OCR-Auto
100% bbox accuracy on 50 element types—from 65% to 92% in 10 prompt iterations
100%
Bbox Accuracy
50
Element Types
92%
Label Accuracy
19
Prompt Versions
Most annotation tools require human labelers to classify each element manually. OCR-Auto replaces that entire workflow with an async 4-stage pipeline powered by Qwen VL models. The system identifies 50 distinct element types across code languages, interaction formats, content elements, and edge cases—from hyperlinks to multi-column layouts to watermarks.
The engineering challenge was reliability at scale: three-layer fault tolerance (exponential backoff retry, EWMA-adaptive rate limiting, circuit breaker), SHA256 content-addressed caching for deterministic results, and checkpoint recovery for crash resilience. Prompt engineering iterated from V1.0 (65% accuracy) through 10+ versions to V3.9 (92%), with V2.0 achieving 100% bbox accuracy on validation sets.