OCR-Auto

100% bbox accuracy on 50 element types—from 65% to 92% in 10 prompt iterations

100% bbox · 50 labels

100%

Bbox Accuracy

Element Types

92%

Label Accuracy

Prompt Versions

Loading diagram...

Most annotation tools require human labelers to classify each element manually. OCR-Auto replaces that entire workflow with an async 4-stage pipeline powered by Qwen VL models. The system identifies 50 distinct element types across code languages, interaction formats, content elements, and edge cases—from hyperlinks to multi-column layouts to watermarks.

The engineering challenge was reliability at scale: three-layer fault tolerance (exponential backoff retry, EWMA-adaptive rate limiting, circuit breaker), SHA256 content-addressed caching for deterministic results, and checkpoint recovery for crash resilience. Prompt engineering iterated from V1.0 (65% accuracy) through 10+ versions to V3.9 (92%), with V2.0 achieving 100% bbox accuracy on validation sets.

Python

Qwen VL

Async Pipeline

LLM

PythonQwen VLAsync PipelineLLM