Skip to content
全プロジェクト

OCR-Auto

50要素タイプでbbox精度100%——10回のプロンプトイテレーションで65%から92%へ

100% bbox · 50 labels

100%

Bbox Accuracy

50

Element Types

92%

Label Accuracy

19

Prompt Versions

Loading diagram...

Most annotation tools require human labelers to classify each element manually. OCR-Auto replaces that entire workflow with an async 4-stage pipeline powered by Qwen VL models. The system identifies 50 distinct element types across code languages, interaction formats, content elements, and edge cases—from hyperlinks to multi-column layouts to watermarks.

The engineering challenge was reliability at scale: three-layer fault tolerance (exponential backoff retry, EWMA-adaptive rate limiting, circuit breaker), SHA256 content-addressed caching for deterministic results, and checkpoint recovery for crash resilience. Prompt engineering iterated from V1.0 (65% accuracy) through 10+ versions to V3.9 (92%), with V2.0 achieving 100% bbox accuracy on validation sets.

Python
QwenQwen VL
Async Pipeline
LLM
PythonQwenQwen VLAsync PipelineLLM