T2V Hook Relabeling Pipeline
より良いAIでより悪いAIのラベリングミスを修正——人間レビュアーが太刀打ちできないコストで
6
Label Dimensions
3
Iterative Rounds
100x
Cost vs. Human
EWMA
Adaptive Rate Limit
The AI-generated video content industry has a dirty secret: the training data that teaches models about human demographics, scenes, and seasons is itself labeled by humans—and human labelers make systematic mistakes that compound into model bias. By 2025, synthetic avatar video generation is a $500M+ market (Sora, Kling, Pika, HeyGen), and every model needs demographic-accurate training labels.
The meta-labeling insight: a more capable frontier model (Gemini 2.5 Flash) can audit and correct the output of cheaper, less accurate labeling rounds. Human labeling costs $0.50–2.00 per video. Our Gemini pipeline costs ~$0.01 per video at scale—a 50–200x cost reduction while improving label accuracy. The economic math is undeniable once you're beyond 10,000 videos.
The engineering challenge was rate-limit adaptation at scale. Gemini's API has token-per-minute quotas that change dynamically based on request patterns. Static throttling either leaves throughput on the table or causes cascading 429s. We implemented EWMA (Exponentially Weighted Moving Average) rate tracking that learns the effective TPM from the last N requests and automatically adjusts concurrency—no manual tuning, no token waste.
Gemini's Files API lets the model receive a video URL and stream the content directly without local download. The 60MB guard rail prevents timeout failures on oversized files. After 3 full-run rounds with incremental merge—preserving high-confidence labels from earlier rounds, re-examining only the uncertain ones—the label dataset quality converges.