How HiveGuard Works
HiveGuard turns each visitor request into an opportunity to collect a label. Here is the full loop.
The labeling loop
1. You upload a dataset of unlabeled items │ ▼2. A visitor hits your site → HiveGuard intercepts the request │ ▼3. HiveGuard generates a challenge: one item from your dataset paired with one ground-truth item (known answer) │ ▼4. Visitor answers both items │ ├─ Ground-truth answer correct? │ Yes → visitor is a real human, request goes through │ No → challenge failed, not recorded as a label │ └─ Unknown item answer recorded → feeds consensus engine │ ▼5. Consensus engine aggregates answers from multiple visitors When enough visitors agree → label finalized │ ▼6. Labeled item available for exportWhy the ground-truth pairing matters
Each challenge contains two items: one whose answer you already know (ground truth), and one you want labeled (unknown). The visitor doesn’t know which is which.
The ground-truth item filters out random clickers and inattentive answers. If a visitor answers the GT item wrong, their answer on the unknown item is discarded — it wasn’t a genuine label, it was noise. This keeps label quality high without any manual review on your end.
Consensus
Labels are not finalized after a single solver. HiveGuard collects answers from multiple visitors and runs them through the consensus engine. When enough solvers agree (default: 3), and agreement exceeds 50%, the label is finalized with a confidence score.
This means a label with confidence 0.9 represents 9 out of 10 visitors agreeing — a much stronger signal than any single annotation.
Re-validation
Labels don’t expire, but they can age. HiveGuard continuously re-serves already-labeled items to new visitors, updating the confidence score over time. If a label starts losing agreement, its confidence drops. Items that fall below the confidence threshold are flagged for review and excluded from exports until they stabilize. See Re-validation for details.
What happens when your dataset runs out
When all unknown items have been labeled, HiveGuard doesn’t stop — it enters re-validation mode. Already-labeled items re-enter the challenge pool as the unknown slot, continuously refining confidence. Your traffic keeps doing useful work even after the first labeling pass is complete.
If the challenge pool is ever empty at the wrong moment, HiveGuard’s degradation ladder handles it gracefully — visitors get through without being blocked.
The feedback loop
Upload items → collect labels via traffic → export labels → train model → repeat. The dataset gets richer over time without any additional infrastructure or human effort on your part.