Skip to content

Re-validation

When an item first reaches consensus it has a label, but that label is based on a small number of votes. A 3/3 unanimous first-pass gives confidence = 1.0 mathematically, but three data points is not enough evidence for a production ML dataset. Re-validation is the mechanism that keeps items in the challenge pool until their label is backed by sufficient evidence.

The two graduation conditions

An item only leaves the challenge pool when both of these are true:

  1. confidence >= revalidation_exit_confidence (default 0.85)
  2. vote_count + revalidation_vote_count >= revalidation_min_total_votes (default 15)

The confidence condition ensures the crowd agrees on the label. The vote-volume condition ensures enough independent humans have weighed in — preventing a lucky streak of three unanimous first-pass votes from graduating an item before it’s truly validated.

Item lifecycle

Upload
Unlabeled ──[consensus_threshold votes, majority > 50%]──► Labeled
(Tier 1 pool) (Tier 2 pool)
┌───────────────────────────────────────┤
│ │
confidence < floor confidence ≥ exit AND
(< 0.60 default) total_votes ≥ 15
│ │
▼ ▼
Needs Review Graduated
(excluded from (out of challenge
training export) pool)

How confidence is updated

Re-validation votes arrive in batches. Once revalidation_min_votes (default 3) unprocessed votes have accumulated, the engine applies a delta to confidence:

  • Vote agrees with current label → confidence += revalidation_delta (default 0.05)
  • Vote disagreesconfidence -= revalidation_delta

Each vote in the batch is marked processed before the delta is applied, preventing the same vote from being counted in a future batch.

Example: item labeled “cat” with initial confidence 0.67, 3 re-validation votes all agree:

0.67 + (3 × 0.05) = 0.82 → still in pool (below 0.85 exit, below 15 total votes)

The item stays in re-validation until both thresholds are met.

Item selection within the re-validation pool

Items are picked using weighted random sampling so lower-confidence and vote-starved items get more attention:

weight = GREATEST(
exit_threshold − confidence, # uncertain labels get higher weight
(min_total_votes − total_votes) / min_total_votes, # few-vote items also weighted up
0.01 # floor so near-graduation items still appear
)

This prevents the same easy high-confidence item from being served repeatedly while harder items sit unsampled.

Exit states

Graduated

Both conditions met. Item is removed from the re-validation pool permanently. revalidation_epochs is incremented to track how many complete passes the item went through.

Needs review

Confidence dropped below revalidation_confidence_threshold (default 0.60). The item is flagged needs_review = True, excluded from training data exports, and removed from the re-validation pool until an admin reviews and clears it in the dashboard.

Configuration reference

SettingDefaultDescription
CONSENSUS_THRESHOLD3First-pass votes needed to assign an initial label
REVALIDATION_MIN_TOTAL_VOTES15Minimum total votes (first-pass + re-validation) before graduation
REVALIDATION_EXIT_CONFIDENCE0.85Confidence required for graduation
REVALIDATION_CONFIDENCE_THRESHOLD0.60Floor below which item is flagged for review
REVALIDATION_DELTA0.05Confidence change per re-validation vote
REVALIDATION_MIN_VOTES3Batch size before deltas are applied

Viewing re-validation status

The dashboard Datasets view shows each item’s current confidence, total vote count, and whether it’s in the re-validation pool or flagged for review.

To export only well-validated labels for ML training:

Terminal window
hiveguard labels export --fmt jsonl | \
jq -c 'select(.confidence >= 0.85)' > training_data.jsonl

Items flagged needs_review are automatically excluded from exports.