Exporting Labels
Once visitors have been answering challenges, labels accumulate. When enough answers on an item agree, the consensus engine finalizes a label. You can export finalized labels at any time — they’re ready for model training.
Quick export
# JSONL to stdouthiveguard labels export --fmt jsonl
# Save to filehiveguard labels export --fmt csv --output labels.csv
# JSON, pipe through jqhiveguard labels export --fmt json | jq '.[] | select(.confidence > 0.9)'Export formats
| Format | Best for |
|---|---|
csv | pandas, scikit-learn, spreadsheets |
jsonl | large datasets, streaming pipelines |
json | small datasets, inspection |
Export from a specific dataset
hiveguard datasets export DATASET_ID --fmt csv --output dataset_labels.csvThis streams directly from the server — safe for datasets with millions of rows.
What each label contains
| Field | Type | Description |
|---|---|---|
item_id | UUID | The item that was labeled |
data_ref | string | URL of the item content |
modality | string | image, text, or audio |
label | string | The consensus answer |
confidence | float | Fraction of solvers who agreed (0.0–1.0) |
solver_count | int | Number of human solvers who answered |
created_at | ISO 8601 | When the label was finalized |
Filtering by confidence
High-confidence labels are more reliable. Filter before feeding to a training pipeline:
# Only labels where ≥90% of solvers agreedhiveguard labels export --fmt jsonl | \ jq -c 'select(.confidence >= 0.9)' > high_conf.jsonlOr in Python:
import pandas as pd
df = pd.read_csv("labels.csv")train_df = df[df["confidence"] >= 0.8]
# Ready for model trainingprint(f"{len(train_df)} high-confidence labels")Automating regular exports
Schedule exports in a cron job or CI pipeline:
#!/bin/bashDATE=$(date +%Y%m%d)hiveguard labels export --fmt jsonl --output "labels_${DATE}.jsonl"This pattern works well for nightly training runs: export the current labels, train, deploy.
Labels are living data
HiveGuard continues collecting labels after export. Labels with low confidence are automatically re-queued for additional labeling through re-validation. If a previously exported label later loses agreement, its confidence drops — that’s a signal to re-export and re-check.
For production pipelines, export on a schedule rather than once. The dataset improves over time.