Exporting Labels

Once visitors have been answering challenges, labels accumulate. When enough answers on an item agree, the consensus engine finalizes a label. You can export finalized labels at any time — they’re ready for model training.

Quick export

# JSONL to stdout
hiveguard labels export --fmt jsonl

# Save to file
hiveguard labels export --fmt csv --output labels.csv

# JSON, pipe through jq
hiveguard labels export --fmt json | jq '.[] | select(.confidence > 0.9)'

Export formats

Format	Best for
`csv`	pandas, scikit-learn, spreadsheets
`jsonl`	large datasets, streaming pipelines
`json`	small datasets, inspection

Export from a specific dataset

hiveguard datasets export DATASET_ID --fmt csv --output dataset_labels.csv

This streams directly from the server — safe for datasets with millions of rows.

What each label contains

Field	Type	Description
`item_id`	UUID	The item that was labeled
`data_ref`	string	URL of the item content
`modality`	string	`image`, `text`, or `audio`
`label`	string	The consensus answer
`confidence`	float	Fraction of solvers who agreed (0.0–1.0)
`solver_count`	int	Number of human solvers who answered
`created_at`	ISO 8601	When the label was finalized

Filtering by confidence

High-confidence labels are more reliable. Filter before feeding to a training pipeline:

# Only labels where ≥90% of solvers agreed
hiveguard labels export --fmt jsonl | \
  jq -c 'select(.confidence >= 0.9)' > high_conf.jsonl

Or in Python:

import pandas as pd

df = pd.read_csv("labels.csv")
train_df = df[df["confidence"] >= 0.8]

# Ready for model training
print(f"{len(train_df)} high-confidence labels")

Automating regular exports

Schedule exports in a cron job or CI pipeline:

#!/bin/bash
DATE=$(date +%Y%m%d)
hiveguard labels export --fmt jsonl --output "labels_${DATE}.jsonl"

This pattern works well for nightly training runs: export the current labels, train, deploy.

Labels are living data

HiveGuard continues collecting labels after export. Labels with low confidence are automatically re-queued for additional labeling through re-validation. If a previously exported label later loses agreement, its confidence drops — that’s a signal to re-export and re-check.

For production pipelines, export on a schedule rather than once. The dataset improves over time.