Add BoolQ evaluation results via inspect-ai on HF Jobs

#172

by mackenzietechdocs - opened 2 days ago

base: refs/heads/main

←

from: refs/pr/172

Discussion Files changed

+37

-1

mackenzietechdocs

2 days ago

Description:

This PR adds BoolQ evaluation results for openai/gpt-oss-20b, following the Hugging Face Skills evaluation workflow.

Benchmark: BoolQ (google/boolq, validation split)
Task: inspect_evals/boolq
Framework: inspect-ai + inspect-evals
Infra: hf jobs uv run on a10g-small, Inference Providers
Metric: accuracy = 89.1% (stderr = 0.005)

The command used was:

hf jobs uv run hf_model_evaluation/scripts/inspect_eval_uv.py \
  --flavor a10g-small \
  --secrets HF_TOKEN \
  -- \
  --model "openai/gpt-oss-20b" \
  --task "inspect_evals/boolq"

Add BoolQ evaluation results via inspect-ai on HF Jobsaf1f59ba

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment