Add evaluation results for GPQA, HLE

by SaylorTwift HF Staff - opened 7 days ago

←

7 days ago

Evaluation Results

    This PR adds evaluation results extracted from the Model Card.

    **Benchmarks:**
    - GPQA: 83.8

HLE: 22.29

  **Files created:**
  - .eval_results/gpqa.yaml

.eval_results/hle_with_tools.yaml

  ---

  Extracted automatically using the [LLM-powered evaluation extractor](https://github.com/huggingface/community-evals).

snapo

6 days ago

Request for Benchmark:
GPQA-Diamond , MMLU-Pro, Aider Polyglot, AA-LCR, IFBENCH

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment