Add evaluation results for GPQA, HLE

#8
by SaylorTwift HF Staff - opened

Evaluation Results

    This PR adds evaluation results extracted from the Model Card.

    **Benchmarks:**
    - GPQA: 83.8
  • HLE: 12.6

  • HLE: 22.29

      **Files created:**
      - .eval_results/gpqa.yaml
    
  • .eval_results/hle.yaml

  • .eval_results/hle_with_tools.yaml

      ---
    
      Extracted automatically using the [LLM-powered evaluation extractor](https://github.com/huggingface/community-evals).
    

Request for Benchmark:
GPQA-Diamond , MMLU-Pro, Aider Polyglot, AA-LCR, IFBENCH

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment