Add evaluation results for GPQA, HLE
#8
by
SaylorTwift
HF Staff
- opened
Evaluation Results
This PR adds evaluation results extracted from the Model Card.
**Benchmarks:**
- GPQA: 83.8
HLE: 12.6
HLE: 22.29
**Files created:** - .eval_results/gpqa.yaml.eval_results/hle.yaml
.eval_results/hle_with_tools.yaml
--- Extracted automatically using the [LLM-powered evaluation extractor](https://github.com/huggingface/community-evals).
Request for Benchmark:
GPQA-Diamond , MMLU-Pro, Aider Polyglot, AA-LCR, IFBENCH