Clémentine
commited on
Commit
·
5f4c2c0
1
Parent(s):
7b35424
about2
Browse files
app.py
CHANGED
|
@@ -111,6 +111,8 @@ def create_app() -> gr.Blocks:
|
|
| 111 |
gr.Markdown("""
|
| 112 |
In this demo, we run 10 samples of 3 evaluations: ifeval (instruction following), gsm_plus (grade school math problems, less contaminated than gsm8k) and gpqa, diamond subset (knowledge), with `lighteval`, `inference-providers` and `jobs`.
|
| 113 |
|
|
|
|
|
|
|
| 114 |
To run any of these locally, you can use the following
|
| 115 |
```python
|
| 116 |
from huggingface_hub import run_job, inspect_job, whoami
|
|
|
|
| 111 |
gr.Markdown("""
|
| 112 |
In this demo, we run 10 samples of 3 evaluations: ifeval (instruction following), gsm_plus (grade school math problems, less contaminated than gsm8k) and gpqa, diamond subset (knowledge), with `lighteval`, `inference-providers` and `jobs`.
|
| 113 |
|
| 114 |
+
The "status" column indicates whether the evaluation failed completely (usually because of the provider was down or because we were rate limited).
|
| 115 |
+
|
| 116 |
To run any of these locally, you can use the following
|
| 117 |
```python
|
| 118 |
from huggingface_hub import run_job, inspect_job, whoami
|