Clémentine
commited on
Commit
·
5737cc1
1
Parent(s):
96bfd95
change
Browse files
app/src/content/chapters/automated-benchmarks/designing-your-automatic-evaluation.mdx
CHANGED
|
@@ -322,18 +322,18 @@ Once you've selected your model, you need to define what is the best possible pr
|
|
| 322 |
|
| 323 |
<Note title="Prompt design guidelines" emoji="📝" variant="info">
|
| 324 |
Provide a clear description of the task at hand:
|
| 325 |
-
-
|
| 326 |
-
-
|
| 327 |
|
| 328 |
Provide clear instructions on the evaluation criteria, including a detailed scoring system if needed:
|
| 329 |
-
-
|
| 330 |
-
-
|
| 331 |
|
| 332 |
Provide some additional "reasoning" evaluation steps:
|
| 333 |
-
-
|
| 334 |
|
| 335 |
Specify the desired output format (adding fields will help consistency)
|
| 336 |
-
-
|
| 337 |
</Note>
|
| 338 |
|
| 339 |
You can and should take inspiration from [MixEval](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended/mix_eval/judge_prompts.pyy) or [MTBench](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended/mt_bench/judge_prompt_templates.py) prompt templates.
|
|
|
|
| 322 |
|
| 323 |
<Note title="Prompt design guidelines" emoji="📝" variant="info">
|
| 324 |
Provide a clear description of the task at hand:
|
| 325 |
+
- *Your task is to do X*.
|
| 326 |
+
- *You will be provided with Y*.
|
| 327 |
|
| 328 |
Provide clear instructions on the evaluation criteria, including a detailed scoring system if needed:
|
| 329 |
+
- *You should evaluate property Z on a scale of 1 - 5, where 1 means ...*
|
| 330 |
+
- *You should evaluate if property Z is present in the sample Y. Property Z is present if ...*
|
| 331 |
|
| 332 |
Provide some additional "reasoning" evaluation steps:
|
| 333 |
+
- *To judge this task, you must first make sure to read sample Y carefully to identify ..., then ...*
|
| 334 |
|
| 335 |
Specify the desired output format (adding fields will help consistency)
|
| 336 |
+
- *Your answer should be provided in JSON, with the following format {"Score": Your score, "Reasoning": The reasoning which led you to this score}*
|
| 337 |
</Note>
|
| 338 |
|
| 339 |
You can and should take inspiration from [MixEval](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended/mix_eval/judge_prompts.pyy) or [MTBench](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended/mt_bench/judge_prompt_templates.py) prompt templates.
|