evaluation-guidebook

Running

Clémentine commited on 9 days ago

Commit

5737cc1

1 Parent(s): 96bfd95

change

Files changed (1) hide show

app/src/content/chapters/automated-benchmarks/designing-your-automatic-evaluation.mdx CHANGED Viewed

@@ -322,18 +322,18 @@ Once you've selected your model, you need to define what is the best possible pr
 <Note title="Prompt design guidelines" emoji="📝" variant="info">
 Provide a clear description of the task at hand:
-- `Your task is to do X`.
-- `You will be provided with Y`.
 Provide clear instructions on the evaluation criteria, including a detailed scoring system if needed:
-- `You should evaluate property Z on a scale of 1 - 5, where 1 means ...`
-- `You should evaluate if property Z is present in the sample Y. Property Z is present if ...`
 Provide some additional "reasoning" evaluation steps:
-- `To judge this task, you must first make sure to read sample Y carefully to identify ..., then ...`
 Specify the desired output format (adding fields will help consistency)
-- `Your answer should be provided in JSON, with the following format {"Score": Your score, "Reasoning": The reasoning which led you to this score}`
 </Note>
 You can and should take inspiration from [MixEval](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended/mix_eval/judge_prompts.pyy) or [MTBench](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended/mt_bench/judge_prompt_templates.py) prompt templates.

 <Note title="Prompt design guidelines" emoji="📝" variant="info">
 Provide a clear description of the task at hand:
+- *Your task is to do X*.
+- *You will be provided with Y*.
 Provide clear instructions on the evaluation criteria, including a detailed scoring system if needed:
+- *You should evaluate property Z on a scale of 1 - 5, where 1 means ...*
+- *You should evaluate if property Z is present in the sample Y. Property Z is present if ...*
 Provide some additional "reasoning" evaluation steps:
+- *To judge this task, you must first make sure to read sample Y carefully to identify ..., then ...*
 Specify the desired output format (adding fields will help consistency)
+- *Your answer should be provided in JSON, with the following format {"Score": Your score, "Reasoning": The reasoning which led you to this score}*
 </Note>
 You can and should take inspiration from [MixEval](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended/mix_eval/judge_prompts.pyy) or [MTBench](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended/mt_bench/judge_prompt_templates.py) prompt templates.