Text Generation
Transformers
Safetensors
English
Japanese
llama
conversational
text-generation-inference
s-mizuki-nlp commited on
Commit
3ad0e74
·
verified ·
1 Parent(s): 1fbe3b6

updated benchmark description.

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -144,7 +144,7 @@ We used llm-jp-eval(v1.3.0), JP Language Model Evaluation Harness(commit #9b42d4
144
  - Automatic summarization (XL-Sum [Hasan et al., 2021])
145
  - Machine translation (WMT2020 ja-en [Barrault et al., 2020])
146
  - Machine translation (WMT2020 en-ja [Barrault et al., 2020])
147
- - Mathematical reasoning (MGSM [Shi et al., 2023])
148
  - Academic exams (JMMLU [尹ら, 2024])
149
  - Code generation (JHumanEval [佐藤ら, 2024])
150
 
@@ -157,7 +157,7 @@ We used the Language Model Evaluation Harness(v.0.4.2) and Code Generation LM Ev
157
  - Machine reading comprehension (SQuAD2 [Rajpurkar et al., 2018])
158
  - Commonsense reasoning (XWINO [Tikhonov and Ryabinin, 2021])
159
  - Natural language inference (HellaSwag [Zellers et al., 2019])
160
- - Mathematical reasoning (GSM8K [Cobbe et al., 2021])
161
  - Mathematical reasoning (MATH [Hendrycks et al., 2022][Lightman et al., 2024])
162
  - Reasoning (BBH (BIG-Bench-Hard) [Suzgun et al., 2023])
163
  - Academic exams (MMLU [Hendrycks et al., 2021])
 
144
  - Automatic summarization (XL-Sum [Hasan et al., 2021])
145
  - Machine translation (WMT2020 ja-en [Barrault et al., 2020])
146
  - Machine translation (WMT2020 en-ja [Barrault et al., 2020])
147
+ - Arithmetic reasoning (MGSM [Shi et al., 2023])
148
  - Academic exams (JMMLU [尹ら, 2024])
149
  - Code generation (JHumanEval [佐藤ら, 2024])
150
 
 
157
  - Machine reading comprehension (SQuAD2 [Rajpurkar et al., 2018])
158
  - Commonsense reasoning (XWINO [Tikhonov and Ryabinin, 2021])
159
  - Natural language inference (HellaSwag [Zellers et al., 2019])
160
+ - Arithmetic reasoning (GSM8K [Cobbe et al., 2021])
161
  - Mathematical reasoning (MATH [Hendrycks et al., 2022][Lightman et al., 2024])
162
  - Reasoning (BBH (BIG-Bench-Hard) [Suzgun et al., 2023])
163
  - Academic exams (MMLU [Hendrycks et al., 2021])