Adding Evaluation Results

#1
Files changed (1) hide show
  1. README.md +118 -2
README.md CHANGED
@@ -1,4 +1,5 @@
1
  ---
 
2
  tags:
3
  - merge
4
  - mergekit
@@ -54,7 +55,109 @@ base_model:
54
  - NousResearch/Meta-Llama-3-8B
55
  - ryan0712/llama-3-8b-slow-DUS-random-layer-method2
56
  - NousResearch/Meta-Llama-3-8B
57
- license: llama3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ---
59
 
60
  # llama-3-8b-slow-DUS-random-method2
@@ -333,4 +436,17 @@ pipeline = transformers.pipeline(
333
 
334
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
335
  print(outputs[0]["generated_text"])
336
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: llama3
3
  tags:
4
  - merge
5
  - mergekit
 
55
  - NousResearch/Meta-Llama-3-8B
56
  - ryan0712/llama-3-8b-slow-DUS-random-layer-method2
57
  - NousResearch/Meta-Llama-3-8B
58
+ model-index:
59
+ - name: llama-3-8b-slow-DUS-random-method2
60
+ results:
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: AI2 Reasoning Challenge (25-Shot)
66
+ type: ai2_arc
67
+ config: ARC-Challenge
68
+ split: test
69
+ args:
70
+ num_few_shot: 25
71
+ metrics:
72
+ - type: acc_norm
73
+ value: 39.16
74
+ name: normalized accuracy
75
+ source:
76
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ryan0712/llama-3-8b-slow-DUS-random-method2
77
+ name: Open LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: HellaSwag (10-Shot)
83
+ type: hellaswag
84
+ split: validation
85
+ args:
86
+ num_few_shot: 10
87
+ metrics:
88
+ - type: acc_norm
89
+ value: 63.82
90
+ name: normalized accuracy
91
+ source:
92
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ryan0712/llama-3-8b-slow-DUS-random-method2
93
+ name: Open LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: MMLU (5-Shot)
99
+ type: cais/mmlu
100
+ config: all
101
+ split: test
102
+ args:
103
+ num_few_shot: 5
104
+ metrics:
105
+ - type: acc
106
+ value: 30.25
107
+ name: accuracy
108
+ source:
109
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ryan0712/llama-3-8b-slow-DUS-random-method2
110
+ name: Open LLM Leaderboard
111
+ - task:
112
+ type: text-generation
113
+ name: Text Generation
114
+ dataset:
115
+ name: TruthfulQA (0-shot)
116
+ type: truthful_qa
117
+ config: multiple_choice
118
+ split: validation
119
+ args:
120
+ num_few_shot: 0
121
+ metrics:
122
+ - type: mc2
123
+ value: 37.04
124
+ source:
125
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ryan0712/llama-3-8b-slow-DUS-random-method2
126
+ name: Open LLM Leaderboard
127
+ - task:
128
+ type: text-generation
129
+ name: Text Generation
130
+ dataset:
131
+ name: Winogrande (5-shot)
132
+ type: winogrande
133
+ config: winogrande_xl
134
+ split: validation
135
+ args:
136
+ num_few_shot: 5
137
+ metrics:
138
+ - type: acc
139
+ value: 60.14
140
+ name: accuracy
141
+ source:
142
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ryan0712/llama-3-8b-slow-DUS-random-method2
143
+ name: Open LLM Leaderboard
144
+ - task:
145
+ type: text-generation
146
+ name: Text Generation
147
+ dataset:
148
+ name: GSM8k (5-shot)
149
+ type: gsm8k
150
+ config: main
151
+ split: test
152
+ args:
153
+ num_few_shot: 5
154
+ metrics:
155
+ - type: acc
156
+ value: 0.68
157
+ name: accuracy
158
+ source:
159
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ryan0712/llama-3-8b-slow-DUS-random-method2
160
+ name: Open LLM Leaderboard
161
  ---
162
 
163
  # llama-3-8b-slow-DUS-random-method2
 
436
 
437
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
438
  print(outputs[0]["generated_text"])
439
+ ```
440
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
441
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ryan0712__llama-3-8b-slow-DUS-random-method2)
442
+
443
+ | Metric |Value|
444
+ |---------------------------------|----:|
445
+ |Avg. |38.52|
446
+ |AI2 Reasoning Challenge (25-Shot)|39.16|
447
+ |HellaSwag (10-Shot) |63.82|
448
+ |MMLU (5-Shot) |30.25|
449
+ |TruthfulQA (0-shot) |37.04|
450
+ |Winogrande (5-shot) |60.14|
451
+ |GSM8k (5-shot) | 0.68|
452
+