Adding the Open Portuguese LLM Leaderboard Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show

README.md +23 -4

README.md CHANGED Viewed

@@ -1,12 +1,9 @@
 ---
 license: llama3
-base_model: meta-llama/Meta-Llama-3-70B
 tags:
 - generated_from_trainer
 - axolotl
-model-index:
-- name: out
-  results: []
 datasets:
 - cognitivecomputations/Dolphin-2.9
 - teknium/OpenHermes-2.5
@@ -16,6 +13,9 @@ datasets:
 - microsoft/orca-math-word-problems-200k
 - Locutusque/function-calling-chatml
 - internlm/Agent-FLAN
 ---
 # Dolphin 2.9.1 Llama 3 70b 🐬
@@ -510,3 +510,22 @@ The following hyperparameters were used during training:
 - Pytorch 2.2.2+cu121
 - Datasets 2.19.1
 - Tokenizers 0.19.1

 ---
 license: llama3
 tags:
 - generated_from_trainer
 - axolotl
+base_model: meta-llama/Meta-Llama-3-70B
 datasets:
 - cognitivecomputations/Dolphin-2.9
 - teknium/OpenHermes-2.5
 - microsoft/orca-math-word-problems-200k
 - Locutusque/function-calling-chatml
 - internlm/Agent-FLAN
+model-index:
+- name: out
+  results: []
 ---
 # Dolphin 2.9.1 Llama 3 70b 🐬
 - Pytorch 2.2.2+cu121
 - Datasets 2.19.1
 - Tokenizers 0.19.1
+# Open Portuguese LLM Leaderboard Evaluation Results
+Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/cognitivecomputations/dolphin-2.9.1-llama-3-70b) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
+|          Metric          |  Value  |
+|--------------------------|---------|
+|Average                   |**72.43**|
+|ENEM Challenge (No Images)|    76.56|
+|BLUEX (No Images)         |    67.87|
+|OAB Exams                 |    61.37|
+|Assin2 RTE                |    92.11|
+|Assin2 STS                |    78.26|
+|FaQuAD NLI                |    52.75|
+|HateBR Binary             |    81.01|
+|PT Hate Speech Binary     |    71.78|
+|tweetSentBR               |    70.14|