Update src/about.py
Browse files- src/about.py +6 -6
src/about.py
CHANGED
|
@@ -23,7 +23,7 @@ NUM_FEWSHOT = 0 # Change with your few shot
|
|
| 23 |
|
| 24 |
|
| 25 |
# Your leaderboard name
|
| 26 |
-
TITLE = """<h1 align="center" id="space-title">
|
| 27 |
|
| 28 |
# What does your leaderboard evaluate?
|
| 29 |
INTRODUCTION_TEXT = """
|
|
@@ -36,7 +36,7 @@ LLM_BENCHMARKS_TEXT = f"""
|
|
| 36 |
The leaderboard evaluates LLMs on the following benchmarks:
|
| 37 |
|
| 38 |
1. TMLU (Taiwanese Mandarin Language Understanding): Measures the model's ability to understand Taiwanese Mandarin text across various domains.
|
| 39 |
-
2. TW Truthful QA: Assesses the model's capability to provide truthful answers to questions in Taiwanese Mandarin, with a focus on Taiwan-specific context.
|
| 40 |
3. TW Legal Eval: Evaluates the model's understanding of legal terminology and concepts in Taiwanese Mandarin, using questions from the Taiwanese bar exam for lawyers.
|
| 41 |
4. MMLU (Massive Multitask Language Understanding): Tests the model's performance on a wide range of tasks in English.
|
| 42 |
|
|
@@ -44,10 +44,10 @@ To reproduce our results, please follow the instructions in the provided GitHub
|
|
| 44 |
|
| 45 |
該排行榜在以下考題上評估 LLMs:
|
| 46 |
|
| 47 |
-
1. TMLU(
|
| 48 |
-
2. TW Truthful QA
|
| 49 |
-
3. TW Legal Eval
|
| 50 |
-
4. MMLU(
|
| 51 |
|
| 52 |
要重現我們的結果,請按照:https://github.com/adamlin120/lm-evaluation-harness/blob/main/run_all.sh
|
| 53 |
"""
|
|
|
|
| 23 |
|
| 24 |
|
| 25 |
# Your leaderboard name
|
| 26 |
+
TITLE = """<h1 align="center" id="space-title">Open Taiwan LLM leaderboard</h1>"""
|
| 27 |
|
| 28 |
# What does your leaderboard evaluate?
|
| 29 |
INTRODUCTION_TEXT = """
|
|
|
|
| 36 |
The leaderboard evaluates LLMs on the following benchmarks:
|
| 37 |
|
| 38 |
1. TMLU (Taiwanese Mandarin Language Understanding): Measures the model's ability to understand Taiwanese Mandarin text across various domains.
|
| 39 |
+
2. TW Truthful QA: Assesses the model's capability to provide truthful and localized answers to questions in Taiwanese Mandarin, with a focus on Taiwan-specific context.
|
| 40 |
3. TW Legal Eval: Evaluates the model's understanding of legal terminology and concepts in Taiwanese Mandarin, using questions from the Taiwanese bar exam for lawyers.
|
| 41 |
4. MMLU (Massive Multitask Language Understanding): Tests the model's performance on a wide range of tasks in English.
|
| 42 |
|
|
|
|
| 44 |
|
| 45 |
該排行榜在以下考題上評估 LLMs:
|
| 46 |
|
| 47 |
+
1. [TMLU(臺灣中文大規模多任務語言理解)](https://huggingface.co/datasets/miulab/tmlu):衡量模型理解各個領域(國中、高中、大學、國考)的能力。
|
| 48 |
+
2. TW Truthful QA:評估模型以臺灣特定的背景來回答問題,測試模型的在地化能力。
|
| 49 |
+
3. [TW Legal Eval](https://huggingface.co/datasets/lianghsun/tw-legal-benchmark-v1):使用臺灣律師資格考試的問題,評估模型對臺灣法律術語和概念的理解。
|
| 50 |
+
4. [MMLU(英文大規模多任務語言理解)](https://huggingface.co/datasets/cais/mmlu):測試模型在英語中各種任務上的表現。
|
| 51 |
|
| 52 |
要重現我們的結果,請按照:https://github.com/adamlin120/lm-evaluation-harness/blob/main/run_all.sh
|
| 53 |
"""
|