Spaces:
Running
Running
Update index.html
Browse files- index.html +24 -31
index.html
CHANGED
|
@@ -360,7 +360,7 @@
|
|
| 360 |
|
| 361 |
<div class="container is-max-desktop has-text-centered">
|
| 362 |
<h1 class="publication-title">Shell@Educhat</h1>
|
| 363 |
-
|
| 364 |
<h2 class="subtitle is-4" style="color: #4a5568; font-weight: 400;">
|
| 365 |
<span class="lang-en">Uncovering and Mitigating Implicit Risks in Domain-Specific LLMs</span>
|
| 366 |
<span class="lang-zh" style="font-weight: 700;">大语言模型垂域任务隐式价值观风险挖掘与对齐基准</span>
|
|
@@ -374,12 +374,12 @@
|
|
| 374 |
<div class="intro-content">
|
| 375 |
<div class="lang-en">
|
| 376 |
<p>
|
| 377 |
-
Ensuring the safety of large language models (LLMs) in vertical domains (Education, Finance, Management) is critical. While current alignment efforts primarily target explicit risks like bias and violence, they often fail to address deeper, <strong>domain-specific implicit risks</strong>. We introduce <strong>a comprehensive dataset</strong>
|
| 378 |
</p>
|
| 379 |
</div>
|
| 380 |
<div class="lang-zh">
|
| 381 |
<p>
|
| 382 |
-
确保垂直领域(教育、金融、管理)中大模型的安全性至关重要。虽然目前的对齐工作主要针对偏见和暴力等显性风险,但往往忽略了更深层次的<strong>特定领域隐性风险</strong>。研发团队推出了<strong
|
| 383 |
</p>
|
| 384 |
</div>
|
| 385 |
</div>
|
|
@@ -410,8 +410,8 @@
|
|
| 410 |
<span class="lang-zh">领域任务隐式风险数据集</span>
|
| 411 |
</h2>
|
| 412 |
<p style="color: var(--text-muted);">
|
| 413 |
-
<span class="lang-en">A domain-specific risk evaluation benchmark covering
|
| 414 |
-
<span class="lang-zh"
|
| 415 |
</p>
|
| 416 |
</div>
|
| 417 |
|
|
@@ -659,14 +659,14 @@
|
|
| 659 |
</thead>
|
| 660 |
<tbody>
|
| 661 |
<tr>
|
| 662 |
-
<td class="model-col">GPT-5-2025-08-07
|
| 663 |
-
<td>0.
|
| 664 |
<td>0.098</td>
|
| 665 |
-
<td>0.
|
| 666 |
-
<td>0.
|
| 667 |
-
<td>0.
|
| 668 |
-
<td>0.189</td>
|
| 669 |
<td>0.370</td>
|
|
|
|
| 670 |
<td>0.855</td>
|
| 671 |
</tr>
|
| 672 |
<tr>
|
|
@@ -687,8 +687,8 @@
|
|
| 687 |
<td>0.131</td>
|
| 688 |
<td>0.088</td>
|
| 689 |
<td>0.696</td>
|
| 690 |
-
<td>0.716</td>
|
| 691 |
<td>0.844</td>
|
|
|
|
| 692 |
<td>0.581</td>
|
| 693 |
</tr>
|
| 694 |
<tr>
|
|
@@ -709,8 +709,8 @@
|
|
| 709 |
<td>0.030</td>
|
| 710 |
<td>0.019</td>
|
| 711 |
<td>0.492</td>
|
| 712 |
-
<td>0.300</td>
|
| 713 |
<td>0.518</td>
|
|
|
|
| 714 |
<td>0.771</td>
|
| 715 |
</tr>
|
| 716 |
<tr>
|
|
@@ -719,9 +719,9 @@
|
|
| 719 |
<td>0.070</td>
|
| 720 |
<td>0.035</td>
|
| 721 |
<td>0.021</td>
|
| 722 |
-
<td>0.522</td>
|
| 723 |
<td>0.672</td>
|
| 724 |
<td>0.682</td>
|
|
|
|
| 725 |
<td>0.659</td>
|
| 726 |
</tr>
|
| 727 |
<tr>
|
|
@@ -731,8 +731,8 @@
|
|
| 731 |
<td>0.020</td>
|
| 732 |
<td>0.011</td>
|
| 733 |
<td>0.608</td>
|
| 734 |
-
<td>0.328</td>
|
| 735 |
<td>0.482</td>
|
|
|
|
| 736 |
<td>0.749</td>
|
| 737 |
</tr>
|
| 738 |
<tr>
|
|
@@ -753,8 +753,8 @@
|
|
| 753 |
<td>0.073</td>
|
| 754 |
<td>0.059</td>
|
| 755 |
<td>0.790</td>
|
| 756 |
-
<td>0.912</td>
|
| 757 |
<td>0.920</td>
|
|
|
|
| 758 |
<td>0.496</td>
|
| 759 |
</tr>
|
| 760 |
<tr>
|
|
@@ -764,8 +764,8 @@
|
|
| 764 |
<td>0.009</td>
|
| 765 |
<td>0.003</td>
|
| 766 |
<td>0.280</td>
|
| 767 |
-
<td>0.174</td>
|
| 768 |
<td>0.170</td>
|
|
|
|
| 769 |
<td>0.906</td>
|
| 770 |
</tr>
|
| 771 |
<tr>
|
|
@@ -781,13 +781,13 @@
|
|
| 781 |
</tr>
|
| 782 |
<tr>
|
| 783 |
<td class="model-col">Gemini-2.5-Pro</td>
|
| 784 |
-
<td>0.
|
| 785 |
-
<td>0.
|
| 786 |
<td>0.003</td>
|
| 787 |
<td>0.002</td>
|
| 788 |
-
<td>0.
|
| 789 |
-
<td>0.400</td>
|
| 790 |
<td>0.502</td>
|
|
|
|
| 791 |
<td>0.761</td>
|
| 792 |
</tr>
|
| 793 |
<tr>
|
|
@@ -797,8 +797,8 @@
|
|
| 797 |
<td>0.005</td>
|
| 798 |
<td>0.003</td>
|
| 799 |
<td>0.426</td>
|
| 800 |
-
<td>0.220</td>
|
| 801 |
<td>0.346</td>
|
|
|
|
| 802 |
<td>0.831</td>
|
| 803 |
</tr>
|
| 804 |
</tbody>
|
|
@@ -837,18 +837,11 @@
|
|
| 837 |
<span class="lang-zh"><strong>免疫分 (Immunity Score):</strong> 量化了模型对隐性风险的抵抗能力 [0-1],越高越好。</span>
|
| 838 |
</li>
|
| 839 |
<li style="margin-top: 10px; color: #1a202c;">
|
| 840 |
-
<span class="lang-en"><strong>Dataset Composition:</strong> This leaderboard is based on
|
| 841 |
-
<span class="lang-zh"><strong>数据集构成:</strong>
|
| 842 |
</li>
|
| 843 |
</ul>
|
| 844 |
</div>
|
| 845 |
-
|
| 846 |
-
<div class="content mt-2">
|
| 847 |
-
<p class="is-size-7 has-text-grey">
|
| 848 |
-
<span class="lang-en"><strong>* Note regarding GPT-5-2025-08-07:</strong> Due to platform safety mechanisms and request interceptions, this model was evaluated on 1302 out of 1500 queries.</span>
|
| 849 |
-
<span class="lang-zh"><strong>* 关于 GPT-5-2025-08-07 的说明:</strong> 由于平台安全机制和请求拦截,该模型在 1500 条查询中实测了 1302 条。</span>
|
| 850 |
-
</p>
|
| 851 |
-
</div>
|
| 852 |
</div>
|
| 853 |
</section>
|
| 854 |
|
|
|
|
| 360 |
|
| 361 |
<div class="container is-max-desktop has-text-centered">
|
| 362 |
<h1 class="publication-title">Shell@Educhat</h1>
|
| 363 |
+
|
| 364 |
<h2 class="subtitle is-4" style="color: #4a5568; font-weight: 400;">
|
| 365 |
<span class="lang-en">Uncovering and Mitigating Implicit Risks in Domain-Specific LLMs</span>
|
| 366 |
<span class="lang-zh" style="font-weight: 700;">大语言模型垂域任务隐式价值观风险挖掘与对齐基准</span>
|
|
|
|
| 374 |
<div class="intro-content">
|
| 375 |
<div class="lang-en">
|
| 376 |
<p>
|
| 377 |
+
Ensuring the safety of large language models (LLMs) in vertical domains (Education, Finance, Management) is critical. While current alignment efforts primarily target explicit risks like bias and violence, they often fail to address deeper, <strong>domain-specific implicit risks</strong>. We introduce <strong>a comprehensive dataset</strong> categorizing risks into Green (Guide), Yellow (Reflect), and Red (Deny), and <strong>MENTOR</strong>, a framework using a Rule Evolution Cycle (REC) and Activation Steering (RV) to effectively detect and mitigate these subtle risks.
|
| 378 |
</p>
|
| 379 |
</div>
|
| 380 |
<div class="lang-zh">
|
| 381 |
<p>
|
| 382 |
+
确保垂直领域(教育、金融、管理)中大模型的安全性至关重要。虽然目前的对齐工作主要针对偏见和暴力等显性风险,但往往忽略了更深层次的<strong>特定领域隐性风险</strong>。研发团队推出了<strong>一个包含多类场景的基准测试集</strong>,将风险分为引导、反思、禁止三类,以及 <strong>MENTOR</strong> 框架。该框架利用规则演化循环(REC)和激活引导(RV)技术,能够有效发现并缓解这些不易察觉的潜在风险。
|
| 383 |
</p>
|
| 384 |
</div>
|
| 385 |
</div>
|
|
|
|
| 410 |
<span class="lang-zh">领域任务隐式风险数据集</span>
|
| 411 |
</h2>
|
| 412 |
<p style="color: var(--text-muted);">
|
| 413 |
+
<span class="lang-en">A domain-specific risk evaluation benchmark covering various queries.</span>
|
| 414 |
+
<span class="lang-zh">涵盖多类查询的特定领域风险评估基准。</span>
|
| 415 |
</p>
|
| 416 |
</div>
|
| 417 |
|
|
|
|
| 659 |
</thead>
|
| 660 |
<tbody>
|
| 661 |
<tr>
|
| 662 |
+
<td class="model-col">GPT-5-2025-08-07</td>
|
| 663 |
+
<td>0.308</td>
|
| 664 |
<td>0.098</td>
|
| 665 |
+
<td>0.042</td>
|
| 666 |
+
<td>0.027</td>
|
| 667 |
+
<td>0.364</td>
|
|
|
|
| 668 |
<td>0.370</td>
|
| 669 |
+
<td>0.190</td>
|
| 670 |
<td>0.855</td>
|
| 671 |
</tr>
|
| 672 |
<tr>
|
|
|
|
| 687 |
<td>0.131</td>
|
| 688 |
<td>0.088</td>
|
| 689 |
<td>0.696</td>
|
|
|
|
| 690 |
<td>0.844</td>
|
| 691 |
+
<td>0.716</td>
|
| 692 |
<td>0.581</td>
|
| 693 |
</tr>
|
| 694 |
<tr>
|
|
|
|
| 709 |
<td>0.030</td>
|
| 710 |
<td>0.019</td>
|
| 711 |
<td>0.492</td>
|
|
|
|
| 712 |
<td>0.518</td>
|
| 713 |
+
<td>0.300</td>
|
| 714 |
<td>0.771</td>
|
| 715 |
</tr>
|
| 716 |
<tr>
|
|
|
|
| 719 |
<td>0.070</td>
|
| 720 |
<td>0.035</td>
|
| 721 |
<td>0.021</td>
|
|
|
|
| 722 |
<td>0.672</td>
|
| 723 |
<td>0.682</td>
|
| 724 |
+
<td>0.522</td>
|
| 725 |
<td>0.659</td>
|
| 726 |
</tr>
|
| 727 |
<tr>
|
|
|
|
| 731 |
<td>0.020</td>
|
| 732 |
<td>0.011</td>
|
| 733 |
<td>0.608</td>
|
|
|
|
| 734 |
<td>0.482</td>
|
| 735 |
+
<td>0.328</td>
|
| 736 |
<td>0.749</td>
|
| 737 |
</tr>
|
| 738 |
<tr>
|
|
|
|
| 753 |
<td>0.073</td>
|
| 754 |
<td>0.059</td>
|
| 755 |
<td>0.790</td>
|
|
|
|
| 756 |
<td>0.920</td>
|
| 757 |
+
<td>0.912</td>
|
| 758 |
<td>0.496</td>
|
| 759 |
</tr>
|
| 760 |
<tr>
|
|
|
|
| 764 |
<td>0.009</td>
|
| 765 |
<td>0.003</td>
|
| 766 |
<td>0.280</td>
|
|
|
|
| 767 |
<td>0.170</td>
|
| 768 |
+
<td>0.174</td>
|
| 769 |
<td>0.906</td>
|
| 770 |
</tr>
|
| 771 |
<tr>
|
|
|
|
| 781 |
</tr>
|
| 782 |
<tr>
|
| 783 |
<td class="model-col">Gemini-2.5-Pro</td>
|
| 784 |
+
<td>0.440</td>
|
| 785 |
+
<td>0.017</td>
|
| 786 |
<td>0.003</td>
|
| 787 |
<td>0.002</td>
|
| 788 |
+
<td>0.418</td>
|
|
|
|
| 789 |
<td>0.502</td>
|
| 790 |
+
<td>0.400</td>
|
| 791 |
<td>0.761</td>
|
| 792 |
</tr>
|
| 793 |
<tr>
|
|
|
|
| 797 |
<td>0.005</td>
|
| 798 |
<td>0.003</td>
|
| 799 |
<td>0.426</td>
|
|
|
|
| 800 |
<td>0.346</td>
|
| 801 |
+
<td>0.220</td>
|
| 802 |
<td>0.831</td>
|
| 803 |
</tr>
|
| 804 |
</tbody>
|
|
|
|
| 837 |
<span class="lang-zh"><strong>免疫分 (Immunity Score):</strong> 量化了模型对隐性风险的抵抗能力 [0-1],越高越好。</span>
|
| 838 |
</li>
|
| 839 |
<li style="margin-top: 10px; color: #1a202c;">
|
| 840 |
+
<span class="lang-en"><strong>Dataset Composition:</strong> This leaderboard is based on curated queries, equally distributed across three vertical domains: <strong>Education (Edu), Management (Mgt), and Finance (Fin)</strong>.</span>
|
| 841 |
+
<span class="lang-zh"><strong>数据集构成:</strong> 本排行榜基于精选查询集,均匀分布于三个垂直领域:<strong>教育 (Edu)、管理 (Mgt) 和金融 (Fin)</strong>。</span>
|
| 842 |
</li>
|
| 843 |
</ul>
|
| 844 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 845 |
</div>
|
| 846 |
</section>
|
| 847 |
|