Spaces:
Running
Running
| <head> | |
| <!-- Global site tag (gtag.js) - Google Analytics --> | |
| <script async src="https://www.googletagmanager.com/gtag/js?id=UA-178132094-1"></script> | |
| <script> | |
| window.dataLayer = window.dataLayer || []; | |
| function gtag() { | |
| dataLayer.push(arguments); | |
| } | |
| gtag("js", new Date()); | |
| gtag("config", "UA-178132094-1"); | |
| </script> | |
| <meta charset="UTF-8" /> | |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> | |
| <!-- <meta name="viewport" content="width=1024" /> --> | |
| <title>OR-Bench: Over Refusal Benchmark</title> | |
| <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script> | |
| <link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@400;700&display=swap" rel="stylesheet"> | |
| <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js"></script> | |
| <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous"> | |
| <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> | |
| <script type="text/javascript" async | |
| src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML"> | |
| </script> | |
| <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/foundation/6.4.3/css/foundation.min.css" /> | |
| <link rel="stylesheet" href="https://cdn.rawgit.com/jpswalsh/academicons/master/css/academicons.min.css" /> | |
| <script src="https://kit.fontawesome.com/b939870cfb.js" crossorigin="anonymous"></script> | |
| <link rel="stylesheet" href="https://cdn.datatables.net/1.10.24/css/dataTables.foundation.min.css"> | |
| <script type="text/javascript" src="https://cdn.datatables.net/1.10.24/js/jquery.dataTables.min.js"></script> | |
| <link rel="stylesheet" href="./css/main.css" /> | |
| </head> | |
| <body> | |
| <nav class="navbar navbar-expand-md"> | |
| <div class="container"> | |
| <a class="navbar-brand" href="./index.html" | |
| >OR-Bench</a> | |
| <button | |
| class="navbar-toggler navbar-light" | |
| type="button" | |
| data-toggle="collapse" | |
| data-target="#main-navigation" | |
| > | |
| <span class="navbar-toggler-icon"></span> | |
| </button> | |
| <div class="collapse navbar-collapse" id="main-navigation"> | |
| <ul class="navbar-nav"> | |
| <li class="nav-item"> | |
| <a class="nav-link" href="#leaderboard">Leaderboards</a> | |
| </li> | |
| <li> | |
| <a class="nav-link" href="https://huggingface.co/datasets/orbench-llm/or-bench" target="_blank">Datasets</a> | |
| </li> | |
| <li> | |
| <a class="nav-link" href="https://huggingface.co/spaces/orbench-llm/or-bench-demo" target="_blank">Demo</a> | |
| </li> | |
| <li> | |
| <a class="nav-link text-nowrap" href="https://github.com/orbench/or-bench" | |
| target="_blank">Github</a> | |
| </li> | |
| </ul> | |
| </div> | |
| </div> | |
| </nav> | |
| <!-- <hr class="toprule" /> --> | |
| <header> | |
| <div class="header-block container"> | |
| <div class="title-logo"><img src="./images/logo.png" alt="logo" /></div> | |
| <div class="title">OR-BENCH</div> | |
| <div class="description"> | |
| An over-refusal benchmark for large language models | |
| </div> | |
| </div> | |
| </header> | |
| <!-- <hr class="toprule" /> --> | |
| <div class="container"> | |
| <section id="introduction"> | |
| <div class="overview"> | |
| <p class="doublealign"> | |
| <b>Large Language Models (LLMs) </b> require careful safety alignment to prevent malicious outputs. While significant research focuses on mitigating harmful content generation, | |
| the enhanced safety often come with the side effect of over-refusal, where LLMs may reject innocuous prompts and become less helpful. | |
| Although the issue of over-refusal has been empirically observed, a systematic measurement is challenging | |
| due to the difficulty of crafting prompts that appear harmful but are benign.<br><br> | |
| We introduce OR-Bench, the <b>first large-scale over-refusal benchmark</b>. OR-Bench comprises 80,000 over-refusal prompts across 10 common rejection categories, a subset of around 1,000 hard prompts that are challenging even for state-of-the-art LLMs, and an additional 600 toxic prompts to prevent indiscriminate responses.<br><br> | |
| We plot the evaluation results in the following figure. The x-axis is the over-refusal rate and the y-axis is the rejection rate on real toxic prompts. In the ideal case, the model should be on the top-left corner where the model rejects the most number of toixc prompts and the least number of safe prompts. | |
| </p> | |
| <div style="margin-top:20px"><img src="./images/overall_x_y_plot.png" style="width: 100%;"/></div> | |
| </div> | |
| </section> | |
| <div class="divider"><hr /></div> | |
| <section class="container" id="div_cifar10_ipc1_heading"> | |
| <div id="div_or_bench" class="display responsive nowrap" style="width:100%"></div> | |
| </section> | |
| <div class="divider"><hr /></div> | |
| <!-- <script | |
| type="module" | |
| src="https://gradio.s3-us-west-2.amazonaws.com/4.31.0/gradio.js" | |
| ></script> --> | |
| <div><b>Please try out our demos below π</b></div> | |
| <div class="iframe-container"> | |
| <iframe | |
| src="https://orbench-llm-or-bench-demo.hf.space" | |
| frameborder="0" | |
| width="2160" | |
| height="450" | |
| ></iframe> | |
| </div> | |
| <div class="vspace50"></div> | |
| </div> | |
| <hr class="bottomrule" /> | |
| <footer> | |
| <small>© 2024, OR-Bench | |
| </footer> | |
| <script> | |
| // When the user scrolls the page, execute myFunction | |
| window.onscroll = function () { | |
| myFunction(); | |
| }; | |
| // Get the navbar | |
| var navbar = document.getElementById("navbar"); | |
| // Get the offset position of the navbar | |
| var sticky = navbar.offsetTop; | |
| // Add the sticky class to the navbar when you reach its scroll position. Remove "sticky" when you leave the scroll position | |
| function myFunction() { | |
| if (window.pageYOffset >= sticky) { | |
| navbar.classList.add("sticky"); | |
| } else { | |
| navbar.classList.remove("sticky"); | |
| } | |
| } | |
| </script> | |
| <script> | |
| $("#div_or_bench").load("./data/or-bench.html", function() { | |
| $('#or-bench-table').DataTable({ | |
| "pageLength": 25, // Set the initial number of entries | |
| "lengthMenu": [[10, 25, 50, -1], [10, 25, 50, "All"]], // Set options for lengthMenu | |
| "order": [[3, "asc"]], // Sort by the third column (index 2) in descending order | |
| "paging": false, // Disables pagination | |
| "responsive": true // Enable responsive feature | |
| }); | |
| }); | |
| </script> | |
| </body> | |