--- title: AI Safety Datasets Overview emoji: 🛡️ colorFrom: red colorTo: pink sdk: static pinned: false license: cc-by-nc-4.0 short_description: AI safety datasets with adversarial conversations tags: - safety - adversarial - red-teaming - ai-safety - multi-turn - synthetic datasets: - GoJulyAI/multi-turn-conversations - GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1 - GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2 --- # 🛡️ AI Safety Datasets Collection Comprehensive evaluation datasets for testing AI model safety mechanisms ## 📊 Dataset Collection Summary | Metric | Value | |--------|-------| | **Total Conversations** | 849+ | | **Total Turns** | 6,694+ | | **Dataset Types** | 3 complementary methodologies | | **Sample Data Available** | 150 free conversations | ## 📈 Full Dataset Statistics | Dataset | Conversations | Turns | Avg Turns/Conv | Focus | |---------|--------------|-------|----------------|--------| | **Psychology multi-turn** | 184+ | 1,964+ | 10.3 | Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. | | **Illicit (bioweapon) multi-turn** | 84+ | 822+ | 9.8 | Bio-safety harmfulness such as bioweapons, pathogens, etc. | | **Illicit (chemical, general) multi-turn** | 581+ | 3,908+ | 6.7 | Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. | ## 🔗 Access Datasets on Hugging Face ### Psychology Multi-turn Conversations Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. **Sample:** 5 conversations 🔗 **[View Dataset](https://huggingface.co/datasets/GoJulyAI/psychology-multi-turn)** ### Illicit (bioweapon) Multi-turn Conversations Bio-safety harmfulness such as bioweapons, pathogens, etc. **Sample:** 5 conversations 🔗 **[View Dataset](https://huggingface.co/datasets/GoJulyAI/illicit-bio-multi-turn/)** ### Illicit (chemical, general) Multi-turn Conversations Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. **Sample:** 5 conversations 🔗 **[View Dataset](https://huggingface.co/datasets/GoJulyAI/illicit-general-multi-turn)** ## ⚠️ Ethical Considerations **⚠️ IMPORTANT:** These datasets contain successful adversarial attacks and harmful content. ### ✅ Intended Use - Defensive security research - AI safety evaluation and improvement - Academic research on adversarial robustness - Training safety and moderation systems ### ❌ Prohibited Use - Creating offensive content - Developing attack tools for malicious purposes - Bypassing safety systems for harm - Any use that violates laws or ethical guidelines ## 🎯 Data Selection Process All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols. ### Base Criteria - Text-based objectives (no code execution templates) - Verdict: `success` (harmful requests successfully fulfilled) - Multi-turn conversations with prompt-response pairs ### Psychology-Specific Criteria - Organic conversations (`organicity = true`) - Successfully elicited harmful psychology-related content ### Illicit-Specific Criteria - Contains specific instruction details - Practically executable (not abstract) - Successfully elicited harmful illicit-related content ## 📄 License Sample datasets are released under **CC-BY-NC-4.0** (Creative Commons Attribution-NonCommercial 4.0 International). - ✅ Use for research and evaluation - ✅ Modify and build upon the data - ✅ Share with attribution - ❌ Commercial use without separate licensing ## 💼 Full Dataset Access The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates. **Please contact us at [info@gojuly.ai](mailto:info@gojuly.ai) to purchase any or all of full datasets.** Include your research objectives, institutional affiliation, and intended use in your inquiry. --- **Last Updated:** December 2, 2025 For detailed documentation, visit the individual dataset repositories on Hugging Face.