---
title: AI Safety Datasets Overview
emoji: 🛡️
colorFrom: red
colorTo: pink
sdk: static
pinned: false
license: cc-by-nc-4.0
short_description: AI safety datasets with adversarial conversations
tags:
- safety
- adversarial
- red-teaming
- ai-safety
- multi-turn
- synthetic
datasets:
- GoJulyAI/multi-turn-conversations
- GoJulyAI/multi-turn-bio-transformed-synth-conversations-v1
- GoJulyAI/multi-turn-bio-transformed-synth-conversations-v2
---

# 🛡️ AI Safety Datasets Collection

Comprehensive evaluation datasets for testing AI model safety mechanisms

## 📊 Dataset Collection Summary

| Metric | Value |
|--------|-------|
| **Total Conversations** | 849+ |
| **Total Turns** | 6,694+ |
| **Dataset Types** | 3 complementary methodologies |
| **Sample Data Available** | 150 free conversations |

## 📈 Full Dataset Statistics

| Dataset | Conversations | Turns | Avg Turns/Conv | Focus |
|---------|--------------|-------|----------------|--------|
| **Psychology multi-turn** | 184+ | 1,964+ | 10.3 | Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc. |
| **Illicit (bioweapon) multi-turn** | 84+ | 822+ | 9.8 | Bio-safety harmfulness such as bioweapons, pathogens, etc. |
| **Illicit (chemical, general) multi-turn** | 581+ | 3,908+ | 6.7 | Non-bio safety harmfulness such as chemical weapons, cyber threats, etc. |

## 🔗 Access Datasets on Hugging Face

### Psychology Multi-turn Conversations
Psychology harmfulness such as self-harm, psychosis, anthropomorphism, etc.  
**Sample:** 5 conversations

🔗 **[View Dataset](https://huggingface.co/datasets/GoJulyAI/psychology-multi-turn)**

### Illicit (bioweapon) Multi-turn Conversations
Bio-safety harmfulness such as bioweapons, pathogens, etc.  
**Sample:** 5 conversations

🔗 **[View Dataset](https://huggingface.co/datasets/GoJulyAI/illicit-bio-multi-turn/)**

### Illicit (chemical, general) Multi-turn Conversations
Non-bio safety harmfulness such as chemical weapons, cyber threats, etc.  
**Sample:** 5 conversations

🔗 **[View Dataset](https://huggingface.co/datasets/GoJulyAI/illicit-general-multi-turn)**

## ⚠️ Ethical Considerations

**⚠️ IMPORTANT:** These datasets contain successful adversarial attacks and harmful content.

### ✅ Intended Use
- Defensive security research
- AI safety evaluation and improvement
- Academic research on adversarial robustness
- Training safety and moderation systems

### ❌ Prohibited Use
- Creating offensive content
- Developing attack tools for malicious purposes
- Bypassing safety systems for harm
- Any use that violates laws or ethical guidelines

## 🎯 Data Selection Process

All datasets are derived from high-quality, validated conversations with strict quality filters including NeurIPS evaluation protocols.

### Base Criteria
- Text-based objectives (no code execution templates)
- Verdict: `success` (harmful requests successfully fulfilled)
- Multi-turn conversations with prompt-response pairs

### Psychology-Specific Criteria
- Organic conversations (`organicity = true`)
- Successfully elicited harmful psychology-related content

### Illicit-Specific Criteria
- Contains specific instruction details
- Practically executable (not abstract)
- Successfully elicited harmful illicit-related content

## 📄 License

Sample datasets are released under **CC-BY-NC-4.0** (Creative Commons Attribution-NonCommercial 4.0 International).

- ✅ Use for research and evaluation
- ✅ Modify and build upon the data
- ✅ Share with attribution
- ❌ Commercial use without separate licensing

## 💼 Full Dataset Access

The sample datasets provide representative examples. Full datasets contain thousands of additional conversations with expanded harm categories and regular updates.

**Please contact us at [info@gojuly.ai](mailto:info@gojuly.ai) to purchase any or all of full datasets.**

Include your research objectives, institutional affiliation, and intended use in your inquiry.

---

**Last Updated:** December 2, 2025

For detailed documentation, visit the individual dataset repositories on Hugging Face.