File size: 4,094 Bytes

---
license: apache-2.0
base_model:
- timm/tf_efficientnetv2_s.in21k_ft_in1k
- Ultralytics/YOLO11
tags:
- comfyui
- object-detection
- face-detection
- face-segmentation
- pytorch
- image-segmentation
---

<div align="center">
<img src="images/header.webp" width="800px" />

Custom-trained models for face detection and segmentation across realistic, anime, and NSFW content.

Made for the **Forbidden Vision** ComfyUI custom nodes

<a href="https://github.com/luxdelux7/ComfyUI-Forbidden-Vision">GitHub Repository</a>  
<a href="https://ko-fi.com/luxdelux" target="_blank">
  <img src="https://ko-fi.com/img/githubbutton_sm.svg" alt="Support me on Ko-fi">
</a>
</div>

---

## 🎯 Why These Models Exist

Traditional face models fail where it matters most for AI art workflows:

| **Problem** | **Why It Matters** |
|-------------|-------------------|
| 🎨 **Domain-locked** | Existing models excel at *either* anime *or* realistic—never both |
| 🔞 **NSFW blindness** | Most models trained only on SFW data break on adult content |
| 👁️‍🗨️ **Detail blindness** | Most models miss anime eyebrows, real eyelashes etc. |
| 🎲 **Generation artifacts** | Standard datasets don't include diffusion model quirks and failures |

**These models solve all 4.**

<div align="center">
<img src="./images/masks.webp" alt="Mask Example" style="border-radius: 6px; box-shadow: 0 0 12px rgba(0,0,0,0.1);">
<p><em>The segmentation model predicts face masks, stylistic eyebrows, eyelashes etc.</em></p>
</div>

---

## 📊 Training Foundation

### The Dataset Difference

Built from **14,000+ manually annotated images** across the domains that actually matter for AI generation:

<table>
<tr>
<td width="50%">

**🎨 Multi-Domain Coverage**
- SDXL, SD1.5, Pony, Illustrious outputs
- Curated Danbooru (anime styles)
- Real photography
- Full NSFW inclusion (no filtering)

</td>
<td width="50%">

**💎 Edge Case Priority**
- ✓ Extreme angles & occlusions
- ✓ Failed/broken generations
- ✓ Low-quality artifacts
- ✓ Unusual expressions & poses
- ✓ Everything other models ignore

</td>
</tr>
</table>

### What This Means For You

```
Traditional models: Trained on clean celebrity faces
         ↓
    Fail on real workflows

These models: Trained on what you actually generate
         ↓
    Work when you need them
```

**One model family. Every domain. Zero compromises.**

## Model Details

### Face Detection (YOLOv11-Small)

**Purpose:** Primary face detection with high recall

**Training Approach:**
- After every training run, I ran the model on a new mixed dataset, hardmining failures and improving the dataset until an acceptable performance was reached
- Trained at 640px resolution (inference should use same resolution)

**Why YOLOv11-Small instead of nano?**  
More reliable detection across mixed realistic/anime domains with acceptable speed tradeoff.

---


### Segmentation (EfficientNet-v2)

**Purpose:** Precise face mask generation

**Training Approach:**
- Dataset prepared using the Forbidden Vision YOLO model at 512px resolution
- Iterative hardmine training in multiple phases:
  - Train on the initial 700 samples
  - Evaluate on remaining images to find failure cases
  - Correct failed masks and add them to the dataset
  - Retrain with the expanded dataset
  - Repeat until failure cases drop to near-zero  
    (final dataset: 4k+ images) 

**Features:**
- Detects and includes facial features other models ignore, like protruding anime eybrows, realistic eyelashes sticking out of the face etc.
- Glasses and similar are treated as part of the face, even if sticking outside the face shape
- NSFW friendly across both anime, realistic and 3d domains

---

## Usage

These models are automatically downloaded and used by the **Fixer** node in ComfyUI Forbidden Vision.

## License

Apache 2.0

---

## Contact

- GitHub: [ComfyUI-Forbidden-Vision](https://github.com/luxdelux7/ComfyUI-Forbidden-Vision)
- Issues: [GitHub Issues](https://github.com/luxdelux7/ComfyUI-Forbidden-Vision/issues)
- Support: [Ko-fi](https://ko-fi.com/luxdelux)