Custom-trained models for face detection and segmentation across realistic, anime, and NSFW content.

Made for the Forbidden Vision ComfyUI custom nodes

GitHub Repository
Support me on Ko-fi


🎯 Why These Models Exist

Traditional face models fail where it matters most for AI art workflows:

Problem Why It Matters
🎨 Domain-locked Existing models excel at either anime or realisticβ€”never both
πŸ”ž NSFW blindness Most models trained only on SFW data break on adult content
πŸ‘οΈβ€πŸ—¨οΈ Detail blindness Most models miss anime eyebrows, real eyelashes etc.
🎲 Generation artifacts Standard datasets don't include diffusion model quirks and failures

These models solve all 4.

Mask Example

The segmentation model predicts face masks, stylistic eyebrows, eyelashes etc.


πŸ“Š Training Foundation

The Dataset Difference

Built from 14,000+ manually annotated images across the domains that actually matter for AI generation:

🎨 Multi-Domain Coverage

  • SDXL, SD1.5, Pony, Illustrious outputs
  • Curated Danbooru (anime styles)
  • Real photography
  • Full NSFW inclusion (no filtering)

πŸ’Ž Edge Case Priority

  • βœ“ Extreme angles & occlusions
  • βœ“ Failed/broken generations
  • βœ“ Low-quality artifacts
  • βœ“ Unusual expressions & poses
  • βœ“ Everything other models ignore

What This Means For You

Traditional models: Trained on clean celebrity faces
         ↓
    Fail on real workflows

These models: Trained on what you actually generate
         ↓
    Work when you need them

One model family. Every domain. Zero compromises.

Model Details

Face Detection (YOLOv11-Small)

Purpose: Primary face detection with high recall

Training Approach:

  • After every training run, I ran the model on a new mixed dataset, hardmining failures and improving the dataset until an acceptable performance was reached
  • Trained at 640px resolution (inference should use same resolution)

Why YOLOv11-Small instead of nano?
More reliable detection across mixed realistic/anime domains with acceptable speed tradeoff.


Segmentation (EfficientNet-v2)

Purpose: Precise face mask generation

Training Approach:

  • Dataset prepared using the Forbidden Vision YOLO model at 512px resolution
  • Iterative hardmine training in multiple phases:
    • Train on the initial 700 samples
    • Evaluate on remaining images to find failure cases
    • Correct failed masks and add them to the dataset
    • Retrain with the expanded dataset
    • Repeat until failure cases drop to near-zero
      (final dataset: 4k+ images)

Features:

  • Detects and includes facial features other models ignore, like protruding anime eybrows, realistic eyelashes sticking out of the face etc.
  • Glasses and similar are treated as part of the face, even if sticking outside the face shape
  • NSFW friendly across both anime, realistic and 3d domains

Usage

These models are automatically downloaded and used by the Fixer node in ComfyUI Forbidden Vision.

License

Apache 2.0


Contact

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for luxdelux7/ForbiddenVision_Models

Base model

Ultralytics/YOLO11
Finetuned
(103)
this model