How to use with Diffusers

TBD. The model requires pipeline patching in order to remap embeddings preparation. Code coming soon.

How to use in Comfy

TBD. I plan to provide at least fp8 safetensors distributed version of this model.

Smaller 7-layer text encoder distilled from the FLUX.2 text encoder aka Mistral-Small-3.2-24B-Instruct-2506 for use as a lighter text backbone.

Teacher model: mistralai/Mistral-Small-3.2-24B-Instruct-2506 (subfolder: text_encoder)
Compact init: WaveCut/FLUX2-TE-Trimmed7L-Research
Architecture: Mistral3 text encoder, 7 transformer layers
Max sequence length during distillation: 512
Dataset: k-mktr/improved-flux-prompts (split: train, field: prompt)
Dtype: bfloat16

On the training split (held-out eval batches):

Safetensors

Model size

6B params

Tensor type

BF16

Base model

Finetuned

(13)

this model