mom-multilingual-class
Collection
long context models for MoM multilingual classifier (domain, jailbreak, pii, factual, feedback)
•
10 items
•
Updated
Full merged model for PII (Personally Identifiable Information) detection, ready for direct inference. Based on mmBERT-32K-YaRN with 32K context length support.
| Property | Value |
|---|---|
| Base Model | llm-semantic-router/mmbert-32k-yarn |
| Architecture | ModernBERT (Flash Attention 2) |
| Parameters | 307M |
| Task | Token Classification (NER) |
| Max Context | 32,768 tokens |
| Entity Types | 17 PII types (35 BIO labels) |
PERSON - Person names (98.7% accuracy)EMAIL_ADDRESS - Email addresses (95%+ accuracy)PHONE_NUMBER - Phone numbers (99.1% accuracy)STREET_ADDRESS - Street addresses (95.9% accuracy)CREDIT_CARD - Credit card numbers (84% accuracy)US_SSN - US Social Security NumbersUS_DRIVER_LICENSE - US Driver License numbersIBAN_CODE - International Bank Account NumbersIP_ADDRESS - IP addressesDATE_TIME - Dates and timesAGE - Age informationORGANIZATION - Organization namesGPE - Geopolitical entitiesZIP_CODE - ZIP/postal codesDOMAIN_NAME - Domain namesNRP - Nationalities, religious or political groupsTITLE - Titles (Mr., Dr., etc.)from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch
model = AutoModelForTokenClassification.from_pretrained(
"llm-semantic-router/mmbert32k-pii-detector-merged"
)
tokenizer = AutoTokenizer.from_pretrained(
"llm-semantic-router/mmbert32k-pii-detector-merged"
)
text = "My email is john.smith@example.com and phone is 555-123-4567"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
# Get label mapping
id2label = model.config.id2label
for token, pred in zip(tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]), predictions[0]):
label = id2label[str(pred.item())]
if label != "O":
print(f"{token}: {label}")
This model is part of the vLLM Semantic Router Mixture-of-Models (MoM) family for intelligent LLM request routing.
MIT License
Base model
jhu-clsp/mmBERT-base