Maaza SLM-360M-JSON v1.2

First SLM with complex schema wins on EdgeJSON v3.1 normalized benchmark.

A 360M parameter model fine-tuned for high-accuracy JSON extraction. v1.2 introduces extended context (2048 tokens) for improved complex schema handling.

Performance

EdgeJSON v3.1 Benchmark (Normalized Scoring)

Metric	Score
JSONExact	58.9%
Field F1	0.761
Avg Latency	~39ms/token

By Complexity

Complexity	JSONExact	Field F1
Simple (2-4 fields)	88.2%	0.961
Medium (4-8 fields)	51.4%	0.860
Complex (8+ fields)	4.0%	0.072

Version Comparison

Note: v1.1+ use EdgeJSON v3.1 normalized scoring (case-insensitive keys). This is stricter on simple/medium schemas but fairer overall. v1.2 is the first SLM to solve complex schemas under the standardized benchmark.

Version	EdgeJSON Eval	Overall (JSONExact)	Complex	Notes
v1.0	v3.0 (strict)	55.1%	4.0%	Original release
v1.1	v3.1 (normalized)	60.1%	0.0%	Best simple/medium under fair scoring
v1.2	v3.1 (normalized)	58.9%	4.0%	Complex breakthrough under fair scoring

vs Baselines

Model	Params	JSONExact	Complex
SmolLM2-360M (base)	360M	11.4%	0.0%
Qwen2.5-3B	3B	6.0%	0.0%
Maaza v1.2	360M	58.9%	4.0%

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load model
base = AutoModelForCausalLM.from_pretrained(
    "HuggingFaceTB/SmolLM2-360M",
    torch_dtype=torch.float16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base, "CycleCoreTechnologies/Maaza-SLM-360M-JSON-v1.2")
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-360M")

# Inference
prompt = """Extract the structured JSON data from the following text. Use snake_case for all keys.

Input: Order #12345 from Jane Smith (jane@example.com). Items: Widget x2 ($19.99), Gadget ($49.99). Ship to 123 Main St, Springfield IL 62701. Total $89.97.

Output:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("Output:")[-1])

Expected Output

{
  "order_id": "12345",
  "customer": {"name": "Jane Smith", "email": "jane@example.com"},
  "items": [
    {"name": "Widget", "quantity": 2, "price": 19.99},
    {"name": "Gadget", "quantity": 1, "price": 49.99}
  ],
  "shipping": {"street": "123 Main St", "city": "Springfield", "state": "IL", "zip": "62701"},
  "total": 89.97
}

Evaluation

Run the EdgeJSON benchmark:

git clone https://github.com/CycleCore-Technologies/slmbench
cd slmbench
pip install -r requirements.txt

python benchmarks/edge_json/scripts/eval.py \
  --model HuggingFaceTB/SmolLM2-360M \
  --adapter CycleCoreTechnologies/Maaza-SLM-360M-JSON-v1.2 \
  --dataset benchmarks/edge_json/data/edgejson_test_v3.jsonl \
  --device cuda

Model Details

Base: SmolLM2-360M
Method: LoRA fine-tuning
Context: 2048 tokens (extended in v1.2)
License: Apache 2.0

Use Cases

Edge device JSON extraction
API response parsing
Document structure extraction
IoT data normalization

Limitations

Complex schemas (8+ fields, deep nesting) remain challenging
Best suited for simple/medium complexity extraction
v1.1 recommended if complex schemas not needed (higher overall accuracy)

Citation

@misc{cyclecore2025maaza,
  title={Maaza SLM-360M-JSON: Small Language Model for JSON Extraction},
  author={CycleCore Technologies},
  year={2025},
  howpublished={\url{https://huggingface.co/CycleCoreTechnologies/Maaza-SLM-360M-JSON-v1.2}}
}

Contact

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for CycleCoreTechnologies/Maaza-SLM-360M-JSON-v1.2

Base model

HuggingFaceTB/SmolLM2-360M

Adapter

(13)

this model

CycleCoreTechnologies
/

Maaza-SLM-360M-JSON-v1.2