TurkWeb-Edu Student (Reasoning) πΉπ·πΉπ·πΉπ·πΉπ·πΉπ·πΉπ·
A Turkish educational content scorer that generates reasoning before scoring. This is the Turkish equivalent of FineWeb-Edu classifier, but using Generative Reasoning Distillation.
How It Works
- You send Turkish text
- The model thinks (generates reasoning in Turkish)
- Then outputs an educational quality score (0-5)
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("YsK-dev/TurkWeb-Edu-Student-Qwen1.5B-SOTA", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("YsK-dev/TurkWeb-Edu-Student-Qwen1.5B-SOTA")
messages = [
{"role": "system", "content": "You are an educational quality classifier."},
{"role": "user", "content": "Analyze the following Turkish text for educational value (0-5):\n\n<your text>\n\nProvide your reasoning and final score."}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
output = model.generate(input_ids, max_new_tokens=300, temperature=0.1, do_sample=True)
print(tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True))
Training Details
| Component | Value |
|---|---|
| Teacher | Qwen3-30B-A3B-Instruct-2507 |
| Student | Qwen/Qwen2.5-1.5B-Instruct |
| Method | SFT with reasoning distillation (LoRA r=64) |
| Data | 660K Turkish web samples from FineWeb-2 |
| Hardware | 1x NVIDIA H100 80GB |
| Steps | 20,000 |
- Downloads last month
- 16