| | --- |
| | language: en |
| | license: mit |
| | tags: |
| | - sdf |
| | - classification |
| | - qwen2.5 |
| | - gguf |
| | - content-type |
| | - web-content |
| | base_model: Qwen/Qwen2.5-1.5B-Instruct |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # SDF Classify |
| |
|
| | Content type classifier for the [SDF Protocol](https://sdfprotocol.org). Fine-tuned from Qwen2.5-1.5B-Instruct using QLoRA. |
| |
|
| | ## Purpose |
| |
|
| | Classifies web content into SDF's hierarchical type system: 10 parent types and 50+ subtypes (e.g., `article.news`, `commerce.product`, `documentation.api_docs`). |
| |
|
| | ## Training |
| |
|
| | - **Base model**: Qwen2.5-1.5B-Instruct |
| | - **Method**: QLoRA (rank 32, alpha 64, dropout 0.05) |
| | - **Training data**: 2,335 classified web documents |
| | - **Accuracy**: 95.2% exact type match |
| |
|
| | ## Files |
| |
|
| | | File | Size | Description | |
| | |------|------|-------------| |
| | | `sdf-classify-Qwen2.5-1.5B-Instruct-Q4_K_M.gguf` | 941 MB | Quantized (Q4_K_M) — recommended for deployment | |
| | | `sdf-classify-Qwen2.5-1.5B-Instruct-f16.gguf` | 2.9 GB | Full precision (f16) | |
| | | `Modelfile` | — | Ollama import configuration | |
| |
|
| | ## Usage with Ollama |
| |
|
| | ```bash |
| | # Download the Q4_K_M file, then: |
| | ollama create sdf-classify -f Modelfile |
| | ``` |
| |
|
| | ## Part of SDF Protocol |
| |
|
| | - **Protocol**: [sdfprotocol.org](https://sdfprotocol.org) |
| | - **Specification**: [github.com/sdfprotocol/sdf](https://github.com/sdfprotocol/sdf) |
| | - **Whitepaper**: [DOI 10.5281/zenodo.18559223](https://doi.org/10.5281/zenodo.18559223) |
| | - **Extractor model**: [pranab2050/sdf-extract](https://huggingface.co/pranab2050/sdf-extract) |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{sarkar2026sdf, |
| | title={Convert Once, Consume Many: SDF for Cacheable, Typed Semantic Extraction from Web Pages}, |
| | author={Sarkar, Pranab}, |
| | year={2026}, |
| | doi={10.5281/zenodo.18559223}, |
| | publisher={Zenodo} |
| | } |
| | ``` |
| |
|