qwen-nusstek-id / README.md
Syamsuddin's picture
Update README.md
445f221 verified
---
license: apache-2.0
datasets:
- Syamsuddin/kamus-program-kegiatan
- Syamsuddin/akun-sumber-pendanaan
- Syamsuddin/qwen-wilayah-administratif-ID
- Syamsuddin/qwen-kamus-sumber-pendanaan-ID
language:
- id
base_model:
- Qwen/Qwen3-4B-Thinking-2507
library_name: transformers
tags:
- indonesia
- nusstek-ai
- apbd
- sakip
- pemerintah-daerah
---
<p align="center">
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsRMVaInTsXT2qdZtt2xiU4mhqFGAchjtWDLnJpdXn1vJAmst-ausr-m9aM4y_8a3YzwGnIvozpwP8BeoyVmHcyv9kjkG-MSX8Y0Z4sCi12Zr3CboGmLOGRj5X4MmfQrbU-DNuS6-6jUF3G2Ytc34e_RoWwnKxe6y1GEaaY6y23z2WZSsOeeiNtuHsKnrN/s320/banner-nusstek.png
" alt="NUSSTEK-AI Banner" width="100%"/>
</p>
# Qwen3-4B-Thinking Indonesia Admin AI
<!-- ===== BADGES: LoRA Release ===== -->
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-green.svg)](./LICENSE)
[![Framework: Transformers](https://img.shields.io/badge/Framework-Transformers-blue.svg)](https://huggingface.co/docs/transformers)
[![PEFT: LoRA](https://img.shields.io/badge/PEFT-LoRA-8A2BE2.svg)](https://github.com/huggingface/peft)
[![Quantization: QLoRA 4-bit](https://img.shields.io/badge/Quantization-QLoRA%204--bit-59316B.svg)](https://github.com/huggingface/peft)
[![Base: Qwen3-4B-Thinking-2507](https://img.shields.io/badge/Base-Qwen3--4B--Thinking--2507-black.svg)](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
[![Language: Indonesian](https://img.shields.io/badge/Language-Indonesia-red.svg)](#)
[![Tasks: Instruction Tuning](https://img.shields.io/badge/Tasks-Instruction%20Tuning-orange.svg)](#)
[![Domain: Pemda/APBD](https://img.shields.io/badge/Domain-Pemda%2FAPBD-9cf.svg)](#)
[![Datasets](https://img.shields.io/badge/Datasets-4%20sources-informational.svg)](#datasets)
[![Model Type: Adapter](https://img.shields.io/badge/Model%20Type-Adapter%20(LoRA)-steelblue.svg)](#)
<!-- ===== BADGES: Full Model Release ===== -->
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-green.svg)](./LICENSE)
[![Framework: Transformers](https://img.shields.io/badge/Framework-Transformers-blue.svg)](https://huggingface.co/docs/transformers)
[![Weights: FP16/BF16](https://img.shields.io/badge/Weights-FP16%2FBF16-5865F2.svg)](#)
[![Base: Qwen3-4B-Thinking-2507](https://img.shields.io/badge/Base-Qwen3--4B--Thinking--2507-black.svg)](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
[![Language: Indonesian](https://img.shields.io/badge/Language-Indonesia-red.svg)](#)
[![Tasks: Instruction Tuning](https://img.shields.io/badge/Tasks-Instruction%20Tuning-orange.svg)](#)
[![Domain: Pemda/APBD](https://img.shields.io/badge/Domain-Pemda%2FAPBD-9cf.svg)](#)
[![Datasets](https://img.shields.io/badge/Datasets-4%20sources-informational.svg)](#datasets)
[![Model Type: Full](https://img.shields.io/badge/Model%20Type-Full%20Weights-darkcyan.svg)](#)
## Model Summary
**Qwen3-4B-Thinking Indonesia Admin AI** adalah model **Large Language Model (LLM)** berbasis *Qwen3-4B-Thinking-2507* yang telah di-*fine-tune* dengan dataset administratif dan keuangan daerah Indonesia.
Tujuan model ini adalah untuk:
- Memahami struktur **program/kegiatan APBD**,
- Mengelola kamus **akun & sumber pendanaan**,
- Menjawab pertanyaan seputar **wilayah administratif Indonesia**,
- Mendukung sistem **AI Pemerintah Daerah** seperti SAKIP-AI, APBD-AI, MONEV-AI.
Model ini dikembangkan sebagai bagian dari inisiatif **NUSTTEK-AI (Nusantara Smart Teknologi)** untuk membangun ekosistem AI on-premise di pemerintahan daerah.
---
<!-- ===== BADGES: Activity (Optional) ===== -->
[![HF Downloads](https://img.shields.io/huggingface/datasets/dl-total.svg?label=HF%20Downloads&color=brightgreen)](https://huggingface.co/Syamsuddin)
[![Model Likes](https://img.shields.io/badge/HF%20Likes-❤︎-pink.svg)](https://huggingface.co/Syamsuddin)
[![Open In Colab](https://img.shields.io/badge/Open%20in-Colab-yellow.svg)](#)
## Model Details
- **Developed by:** Syamsuddin / NUSSTEK-AI
- **Model type:** Instruction-tuned LLM (Qwen3-4B)
- **Language(s):** Bahasa Indonesia (`id`)
- **License:** Apache-2.0
- **Finetuned from:** [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
- **Library:** HuggingFace Transformers
### Model Sources
- **Repository:** [Model on HuggingFace](https://huggingface.co/Syamsuddin/qwen3-4B-Indonesia-AdminAI)
- **Datasets:**
- [Syamsuddin/kamus-program-kegiatan](https://huggingface.co/datasets/Syamsuddin/kamus-program-kegiatan)
- [Syamsuddin/akun-sumber-pendanaan](https://huggingface.co/datasets/Syamsuddin/akun-sumber-pendanaan)
- [Syamsuddin/qwen-wilayah-administratif-ID](https://huggingface.co/datasets/Syamsuddin/qwen-wilayah-administratif-ID)
- [Syamsuddin/qwen-kamus-sumber-pendanaan-ID](https://huggingface.co/datasets/Syamsuddin/qwen-kamus-sumber-pendanaan-ID)
---
<!-- ===== BADGES: Datasets Detail (Optional) ===== -->
[![Dataset: Program-Kegiatan](https://img.shields.io/badge/Dataset-kamus--program--kegiatan-blue.svg)](https://huggingface.co/datasets/Syamsuddin/kamus-program-kegiatan)
[![Dataset: Akun-Pendanaan](https://img.shields.io/badge/Dataset-akun--sumber--pendanaan-blue.svg)](https://huggingface.co/datasets/Syamsuddin/akun-sumber-pendanaan)
[![Dataset: Wilayah ID](https://img.shields.io/badge/Dataset-wilayah%20administratif%20ID-blue.svg)](https://huggingface.co/datasets/Syamsuddin/qwen-wilayah-administratif-ID)
[![Dataset: Kamus Pendanaan ID](https://img.shields.io/badge/Dataset-kamus%20sumber%20pendanaan%20ID-blue.svg)](https://huggingface.co/datasets/Syamsuddin/qwen-kamus-sumber-pendanaan-ID)
## Intended Uses
### Direct Use
- Chatbot AI untuk pemerintah daerah.
- Asisten digital dalam drafting dokumen **RKA, DPA, SAKIP, dan laporan keuangan**.
- Pencarian cepat kode/nama akun, sumber pendanaan, program, kegiatan, dan wilayah.
### Downstream Use
- **RAG (Retrieval-Augmented Generation)** dengan dokumen daerah.
- Basis AI untuk aplikasi internal Pemda (BudView-AI, APBD-AI, PBJ-AI, SAKIP-AI, MONEV-AI).
### Out-of-Scope Use
- Tidak cocok untuk domain medis, hukum internasional, atau bahasa non-Indonesia.
- Tidak boleh digunakan sebagai dasar keputusan hukum tanpa verifikasi manual.
---
## Training Details
### Training Data
Model dilatih dengan beberapa dataset khusus:
- **Program & Kegiatan APBD** → struktur, kode, nama, hubungan hierarki.
- **Akun & Sumber Pendanaan** → kode dan uraian pendanaan, termasuk kelompok, jenis, obyek.
- **Wilayah Administratif** → kode provinsi, kabupaten/kota, kecamatan, desa/kelurahan.
### Preprocessing
- Format data diubah menjadi **instruction tuning** dengan tiga kolom:
- `instruction` → tugas,
- `input` → konteks,
- `output` → jawaban.
- Variasi pertanyaan dibuat (nama→kode, kode→nama, fuzzy search, normalisasi, daftar lengkap).
### Hyperparameters
- **Precision:** bf16 mixed precision
- **Batch size:** 4 (grad accum = 8)
- **Learning rate:** 2e-5
- **Epochs:** 3
- **Max sequence length:** 2048
### Compute Infrastructure
- **Hardware:** 1x NVIDIA A100 40GB
- **Framework:** HuggingFace Transformers + PEFT (LoRA)
- **OS:** Ubuntu Server 22.04
---
## Evaluation
### Testing Data
- Sample pertanyaan dari dataset wilayah, akun, program.
### Metrics
- Evaluasi manual: **akurasi lookup kode/nama** & **kelengkapan daftar wilayah**.
- Hasil:
- Lookup kode/nama: >95% benar.
- Daftar wilayah sesuai dengan data Kementerian Dalam Negeri.
---
## Environmental Impact
- **Hardware Type:** GPU A100 40GB
- **Training Time:** ±4 jam
- **Carbon Emitted:** ~3 kgCO2eq (estimasi dengan MLCO2 Impact Calculator)
---
## Limitations & Risks
- Dataset terbatas pada versi terbaru (2023/2024). Perubahan regulasi baru bisa menyebabkan jawaban usang.
- Model mungkin **sensitif terhadap variasi ejaan** yang tidak ada dalam dataset.
- Tidak mendukung multi-bahasa selain Bahasa Indonesia.
### Recommendations
Gunakan model ini **sebagai pendamping**, bukan pengganti validasi manual.
Periksa hasil AI terhadap dokumen resmi (Permendagri, RPJMD, RKA, APBD).
---
## How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Syamsuddin/qwen3-4B-Indonesia-AdminAI"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
prompt = "Apa kode sumber pendanaan untuk 'Pendapatan Asli Daerah'?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## Citation
```bibtex
@misc{syamsuddin2025qwenid,
author = {Syamsuddin},
title = {Qwen3-4B-Thinking Indonesia Admin AI},
year = {2025},
howpublished = {HuggingFace Models},
url = {https://huggingface.co/Syamsuddin/qwen3-4B-Indonesia-AdminAI},
note = {Fine-tuned Qwen3-4B untuk konteks pemerintahan Indonesia (APBD, SAKIP, Wilayah)}
}
```
---
## Contact
- **Author:** Syamsuddin
- **Organization:** NUSSTEK-AI
- **Email:** office@nusstek.com, syamsuddin.ideris@gmail.com