|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- Syamsuddin/kamus-program-kegiatan |
|
|
- Syamsuddin/akun-sumber-pendanaan |
|
|
- Syamsuddin/qwen-wilayah-administratif-ID |
|
|
- Syamsuddin/qwen-kamus-sumber-pendanaan-ID |
|
|
language: |
|
|
- id |
|
|
base_model: |
|
|
- Qwen/Qwen3-4B-Thinking-2507 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- indonesia |
|
|
- nusstek-ai |
|
|
- apbd |
|
|
- sakip |
|
|
- pemerintah-daerah |
|
|
--- |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsRMVaInTsXT2qdZtt2xiU4mhqFGAchjtWDLnJpdXn1vJAmst-ausr-m9aM4y_8a3YzwGnIvozpwP8BeoyVmHcyv9kjkG-MSX8Y0Z4sCi12Zr3CboGmLOGRj5X4MmfQrbU-DNuS6-6jUF3G2Ytc34e_RoWwnKxe6y1GEaaY6y23z2WZSsOeeiNtuHsKnrN/s320/banner-nusstek.png |
|
|
" alt="NUSSTEK-AI Banner" width="100%"/> |
|
|
</p> |
|
|
|
|
|
# Qwen3-4B-Thinking Indonesia Admin AI |
|
|
<!-- ===== BADGES: LoRA Release ===== --> |
|
|
[](./LICENSE) |
|
|
[](https://huggingface.co/docs/transformers) |
|
|
[](https://github.com/huggingface/peft) |
|
|
[](https://github.com/huggingface/peft) |
|
|
[](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) |
|
|
[](#) |
|
|
[](#) |
|
|
[](#) |
|
|
[](#datasets) |
|
|
[-steelblue.svg)](#) |
|
|
|
|
|
<!-- ===== BADGES: Full Model Release ===== --> |
|
|
[](./LICENSE) |
|
|
[](https://huggingface.co/docs/transformers) |
|
|
[](#) |
|
|
[](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) |
|
|
[](#) |
|
|
[](#) |
|
|
[](#) |
|
|
[](#datasets) |
|
|
[](#) |
|
|
|
|
|
|
|
|
## Model Summary |
|
|
|
|
|
**Qwen3-4B-Thinking Indonesia Admin AI** adalah model **Large Language Model (LLM)** berbasis *Qwen3-4B-Thinking-2507* yang telah di-*fine-tune* dengan dataset administratif dan keuangan daerah Indonesia. |
|
|
|
|
|
Tujuan model ini adalah untuk: |
|
|
- Memahami struktur **program/kegiatan APBD**, |
|
|
- Mengelola kamus **akun & sumber pendanaan**, |
|
|
- Menjawab pertanyaan seputar **wilayah administratif Indonesia**, |
|
|
- Mendukung sistem **AI Pemerintah Daerah** seperti SAKIP-AI, APBD-AI, MONEV-AI. |
|
|
|
|
|
Model ini dikembangkan sebagai bagian dari inisiatif **NUSTTEK-AI (Nusantara Smart Teknologi)** untuk membangun ekosistem AI on-premise di pemerintahan daerah. |
|
|
|
|
|
--- |
|
|
|
|
|
<!-- ===== BADGES: Activity (Optional) ===== --> |
|
|
[](https://huggingface.co/Syamsuddin) |
|
|
[](https://huggingface.co/Syamsuddin) |
|
|
[](#) |
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Developed by:** Syamsuddin / NUSSTEK-AI |
|
|
- **Model type:** Instruction-tuned LLM (Qwen3-4B) |
|
|
- **Language(s):** Bahasa Indonesia (`id`) |
|
|
- **License:** Apache-2.0 |
|
|
- **Finetuned from:** [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) |
|
|
- **Library:** HuggingFace Transformers |
|
|
|
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository:** [Model on HuggingFace](https://huggingface.co/Syamsuddin/qwen3-4B-Indonesia-AdminAI) |
|
|
- **Datasets:** |
|
|
- [Syamsuddin/kamus-program-kegiatan](https://huggingface.co/datasets/Syamsuddin/kamus-program-kegiatan) |
|
|
- [Syamsuddin/akun-sumber-pendanaan](https://huggingface.co/datasets/Syamsuddin/akun-sumber-pendanaan) |
|
|
- [Syamsuddin/qwen-wilayah-administratif-ID](https://huggingface.co/datasets/Syamsuddin/qwen-wilayah-administratif-ID) |
|
|
- [Syamsuddin/qwen-kamus-sumber-pendanaan-ID](https://huggingface.co/datasets/Syamsuddin/qwen-kamus-sumber-pendanaan-ID) |
|
|
|
|
|
--- |
|
|
|
|
|
<!-- ===== BADGES: Datasets Detail (Optional) ===== --> |
|
|
[](https://huggingface.co/datasets/Syamsuddin/kamus-program-kegiatan) |
|
|
[](https://huggingface.co/datasets/Syamsuddin/akun-sumber-pendanaan) |
|
|
[](https://huggingface.co/datasets/Syamsuddin/qwen-wilayah-administratif-ID) |
|
|
[](https://huggingface.co/datasets/Syamsuddin/qwen-kamus-sumber-pendanaan-ID) |
|
|
|
|
|
|
|
|
## Intended Uses |
|
|
|
|
|
### Direct Use |
|
|
- Chatbot AI untuk pemerintah daerah. |
|
|
- Asisten digital dalam drafting dokumen **RKA, DPA, SAKIP, dan laporan keuangan**. |
|
|
- Pencarian cepat kode/nama akun, sumber pendanaan, program, kegiatan, dan wilayah. |
|
|
|
|
|
### Downstream Use |
|
|
- **RAG (Retrieval-Augmented Generation)** dengan dokumen daerah. |
|
|
- Basis AI untuk aplikasi internal Pemda (BudView-AI, APBD-AI, PBJ-AI, SAKIP-AI, MONEV-AI). |
|
|
|
|
|
### Out-of-Scope Use |
|
|
- Tidak cocok untuk domain medis, hukum internasional, atau bahasa non-Indonesia. |
|
|
- Tidak boleh digunakan sebagai dasar keputusan hukum tanpa verifikasi manual. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
Model dilatih dengan beberapa dataset khusus: |
|
|
- **Program & Kegiatan APBD** → struktur, kode, nama, hubungan hierarki. |
|
|
- **Akun & Sumber Pendanaan** → kode dan uraian pendanaan, termasuk kelompok, jenis, obyek. |
|
|
- **Wilayah Administratif** → kode provinsi, kabupaten/kota, kecamatan, desa/kelurahan. |
|
|
|
|
|
### Preprocessing |
|
|
- Format data diubah menjadi **instruction tuning** dengan tiga kolom: |
|
|
- `instruction` → tugas, |
|
|
- `input` → konteks, |
|
|
- `output` → jawaban. |
|
|
- Variasi pertanyaan dibuat (nama→kode, kode→nama, fuzzy search, normalisasi, daftar lengkap). |
|
|
|
|
|
### Hyperparameters |
|
|
- **Precision:** bf16 mixed precision |
|
|
- **Batch size:** 4 (grad accum = 8) |
|
|
- **Learning rate:** 2e-5 |
|
|
- **Epochs:** 3 |
|
|
- **Max sequence length:** 2048 |
|
|
|
|
|
### Compute Infrastructure |
|
|
- **Hardware:** 1x NVIDIA A100 40GB |
|
|
- **Framework:** HuggingFace Transformers + PEFT (LoRA) |
|
|
- **OS:** Ubuntu Server 22.04 |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Testing Data |
|
|
- Sample pertanyaan dari dataset wilayah, akun, program. |
|
|
|
|
|
### Metrics |
|
|
- Evaluasi manual: **akurasi lookup kode/nama** & **kelengkapan daftar wilayah**. |
|
|
- Hasil: |
|
|
- Lookup kode/nama: >95% benar. |
|
|
- Daftar wilayah sesuai dengan data Kementerian Dalam Negeri. |
|
|
|
|
|
--- |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
- **Hardware Type:** GPU A100 40GB |
|
|
- **Training Time:** ±4 jam |
|
|
- **Carbon Emitted:** ~3 kgCO2eq (estimasi dengan MLCO2 Impact Calculator) |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations & Risks |
|
|
|
|
|
- Dataset terbatas pada versi terbaru (2023/2024). Perubahan regulasi baru bisa menyebabkan jawaban usang. |
|
|
- Model mungkin **sensitif terhadap variasi ejaan** yang tidak ada dalam dataset. |
|
|
- Tidak mendukung multi-bahasa selain Bahasa Indonesia. |
|
|
|
|
|
### Recommendations |
|
|
Gunakan model ini **sebagai pendamping**, bukan pengganti validasi manual. |
|
|
Periksa hasil AI terhadap dokumen resmi (Permendagri, RPJMD, RKA, APBD). |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Use |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_id = "Syamsuddin/qwen3-4B-Indonesia-AdminAI" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_id) |
|
|
|
|
|
prompt = "Apa kode sumber pendanaan untuk 'Pendapatan Asli Daerah'?" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=200) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{syamsuddin2025qwenid, |
|
|
author = {Syamsuddin}, |
|
|
title = {Qwen3-4B-Thinking Indonesia Admin AI}, |
|
|
year = {2025}, |
|
|
howpublished = {HuggingFace Models}, |
|
|
url = {https://huggingface.co/Syamsuddin/qwen3-4B-Indonesia-AdminAI}, |
|
|
note = {Fine-tuned Qwen3-4B untuk konteks pemerintahan Indonesia (APBD, SAKIP, Wilayah)} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Contact |
|
|
|
|
|
- **Author:** Syamsuddin |
|
|
- **Organization:** NUSSTEK-AI |
|
|
- **Email:** office@nusstek.com, syamsuddin.ideris@gmail.com |
|
|
|