Llama-3.2-1B-Instruct-bnb-4bit-lima - LoRA Adapters
Fine-tuned LoRA adapters for unsloth/Llama-3.2-1B-Instruct-bnb-4bit using supervised fine-tuning.
Model Details
- Base Model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
- Training Method: LoRA (Low-Rank Adaptation)
- Dataset: GAIR/lima
- Training Framework: Unsloth + TRL + Transformers
- Adapter Type: PEFT LoRA adapters only (requires base model)
Prompt Format
This model uses the Llama 3.2 chat template.
Use the tokenizer's apply_chat_template() method:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Your question here"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
Training Configuration
LoRA Parameters
- LoRA Rank (r): 64
- LoRA Alpha: 128
- LoRA Dropout: 0.0
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Hyperparameters
- Learning Rate: 0.0003
- Batch Size: 4
- Gradient Accumulation Steps: 2
- Effective Batch Size: 8
- Epochs: 3.0
- Max Sequence Length: 2048
- Optimizer: adamw_8bit
- Packing: False
- Weight Decay: 0.01
- Learning Rate Scheduler: linear
Training Results
- Training Loss: 1.1123
- Training Steps: 480
- Dataset Samples: 1278
- Training Scope: 1,278 samples (3.0 epoch(s), full dataset)
Benchmark Results
Evaluated: 2025-11-24 03:10 Comparison: Fine-tuned vs Base model
HuggingFace Transformers (16-bit merged model)
IFEval (Instruction Following)
| Model | Strict Prompt | Strict Inst | Loose Prompt | Loose Inst |
|---|---|---|---|---|
| Base | 0.4399 | 0.5731 | 0.4787 | 0.6067 |
| Fine-tuned | 0.3050 | 0.4376 | 0.3327 | 0.4700 |
| Ξ | β -0.1349 | β -0.1355 | β -0.1460 | β -0.1367 |
Summary
| Benchmark | What It Tests | Base | Fine-tuned | Improvement |
|---|---|---|---|---|
| IFEval | Tests ability to follow specific instructions | 43.99% | 30.50% | β -13.49% (-30.7%) |
| GSM8K | Tests math reasoning and chain-of-thought | - | - | - |
| HellaSwag | Tests real-world knowledge and common sense | - | - | - |
| MMLU | Tests broad knowledge retention (detects catastrophic forgetting) | - | - | - |
| TruthfulQA | Tests tendency to generate truthful answers | - | - | - |
Usage
Load with Transformers + PEFT
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
load_in_4bit=True,
device_map="auto"
)
# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "path/to/lora")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Llama-3.2-1B-Instruct-bnb-4bit")
# Generate
messages = [{"role": "user", "content": "Your question here"}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Load with Unsloth (Recommended)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="path/to/lora",
max_seq_length=2048,
dtype=None,
load_in_4bit=True,
)
# For inference
FastLanguageModel.for_inference(model)
# Generate
messages = [{"role": "user", "content": "Your question here"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Related Models
- Merged Model: fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima - Ready-to-use merged model
- GGUF Quantized: fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-GGUF - GGUF format for llama.cpp/Ollama
Dataset
Training dataset: GAIR/lima
Please refer to the dataset documentation for licensing and usage restrictions.
Merge with Base Model
To create a standalone merged model:
from unsloth import FastLanguageModel
# Load model with LoRA
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="path/to/lora",
max_seq_length=2048,
dtype=None,
load_in_4bit=True,
)
# Save merged 16-bit model
model.save_pretrained_merged("merged_model", tokenizer, save_method="merged_16bit")
# Or save as GGUF for llama.cpp/Ollama
model.save_pretrained_gguf("model.gguf", tokenizer, quantization_method="q4_k_m")
Framework Versions
- Unsloth: 2025.11.3
- Transformers: 4.57.1
- PyTorch: 2.9.0+cu128
- PEFT: 0.18.0
- TRL: 0.22.2
- Datasets: 4.3.0
License
This model is based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on GAIR/lima. Please refer to the original model and dataset licenses for usage terms.
Credits
Trained by: Farhan Syah
Training pipeline:
- unsloth-finetuning by @farhan-syah
- Unsloth - 2x faster LLM fine-tuning
Base components:
- Base model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
- Training dataset: GAIR/lima by GAIR
Citation
If you use this model, please cite:
@misc{llama_3.2_1b_instruct_bnb_4bit_lima_lora,
author = {Farhan Syah},
title = {Llama-3.2-1B-Instruct-bnb-4bit-lima Fine-tuned with LoRA},
year = {2025},
note = {Fine-tuned using Unsloth: https://github.com/unslothai/unsloth},
howpublished = {\url{https://github.com/farhan-syah/unsloth-finetuning}}
}
- Downloads last month
- 42
Model tree for fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-lora
Base model
meta-llama/Llama-3.2-1B-Instruct
Quantized
unsloth/Llama-3.2-1B-Instruct-bnb-4bit