File size: 2,430 Bytes
d1a1250
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: "apache-2.0"
base_model: "openai/gpt-oss-20b"
tags: ["lora", "unsloth", "peft", "gpt-oss", "fine-tuning"]
language: ["en"]
datasets: ["yahma/alpaca-cleaned"]
library_name: peft
pipeline_tag: text-generation
---
# LoRA Adapter for openai/gpt-oss-20b

This repository hosts a **LoRA adapter** (and tokenizer files) trained on top of **openai/gpt-oss-20b**.

## ✨ What’s inside
- **PEFT type**: LORA
- **LoRA r**: 16
- **LoRA alpha**: 16
- **LoRA dropout**: 0.0
- **Target modules**: q_proj, v_proj, k_proj, up_proj, gate_proj, o_proj, down_proj

## 📚 Datasets
- yahma/alpaca-cleaned

## 🌐 Languages
- en

## 📝 Usage

### (A) Use adapter with the **official base model**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "openai/gpt-oss-20b"
adapter_id = "hwang2006/gpt-oss-20b-alpaca-2pct-lora"

tok = AutoTokenizer.from_pretrained(base)
base_model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, adapter_id)

messages = [
    {"role":"system","content":"You are a helpful assistant."},
    {"role":"user","content":"Quick test?"},
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)

with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9)

print(tok.decode(out[0], skip_special_tokens=True))
```

### (B) 4-bit on the fly (if VRAM is tight)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
base = "openai/gpt-oss-20b"
adapter_id = "hwang2006/gpt-oss-20b-alpaca-2pct-lora"

tok = AutoTokenizer.from_pretrained(base)
base_model = AutoModelForCausalLM.from_pretrained(base, quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(base_model, adapter_id)
```

## ⚠️ Notes
- Use a **compatible base** (architecture & tokenizer) with this LoRA.
- This repo contains **only** adapters/tokenizer, not full model weights.
- License here reflects this adapter’s repository. Ensure the **base model’s license** fits your use.