File size: 7,700 Bytes
4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a c555cb0 4d8819a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
---
base_model: Qwen/Qwen1.5-1.8B
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen1.5-1.8B
- lora
- transformers
- code-generation
- python
- reasoning
- synthetic-data
language:
- en
license: apache-2.0
---
# Qwen 1.5 1.8B - Python Code Generation with Step-by-Step Reasoning
A fine-tuned version of Qwen 1.5 1.8B that generates Python code with detailed step-by-step reasoning explanations. This model teaches users how to solve programming problems by explaining its thought process before writing code.
## Model Details
### Model Description
This model is fine-tuned using QLoRA on a synthetic dataset of 1,000 Python programming problems enriched with step-by-step reasoning. The model learns to explain its problem-solving approach before generating code, making it ideal for educational purposes and transparent code generation.
- **Developed by:** [Your Name/Organization]
- **Model type:** Causal Language Model (Fine-tuned with LoRA adapters)
- **Language(s):** English (code generation in Python)
- **License:** Apache 2.0
- **Finetuned from model:** Qwen/Qwen1.5-1.8B
### Model Sources
- **Base Model:** [Qwen/Qwen1.5-1.8B](https://huggingface.co/Qwen/Qwen1.5-1.8B)
- **Training Data:** Synthetic dataset generated from MBPP and CodeAlpaca using Llama 3.1 8B
## Uses
### Direct Use
This model is designed for:
- **Educational code generation**: Teaching programming concepts through explained solutions
- **Transparent AI coding assistants**: Understanding how the model approaches problems
- **Code explanation**: Generating step-by-step breakdowns of problem-solving strategies
- **Learning tool**: Helping beginners understand algorithmic thinking
### Example Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen1.5-1.8B",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-1.8B")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "[YOUR_MODEL_PATH]")
# Generate code with reasoning
prompt = "Write a Python function to find the longest common prefix in a list of strings."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Out-of-Scope Use
- **Production-critical systems**: This model is fine-tuned on a limited dataset and should not be used for safety-critical applications
- **Non-Python languages**: The model is specifically trained on Python problems
- **Complex software architecture**: Best suited for algorithm-level problems, not large-scale system design
- **Security-sensitive code**: Should not be used for generating cryptographic or security-critical code without expert review
## Bias, Risks, and Limitations
### Limitations
1. **Dataset size**: Trained on only 1,000 examples, may not generalize to all problem types
2. **Teacher model quality**: Synthetic data generated by Llama 3.1 8B may contain errors
3. **Small test set**: Evaluated on only 7 problems, true generalization unknown
4. **Potential overfitting**: High accuracy on test set may indicate memorization rather than true learning
5. **No code validation**: Training data was not validated for correctness before fine-tuning
### Recommendations
- Always review and test generated code before using in production
- Use as a learning tool rather than a replacement for human expertise
- Validate outputs against test cases and edge cases
- Consider the model's explanations as one perspective, not absolute truth
## Training Details
### Training Data
- **Source datasets**: MBPP (Mostly Basic Programming Problems) and CodeAlpaca
- **Dataset size**: 1,000 Python programming problems
- **Data generation**: Synthetic step-by-step reasoning generated using Llama 3.1 8B Instant via Groq API
- **Data structure**: Each example contains:
- Original programming problem
- Step-by-step reasoning (problem understanding, algorithm design, implementation strategy)
- Python solution
### Training Procedure
#### Fine-tuning Method
- **Technique**: QLoRA (Quantized Low-Rank Adaptation)
- **Quantization**: 4-bit quantization for memory efficiency
- **LoRA Configuration**:
- Rank (r): 8
- Alpha: 16
- Target modules: q_proj, k_proj, v_proj, o_proj (attention layers)
- Dropout: 0.05
#### Training Hyperparameters
- **Training epochs**: 3
- **Learning rate**: 2e-4
- **Optimizer**: paged_adamw_8bit
- **Batch size**: [Specify if known]
- **Training regime**: Mixed precision (4-bit quantization)
- **Hardware**: Google Colab T4 GPU (free tier)
- **Framework**: PEFT 0.17.1, Transformers, bitsandbytes
#### Training Time
- Approximately [X hours] on Google Colab T4 GPU
## Evaluation
### Testing Data & Metrics
#### Testing Data
- **Test set size**: 7 diverse Python programming problems
- **Problem types**: Mix of algorithmic challenges from the training distribution
#### Metrics
- **Primary metric**: Pass@1 (functional correctness - does the generated code execute correctly?)
- **Secondary metric**: Reasoning structure presence (does output include step-by-step explanation?)
### Results
| Metric | Base Model (Qwen 1.5 1.8B) | Fine-tuned Model |
|--------|---------------------------|------------------|
| Pass@1 | 75% | 100% |
| Reasoning Structure | Inconsistent | 100% |
**Key Findings**:
- **+25 percentage point improvement** in functional correctness
- **100% of outputs** now include structured step-by-step reasoning
- All 7 test cases passed successfully
**Important Note**: Results are based on a small test set (7 examples). Larger-scale evaluation needed to confirm generalization.
## Environmental Impact
- **Hardware Type**: NVIDIA T4 GPU (Google Colab)
- **Hours used**: ~[X hours for fine-tuning]
- **Cloud Provider**: Google Cloud Platform
- **Compute Region**: [Specify if known]
- **Carbon Emitted**: Minimal due to use of QLoRA on single T4 GPU
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute).
## Technical Specifications
### Model Architecture
- **Base architecture**: Qwen 1.5 1.8B (Transformer decoder)
- **Fine-tuning method**: LoRA adapters on attention layers
- **Total parameters**: 1.8B (base) + ~4.7M (LoRA adapters)
- **Trainable parameters**: ~4.7M (0.26% of total)
### Compute Infrastructure
#### Hardware
- GPU: NVIDIA T4 (16GB VRAM)
- Platform: Google Colab (free tier)
#### Software
- PEFT 0.17.1
- Transformers
- bitsandbytes (for 4-bit quantization)
- PyTorch
- Groq API (for synthetic data generation)
## Project Insights
### What Worked Well
- Cross-model knowledge distillation (8B teacher → 1.8B student)
- QLoRA enabled fine-tuning on free-tier GPU
- Structured prompts for synthetic data generation
- Teaching reasoning process alongside code generation
### Future Improvements
1. **Better teacher model**: Use Llama 3.1 70B for higher-quality synthetic data
2. **Data validation**: Verify all generated code executes correctly before training
3. **Larger dataset**: Scale to 5,000-10,000 examples
4. **Robust evaluation**: Test on 50-100 problems from benchmarks like HumanEval
5. **Higher LoRA rank**: Experiment with rank 16 or 32 for more capacity
## Citation
If you use this model, please cite:
```bibtex
@misc{qwen15-code-reasoning,
author = {[Rachit Verma]},
title = {Qwen 1.5 1.8B Fine-tuned for Python Code Generation with Reasoning},
year = {2025},
publisher = {HuggingFace},
}
```
## Model Card Authors
[Rachit Verma]
|