--- library_name: transformers tags: - llama-3.2 - causal-lm - code - python - peft - qlora --- # Model Card for llama32-1b-python-docstrings-qlora A parameter-efficiently fine-tuned adapter on top of `meta-llama/Llama-3.2-1B-Instruct` for generating concise one-line Python docstrings from function bodies. ## Model Details ### Model Description - **Developed by:** Abdullah Al-Housni - **Model type:** Causal language model with LoRA/QLoRA adapters - **Language(s):** Python code as input, English docstrings as output - **License:** Same as `meta-llama/Llama-3.2-1B-Instruct` (Meta Llama 3.2 Community License) - **Finetuned from model:** `meta-llama/Llama-3.2-1B-Instruct` The model is trained to take a Python function definition and generate a concise, one-line docstring describing what the function does. ## Uses ### Direct Use - Automatically generate one-line Python docstrings for functions. - Improve or bootstrap documentation in Python codebases. - Educational use for learning how to summarize code behavior. Typical usage pattern: - Input: Python function body (source code). - Output: Single-sentence English description suitable as a docstring. ### Out-of-Scope Use - Generating full, multi-paragraph API documentation. - Security auditing or correctness guarantees for code. - Use outside Python (e.g., other programming languages) without additional fine-tuning. - Any safety-critical application where incorrect summaries could cause harm. ## Bias, Risks, and Limitations - The model can produce **incorrect or incomplete summaries**, especially for complex or ambiguous functions. - It may imitate noisy or low-quality patterns from the training data (e.g., overly short or cryptic docstrings). - It does **not** understand project-specific context, invariants, or business logic; outputs should be reviewed by a human developer. ### Recommendations - Use the model as an **assistive tool**, not an authoritative source. - Always review and edit generated docstrings before committing to production code. - For non-Python or highly domain-specific code, consider additional fine-tuning on in-domain examples. ## How to Get Started with the Model Example with 🤗 Transformers and PEFT (LoRA adapter): ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel base_model_id = "meta-llama/Llama-3.2-1B-Instruct" adapter_id = "Abdul1102/llama32-1b-python-docstrings-qlora" tokenizer = AutoTokenizer.from_pretrained(base_model_id) model = AutoModelForCausalLM.from_pretrained(base_model_id, device_map="auto") model = PeftModel.from_pretrained(model, adapter_id) def make_prompt(code: str) -> str: return f'Write a one-line Python docstring for this function:\n\n{code}\n\n"""' code = "def add(a, b):\n return a + b" inputs = tokenizer(make_prompt(code), return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=32, do_sample=False) text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(text) ``` ## Training Details ### Training Data - Dataset: Python subset of CodeSearchNet (`Nan-Do/code-search-net-python`) - Inputs: `code` column (full Python function body) - Targets: First non-empty line of `docstring` - A filtered subset of ~1,000–2,000 examples was used for efficient QLoRA fine-tuning ### Training Procedure - Objective: Causal language modeling (predict the docstring continuation) - Method: QLoRA (4-bit quantized base model with LoRA adapters) - Precision: 4-bit quantized weights, bf16 compute - Epochs: 1 - Max sequence length: 256–512 tokens #### Training Hyperparameters - Learning rate: ~2e-4 (adapter weights only) - Epochs: 1 - Optimizer: AdamW via Hugging Face `Trainer` - LoRA rank: 16 - LoRA alpha: 32 - LoRA dropout: 0.05 --- ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data Held-out test split from the same CodeSearchNet Python dataset, using identical `code` → one-line docstring mapping. #### Factors - Function size and complexity - Variety in docstring writing styles - Presence of short or noisy docstrings #### Metrics - BLEU (sacreBLEU): strict n-gram overlap, sensitive to paraphrasing - ROUGE (ROUGE-1 / ROUGE-2 / ROUGE-L): better for short summaries ### Results Approximate performance on ~50 held-out samples: - BLEU: ~12.4 - ROUGE-1: ~0.78 - ROUGE-2: ~0.74 - ROUGE-L: ~0.78 #### Summary The model frequently reproduces or closely paraphrases the correct docstring. Occasional failures include echoing part of the prompt or returning an empty string. Strong performance for a 1B model trained briefly on a small dataset. --- ## Model Examination Not applicable. --- ## Environmental Impact - Hardware Type: Google Colab GPU (T4/L4) - Hours Used: ~0.5–1 hour total - Cloud Provider: Google Colab - Compute Region: US - Carbon Emitted: Not estimated (very low due to minimal training time) --- ## Technical Specifications ### Model Architecture and Objective - Base model: Llama 3.2 1B Instruct - Architecture: Decoder-only transformer - Objective: Causal language modeling - Parameter-efficient fine-tuning using LoRA (rank 16) ### Compute Infrastructure #### Hardware Single Google Colab GPU (T4 or L4) #### Software - Python - PyTorch - Hugging Face Transformers - PEFT - bitsandbytes - Datasets --- ## Citation Not applicable. --- ## Glossary Not applicable. --- ## More Information See the Hugging Face model page for updates or usage examples. --- ## Model Card Authors Abdullah Al-Housni --- ## Model Card Contact Available through the Hugging Face model repository.