README.md · abdou-u/MNLP_M2_quantized

MNLP_M2_quantized_model / README.md

abdou-u

Update README.md

2e4cd62 verified 9 months ago

preview code

raw

history blame contribute delete

3.42 kB

	---
	library_name: transformers
	tags: [quantization, qwen3, qlora, causal-lm, low-rank-adapters, 4bit, bitsandbytes, peft, efficient-finetuning]
	---

	# Qwen3-0.6B Quantized with QLoRA for Reasoning Tasks

	This is a 4-bit quantized version of `Qwen/Qwen3-0.6B-Base`, fine-tuned using LoRA adapters on multiple MCQA-style reasoning datasets. The model was optimized using QLoRA, a parameter-efficient tuning method with minimal memory footprint and minimal accuracy loss.

	## Model Details

	### Model Description

	This model is:
	- A quantized version of `Qwen/Qwen3-0.6B-Base` using `bitsandbytes` 4-bit NormalFloat (nf4)
	- Fine-tuned using Low-Rank Adaptation (LoRA) with rank 8
	- Adapted to multiple-choice reasoning datasets like AQuA-RAT and TheoremQA
	- Fully compatible with Hugging Face Transformers

	- Developed by: Ahmed Abdelmalek (EPFL CS-552 Project)
	- Model type: Causal Language Model
	- Language(s): English
	- License: Apache 2.0
	- Fine-tuned from model: `Qwen/Qwen3-0.6B-Base`

	### Model Sources

	- [Repository](https://huggingface.co/Qwen/Qwen3-0.6B-Base)

	## Uses

	### Direct Use

	You can directly use this model for MCQA-style question-answering tasks using generation.

	### Out-of-Scope Use

	- Not intended for open-ended generation or safety-critical applications
	- Not intended for real-time or commercial deployment without evaluation

	## Bias, Risks, and Limitations

	- Inherits biases from its base model and training data (e.g., reasoning datasets)
	- May fail on adversarial or out-of-distribution logic tasks

	### Recommendations

	Evaluate the model against your specific reasoning task before production use.

	## How to Get Started with the Model

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "your-username/MNLP_M2_quantized_model"

	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

	prompt = "Question: What is 3 + 5?
	Options:
	A) 6
	B) 8
	C) 9
	D) 10
	Answer:"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=50)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Training Details

	### Training Data

	- Processed versions of AQuA-RAT, TheoremQA, and custom MCQA datasets
	- Unified into a single format with rationale-enhanced prompts

	### Training Procedure

	- Precision: fp16
	- Quantization: 4-bit nf4 + double quant + float16 compute
	- Adapter Type: LoRA (r=8, α=16, dropout=0.05)
	- Base model frozen

	#### Training Hyperparameters

	- Epochs: 3
	- Batch size: 4
	- Grad accum steps: 2
	- Optimizer: paged_adamw_8bit

	## Evaluation

	### Testing Data

	Validation set with 1000 samples held out from the unified dataset.

	### Metrics

	- Accuracy / F1 (to be reported in evaluation phase)

	## Environmental Impact

	- Hardware: Google Colab Pro, GPU A100
	- Hours used: ~6–7 hours
	- Carbon Emitted: Estimated with [MLCO2](https://mlco2.github.io/impact#compute)

	## Technical Specifications

	### Architecture

	- Qwen3-0.6B base
	- 28-layer transformer with rotary positional encoding and 16 heads

	### Compute Infrastructure

	- Hardware: Colab A100 GPU, High RAM
	- Software: Python 3.10, PyTorch 2.2.2, Transformers 4.51.3

	## Contact

	- Author: Ahmed Abdelmalek
	- Email: ahmed.abdelmalek@epfl.ch