Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

0000100_adapters.safetensors +3 -0
LICENSE-THIRD-PARTY.md +116 -0
MODEL_CARD.md +167 -0
README.md +243 -0
USAGE.md +38 -0
adapter_config.json +40 -0
adapters.safetensors +3 -0
config.json +17 -0
run_meta.json +10 -0

0000100_adapters.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8968d14e7792a2feebf6e0a346db20bb8b1f7e0bff0d7ce180f128ca7f43fe5
+size 11754630

LICENSE-THIRD-PARTY.md ADDED Viewed

	@@ -0,0 +1,116 @@

+# Third-Party Licenses and Attribution
+This project uses and builds upon the following third-party components:
+## Base Model
+**Qwen/Qwen2.5-Coder-0.5B-Instruct**
+- Source: https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct
+- License: Apache License 2.0
+- Copyright: Qwen Team, Alibaba Cloud
+- Description: Base language model for code generation
+### Apache License 2.0 Summary
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+## MLX Model Weights
+**mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit**
+- Source: https://huggingface.co/mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit
+- License: Apache License 2.0 (inherited from base model)
+- Description: MLX-optimized 4-bit quantized version of Qwen2.5-Coder-0.5B-Instruct
+- Conversion: Community contribution for Apple Silicon optimization
+## Training Dataset
+**flwrlabs/code-alpaca-20k**
+- Source: https://huggingface.co/datasets/flwrlabs/code-alpaca-20k
+- License: Apache License 2.0
+- Description: Code instruction dataset based on Stanford Alpaca methodology
+- Size: 20,000 code instruction-following examples
+## Python Dependencies
+### MLX-LM
+- License: MIT License
+- Description: MLX language model utilities
+- Source: https://github.com/ml-explore/mlx-lm
+### Hugging Face Datasets
+- License: Apache License 2.0
+- Description: Dataset loading and processing library
+- Source: https://github.com/huggingface/datasets
+### Hugging Face Hub
+- License: Apache License 2.0
+- Description: Hugging Face Hub client library
+- Source: https://github.com/huggingface/huggingface_hub
+### PyYAML
+- License: MIT License
+- Description: YAML parser and emitter
+- Source: https://github.com/yaml/pyyaml
+## Disclaimers
+### No Endorsement
+This project is not endorsed by, affiliated with, or sponsored by:
+- Qwen Team or Alibaba Cloud
+- The MLX community
+- flwrlabs or the code-alpaca-20k dataset authors
+- Hugging Face
+### Attribution Requirements
+When using this model or its derivatives:
+1. Maintain attribution to the base model (Qwen2.5-Coder-0.5B-Instruct)
+2. Maintain attribution to the training dataset (code-alpaca-20k)
+3. Include this license file or equivalent attribution
+4. Do not imply endorsement by original authors
+### Modifications
+This project provides:
+- LoRA adapter weights (fine-tuning on top of base model)
+- Training and serving infrastructure
+- Documentation and usage examples
+This project does NOT redistribute:
+- Base model weights (users download from original source)
+- Complete fine-tuned model weights
+- Training dataset (users download from original source)
+## License Compliance
+All components used in this project are licensed under permissive open-source licenses (Apache-2.0, MIT) that allow:
+- Commercial use
+- Modification
+- Distribution
+- Private use
+Users must:
+- Include copyright notices
+- Include license text
+- State changes made
+- Not use trademarks without permission
+## Full License Texts
+### Apache License 2.0
+Full text available at: http://www.apache.org/licenses/LICENSE-2.0
+### MIT License
+Full text available at: https://opensource.org/licenses/MIT
+## Questions
+For questions about licensing or attribution, please open an issue at:
+https://github.com/salakash/Minimalism/issues

MODEL_CARD.md ADDED Viewed

	@@ -0,0 +1,167 @@

+---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
+tags:
+- code
+- coding-assistant
+- mlx
+- lora
+- qwen2.5
+language:
+- en
+pipeline_tag: text-generation
+---
+**Developed By Samiya Kashif, Kashif Salahuddin & Rohan Bhangale & Rpbert Rojek**
+# Minimalism
+Minimalism is a practical coding assistant fine-tuned with LoRA on the code-alpaca-20k dataset. It provides runnable-first responses with structured sections for Solution, Usage, and Sanity Tests.
+## Model Details
+- **Base Model**: [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct)
+- **MLX Weights**: [mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit](https://huggingface.co/mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit)
+- **Training Dataset**: [flwrlabs/code-alpaca-20k](https://huggingface.co/datasets/flwrlabs/code-alpaca-20k)
+- **Training Method**: LoRA (Low-Rank Adaptation)
+- **Framework**: MLX (Apple Silicon optimized)
+- **License**: Apache-2.0
+## Intended Use
+Minimalism is designed for:
+- Code generation and completion
+- Programming assistance and tutoring
+- Quick prototyping and examples
+- Learning programming concepts
+### Response Format
+When asked for code, Minimalism structures responses with:
+1. **Solution**: The main implementation
+2. **Usage**: A minimal runnable example
+3. **Sanity test**: A tiny test snippet (when appropriate)
+This format ensures responses are immediately actionable and testable.
+## Training Details
+- **Dataset Size**: 2,000 examples (configurable)
+- **Training Iterations**: 50 (configurable)
+- **LoRA Rank**: 8
+- **LoRA Alpha**: 16
+- **Learning Rate**: 2e-5
+- **Hardware**: Apple Silicon M1 with 32GB RAM
+### Data Processing
+The training data underwent:
+1. Secret redaction (API keys, private keys, tokens)
+2. Deduplication by content hash
+3. Train/validation split (98/2)
+4. Deterministic truncation for efficiency
+## Usage
+### Installation
+```bash
+pip install mlx-lm
+```
+### Running the Server
+```bash
+python -m mlx_lm.server \
+  --model mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit \
+  --adapter-path salakash/Minimalism \
+  --host 127.0.0.1 \
+  --port 8080
+```
+### API Example
+```bash
+curl http://127.0.0.1:8080/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "Minimalism",
+    "messages": [
+      {"role": "user", "content": "Write a Python function to add two numbers"}
+    ],
+    "max_tokens": 256
+  }'
+```
+### Python Example
+```python
+from mlx_lm import load, generate
+# Load model with adapter
+model, tokenizer = load(
+    "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
+    adapter_path="salakash/Minimalism"
+)
+# Generate response
+prompt = "Write a Python function to reverse a string"
+response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
+print(response)
+```
+## Limitations
+- **Model Size**: 0.5B parameters - suitable for quick tasks but not complex reasoning
+- **Context Length**: Limited by base model's context window
+- **Domain**: Primarily trained on Python code examples
+- **Hardware**: Optimized for Apple Silicon; may not perform optimally on other platforms
+- **Accuracy**: May generate incorrect or insecure code; always review outputs
+## Ethical Considerations
+- **Code Review**: Always review generated code before use in production
+- **Security**: Do not use for security-critical applications without thorough review
+- **Bias**: May reflect biases present in training data
+- **Attribution**: Generated code should be reviewed for licensing implications
+## Attribution
+This model is built upon:
+1. **Base Model**: Qwen/Qwen2.5-Coder-0.5B-Instruct
+   - License: Apache-2.0
+   - Authors: Qwen Team, Alibaba Cloud
+   - No endorsement by original authors is implied
+2. **MLX Conversion**: mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit
+   - Converted for Apple Silicon optimization
+   - Community contribution
+3. **Training Dataset**: flwrlabs/code-alpaca-20k
+   - License: Apache-2.0
+   - Based on Stanford Alpaca methodology
+   - No endorsement by dataset authors is implied
+## Citation
+If you use Minimalism in your research or applications, please cite:
+```bibtex
+@misc{minimalism2024,
+  title={Minimalism: A Practical Coding Assistant},
+  author={Kashif Salahuddin},
+  year={2024},
+  publisher={Hugging Face},
+  howpublished={\url{https://huggingface.co/salakash/Minimalism}}
+}
+```
+## Contact
+- Repository: [github.com/salakash/Minimalism](https://github.com/salakash/Minimalism)
+- Issues: [github.com/salakash/Minimalism/issues](https://github.com/salakash/Minimalism/issues)
+## Disclaimer
+This adapter is provided "as is" without warranty. The authors are not responsible for any damages or issues arising from its use. Always review and test generated code before deployment.

README.md ADDED Viewed

	@@ -0,0 +1,243 @@

+---
+language:
+- en
+license: apache-2.0
+base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
+tags:
+- code
+- coding-assistant
+- lora
+- mlx
+- apple-silicon
+- qwen2.5
+datasets:
+- flwrlabs/code-alpaca-20k
+- m-a-p/Code-Feedback
+library_name: mlx-lm
+pipeline_tag: text-generation
+---
+**Developed By Samiya Kashif, Kashif Salahuddin & Rohan Bhangale**
+## 1. Executive Summary
+**Minimalism** is a specialized coding assistant built as a LoRA (Low-Rank Adaptation) adapter for the Qwen2.5-Coder-0.5B-Instruct base model. Unlike generic coding assistants, Minimalism implements a "runnable-first" philosophy: when users request code, responses are structured with clear **Solution**, **Usage**, and **Sanity test** sections, ensuring developers receive immediately executable code with minimal friction.
+### What Minimalism Is
+- **A LoRA adapter** Trained on code-alpaca-20k dataset
+- **OpenAI-compatible API** for local inference
+- **Lightweight distribution** (~12MB adapter vs. multi-GB full models)
+- **Production-engineered** with automated pipelines, evaluation, and publishing
+## Why Minimalism
+Minimalism is built for a simple, practical goal: **deliver the same outcome with fewer lines of code**.
+Most coding assistants tend to “over-achieve” by producing large, multi-step solutions—even when a smaller, clearer implementation would do. That extra code isn’t free: it increases review effort, maintenance cost, and the surface area where defects can hide.
+**Too Much Code, Too Fast** Teams everywhere are seeing a huge jump in the number of lines of code (LOC). Developers—from interns to seniors—are suddenly writing **5 to 7 times more** than before. At first, it looks like higher productivity. In reality, it often means more bugs.
+There’s a long-standing rule in software engineering:
+> “The more lines of code you have, the higher your probability of introducing bugs.”
+The industry’s oldest truth still stands: the more code you have, the more things can go wrong. And AI-generated code tends to be **verbose and repetitive**, which can inflate LOC without adding real value.
+Minimalism is designed for teams that value **minimalism, clarity, and correctness** over volume.
+### What makes Minimalism different
+* **Minimal LoC by default**
+  Minimalism is optimized to **minimize lines of code while preserving behavior**—it prefers the smallest correct solution that meets the user’s objective.
+* **Internal governance behavior**
+  The model follows a lightweight internal “governance layer” in its response style: avoid unnecessary scaffolding, avoid over-abstraction, keep code focused, and don’t introduce additional complexity that doesn’t improve the result. The governance layer sits between the user request and the model’s final output to enforce **minimalism as a constraint**. It evaluates candidate solutions by measuring **lines of code** and selects the smallest implementation that still satisfies the original requirements. If a shorter variant fails, it automatically falls back to the next-smallest passing candidate, ensuring fewer lines **without sacrificing correctness**.
+* **Practical, runnable output**
+  When you ask for code, Minimalism is tuned toward “runnable-first” answers—clear implementation, a minimal usage example, and a quick sanity check when appropriate.
+### Early validation
+Minimalism was evaluated in a small developer study comparing it with popular coding models on a shared set of tasks. In this pilot, Minimalism showed a **clear reduction in lines of code (up to ~30%)** while producing solutions that **executed correctly and achieved the same intended outcomes** under the evaluation harness.
+> Note: Results depend on task selection, constraints, and how “equivalence” is measured. We recommend validating on your own codebase and standards.
+### Why It Exists
+Developers need coding assistance that:
+1. Provides **runnable code immediately** without extensive explanation
+2. Runs **locally** without cloud dependencies
+3. Maintains **small footprint** for fast iteration
+4. Offers **structured, predictable responses** for automation
+### Who It's For
+- **Individual developers** working on their individual projects.
+- **Small teams** needing local, private coding assistance
+- **Educators** teaching programming with consistent code examples
+- **Researchers** experimenting with LoRA fine-tuning on MLX
+## Quick Start
+### Option 1: Use with MLX
+Install MLX and load the model with adapter:
+```bash
+pip install mlx-lm
+```
+```python
+from mlx_lm import load, generate
+# Load base model with Minimalism adapter
+model, tokenizer = load(
+    "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
+    adapter_path="salakash/Minimalism"
+)
+# Generate code
+prompt = "Write a Python function to calculate factorial"
+response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
+print(response)
+```
+### Option 2: Use with Transformers
+```bash
+pip install transformers torch
+```
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+# Load base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen2.5-Coder-0.5B-Instruct",
+    trust_remote_code=True
+)
+# Load adapter
+model = PeftModel.from_pretrained(base_model, "salakash/Minimalism")
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
+# Generate
+messages = [{"role": "user", "content": "Write a Python function to add two numbers"}]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=256)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Option 3: Web UI with MLX
+Start an OpenAI-compatible server:
+```bash
+# Install mlx-lm if not already installed
+pip install mlx-lm
+# Start server with adapter
+mlx_lm.server \
+  --model mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit \
+  --adapter-path salakash/Minimalism \
+  --port 8080
+```
+Then use with any OpenAI-compatible client:
+```bash
+curl http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
+    "messages": [
+      {"role": "user", "content": "Write a Python function to reverse a string"}
+    ],
+    "max_tokens": 512
+  }'
+```
+Or use with any OpenAI-compatible web UI like:
+- [Open WebUI](https://github.com/open-webui/open-webui)
+- [LibreChat](https://github.com/danny-avila/LibreChat)
+- [ChatGPT-Next-Web](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web)
+Configure the UI to point to `http://localhost:8080` as the API endpoint.
+### Option 4: Hugging Face Inference API
+Use directly via Hugging Face's Inference API (requires HF token):
+```python
+import requests
+API_URL = "https://api-inference.huggingface.co/models/salakash/Minimalism"
+headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
+def query(payload):
+    response = requests.post(API_URL, headers=headers, json=payload)
+    return response.json()
+output = query({
+    "inputs": "Write a Python function to check if a number is prime",
+    "parameters": {"max_new_tokens": 256}
+})
+print(output)
+```
+## Response Format
+Minimalism provides structured, runnable-first responses:
+- **Solution**: The main implementation code
+- **Usage**: A minimal runnable example
+- **Sanity test**: A tiny test snippet (when appropriate)
+## Comparison
+Minimalism achieved the same objective in **~8-10 lines of code**, while a standard LLM typically produced **22–26 lines** for the equivalent solution.
+### Minimalism
+![alt text](image-1.png)
+### Standard Coding Agent
+![alt text](image.png)
+## Documentation
+For comprehensive technical details, see:
+- **[PYTHON_DEVELOPMENT_GUIDE.md](PYTHON_DEVELOPMENT_GUIDE.md)**: Complete Python guide covering all concepts, libraries, and techniques used in the project
+- **[ARCHITECTURE.md](ARCHITECTURE.md)**: Complete system architecture, building blocks, epics & stories, technical stack, and design decisions
+- **[HUGGINGFACE_UPLOAD_GUIDE.md](HUGGINGFACE_UPLOAD_GUIDE.md)**: Step-by-step guide for uploading to HuggingFace Hub
+- **[MODEL_CARD.md](MODEL_CARD.md)**: Model details, training configuration, and usage guidelines
+- **[QUICK_RUN_GUIDE.md](QUICK_RUN_GUIDE.md)**: Quick start guide for getting up and running
+## Base Model & Dataset
+- **Base Model**: [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct)
+- **MLX Weights**: [mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit](https://huggingface.co/mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit)
+- **Dataset**: [flwrlabs/code-alpaca-20k](https://huggingface.co/datasets/flwrlabs/code-alpaca-20k)
+- **Dataset**: [m-a-p/Code-Feedback](https://huggingface.co/datasets/m-a-p/Code-Feedback)
+## License
+This project publishes only adapter artifacts and configuration. The base model and dataset have their own licenses:
+- Base Model: Apache-2.0 (Qwen/Qwen2.5-Coder-0.5B-Instruct)
+- Dataset: Apache-2.0 (flwrlabs/code-alpaca-20k)
+See `LICENSE-THIRD-PARTY.md` for complete attribution.
+## Acknowledgments
+- Qwen team for the excellent base model
+- MLX community for the Apple Silicon optimizations
+- flwrlabs for the code-alpaca-20k dataset
+- Multimodel Art Projection for m-a-p/Code-Feedback

USAGE.md ADDED Viewed

	@@ -0,0 +1,38 @@

+# Minimalism Usage
+## Quick Start
+### 1. Install dependencies
+```bash
+pip install mlx-lm
+```
+### 2. Start the server
+```bash
+# Using the base model with this adapter
+python -m mlx_lm.server \
+  --model mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit \
+  --adapter-path . \
+  --host 127.0.0.1 \
+  --port 8080
+```
+### 3. Test with curl
+```bash
+curl http://127.0.0.1:8080/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "Minimalism",
+    "messages": [
+      {"role": "user", "content": "Write a Python function to add two numbers"}
+    ],
+    "max_tokens": 256
+  }'
+```
+## Response Format
+Minimalism provides runnable-first responses with these sections:
+- **Solution**: Main implementation
+- **Usage**: Smallest runnable example
+- **Sanity test**: Tiny test snippet (when appropriate)

adapter_config.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+    "adapter_path": "outputs/adapters/dev",
+    "batch_size": 4,
+    "config": null,
+    "data": "data/training_ready",
+    "fine_tune_type": "lora",
+    "grad_accumulation_steps": 1,
+    "grad_checkpoint": false,
+    "iters": 100,
+    "learning_rate": 2e-05,
+    "lora_parameters": {
+        "rank": 8,
+        "dropout": 0.0,
+        "scale": 20.0
+    },
+    "lr_schedule": null,
+    "mask_prompt": false,
+    "max_seq_length": 2048,
+    "model": "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
+    "num_layers": 16,
+    "optimizer": "adam",
+    "optimizer_config": {
+        "adam": {},
+        "adamw": {},
+        "muon": {},
+        "sgd": {},
+        "adafactor": {}
+    },
+    "project_name": null,
+    "report_to": null,
+    "resume_adapter_file": null,
+    "save_every": 100,
+    "seed": 0,
+    "steps_per_eval": 200,
+    "steps_per_report": 10,
+    "test": false,
+    "test_batches": 500,
+    "train": true,
+    "val_batches": 25
+}

adapters.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8968d14e7792a2feebf6e0a346db20bb8b1f7e0bff0d7ce180f128ca7f43fe5
+size 11754630

config.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "model_type": "qwen2",
+  "adapter_type": "lora",
+  "base_model": "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
+  "base_model_reference": "Qwen/Qwen2.5-Coder-0.5B-Instruct",
+  "task": "text-generation",
+  "framework": "mlx",
+  "lora_rank": 8,
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "trained_on": "flwrlabs/code-alpaca-20k",
+  "training_samples": 2000,
+  "training_iterations": 100,
+  "model_name": "Minimalism",
+  "description": "LoRA adapter for Qwen2.5-Coder-0.5B-Instruct trained on code-alpaca-20k dataset. Provides runnable-first coding assistance.",
+  "license": "apache-2.0"
+}

run_meta.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "model_id": "mlx-community/Qwen2.5-Coder-0.5B-Instruct-4bit",
+  "dataset_id": "flwrlabs/code-alpaca-20k",
+  "iters": 100,
+  "rank": 8,
+  "alpha": 16,
+  "dropout": 0.05,
+  "learning_rate": 2e-05,
+  "timestamp": "2025-12-31T15:18:04.451022Z"
+}