|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
library_name: transformers.js |
|
|
tags: |
|
|
- code |
|
|
- python |
|
|
- maincoder |
|
|
- code-generation |
|
|
- reinforcement-learning |
|
|
- mcpo |
|
|
- onnx |
|
|
pipeline_tag: text-generation |
|
|
base_model: Maincode/Maincoder-1B |
|
|
--- |
|
|
<img src="https://huggingface.co/datasets/Maincode/assets/resolve/e51154e034201be1a5dad0e9c8de31d8b9f17643/maincoder_logo.png" alt="" width="1250"> |
|
|
|
|
|
[**Maincoder-1B-ONNX**](https://maincode.com/maincoder/) is the ONNX-optimized version of [Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B), a code-focused language model optimized for code generation and completion tasks. This version enables fast inference using ONNX Runtime in Python and runs directly in the browser with Transformers.js. |
|
|
|
|
|
# Key Features |
|
|
|
|
|
- **ONNX Optimized**: Efficient inference with ONNX Runtime and KV-cache support |
|
|
- **Cross-Platform**: Run in Python, Node.js, or directly in the browser |
|
|
- **Code Generation**: Optimized for Python code completion and generation tasks. |
|
|
- **Compact Size**: 1 billion parameters, lightweight enough to run on consumer hardware. |
|
|
- **SOTA Performance**: State-of-the-art performance on Python coding benchmarks HumanEval, HumanEval+ and MBPP+. |
|
|
|
|
|
# Benchmark Results |
|
|
|
|
|
<img src="https://huggingface.co/datasets/Maincode/assets/resolve/main/performance_h.png" alt="Benchmark Performance Across Baseline LLMs" width="1050"> |
|
|
|
|
|
| Model | HumanEval | HumanEval+ | MBPP+ | MMLU | GSM8K | |
|
|
|---|---:|---:|---:|---:|---:| |
|
|
| [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B) | **0.7622** | **0.7256** | **0.7090** | 0.3054 | 0.2976 | |
|
|
| [deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct) | 0.5610 | 0.5305 | 0.6217 | 0.2705 | 0.0413 | |
|
|
| [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) | 0.5366 | 0.5000 | 0.6799 | **0.5928** | 0.5505 | |
|
|
| [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) | 0.4634 | 0.4451 | 0.6561 | 0.4984 | 0.4944 | |
|
|
| [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) | 0.4024 | 0.3780 | 0.5582 | 0.5571 |**0.6865** | |
|
|
|
|
|
# Model Overview |
|
|
|
|
|
Maincoder uses a modern transformer decoder architecture with: |
|
|
|
|
|
- **Rotary Position Embeddings**: With theta of 1,000,000. |
|
|
- **RMSNorm**: Pre-normalization for stable training. |
|
|
- **Grouped Query Attention**: 4:1 ratio of query to key-value heads. |
|
|
- **QK Normalization**: RMSNorm applied to attention queries and keys. |
|
|
- **SwiGLU MLP**: Gated linear units with SiLU activation. |
|
|
|
|
|
| Attribute | Value | |
|
|
|-----------|-------| |
|
|
| Parameters | 1B | |
|
|
| Hidden Size | 1536 | |
|
|
| Layers | 32 | |
|
|
| Attention Heads | 16 (4 KV heads) | |
|
|
| Head Dimension | 96 | |
|
|
| Vocabulary Size | 151,936 | |
|
|
| Context Length | 2,048 | |
|
|
| Format | ONNX | |
|
|
|
|
|
# Usage |
|
|
|
|
|
## Python (ONNX Runtime) |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install optimum[onnxruntime] transformers |
|
|
``` |
|
|
|
|
|
For GPU acceleration: |
|
|
|
|
|
```bash |
|
|
pip install optimum[onnxruntime-gpu] |
|
|
``` |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
from optimum.onnxruntime import ORTModelForCausalLM |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
# Load the ONNX model with KV-cache support |
|
|
model = ORTModelForCausalLM.from_pretrained( |
|
|
"Maincode/Maincoder-1B-ONNX", |
|
|
file_name="decoder_with_past_model.onnx", |
|
|
use_cache=True |
|
|
) |
|
|
|
|
|
# Load the tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("Maincode/Maincoder-1B-ONNX") |
|
|
|
|
|
# Code completion example |
|
|
prompt = '''def fibonacci(n: int) -> int: |
|
|
"""Return the n-th Fibonacci number.""" |
|
|
''' |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=128, |
|
|
temperature=0.2, |
|
|
do_sample=True, |
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### GPU Acceleration |
|
|
|
|
|
```python |
|
|
from optimum.onnxruntime import ORTModelForCausalLM |
|
|
|
|
|
model = ORTModelForCausalLM.from_pretrained( |
|
|
"Maincode/Maincoder-1B-ONNX", |
|
|
use_cache=True, |
|
|
file_name="decoder_with_past_model.onnx", |
|
|
provider="CUDAExecutionProvider" |
|
|
) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## JavaScript (Transformers.js) |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
npm install @huggingface/transformers |
|
|
``` |
|
|
|
|
|
### Node.js |
|
|
|
|
|
```javascript |
|
|
import { AutoModelForCausalLM, AutoTokenizer } from '@huggingface/transformers'; |
|
|
|
|
|
// Load the tokenizer and model |
|
|
const tokenizer = await AutoTokenizer.from_pretrained('Maincode/Maincoder-1B-ONNX'); |
|
|
const model = await AutoModelForCausalLM.from_pretrained('Maincode/Maincoder-1B-ONNX', { |
|
|
subfolder: '.', |
|
|
model_file_name: 'decoder_with_past_model', |
|
|
use_external_data_format: true, |
|
|
|
|
|
}); |
|
|
|
|
|
// Code completion example |
|
|
const prompt = `def fibonacci(n: int) -> int: |
|
|
"""Return the n-th Fibonacci number.""" |
|
|
`; |
|
|
|
|
|
const inputs = await tokenizer(prompt, { return_tensors: 'pt' }); |
|
|
|
|
|
const outputs = await model.generate({ |
|
|
input_ids: inputs.input_ids, |
|
|
attention_mask: inputs.attention_mask, |
|
|
max_new_tokens: 128, |
|
|
temperature: 0.2, |
|
|
do_sample: true, |
|
|
}); |
|
|
|
|
|
const decoded = tokenizer.decode(outputs[0], { skip_special_tokens: true }); |
|
|
console.log(decoded); |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Code Completion Examples |
|
|
|
|
|
```python |
|
|
# Function completion |
|
|
prompt = '''def quicksort(arr: list) -> list: |
|
|
"""Sort a list using the quicksort algorithm.""" |
|
|
''' |
|
|
|
|
|
# Class completion |
|
|
prompt = '''class BinarySearchTree: |
|
|
"""A binary search tree implementation.""" |
|
|
|
|
|
def __init__(self): |
|
|
''' |
|
|
|
|
|
# Algorithm implementation |
|
|
prompt = '''def dijkstra(graph: dict, start: str, end: str) -> tuple: |
|
|
"""Find the shortest path using Dijkstra's algorithm. |
|
|
|
|
|
Args: |
|
|
graph: Adjacency list representation of the graph |
|
|
start: Starting node |
|
|
end: Target node |
|
|
|
|
|
Returns: |
|
|
Tuple of (distance, path) |
|
|
""" |
|
|
''' |
|
|
``` |
|
|
|
|
|
# Additional Notes |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Context length limited to 2,048 tokens |
|
|
- Primarily optimized for Python, performance may vary on other languages |
|
|
- May generate code with bugs or security issues - always review generated code |
|
|
- Browser performance depends on device capabilities |
|
|
|
|
|
<div style="margin-left:14px; border-left:4px solid #3b82f6; background:rgba(59,130,246,0.08); padding:8px 10px; border-radius:8px; font-size:0.92em; margin:10px 0;"> |
|
|
<strong>Disclaimer</strong>: This model has <strong>not</strong> undergone any alignment or safety tuning (e.g., RLHF/RLAIF, DPO, or safety fine-tuning). Outputs may be unsafe or biased. Please use appropriate safeguards and evaluate carefully for your use case. |
|
|
</div> |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{maincoder2025, |
|
|
title = {Maincoder-1B: A High-Performance 1B Parameter Coding Model}, |
|
|
author = {Maincode Team}, |
|
|
year = {2025}, |
|
|
organization = {Maincode}, |
|
|
howpublished = {\url{https://huggingface.co/Maincode/Maincoder-1B}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Related Models |
|
|
|
|
|
- [Maincode/Maincoder-1B](https://huggingface.co/Maincode/Maincoder-1B) - Original PyTorch model |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions, issues, or collaboration inquiries, please visit [Maincode](https://maincode.com). |
|
|
|