metadata
language:
- en
license: mit
tags:
- ernie
- ernie-4.5
- math
- reasoning
- unsloth
- lora
- fine-tuned
datasets:
- nvidia/Nemotron-RL-math-OpenMathReasoning
base_model: unsloth/ERNIE-4.5-21B-A3B-PT
metrics:
- loss
model-index:
- name: naazimsnh02/ernie-45-math-finetuned
results:
- task:
type: text-generation
name: Mathematical Reasoning
dataset:
name: Nemotron-RL-math-OpenMathReasoning
type: nvidia/Nemotron-RL-math-OpenMathReasoning
metrics:
- type: loss
value: 0.6046
name: Final Training Loss
- type: loss
value: 0.6114514470100403
name: Final Validation Loss
- type: loss
value: 0.6114514470100403
name: Best Validation Loss
ERNIE-4.5 Fine-tuned for Mathematical Reasoning
This model is a fine-tuned version of unsloth/ERNIE-4.5-21B-A3B-PT on the nvidia/Nemotron-RL-math-OpenMathReasoning dataset.
Model Description
This model specializes in solving complex mathematical problems including:
- Algebra (equations, factoring, systems)
- Calculus (derivatives, integrals)
- Geometry and trigonometry
- Word problems requiring multi-step reasoning
- Competition-level mathematics
Training Details
Training Data
- Dataset: nvidia/Nemotron-RL-math-OpenMathReasoning
- Training Samples: 7,600
- Evaluation Samples: 400
- Format: Conversational (ERNIE-4.5 format)
Training Configuration
- Base Model: unsloth/ERNIE-4.5-21B-A3B-PT (21B parameters)
- Method: QLoRA (4-bit quantization + LoRA)
- LoRA Rank: 16
- LoRA Alpha: 16
- Trainable Parameters: 355,090,432 (3.11% of total)
Hyperparameters
- Batch Size: 4 (per device)
- Gradient Accumulation: 2
- Effective Batch Size: 8
- Learning Rate: 0.0002
- LR Scheduler: Cosine with warmup
- Warmup Ratio: 0.05
- Training Steps: 707 (stopped early for optimal performance)
- Optimizer: AdamW 8-bit
- Precision: BF16
Training Results
- Final Training Loss: 0.6046
- Final Validation Loss: 0.6115
- Best Validation Loss: 0.6115
- Loss Improvement: 9.2% (from 0.6732 to 0.6115)
- Training Time: 4.64 hours
- GPU: NVIDIA A100-SXM4-40GB
- Peak Memory: 19.375 GB / 39.494 GB (49.058%)
Framework
- Unsloth: 2x faster training, 70% less memory
- Modal: Serverless GPU infrastructure (40GB A100)
- Transformers: 4.56.2
- TRL: 0.22.2
Usage
from unsloth import FastModel
# Load the fine-tuned model
model, tokenizer = FastModel.from_pretrained(
model_name="naazimsnh02/ernie-45-math-finetuned",
max_seq_length=2048,
load_in_4bit=True,
full_finetuning=False,
)
# Prepare for inference
FastModel.for_inference(model)
# Solve a math problem
messages = [{
"role": "user",
"content": "Solve the equation: 2x² + 5x - 3 = 0"
}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", padding=True).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Example Output
Input:
Solve the equation: x² + 5x + 6 = 0
Output:
To solve x² + 5x + 6 = 0, we can factor:
Find two numbers that multiply to 6 and add to 5:
2 and 3 work because 2 × 3 = 6 and 2 + 3 = 5
Factored form:
(x + 2)(x + 3) = 0
Setting each factor to zero:
x + 2 = 0 → x = -2
x + 3 = 0 → x = -3
Therefore: \boxed{x = -2, -3}
Training Progress
| Step | Training Loss | Validation Loss |
|---|---|---|
| 100 | 0.589 | 0.673 |
| 200 | 0.661 | 0.648 |
| 300 | 0.637 | 0.646 |
| 400 | 0.557 | 0.640 |
| 500 | 0.587 | 0.633 |
| 600 | 0.589 | 0.617 |
| 700 | 0.605 | 0.611 |
Training stopped at step 700 for optimal validation loss.
Training Infrastructure
- Platform: Modal (modal.com)
- GPU: 40GB A100
- Training Duration: ~4.6 hours
- Checkpointing: Every 100 steps
- Evaluation: Every 100 steps
Limitations
- Optimized for mathematical reasoning; may not perform as well on other domains
- Trained on English language problems only
- Best results with problems similar to training data format
- Requires GPU for inference (4-bit quantization)
Citation
@misc{ernie45-math-2025,
title={ERNIE-4.5 Fine-tuned for Mathematical Reasoning},
author={naazimsnh02},
year={2025},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/naazimsnh02/ernie-45-math-finetuned}}
}
Acknowledgments
- ERNIE Team for the base model
- Unsloth for optimization framework
- NVIDIA for the Nemotron-RL dataset
- Modal for GPU infrastructure
- ERNIE AI Developer Challenge for the opportunity
License
MIT License - See repository for details
Trained with ❤️ using Unsloth and Modal