naazimsnh02's picture
Upload ERNIE-4.5 math fine-tuned model - 707 steps, val_loss=0.6115
01a00e0 verified
metadata
language:
  - en
license: mit
tags:
  - ernie
  - ernie-4.5
  - math
  - reasoning
  - unsloth
  - lora
  - fine-tuned
datasets:
  - nvidia/Nemotron-RL-math-OpenMathReasoning
base_model: unsloth/ERNIE-4.5-21B-A3B-PT
metrics:
  - loss
model-index:
  - name: naazimsnh02/ernie-45-math-finetuned
    results:
      - task:
          type: text-generation
          name: Mathematical Reasoning
        dataset:
          name: Nemotron-RL-math-OpenMathReasoning
          type: nvidia/Nemotron-RL-math-OpenMathReasoning
        metrics:
          - type: loss
            value: 0.6046
            name: Final Training Loss
          - type: loss
            value: 0.6114514470100403
            name: Final Validation Loss
          - type: loss
            value: 0.6114514470100403
            name: Best Validation Loss

ERNIE-4.5 Fine-tuned for Mathematical Reasoning

This model is a fine-tuned version of unsloth/ERNIE-4.5-21B-A3B-PT on the nvidia/Nemotron-RL-math-OpenMathReasoning dataset.

Model Description

This model specializes in solving complex mathematical problems including:

  • Algebra (equations, factoring, systems)
  • Calculus (derivatives, integrals)
  • Geometry and trigonometry
  • Word problems requiring multi-step reasoning
  • Competition-level mathematics

Training Details

Training Data

  • Dataset: nvidia/Nemotron-RL-math-OpenMathReasoning
  • Training Samples: 7,600
  • Evaluation Samples: 400
  • Format: Conversational (ERNIE-4.5 format)

Training Configuration

  • Base Model: unsloth/ERNIE-4.5-21B-A3B-PT (21B parameters)
  • Method: QLoRA (4-bit quantization + LoRA)
  • LoRA Rank: 16
  • LoRA Alpha: 16
  • Trainable Parameters: 355,090,432 (3.11% of total)

Hyperparameters

  • Batch Size: 4 (per device)
  • Gradient Accumulation: 2
  • Effective Batch Size: 8
  • Learning Rate: 0.0002
  • LR Scheduler: Cosine with warmup
  • Warmup Ratio: 0.05
  • Training Steps: 707 (stopped early for optimal performance)
  • Optimizer: AdamW 8-bit
  • Precision: BF16

Training Results

  • Final Training Loss: 0.6046
  • Final Validation Loss: 0.6115
  • Best Validation Loss: 0.6115
  • Loss Improvement: 9.2% (from 0.6732 to 0.6115)
  • Training Time: 4.64 hours
  • GPU: NVIDIA A100-SXM4-40GB
  • Peak Memory: 19.375 GB / 39.494 GB (49.058%)

Framework

  • Unsloth: 2x faster training, 70% less memory
  • Modal: Serverless GPU infrastructure (40GB A100)
  • Transformers: 4.56.2
  • TRL: 0.22.2

Usage

from unsloth import FastModel

# Load the fine-tuned model
model, tokenizer = FastModel.from_pretrained(
    model_name="naazimsnh02/ernie-45-math-finetuned",
    max_seq_length=2048,
    load_in_4bit=True,
    full_finetuning=False,
)

# Prepare for inference
FastModel.for_inference(model)

# Solve a math problem
messages = [{
    "role": "user",
    "content": "Solve the equation: 2x² + 5x - 3 = 0"
}]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", padding=True).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Example Output

Input:

Solve the equation: x² + 5x + 6 = 0

Output:

To solve x² + 5x + 6 = 0, we can factor:

Find two numbers that multiply to 6 and add to 5:
2 and 3 work because 2 × 3 = 6 and 2 + 3 = 5

Factored form:
(x + 2)(x + 3) = 0

Setting each factor to zero:
x + 2 = 0  →  x = -2
x + 3 = 0  →  x = -3

Therefore: \boxed{x = -2, -3}

Training Progress

Step Training Loss Validation Loss
100 0.589 0.673
200 0.661 0.648
300 0.637 0.646
400 0.557 0.640
500 0.587 0.633
600 0.589 0.617
700 0.605 0.611

Training stopped at step 700 for optimal validation loss.

Training Infrastructure

  • Platform: Modal (modal.com)
  • GPU: 40GB A100
  • Training Duration: ~4.6 hours
  • Checkpointing: Every 100 steps
  • Evaluation: Every 100 steps

Limitations

  • Optimized for mathematical reasoning; may not perform as well on other domains
  • Trained on English language problems only
  • Best results with problems similar to training data format
  • Requires GPU for inference (4-bit quantization)

Citation

@misc{ernie45-math-2025,
  title={ERNIE-4.5 Fine-tuned for Mathematical Reasoning},
  author={naazimsnh02},
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/naazimsnh02/ernie-45-math-finetuned}}
}

Acknowledgments

  • ERNIE Team for the base model
  • Unsloth for optimization framework
  • NVIDIA for the Nemotron-RL dataset
  • Modal for GPU infrastructure
  • ERNIE AI Developer Challenge for the opportunity

License

MIT License - See repository for details


Trained with ❤️ using Unsloth and Modal