TinyLLM: Character-Level Math Solver

Model Description

TinyLLM is a highly compact, character-level Causal Language Model (based on the standard Transformer decoder architecture) trained specifically to solve single-digit math problems.

This model serves as a minimalist, educational example of how a standard LLM architecture can be trained from scratch on a very small, custom dataset.

Key Features

Architecture: Causal Transformer Decoder.
Task: Character-level text generation (autoregressive).
Input/Output: Solves problems formatted as N op N and generates the answer, e.g., 4 + 5 = 9<EOS>.
Custom Code Required: This is a custom PyTorch model and requires custom code (model.py, tokenizer.py) to be loaded by users.

How to Use (Inference)

To load and run this custom model, users must download the entire repository structure and use the provided custom code, specifically the TinyLLM class defined in model.py and the CharacterTokenizer in tokenizer.py.

1. Installation

First, ensure you have the required libraries installed:

pip install torch huggingface-hub
from huggingface_hub import snapshot_download
import torch
import os
import sys

# 1. Configuration: REPLACE with your repository ID
MODEL_ID = "anujbhatt4ai/tiny-math-llm" 
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

# 2. Download all files (code and weights)
local_path = snapshot_download(repo_id=MODEL_ID)

# 3. Import Custom Classes
# The downloaded path must be added to sys.path to allow custom imports
sys.path.append(local_path) 
from model import TinyLLM
from tokenizer import CharacterTokenizer, generate_v1_data

# 4. Setup and Load Model
def load_tiny_llm():
    # In this minimal case, we hardcode the known config values
    vocab_size = 22
    block_size = 14
    
    # Initialize the model with the exact trained parameters
    model = TinyLLM(
        vocab_size=vocab_size, 
        block_size=block_size, 
        n_embed=64, n_head=4, n_layer=4, dropout=0.1
    ).to(DEVICE)

    # Load the trained weights
    weights_path = os.path.join(local_path, "pytorch_model.bin")
    model.load_state_dict(torch.load(weights_path, map_location=DEVICE))
    model.eval()
    
    # Initialize the tokenizer
    raw_data = generate_v1_data()
    tokenizer = CharacterTokenizer(raw_data)
    
    return model, tokenizer

# Use the loaded model and tokenizer in your own generation logic
model, tokenizer = load_tiny_llm()
print("Model loaded and ready for math inference!")

**Block 4: Training Details and Repository Files**

`markdown
##  Training Details

### Architecture Configuration

The `TinyLLM` is configured with the following parameters, derived from the `config.json` and `model.py` files:

| Parameter | Value | Description |
| :--- | :--- | :--- |
| **`vocab_size`** | `22` | The size of the character vocabulary. |
| **`block_size`** | `14` | The maximum sequence length (context window). |
| **`n_embed`** | `64` | Embedding dimension. |
| **`n_head`** | `4` | Number of attention heads. |
| **`n_layer`** | `4` | Number of Transformer decoder blocks. |
| **`dropout`** | `0.1` | Dropout rate. |

### Training Hyperparameters (from `train.py`)

| Parameter | Value |
| :--- | :--- |
| **`BATCH_SIZE`** | `32` |
| **`LEARNING_RATE`** | `1e-3` (AdamW) |
| **`EPOCHS`** | `100` |
| **`DEVICE`** | `cuda` if available, else `cpu` |

### Dataset

The model was trained on an **exhaustive set of all single-digit math problems** (addition, subtraction, multiplication, and non-remainder division) where the result is also a single digit (0-9). The **`dataset.py`** file contains the logic for the essential sequence shift used for language modeling training.

---

## Repository Files

This flat repository contains all the source code needed for complete reproducibility.

| File Name | Description |
| :--- | :--- |
| **`pytorch_model.bin`** | The trained model weights. |
| **`config.json`** | Model configuration/hyperparameters. |
| **`model.py`** | **Core Logic:** Custom `TinyLLM` architecture definition. |
| **`tokenizer.py`** | **Core Logic:** Custom `CharacterTokenizer` and data generator. |
| **`dataset.py`** | Defines the `MathDataset` class and sequence shift logic. |
| **`train.py`** | The complete training script and final hyperparameters. |
| **`custom_run.py`** (or `run.py`) | Example script demonstrating how to use the model for generation. |
| **`README.md`** | This model card and documentation. |

Downloads last month: 7