MicroJulia

A minimal character-level GPT built entirely in pure Julia with scalar autograd. No external ML dependencies.

Architecture

  • 1 transformer layer, 4 attention heads
  • n_embd=16, block_size=64
  • RMSNorm, ReLU, KV cache for causal masking
  • Adam optimizer with linear LR decay
  • ~5K parameters

Vocabulary

27 characters (a-z + space) + BOS = 28 vocab

Training

  • Dataset: Aristotle's Rhetoric + Euclid's Elements (8,487 chunks)
  • Current checkpoint: step 150, val_loss=2.4315

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train LisaMegaWatts/MicroJulia