MicroJulia
A minimal character-level GPT built entirely in pure Julia with scalar autograd. No external ML dependencies.
Architecture
- 1 transformer layer, 4 attention heads
- n_embd=16, block_size=64
- RMSNorm, ReLU, KV cache for causal masking
- Adam optimizer with linear LR decay
- ~5K parameters
Vocabulary
27 characters (a-z + space) + BOS = 28 vocab
Training
- Dataset: Aristotle's Rhetoric + Euclid's Elements (8,487 chunks)
- Current checkpoint: step 150, val_loss=2.4315