JuliaGPT
An experimental character-level GPT in pure Julia exploring minimal vocabularies inspired by ancient Greek scriptio continua. Built with scalar autograd, no external ML dependencies.
Architecture
- 1 transformer layer, 4 attention heads
- n_embd=16, block_size=256
- RMSNorm, ReLU, KV cache for causal masking
- Adam optimizer with linear LR decay
- ~5K parameters
Vocabulary
28 characters (a-z + space + period) + BOS = 29 vocab. Numerals converted to words, all punctuation removed except period.
Training
- Dataset: Aristotle's Rhetoric + Euclid's Elements (8,461 chunks)
- Current checkpoint: step 650, val_loss=2.3414