JuliaGPT

An experimental character-level GPT in pure Julia exploring minimal vocabularies inspired by ancient Greek scriptio continua. Built with scalar autograd, no external ML dependencies.

Architecture

  • 1 transformer layer, 4 attention heads
  • n_embd=16, block_size=256
  • RMSNorm, ReLU, KV cache for causal masking
  • Adam optimizer with linear LR decay
  • ~5K parameters

Vocabulary

28 characters (a-z + space + period) + BOS = 29 vocab. Numerals converted to words, all punctuation removed except period.

Training

  • Dataset: Aristotle's Rhetoric + Euclid's Elements (8,461 chunks)
  • Current checkpoint: step 650, val_loss=2.3414

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train LisaMegaWatts/JuliaGPT