AGIFORMER: Byte-Level Language Model with Neuroplasticity
Status: Phase 7 - Curriculum Learning ✅ Complete
Latest Achievement: 20K curriculum training with 77% BPC reduction
A research implementation of a byte-level language model featuring:
- 🧠 Hebbian Memory with dynamic neuroplasticity
- 📚 Curriculum Learning (3-stage developmental approach)
- 🔄 System 2 Reasoning (iterative thinking loop)
- 🚀 Linear Complexity attention mechanism
Quick Start
Installation
pip install torch datasets tqdm
Training (Curriculum Learning)
python train_curriculum.py # 20K steps, 3 curriculum stages
Inference
python generate.py best_model_curriculum.pth
Testing
python test_recall.py best_model_curriculum.pth # Memory test
python inspect_reasoning.py # System 2 diagnostics
Architecture
Bytes → Encoder (RoPE) → Hebbian Memory → Reasoning Loop → Local RNN → Bytes
(Patches) (Dynamic λ) (3 steps) (Autoregressive)
Core Components
- ByteLatentEncoder: Patches bytes into latent vectors with RoPE
- HebbianMemory: Fast weights with learnable decay + neuroplasticity (α)
- RecurrentReasoningBlock: 3-step iterative thinking loop (System 2)
- LocalAutoregressiveHead: GRU-based byte decoder
See docs/architecture.md for technical details.
Features
✅ No Tokenization - Universal byte-level processing
✅ Linear Complexity - O(N) attention with Hebbian memory
✅ Neuroplasticity - Dynamic memory consolidation (α: 0.1 → 0.99)
✅ Curriculum Learning - 3-stage developmental training
✅ Active Reasoning - Verified thinking loop (Δz = 12.7)
✅ AMP Compatible - Mixed precision training with stability fixes
Curriculum Learning (Phase 7)
Training Stages
| Stage | Steps | Plasticity (α) | Data | Purpose |
|---|---|---|---|---|
| 1. Childhood | 0-3K | 0.10 | Dictionary | Lexical grounding |
| 2. Youth | 3K-8K | 0.50 | Stories | Syntactic scaffolding |
| 3. Adulthood | 8K-20K | 0.99 | Wikipedia | Semantic expansion |
Results (20K Steps - Turkish Training)
Metrics:
- Final BPC: 1.85 (↓77% from initialization)
- Best Val BPC: 1.78
- Training Time: ~50 minutes (CUDA GPU)
- Stability: 0 NaN occurrences across 20K steps
Progress:
Step 0: BPC = 8.04 (Random initialization)
Step 5K: BPC = 2.23 (Initial curriculum complete)
Step 10K: BPC = 1.98 (Mid-training)
Step 20K: BPC = 1.85 (Final)
Improvement: 6.19 BPC reduction (77% improvement)
Critical Fix: AMP Stability
Problem: Float16 overflow in Hebbian Memory with low plasticity (α=0.1)
Solution: Force float32 computation for memory module
@torch.amp.autocast('cuda', enabled=False)
def forward(self, x):
x = x.float() # Bypass AMP for numerical stability
# ... Hebbian computation ...
return out.to(input_dtype)
This fix enables stable 20K+ step training with AMP enabled.
Documentation
- Architecture Guide - Technical deep dive
- Training Guide - Training from scratch
- Inference Guide - Generation and sampling
- API Reference - Code documentation
- RFC 007: Curriculum Learning - Phase 7 design
Model Files
best_model_curriculum.pth- Best checkpoint (Val BPC: 1.78)last_model_curriculum.pth- Final model state (20K steps)metrics_curriculum.json- Full training metrics
Next Steps
Recommended Improvements
- Extended Training: 30K-50K steps for further convergence
- Larger Model: Increase d_model=768, n_layers=8
- Longer Context: Extend to 2048 token window
- Fine-tuning: Domain-specific Turkish datasets
Research Directions
- Adaptive plasticity scheduling
- Multi-stage curriculum optimization
- Cross-lingual transfer learning
- Sparse Hebbian memory
Citation
@software{agiformer2025,
title={AGIFORMER: Byte-Level Language Model with Hebbian Memory and Neuroplasticity},
author={inkbytefo},
year={2025},
note={Phase 7: Curriculum Learning with Dynamic Plasticity},
url={https://github.com/inkbytefo/agi-former}
}
License
MIT License - see LICENSE file for details.
Acknowledgments
- Built with PyTorch
- Turkish Wikipedia dataset (trwiki)
- Turkish Dictionary dataset (TDK)
- Inspired by Fast Weights, Linear Transformers, and developmental neuroscience
Developer: inkbytefo
Phase: 7 (Curriculum Learning & Neuroplasticity)
Status: Production Ready ✅
Last Updated: 2025-11-23