YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

AGIFORMER: Byte-Level Language Model with Neuroplasticity

Status: Phase 7 - Curriculum Learning ✅ Complete
Latest Achievement: 20K curriculum training with 77% BPC reduction

A research implementation of a byte-level language model featuring:

  • 🧠 Hebbian Memory with dynamic neuroplasticity
  • 📚 Curriculum Learning (3-stage developmental approach)
  • 🔄 System 2 Reasoning (iterative thinking loop)
  • 🚀 Linear Complexity attention mechanism

Quick Start

Installation

pip install torch datasets tqdm

Training (Curriculum Learning)

python train_curriculum.py  # 20K steps, 3 curriculum stages

Inference

python generate.py best_model_curriculum.pth

Testing

python test_recall.py best_model_curriculum.pth  # Memory test
python inspect_reasoning.py                        # System 2 diagnostics

Architecture

Bytes → Encoder (RoPE) → Hebbian Memory → Reasoning Loop → Local RNN → Bytes
         (Patches)        (Dynamic λ)       (3 steps)        (Autoregressive)

Core Components

  • ByteLatentEncoder: Patches bytes into latent vectors with RoPE
  • HebbianMemory: Fast weights with learnable decay + neuroplasticity (α)
  • RecurrentReasoningBlock: 3-step iterative thinking loop (System 2)
  • LocalAutoregressiveHead: GRU-based byte decoder

See docs/architecture.md for technical details.

Features

No Tokenization - Universal byte-level processing
Linear Complexity - O(N) attention with Hebbian memory
Neuroplasticity - Dynamic memory consolidation (α: 0.1 → 0.99)
Curriculum Learning - 3-stage developmental training
Active Reasoning - Verified thinking loop (Δz = 12.7)
AMP Compatible - Mixed precision training with stability fixes

Curriculum Learning (Phase 7)

Training Stages

Stage Steps Plasticity (α) Data Purpose
1. Childhood 0-3K 0.10 Dictionary Lexical grounding
2. Youth 3K-8K 0.50 Stories Syntactic scaffolding
3. Adulthood 8K-20K 0.99 Wikipedia Semantic expansion

Results (20K Steps - Turkish Training)

Metrics:

  • Final BPC: 1.85 (↓77% from initialization)
  • Best Val BPC: 1.78
  • Training Time: ~50 minutes (CUDA GPU)
  • Stability: 0 NaN occurrences across 20K steps

Progress:

Step 0:     BPC = 8.04  (Random initialization)
Step 5K:    BPC = 2.23  (Initial curriculum complete)
Step 10K:   BPC = 1.98  (Mid-training)
Step 20K:   BPC = 1.85  (Final)

Improvement: 6.19 BPC reduction (77% improvement)

Critical Fix: AMP Stability

Problem: Float16 overflow in Hebbian Memory with low plasticity (α=0.1)
Solution: Force float32 computation for memory module

@torch.amp.autocast('cuda', enabled=False)
def forward(self, x):
    x = x.float()  # Bypass AMP for numerical stability
    # ... Hebbian computation ...
    return out.to(input_dtype)

This fix enables stable 20K+ step training with AMP enabled.

Documentation

Model Files

  • best_model_curriculum.pth - Best checkpoint (Val BPC: 1.78)
  • last_model_curriculum.pth - Final model state (20K steps)
  • metrics_curriculum.json - Full training metrics

Next Steps

Recommended Improvements

  1. Extended Training: 30K-50K steps for further convergence
  2. Larger Model: Increase d_model=768, n_layers=8
  3. Longer Context: Extend to 2048 token window
  4. Fine-tuning: Domain-specific Turkish datasets

Research Directions

  • Adaptive plasticity scheduling
  • Multi-stage curriculum optimization
  • Cross-lingual transfer learning
  • Sparse Hebbian memory

Citation

@software{agiformer2025,
  title={AGIFORMER: Byte-Level Language Model with Hebbian Memory and Neuroplasticity},
  author={inkbytefo},
  year={2025},
  note={Phase 7: Curriculum Learning with Dynamic Plasticity},
  url={https://github.com/inkbytefo/agi-former}
}

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built with PyTorch
  • Turkish Wikipedia dataset (trwiki)
  • Turkish Dictionary dataset (TDK)
  • Inspired by Fast Weights, Linear Transformers, and developmental neuroscience

Developer: inkbytefo
Phase: 7 (Curriculum Learning & Neuroplasticity)
Status: Production Ready ✅
Last Updated: 2025-11-23

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support