SparseFlow v8

Efficient language model with sparse attention and persistent memory.

πŸ“Š REAL Measured Metrics

Metric Value
Parameters 71,359,746
Perplexity 14.77
Attention Sparsity 87.5%
Channel Sparsity 75.0%
Peak Memory 3.67 GB

πŸ—οΈ Architecture

  • Sparse Token Attention: Attends to top-64 tokens per position
  • Sparse Channel FFN: Activates top-128 channels
  • Persistent Memory: 20,000 memory vectors
  • 8 Transformer layers with 512 dim

πŸ“š Training Data

Open source datasets only:

  • GSM8K, MATH (mathematics)
  • ARC, OpenBookQA, SciQ (science & reasoning)
  • CommonsenseQA, PIQA (common sense)
  • TriviaQA, Natural Questions (factual)
  • WikiText-103 (language modeling)

πŸ‘¨β€πŸ’» Author

Logo (Mike Amega) β€” Ame Web Studio

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train amewebstudio/sparseflow-chat-v8