Girinath11
/

MixtureofRecursionwithRouter

Text Generation

mixture_of_recursions

feature-extraction

recursive-transformer

technical-content

code-generation

adaptive-routing

Model card Files Files and versions

Girinath11 commited on Sep 4

Commit

5a4d89b

·

verified ·

1 Parent(s): 5e91a95

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ MixtureofRecursionwithRouter is tailored for technical domains, combining:
 ->Custom Tokenizer: Byte-pair encoding (BPE) with special tokens for code, math, and conversation roles (e.g., <user>, <assistant>).
 ->Adaptive Embeddings: Token embeddings with configurable positional encodings (learned, sinusoidal, or RoPE).
 ->Recursive Transformer: Multi-layered architecture with a RecursionRouter to dynamically adjust computation steps based on input complexity.
-->Ultra-Fast Training: Optimized for low loss (<2.0) and perplexity (<12) in 4-5 hours using mixed precision and cosine scheduling.
 ## Model Details

 ->Custom Tokenizer: Byte-pair encoding (BPE) with special tokens for code, math, and conversation roles (e.g., <user>, <assistant>).
 ->Adaptive Embeddings: Token embeddings with configurable positional encodings (learned, sinusoidal, or RoPE).
 ->Recursive Transformer: Multi-layered architecture with a RecursionRouter to dynamically adjust computation steps based on input complexity.
+->Ultra-Fast Training: Optimized for low loss (<2.0) and perplexity (<12) using mixed precision and cosine scheduling.
 ## Model Details