Update README.md
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ MixtureofRecursionwithRouter is tailored for technical domains, combining:
|
|
| 21 |
->Custom Tokenizer: Byte-pair encoding (BPE) with special tokens for code, math, and conversation roles (e.g., <user>, <assistant>).
|
| 22 |
->Adaptive Embeddings: Token embeddings with configurable positional encodings (learned, sinusoidal, or RoPE).
|
| 23 |
->Recursive Transformer: Multi-layered architecture with a RecursionRouter to dynamically adjust computation steps based on input complexity.
|
| 24 |
-
->Ultra-Fast Training: Optimized for low loss (<2.0) and perplexity (<12)
|
| 25 |
|
| 26 |
## Model Details
|
| 27 |
|
|
|
|
| 21 |
->Custom Tokenizer: Byte-pair encoding (BPE) with special tokens for code, math, and conversation roles (e.g., <user>, <assistant>).
|
| 22 |
->Adaptive Embeddings: Token embeddings with configurable positional encodings (learned, sinusoidal, or RoPE).
|
| 23 |
->Recursive Transformer: Multi-layered architecture with a RecursionRouter to dynamically adjust computation steps based on input complexity.
|
| 24 |
+
->Ultra-Fast Training: Optimized for low loss (<2.0) and perplexity (<12) using mixed precision and cosine scheduling.
|
| 25 |
|
| 26 |
## Model Details
|
| 27 |
|