Girinath11 commited on
Commit
5a4d89b
·
verified ·
1 Parent(s): 5e91a95

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -21,7 +21,7 @@ MixtureofRecursionwithRouter is tailored for technical domains, combining:
21
  ->Custom Tokenizer: Byte-pair encoding (BPE) with special tokens for code, math, and conversation roles (e.g., <user>, <assistant>).
22
  ->Adaptive Embeddings: Token embeddings with configurable positional encodings (learned, sinusoidal, or RoPE).
23
  ->Recursive Transformer: Multi-layered architecture with a RecursionRouter to dynamically adjust computation steps based on input complexity.
24
- ->Ultra-Fast Training: Optimized for low loss (<2.0) and perplexity (<12) in 4-5 hours using mixed precision and cosine scheduling.
25
 
26
  ## Model Details
27
 
 
21
  ->Custom Tokenizer: Byte-pair encoding (BPE) with special tokens for code, math, and conversation roles (e.g., <user>, <assistant>).
22
  ->Adaptive Embeddings: Token embeddings with configurable positional encodings (learned, sinusoidal, or RoPE).
23
  ->Recursive Transformer: Multi-layered architecture with a RecursionRouter to dynamically adjust computation steps based on input complexity.
24
+ ->Ultra-Fast Training: Optimized for low loss (<2.0) and perplexity (<12) using mixed precision and cosine scheduling.
25
 
26
  ## Model Details
27