| | --- |
| | license: mit |
| | language: |
| | - en |
| | tags: |
| | - causal-lm |
| | - scientific-language-model |
| | - mathematics |
| | - arxiv |
| | - research |
| | library_name: transformers |
| | --- |
| | |
| | # KiteFish-A1-1.5B |
| |
|
| | **KiteFish-A1-1.5B** is a ~1.5B parameter decoder-only transformer trained from scratch on raw arXiv LaTeX sources across mathematics, computer science, and theoretical physics. |
| |
|
| | 📄 **Paper:** https://arxiv.org/abs/2602.17288 |
| | 💻 **Github:** https://github.com/kitefishai/KiteFish-A1-1.5B-Math |
| |
|
| | This is a **base scientific language model** (not instruction-tuned). |
| |
|
| | ## Overview |
| |
|
| | KiteFish-A1-1.5B explores what it takes to train a domain-specialized scientific language model directly from structured LaTeX archives. |
| |
|
| | **Training Scale** |
| | - ~52B pretraining tokens |
| | - ~5B additional post-training tokens |
| | - ~200GB processed scientific corpus |
| | - LLaMA-compatible tokenizer (~102k vocab) |
| | - 2× NVIDIA A100 (80GB) GPUs |
| | - 24 experimental training runs |
| |
|
| | The focus of this project is *scientific language modeling robustness*, not benchmark optimization. |
| |
|
| | ## Model Architecture |
| |
|
| | - 24 Transformer layers |
| | - Hidden size: 2048 |
| | - FFN size: 5504 |
| | - 16 attention heads |
| | - Context length: 4096 (trained at 768 tokens) |
| | - Dense LLaMA-style architecture |
| |
|
| | **Optimization** |
| | - AdamW |
| | - Learning rate: 2e-4 |
| | - Warmup: 500 steps |
| | - Weight decay: 0.1 |
| | - Gradient accumulation: 32 |
| | - bf16 mixed precision |
| | - Gradient checkpointing enabled |
| |
|
| | **Validation Perplexity:** ~4.2 (held-out scientific corpus) |
| |
|
| | ## Intended Use |
| |
|
| | KiteFish-A1-1.5B is suitable for: |
| |
|
| | - Scientific text modeling research |
| | - Mathematical language modeling experiments |
| | - Pretraining initialization for domain fine-tuning |
| | - Tokenization and symbolic modeling research |
| | - Studying LaTeX structure modeling |
| |
|
| | It is **not optimized for:** |
| |
|
| | - Instruction following |
| | - Chat-based applications |
| | - General conversational AI |
| | - Benchmark leaderboard performance |
| |
|
| | ## Performance Notes |
| |
|
| | This model was trained under moderate compute constraints and without instruction tuning or alignment stages. |
| |
|
| | Observed characteristics: |
| |
|
| | - Strong familiarity with scientific writing style |
| | - Stable LaTeX structural modeling |
| | - Reasonable symbolic fluency |
| | - Limited reasoning depth |
| | - Low downstream benchmark accuracy without fine-tuning |
| |
|
| | Performance improves significantly with supervised fine-tuning (SFT), LoRA adaptation, or domain-specific instruction tuning. |
| |
|
| | ## Limitations |
| |
|
| | - Not instruction-tuned |
| | - No RLHF or preference alignment |
| | - Trained at 768-token sequence length |
| | - Domain restricted to selected arXiv categories |
| | - Not optimized for reasoning benchmarks |
| | - General NLP benchmark scores may be low |
| |
|
| | This release is intended primarily for research and experimentation. |
| |
|
| | ## Example Usage |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | import torch |
| | |
| | model_id = "KiteFishAI/KiteFish-A1-1.5B-Math" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_id) |
| | model = AutoModelForCausalLM.from_pretrained(model_id) |
| | |
| | prompt = "Prove that the sum of two continuous functions is continuous." |
| | inputs = tokenizer(prompt, return_tensors="pt") |
| | |
| | with torch.no_grad(): |
| | outputs = model.generate(**inputs, max_new_tokens=200) |
| | |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | If you use this model in your research, please cite: |
| |
|
| | ``` |
| | @article{kitefish_a1_2026, |
| | title={KiteFish-A1: Training a Scientific Language Model from Raw LaTeX Archives}, |
| | author={...}, |
| | year={2026}, |
| | eprint={2602.17288}, |
| | archivePrefix={arXiv} |
| | } |
| | ``` |
| |
|