File size: 3,534 Bytes
8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 bfe4a0f 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 8f3589a 7c070a4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | ---
license: mit
language:
- en
tags:
- causal-lm
- scientific-language-model
- mathematics
- arxiv
- research
library_name: transformers
---
# KiteFish-A1-1.5B
**KiteFish-A1-1.5B** is a ~1.5B parameter decoder-only transformer trained from scratch on raw arXiv LaTeX sources across mathematics, computer science, and theoretical physics.
📄 **Paper:** https://arxiv.org/abs/2602.17288
💻 **Github:** https://github.com/kitefishai/KiteFish-A1-1.5B-Math
This is a **base scientific language model** (not instruction-tuned).
## Overview
KiteFish-A1-1.5B explores what it takes to train a domain-specialized scientific language model directly from structured LaTeX archives.
**Training Scale**
- ~52B pretraining tokens
- ~5B additional post-training tokens
- ~200GB processed scientific corpus
- LLaMA-compatible tokenizer (~102k vocab)
- 2× NVIDIA A100 (80GB) GPUs
- 24 experimental training runs
The focus of this project is *scientific language modeling robustness*, not benchmark optimization.
## Model Architecture
- 24 Transformer layers
- Hidden size: 2048
- FFN size: 5504
- 16 attention heads
- Context length: 4096 (trained at 768 tokens)
- Dense LLaMA-style architecture
**Optimization**
- AdamW
- Learning rate: 2e-4
- Warmup: 500 steps
- Weight decay: 0.1
- Gradient accumulation: 32
- bf16 mixed precision
- Gradient checkpointing enabled
**Validation Perplexity:** ~4.2 (held-out scientific corpus)
## Intended Use
KiteFish-A1-1.5B is suitable for:
- Scientific text modeling research
- Mathematical language modeling experiments
- Pretraining initialization for domain fine-tuning
- Tokenization and symbolic modeling research
- Studying LaTeX structure modeling
It is **not optimized for:**
- Instruction following
- Chat-based applications
- General conversational AI
- Benchmark leaderboard performance
## Performance Notes
This model was trained under moderate compute constraints and without instruction tuning or alignment stages.
Observed characteristics:
- Strong familiarity with scientific writing style
- Stable LaTeX structural modeling
- Reasonable symbolic fluency
- Limited reasoning depth
- Low downstream benchmark accuracy without fine-tuning
Performance improves significantly with supervised fine-tuning (SFT), LoRA adaptation, or domain-specific instruction tuning.
## Limitations
- Not instruction-tuned
- No RLHF or preference alignment
- Trained at 768-token sequence length
- Domain restricted to selected arXiv categories
- Not optimized for reasoning benchmarks
- General NLP benchmark scores may be low
This release is intended primarily for research and experimentation.
## Example Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "KiteFishAI/KiteFish-A1-1.5B-Math"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
prompt = "Prove that the sum of two continuous functions is continuous."
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Citation
If you use this model in your research, please cite:
```
@article{kitefish_a1_2026,
title={KiteFish-A1: Training a Scientific Language Model from Raw LaTeX Archives},
author={...},
year={2026},
eprint={2602.17288},
archivePrefix={arXiv}
}
```
|