KiteFishAI
/

KiteFish-A1-1.5B-Math

Text Generation

scientific-language-model

text-generation-inference

Model card Files Files and versions

anuj0456 commited on 4 days ago

Commit

8f3589a

·

verified ·

1 Parent(s): 031bdef

Update README.md

Files changed (1) hide show

README.md +126 -3

README.md CHANGED Viewed

@@ -1,3 +1,126 @@
----
-license: mit
----

+---
+license: mit
+language:
+  - en
+tags:
+  - causal-lm
+  - scientific-language-model
+  - arxiv
+  - mathematics
+  - research
+library_name: transformers
+---
+# KiteFish-A1-1.5B
+KiteFish-A1-1.5B is a ~1.5B parameter decoder-only transformer trained from scratch on raw arXiv LaTeX sources spanning mathematics, computer science, and theoretical physics.
+This model is a **base scientific language model** and is not instruction-tuned.
+---
+## Overview
+KiteFish-A1-1.5B was trained using approximately:
+- **52.18B pretraining tokens**
+- **5B post-training tokens**
+- ~200GB of processed scientific corpus
+- LLaMA-compatible tokenizer (~102k vocab)
+- 2× NVIDIA A100 (80GB) GPUs
+- 24 experimental runs for optimization stability
+The goal of this model is to explore the practical challenges of training a domain-specialized scientific language model from raw LaTeX archives.
+---
+## Intended Use
+This model is intended for:
+- Scientific text modeling research
+- Mathematical language modeling experiments
+- Pretraining initialization for domain-specific fine-tuning
+- Tokenization and symbolic modeling research
+This model is **not optimized for:**
+- General conversational AI
+- Instruction following
+- Chat-based interaction
+- Benchmark competition
+---
+## Performance Notes
+This is a base model trained from scratch under moderate compute constraints.
+Observed characteristics:
+- Strong familiarity with scientific writing style
+- Stable LaTeX structure modeling
+- Limited instruction-following ability
+- Limited reasoning depth compared to large instruction-tuned models
+- Modest downstream benchmark accuracy without fine-tuning
+Users are encouraged to apply supervised fine-tuning (SFT) or LoRA-based adaptation for improved task performance.
+---
+## Training Details
+**Architecture**
+- 24 layers
+- Hidden size: 2048
+- FFN size: 5504
+- 16 attention heads
+- Context length: 4096 (trained at 768 tokens)
+- Dense LLaMA-style transformer
+**Optimization**
+- AdamW
+- Learning rate: 2e-4
+- Warmup: 500 steps
+- Weight decay: 0.1
+- Gradient accumulation: 32
+- Gradient checkpointing enabled
+- Mixed precision (bf16)
+**Validation Perplexity**
+- ~4.2 on held-out scientific corpus
+---
+## Limitations
+- Not instruction-tuned
+- Limited reasoning capabilities
+- Trained at 768-token sequence length
+- Domain restricted to selected arXiv categories
+- No RLHF or preference alignment
+- Not benchmark-optimized
+Performance on general NLP benchmarks may be low.
+---
+## Example Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_id = "KiteFishAI/KiteFish-A1-1.5B-Math"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id)
+prompt = "Prove that the sum of two continuous functions is continuous."
+inputs = tokenizer(prompt, return_tensors="pt")
+with torch.no_grad():
+    outputs = model.generate(**inputs, max_new_tokens=200)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))