SpoomplesMaxx Base — Qwen3-14B CPT
A continued pre-training (CPT) of Qwen3-14B-Base on a curated mix of fiction, character knowledge, prose, and domain-specific corpora. This is the base model — further SFT and DPO stages follow.
Model Description
This model is part of the SpoomplesMaxx training pipeline: CPT → SFT → DPO
The CPT stage teaches the model general language patterns, domain knowledge, and writing styles by training on raw text corpora without chat templates. It grounds the model in character knowledge, narrative prose, multilingual content, and uncensored language before instruction tuning.
Training Data
CPT Curriculum (3 phases)
The prepared dataset (aimeri/spoomplesmaxx-cpt-small-Qwen3-14B-Base) was assembled from three curriculum phases, each with different repeat factors to control emphasis.
| Phase | Focus | Repeat | Key Sources |
|---|---|---|---|
| Phase 1: Core Knowledge | Characters, lore, world-building | 2× | Custom character cards, AO3 works, fictional character DBpedia, NSFW prose (Literotica, nsfwstory, NSFW-Stories), movie spoilers |
| Phase 2: Domain Prose | Writing quality, narrative style | 3× | Gutenberg prose, LongPage (long-form + planning traces), light novels, TV dialogue, FimFiction, TV Tropes, Brazilian news/law, Huberman Lab transcripts |
| Phase 3: Language Diversity | Robustness, multilingual | 1× | Toxic conversations (pile-toxicity-balanced series), Fandom wiki lore |
Total training samples: 50,000 (after dataset_limit, repacked to 3072 tokens)
The prepared dataset is pre-tokenized and publicly available on HuggingFace. Private data (custom character cards) is included in the tokenized form only.
Training Configuration
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen3-14B-Base |
| Training Phase | CPT (Continued Pre-Training) |
| Epochs | 2 |
| Steps | ~782 |
| Batch Size (per device) | 4 |
| Gradient Accumulation | 8 |
| Effective Batch Size | 128 (4 × 8 × 4 GPUs) |
| Learning Rate | 3e-5 |
| LR Scheduler | Cosine with min LR (min_lr_rate=0.01 → floor 3e-7) |
| Warmup Ratio | 0.0 |
| Weight Decay | 0.1 |
| Max Gradient Norm | 1.0 |
| Max Sequence Length | 3072 |
| Precision | BF16 |
| Optimizer | 8-bit Paged AdamW |
| Gradient Checkpointing | Yes |
| Liger Kernel | Yes (fused lm_head + cross-entropy) |
| Dataset Repacking | Yes (stream mode, 3072 tokens) |
| DeepSpeed | ZeRO-3 (full parameter sharding) |
Hardware
- GPU: 4× NVIDIA H100
- Training Time: ~14 hours
Metrics
| Metric | Value |
|---|---|
| Train Loss | 1.735 |
| Train Perplexity | 5.668 |
| Samples/sec | 1.964 |
| Total FLOPs | 1.26 × 10¹⁸ |
Intended Use
This CPT base model is intended as a foundation for the SpoomplesMaxx pipeline:
- Next step: SFT with chat/instruction data and persona injection
- Final step: DPO alignment with preference pairs
Use cases after full pipeline:
- Creative writing and fiction generation
- Character roleplay with consistent personas
- Uncensored conversational AI
- Multilingual content (English, Portuguese, some Italian/Spanish)
Not Recommended For:
- Direct use as a chat assistant (this is a base model — no instruction tuning yet)
- Factual Q&A or knowledge retrieval (CPT emphasizes narrative over factuality)
- Production safety-critical applications
Limitations
- This is a base model without instruction following. It will continue text, not answer questions.
- Domain knowledge is biased toward fiction, character knowledge, and creative writing.
- Contains uncensored/NSFW training data — outputs may include explicit content.
- Multilingual content is weighted toward English with some Portuguese/Brazilian content.
- 3072 token context window during CPT (base model supports 128K; longer contexts untested post-CPT).
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"aimeri/spoomplesmaxx-base-qwen3-14b",
dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("aimeri/spoomplesmaxx-base-qwen3-14b")
# Text continuation (base model — no chat template)
inputs = tokenizer("The castle stood silent against the darkening sky, its towers reaching toward clouds that promised rain. Inside,", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.8, top_p=0.95)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
If you use this model, please cite the base model and datasets:
@misc{qwen3-14b-base,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
url={https://huggingface.co/Qwen/Qwen3-14B-Base}
}
Acknowledgments
- Qwen Team for the Qwen3-14B-Base model
- PocketDoc for the Dans-Prosemaxx, DanChat format and related datasets
- PJMixers-Dev for curated fiction and RP datasets
- HuggingFace for the Transformers and Accelerate libraries
- DeepSpeed for ZeRO-3 distributed training
- Downloads last month
- 73