SpoomplesMaxx Base — Qwen3-14B CPT

A continued pre-training (CPT) of Qwen3-14B-Base on a curated mix of fiction, character knowledge, prose, and domain-specific corpora. This is the base model — further SFT and DPO stages follow.

Model Description

This model is part of the SpoomplesMaxx training pipeline: CPT → SFT → DPO

The CPT stage teaches the model general language patterns, domain knowledge, and writing styles by training on raw text corpora without chat templates. It grounds the model in character knowledge, narrative prose, multilingual content, and uncensored language before instruction tuning.

Training Data

CPT Curriculum (3 phases)

The prepared dataset (aimeri/spoomplesmaxx-cpt-small-Qwen3-14B-Base) was assembled from three curriculum phases, each with different repeat factors to control emphasis.

Phase	Focus	Repeat	Key Sources
Phase 1: Core Knowledge	Characters, lore, world-building	2×	Custom character cards, AO3 works, fictional character DBpedia, NSFW prose (Literotica, nsfwstory, NSFW-Stories), movie spoilers
Phase 2: Domain Prose	Writing quality, narrative style	3×	Gutenberg prose, LongPage (long-form + planning traces), light novels, TV dialogue, FimFiction, TV Tropes, Brazilian news/law, Huberman Lab transcripts
Phase 3: Language Diversity	Robustness, multilingual	1×	Toxic conversations (pile-toxicity-balanced series), Fandom wiki lore

Total training samples: 50,000 (after dataset_limit, repacked to 3072 tokens)

The prepared dataset is pre-tokenized and publicly available on HuggingFace. Private data (custom character cards) is included in the tokenized form only.

Training Configuration

Parameter	Value
Base Model	`Qwen/Qwen3-14B-Base`
Training Phase	CPT (Continued Pre-Training)
Epochs	2
Steps	~782
Batch Size (per device)	4
Gradient Accumulation	8
Effective Batch Size	128 (4 × 8 × 4 GPUs)
Learning Rate	3e-5
LR Scheduler	Cosine with min LR (min_lr_rate=0.01 → floor 3e-7)
Warmup Ratio	0.0
Weight Decay	0.1
Max Gradient Norm	1.0
Max Sequence Length	3072
Precision	BF16
Optimizer	8-bit Paged AdamW
Gradient Checkpointing	Yes
Liger Kernel	Yes (fused lm_head + cross-entropy)
Dataset Repacking	Yes (stream mode, 3072 tokens)
DeepSpeed	ZeRO-3 (full parameter sharding)

Hardware

GPU: 4× NVIDIA H100
Training Time: ~14 hours

Metrics

Metric	Value
Train Loss	1.735
Train Perplexity	5.668
Samples/sec	1.964
Total FLOPs	1.26 × 10¹⁸

Intended Use

This CPT base model is intended as a foundation for the SpoomplesMaxx pipeline:

Next step: SFT with chat/instruction data and persona injection
Final step: DPO alignment with preference pairs

Use cases after full pipeline:

Creative writing and fiction generation
Character roleplay with consistent personas
Uncensored conversational AI
Multilingual content (English, Portuguese, some Italian/Spanish)

Not Recommended For:

Direct use as a chat assistant (this is a base model — no instruction tuning yet)
Factual Q&A or knowledge retrieval (CPT emphasizes narrative over factuality)
Production safety-critical applications

Limitations

This is a base model without instruction following. It will continue text, not answer questions.
Domain knowledge is biased toward fiction, character knowledge, and creative writing.
Contains uncensored/NSFW training data — outputs may include explicit content.
Multilingual content is weighted toward English with some Portuguese/Brazilian content.
3072 token context window during CPT (base model supports 128K; longer contexts untested post-CPT).

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "aimeri/spoomplesmaxx-base-qwen3-14b",
    dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("aimeri/spoomplesmaxx-base-qwen3-14b")

# Text continuation (base model — no chat template)
inputs = tokenizer("The castle stood silent against the darkening sky, its towers reaching toward clouds that promised rain. Inside,", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.8, top_p=0.95)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

If you use this model, please cite the base model and datasets:

@misc{qwen3-14b-base,
  title={Qwen3 Technical Report},
  author={Qwen Team},
  year={2025},
  url={https://huggingface.co/Qwen/Qwen3-14B-Base}
}

Acknowledgments

Qwen Team for the Qwen3-14B-Base model
PocketDoc for the Dans-Prosemaxx, DanChat format and related datasets
PJMixers-Dev for curated fiction and RP datasets
HuggingFace for the Transformers and Accelerate libraries
DeepSpeed for ZeRO-3 distributed training

Downloads last month: 73

Safetensors

Model size

15B params

Tensor type

BF16

Model tree for aimeri/spoomplesmaxx-base-qwen3-14b

Base model

Qwen/Qwen3-14B-Base

Finetuned

(69)

this model

Quantizations

2 models