aimeri commited on
Commit
7eb1e01
·
verified ·
1 Parent(s): ff8747c

Upload folder using huggingface_hub

Browse files
README.md DELETED
@@ -1,125 +0,0 @@
1
- ---
2
- license: mit
3
- base_model: Qwen/Qwen3-14B-Base
4
- tags:
5
- - cpt
6
- - continued-pretraining
7
- - roleplay
8
- - creative-writing
9
- - character-cards
10
- - fiction
11
- datasets:
12
- - nyuuzyou/fandom
13
- - gryffindor-ISWS/dbpedia_abstracts_fictional_characters_with_img
14
- - aimeri/ao3_works
15
- language:
16
- - en
17
- pipeline_tag: text-generation
18
- ---
19
-
20
- # SpoomplesMaxx Base
21
-
22
- A continued pre-training (CPT) checkpoint of [Qwen/Qwen3-14B-Base](https://huggingface.co/Qwen/Qwen3-14B-Base) fine-tuned on creative writing and roleplay data.
23
-
24
- ## Model Description
25
-
26
- This model is part of the SpoomplesMaxx training pipeline: **CPT → SFT → DPO**
27
-
28
- The CPT stage teaches the model:
29
- - Character understanding and portrayal
30
- - Creative fiction writing patterns
31
- - Fandom/wiki-style lore knowledge
32
- - Dialogue patterns for roleplay
33
-
34
- ## Training Data
35
-
36
- ### Phase 1: Core Knowledge
37
-
38
- This checkpoint was trained on data focused on character knowledge and lore:
39
-
40
- | Dataset | Source | Samples | Description |
41
- |---------|--------|---------|-------------|
42
- | **Private Dataset** | Private | ~100k (50,000 sampled) | SillyTavern-style character cards with personality, scenario, and example dialogue, as well as fanfics, essays about media and characters, short novels, and high quality roleplay data |
43
- | [nyuuzyou/fandom](https://huggingface.co/datasets/nyuuzyou/fandom) | HuggingFace | 50,000 (sampled) | Fandom wiki articles with character/world lore |
44
- | [gryffindor-ISWS/dbpedia_abstracts_fictional_characters_with_img](https://huggingface.co/datasets/gryffindor-ISWS/dbpedia_abstracts_fictional_characters_with_img) | HuggingFace | 50,000 (sampled) | DBpedia abstracts of fictional characters |
45
-
46
- **Total training samples:** ~46k
47
-
48
-
49
- ## Training Configuration
50
-
51
- | Parameter | Value |
52
- |-----------|-------|
53
- | Base Model | `Qwen/Qwen3-14B-Base` |
54
- | Steps | 3000 |
55
- | Batch Size | 1 |
56
- | Gradient Accumulation | 16 |
57
- | **Effective Batch Size** | **16** |
58
- | Learning Rate | 1e-5 |
59
- | LR Scheduler | Cosine |
60
- | Warmup Ratio | 5% |
61
- | Max Sequence Length | 8192 |
62
- | Precision | BF16 |
63
- | Optimizer | 8-bit Paged AdamW |
64
- | Gradient Checkpointing | ✓ |
65
-
66
- ### Hardware
67
-
68
- - **GPU:** 1× NVIDIA A800
69
- - **Training Time:** ~6 hours for 1000 steps
70
-
71
-
72
- ## Intended Use
73
-
74
- This model is intended for use as a creative base model for further finetuning.
75
-
76
- ### Not Recommended For:
77
- - Production deployment (use final model after full CPT → SFT → DPO pipeline)
78
- - Direct chat/instruction following (this is a base model continuation, not instruction-tuned)
79
-
80
- ## Limitations
81
-
82
- - **No instruction tuning:** This model continues raw text, not chat/instructions
83
- - **Private data bias:** Heavy weighting toward private character cards may introduce specific character patterns
84
- - **NSFW content:** Training data includes creative fiction that may contain mature themes. No safety filtering was applied at this stage.
85
-
86
- ## How to Use
87
-
88
- ```python
89
- from transformers import AutoModelForCausalLM, AutoTokenizer
90
-
91
- model = AutoModelForCausalLM.from_pretrained(
92
- "aimeri/SpoomplesMaxx-Base",
93
- dtype="auto",
94
- device_map="auto",
95
- )
96
- tokenizer = AutoTokenizer.from_pretrained("aimeri/SpoomplesMaxx-Base")
97
-
98
- # CPT models continue text, not chat
99
- prompt = "The castle stood silent against the darkening sky, its towers reaching toward clouds that promised rain. Inside,"
100
-
101
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
102
- outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.8)
103
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
104
- ```
105
-
106
- ## Citation
107
-
108
- If you use this model, please cite the base model and datasets:
109
-
110
- ```bibtex
111
- @misc{qwen3-14b-base,
112
- title={Qwen3-14B-Base},
113
- author={Qwen Team},
114
- year={2025},
115
- publisher={Hugging Face},
116
- url={https://huggingface.co/Qwen/Qwen3-14B-Base}
117
- }
118
- ```
119
-
120
- ## Acknowledgments
121
-
122
- - [Qwen Team](https://huggingface.co/Qwen) for the excellent base model
123
- - [nyuuzyou](https://huggingface.co/nyuuzyou) for the Fandom wiki dataset
124
-
125
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
all_results.json DELETED
@@ -1,15 +0,0 @@
1
- {
2
- "epoch": 0.07193533013820576,
3
- "eval/perplexity": 8.722142824245864,
4
- "eval_loss": 2.165864944458008,
5
- "eval_runtime": 19.455,
6
- "eval_samples_per_second": 5.089,
7
- "eval_steps_per_second": 5.089,
8
- "perplexity": 8.722142824245864,
9
- "total_flos": 8.662378235309568e+17,
10
- "train_loss": 2.1757922117710113,
11
- "train_perplexity": 8.809161079705012,
12
- "train_runtime": 14536.9316,
13
- "train_samples_per_second": 0.55,
14
- "train_steps_per_second": 0.069
15
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eval_results.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "epoch": 0.07193533013820576,
3
- "eval/perplexity": 8.722142824245864,
4
- "eval_loss": 2.165864944458008,
5
- "eval_runtime": 19.455,
6
- "eval_samples_per_second": 5.089,
7
- "eval_steps_per_second": 5.089,
8
- "perplexity": 8.722142824245864
9
- }
 
 
 
 
 
 
 
 
 
 
hyperparameters.json DELETED
@@ -1,26 +0,0 @@
1
- {
2
- "stage": "CPT",
3
- "phase": "1",
4
- "model_id": "aimeri/SpoomplesMaxx-CPT-2-Base",
5
- "num_epochs": 1.0,
6
- "max_steps": 1000,
7
- "batch_size": 1,
8
- "grad_accum": 8,
9
- "effective_batch_size": 8,
10
- "learning_rate": 1e-05,
11
- "weight_decay": 0.01,
12
- "warmup_ratio": 0.05,
13
- "max_seq_length": 8192,
14
- "seed": 42,
15
- "sample_seed": null,
16
- "max_samples_per_dataset": 50000,
17
- "priority_datasets": null,
18
- "priority_repeat": 50,
19
- "cpt_datasets": [
20
- "json:./data/character_cards/characters.jsonl",
21
- "json:./data/ao3_works.jsonl",
22
- "nyuuzyou/fandom",
23
- "gryffindor-ISWS/dbpedia_abstracts_fictional_characters_with_img"
24
- ],
25
- "cache_key": "bf54897d9750"
26
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e5a7408fe537c75552d493763bc97433f4d4fc188f732ff31c4adbaf70f9a585
3
  size 29536666272
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45073d6d75a5fab2d83bea0bdaffc7e6fd054e4366b9adb227033faf6f1a4783
3
  size 29536666272
train_metrics.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "train_runtime": 14536.9316,
3
- "train_samples_per_second": 0.55,
4
- "train_steps_per_second": 0.069,
5
- "total_flos": 8.662378235309568e+17,
6
- "train_loss": 2.1757922117710113,
7
- "epoch": 0.07193533013820576,
8
- "train_perplexity": 8.809161079705012
9
- }
 
 
 
 
 
 
 
 
 
 
train_results.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "epoch": 0.07193533013820576,
3
- "total_flos": 8.662378235309568e+17,
4
- "train_loss": 2.1757922117710113,
5
- "train_perplexity": 8.809161079705012,
6
- "train_runtime": 14536.9316,
7
- "train_samples_per_second": 0.55,
8
- "train_steps_per_second": 0.069
9
- }