Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

README.md +0 -125
all_results.json +0 -15
eval_results.json +0 -9
hyperparameters.json +0 -26
model.safetensors +1 -1
train_metrics.json +0 -9
train_results.json +0 -9

README.md DELETED Viewed

@@ -1,125 +0,0 @@
----
-license: mit
-base_model: Qwen/Qwen3-14B-Base
-tags:
-- cpt
-- continued-pretraining
-- roleplay
-- creative-writing
-- character-cards
-- fiction
-datasets:
-- nyuuzyou/fandom
-- gryffindor-ISWS/dbpedia_abstracts_fictional_characters_with_img
-- aimeri/ao3_works
-language:
-- en
-pipeline_tag: text-generation
----
-# SpoomplesMaxx Base
-A continued pre-training (CPT) checkpoint of [Qwen/Qwen3-14B-Base](https://huggingface.co/Qwen/Qwen3-14B-Base) fine-tuned on creative writing and roleplay data.
-## Model Description
-This model is part of the SpoomplesMaxx training pipeline: **CPT → SFT → DPO**
-The CPT stage teaches the model:
-- Character understanding and portrayal
-- Creative fiction writing patterns
-- Fandom/wiki-style lore knowledge
-- Dialogue patterns for roleplay
-## Training Data
-### Phase 1: Core Knowledge
-This checkpoint was trained on data focused on character knowledge and lore:
-| Dataset | Source | Samples | Description |
-|---------|--------|---------|-------------|
-| **Private Dataset** | Private | ~100k (50,000 sampled) | SillyTavern-style character cards with personality, scenario, and example dialogue, as well as fanfics, essays about media and characters, short novels, and high quality roleplay data |
-| [nyuuzyou/fandom](https://huggingface.co/datasets/nyuuzyou/fandom) | HuggingFace | 50,000 (sampled) | Fandom wiki articles with character/world lore |
-| [gryffindor-ISWS/dbpedia_abstracts_fictional_characters_with_img](https://huggingface.co/datasets/gryffindor-ISWS/dbpedia_abstracts_fictional_characters_with_img) | HuggingFace | 50,000 (sampled) | DBpedia abstracts of fictional characters |
-**Total training samples:** ~46k
-## Training Configuration
-| Parameter | Value |
-|-----------|-------|
-| Base Model | `Qwen/Qwen3-14B-Base` |
-| Steps | 3000 |
-| Batch Size | 1 |
-| Gradient Accumulation | 16 |
-| **Effective Batch Size** | **16** |
-| Learning Rate | 1e-5 |
-| LR Scheduler | Cosine |
-| Warmup Ratio | 5% |
-| Max Sequence Length | 8192 |
-| Precision | BF16 |
-| Optimizer | 8-bit Paged AdamW |
-| Gradient Checkpointing | ✓ |
-### Hardware
-- **GPU:** 1× NVIDIA A800
-- **Training Time:** ~6 hours for 1000 steps
-## Intended Use
-This model is intended for use as a creative base model for further finetuning.
-### Not Recommended For:
-- Production deployment (use final model after full CPT → SFT → DPO pipeline)
-- Direct chat/instruction following (this is a base model continuation, not instruction-tuned)
-## Limitations
-- **No instruction tuning:** This model continues raw text, not chat/instructions
-- **Private data bias:** Heavy weighting toward private character cards may introduce specific character patterns
-- **NSFW content:** Training data includes creative fiction that may contain mature themes. No safety filtering was applied at this stage.
-## How to Use
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained(
-    "aimeri/SpoomplesMaxx-Base",
-    dtype="auto",
-    device_map="auto",
-)
-tokenizer = AutoTokenizer.from_pretrained("aimeri/SpoomplesMaxx-Base")
-# CPT models continue text, not chat
-prompt = "The castle stood silent against the darkening sky, its towers reaching toward clouds that promised rain. Inside,"
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.8)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```
-## Citation
-If you use this model, please cite the base model and datasets:
-```bibtex
-@misc{qwen3-14b-base,
-  title={Qwen3-14B-Base},
-  author={Qwen Team},
-  year={2025},
-  publisher={Hugging Face},
-  url={https://huggingface.co/Qwen/Qwen3-14B-Base}
-}
-```
-## Acknowledgments
-- [Qwen Team](https://huggingface.co/Qwen) for the excellent base model
-- [nyuuzyou](https://huggingface.co/nyuuzyou) for the Fandom wiki dataset
----

all_results.json DELETED Viewed

@@ -1,15 +0,0 @@
-{
-    "epoch": 0.07193533013820576,
-    "eval/perplexity": 8.722142824245864,
-    "eval_loss": 2.165864944458008,
-    "eval_runtime": 19.455,
-    "eval_samples_per_second": 5.089,
-    "eval_steps_per_second": 5.089,
-    "perplexity": 8.722142824245864,
-    "total_flos": 8.662378235309568e+17,
-    "train_loss": 2.1757922117710113,
-    "train_perplexity": 8.809161079705012,
-    "train_runtime": 14536.9316,
-    "train_samples_per_second": 0.55,
-    "train_steps_per_second": 0.069
-}

eval_results.json DELETED Viewed

@@ -1,9 +0,0 @@
-{
-    "epoch": 0.07193533013820576,
-    "eval/perplexity": 8.722142824245864,
-    "eval_loss": 2.165864944458008,
-    "eval_runtime": 19.455,
-    "eval_samples_per_second": 5.089,
-    "eval_steps_per_second": 5.089,
-    "perplexity": 8.722142824245864
-}

hyperparameters.json DELETED Viewed

@@ -1,26 +0,0 @@
-{
-  "stage": "CPT",
-  "phase": "1",
-  "model_id": "aimeri/SpoomplesMaxx-CPT-2-Base",
-  "num_epochs": 1.0,
-  "max_steps": 1000,
-  "batch_size": 1,
-  "grad_accum": 8,
-  "effective_batch_size": 8,
-  "learning_rate": 1e-05,
-  "weight_decay": 0.01,
-  "warmup_ratio": 0.05,
-  "max_seq_length": 8192,
-  "seed": 42,
-  "sample_seed": null,
-  "max_samples_per_dataset": 50000,
-  "priority_datasets": null,
-  "priority_repeat": 50,
-  "cpt_datasets": [
-    "json:./data/character_cards/characters.jsonl",
-    "json:./data/ao3_works.jsonl",
-    "nyuuzyou/fandom",
-    "gryffindor-ISWS/dbpedia_abstracts_fictional_characters_with_img"
-  ],
-  "cache_key": "bf54897d9750"
-}

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e5a7408fe537c75552d493763bc97433f4d4fc188f732ff31c4adbaf70f9a585
 size 29536666272

 version https://git-lfs.github.com/spec/v1
+oid sha256:45073d6d75a5fab2d83bea0bdaffc7e6fd054e4366b9adb227033faf6f1a4783
 size 29536666272

train_metrics.json DELETED Viewed

@@ -1,9 +0,0 @@
-{
-  "train_runtime": 14536.9316,
-  "train_samples_per_second": 0.55,
-  "train_steps_per_second": 0.069,
-  "total_flos": 8.662378235309568e+17,
-  "train_loss": 2.1757922117710113,
-  "epoch": 0.07193533013820576,
-  "train_perplexity": 8.809161079705012
-}

train_results.json DELETED Viewed

@@ -1,9 +0,0 @@
-{
-    "epoch": 0.07193533013820576,
-    "total_flos": 8.662378235309568e+17,
-    "train_loss": 2.1757922117710113,
-    "train_perplexity": 8.809161079705012,
-    "train_runtime": 14536.9316,
-    "train_samples_per_second": 0.55,
-    "train_steps_per_second": 0.069
-}