Delta-Vector commited on
Commit
d30aab4
·
verified ·
1 Parent(s): eea3d33

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +347 -20
README.md CHANGED
@@ -1,35 +1,362 @@
1
  ---
 
 
 
2
  base_model:
3
- - unsloth/phi-4
4
- - NewEden/phi4-pt-out-r2
5
- library_name: transformers
6
  tags:
7
- - mergekit
8
- - merge
9
-
 
10
  ---
11
- # phi-pretrain-v2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
 
14
 
15
- ## Merge Details
16
- ### Merge Method
17
 
18
- This model was merged using the Passthrough merge method using [unsloth/phi-4](https://huggingface.co/unsloth/phi-4) + [NewEden/phi4-pt-out-r2](https://huggingface.co/NewEden/phi4-pt-out-r2) as a base.
19
 
20
- ### Models Merged
 
 
 
 
 
21
 
22
- The following models were included in the merge:
23
 
24
 
25
- ### Configuration
26
 
27
- The following YAML configuration was used to produce this model:
28
 
 
 
 
 
29
  ```yaml
30
- base_model: unsloth/phi-4+NewEden/phi4-pt-out-r2
31
- dtype: bfloat16
32
- merge_method: passthrough
33
- models:
34
- - model: unsloth/phi-4+NewEden/phi4-pt-out-r2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - NewEden/Orion-Asstr-Stories-16K
4
+ - Mielikki/Erebus-87k
5
  base_model:
6
+ - Unsloth/phi-4
 
 
7
  tags:
8
+ - phi
9
+ - roleplay
10
+ - finetune
11
+ - storywriting
12
  ---
13
+ <!DOCTYPE html>
14
+ <style>
15
+ html, body {
16
+ background: black;
17
+ color: #c9d1d9 !important;
18
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
19
+ margin: 0;
20
+ padding: 0;
21
+ min-height: 100vh;
22
+ }
23
+ .markdown-body {
24
+ color: white;
25
+ margin: 40px auto;
26
+ padding: 40px;
27
+ border-radius: 12px;
28
+ position: relative;
29
+ overflow: hidden;
30
+ }
31
+
32
+ .markdown-body::after {
33
+ content: '';
34
+ position: absolute;
35
+ top: 0;
36
+ left: 0;
37
+ width: 100%;
38
+ height: 100%;
39
+ background: #0c0f18; /* background color */
40
+ pointer-events: none;
41
+ z-index: -999;
42
+ }
43
+
44
+ h1, h2, h3 {
45
+ background: linear-gradient(45deg, #6e00ff, #00ffff);
46
+ -webkit-background-clip: text;
47
+ -webkit-text-fill-color: transparent;
48
+ border-bottom: 1px solid #333;
49
+ padding-bottom: 0.3em;
50
+ }
51
+
52
+ div[style*="border:2px solid #333"],
53
+ div[style*="border: 2px solid #333"],
54
+ div[style*="border:1px solid #333"],
55
+ div[style*="border: 1px solid #333"] {
56
+ background: rgba(22, 27, 34, 0.8) !important;
57
+ border: 2px solid #6e00ff !important;
58
+ box-shadow: 0 0 15px rgba(110, 0, 255, 0.5);
59
+ border-radius: 10px;
60
+ padding: 20px;
61
+ margin: 20px 0;
62
+ }
63
+
64
+ code {
65
+ background-color: #1a1a1a !important;
66
+ border-radius: 4px;
67
+ padding: 0.2em 0.4em;
68
+ color: #00ffff;
69
+ }
70
+
71
+ pre {
72
+ background-color: #1a1a1a !important;
73
+ border: 1px solid #333;
74
+ border-radius: 8px;
75
+ padding: 16px;
76
+ }
77
+
78
+ table {
79
+ width: 100%;
80
+ border-collapse: collapse;
81
+ margin: 20px 0;
82
+ background: rgba(0,0,0,0.2);
83
+ table-layout: fixed;
84
+ color: white;
85
+ }
86
+
87
+ th, td {
88
+ border: 1px solid #333;
89
+ padding: 12px;
90
+ text-align: center;
91
+ color: white;
92
+ }
93
+
94
+ th {
95
+ background: rgba(110, 0, 255, 0.1);
96
+ }
97
+
98
+ td:nth-child(1) {
99
+ width: 1%;
100
+ white-space: nowrap;
101
+ }
102
+
103
+ td:nth-child(2) {
104
+ width: 100%;
105
+ }
106
+
107
+ td > span {
108
+ display: block;
109
+ padding: 4px 8px;
110
+ background: rgba(110, 0, 255, 0.1);
111
+ border-radius: 4px;
112
+ transition: all 0.3s ease;
113
+ }
114
+
115
+ td > span:hover {
116
+ background: rgba(110, 0, 255, 0.2);
117
+ transform: translateY(-1px);
118
+ }
119
+
120
+ a {
121
+ color: #00ffff;
122
+ text-decoration: none;
123
+ transition: all 0.3s ease;
124
+ }
125
+
126
+ a:hover {
127
+ color: #6e00ff;
128
+ text-decoration: none;
129
+ }
130
+
131
+ hr {
132
+ border: 0;
133
+ height: 1px;
134
+ background: linear-gradient(90deg, transparent, #333, transparent);
135
+ margin: 40px 0;
136
+ }
137
+
138
+ img {
139
+ max-width: 100%;
140
+ border-radius: 10px;
141
+ }
142
+
143
+ details summary:hover {
144
+ color: #00ffff;
145
+ }
146
+
147
+ * {
148
+ color-scheme: dark !important;
149
+ }
150
+
151
+ .prose, .max-w-none, .px-4 {
152
+ background-color: transparent !important;
153
+ color: #c9d1d9 !important;
154
+ }
155
+ </style>
156
+ <body>
157
+ <div class="markdown-body">
158
+ <div align="center">
159
+
160
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/66c26b6fb01b19d8c3c2467b/o5WjJKA9f95ri9UzRxZQE.png" alt="Model Visualization" width="500px" style="border: 3px solid #333; box-shadow: 0 0 15px rgba(66, 0, 131, 0.5);" />
161
+
162
+ <br>
163
+ <br>
164
+
165
+ <div style="font-size:1.5em; font-weight:bold; background: linear-gradient(45deg, #6e00ff, #00ffff); -webkit-background-clip: text; -webkit-text-fill-color: transparent;">
166
+ Hamanasu 15B R2 PT
167
+ </div>
168
+
169
+ </div>
170
+
171
+ <div style="border:1px solid #333; border-radius:10px; padding:20px; margin:20px 0; background: rgba(0,0,0,0.4);">
172
+
173
+ ## 🌌 Overview
174
+
175
+ <i>This is the 1st pretrain of Phi-4 with the following: /i>
176
 
177
+ - `NewEden/Orion-LIT`
178
+
179
+ <i>This model has *not* been instruct tuned, Ablities to converse may be reduced from the original model, If you would like to roleplay, Please use the Instruct version.</i>
180
 
181
+ </div>
 
182
 
183
+ <div style="border:2px solid #333; border-radius:10px; padding:20px; background: rgba(0,0,0,0.2);">
184
 
185
+ ### ⚔️ Hardware
186
+ - 4x RTX 3090 GPUs
187
+ - Epochs: 1
188
+ - Base: `Unsloth/phi-4`
189
+ - Amount of Tokens: 500 Million
190
+ </div>
191
 
 
192
 
193
 
194
+ </div>
195
 
196
+ <div style="border: 2px solid #6e00ff; border-radius: 10px; padding: 20px; margin: 20px 0; box-shadow: 0 0 15px rgba(110, 0, 255, 0.5);">
197
 
198
+ ## Axolotl Config ꒰(˶• ᴗ •˶)꒱
199
+
200
+ <details>
201
+
202
  ```yaml
203
+ base_model: unsloth_phi-4
204
+ model_type: AutoModelForCausalLM
205
+ tokenizer_type: AutoTokenizer
206
+
207
+ #hub_model_id: NewEden/Phi4-pretrain
208
+ #hub_strategy: "all_checkpoints"
209
+ #push_dataset_to_hub:
210
+ #hf_use_auth_token: true
211
+
212
+ plugins:
213
+ - axolotl.integrations.liger.LigerPlugin
214
+ liger_rope: true
215
+ liger_rms_norm: true
216
+ liger_swiglu: true
217
+ liger_fused_linear_cross_entropy: true
218
+
219
+ #plugins:
220
+ # - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
221
+
222
+ #cut_cross_entropy: true
223
+
224
+ load_in_8bit: false
225
+ load_in_4bit: false
226
+ strict: false
227
+
228
+ datasets:
229
+ - path: Mielikki/Erebus-87k
230
+ type: completion
231
+ field: body
232
+ - path: NewEden/Orion-Asstr-Stories-16K
233
+ type: completion
234
+ field: content
235
+ shuffle_merged_datasets: true
236
+ dataset_prepared_path: prepared_data
237
+ val_set_size: 0.0
238
+ output_dir: ./phi4-pt-out-r2
239
+
240
+ sequence_len: 16384
241
+ sample_packing: true
242
+ pad_to_sequence_len: true
243
+
244
+ adapter: lora
245
+ lora_model_dir:
246
+ lora_r: 128
247
+ lora_alpha: 16
248
+ lora_dropout: 0.05
249
+ lora_target_modules:
250
+ - gate_proj
251
+ - down_proj
252
+ - up_proj
253
+ - q_proj
254
+ - v_proj
255
+ - k_proj
256
+ - o_proj
257
+
258
+ lora_modules_to_save:
259
+ - embed_tokens
260
+ - lm_head
261
+
262
+
263
+ wandb_project: mag-phi
264
+ wandb_entity:
265
+ wandb_watch:
266
+ wandb_name: attempt-02
267
+ wandb_log_model:
268
+
269
+ gradient_accumulation_steps: 4
270
+ micro_batch_size: 2
271
+ num_epochs: 1
272
+ optimizer: paged_ademamix_8bit
273
+ lr_scheduler: cosine
274
+ learning_rate: 0.00001
275
+
276
+ train_on_inputs: false
277
+ group_by_length: false
278
+ bf16: auto
279
+ fp16:
280
+ tf32: false
281
+
282
+ gradient_checkpointing: unsloth
283
+ early_stopping_patience:
284
+ resume_from_checkpoint:
285
+ local_rank:
286
+ logging_steps: 1
287
+ xformers_attention:
288
+ flash_attention: true
289
+
290
+ warmup_steps: 10
291
+ evals_per_epoch: 4
292
+ eval_table_size:
293
+ eval_max_new_tokens: 128
294
+ saves_per_epoch: 4
295
+ debug:
296
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
297
+ weight_decay: 0.01
298
+ fsdp:
299
+ fsdp_config:
300
  ```
301
+
302
+ </details>
303
+ </div>
304
+
305
+
306
+ <div align="center">
307
+
308
+ <div style="border: 2px solid #6e00ff; border-radius: 10px; padding: 20px; margin: 20px 0; box-shadow: 0 0 15px rgba(110, 0, 255, 0.5);">
309
+
310
+ ## ⚡ Credits
311
+ <div style="display: flex; justify-content: center;">
312
+ <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 10px; margin: 20px 0; max-width: 600px;">
313
+
314
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
315
+ <a href="https://huggingface.co/lucyknada">
316
+ <img src="https://img.shields.io/badge/%F0%9F%8C%9F-Lucy_Knada-blueviolet" alt="Lucy Knada">
317
+ </a>
318
+ </div>
319
+
320
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
321
+ <a href="https://huggingface.co/jeiku">
322
+ <img src="https://img.shields.io/badge/%E2%9A%94%EF%B8%8F-jeiku-blueviolet" alt="jeiku">
323
+ </a>
324
+ </div>
325
+
326
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
327
+ <a href="https://huggingface.co/intervitens">
328
+ <img src="https://img.shields.io/badge/%F0%9F%9B%A1%EF%B8%8F-Intervitens-blueviolet" alt="Intervitens">
329
+ </a>
330
+ </div>
331
+
332
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
333
+ <a href="https://huggingface.co/kalomaze">
334
+ <img src="https://img.shields.io/badge/%F0%9F%94%AE-Kalomaze-blueviolet" alt="Kalomaze">
335
+ </a>
336
+ </div>
337
+
338
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
339
+ <a href="https://huggingface.co/kubernetes-bad">
340
+ <img src="https://img.shields.io/badge/%E2%9A%A1-Kubernetes_Bad-blueviolet" alt="Kubernetes Bad">
341
+ </a>
342
+ </div>
343
+
344
+ <div style="border:1px solid #333; padding:10px; border-radius:5px; text-align:center; background: rgba(0,0,0,0.2); display: flex; align-items: center; justify-content: center;">
345
+ <a href="https://huggingface.co/anthracite-org">
346
+ <img src="https://img.shields.io/badge/%F0%9F%8C%91-Anthracite-blueviolet" alt="Anthracite">
347
+ </a>
348
+ </div>
349
+ </div>
350
+ </div>
351
+ </div>
352
+
353
+ ---
354
+
355
+ <div align="center">
356
+ <div style="font-size:0.8em; opacity:0.8;">Made by</div>
357
+ <div style="font-size:1.2em; font-weight:bold; background: linear-gradient(45deg, #6e00ff, #00ffff); -webkit-background-clip: text; -webkit-text-fill-color: transparent;">Delta-Vector</div>
358
+ </div>
359
+
360
+ </div>
361
+ </body>
362
+ </html>