prince-canuma commited on
Commit
39159bd
·
verified ·
1 Parent(s): 8dae1fa

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +8 -236
  2. config.json +20 -964
  3. model-00001-of-00046.safetensors +2 -2
  4. model-00002-of-00046.safetensors +2 -2
  5. model-00003-of-00046.safetensors +2 -2
  6. model-00004-of-00046.safetensors +2 -2
  7. model-00005-of-00046.safetensors +2 -2
  8. model-00006-of-00046.safetensors +2 -2
  9. model-00007-of-00046.safetensors +2 -2
  10. model-00008-of-00046.safetensors +2 -2
  11. model-00009-of-00046.safetensors +2 -2
  12. model-00010-of-00046.safetensors +2 -2
  13. model-00011-of-00046.safetensors +2 -2
  14. model-00012-of-00046.safetensors +2 -2
  15. model-00013-of-00046.safetensors +2 -2
  16. model-00014-of-00046.safetensors +2 -2
  17. model-00015-of-00046.safetensors +2 -2
  18. model-00016-of-00046.safetensors +2 -2
  19. model-00017-of-00046.safetensors +2 -2
  20. model-00018-of-00046.safetensors +2 -2
  21. model-00019-of-00046.safetensors +2 -2
  22. model-00020-of-00046.safetensors +2 -2
  23. model-00021-of-00046.safetensors +2 -2
  24. model-00022-of-00046.safetensors +2 -2
  25. model-00023-of-00046.safetensors +2 -2
  26. model-00024-of-00046.safetensors +2 -2
  27. model-00025-of-00046.safetensors +2 -2
  28. model-00026-of-00046.safetensors +2 -2
  29. model-00027-of-00046.safetensors +2 -2
  30. model-00028-of-00046.safetensors +2 -2
  31. model-00029-of-00046.safetensors +2 -2
  32. model-00030-of-00046.safetensors +2 -2
  33. model-00031-of-00046.safetensors +2 -2
  34. model-00032-of-00046.safetensors +2 -2
  35. model-00033-of-00046.safetensors +2 -2
  36. model-00034-of-00046.safetensors +2 -2
  37. model-00035-of-00046.safetensors +2 -2
  38. model-00036-of-00046.safetensors +2 -2
  39. model-00037-of-00046.safetensors +2 -2
  40. model-00038-of-00046.safetensors +2 -2
  41. model-00039-of-00046.safetensors +2 -2
  42. model-00040-of-00046.safetensors +2 -2
  43. model-00041-of-00046.safetensors +2 -2
  44. model-00042-of-00046.safetensors +2 -2
  45. model-00043-of-00046.safetensors +2 -2
  46. model-00044-of-00046.safetensors +2 -2
  47. model-00045-of-00046.safetensors +2 -2
  48. model-00046-of-00046.safetensors +2 -2
  49. model.safetensors.index.json +335 -3
  50. preprocessor_config.json +21 -0
README.md CHANGED
@@ -1,249 +1,21 @@
1
  ---
2
- library_name: mlx
3
  license: apache-2.0
4
  license_link: https://huggingface.co/Qwen/Qwen3.5-397B-A17B/blob/main/LICENSE
5
- base_model: Qwen/Qwen3.5-397B-A17B
6
- pipeline_tag: text-generation
7
  tags:
8
  - mlx
9
- - 4bit
10
- - quantized
11
- - qwen3_5_moe
12
- - moe
13
- - mixture-of-experts
14
- - text-generation
15
- - conversational
16
- - apple-silicon
17
- language:
18
- - multilingual
19
  ---
20
 
21
- # Qwen3.5-397B-A17B-4bit (MLX)
22
-
23
- 4-bit [MLX](https://github.com/ml-explore/mlx) quantized version of the **text** model from [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B).
24
-
25
- Portions of this card were copied or adapted from the original model card, authored by the Qwen team.
26
-
27
- ## Model Overview
28
-
29
- Qwen3.5-397B-A17B is Alibaba's latest flagship language model, featuring a hybrid architecture that combines Gated DeltaNet (linear attention) with sparse Mixture-of-Experts for high-throughput inference. Despite having 397B total parameters, only ~17B are activated per token, making it remarkably efficient for its capability level.
30
-
31
- This conversion provides a **text-only** 4-bit quantized version optimized for local inference on Apple Silicon Macs via the MLX framework. The vision encoder from the original multimodal model is not included — for image/video understanding, refer to the original [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B).
32
-
33
- ### Key Capabilities
34
-
35
- - **201 languages and dialects** with deep cultural and regional understanding
36
- - **262K native context** (extensible to 1M+ with YaRN)
37
- - **Thinking mode** with chain-of-thought reasoning (`<think>...</think>`)
38
- - **Tool use and agentic workflows** (MCP, function calling)
39
- - **Competitive benchmarks**: MMLU-Pro 87.8, SuperGPQA 70.4, C-Eval 93.0
40
-
41
- ## Architecture
42
-
43
- | Parameter | Value |
44
- |---|---|
45
- | Total Parameters | 397B |
46
- | Active Parameters | ~17B |
47
- | Hidden Size | 4,096 |
48
- | Layers | 60 |
49
- | Layer Layout | 15 × (3 × Gated DeltaNet + 1 × Full Attention), all with MoE FFN |
50
- | Total Experts | 512 |
51
- | Active Experts per Token | 10 routed + 1 shared |
52
- | Expert Intermediate Size | 1,024 |
53
- | Full Attention Heads | 32 Q / 2 KV (GQA), head dim 256 |
54
- | Linear Attention Heads | 16 QK / 64 V, head dim 128 |
55
- | Context Length | 262,144 tokens |
56
- | Vocab Size | 248,320 |
57
-
58
- ## Quantization Details
59
-
60
- | Parameter | Value |
61
- |---|---|
62
- | Method | Affine quantization |
63
- | Bits | 4-bit (weights) |
64
- | Group Size | 64 |
65
- | MoE Router Gates | 8-bit (preserved at higher precision) |
66
- | Model Size on Disk | ~223 GB |
67
-
68
- The MoE router gates (`mlp.gate` and `mlp.shared_expert_gate` for all 60 layers) are kept at 8-bit precision to preserve routing accuracy, which is critical for Mixture-of-Experts models.
69
-
70
- ## Requirements
71
-
72
- - Apple Silicon Mac with **at least 256 GB unified memory** (e.g., Mac Studio M3 Ultra 256GB+)
73
- - Python 3.10+
74
- - [`mlx-lm`](https://github.com/ml-explore/mlx-lm) from the `main` branch
75
-
76
- ## Installation
77
 
78
  ```bash
79
- pip install git+https://github.com/ml-explore/mlx-lm
80
  ```
81
 
82
- ## Usage
83
-
84
- ### Quick Start — Python API
85
-
86
- ```python
87
- from mlx_lm import load, generate
88
-
89
- model, tokenizer = load("mlx-community/Qwen3.5-397B-A17B-4bit")
90
-
91
- messages = [{"role": "user", "content": "Explain the Riemann hypothesis in simple terms."}]
92
- prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
93
-
94
- response = generate(
95
- model,
96
- tokenizer,
97
- prompt=prompt,
98
- max_tokens=4096,
99
- verbose=True,
100
- temp=0.6,
101
- top_p=0.95,
102
- )
103
- ```
104
-
105
- ### Thinking Mode (Default)
106
-
107
- The model defaults to thinking mode, producing chain-of-thought reasoning inside `<think>...</think>` tags before the final answer:
108
-
109
- ```python
110
- from mlx_lm import load, generate
111
-
112
- model, tokenizer = load("mlx-community/Qwen3.5-397B-A17B-4bit")
113
-
114
- messages = [
115
- {"role": "user", "content": "How many r's are in the word 'strawberry'?"}
116
- ]
117
- prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
118
-
119
- response = generate(
120
- model,
121
- tokenizer,
122
- prompt=prompt,
123
- max_tokens=8192,
124
- verbose=True,
125
- temp=0.6,
126
- top_p=0.95,
127
- )
128
- ```
129
-
130
- ### Non-Thinking Mode
131
-
132
- For faster, more direct responses without chain-of-thought reasoning:
133
-
134
- ```python
135
- from mlx_lm import load, generate
136
-
137
- model, tokenizer = load("mlx-community/Qwen3.5-397B-A17B-4bit")
138
-
139
- messages = [
140
- {"role": "user", "content": "Write a haiku about machine learning."}
141
- ]
142
- prompt = tokenizer.apply_chat_template(
143
- messages,
144
- add_generation_prompt=True,
145
- enable_thinking=False,
146
- )
147
-
148
- response = generate(
149
- model,
150
- tokenizer,
151
- prompt=prompt,
152
- max_tokens=2048,
153
- verbose=True,
154
- temp=0.7,
155
- top_p=0.8,
156
- )
157
- ```
158
-
159
- ### Command Line
160
-
161
- ```bash
162
- # Thinking mode (default)
163
- mlx_lm.generate \
164
- --model mlx-community/Qwen3.5-397B-A17B-4bit \
165
- --prompt "What are the key differences between TCP and UDP?" \
166
- --max-tokens 4096 \
167
- --temp 0.6 \
168
- --top-p 0.95
169
-
170
- # Start a local chat server (OpenAI-compatible)
171
- mlx_lm.server --model mlx-community/Qwen3.5-397B-A17B-4bit
172
- ```
173
-
174
- ### Local OpenAI-Compatible Server
175
-
176
- Start the server:
177
-
178
- ```bash
179
- mlx_lm.server --model mlx-community/Qwen3.5-397B-A17B-4bit --port 8080
180
- ```
181
-
182
- Then query it with any OpenAI-compatible client:
183
-
184
- ```python
185
- from openai import OpenAI
186
-
187
- client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
188
-
189
- response = client.chat.completions.create(
190
- model="mlx-community/Qwen3.5-397B-A17B-4bit",
191
- messages=[
192
- {"role": "system", "content": "You are a helpful assistant."},
193
- {"role": "user", "content": "Write a Python function to find all prime numbers up to n using the Sieve of Eratosthenes."},
194
- ],
195
- max_tokens=4096,
196
- temperature=0.6,
197
- top_p=0.95,
198
- )
199
- print(response.choices[0].message.content)
200
- ```
201
-
202
- Or with `curl`:
203
-
204
  ```bash
205
- curl http://localhost:8080/v1/chat/completions \
206
- -H "Content-Type: application/json" \
207
- -d '{
208
- "model": "mlx-community/Qwen3.5-397B-A17B-4bit",
209
- "messages": [{"role": "user", "content": "Hello!"}],
210
- "max_tokens": 512,
211
- "temperature": 0.6
212
- }'
213
- ```
214
-
215
- ## Recommended Generation Parameters
216
-
217
- | Parameter | Thinking Mode | Non-Thinking Mode |
218
- |---|---|---|
219
- | `temperature` | 0.6 | 0.7 |
220
- | `top_p` | 0.95 | 0.8 |
221
- | `top_k` | 20 | 20 |
222
- | `presence_penalty` | 0.0 | 1.5 |
223
- | `repetition_penalty` | 1.0 | 1.0 |
224
- | `max_tokens` (general) | 32,768 | 32,768 |
225
- | `max_tokens` (math/code) | 81,920 | — |
226
-
227
- ## Tips
228
-
229
- - **Thinking mode** is best for complex reasoning, math, and coding tasks. The model will produce internal reasoning before answering.
230
- - **Non-thinking mode** is better for straightforward Q&A, creative writing, and conversational use where latency matters.
231
- - For **math problems**, append: *"Please reason step by step, and put your final answer within \boxed{}."*
232
- - For **multi-turn conversations**, the default chat template automatically strips thinking content from prior turns.
233
- - If running into **memory pressure**, consider closing other applications to free unified memory.
234
-
235
- ## Original Model
236
-
237
- This is a quantized version of [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B). Refer to the original model card for full benchmark results, training details, and the technical report.
238
-
239
- ## Citation
240
-
241
- ```bibtex
242
- @misc{qwen3.5,
243
- title = {{Qwen3.5}: Towards Native Multimodal Agents},
244
- author = {{Qwen Team}},
245
- month = {February},
246
- year = {2026},
247
- url = {https://qwen.ai/blog?id=qwen3.5}
248
- }
249
  ```
 
1
  ---
2
+ library_name: transformers
3
  license: apache-2.0
4
  license_link: https://huggingface.co/Qwen/Qwen3.5-397B-A17B/blob/main/LICENSE
5
+ pipeline_tag: image-text-to-text
 
6
  tags:
7
  - mlx
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
+ # mlx-community/Qwen3.5-397B-A17B-4bit
11
+ This model was converted to MLX format from [`Qwen/Qwen3.5-397B-A17B`]() using mlx-vlm version **0.3.12**.
12
+ Refer to the [original model card](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) for more details on the model.
13
+ ## Use with mlx
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  ```bash
16
+ pip install -U mlx-vlm
17
  ```
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ```bash
20
+ python -m mlx_vlm.generate --model mlx-community/Qwen3.5-397B-A17B-4bit --max-tokens 100 --temperature 0.0 --prompt "Describe this image." --image <path_to_image>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ```
config.json CHANGED
@@ -11,972 +11,12 @@
11
  "quantization": {
12
  "group_size": 64,
13
  "bits": 4,
14
- "mode": "affine",
15
- "language_model.model.layers.0.mlp.gate": {
16
- "group_size": 64,
17
- "bits": 8
18
- },
19
- "language_model.model.layers.0.mlp.shared_expert_gate": {
20
- "group_size": 64,
21
- "bits": 8
22
- },
23
- "language_model.model.layers.1.mlp.gate": {
24
- "group_size": 64,
25
- "bits": 8
26
- },
27
- "language_model.model.layers.1.mlp.shared_expert_gate": {
28
- "group_size": 64,
29
- "bits": 8
30
- },
31
- "language_model.model.layers.2.mlp.gate": {
32
- "group_size": 64,
33
- "bits": 8
34
- },
35
- "language_model.model.layers.2.mlp.shared_expert_gate": {
36
- "group_size": 64,
37
- "bits": 8
38
- },
39
- "language_model.model.layers.3.mlp.gate": {
40
- "group_size": 64,
41
- "bits": 8
42
- },
43
- "language_model.model.layers.3.mlp.shared_expert_gate": {
44
- "group_size": 64,
45
- "bits": 8
46
- },
47
- "language_model.model.layers.4.mlp.gate": {
48
- "group_size": 64,
49
- "bits": 8
50
- },
51
- "language_model.model.layers.4.mlp.shared_expert_gate": {
52
- "group_size": 64,
53
- "bits": 8
54
- },
55
- "language_model.model.layers.5.mlp.gate": {
56
- "group_size": 64,
57
- "bits": 8
58
- },
59
- "language_model.model.layers.5.mlp.shared_expert_gate": {
60
- "group_size": 64,
61
- "bits": 8
62
- },
63
- "language_model.model.layers.6.mlp.gate": {
64
- "group_size": 64,
65
- "bits": 8
66
- },
67
- "language_model.model.layers.6.mlp.shared_expert_gate": {
68
- "group_size": 64,
69
- "bits": 8
70
- },
71
- "language_model.model.layers.7.mlp.gate": {
72
- "group_size": 64,
73
- "bits": 8
74
- },
75
- "language_model.model.layers.7.mlp.shared_expert_gate": {
76
- "group_size": 64,
77
- "bits": 8
78
- },
79
- "language_model.model.layers.8.mlp.gate": {
80
- "group_size": 64,
81
- "bits": 8
82
- },
83
- "language_model.model.layers.8.mlp.shared_expert_gate": {
84
- "group_size": 64,
85
- "bits": 8
86
- },
87
- "language_model.model.layers.9.mlp.gate": {
88
- "group_size": 64,
89
- "bits": 8
90
- },
91
- "language_model.model.layers.9.mlp.shared_expert_gate": {
92
- "group_size": 64,
93
- "bits": 8
94
- },
95
- "language_model.model.layers.10.mlp.gate": {
96
- "group_size": 64,
97
- "bits": 8
98
- },
99
- "language_model.model.layers.10.mlp.shared_expert_gate": {
100
- "group_size": 64,
101
- "bits": 8
102
- },
103
- "language_model.model.layers.11.mlp.gate": {
104
- "group_size": 64,
105
- "bits": 8
106
- },
107
- "language_model.model.layers.11.mlp.shared_expert_gate": {
108
- "group_size": 64,
109
- "bits": 8
110
- },
111
- "language_model.model.layers.12.mlp.gate": {
112
- "group_size": 64,
113
- "bits": 8
114
- },
115
- "language_model.model.layers.12.mlp.shared_expert_gate": {
116
- "group_size": 64,
117
- "bits": 8
118
- },
119
- "language_model.model.layers.13.mlp.gate": {
120
- "group_size": 64,
121
- "bits": 8
122
- },
123
- "language_model.model.layers.13.mlp.shared_expert_gate": {
124
- "group_size": 64,
125
- "bits": 8
126
- },
127
- "language_model.model.layers.14.mlp.gate": {
128
- "group_size": 64,
129
- "bits": 8
130
- },
131
- "language_model.model.layers.14.mlp.shared_expert_gate": {
132
- "group_size": 64,
133
- "bits": 8
134
- },
135
- "language_model.model.layers.15.mlp.gate": {
136
- "group_size": 64,
137
- "bits": 8
138
- },
139
- "language_model.model.layers.15.mlp.shared_expert_gate": {
140
- "group_size": 64,
141
- "bits": 8
142
- },
143
- "language_model.model.layers.16.mlp.gate": {
144
- "group_size": 64,
145
- "bits": 8
146
- },
147
- "language_model.model.layers.16.mlp.shared_expert_gate": {
148
- "group_size": 64,
149
- "bits": 8
150
- },
151
- "language_model.model.layers.17.mlp.gate": {
152
- "group_size": 64,
153
- "bits": 8
154
- },
155
- "language_model.model.layers.17.mlp.shared_expert_gate": {
156
- "group_size": 64,
157
- "bits": 8
158
- },
159
- "language_model.model.layers.18.mlp.gate": {
160
- "group_size": 64,
161
- "bits": 8
162
- },
163
- "language_model.model.layers.18.mlp.shared_expert_gate": {
164
- "group_size": 64,
165
- "bits": 8
166
- },
167
- "language_model.model.layers.19.mlp.gate": {
168
- "group_size": 64,
169
- "bits": 8
170
- },
171
- "language_model.model.layers.19.mlp.shared_expert_gate": {
172
- "group_size": 64,
173
- "bits": 8
174
- },
175
- "language_model.model.layers.20.mlp.gate": {
176
- "group_size": 64,
177
- "bits": 8
178
- },
179
- "language_model.model.layers.20.mlp.shared_expert_gate": {
180
- "group_size": 64,
181
- "bits": 8
182
- },
183
- "language_model.model.layers.21.mlp.gate": {
184
- "group_size": 64,
185
- "bits": 8
186
- },
187
- "language_model.model.layers.21.mlp.shared_expert_gate": {
188
- "group_size": 64,
189
- "bits": 8
190
- },
191
- "language_model.model.layers.22.mlp.gate": {
192
- "group_size": 64,
193
- "bits": 8
194
- },
195
- "language_model.model.layers.22.mlp.shared_expert_gate": {
196
- "group_size": 64,
197
- "bits": 8
198
- },
199
- "language_model.model.layers.23.mlp.gate": {
200
- "group_size": 64,
201
- "bits": 8
202
- },
203
- "language_model.model.layers.23.mlp.shared_expert_gate": {
204
- "group_size": 64,
205
- "bits": 8
206
- },
207
- "language_model.model.layers.24.mlp.gate": {
208
- "group_size": 64,
209
- "bits": 8
210
- },
211
- "language_model.model.layers.24.mlp.shared_expert_gate": {
212
- "group_size": 64,
213
- "bits": 8
214
- },
215
- "language_model.model.layers.25.mlp.gate": {
216
- "group_size": 64,
217
- "bits": 8
218
- },
219
- "language_model.model.layers.25.mlp.shared_expert_gate": {
220
- "group_size": 64,
221
- "bits": 8
222
- },
223
- "language_model.model.layers.26.mlp.gate": {
224
- "group_size": 64,
225
- "bits": 8
226
- },
227
- "language_model.model.layers.26.mlp.shared_expert_gate": {
228
- "group_size": 64,
229
- "bits": 8
230
- },
231
- "language_model.model.layers.27.mlp.gate": {
232
- "group_size": 64,
233
- "bits": 8
234
- },
235
- "language_model.model.layers.27.mlp.shared_expert_gate": {
236
- "group_size": 64,
237
- "bits": 8
238
- },
239
- "language_model.model.layers.28.mlp.gate": {
240
- "group_size": 64,
241
- "bits": 8
242
- },
243
- "language_model.model.layers.28.mlp.shared_expert_gate": {
244
- "group_size": 64,
245
- "bits": 8
246
- },
247
- "language_model.model.layers.29.mlp.gate": {
248
- "group_size": 64,
249
- "bits": 8
250
- },
251
- "language_model.model.layers.29.mlp.shared_expert_gate": {
252
- "group_size": 64,
253
- "bits": 8
254
- },
255
- "language_model.model.layers.30.mlp.gate": {
256
- "group_size": 64,
257
- "bits": 8
258
- },
259
- "language_model.model.layers.30.mlp.shared_expert_gate": {
260
- "group_size": 64,
261
- "bits": 8
262
- },
263
- "language_model.model.layers.31.mlp.gate": {
264
- "group_size": 64,
265
- "bits": 8
266
- },
267
- "language_model.model.layers.31.mlp.shared_expert_gate": {
268
- "group_size": 64,
269
- "bits": 8
270
- },
271
- "language_model.model.layers.32.mlp.gate": {
272
- "group_size": 64,
273
- "bits": 8
274
- },
275
- "language_model.model.layers.32.mlp.shared_expert_gate": {
276
- "group_size": 64,
277
- "bits": 8
278
- },
279
- "language_model.model.layers.33.mlp.gate": {
280
- "group_size": 64,
281
- "bits": 8
282
- },
283
- "language_model.model.layers.33.mlp.shared_expert_gate": {
284
- "group_size": 64,
285
- "bits": 8
286
- },
287
- "language_model.model.layers.34.mlp.gate": {
288
- "group_size": 64,
289
- "bits": 8
290
- },
291
- "language_model.model.layers.34.mlp.shared_expert_gate": {
292
- "group_size": 64,
293
- "bits": 8
294
- },
295
- "language_model.model.layers.35.mlp.gate": {
296
- "group_size": 64,
297
- "bits": 8
298
- },
299
- "language_model.model.layers.35.mlp.shared_expert_gate": {
300
- "group_size": 64,
301
- "bits": 8
302
- },
303
- "language_model.model.layers.36.mlp.gate": {
304
- "group_size": 64,
305
- "bits": 8
306
- },
307
- "language_model.model.layers.36.mlp.shared_expert_gate": {
308
- "group_size": 64,
309
- "bits": 8
310
- },
311
- "language_model.model.layers.37.mlp.gate": {
312
- "group_size": 64,
313
- "bits": 8
314
- },
315
- "language_model.model.layers.37.mlp.shared_expert_gate": {
316
- "group_size": 64,
317
- "bits": 8
318
- },
319
- "language_model.model.layers.38.mlp.gate": {
320
- "group_size": 64,
321
- "bits": 8
322
- },
323
- "language_model.model.layers.38.mlp.shared_expert_gate": {
324
- "group_size": 64,
325
- "bits": 8
326
- },
327
- "language_model.model.layers.39.mlp.gate": {
328
- "group_size": 64,
329
- "bits": 8
330
- },
331
- "language_model.model.layers.39.mlp.shared_expert_gate": {
332
- "group_size": 64,
333
- "bits": 8
334
- },
335
- "language_model.model.layers.40.mlp.gate": {
336
- "group_size": 64,
337
- "bits": 8
338
- },
339
- "language_model.model.layers.40.mlp.shared_expert_gate": {
340
- "group_size": 64,
341
- "bits": 8
342
- },
343
- "language_model.model.layers.41.mlp.gate": {
344
- "group_size": 64,
345
- "bits": 8
346
- },
347
- "language_model.model.layers.41.mlp.shared_expert_gate": {
348
- "group_size": 64,
349
- "bits": 8
350
- },
351
- "language_model.model.layers.42.mlp.gate": {
352
- "group_size": 64,
353
- "bits": 8
354
- },
355
- "language_model.model.layers.42.mlp.shared_expert_gate": {
356
- "group_size": 64,
357
- "bits": 8
358
- },
359
- "language_model.model.layers.43.mlp.gate": {
360
- "group_size": 64,
361
- "bits": 8
362
- },
363
- "language_model.model.layers.43.mlp.shared_expert_gate": {
364
- "group_size": 64,
365
- "bits": 8
366
- },
367
- "language_model.model.layers.44.mlp.gate": {
368
- "group_size": 64,
369
- "bits": 8
370
- },
371
- "language_model.model.layers.44.mlp.shared_expert_gate": {
372
- "group_size": 64,
373
- "bits": 8
374
- },
375
- "language_model.model.layers.45.mlp.gate": {
376
- "group_size": 64,
377
- "bits": 8
378
- },
379
- "language_model.model.layers.45.mlp.shared_expert_gate": {
380
- "group_size": 64,
381
- "bits": 8
382
- },
383
- "language_model.model.layers.46.mlp.gate": {
384
- "group_size": 64,
385
- "bits": 8
386
- },
387
- "language_model.model.layers.46.mlp.shared_expert_gate": {
388
- "group_size": 64,
389
- "bits": 8
390
- },
391
- "language_model.model.layers.47.mlp.gate": {
392
- "group_size": 64,
393
- "bits": 8
394
- },
395
- "language_model.model.layers.47.mlp.shared_expert_gate": {
396
- "group_size": 64,
397
- "bits": 8
398
- },
399
- "language_model.model.layers.48.mlp.gate": {
400
- "group_size": 64,
401
- "bits": 8
402
- },
403
- "language_model.model.layers.48.mlp.shared_expert_gate": {
404
- "group_size": 64,
405
- "bits": 8
406
- },
407
- "language_model.model.layers.49.mlp.gate": {
408
- "group_size": 64,
409
- "bits": 8
410
- },
411
- "language_model.model.layers.49.mlp.shared_expert_gate": {
412
- "group_size": 64,
413
- "bits": 8
414
- },
415
- "language_model.model.layers.50.mlp.gate": {
416
- "group_size": 64,
417
- "bits": 8
418
- },
419
- "language_model.model.layers.50.mlp.shared_expert_gate": {
420
- "group_size": 64,
421
- "bits": 8
422
- },
423
- "language_model.model.layers.51.mlp.gate": {
424
- "group_size": 64,
425
- "bits": 8
426
- },
427
- "language_model.model.layers.51.mlp.shared_expert_gate": {
428
- "group_size": 64,
429
- "bits": 8
430
- },
431
- "language_model.model.layers.52.mlp.gate": {
432
- "group_size": 64,
433
- "bits": 8
434
- },
435
- "language_model.model.layers.52.mlp.shared_expert_gate": {
436
- "group_size": 64,
437
- "bits": 8
438
- },
439
- "language_model.model.layers.53.mlp.gate": {
440
- "group_size": 64,
441
- "bits": 8
442
- },
443
- "language_model.model.layers.53.mlp.shared_expert_gate": {
444
- "group_size": 64,
445
- "bits": 8
446
- },
447
- "language_model.model.layers.54.mlp.gate": {
448
- "group_size": 64,
449
- "bits": 8
450
- },
451
- "language_model.model.layers.54.mlp.shared_expert_gate": {
452
- "group_size": 64,
453
- "bits": 8
454
- },
455
- "language_model.model.layers.55.mlp.gate": {
456
- "group_size": 64,
457
- "bits": 8
458
- },
459
- "language_model.model.layers.55.mlp.shared_expert_gate": {
460
- "group_size": 64,
461
- "bits": 8
462
- },
463
- "language_model.model.layers.56.mlp.gate": {
464
- "group_size": 64,
465
- "bits": 8
466
- },
467
- "language_model.model.layers.56.mlp.shared_expert_gate": {
468
- "group_size": 64,
469
- "bits": 8
470
- },
471
- "language_model.model.layers.57.mlp.gate": {
472
- "group_size": 64,
473
- "bits": 8
474
- },
475
- "language_model.model.layers.57.mlp.shared_expert_gate": {
476
- "group_size": 64,
477
- "bits": 8
478
- },
479
- "language_model.model.layers.58.mlp.gate": {
480
- "group_size": 64,
481
- "bits": 8
482
- },
483
- "language_model.model.layers.58.mlp.shared_expert_gate": {
484
- "group_size": 64,
485
- "bits": 8
486
- },
487
- "language_model.model.layers.59.mlp.gate": {
488
- "group_size": 64,
489
- "bits": 8
490
- },
491
- "language_model.model.layers.59.mlp.shared_expert_gate": {
492
- "group_size": 64,
493
- "bits": 8
494
- }
495
  },
496
  "quantization_config": {
497
  "group_size": 64,
498
  "bits": 4,
499
- "mode": "affine",
500
- "language_model.model.layers.0.mlp.gate": {
501
- "group_size": 64,
502
- "bits": 8
503
- },
504
- "language_model.model.layers.0.mlp.shared_expert_gate": {
505
- "group_size": 64,
506
- "bits": 8
507
- },
508
- "language_model.model.layers.1.mlp.gate": {
509
- "group_size": 64,
510
- "bits": 8
511
- },
512
- "language_model.model.layers.1.mlp.shared_expert_gate": {
513
- "group_size": 64,
514
- "bits": 8
515
- },
516
- "language_model.model.layers.2.mlp.gate": {
517
- "group_size": 64,
518
- "bits": 8
519
- },
520
- "language_model.model.layers.2.mlp.shared_expert_gate": {
521
- "group_size": 64,
522
- "bits": 8
523
- },
524
- "language_model.model.layers.3.mlp.gate": {
525
- "group_size": 64,
526
- "bits": 8
527
- },
528
- "language_model.model.layers.3.mlp.shared_expert_gate": {
529
- "group_size": 64,
530
- "bits": 8
531
- },
532
- "language_model.model.layers.4.mlp.gate": {
533
- "group_size": 64,
534
- "bits": 8
535
- },
536
- "language_model.model.layers.4.mlp.shared_expert_gate": {
537
- "group_size": 64,
538
- "bits": 8
539
- },
540
- "language_model.model.layers.5.mlp.gate": {
541
- "group_size": 64,
542
- "bits": 8
543
- },
544
- "language_model.model.layers.5.mlp.shared_expert_gate": {
545
- "group_size": 64,
546
- "bits": 8
547
- },
548
- "language_model.model.layers.6.mlp.gate": {
549
- "group_size": 64,
550
- "bits": 8
551
- },
552
- "language_model.model.layers.6.mlp.shared_expert_gate": {
553
- "group_size": 64,
554
- "bits": 8
555
- },
556
- "language_model.model.layers.7.mlp.gate": {
557
- "group_size": 64,
558
- "bits": 8
559
- },
560
- "language_model.model.layers.7.mlp.shared_expert_gate": {
561
- "group_size": 64,
562
- "bits": 8
563
- },
564
- "language_model.model.layers.8.mlp.gate": {
565
- "group_size": 64,
566
- "bits": 8
567
- },
568
- "language_model.model.layers.8.mlp.shared_expert_gate": {
569
- "group_size": 64,
570
- "bits": 8
571
- },
572
- "language_model.model.layers.9.mlp.gate": {
573
- "group_size": 64,
574
- "bits": 8
575
- },
576
- "language_model.model.layers.9.mlp.shared_expert_gate": {
577
- "group_size": 64,
578
- "bits": 8
579
- },
580
- "language_model.model.layers.10.mlp.gate": {
581
- "group_size": 64,
582
- "bits": 8
583
- },
584
- "language_model.model.layers.10.mlp.shared_expert_gate": {
585
- "group_size": 64,
586
- "bits": 8
587
- },
588
- "language_model.model.layers.11.mlp.gate": {
589
- "group_size": 64,
590
- "bits": 8
591
- },
592
- "language_model.model.layers.11.mlp.shared_expert_gate": {
593
- "group_size": 64,
594
- "bits": 8
595
- },
596
- "language_model.model.layers.12.mlp.gate": {
597
- "group_size": 64,
598
- "bits": 8
599
- },
600
- "language_model.model.layers.12.mlp.shared_expert_gate": {
601
- "group_size": 64,
602
- "bits": 8
603
- },
604
- "language_model.model.layers.13.mlp.gate": {
605
- "group_size": 64,
606
- "bits": 8
607
- },
608
- "language_model.model.layers.13.mlp.shared_expert_gate": {
609
- "group_size": 64,
610
- "bits": 8
611
- },
612
- "language_model.model.layers.14.mlp.gate": {
613
- "group_size": 64,
614
- "bits": 8
615
- },
616
- "language_model.model.layers.14.mlp.shared_expert_gate": {
617
- "group_size": 64,
618
- "bits": 8
619
- },
620
- "language_model.model.layers.15.mlp.gate": {
621
- "group_size": 64,
622
- "bits": 8
623
- },
624
- "language_model.model.layers.15.mlp.shared_expert_gate": {
625
- "group_size": 64,
626
- "bits": 8
627
- },
628
- "language_model.model.layers.16.mlp.gate": {
629
- "group_size": 64,
630
- "bits": 8
631
- },
632
- "language_model.model.layers.16.mlp.shared_expert_gate": {
633
- "group_size": 64,
634
- "bits": 8
635
- },
636
- "language_model.model.layers.17.mlp.gate": {
637
- "group_size": 64,
638
- "bits": 8
639
- },
640
- "language_model.model.layers.17.mlp.shared_expert_gate": {
641
- "group_size": 64,
642
- "bits": 8
643
- },
644
- "language_model.model.layers.18.mlp.gate": {
645
- "group_size": 64,
646
- "bits": 8
647
- },
648
- "language_model.model.layers.18.mlp.shared_expert_gate": {
649
- "group_size": 64,
650
- "bits": 8
651
- },
652
- "language_model.model.layers.19.mlp.gate": {
653
- "group_size": 64,
654
- "bits": 8
655
- },
656
- "language_model.model.layers.19.mlp.shared_expert_gate": {
657
- "group_size": 64,
658
- "bits": 8
659
- },
660
- "language_model.model.layers.20.mlp.gate": {
661
- "group_size": 64,
662
- "bits": 8
663
- },
664
- "language_model.model.layers.20.mlp.shared_expert_gate": {
665
- "group_size": 64,
666
- "bits": 8
667
- },
668
- "language_model.model.layers.21.mlp.gate": {
669
- "group_size": 64,
670
- "bits": 8
671
- },
672
- "language_model.model.layers.21.mlp.shared_expert_gate": {
673
- "group_size": 64,
674
- "bits": 8
675
- },
676
- "language_model.model.layers.22.mlp.gate": {
677
- "group_size": 64,
678
- "bits": 8
679
- },
680
- "language_model.model.layers.22.mlp.shared_expert_gate": {
681
- "group_size": 64,
682
- "bits": 8
683
- },
684
- "language_model.model.layers.23.mlp.gate": {
685
- "group_size": 64,
686
- "bits": 8
687
- },
688
- "language_model.model.layers.23.mlp.shared_expert_gate": {
689
- "group_size": 64,
690
- "bits": 8
691
- },
692
- "language_model.model.layers.24.mlp.gate": {
693
- "group_size": 64,
694
- "bits": 8
695
- },
696
- "language_model.model.layers.24.mlp.shared_expert_gate": {
697
- "group_size": 64,
698
- "bits": 8
699
- },
700
- "language_model.model.layers.25.mlp.gate": {
701
- "group_size": 64,
702
- "bits": 8
703
- },
704
- "language_model.model.layers.25.mlp.shared_expert_gate": {
705
- "group_size": 64,
706
- "bits": 8
707
- },
708
- "language_model.model.layers.26.mlp.gate": {
709
- "group_size": 64,
710
- "bits": 8
711
- },
712
- "language_model.model.layers.26.mlp.shared_expert_gate": {
713
- "group_size": 64,
714
- "bits": 8
715
- },
716
- "language_model.model.layers.27.mlp.gate": {
717
- "group_size": 64,
718
- "bits": 8
719
- },
720
- "language_model.model.layers.27.mlp.shared_expert_gate": {
721
- "group_size": 64,
722
- "bits": 8
723
- },
724
- "language_model.model.layers.28.mlp.gate": {
725
- "group_size": 64,
726
- "bits": 8
727
- },
728
- "language_model.model.layers.28.mlp.shared_expert_gate": {
729
- "group_size": 64,
730
- "bits": 8
731
- },
732
- "language_model.model.layers.29.mlp.gate": {
733
- "group_size": 64,
734
- "bits": 8
735
- },
736
- "language_model.model.layers.29.mlp.shared_expert_gate": {
737
- "group_size": 64,
738
- "bits": 8
739
- },
740
- "language_model.model.layers.30.mlp.gate": {
741
- "group_size": 64,
742
- "bits": 8
743
- },
744
- "language_model.model.layers.30.mlp.shared_expert_gate": {
745
- "group_size": 64,
746
- "bits": 8
747
- },
748
- "language_model.model.layers.31.mlp.gate": {
749
- "group_size": 64,
750
- "bits": 8
751
- },
752
- "language_model.model.layers.31.mlp.shared_expert_gate": {
753
- "group_size": 64,
754
- "bits": 8
755
- },
756
- "language_model.model.layers.32.mlp.gate": {
757
- "group_size": 64,
758
- "bits": 8
759
- },
760
- "language_model.model.layers.32.mlp.shared_expert_gate": {
761
- "group_size": 64,
762
- "bits": 8
763
- },
764
- "language_model.model.layers.33.mlp.gate": {
765
- "group_size": 64,
766
- "bits": 8
767
- },
768
- "language_model.model.layers.33.mlp.shared_expert_gate": {
769
- "group_size": 64,
770
- "bits": 8
771
- },
772
- "language_model.model.layers.34.mlp.gate": {
773
- "group_size": 64,
774
- "bits": 8
775
- },
776
- "language_model.model.layers.34.mlp.shared_expert_gate": {
777
- "group_size": 64,
778
- "bits": 8
779
- },
780
- "language_model.model.layers.35.mlp.gate": {
781
- "group_size": 64,
782
- "bits": 8
783
- },
784
- "language_model.model.layers.35.mlp.shared_expert_gate": {
785
- "group_size": 64,
786
- "bits": 8
787
- },
788
- "language_model.model.layers.36.mlp.gate": {
789
- "group_size": 64,
790
- "bits": 8
791
- },
792
- "language_model.model.layers.36.mlp.shared_expert_gate": {
793
- "group_size": 64,
794
- "bits": 8
795
- },
796
- "language_model.model.layers.37.mlp.gate": {
797
- "group_size": 64,
798
- "bits": 8
799
- },
800
- "language_model.model.layers.37.mlp.shared_expert_gate": {
801
- "group_size": 64,
802
- "bits": 8
803
- },
804
- "language_model.model.layers.38.mlp.gate": {
805
- "group_size": 64,
806
- "bits": 8
807
- },
808
- "language_model.model.layers.38.mlp.shared_expert_gate": {
809
- "group_size": 64,
810
- "bits": 8
811
- },
812
- "language_model.model.layers.39.mlp.gate": {
813
- "group_size": 64,
814
- "bits": 8
815
- },
816
- "language_model.model.layers.39.mlp.shared_expert_gate": {
817
- "group_size": 64,
818
- "bits": 8
819
- },
820
- "language_model.model.layers.40.mlp.gate": {
821
- "group_size": 64,
822
- "bits": 8
823
- },
824
- "language_model.model.layers.40.mlp.shared_expert_gate": {
825
- "group_size": 64,
826
- "bits": 8
827
- },
828
- "language_model.model.layers.41.mlp.gate": {
829
- "group_size": 64,
830
- "bits": 8
831
- },
832
- "language_model.model.layers.41.mlp.shared_expert_gate": {
833
- "group_size": 64,
834
- "bits": 8
835
- },
836
- "language_model.model.layers.42.mlp.gate": {
837
- "group_size": 64,
838
- "bits": 8
839
- },
840
- "language_model.model.layers.42.mlp.shared_expert_gate": {
841
- "group_size": 64,
842
- "bits": 8
843
- },
844
- "language_model.model.layers.43.mlp.gate": {
845
- "group_size": 64,
846
- "bits": 8
847
- },
848
- "language_model.model.layers.43.mlp.shared_expert_gate": {
849
- "group_size": 64,
850
- "bits": 8
851
- },
852
- "language_model.model.layers.44.mlp.gate": {
853
- "group_size": 64,
854
- "bits": 8
855
- },
856
- "language_model.model.layers.44.mlp.shared_expert_gate": {
857
- "group_size": 64,
858
- "bits": 8
859
- },
860
- "language_model.model.layers.45.mlp.gate": {
861
- "group_size": 64,
862
- "bits": 8
863
- },
864
- "language_model.model.layers.45.mlp.shared_expert_gate": {
865
- "group_size": 64,
866
- "bits": 8
867
- },
868
- "language_model.model.layers.46.mlp.gate": {
869
- "group_size": 64,
870
- "bits": 8
871
- },
872
- "language_model.model.layers.46.mlp.shared_expert_gate": {
873
- "group_size": 64,
874
- "bits": 8
875
- },
876
- "language_model.model.layers.47.mlp.gate": {
877
- "group_size": 64,
878
- "bits": 8
879
- },
880
- "language_model.model.layers.47.mlp.shared_expert_gate": {
881
- "group_size": 64,
882
- "bits": 8
883
- },
884
- "language_model.model.layers.48.mlp.gate": {
885
- "group_size": 64,
886
- "bits": 8
887
- },
888
- "language_model.model.layers.48.mlp.shared_expert_gate": {
889
- "group_size": 64,
890
- "bits": 8
891
- },
892
- "language_model.model.layers.49.mlp.gate": {
893
- "group_size": 64,
894
- "bits": 8
895
- },
896
- "language_model.model.layers.49.mlp.shared_expert_gate": {
897
- "group_size": 64,
898
- "bits": 8
899
- },
900
- "language_model.model.layers.50.mlp.gate": {
901
- "group_size": 64,
902
- "bits": 8
903
- },
904
- "language_model.model.layers.50.mlp.shared_expert_gate": {
905
- "group_size": 64,
906
- "bits": 8
907
- },
908
- "language_model.model.layers.51.mlp.gate": {
909
- "group_size": 64,
910
- "bits": 8
911
- },
912
- "language_model.model.layers.51.mlp.shared_expert_gate": {
913
- "group_size": 64,
914
- "bits": 8
915
- },
916
- "language_model.model.layers.52.mlp.gate": {
917
- "group_size": 64,
918
- "bits": 8
919
- },
920
- "language_model.model.layers.52.mlp.shared_expert_gate": {
921
- "group_size": 64,
922
- "bits": 8
923
- },
924
- "language_model.model.layers.53.mlp.gate": {
925
- "group_size": 64,
926
- "bits": 8
927
- },
928
- "language_model.model.layers.53.mlp.shared_expert_gate": {
929
- "group_size": 64,
930
- "bits": 8
931
- },
932
- "language_model.model.layers.54.mlp.gate": {
933
- "group_size": 64,
934
- "bits": 8
935
- },
936
- "language_model.model.layers.54.mlp.shared_expert_gate": {
937
- "group_size": 64,
938
- "bits": 8
939
- },
940
- "language_model.model.layers.55.mlp.gate": {
941
- "group_size": 64,
942
- "bits": 8
943
- },
944
- "language_model.model.layers.55.mlp.shared_expert_gate": {
945
- "group_size": 64,
946
- "bits": 8
947
- },
948
- "language_model.model.layers.56.mlp.gate": {
949
- "group_size": 64,
950
- "bits": 8
951
- },
952
- "language_model.model.layers.56.mlp.shared_expert_gate": {
953
- "group_size": 64,
954
- "bits": 8
955
- },
956
- "language_model.model.layers.57.mlp.gate": {
957
- "group_size": 64,
958
- "bits": 8
959
- },
960
- "language_model.model.layers.57.mlp.shared_expert_gate": {
961
- "group_size": 64,
962
- "bits": 8
963
- },
964
- "language_model.model.layers.58.mlp.gate": {
965
- "group_size": 64,
966
- "bits": 8
967
- },
968
- "language_model.model.layers.58.mlp.shared_expert_gate": {
969
- "group_size": 64,
970
- "bits": 8
971
- },
972
- "language_model.model.layers.59.mlp.gate": {
973
- "group_size": 64,
974
- "bits": 8
975
- },
976
- "language_model.model.layers.59.mlp.shared_expert_gate": {
977
- "group_size": 64,
978
- "bits": 8
979
- }
980
  },
981
  "text_config": {
982
  "attention_bias": false,
@@ -1080,14 +120,30 @@
1080
  11,
1081
  10
1082
  ],
 
1083
  "rope_theta": 10000000,
1084
- "partial_rotary_factor": 0.25,
1085
- "type": "default"
1086
  }
1087
  },
1088
  "tie_word_embeddings": false,
1089
  "transformers_version": "4.57.0.dev0",
1090
  "video_token_id": 248057,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1091
  "vision_end_token_id": 248054,
1092
  "vision_start_token_id": 248053
1093
  }
 
11
  "quantization": {
12
  "group_size": 64,
13
  "bits": 4,
14
+ "mode": "affine"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  },
16
  "quantization_config": {
17
  "group_size": 64,
18
  "bits": 4,
19
+ "mode": "affine"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  },
21
  "text_config": {
22
  "attention_bias": false,
 
120
  11,
121
  10
122
  ],
123
+ "rope_type": "default",
124
  "rope_theta": 10000000,
125
+ "partial_rotary_factor": 0.25
 
126
  }
127
  },
128
  "tie_word_embeddings": false,
129
  "transformers_version": "4.57.0.dev0",
130
  "video_token_id": 248057,
131
+ "vision_config": {
132
+ "deepstack_visual_indexes": [],
133
+ "depth": 27,
134
+ "hidden_act": "gelu_pytorch_tanh",
135
+ "hidden_size": 1152,
136
+ "in_channels": 3,
137
+ "initializer_range": 0.02,
138
+ "intermediate_size": 4304,
139
+ "model_type": "qwen3_5_moe",
140
+ "num_heads": 16,
141
+ "num_position_embeddings": 2304,
142
+ "out_hidden_size": 4096,
143
+ "patch_size": 16,
144
+ "spatial_merge_size": 2,
145
+ "temporal_patch_size": 2
146
+ },
147
  "vision_end_token_id": 248054,
148
  "vision_start_token_id": 248053
149
  }
model-00001-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5c9a28b5750dc17c9f1cf5d8c8a9a5682839a17680179362f163200c76dd4240
3
- size 4340497198
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:361799676d080074e65b29ce20cae9cde05b56831c372d487ed4887b301b26a8
3
+ size 5250456685
model-00002-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5dcf840a69486fc1a1d7f5a66e54bb1227e1887b0f68cbc11e38417734fc5ff8
3
- size 4907625691
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b032c716ed386ef8ee090c1a1915b46c3c53b3899bca3be42cbd9885d4f991e8
3
+ size 4906575065
model-00003-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:596c6caf10767f1908374ee29d296050621e64071677381c692cf43cec988950
3
- size 4900154294
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce73c3ca05a82486f4112e46a5fe80d965590c0527a5e6bd3b1fc2f0c95e3e5e
3
+ size 4899103668
model-00004-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9c7f89207ac733e4c46637e87b0eaaa438bcd65f8939581e0391cb1f069b8ced
3
- size 4983411456
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:182b3b9461f573e73d4d27696346ca3634216234c113bf3ea7c2fda32a77fa6c
3
+ size 4981310204
model-00005-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c0ff1ec39982dc9dd393fdf529e25ec751a7ff5f62633f1d0f95b8515303a2f3
3
- size 4907625683
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8452e7513c569caece2916750d6adf22bf1a8fb17c1bd7e2de72197c905098d
3
+ size 4906575057
model-00006-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e3261a10f09f2d7f6940e487fa386d0a6e28a099a303725ea3ad99b9756b995e
3
- size 4900154304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1647865d9f3a10eada5923056000a960c6c5c15d32b9e2fdaabfcc0498b984c7
3
+ size 4899103678
model-00007-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:32b7b731e4f701864f7dc4fa436f2fbb433445aae3b788492ca1804801b7c170
3
- size 4983411398
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32246ab4ff7a98e06ad6c0be7dde31aae0e75665fe3cc9bc8aa55bda16f0fcbd
3
+ size 4981310146
model-00008-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f01243846c0a73797d9b71805d447e97fe360a52b091b7a46091c1d7bfe7f170
3
- size 4907625736
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9cf28f7d83015f239fdfd9a8f4570c0623a9b9eaf848d2ab174675c40302eb5
3
+ size 4906575110
model-00009-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:07cf03eeb0e5330be24517bfd174b74fa390496fa7ad34f2a333ff0448078a94
3
- size 4900154335
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:758f63d945a298653219c151ab8e474b587db4b8db29d08f1a17aafa1e845644
3
+ size 4899103709
model-00010-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dd812620470986741e852979b36204cf4d6662c5cc4ebd8fe58ab97863f68d69
3
- size 4983411534
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fae3b1d52e103cac7bc7ada1db8d8513d5fe9131833b832256cc0fbbae2ad596
3
+ size 4981310282
model-00011-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:35d94cd7575ccfc5de47304e352ca48b9e2cb070a9d30c695f129ca1ff26b9d0
3
- size 4907625787
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6cfec296e8ce24e3782a56eb5bbba256eb0e9749a06806431870654476430a6
3
+ size 4906575161
model-00012-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4b41683da8a88716fc09a276ffbc6d5f586b51aab4b50b699d7d92a0382ee68d
3
- size 4900154341
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:46deac6ed7bf0f54dc32a0f7168cc3a63093847f0086fb375db7fbbfcd553185
3
+ size 4899103715
model-00013-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cf76e4844b40ddbada0c73b0b9293efcdfb4c17b524aaab8eee5f5449dcb2d26
3
- size 4983411496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:727767e29695e06e7041570c070b3a89835ac7a5dcff3fc907e0722319f7bd3a
3
+ size 4981310244
model-00014-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d17bf0f5160a667abe7882ec579be4190b7a9d49a34e2d1575a608b147165747
3
- size 4907625727
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c5b5666e91fa8424c69d85378faa92adb65d3682726f0e4c83b877bfff23544
3
+ size 4906575101
model-00015-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0835c4b9255019cf18953e5ddb53cf9ea4b859fb075e123dced68d92006d8ae7
3
- size 4900154347
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9b17bce9a5beb74ebede9dbe993abf2b435024fc3d63930959ccc7869553b73
3
+ size 4899103721
model-00016-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4ad763e4f81c670b0b541a8bd235a3c6dc8baaae387cbf618866cb79ff2562cf
3
- size 4983411512
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9fd2152c2bd7e1344151b3364b64fe522bd27e7e461c5f422202cb3a5da4df3c
3
+ size 4981310260
model-00017-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ab8dffc66afe56d3e0854e015cad6c5804a253e3a0c2aa5d06a759b5de5abf79
3
- size 4907625723
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41480b57a8f6675b750d286b80800ac56875055df06224fb6a62eb4f2497c8ed
3
+ size 4906575097
model-00018-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6c6cf934936edc29c7e8ecb9ab269e19d0b372134e20608c17bc6fd0ab90e9fe
3
- size 4900154337
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cdcebab30ff3fbfd288efeb0ac582c35c98188689e267d020e365bdc88cf729c
3
+ size 4899103711
model-00019-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ba53cf84edc7ad5aa7f7c320980667a5a581a4803c48ce89ae0b7965f8c65c08
3
- size 4983411516
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6948f546257ed4b09cf53782d36d193907e8ecf22f528729c66c66cbf09f2115
3
+ size 4981310264
model-00020-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:305d58849d3703a9546b25c4150ca5bb845824aba3cc5e87f74340e4c3d211b4
3
- size 4907625775
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f727102dd2c400483f56e319425c15a7b48ceabbf60515fba7a6f68bdc90a75f
3
+ size 4906575149
model-00021-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:30b4207945236e718ecf672a027caea75a681ae18675fd0a7380fb7552a28cdb
3
- size 4900154317
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ce671cbba6e5a7d97513d08511269592b7f5f916375a75b4d4545b90a2c9822
3
+ size 4899103691
model-00022-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b25f41f408d9b126b434989a108f9af3afb0506872d7f13a193664fbaa4bd8f9
3
- size 4983411510
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33ef1bc5e7658a63a38f8223d4adc889df80c6cfed4abe07709e1e2280f803f2
3
+ size 4981310258
model-00023-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fff7a036434adf442f85fa6f35118879a41cbd6c48d82ae2a7cada2c35c0f2b2
3
- size 4907625783
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca6c64a485e6c01aab595e35e30c5e4f6e58dc27d8d959fd3185d07bb3c5444d
3
+ size 4906575157
model-00024-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:da885e4fc646e7f6f07e4d17f8adf430041bc352eb446a0b3eb15e994c4f75b6
3
- size 4900154347
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:80a1f2a69d7ac8ba1d116ba9abb3627cd522b10f894170f829a750ce128b8885
3
+ size 4899103721
model-00025-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cc581b9184114a9188f360cf93b39c5ac6d2e5bd2caa36ee017666f15f9ed1f9
3
- size 4983411532
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a57048a75c863b353b02c55ed61ac408e22d935db6bc4530f2cbdf75018d504c
3
+ size 4981310280
model-00026-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:654b46a75961997c606b1106bb3e027b3922830037dd6ef5de342a3cffa1c461
3
- size 4907625775
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0ad2519dce9ddfaa76f6a0379ba7c4f4549513fe7cd3310d94b74e0560417bf
3
+ size 4906575149
model-00027-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:14d5f543cae9d2632bc3047c9b0de42ae64f6e481ddf31019ab21f34cd009da6
3
- size 4900154347
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:407d0d0ef6d01fd5138538dd8badbcc59f13c3261277d75a12cb4185acc9c613
3
+ size 4899103721
model-00028-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3ed47f3ef71f5d679ef8561b857af336c4ceab90a845b235a440d86e091ebae1
3
- size 4983411506
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5bb4c6166f10203a29e309c9473b8d8a1929007ffc754770cdaec6fc6c9c5f7c
3
+ size 4981310254
model-00029-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8ad458c1dba5c70a9afefb2ee054a48ab66d1aed30f2f7e058f297ebe059e1bc
3
- size 4907625779
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:124c156fa19aeca7f721a88fc894b1bc7d36bf2f03c026c5698c0b76d947af06
3
+ size 4906575153
model-00030-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cd257dff03daa1c8405ab9c1d3c3496c4818a9b0926bcd8bf404d4b4f6d8c18b
3
- size 4900154331
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:562544a4d7d6203b1c0fe24aac470767f4564ba1399424d679be2af46bc00e6a
3
+ size 4899103705
model-00031-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b7f10d09732b6319fc4aa4260b420552d46a3eac4bde38aa5c7d98144607e8d6
3
- size 4983411506
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b785e123019df25ff40009f2fb308d53307fd830765238003284f356e6ca387
3
+ size 4981310254
model-00032-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3cee68ddecb3e6d641741538cc4c97350642ab96f89ec13e686525804be20807
3
- size 4907625715
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6a09d9ad648fbb2ef4db5ef090a4bff11c5dd188920868d721514fd1e07f1ed
3
+ size 4906575089
model-00033-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:75a5add9b57b9c94412c67571cda2156ffe15a505fc5943e582eb9b1d493452b
3
- size 4900154343
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d8dca928a363a3e8e763ab11e622b59f09a940f44cc537d600a267403a9b8084
3
+ size 4899103717
model-00034-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:313ba37d3d18228c319561a4fb44e380ca1fd8e7a7c01c0f67f48cbc5556ce5a
3
- size 4983411486
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90642ea0d7b03c10083682eefd2341b2d8b3630e845befe7e1344f0734cef0dd
3
+ size 4981310234
model-00035-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:85588835585f95f5eaba7d922af13522497b7f0d1aa22c44679c81822c626f7c
3
- size 4907625761
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ca3580b7cf5fe11f3b18b1a0e1305493bee6a00a17b96dfbb53481a2c5017ea
3
+ size 4906575135
model-00036-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b54a7d707f3f4a7d05c855afab589ac8064b1528ce5afc4eb7a1e2e54079a44c
3
- size 4900154277
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b08ef8225099efef7015208279301e0bc3613f524ff86177d22e320089036723
3
+ size 4899103651
model-00037-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9ab3cf7d734763ff7987c764248e1e2a38e6eea24a783e8c755d1ffc1e58956f
3
- size 4983411508
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2e70e7e4e1c66adcf3982307dee354852331eb80416207a65d64c986bde9a66
3
+ size 4981310256
model-00038-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1333e4841ca56979defd12ef6e4994e45553ad89680c91c260f56ae1d786f53e
3
- size 4907625703
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1e2af6748394a0acee2c750cf8bcc88970014f3f9fa14bc891edb1c931fc4d2
3
+ size 4906575077
model-00039-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e9eb67fc3e83bd0b2381872d2d98c2dad7c4d06647dbf09ef177f1f2f330a7af
3
- size 4900154343
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c58e1f29f2cb616a312f2bb05b9a6d6ca2da2ea953a6fd2228e7af5f0a0cfb6b
3
+ size 4899103717
model-00040-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:588d875aaa0fd42472c6ef81491b3e2e76d705a4e3c0b1620181fb054b69815c
3
- size 4983411540
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02422272717572c7ce2c1eb4513103e9e688f2c8da52838f464d3f2a4c39eac9
3
+ size 4981310288
model-00041-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ba08212f76413dda6e1c8fe578edf21c93a8f6519fa49dc71f3b78a69b8f232e
3
- size 4907625721
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be64e70bdb7e383cae9a523b077127ab70098a697262498dc66f7cc7d4fc6017
3
+ size 4906575095
model-00042-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:13e94fabe68719c6b9a0c14ef94d1dd557409adafc366abd9a6f4c62234e7505
3
- size 4900154343
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5083ca46270bf3ced3a025f47f9c11dccb7cd20a37890fb81e08f9d14f55c71d
3
+ size 4899103717
model-00043-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b9dd628b31d0fa64cf7450dc93e4520e4dcebc1e1e585db325a8a269ff8fe558
3
- size 4983411498
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:46cd310c0b15a95477097fe9943a66c5f3f26b629d215dad9e79b64eeeebdf5a
3
+ size 4981310246
model-00044-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ae90fcfbd7bdf0c3048dea983631a8f895f674db49a55ad5ffbed285145fc33d
3
- size 4907625783
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:655e1157645305f0dd81ae64e55ee33cc387ead90c0f3a565011301318203bd0
3
+ size 4906575157
model-00045-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cd995517c9460cf792adc9a96a4b5ecc95e23f72c0d60602b10110d4ef3b7a1d
3
- size 4900154347
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e78b2adc5bcca197eef3b96fdcb126f52c14d26e68ee9b994cf0e320329aaeaa
3
+ size 4899103721
model-00046-of-00046.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:726cda5891e13d726197cd592f7586d11560db314e1adae6fcd270dddfaaab31
3
- size 1787181790
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a008c67647f5487518fa7148905fbe2b4536c37c000fece03b322732daaa0ec
3
+ size 1787179741
model.safetensors.index.json CHANGED
@@ -1,7 +1,6 @@
1
  {
2
  "metadata": {
3
- "total_size": 223011784832,
4
- "total_parameters": 396346344576
5
  },
6
  "weight_map": {
7
  "language_model.lm_head.biases": "model-00046-of-00046.safetensors",
@@ -2635,6 +2634,339 @@
2635
  "language_model.model.layers.9.mlp.switch_mlp.up_proj.scales": "model-00008-of-00046.safetensors",
2636
  "language_model.model.layers.9.mlp.switch_mlp.up_proj.weight": "model-00008-of-00046.safetensors",
2637
  "language_model.model.layers.9.post_attention_layernorm.weight": "model-00007-of-00046.safetensors",
2638
- "language_model.model.norm.weight": "model-00046-of-00046.safetensors"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2639
  }
2640
  }
 
1
  {
2
  "metadata": {
3
+ "total_size": 223860768352
 
4
  },
5
  "weight_map": {
6
  "language_model.lm_head.biases": "model-00046-of-00046.safetensors",
 
2634
  "language_model.model.layers.9.mlp.switch_mlp.up_proj.scales": "model-00008-of-00046.safetensors",
2635
  "language_model.model.layers.9.mlp.switch_mlp.up_proj.weight": "model-00008-of-00046.safetensors",
2636
  "language_model.model.layers.9.post_attention_layernorm.weight": "model-00007-of-00046.safetensors",
2637
+ "language_model.model.norm.weight": "model-00046-of-00046.safetensors",
2638
+ "vision_tower.blocks.0.attn.proj.bias": "model-00001-of-00046.safetensors",
2639
+ "vision_tower.blocks.0.attn.proj.weight": "model-00001-of-00046.safetensors",
2640
+ "vision_tower.blocks.0.attn.qkv.bias": "model-00001-of-00046.safetensors",
2641
+ "vision_tower.blocks.0.attn.qkv.weight": "model-00001-of-00046.safetensors",
2642
+ "vision_tower.blocks.0.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2643
+ "vision_tower.blocks.0.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2644
+ "vision_tower.blocks.0.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2645
+ "vision_tower.blocks.0.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2646
+ "vision_tower.blocks.0.norm1.bias": "model-00001-of-00046.safetensors",
2647
+ "vision_tower.blocks.0.norm1.weight": "model-00001-of-00046.safetensors",
2648
+ "vision_tower.blocks.0.norm2.bias": "model-00001-of-00046.safetensors",
2649
+ "vision_tower.blocks.0.norm2.weight": "model-00001-of-00046.safetensors",
2650
+ "vision_tower.blocks.1.attn.proj.bias": "model-00001-of-00046.safetensors",
2651
+ "vision_tower.blocks.1.attn.proj.weight": "model-00001-of-00046.safetensors",
2652
+ "vision_tower.blocks.1.attn.qkv.bias": "model-00001-of-00046.safetensors",
2653
+ "vision_tower.blocks.1.attn.qkv.weight": "model-00001-of-00046.safetensors",
2654
+ "vision_tower.blocks.1.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2655
+ "vision_tower.blocks.1.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2656
+ "vision_tower.blocks.1.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2657
+ "vision_tower.blocks.1.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2658
+ "vision_tower.blocks.1.norm1.bias": "model-00001-of-00046.safetensors",
2659
+ "vision_tower.blocks.1.norm1.weight": "model-00001-of-00046.safetensors",
2660
+ "vision_tower.blocks.1.norm2.bias": "model-00001-of-00046.safetensors",
2661
+ "vision_tower.blocks.1.norm2.weight": "model-00001-of-00046.safetensors",
2662
+ "vision_tower.blocks.10.attn.proj.bias": "model-00001-of-00046.safetensors",
2663
+ "vision_tower.blocks.10.attn.proj.weight": "model-00001-of-00046.safetensors",
2664
+ "vision_tower.blocks.10.attn.qkv.bias": "model-00001-of-00046.safetensors",
2665
+ "vision_tower.blocks.10.attn.qkv.weight": "model-00001-of-00046.safetensors",
2666
+ "vision_tower.blocks.10.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2667
+ "vision_tower.blocks.10.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2668
+ "vision_tower.blocks.10.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2669
+ "vision_tower.blocks.10.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2670
+ "vision_tower.blocks.10.norm1.bias": "model-00001-of-00046.safetensors",
2671
+ "vision_tower.blocks.10.norm1.weight": "model-00001-of-00046.safetensors",
2672
+ "vision_tower.blocks.10.norm2.bias": "model-00001-of-00046.safetensors",
2673
+ "vision_tower.blocks.10.norm2.weight": "model-00001-of-00046.safetensors",
2674
+ "vision_tower.blocks.11.attn.proj.bias": "model-00001-of-00046.safetensors",
2675
+ "vision_tower.blocks.11.attn.proj.weight": "model-00001-of-00046.safetensors",
2676
+ "vision_tower.blocks.11.attn.qkv.bias": "model-00001-of-00046.safetensors",
2677
+ "vision_tower.blocks.11.attn.qkv.weight": "model-00001-of-00046.safetensors",
2678
+ "vision_tower.blocks.11.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2679
+ "vision_tower.blocks.11.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2680
+ "vision_tower.blocks.11.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2681
+ "vision_tower.blocks.11.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2682
+ "vision_tower.blocks.11.norm1.bias": "model-00001-of-00046.safetensors",
2683
+ "vision_tower.blocks.11.norm1.weight": "model-00001-of-00046.safetensors",
2684
+ "vision_tower.blocks.11.norm2.bias": "model-00001-of-00046.safetensors",
2685
+ "vision_tower.blocks.11.norm2.weight": "model-00001-of-00046.safetensors",
2686
+ "vision_tower.blocks.12.attn.proj.bias": "model-00001-of-00046.safetensors",
2687
+ "vision_tower.blocks.12.attn.proj.weight": "model-00001-of-00046.safetensors",
2688
+ "vision_tower.blocks.12.attn.qkv.bias": "model-00001-of-00046.safetensors",
2689
+ "vision_tower.blocks.12.attn.qkv.weight": "model-00001-of-00046.safetensors",
2690
+ "vision_tower.blocks.12.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2691
+ "vision_tower.blocks.12.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2692
+ "vision_tower.blocks.12.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2693
+ "vision_tower.blocks.12.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2694
+ "vision_tower.blocks.12.norm1.bias": "model-00001-of-00046.safetensors",
2695
+ "vision_tower.blocks.12.norm1.weight": "model-00001-of-00046.safetensors",
2696
+ "vision_tower.blocks.12.norm2.bias": "model-00001-of-00046.safetensors",
2697
+ "vision_tower.blocks.12.norm2.weight": "model-00001-of-00046.safetensors",
2698
+ "vision_tower.blocks.13.attn.proj.bias": "model-00001-of-00046.safetensors",
2699
+ "vision_tower.blocks.13.attn.proj.weight": "model-00001-of-00046.safetensors",
2700
+ "vision_tower.blocks.13.attn.qkv.bias": "model-00001-of-00046.safetensors",
2701
+ "vision_tower.blocks.13.attn.qkv.weight": "model-00001-of-00046.safetensors",
2702
+ "vision_tower.blocks.13.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2703
+ "vision_tower.blocks.13.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2704
+ "vision_tower.blocks.13.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2705
+ "vision_tower.blocks.13.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2706
+ "vision_tower.blocks.13.norm1.bias": "model-00001-of-00046.safetensors",
2707
+ "vision_tower.blocks.13.norm1.weight": "model-00001-of-00046.safetensors",
2708
+ "vision_tower.blocks.13.norm2.bias": "model-00001-of-00046.safetensors",
2709
+ "vision_tower.blocks.13.norm2.weight": "model-00001-of-00046.safetensors",
2710
+ "vision_tower.blocks.14.attn.proj.bias": "model-00001-of-00046.safetensors",
2711
+ "vision_tower.blocks.14.attn.proj.weight": "model-00001-of-00046.safetensors",
2712
+ "vision_tower.blocks.14.attn.qkv.bias": "model-00001-of-00046.safetensors",
2713
+ "vision_tower.blocks.14.attn.qkv.weight": "model-00001-of-00046.safetensors",
2714
+ "vision_tower.blocks.14.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2715
+ "vision_tower.blocks.14.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2716
+ "vision_tower.blocks.14.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2717
+ "vision_tower.blocks.14.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2718
+ "vision_tower.blocks.14.norm1.bias": "model-00001-of-00046.safetensors",
2719
+ "vision_tower.blocks.14.norm1.weight": "model-00001-of-00046.safetensors",
2720
+ "vision_tower.blocks.14.norm2.bias": "model-00001-of-00046.safetensors",
2721
+ "vision_tower.blocks.14.norm2.weight": "model-00001-of-00046.safetensors",
2722
+ "vision_tower.blocks.15.attn.proj.bias": "model-00001-of-00046.safetensors",
2723
+ "vision_tower.blocks.15.attn.proj.weight": "model-00001-of-00046.safetensors",
2724
+ "vision_tower.blocks.15.attn.qkv.bias": "model-00001-of-00046.safetensors",
2725
+ "vision_tower.blocks.15.attn.qkv.weight": "model-00001-of-00046.safetensors",
2726
+ "vision_tower.blocks.15.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2727
+ "vision_tower.blocks.15.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2728
+ "vision_tower.blocks.15.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2729
+ "vision_tower.blocks.15.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2730
+ "vision_tower.blocks.15.norm1.bias": "model-00001-of-00046.safetensors",
2731
+ "vision_tower.blocks.15.norm1.weight": "model-00001-of-00046.safetensors",
2732
+ "vision_tower.blocks.15.norm2.bias": "model-00001-of-00046.safetensors",
2733
+ "vision_tower.blocks.15.norm2.weight": "model-00001-of-00046.safetensors",
2734
+ "vision_tower.blocks.16.attn.proj.bias": "model-00001-of-00046.safetensors",
2735
+ "vision_tower.blocks.16.attn.proj.weight": "model-00001-of-00046.safetensors",
2736
+ "vision_tower.blocks.16.attn.qkv.bias": "model-00001-of-00046.safetensors",
2737
+ "vision_tower.blocks.16.attn.qkv.weight": "model-00001-of-00046.safetensors",
2738
+ "vision_tower.blocks.16.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2739
+ "vision_tower.blocks.16.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2740
+ "vision_tower.blocks.16.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2741
+ "vision_tower.blocks.16.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2742
+ "vision_tower.blocks.16.norm1.bias": "model-00001-of-00046.safetensors",
2743
+ "vision_tower.blocks.16.norm1.weight": "model-00001-of-00046.safetensors",
2744
+ "vision_tower.blocks.16.norm2.bias": "model-00001-of-00046.safetensors",
2745
+ "vision_tower.blocks.16.norm2.weight": "model-00001-of-00046.safetensors",
2746
+ "vision_tower.blocks.17.attn.proj.bias": "model-00001-of-00046.safetensors",
2747
+ "vision_tower.blocks.17.attn.proj.weight": "model-00001-of-00046.safetensors",
2748
+ "vision_tower.blocks.17.attn.qkv.bias": "model-00001-of-00046.safetensors",
2749
+ "vision_tower.blocks.17.attn.qkv.weight": "model-00001-of-00046.safetensors",
2750
+ "vision_tower.blocks.17.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2751
+ "vision_tower.blocks.17.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2752
+ "vision_tower.blocks.17.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2753
+ "vision_tower.blocks.17.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2754
+ "vision_tower.blocks.17.norm1.bias": "model-00001-of-00046.safetensors",
2755
+ "vision_tower.blocks.17.norm1.weight": "model-00001-of-00046.safetensors",
2756
+ "vision_tower.blocks.17.norm2.bias": "model-00001-of-00046.safetensors",
2757
+ "vision_tower.blocks.17.norm2.weight": "model-00001-of-00046.safetensors",
2758
+ "vision_tower.blocks.18.attn.proj.bias": "model-00001-of-00046.safetensors",
2759
+ "vision_tower.blocks.18.attn.proj.weight": "model-00001-of-00046.safetensors",
2760
+ "vision_tower.blocks.18.attn.qkv.bias": "model-00001-of-00046.safetensors",
2761
+ "vision_tower.blocks.18.attn.qkv.weight": "model-00001-of-00046.safetensors",
2762
+ "vision_tower.blocks.18.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2763
+ "vision_tower.blocks.18.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2764
+ "vision_tower.blocks.18.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2765
+ "vision_tower.blocks.18.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2766
+ "vision_tower.blocks.18.norm1.bias": "model-00001-of-00046.safetensors",
2767
+ "vision_tower.blocks.18.norm1.weight": "model-00001-of-00046.safetensors",
2768
+ "vision_tower.blocks.18.norm2.bias": "model-00001-of-00046.safetensors",
2769
+ "vision_tower.blocks.18.norm2.weight": "model-00001-of-00046.safetensors",
2770
+ "vision_tower.blocks.19.attn.proj.bias": "model-00001-of-00046.safetensors",
2771
+ "vision_tower.blocks.19.attn.proj.weight": "model-00001-of-00046.safetensors",
2772
+ "vision_tower.blocks.19.attn.qkv.bias": "model-00001-of-00046.safetensors",
2773
+ "vision_tower.blocks.19.attn.qkv.weight": "model-00001-of-00046.safetensors",
2774
+ "vision_tower.blocks.19.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2775
+ "vision_tower.blocks.19.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2776
+ "vision_tower.blocks.19.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2777
+ "vision_tower.blocks.19.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2778
+ "vision_tower.blocks.19.norm1.bias": "model-00001-of-00046.safetensors",
2779
+ "vision_tower.blocks.19.norm1.weight": "model-00001-of-00046.safetensors",
2780
+ "vision_tower.blocks.19.norm2.bias": "model-00001-of-00046.safetensors",
2781
+ "vision_tower.blocks.19.norm2.weight": "model-00001-of-00046.safetensors",
2782
+ "vision_tower.blocks.2.attn.proj.bias": "model-00001-of-00046.safetensors",
2783
+ "vision_tower.blocks.2.attn.proj.weight": "model-00001-of-00046.safetensors",
2784
+ "vision_tower.blocks.2.attn.qkv.bias": "model-00001-of-00046.safetensors",
2785
+ "vision_tower.blocks.2.attn.qkv.weight": "model-00001-of-00046.safetensors",
2786
+ "vision_tower.blocks.2.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2787
+ "vision_tower.blocks.2.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2788
+ "vision_tower.blocks.2.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2789
+ "vision_tower.blocks.2.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2790
+ "vision_tower.blocks.2.norm1.bias": "model-00001-of-00046.safetensors",
2791
+ "vision_tower.blocks.2.norm1.weight": "model-00001-of-00046.safetensors",
2792
+ "vision_tower.blocks.2.norm2.bias": "model-00001-of-00046.safetensors",
2793
+ "vision_tower.blocks.2.norm2.weight": "model-00001-of-00046.safetensors",
2794
+ "vision_tower.blocks.20.attn.proj.bias": "model-00001-of-00046.safetensors",
2795
+ "vision_tower.blocks.20.attn.proj.weight": "model-00001-of-00046.safetensors",
2796
+ "vision_tower.blocks.20.attn.qkv.bias": "model-00001-of-00046.safetensors",
2797
+ "vision_tower.blocks.20.attn.qkv.weight": "model-00001-of-00046.safetensors",
2798
+ "vision_tower.blocks.20.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2799
+ "vision_tower.blocks.20.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2800
+ "vision_tower.blocks.20.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2801
+ "vision_tower.blocks.20.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2802
+ "vision_tower.blocks.20.norm1.bias": "model-00001-of-00046.safetensors",
2803
+ "vision_tower.blocks.20.norm1.weight": "model-00001-of-00046.safetensors",
2804
+ "vision_tower.blocks.20.norm2.bias": "model-00001-of-00046.safetensors",
2805
+ "vision_tower.blocks.20.norm2.weight": "model-00001-of-00046.safetensors",
2806
+ "vision_tower.blocks.21.attn.proj.bias": "model-00001-of-00046.safetensors",
2807
+ "vision_tower.blocks.21.attn.proj.weight": "model-00001-of-00046.safetensors",
2808
+ "vision_tower.blocks.21.attn.qkv.bias": "model-00001-of-00046.safetensors",
2809
+ "vision_tower.blocks.21.attn.qkv.weight": "model-00001-of-00046.safetensors",
2810
+ "vision_tower.blocks.21.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2811
+ "vision_tower.blocks.21.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2812
+ "vision_tower.blocks.21.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2813
+ "vision_tower.blocks.21.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2814
+ "vision_tower.blocks.21.norm1.bias": "model-00001-of-00046.safetensors",
2815
+ "vision_tower.blocks.21.norm1.weight": "model-00001-of-00046.safetensors",
2816
+ "vision_tower.blocks.21.norm2.bias": "model-00001-of-00046.safetensors",
2817
+ "vision_tower.blocks.21.norm2.weight": "model-00001-of-00046.safetensors",
2818
+ "vision_tower.blocks.22.attn.proj.bias": "model-00001-of-00046.safetensors",
2819
+ "vision_tower.blocks.22.attn.proj.weight": "model-00001-of-00046.safetensors",
2820
+ "vision_tower.blocks.22.attn.qkv.bias": "model-00001-of-00046.safetensors",
2821
+ "vision_tower.blocks.22.attn.qkv.weight": "model-00001-of-00046.safetensors",
2822
+ "vision_tower.blocks.22.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2823
+ "vision_tower.blocks.22.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2824
+ "vision_tower.blocks.22.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2825
+ "vision_tower.blocks.22.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2826
+ "vision_tower.blocks.22.norm1.bias": "model-00001-of-00046.safetensors",
2827
+ "vision_tower.blocks.22.norm1.weight": "model-00001-of-00046.safetensors",
2828
+ "vision_tower.blocks.22.norm2.bias": "model-00001-of-00046.safetensors",
2829
+ "vision_tower.blocks.22.norm2.weight": "model-00001-of-00046.safetensors",
2830
+ "vision_tower.blocks.23.attn.proj.bias": "model-00001-of-00046.safetensors",
2831
+ "vision_tower.blocks.23.attn.proj.weight": "model-00001-of-00046.safetensors",
2832
+ "vision_tower.blocks.23.attn.qkv.bias": "model-00001-of-00046.safetensors",
2833
+ "vision_tower.blocks.23.attn.qkv.weight": "model-00001-of-00046.safetensors",
2834
+ "vision_tower.blocks.23.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2835
+ "vision_tower.blocks.23.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2836
+ "vision_tower.blocks.23.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2837
+ "vision_tower.blocks.23.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2838
+ "vision_tower.blocks.23.norm1.bias": "model-00001-of-00046.safetensors",
2839
+ "vision_tower.blocks.23.norm1.weight": "model-00001-of-00046.safetensors",
2840
+ "vision_tower.blocks.23.norm2.bias": "model-00001-of-00046.safetensors",
2841
+ "vision_tower.blocks.23.norm2.weight": "model-00001-of-00046.safetensors",
2842
+ "vision_tower.blocks.24.attn.proj.bias": "model-00001-of-00046.safetensors",
2843
+ "vision_tower.blocks.24.attn.proj.weight": "model-00001-of-00046.safetensors",
2844
+ "vision_tower.blocks.24.attn.qkv.bias": "model-00001-of-00046.safetensors",
2845
+ "vision_tower.blocks.24.attn.qkv.weight": "model-00001-of-00046.safetensors",
2846
+ "vision_tower.blocks.24.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2847
+ "vision_tower.blocks.24.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2848
+ "vision_tower.blocks.24.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2849
+ "vision_tower.blocks.24.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2850
+ "vision_tower.blocks.24.norm1.bias": "model-00001-of-00046.safetensors",
2851
+ "vision_tower.blocks.24.norm1.weight": "model-00001-of-00046.safetensors",
2852
+ "vision_tower.blocks.24.norm2.bias": "model-00001-of-00046.safetensors",
2853
+ "vision_tower.blocks.24.norm2.weight": "model-00001-of-00046.safetensors",
2854
+ "vision_tower.blocks.25.attn.proj.bias": "model-00001-of-00046.safetensors",
2855
+ "vision_tower.blocks.25.attn.proj.weight": "model-00001-of-00046.safetensors",
2856
+ "vision_tower.blocks.25.attn.qkv.bias": "model-00001-of-00046.safetensors",
2857
+ "vision_tower.blocks.25.attn.qkv.weight": "model-00001-of-00046.safetensors",
2858
+ "vision_tower.blocks.25.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2859
+ "vision_tower.blocks.25.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2860
+ "vision_tower.blocks.25.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2861
+ "vision_tower.blocks.25.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2862
+ "vision_tower.blocks.25.norm1.bias": "model-00001-of-00046.safetensors",
2863
+ "vision_tower.blocks.25.norm1.weight": "model-00001-of-00046.safetensors",
2864
+ "vision_tower.blocks.25.norm2.bias": "model-00001-of-00046.safetensors",
2865
+ "vision_tower.blocks.25.norm2.weight": "model-00001-of-00046.safetensors",
2866
+ "vision_tower.blocks.26.attn.proj.bias": "model-00001-of-00046.safetensors",
2867
+ "vision_tower.blocks.26.attn.proj.weight": "model-00001-of-00046.safetensors",
2868
+ "vision_tower.blocks.26.attn.qkv.bias": "model-00001-of-00046.safetensors",
2869
+ "vision_tower.blocks.26.attn.qkv.weight": "model-00001-of-00046.safetensors",
2870
+ "vision_tower.blocks.26.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2871
+ "vision_tower.blocks.26.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2872
+ "vision_tower.blocks.26.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2873
+ "vision_tower.blocks.26.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2874
+ "vision_tower.blocks.26.norm1.bias": "model-00001-of-00046.safetensors",
2875
+ "vision_tower.blocks.26.norm1.weight": "model-00001-of-00046.safetensors",
2876
+ "vision_tower.blocks.26.norm2.bias": "model-00001-of-00046.safetensors",
2877
+ "vision_tower.blocks.26.norm2.weight": "model-00001-of-00046.safetensors",
2878
+ "vision_tower.blocks.3.attn.proj.bias": "model-00001-of-00046.safetensors",
2879
+ "vision_tower.blocks.3.attn.proj.weight": "model-00001-of-00046.safetensors",
2880
+ "vision_tower.blocks.3.attn.qkv.bias": "model-00001-of-00046.safetensors",
2881
+ "vision_tower.blocks.3.attn.qkv.weight": "model-00001-of-00046.safetensors",
2882
+ "vision_tower.blocks.3.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2883
+ "vision_tower.blocks.3.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2884
+ "vision_tower.blocks.3.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2885
+ "vision_tower.blocks.3.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2886
+ "vision_tower.blocks.3.norm1.bias": "model-00001-of-00046.safetensors",
2887
+ "vision_tower.blocks.3.norm1.weight": "model-00001-of-00046.safetensors",
2888
+ "vision_tower.blocks.3.norm2.bias": "model-00001-of-00046.safetensors",
2889
+ "vision_tower.blocks.3.norm2.weight": "model-00001-of-00046.safetensors",
2890
+ "vision_tower.blocks.4.attn.proj.bias": "model-00001-of-00046.safetensors",
2891
+ "vision_tower.blocks.4.attn.proj.weight": "model-00001-of-00046.safetensors",
2892
+ "vision_tower.blocks.4.attn.qkv.bias": "model-00001-of-00046.safetensors",
2893
+ "vision_tower.blocks.4.attn.qkv.weight": "model-00001-of-00046.safetensors",
2894
+ "vision_tower.blocks.4.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2895
+ "vision_tower.blocks.4.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2896
+ "vision_tower.blocks.4.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2897
+ "vision_tower.blocks.4.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2898
+ "vision_tower.blocks.4.norm1.bias": "model-00001-of-00046.safetensors",
2899
+ "vision_tower.blocks.4.norm1.weight": "model-00001-of-00046.safetensors",
2900
+ "vision_tower.blocks.4.norm2.bias": "model-00001-of-00046.safetensors",
2901
+ "vision_tower.blocks.4.norm2.weight": "model-00001-of-00046.safetensors",
2902
+ "vision_tower.blocks.5.attn.proj.bias": "model-00001-of-00046.safetensors",
2903
+ "vision_tower.blocks.5.attn.proj.weight": "model-00001-of-00046.safetensors",
2904
+ "vision_tower.blocks.5.attn.qkv.bias": "model-00001-of-00046.safetensors",
2905
+ "vision_tower.blocks.5.attn.qkv.weight": "model-00001-of-00046.safetensors",
2906
+ "vision_tower.blocks.5.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2907
+ "vision_tower.blocks.5.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2908
+ "vision_tower.blocks.5.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2909
+ "vision_tower.blocks.5.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2910
+ "vision_tower.blocks.5.norm1.bias": "model-00001-of-00046.safetensors",
2911
+ "vision_tower.blocks.5.norm1.weight": "model-00001-of-00046.safetensors",
2912
+ "vision_tower.blocks.5.norm2.bias": "model-00001-of-00046.safetensors",
2913
+ "vision_tower.blocks.5.norm2.weight": "model-00001-of-00046.safetensors",
2914
+ "vision_tower.blocks.6.attn.proj.bias": "model-00001-of-00046.safetensors",
2915
+ "vision_tower.blocks.6.attn.proj.weight": "model-00001-of-00046.safetensors",
2916
+ "vision_tower.blocks.6.attn.qkv.bias": "model-00001-of-00046.safetensors",
2917
+ "vision_tower.blocks.6.attn.qkv.weight": "model-00001-of-00046.safetensors",
2918
+ "vision_tower.blocks.6.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2919
+ "vision_tower.blocks.6.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2920
+ "vision_tower.blocks.6.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2921
+ "vision_tower.blocks.6.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2922
+ "vision_tower.blocks.6.norm1.bias": "model-00001-of-00046.safetensors",
2923
+ "vision_tower.blocks.6.norm1.weight": "model-00001-of-00046.safetensors",
2924
+ "vision_tower.blocks.6.norm2.bias": "model-00001-of-00046.safetensors",
2925
+ "vision_tower.blocks.6.norm2.weight": "model-00001-of-00046.safetensors",
2926
+ "vision_tower.blocks.7.attn.proj.bias": "model-00001-of-00046.safetensors",
2927
+ "vision_tower.blocks.7.attn.proj.weight": "model-00001-of-00046.safetensors",
2928
+ "vision_tower.blocks.7.attn.qkv.bias": "model-00001-of-00046.safetensors",
2929
+ "vision_tower.blocks.7.attn.qkv.weight": "model-00001-of-00046.safetensors",
2930
+ "vision_tower.blocks.7.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2931
+ "vision_tower.blocks.7.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2932
+ "vision_tower.blocks.7.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2933
+ "vision_tower.blocks.7.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2934
+ "vision_tower.blocks.7.norm1.bias": "model-00001-of-00046.safetensors",
2935
+ "vision_tower.blocks.7.norm1.weight": "model-00001-of-00046.safetensors",
2936
+ "vision_tower.blocks.7.norm2.bias": "model-00001-of-00046.safetensors",
2937
+ "vision_tower.blocks.7.norm2.weight": "model-00001-of-00046.safetensors",
2938
+ "vision_tower.blocks.8.attn.proj.bias": "model-00001-of-00046.safetensors",
2939
+ "vision_tower.blocks.8.attn.proj.weight": "model-00001-of-00046.safetensors",
2940
+ "vision_tower.blocks.8.attn.qkv.bias": "model-00001-of-00046.safetensors",
2941
+ "vision_tower.blocks.8.attn.qkv.weight": "model-00001-of-00046.safetensors",
2942
+ "vision_tower.blocks.8.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2943
+ "vision_tower.blocks.8.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2944
+ "vision_tower.blocks.8.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2945
+ "vision_tower.blocks.8.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2946
+ "vision_tower.blocks.8.norm1.bias": "model-00001-of-00046.safetensors",
2947
+ "vision_tower.blocks.8.norm1.weight": "model-00001-of-00046.safetensors",
2948
+ "vision_tower.blocks.8.norm2.bias": "model-00001-of-00046.safetensors",
2949
+ "vision_tower.blocks.8.norm2.weight": "model-00001-of-00046.safetensors",
2950
+ "vision_tower.blocks.9.attn.proj.bias": "model-00001-of-00046.safetensors",
2951
+ "vision_tower.blocks.9.attn.proj.weight": "model-00001-of-00046.safetensors",
2952
+ "vision_tower.blocks.9.attn.qkv.bias": "model-00001-of-00046.safetensors",
2953
+ "vision_tower.blocks.9.attn.qkv.weight": "model-00001-of-00046.safetensors",
2954
+ "vision_tower.blocks.9.mlp.linear_fc1.bias": "model-00001-of-00046.safetensors",
2955
+ "vision_tower.blocks.9.mlp.linear_fc1.weight": "model-00001-of-00046.safetensors",
2956
+ "vision_tower.blocks.9.mlp.linear_fc2.bias": "model-00001-of-00046.safetensors",
2957
+ "vision_tower.blocks.9.mlp.linear_fc2.weight": "model-00001-of-00046.safetensors",
2958
+ "vision_tower.blocks.9.norm1.bias": "model-00001-of-00046.safetensors",
2959
+ "vision_tower.blocks.9.norm1.weight": "model-00001-of-00046.safetensors",
2960
+ "vision_tower.blocks.9.norm2.bias": "model-00001-of-00046.safetensors",
2961
+ "vision_tower.blocks.9.norm2.weight": "model-00001-of-00046.safetensors",
2962
+ "vision_tower.merger.linear_fc1.bias": "model-00001-of-00046.safetensors",
2963
+ "vision_tower.merger.linear_fc1.weight": "model-00001-of-00046.safetensors",
2964
+ "vision_tower.merger.linear_fc2.bias": "model-00001-of-00046.safetensors",
2965
+ "vision_tower.merger.linear_fc2.weight": "model-00001-of-00046.safetensors",
2966
+ "vision_tower.merger.norm.bias": "model-00001-of-00046.safetensors",
2967
+ "vision_tower.merger.norm.weight": "model-00001-of-00046.safetensors",
2968
+ "vision_tower.patch_embed.proj.bias": "model-00001-of-00046.safetensors",
2969
+ "vision_tower.patch_embed.proj.weight": "model-00001-of-00046.safetensors",
2970
+ "vision_tower.pos_embed.weight": "model-00001-of-00046.safetensors"
2971
  }
2972
  }
preprocessor_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "size": {
3
+ "longest_edge": 16777216,
4
+ "shortest_edge": 65536
5
+ },
6
+ "patch_size": 16,
7
+ "temporal_patch_size": 2,
8
+ "merge_size": 2,
9
+ "image_mean": [
10
+ 0.5,
11
+ 0.5,
12
+ 0.5
13
+ ],
14
+ "image_std": [
15
+ 0.5,
16
+ 0.5,
17
+ 0.5
18
+ ],
19
+ "processor_class": "Qwen3VLProcessor",
20
+ "image_processor_type": "Qwen2VLImageProcessorFast"
21
+ }