zengxianyu commited on
Commit
8573fa5
·
1 Parent(s): bc20114

upload submodules

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +1 -0
  2. .gitignore +1 -1
  3. .gitmodules +6 -0
  4. ComfyUI/custom_nodes/ComfyUI-GGUF +1 -0
  5. ComfyUI/custom_nodes/ComfyUI-GGUF/LICENSE +0 -201
  6. ComfyUI/custom_nodes/ComfyUI-GGUF/README.md +0 -49
  7. ComfyUI/custom_nodes/ComfyUI-GGUF/__init__.py +0 -9
  8. ComfyUI/custom_nodes/ComfyUI-GGUF/dequant.py +0 -248
  9. ComfyUI/custom_nodes/ComfyUI-GGUF/loader.py +0 -353
  10. ComfyUI/custom_nodes/ComfyUI-GGUF/nodes.py +0 -305
  11. ComfyUI/custom_nodes/ComfyUI-GGUF/ops.py +0 -281
  12. ComfyUI/custom_nodes/ComfyUI-GGUF/pyproject.toml +0 -14
  13. ComfyUI/custom_nodes/ComfyUI-GGUF/requirements.txt +0 -5
  14. ComfyUI/custom_nodes/ComfyUI-GGUF/tools/README.md +0 -93
  15. ComfyUI/custom_nodes/ComfyUI-GGUF/tools/convert.py +0 -365
  16. ComfyUI/custom_nodes/ComfyUI-GGUF/tools/fix_5d_tensors.py +0 -82
  17. ComfyUI/custom_nodes/ComfyUI-GGUF/tools/fix_lines_ending.py +0 -31
  18. ComfyUI/custom_nodes/ComfyUI-GGUF/tools/lcpp.patch +0 -451
  19. ComfyUI/custom_nodes/ComfyUI-GGUF/tools/read_tensors.py +0 -21
  20. ComfyUI/custom_nodes/cg-image-filter +1 -0
  21. ComfyUI/models/audio_encoders/put_audio_encoder_models_here +0 -0
  22. ComfyUI/models/checkpoints/put_checkpoints_here +0 -0
  23. ComfyUI/models/clip/put_clip_or_text_encoder_models_here +0 -0
  24. ComfyUI/models/clip_vision/put_clip_vision_models_here +0 -0
  25. ComfyUI/models/configs/anything_v3.yaml +0 -73
  26. ComfyUI/models/configs/v1-inference.yaml +0 -70
  27. ComfyUI/models/configs/v1-inference_clip_skip_2.yaml +0 -73
  28. ComfyUI/models/configs/v1-inference_clip_skip_2_fp16.yaml +0 -74
  29. ComfyUI/models/configs/v1-inference_fp16.yaml +0 -71
  30. ComfyUI/models/configs/v1-inpainting-inference.yaml +0 -71
  31. ComfyUI/models/configs/v2-inference-v.yaml +0 -68
  32. ComfyUI/models/configs/v2-inference-v_fp32.yaml +0 -68
  33. ComfyUI/models/configs/v2-inference.yaml +0 -67
  34. ComfyUI/models/configs/v2-inference_fp32.yaml +0 -67
  35. ComfyUI/models/configs/v2-inpainting-inference.yaml +0 -158
  36. ComfyUI/models/controlnet/put_controlnets_and_t2i_here +0 -0
  37. ComfyUI/models/diffusers/put_diffusers_models_here +0 -0
  38. ComfyUI/models/diffusion_models/put_diffusion_model_files_here +0 -0
  39. ComfyUI/models/embeddings/put_embeddings_or_textual_inversion_concepts_here +0 -0
  40. ComfyUI/models/gligen/put_gligen_models_here +0 -0
  41. ComfyUI/models/hypernetworks/put_hypernetworks_here +0 -0
  42. ComfyUI/models/loras/put_loras_here +0 -0
  43. ComfyUI/models/model_patches/put_model_patches_here +0 -0
  44. ComfyUI/models/photomaker/put_photomaker_models_here +0 -0
  45. ComfyUI/models/style_models/put_t2i_style_model_here +0 -0
  46. ComfyUI/models/text_encoders/put_text_encoder_files_here +0 -0
  47. ComfyUI/models/unet/put_unet_files_here +0 -0
  48. ComfyUI/models/upscale_models/put_esrgan_and_other_upscale_models_here +0 -0
  49. ComfyUI/models/vae/put_vae_here +0 -0
  50. ComfyUI/models/vae_approx/put_taesd_encoder_pth_and_taesd_decoder_pth_here +0 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
.gitignore CHANGED
@@ -1,3 +1,3 @@
1
  *.pyc
2
  *.gguf
3
- *.safetensors
 
1
  *.pyc
2
  *.gguf
3
+ *.safetensors
.gitmodules ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ [submodule "ComfyUI/custom_nodes/ComfyUI-GGUF"]
2
+ path = ComfyUI/custom_nodes/ComfyUI-GGUF
3
+ url = https://github.com/city96/ComfyUI-GGUF
4
+ [submodule "ComfyUI/custom_nodes/cg-image-filter"]
5
+ path = ComfyUI/custom_nodes/cg-image-filter
6
+ url = https://github.com/chrisgoringe/cg-image-filter
ComfyUI/custom_nodes/ComfyUI-GGUF ADDED
@@ -0,0 +1 @@
 
 
1
+ Subproject commit be2a08330d7ec232d684e50ab938870d7529471e
ComfyUI/custom_nodes/ComfyUI-GGUF/LICENSE DELETED
@@ -1,201 +0,0 @@
1
- Apache License
2
- Version 2.0, January 2004
3
- http://www.apache.org/licenses/
4
-
5
- TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
-
7
- 1. Definitions.
8
-
9
- "License" shall mean the terms and conditions for use, reproduction,
10
- and distribution as defined by Sections 1 through 9 of this document.
11
-
12
- "Licensor" shall mean the copyright owner or entity authorized by
13
- the copyright owner that is granting the License.
14
-
15
- "Legal Entity" shall mean the union of the acting entity and all
16
- other entities that control, are controlled by, or are under common
17
- control with that entity. For the purposes of this definition,
18
- "control" means (i) the power, direct or indirect, to cause the
19
- direction or management of such entity, whether by contract or
20
- otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
- outstanding shares, or (iii) beneficial ownership of such entity.
22
-
23
- "You" (or "Your") shall mean an individual or Legal Entity
24
- exercising permissions granted by this License.
25
-
26
- "Source" form shall mean the preferred form for making modifications,
27
- including but not limited to software source code, documentation
28
- source, and configuration files.
29
-
30
- "Object" form shall mean any form resulting from mechanical
31
- transformation or translation of a Source form, including but
32
- not limited to compiled object code, generated documentation,
33
- and conversions to other media types.
34
-
35
- "Work" shall mean the work of authorship, whether in Source or
36
- Object form, made available under the License, as indicated by a
37
- copyright notice that is included in or attached to the work
38
- (an example is provided in the Appendix below).
39
-
40
- "Derivative Works" shall mean any work, whether in Source or Object
41
- form, that is based on (or derived from) the Work and for which the
42
- editorial revisions, annotations, elaborations, or other modifications
43
- represent, as a whole, an original work of authorship. For the purposes
44
- of this License, Derivative Works shall not include works that remain
45
- separable from, or merely link (or bind by name) to the interfaces of,
46
- the Work and Derivative Works thereof.
47
-
48
- "Contribution" shall mean any work of authorship, including
49
- the original version of the Work and any modifications or additions
50
- to that Work or Derivative Works thereof, that is intentionally
51
- submitted to Licensor for inclusion in the Work by the copyright owner
52
- or by an individual or Legal Entity authorized to submit on behalf of
53
- the copyright owner. For the purposes of this definition, "submitted"
54
- means any form of electronic, verbal, or written communication sent
55
- to the Licensor or its representatives, including but not limited to
56
- communication on electronic mailing lists, source code control systems,
57
- and issue tracking systems that are managed by, or on behalf of, the
58
- Licensor for the purpose of discussing and improving the Work, but
59
- excluding communication that is conspicuously marked or otherwise
60
- designated in writing by the copyright owner as "Not a Contribution."
61
-
62
- "Contributor" shall mean Licensor and any individual or Legal Entity
63
- on behalf of whom a Contribution has been received by Licensor and
64
- subsequently incorporated within the Work.
65
-
66
- 2. Grant of Copyright License. Subject to the terms and conditions of
67
- this License, each Contributor hereby grants to You a perpetual,
68
- worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
- copyright license to reproduce, prepare Derivative Works of,
70
- publicly display, publicly perform, sublicense, and distribute the
71
- Work and such Derivative Works in Source or Object form.
72
-
73
- 3. Grant of Patent License. Subject to the terms and conditions of
74
- this License, each Contributor hereby grants to You a perpetual,
75
- worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
- (except as stated in this section) patent license to make, have made,
77
- use, offer to sell, sell, import, and otherwise transfer the Work,
78
- where such license applies only to those patent claims licensable
79
- by such Contributor that are necessarily infringed by their
80
- Contribution(s) alone or by combination of their Contribution(s)
81
- with the Work to which such Contribution(s) was submitted. If You
82
- institute patent litigation against any entity (including a
83
- cross-claim or counterclaim in a lawsuit) alleging that the Work
84
- or a Contribution incorporated within the Work constitutes direct
85
- or contributory patent infringement, then any patent licenses
86
- granted to You under this License for that Work shall terminate
87
- as of the date such litigation is filed.
88
-
89
- 4. Redistribution. You may reproduce and distribute copies of the
90
- Work or Derivative Works thereof in any medium, with or without
91
- modifications, and in Source or Object form, provided that You
92
- meet the following conditions:
93
-
94
- (a) You must give any other recipients of the Work or
95
- Derivative Works a copy of this License; and
96
-
97
- (b) You must cause any modified files to carry prominent notices
98
- stating that You changed the files; and
99
-
100
- (c) You must retain, in the Source form of any Derivative Works
101
- that You distribute, all copyright, patent, trademark, and
102
- attribution notices from the Source form of the Work,
103
- excluding those notices that do not pertain to any part of
104
- the Derivative Works; and
105
-
106
- (d) If the Work includes a "NOTICE" text file as part of its
107
- distribution, then any Derivative Works that You distribute must
108
- include a readable copy of the attribution notices contained
109
- within such NOTICE file, excluding those notices that do not
110
- pertain to any part of the Derivative Works, in at least one
111
- of the following places: within a NOTICE text file distributed
112
- as part of the Derivative Works; within the Source form or
113
- documentation, if provided along with the Derivative Works; or,
114
- within a display generated by the Derivative Works, if and
115
- wherever such third-party notices normally appear. The contents
116
- of the NOTICE file are for informational purposes only and
117
- do not modify the License. You may add Your own attribution
118
- notices within Derivative Works that You distribute, alongside
119
- or as an addendum to the NOTICE text from the Work, provided
120
- that such additional attribution notices cannot be construed
121
- as modifying the License.
122
-
123
- You may add Your own copyright statement to Your modifications and
124
- may provide additional or different license terms and conditions
125
- for use, reproduction, or distribution of Your modifications, or
126
- for any such Derivative Works as a whole, provided Your use,
127
- reproduction, and distribution of the Work otherwise complies with
128
- the conditions stated in this License.
129
-
130
- 5. Submission of Contributions. Unless You explicitly state otherwise,
131
- any Contribution intentionally submitted for inclusion in the Work
132
- by You to the Licensor shall be under the terms and conditions of
133
- this License, without any additional terms or conditions.
134
- Notwithstanding the above, nothing herein shall supersede or modify
135
- the terms of any separate license agreement you may have executed
136
- with Licensor regarding such Contributions.
137
-
138
- 6. Trademarks. This License does not grant permission to use the trade
139
- names, trademarks, service marks, or product names of the Licensor,
140
- except as required for reasonable and customary use in describing the
141
- origin of the Work and reproducing the content of the NOTICE file.
142
-
143
- 7. Disclaimer of Warranty. Unless required by applicable law or
144
- agreed to in writing, Licensor provides the Work (and each
145
- Contributor provides its Contributions) on an "AS IS" BASIS,
146
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
- implied, including, without limitation, any warranties or conditions
148
- of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
- PARTICULAR PURPOSE. You are solely responsible for determining the
150
- appropriateness of using or redistributing the Work and assume any
151
- risks associated with Your exercise of permissions under this License.
152
-
153
- 8. Limitation of Liability. In no event and under no legal theory,
154
- whether in tort (including negligence), contract, or otherwise,
155
- unless required by applicable law (such as deliberate and grossly
156
- negligent acts) or agreed to in writing, shall any Contributor be
157
- liable to You for damages, including any direct, indirect, special,
158
- incidental, or consequential damages of any character arising as a
159
- result of this License or out of the use or inability to use the
160
- Work (including but not limited to damages for loss of goodwill,
161
- work stoppage, computer failure or malfunction, or any and all
162
- other commercial damages or losses), even if such Contributor
163
- has been advised of the possibility of such damages.
164
-
165
- 9. Accepting Warranty or Additional Liability. While redistributing
166
- the Work or Derivative Works thereof, You may choose to offer,
167
- and charge a fee for, acceptance of support, warranty, indemnity,
168
- or other liability obligations and/or rights consistent with this
169
- License. However, in accepting such obligations, You may act only
170
- on Your own behalf and on Your sole responsibility, not on behalf
171
- of any other Contributor, and only if You agree to indemnify,
172
- defend, and hold each Contributor harmless for any liability
173
- incurred by, or claims asserted against, such Contributor by reason
174
- of your accepting any such warranty or additional liability.
175
-
176
- END OF TERMS AND CONDITIONS
177
-
178
- APPENDIX: How to apply the Apache License to your work.
179
-
180
- To apply the Apache License to your work, attach the following
181
- boilerplate notice, with the fields enclosed by brackets "[]"
182
- replaced with your own identifying information. (Don't include
183
- the brackets!) The text should be enclosed in the appropriate
184
- comment syntax for the file format. We also recommend that a
185
- file or class name and description of purpose be included on the
186
- same "printed page" as the copyright notice for easier
187
- identification within third-party archives.
188
-
189
- Copyright [yyyy] [name of copyright owner]
190
-
191
- Licensed under the Apache License, Version 2.0 (the "License");
192
- you may not use this file except in compliance with the License.
193
- You may obtain a copy of the License at
194
-
195
- http://www.apache.org/licenses/LICENSE-2.0
196
-
197
- Unless required by applicable law or agreed to in writing, software
198
- distributed under the License is distributed on an "AS IS" BASIS,
199
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
- See the License for the specific language governing permissions and
201
- limitations under the License.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/README.md DELETED
@@ -1,49 +0,0 @@
1
- # ComfyUI-GGUF
2
- GGUF Quantization support for native ComfyUI models
3
-
4
- This is currently very much WIP. These custom nodes provide support for model files stored in the GGUF format popularized by [llama.cpp](https://github.com/ggerganov/llama.cpp).
5
-
6
- While quantization wasn't feasible for regular UNET models (conv2d), transformer/DiT models such as flux seem less affected by quantization. This allows running it in much lower bits per weight variable bitrate quants on low-end GPUs. For further VRAM savings, a node to load a quantized version of the T5 text encoder is also included.
7
-
8
- ![Comfy_Flux1_dev_Q4_0_GGUF_1024](https://github.com/user-attachments/assets/70d16d97-c522-4ef4-9435-633f128644c8)
9
-
10
- Note: The "Force/Set CLIP Device" is **NOT** part of this node pack. Do not install it if you only have one GPU. Do not set it to cuda:0 then complain about OOM errors if you do not undestand what it is for. There is not need to copy the workflow above, just use your own workflow and replace the stock "Load Diffusion Model" with the "Unet Loader (GGUF)" node.
11
-
12
- ## Installation
13
-
14
- > [!IMPORTANT]
15
- > Make sure your ComfyUI is on a recent-enough version to support custom ops when loading the UNET-only.
16
-
17
- To install the custom node normally, git clone this repository into your custom nodes folder (`ComfyUI/custom_nodes`) and install the only dependency for inference (`pip install --upgrade gguf`)
18
-
19
- ```
20
- git clone https://github.com/city96/ComfyUI-GGUF
21
- ```
22
-
23
- To install the custom node on a standalone ComfyUI release, open a CMD inside the "ComfyUI_windows_portable" folder (where your `run_nvidia_gpu.bat` file is) and use the following commands:
24
-
25
- ```
26
- git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
27
- .\python_embeded\python.exe -s -m pip install -r .\ComfyUI\custom_nodes\ComfyUI-GGUF\requirements.txt
28
- ```
29
-
30
- On MacOS sequoia, torch 2.4.1 seems to be required, as 2.6.X nightly versions cause a "M1 buffer is not large enough" error. See [this issue](https://github.com/city96/ComfyUI-GGUF/issues/107) for more information/workarounds.
31
-
32
- ## Usage
33
-
34
- Simply use the GGUF Unet loader found under the `bootleg` category. Place the .gguf model files in your `ComfyUI/models/unet` folder.
35
-
36
- LoRA loading is experimental but it should work with just the built-in LoRA loader node(s).
37
-
38
- Pre-quantized models:
39
-
40
- - [flux1-dev GGUF](https://huggingface.co/city96/FLUX.1-dev-gguf)
41
- - [flux1-schnell GGUF](https://huggingface.co/city96/FLUX.1-schnell-gguf)
42
- - [stable-diffusion-3.5-large GGUF](https://huggingface.co/city96/stable-diffusion-3.5-large-gguf)
43
- - [stable-diffusion-3.5-large-turbo GGUF](https://huggingface.co/city96/stable-diffusion-3.5-large-turbo-gguf)
44
-
45
- Initial support for quantizing T5 has also been added recently, these can be used using the various `*CLIPLoader (gguf)` nodes which can be used inplace of the regular ones. For the CLIP model, use whatever model you were using before for CLIP. The loader can handle both types of files - `gguf` and regular `safetensors`/`bin`.
46
-
47
- - [t5_v1.1-xxl GGUF](https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf)
48
-
49
- See the instructions in the [tools](https://github.com/city96/ComfyUI-GGUF/tree/main/tools) folder for how to create your own quants.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/__init__.py DELETED
@@ -1,9 +0,0 @@
1
- # only import if running as a custom node
2
- try:
3
- import comfy.utils
4
- except ImportError:
5
- pass
6
- else:
7
- from .nodes import NODE_CLASS_MAPPINGS
8
- NODE_DISPLAY_NAME_MAPPINGS = {k:v.TITLE for k,v in NODE_CLASS_MAPPINGS.items()}
9
- __all__ = ['NODE_CLASS_MAPPINGS', 'NODE_DISPLAY_NAME_MAPPINGS']
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/dequant.py DELETED
@@ -1,248 +0,0 @@
1
- # (c) City96 || Apache-2.0 (apache.org/licenses/LICENSE-2.0)
2
- import gguf
3
- import torch
4
- from tqdm import tqdm
5
-
6
-
7
- TORCH_COMPATIBLE_QTYPES = (None, gguf.GGMLQuantizationType.F32, gguf.GGMLQuantizationType.F16)
8
-
9
- def is_torch_compatible(tensor):
10
- return tensor is None or getattr(tensor, "tensor_type", None) in TORCH_COMPATIBLE_QTYPES
11
-
12
- def is_quantized(tensor):
13
- return not is_torch_compatible(tensor)
14
-
15
- def dequantize_tensor(tensor, dtype=None, dequant_dtype=None):
16
- qtype = getattr(tensor, "tensor_type", None)
17
- oshape = getattr(tensor, "tensor_shape", tensor.shape)
18
-
19
- if qtype in TORCH_COMPATIBLE_QTYPES:
20
- return tensor.to(dtype)
21
- elif qtype in dequantize_functions:
22
- dequant_dtype = dtype if dequant_dtype == "target" else dequant_dtype
23
- return dequantize(tensor.data, qtype, oshape, dtype=dequant_dtype).to(dtype)
24
- else:
25
- # this is incredibly slow
26
- tqdm.write(f"Falling back to numpy dequant for qtype: {qtype}")
27
- new = gguf.quants.dequantize(tensor.cpu().numpy(), qtype)
28
- return torch.from_numpy(new).to(tensor.device, dtype=dtype)
29
-
30
- def dequantize(data, qtype, oshape, dtype=None):
31
- """
32
- Dequantize tensor back to usable shape/dtype
33
- """
34
- block_size, type_size = gguf.GGML_QUANT_SIZES[qtype]
35
- dequantize_blocks = dequantize_functions[qtype]
36
-
37
- rows = data.reshape(
38
- (-1, data.shape[-1])
39
- ).view(torch.uint8)
40
-
41
- n_blocks = rows.numel() // type_size
42
- blocks = rows.reshape((n_blocks, type_size))
43
- blocks = dequantize_blocks(blocks, block_size, type_size, dtype)
44
- return blocks.reshape(oshape)
45
-
46
- def to_uint32(x):
47
- # no uint32 :(
48
- x = x.view(torch.uint8).to(torch.int32)
49
- return (x[:, 0] | x[:, 1] << 8 | x[:, 2] << 16 | x[:, 3] << 24).unsqueeze(1)
50
-
51
- def split_block_dims(blocks, *args):
52
- n_max = blocks.shape[1]
53
- dims = list(args) + [n_max - sum(args)]
54
- return torch.split(blocks, dims, dim=1)
55
-
56
- # Full weights #
57
- def dequantize_blocks_BF16(blocks, block_size, type_size, dtype=None):
58
- return (blocks.view(torch.int16).to(torch.int32) << 16).view(torch.float32)
59
-
60
- # Legacy Quants #
61
- def dequantize_blocks_Q8_0(blocks, block_size, type_size, dtype=None):
62
- d, x = split_block_dims(blocks, 2)
63
- d = d.view(torch.float16).to(dtype)
64
- x = x.view(torch.int8)
65
- return (d * x)
66
-
67
- def dequantize_blocks_Q5_1(blocks, block_size, type_size, dtype=None):
68
- n_blocks = blocks.shape[0]
69
-
70
- d, m, qh, qs = split_block_dims(blocks, 2, 2, 4)
71
- d = d.view(torch.float16).to(dtype)
72
- m = m.view(torch.float16).to(dtype)
73
- qh = to_uint32(qh)
74
-
75
- qh = qh.reshape((n_blocks, 1)) >> torch.arange(32, device=d.device, dtype=torch.int32).reshape(1, 32)
76
- ql = qs.reshape((n_blocks, -1, 1, block_size // 2)) >> torch.tensor([0, 4], device=d.device, dtype=torch.uint8).reshape(1, 1, 2, 1)
77
- qh = (qh & 1).to(torch.uint8)
78
- ql = (ql & 0x0F).reshape((n_blocks, -1))
79
-
80
- qs = (ql | (qh << 4))
81
- return (d * qs) + m
82
-
83
- def dequantize_blocks_Q5_0(blocks, block_size, type_size, dtype=None):
84
- n_blocks = blocks.shape[0]
85
-
86
- d, qh, qs = split_block_dims(blocks, 2, 4)
87
- d = d.view(torch.float16).to(dtype)
88
- qh = to_uint32(qh)
89
-
90
- qh = qh.reshape(n_blocks, 1) >> torch.arange(32, device=d.device, dtype=torch.int32).reshape(1, 32)
91
- ql = qs.reshape(n_blocks, -1, 1, block_size // 2) >> torch.tensor([0, 4], device=d.device, dtype=torch.uint8).reshape(1, 1, 2, 1)
92
-
93
- qh = (qh & 1).to(torch.uint8)
94
- ql = (ql & 0x0F).reshape(n_blocks, -1)
95
-
96
- qs = (ql | (qh << 4)).to(torch.int8) - 16
97
- return (d * qs)
98
-
99
- def dequantize_blocks_Q4_1(blocks, block_size, type_size, dtype=None):
100
- n_blocks = blocks.shape[0]
101
-
102
- d, m, qs = split_block_dims(blocks, 2, 2)
103
- d = d.view(torch.float16).to(dtype)
104
- m = m.view(torch.float16).to(dtype)
105
-
106
- qs = qs.reshape((n_blocks, -1, 1, block_size // 2)) >> torch.tensor([0, 4], device=d.device, dtype=torch.uint8).reshape(1, 1, 2, 1)
107
- qs = (qs & 0x0F).reshape(n_blocks, -1)
108
-
109
- return (d * qs) + m
110
-
111
- def dequantize_blocks_Q4_0(blocks, block_size, type_size, dtype=None):
112
- n_blocks = blocks.shape[0]
113
-
114
- d, qs = split_block_dims(blocks, 2)
115
- d = d.view(torch.float16).to(dtype)
116
-
117
- qs = qs.reshape((n_blocks, -1, 1, block_size // 2)) >> torch.tensor([0, 4], device=d.device, dtype=torch.uint8).reshape((1, 1, 2, 1))
118
- qs = (qs & 0x0F).reshape((n_blocks, -1)).to(torch.int8) - 8
119
- return (d * qs)
120
-
121
- # K Quants #
122
- QK_K = 256
123
- K_SCALE_SIZE = 12
124
-
125
- def get_scale_min(scales):
126
- n_blocks = scales.shape[0]
127
- scales = scales.view(torch.uint8)
128
- scales = scales.reshape((n_blocks, 3, 4))
129
-
130
- d, m, m_d = torch.split(scales, scales.shape[-2] // 3, dim=-2)
131
-
132
- sc = torch.cat([d & 0x3F, (m_d & 0x0F) | ((d >> 2) & 0x30)], dim=-1)
133
- min = torch.cat([m & 0x3F, (m_d >> 4) | ((m >> 2) & 0x30)], dim=-1)
134
-
135
- return (sc.reshape((n_blocks, 8)), min.reshape((n_blocks, 8)))
136
-
137
- def dequantize_blocks_Q6_K(blocks, block_size, type_size, dtype=None):
138
- n_blocks = blocks.shape[0]
139
-
140
- ql, qh, scales, d, = split_block_dims(blocks, QK_K // 2, QK_K // 4, QK_K // 16)
141
-
142
- scales = scales.view(torch.int8).to(dtype)
143
- d = d.view(torch.float16).to(dtype)
144
- d = (d * scales).reshape((n_blocks, QK_K // 16, 1))
145
-
146
- ql = ql.reshape((n_blocks, -1, 1, 64)) >> torch.tensor([0, 4], device=d.device, dtype=torch.uint8).reshape((1, 1, 2, 1))
147
- ql = (ql & 0x0F).reshape((n_blocks, -1, 32))
148
- qh = qh.reshape((n_blocks, -1, 1, 32)) >> torch.tensor([0, 2, 4, 6], device=d.device, dtype=torch.uint8).reshape((1, 1, 4, 1))
149
- qh = (qh & 0x03).reshape((n_blocks, -1, 32))
150
- q = (ql | (qh << 4)).to(torch.int8) - 32
151
- q = q.reshape((n_blocks, QK_K // 16, -1))
152
-
153
- return (d * q).reshape((n_blocks, QK_K))
154
-
155
- def dequantize_blocks_Q5_K(blocks, block_size, type_size, dtype=None):
156
- n_blocks = blocks.shape[0]
157
-
158
- d, dmin, scales, qh, qs = split_block_dims(blocks, 2, 2, K_SCALE_SIZE, QK_K // 8)
159
-
160
- d = d.view(torch.float16).to(dtype)
161
- dmin = dmin.view(torch.float16).to(dtype)
162
-
163
- sc, m = get_scale_min(scales)
164
-
165
- d = (d * sc).reshape((n_blocks, -1, 1))
166
- dm = (dmin * m).reshape((n_blocks, -1, 1))
167
-
168
- ql = qs.reshape((n_blocks, -1, 1, 32)) >> torch.tensor([0, 4], device=d.device, dtype=torch.uint8).reshape((1, 1, 2, 1))
169
- qh = qh.reshape((n_blocks, -1, 1, 32)) >> torch.tensor([i for i in range(8)], device=d.device, dtype=torch.uint8).reshape((1, 1, 8, 1))
170
- ql = (ql & 0x0F).reshape((n_blocks, -1, 32))
171
- qh = (qh & 0x01).reshape((n_blocks, -1, 32))
172
- q = (ql | (qh << 4))
173
-
174
- return (d * q - dm).reshape((n_blocks, QK_K))
175
-
176
- def dequantize_blocks_Q4_K(blocks, block_size, type_size, dtype=None):
177
- n_blocks = blocks.shape[0]
178
-
179
- d, dmin, scales, qs = split_block_dims(blocks, 2, 2, K_SCALE_SIZE)
180
- d = d.view(torch.float16).to(dtype)
181
- dmin = dmin.view(torch.float16).to(dtype)
182
-
183
- sc, m = get_scale_min(scales)
184
-
185
- d = (d * sc).reshape((n_blocks, -1, 1))
186
- dm = (dmin * m).reshape((n_blocks, -1, 1))
187
-
188
- qs = qs.reshape((n_blocks, -1, 1, 32)) >> torch.tensor([0, 4], device=d.device, dtype=torch.uint8).reshape((1, 1, 2, 1))
189
- qs = (qs & 0x0F).reshape((n_blocks, -1, 32))
190
-
191
- return (d * qs - dm).reshape((n_blocks, QK_K))
192
-
193
- def dequantize_blocks_Q3_K(blocks, block_size, type_size, dtype=None):
194
- n_blocks = blocks.shape[0]
195
-
196
- hmask, qs, scales, d = split_block_dims(blocks, QK_K // 8, QK_K // 4, 12)
197
- d = d.view(torch.float16).to(dtype)
198
-
199
- lscales, hscales = scales[:, :8], scales[:, 8:]
200
- lscales = lscales.reshape((n_blocks, 1, 8)) >> torch.tensor([0, 4], device=d.device, dtype=torch.uint8).reshape((1, 2, 1))
201
- lscales = lscales.reshape((n_blocks, 16))
202
- hscales = hscales.reshape((n_blocks, 1, 4)) >> torch.tensor([0, 2, 4, 6], device=d.device, dtype=torch.uint8).reshape((1, 4, 1))
203
- hscales = hscales.reshape((n_blocks, 16))
204
- scales = (lscales & 0x0F) | ((hscales & 0x03) << 4)
205
- scales = (scales.to(torch.int8) - 32)
206
-
207
- dl = (d * scales).reshape((n_blocks, 16, 1))
208
-
209
- ql = qs.reshape((n_blocks, -1, 1, 32)) >> torch.tensor([0, 2, 4, 6], device=d.device, dtype=torch.uint8).reshape((1, 1, 4, 1))
210
- qh = hmask.reshape(n_blocks, -1, 1, 32) >> torch.tensor([i for i in range(8)], device=d.device, dtype=torch.uint8).reshape((1, 1, 8, 1))
211
- ql = ql.reshape((n_blocks, 16, QK_K // 16)) & 3
212
- qh = (qh.reshape((n_blocks, 16, QK_K // 16)) & 1) ^ 1
213
- q = (ql.to(torch.int8) - (qh << 2).to(torch.int8))
214
-
215
- return (dl * q).reshape((n_blocks, QK_K))
216
-
217
- def dequantize_blocks_Q2_K(blocks, block_size, type_size, dtype=None):
218
- n_blocks = blocks.shape[0]
219
-
220
- scales, qs, d, dmin = split_block_dims(blocks, QK_K // 16, QK_K // 4, 2)
221
- d = d.view(torch.float16).to(dtype)
222
- dmin = dmin.view(torch.float16).to(dtype)
223
-
224
- # (n_blocks, 16, 1)
225
- dl = (d * (scales & 0xF)).reshape((n_blocks, QK_K // 16, 1))
226
- ml = (dmin * (scales >> 4)).reshape((n_blocks, QK_K // 16, 1))
227
-
228
- shift = torch.tensor([0, 2, 4, 6], device=d.device, dtype=torch.uint8).reshape((1, 1, 4, 1))
229
-
230
- qs = (qs.reshape((n_blocks, -1, 1, 32)) >> shift) & 3
231
- qs = qs.reshape((n_blocks, QK_K // 16, 16))
232
- qs = dl * qs - ml
233
-
234
- return qs.reshape((n_blocks, -1))
235
-
236
- dequantize_functions = {
237
- gguf.GGMLQuantizationType.BF16: dequantize_blocks_BF16,
238
- gguf.GGMLQuantizationType.Q8_0: dequantize_blocks_Q8_0,
239
- gguf.GGMLQuantizationType.Q5_1: dequantize_blocks_Q5_1,
240
- gguf.GGMLQuantizationType.Q5_0: dequantize_blocks_Q5_0,
241
- gguf.GGMLQuantizationType.Q4_1: dequantize_blocks_Q4_1,
242
- gguf.GGMLQuantizationType.Q4_0: dequantize_blocks_Q4_0,
243
- gguf.GGMLQuantizationType.Q6_K: dequantize_blocks_Q6_K,
244
- gguf.GGMLQuantizationType.Q5_K: dequantize_blocks_Q5_K,
245
- gguf.GGMLQuantizationType.Q4_K: dequantize_blocks_Q4_K,
246
- gguf.GGMLQuantizationType.Q3_K: dequantize_blocks_Q3_K,
247
- gguf.GGMLQuantizationType.Q2_K: dequantize_blocks_Q2_K,
248
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/loader.py DELETED
@@ -1,353 +0,0 @@
1
- # (c) City96 || Apache-2.0 (apache.org/licenses/LICENSE-2.0)
2
- import warnings
3
- import logging
4
- import torch
5
- import gguf
6
- import re
7
- import os
8
-
9
- from .ops import GGMLTensor
10
- from .dequant import is_quantized, dequantize_tensor
11
-
12
- IMG_ARCH_LIST = {"flux", "sd1", "sdxl", "sd3", "aura", "hidream", "cosmos", "ltxv", "hyvid", "wan", "lumina2", "qwen_image"}
13
- TXT_ARCH_LIST = {"t5", "t5encoder", "llama", "qwen2vl"}
14
- VIS_TYPE_LIST = {"clip-vision"}
15
-
16
- def get_orig_shape(reader, tensor_name):
17
- field_key = f"comfy.gguf.orig_shape.{tensor_name}"
18
- field = reader.get_field(field_key)
19
- if field is None:
20
- return None
21
- # Has original shape metadata, so we try to decode it.
22
- if len(field.types) != 2 or field.types[0] != gguf.GGUFValueType.ARRAY or field.types[1] != gguf.GGUFValueType.INT32:
23
- raise TypeError(f"Bad original shape metadata for {field_key}: Expected ARRAY of INT32, got {field.types}")
24
- return torch.Size(tuple(int(field.parts[part_idx][0]) for part_idx in field.data))
25
-
26
- def get_field(reader, field_name, field_type):
27
- field = reader.get_field(field_name)
28
- if field is None:
29
- return None
30
- elif field_type == str:
31
- # extra check here as this is used for checking arch string
32
- if len(field.types) != 1 or field.types[0] != gguf.GGUFValueType.STRING:
33
- raise TypeError(f"Bad type for GGUF {field_name} key: expected string, got {field.types!r}")
34
- return str(field.parts[field.data[-1]], encoding="utf-8")
35
- elif field_type in [int, float, bool]:
36
- return field_type(field.parts[field.data[-1]])
37
- else:
38
- raise TypeError(f"Unknown field type {field_type}")
39
-
40
- def get_list_field(reader, field_name, field_type):
41
- field = reader.get_field(field_name)
42
- if field is None:
43
- return None
44
- elif field_type == str:
45
- return tuple(str(field.parts[part_idx], encoding="utf-8") for part_idx in field.data)
46
- elif field_type in [int, float, bool]:
47
- return tuple(field_type(field.parts[part_idx][0]) for part_idx in field.data)
48
- else:
49
- raise TypeError(f"Unknown field type {field_type}")
50
-
51
- def gguf_sd_loader(path, handle_prefix="model.diffusion_model.", return_arch=False, is_text_model=False):
52
- """
53
- Read state dict as fake tensors
54
- """
55
- reader = gguf.GGUFReader(path)
56
-
57
- # filter and strip prefix
58
- has_prefix = False
59
- if handle_prefix is not None:
60
- prefix_len = len(handle_prefix)
61
- tensor_names = set(tensor.name for tensor in reader.tensors)
62
- has_prefix = any(s.startswith(handle_prefix) for s in tensor_names)
63
-
64
- tensors = []
65
- for tensor in reader.tensors:
66
- sd_key = tensor_name = tensor.name
67
- if has_prefix:
68
- if not tensor_name.startswith(handle_prefix):
69
- continue
70
- sd_key = tensor_name[prefix_len:]
71
- tensors.append((sd_key, tensor))
72
-
73
- # detect and verify architecture
74
- compat = None
75
- arch_str = get_field(reader, "general.architecture", str)
76
- type_str = get_field(reader, "general.type", str)
77
- if arch_str in [None, "pig"]:
78
- if is_text_model:
79
- raise ValueError(f"This text model is incompatible with llama.cpp!\nConsider using the safetensors version\n({path})")
80
- compat = "sd.cpp" if arch_str is None else arch_str
81
- # import here to avoid changes to convert.py breaking regular models
82
- from .tools.convert import detect_arch
83
- try:
84
- arch_str = detect_arch(set(val[0] for val in tensors)).arch
85
- except Exception as e:
86
- raise ValueError(f"This model is not currently supported - ({e})")
87
- elif arch_str not in TXT_ARCH_LIST and is_text_model:
88
- if type_str not in VIS_TYPE_LIST:
89
- raise ValueError(f"Unexpected text model architecture type in GGUF file: {arch_str!r}")
90
- elif arch_str not in IMG_ARCH_LIST and not is_text_model:
91
- raise ValueError(f"Unexpected architecture type in GGUF file: {arch_str!r}")
92
-
93
- if compat:
94
- logging.warning(f"Warning: This gguf model file is loaded in compatibility mode '{compat}' [arch:{arch_str}]")
95
-
96
- # main loading loop
97
- state_dict = {}
98
- qtype_dict = {}
99
- for sd_key, tensor in tensors:
100
- tensor_name = tensor.name
101
- # torch_tensor = torch.from_numpy(tensor.data) # mmap
102
-
103
- # NOTE: line above replaced with this block to avoid persistent numpy warning about mmap
104
- with warnings.catch_warnings():
105
- warnings.filterwarnings("ignore", message="The given NumPy array is not writable")
106
- torch_tensor = torch.from_numpy(tensor.data) # mmap
107
-
108
- shape = get_orig_shape(reader, tensor_name)
109
- if shape is None:
110
- shape = torch.Size(tuple(int(v) for v in reversed(tensor.shape)))
111
- # Workaround for stable-diffusion.cpp SDXL detection.
112
- if compat == "sd.cpp" and arch_str == "sdxl":
113
- if any([tensor_name.endswith(x) for x in (".proj_in.weight", ".proj_out.weight")]):
114
- while len(shape) > 2 and shape[-1] == 1:
115
- shape = shape[:-1]
116
-
117
- # add to state dict
118
- if tensor.tensor_type in {gguf.GGMLQuantizationType.F32, gguf.GGMLQuantizationType.F16}:
119
- torch_tensor = torch_tensor.view(*shape)
120
- state_dict[sd_key] = GGMLTensor(torch_tensor, tensor_type=tensor.tensor_type, tensor_shape=shape)
121
-
122
- # keep track of loaded tensor types
123
- tensor_type_str = getattr(tensor.tensor_type, "name", repr(tensor.tensor_type))
124
- qtype_dict[tensor_type_str] = qtype_dict.get(tensor_type_str, 0) + 1
125
-
126
- # print loaded tensor type counts
127
- logging.info("gguf qtypes: " + ", ".join(f"{k} ({v})" for k, v in qtype_dict.items()))
128
-
129
- # mark largest tensor for vram estimation
130
- qsd = {k:v for k,v in state_dict.items() if is_quantized(v)}
131
- if len(qsd) > 0:
132
- max_key = max(qsd.keys(), key=lambda k: qsd[k].numel())
133
- state_dict[max_key].is_largest_weight = True
134
-
135
- if return_arch:
136
- return (state_dict, arch_str)
137
- return state_dict
138
-
139
- # for remapping llama.cpp -> original key names
140
- T5_SD_MAP = {
141
- "enc.": "encoder.",
142
- ".blk.": ".block.",
143
- "token_embd": "shared",
144
- "output_norm": "final_layer_norm",
145
- "attn_q": "layer.0.SelfAttention.q",
146
- "attn_k": "layer.0.SelfAttention.k",
147
- "attn_v": "layer.0.SelfAttention.v",
148
- "attn_o": "layer.0.SelfAttention.o",
149
- "attn_norm": "layer.0.layer_norm",
150
- "attn_rel_b": "layer.0.SelfAttention.relative_attention_bias",
151
- "ffn_up": "layer.1.DenseReluDense.wi_1",
152
- "ffn_down": "layer.1.DenseReluDense.wo",
153
- "ffn_gate": "layer.1.DenseReluDense.wi_0",
154
- "ffn_norm": "layer.1.layer_norm",
155
- }
156
-
157
- LLAMA_SD_MAP = {
158
- "blk.": "model.layers.",
159
- "attn_norm": "input_layernorm",
160
- "attn_q": "self_attn.q_proj",
161
- "attn_k": "self_attn.k_proj",
162
- "attn_v": "self_attn.v_proj",
163
- "attn_output": "self_attn.o_proj",
164
- "ffn_up": "mlp.up_proj",
165
- "ffn_down": "mlp.down_proj",
166
- "ffn_gate": "mlp.gate_proj",
167
- "ffn_norm": "post_attention_layernorm",
168
- "token_embd": "model.embed_tokens",
169
- "output_norm": "model.norm",
170
- "output.weight": "lm_head.weight",
171
- }
172
-
173
- CLIP_VISION_SD_MAP = {
174
- "mm.": "visual.merger.mlp.",
175
- "v.post_ln.": "visual.merger.ln_q.",
176
- "v.patch_embd": "visual.patch_embed.proj",
177
- "v.blk.": "visual.blocks.",
178
- "ffn_up": "mlp.up_proj",
179
- "ffn_down": "mlp.down_proj",
180
- "ffn_gate": "mlp.gate_proj",
181
- "attn_out.": "attn.proj.",
182
- "ln1.": "norm1.",
183
- "ln2.": "norm2.",
184
- }
185
-
186
- def sd_map_replace(raw_sd, key_map):
187
- sd = {}
188
- for k,v in raw_sd.items():
189
- for s,d in key_map.items():
190
- k = k.replace(s,d)
191
- sd[k] = v
192
- return sd
193
-
194
- def llama_permute(raw_sd, n_head, n_head_kv):
195
- # Reverse version of LlamaModel.permute in llama.cpp convert script
196
- sd = {}
197
- permute = lambda x,h: x.reshape(h, x.shape[0] // h // 2, 2, *x.shape[1:]).swapaxes(1, 2).reshape(x.shape)
198
- for k,v in raw_sd.items():
199
- if k.endswith(("q_proj.weight", "q_proj.bias")):
200
- v.data = permute(v.data, n_head)
201
- if k.endswith(("k_proj.weight", "k_proj.bias")):
202
- v.data = permute(v.data, n_head_kv)
203
- sd[k] = v
204
- return sd
205
-
206
- def strip_quant_suffix(name):
207
- pattern = r"[-_]?(?:ud-)?i?q[0-9]_[a-z0-9_\-]{1,8}$"
208
- match = re.search(pattern, name, re.IGNORECASE)
209
- if match:
210
- name = name[:match.start()]
211
- return name
212
-
213
- def gguf_mmproj_loader(path):
214
- # Reverse version of Qwen2VLVisionModel.modify_tensors
215
- logging.info("Attenpting to find mmproj file for text encoder...")
216
-
217
- # get name to match w/o quant suffix
218
- tenc_fname = os.path.basename(path)
219
- tenc = os.path.splitext(tenc_fname)[0].lower()
220
- tenc = strip_quant_suffix(tenc)
221
-
222
- # try and find matching mmproj
223
- target = []
224
- root = os.path.dirname(path)
225
- for fname in os.listdir(root):
226
- name, ext = os.path.splitext(fname)
227
- if ext.lower() != ".gguf":
228
- continue
229
- if "mmproj" not in name.lower():
230
- continue
231
- if tenc in name.lower():
232
- target.append(fname)
233
-
234
- if len(target) == 0:
235
- logging.error(f"Error: Can't find mmproj file for '{tenc_fname}' (matching:'{tenc}')! Qwen-Image-Edit will be broken!")
236
- return {}
237
- if len(target) > 1:
238
- logging.error(f"Ambiguous mmproj for text encoder '{tenc_fname}', will use first match.")
239
-
240
- logging.info(f"Using mmproj '{target[0]}' for text encoder '{tenc_fname}'.")
241
- target = os.path.join(root, target[0])
242
- vsd = gguf_sd_loader(target, is_text_model=True)
243
-
244
- # concat 4D to 5D
245
- if "v.patch_embd.weight.1" in vsd:
246
- w1 = dequantize_tensor(vsd.pop("v.patch_embd.weight"), dtype=torch.float32)
247
- w2 = dequantize_tensor(vsd.pop("v.patch_embd.weight.1"), dtype=torch.float32)
248
- vsd["v.patch_embd.weight"] = torch.stack([w1, w2], dim=2)
249
-
250
- # run main replacement
251
- vsd = sd_map_replace(vsd, CLIP_VISION_SD_MAP)
252
-
253
- # handle split Q/K/V
254
- if "visual.blocks.0.attn_q.weight" in vsd:
255
- attns = {}
256
- # filter out attentions + group
257
- for k,v in vsd.items():
258
- if any(x in k for x in ["attn_q", "attn_k", "attn_v"]):
259
- k_attn, k_name = k.rsplit(".attn_", 1)
260
- k_attn += ".attn.qkv." + k_name.split(".")[-1]
261
- if k_attn not in attns:
262
- attns[k_attn] = {}
263
- attns[k_attn][k_name] = dequantize_tensor(
264
- v, dtype=(torch.bfloat16 if is_quantized(v) else torch.float16)
265
- )
266
-
267
- # recombine
268
- for k,v in attns.items():
269
- suffix = k.split(".")[-1]
270
- vsd[k] = torch.cat([
271
- v[f"q.{suffix}"],
272
- v[f"k.{suffix}"],
273
- v[f"v.{suffix}"],
274
- ], dim=0)
275
- del attns
276
-
277
- return vsd
278
-
279
- def gguf_tokenizer_loader(path, temb_shape):
280
- # convert gguf tokenizer to spiece
281
- logging.info("Attempting to recreate sentencepiece tokenizer from GGUF file metadata...")
282
- try:
283
- from sentencepiece import sentencepiece_model_pb2 as model
284
- except ImportError:
285
- raise ImportError("Please make sure sentencepiece and protobuf are installed.\npip install sentencepiece protobuf")
286
- spm = model.ModelProto()
287
-
288
- reader = gguf.GGUFReader(path)
289
-
290
- if get_field(reader, "tokenizer.ggml.model", str) == "t5":
291
- if temb_shape == (256384, 4096): # probably UMT5
292
- spm.trainer_spec.model_type == 1 # Unigram (do we have a T5 w/ BPE?)
293
- else:
294
- raise NotImplementedError("Unknown model, can't set tokenizer!")
295
- else:
296
- raise NotImplementedError("Unknown model, can't set tokenizer!")
297
-
298
- spm.normalizer_spec.add_dummy_prefix = get_field(reader, "tokenizer.ggml.add_space_prefix", bool)
299
- spm.normalizer_spec.remove_extra_whitespaces = get_field(reader, "tokenizer.ggml.remove_extra_whitespaces", bool)
300
-
301
- tokens = get_list_field(reader, "tokenizer.ggml.tokens", str)
302
- scores = get_list_field(reader, "tokenizer.ggml.scores", float)
303
- toktypes = get_list_field(reader, "tokenizer.ggml.token_type", int)
304
-
305
- for idx, (token, score, toktype) in enumerate(zip(tokens, scores, toktypes)):
306
- # # These aren't present in the original?
307
- # if toktype == 5 and idx >= temb_shape[0]%1000):
308
- # continue
309
-
310
- piece = spm.SentencePiece()
311
- piece.piece = token
312
- piece.score = score
313
- piece.type = toktype
314
- spm.pieces.append(piece)
315
-
316
- # unsure if any of these are correct
317
- spm.trainer_spec.byte_fallback = True
318
- spm.trainer_spec.vocab_size = len(tokens) # split off unused?
319
- spm.trainer_spec.max_sentence_length = 4096
320
- spm.trainer_spec.eos_id = get_field(reader, "tokenizer.ggml.eos_token_id", int)
321
- spm.trainer_spec.pad_id = get_field(reader, "tokenizer.ggml.padding_token_id", int)
322
-
323
- logging.info(f"Created tokenizer with vocab size of {len(spm.pieces)}")
324
- del reader
325
- return torch.ByteTensor(list(spm.SerializeToString()))
326
-
327
- def gguf_clip_loader(path):
328
- sd, arch = gguf_sd_loader(path, return_arch=True, is_text_model=True)
329
- if arch in {"t5", "t5encoder"}:
330
- temb_key = "token_embd.weight"
331
- if temb_key in sd and sd[temb_key].shape == (256384, 4096):
332
- # non-standard Comfy-Org tokenizer
333
- sd["spiece_model"] = gguf_tokenizer_loader(path, sd[temb_key].shape)
334
- # TODO: dequantizing token embed here is janky but otherwise we OOM due to tensor being massive.
335
- logging.warning(f"Dequantizing {temb_key} to prevent runtime OOM.")
336
- sd[temb_key] = dequantize_tensor(sd[temb_key], dtype=torch.float16)
337
- sd = sd_map_replace(sd, T5_SD_MAP)
338
- elif arch in {"llama", "qwen2vl"}:
339
- # TODO: pass model_options["vocab_size"] to loader somehow
340
- temb_key = "token_embd.weight"
341
- if temb_key in sd and sd[temb_key].shape[0] >= (64 * 1024):
342
- # See note above for T5.
343
- logging.warning(f"Dequantizing {temb_key} to prevent runtime OOM.")
344
- sd[temb_key] = dequantize_tensor(sd[temb_key], dtype=torch.float16)
345
- sd = sd_map_replace(sd, LLAMA_SD_MAP)
346
- if arch == "llama":
347
- sd = llama_permute(sd, 32, 8) # L3
348
- if arch == "qwen2vl":
349
- vsd = gguf_mmproj_loader(path)
350
- sd.update(vsd)
351
- else:
352
- pass
353
- return sd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/nodes.py DELETED
@@ -1,305 +0,0 @@
1
- # (c) City96 || Apache-2.0 (apache.org/licenses/LICENSE-2.0)
2
- import torch
3
- import logging
4
- import collections
5
-
6
- import nodes
7
- import comfy.sd
8
- import comfy.lora
9
- import comfy.float
10
- import comfy.utils
11
- import comfy.model_patcher
12
- import comfy.model_management
13
- import folder_paths
14
-
15
- from .ops import GGMLOps, move_patch_to_device
16
- from .loader import gguf_sd_loader, gguf_clip_loader
17
- from .dequant import is_quantized, is_torch_compatible
18
-
19
- def update_folder_names_and_paths(key, targets=[]):
20
- # check for existing key
21
- base = folder_paths.folder_names_and_paths.get(key, ([], {}))
22
- base = base[0] if isinstance(base[0], (list, set, tuple)) else []
23
- # find base key & add w/ fallback, sanity check + warning
24
- target = next((x for x in targets if x in folder_paths.folder_names_and_paths), targets[0])
25
- orig, _ = folder_paths.folder_names_and_paths.get(target, ([], {}))
26
- folder_paths.folder_names_and_paths[key] = (orig or base, {".gguf"})
27
- if base and base != orig:
28
- logging.warning(f"Unknown file list already present on key {key}: {base}")
29
-
30
- # Add a custom keys for files ending in .gguf
31
- update_folder_names_and_paths("unet_gguf", ["diffusion_models", "unet"])
32
- update_folder_names_and_paths("clip_gguf", ["text_encoders", "clip"])
33
-
34
- class GGUFModelPatcher(comfy.model_patcher.ModelPatcher):
35
- patch_on_device = False
36
-
37
- def patch_weight_to_device(self, key, device_to=None, inplace_update=False):
38
- if key not in self.patches:
39
- return
40
- weight = comfy.utils.get_attr(self.model, key)
41
-
42
- patches = self.patches[key]
43
- if is_quantized(weight):
44
- out_weight = weight.to(device_to)
45
- patches = move_patch_to_device(patches, self.load_device if self.patch_on_device else self.offload_device)
46
- # TODO: do we ever have legitimate duplicate patches? (i.e. patch on top of patched weight)
47
- out_weight.patches = [(patches, key)]
48
- else:
49
- inplace_update = self.weight_inplace_update or inplace_update
50
- if key not in self.backup:
51
- self.backup[key] = collections.namedtuple('Dimension', ['weight', 'inplace_update'])(
52
- weight.to(device=self.offload_device, copy=inplace_update), inplace_update
53
- )
54
-
55
- if device_to is not None:
56
- temp_weight = comfy.model_management.cast_to_device(weight, device_to, torch.float32, copy=True)
57
- else:
58
- temp_weight = weight.to(torch.float32, copy=True)
59
-
60
- out_weight = comfy.lora.calculate_weight(patches, temp_weight, key)
61
- out_weight = comfy.float.stochastic_rounding(out_weight, weight.dtype)
62
-
63
- if inplace_update:
64
- comfy.utils.copy_to_param(self.model, key, out_weight)
65
- else:
66
- comfy.utils.set_attr_param(self.model, key, out_weight)
67
-
68
- def unpatch_model(self, device_to=None, unpatch_weights=True):
69
- if unpatch_weights:
70
- for p in self.model.parameters():
71
- if is_torch_compatible(p):
72
- continue
73
- patches = getattr(p, "patches", [])
74
- if len(patches) > 0:
75
- p.patches = []
76
- # TODO: Find another way to not unload after patches
77
- return super().unpatch_model(device_to=device_to, unpatch_weights=unpatch_weights)
78
-
79
- mmap_released = False
80
- def load(self, *args, force_patch_weights=False, **kwargs):
81
- # always call `patch_weight_to_device` even for lowvram
82
- super().load(*args, force_patch_weights=True, **kwargs)
83
-
84
- # make sure nothing stays linked to mmap after first load
85
- if not self.mmap_released:
86
- linked = []
87
- if kwargs.get("lowvram_model_memory", 0) > 0:
88
- for n, m in self.model.named_modules():
89
- if hasattr(m, "weight"):
90
- device = getattr(m.weight, "device", None)
91
- if device == self.offload_device:
92
- linked.append((n, m))
93
- continue
94
- if hasattr(m, "bias"):
95
- device = getattr(m.bias, "device", None)
96
- if device == self.offload_device:
97
- linked.append((n, m))
98
- continue
99
- if linked and self.load_device != self.offload_device:
100
- logging.info(f"Attempting to release mmap ({len(linked)})")
101
- for n, m in linked:
102
- # TODO: possible to OOM, find better way to detach
103
- m.to(self.load_device).to(self.offload_device)
104
- self.mmap_released = True
105
-
106
- def clone(self, *args, **kwargs):
107
- src_cls = self.__class__
108
- self.__class__ = GGUFModelPatcher
109
- n = super().clone(*args, **kwargs)
110
- n.__class__ = GGUFModelPatcher
111
- self.__class__ = src_cls
112
- # GGUF specific clone values below
113
- n.patch_on_device = getattr(self, "patch_on_device", False)
114
- if src_cls != GGUFModelPatcher:
115
- n.size = 0 # force recalc
116
- return n
117
-
118
- class UnetLoaderGGUF:
119
- @classmethod
120
- def INPUT_TYPES(s):
121
- unet_names = [x for x in folder_paths.get_filename_list("unet_gguf")]
122
- return {
123
- "required": {
124
- "unet_name": (unet_names,),
125
- }
126
- }
127
-
128
- RETURN_TYPES = ("MODEL",)
129
- FUNCTION = "load_unet"
130
- CATEGORY = "bootleg"
131
- TITLE = "Unet Loader (GGUF)"
132
-
133
- def load_unet(self, unet_name, dequant_dtype=None, patch_dtype=None, patch_on_device=None):
134
- ops = GGMLOps()
135
-
136
- if dequant_dtype in ("default", None):
137
- ops.Linear.dequant_dtype = None
138
- elif dequant_dtype in ["target"]:
139
- ops.Linear.dequant_dtype = dequant_dtype
140
- else:
141
- ops.Linear.dequant_dtype = getattr(torch, dequant_dtype)
142
-
143
- if patch_dtype in ("default", None):
144
- ops.Linear.patch_dtype = None
145
- elif patch_dtype in ["target"]:
146
- ops.Linear.patch_dtype = patch_dtype
147
- else:
148
- ops.Linear.patch_dtype = getattr(torch, patch_dtype)
149
-
150
- # init model
151
- unet_path = folder_paths.get_full_path("unet", unet_name)
152
- sd = gguf_sd_loader(unet_path)
153
- model = comfy.sd.load_diffusion_model_state_dict(
154
- sd, model_options={"custom_operations": ops}
155
- )
156
- if model is None:
157
- logging.error("ERROR UNSUPPORTED UNET {}".format(unet_path))
158
- raise RuntimeError("ERROR: Could not detect model type of: {}".format(unet_path))
159
- model = GGUFModelPatcher.clone(model)
160
- model.patch_on_device = patch_on_device
161
- return (model,)
162
-
163
- class UnetLoaderGGUFAdvanced(UnetLoaderGGUF):
164
- @classmethod
165
- def INPUT_TYPES(s):
166
- unet_names = [x for x in folder_paths.get_filename_list("unet_gguf")]
167
- return {
168
- "required": {
169
- "unet_name": (unet_names,),
170
- "dequant_dtype": (["default", "target", "float32", "float16", "bfloat16"], {"default": "default"}),
171
- "patch_dtype": (["default", "target", "float32", "float16", "bfloat16"], {"default": "default"}),
172
- "patch_on_device": ("BOOLEAN", {"default": False}),
173
- }
174
- }
175
- TITLE = "Unet Loader (GGUF/Advanced)"
176
-
177
- class CLIPLoaderGGUF:
178
- @classmethod
179
- def INPUT_TYPES(s):
180
- base = nodes.CLIPLoader.INPUT_TYPES()
181
- return {
182
- "required": {
183
- "clip_name": (s.get_filename_list(),),
184
- "type": base["required"]["type"],
185
- }
186
- }
187
-
188
- RETURN_TYPES = ("CLIP",)
189
- FUNCTION = "load_clip"
190
- CATEGORY = "bootleg"
191
- TITLE = "CLIPLoader (GGUF)"
192
-
193
- @classmethod
194
- def get_filename_list(s):
195
- files = []
196
- files += folder_paths.get_filename_list("clip")
197
- files += folder_paths.get_filename_list("clip_gguf")
198
- return sorted(files)
199
-
200
- def load_data(self, ckpt_paths):
201
- clip_data = []
202
- for p in ckpt_paths:
203
- if p.endswith(".gguf"):
204
- sd = gguf_clip_loader(p)
205
- else:
206
- sd = comfy.utils.load_torch_file(p, safe_load=True)
207
- if "scaled_fp8" in sd: # NOTE: Scaled FP8 would require different custom ops, but only one can be active
208
- raise NotImplementedError(f"Mixing scaled FP8 with GGUF is not supported! Use regular CLIP loader or switch model(s)\n({p})")
209
- clip_data.append(sd)
210
- return clip_data
211
-
212
- def load_patcher(self, clip_paths, clip_type, clip_data):
213
- clip = comfy.sd.load_text_encoder_state_dicts(
214
- clip_type = clip_type,
215
- state_dicts = clip_data,
216
- model_options = {
217
- "custom_operations": GGMLOps,
218
- "initial_device": comfy.model_management.text_encoder_offload_device()
219
- },
220
- embedding_directory = folder_paths.get_folder_paths("embeddings"),
221
- )
222
- clip.patcher = GGUFModelPatcher.clone(clip.patcher)
223
- return clip
224
-
225
- def load_clip(self, clip_name, type="stable_diffusion"):
226
- clip_path = folder_paths.get_full_path("clip", clip_name)
227
- clip_type = getattr(comfy.sd.CLIPType, type.upper(), comfy.sd.CLIPType.STABLE_DIFFUSION)
228
- return (self.load_patcher([clip_path], clip_type, self.load_data([clip_path])),)
229
-
230
- class DualCLIPLoaderGGUF(CLIPLoaderGGUF):
231
- @classmethod
232
- def INPUT_TYPES(s):
233
- base = nodes.DualCLIPLoader.INPUT_TYPES()
234
- file_options = (s.get_filename_list(), )
235
- return {
236
- "required": {
237
- "clip_name1": file_options,
238
- "clip_name2": file_options,
239
- "type": base["required"]["type"],
240
- }
241
- }
242
-
243
- TITLE = "DualCLIPLoader (GGUF)"
244
-
245
- def load_clip(self, clip_name1, clip_name2, type):
246
- clip_path1 = folder_paths.get_full_path("clip", clip_name1)
247
- clip_path2 = folder_paths.get_full_path("clip", clip_name2)
248
- clip_paths = (clip_path1, clip_path2)
249
- clip_type = getattr(comfy.sd.CLIPType, type.upper(), comfy.sd.CLIPType.STABLE_DIFFUSION)
250
- return (self.load_patcher(clip_paths, clip_type, self.load_data(clip_paths)),)
251
-
252
- class TripleCLIPLoaderGGUF(CLIPLoaderGGUF):
253
- @classmethod
254
- def INPUT_TYPES(s):
255
- file_options = (s.get_filename_list(), )
256
- return {
257
- "required": {
258
- "clip_name1": file_options,
259
- "clip_name2": file_options,
260
- "clip_name3": file_options,
261
- }
262
- }
263
-
264
- TITLE = "TripleCLIPLoader (GGUF)"
265
-
266
- def load_clip(self, clip_name1, clip_name2, clip_name3, type="sd3"):
267
- clip_path1 = folder_paths.get_full_path("clip", clip_name1)
268
- clip_path2 = folder_paths.get_full_path("clip", clip_name2)
269
- clip_path3 = folder_paths.get_full_path("clip", clip_name3)
270
- clip_paths = (clip_path1, clip_path2, clip_path3)
271
- clip_type = getattr(comfy.sd.CLIPType, type.upper(), comfy.sd.CLIPType.STABLE_DIFFUSION)
272
- return (self.load_patcher(clip_paths, clip_type, self.load_data(clip_paths)),)
273
-
274
- class QuadrupleCLIPLoaderGGUF(CLIPLoaderGGUF):
275
- @classmethod
276
- def INPUT_TYPES(s):
277
- file_options = (s.get_filename_list(), )
278
- return {
279
- "required": {
280
- "clip_name1": file_options,
281
- "clip_name2": file_options,
282
- "clip_name3": file_options,
283
- "clip_name4": file_options,
284
- }
285
- }
286
-
287
- TITLE = "QuadrupleCLIPLoader (GGUF)"
288
-
289
- def load_clip(self, clip_name1, clip_name2, clip_name3, clip_name4, type="stable_diffusion"):
290
- clip_path1 = folder_paths.get_full_path("clip", clip_name1)
291
- clip_path2 = folder_paths.get_full_path("clip", clip_name2)
292
- clip_path3 = folder_paths.get_full_path("clip", clip_name3)
293
- clip_path4 = folder_paths.get_full_path("clip", clip_name4)
294
- clip_paths = (clip_path1, clip_path2, clip_path3, clip_path4)
295
- clip_type = getattr(comfy.sd.CLIPType, type.upper(), comfy.sd.CLIPType.STABLE_DIFFUSION)
296
- return (self.load_patcher(clip_paths, clip_type, self.load_data(clip_paths)),)
297
-
298
- NODE_CLASS_MAPPINGS = {
299
- "UnetLoaderGGUF": UnetLoaderGGUF,
300
- "CLIPLoaderGGUF": CLIPLoaderGGUF,
301
- "DualCLIPLoaderGGUF": DualCLIPLoaderGGUF,
302
- "TripleCLIPLoaderGGUF": TripleCLIPLoaderGGUF,
303
- "QuadrupleCLIPLoaderGGUF": QuadrupleCLIPLoaderGGUF,
304
- "UnetLoaderGGUFAdvanced": UnetLoaderGGUFAdvanced,
305
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/ops.py DELETED
@@ -1,281 +0,0 @@
1
- # (c) City96 || Apache-2.0 (apache.org/licenses/LICENSE-2.0)
2
- import gguf
3
- import torch
4
- import logging
5
-
6
- import comfy.ops
7
- import comfy.lora
8
- import comfy.model_management
9
- from .dequant import dequantize_tensor, is_quantized
10
-
11
- def chained_hasattr(obj, chained_attr):
12
- probe = obj
13
- for attr in chained_attr.split('.'):
14
- if hasattr(probe, attr):
15
- probe = getattr(probe, attr)
16
- else:
17
- return False
18
- return True
19
-
20
- # A bakcward and forward compatible way to get `torch.compiler.disable`.
21
- def get_torch_compiler_disable_decorator():
22
- def dummy_decorator(*args, **kwargs):
23
- def noop(x):
24
- return x
25
- return noop
26
-
27
- from packaging import version
28
-
29
- if not chained_hasattr(torch, "compiler.disable"):
30
- logging.info("ComfyUI-GGUF: Torch too old for torch.compile - bypassing")
31
- return dummy_decorator # torch too old
32
- elif version.parse(torch.__version__) >= version.parse("2.8"):
33
- logging.info("ComfyUI-GGUF: Allowing full torch compile")
34
- return dummy_decorator # torch compile works
35
- if chained_hasattr(torch, "_dynamo.config.nontraceable_tensor_subclasses"):
36
- logging.info("ComfyUI-GGUF: Allowing full torch compile (nightly)")
37
- return dummy_decorator # torch compile works, nightly before 2.8 release
38
- else:
39
- logging.info("ComfyUI-GGUF: Partial torch compile only, consider updating pytorch")
40
- return torch.compiler.disable
41
-
42
- torch_compiler_disable = get_torch_compiler_disable_decorator()
43
-
44
- class GGMLTensor(torch.Tensor):
45
- """
46
- Main tensor-like class for storing quantized weights
47
- """
48
- def __init__(self, *args, tensor_type, tensor_shape, patches=[], **kwargs):
49
- super().__init__()
50
- self.tensor_type = tensor_type
51
- self.tensor_shape = tensor_shape
52
- self.patches = patches
53
-
54
- def __new__(cls, *args, tensor_type, tensor_shape, patches=[], **kwargs):
55
- return super().__new__(cls, *args, **kwargs)
56
-
57
- def to(self, *args, **kwargs):
58
- new = super().to(*args, **kwargs)
59
- new.tensor_type = getattr(self, "tensor_type", None)
60
- new.tensor_shape = getattr(self, "tensor_shape", new.data.shape)
61
- new.patches = getattr(self, "patches", []).copy()
62
- return new
63
-
64
- def clone(self, *args, **kwargs):
65
- return self
66
-
67
- def detach(self, *args, **kwargs):
68
- return self
69
-
70
- def copy_(self, *args, **kwargs):
71
- # fixes .weight.copy_ in comfy/clip_model/CLIPTextModel
72
- try:
73
- return super().copy_(*args, **kwargs)
74
- except Exception as e:
75
- logging.warning(f"ignoring 'copy_' on tensor: {e}")
76
-
77
- def new_empty(self, size, *args, **kwargs):
78
- # Intel Arc fix, ref#50
79
- new_tensor = super().new_empty(size, *args, **kwargs)
80
- return GGMLTensor(
81
- new_tensor,
82
- tensor_type = getattr(self, "tensor_type", None),
83
- tensor_shape = size,
84
- patches = getattr(self, "patches", []).copy()
85
- )
86
-
87
- @property
88
- def shape(self):
89
- if not hasattr(self, "tensor_shape"):
90
- self.tensor_shape = self.size()
91
- return self.tensor_shape
92
-
93
- class GGMLLayer(torch.nn.Module):
94
- """
95
- This (should) be responsible for de-quantizing on the fly
96
- """
97
- comfy_cast_weights = True
98
- dequant_dtype = None
99
- patch_dtype = None
100
- largest_layer = False
101
- torch_compatible_tensor_types = {None, gguf.GGMLQuantizationType.F32, gguf.GGMLQuantizationType.F16}
102
-
103
- def is_ggml_quantized(self, *, weight=None, bias=None):
104
- if weight is None:
105
- weight = self.weight
106
- if bias is None:
107
- bias = self.bias
108
- return is_quantized(weight) or is_quantized(bias)
109
-
110
- def _load_from_state_dict(self, state_dict, prefix, *args, **kwargs):
111
- weight, bias = state_dict.get(f"{prefix}weight"), state_dict.get(f"{prefix}bias")
112
- # NOTE: using modified load for linear due to not initializing on creation, see GGMLOps todo
113
- if self.is_ggml_quantized(weight=weight, bias=bias) or isinstance(self, torch.nn.Linear):
114
- return self.ggml_load_from_state_dict(state_dict, prefix, *args, **kwargs)
115
- # Not strictly required, but fixes embedding shape mismatch. Threshold set in loader.py
116
- if isinstance(self, torch.nn.Embedding) and self.weight.shape[0] >= (64 * 1024):
117
- return self.ggml_load_from_state_dict(state_dict, prefix, *args, **kwargs)
118
- return super()._load_from_state_dict(state_dict, prefix, *args, **kwargs)
119
-
120
- def ggml_load_from_state_dict(self, state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs):
121
- prefix_len = len(prefix)
122
- for k,v in state_dict.items():
123
- if k[prefix_len:] == "weight":
124
- self.weight = torch.nn.Parameter(v, requires_grad=False)
125
- elif k[prefix_len:] == "bias" and v is not None:
126
- self.bias = torch.nn.Parameter(v, requires_grad=False)
127
- else:
128
- unexpected_keys.append(k)
129
-
130
- # For Linear layer with missing weight
131
- if self.weight is None and isinstance(self, torch.nn.Linear):
132
- v = torch.zeros(self.in_features, self.out_features)
133
- self.weight = torch.nn.Parameter(v, requires_grad=False)
134
- missing_keys.append(prefix+"weight")
135
-
136
- # for vram estimation (TODO: less fragile logic?)
137
- if getattr(self.weight, "is_largest_weight", False):
138
- self.largest_layer = True
139
-
140
- def _save_to_state_dict(self, *args, **kwargs):
141
- if self.is_ggml_quantized():
142
- return self.ggml_save_to_state_dict(*args, **kwargs)
143
- return super()._save_to_state_dict(*args, **kwargs)
144
-
145
- def ggml_save_to_state_dict(self, destination, prefix, keep_vars):
146
- # This is a fake state dict for vram estimation
147
- weight = torch.zeros_like(self.weight, device=torch.device("meta"))
148
- destination[prefix + "weight"] = weight
149
- if self.bias is not None:
150
- bias = torch.zeros_like(self.bias, device=torch.device("meta"))
151
- destination[prefix + "bias"] = bias
152
-
153
- # Take into account space required for dequantizing the largest tensor
154
- if self.largest_layer:
155
- shape = getattr(self.weight, "tensor_shape", self.weight.shape)
156
- dtype = self.dequant_dtype if self.dequant_dtype and self.dequant_dtype != "target" else torch.float16
157
- temp = torch.empty(*shape, device=torch.device("meta"), dtype=dtype)
158
- destination[prefix + "temp.weight"] = temp
159
-
160
- return
161
- # This would return the dequantized state dict
162
- destination[prefix + "weight"] = self.get_weight(self.weight)
163
- if bias is not None:
164
- destination[prefix + "bias"] = self.get_weight(self.bias)
165
-
166
- def get_weight(self, tensor, dtype):
167
- if tensor is None:
168
- return
169
-
170
- # consolidate and load patches to GPU in async
171
- patch_list = []
172
- device = tensor.device
173
- for patches, key in getattr(tensor, "patches", []):
174
- patch_list += move_patch_to_device(patches, device)
175
-
176
- # dequantize tensor while patches load
177
- weight = dequantize_tensor(tensor, dtype, self.dequant_dtype)
178
-
179
- # prevent propagating custom tensor class
180
- if isinstance(weight, GGMLTensor):
181
- weight = torch.Tensor(weight)
182
-
183
- # apply patches
184
- if len(patch_list) > 0:
185
- if self.patch_dtype is None:
186
- weight = comfy.lora.calculate_weight(patch_list, weight, key)
187
- else:
188
- # for testing, may degrade image quality
189
- patch_dtype = dtype if self.patch_dtype == "target" else self.patch_dtype
190
- weight = comfy.lora.calculate_weight(patch_list, weight, key, patch_dtype)
191
- return weight
192
-
193
- @torch_compiler_disable()
194
- def cast_bias_weight(s, input=None, dtype=None, device=None, bias_dtype=None):
195
- if input is not None:
196
- if dtype is None:
197
- dtype = getattr(input, "dtype", torch.float32)
198
- if bias_dtype is None:
199
- bias_dtype = dtype
200
- if device is None:
201
- device = input.device
202
-
203
- bias = None
204
- non_blocking = comfy.model_management.device_supports_non_blocking(device)
205
- if s.bias is not None:
206
- bias = s.get_weight(s.bias.to(device), dtype)
207
- bias = comfy.ops.cast_to(bias, bias_dtype, device, non_blocking=non_blocking, copy=False)
208
-
209
- weight = s.get_weight(s.weight.to(device), dtype)
210
- weight = comfy.ops.cast_to(weight, dtype, device, non_blocking=non_blocking, copy=False)
211
- return weight, bias
212
-
213
- def forward_comfy_cast_weights(self, input, *args, **kwargs):
214
- if self.is_ggml_quantized():
215
- out = self.forward_ggml_cast_weights(input, *args, **kwargs)
216
- else:
217
- out = super().forward_comfy_cast_weights(input, *args, **kwargs)
218
-
219
- # non-ggml forward might still propagate custom tensor class
220
- if isinstance(out, GGMLTensor):
221
- out = torch.Tensor(out)
222
- return out
223
-
224
- def forward_ggml_cast_weights(self, input):
225
- raise NotImplementedError
226
-
227
- class GGMLOps(comfy.ops.manual_cast):
228
- """
229
- Dequantize weights on the fly before doing the compute
230
- """
231
- class Linear(GGMLLayer, comfy.ops.manual_cast.Linear):
232
- def __init__(self, in_features, out_features, bias=True, device=None, dtype=None):
233
- torch.nn.Module.__init__(self)
234
- # TODO: better workaround for reserved memory spike on windows
235
- # Issue is with `torch.empty` still reserving the full memory for the layer
236
- # Windows doesn't over-commit memory so without this 24GB+ of pagefile is used
237
- self.in_features = in_features
238
- self.out_features = out_features
239
- self.weight = None
240
- self.bias = None
241
-
242
- def forward_ggml_cast_weights(self, input):
243
- weight, bias = self.cast_bias_weight(input)
244
- return torch.nn.functional.linear(input, weight, bias)
245
-
246
- class Conv2d(GGMLLayer, comfy.ops.manual_cast.Conv2d):
247
- def forward_ggml_cast_weights(self, input):
248
- weight, bias = self.cast_bias_weight(input)
249
- return self._conv_forward(input, weight, bias)
250
-
251
- class Embedding(GGMLLayer, comfy.ops.manual_cast.Embedding):
252
- def forward_ggml_cast_weights(self, input, out_dtype=None):
253
- output_dtype = out_dtype
254
- if self.weight.dtype == torch.float16 or self.weight.dtype == torch.bfloat16:
255
- out_dtype = None
256
- weight, _bias = self.cast_bias_weight(self, device=input.device, dtype=out_dtype)
257
- return torch.nn.functional.embedding(
258
- input, weight, self.padding_idx, self.max_norm, self.norm_type, self.scale_grad_by_freq, self.sparse
259
- ).to(dtype=output_dtype)
260
-
261
- class LayerNorm(GGMLLayer, comfy.ops.manual_cast.LayerNorm):
262
- def forward_ggml_cast_weights(self, input):
263
- if self.weight is None:
264
- return super().forward_comfy_cast_weights(input)
265
- weight, bias = self.cast_bias_weight(input)
266
- return torch.nn.functional.layer_norm(input, self.normalized_shape, weight, bias, self.eps)
267
-
268
- class GroupNorm(GGMLLayer, comfy.ops.manual_cast.GroupNorm):
269
- def forward_ggml_cast_weights(self, input):
270
- weight, bias = self.cast_bias_weight(input)
271
- return torch.nn.functional.group_norm(input, self.num_groups, weight, bias, self.eps)
272
-
273
- def move_patch_to_device(item, device):
274
- if isinstance(item, torch.Tensor):
275
- return item.to(device, non_blocking=True)
276
- elif isinstance(item, tuple):
277
- return tuple(move_patch_to_device(x, device) for x in item)
278
- elif isinstance(item, list):
279
- return [move_patch_to_device(x, device) for x in item]
280
- else:
281
- return item
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/pyproject.toml DELETED
@@ -1,14 +0,0 @@
1
- [project]
2
- name = "comfyui-gguf"
3
- description = "GGUF Quantization support for native ComfyUI models."
4
- version = "2.0.0" # 2.0.0 = GitHub main, 1.X.X = ComfyUI Registry
5
- license = { file = "LICENSE" }
6
- dependencies = ["gguf>=0.13.0", "sentencepiece", "protobuf"]
7
-
8
- [project.urls]
9
- Repository = "https://github.com/city96/ComfyUI-GGUF"
10
-
11
- [tool.comfy]
12
- PublisherId = "city96"
13
- DisplayName = "ComfyUI-GGUF"
14
- Icon = ""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/requirements.txt DELETED
@@ -1,5 +0,0 @@
1
- # main
2
- gguf>=0.13.0
3
- # optional - tokenizer
4
- sentencepiece
5
- protobuf
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/tools/README.md DELETED
@@ -1,93 +0,0 @@
1
- ## Converting initial model
2
-
3
- To convert your initial safetensors/ckpt model to FP16/BF16 GGUF, run the following command:
4
-
5
- ```
6
- python convert.py --src E:\models\unet\flux1-dev.safetensors
7
- ```
8
- Make sure `gguf>=0.13.0` is installed for this step. Optionally, specify the output gguf file with the `--dst` arg.
9
-
10
- > [!NOTE]
11
- > Do not use the diffusers UNET format for flux, it won't work, use the default/reference checkpoint key format. This is due to q/k/v being merged into one qkv key.
12
- > You can convert it by loading it in ComfyUI and saving it using the built-in "ModelSave" node.
13
-
14
- > [!WARNING]
15
- > For hunyuan video/wan 2.1, you will see a warning about 5D tensors. This means the script will save a **non functional** model to disk first, that you can quantize. I recommend saving these in a separate `raw` folder to avoid confusion.
16
- >
17
- > After quantization, you will have to run `fix_5d_tensor.py` manually to add back the missing key that was saved by the conversion code.
18
-
19
- ## Quantizing using custom llama.cpp
20
-
21
- Depending on your git settings, you may need to run the following script first in order to make sure the patch file is valid. It will convert Windows (CRLF) line endings to Unix (LF) ones.
22
-
23
- ```
24
- python fix_lines_ending.py
25
- ```
26
-
27
- Git clone llama.cpp into the current folder:
28
-
29
- ```
30
- git clone https://github.com/ggerganov/llama.cpp
31
- ```
32
-
33
- Check out the correct branch, then apply the custom patch needed to add image model support to the repo you just cloned.
34
-
35
- ```
36
- cd llama.cpp
37
- git checkout tags/b3962
38
- git apply ..\lcpp.patch
39
- ```
40
-
41
- Compile the llama-quantize binary. This example uses cmake, on linux you can just use make.
42
-
43
- ### Visual Studio 2019, Linux, etc...
44
-
45
- ```
46
- mkdir build
47
- cmake -B build
48
- cmake --build build --config Debug -j10 --target llama-quantize
49
- cd ..
50
- ```
51
-
52
- ### Visual Studio 2022
53
-
54
- ```
55
- mkdir build
56
- cmake -B build -DCMAKE_CXX_STANDARD=17 -DCMAKE_CXX_STANDARD_REQUIRED=ON -DCMAKE_CXX_FLAGS="-std=c++17"
57
- ```
58
-
59
- Edit the `llama.cpp\common\log.cpp` file, inserts two lines after the existing first line:
60
-
61
- ```
62
- #include "log.h"
63
-
64
- #define _SILENCE_CXX23_CHRONO_DEPRECATION_WARNING
65
- #include <chrono>
66
- ```
67
-
68
- Then you can build the project:
69
- ```
70
- cmake --build build --config Debug -j10 --target llama-quantize
71
- cd ..
72
- ```
73
-
74
- ### Quantize your model
75
-
76
-
77
- Now you can use the newly build binary to quantize your model to the desired format:
78
- ```
79
- llama.cpp\build\bin\Debug\llama-quantize.exe E:\models\unet\flux1-dev-BF16.gguf E:\models\unet\flux1-dev-Q4_K_S.gguf Q4_K_S
80
- ```
81
-
82
- You can extract the patch again with `git diff src\llama.cpp > lcpp.patch` if you wish to change something and contribute back.
83
-
84
- > [!WARNING]
85
- > For hunyuan video/wan 2.1, you will have to run `fix_5d_tensor.py` after the quantization step is done.
86
- >
87
- > Example usage: `fix_5d_tensors.py --src E:\models\video\raw\wan2.1-t2v-1.3b-Q8_0.gguf --dst E:\models\video\wan2.1-t2v-1.3b-Q8_0.gguf`
88
- >
89
- > By default, this also saves a `fix_5d_tensors_[arch].safetensors` file in the `ComfyUI-GGUF/tools` folder, it's recommended to delete this after all models have been converted.
90
-
91
- > [!NOTE]
92
- > Do not quantize SDXL / SD1 / other Conv2D heavy models. If you do, make sure to **extract the UNET model first**.
93
- >This should be obvious, but also don't use the resulting llama-quantize binary with LLMs.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/tools/convert.py DELETED
@@ -1,365 +0,0 @@
1
- # (c) City96 || Apache-2.0 (apache.org/licenses/LICENSE-2.0)
2
- import os
3
- import gguf
4
- import torch
5
- import logging
6
- import argparse
7
- from tqdm import tqdm
8
- from safetensors.torch import load_file, save_file
9
-
10
- QUANTIZATION_THRESHOLD = 1024
11
- REARRANGE_THRESHOLD = 512
12
- MAX_TENSOR_NAME_LENGTH = 127
13
- MAX_TENSOR_DIMS = 4
14
-
15
- class ModelTemplate:
16
- arch = "invalid" # string describing architecture
17
- shape_fix = False # whether to reshape tensors
18
- keys_detect = [] # list of lists to match in state dict
19
- keys_banned = [] # list of keys that should mark model as invalid for conversion
20
- keys_hiprec = [] # list of keys that need to be kept in fp32 for some reason
21
- keys_ignore = [] # list of strings to ignore keys by when found
22
-
23
- def handle_nd_tensor(self, key, data):
24
- raise NotImplementedError(f"Tensor detected that exceeds dims supported by C++ code! ({key} @ {data.shape})")
25
-
26
- class ModelFlux(ModelTemplate):
27
- arch = "flux"
28
- keys_detect = [
29
- ("transformer_blocks.0.attn.norm_added_k.weight",),
30
- ("double_blocks.0.img_attn.proj.weight",),
31
- ]
32
- keys_banned = ["transformer_blocks.0.attn.norm_added_k.weight",]
33
-
34
- class ModelSD3(ModelTemplate):
35
- arch = "sd3"
36
- keys_detect = [
37
- ("transformer_blocks.0.attn.add_q_proj.weight",),
38
- ("joint_blocks.0.x_block.attn.qkv.weight",),
39
- ]
40
- keys_banned = ["transformer_blocks.0.attn.add_q_proj.weight",]
41
-
42
- class ModelAura(ModelTemplate):
43
- arch = "aura"
44
- keys_detect = [
45
- ("double_layers.3.modX.1.weight",),
46
- ("joint_transformer_blocks.3.ff_context.out_projection.weight",),
47
- ]
48
- keys_banned = ["joint_transformer_blocks.3.ff_context.out_projection.weight",]
49
-
50
- class ModelHiDream(ModelTemplate):
51
- arch = "hidream"
52
- keys_detect = [
53
- (
54
- "caption_projection.0.linear.weight",
55
- "double_stream_blocks.0.block.ff_i.shared_experts.w3.weight"
56
- )
57
- ]
58
- keys_hiprec = [
59
- # nn.parameter, can't load from BF16 ver
60
- ".ff_i.gate.weight",
61
- "img_emb.emb_pos"
62
- ]
63
-
64
- class CosmosPredict2(ModelTemplate):
65
- arch = "cosmos"
66
- keys_detect = [
67
- (
68
- "blocks.0.mlp.layer1.weight",
69
- "blocks.0.adaln_modulation_cross_attn.1.weight",
70
- )
71
- ]
72
- keys_hiprec = ["pos_embedder"]
73
- keys_ignore = ["_extra_state", "accum_"]
74
-
75
- class ModelHyVid(ModelTemplate):
76
- arch = "hyvid"
77
- keys_detect = [
78
- (
79
- "double_blocks.0.img_attn_proj.weight",
80
- "txt_in.individual_token_refiner.blocks.1.self_attn_qkv.weight",
81
- )
82
- ]
83
-
84
- def handle_nd_tensor(self, key, data):
85
- # hacky but don't have any better ideas
86
- path = f"./fix_5d_tensors_{self.arch}.safetensors" # TODO: somehow get a path here??
87
- if os.path.isfile(path):
88
- raise RuntimeError(f"5D tensor fix file already exists! {path}")
89
- fsd = {key: torch.from_numpy(data)}
90
- tqdm.write(f"5D key found in state dict! Manual fix required! - {key} {data.shape}")
91
- save_file(fsd, path)
92
-
93
- class ModelWan(ModelHyVid):
94
- arch = "wan"
95
- keys_detect = [
96
- (
97
- "blocks.0.self_attn.norm_q.weight",
98
- "text_embedding.2.weight",
99
- "head.modulation",
100
- )
101
- ]
102
- keys_hiprec = [
103
- ".modulation" # nn.parameter, can't load from BF16 ver
104
- ]
105
-
106
- class ModelLTXV(ModelTemplate):
107
- arch = "ltxv"
108
- keys_detect = [
109
- (
110
- "adaln_single.emb.timestep_embedder.linear_2.weight",
111
- "transformer_blocks.27.scale_shift_table",
112
- "caption_projection.linear_2.weight",
113
- )
114
- ]
115
- keys_hiprec = [
116
- "scale_shift_table" # nn.parameter, can't load from BF16 base quant
117
- ]
118
-
119
- class ModelSDXL(ModelTemplate):
120
- arch = "sdxl"
121
- shape_fix = True
122
- keys_detect = [
123
- ("down_blocks.0.downsamplers.0.conv.weight", "add_embedding.linear_1.weight",),
124
- (
125
- "input_blocks.3.0.op.weight", "input_blocks.6.0.op.weight",
126
- "output_blocks.2.2.conv.weight", "output_blocks.5.2.conv.weight",
127
- ), # Non-diffusers
128
- ("label_emb.0.0.weight",),
129
- ]
130
-
131
- class ModelSD1(ModelTemplate):
132
- arch = "sd1"
133
- shape_fix = True
134
- keys_detect = [
135
- ("down_blocks.0.downsamplers.0.conv.weight",),
136
- (
137
- "input_blocks.3.0.op.weight", "input_blocks.6.0.op.weight", "input_blocks.9.0.op.weight",
138
- "output_blocks.2.1.conv.weight", "output_blocks.5.2.conv.weight", "output_blocks.8.2.conv.weight"
139
- ), # Non-diffusers
140
- ]
141
-
142
- class ModelLumina2(ModelTemplate):
143
- arch = "lumina2"
144
- keys_detect = [
145
- ("cap_embedder.1.weight", "context_refiner.0.attention.qkv.weight")
146
- ]
147
-
148
- arch_list = [ModelFlux, ModelSD3, ModelAura, ModelHiDream, CosmosPredict2,
149
- ModelLTXV, ModelHyVid, ModelWan, ModelSDXL, ModelSD1, ModelLumina2]
150
-
151
- def is_model_arch(model, state_dict):
152
- # check if model is correct
153
- matched = False
154
- invalid = False
155
- for match_list in model.keys_detect:
156
- if all(key in state_dict for key in match_list):
157
- matched = True
158
- invalid = any(key in state_dict for key in model.keys_banned)
159
- break
160
- assert not invalid, "Model architecture not allowed for conversion! (i.e. reference VS diffusers format)"
161
- return matched
162
-
163
- def detect_arch(state_dict):
164
- model_arch = None
165
- for arch in arch_list:
166
- if is_model_arch(arch, state_dict):
167
- model_arch = arch()
168
- break
169
- assert model_arch is not None, "Unknown model architecture!"
170
- return model_arch
171
-
172
- def parse_args():
173
- parser = argparse.ArgumentParser(description="Generate F16 GGUF files from single UNET")
174
- parser.add_argument("--src", required=True, help="Source model ckpt file.")
175
- parser.add_argument("--dst", help="Output unet gguf file.")
176
- args = parser.parse_args()
177
-
178
- if not os.path.isfile(args.src):
179
- parser.error("No input provided!")
180
-
181
- return args
182
-
183
- def strip_prefix(state_dict):
184
- # prefix for mixed state dict
185
- prefix = None
186
- for pfx in ["model.diffusion_model.", "model."]:
187
- if any([x.startswith(pfx) for x in state_dict.keys()]):
188
- prefix = pfx
189
- break
190
-
191
- # prefix for uniform state dict
192
- if prefix is None:
193
- for pfx in ["net."]:
194
- if all([x.startswith(pfx) for x in state_dict.keys()]):
195
- prefix = pfx
196
- break
197
-
198
- # strip prefix if found
199
- if prefix is not None:
200
- logging.info(f"State dict prefix found: '{prefix}'")
201
- sd = {}
202
- for k, v in state_dict.items():
203
- if prefix not in k:
204
- continue
205
- k = k.replace(prefix, "")
206
- sd[k] = v
207
- else:
208
- logging.debug("State dict has no prefix")
209
- sd = state_dict
210
-
211
- return sd
212
-
213
- def load_state_dict(path):
214
- if any(path.endswith(x) for x in [".ckpt", ".pt", ".bin", ".pth"]):
215
- state_dict = torch.load(path, map_location="cpu", weights_only=True)
216
- for subkey in ["model", "module"]:
217
- if subkey in state_dict:
218
- state_dict = state_dict[subkey]
219
- break
220
- if len(state_dict) < 20:
221
- raise RuntimeError(f"pt subkey load failed: {state_dict.keys()}")
222
- else:
223
- state_dict = load_file(path)
224
-
225
- return strip_prefix(state_dict)
226
-
227
- def handle_tensors(writer, state_dict, model_arch):
228
- name_lengths = tuple(sorted(
229
- ((key, len(key)) for key in state_dict.keys()),
230
- key=lambda item: item[1],
231
- reverse=True,
232
- ))
233
- if not name_lengths:
234
- return
235
- max_name_len = name_lengths[0][1]
236
- if max_name_len > MAX_TENSOR_NAME_LENGTH:
237
- bad_list = ", ".join(f"{key!r} ({namelen})" for key, namelen in name_lengths if namelen > MAX_TENSOR_NAME_LENGTH)
238
- raise ValueError(f"Can only handle tensor names up to {MAX_TENSOR_NAME_LENGTH} characters. Tensors exceeding the limit: {bad_list}")
239
- for key, data in tqdm(state_dict.items()):
240
- old_dtype = data.dtype
241
-
242
- if any(x in key for x in model_arch.keys_ignore):
243
- tqdm.write(f"Filtering ignored key: '{key}'")
244
- continue
245
-
246
- if data.dtype == torch.bfloat16:
247
- data = data.to(torch.float32).numpy()
248
- # this is so we don't break torch 2.0.X
249
- elif data.dtype in [getattr(torch, "float8_e4m3fn", "_invalid"), getattr(torch, "float8_e5m2", "_invalid")]:
250
- data = data.to(torch.float16).numpy()
251
- else:
252
- data = data.numpy()
253
-
254
- n_dims = len(data.shape)
255
- data_shape = data.shape
256
- if old_dtype == torch.bfloat16:
257
- data_qtype = gguf.GGMLQuantizationType.BF16
258
- # elif old_dtype == torch.float32:
259
- # data_qtype = gguf.GGMLQuantizationType.F32
260
- else:
261
- data_qtype = gguf.GGMLQuantizationType.F16
262
-
263
- # The max no. of dimensions that can be handled by the quantization code is 4
264
- if len(data.shape) > MAX_TENSOR_DIMS:
265
- model_arch.handle_nd_tensor(key, data)
266
- continue # needs to be added back later
267
-
268
- # get number of parameters (AKA elements) in this tensor
269
- n_params = 1
270
- for dim_size in data_shape:
271
- n_params *= dim_size
272
-
273
- if old_dtype in (torch.float32, torch.bfloat16):
274
- if n_dims == 1:
275
- # one-dimensional tensors should be kept in F32
276
- # also speeds up inference due to not dequantizing
277
- data_qtype = gguf.GGMLQuantizationType.F32
278
-
279
- elif n_params <= QUANTIZATION_THRESHOLD:
280
- # very small tensors
281
- data_qtype = gguf.GGMLQuantizationType.F32
282
-
283
- elif any(x in key for x in model_arch.keys_hiprec):
284
- # tensors that require max precision
285
- data_qtype = gguf.GGMLQuantizationType.F32
286
-
287
- if (model_arch.shape_fix # NEVER reshape for models such as flux
288
- and n_dims > 1 # Skip one-dimensional tensors
289
- and n_params >= REARRANGE_THRESHOLD # Only rearrange tensors meeting the size requirement
290
- and (n_params / 256).is_integer() # Rearranging only makes sense if total elements is divisible by 256
291
- and not (data.shape[-1] / 256).is_integer() # Only need to rearrange if the last dimension is not divisible by 256
292
- ):
293
- orig_shape = data.shape
294
- data = data.reshape(n_params // 256, 256)
295
- writer.add_array(f"comfy.gguf.orig_shape.{key}", tuple(int(dim) for dim in orig_shape))
296
-
297
- try:
298
- data = gguf.quants.quantize(data, data_qtype)
299
- except (AttributeError, gguf.QuantError) as e:
300
- tqdm.write(f"falling back to F16: {e}")
301
- data_qtype = gguf.GGMLQuantizationType.F16
302
- data = gguf.quants.quantize(data, data_qtype)
303
-
304
- new_name = key # do we need to rename?
305
-
306
- shape_str = f"{{{', '.join(str(n) for n in reversed(data.shape))}}}"
307
- tqdm.write(f"{f'%-{max_name_len + 4}s' % f'{new_name}'} {old_dtype} --> {data_qtype.name}, shape = {shape_str}")
308
-
309
- writer.add_tensor(new_name, data, raw_dtype=data_qtype)
310
-
311
- def convert_file(path, dst_path=None, interact=True, overwrite=False):
312
- # load & run model detection logic
313
- state_dict = load_state_dict(path)
314
- model_arch = detect_arch(state_dict)
315
- logging.info(f"* Architecture detected from input: {model_arch.arch}")
316
-
317
- # detect & set dtype for output file
318
- dtypes = [x.dtype for x in state_dict.values()]
319
- dtypes = {x:dtypes.count(x) for x in set(dtypes)}
320
- main_dtype = max(dtypes, key=dtypes.get)
321
-
322
- if main_dtype == torch.bfloat16:
323
- ftype_name = "BF16"
324
- ftype_gguf = gguf.LlamaFileType.MOSTLY_BF16
325
- # elif main_dtype == torch.float32:
326
- # ftype_name = "F32"
327
- # ftype_gguf = None
328
- else:
329
- ftype_name = "F16"
330
- ftype_gguf = gguf.LlamaFileType.MOSTLY_F16
331
-
332
- if dst_path is None:
333
- dst_path = f"{os.path.splitext(path)[0]}-{ftype_name}.gguf"
334
- elif "{ftype}" in dst_path: # lcpp logic
335
- dst_path = dst_path.replace("{ftype}", ftype_name)
336
-
337
- if os.path.isfile(dst_path) and not overwrite:
338
- if interact:
339
- input("Output exists enter to continue or ctrl+c to abort!")
340
- else:
341
- raise OSError("Output exists and overwriting is disabled!")
342
-
343
- # handle actual file
344
- writer = gguf.GGUFWriter(path=None, arch=model_arch.arch)
345
- writer.add_quantization_version(gguf.GGML_QUANT_VERSION)
346
- if ftype_gguf is not None:
347
- writer.add_file_type(ftype_gguf)
348
-
349
- handle_tensors(writer, state_dict, model_arch)
350
- writer.write_header_to_file(path=dst_path)
351
- writer.write_kv_data_to_file()
352
- writer.write_tensors_to_file(progress=True)
353
- writer.close()
354
-
355
- fix = f"./fix_5d_tensors_{model_arch.arch}.safetensors"
356
- if os.path.isfile(fix):
357
- logging.warning(f"\n### Warning! Fix file found at '{fix}'")
358
- logging.warning(" you most likely need to run 'fix_5d_tensors.py' after quantization.")
359
-
360
- return dst_path, model_arch
361
-
362
- if __name__ == "__main__":
363
- args = parse_args()
364
- convert_file(args.src, args.dst)
365
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/tools/fix_5d_tensors.py DELETED
@@ -1,82 +0,0 @@
1
- # (c) City96 || Apache-2.0 (apache.org/licenses/LICENSE-2.0)
2
- import os
3
- import gguf
4
- import torch
5
- import argparse
6
- from tqdm import tqdm
7
- from safetensors.torch import load_file
8
-
9
- def get_args():
10
- parser = argparse.ArgumentParser()
11
- parser.add_argument("--src", required=True)
12
- parser.add_argument("--dst", required=True)
13
- parser.add_argument("--fix", required=False, help="Defaults to ./fix_5d_tensors_[arch].pt")
14
- parser.add_argument("--overwrite", action="store_true")
15
- args = parser.parse_args()
16
-
17
- if not os.path.isfile(args.src):
18
- parser.error(f"Invalid source file '{args.src}'")
19
- if not args.overwrite and os.path.exists(args.dst):
20
- parser.error(f"Output exists, use '--overwrite' ({args.dst})")
21
-
22
- return args
23
-
24
- def get_arch_str(reader):
25
- field = reader.get_field("general.architecture")
26
- return str(field.parts[field.data[-1]], encoding="utf-8")
27
-
28
- def get_file_type(reader):
29
- field = reader.get_field("general.file_type")
30
- ft = int(field.parts[field.data[-1]])
31
- return gguf.LlamaFileType(ft)
32
-
33
- if __name__ == "__main__":
34
- args = get_args()
35
-
36
- # read existing
37
- reader = gguf.GGUFReader(args.src)
38
- arch = get_arch_str(reader)
39
- file_type = get_file_type(reader)
40
- print(f"Detected arch: '{arch}' (ftype: {str(file_type)})")
41
-
42
- # prep fix
43
- if args.fix is None:
44
- args.fix = f"./fix_5d_tensors_{arch}.safetensors"
45
-
46
- if not os.path.isfile(args.fix):
47
- raise OSError(f"No 5D tensor fix file: {args.fix}")
48
-
49
- sd5d = load_file(args.fix)
50
- sd5d = {k:v.numpy() for k,v in sd5d.items()}
51
- print("5D tensors:", sd5d.keys())
52
-
53
- # prep output
54
- writer = gguf.GGUFWriter(path=None, arch=arch)
55
- writer.add_quantization_version(gguf.GGML_QUANT_VERSION)
56
- writer.add_file_type(file_type)
57
-
58
- added = []
59
- def add_extra_key(writer, key, data):
60
- global added
61
- data_qtype = gguf.GGMLQuantizationType.F32
62
- data = gguf.quants.quantize(data, data_qtype)
63
- tqdm.write(f"Adding key {key} ({data.shape})")
64
- writer.add_tensor(key, data, raw_dtype=data_qtype)
65
- added.append(key)
66
-
67
- # main loop to add missing 5D tensor(s)
68
- for tensor in tqdm(reader.tensors):
69
- writer.add_tensor(tensor.name, tensor.data, raw_dtype=tensor.tensor_type)
70
- key5d = tensor.name.replace(".bias", ".weight")
71
- if key5d in sd5d.keys():
72
- add_extra_key(writer, key5d, sd5d[key5d])
73
-
74
- # brute force for any missed
75
- for key, data in sd5d.items():
76
- if key not in added:
77
- add_extra_key(writer, key, data)
78
-
79
- writer.write_header_to_file(path=args.dst)
80
- writer.write_kv_data_to_file()
81
- writer.write_tensors_to_file(progress=True)
82
- writer.close()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/tools/fix_lines_ending.py DELETED
@@ -1,31 +0,0 @@
1
- import os
2
-
3
- files = ["lcpp.patch", "lcpp_sd3.patch"]
4
-
5
- def has_unix_line_endings(file_path):
6
- try:
7
- with open(file_path, 'rb') as file:
8
- content = file.read()
9
- return b'\r\n' not in content
10
- except Exception as e:
11
- print(f"Error checking '{file_path}': {e}")
12
- return False
13
-
14
- def convert_to_linux_format(file_path):
15
- try:
16
- with open(file_path, 'rb') as file:
17
- content = file.read().replace(b'\r\n', b'\n')
18
- with open(file_path, 'wb') as file:
19
- file.write(content)
20
- print(f"'{file_path}' converted to Linux line endings (LF).")
21
- except Exception as e:
22
- print(f"Error processing '{file_path}': {e}")
23
-
24
- for file in files:
25
- if os.path.exists(file):
26
- if has_unix_line_endings(file):
27
- print(f"'{file}' already has Unix line endings (LF). No conversion needed.")
28
- else:
29
- convert_to_linux_format(file)
30
- else:
31
- print(f"File '{file}' does not exist.")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/tools/lcpp.patch DELETED
@@ -1,451 +0,0 @@
1
- diff --git a/ggml/include/ggml.h b/ggml/include/ggml.h
2
- index de3c706f..0267c1fa 100644
3
- --- a/ggml/include/ggml.h
4
- +++ b/ggml/include/ggml.h
5
- @@ -223,7 +223,7 @@
6
- #define GGML_MAX_OP_PARAMS 64
7
-
8
- #ifndef GGML_MAX_NAME
9
- -# define GGML_MAX_NAME 64
10
- +# define GGML_MAX_NAME 128
11
- #endif
12
-
13
- #define GGML_DEFAULT_N_THREADS 4
14
- @@ -2449,6 +2449,7 @@ extern "C" {
15
-
16
- // manage tensor info
17
- GGML_API void gguf_add_tensor(struct gguf_context * ctx, const struct ggml_tensor * tensor);
18
- + GGML_API void gguf_set_tensor_ndim(struct gguf_context * ctx, const char * name, int n_dim);
19
- GGML_API void gguf_set_tensor_type(struct gguf_context * ctx, const char * name, enum ggml_type type);
20
- GGML_API void gguf_set_tensor_data(struct gguf_context * ctx, const char * name, const void * data, size_t size);
21
-
22
- diff --git a/ggml/src/ggml.c b/ggml/src/ggml.c
23
- index b16c462f..6d1568f1 100644
24
- --- a/ggml/src/ggml.c
25
- +++ b/ggml/src/ggml.c
26
- @@ -22960,6 +22960,14 @@ void gguf_add_tensor(
27
- ctx->header.n_tensors++;
28
- }
29
-
30
- +void gguf_set_tensor_ndim(struct gguf_context * ctx, const char * name, const int n_dim) {
31
- + const int idx = gguf_find_tensor(ctx, name);
32
- + if (idx < 0) {
33
- + GGML_ABORT("tensor not found");
34
- + }
35
- + ctx->infos[idx].n_dims = n_dim;
36
- +}
37
- +
38
- void gguf_set_tensor_type(struct gguf_context * ctx, const char * name, enum ggml_type type) {
39
- const int idx = gguf_find_tensor(ctx, name);
40
- if (idx < 0) {
41
- diff --git a/src/llama.cpp b/src/llama.cpp
42
- index 24e1f1f0..25db4c69 100644
43
- --- a/src/llama.cpp
44
- +++ b/src/llama.cpp
45
- @@ -205,6 +205,17 @@ enum llm_arch {
46
- LLM_ARCH_GRANITE,
47
- LLM_ARCH_GRANITE_MOE,
48
- LLM_ARCH_CHAMELEON,
49
- + LLM_ARCH_FLUX,
50
- + LLM_ARCH_SD1,
51
- + LLM_ARCH_SDXL,
52
- + LLM_ARCH_SD3,
53
- + LLM_ARCH_AURA,
54
- + LLM_ARCH_LTXV,
55
- + LLM_ARCH_HYVID,
56
- + LLM_ARCH_WAN,
57
- + LLM_ARCH_HIDREAM,
58
- + LLM_ARCH_COSMOS,
59
- + LLM_ARCH_LUMINA2,
60
- LLM_ARCH_UNKNOWN,
61
- };
62
-
63
- @@ -258,6 +269,17 @@ static const std::map<llm_arch, const char *> LLM_ARCH_NAMES = {
64
- { LLM_ARCH_GRANITE, "granite" },
65
- { LLM_ARCH_GRANITE_MOE, "granitemoe" },
66
- { LLM_ARCH_CHAMELEON, "chameleon" },
67
- + { LLM_ARCH_FLUX, "flux" },
68
- + { LLM_ARCH_SD1, "sd1" },
69
- + { LLM_ARCH_SDXL, "sdxl" },
70
- + { LLM_ARCH_SD3, "sd3" },
71
- + { LLM_ARCH_AURA, "aura" },
72
- + { LLM_ARCH_LTXV, "ltxv" },
73
- + { LLM_ARCH_HYVID, "hyvid" },
74
- + { LLM_ARCH_WAN, "wan" },
75
- + { LLM_ARCH_HIDREAM, "hidream" },
76
- + { LLM_ARCH_COSMOS, "cosmos" },
77
- + { LLM_ARCH_LUMINA2, "lumina2" },
78
- { LLM_ARCH_UNKNOWN, "(unknown)" },
79
- };
80
-
81
- @@ -1531,6 +1553,17 @@ static const std::map<llm_arch, std::map<llm_tensor, const char *>> LLM_TENSOR_N
82
- { LLM_TENSOR_ATTN_K_NORM, "blk.%d.attn_k_norm" },
83
- },
84
- },
85
- + { LLM_ARCH_FLUX, {}},
86
- + { LLM_ARCH_SD1, {}},
87
- + { LLM_ARCH_SDXL, {}},
88
- + { LLM_ARCH_SD3, {}},
89
- + { LLM_ARCH_AURA, {}},
90
- + { LLM_ARCH_LTXV, {}},
91
- + { LLM_ARCH_HYVID, {}},
92
- + { LLM_ARCH_WAN, {}},
93
- + { LLM_ARCH_HIDREAM, {}},
94
- + { LLM_ARCH_COSMOS, {}},
95
- + { LLM_ARCH_LUMINA2, {}},
96
- {
97
- LLM_ARCH_UNKNOWN,
98
- {
99
- @@ -5403,6 +5436,25 @@ static void llm_load_hparams(
100
- // get general kv
101
- ml.get_key(LLM_KV_GENERAL_NAME, model.name, false);
102
-
103
- + // Disable LLM metadata for image models
104
- + switch (model.arch) {
105
- + case LLM_ARCH_FLUX:
106
- + case LLM_ARCH_SD1:
107
- + case LLM_ARCH_SDXL:
108
- + case LLM_ARCH_SD3:
109
- + case LLM_ARCH_AURA:
110
- + case LLM_ARCH_LTXV:
111
- + case LLM_ARCH_HYVID:
112
- + case LLM_ARCH_WAN:
113
- + case LLM_ARCH_HIDREAM:
114
- + case LLM_ARCH_COSMOS:
115
- + case LLM_ARCH_LUMINA2:
116
- + model.ftype = ml.ftype;
117
- + return;
118
- + default:
119
- + break;
120
- + }
121
- +
122
- // get hparams kv
123
- ml.get_key(LLM_KV_VOCAB_SIZE, hparams.n_vocab, false) || ml.get_arr_n(LLM_KV_TOKENIZER_LIST, hparams.n_vocab);
124
-
125
- @@ -18016,6 +18068,134 @@ static void llama_tensor_dequantize_internal(
126
- workers.clear();
127
- }
128
-
129
- +static ggml_type img_tensor_get_type(quantize_state_internal & qs, ggml_type new_type, const ggml_tensor * tensor, llama_ftype ftype) {
130
- + // Special function for quantizing image model tensors
131
- + const std::string name = ggml_get_name(tensor);
132
- + const llm_arch arch = qs.model.arch;
133
- +
134
- + // Sanity check
135
- + if (
136
- + (name.find("model.diffusion_model.") != std::string::npos) ||
137
- + (name.find("first_stage_model.") != std::string::npos) ||
138
- + (name.find("single_transformer_blocks.") != std::string::npos) ||
139
- + (name.find("joint_transformer_blocks.") != std::string::npos)
140
- + ) {
141
- + throw std::runtime_error("Invalid input GGUF file. This is not a supported UNET model");
142
- + }
143
- +
144
- + // Unsupported quant types - exclude all IQ quants for now
145
- + if (ftype == LLAMA_FTYPE_MOSTLY_IQ2_XXS || ftype == LLAMA_FTYPE_MOSTLY_IQ2_XS ||
146
- + ftype == LLAMA_FTYPE_MOSTLY_IQ2_S || ftype == LLAMA_FTYPE_MOSTLY_IQ2_M ||
147
- + ftype == LLAMA_FTYPE_MOSTLY_IQ3_XXS || ftype == LLAMA_FTYPE_MOSTLY_IQ1_S ||
148
- + ftype == LLAMA_FTYPE_MOSTLY_IQ1_M || ftype == LLAMA_FTYPE_MOSTLY_IQ4_NL ||
149
- + ftype == LLAMA_FTYPE_MOSTLY_IQ4_XS || ftype == LLAMA_FTYPE_MOSTLY_IQ3_S ||
150
- + ftype == LLAMA_FTYPE_MOSTLY_IQ3_M || ftype == LLAMA_FTYPE_MOSTLY_Q4_0_4_4 ||
151
- + ftype == LLAMA_FTYPE_MOSTLY_Q4_0_4_8 || ftype == LLAMA_FTYPE_MOSTLY_Q4_0_8_8) {
152
- + throw std::runtime_error("Invalid quantization type for image model (Not supported)");
153
- + }
154
- +
155
- + if ( // Rules for to_v attention
156
- + (name.find("attn_v.weight") != std::string::npos) ||
157
- + (name.find(".to_v.weight") != std::string::npos) ||
158
- + (name.find(".v.weight") != std::string::npos) ||
159
- + (name.find(".attn.w1v.weight") != std::string::npos) ||
160
- + (name.find(".attn.w2v.weight") != std::string::npos) ||
161
- + (name.find("_attn.v_proj.weight") != std::string::npos)
162
- + ){
163
- + if (ftype == LLAMA_FTYPE_MOSTLY_Q2_K) {
164
- + new_type = GGML_TYPE_Q3_K;
165
- + }
166
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q3_K_M) {
167
- + new_type = qs.i_attention_wv < 2 ? GGML_TYPE_Q5_K : GGML_TYPE_Q4_K;
168
- + }
169
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q3_K_L) {
170
- + new_type = GGML_TYPE_Q5_K;
171
- + }
172
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q4_K_M || ftype == LLAMA_FTYPE_MOSTLY_Q5_K_M) {
173
- + new_type = GGML_TYPE_Q6_K;
174
- + }
175
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q4_K_S && qs.i_attention_wv < 4) {
176
- + new_type = GGML_TYPE_Q5_K;
177
- + }
178
- + ++qs.i_attention_wv;
179
- + } else if ( // Rules for fused qkv attention
180
- + (name.find("attn_qkv.weight") != std::string::npos) ||
181
- + (name.find("attn.qkv.weight") != std::string::npos) ||
182
- + (name.find("attention.qkv.weight") != std::string::npos)
183
- + ) {
184
- + if (ftype == LLAMA_FTYPE_MOSTLY_Q3_K_M || ftype == LLAMA_FTYPE_MOSTLY_Q3_K_L) {
185
- + new_type = GGML_TYPE_Q4_K;
186
- + }
187
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q4_K_M) {
188
- + new_type = GGML_TYPE_Q5_K;
189
- + }
190
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q5_K_M) {
191
- + new_type = GGML_TYPE_Q6_K;
192
- + }
193
- + } else if ( // Rules for ffn
194
- + (name.find("ffn_down") != std::string::npos) ||
195
- + ((name.find("experts.") != std::string::npos) && (name.find(".w2.weight") != std::string::npos)) ||
196
- + (name.find(".ffn.2.weight") != std::string::npos) || // is this even the right way around?
197
- + (name.find(".ff.net.2.weight") != std::string::npos) ||
198
- + (name.find(".mlp.layer2.weight") != std::string::npos) ||
199
- + (name.find(".adaln_modulation_mlp.2.weight") != std::string::npos) ||
200
- + (name.find(".feed_forward.w2.weight") != std::string::npos)
201
- + ) {
202
- + // TODO: add back `layer_info` with some model specific logic + logic further down
203
- + if (ftype == LLAMA_FTYPE_MOSTLY_Q3_K_M) {
204
- + new_type = GGML_TYPE_Q4_K;
205
- + }
206
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q3_K_L) {
207
- + new_type = GGML_TYPE_Q5_K;
208
- + }
209
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q4_K_S) {
210
- + new_type = GGML_TYPE_Q5_K;
211
- + }
212
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q4_K_M) {
213
- + new_type = GGML_TYPE_Q6_K;
214
- + }
215
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q5_K_M) {
216
- + new_type = GGML_TYPE_Q6_K;
217
- + }
218
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q4_0) {
219
- + new_type = GGML_TYPE_Q4_1;
220
- + }
221
- + else if (ftype == LLAMA_FTYPE_MOSTLY_Q5_0) {
222
- + new_type = GGML_TYPE_Q5_1;
223
- + }
224
- + ++qs.i_ffn_down;
225
- + }
226
- +
227
- + // Sanity check for row shape
228
- + bool convert_incompatible_tensor = false;
229
- + if (new_type == GGML_TYPE_Q2_K || new_type == GGML_TYPE_Q3_K || new_type == GGML_TYPE_Q4_K ||
230
- + new_type == GGML_TYPE_Q5_K || new_type == GGML_TYPE_Q6_K) {
231
- + int nx = tensor->ne[0];
232
- + int ny = tensor->ne[1];
233
- + if (nx % QK_K != 0) {
234
- + LLAMA_LOG_WARN("\n\n%s : tensor cols %d x %d are not divisible by %d, required for %s", __func__, nx, ny, QK_K, ggml_type_name(new_type));
235
- + convert_incompatible_tensor = true;
236
- + } else {
237
- + ++qs.n_k_quantized;
238
- + }
239
- + }
240
- + if (convert_incompatible_tensor) {
241
- + // TODO: Possibly reenable this in the future
242
- + // switch (new_type) {
243
- + // case GGML_TYPE_Q2_K:
244
- + // case GGML_TYPE_Q3_K:
245
- + // case GGML_TYPE_Q4_K: new_type = GGML_TYPE_Q5_0; break;
246
- + // case GGML_TYPE_Q5_K: new_type = GGML_TYPE_Q5_1; break;
247
- + // case GGML_TYPE_Q6_K: new_type = GGML_TYPE_Q8_0; break;
248
- + // default: throw std::runtime_error("\nUnsupported tensor size encountered\n");
249
- + // }
250
- + new_type = GGML_TYPE_F16;
251
- + LLAMA_LOG_WARN(" - using fallback quantization %s\n", ggml_type_name(new_type));
252
- + ++qs.n_fallback;
253
- + }
254
- + return new_type;
255
- +}
256
- +
257
- static ggml_type llama_tensor_get_type(quantize_state_internal & qs, ggml_type new_type, const ggml_tensor * tensor, llama_ftype ftype) {
258
- const std::string name = ggml_get_name(tensor);
259
-
260
- @@ -18513,7 +18693,9 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
261
- if (llama_model_has_encoder(&model)) {
262
- n_attn_layer *= 3;
263
- }
264
- - GGML_ASSERT((qs.n_attention_wv == n_attn_layer) && "n_attention_wv is unexpected");
265
- + if (model.arch != LLM_ARCH_HYVID) { // TODO: Check why this fails
266
- + GGML_ASSERT((qs.n_attention_wv == n_attn_layer) && "n_attention_wv is unexpected");
267
- + }
268
- }
269
-
270
- size_t total_size_org = 0;
271
- @@ -18547,6 +18729,51 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
272
- ctx_outs[i_split] = gguf_init_empty();
273
- }
274
- gguf_add_tensor(ctx_outs[i_split], tensor);
275
- + // SD3 pos_embed needs special fix as first dim is 1, which gets truncated here
276
- + if (model.arch == LLM_ARCH_SD3) {
277
- + const std::string name = ggml_get_name(tensor);
278
- + if (name == "pos_embed" && tensor->ne[2] == 1) {
279
- + const int n_dim = 3;
280
- + gguf_set_tensor_ndim(ctx_outs[i_split], "pos_embed", n_dim);
281
- + LLAMA_LOG_INFO("\n%s: Correcting pos_embed shape for SD3: [key:%s]\n", __func__, tensor->name);
282
- + }
283
- + }
284
- + // same goes for auraflow
285
- + if (model.arch == LLM_ARCH_AURA) {
286
- + const std::string name = ggml_get_name(tensor);
287
- + if (name == "positional_encoding" && tensor->ne[2] == 1) {
288
- + const int n_dim = 3;
289
- + gguf_set_tensor_ndim(ctx_outs[i_split], "positional_encoding", n_dim);
290
- + LLAMA_LOG_INFO("\n%s: Correcting positional_encoding shape for AuraFlow: [key:%s]\n", __func__, tensor->name);
291
- + }
292
- + if (name == "register_tokens" && tensor->ne[2] == 1) {
293
- + const int n_dim = 3;
294
- + gguf_set_tensor_ndim(ctx_outs[i_split], "register_tokens", n_dim);
295
- + LLAMA_LOG_INFO("\n%s: Correcting register_tokens shape for AuraFlow: [key:%s]\n", __func__, tensor->name);
296
- + }
297
- + }
298
- + // conv3d fails due to max dims - unsure what to do here as we never even reach this check
299
- + if (model.arch == LLM_ARCH_HYVID) {
300
- + const std::string name = ggml_get_name(tensor);
301
- + if (name == "img_in.proj.weight" && tensor->ne[5] != 1 ) {
302
- + throw std::runtime_error("img_in.proj.weight size failed for HyVid");
303
- + }
304
- + }
305
- + // All the modulation layers also have dim1, and I think conv3d fails here too but we segfaul way before that...
306
- + if (model.arch == LLM_ARCH_WAN) {
307
- + const std::string name = ggml_get_name(tensor);
308
- + if (name.find(".modulation") != std::string::npos && tensor->ne[2] == 1) {
309
- + const int n_dim = 3;
310
- + gguf_set_tensor_ndim(ctx_outs[i_split], tensor->name, n_dim);
311
- + LLAMA_LOG_INFO("\n%s: Correcting shape for Wan: [key:%s]\n", __func__, tensor->name);
312
- + }
313
- + // FLF2V model only
314
- + if (name == "img_emb.emb_pos") {
315
- + const int n_dim = 3;
316
- + gguf_set_tensor_ndim(ctx_outs[i_split], tensor->name, n_dim);
317
- + LLAMA_LOG_INFO("\n%s: Correcting shape for Wan FLF2V: [key:%s]\n", __func__, tensor->name);
318
- + }
319
- + }
320
- }
321
-
322
- // Set split info if needed
323
- @@ -18647,6 +18874,110 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
324
- // do not quantize relative position bias (T5)
325
- quantize &= name.find("attn_rel_b.weight") == std::string::npos;
326
-
327
- + // rules for image models
328
- + bool image_model = false;
329
- + if (model.arch == LLM_ARCH_FLUX) {
330
- + image_model = true;
331
- + quantize &= name.find("txt_in.") == std::string::npos;
332
- + quantize &= name.find("img_in.") == std::string::npos;
333
- + quantize &= name.find("time_in.") == std::string::npos;
334
- + quantize &= name.find("vector_in.") == std::string::npos;
335
- + quantize &= name.find("guidance_in.") == std::string::npos;
336
- + quantize &= name.find("final_layer.") == std::string::npos;
337
- + }
338
- + if (model.arch == LLM_ARCH_SD1 || model.arch == LLM_ARCH_SDXL) {
339
- + image_model = true;
340
- + quantize &= name.find("class_embedding.") == std::string::npos;
341
- + quantize &= name.find("time_embedding.") == std::string::npos;
342
- + quantize &= name.find("add_embedding.") == std::string::npos;
343
- + quantize &= name.find("time_embed.") == std::string::npos;
344
- + quantize &= name.find("label_emb.") == std::string::npos;
345
- + quantize &= name.find("conv_in.") == std::string::npos;
346
- + quantize &= name.find("conv_out.") == std::string::npos;
347
- + quantize &= name != "input_blocks.0.0.weight";
348
- + quantize &= name != "out.2.weight";
349
- + }
350
- + if (model.arch == LLM_ARCH_SD3) {
351
- + image_model = true;
352
- + quantize &= name.find("final_layer.") == std::string::npos;
353
- + quantize &= name.find("time_text_embed.") == std::string::npos;
354
- + quantize &= name.find("context_embedder.") == std::string::npos;
355
- + quantize &= name.find("t_embedder.") == std::string::npos;
356
- + quantize &= name.find("y_embedder.") == std::string::npos;
357
- + quantize &= name.find("x_embedder.") == std::string::npos;
358
- + quantize &= name != "proj_out.weight";
359
- + quantize &= name != "pos_embed";
360
- + }
361
- + if (model.arch == LLM_ARCH_AURA) {
362
- + image_model = true;
363
- + quantize &= name.find("t_embedder.") == std::string::npos;
364
- + quantize &= name.find("init_x_linear.") == std::string::npos;
365
- + quantize &= name != "modF.1.weight";
366
- + quantize &= name != "cond_seq_linear.weight";
367
- + quantize &= name != "final_linear.weight";
368
- + quantize &= name != "final_linear.weight";
369
- + quantize &= name != "positional_encoding";
370
- + quantize &= name != "register_tokens";
371
- + }
372
- + if (model.arch == LLM_ARCH_LTXV) {
373
- + image_model = true;
374
- + quantize &= name.find("adaln_single.") == std::string::npos;
375
- + quantize &= name.find("caption_projection.") == std::string::npos;
376
- + quantize &= name.find("patchify_proj.") == std::string::npos;
377
- + quantize &= name.find("proj_out.") == std::string::npos;
378
- + quantize &= name.find("scale_shift_table") == std::string::npos; // last block too
379
- + }
380
- + if (model.arch == LLM_ARCH_HYVID) {
381
- + image_model = true;
382
- + quantize &= name.find("txt_in.") == std::string::npos;
383
- + quantize &= name.find("img_in.") == std::string::npos;
384
- + quantize &= name.find("time_in.") == std::string::npos;
385
- + quantize &= name.find("vector_in.") == std::string::npos;
386
- + quantize &= name.find("guidance_in.") == std::string::npos;
387
- + quantize &= name.find("final_layer.") == std::string::npos;
388
- + }
389
- + if (model.arch == LLM_ARCH_WAN) {
390
- + image_model = true;
391
- + quantize &= name.find("modulation.") == std::string::npos;
392
- + quantize &= name.find("patch_embedding.") == std::string::npos;
393
- + quantize &= name.find("text_embedding.") == std::string::npos;
394
- + quantize &= name.find("time_projection.") == std::string::npos;
395
- + quantize &= name.find("time_embedding.") == std::string::npos;
396
- + quantize &= name.find("img_emb.") == std::string::npos;
397
- + quantize &= name.find("head.") == std::string::npos;
398
- + }
399
- + if (model.arch == LLM_ARCH_HIDREAM) {
400
- + image_model = true;
401
- + quantize &= name.find("p_embedder.") == std::string::npos;
402
- + quantize &= name.find("t_embedder.") == std::string::npos;
403
- + quantize &= name.find("x_embedder.") == std::string::npos;
404
- + quantize &= name.find("final_layer.") == std::string::npos;
405
- + quantize &= name.find(".ff_i.gate.weight") == std::string::npos;
406
- + quantize &= name.find("caption_projection.") == std::string::npos;
407
- + }
408
- + if (model.arch == LLM_ARCH_COSMOS) {
409
- + image_model = true;
410
- + quantize &= name.find("p_embedder.") == std::string::npos;
411
- + quantize &= name.find("t_embedder.") == std::string::npos;
412
- + quantize &= name.find("t_embedding_norm.") == std::string::npos;
413
- + quantize &= name.find("x_embedder.") == std::string::npos;
414
- + quantize &= name.find("pos_embedder.") == std::string::npos;
415
- + quantize &= name.find("final_layer.") == std::string::npos;
416
- + }
417
- + if (model.arch == LLM_ARCH_LUMINA2) {
418
- + image_model = true;
419
- + quantize &= name.find("t_embedder.") == std::string::npos;
420
- + quantize &= name.find("x_embedder.") == std::string::npos;
421
- + quantize &= name.find("final_layer.") == std::string::npos;
422
- + quantize &= name.find("cap_embedder.") == std::string::npos;
423
- + quantize &= name.find("context_refiner.") == std::string::npos;
424
- + quantize &= name.find("noise_refiner.") == std::string::npos;
425
- + }
426
- + // ignore 3D/4D tensors for image models as the code was never meant to handle these
427
- + if (image_model) {
428
- + quantize &= ggml_n_dims(tensor) == 2;
429
- + }
430
- +
431
- enum ggml_type new_type;
432
- void * new_data;
433
- size_t new_size;
434
- @@ -18655,6 +18986,9 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
435
- new_type = default_type;
436
-
437
- // get more optimal quantization type based on the tensor shape, layer, etc.
438
- + if (image_model) {
439
- + new_type = img_tensor_get_type(qs, new_type, tensor, ftype);
440
- + } else {
441
- if (!params->pure && ggml_is_quantized(default_type)) {
442
- new_type = llama_tensor_get_type(qs, new_type, tensor, ftype);
443
- }
444
- @@ -18664,6 +18998,7 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
445
- if (params->output_tensor_type < GGML_TYPE_COUNT && strcmp(tensor->name, "output.weight") == 0) {
446
- new_type = params->output_tensor_type;
447
- }
448
- + }
449
-
450
- // If we've decided to quantize to the same type the tensor is already
451
- // in then there's nothing to do.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/ComfyUI-GGUF/tools/read_tensors.py DELETED
@@ -1,21 +0,0 @@
1
- #!/usr/bin/python3
2
- import os
3
- import sys
4
- import gguf
5
-
6
- def read_tensors(path):
7
- reader = gguf.GGUFReader(path)
8
- for tensor in reader.tensors:
9
- if tensor.tensor_type == gguf.GGMLQuantizationType.F32:
10
- continue
11
- print(f"{str(tensor.tensor_type):32}: {tensor.name}")
12
-
13
- try:
14
- path = sys.argv[1]
15
- assert os.path.isfile(path), "Invalid path"
16
- print(f"input: {path}")
17
- except Exception as e:
18
- input(f"failed: {e}")
19
- else:
20
- read_tensors(path)
21
- input()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/custom_nodes/cg-image-filter ADDED
@@ -0,0 +1 @@
 
 
1
+ Subproject commit 2cd49e79af81f91e7758c5174b4ade7b168f9c85
ComfyUI/models/audio_encoders/put_audio_encoder_models_here DELETED
File without changes
ComfyUI/models/checkpoints/put_checkpoints_here DELETED
File without changes
ComfyUI/models/clip/put_clip_or_text_encoder_models_here DELETED
File without changes
ComfyUI/models/clip_vision/put_clip_vision_models_here DELETED
File without changes
ComfyUI/models/configs/anything_v3.yaml DELETED
@@ -1,73 +0,0 @@
1
- model:
2
- base_learning_rate: 1.0e-04
3
- target: ldm.models.diffusion.ddpm.LatentDiffusion
4
- params:
5
- linear_start: 0.00085
6
- linear_end: 0.0120
7
- num_timesteps_cond: 1
8
- log_every_t: 200
9
- timesteps: 1000
10
- first_stage_key: "jpg"
11
- cond_stage_key: "txt"
12
- image_size: 64
13
- channels: 4
14
- cond_stage_trainable: false # Note: different from the one we trained before
15
- conditioning_key: crossattn
16
- monitor: val/loss_simple_ema
17
- scale_factor: 0.18215
18
- use_ema: False
19
-
20
- scheduler_config: # 10000 warmup steps
21
- target: ldm.lr_scheduler.LambdaLinearScheduler
22
- params:
23
- warm_up_steps: [ 10000 ]
24
- cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
25
- f_start: [ 1.e-6 ]
26
- f_max: [ 1. ]
27
- f_min: [ 1. ]
28
-
29
- unet_config:
30
- target: ldm.modules.diffusionmodules.openaimodel.UNetModel
31
- params:
32
- image_size: 32 # unused
33
- in_channels: 4
34
- out_channels: 4
35
- model_channels: 320
36
- attention_resolutions: [ 4, 2, 1 ]
37
- num_res_blocks: 2
38
- channel_mult: [ 1, 2, 4, 4 ]
39
- num_heads: 8
40
- use_spatial_transformer: True
41
- transformer_depth: 1
42
- context_dim: 768
43
- use_checkpoint: True
44
- legacy: False
45
-
46
- first_stage_config:
47
- target: ldm.models.autoencoder.AutoencoderKL
48
- params:
49
- embed_dim: 4
50
- monitor: val/rec_loss
51
- ddconfig:
52
- double_z: true
53
- z_channels: 4
54
- resolution: 256
55
- in_channels: 3
56
- out_ch: 3
57
- ch: 128
58
- ch_mult:
59
- - 1
60
- - 2
61
- - 4
62
- - 4
63
- num_res_blocks: 2
64
- attn_resolutions: []
65
- dropout: 0.0
66
- lossconfig:
67
- target: torch.nn.Identity
68
-
69
- cond_stage_config:
70
- target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
71
- params:
72
- layer: "hidden"
73
- layer_idx: -2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/models/configs/v1-inference.yaml DELETED
@@ -1,70 +0,0 @@
1
- model:
2
- base_learning_rate: 1.0e-04
3
- target: ldm.models.diffusion.ddpm.LatentDiffusion
4
- params:
5
- linear_start: 0.00085
6
- linear_end: 0.0120
7
- num_timesteps_cond: 1
8
- log_every_t: 200
9
- timesteps: 1000
10
- first_stage_key: "jpg"
11
- cond_stage_key: "txt"
12
- image_size: 64
13
- channels: 4
14
- cond_stage_trainable: false # Note: different from the one we trained before
15
- conditioning_key: crossattn
16
- monitor: val/loss_simple_ema
17
- scale_factor: 0.18215
18
- use_ema: False
19
-
20
- scheduler_config: # 10000 warmup steps
21
- target: ldm.lr_scheduler.LambdaLinearScheduler
22
- params:
23
- warm_up_steps: [ 10000 ]
24
- cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
25
- f_start: [ 1.e-6 ]
26
- f_max: [ 1. ]
27
- f_min: [ 1. ]
28
-
29
- unet_config:
30
- target: ldm.modules.diffusionmodules.openaimodel.UNetModel
31
- params:
32
- image_size: 32 # unused
33
- in_channels: 4
34
- out_channels: 4
35
- model_channels: 320
36
- attention_resolutions: [ 4, 2, 1 ]
37
- num_res_blocks: 2
38
- channel_mult: [ 1, 2, 4, 4 ]
39
- num_heads: 8
40
- use_spatial_transformer: True
41
- transformer_depth: 1
42
- context_dim: 768
43
- use_checkpoint: True
44
- legacy: False
45
-
46
- first_stage_config:
47
- target: ldm.models.autoencoder.AutoencoderKL
48
- params:
49
- embed_dim: 4
50
- monitor: val/rec_loss
51
- ddconfig:
52
- double_z: true
53
- z_channels: 4
54
- resolution: 256
55
- in_channels: 3
56
- out_ch: 3
57
- ch: 128
58
- ch_mult:
59
- - 1
60
- - 2
61
- - 4
62
- - 4
63
- num_res_blocks: 2
64
- attn_resolutions: []
65
- dropout: 0.0
66
- lossconfig:
67
- target: torch.nn.Identity
68
-
69
- cond_stage_config:
70
- target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/models/configs/v1-inference_clip_skip_2.yaml DELETED
@@ -1,73 +0,0 @@
1
- model:
2
- base_learning_rate: 1.0e-04
3
- target: ldm.models.diffusion.ddpm.LatentDiffusion
4
- params:
5
- linear_start: 0.00085
6
- linear_end: 0.0120
7
- num_timesteps_cond: 1
8
- log_every_t: 200
9
- timesteps: 1000
10
- first_stage_key: "jpg"
11
- cond_stage_key: "txt"
12
- image_size: 64
13
- channels: 4
14
- cond_stage_trainable: false # Note: different from the one we trained before
15
- conditioning_key: crossattn
16
- monitor: val/loss_simple_ema
17
- scale_factor: 0.18215
18
- use_ema: False
19
-
20
- scheduler_config: # 10000 warmup steps
21
- target: ldm.lr_scheduler.LambdaLinearScheduler
22
- params:
23
- warm_up_steps: [ 10000 ]
24
- cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
25
- f_start: [ 1.e-6 ]
26
- f_max: [ 1. ]
27
- f_min: [ 1. ]
28
-
29
- unet_config:
30
- target: ldm.modules.diffusionmodules.openaimodel.UNetModel
31
- params:
32
- image_size: 32 # unused
33
- in_channels: 4
34
- out_channels: 4
35
- model_channels: 320
36
- attention_resolutions: [ 4, 2, 1 ]
37
- num_res_blocks: 2
38
- channel_mult: [ 1, 2, 4, 4 ]
39
- num_heads: 8
40
- use_spatial_transformer: True
41
- transformer_depth: 1
42
- context_dim: 768
43
- use_checkpoint: True
44
- legacy: False
45
-
46
- first_stage_config:
47
- target: ldm.models.autoencoder.AutoencoderKL
48
- params:
49
- embed_dim: 4
50
- monitor: val/rec_loss
51
- ddconfig:
52
- double_z: true
53
- z_channels: 4
54
- resolution: 256
55
- in_channels: 3
56
- out_ch: 3
57
- ch: 128
58
- ch_mult:
59
- - 1
60
- - 2
61
- - 4
62
- - 4
63
- num_res_blocks: 2
64
- attn_resolutions: []
65
- dropout: 0.0
66
- lossconfig:
67
- target: torch.nn.Identity
68
-
69
- cond_stage_config:
70
- target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
71
- params:
72
- layer: "hidden"
73
- layer_idx: -2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/models/configs/v1-inference_clip_skip_2_fp16.yaml DELETED
@@ -1,74 +0,0 @@
1
- model:
2
- base_learning_rate: 1.0e-04
3
- target: ldm.models.diffusion.ddpm.LatentDiffusion
4
- params:
5
- linear_start: 0.00085
6
- linear_end: 0.0120
7
- num_timesteps_cond: 1
8
- log_every_t: 200
9
- timesteps: 1000
10
- first_stage_key: "jpg"
11
- cond_stage_key: "txt"
12
- image_size: 64
13
- channels: 4
14
- cond_stage_trainable: false # Note: different from the one we trained before
15
- conditioning_key: crossattn
16
- monitor: val/loss_simple_ema
17
- scale_factor: 0.18215
18
- use_ema: False
19
-
20
- scheduler_config: # 10000 warmup steps
21
- target: ldm.lr_scheduler.LambdaLinearScheduler
22
- params:
23
- warm_up_steps: [ 10000 ]
24
- cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
25
- f_start: [ 1.e-6 ]
26
- f_max: [ 1. ]
27
- f_min: [ 1. ]
28
-
29
- unet_config:
30
- target: ldm.modules.diffusionmodules.openaimodel.UNetModel
31
- params:
32
- use_fp16: True
33
- image_size: 32 # unused
34
- in_channels: 4
35
- out_channels: 4
36
- model_channels: 320
37
- attention_resolutions: [ 4, 2, 1 ]
38
- num_res_blocks: 2
39
- channel_mult: [ 1, 2, 4, 4 ]
40
- num_heads: 8
41
- use_spatial_transformer: True
42
- transformer_depth: 1
43
- context_dim: 768
44
- use_checkpoint: True
45
- legacy: False
46
-
47
- first_stage_config:
48
- target: ldm.models.autoencoder.AutoencoderKL
49
- params:
50
- embed_dim: 4
51
- monitor: val/rec_loss
52
- ddconfig:
53
- double_z: true
54
- z_channels: 4
55
- resolution: 256
56
- in_channels: 3
57
- out_ch: 3
58
- ch: 128
59
- ch_mult:
60
- - 1
61
- - 2
62
- - 4
63
- - 4
64
- num_res_blocks: 2
65
- attn_resolutions: []
66
- dropout: 0.0
67
- lossconfig:
68
- target: torch.nn.Identity
69
-
70
- cond_stage_config:
71
- target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
72
- params:
73
- layer: "hidden"
74
- layer_idx: -2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/models/configs/v1-inference_fp16.yaml DELETED
@@ -1,71 +0,0 @@
1
- model:
2
- base_learning_rate: 1.0e-04
3
- target: ldm.models.diffusion.ddpm.LatentDiffusion
4
- params:
5
- linear_start: 0.00085
6
- linear_end: 0.0120
7
- num_timesteps_cond: 1
8
- log_every_t: 200
9
- timesteps: 1000
10
- first_stage_key: "jpg"
11
- cond_stage_key: "txt"
12
- image_size: 64
13
- channels: 4
14
- cond_stage_trainable: false # Note: different from the one we trained before
15
- conditioning_key: crossattn
16
- monitor: val/loss_simple_ema
17
- scale_factor: 0.18215
18
- use_ema: False
19
-
20
- scheduler_config: # 10000 warmup steps
21
- target: ldm.lr_scheduler.LambdaLinearScheduler
22
- params:
23
- warm_up_steps: [ 10000 ]
24
- cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
25
- f_start: [ 1.e-6 ]
26
- f_max: [ 1. ]
27
- f_min: [ 1. ]
28
-
29
- unet_config:
30
- target: ldm.modules.diffusionmodules.openaimodel.UNetModel
31
- params:
32
- use_fp16: True
33
- image_size: 32 # unused
34
- in_channels: 4
35
- out_channels: 4
36
- model_channels: 320
37
- attention_resolutions: [ 4, 2, 1 ]
38
- num_res_blocks: 2
39
- channel_mult: [ 1, 2, 4, 4 ]
40
- num_heads: 8
41
- use_spatial_transformer: True
42
- transformer_depth: 1
43
- context_dim: 768
44
- use_checkpoint: True
45
- legacy: False
46
-
47
- first_stage_config:
48
- target: ldm.models.autoencoder.AutoencoderKL
49
- params:
50
- embed_dim: 4
51
- monitor: val/rec_loss
52
- ddconfig:
53
- double_z: true
54
- z_channels: 4
55
- resolution: 256
56
- in_channels: 3
57
- out_ch: 3
58
- ch: 128
59
- ch_mult:
60
- - 1
61
- - 2
62
- - 4
63
- - 4
64
- num_res_blocks: 2
65
- attn_resolutions: []
66
- dropout: 0.0
67
- lossconfig:
68
- target: torch.nn.Identity
69
-
70
- cond_stage_config:
71
- target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/models/configs/v1-inpainting-inference.yaml DELETED
@@ -1,71 +0,0 @@
1
- model:
2
- base_learning_rate: 7.5e-05
3
- target: ldm.models.diffusion.ddpm.LatentInpaintDiffusion
4
- params:
5
- linear_start: 0.00085
6
- linear_end: 0.0120
7
- num_timesteps_cond: 1
8
- log_every_t: 200
9
- timesteps: 1000
10
- first_stage_key: "jpg"
11
- cond_stage_key: "txt"
12
- image_size: 64
13
- channels: 4
14
- cond_stage_trainable: false # Note: different from the one we trained before
15
- conditioning_key: hybrid # important
16
- monitor: val/loss_simple_ema
17
- scale_factor: 0.18215
18
- finetune_keys: null
19
-
20
- scheduler_config: # 10000 warmup steps
21
- target: ldm.lr_scheduler.LambdaLinearScheduler
22
- params:
23
- warm_up_steps: [ 2500 ] # NOTE for resuming. use 10000 if starting from scratch
24
- cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
25
- f_start: [ 1.e-6 ]
26
- f_max: [ 1. ]
27
- f_min: [ 1. ]
28
-
29
- unet_config:
30
- target: ldm.modules.diffusionmodules.openaimodel.UNetModel
31
- params:
32
- image_size: 32 # unused
33
- in_channels: 9 # 4 data + 4 downscaled image + 1 mask
34
- out_channels: 4
35
- model_channels: 320
36
- attention_resolutions: [ 4, 2, 1 ]
37
- num_res_blocks: 2
38
- channel_mult: [ 1, 2, 4, 4 ]
39
- num_heads: 8
40
- use_spatial_transformer: True
41
- transformer_depth: 1
42
- context_dim: 768
43
- use_checkpoint: True
44
- legacy: False
45
-
46
- first_stage_config:
47
- target: ldm.models.autoencoder.AutoencoderKL
48
- params:
49
- embed_dim: 4
50
- monitor: val/rec_loss
51
- ddconfig:
52
- double_z: true
53
- z_channels: 4
54
- resolution: 256
55
- in_channels: 3
56
- out_ch: 3
57
- ch: 128
58
- ch_mult:
59
- - 1
60
- - 2
61
- - 4
62
- - 4
63
- num_res_blocks: 2
64
- attn_resolutions: []
65
- dropout: 0.0
66
- lossconfig:
67
- target: torch.nn.Identity
68
-
69
- cond_stage_config:
70
- target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
71
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/models/configs/v2-inference-v.yaml DELETED
@@ -1,68 +0,0 @@
1
- model:
2
- base_learning_rate: 1.0e-4
3
- target: ldm.models.diffusion.ddpm.LatentDiffusion
4
- params:
5
- parameterization: "v"
6
- linear_start: 0.00085
7
- linear_end: 0.0120
8
- num_timesteps_cond: 1
9
- log_every_t: 200
10
- timesteps: 1000
11
- first_stage_key: "jpg"
12
- cond_stage_key: "txt"
13
- image_size: 64
14
- channels: 4
15
- cond_stage_trainable: false
16
- conditioning_key: crossattn
17
- monitor: val/loss_simple_ema
18
- scale_factor: 0.18215
19
- use_ema: False # we set this to false because this is an inference only config
20
-
21
- unet_config:
22
- target: ldm.modules.diffusionmodules.openaimodel.UNetModel
23
- params:
24
- use_checkpoint: True
25
- use_fp16: True
26
- image_size: 32 # unused
27
- in_channels: 4
28
- out_channels: 4
29
- model_channels: 320
30
- attention_resolutions: [ 4, 2, 1 ]
31
- num_res_blocks: 2
32
- channel_mult: [ 1, 2, 4, 4 ]
33
- num_head_channels: 64 # need to fix for flash-attn
34
- use_spatial_transformer: True
35
- use_linear_in_transformer: True
36
- transformer_depth: 1
37
- context_dim: 1024
38
- legacy: False
39
-
40
- first_stage_config:
41
- target: ldm.models.autoencoder.AutoencoderKL
42
- params:
43
- embed_dim: 4
44
- monitor: val/rec_loss
45
- ddconfig:
46
- #attn_type: "vanilla-xformers"
47
- double_z: true
48
- z_channels: 4
49
- resolution: 256
50
- in_channels: 3
51
- out_ch: 3
52
- ch: 128
53
- ch_mult:
54
- - 1
55
- - 2
56
- - 4
57
- - 4
58
- num_res_blocks: 2
59
- attn_resolutions: []
60
- dropout: 0.0
61
- lossconfig:
62
- target: torch.nn.Identity
63
-
64
- cond_stage_config:
65
- target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
66
- params:
67
- freeze: True
68
- layer: "penultimate"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/models/configs/v2-inference-v_fp32.yaml DELETED
@@ -1,68 +0,0 @@
1
- model:
2
- base_learning_rate: 1.0e-4
3
- target: ldm.models.diffusion.ddpm.LatentDiffusion
4
- params:
5
- parameterization: "v"
6
- linear_start: 0.00085
7
- linear_end: 0.0120
8
- num_timesteps_cond: 1
9
- log_every_t: 200
10
- timesteps: 1000
11
- first_stage_key: "jpg"
12
- cond_stage_key: "txt"
13
- image_size: 64
14
- channels: 4
15
- cond_stage_trainable: false
16
- conditioning_key: crossattn
17
- monitor: val/loss_simple_ema
18
- scale_factor: 0.18215
19
- use_ema: False # we set this to false because this is an inference only config
20
-
21
- unet_config:
22
- target: ldm.modules.diffusionmodules.openaimodel.UNetModel
23
- params:
24
- use_checkpoint: True
25
- use_fp16: False
26
- image_size: 32 # unused
27
- in_channels: 4
28
- out_channels: 4
29
- model_channels: 320
30
- attention_resolutions: [ 4, 2, 1 ]
31
- num_res_blocks: 2
32
- channel_mult: [ 1, 2, 4, 4 ]
33
- num_head_channels: 64 # need to fix for flash-attn
34
- use_spatial_transformer: True
35
- use_linear_in_transformer: True
36
- transformer_depth: 1
37
- context_dim: 1024
38
- legacy: False
39
-
40
- first_stage_config:
41
- target: ldm.models.autoencoder.AutoencoderKL
42
- params:
43
- embed_dim: 4
44
- monitor: val/rec_loss
45
- ddconfig:
46
- #attn_type: "vanilla-xformers"
47
- double_z: true
48
- z_channels: 4
49
- resolution: 256
50
- in_channels: 3
51
- out_ch: 3
52
- ch: 128
53
- ch_mult:
54
- - 1
55
- - 2
56
- - 4
57
- - 4
58
- num_res_blocks: 2
59
- attn_resolutions: []
60
- dropout: 0.0
61
- lossconfig:
62
- target: torch.nn.Identity
63
-
64
- cond_stage_config:
65
- target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
66
- params:
67
- freeze: True
68
- layer: "penultimate"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/models/configs/v2-inference.yaml DELETED
@@ -1,67 +0,0 @@
1
- model:
2
- base_learning_rate: 1.0e-4
3
- target: ldm.models.diffusion.ddpm.LatentDiffusion
4
- params:
5
- linear_start: 0.00085
6
- linear_end: 0.0120
7
- num_timesteps_cond: 1
8
- log_every_t: 200
9
- timesteps: 1000
10
- first_stage_key: "jpg"
11
- cond_stage_key: "txt"
12
- image_size: 64
13
- channels: 4
14
- cond_stage_trainable: false
15
- conditioning_key: crossattn
16
- monitor: val/loss_simple_ema
17
- scale_factor: 0.18215
18
- use_ema: False # we set this to false because this is an inference only config
19
-
20
- unet_config:
21
- target: ldm.modules.diffusionmodules.openaimodel.UNetModel
22
- params:
23
- use_checkpoint: True
24
- use_fp16: True
25
- image_size: 32 # unused
26
- in_channels: 4
27
- out_channels: 4
28
- model_channels: 320
29
- attention_resolutions: [ 4, 2, 1 ]
30
- num_res_blocks: 2
31
- channel_mult: [ 1, 2, 4, 4 ]
32
- num_head_channels: 64 # need to fix for flash-attn
33
- use_spatial_transformer: True
34
- use_linear_in_transformer: True
35
- transformer_depth: 1
36
- context_dim: 1024
37
- legacy: False
38
-
39
- first_stage_config:
40
- target: ldm.models.autoencoder.AutoencoderKL
41
- params:
42
- embed_dim: 4
43
- monitor: val/rec_loss
44
- ddconfig:
45
- #attn_type: "vanilla-xformers"
46
- double_z: true
47
- z_channels: 4
48
- resolution: 256
49
- in_channels: 3
50
- out_ch: 3
51
- ch: 128
52
- ch_mult:
53
- - 1
54
- - 2
55
- - 4
56
- - 4
57
- num_res_blocks: 2
58
- attn_resolutions: []
59
- dropout: 0.0
60
- lossconfig:
61
- target: torch.nn.Identity
62
-
63
- cond_stage_config:
64
- target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
65
- params:
66
- freeze: True
67
- layer: "penultimate"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/models/configs/v2-inference_fp32.yaml DELETED
@@ -1,67 +0,0 @@
1
- model:
2
- base_learning_rate: 1.0e-4
3
- target: ldm.models.diffusion.ddpm.LatentDiffusion
4
- params:
5
- linear_start: 0.00085
6
- linear_end: 0.0120
7
- num_timesteps_cond: 1
8
- log_every_t: 200
9
- timesteps: 1000
10
- first_stage_key: "jpg"
11
- cond_stage_key: "txt"
12
- image_size: 64
13
- channels: 4
14
- cond_stage_trainable: false
15
- conditioning_key: crossattn
16
- monitor: val/loss_simple_ema
17
- scale_factor: 0.18215
18
- use_ema: False # we set this to false because this is an inference only config
19
-
20
- unet_config:
21
- target: ldm.modules.diffusionmodules.openaimodel.UNetModel
22
- params:
23
- use_checkpoint: True
24
- use_fp16: False
25
- image_size: 32 # unused
26
- in_channels: 4
27
- out_channels: 4
28
- model_channels: 320
29
- attention_resolutions: [ 4, 2, 1 ]
30
- num_res_blocks: 2
31
- channel_mult: [ 1, 2, 4, 4 ]
32
- num_head_channels: 64 # need to fix for flash-attn
33
- use_spatial_transformer: True
34
- use_linear_in_transformer: True
35
- transformer_depth: 1
36
- context_dim: 1024
37
- legacy: False
38
-
39
- first_stage_config:
40
- target: ldm.models.autoencoder.AutoencoderKL
41
- params:
42
- embed_dim: 4
43
- monitor: val/rec_loss
44
- ddconfig:
45
- #attn_type: "vanilla-xformers"
46
- double_z: true
47
- z_channels: 4
48
- resolution: 256
49
- in_channels: 3
50
- out_ch: 3
51
- ch: 128
52
- ch_mult:
53
- - 1
54
- - 2
55
- - 4
56
- - 4
57
- num_res_blocks: 2
58
- attn_resolutions: []
59
- dropout: 0.0
60
- lossconfig:
61
- target: torch.nn.Identity
62
-
63
- cond_stage_config:
64
- target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
65
- params:
66
- freeze: True
67
- layer: "penultimate"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/models/configs/v2-inpainting-inference.yaml DELETED
@@ -1,158 +0,0 @@
1
- model:
2
- base_learning_rate: 5.0e-05
3
- target: ldm.models.diffusion.ddpm.LatentInpaintDiffusion
4
- params:
5
- linear_start: 0.00085
6
- linear_end: 0.0120
7
- num_timesteps_cond: 1
8
- log_every_t: 200
9
- timesteps: 1000
10
- first_stage_key: "jpg"
11
- cond_stage_key: "txt"
12
- image_size: 64
13
- channels: 4
14
- cond_stage_trainable: false
15
- conditioning_key: hybrid
16
- scale_factor: 0.18215
17
- monitor: val/loss_simple_ema
18
- finetune_keys: null
19
- use_ema: False
20
-
21
- unet_config:
22
- target: ldm.modules.diffusionmodules.openaimodel.UNetModel
23
- params:
24
- use_checkpoint: True
25
- image_size: 32 # unused
26
- in_channels: 9
27
- out_channels: 4
28
- model_channels: 320
29
- attention_resolutions: [ 4, 2, 1 ]
30
- num_res_blocks: 2
31
- channel_mult: [ 1, 2, 4, 4 ]
32
- num_head_channels: 64 # need to fix for flash-attn
33
- use_spatial_transformer: True
34
- use_linear_in_transformer: True
35
- transformer_depth: 1
36
- context_dim: 1024
37
- legacy: False
38
-
39
- first_stage_config:
40
- target: ldm.models.autoencoder.AutoencoderKL
41
- params:
42
- embed_dim: 4
43
- monitor: val/rec_loss
44
- ddconfig:
45
- #attn_type: "vanilla-xformers"
46
- double_z: true
47
- z_channels: 4
48
- resolution: 256
49
- in_channels: 3
50
- out_ch: 3
51
- ch: 128
52
- ch_mult:
53
- - 1
54
- - 2
55
- - 4
56
- - 4
57
- num_res_blocks: 2
58
- attn_resolutions: [ ]
59
- dropout: 0.0
60
- lossconfig:
61
- target: torch.nn.Identity
62
-
63
- cond_stage_config:
64
- target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
65
- params:
66
- freeze: True
67
- layer: "penultimate"
68
-
69
-
70
- data:
71
- target: ldm.data.laion.WebDataModuleFromConfig
72
- params:
73
- tar_base: null # for concat as in LAION-A
74
- p_unsafe_threshold: 0.1
75
- filter_word_list: "data/filters.yaml"
76
- max_pwatermark: 0.45
77
- batch_size: 8
78
- num_workers: 6
79
- multinode: True
80
- min_size: 512
81
- train:
82
- shards:
83
- - "pipe:aws s3 cp s3://stability-aws/laion-a-native/part-0/{00000..18699}.tar -"
84
- - "pipe:aws s3 cp s3://stability-aws/laion-a-native/part-1/{00000..18699}.tar -"
85
- - "pipe:aws s3 cp s3://stability-aws/laion-a-native/part-2/{00000..18699}.tar -"
86
- - "pipe:aws s3 cp s3://stability-aws/laion-a-native/part-3/{00000..18699}.tar -"
87
- - "pipe:aws s3 cp s3://stability-aws/laion-a-native/part-4/{00000..18699}.tar -" #{00000-94333}.tar"
88
- shuffle: 10000
89
- image_key: jpg
90
- image_transforms:
91
- - target: torchvision.transforms.Resize
92
- params:
93
- size: 512
94
- interpolation: 3
95
- - target: torchvision.transforms.RandomCrop
96
- params:
97
- size: 512
98
- postprocess:
99
- target: ldm.data.laion.AddMask
100
- params:
101
- mode: "512train-large"
102
- p_drop: 0.25
103
- # NOTE use enough shards to avoid empty validation loops in workers
104
- validation:
105
- shards:
106
- - "pipe:aws s3 cp s3://deep-floyd-s3/datasets/laion_cleaned-part5/{93001..94333}.tar - "
107
- shuffle: 0
108
- image_key: jpg
109
- image_transforms:
110
- - target: torchvision.transforms.Resize
111
- params:
112
- size: 512
113
- interpolation: 3
114
- - target: torchvision.transforms.CenterCrop
115
- params:
116
- size: 512
117
- postprocess:
118
- target: ldm.data.laion.AddMask
119
- params:
120
- mode: "512train-large"
121
- p_drop: 0.25
122
-
123
- lightning:
124
- find_unused_parameters: True
125
- modelcheckpoint:
126
- params:
127
- every_n_train_steps: 5000
128
-
129
- callbacks:
130
- metrics_over_trainsteps_checkpoint:
131
- params:
132
- every_n_train_steps: 10000
133
-
134
- image_logger:
135
- target: main.ImageLogger
136
- params:
137
- enable_autocast: False
138
- disabled: False
139
- batch_frequency: 1000
140
- max_images: 4
141
- increase_log_steps: False
142
- log_first_step: False
143
- log_images_kwargs:
144
- use_ema_scope: False
145
- inpaint: False
146
- plot_progressive_rows: False
147
- plot_diffusion_rows: False
148
- N: 4
149
- unconditional_guidance_scale: 5.0
150
- unconditional_guidance_label: [""]
151
- ddim_steps: 50 # todo check these out for depth2img,
152
- ddim_eta: 0.0 # todo check these out for depth2img,
153
-
154
- trainer:
155
- benchmark: True
156
- val_check_interval: 5000000
157
- num_sanity_val_steps: 0
158
- accumulate_grad_batches: 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ComfyUI/models/controlnet/put_controlnets_and_t2i_here DELETED
File without changes
ComfyUI/models/diffusers/put_diffusers_models_here DELETED
File without changes
ComfyUI/models/diffusion_models/put_diffusion_model_files_here DELETED
File without changes
ComfyUI/models/embeddings/put_embeddings_or_textual_inversion_concepts_here DELETED
File without changes
ComfyUI/models/gligen/put_gligen_models_here DELETED
File without changes
ComfyUI/models/hypernetworks/put_hypernetworks_here DELETED
File without changes
ComfyUI/models/loras/put_loras_here DELETED
File without changes
ComfyUI/models/model_patches/put_model_patches_here DELETED
File without changes
ComfyUI/models/photomaker/put_photomaker_models_here DELETED
File without changes
ComfyUI/models/style_models/put_t2i_style_model_here DELETED
File without changes
ComfyUI/models/text_encoders/put_text_encoder_files_here DELETED
File without changes
ComfyUI/models/unet/put_unet_files_here DELETED
File without changes
ComfyUI/models/upscale_models/put_esrgan_and_other_upscale_models_here DELETED
File without changes
ComfyUI/models/vae/put_vae_here DELETED
File without changes
ComfyUI/models/vae_approx/put_taesd_encoder_pth_and_taesd_decoder_pth_here DELETED
File without changes