Update README.md
Browse files
README.md
CHANGED
|
@@ -30,6 +30,7 @@ This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tun
|
|
| 30 |
|
| 31 |
## Quantization Process
|
| 32 |
If you want to create your own GGUF quantizations of HuggingFace models, use llama.cpp.
|
|
|
|
| 33 |
1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*, then compile.
|
| 34 |
```
|
| 35 |
cd llama.cpp && make
|
|
@@ -38,7 +39,7 @@ cd llama.cpp && make
|
|
| 38 |
```
|
| 39 |
python llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
|
| 40 |
```
|
| 41 |
-
3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit
|
| 42 |
```
|
| 43 |
# Quantize GGUF (FP16) to 8 Bit and 5 Bit (K_M)
|
| 44 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
|
|
|
|
| 30 |
|
| 31 |
## Quantization Process
|
| 32 |
If you want to create your own GGUF quantizations of HuggingFace models, use llama.cpp.
|
| 33 |
+
|
| 34 |
1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*, then compile.
|
| 35 |
```
|
| 36 |
cd llama.cpp && make
|
|
|
|
| 39 |
```
|
| 40 |
python llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
|
| 41 |
```
|
| 42 |
+
3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit and 5 Bit (K_M).
|
| 43 |
```
|
| 44 |
# Quantize GGUF (FP16) to 8 Bit and 5 Bit (K_M)
|
| 45 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
|