Upload complete model
Browse files
README.md
CHANGED
|
@@ -22,20 +22,22 @@ pipeline_tag: text-generation
|
|
| 22 |
| **q8.5** | 1.128 |
|
| 23 |
|
| 24 |
## Usage Notes
|
| 25 |
-
|
| 26 |
-
|
| 27 |
* Expect ~16.5 tokens/s @ 1000 tokens
|
| 28 |
* Memory usage: ~450 GB
|
| 29 |
* For a larger context window (11k tokens) you can expand the RAM limit:
|
| 30 |
-
|
|
|
|
|
|
|
| 31 |
|
| 32 |
-
|
| 33 |
* Expect ~13.7 tokens/s @ 1000 tokens
|
| 34 |
-
* Memory usage: MBP ~20GB + Mac Studio ~430GB
|
| 35 |
* More RAM available for larger context window using this method
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
| 39 |
|
| 40 |
## Disclaimer
|
| 41 |
|
|
|
|
| 22 |
| **q8.5** | 1.128 |
|
| 23 |
|
| 24 |
## Usage Notes
|
| 25 |
+
|
| 26 |
+
#### M3 Ultra 512GB RAM using [Inferencer app v1.7.3](https://inferencer.com)
|
| 27 |
* Expect ~16.5 tokens/s @ 1000 tokens
|
| 28 |
* Memory usage: ~450 GB
|
| 29 |
* For a larger context window (11k tokens) you can expand the RAM limit:
|
| 30 |
+
```bash
|
| 31 |
+
sudo sysctl iogpu.wired_limit_mb=507000
|
| 32 |
+
```
|
| 33 |
|
| 34 |
+
#### M3 Ultra 512GB RAM connected to MBP 128GB RAM using [Inferencer app v1.7.3](https://inferencer.com) with LAN distributed compute
|
| 35 |
* Expect ~13.7 tokens/s @ 1000 tokens
|
| 36 |
+
* Memory usage: MBP ~20GB + Mac Studio ~430GB (will be expanded in v1.7.4 to support dynamic splits)
|
| 37 |
* More RAM available for larger context window using this method
|
| 38 |
|
| 39 |
+
##### Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
|
| 40 |
+
##### For more details see [demonstration video - coming soon](https://youtu.be/b6RgBIROK5o) or visit [DeepSeek-V3.2-Speciale](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale).
|
| 41 |
|
| 42 |
## Disclaimer
|
| 43 |
|