Upload complete model
Browse files
README.md
CHANGED
|
@@ -23,11 +23,17 @@ pipeline_tag: text-generation
|
|
| 23 |
|
| 24 |
## Usage Notes
|
| 25 |
|
| 26 |
-
*
|
|
|
|
| 27 |
* Memory usage: ~450 GB
|
| 28 |
-
* For a larger context window you can expand the
|
| 29 |
* sudo sysctl iogpu.wired_limit_mb=507000
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
|
| 32 |
* For more details see [demonstration video - coming soon](https://youtu.be/b6RgBIROK5o) or visit [DeepSeek-V3.2-Speciale](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale).
|
| 33 |
|
|
|
|
| 23 |
|
| 24 |
## Usage Notes
|
| 25 |
|
| 26 |
+
* With a single M3 Ultra 512GB RAM using [Inferencer app v1.7.3](https://inferencer.com)
|
| 27 |
+
* Expect ~16.5 tokens/s @ 1000 tokens
|
| 28 |
* Memory usage: ~450 GB
|
| 29 |
+
* For a larger context window (11k tokens) you can expand the RAM limit:
|
| 30 |
* sudo sysctl iogpu.wired_limit_mb=507000
|
| 31 |
+
|
| 32 |
+
* With M3 Ultra 512GB RAM connected to MBP 128GB RAM using [Inferencer app v1.7.3](https://inferencer.com) with distributed compute
|
| 33 |
+
* Expect ~13.7 tokens/s @ 1000 tokens
|
| 34 |
+
* Memory usage: MBP ~20GB + Mac Studio ~430GB
|
| 35 |
+
* More RAM available for larger context window using this method
|
| 36 |
+
|
| 37 |
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
|
| 38 |
* For more details see [demonstration video - coming soon](https://youtu.be/b6RgBIROK5o) or visit [DeepSeek-V3.2-Speciale](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale).
|
| 39 |
|