inferencerlabs commited on
Commit
3d6a850
·
verified ·
1 Parent(s): 1f42335

Upload complete model

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -23,11 +23,17 @@ pipeline_tag: text-generation
23
 
24
  ## Usage Notes
25
 
26
- * Tested remotely over the network via a M3 Ultra 512GB RAM using [Inferencer app v1.7.3](https://inferencer.com)
 
27
  * Memory usage: ~450 GB
28
- * For a larger context window you can expand the VRAM limit:
29
  * sudo sysctl iogpu.wired_limit_mb=507000
30
- * Expect ~16.5 tokens/s @ 1000 tokens
 
 
 
 
 
31
  * Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
32
  * For more details see [demonstration video - coming soon](https://youtu.be/b6RgBIROK5o) or visit [DeepSeek-V3.2-Speciale](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale).
33
 
 
23
 
24
  ## Usage Notes
25
 
26
+ * With a single M3 Ultra 512GB RAM using [Inferencer app v1.7.3](https://inferencer.com)
27
+ * Expect ~16.5 tokens/s @ 1000 tokens
28
  * Memory usage: ~450 GB
29
+ * For a larger context window (11k tokens) you can expand the RAM limit:
30
  * sudo sysctl iogpu.wired_limit_mb=507000
31
+
32
+ * With M3 Ultra 512GB RAM connected to MBP 128GB RAM using [Inferencer app v1.7.3](https://inferencer.com) with distributed compute
33
+ * Expect ~13.7 tokens/s @ 1000 tokens
34
+ * Memory usage: MBP ~20GB + Mac Studio ~430GB
35
+ * More RAM available for larger context window using this method
36
+
37
  * Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
38
  * For more details see [demonstration video - coming soon](https://youtu.be/b6RgBIROK5o) or visit [DeepSeek-V3.2-Speciale](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale).
39