swayamsingal commited on
Commit
bbb8b3e
·
verified ·
1 Parent(s): 933fa99

Add model card with deployment instructions

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - llm
5
+ - compression
6
+ - nanoquant
7
+ - quantization
8
+ - pruning
9
+ license: apache-2.0
10
+ datasets: []
11
+ model-index: []
12
+ ---
13
+
14
+ # NanoQuant Compressed Model
15
+
16
+ ## Model Description
17
+
18
+ This is a compressed version of [tencent/Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B)
19
+ created using NanoQuant, an advanced LLM compression toolkit.
20
+
21
+ ## Compression Details
22
+
23
+ - **Compression Level**: medium
24
+ - **Size Reduction**: 77.0%
25
+ - **Techniques Used**:
26
+ - Quantization: 8bit
27
+ - Pruning: magnitude
28
+ - LoRA: {'r': 32, 'alpha': 32, 'dropout': 0.1}
29
+
30
+ ## Deployment Options
31
+
32
+ ### Option 1: Direct Usage with Transformers
33
+
34
+ ```python
35
+ from transformers import AutoModelForCausalLM, AutoTokenizer
36
+
37
+ model = AutoModelForCausalLM.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_medium")
38
+ tokenizer = AutoTokenizer.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_medium")
39
+ ```
40
+
41
+ ### Option 2: Ollama Deployment
42
+
43
+ This model is also available for Ollama:
44
+
45
+ ```bash
46
+ ollama pull nanoquant-tencent-Hunyuan-MT-7B:medium
47
+ ```
48
+
49
+ ## Performance Characteristics
50
+
51
+ Due to the compression, this model:
52
+ - Requires significantly less storage space
53
+ - Has faster loading times
54
+ - Uses less memory during inference
55
+ - Maintains most of the original model's capabilities
56
+
57
+ ## Original Model
58
+
59
+ For information about the original model, please visit: https://huggingface.co/tencent/Hunyuan-MT-7B
60
+
61
+ ## License
62
+
63
+ This model is released under the Apache 2.0 license.
64
+
65
+ ## NanoQuant
66
+
67
+ NanoQuant is an advanced model compression system that achieves up to 99.95% size reduction while maintaining model performance.
68
+ Learn more at [NanoQuant Documentation](https://github.com/nanoquant/nanoquant).