grikdotnet commited on
Commit
1cad70c
·
verified ·
1 Parent(s): e7c20ef

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Parakeet TDT 0.6B V3 - FP16 ONNX
2
+
3
+ FP16 (half-precision) quantized version of the [Parakeet TDT 0.6B V3 ONNX model](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx).
4
+
5
+ ## Overview
6
+
7
+ This repository contains FP16-quantized ONNX models for NVIDIA's Parakeet TDT (Token-and-Duration Transducer) 0.6B V3, a multilingual automatic speech recognition (ASR) model.
8
+
9
+ **Key Benefits:**
10
+ - **50% smaller size**: 1.25GB total vs 2.4GB original
11
+ - **Faster inference**: FP16 operations accelerated on modern GPUs
12
+ - **Same accuracy**: Minimal quality loss from quantization
13
+ - **Drop-in replacement**: Compatible with `onnx-asr` library via `quantization='fp16'` parameter
14
+
15
+ ## Model Files
16
+
17
+ | File | Size | Description |
18
+ |------|------|-------------|
19
+ | `encoder-model.fp16.onnx` | 1.2GB | FP16 encoder model |
20
+ | `decoder_joint-model.fp16.onnx` | 35MB | FP16 decoder model |
21
+
22
+ **Note:** You'll also need the supporting files from the [original repository](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx):
23
+ - `config.json` - Model configuration
24
+ - `vocab.txt` - Vocabulary file
25
+ - `nemo128.onnx` - Tokenizer model
26
+
27
+ ## Installation
28
+
29
+ ```bash
30
+ pip install onnx-asr
31
+ ```
32
+
33
+ ## Usage
34
+
35
+ ### Basic Usage
36
+
37
+ ```python
38
+ import onnx_asr
39
+
40
+ # Load FP16 model
41
+ model = onnx_asr.load_model(
42
+ 'nemo-parakeet-tdt-0.6b-v3',
43
+ './models/parakeet', # Directory containing both FP32 and FP16 files
44
+ quantization='fp16', # Use FP16 quantized models
45
+ cpu_preprocessing=False
46
+ )
47
+
48
+ # Recognize speech from audio file
49
+ text = model.recognize('audio.wav')
50
+ print(text)
51
+ ```
52
+
53
+ ### With NumPy Arrays
54
+
55
+ ```python
56
+ import numpy as np
57
+
58
+ # Load audio as numpy array (16kHz, mono, float32)
59
+ audio = np.random.randn(16000).astype(np.float32)
60
+
61
+ # Recognize
62
+ text = model.recognize(audio)
63
+ ```
64
+
65
+ ### GPU Acceleration
66
+
67
+ FP16 models work best with GPU acceleration:
68
+
69
+ ```python
70
+ model = onnx_asr.load_model(
71
+ 'nemo-parakeet-tdt-0.6b-v3',
72
+ './models/parakeet',
73
+ quantization='fp16',
74
+ providers=['CUDAExecutionProvider', 'CPUExecutionProvider'], # GPU first
75
+ cpu_preprocessing=False
76
+ )
77
+ ```
78
+
79
+ ## How It Was Created
80
+
81
+ This FP16 model was created using a two-step process:
82
+
83
+ ### Step 1: FP32 → FP16 Conversion
84
+
85
+ ```python
86
+ from onnxconverter_common import float16
87
+ import onnx
88
+
89
+ model = onnx.load('encoder-model.onnx')
90
+ model_fp16 = float16.convert_float_to_float16(
91
+ model,
92
+ keep_io_types=True, # Keep inputs/outputs as FP32
93
+ disable_shape_infer=True # Preserve external data
94
+ )
95
+ onnx.save(model_fp16, 'encoder-model.fp16.onnx')
96
+ ```
97
+
98
+ ### Step 2: Fix Cast Operations
99
+
100
+ The initial conversion leaves some `Cast` operations targeting FP32, causing type mismatches. A post-processing script fixes these by converting internal `Cast(to=FLOAT)` operations to `Cast(to=FLOAT16)` while preserving output casts for compatibility.
101
+
102
+ See the conversion scripts:
103
+ - [`convert_to_fp16.py`](https://github.com/YOUR_USERNAME/YOUR_REPO/blob/main/convert_to_fp16.py)
104
+ - [`fix_fp16_casts.py`](https://github.com/YOUR_USERNAME/YOUR_REPO/blob/main/fix_fp16_casts.py)
105
+
106
+ ## Supported Languages
107
+
108
+ Supports 25 languages (same as original model):
109
+ - English, Spanish, French, German, Italian, Portuguese
110
+ - Russian, Polish, Ukrainian, Czech, Slovak
111
+ - Chinese (Mandarin), Japanese, Korean
112
+ - Arabic, Hebrew, Turkish
113
+ - Dutch, Swedish, Danish, Norwegian, Finnish
114
+ - And more...
115
+
116
+ ## License
117
+
118
+ This model is licensed under **CC-BY-4.0** (Creative Commons Attribution 4.0), same as the original Parakeet model.
119
+
120
+
121
+ See [huggingface repo](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) for details.
122
+
123
+ ## Citation
124
+
125
+ If you use this model, please cite both the original Parakeet model and the ONNX conversion.
126
+
127
+ ## Credits
128
+
129
+ - **Original Model**: [NVIDIA Parakeet TDT 0.6B V3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
130
+ - **ONNX Conversion**: [Igor Stupakov](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
131
+ - **FP16 Quantization**: this repository
132
+
133
+ ## Related Links
134
+
135
+ - [Original Parakeet Model](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
136
+ - [ONNX FP32 Version](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
137
+ - [onnx-asr Library](https://pypi.org/project/onnx-asr/)
138
+
139
+ ## Support
140
+
141
+ For issues or questions:
142
+ - **Original model questions**: See [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
143
+ - **onnx-asr library**: See [onnx-asr documentation](https://pypi.org/project/onnx-asr/)
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "model_type": "nemo-conformer-tdt",
3
+ "features_size": 128,
4
+ "subsampling_factor": 8
5
+ }
decoder_joint-model.fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b33a73b7c1d71b9d5a0911f5cb478be3dcbf79f53355c531ab1cd1dcd68ad8ef
3
+ size 36266140
encoder-model.fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2bdeeb99cb7e5548818e823127b33854dd0c26f5d0c8da91effdd895ea0e717
3
+ size 1238960452
nemo128.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9fde1486ebfcc08f328d75ad4610c67835fea58c73ba57e3209a6f6cf019e9f
3
+ size 139764
vocab.txt ADDED
The diff for this file is too large to render. See raw diff