Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

README.md +143 -0
config.json +5 -0
decoder_joint-model.fp16.onnx +3 -0
encoder-model.fp16.onnx +3 -0
nemo128.onnx +3 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,143 @@

+# Parakeet TDT 0.6B V3 - FP16 ONNX
+FP16 (half-precision) quantized version of the [Parakeet TDT 0.6B V3 ONNX model](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx).
+## Overview
+This repository contains FP16-quantized ONNX models for NVIDIA's Parakeet TDT (Token-and-Duration Transducer) 0.6B V3, a multilingual automatic speech recognition (ASR) model.
+**Key Benefits:**
+- **50% smaller size**: 1.25GB total vs 2.4GB original
+- **Faster inference**: FP16 operations accelerated on modern GPUs
+- **Same accuracy**: Minimal quality loss from quantization
+- **Drop-in replacement**: Compatible with `onnx-asr` library via `quantization='fp16'` parameter
+## Model Files
+| File | Size | Description |
+|------|------|-------------|
+| `encoder-model.fp16.onnx` | 1.2GB | FP16 encoder model |
+| `decoder_joint-model.fp16.onnx` | 35MB | FP16 decoder model |
+**Note:** You'll also need the supporting files from the [original repository](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx):
+- `config.json` - Model configuration
+- `vocab.txt` - Vocabulary file
+- `nemo128.onnx` - Tokenizer model
+## Installation
+```bash
+pip install onnx-asr
+```
+## Usage
+### Basic Usage
+```python
+import onnx_asr
+# Load FP16 model
+model = onnx_asr.load_model(
+    'nemo-parakeet-tdt-0.6b-v3',
+    './models/parakeet',  # Directory containing both FP32 and FP16 files
+    quantization='fp16',  # Use FP16 quantized models
+    cpu_preprocessing=False
+)
+# Recognize speech from audio file
+text = model.recognize('audio.wav')
+print(text)
+```
+### With NumPy Arrays
+```python
+import numpy as np
+# Load audio as numpy array (16kHz, mono, float32)
+audio = np.random.randn(16000).astype(np.float32)
+# Recognize
+text = model.recognize(audio)
+```
+### GPU Acceleration
+FP16 models work best with GPU acceleration:
+```python
+model = onnx_asr.load_model(
+    'nemo-parakeet-tdt-0.6b-v3',
+    './models/parakeet',
+    quantization='fp16',
+    providers=['CUDAExecutionProvider', 'CPUExecutionProvider'],  # GPU first
+    cpu_preprocessing=False
+)
+```
+## How It Was Created
+This FP16 model was created using a two-step process:
+### Step 1: FP32 → FP16 Conversion
+```python
+from onnxconverter_common import float16
+import onnx
+model = onnx.load('encoder-model.onnx')
+model_fp16 = float16.convert_float_to_float16(
+    model,
+    keep_io_types=True,           # Keep inputs/outputs as FP32
+    disable_shape_infer=True      # Preserve external data
+)
+onnx.save(model_fp16, 'encoder-model.fp16.onnx')
+```
+### Step 2: Fix Cast Operations
+The initial conversion leaves some `Cast` operations targeting FP32, causing type mismatches. A post-processing script fixes these by converting internal `Cast(to=FLOAT)` operations to `Cast(to=FLOAT16)` while preserving output casts for compatibility.
+See the conversion scripts:
+- [`convert_to_fp16.py`](https://github.com/YOUR_USERNAME/YOUR_REPO/blob/main/convert_to_fp16.py)
+- [`fix_fp16_casts.py`](https://github.com/YOUR_USERNAME/YOUR_REPO/blob/main/fix_fp16_casts.py)
+## Supported Languages
+Supports 25 languages (same as original model):
+- English, Spanish, French, German, Italian, Portuguese
+- Russian, Polish, Ukrainian, Czech, Slovak
+- Chinese (Mandarin), Japanese, Korean
+- Arabic, Hebrew, Turkish
+- Dutch, Swedish, Danish, Norwegian, Finnish
+- And more...
+## License
+This model is licensed under **CC-BY-4.0** (Creative Commons Attribution 4.0), same as the original Parakeet model.
+See [huggingface repo](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) for details.
+## Citation
+If you use this model, please cite both the original Parakeet model and the ONNX conversion.
+## Credits
+- **Original Model**: [NVIDIA Parakeet TDT 0.6B V3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
+- **ONNX Conversion**: [Igor Stupakov](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
+- **FP16 Quantization**: this repository
+## Related Links
+- [Original Parakeet Model](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
+- [ONNX FP32 Version](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
+- [onnx-asr Library](https://pypi.org/project/onnx-asr/)
+## Support
+For issues or questions:
+- **Original model questions**: See [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
+- **onnx-asr library**: See [onnx-asr documentation](https://pypi.org/project/onnx-asr/)

config.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+    "model_type": "nemo-conformer-tdt",
+    "features_size": 128,
+    "subsampling_factor": 8
+}

decoder_joint-model.fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b33a73b7c1d71b9d5a0911f5cb478be3dcbf79f53355c531ab1cd1dcd68ad8ef
+size 36266140

encoder-model.fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a2bdeeb99cb7e5548818e823127b33854dd0c26f5d0c8da91effdd895ea0e717
+size 1238960452

nemo128.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a9fde1486ebfcc08f328d75ad4610c67835fea58c73ba57e3209a6f6cf019e9f
+size 139764

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff