File size: 1,359 Bytes
e52b667 aa8d0d6 e52b667 aa8d0d6 8f26d6c e52b667 c1f2a98 e52b667 b2e427b e52b667 c1f2a98 e52b667 aa8d0d6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
license: apache-2.0
language:
- multilingual
- en
- hi
- es
- fr
- de
- it
- gu
- mr
pipeline_tag: automatic-speech-recognition
tags:
- nemo
- asr
- emotion
- age
- gender
- intent
- entity_recognition
datasets:
- MLCommons/peoples_speech
- fsicoli/common_voice_17_0
- ai4bharat/IndicVoices
- facebook/multilingual_librispeech
- openslr/librispeech_asr
base_model:
- nvidia/parakeet-ctc-0.6b
library_name: nemo
---
# parakeet-ctc-0.6b-with-meta
This is a multilingual Automatic Speech Recognition (ASR) model fine-tuned with NVIDIA NeMo. It is different from standard transcription models, as it can mark intents, get voice bio, and emotions in streaming.
## How to Use
You can use this model directly with the NeMo toolkit for inference.
```python
import nemo.collections.asr as nemo_asr
# Load the model from Hugging Face Hub
asr_model = nemo_asr.models.ASRModel.from_pretrained("WhissleAI/parakeet-ctc-0.6b-with-meta")
# Transcribe an audio file
transcriptions = asr_model.transcribe(["/path/to/your/audio.wav"])
print(transcriptions)
```
This model can also be used with the inference server provided in the `PromptingNemo` repository.
See this folder for fine-tuning and inference scripts [https://github.com/WhissleAI/PromptingNemo/scripts/asr/meta-asr](https://github.com/WhissleAI/PromptingNemo/blob/main/scripts/asr/meta-asr) for details. |