File size: 1,359 Bytes

---
license: apache-2.0
language:
- multilingual
- en
- hi
- es
- fr
- de
- it
- gu
- mr
pipeline_tag: automatic-speech-recognition
tags:
- nemo
- asr
- emotion
- age
- gender
- intent
- entity_recognition
datasets:
- MLCommons/peoples_speech
- fsicoli/common_voice_17_0
- ai4bharat/IndicVoices
- facebook/multilingual_librispeech
- openslr/librispeech_asr
base_model:
- nvidia/parakeet-ctc-0.6b
library_name: nemo
---

# parakeet-ctc-0.6b-with-meta

This is a multilingual Automatic Speech Recognition (ASR) model fine-tuned with NVIDIA NeMo. It is different from standard transcription models, as it can mark intents, get voice bio, and emotions in streaming.

## How to Use

You can use this model directly with the NeMo toolkit for inference.

```python
import nemo.collections.asr as nemo_asr

# Load the model from Hugging Face Hub
asr_model = nemo_asr.models.ASRModel.from_pretrained("WhissleAI/parakeet-ctc-0.6b-with-meta")

# Transcribe an audio file
transcriptions = asr_model.transcribe(["/path/to/your/audio.wav"])
print(transcriptions)
```

This model can also be used with the inference server provided in the `PromptingNemo` repository.
See this folder for fine-tuning and inference scripts [https://github.com/WhissleAI/PromptingNemo/scripts/asr/meta-asr](https://github.com/WhissleAI/PromptingNemo/blob/main/scripts/asr/meta-asr) for details.