File size: 1,359 Bytes
e52b667
 
aa8d0d6
 
 
 
 
 
 
 
 
 
e52b667
 
 
 
aa8d0d6
 
 
 
 
 
 
 
 
 
 
 
 
8f26d6c
e52b667
 
c1f2a98
e52b667
b2e427b
e52b667
 
 
 
 
 
 
 
 
c1f2a98
e52b667
 
 
 
 
 
 
aa8d0d6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
license: apache-2.0
language:
- multilingual
- en
- hi
- es
- fr
- de
- it
- gu
- mr
pipeline_tag: automatic-speech-recognition
tags:
- nemo
- asr
- emotion
- age
- gender
- intent
- entity_recognition
datasets:
- MLCommons/peoples_speech
- fsicoli/common_voice_17_0
- ai4bharat/IndicVoices
- facebook/multilingual_librispeech
- openslr/librispeech_asr
base_model:
- nvidia/parakeet-ctc-0.6b
library_name: nemo
---

# parakeet-ctc-0.6b-with-meta

This is a multilingual Automatic Speech Recognition (ASR) model fine-tuned with NVIDIA NeMo. It is different from standard transcription models, as it can mark intents, get voice bio, and emotions in streaming.

## How to Use

You can use this model directly with the NeMo toolkit for inference.

```python
import nemo.collections.asr as nemo_asr

# Load the model from Hugging Face Hub
asr_model = nemo_asr.models.ASRModel.from_pretrained("WhissleAI/parakeet-ctc-0.6b-with-meta")

# Transcribe an audio file
transcriptions = asr_model.transcribe(["/path/to/your/audio.wav"])
print(transcriptions)
```

This model can also be used with the inference server provided in the `PromptingNemo` repository.
See this folder for fine-tuning and inference scripts [https://github.com/WhissleAI/PromptingNemo/scripts/asr/meta-asr](https://github.com/WhissleAI/PromptingNemo/blob/main/scripts/asr/meta-asr) for details.