ksingla025's picture
Update README.md
8f26d6c verified
metadata
license: apache-2.0
language:
  - multilingual
  - en
  - hi
  - es
  - fr
  - de
  - it
  - gu
  - mr
pipeline_tag: automatic-speech-recognition
tags:
  - nemo
  - asr
  - emotion
  - age
  - gender
  - intent
  - entity_recognition
datasets:
  - MLCommons/peoples_speech
  - fsicoli/common_voice_17_0
  - ai4bharat/IndicVoices
  - facebook/multilingual_librispeech
  - openslr/librispeech_asr
base_model:
  - nvidia/parakeet-ctc-0.6b
library_name: nemo

parakeet-ctc-0.6b-with-meta

This is a multilingual Automatic Speech Recognition (ASR) model fine-tuned with NVIDIA NeMo. It is different from standard transcription models, as it can mark intents, get voice bio, and emotions in streaming.

How to Use

You can use this model directly with the NeMo toolkit for inference.

import nemo.collections.asr as nemo_asr

# Load the model from Hugging Face Hub
asr_model = nemo_asr.models.ASRModel.from_pretrained("WhissleAI/parakeet-ctc-0.6b-with-meta")

# Transcribe an audio file
transcriptions = asr_model.transcribe(["/path/to/your/audio.wav"])
print(transcriptions)

This model can also be used with the inference server provided in the PromptingNemo repository. See this folder for fine-tuning and inference scripts https://github.com/WhissleAI/PromptingNemo/scripts/asr/meta-asr for details.