nvidia
/

speakerverification_en_titanet_large

speaker-verification

speaker-recognition

speaker-diarization

Model card Files Files and versions

smajumdar94 commited on Jul 15, 2022

Commit

3c99844

·

1 Parent(s): 5f243ee

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -114,7 +114,7 @@ img {
 | [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
-This model extracts speaker embeddings from given speech, which are backbone for speaker verification and diarization tasks.
 It is a "large" version of TitaNet (around 23M parameters) models.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/models.html#titanet) for complete architecture details.
@@ -141,7 +141,7 @@ speaker_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained("nvidia/
 Using
 ```python
-emb = speaker_model.get_embedding("nvidia/an255-fash-b.wav")
 ```
 ### Verifying two utterances (Speaker Verification)
@@ -149,7 +149,7 @@ emb = speaker_model.get_embedding("nvidia/an255-fash-b.wav")
 Now to check if two audio files are from the same speaker or not, simply do:
 ```python
-speaker_model.verify_speakers("nvidia/an255-fash-b.wav","nvidia/cen7-fash-b.wav")
 ```
 ### Extracting Embeddings for more audio files
@@ -161,6 +161,7 @@ Write audio files to a `manifest.json` file with lines as in format:
 ```json
 {"audio_filepath": "<absolute path to dataset>/audio_file.wav", "duration": "duration of file in sec", "label": "speaker_id"}
 ```
 Then running following script will extract embeddings and writes to current working directory:
 ```shell
 python <NeMo_root>/examples/speaker_tasks/recognition/extract_speaker_embeddings.py --manifest=manifest.json

 | [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
+This model extracts speaker embeddings from given speech, which is the backbone for speaker verification and diarization tasks.
 It is a "large" version of TitaNet (around 23M parameters) models.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/models.html#titanet) for complete architecture details.
 Using
 ```python
+emb = speaker_model.get_embedding("an255-fash-b.wav")
 ```
 ### Verifying two utterances (Speaker Verification)
 Now to check if two audio files are from the same speaker or not, simply do:
 ```python
+speaker_model.verify_speakers("an255-fash-b.wav","cen7-fash-b.wav")
 ```
 ### Extracting Embeddings for more audio files
 ```json
 {"audio_filepath": "<absolute path to dataset>/audio_file.wav", "duration": "duration of file in sec", "label": "speaker_id"}
 ```
 Then running following script will extract embeddings and writes to current working directory:
 ```shell
 python <NeMo_root>/examples/speaker_tasks/recognition/extract_speaker_embeddings.py --manifest=manifest.json