Commit
·
3c99844
1
Parent(s):
5f243ee
Update README.md
Browse files
README.md
CHANGED
|
@@ -114,7 +114,7 @@ img {
|
|
| 114 |
| [](#datasets)
|
| 115 |
|
| 116 |
|
| 117 |
-
This model extracts speaker embeddings from given speech, which
|
| 118 |
It is a "large" version of TitaNet (around 23M parameters) models.
|
| 119 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/models.html#titanet) for complete architecture details.
|
| 120 |
|
|
@@ -141,7 +141,7 @@ speaker_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained("nvidia/
|
|
| 141 |
Using
|
| 142 |
|
| 143 |
```python
|
| 144 |
-
emb = speaker_model.get_embedding("
|
| 145 |
```
|
| 146 |
|
| 147 |
### Verifying two utterances (Speaker Verification)
|
|
@@ -149,7 +149,7 @@ emb = speaker_model.get_embedding("nvidia/an255-fash-b.wav")
|
|
| 149 |
Now to check if two audio files are from the same speaker or not, simply do:
|
| 150 |
|
| 151 |
```python
|
| 152 |
-
speaker_model.verify_speakers("
|
| 153 |
```
|
| 154 |
|
| 155 |
### Extracting Embeddings for more audio files
|
|
@@ -161,6 +161,7 @@ Write audio files to a `manifest.json` file with lines as in format:
|
|
| 161 |
```json
|
| 162 |
{"audio_filepath": "<absolute path to dataset>/audio_file.wav", "duration": "duration of file in sec", "label": "speaker_id"}
|
| 163 |
```
|
|
|
|
| 164 |
Then running following script will extract embeddings and writes to current working directory:
|
| 165 |
```shell
|
| 166 |
python <NeMo_root>/examples/speaker_tasks/recognition/extract_speaker_embeddings.py --manifest=manifest.json
|
|
|
|
| 114 |
| [](#datasets)
|
| 115 |
|
| 116 |
|
| 117 |
+
This model extracts speaker embeddings from given speech, which is the backbone for speaker verification and diarization tasks.
|
| 118 |
It is a "large" version of TitaNet (around 23M parameters) models.
|
| 119 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/models.html#titanet) for complete architecture details.
|
| 120 |
|
|
|
|
| 141 |
Using
|
| 142 |
|
| 143 |
```python
|
| 144 |
+
emb = speaker_model.get_embedding("an255-fash-b.wav")
|
| 145 |
```
|
| 146 |
|
| 147 |
### Verifying two utterances (Speaker Verification)
|
|
|
|
| 149 |
Now to check if two audio files are from the same speaker or not, simply do:
|
| 150 |
|
| 151 |
```python
|
| 152 |
+
speaker_model.verify_speakers("an255-fash-b.wav","cen7-fash-b.wav")
|
| 153 |
```
|
| 154 |
|
| 155 |
### Extracting Embeddings for more audio files
|
|
|
|
| 161 |
```json
|
| 162 |
{"audio_filepath": "<absolute path to dataset>/audio_file.wav", "duration": "duration of file in sec", "label": "speaker_id"}
|
| 163 |
```
|
| 164 |
+
|
| 165 |
Then running following script will extract embeddings and writes to current working directory:
|
| 166 |
```shell
|
| 167 |
python <NeMo_root>/examples/speaker_tasks/recognition/extract_speaker_embeddings.py --manifest=manifest.json
|