mechanicalsea
/

efficient-tdnn

Model card Files Files and versions

mechanicalsea commited on Aug 24, 2022

Commit

391404d

·

1 Parent(s): 049a0c7

Update README.md

Files changed (1) hide show

README.md +15 -16

README.md CHANGED Viewed

@@ -42,18 +42,18 @@ The details of three subnets are:
 ## Compute your speaker embeddings
 ```python
-import torchaudio
 from sugar.models import WrappedModel
-wav_file = f"{vox1_root}/id10270/x6uYqmx31kE/00001.wav"
-signal, fs =torchaudio.load(wav_file)
 repo_id = "mechanicalsea/efficient-tdnn"
 supernet_filename = "depth/depth.torchparams"
 subnet_filename = "depth/depth.ecapa-tdnn.3.512.512.512.512.5.3.3.3.1536.bn.tar"
-subnet, info = WrappedModel.from_pretrained(
-    repo_id=repo_id, supernet_filename=supernet_filename, subnet_filename=subnet_filename)
-embedding = subnet(signal)
 ```
 ## Inference on GPU
@@ -112,14 +112,13 @@ More details about EfficentTDNN can be found in the paper [EfficientTDNN](https:
 Please, cite EfficientTDNN if you use it for your research or business.
 ```bibtex
-@article{rwang-efficienttdnn-2021,
-  title={{EfficientTDNN}: Efficient Architecture Search for Speaker Recognition},
-  author={Rui Wang and Zhihua Wei and Haoran Duan and Shouling Ji and Yang Long and Zhen Hong},
-  journal={arXiv preprint arXiv:2103.13581},
-  year={2021},
-  eprint={2103.13581},
-  archivePrefix={arXiv},
-  primaryClass={eess.AS},
-  note={arXiv:2103.13581}
-}
 ```

 ## Compute your speaker embeddings
 ```python
+import torch
 from sugar.models import WrappedModel
+wav_input_16khz = torch.randn(1,10000).cuda()
 repo_id = "mechanicalsea/efficient-tdnn"
 supernet_filename = "depth/depth.torchparams"
 subnet_filename = "depth/depth.ecapa-tdnn.3.512.512.512.512.5.3.3.3.1536.bn.tar"
+subnet, info = WrappedModel.from_pretrained(repo_id=repo_id, supernet_filename=supernet_filename, subnet_filename=subnet_filename)
+subnet = subnet.cuda()
+subnet = subnet.eval()
+embedding = subnet(wav_input_16khz)
 ```
 ## Inference on GPU
 Please, cite EfficientTDNN if you use it for your research or business.
 ```bibtex
+@article{wr-efficienttdnn-2022,
+  author={Wang, Rui and Wei, Zhihua and Duan, Haoran and Ji, Shouling and Long, Yang and Hong, Zhen},
+  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
+  title={EfficientTDNN: Efficient Architecture Search for Speaker Recognition},
+  year={2022},
+  volume={30},
+  number={},
+  pages={2267-2279},
+  doi={10.1109/TASLP.2022.3182856}}
 ```