| | --- |
| | '[object Object]': null |
| | license: mit |
| | tags: |
| | - audio |
| | - deep-learning |
| | - pytorch |
| | - generative-adversarial-network |
| | - codec |
| | - gans |
| | - compression-algorithm |
| | - audio-compression |
| | - RVQ |
| | --- |
| | |
| |
|
| | # Descript Audio Codec |
| |
|
| | π With Descript Audio Codec, you can compress **44.1 KHz audio** into discrete codes at a **low 8 kbps bitrate**. <br> |
| | π€ That's approximately **90x compression** while maintaining exceptional fidelity and minimizing artifacts. <br> |
| | πͺ Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio. <br> |
| | π It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br> |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | - **License:** MIT |
| |
|
| | ### Model Sources |
| |
|
| | - **Repository:** [Github Repo](https://github.com/descriptinc/descript-audio-codec) |
| | - **Paper:** [arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN |
| | ](http://arxiv.org/abs/2306.06546) |
| | - **Demo:** [Demo Site](https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5) |
| |
|
| | ## Uses |
| |
|
| | The model is intended for compressing audio files containing speech, music and environmental sounds. |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | It is not intended to be used for compressing other file formats such as text, images, etc. |
| |
|
| | ## Bias, Risks, and Limitations |
| | Our model has difficulty reconstructing some challenging audio. It |
| | performs best for speech and has more issues with environmental sounds. It |
| | does not model some musical instruments perfectly, such as glockenspeil, or synthesizer sounds. |
| |
|
| |
|
| | ## How to Get Started with the Model |
| | This model is meant to be used with our official repo linked above. We release the model here for redundancy purposes. |
| | Our code is able to pull the weights from their |
| | [original location on Github](https://github.com/descriptinc/descript-audio-codec/releases/download/0.0.1/weights.pth). |
| | Please refer to the official [README](https://github.com/descriptinc/descript-audio-codec#readme) for usage instructions. |
| |
|
| | ## Citation |
| |
|
| | **BibTeX:** |
| |
|
| | ``` |
| | @misc{kumar2023highfidelity, |
| | title={High-Fidelity Audio Compression with Improved RVQGAN}, |
| | author={Rithesh Kumar and Prem Seetharaman and Alejandro Luebs and Ishaan Kumar and Kundan Kumar}, |
| | year={2023}, |
| | eprint={2306.06546}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.SD} |
| | } |
| | ``` |