openbmb
/

VoxCPM-0.5B

@@ -222,16 +222,6 @@ VoxCPM achieves competitive results on public zero-shot TTS benchmarks:
 | **VoxCPM** | **3.40** | **4.04** | 12.9 | 66.1 | 3.59 | **7.89** | 64.3 | 3.74 |
 ## ⚠️ Risks and limitations
 - General Model Behavior: While VoxCPM has been trained on a large-scale dataset, it may still produce outputs that are unexpected, biased, or contain artifacts.
 - Potential for Misuse of Voice Cloning: VoxCPM's powerful zero-shot voice cloning capability can generate highly realistic synthetic speech. This technology could be misused for creating convincing deepfakes for purposes of impersonation, fraud, or spreading disinformation. Users of this model must not use it to create content that infringes upon the rights of individuals. It is strictly forbidden to use VoxCPM for any illegal or unethical purposes. We strongly recommend that any publicly shared content generated with this model be clearly marked as AI-generated.
@@ -242,37 +232,6 @@ VoxCPM achieves competitive results on public zero-shot TTS benchmarks:
 ## 📄 License
-The VoxCPM model weights and code are open-sourced under the [Apache-2.0](LICENSE) license.
-## 🙏 Acknowledgments
-We extend our sincere gratitude to the following works and resources for their inspiration and contributions:
-- [DiTAR](https://arxiv.org/abs/2502.03930) for the diffusion autoregressive backbone used in speech generation
-- [MiniCPM-4](https://github.com/OpenBMB/MiniCPM) for serving as the language model foundation
-- [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) for the implementation of Flow Matching-based LocDiT
-- [DAC](https://github.com/descriptinc/descript-audio-codec) for providing the Audio VAE backbone
-## Institutions
-This project is developed by the following institutions:
-- <img src="assets/modelbest_logo.png" width="28px"> [ModelBest](https://modelbest.cn/)
-- <img src="assets/thuhcsi_logo.png" width="28px"> [THUHCSI](https://github.com/thuhcsi)
-## 📚 Citation
-If you find our model helpful, please consider citing our projects 📝 and staring us ⭐️！
-```bib
-@misc{voxcpm2025,
-  author       = {{Yixuan Zhou, Guoyang Zeng, Xin Liu, Xiang Li, Renjie Yu, Ziyang Wang, Runchuan Ye, Weiyue Sun, Jiancheng Gui, Kehan Li, Zhiyong Wu, Zhiyuan Liu}},
-  title        = {{VoxCPM}},
-  year         = {2025},
-  publish = {\url{https://github.com/OpenBMB/VoxCPM}},
-  note         = {GitHub repository}
-}
-```

 | **VoxCPM** | **3.40** | **4.04** | 12.9 | 66.1 | 3.59 | **7.89** | 64.3 | 3.74 |
 ## ⚠️ Risks and limitations
 - General Model Behavior: While VoxCPM has been trained on a large-scale dataset, it may still produce outputs that are unexpected, biased, or contain artifacts.
 - Potential for Misuse of Voice Cloning: VoxCPM's powerful zero-shot voice cloning capability can generate highly realistic synthetic speech. This technology could be misused for creating convincing deepfakes for purposes of impersonation, fraud, or spreading disinformation. Users of this model must not use it to create content that infringes upon the rights of individuals. It is strictly forbidden to use VoxCPM for any illegal or unethical purposes. We strongly recommend that any publicly shared content generated with this model be clearly marked as AI-generated.
 ## 📄 License
+The VoxCPM model weights and code are open-sourced under the Apache-2.0 license.