|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
base_model: |
|
|
- Wan-AI/Wan2.2-S2V-14B |
|
|
pipeline_tag: any-to-any |
|
|
--- |
|
|
|
|
|
# RealVideo |
|
|
|
|
|
RealVideo is a WebSocket-based video calling system that supports text input. It leverages **GLM-4.5-AirX** and |
|
|
**GLM-TTS** models to generate audio responses and utilizes autoregressive diffusion to generate corresponding video frames. The |
|
|
system features a modular design with full functionality and a clean code structure. |
|
|
Visit [blog](https://z.ai/blog/realvideo) here! |
|
|
|
|
|
## Features |
|
|
|
|
|
- **Text Input**: Supports text message input. |
|
|
- **AI Voice Response**: Integrates GLM-4.5-AirX and GLM-TTS models to generate voice responses. |
|
|
- **Lip Sync**: Generates real-time conversational video based on any input image and audio. |
|
|
- **Real-time Communication**: WebSocket-based real-time bidirectional communication. |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
you can check in our [GitHub](https://github.com/zai-org/RealVideo). |
|
|
|
|
|
## Technical Highlights |
|
|
|
|
|
- **Model Integration**: Allows for convenient and quick voice cloning, taking text input to generate audio output. |
|
|
- **Modular Design**: Clear code structure, easy to maintain and extend. |
|
|
- **Real-time Performance**: Optimized audio processing and real-time video generation algorithms. |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
This project utilizes the following open-source libraries: |
|
|
|
|
|
- [self forcing](https://github.com/guandeh17/Self-Forcing) |