Spaces:
Sleeping
Sleeping
| title: Speaker Diarization | |
| emoji: 🔥 | |
| colorFrom: blue | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| # Real-Time Speaker Diarization | |
| This project implements real-time speaker diarization using WebRTC, FastAPI, and Gradio. It automatically transcribes speech and identifies different speakers in real-time. | |
| ## Architecture | |
| The system is split into two components: | |
| 1. **Model Server (Hugging Face Space)**: Runs the speech recognition and speaker diarization models | |
| 2. **Signaling Server (Render)**: Handles WebRTC signaling for direct audio streaming from browser | |
| ## Deployment Instructions | |
| ### Deploy Model Server on Hugging Face Space | |
| 1. Create a new Space on Hugging Face (Docker SDK) | |
| 2. Upload all files from the `Speaker-Diarization` directory | |
| 3. In Space settings: | |
| - Set Hardware to CPU (or GPU if available) | |
| - Set the public visibility | |
| - Environment: Make sure Docker SDK is selected | |
| ### Deploy Signaling Server on Render | |
| 1. Create a new Render Web Service | |
| 2. Connect to your GitHub repo containing the `render-signal` directory | |
| 3. Configure Render service: | |
| - Set Build Command: `cd render-signal && pip install -r requirements.txt` | |
| - Set Start Command: `cd render-signal && python backend.py` | |
| - Select Environment: Python 3 | |
| - Set Environment Variables: | |
| - `HF_SPACE_URL`: Set to your Hugging Face Space URL (e.g., `your-username-speaker-diarization.hf.space`) | |
| ### Update Configuration | |
| After both services are deployed: | |
| 1. Update `ui.py` on your Hugging Face Space: | |
| - Change `RENDER_SIGNALING_URL` to your Render app URL (`wss://your-app.onrender.com/stream`) | |
| - Make sure `HF_SPACE_URL` matches your actual Hugging Face Space URL | |
| 2. Update `backend.py` on your Render service: | |
| - Set `API_WS` to your Hugging Face Space WebSocket URL (`wss://your-username-speaker-diarization.hf.space/ws_inference`) | |
| ## Usage | |
| 1. Open your Hugging Face Space URL in a web browser | |
| 2. Click "Start Listening" to begin | |
| 3. Speak into your microphone | |
| 4. The system will transcribe your speech and identify different speakers in real-time | |
| ## Technology Stack | |
| - **Frontend**: Gradio UI with WebRTC for audio streaming | |
| - **Signaling**: FastRTC on Render for WebRTC signaling | |
| - **Backend**: FastAPI + WebSockets | |
| - **Models**: | |
| - SpeechBrain ECAPA-TDNN for speaker embeddings | |
| - Automatic Speech Recognition for transcription | |
| ## License | |
| MIT | |