Pyannote
Run Pyannote optimized for Qualcomm SnapDragon device's NPU with nexaSDK.
Quickstart
Install NexaSDK and create a free account at sdk.nexa.ai
Activate your device with your access token:
nexa config set license '<access_token>'Run the model on Qualcomm NPU in one line:
nexa infer NexaAI/Pyannote-NPU
- Input: Enter input audio path,
- Output: Returns speech diarization results, or report error if any required input cannot be found
Model Description
pyannote-audio (Community Version) is an open-source speech diarization model designed for accurate speaker segmentation and labeling in audio streams.
Developed by the Pyannote community, it combines audio processing, speaker embedding, and clustering into a unified framework, enabling robust speech segmentation on local machines without cloud dependency.
Features
- 🔊 End-to-End Diarization Pipeline — Automatically detects and labels who spoke when in an audio file.
- ⚡ Lightweight & Efficient — Optimized for real-time or batch processing on consumer hardware and GPUs.
- 🧠 Speaker Embedding & Clustering — Extracts rich speaker representations and groups them for identity separation.
- 🔧 Customizable & Modular — Easily integrates with PyTorch pipelines or modified components for research and prototyping.
- 🌍 Community-Driven & Transparent — Fully open and maintained by an active community of speech researchers and developers.
Use Cases
- Meeting Transcription: Segment conversations by speaker for clearer transcripts.
- Broadcast and Podcast Analysis: Attribute voices and structure long-form audio content.
- Call Center Analytics: Separate agent and customer segments for interaction insights.
- Research: Test diarization algorithms or contribute new speaker models.
- Voice Dataset Preparation: Preprocess large audio datasets for training ASR or emotion recognition systems.
Inputs and Outputs
Input
- Audio file or stream
Output
- Speaker-labeled time segments
License
This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution.
All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications.
Commercial licensing or enterprise usage requires a separate agreement.
For inquiries, please contact dev@nexa.ai.
- Downloads last month
- 24