YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Pyannote

Run Pyannote optimized for Qualcomm SnapDragon device's NPU with nexaSDK.

Quickstart

  1. Install NexaSDK and create a free account at sdk.nexa.ai

  2. Activate your device with your access token:

    nexa config set license '<access_token>'
    
  3. Run the model on Qualcomm NPU in one line:

    nexa infer NexaAI/Pyannote-NPU
    
  • Input: Enter input audio path,
  • Output: Returns speech diarization results, or report error if any required input cannot be found

Model Description

pyannote-audio (Community Version) is an open-source speech diarization model designed for accurate speaker segmentation and labeling in audio streams.
Developed by the Pyannote community, it combines audio processing, speaker embedding, and clustering into a unified framework, enabling robust speech segmentation on local machines without cloud dependency.

Features

  • 🔊 End-to-End Diarization Pipeline — Automatically detects and labels who spoke when in an audio file.
  • Lightweight & Efficient — Optimized for real-time or batch processing on consumer hardware and GPUs.
  • 🧠 Speaker Embedding & Clustering — Extracts rich speaker representations and groups them for identity separation.
  • 🔧 Customizable & Modular — Easily integrates with PyTorch pipelines or modified components for research and prototyping.
  • 🌍 Community-Driven & Transparent — Fully open and maintained by an active community of speech researchers and developers.

Use Cases

  • Meeting Transcription: Segment conversations by speaker for clearer transcripts.
  • Broadcast and Podcast Analysis: Attribute voices and structure long-form audio content.
  • Call Center Analytics: Separate agent and customer segments for interaction insights.
  • Research: Test diarization algorithms or contribute new speaker models.
  • Voice Dataset Preparation: Preprocess large audio datasets for training ASR or emotion recognition systems.

Inputs and Outputs

Input

  • Audio file or stream

Output

  • Speaker-labeled time segments

License

This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution.
All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications.
Commercial licensing or enterprise usage requires a separate agreement.
For inquiries, please contact dev@nexa.ai.

Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NexaAI/Pyannote-NPU