Spaces:
Running
A newer version of the Gradio SDK is available:
6.0.2
title: GigaAMv3 Preview
emoji: 🔥
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: false
license: mit
short_description: Interactive Gradio Space demonstrating ai-sage/GigaAM-v3 ASR
hf_oauth: true
hf_oauth_scopes:
- read-repos
GigaAM-v3 Gradio demo
This Space demonstrates the ai-sage/GigaAM-v3 Russian ASR models built on top of a Conformer encoder and HuBERT-CTC objective. The demo lets you:
- upload or record audio (WAV/MP3/FLAC) directly in the browser,
- choose between the
ctc,rnnt,e2e_ctc, ande2e_rnntcheckpoints, - switch between a fast single-pass mode and a segmented long-form mode that returns timestamps.
The end-to-end variants (e2e_*) produce punctuated, normalized text, while the classic CTC/RNN-T checkpoints return raw transcriptions with lower latency. Long-form mode uses model.transcribe_longform and requires a Hugging Face token with access to pyannote/segmentation-3.0.
Short-form limits & audio pre-processing
model.transcribein GigaAM supports clips roughly up to 25 seconds despite the UI limit of 150 seconds.- All incoming audio (upload + microphone) is automatically converted to mono PCM at 16 kHz before inference to match the recommendation from the official repo.
- If a clip exceeds the short-form limit, the app transparently switches to segmented mode (requires an auth token) instead of failing with "Too long wav file".
Requirements
- Python 3.10
- PyTorch / torchaudio 2.8.0
transformers==4.57.1gradio==6.0.0(seerequirements.txtfor the full list)- Optional: set
HF_TOKEN(orHUGGINGFACEHUB_API_TOKEN) if you want to use the segmented mode or access private weights.
Running locally
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
# optional – needed for long-form segmentation
export HF_TOKEN=<your_hf_token>
python app.py
Open the printed URL (default http://127.0.0.1:7860) and start transcribing.
Authentication & user tokens
This Space enables the Hugging Face OAuth flow (see Spaces OAuth docs). When you click the "Sign in with Hugging Face" button in the UI:
- The returned access token is stored only in your session and used to access
pyannote/segmentation-3.0for long-form transcription. - You can sign out at any time, or rely on the space-level
HF_TOKENsecret if provided by the maintainer. - Without a token you can still run the short-form mode (<25 s) but segmented transcription is disabled.
Deploying to Hugging Face Spaces
- Keep the YAML front matter above so Spaces can infer the runtime.
- Upload
app.py,requirements.txt, andruntime.txt. - Configure an
HF_TOKENsecret in Settings → Variables if you want segmented mode to work for everyone. - Assign
CPU Upgradeor GPU hardware for heavy, long-form workloads. - (Optional) Leave
hf_oauth: truein the metadata to enable the built-in "Sign in with HF" button powered by OAuth/OpenID Connect.
For more options (custom hardware, scaling, telemetry), review the Spaces configuration reference.