--- title: GigaAMv3 Preview emoji: 🔥 colorFrom: red colorTo: green sdk: gradio sdk_version: 6.0.0 app_file: app.py pinned: false license: mit short_description: Interactive Gradio Space demonstrating ai-sage/GigaAM-v3 ASR hf_oauth: true hf_oauth_scopes: - read-repos --- # GigaAM-v3 Gradio demo This Space demonstrates the [`ai-sage/GigaAM-v3`](https://huggingface.co/ai-sage/GigaAM-v3) Russian ASR models built on top of a Conformer encoder and HuBERT-CTC objective. The demo lets you: - upload or record audio (WAV/MP3/FLAC) directly in the browser, - choose between the `ctc`, `rnnt`, `e2e_ctc`, and `e2e_rnnt` checkpoints, - switch between a fast single-pass mode and a segmented long-form mode that returns timestamps. The end-to-end variants (`e2e_*`) produce punctuated, normalized text, while the classic CTC/RNN-T checkpoints return raw transcriptions with lower latency. Long-form mode uses `model.transcribe_longform` and requires a Hugging Face token with access to [`pyannote/segmentation-3.0`](https://huggingface.co/pyannote/segmentation-3.0). **Short-form limits & audio pre-processing** - `model.transcribe` in GigaAM supports clips roughly up to **25 seconds** despite the UI limit of 150 seconds. - All incoming audio (upload + microphone) is automatically converted to mono PCM at 16 kHz before inference to match the recommendation from the [official repo](https://github.com/salute-developers/GigaAM/). - If a clip exceeds the short-form limit, the app transparently switches to segmented mode (requires an auth token) instead of failing with \"Too long wav file\". ## Requirements - Python 3.10 - PyTorch / torchaudio 2.8.0 - `transformers==4.57.1` - `gradio==6.0.0` (see `requirements.txt` for the full list) - Optional: set `HF_TOKEN` (or `HUGGINGFACEHUB_API_TOKEN`) if you want to use the segmented mode or access private weights. ## Running locally ```bash python -m venv .venv source .venv/bin/activate # or .venv\Scripts\activate on Windows pip install -r requirements.txt # optional – needed for long-form segmentation export HF_TOKEN= python app.py ``` Open the printed URL (default `http://127.0.0.1:7860`) and start transcribing. ## Authentication & user tokens This Space enables the Hugging Face OAuth flow (see [Spaces OAuth docs](https://huggingface.co/docs/hub/spaces-oauth)). When you click the \"Sign in with Hugging Face\" button in the UI: - The returned access token is stored only in your session and used to access `pyannote/segmentation-3.0` for long-form transcription. - You can sign out at any time, or rely on the space-level `HF_TOKEN` secret if provided by the maintainer. - Without a token you can still run the short-form mode (<25 s) but segmented transcription is disabled. ## Deploying to Hugging Face Spaces - Keep the YAML front matter above so Spaces can infer the runtime. - Upload `app.py`, `requirements.txt`, and `runtime.txt`. - Configure an `HF_TOKEN` secret in **Settings → Variables** if you want segmented mode to work for everyone. - Assign `CPU Upgrade` or GPU hardware for heavy, long-form workloads. - (Optional) Leave `hf_oauth: true` in the metadata to enable the built-in \"Sign in with HF\" button powered by OAuth/OpenID Connect. For more options (custom hardware, scaling, telemetry), review the [Spaces configuration reference](https://huggingface.co/docs/hub/spaces-config-reference).