Spaces:
Running
Running
File size: 3,409 Bytes
0cf4344 2be93d1 0cf4344 08c9a0c d47898a f8fd37e 0cf4344 08c9a0c d47898a 08c9a0c 2be93d1 08c9a0c d47898a 08c9a0c d47898a 08c9a0c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
title: GigaAMv3 Preview
emoji: 🔥
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: false
license: mit
short_description: Interactive Gradio Space demonstrating ai-sage/GigaAM-v3 ASR
hf_oauth: true
hf_oauth_scopes:
- read-repos
---
# GigaAM-v3 Gradio demo
This Space demonstrates the [`ai-sage/GigaAM-v3`](https://huggingface.co/ai-sage/GigaAM-v3) Russian ASR models built on top of a Conformer encoder and HuBERT-CTC objective. The demo lets you:
- upload or record audio (WAV/MP3/FLAC) directly in the browser,
- choose between the `ctc`, `rnnt`, `e2e_ctc`, and `e2e_rnnt` checkpoints,
- switch between a fast single-pass mode and a segmented long-form mode that returns timestamps.
The end-to-end variants (`e2e_*`) produce punctuated, normalized text, while the classic CTC/RNN-T checkpoints return raw transcriptions with lower latency. Long-form mode uses `model.transcribe_longform` and requires a Hugging Face token with access to [`pyannote/segmentation-3.0`](https://huggingface.co/pyannote/segmentation-3.0).
**Short-form limits & audio pre-processing**
- `model.transcribe` in GigaAM supports clips roughly up to **25 seconds** despite the UI limit of 150 seconds.
- All incoming audio (upload + microphone) is automatically converted to mono PCM at 16 kHz before inference to match the recommendation from the [official repo](https://github.com/salute-developers/GigaAM/).
- If a clip exceeds the short-form limit, the app transparently switches to segmented mode (requires an auth token) instead of failing with \"Too long wav file\".
## Requirements
- Python 3.10
- PyTorch / torchaudio 2.8.0
- `transformers==4.57.1`
- `gradio==6.0.0` (see `requirements.txt` for the full list)
- Optional: set `HF_TOKEN` (or `HUGGINGFACEHUB_API_TOKEN`) if you want to use the segmented mode or access private weights.
## Running locally
```bash
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
# optional – needed for long-form segmentation
export HF_TOKEN=<your_hf_token>
python app.py
```
Open the printed URL (default `http://127.0.0.1:7860`) and start transcribing.
## Authentication & user tokens
This Space enables the Hugging Face OAuth flow (see [Spaces OAuth docs](https://huggingface.co/docs/hub/spaces-oauth)). When you click the \"Sign in with Hugging Face\" button in the UI:
- The returned access token is stored only in your session and used to access `pyannote/segmentation-3.0` for long-form transcription.
- You can sign out at any time, or rely on the space-level `HF_TOKEN` secret if provided by the maintainer.
- Without a token you can still run the short-form mode (<25 s) but segmented transcription is disabled.
## Deploying to Hugging Face Spaces
- Keep the YAML front matter above so Spaces can infer the runtime.
- Upload `app.py`, `requirements.txt`, and `runtime.txt`.
- Configure an `HF_TOKEN` secret in **Settings → Variables** if you want segmented mode to work for everyone.
- Assign `CPU Upgrade` or GPU hardware for heavy, long-form workloads.
- (Optional) Leave `hf_oauth: true` in the metadata to enable the built-in \"Sign in with HF\" button powered by OAuth/OpenID Connect.
For more options (custom hardware, scaling, telemetry), review the [Spaces configuration reference](https://huggingface.co/docs/hub/spaces-config-reference).
|