Spaces:

Ar4ikov
/

GigaAMv3-preview

Running

File size: 3,409 Bytes

---
title: GigaAMv3 Preview
emoji: 🔥
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: false
license: mit
short_description: Interactive Gradio Space demonstrating ai-sage/GigaAM-v3 ASR
hf_oauth: true
hf_oauth_scopes:
- read-repos

---

# GigaAM-v3 Gradio demo

This Space demonstrates the [`ai-sage/GigaAM-v3`](https://huggingface.co/ai-sage/GigaAM-v3) Russian ASR models built on top of a Conformer encoder and HuBERT-CTC objective. The demo lets you:

- upload or record audio (WAV/MP3/FLAC) directly in the browser,
- choose between the `ctc`, `rnnt`, `e2e_ctc`, and `e2e_rnnt` checkpoints,
- switch between a fast single-pass mode and a segmented long-form mode that returns timestamps.

The end-to-end variants (`e2e_*`) produce punctuated, normalized text, while the classic CTC/RNN-T checkpoints return raw transcriptions with lower latency. Long-form mode uses `model.transcribe_longform` and requires a Hugging Face token with access to [`pyannote/segmentation-3.0`](https://huggingface.co/pyannote/segmentation-3.0).

**Short-form limits & audio pre-processing**

- `model.transcribe` in GigaAM supports clips roughly up to **25 seconds** despite the UI limit of 150 seconds.
- All incoming audio (upload + microphone) is automatically converted to mono PCM at 16 kHz before inference to match the recommendation from the [official repo](https://github.com/salute-developers/GigaAM/).
- If a clip exceeds the short-form limit, the app transparently switches to segmented mode (requires an auth token) instead of failing with \"Too long wav file\".

## Requirements

- Python 3.10
- PyTorch / torchaudio 2.8.0
- `transformers==4.57.1`
- `gradio==6.0.0` (see `requirements.txt` for the full list)
- Optional: set `HF_TOKEN` (or `HUGGINGFACEHUB_API_TOKEN`) if you want to use the segmented mode or access private weights.

## Running locally

```bash
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -r requirements.txt

# optional – needed for long-form segmentation
export HF_TOKEN=<your_hf_token>

python app.py
```

Open the printed URL (default `http://127.0.0.1:7860`) and start transcribing.

## Authentication & user tokens

This Space enables the Hugging Face OAuth flow (see [Spaces OAuth docs](https://huggingface.co/docs/hub/spaces-oauth)). When you click the \"Sign in with Hugging Face\" button in the UI:

- The returned access token is stored only in your session and used to access `pyannote/segmentation-3.0` for long-form transcription.
- You can sign out at any time, or rely on the space-level `HF_TOKEN` secret if provided by the maintainer.
- Without a token you can still run the short-form mode (<25 s) but segmented transcription is disabled.

## Deploying to Hugging Face Spaces

- Keep the YAML front matter above so Spaces can infer the runtime.
- Upload `app.py`, `requirements.txt`, and `runtime.txt`.
- Configure an `HF_TOKEN` secret in **Settings → Variables** if you want segmented mode to work for everyone.
- Assign `CPU Upgrade` or GPU hardware for heavy, long-form workloads.
- (Optional) Leave `hf_oauth: true` in the metadata to enable the built-in \"Sign in with HF\" button powered by OAuth/OpenID Connect.

For more options (custom hardware, scaling, telemetry), review the [Spaces configuration reference](https://huggingface.co/docs/hub/spaces-config-reference).