GigaAMv3-preview / README.md
Ar4ikov's picture
Enhance token handling in Gradio app
f8fd37e

A newer version of the Gradio SDK is available: 6.0.2

Upgrade
metadata
title: GigaAMv3 Preview
emoji: 🔥
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 6.0.0
app_file: app.py
pinned: false
license: mit
short_description: Interactive Gradio Space demonstrating ai-sage/GigaAM-v3 ASR
hf_oauth: true
hf_oauth_scopes:
  - read-repos

GigaAM-v3 Gradio demo

This Space demonstrates the ai-sage/GigaAM-v3 Russian ASR models built on top of a Conformer encoder and HuBERT-CTC objective. The demo lets you:

  • upload or record audio (WAV/MP3/FLAC) directly in the browser,
  • choose between the ctc, rnnt, e2e_ctc, and e2e_rnnt checkpoints,
  • switch between a fast single-pass mode and a segmented long-form mode that returns timestamps.

The end-to-end variants (e2e_*) produce punctuated, normalized text, while the classic CTC/RNN-T checkpoints return raw transcriptions with lower latency. Long-form mode uses model.transcribe_longform and requires a Hugging Face token with access to pyannote/segmentation-3.0.

Short-form limits & audio pre-processing

  • model.transcribe in GigaAM supports clips roughly up to 25 seconds despite the UI limit of 150 seconds.
  • All incoming audio (upload + microphone) is automatically converted to mono PCM at 16 kHz before inference to match the recommendation from the official repo.
  • If a clip exceeds the short-form limit, the app transparently switches to segmented mode (requires an auth token) instead of failing with "Too long wav file".

Requirements

  • Python 3.10
  • PyTorch / torchaudio 2.8.0
  • transformers==4.57.1
  • gradio==6.0.0 (see requirements.txt for the full list)
  • Optional: set HF_TOKEN (or HUGGINGFACEHUB_API_TOKEN) if you want to use the segmented mode or access private weights.

Running locally

python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -r requirements.txt

# optional – needed for long-form segmentation
export HF_TOKEN=<your_hf_token>

python app.py

Open the printed URL (default http://127.0.0.1:7860) and start transcribing.

Authentication & user tokens

This Space enables the Hugging Face OAuth flow (see Spaces OAuth docs). When you click the "Sign in with Hugging Face" button in the UI:

  • The returned access token is stored only in your session and used to access pyannote/segmentation-3.0 for long-form transcription.
  • You can sign out at any time, or rely on the space-level HF_TOKEN secret if provided by the maintainer.
  • Without a token you can still run the short-form mode (<25 s) but segmented transcription is disabled.

Deploying to Hugging Face Spaces

  • Keep the YAML front matter above so Spaces can infer the runtime.
  • Upload app.py, requirements.txt, and runtime.txt.
  • Configure an HF_TOKEN secret in Settings → Variables if you want segmented mode to work for everyone.
  • Assign CPU Upgrade or GPU hardware for heavy, long-form workloads.
  • (Optional) Leave hf_oauth: true in the metadata to enable the built-in "Sign in with HF" button powered by OAuth/OpenID Connect.

For more options (custom hardware, scaling, telemetry), review the Spaces configuration reference.