TrevorJS commited on
Commit
ef4c966
·
verified ·
1 Parent(s): c5952a2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - fr
6
+ - de
7
+ - es
8
+ - it
9
+ - pt
10
+ - nl
11
+ - pl
12
+ - ru
13
+ - ja
14
+ - ko
15
+ - zh
16
+ - ar
17
+ - hi
18
+ tags:
19
+ - speech
20
+ - asr
21
+ - voxtral
22
+ - gguf
23
+ - q4
24
+ - webgpu
25
+ - wasm
26
+ - streaming
27
+ library_name: burn
28
+ pipeline_tag: automatic-speech-recognition
29
+ base_model: mistralai/Voxtral-Mini-4B-Realtime-2602
30
+ ---
31
+
32
+ # Voxtral Mini 4B Realtime — Q4 GGUF
33
+
34
+ Q4_0 quantized GGUF weights for [Voxtral Mini 4B Realtime](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602), converted for use with [voxtral-mini-realtime-rs](https://github.com/TrevorS/voxtral-mini-realtime-rs).
35
+
36
+ ## Files
37
+
38
+ | File | Size | Description |
39
+ |------|------|-------------|
40
+ | `voxtral-q4.gguf` | 2.51 GB | Q4_0 quantized model weights (GGUF v3) |
41
+ | `tekken.json` | 14.9 MB | Tekken tokenizer (131,072 vocab) |
42
+
43
+ ## Usage
44
+
45
+ ### Native CLI
46
+
47
+ ```bash
48
+ cargo run --features "wgpu,cli,hub" --bin voxtral-transcribe -- \
49
+ --audio input.wav \
50
+ --gguf models/voxtral-q4.gguf \
51
+ --tokenizer models/voxtral/tekken.json
52
+ ```
53
+
54
+ ### Browser (WASM + WebGPU)
55
+
56
+ The Q4 GGUF is designed to run entirely client-side in a browser tab via WebGPU. See the [GitHub repo](https://github.com/TrevorS/voxtral-mini-realtime-rs) for the full WASM build and dev server setup.
57
+
58
+ ## Quantization Details
59
+
60
+ - **Method**: Q4_0 (4-bit quantization, block size 32, 18 bytes per block)
61
+ - **Original model**: [mistralai/Voxtral-Mini-4B-Realtime-2602](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602) (~16 GB F32)
62
+ - **Quantized size**: ~2.5 GB (fits in browser memory)
63
+ - **Inference**: Custom WGSL shader for fused GPU dequantize + matmul
64
+
65
+ ## Links
66
+
67
+ - **Code**: [github.com/TrevorS/voxtral-mini-realtime-rs](https://github.com/TrevorS/voxtral-mini-realtime-rs)
68
+ - **Original model**: [mistralai/Voxtral-Mini-4B-Realtime-2602](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602)