Add vocoder model with config and documentation

Files changed (3) hide show

Readme.md ADDED Viewed

+# Vocoder Model
+This directory contains the pre-trained vocoder model for converting mel-spectrograms to audio waveforms.
+## Model Details
+- **File**: `vocoder.pt`
+- **Input**: Mel-spectrograms
+- **Output**: Audio waveform
+## Usage
+```python
+# Load the vocoder model
+vocoder = torch.load('vocoder.pt')
+vocoder.eval()
+# Generate audio from mel-spectrogram
+with torch.no_grad():
+    audio = vocoder(mel_spectrogram)
+```
+## Dependencies
+- PyTorch
+- NumPy
+- Audio processing libraries (for waveform handling)
+## Model Configuration
+See `config.json` for model architecture and training parameters.

config.json ADDED Viewed

+{
+  "model_type": "hifigan",
+  "sample_rate": 22050,
+  "num_mels": 80,
+  "num_freq": 1025,
+  "n_fft": 1024,
+  "hop_size": 256,
+  "win_size": 1024,
+  "fmin": 0.0,
+  "fmax": 8000.0,
+  "upsample_rates": [8, 8, 2, 2],
+  "upsample_kernel_sizes": [16, 16, 4, 4],
+  "upsample_initial_channel": 512,
+  "resblock_kernel_sizes": [3, 7, 11],
+  "resblock_dilation_sizes": [
+    [1, 3, 5],
+    [1, 3, 5],
+    [1, 3, 5]
+  ],
+  "resblock_type": "1",
+  "use_spectral_norm": false,
+  "version": "1.0",
+  "authors": ["Arjit"],
+  "description": "HiFi-GAN vocoder for high-fidelity audio waveform generation from mel-spectrograms"
+}

vocoder.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:1d7a6861589e927e0fbdaa5849ca022258fe2b58a20cc7bfb8fb598ccf936169
+size 53845290