| | --- |
| | license: cc-by-sa-3.0 |
| | datasets: |
| | - mnist |
| | --- |
| | |
| | [WGAN-GP](https://arxiv.org/abs/1704.00028) model trained on the [MNIST dataset](https://www.tensorflow.org/datasets/catalog/mnist) using [JAX in Colab](https://colab.research.google.com/drive/1RzQfrc4Xf_pvGJD2PaNJyaURLh0nO4Fp?usp=sharing). |
| |
|
| | | Real Images | Generated Images | |
| | | ------- | -------- | |
| | |  |  | |
| |
|
| | # Training Progression |
| | <video width="50%" controls> |
| | <source src="https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/nX7L6xkjvAvaca5pHyTp0.mp4" type="video/mp4"> |
| | </video> |
| |
|
| | # Details |
| | This model is based on [WGAN-GP](https://arxiv.org/abs/1704.00028). |
| |
|
| | The model was trained for ~9h40m on a GCE VM instance (n1-standard-4, 1 x NVIDIA T4). |
| |
|
| | The Critic consists of 4 Convolutional Layers with strides for downsampling, and Leaky ReLU activation. The critic does not use Batch Normalization or Dropout. |
| |
|
| | The Generator consists of 4 Transposed Convolutional Layers with ReLU activation and Batch Normalization. |
| |
|
| | The learning rate was kept constant at 1e-4 for the first 50,000 steps, which was followed by cosine annealing cycles with a peak LR of 1e-3. |
| |
|
| | The Lambda (gradient penalty coefficient) used was 10 (same as the original paper). |
| |
|
| | For more details, please refer to the [Colab Notebook](https://colab.research.google.com/drive/1RzQfrc4Xf_pvGJD2PaNJyaURLh0nO4Fp?usp=sharing). |