Spaces:
Sleeping
Sleeping
| title: StoryKimi Zero | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.42.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| hardware: zero-gpu | |
| short_description: Generate stories with StoryKimi model using ZeroGPU | |
| # StoryKimi Zero - DeepSeek V3 Inspired Model on ZeroGPU | |
| A PyTorch implementation of a DeepSeek V3 inspired transformer model with Mixture of Experts (MoE), Latent Attention, and other advanced features, deployed on Hugging Face Spaces with ZeroGPU for efficient inference. | |
|  | |
| ## π Training Results & Model Weights | |
| **π View Training Report**: [StoryKimi Training Results on WandB](https://wandb.ai/rentio/DSV-Training/reports/SmolKimi-A-smaller-Kimi-K2---VmlldzoxMzYwNDQ4Mg?accessToken=lfs6n1y7gn8q0f0dwilta8yuwzxel45ztzbbcavwbqp7jsyv1p7cz9elflycv9fg) | |
| **πΎ Pre-trained Weights**: | |
| - **Hugging Face Model**: [YuvrajSingh9886/StoryKimi](https://huggingface.co/YuvrajSingh9886/StoryKimi) | |
| - **WandB Checkpoints**: Check the WandB report above for additional trained model checkpoints | |
| ## π Features | |
| - **ZeroGPU Integration**: Dynamic GPU allocation with NVIDIA H200 slices (70GB VRAM) | |
| - **Latent Attention**: Efficient attention mechanism with compressed key-value representations | |
| - **Mixture of Experts (MoE)**: 8 experts with top-2 routing and shared expert support | |
| - **SWiGLU Activation**: Advanced activation function in expert layers | |
| - **Sinusoidal Positional Embeddings**: Position encoding for sequence understanding | |
| - **Interactive Interface**: User-friendly Gradio interface with real-time generation | |
| - **Multiple Sampling Methods**: Top-k sampling with temperature control | |
| - **Real-time Generation**: Fast inference with automatic scaling | |
| ## π§ Model Architecture | |
| ### Default Configuration | |
| - **Embedding Dimensions**: 384 | |
| - **Decoder Layers**: 6 | |
| - **Attention Heads**: 8 | |
| - **MoE Experts**: 8 (top-2 routing) | |
| - **Block Size**: 128 tokens | |
| - **Vocabulary Size**: Based on Llama-2-7b tokenizer (~32,000 tokens) | |
| - **Latent Dimension**: 64 (for compressed attention) | |
| ### ZeroGPU Configuration | |
| - **GPU Type**: NVIDIA H200 slice | |
| - **Available VRAM**: 70GB per workload | |
| - **Max Duration**: 120 seconds per generation | |
| - **Deployment**: Hugging Face Spaces with automatic scaling | |
| ## π― Usage | |
| 1. **Enter your story prompt** in the text box | |
| 2. **Select model checkpoint** (Checkpoint 2000 available) | |
| 3. **Adjust generation parameters**: | |
| - **Max Length**: 10-128 tokens | |
| - **Temperature**: 0.1-2.0 (creativity vs coherence) | |
| - **Top-k**: 1-100 (vocabulary filtering) | |
| 4. **Click "Generate Text"** to create your AI-generated story | |
| 5. **Enjoy your personalized story!** | |
| ## π‘ Generation Tips | |
| - **Lower temperature** (0.1-0.7) for more coherent and focused stories | |
| - **Higher temperature** (0.8-2.0) for more creative and diverse outputs | |
| - **Adjust top-k** to control vocabulary diversity and randomness | |
| - **Use descriptive prompts** for better and more relevant results | |
| - **Experiment with different lengths** to find your preferred story format | |
| ## π ZeroGPU Benefits | |
| - **Free GPU Access**: No cost for users to generate stories | |
| - **Efficient Resource Usage**: GPU allocated only when needed for inference | |
| - **Automatic Scaling**: Handles multiple concurrent users seamlessly | |
| - **High Performance**: NVIDIA H200 acceleration for fast generation | |
| - **No Setup Required**: Ready-to-use interface with pre-loaded model | |
| ## ποΈ Technical Implementation | |
| ### Model Features | |
| - **Latent Attention**: Compressed key-value representations for efficiency | |
| - **Mixture of Experts**: 8 experts with intelligent routing | |
| - **Advanced Activation**: SWiGLU for better performance | |
| - **Positional Encoding**: Sinusoidal embeddings for sequence understanding | |
| ### Deployment Features | |
| - **ZeroGPU Decorator**: `@spaces.GPU(duration=120)` for dynamic allocation | |
| - **Optimized Loading**: Efficient model loading and initialization | |
| - **Error Handling**: Robust error management for better user experience | |
| - **Real-time Feedback**: Live generation status and results | |
| ## π Local Development | |
| Want to run this locally or contribute? Check out the full repository: | |
| **π Source Code**: [YuvrajSingh-mist/SmolHub/StoryKimi](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/StoryKimi) | |
| ### Quick Local Setup | |
| ```bash | |
| # Clone the repository | |
| git clone https://github.com/YuvrajSingh-mist/SmolHub.git | |
| cd SmolHub/StoryKimi | |
| # Install dependencies | |
| chmod +x install.sh | |
| ./install.sh | |
| # Run Gradio interface | |
| cd gradio | |
| python app.py | |
| ``` | |
| ### Training Your Own Model | |
| ```bash | |
| # Set your HF token for Llama-2 tokenizer access | |
| export HF_TOKEN="your_token_here" | |
| # Basic training | |
| python trainer.py | |
| # Advanced training with custom parameters | |
| python trainer.py --embeddings_dims 512 --experts 16 --epochs 5 | |
| ``` | |
| ## π Model Performance | |
| The model has been trained on diverse text data and shows strong performance in: | |
| - **Story Generation**: Creative and coherent narrative creation | |
| - **Text Continuation**: Natural extension of given prompts | |
| - **Style Adaptation**: Adapting to different writing styles and genres | |
| - **Character Development**: Creating consistent characters and dialogue | |
| ## π Related Links | |
| - **Full Project**: [SmolHub Repository](https://github.com/YuvrajSingh-mist/SmolHub) | |
| - **Model Weights**: [HuggingFace Model](https://huggingface.co/YuvrajSingh9886/StoryKimi) | |
| - **Training Report**: [WandB Results](https://wandb.ai/rentio/DSV-Training/reports/SmolKimi-A-smaller-Kimi-K2---VmlldzoxMzYwNDQ4Mg?accessToken=lfs6n1y7gn8q0f0dwilta8yuwzxel45ztzbbcavwbqp7jsyv1p7cz9elflycv9fg) | |
| - **Other Models**: [SmolMixtral](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/SmolMixtral), [SmolTransformer](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/SmolTransformer) | |
| ## π License | |
| MIT License - See LICENSE file for details | |