StoryKimi-Zero / README.md
yuvraj-singh-9886's picture
Add StoryKimi ZeroGPU implementation
3b70c60
---
title: StoryKimi Zero
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.42.0
app_file: app.py
pinned: false
license: mit
hardware: zero-gpu
short_description: Generate stories with StoryKimi model using ZeroGPU
---
# StoryKimi Zero - DeepSeek V3 Inspired Model on ZeroGPU
A PyTorch implementation of a DeepSeek V3 inspired transformer model with Mixture of Experts (MoE), Latent Attention, and other advanced features, deployed on Hugging Face Spaces with ZeroGPU for efficient inference.
![StoryKimi Model](https://huggingface.co/YuvrajSingh9886/StoryKimi/resolve/main/images/image.png)
## πŸ“Š Training Results & Model Weights
**πŸ“ˆ View Training Report**: [StoryKimi Training Results on WandB](https://wandb.ai/rentio/DSV-Training/reports/SmolKimi-A-smaller-Kimi-K2---VmlldzoxMzYwNDQ4Mg?accessToken=lfs6n1y7gn8q0f0dwilta8yuwzxel45ztzbbcavwbqp7jsyv1p7cz9elflycv9fg)
**πŸ’Ύ Pre-trained Weights**:
- **Hugging Face Model**: [YuvrajSingh9886/StoryKimi](https://huggingface.co/YuvrajSingh9886/StoryKimi)
- **WandB Checkpoints**: Check the WandB report above for additional trained model checkpoints
## 🌟 Features
- **ZeroGPU Integration**: Dynamic GPU allocation with NVIDIA H200 slices (70GB VRAM)
- **Latent Attention**: Efficient attention mechanism with compressed key-value representations
- **Mixture of Experts (MoE)**: 8 experts with top-2 routing and shared expert support
- **SWiGLU Activation**: Advanced activation function in expert layers
- **Sinusoidal Positional Embeddings**: Position encoding for sequence understanding
- **Interactive Interface**: User-friendly Gradio interface with real-time generation
- **Multiple Sampling Methods**: Top-k sampling with temperature control
- **Real-time Generation**: Fast inference with automatic scaling
## πŸ”§ Model Architecture
### Default Configuration
- **Embedding Dimensions**: 384
- **Decoder Layers**: 6
- **Attention Heads**: 8
- **MoE Experts**: 8 (top-2 routing)
- **Block Size**: 128 tokens
- **Vocabulary Size**: Based on Llama-2-7b tokenizer (~32,000 tokens)
- **Latent Dimension**: 64 (for compressed attention)
### ZeroGPU Configuration
- **GPU Type**: NVIDIA H200 slice
- **Available VRAM**: 70GB per workload
- **Max Duration**: 120 seconds per generation
- **Deployment**: Hugging Face Spaces with automatic scaling
## 🎯 Usage
1. **Enter your story prompt** in the text box
2. **Select model checkpoint** (Checkpoint 2000 available)
3. **Adjust generation parameters**:
- **Max Length**: 10-128 tokens
- **Temperature**: 0.1-2.0 (creativity vs coherence)
- **Top-k**: 1-100 (vocabulary filtering)
4. **Click "Generate Text"** to create your AI-generated story
5. **Enjoy your personalized story!**
## πŸ’‘ Generation Tips
- **Lower temperature** (0.1-0.7) for more coherent and focused stories
- **Higher temperature** (0.8-2.0) for more creative and diverse outputs
- **Adjust top-k** to control vocabulary diversity and randomness
- **Use descriptive prompts** for better and more relevant results
- **Experiment with different lengths** to find your preferred story format
## πŸ”„ ZeroGPU Benefits
- **Free GPU Access**: No cost for users to generate stories
- **Efficient Resource Usage**: GPU allocated only when needed for inference
- **Automatic Scaling**: Handles multiple concurrent users seamlessly
- **High Performance**: NVIDIA H200 acceleration for fast generation
- **No Setup Required**: Ready-to-use interface with pre-loaded model
## πŸ—οΈ Technical Implementation
### Model Features
- **Latent Attention**: Compressed key-value representations for efficiency
- **Mixture of Experts**: 8 experts with intelligent routing
- **Advanced Activation**: SWiGLU for better performance
- **Positional Encoding**: Sinusoidal embeddings for sequence understanding
### Deployment Features
- **ZeroGPU Decorator**: `@spaces.GPU(duration=120)` for dynamic allocation
- **Optimized Loading**: Efficient model loading and initialization
- **Error Handling**: Robust error management for better user experience
- **Real-time Feedback**: Live generation status and results
## πŸš€ Local Development
Want to run this locally or contribute? Check out the full repository:
**πŸ“ Source Code**: [YuvrajSingh-mist/SmolHub/StoryKimi](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/StoryKimi)
### Quick Local Setup
```bash
# Clone the repository
git clone https://github.com/YuvrajSingh-mist/SmolHub.git
cd SmolHub/StoryKimi
# Install dependencies
chmod +x install.sh
./install.sh
# Run Gradio interface
cd gradio
python app.py
```
### Training Your Own Model
```bash
# Set your HF token for Llama-2 tokenizer access
export HF_TOKEN="your_token_here"
# Basic training
python trainer.py
# Advanced training with custom parameters
python trainer.py --embeddings_dims 512 --experts 16 --epochs 5
```
## πŸ“Š Model Performance
The model has been trained on diverse text data and shows strong performance in:
- **Story Generation**: Creative and coherent narrative creation
- **Text Continuation**: Natural extension of given prompts
- **Style Adaptation**: Adapting to different writing styles and genres
- **Character Development**: Creating consistent characters and dialogue
## πŸ”— Related Links
- **Full Project**: [SmolHub Repository](https://github.com/YuvrajSingh-mist/SmolHub)
- **Model Weights**: [HuggingFace Model](https://huggingface.co/YuvrajSingh9886/StoryKimi)
- **Training Report**: [WandB Results](https://wandb.ai/rentio/DSV-Training/reports/SmolKimi-A-smaller-Kimi-K2---VmlldzoxMzYwNDQ4Mg?accessToken=lfs6n1y7gn8q0f0dwilta8yuwzxel45ztzbbcavwbqp7jsyv1p7cz9elflycv9fg)
- **Other Models**: [SmolMixtral](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/SmolMixtral), [SmolTransformer](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/SmolTransformer)
## πŸ“ License
MIT License - See LICENSE file for details