Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: StoryKimi Zero
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.42.0
app_file: app.py
pinned: false
license: mit
hardware: zero-gpu
short_description: Generate stories with StoryKimi model using ZeroGPU
StoryKimi Zero - DeepSeek V3 Inspired Model on ZeroGPU
A PyTorch implementation of a DeepSeek V3 inspired transformer model with Mixture of Experts (MoE), Latent Attention, and other advanced features, deployed on Hugging Face Spaces with ZeroGPU for efficient inference.
π Training Results & Model Weights
π View Training Report: StoryKimi Training Results on WandB
πΎ Pre-trained Weights:
- Hugging Face Model: YuvrajSingh9886/StoryKimi
- WandB Checkpoints: Check the WandB report above for additional trained model checkpoints
π Features
- ZeroGPU Integration: Dynamic GPU allocation with NVIDIA H200 slices (70GB VRAM)
- Latent Attention: Efficient attention mechanism with compressed key-value representations
- Mixture of Experts (MoE): 8 experts with top-2 routing and shared expert support
- SWiGLU Activation: Advanced activation function in expert layers
- Sinusoidal Positional Embeddings: Position encoding for sequence understanding
- Interactive Interface: User-friendly Gradio interface with real-time generation
- Multiple Sampling Methods: Top-k sampling with temperature control
- Real-time Generation: Fast inference with automatic scaling
π§ Model Architecture
Default Configuration
- Embedding Dimensions: 384
- Decoder Layers: 6
- Attention Heads: 8
- MoE Experts: 8 (top-2 routing)
- Block Size: 128 tokens
- Vocabulary Size: Based on Llama-2-7b tokenizer (~32,000 tokens)
- Latent Dimension: 64 (for compressed attention)
ZeroGPU Configuration
- GPU Type: NVIDIA H200 slice
- Available VRAM: 70GB per workload
- Max Duration: 120 seconds per generation
- Deployment: Hugging Face Spaces with automatic scaling
π― Usage
- Enter your story prompt in the text box
- Select model checkpoint (Checkpoint 2000 available)
- Adjust generation parameters:
- Max Length: 10-128 tokens
- Temperature: 0.1-2.0 (creativity vs coherence)
- Top-k: 1-100 (vocabulary filtering)
- Click "Generate Text" to create your AI-generated story
- Enjoy your personalized story!
π‘ Generation Tips
- Lower temperature (0.1-0.7) for more coherent and focused stories
- Higher temperature (0.8-2.0) for more creative and diverse outputs
- Adjust top-k to control vocabulary diversity and randomness
- Use descriptive prompts for better and more relevant results
- Experiment with different lengths to find your preferred story format
π ZeroGPU Benefits
- Free GPU Access: No cost for users to generate stories
- Efficient Resource Usage: GPU allocated only when needed for inference
- Automatic Scaling: Handles multiple concurrent users seamlessly
- High Performance: NVIDIA H200 acceleration for fast generation
- No Setup Required: Ready-to-use interface with pre-loaded model
ποΈ Technical Implementation
Model Features
- Latent Attention: Compressed key-value representations for efficiency
- Mixture of Experts: 8 experts with intelligent routing
- Advanced Activation: SWiGLU for better performance
- Positional Encoding: Sinusoidal embeddings for sequence understanding
Deployment Features
- ZeroGPU Decorator:
@spaces.GPU(duration=120)for dynamic allocation - Optimized Loading: Efficient model loading and initialization
- Error Handling: Robust error management for better user experience
- Real-time Feedback: Live generation status and results
π Local Development
Want to run this locally or contribute? Check out the full repository:
π Source Code: YuvrajSingh-mist/SmolHub/StoryKimi
Quick Local Setup
# Clone the repository
git clone https://github.com/YuvrajSingh-mist/SmolHub.git
cd SmolHub/StoryKimi
# Install dependencies
chmod +x install.sh
./install.sh
# Run Gradio interface
cd gradio
python app.py
Training Your Own Model
# Set your HF token for Llama-2 tokenizer access
export HF_TOKEN="your_token_here"
# Basic training
python trainer.py
# Advanced training with custom parameters
python trainer.py --embeddings_dims 512 --experts 16 --epochs 5
π Model Performance
The model has been trained on diverse text data and shows strong performance in:
- Story Generation: Creative and coherent narrative creation
- Text Continuation: Natural extension of given prompts
- Style Adaptation: Adapting to different writing styles and genres
- Character Development: Creating consistent characters and dialogue
π Related Links
- Full Project: SmolHub Repository
- Model Weights: HuggingFace Model
- Training Report: WandB Results
- Other Models: SmolMixtral, SmolTransformer
π License
MIT License - See LICENSE file for details
