Spaces:

YuvrajSingh9886
/

StoryKimi-Zero

Sleeping

App Files Files Community

StoryKimi-Zero / README.md

yuvraj-singh-9886

Add StoryKimi ZeroGPU implementation

3b70c60 4 months ago

preview code

raw

history blame contribute delete

5.89 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: StoryKimi Zero
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.42.0
app_file: app.py
pinned: false
license: mit
hardware: zero-gpu
short_description: Generate stories with StoryKimi model using ZeroGPU

StoryKimi Zero - DeepSeek V3 Inspired Model on ZeroGPU

A PyTorch implementation of a DeepSeek V3 inspired transformer model with Mixture of Experts (MoE), Latent Attention, and other advanced features, deployed on Hugging Face Spaces with ZeroGPU for efficient inference.

📊 Training Results & Model Weights

📈 View Training Report: StoryKimi Training Results on WandB

💾 Pre-trained Weights:

Hugging Face Model: YuvrajSingh9886/StoryKimi
WandB Checkpoints: Check the WandB report above for additional trained model checkpoints

🌟 Features

ZeroGPU Integration: Dynamic GPU allocation with NVIDIA H200 slices (70GB VRAM)
Latent Attention: Efficient attention mechanism with compressed key-value representations
Mixture of Experts (MoE): 8 experts with top-2 routing and shared expert support
SWiGLU Activation: Advanced activation function in expert layers
Sinusoidal Positional Embeddings: Position encoding for sequence understanding
Interactive Interface: User-friendly Gradio interface with real-time generation
Multiple Sampling Methods: Top-k sampling with temperature control
Real-time Generation: Fast inference with automatic scaling

🔧 Model Architecture

Default Configuration

Embedding Dimensions: 384
Decoder Layers: 6
Attention Heads: 8
MoE Experts: 8 (top-2 routing)
Block Size: 128 tokens
Vocabulary Size: Based on Llama-2-7b tokenizer (~32,000 tokens)
Latent Dimension: 64 (for compressed attention)

ZeroGPU Configuration

GPU Type: NVIDIA H200 slice
Available VRAM: 70GB per workload
Max Duration: 120 seconds per generation
Deployment: Hugging Face Spaces with automatic scaling

🎯 Usage

Enter your story prompt in the text box
Select model checkpoint (Checkpoint 2000 available)
Adjust generation parameters:
- Max Length: 10-128 tokens
- Temperature: 0.1-2.0 (creativity vs coherence)
- Top-k: 1-100 (vocabulary filtering)
Click "Generate Text" to create your AI-generated story
Enjoy your personalized story!

💡 Generation Tips

Lower temperature (0.1-0.7) for more coherent and focused stories
Higher temperature (0.8-2.0) for more creative and diverse outputs
Adjust top-k to control vocabulary diversity and randomness
Use descriptive prompts for better and more relevant results
Experiment with different lengths to find your preferred story format

🔄 ZeroGPU Benefits

Free GPU Access: No cost for users to generate stories
Efficient Resource Usage: GPU allocated only when needed for inference
Automatic Scaling: Handles multiple concurrent users seamlessly
High Performance: NVIDIA H200 acceleration for fast generation
No Setup Required: Ready-to-use interface with pre-loaded model

🏗️ Technical Implementation

Model Features

Latent Attention: Compressed key-value representations for efficiency
Mixture of Experts: 8 experts with intelligent routing
Advanced Activation: SWiGLU for better performance
Positional Encoding: Sinusoidal embeddings for sequence understanding

Deployment Features

ZeroGPU Decorator: @spaces.GPU(duration=120) for dynamic allocation
Optimized Loading: Efficient model loading and initialization
Error Handling: Robust error management for better user experience
Real-time Feedback: Live generation status and results

🚀 Local Development

Want to run this locally or contribute? Check out the full repository:

📁 Source Code: YuvrajSingh-mist/SmolHub/StoryKimi

Quick Local Setup

# Clone the repository
git clone https://github.com/YuvrajSingh-mist/SmolHub.git
cd SmolHub/StoryKimi

# Install dependencies
chmod +x install.sh
./install.sh

# Run Gradio interface
cd gradio
python app.py

Training Your Own Model

# Set your HF token for Llama-2 tokenizer access
export HF_TOKEN="your_token_here"

# Basic training
python trainer.py

# Advanced training with custom parameters
python trainer.py --embeddings_dims 512 --experts 16 --epochs 5

📊 Model Performance

The model has been trained on diverse text data and shows strong performance in:

Story Generation: Creative and coherent narrative creation
Text Continuation: Natural extension of given prompts
Style Adaptation: Adapting to different writing styles and genres
Character Development: Creating consistent characters and dialogue

🔗 Related Links

Full Project: SmolHub Repository
Model Weights: HuggingFace Model
Training Report: WandB Results
Other Models: SmolMixtral, SmolTransformer

📝 License

MIT License - See LICENSE file for details