Spaces:

YuvrajSingh9886
/

StoryKimi-Zero

Sleeping

App Files Files Community

StoryKimi-Zero / README.md

yuvraj-singh-9886

Add StoryKimi ZeroGPU implementation

3b70c60 4 months ago

preview code

raw

history blame contribute delete

5.89 kB

	---
	title: StoryKimi Zero
	emoji: 🚀
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.42.0
	app_file: app.py
	pinned: false
	license: mit
	hardware: zero-gpu
	short_description: Generate stories with StoryKimi model using ZeroGPU
	---

	# StoryKimi Zero - DeepSeek V3 Inspired Model on ZeroGPU

	A PyTorch implementation of a DeepSeek V3 inspired transformer model with Mixture of Experts (MoE), Latent Attention, and other advanced features, deployed on Hugging Face Spaces with ZeroGPU for efficient inference.

	![StoryKimi Model](https://huggingface.co/YuvrajSingh9886/StoryKimi/resolve/main/images/image.png)

	## 📊 Training Results & Model Weights

	📈 View Training Report: [StoryKimi Training Results on WandB](https://wandb.ai/rentio/DSV-Training/reports/SmolKimi-A-smaller-Kimi-K2---VmlldzoxMzYwNDQ4Mg?accessToken=lfs6n1y7gn8q0f0dwilta8yuwzxel45ztzbbcavwbqp7jsyv1p7cz9elflycv9fg)

	💾 Pre-trained Weights:
	- Hugging Face Model: [YuvrajSingh9886/StoryKimi](https://huggingface.co/YuvrajSingh9886/StoryKimi)
	- WandB Checkpoints: Check the WandB report above for additional trained model checkpoints

	## 🌟 Features

	- ZeroGPU Integration: Dynamic GPU allocation with NVIDIA H200 slices (70GB VRAM)
	- Latent Attention: Efficient attention mechanism with compressed key-value representations
	- Mixture of Experts (MoE): 8 experts with top-2 routing and shared expert support
	- SWiGLU Activation: Advanced activation function in expert layers
	- Sinusoidal Positional Embeddings: Position encoding for sequence understanding
	- Interactive Interface: User-friendly Gradio interface with real-time generation
	- Multiple Sampling Methods: Top-k sampling with temperature control
	- Real-time Generation: Fast inference with automatic scaling

	## 🔧 Model Architecture

	### Default Configuration
	- Embedding Dimensions: 384
	- Decoder Layers: 6
	- Attention Heads: 8
	- MoE Experts: 8 (top-2 routing)
	- Block Size: 128 tokens
	- Vocabulary Size: Based on Llama-2-7b tokenizer (~32,000 tokens)
	- Latent Dimension: 64 (for compressed attention)

	### ZeroGPU Configuration
	- GPU Type: NVIDIA H200 slice
	- Available VRAM: 70GB per workload
	- Max Duration: 120 seconds per generation
	- Deployment: Hugging Face Spaces with automatic scaling

	## 🎯 Usage

	1. Enter your story prompt in the text box
	2. Select model checkpoint (Checkpoint 2000 available)
	3. Adjust generation parameters:
	- Max Length: 10-128 tokens
	- Temperature: 0.1-2.0 (creativity vs coherence)
	- Top-k: 1-100 (vocabulary filtering)
	4. Click "Generate Text" to create your AI-generated story
	5. Enjoy your personalized story!

	## 💡 Generation Tips

	- Lower temperature (0.1-0.7) for more coherent and focused stories
	- Higher temperature (0.8-2.0) for more creative and diverse outputs
	- Adjust top-k to control vocabulary diversity and randomness
	- Use descriptive prompts for better and more relevant results
	- Experiment with different lengths to find your preferred story format

	## 🔄 ZeroGPU Benefits

	- Free GPU Access: No cost for users to generate stories
	- Efficient Resource Usage: GPU allocated only when needed for inference
	- Automatic Scaling: Handles multiple concurrent users seamlessly
	- High Performance: NVIDIA H200 acceleration for fast generation
	- No Setup Required: Ready-to-use interface with pre-loaded model

	## 🏗️ Technical Implementation

	### Model Features
	- Latent Attention: Compressed key-value representations for efficiency
	- Mixture of Experts: 8 experts with intelligent routing
	- Advanced Activation: SWiGLU for better performance
	- Positional Encoding: Sinusoidal embeddings for sequence understanding

	### Deployment Features
	- ZeroGPU Decorator: `@spaces.GPU(duration=120)` for dynamic allocation
	- Optimized Loading: Efficient model loading and initialization
	- Error Handling: Robust error management for better user experience
	- Real-time Feedback: Live generation status and results

	## 🚀 Local Development

	Want to run this locally or contribute? Check out the full repository:

	📁 Source Code: [YuvrajSingh-mist/SmolHub/StoryKimi](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/StoryKimi)

	### Quick Local Setup
	```bash
	# Clone the repository
	git clone https://github.com/YuvrajSingh-mist/SmolHub.git
	cd SmolHub/StoryKimi

	# Install dependencies
	chmod +x install.sh
	./install.sh

	# Run Gradio interface
	cd gradio
	python app.py
	```

	### Training Your Own Model
	```bash
	# Set your HF token for Llama-2 tokenizer access
	export HF_TOKEN="your_token_here"

	# Basic training
	python trainer.py

	# Advanced training with custom parameters
	python trainer.py --embeddings_dims 512 --experts 16 --epochs 5
	```

	## 📊 Model Performance

	The model has been trained on diverse text data and shows strong performance in:
	- Story Generation: Creative and coherent narrative creation
	- Text Continuation: Natural extension of given prompts
	- Style Adaptation: Adapting to different writing styles and genres
	- Character Development: Creating consistent characters and dialogue

	## 🔗 Related Links

	- Full Project: [SmolHub Repository](https://github.com/YuvrajSingh-mist/SmolHub)
	- Model Weights: [HuggingFace Model](https://huggingface.co/YuvrajSingh9886/StoryKimi)
	- Training Report: [WandB Results](https://wandb.ai/rentio/DSV-Training/reports/SmolKimi-A-smaller-Kimi-K2---VmlldzoxMzYwNDQ4Mg?accessToken=lfs6n1y7gn8q0f0dwilta8yuwzxel45ztzbbcavwbqp7jsyv1p7cz9elflycv9fg)
	- Other Models: [SmolMixtral](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/SmolMixtral), [SmolTransformer](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/SmolTransformer)

	## 📝 License

	MIT License - See LICENSE file for details

	---
	title: StoryKimi Zero
	emoji: 🚀
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.42.0
	app_file: app.py
	pinned: false
	license: mit
	hardware: zero-gpu
	short_description: Generate stories with StoryKimi model using ZeroGPU
	---

	# StoryKimi Zero - DeepSeek V3 Inspired Model on ZeroGPU

	A PyTorch implementation of a DeepSeek V3 inspired transformer model with Mixture of Experts (MoE), Latent Attention, and other advanced features, deployed on Hugging Face Spaces with ZeroGPU for efficient inference.

	![StoryKimi Model](https://huggingface.co/YuvrajSingh9886/StoryKimi/resolve/main/images/image.png)

	## 📊 Training Results & Model Weights

	📈 View Training Report: [StoryKimi Training Results on WandB](https://wandb.ai/rentio/DSV-Training/reports/SmolKimi-A-smaller-Kimi-K2---VmlldzoxMzYwNDQ4Mg?accessToken=lfs6n1y7gn8q0f0dwilta8yuwzxel45ztzbbcavwbqp7jsyv1p7cz9elflycv9fg)

	💾 Pre-trained Weights:
	- Hugging Face Model: [YuvrajSingh9886/StoryKimi](https://huggingface.co/YuvrajSingh9886/StoryKimi)
	- WandB Checkpoints: Check the WandB report above for additional trained model checkpoints

	## 🌟 Features

	- ZeroGPU Integration: Dynamic GPU allocation with NVIDIA H200 slices (70GB VRAM)
	- Latent Attention: Efficient attention mechanism with compressed key-value representations
	- Mixture of Experts (MoE): 8 experts with top-2 routing and shared expert support
	- SWiGLU Activation: Advanced activation function in expert layers
	- Sinusoidal Positional Embeddings: Position encoding for sequence understanding
	- Interactive Interface: User-friendly Gradio interface with real-time generation
	- Multiple Sampling Methods: Top-k sampling with temperature control
	- Real-time Generation: Fast inference with automatic scaling

	## 🔧 Model Architecture

	### Default Configuration
	- Embedding Dimensions: 384
	- Decoder Layers: 6
	- Attention Heads: 8
	- MoE Experts: 8 (top-2 routing)
	- Block Size: 128 tokens
	- Vocabulary Size: Based on Llama-2-7b tokenizer (~32,000 tokens)
	- Latent Dimension: 64 (for compressed attention)

	### ZeroGPU Configuration
	- GPU Type: NVIDIA H200 slice
	- Available VRAM: 70GB per workload
	- Max Duration: 120 seconds per generation
	- Deployment: Hugging Face Spaces with automatic scaling

	## 🎯 Usage

	1. Enter your story prompt in the text box
	2. Select model checkpoint (Checkpoint 2000 available)
	3. Adjust generation parameters:
	- Max Length: 10-128 tokens
	- Temperature: 0.1-2.0 (creativity vs coherence)
	- Top-k: 1-100 (vocabulary filtering)
	4. Click "Generate Text" to create your AI-generated story
	5. Enjoy your personalized story!

	## 💡 Generation Tips

	- Lower temperature (0.1-0.7) for more coherent and focused stories
	- Higher temperature (0.8-2.0) for more creative and diverse outputs
	- Adjust top-k to control vocabulary diversity and randomness
	- Use descriptive prompts for better and more relevant results
	- Experiment with different lengths to find your preferred story format

	## 🔄 ZeroGPU Benefits

	- Free GPU Access: No cost for users to generate stories
	- Efficient Resource Usage: GPU allocated only when needed for inference
	- Automatic Scaling: Handles multiple concurrent users seamlessly
	- High Performance: NVIDIA H200 acceleration for fast generation
	- No Setup Required: Ready-to-use interface with pre-loaded model

	## 🏗️ Technical Implementation

	### Model Features
	- Latent Attention: Compressed key-value representations for efficiency
	- Mixture of Experts: 8 experts with intelligent routing
	- Advanced Activation: SWiGLU for better performance
	- Positional Encoding: Sinusoidal embeddings for sequence understanding

	### Deployment Features
	- ZeroGPU Decorator: `@spaces.GPU(duration=120)` for dynamic allocation
	- Optimized Loading: Efficient model loading and initialization
	- Error Handling: Robust error management for better user experience
	- Real-time Feedback: Live generation status and results

	## 🚀 Local Development

	Want to run this locally or contribute? Check out the full repository:

	📁 Source Code: [YuvrajSingh-mist/SmolHub/StoryKimi](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/StoryKimi)

	### Quick Local Setup
	```bash
	# Clone the repository
	git clone https://github.com/YuvrajSingh-mist/SmolHub.git
	cd SmolHub/StoryKimi

	# Install dependencies
	chmod +x install.sh
	./install.sh

	# Run Gradio interface
	cd gradio
	python app.py
	```

	### Training Your Own Model
	```bash
	# Set your HF token for Llama-2 tokenizer access
	export HF_TOKEN="your_token_here"

	# Basic training
	python trainer.py

	# Advanced training with custom parameters
	python trainer.py --embeddings_dims 512 --experts 16 --epochs 5
	```

	## 📊 Model Performance

	The model has been trained on diverse text data and shows strong performance in:
	- Story Generation: Creative and coherent narrative creation
	- Text Continuation: Natural extension of given prompts
	- Style Adaptation: Adapting to different writing styles and genres
	- Character Development: Creating consistent characters and dialogue

	## 🔗 Related Links

	- Full Project: [SmolHub Repository](https://github.com/YuvrajSingh-mist/SmolHub)
	- Model Weights: [HuggingFace Model](https://huggingface.co/YuvrajSingh9886/StoryKimi)
	- Training Report: [WandB Results](https://wandb.ai/rentio/DSV-Training/reports/SmolKimi-A-smaller-Kimi-K2---VmlldzoxMzYwNDQ4Mg?accessToken=lfs6n1y7gn8q0f0dwilta8yuwzxel45ztzbbcavwbqp7jsyv1p7cz9elflycv9fg)
	- Other Models: [SmolMixtral](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/SmolMixtral), [SmolTransformer](https://github.com/YuvrajSingh-mist/SmolHub/tree/main/SmolTransformer)

	## 📝 License

	MIT License - See LICENSE file for details