Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.0.2
π Deployment Guide for HuggingFace Space with ZeroGPU
β Pre-Deployment Checklist
All code is ready! Here's what's configured:
- β
Model:
microsoft/Phi-3-mini-4k-instruct(3.8B params) - β
ZeroGPU support: Enabled with
@spaces.GPUdecorator - β Local/Space compatibility: Auto-detects environment
- β Usage tracking: 50 requests/day per user
- β Requirements: All dependencies listed
- β README: Updated with instructions
π Deployment Steps
Step 1: Push Code to Your Space
cd /Users/tom/code/cojournalist-data
# If not already initialized
git init
git remote add space https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data
# Or if already connected
git add .
git commit -m "Deploy Phi-3-mini with ZeroGPU and usage tracking"
git push space main
Step 2: Configure Space Hardware
- Go to your Space:
https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data - Click Settings (βοΈ icon in top right)
- Scroll to Hardware section
- Select ZeroGPU from dropdown
- Click Save
- Space will restart automatically
Step 3: Wait for Build
The Space will:
- Install dependencies (~2-3 minutes)
- Download Phi-3-mini model (~1-2 minutes, 7.6GB)
- Load model into memory (~30 seconds)
- Launch Gradio interface
Total build time: ~5-7 minutes
Step 4: Test Your Space
Once running, test with these queries:
- English: "Who are the parliamentarians from Zurich?"
- German: "Zeige mir aktuelle Abstimmungen zur Klimapolitik"
- French: "Qui sont les parlementaires de Zurich?"
- Italian: "Mostrami i voti recenti sulla politica climatica"
π§ Space Settings Summary
Hardware
- Type: ZeroGPU
- Cost: FREE (included with Team plan)
- GPU: Nvidia H200 (70GB VRAM)
- Allocation: Dynamic (only when needed)
Environment Variables (Optional)
If you want to configure anything:
HF_TOKEN: Your HuggingFace token (for private models, not needed for Phi-3)
π Expected Behavior
First Request
- Takes ~5-10 seconds (GPU allocation + inference)
- Subsequent requests faster (~2-5 seconds)
Rate Limiting
- 50 requests per day per user IP
- Error message shown when limit reached
- Resets daily at midnight UTC
Model Loading
- Happens once on Space startup
- Cached for subsequent requests
- No reload needed between requests
π Troubleshooting
"Model not loading"
- Check Space logs for errors
- Verify ZeroGPU is selected in Hardware settings
- Ensure
spaces>=0.28.0in requirements.txt
"Out of memory"
- This shouldn't happen with ZeroGPU (70GB VRAM)
- If it does, contact HF support
"Rate limit not working"
- Usage tracker uses in-memory storage
- Resets on Space restart
- IP-based tracking (works in production)
"Slow inference"
- First request allocates GPU (slower)
- Subsequent requests use cached allocation
- Normal: 2-5 seconds per request
π° Cost Breakdown
- Team Plan: $20/user/month (you already have this)
- ZeroGPU: FREE (included)
- Inference: FREE (no API calls)
- Storage: FREE (model cached by HF)
Total additional cost: $0/month π
π Updates & Maintenance
To update your Space:
# Make changes to code
git add .
git commit -m "Update: description of changes"
git push space main
Space will automatically rebuild and redeploy.
π Monitoring Usage
Check your Space's metrics:
- Go to Space page
- Click "Analytics" tab
- View daily/weekly usage stats
π― Next Steps After Deployment
- β Test all 4 languages
- β Verify tool calling works
- β Check rate limiting
- β Monitor performance
- π Adjust system prompt if needed
- π Fine-tune temperature/max_tokens if needed
π Support
If you encounter issues:
- Check Space logs (Settings β Logs)
- HuggingFace Discord: https://discord.gg/huggingface
- HF Forums: https://discuss.huggingface.co/
You're ready to deploy! π