Spaces:
Sleeping
Sleeping
File size: 3,987 Bytes
c7dcc92 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
# π Deployment Guide for HuggingFace Space with ZeroGPU
## β
Pre-Deployment Checklist
All code is ready! Here's what's configured:
- β
Model: `microsoft/Phi-3-mini-4k-instruct` (3.8B params)
- β
ZeroGPU support: Enabled with `@spaces.GPU` decorator
- β
Local/Space compatibility: Auto-detects environment
- β
Usage tracking: 50 requests/day per user
- β
Requirements: All dependencies listed
- β
README: Updated with instructions
## π Deployment Steps
### Step 1: Push Code to Your Space
```bash
cd /Users/tom/code/cojournalist-data
# If not already initialized
git init
git remote add space https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data
# Or if already connected
git add .
git commit -m "Deploy Phi-3-mini with ZeroGPU and usage tracking"
git push space main
```
### Step 2: Configure Space Hardware
1. Go to your Space: `https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data`
2. Click **Settings** (βοΈ icon in top right)
3. Scroll to **Hardware** section
4. Select **ZeroGPU** from dropdown
5. Click **Save**
6. Space will restart automatically
### Step 3: Wait for Build
The Space will:
1. Install dependencies (~2-3 minutes)
2. Download Phi-3-mini model (~1-2 minutes, 7.6GB)
3. Load model into memory (~30 seconds)
4. Launch Gradio interface
**Total build time: ~5-7 minutes**
### Step 4: Test Your Space
Once running, test with these queries:
1. **English:** "Who are the parliamentarians from Zurich?"
2. **German:** "Zeige mir aktuelle Abstimmungen zur Klimapolitik"
3. **French:** "Qui sont les parlementaires de Zurich?"
4. **Italian:** "Mostrami i voti recenti sulla politica climatica"
## π§ Space Settings Summary
### Hardware
- **Type:** ZeroGPU
- **Cost:** FREE (included with Team plan)
- **GPU:** Nvidia H200 (70GB VRAM)
- **Allocation:** Dynamic (only when needed)
### Environment Variables (Optional)
If you want to configure anything:
- `HF_TOKEN`: Your HuggingFace token (for private models, not needed for Phi-3)
## π Expected Behavior
### First Request
- Takes ~5-10 seconds (GPU allocation + inference)
- Subsequent requests faster (~2-5 seconds)
### Rate Limiting
- 50 requests per day per user IP
- Error message shown when limit reached
- Resets daily at midnight UTC
### Model Loading
- Happens once on Space startup
- Cached for subsequent requests
- No reload needed between requests
## π Troubleshooting
### "Model not loading"
- Check Space logs for errors
- Verify ZeroGPU is selected in Hardware settings
- Ensure `spaces>=0.28.0` in requirements.txt
### "Out of memory"
- This shouldn't happen with ZeroGPU (70GB VRAM)
- If it does, contact HF support
### "Rate limit not working"
- Usage tracker uses in-memory storage
- Resets on Space restart
- IP-based tracking (works in production)
### "Slow inference"
- First request allocates GPU (slower)
- Subsequent requests use cached allocation
- Normal: 2-5 seconds per request
## π° Cost Breakdown
- **Team Plan:** $20/user/month (you already have this)
- **ZeroGPU:** FREE (included)
- **Inference:** FREE (no API calls)
- **Storage:** FREE (model cached by HF)
**Total additional cost: $0/month** π
## π Updates & Maintenance
To update your Space:
```bash
# Make changes to code
git add .
git commit -m "Update: description of changes"
git push space main
```
Space will automatically rebuild and redeploy.
## π Monitoring Usage
Check your Space's metrics:
1. Go to Space page
2. Click "Analytics" tab
3. View daily/weekly usage stats
## π― Next Steps After Deployment
1. β
Test all 4 languages
2. β
Verify tool calling works
3. β
Check rate limiting
4. β
Monitor performance
5. π Adjust system prompt if needed
6. π Fine-tune temperature/max_tokens if needed
## π Support
If you encounter issues:
- Check Space logs (Settings β Logs)
- HuggingFace Discord: https://discord.gg/huggingface
- HF Forums: https://discuss.huggingface.co/
---
**You're ready to deploy! π**
|