Spaces:

tomvaillant
/

cojournalist-data

Sleeping

File size: 3,987 Bytes

c7dcc92

# 🚀 Deployment Guide for HuggingFace Space with ZeroGPU

## ✅ Pre-Deployment Checklist

All code is ready! Here's what's configured:

- ✅ Model: `microsoft/Phi-3-mini-4k-instruct` (3.8B params)
- ✅ ZeroGPU support: Enabled with `@spaces.GPU` decorator
- ✅ Local/Space compatibility: Auto-detects environment
- ✅ Usage tracking: 50 requests/day per user
- ✅ Requirements: All dependencies listed
- ✅ README: Updated with instructions

## 📋 Deployment Steps

### Step 1: Push Code to Your Space

```bash
cd /Users/tom/code/cojournalist-data

# If not already initialized
git init
git remote add space https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data

# Or if already connected
git add .
git commit -m "Deploy Phi-3-mini with ZeroGPU and usage tracking"
git push space main
```

### Step 2: Configure Space Hardware

1. Go to your Space: `https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data`
2. Click **Settings** (⚙️ icon in top right)
3. Scroll to **Hardware** section
4. Select **ZeroGPU** from dropdown
5. Click **Save**
6. Space will restart automatically

### Step 3: Wait for Build

The Space will:
1. Install dependencies (~2-3 minutes)
2. Download Phi-3-mini model (~1-2 minutes, 7.6GB)
3. Load model into memory (~30 seconds)
4. Launch Gradio interface

**Total build time: ~5-7 minutes**

### Step 4: Test Your Space

Once running, test with these queries:

1. **English:** "Who are the parliamentarians from Zurich?"
2. **German:** "Zeige mir aktuelle Abstimmungen zur Klimapolitik"
3. **French:** "Qui sont les parlementaires de Zurich?"
4. **Italian:** "Mostrami i voti recenti sulla politica climatica"

## 🔧 Space Settings Summary

### Hardware
- **Type:** ZeroGPU
- **Cost:** FREE (included with Team plan)
- **GPU:** Nvidia H200 (70GB VRAM)
- **Allocation:** Dynamic (only when needed)

### Environment Variables (Optional)
If you want to configure anything:
- `HF_TOKEN`: Your HuggingFace token (for private models, not needed for Phi-3)

## 📊 Expected Behavior

### First Request
- Takes ~5-10 seconds (GPU allocation + inference)
- Subsequent requests faster (~2-5 seconds)

### Rate Limiting
- 50 requests per day per user IP
- Error message shown when limit reached
- Resets daily at midnight UTC

### Model Loading
- Happens once on Space startup
- Cached for subsequent requests
- No reload needed between requests

## 🐛 Troubleshooting

### "Model not loading"
- Check Space logs for errors
- Verify ZeroGPU is selected in Hardware settings
- Ensure `spaces>=0.28.0` in requirements.txt

### "Out of memory"
- This shouldn't happen with ZeroGPU (70GB VRAM)
- If it does, contact HF support

### "Rate limit not working"
- Usage tracker uses in-memory storage
- Resets on Space restart
- IP-based tracking (works in production)

### "Slow inference"
- First request allocates GPU (slower)
- Subsequent requests use cached allocation
- Normal: 2-5 seconds per request

## 💰 Cost Breakdown

- **Team Plan:** $20/user/month (you already have this)
- **ZeroGPU:** FREE (included)
- **Inference:** FREE (no API calls)
- **Storage:** FREE (model cached by HF)

**Total additional cost: $0/month** 🎉

## 🔄 Updates & Maintenance

To update your Space:
```bash
# Make changes to code
git add .
git commit -m "Update: description of changes"
git push space main
```

Space will automatically rebuild and redeploy.

## 📈 Monitoring Usage

Check your Space's metrics:
1. Go to Space page
2. Click "Analytics" tab
3. View daily/weekly usage stats

## 🎯 Next Steps After Deployment

1. ✅ Test all 4 languages
2. ✅ Verify tool calling works
3. ✅ Check rate limiting
4. ✅ Monitor performance
5. 🔜 Adjust system prompt if needed
6. 🔜 Fine-tune temperature/max_tokens if needed

## 📞 Support

If you encounter issues:
- Check Space logs (Settings → Logs)
- HuggingFace Discord: https://discord.gg/huggingface
- HF Forums: https://discuss.huggingface.co/

---

**You're ready to deploy! 🚀**