cojournalist-data / DEPLOYMENT.md
Tom
Deploy Phi-3-mini with ZeroGPU and 50 req/day limit
c7dcc92

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

πŸš€ Deployment Guide for HuggingFace Space with ZeroGPU

βœ… Pre-Deployment Checklist

All code is ready! Here's what's configured:

  • βœ… Model: microsoft/Phi-3-mini-4k-instruct (3.8B params)
  • βœ… ZeroGPU support: Enabled with @spaces.GPU decorator
  • βœ… Local/Space compatibility: Auto-detects environment
  • βœ… Usage tracking: 50 requests/day per user
  • βœ… Requirements: All dependencies listed
  • βœ… README: Updated with instructions

πŸ“‹ Deployment Steps

Step 1: Push Code to Your Space

cd /Users/tom/code/cojournalist-data

# If not already initialized
git init
git remote add space https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data

# Or if already connected
git add .
git commit -m "Deploy Phi-3-mini with ZeroGPU and usage tracking"
git push space main

Step 2: Configure Space Hardware

  1. Go to your Space: https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data
  2. Click Settings (βš™οΈ icon in top right)
  3. Scroll to Hardware section
  4. Select ZeroGPU from dropdown
  5. Click Save
  6. Space will restart automatically

Step 3: Wait for Build

The Space will:

  1. Install dependencies (~2-3 minutes)
  2. Download Phi-3-mini model (~1-2 minutes, 7.6GB)
  3. Load model into memory (~30 seconds)
  4. Launch Gradio interface

Total build time: ~5-7 minutes

Step 4: Test Your Space

Once running, test with these queries:

  1. English: "Who are the parliamentarians from Zurich?"
  2. German: "Zeige mir aktuelle Abstimmungen zur Klimapolitik"
  3. French: "Qui sont les parlementaires de Zurich?"
  4. Italian: "Mostrami i voti recenti sulla politica climatica"

πŸ”§ Space Settings Summary

Hardware

  • Type: ZeroGPU
  • Cost: FREE (included with Team plan)
  • GPU: Nvidia H200 (70GB VRAM)
  • Allocation: Dynamic (only when needed)

Environment Variables (Optional)

If you want to configure anything:

  • HF_TOKEN: Your HuggingFace token (for private models, not needed for Phi-3)

πŸ“Š Expected Behavior

First Request

  • Takes ~5-10 seconds (GPU allocation + inference)
  • Subsequent requests faster (~2-5 seconds)

Rate Limiting

  • 50 requests per day per user IP
  • Error message shown when limit reached
  • Resets daily at midnight UTC

Model Loading

  • Happens once on Space startup
  • Cached for subsequent requests
  • No reload needed between requests

πŸ› Troubleshooting

"Model not loading"

  • Check Space logs for errors
  • Verify ZeroGPU is selected in Hardware settings
  • Ensure spaces>=0.28.0 in requirements.txt

"Out of memory"

  • This shouldn't happen with ZeroGPU (70GB VRAM)
  • If it does, contact HF support

"Rate limit not working"

  • Usage tracker uses in-memory storage
  • Resets on Space restart
  • IP-based tracking (works in production)

"Slow inference"

  • First request allocates GPU (slower)
  • Subsequent requests use cached allocation
  • Normal: 2-5 seconds per request

πŸ’° Cost Breakdown

  • Team Plan: $20/user/month (you already have this)
  • ZeroGPU: FREE (included)
  • Inference: FREE (no API calls)
  • Storage: FREE (model cached by HF)

Total additional cost: $0/month πŸŽ‰

πŸ”„ Updates & Maintenance

To update your Space:

# Make changes to code
git add .
git commit -m "Update: description of changes"
git push space main

Space will automatically rebuild and redeploy.

πŸ“ˆ Monitoring Usage

Check your Space's metrics:

  1. Go to Space page
  2. Click "Analytics" tab
  3. View daily/weekly usage stats

🎯 Next Steps After Deployment

  1. βœ… Test all 4 languages
  2. βœ… Verify tool calling works
  3. βœ… Check rate limiting
  4. βœ… Monitor performance
  5. πŸ”œ Adjust system prompt if needed
  6. πŸ”œ Fine-tune temperature/max_tokens if needed

πŸ“ž Support

If you encounter issues:


You're ready to deploy! πŸš€