Spaces:
Running
Running
Tom
commited on
Commit
·
c7dcc92
1
Parent(s):
5a32df9
Deploy Phi-3-mini with ZeroGPU and 50 req/day limit
Browse files- DEPLOYMENT.md +154 -0
- README.md +49 -4
- app.py +123 -45
- requirements.txt +6 -0
- usage_tracker.py +75 -0
DEPLOYMENT.md
ADDED
|
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🚀 Deployment Guide for HuggingFace Space with ZeroGPU
|
| 2 |
+
|
| 3 |
+
## ✅ Pre-Deployment Checklist
|
| 4 |
+
|
| 5 |
+
All code is ready! Here's what's configured:
|
| 6 |
+
|
| 7 |
+
- ✅ Model: `microsoft/Phi-3-mini-4k-instruct` (3.8B params)
|
| 8 |
+
- ✅ ZeroGPU support: Enabled with `@spaces.GPU` decorator
|
| 9 |
+
- ✅ Local/Space compatibility: Auto-detects environment
|
| 10 |
+
- ✅ Usage tracking: 50 requests/day per user
|
| 11 |
+
- ✅ Requirements: All dependencies listed
|
| 12 |
+
- ✅ README: Updated with instructions
|
| 13 |
+
|
| 14 |
+
## 📋 Deployment Steps
|
| 15 |
+
|
| 16 |
+
### Step 1: Push Code to Your Space
|
| 17 |
+
|
| 18 |
+
```bash
|
| 19 |
+
cd /Users/tom/code/cojournalist-data
|
| 20 |
+
|
| 21 |
+
# If not already initialized
|
| 22 |
+
git init
|
| 23 |
+
git remote add space https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data
|
| 24 |
+
|
| 25 |
+
# Or if already connected
|
| 26 |
+
git add .
|
| 27 |
+
git commit -m "Deploy Phi-3-mini with ZeroGPU and usage tracking"
|
| 28 |
+
git push space main
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
### Step 2: Configure Space Hardware
|
| 32 |
+
|
| 33 |
+
1. Go to your Space: `https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data`
|
| 34 |
+
2. Click **Settings** (⚙️ icon in top right)
|
| 35 |
+
3. Scroll to **Hardware** section
|
| 36 |
+
4. Select **ZeroGPU** from dropdown
|
| 37 |
+
5. Click **Save**
|
| 38 |
+
6. Space will restart automatically
|
| 39 |
+
|
| 40 |
+
### Step 3: Wait for Build
|
| 41 |
+
|
| 42 |
+
The Space will:
|
| 43 |
+
1. Install dependencies (~2-3 minutes)
|
| 44 |
+
2. Download Phi-3-mini model (~1-2 minutes, 7.6GB)
|
| 45 |
+
3. Load model into memory (~30 seconds)
|
| 46 |
+
4. Launch Gradio interface
|
| 47 |
+
|
| 48 |
+
**Total build time: ~5-7 minutes**
|
| 49 |
+
|
| 50 |
+
### Step 4: Test Your Space
|
| 51 |
+
|
| 52 |
+
Once running, test with these queries:
|
| 53 |
+
|
| 54 |
+
1. **English:** "Who are the parliamentarians from Zurich?"
|
| 55 |
+
2. **German:** "Zeige mir aktuelle Abstimmungen zur Klimapolitik"
|
| 56 |
+
3. **French:** "Qui sont les parlementaires de Zurich?"
|
| 57 |
+
4. **Italian:** "Mostrami i voti recenti sulla politica climatica"
|
| 58 |
+
|
| 59 |
+
## 🔧 Space Settings Summary
|
| 60 |
+
|
| 61 |
+
### Hardware
|
| 62 |
+
- **Type:** ZeroGPU
|
| 63 |
+
- **Cost:** FREE (included with Team plan)
|
| 64 |
+
- **GPU:** Nvidia H200 (70GB VRAM)
|
| 65 |
+
- **Allocation:** Dynamic (only when needed)
|
| 66 |
+
|
| 67 |
+
### Environment Variables (Optional)
|
| 68 |
+
If you want to configure anything:
|
| 69 |
+
- `HF_TOKEN`: Your HuggingFace token (for private models, not needed for Phi-3)
|
| 70 |
+
|
| 71 |
+
## 📊 Expected Behavior
|
| 72 |
+
|
| 73 |
+
### First Request
|
| 74 |
+
- Takes ~5-10 seconds (GPU allocation + inference)
|
| 75 |
+
- Subsequent requests faster (~2-5 seconds)
|
| 76 |
+
|
| 77 |
+
### Rate Limiting
|
| 78 |
+
- 50 requests per day per user IP
|
| 79 |
+
- Error message shown when limit reached
|
| 80 |
+
- Resets daily at midnight UTC
|
| 81 |
+
|
| 82 |
+
### Model Loading
|
| 83 |
+
- Happens once on Space startup
|
| 84 |
+
- Cached for subsequent requests
|
| 85 |
+
- No reload needed between requests
|
| 86 |
+
|
| 87 |
+
## 🐛 Troubleshooting
|
| 88 |
+
|
| 89 |
+
### "Model not loading"
|
| 90 |
+
- Check Space logs for errors
|
| 91 |
+
- Verify ZeroGPU is selected in Hardware settings
|
| 92 |
+
- Ensure `spaces>=0.28.0` in requirements.txt
|
| 93 |
+
|
| 94 |
+
### "Out of memory"
|
| 95 |
+
- This shouldn't happen with ZeroGPU (70GB VRAM)
|
| 96 |
+
- If it does, contact HF support
|
| 97 |
+
|
| 98 |
+
### "Rate limit not working"
|
| 99 |
+
- Usage tracker uses in-memory storage
|
| 100 |
+
- Resets on Space restart
|
| 101 |
+
- IP-based tracking (works in production)
|
| 102 |
+
|
| 103 |
+
### "Slow inference"
|
| 104 |
+
- First request allocates GPU (slower)
|
| 105 |
+
- Subsequent requests use cached allocation
|
| 106 |
+
- Normal: 2-5 seconds per request
|
| 107 |
+
|
| 108 |
+
## 💰 Cost Breakdown
|
| 109 |
+
|
| 110 |
+
- **Team Plan:** $20/user/month (you already have this)
|
| 111 |
+
- **ZeroGPU:** FREE (included)
|
| 112 |
+
- **Inference:** FREE (no API calls)
|
| 113 |
+
- **Storage:** FREE (model cached by HF)
|
| 114 |
+
|
| 115 |
+
**Total additional cost: $0/month** 🎉
|
| 116 |
+
|
| 117 |
+
## 🔄 Updates & Maintenance
|
| 118 |
+
|
| 119 |
+
To update your Space:
|
| 120 |
+
```bash
|
| 121 |
+
# Make changes to code
|
| 122 |
+
git add .
|
| 123 |
+
git commit -m "Update: description of changes"
|
| 124 |
+
git push space main
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
Space will automatically rebuild and redeploy.
|
| 128 |
+
|
| 129 |
+
## 📈 Monitoring Usage
|
| 130 |
+
|
| 131 |
+
Check your Space's metrics:
|
| 132 |
+
1. Go to Space page
|
| 133 |
+
2. Click "Analytics" tab
|
| 134 |
+
3. View daily/weekly usage stats
|
| 135 |
+
|
| 136 |
+
## 🎯 Next Steps After Deployment
|
| 137 |
+
|
| 138 |
+
1. ✅ Test all 4 languages
|
| 139 |
+
2. ✅ Verify tool calling works
|
| 140 |
+
3. ✅ Check rate limiting
|
| 141 |
+
4. ✅ Monitor performance
|
| 142 |
+
5. 🔜 Adjust system prompt if needed
|
| 143 |
+
6. 🔜 Fine-tune temperature/max_tokens if needed
|
| 144 |
+
|
| 145 |
+
## 📞 Support
|
| 146 |
+
|
| 147 |
+
If you encounter issues:
|
| 148 |
+
- Check Space logs (Settings → Logs)
|
| 149 |
+
- HuggingFace Discord: https://discord.gg/huggingface
|
| 150 |
+
- HF Forums: https://discuss.huggingface.co/
|
| 151 |
+
|
| 152 |
+
---
|
| 153 |
+
|
| 154 |
+
**You're ready to deploy! 🚀**
|
README.md
CHANGED
|
@@ -1,13 +1,58 @@
|
|
| 1 |
---
|
| 2 |
title: Cojournalist Data
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 5.49.1
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
-
short_description: Data
|
| 11 |
---
|
| 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 1 |
---
|
| 2 |
title: Cojournalist Data
|
| 3 |
+
emoji: 🏛️
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 5.49.1
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
short_description: Swiss Parliamentary Data Chatbot with Phi-3-mini
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# 🏛️ CoJournalist Data
|
| 14 |
+
|
| 15 |
+
A Swiss Parliamentary Data Chatbot powered by Phi-3-mini and the OpenParlData MCP server.
|
| 16 |
+
|
| 17 |
+
## Features
|
| 18 |
+
|
| 19 |
+
- 🤖 **Phi-3-mini-4k-instruct** - Efficient 3.8B parameter model running on ZeroGPU
|
| 20 |
+
- 🌍 **Multilingual** - Support for English, German, French, and Italian
|
| 21 |
+
- 🛠️ **Tool Calling** - Intelligent query routing to parliamentary data APIs
|
| 22 |
+
- 🔒 **Rate Limited** - 50 requests per day per user for cost control
|
| 23 |
+
- ⚡ **ZeroGPU** - FREE GPU inference for PRO users
|
| 24 |
+
|
| 25 |
+
## Space Settings Required
|
| 26 |
+
|
| 27 |
+
**IMPORTANT:** To run this Space, you need to configure the following in your HuggingFace Space settings:
|
| 28 |
+
|
| 29 |
+
### 1. Hardware Selection
|
| 30 |
+
- Go to **Settings** → **Hardware**
|
| 31 |
+
- Select **ZeroGPU** (FREE for PRO users)
|
| 32 |
+
- Save changes
|
| 33 |
+
|
| 34 |
+
### 2. Environment Variables (Optional)
|
| 35 |
+
If you want to use the OpenParlData API when it's available:
|
| 36 |
+
- Add `HF_TOKEN` with your HuggingFace token
|
| 37 |
+
|
| 38 |
+
## Usage
|
| 39 |
+
|
| 40 |
+
Simply ask questions about Swiss parliamentary data in natural language:
|
| 41 |
+
- "Who are the parliamentarians from Zurich?"
|
| 42 |
+
- "Show me recent votes about climate policy"
|
| 43 |
+
- "What motions were submitted about healthcare in 2024?"
|
| 44 |
+
|
| 45 |
+
## Architecture
|
| 46 |
+
|
| 47 |
+
- **Model:** microsoft/Phi-3-mini-4k-instruct (3.8B params)
|
| 48 |
+
- **GPU:** ZeroGPU (H200) with dynamic allocation
|
| 49 |
+
- **Framework:** Gradio + Transformers + PyTorch
|
| 50 |
+
- **MCP Integration:** OpenParlData server for parliamentary data
|
| 51 |
+
|
| 52 |
+
## Cost
|
| 53 |
+
|
| 54 |
+
- **HF PRO:** $9/month (required for ZeroGPU)
|
| 55 |
+
- **Inference:** FREE (included with PRO subscription)
|
| 56 |
+
- **Total:** $9/month for unlimited usage within ZeroGPU quotas
|
| 57 |
+
|
| 58 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
app.py
CHANGED
|
@@ -1,25 +1,62 @@
|
|
| 1 |
"""
|
| 2 |
CoJournalist Data - Swiss Parliamentary Data Chatbot
|
| 3 |
-
Powered by
|
| 4 |
"""
|
| 5 |
|
| 6 |
import os
|
| 7 |
import json
|
| 8 |
import gradio as gr
|
| 9 |
-
from huggingface_hub import InferenceClient
|
| 10 |
from dotenv import load_dotenv
|
| 11 |
from mcp_integration import execute_mcp_query, OpenParlDataClient
|
| 12 |
import asyncio
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
# Load environment variables
|
| 15 |
load_dotenv()
|
| 16 |
|
| 17 |
-
# Initialize
|
| 18 |
-
|
| 19 |
-
if not HF_TOKEN:
|
| 20 |
-
print("Warning: HF_TOKEN not found. Please set it in .env file or Hugging Face Space secrets.")
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
# Available languages
|
| 25 |
LANGUAGES = {
|
|
@@ -29,7 +66,7 @@ LANGUAGES = {
|
|
| 29 |
"Italiano": "it"
|
| 30 |
}
|
| 31 |
|
| 32 |
-
# System prompt for
|
| 33 |
SYSTEM_PROMPT = """You are a helpful assistant that helps users query Swiss parliamentary data.
|
| 34 |
|
| 35 |
You have access to the following tools from the OpenParlData MCP server:
|
|
@@ -52,11 +89,13 @@ You have access to the following tools from the OpenParlData MCP server:
|
|
| 52 |
6. **openparldata_search_debates** - Search debate transcripts
|
| 53 |
Parameters: query, date_from, date_to, speaker_id, language, limit
|
| 54 |
|
|
|
|
|
|
|
| 55 |
When a user asks a question about Swiss parliamentary data:
|
| 56 |
1. Analyze what information they need
|
| 57 |
2. Determine which tool(s) to use
|
| 58 |
3. Extract the relevant parameters from their question
|
| 59 |
-
4. Respond with a JSON object containing the tool call
|
| 60 |
|
| 61 |
Your response should be in this exact format:
|
| 62 |
{
|
|
@@ -116,51 +155,79 @@ EXAMPLES = {
|
|
| 116 |
}
|
| 117 |
|
| 118 |
|
| 119 |
-
|
| 120 |
-
"""Query
|
| 121 |
|
| 122 |
try:
|
| 123 |
-
#
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
|
| 140 |
# Try to parse as JSON
|
| 141 |
try:
|
| 142 |
-
# Clean up response
|
| 143 |
clean_response = assistant_message.strip()
|
|
|
|
|
|
|
| 144 |
if clean_response.startswith("```json"):
|
| 145 |
clean_response = clean_response[7:]
|
| 146 |
-
|
| 147 |
clean_response = clean_response[3:]
|
|
|
|
| 148 |
if clean_response.endswith("```"):
|
| 149 |
clean_response = clean_response[:-3]
|
|
|
|
| 150 |
clean_response = clean_response.strip()
|
| 151 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
return json.loads(clean_response)
|
| 153 |
except json.JSONDecodeError:
|
| 154 |
# If not valid JSON, treat as natural language response
|
| 155 |
return {"response": assistant_message}
|
| 156 |
|
| 157 |
except Exception as e:
|
| 158 |
-
return {"error": f"Error querying
|
| 159 |
|
| 160 |
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
|
|
|
|
|
|
| 164 |
|
| 165 |
|
| 166 |
async def execute_tool_async(tool_name: str, arguments: dict, show_debug: bool) -> tuple:
|
|
@@ -185,22 +252,22 @@ def chat_response(message: str, history: list, language: str, show_debug: bool)
|
|
| 185 |
# Get language code
|
| 186 |
lang_code = LANGUAGES.get(language, "en")
|
| 187 |
|
| 188 |
-
# Query
|
| 189 |
-
|
| 190 |
|
| 191 |
# Check if it's a direct response (no tool call needed)
|
| 192 |
-
if "response" in
|
| 193 |
-
return
|
| 194 |
|
| 195 |
# Check for error
|
| 196 |
-
if "error" in
|
| 197 |
-
return f"❌ {
|
| 198 |
|
| 199 |
# Execute tool call
|
| 200 |
-
if "tool" in
|
| 201 |
-
tool_name =
|
| 202 |
-
arguments =
|
| 203 |
-
explanation =
|
| 204 |
|
| 205 |
# Ensure language is set in arguments
|
| 206 |
if "language" not in arguments:
|
|
@@ -315,10 +382,19 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
|
|
| 315 |
)
|
| 316 |
|
| 317 |
# Handle message submission
|
| 318 |
-
def respond(message, chat_history, language, show_debug):
|
| 319 |
if not message.strip():
|
| 320 |
return "", chat_history
|
| 321 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 322 |
# Get bot response
|
| 323 |
bot_message = chat_response(message, chat_history, language, show_debug)
|
| 324 |
|
|
@@ -341,7 +417,9 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
|
|
| 341 |
**Note:** This app uses the OpenParlData MCP server to access Swiss parliamentary data.
|
| 342 |
Currently returning mock data while the OpenParlData API is in development.
|
| 343 |
|
| 344 |
-
|
|
|
|
|
|
|
| 345 |
"""
|
| 346 |
)
|
| 347 |
|
|
|
|
| 1 |
"""
|
| 2 |
CoJournalist Data - Swiss Parliamentary Data Chatbot
|
| 3 |
+
Powered by Phi-3-mini and OpenParlData MCP
|
| 4 |
"""
|
| 5 |
|
| 6 |
import os
|
| 7 |
import json
|
| 8 |
import gradio as gr
|
|
|
|
| 9 |
from dotenv import load_dotenv
|
| 10 |
from mcp_integration import execute_mcp_query, OpenParlDataClient
|
| 11 |
import asyncio
|
| 12 |
+
from usage_tracker import UsageTracker
|
| 13 |
+
import torch
|
| 14 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 15 |
+
|
| 16 |
+
# Import spaces only if available (for HuggingFace Spaces)
|
| 17 |
+
try:
|
| 18 |
+
import spaces
|
| 19 |
+
SPACES_AVAILABLE = True
|
| 20 |
+
except ImportError:
|
| 21 |
+
SPACES_AVAILABLE = False
|
| 22 |
+
print("Running locally without ZeroGPU support")
|
| 23 |
|
| 24 |
# Load environment variables
|
| 25 |
load_dotenv()
|
| 26 |
|
| 27 |
+
# Initialize usage tracker with 50 requests per day limit
|
| 28 |
+
tracker = UsageTracker(daily_limit=50)
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
# Initialize model and tokenizer
|
| 31 |
+
MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
|
| 32 |
+
print(f"Loading model: {MODEL_NAME}")
|
| 33 |
+
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
|
| 34 |
+
|
| 35 |
+
# Detect device (MPS for Mac, CUDA for GPU, CPU fallback)
|
| 36 |
+
if torch.cuda.is_available():
|
| 37 |
+
device = "cuda"
|
| 38 |
+
dtype = torch.float16
|
| 39 |
+
elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
|
| 40 |
+
device = "mps"
|
| 41 |
+
dtype = torch.float16
|
| 42 |
+
else:
|
| 43 |
+
device = "cpu"
|
| 44 |
+
dtype = torch.float32
|
| 45 |
+
|
| 46 |
+
print(f"Using device: {device}")
|
| 47 |
+
|
| 48 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 49 |
+
MODEL_NAME,
|
| 50 |
+
torch_dtype=dtype,
|
| 51 |
+
device_map=device if device != "mps" else None,
|
| 52 |
+
trust_remote_code=True
|
| 53 |
+
)
|
| 54 |
+
|
| 55 |
+
# Move to MPS if needed
|
| 56 |
+
if device == "mps":
|
| 57 |
+
model = model.to(device)
|
| 58 |
+
|
| 59 |
+
print(f"Model loaded successfully on {device}!")
|
| 60 |
|
| 61 |
# Available languages
|
| 62 |
LANGUAGES = {
|
|
|
|
| 66 |
"Italiano": "it"
|
| 67 |
}
|
| 68 |
|
| 69 |
+
# System prompt optimized for Phi-3-mini-4k-instruct
|
| 70 |
SYSTEM_PROMPT = """You are a helpful assistant that helps users query Swiss parliamentary data.
|
| 71 |
|
| 72 |
You have access to the following tools from the OpenParlData MCP server:
|
|
|
|
| 89 |
6. **openparldata_search_debates** - Search debate transcripts
|
| 90 |
Parameters: query, date_from, date_to, speaker_id, language, limit
|
| 91 |
|
| 92 |
+
IMPORTANT: Your response MUST be valid JSON only. Do not include any explanatory text before or after the JSON. Do not wrap your response in code blocks or markdown formatting.
|
| 93 |
+
|
| 94 |
When a user asks a question about Swiss parliamentary data:
|
| 95 |
1. Analyze what information they need
|
| 96 |
2. Determine which tool(s) to use
|
| 97 |
3. Extract the relevant parameters from their question
|
| 98 |
+
4. Respond with ONLY a JSON object containing the tool call
|
| 99 |
|
| 100 |
Your response should be in this exact format:
|
| 101 |
{
|
|
|
|
| 155 |
}
|
| 156 |
|
| 157 |
|
| 158 |
+
def query_model_impl(message: str, language: str = "en") -> dict:
|
| 159 |
+
"""Query Phi-3-mini model to interpret user intent and determine tool calls."""
|
| 160 |
|
| 161 |
try:
|
| 162 |
+
# Format prompt for Phi-3
|
| 163 |
+
prompt = f"""<|system|>
|
| 164 |
+
{SYSTEM_PROMPT}<|end|>
|
| 165 |
+
<|user|>
|
| 166 |
+
Language: {language}
|
| 167 |
+
Question: {message}<|end|>
|
| 168 |
+
<|assistant|>
|
| 169 |
+
"""
|
| 170 |
+
|
| 171 |
+
# Tokenize and generate
|
| 172 |
+
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=3072)
|
| 173 |
+
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
| 174 |
+
|
| 175 |
+
with torch.no_grad():
|
| 176 |
+
outputs = model.generate(
|
| 177 |
+
**inputs,
|
| 178 |
+
max_new_tokens=500,
|
| 179 |
+
temperature=0.3,
|
| 180 |
+
do_sample=True,
|
| 181 |
+
pad_token_id=tokenizer.eos_token_id
|
| 182 |
+
)
|
| 183 |
+
|
| 184 |
+
# Decode response
|
| 185 |
+
full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 186 |
+
|
| 187 |
+
# Extract only the assistant's response (after the last <|assistant|>)
|
| 188 |
+
if "<|assistant|>" in full_response:
|
| 189 |
+
assistant_message = full_response.split("<|assistant|>")[-1].strip()
|
| 190 |
+
else:
|
| 191 |
+
assistant_message = full_response.strip()
|
| 192 |
|
| 193 |
# Try to parse as JSON
|
| 194 |
try:
|
| 195 |
+
# Clean up response - enhanced for Phi-3 model
|
| 196 |
clean_response = assistant_message.strip()
|
| 197 |
+
|
| 198 |
+
# Remove markdown code blocks
|
| 199 |
if clean_response.startswith("```json"):
|
| 200 |
clean_response = clean_response[7:]
|
| 201 |
+
elif clean_response.startswith("```"):
|
| 202 |
clean_response = clean_response[3:]
|
| 203 |
+
|
| 204 |
if clean_response.endswith("```"):
|
| 205 |
clean_response = clean_response[:-3]
|
| 206 |
+
|
| 207 |
clean_response = clean_response.strip()
|
| 208 |
|
| 209 |
+
# Find first { or [ (start of JSON) to handle explanatory text
|
| 210 |
+
json_start = min(
|
| 211 |
+
clean_response.find('{') if '{' in clean_response else len(clean_response),
|
| 212 |
+
clean_response.find('[') if '[' in clean_response else len(clean_response)
|
| 213 |
+
)
|
| 214 |
+
if json_start > 0:
|
| 215 |
+
clean_response = clean_response[json_start:]
|
| 216 |
+
|
| 217 |
return json.loads(clean_response)
|
| 218 |
except json.JSONDecodeError:
|
| 219 |
# If not valid JSON, treat as natural language response
|
| 220 |
return {"response": assistant_message}
|
| 221 |
|
| 222 |
except Exception as e:
|
| 223 |
+
return {"error": f"Error querying model: {str(e)}"}
|
| 224 |
|
| 225 |
|
| 226 |
+
# Apply ZeroGPU decorator only when running on HuggingFace Spaces
|
| 227 |
+
if SPACES_AVAILABLE:
|
| 228 |
+
query_model = spaces.GPU(duration=60)(query_model_impl)
|
| 229 |
+
else:
|
| 230 |
+
query_model = query_model_impl
|
| 231 |
|
| 232 |
|
| 233 |
async def execute_tool_async(tool_name: str, arguments: dict, show_debug: bool) -> tuple:
|
|
|
|
| 252 |
# Get language code
|
| 253 |
lang_code = LANGUAGES.get(language, "en")
|
| 254 |
|
| 255 |
+
# Query Phi-3 model to interpret intent
|
| 256 |
+
model_response = query_model(message, lang_code)
|
| 257 |
|
| 258 |
# Check if it's a direct response (no tool call needed)
|
| 259 |
+
if "response" in model_response:
|
| 260 |
+
return model_response["response"]
|
| 261 |
|
| 262 |
# Check for error
|
| 263 |
+
if "error" in model_response:
|
| 264 |
+
return f"❌ {model_response['error']}"
|
| 265 |
|
| 266 |
# Execute tool call
|
| 267 |
+
if "tool" in model_response and "arguments" in model_response:
|
| 268 |
+
tool_name = model_response["tool"]
|
| 269 |
+
arguments = model_response["arguments"]
|
| 270 |
+
explanation = model_response.get("explanation", "")
|
| 271 |
|
| 272 |
# Ensure language is set in arguments
|
| 273 |
if "language" not in arguments:
|
|
|
|
| 382 |
)
|
| 383 |
|
| 384 |
# Handle message submission
|
| 385 |
+
def respond(message, chat_history, language, show_debug, request: gr.Request):
|
| 386 |
if not message.strip():
|
| 387 |
return "", chat_history
|
| 388 |
|
| 389 |
+
# Check usage limit
|
| 390 |
+
user_id = request.client.host if request and hasattr(request, 'client') else "unknown"
|
| 391 |
+
|
| 392 |
+
if not tracker.check_limit(user_id):
|
| 393 |
+
remaining = tracker.get_remaining(user_id)
|
| 394 |
+
bot_message = f"⚠️ Daily request limit reached. You have used all 50 requests for today. Please try again tomorrow.\n\nThis limit helps us keep the service free and available for everyone."
|
| 395 |
+
chat_history.append((message, bot_message))
|
| 396 |
+
return "", chat_history
|
| 397 |
+
|
| 398 |
# Get bot response
|
| 399 |
bot_message = chat_response(message, chat_history, language, show_debug)
|
| 400 |
|
|
|
|
| 417 |
**Note:** This app uses the OpenParlData MCP server to access Swiss parliamentary data.
|
| 418 |
Currently returning mock data while the OpenParlData API is in development.
|
| 419 |
|
| 420 |
+
**Rate Limit:** 50 requests per day per user to keep the service free and accessible.
|
| 421 |
+
|
| 422 |
+
Powered by [Phi-3-mini](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on ZeroGPU and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
|
| 423 |
"""
|
| 424 |
)
|
| 425 |
|
requirements.txt
CHANGED
|
@@ -6,6 +6,12 @@ gradio>=5.49.1
|
|
| 6 |
|
| 7 |
# Hugging Face
|
| 8 |
huggingface-hub>=0.22.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
# MCP Support
|
| 11 |
mcp>=0.1.0
|
|
|
|
| 6 |
|
| 7 |
# Hugging Face
|
| 8 |
huggingface-hub>=0.22.0
|
| 9 |
+
transformers>=4.40.0
|
| 10 |
+
torch>=2.0.0
|
| 11 |
+
accelerate>=0.20.0
|
| 12 |
+
|
| 13 |
+
# ZeroGPU support (required for HuggingFace Spaces deployment)
|
| 14 |
+
spaces>=0.28.0
|
| 15 |
|
| 16 |
# MCP Support
|
| 17 |
mcp>=0.1.0
|
usage_tracker.py
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Usage tracking module for rate limiting API requests.
|
| 3 |
+
|
| 4 |
+
This module provides a simple in-memory usage tracker that limits
|
| 5 |
+
the number of requests per user per day.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
from datetime import datetime
|
| 9 |
+
from typing import Dict
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
class UsageTracker:
|
| 13 |
+
"""Track and limit user requests on a daily basis."""
|
| 14 |
+
|
| 15 |
+
def __init__(self, daily_limit: int = 100):
|
| 16 |
+
"""
|
| 17 |
+
Initialize the usage tracker.
|
| 18 |
+
|
| 19 |
+
Args:
|
| 20 |
+
daily_limit: Maximum number of requests per user per day
|
| 21 |
+
"""
|
| 22 |
+
self.daily_limit = daily_limit
|
| 23 |
+
self.usage: Dict[datetime.date, Dict[str, int]] = {}
|
| 24 |
+
|
| 25 |
+
def check_limit(self, user_id: str) -> bool:
|
| 26 |
+
"""
|
| 27 |
+
Check if user has exceeded their daily limit and increment counter.
|
| 28 |
+
|
| 29 |
+
Args:
|
| 30 |
+
user_id: Unique identifier for the user (typically IP address)
|
| 31 |
+
|
| 32 |
+
Returns:
|
| 33 |
+
True if request is allowed, False if limit exceeded
|
| 34 |
+
"""
|
| 35 |
+
today = datetime.now().date()
|
| 36 |
+
|
| 37 |
+
# Clean up old dates to prevent memory growth
|
| 38 |
+
if today not in self.usage:
|
| 39 |
+
self.usage = {today: {}}
|
| 40 |
+
|
| 41 |
+
# Get current usage count for this user
|
| 42 |
+
user_count = self.usage[today].get(user_id, 0)
|
| 43 |
+
|
| 44 |
+
# Check if limit exceeded
|
| 45 |
+
if user_count >= self.daily_limit:
|
| 46 |
+
return False
|
| 47 |
+
|
| 48 |
+
# Increment counter
|
| 49 |
+
self.usage[today][user_id] = user_count + 1
|
| 50 |
+
return True
|
| 51 |
+
|
| 52 |
+
def get_usage(self, user_id: str) -> int:
|
| 53 |
+
"""
|
| 54 |
+
Get current usage count for a user today.
|
| 55 |
+
|
| 56 |
+
Args:
|
| 57 |
+
user_id: Unique identifier for the user
|
| 58 |
+
|
| 59 |
+
Returns:
|
| 60 |
+
Number of requests made today
|
| 61 |
+
"""
|
| 62 |
+
today = datetime.now().date()
|
| 63 |
+
return self.usage.get(today, {}).get(user_id, 0)
|
| 64 |
+
|
| 65 |
+
def get_remaining(self, user_id: str) -> int:
|
| 66 |
+
"""
|
| 67 |
+
Get remaining requests for a user today.
|
| 68 |
+
|
| 69 |
+
Args:
|
| 70 |
+
user_id: Unique identifier for the user
|
| 71 |
+
|
| 72 |
+
Returns:
|
| 73 |
+
Number of requests remaining today
|
| 74 |
+
"""
|
| 75 |
+
return max(0, self.daily_limit - self.get_usage(user_id))
|