Spaces:

tomvaillant
/

cojournalist-data

Running

App Files Files Community

Tom commited on Nov 2

Commit

c7dcc92

1 Parent(s): 5a32df9

Deploy Phi-3-mini with ZeroGPU and 50 req/day limit

Browse files

Files changed (5) hide show

DEPLOYMENT.md +154 -0
README.md +49 -4
app.py +123 -45
requirements.txt +6 -0
usage_tracker.py +75 -0

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,154 @@

+# 🚀 Deployment Guide for HuggingFace Space with ZeroGPU
+## ✅ Pre-Deployment Checklist
+All code is ready! Here's what's configured:
+- ✅ Model: `microsoft/Phi-3-mini-4k-instruct` (3.8B params)
+- ✅ ZeroGPU support: Enabled with `@spaces.GPU` decorator
+- ✅ Local/Space compatibility: Auto-detects environment
+- ✅ Usage tracking: 50 requests/day per user
+- ✅ Requirements: All dependencies listed
+- ✅ README: Updated with instructions
+## 📋 Deployment Steps
+### Step 1: Push Code to Your Space
+```bash
+cd /Users/tom/code/cojournalist-data
+# If not already initialized
+git init
+git remote add space https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data
+# Or if already connected
+git add .
+git commit -m "Deploy Phi-3-mini with ZeroGPU and usage tracking"
+git push space main
+```
+### Step 2: Configure Space Hardware
+1. Go to your Space: `https://huggingface.co/spaces/YOUR_USERNAME/cojournalist-data`
+2. Click **Settings** (⚙️ icon in top right)
+3. Scroll to **Hardware** section
+4. Select **ZeroGPU** from dropdown
+5. Click **Save**
+6. Space will restart automatically
+### Step 3: Wait for Build
+The Space will:
+1. Install dependencies (~2-3 minutes)
+2. Download Phi-3-mini model (~1-2 minutes, 7.6GB)
+3. Load model into memory (~30 seconds)
+4. Launch Gradio interface
+**Total build time: ~5-7 minutes**
+### Step 4: Test Your Space
+Once running, test with these queries:
+1. **English:** "Who are the parliamentarians from Zurich?"
+2. **German:** "Zeige mir aktuelle Abstimmungen zur Klimapolitik"
+3. **French:** "Qui sont les parlementaires de Zurich?"
+4. **Italian:** "Mostrami i voti recenti sulla politica climatica"
+## 🔧 Space Settings Summary
+### Hardware
+- **Type:** ZeroGPU
+- **Cost:** FREE (included with Team plan)
+- **GPU:** Nvidia H200 (70GB VRAM)
+- **Allocation:** Dynamic (only when needed)
+### Environment Variables (Optional)
+If you want to configure anything:
+- `HF_TOKEN`: Your HuggingFace token (for private models, not needed for Phi-3)
+## 📊 Expected Behavior
+### First Request
+- Takes ~5-10 seconds (GPU allocation + inference)
+- Subsequent requests faster (~2-5 seconds)
+### Rate Limiting
+- 50 requests per day per user IP
+- Error message shown when limit reached
+- Resets daily at midnight UTC
+### Model Loading
+- Happens once on Space startup
+- Cached for subsequent requests
+- No reload needed between requests
+## 🐛 Troubleshooting
+### "Model not loading"
+- Check Space logs for errors
+- Verify ZeroGPU is selected in Hardware settings
+- Ensure `spaces>=0.28.0` in requirements.txt
+### "Out of memory"
+- This shouldn't happen with ZeroGPU (70GB VRAM)
+- If it does, contact HF support
+### "Rate limit not working"
+- Usage tracker uses in-memory storage
+- Resets on Space restart
+- IP-based tracking (works in production)
+### "Slow inference"
+- First request allocates GPU (slower)
+- Subsequent requests use cached allocation
+- Normal: 2-5 seconds per request
+## 💰 Cost Breakdown
+- **Team Plan:** $20/user/month (you already have this)
+- **ZeroGPU:** FREE (included)
+- **Inference:** FREE (no API calls)
+- **Storage:** FREE (model cached by HF)
+**Total additional cost: $0/month** 🎉
+## 🔄 Updates & Maintenance
+To update your Space:
+```bash
+# Make changes to code
+git add .
+git commit -m "Update: description of changes"
+git push space main
+```
+Space will automatically rebuild and redeploy.
+## 📈 Monitoring Usage
+Check your Space's metrics:
+1. Go to Space page
+2. Click "Analytics" tab
+3. View daily/weekly usage stats
+## 🎯 Next Steps After Deployment
+1. ✅ Test all 4 languages
+2. ✅ Verify tool calling works
+3. ✅ Check rate limiting
+4. ✅ Monitor performance
+5. 🔜 Adjust system prompt if needed
+6. 🔜 Fine-tune temperature/max_tokens if needed
+## 📞 Support
+If you encounter issues:
+- Check Space logs (Settings → Logs)
+- HuggingFace Discord: https://discord.gg/huggingface
+- HF Forums: https://discuss.huggingface.co/
+---
+**You're ready to deploy! 🚀**

README.md CHANGED Viewed

@@ -1,13 +1,58 @@
 ---
 title: Cojournalist Data
-emoji: 🐨
-colorFrom: green
-colorTo: red
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
-short_description: Data LLM for coJournalist
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Cojournalist Data
+emoji: 🏛️
+colorFrom: blue
+colorTo: purple
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
+short_description: Swiss Parliamentary Data Chatbot with Phi-3-mini
 ---
+# 🏛️ CoJournalist Data
+A Swiss Parliamentary Data Chatbot powered by Phi-3-mini and the OpenParlData MCP server.
+## Features
+- 🤖 **Phi-3-mini-4k-instruct** - Efficient 3.8B parameter model running on ZeroGPU
+- 🌍 **Multilingual** - Support for English, German, French, and Italian
+- 🛠️ **Tool Calling** - Intelligent query routing to parliamentary data APIs
+- 🔒 **Rate Limited** - 50 requests per day per user for cost control
+- ⚡ **ZeroGPU** - FREE GPU inference for PRO users
+## Space Settings Required
+**IMPORTANT:** To run this Space, you need to configure the following in your HuggingFace Space settings:
+### 1. Hardware Selection
+- Go to **Settings** → **Hardware**
+- Select **ZeroGPU** (FREE for PRO users)
+- Save changes
+### 2. Environment Variables (Optional)
+If you want to use the OpenParlData API when it's available:
+- Add `HF_TOKEN` with your HuggingFace token
+## Usage
+Simply ask questions about Swiss parliamentary data in natural language:
+- "Who are the parliamentarians from Zurich?"
+- "Show me recent votes about climate policy"
+- "What motions were submitted about healthcare in 2024?"
+## Architecture
+- **Model:** microsoft/Phi-3-mini-4k-instruct (3.8B params)
+- **GPU:** ZeroGPU (H200) with dynamic allocation
+- **Framework:** Gradio + Transformers + PyTorch
+- **MCP Integration:** OpenParlData server for parliamentary data
+## Cost
+- **HF PRO:** $9/month (required for ZeroGPU)
+- **Inference:** FREE (included with PRO subscription)
+- **Total:** $9/month for unlimited usage within ZeroGPU quotas
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -1,25 +1,62 @@
 """
 CoJournalist Data - Swiss Parliamentary Data Chatbot
-Powered by Mistral AI and OpenParlData MCP
 """
 import os
 import json
 import gradio as gr
-from huggingface_hub import InferenceClient
 from dotenv import load_dotenv
 from mcp_integration import execute_mcp_query, OpenParlDataClient
 import asyncio
 # Load environment variables
 load_dotenv()
-# Initialize Hugging Face Inference Client
-HF_TOKEN = os.getenv("HF_TOKEN")
-if not HF_TOKEN:
-    print("Warning: HF_TOKEN not found. Please set it in .env file or Hugging Face Space secrets.")
-client = InferenceClient(token=HF_TOKEN)
 # Available languages
 LANGUAGES = {
@@ -29,7 +66,7 @@ LANGUAGES = {
     "Italiano": "it"
 }
-# System prompt for Mistral
 SYSTEM_PROMPT = """You are a helpful assistant that helps users query Swiss parliamentary data.
 You have access to the following tools from the OpenParlData MCP server:
@@ -52,11 +89,13 @@ You have access to the following tools from the OpenParlData MCP server:
 6. **openparldata_search_debates** - Search debate transcripts
    Parameters: query, date_from, date_to, speaker_id, language, limit
 When a user asks a question about Swiss parliamentary data:
 1. Analyze what information they need
 2. Determine which tool(s) to use
 3. Extract the relevant parameters from their question
-4. Respond with a JSON object containing the tool call
 Your response should be in this exact format:
 {
@@ -116,51 +155,79 @@ EXAMPLES = {
 }
-async def query_mistral_async(message: str, language: str = "en") -> dict:
-    """Query Mistral model to interpret user intent and determine tool calls."""
     try:
-        # Create messages for chat completion
-        messages = [
-            {"role": "system", "content": SYSTEM_PROMPT},
-            {"role": "user", "content": f"Language: {language}\nQuestion: {message}"}
-        ]
-        # Call Mistral via HuggingFace Inference API
-        response = client.chat_completion(
-            model="mistralai/Mistral-7B-Instruct-v0.3",
-            messages=messages,
-            max_tokens=500,
-            temperature=0.3
-        )
-        # Extract response
-        assistant_message = response.choices[0].message.content
         # Try to parse as JSON
         try:
-            # Clean up response (sometimes models add markdown code blocks)
             clean_response = assistant_message.strip()
             if clean_response.startswith("```json"):
                 clean_response = clean_response[7:]
-            if clean_response.startswith("```"):
                 clean_response = clean_response[3:]
             if clean_response.endswith("```"):
                 clean_response = clean_response[:-3]
             clean_response = clean_response.strip()
             return json.loads(clean_response)
         except json.JSONDecodeError:
             # If not valid JSON, treat as natural language response
             return {"response": assistant_message}
     except Exception as e:
-        return {"error": f"Error querying Mistral: {str(e)}"}
-def query_mistral(message: str, language: str = "en") -> dict:
-    """Synchronous wrapper for async Mistral query."""
-    return asyncio.run(query_mistral_async(message, language))
 async def execute_tool_async(tool_name: str, arguments: dict, show_debug: bool) -> tuple:
@@ -185,22 +252,22 @@ def chat_response(message: str, history: list, language: str, show_debug: bool)
         # Get language code
         lang_code = LANGUAGES.get(language, "en")
-        # Query Mistral to interpret intent
-        mistral_response = query_mistral(message, lang_code)
         # Check if it's a direct response (no tool call needed)
-        if "response" in mistral_response:
-            return mistral_response["response"]
         # Check for error
-        if "error" in mistral_response:
-            return f"❌ {mistral_response['error']}"
         # Execute tool call
-        if "tool" in mistral_response and "arguments" in mistral_response:
-            tool_name = mistral_response["tool"]
-            arguments = mistral_response["arguments"]
-            explanation = mistral_response.get("explanation", "")
             # Ensure language is set in arguments
             if "language" not in arguments:
@@ -315,10 +382,19 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
             )
     # Handle message submission
-    def respond(message, chat_history, language, show_debug):
         if not message.strip():
             return "", chat_history
         # Get bot response
         bot_message = chat_response(message, chat_history, language, show_debug)
@@ -341,7 +417,9 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
         **Note:** This app uses the OpenParlData MCP server to access Swiss parliamentary data.
         Currently returning mock data while the OpenParlData API is in development.
-        Powered by [Mistral AI](https://mistral.ai/) and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
         """
     )

 """
 CoJournalist Data - Swiss Parliamentary Data Chatbot
+Powered by Phi-3-mini and OpenParlData MCP
 """
 import os
 import json
 import gradio as gr
 from dotenv import load_dotenv
 from mcp_integration import execute_mcp_query, OpenParlDataClient
 import asyncio
+from usage_tracker import UsageTracker
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Import spaces only if available (for HuggingFace Spaces)
+try:
+    import spaces
+    SPACES_AVAILABLE = True
+except ImportError:
+    SPACES_AVAILABLE = False
+    print("Running locally without ZeroGPU support")
 # Load environment variables
 load_dotenv()
+# Initialize usage tracker with 50 requests per day limit
+tracker = UsageTracker(daily_limit=50)
+# Initialize model and tokenizer
+MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
+print(f"Loading model: {MODEL_NAME}")
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
+# Detect device (MPS for Mac, CUDA for GPU, CPU fallback)
+if torch.cuda.is_available():
+    device = "cuda"
+    dtype = torch.float16
+elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
+    device = "mps"
+    dtype = torch.float16
+else:
+    device = "cpu"
+    dtype = torch.float32
+print(f"Using device: {device}")
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_NAME,
+    torch_dtype=dtype,
+    device_map=device if device != "mps" else None,
+    trust_remote_code=True
+)
+# Move to MPS if needed
+if device == "mps":
+    model = model.to(device)
+print(f"Model loaded successfully on {device}!")
 # Available languages
 LANGUAGES = {
     "Italiano": "it"
 }
+# System prompt optimized for Phi-3-mini-4k-instruct
 SYSTEM_PROMPT = """You are a helpful assistant that helps users query Swiss parliamentary data.
 You have access to the following tools from the OpenParlData MCP server:
 6. **openparldata_search_debates** - Search debate transcripts
    Parameters: query, date_from, date_to, speaker_id, language, limit
+IMPORTANT: Your response MUST be valid JSON only. Do not include any explanatory text before or after the JSON. Do not wrap your response in code blocks or markdown formatting.
 When a user asks a question about Swiss parliamentary data:
 1. Analyze what information they need
 2. Determine which tool(s) to use
 3. Extract the relevant parameters from their question
+4. Respond with ONLY a JSON object containing the tool call
 Your response should be in this exact format:
 {
 }
+def query_model_impl(message: str, language: str = "en") -> dict:
+    """Query Phi-3-mini model to interpret user intent and determine tool calls."""
     try:
+        # Format prompt for Phi-3
+        prompt = f"""<|system|>
+{SYSTEM_PROMPT}<|end|>
+<|user|>
+Language: {language}
+Question: {message}<|end|>
+<|assistant|>
+"""
+        # Tokenize and generate
+        inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=3072)
+        inputs = {k: v.to(model.device) for k, v in inputs.items()}
+        with torch.no_grad():
+            outputs = model.generate(
+                **inputs,
+                max_new_tokens=500,
+                temperature=0.3,
+                do_sample=True,
+                pad_token_id=tokenizer.eos_token_id
+            )
+        # Decode response
+        full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+        # Extract only the assistant's response (after the last <|assistant|>)
+        if "<|assistant|>" in full_response:
+            assistant_message = full_response.split("<|assistant|>")[-1].strip()
+        else:
+            assistant_message = full_response.strip()
         # Try to parse as JSON
         try:
+            # Clean up response - enhanced for Phi-3 model
             clean_response = assistant_message.strip()
+            # Remove markdown code blocks
             if clean_response.startswith("```json"):
                 clean_response = clean_response[7:]
+            elif clean_response.startswith("```"):
                 clean_response = clean_response[3:]
             if clean_response.endswith("```"):
                 clean_response = clean_response[:-3]
             clean_response = clean_response.strip()
+            # Find first { or [ (start of JSON) to handle explanatory text
+            json_start = min(
+                clean_response.find('{') if '{' in clean_response else len(clean_response),
+                clean_response.find('[') if '[' in clean_response else len(clean_response)
+            )
+            if json_start > 0:
+                clean_response = clean_response[json_start:]
             return json.loads(clean_response)
         except json.JSONDecodeError:
             # If not valid JSON, treat as natural language response
             return {"response": assistant_message}
     except Exception as e:
+        return {"error": f"Error querying model: {str(e)}"}
+# Apply ZeroGPU decorator only when running on HuggingFace Spaces
+if SPACES_AVAILABLE:
+    query_model = spaces.GPU(duration=60)(query_model_impl)
+else:
+    query_model = query_model_impl
 async def execute_tool_async(tool_name: str, arguments: dict, show_debug: bool) -> tuple:
         # Get language code
         lang_code = LANGUAGES.get(language, "en")
+        # Query Phi-3 model to interpret intent
+        model_response = query_model(message, lang_code)
         # Check if it's a direct response (no tool call needed)
+        if "response" in model_response:
+            return model_response["response"]
         # Check for error
+        if "error" in model_response:
+            return f"❌ {model_response['error']}"
         # Execute tool call
+        if "tool" in model_response and "arguments" in model_response:
+            tool_name = model_response["tool"]
+            arguments = model_response["arguments"]
+            explanation = model_response.get("explanation", "")
             # Ensure language is set in arguments
             if "language" not in arguments:
             )
     # Handle message submission
+    def respond(message, chat_history, language, show_debug, request: gr.Request):
         if not message.strip():
             return "", chat_history
+        # Check usage limit
+        user_id = request.client.host if request and hasattr(request, 'client') else "unknown"
+        if not tracker.check_limit(user_id):
+            remaining = tracker.get_remaining(user_id)
+            bot_message = f"⚠️ Daily request limit reached. You have used all 50 requests for today. Please try again tomorrow.\n\nThis limit helps us keep the service free and available for everyone."
+            chat_history.append((message, bot_message))
+            return "", chat_history
         # Get bot response
         bot_message = chat_response(message, chat_history, language, show_debug)
         **Note:** This app uses the OpenParlData MCP server to access Swiss parliamentary data.
         Currently returning mock data while the OpenParlData API is in development.
+        **Rate Limit:** 50 requests per day per user to keep the service free and accessible.
+        Powered by [Phi-3-mini](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on ZeroGPU and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
         """
     )

requirements.txt CHANGED Viewed

@@ -6,6 +6,12 @@ gradio>=5.49.1
 # Hugging Face
 huggingface-hub>=0.22.0
 # MCP Support
 mcp>=0.1.0

 # Hugging Face
 huggingface-hub>=0.22.0
+transformers>=4.40.0
+torch>=2.0.0
+accelerate>=0.20.0
+# ZeroGPU support (required for HuggingFace Spaces deployment)
+spaces>=0.28.0
 # MCP Support
 mcp>=0.1.0

usage_tracker.py ADDED Viewed

	@@ -0,0 +1,75 @@

+"""
+Usage tracking module for rate limiting API requests.
+This module provides a simple in-memory usage tracker that limits
+the number of requests per user per day.
+"""
+from datetime import datetime
+from typing import Dict
+class UsageTracker:
+    """Track and limit user requests on a daily basis."""
+    def __init__(self, daily_limit: int = 100):
+        """
+        Initialize the usage tracker.
+        Args:
+            daily_limit: Maximum number of requests per user per day
+        """
+        self.daily_limit = daily_limit
+        self.usage: Dict[datetime.date, Dict[str, int]] = {}
+    def check_limit(self, user_id: str) -> bool:
+        """
+        Check if user has exceeded their daily limit and increment counter.
+        Args:
+            user_id: Unique identifier for the user (typically IP address)
+        Returns:
+            True if request is allowed, False if limit exceeded
+        """
+        today = datetime.now().date()
+        # Clean up old dates to prevent memory growth
+        if today not in self.usage:
+            self.usage = {today: {}}
+        # Get current usage count for this user
+        user_count = self.usage[today].get(user_id, 0)
+        # Check if limit exceeded
+        if user_count >= self.daily_limit:
+            return False
+        # Increment counter
+        self.usage[today][user_id] = user_count + 1
+        return True
+    def get_usage(self, user_id: str) -> int:
+        """
+        Get current usage count for a user today.
+        Args:
+            user_id: Unique identifier for the user
+        Returns:
+            Number of requests made today
+        """
+        today = datetime.now().date()
+        return self.usage.get(today, {}).get(user_id, 0)
+    def get_remaining(self, user_id: str) -> int:
+        """
+        Get remaining requests for a user today.
+        Args:
+            user_id: Unique identifier for the user
+        Returns:
+            Number of requests remaining today
+        """
+        return max(0, self.daily_limit - self.get_usage(user_id))