Spaces:

tomvaillant
/

cojournalist-data

Running

Tom Claude commited on Nov 2

Commit

c5df650

1 Parent(s): c7dcc92

Switch to Llama-3.1-8B-Instruct via HF Inference Providers

Major changes:
- Replace Phi-3 local model with Llama-3.1-8B-Instruct via Inference API
- Remove GPU dependencies (torch, transformers, accelerate, spaces)
- Use HuggingFace Inference Providers (Novita, etc.) for model hosting
- Enhanced system prompt with explicit date format and enum value rules
- Reduce monthly cost from $40 to ~$12 (Team → PRO plan)
- Keep usage tracker (50 req/day per user) and MCP integration

Benefits:
- No more "model_pending_deploy" errors
- Native tool calling support via Llama-3.1
- Predictable costs with Inference Provider pay-per-use
- No ZeroGPU or Team plan required

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (3) hide show

README.md +19 -18
app.py +50 -92
requirements.txt +0 -6

README.md CHANGED Viewed

@@ -7,33 +7,31 @@ sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
-short_description: Swiss Parliamentary Data Chatbot with Phi-3-mini
 ---
 # 🏛️ CoJournalist Data
-A Swiss Parliamentary Data Chatbot powered by Phi-3-mini and the OpenParlData MCP server.
 ## Features
-- 🤖 **Phi-3-mini-4k-instruct** - Efficient 3.8B parameter model running on ZeroGPU
 - 🌍 **Multilingual** - Support for English, German, French, and Italian
 - 🛠️ **Tool Calling** - Intelligent query routing to parliamentary data APIs
 - 🔒 **Rate Limited** - 50 requests per day per user for cost control
-- ⚡ **ZeroGPU** - FREE GPU inference for PRO users
 ## Space Settings Required
-**IMPORTANT:** To run this Space, you need to configure the following in your HuggingFace Space settings:
-### 1. Hardware Selection
-- Go to **Settings** → **Hardware**
-- Select **ZeroGPU** (FREE for PRO users)
-- Save changes
-### 2. Environment Variables (Optional)
-If you want to use the OpenParlData API when it's available:
-- Add `HF_TOKEN` with your HuggingFace token
 ## Usage
@@ -44,15 +42,18 @@ Simply ask questions about Swiss parliamentary data in natural language:
 ## Architecture
-- **Model:** microsoft/Phi-3-mini-4k-instruct (3.8B params)
-- **GPU:** ZeroGPU (H200) with dynamic allocation
-- **Framework:** Gradio + Transformers + PyTorch
 - **MCP Integration:** OpenParlData server for parliamentary data
 ## Cost
-- **HF PRO:** $9/month (required for ZeroGPU)
-- **Inference:** FREE (included with PRO subscription)
-- **Total:** $9/month for unlimited usage within ZeroGPU quotas
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
+short_description: Swiss Parliamentary Data Chatbot with Llama-3.1-8B
 ---
 # 🏛️ CoJournalist Data
+A Swiss Parliamentary Data Chatbot powered by Llama-3.1-8B-Instruct and the OpenParlData MCP server.
 ## Features
+- 🤖 **Llama-3.1-8B-Instruct** - Meta's 8B parameter model with native tool calling support
 - 🌍 **Multilingual** - Support for English, German, French, and Italian
 - 🛠️ **Tool Calling** - Intelligent query routing to parliamentary data APIs
 - 🔒 **Rate Limited** - 50 requests per day per user for cost control
+- ⚡ **HF Inference Providers** - Fast inference via Novita and other providers
 ## Space Settings Required
+**IMPORTANT:** To run this Space, configure the following:
+### Environment Variables
+- **Required:** `HF_TOKEN` - Your HuggingFace token with Inference Provider access
+- Add this in Space Settings → Repository secrets
+### Hardware
+- **CPU Basic** (Free) - Sufficient since inference happens via API
 ## Usage
 ## Architecture
+- **Model:** meta-llama/Llama-3.1-8B-Instruct (8B params)
+- **Inference:** HuggingFace Inference Providers (Novita, etc.)
+- **Framework:** Gradio + HuggingFace Hub
 - **MCP Integration:** OpenParlData server for parliamentary data
 ## Cost
+- **HF PRO:** $9/month (recommended)
+- **Inference:** $2/month included credits + pay-per-use
+- **Estimated Total:** ~$12/month for typical usage (1,500 requests/month)
+- **Space Hardware:** FREE (CPU Basic)
+With 50 requests/day limit, costs stay predictable and affordable.
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -1,63 +1,30 @@
 """
 CoJournalist Data - Swiss Parliamentary Data Chatbot
-Powered by Phi-3-mini and OpenParlData MCP
 """
 import os
 import json
 import gradio as gr
 from dotenv import load_dotenv
 from mcp_integration import execute_mcp_query, OpenParlDataClient
 import asyncio
 from usage_tracker import UsageTracker
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-# Import spaces only if available (for HuggingFace Spaces)
-try:
-    import spaces
-    SPACES_AVAILABLE = True
-except ImportError:
-    SPACES_AVAILABLE = False
-    print("Running locally without ZeroGPU support")
 # Load environment variables
 load_dotenv()
 # Initialize usage tracker with 50 requests per day limit
 tracker = UsageTracker(daily_limit=50)
-# Initialize model and tokenizer
-MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
-print(f"Loading model: {MODEL_NAME}")
-tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
-# Detect device (MPS for Mac, CUDA for GPU, CPU fallback)
-if torch.cuda.is_available():
-    device = "cuda"
-    dtype = torch.float16
-elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
-    device = "mps"
-    dtype = torch.float16
-else:
-    device = "cpu"
-    dtype = torch.float32
-print(f"Using device: {device}")
-model = AutoModelForCausalLM.from_pretrained(
-    MODEL_NAME,
-    torch_dtype=dtype,
-    device_map=device if device != "mps" else None,
-    trust_remote_code=True
-)
-# Move to MPS if needed
-if device == "mps":
-    model = model.to(device)
-print(f"Model loaded successfully on {device}!")
 # Available languages
 LANGUAGES = {
     "English": "en",
@@ -66,7 +33,7 @@ LANGUAGES = {
     "Italiano": "it"
 }
-# System prompt optimized for Phi-3-mini-4k-instruct
 SYSTEM_PROMPT = """You are a helpful assistant that helps users query Swiss parliamentary data.
 You have access to the following tools from the OpenParlData MCP server:
@@ -78,18 +45,29 @@ You have access to the following tools from the OpenParlData MCP server:
    Parameters: person_id, include_votes, include_motions, language
 3. **openparldata_search_votes** - Search parliamentary votes
-   Parameters: query (title/description), date_from (YYYY-MM-DD), date_to, vote_type, language, limit
 4. **openparldata_get_vote_details** - Get detailed vote information
    Parameters: vote_id, include_individual_votes, language
 5. **openparldata_search_motions** - Search motions and proposals
-   Parameters: query, status, date_from, date_to, submitter_id, language, limit
 6. **openparldata_search_debates** - Search debate transcripts
-   Parameters: query, date_from, date_to, speaker_id, language, limit
-IMPORTANT: Your response MUST be valid JSON only. Do not include any explanatory text before or after the JSON. Do not wrap your response in code blocks or markdown formatting.
 When a user asks a question about Swiss parliamentary data:
 1. Analyze what information they need
@@ -155,55 +133,37 @@ EXAMPLES = {
 }
-def query_model_impl(message: str, language: str = "en") -> dict:
-    """Query Phi-3-mini model to interpret user intent and determine tool calls."""
     try:
-        # Format prompt for Phi-3
-        prompt = f"""<|system|>
-{SYSTEM_PROMPT}<|end|>
-<|user|>
-Language: {language}
-Question: {message}<|end|>
-<|assistant|>
-"""
-        # Tokenize and generate
-        inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=3072)
-        inputs = {k: v.to(model.device) for k, v in inputs.items()}
-        with torch.no_grad():
-            outputs = model.generate(
-                **inputs,
-                max_new_tokens=500,
-                temperature=0.3,
-                do_sample=True,
-                pad_token_id=tokenizer.eos_token_id
-            )
-        # Decode response
-        full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
-        # Extract only the assistant's response (after the last <|assistant|>)
-        if "<|assistant|>" in full_response:
-            assistant_message = full_response.split("<|assistant|>")[-1].strip()
-        else:
-            assistant_message = full_response.strip()
         # Try to parse as JSON
         try:
-            # Clean up response - enhanced for Phi-3 model
             clean_response = assistant_message.strip()
-            # Remove markdown code blocks
             if clean_response.startswith("```json"):
                 clean_response = clean_response[7:]
-            elif clean_response.startswith("```"):
                 clean_response = clean_response[3:]
             if clean_response.endswith("```"):
                 clean_response = clean_response[:-3]
             clean_response = clean_response.strip()
             # Find first { or [ (start of JSON) to handle explanatory text
@@ -223,11 +183,9 @@ Question: {message}<|end|>
         return {"error": f"Error querying model: {str(e)}"}
-# Apply ZeroGPU decorator only when running on HuggingFace Spaces
-if SPACES_AVAILABLE:
-    query_model = spaces.GPU(duration=60)(query_model_impl)
-else:
-    query_model = query_model_impl
 async def execute_tool_async(tool_name: str, arguments: dict, show_debug: bool) -> tuple:
@@ -417,9 +375,9 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
         **Note:** This app uses the OpenParlData MCP server to access Swiss parliamentary data.
         Currently returning mock data while the OpenParlData API is in development.
-        **Rate Limit:** 50 requests per day per user to keep the service free and accessible.
-        Powered by [Phi-3-mini](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on ZeroGPU and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
         """
     )

 """
 CoJournalist Data - Swiss Parliamentary Data Chatbot
+Powered by Llama-3.1-8B-Instruct and OpenParlData MCP
 """
 import os
 import json
 import gradio as gr
+from huggingface_hub import InferenceClient
 from dotenv import load_dotenv
 from mcp_integration import execute_mcp_query, OpenParlDataClient
 import asyncio
 from usage_tracker import UsageTracker
 # Load environment variables
 load_dotenv()
+# Initialize Hugging Face Inference Client
+HF_TOKEN = os.getenv("HF_TOKEN")
+if not HF_TOKEN:
+    print("Warning: HF_TOKEN not found. Please set it in .env file or Hugging Face Space secrets.")
+client = InferenceClient(token=HF_TOKEN)
 # Initialize usage tracker with 50 requests per day limit
 tracker = UsageTracker(daily_limit=50)
 # Available languages
 LANGUAGES = {
     "English": "en",
     "Italiano": "it"
 }
+# System prompt for Llama-3.1-8B-Instruct
 SYSTEM_PROMPT = """You are a helpful assistant that helps users query Swiss parliamentary data.
 You have access to the following tools from the OpenParlData MCP server:
    Parameters: person_id, include_votes, include_motions, language
 3. **openparldata_search_votes** - Search parliamentary votes
+   Parameters:
+   - query (title/description)
+   - date_from (YYYY-MM-DD format, e.g., "2024-01-01")
+   - date_to (YYYY-MM-DD format, e.g., "2024-12-31" - NEVER use "now", always use actual date)
+   - vote_type (must be "final", "detail", or "overall")
+   - language, limit
 4. **openparldata_get_vote_details** - Get detailed vote information
    Parameters: vote_id, include_individual_votes, language
 5. **openparldata_search_motions** - Search motions and proposals
+   Parameters: query, status, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), submitter_id, language, limit
 6. **openparldata_search_debates** - Search debate transcripts
+   Parameters: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), speaker_id, language, limit
+CRITICAL RULES:
+- All dates MUST be in YYYY-MM-DD format (e.g., "2024-12-31")
+- NEVER use "now", "today", or relative dates - always use actual YYYY-MM-DD dates
+- For "latest" queries, use date_from with a recent date like "2024-01-01" and NO date_to parameter
+- vote_type must ONLY be "final", "detail", or "overall" - no other values
+- Your response MUST be valid JSON only
+- Do NOT include explanatory text or markdown formatting
 When a user asks a question about Swiss parliamentary data:
 1. Analyze what information they need
 }
+async def query_model_async(message: str, language: str = "en") -> dict:
+    """Query Llama-3.1-8B model via Inference Providers to interpret user intent and determine tool calls."""
     try:
+        # Create messages for chat completion
+        messages = [
+            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "user", "content": f"Language: {language}\nQuestion: {message}"}
+        ]
+        # Call Llama-3.1-8B via HuggingFace Inference Providers
+        response = client.chat_completion(
+            model="meta-llama/Llama-3.1-8B-Instruct",
+            messages=messages,
+            max_tokens=500,
+            temperature=0.3
+        )
+        # Extract response
+        assistant_message = response.choices[0].message.content
         # Try to parse as JSON
         try:
+            # Clean up response (sometimes models add markdown code blocks)
             clean_response = assistant_message.strip()
             if clean_response.startswith("```json"):
                 clean_response = clean_response[7:]
+            if clean_response.startswith("```"):
                 clean_response = clean_response[3:]
             if clean_response.endswith("```"):
                 clean_response = clean_response[:-3]
             clean_response = clean_response.strip()
             # Find first { or [ (start of JSON) to handle explanatory text
         return {"error": f"Error querying model: {str(e)}"}
+def query_model(message: str, language: str = "en") -> dict:
+    """Synchronous wrapper for async model query."""
+    return asyncio.run(query_model_async(message, language))
 async def execute_tool_async(tool_name: str, arguments: dict, show_debug: bool) -> tuple:
         **Note:** This app uses the OpenParlData MCP server to access Swiss parliamentary data.
         Currently returning mock data while the OpenParlData API is in development.
+        **Rate Limit:** 50 requests per day per user to keep the service affordable and accessible.
+        Powered by [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) via HF Inference Providers and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
         """
     )

requirements.txt CHANGED Viewed

@@ -6,12 +6,6 @@ gradio>=5.49.1
 # Hugging Face
 huggingface-hub>=0.22.0
-transformers>=4.40.0
-torch>=2.0.0
-accelerate>=0.20.0
-# ZeroGPU support (required for HuggingFace Spaces deployment)
-spaces>=0.28.0
 # MCP Support
 mcp>=0.1.0

 # Hugging Face
 huggingface-hub>=0.22.0
 # MCP Support
 mcp>=0.1.0