Spaces:
Running
Running
Tom
Claude
commited on
Commit
Β·
c5df650
1
Parent(s):
c7dcc92
Switch to Llama-3.1-8B-Instruct via HF Inference Providers
Browse filesMajor changes:
- Replace Phi-3 local model with Llama-3.1-8B-Instruct via Inference API
- Remove GPU dependencies (torch, transformers, accelerate, spaces)
- Use HuggingFace Inference Providers (Novita, etc.) for model hosting
- Enhanced system prompt with explicit date format and enum value rules
- Reduce monthly cost from $40 to ~$12 (Team β PRO plan)
- Keep usage tracker (50 req/day per user) and MCP integration
Benefits:
- No more "model_pending_deploy" errors
- Native tool calling support via Llama-3.1
- Predictable costs with Inference Provider pay-per-use
- No ZeroGPU or Team plan required
π€ Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- README.md +19 -18
- app.py +50 -92
- requirements.txt +0 -6
README.md
CHANGED
|
@@ -7,33 +7,31 @@ sdk: gradio
|
|
| 7 |
sdk_version: 5.49.1
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
-
short_description: Swiss Parliamentary Data Chatbot with
|
| 11 |
---
|
| 12 |
|
| 13 |
# ποΈ CoJournalist Data
|
| 14 |
|
| 15 |
-
A Swiss Parliamentary Data Chatbot powered by
|
| 16 |
|
| 17 |
## Features
|
| 18 |
|
| 19 |
-
- π€ **
|
| 20 |
- π **Multilingual** - Support for English, German, French, and Italian
|
| 21 |
- π οΈ **Tool Calling** - Intelligent query routing to parliamentary data APIs
|
| 22 |
- π **Rate Limited** - 50 requests per day per user for cost control
|
| 23 |
-
- β‘ **
|
| 24 |
|
| 25 |
## Space Settings Required
|
| 26 |
|
| 27 |
-
**IMPORTANT:** To run this Space,
|
| 28 |
|
| 29 |
-
###
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
-
- Save changes
|
| 33 |
|
| 34 |
-
###
|
| 35 |
-
|
| 36 |
-
- Add `HF_TOKEN` with your HuggingFace token
|
| 37 |
|
| 38 |
## Usage
|
| 39 |
|
|
@@ -44,15 +42,18 @@ Simply ask questions about Swiss parliamentary data in natural language:
|
|
| 44 |
|
| 45 |
## Architecture
|
| 46 |
|
| 47 |
-
- **Model:**
|
| 48 |
-
- **
|
| 49 |
-
- **Framework:** Gradio +
|
| 50 |
- **MCP Integration:** OpenParlData server for parliamentary data
|
| 51 |
|
| 52 |
## Cost
|
| 53 |
|
| 54 |
-
- **HF PRO:** $9/month (
|
| 55 |
-
- **Inference:**
|
| 56 |
-
- **Total:**
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 7 |
sdk_version: 5.49.1
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
short_description: Swiss Parliamentary Data Chatbot with Llama-3.1-8B
|
| 11 |
---
|
| 12 |
|
| 13 |
# ποΈ CoJournalist Data
|
| 14 |
|
| 15 |
+
A Swiss Parliamentary Data Chatbot powered by Llama-3.1-8B-Instruct and the OpenParlData MCP server.
|
| 16 |
|
| 17 |
## Features
|
| 18 |
|
| 19 |
+
- π€ **Llama-3.1-8B-Instruct** - Meta's 8B parameter model with native tool calling support
|
| 20 |
- π **Multilingual** - Support for English, German, French, and Italian
|
| 21 |
- π οΈ **Tool Calling** - Intelligent query routing to parliamentary data APIs
|
| 22 |
- π **Rate Limited** - 50 requests per day per user for cost control
|
| 23 |
+
- β‘ **HF Inference Providers** - Fast inference via Novita and other providers
|
| 24 |
|
| 25 |
## Space Settings Required
|
| 26 |
|
| 27 |
+
**IMPORTANT:** To run this Space, configure the following:
|
| 28 |
|
| 29 |
+
### Environment Variables
|
| 30 |
+
- **Required:** `HF_TOKEN` - Your HuggingFace token with Inference Provider access
|
| 31 |
+
- Add this in Space Settings β Repository secrets
|
|
|
|
| 32 |
|
| 33 |
+
### Hardware
|
| 34 |
+
- **CPU Basic** (Free) - Sufficient since inference happens via API
|
|
|
|
| 35 |
|
| 36 |
## Usage
|
| 37 |
|
|
|
|
| 42 |
|
| 43 |
## Architecture
|
| 44 |
|
| 45 |
+
- **Model:** meta-llama/Llama-3.1-8B-Instruct (8B params)
|
| 46 |
+
- **Inference:** HuggingFace Inference Providers (Novita, etc.)
|
| 47 |
+
- **Framework:** Gradio + HuggingFace Hub
|
| 48 |
- **MCP Integration:** OpenParlData server for parliamentary data
|
| 49 |
|
| 50 |
## Cost
|
| 51 |
|
| 52 |
+
- **HF PRO:** $9/month (recommended)
|
| 53 |
+
- **Inference:** $2/month included credits + pay-per-use
|
| 54 |
+
- **Estimated Total:** ~$12/month for typical usage (1,500 requests/month)
|
| 55 |
+
- **Space Hardware:** FREE (CPU Basic)
|
| 56 |
+
|
| 57 |
+
With 50 requests/day limit, costs stay predictable and affordable.
|
| 58 |
|
| 59 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
app.py
CHANGED
|
@@ -1,63 +1,30 @@
|
|
| 1 |
"""
|
| 2 |
CoJournalist Data - Swiss Parliamentary Data Chatbot
|
| 3 |
-
Powered by
|
| 4 |
"""
|
| 5 |
|
| 6 |
import os
|
| 7 |
import json
|
| 8 |
import gradio as gr
|
|
|
|
| 9 |
from dotenv import load_dotenv
|
| 10 |
from mcp_integration import execute_mcp_query, OpenParlDataClient
|
| 11 |
import asyncio
|
| 12 |
from usage_tracker import UsageTracker
|
| 13 |
-
import torch
|
| 14 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 15 |
-
|
| 16 |
-
# Import spaces only if available (for HuggingFace Spaces)
|
| 17 |
-
try:
|
| 18 |
-
import spaces
|
| 19 |
-
SPACES_AVAILABLE = True
|
| 20 |
-
except ImportError:
|
| 21 |
-
SPACES_AVAILABLE = False
|
| 22 |
-
print("Running locally without ZeroGPU support")
|
| 23 |
|
| 24 |
# Load environment variables
|
| 25 |
load_dotenv()
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
# Initialize usage tracker with 50 requests per day limit
|
| 28 |
tracker = UsageTracker(daily_limit=50)
|
| 29 |
|
| 30 |
-
# Initialize model and tokenizer
|
| 31 |
-
MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
|
| 32 |
-
print(f"Loading model: {MODEL_NAME}")
|
| 33 |
-
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
|
| 34 |
-
|
| 35 |
-
# Detect device (MPS for Mac, CUDA for GPU, CPU fallback)
|
| 36 |
-
if torch.cuda.is_available():
|
| 37 |
-
device = "cuda"
|
| 38 |
-
dtype = torch.float16
|
| 39 |
-
elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
|
| 40 |
-
device = "mps"
|
| 41 |
-
dtype = torch.float16
|
| 42 |
-
else:
|
| 43 |
-
device = "cpu"
|
| 44 |
-
dtype = torch.float32
|
| 45 |
-
|
| 46 |
-
print(f"Using device: {device}")
|
| 47 |
-
|
| 48 |
-
model = AutoModelForCausalLM.from_pretrained(
|
| 49 |
-
MODEL_NAME,
|
| 50 |
-
torch_dtype=dtype,
|
| 51 |
-
device_map=device if device != "mps" else None,
|
| 52 |
-
trust_remote_code=True
|
| 53 |
-
)
|
| 54 |
-
|
| 55 |
-
# Move to MPS if needed
|
| 56 |
-
if device == "mps":
|
| 57 |
-
model = model.to(device)
|
| 58 |
-
|
| 59 |
-
print(f"Model loaded successfully on {device}!")
|
| 60 |
-
|
| 61 |
# Available languages
|
| 62 |
LANGUAGES = {
|
| 63 |
"English": "en",
|
|
@@ -66,7 +33,7 @@ LANGUAGES = {
|
|
| 66 |
"Italiano": "it"
|
| 67 |
}
|
| 68 |
|
| 69 |
-
# System prompt
|
| 70 |
SYSTEM_PROMPT = """You are a helpful assistant that helps users query Swiss parliamentary data.
|
| 71 |
|
| 72 |
You have access to the following tools from the OpenParlData MCP server:
|
|
@@ -78,18 +45,29 @@ You have access to the following tools from the OpenParlData MCP server:
|
|
| 78 |
Parameters: person_id, include_votes, include_motions, language
|
| 79 |
|
| 80 |
3. **openparldata_search_votes** - Search parliamentary votes
|
| 81 |
-
Parameters:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
4. **openparldata_get_vote_details** - Get detailed vote information
|
| 84 |
Parameters: vote_id, include_individual_votes, language
|
| 85 |
|
| 86 |
5. **openparldata_search_motions** - Search motions and proposals
|
| 87 |
-
Parameters: query, status, date_from, date_to, submitter_id, language, limit
|
| 88 |
|
| 89 |
6. **openparldata_search_debates** - Search debate transcripts
|
| 90 |
-
Parameters: query, date_from, date_to, speaker_id, language, limit
|
| 91 |
|
| 92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
When a user asks a question about Swiss parliamentary data:
|
| 95 |
1. Analyze what information they need
|
|
@@ -155,55 +133,37 @@ EXAMPLES = {
|
|
| 155 |
}
|
| 156 |
|
| 157 |
|
| 158 |
-
def
|
| 159 |
-
"""Query
|
| 160 |
|
| 161 |
try:
|
| 162 |
-
#
|
| 163 |
-
|
| 164 |
-
{SYSTEM_PROMPT}
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
max_new_tokens=500,
|
| 179 |
-
temperature=0.3,
|
| 180 |
-
do_sample=True,
|
| 181 |
-
pad_token_id=tokenizer.eos_token_id
|
| 182 |
-
)
|
| 183 |
-
|
| 184 |
-
# Decode response
|
| 185 |
-
full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 186 |
-
|
| 187 |
-
# Extract only the assistant's response (after the last <|assistant|>)
|
| 188 |
-
if "<|assistant|>" in full_response:
|
| 189 |
-
assistant_message = full_response.split("<|assistant|>")[-1].strip()
|
| 190 |
-
else:
|
| 191 |
-
assistant_message = full_response.strip()
|
| 192 |
|
| 193 |
# Try to parse as JSON
|
| 194 |
try:
|
| 195 |
-
# Clean up response
|
| 196 |
clean_response = assistant_message.strip()
|
| 197 |
-
|
| 198 |
-
# Remove markdown code blocks
|
| 199 |
if clean_response.startswith("```json"):
|
| 200 |
clean_response = clean_response[7:]
|
| 201 |
-
|
| 202 |
clean_response = clean_response[3:]
|
| 203 |
-
|
| 204 |
if clean_response.endswith("```"):
|
| 205 |
clean_response = clean_response[:-3]
|
| 206 |
-
|
| 207 |
clean_response = clean_response.strip()
|
| 208 |
|
| 209 |
# Find first { or [ (start of JSON) to handle explanatory text
|
|
@@ -223,11 +183,9 @@ Question: {message}<|end|>
|
|
| 223 |
return {"error": f"Error querying model: {str(e)}"}
|
| 224 |
|
| 225 |
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
else:
|
| 230 |
-
query_model = query_model_impl
|
| 231 |
|
| 232 |
|
| 233 |
async def execute_tool_async(tool_name: str, arguments: dict, show_debug: bool) -> tuple:
|
|
@@ -417,9 +375,9 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
|
|
| 417 |
**Note:** This app uses the OpenParlData MCP server to access Swiss parliamentary data.
|
| 418 |
Currently returning mock data while the OpenParlData API is in development.
|
| 419 |
|
| 420 |
-
**Rate Limit:** 50 requests per day per user to keep the service
|
| 421 |
|
| 422 |
-
Powered by [
|
| 423 |
"""
|
| 424 |
)
|
| 425 |
|
|
|
|
| 1 |
"""
|
| 2 |
CoJournalist Data - Swiss Parliamentary Data Chatbot
|
| 3 |
+
Powered by Llama-3.1-8B-Instruct and OpenParlData MCP
|
| 4 |
"""
|
| 5 |
|
| 6 |
import os
|
| 7 |
import json
|
| 8 |
import gradio as gr
|
| 9 |
+
from huggingface_hub import InferenceClient
|
| 10 |
from dotenv import load_dotenv
|
| 11 |
from mcp_integration import execute_mcp_query, OpenParlDataClient
|
| 12 |
import asyncio
|
| 13 |
from usage_tracker import UsageTracker
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
# Load environment variables
|
| 16 |
load_dotenv()
|
| 17 |
|
| 18 |
+
# Initialize Hugging Face Inference Client
|
| 19 |
+
HF_TOKEN = os.getenv("HF_TOKEN")
|
| 20 |
+
if not HF_TOKEN:
|
| 21 |
+
print("Warning: HF_TOKEN not found. Please set it in .env file or Hugging Face Space secrets.")
|
| 22 |
+
|
| 23 |
+
client = InferenceClient(token=HF_TOKEN)
|
| 24 |
+
|
| 25 |
# Initialize usage tracker with 50 requests per day limit
|
| 26 |
tracker = UsageTracker(daily_limit=50)
|
| 27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
# Available languages
|
| 29 |
LANGUAGES = {
|
| 30 |
"English": "en",
|
|
|
|
| 33 |
"Italiano": "it"
|
| 34 |
}
|
| 35 |
|
| 36 |
+
# System prompt for Llama-3.1-8B-Instruct
|
| 37 |
SYSTEM_PROMPT = """You are a helpful assistant that helps users query Swiss parliamentary data.
|
| 38 |
|
| 39 |
You have access to the following tools from the OpenParlData MCP server:
|
|
|
|
| 45 |
Parameters: person_id, include_votes, include_motions, language
|
| 46 |
|
| 47 |
3. **openparldata_search_votes** - Search parliamentary votes
|
| 48 |
+
Parameters:
|
| 49 |
+
- query (title/description)
|
| 50 |
+
- date_from (YYYY-MM-DD format, e.g., "2024-01-01")
|
| 51 |
+
- date_to (YYYY-MM-DD format, e.g., "2024-12-31" - NEVER use "now", always use actual date)
|
| 52 |
+
- vote_type (must be "final", "detail", or "overall")
|
| 53 |
+
- language, limit
|
| 54 |
|
| 55 |
4. **openparldata_get_vote_details** - Get detailed vote information
|
| 56 |
Parameters: vote_id, include_individual_votes, language
|
| 57 |
|
| 58 |
5. **openparldata_search_motions** - Search motions and proposals
|
| 59 |
+
Parameters: query, status, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), submitter_id, language, limit
|
| 60 |
|
| 61 |
6. **openparldata_search_debates** - Search debate transcripts
|
| 62 |
+
Parameters: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), speaker_id, language, limit
|
| 63 |
|
| 64 |
+
CRITICAL RULES:
|
| 65 |
+
- All dates MUST be in YYYY-MM-DD format (e.g., "2024-12-31")
|
| 66 |
+
- NEVER use "now", "today", or relative dates - always use actual YYYY-MM-DD dates
|
| 67 |
+
- For "latest" queries, use date_from with a recent date like "2024-01-01" and NO date_to parameter
|
| 68 |
+
- vote_type must ONLY be "final", "detail", or "overall" - no other values
|
| 69 |
+
- Your response MUST be valid JSON only
|
| 70 |
+
- Do NOT include explanatory text or markdown formatting
|
| 71 |
|
| 72 |
When a user asks a question about Swiss parliamentary data:
|
| 73 |
1. Analyze what information they need
|
|
|
|
| 133 |
}
|
| 134 |
|
| 135 |
|
| 136 |
+
async def query_model_async(message: str, language: str = "en") -> dict:
|
| 137 |
+
"""Query Llama-3.1-8B model via Inference Providers to interpret user intent and determine tool calls."""
|
| 138 |
|
| 139 |
try:
|
| 140 |
+
# Create messages for chat completion
|
| 141 |
+
messages = [
|
| 142 |
+
{"role": "system", "content": SYSTEM_PROMPT},
|
| 143 |
+
{"role": "user", "content": f"Language: {language}\nQuestion: {message}"}
|
| 144 |
+
]
|
| 145 |
+
|
| 146 |
+
# Call Llama-3.1-8B via HuggingFace Inference Providers
|
| 147 |
+
response = client.chat_completion(
|
| 148 |
+
model="meta-llama/Llama-3.1-8B-Instruct",
|
| 149 |
+
messages=messages,
|
| 150 |
+
max_tokens=500,
|
| 151 |
+
temperature=0.3
|
| 152 |
+
)
|
| 153 |
+
|
| 154 |
+
# Extract response
|
| 155 |
+
assistant_message = response.choices[0].message.content
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
|
| 157 |
# Try to parse as JSON
|
| 158 |
try:
|
| 159 |
+
# Clean up response (sometimes models add markdown code blocks)
|
| 160 |
clean_response = assistant_message.strip()
|
|
|
|
|
|
|
| 161 |
if clean_response.startswith("```json"):
|
| 162 |
clean_response = clean_response[7:]
|
| 163 |
+
if clean_response.startswith("```"):
|
| 164 |
clean_response = clean_response[3:]
|
|
|
|
| 165 |
if clean_response.endswith("```"):
|
| 166 |
clean_response = clean_response[:-3]
|
|
|
|
| 167 |
clean_response = clean_response.strip()
|
| 168 |
|
| 169 |
# Find first { or [ (start of JSON) to handle explanatory text
|
|
|
|
| 183 |
return {"error": f"Error querying model: {str(e)}"}
|
| 184 |
|
| 185 |
|
| 186 |
+
def query_model(message: str, language: str = "en") -> dict:
|
| 187 |
+
"""Synchronous wrapper for async model query."""
|
| 188 |
+
return asyncio.run(query_model_async(message, language))
|
|
|
|
|
|
|
| 189 |
|
| 190 |
|
| 191 |
async def execute_tool_async(tool_name: str, arguments: dict, show_debug: bool) -> tuple:
|
|
|
|
| 375 |
**Note:** This app uses the OpenParlData MCP server to access Swiss parliamentary data.
|
| 376 |
Currently returning mock data while the OpenParlData API is in development.
|
| 377 |
|
| 378 |
+
**Rate Limit:** 50 requests per day per user to keep the service affordable and accessible.
|
| 379 |
|
| 380 |
+
Powered by [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) via HF Inference Providers and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
|
| 381 |
"""
|
| 382 |
)
|
| 383 |
|
requirements.txt
CHANGED
|
@@ -6,12 +6,6 @@ gradio>=5.49.1
|
|
| 6 |
|
| 7 |
# Hugging Face
|
| 8 |
huggingface-hub>=0.22.0
|
| 9 |
-
transformers>=4.40.0
|
| 10 |
-
torch>=2.0.0
|
| 11 |
-
accelerate>=0.20.0
|
| 12 |
-
|
| 13 |
-
# ZeroGPU support (required for HuggingFace Spaces deployment)
|
| 14 |
-
spaces>=0.28.0
|
| 15 |
|
| 16 |
# MCP Support
|
| 17 |
mcp>=0.1.0
|
|
|
|
| 6 |
|
| 7 |
# Hugging Face
|
| 8 |
huggingface-hub>=0.22.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
# MCP Support
|
| 11 |
mcp>=0.1.0
|