Spaces:
Sleeping
Implement engine-per-dataset architecture with argument sanitization and enhanced UI
Browse filesMajor architectural improvements:
- Refactor to engine-per-dataset pattern (ParliamentEngine, BFSEngine)
- Add comprehensive argument sanitization layer to prevent MCP validation errors
- Implement datacube knowledge base for BFS semantic search
- Add parliament cards display with smart pagination (10 items, auto-hide when <10)
- Add strategic logging for debugging MCP payloads
BFS MCP enhancements:
- Create mcp_bfs/ with knowledge base mapping keywords to datacube IDs
- Map topics (population, employment, health, etc.) to specific datacubes
- Enable semantic search without relying on cryptic API IDs
UI improvements:
- Simplify BFS search results display (no duplicate listings)
- Add language-aware content display (de/fr/it preference with English fallback)
- Remove Example Questions and Debug Info sections
- Update title to "CoJournalist Swiss Data" with white text
- Simplify API dataset values to "openparldata" and "bfs"
- Change placeholder text to "(Choose a source on the right first)"
Technical improvements:
- Add Pydantic-compatible type conversions (string→int for limit, string→enum for language/format)
- Implement tool-specific parameter filtering to prevent extra='forbid' errors
- Update prompts with explicit parameter constraints
- Enable Gradio API parameter support for dataset selection
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- app.py +946 -243
- mcp/openparldata_mcp.py +2 -2
- mcp_bfs/MCP_USAGE.md +120 -0
- mcp_bfs/bfs_mcp_server.py +538 -0
- mcp_bfs/requirements.txt +4 -0
- mcp_bfs/test_bfs_api.py +99 -0
- mcp_integration.py +141 -3
- prompts/bfs.txt +34 -0
- prompts/parliament.txt +29 -0
- requirements.txt +3 -0
|
@@ -1,20 +1,37 @@
|
|
| 1 |
"""
|
| 2 |
-
CoJournalist Data - Swiss Parliamentary Data Chatbot
|
| 3 |
-
Powered by Llama-3.1-8B-Instruct
|
| 4 |
"""
|
| 5 |
|
| 6 |
import os
|
| 7 |
import json
|
|
|
|
|
|
|
|
|
|
| 8 |
import gradio as gr
|
| 9 |
from huggingface_hub import InferenceClient
|
| 10 |
from dotenv import load_dotenv
|
| 11 |
-
from mcp_integration import execute_mcp_query,
|
| 12 |
import asyncio
|
| 13 |
from usage_tracker import UsageTracker
|
| 14 |
|
| 15 |
# Load environment variables
|
| 16 |
load_dotenv()
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
# Initialize Hugging Face Inference Client
|
| 19 |
HF_TOKEN = os.getenv("HF_TOKEN")
|
| 20 |
if not HF_TOKEN:
|
|
@@ -22,6 +39,572 @@ if not HF_TOKEN:
|
|
| 22 |
|
| 23 |
client = InferenceClient(token=HF_TOKEN)
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
# Initialize usage tracker with 50 requests per day limit
|
| 26 |
tracker = UsageTracker(daily_limit=50)
|
| 27 |
|
|
@@ -33,79 +616,8 @@ LANGUAGES = {
|
|
| 33 |
"Italiano": "it"
|
| 34 |
}
|
| 35 |
|
| 36 |
-
#
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
You have access to the following tools from the OpenParlData MCP server:
|
| 40 |
-
|
| 41 |
-
1. **openparldata_search_parliamentarians** - Search for Swiss parliamentarians
|
| 42 |
-
Parameters: query (name/party), canton (2-letter code), party, active_only, language, limit
|
| 43 |
-
|
| 44 |
-
2. **openparldata_get_parliamentarian** - Get detailed info about a specific parliamentarian
|
| 45 |
-
Parameters: person_id, include_votes, include_motions, language
|
| 46 |
-
|
| 47 |
-
3. **openparldata_search_votes** - Search parliamentary votes
|
| 48 |
-
Parameters:
|
| 49 |
-
- query (title/description)
|
| 50 |
-
- date_from (YYYY-MM-DD format, e.g., "2024-01-01")
|
| 51 |
-
- date_to (YYYY-MM-DD format, e.g., "2024-12-31" - NEVER use "now", always use actual date)
|
| 52 |
-
- vote_type (must be "final", "detail", or "overall")
|
| 53 |
-
- language, limit
|
| 54 |
-
|
| 55 |
-
4. **openparldata_get_vote_details** - Get detailed vote information
|
| 56 |
-
Parameters: vote_id, include_individual_votes, language
|
| 57 |
-
|
| 58 |
-
5. **openparldata_search_motions** - Search motions and proposals
|
| 59 |
-
Parameters: query, status, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), submitter_id, language, limit
|
| 60 |
-
|
| 61 |
-
6. **openparldata_search_debates** - Search debate transcripts
|
| 62 |
-
Parameters: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), speaker_id, language, limit
|
| 63 |
-
|
| 64 |
-
CRITICAL RULES:
|
| 65 |
-
- All dates MUST be in YYYY-MM-DD format (e.g., "2024-12-31")
|
| 66 |
-
- NEVER use "now", "today", or relative dates - always use actual YYYY-MM-DD dates
|
| 67 |
-
- For "latest" queries, use date_from with a recent date like "2024-01-01" and NO date_to parameter
|
| 68 |
-
- vote_type must ONLY be "final", "detail", or "overall" - no other values
|
| 69 |
-
- Your response MUST be valid JSON only
|
| 70 |
-
- Do NOT include explanatory text or markdown formatting
|
| 71 |
-
|
| 72 |
-
When a user asks a question about Swiss parliamentary data:
|
| 73 |
-
1. Analyze what information they need
|
| 74 |
-
2. Determine which tool(s) to use
|
| 75 |
-
3. Extract the relevant parameters from their question
|
| 76 |
-
4. Respond with ONLY a JSON object containing the tool call
|
| 77 |
-
|
| 78 |
-
Your response should be in this exact format:
|
| 79 |
-
{
|
| 80 |
-
"tool": "tool_name",
|
| 81 |
-
"arguments": {
|
| 82 |
-
"param1": "value1",
|
| 83 |
-
"param2": "value2"
|
| 84 |
-
},
|
| 85 |
-
"explanation": "Brief explanation of what you're searching for"
|
| 86 |
-
}
|
| 87 |
-
|
| 88 |
-
If the user's question is not about Swiss parliamentary data or you cannot determine the right tool, respond with:
|
| 89 |
-
{
|
| 90 |
-
"response": "Your natural language response here"
|
| 91 |
-
}
|
| 92 |
-
|
| 93 |
-
Example:
|
| 94 |
-
User: "Who are the parliamentarians from Zurich?"
|
| 95 |
-
Assistant:
|
| 96 |
-
{
|
| 97 |
-
"tool": "openparldata_search_parliamentarians",
|
| 98 |
-
"arguments": {
|
| 99 |
-
"canton": "ZH",
|
| 100 |
-
"language": "en",
|
| 101 |
-
"limit": 20
|
| 102 |
-
},
|
| 103 |
-
"explanation": "Searching for active parliamentarians from Canton Zurich"
|
| 104 |
-
}
|
| 105 |
-
"""
|
| 106 |
-
|
| 107 |
-
# Example queries
|
| 108 |
-
EXAMPLES = {
|
| 109 |
"en": [
|
| 110 |
"Who are the parliamentarians from Zurich?",
|
| 111 |
"Show me recent votes about climate policy",
|
|
@@ -132,132 +644,50 @@ EXAMPLES = {
|
|
| 132 |
]
|
| 133 |
}
|
| 134 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
try:
|
| 140 |
-
# Create messages for chat completion
|
| 141 |
-
messages = [
|
| 142 |
-
{"role": "system", "content": SYSTEM_PROMPT},
|
| 143 |
-
{"role": "user", "content": f"Language: {language}\nQuestion: {message}"}
|
| 144 |
-
]
|
| 145 |
-
|
| 146 |
-
# Call Llama-3.1-8B via HuggingFace Inference Providers
|
| 147 |
-
response = client.chat_completion(
|
| 148 |
-
model="meta-llama/Llama-3.1-8B-Instruct",
|
| 149 |
-
messages=messages,
|
| 150 |
-
max_tokens=500,
|
| 151 |
-
temperature=0.3
|
| 152 |
-
)
|
| 153 |
-
|
| 154 |
-
# Extract response
|
| 155 |
-
assistant_message = response.choices[0].message.content
|
| 156 |
-
|
| 157 |
-
# Try to parse as JSON
|
| 158 |
-
try:
|
| 159 |
-
# Clean up response (sometimes models add markdown code blocks)
|
| 160 |
-
clean_response = assistant_message.strip()
|
| 161 |
-
if clean_response.startswith("```json"):
|
| 162 |
-
clean_response = clean_response[7:]
|
| 163 |
-
if clean_response.startswith("```"):
|
| 164 |
-
clean_response = clean_response[3:]
|
| 165 |
-
if clean_response.endswith("```"):
|
| 166 |
-
clean_response = clean_response[:-3]
|
| 167 |
-
clean_response = clean_response.strip()
|
| 168 |
-
|
| 169 |
-
# Find first { or [ (start of JSON) to handle explanatory text
|
| 170 |
-
json_start = min(
|
| 171 |
-
clean_response.find('{') if '{' in clean_response else len(clean_response),
|
| 172 |
-
clean_response.find('[') if '[' in clean_response else len(clean_response)
|
| 173 |
-
)
|
| 174 |
-
if json_start > 0:
|
| 175 |
-
clean_response = clean_response[json_start:]
|
| 176 |
-
|
| 177 |
-
return json.loads(clean_response)
|
| 178 |
-
except json.JSONDecodeError:
|
| 179 |
-
# If not valid JSON, treat as natural language response
|
| 180 |
-
return {"response": assistant_message}
|
| 181 |
-
|
| 182 |
-
except Exception as e:
|
| 183 |
-
return {"error": f"Error querying model: {str(e)}"}
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
def query_model(message: str, language: str = "en") -> dict:
|
| 187 |
-
"""Synchronous wrapper for async model query."""
|
| 188 |
-
return asyncio.run(query_model_async(message, language))
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
async def execute_tool_async(tool_name: str, arguments: dict, show_debug: bool) -> tuple:
|
| 192 |
-
"""Execute MCP tool asynchronously."""
|
| 193 |
-
return await execute_mcp_query("", tool_name, arguments, show_debug)
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
def chat_response(message: str, history: list, language: str, show_debug: bool) -> str:
|
| 197 |
"""
|
| 198 |
-
Main chat response function.
|
| 199 |
-
|
| 200 |
-
Args:
|
| 201 |
-
message: User's message
|
| 202 |
-
history: Chat history
|
| 203 |
-
language: Selected language
|
| 204 |
-
show_debug: Whether to show debug information
|
| 205 |
-
|
| 206 |
-
Returns:
|
| 207 |
-
Response string
|
| 208 |
"""
|
| 209 |
try:
|
| 210 |
-
|
| 211 |
-
|
|
|
|
| 212 |
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
# Check if it's a direct response (no tool call needed)
|
| 217 |
-
if "response" in model_response:
|
| 218 |
-
return model_response["response"]
|
| 219 |
-
|
| 220 |
-
# Check for error
|
| 221 |
-
if "error" in model_response:
|
| 222 |
-
return f"❌ {model_response['error']}"
|
| 223 |
-
|
| 224 |
-
# Execute tool call
|
| 225 |
-
if "tool" in model_response and "arguments" in model_response:
|
| 226 |
-
tool_name = model_response["tool"]
|
| 227 |
-
arguments = model_response["arguments"]
|
| 228 |
-
explanation = model_response.get("explanation", "")
|
| 229 |
-
|
| 230 |
-
# Ensure language is set in arguments
|
| 231 |
-
if "language" not in arguments:
|
| 232 |
-
arguments["language"] = lang_code
|
| 233 |
-
|
| 234 |
-
# Execute the tool
|
| 235 |
-
try:
|
| 236 |
-
response, debug_info = asyncio.run(
|
| 237 |
-
execute_tool_async(tool_name, arguments, show_debug)
|
| 238 |
-
)
|
| 239 |
-
|
| 240 |
-
# Build final response
|
| 241 |
-
final_response = ""
|
| 242 |
-
|
| 243 |
-
if explanation:
|
| 244 |
-
final_response += f"*{explanation}*\n\n"
|
| 245 |
-
|
| 246 |
-
if show_debug and debug_info:
|
| 247 |
-
final_response += f"### 🔧 Debug Information\n{debug_info}\n\n---\n\n"
|
| 248 |
-
|
| 249 |
-
final_response += f"### 📊 Results\n{response}"
|
| 250 |
-
|
| 251 |
-
return final_response
|
| 252 |
-
|
| 253 |
-
except Exception as e:
|
| 254 |
-
return f"❌ Error executing tool '{tool_name}': {str(e)}"
|
| 255 |
-
|
| 256 |
-
# Fallback
|
| 257 |
-
return "I couldn't determine how to process your request. Please try rephrasing your question."
|
| 258 |
|
| 259 |
except Exception as e:
|
| 260 |
-
return f"❌ An error occurred: {str(e)}"
|
| 261 |
|
| 262 |
|
| 263 |
# Custom CSS
|
|
@@ -269,19 +699,34 @@ custom_css = """
|
|
| 269 |
text-align: center;
|
| 270 |
padding: 20px;
|
| 271 |
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 272 |
-
color: white;
|
| 273 |
border-radius: 10px;
|
| 274 |
margin-bottom: 20px;
|
| 275 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 276 |
"""
|
| 277 |
|
| 278 |
# Build Gradio interface
|
| 279 |
-
with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 280 |
gr.Markdown(
|
| 281 |
"""
|
| 282 |
<div class="chatbot-header">
|
| 283 |
-
<h1
|
| 284 |
-
<p>
|
| 285 |
</div>
|
| 286 |
"""
|
| 287 |
)
|
|
@@ -291,12 +736,39 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
|
|
| 291 |
chatbot = gr.Chatbot(
|
| 292 |
height=500,
|
| 293 |
label="Chat with CoJournalist",
|
| 294 |
-
show_label=False
|
|
|
|
| 295 |
)
|
| 296 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 297 |
with gr.Row():
|
| 298 |
msg = gr.Textbox(
|
| 299 |
-
placeholder="
|
| 300 |
show_label=False,
|
| 301 |
scale=4
|
| 302 |
)
|
|
@@ -305,6 +777,16 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
|
|
| 305 |
with gr.Column(scale=1):
|
| 306 |
gr.Markdown("### ⚙️ Settings")
|
| 307 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 308 |
language = gr.Radio(
|
| 309 |
choices=list(LANGUAGES.keys()),
|
| 310 |
value="English",
|
|
@@ -312,70 +794,291 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
|
|
| 312 |
info="Select response language"
|
| 313 |
)
|
| 314 |
|
| 315 |
-
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 320 |
|
| 321 |
-
|
|
|
|
|
|
|
|
|
|
| 322 |
|
| 323 |
-
|
| 324 |
-
|
| 325 |
-
lang_code = LANGUAGES.get(lang, "en")
|
| 326 |
-
return gr.update(
|
| 327 |
-
choices=EXAMPLES.get(lang_code, EXAMPLES["en"])
|
| 328 |
-
)
|
| 329 |
|
| 330 |
-
|
| 331 |
-
|
| 332 |
-
|
| 333 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 334 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 335 |
|
| 336 |
-
|
| 337 |
-
|
| 338 |
-
|
| 339 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 340 |
)
|
| 341 |
|
| 342 |
-
|
| 343 |
-
|
| 344 |
-
|
| 345 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 346 |
|
| 347 |
# Check usage limit
|
| 348 |
user_id = request.client.host if request and hasattr(request, 'client') else "unknown"
|
| 349 |
|
| 350 |
if not tracker.check_limit(user_id):
|
| 351 |
-
|
| 352 |
-
|
| 353 |
-
|
| 354 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 355 |
|
| 356 |
-
# Get
|
| 357 |
-
|
| 358 |
|
| 359 |
-
|
| 360 |
-
|
|
|
|
|
|
|
|
|
|
| 361 |
|
| 362 |
-
|
| 363 |
|
| 364 |
-
|
| 365 |
-
|
| 366 |
-
|
| 367 |
|
| 368 |
-
|
| 369 |
-
|
| 370 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 371 |
|
| 372 |
gr.Markdown(
|
| 373 |
"""
|
| 374 |
---
|
| 375 |
-
**
|
| 376 |
-
|
|
|
|
| 377 |
|
| 378 |
-
**Rate Limit:** 50 requests per day per user to keep the service affordable and accessible.
|
| 379 |
|
| 380 |
Powered by [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) via HF Inference Providers and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
|
| 381 |
"""
|
|
|
|
| 1 |
"""
|
| 2 |
+
CoJournalist Data - Swiss Parliamentary Data & Statistics Chatbot
|
| 3 |
+
Powered by Llama-3.1-8B-Instruct with OpenParlData and BFS MCP
|
| 4 |
"""
|
| 5 |
|
| 6 |
import os
|
| 7 |
import json
|
| 8 |
+
import tempfile
|
| 9 |
+
from datetime import datetime
|
| 10 |
+
from pathlib import Path
|
| 11 |
import gradio as gr
|
| 12 |
from huggingface_hub import InferenceClient
|
| 13 |
from dotenv import load_dotenv
|
| 14 |
+
from mcp_integration import execute_mcp_query, execute_mcp_query_bfs
|
| 15 |
import asyncio
|
| 16 |
from usage_tracker import UsageTracker
|
| 17 |
|
| 18 |
# Load environment variables
|
| 19 |
load_dotenv()
|
| 20 |
|
| 21 |
+
# Load system prompts from files
|
| 22 |
+
PROMPTS_DIR = Path(__file__).parent / "prompts"
|
| 23 |
+
|
| 24 |
+
def load_prompt(dataset_name: str) -> str:
|
| 25 |
+
"""Load system prompt from file."""
|
| 26 |
+
prompt_file = PROMPTS_DIR / f"{dataset_name}.txt"
|
| 27 |
+
if not prompt_file.exists():
|
| 28 |
+
raise FileNotFoundError(f"Prompt file not found: {prompt_file}")
|
| 29 |
+
return prompt_file.read_text(encoding='utf-8')
|
| 30 |
+
|
| 31 |
+
# Load prompts at startup
|
| 32 |
+
PARLIAMENT_PROMPT = load_prompt("parliament")
|
| 33 |
+
BFS_PROMPT = load_prompt("bfs")
|
| 34 |
+
|
| 35 |
# Initialize Hugging Face Inference Client
|
| 36 |
HF_TOKEN = os.getenv("HF_TOKEN")
|
| 37 |
if not HF_TOKEN:
|
|
|
|
| 39 |
|
| 40 |
client = InferenceClient(token=HF_TOKEN)
|
| 41 |
|
| 42 |
+
class DatasetEngine:
|
| 43 |
+
"""Dataset-specific orchestrator for LLM prompting and tool execution."""
|
| 44 |
+
|
| 45 |
+
def __init__(
|
| 46 |
+
self,
|
| 47 |
+
name: str,
|
| 48 |
+
display_name: str,
|
| 49 |
+
system_prompt: str,
|
| 50 |
+
routing_instruction: str,
|
| 51 |
+
allowed_tools: set[str],
|
| 52 |
+
):
|
| 53 |
+
self.name = name
|
| 54 |
+
self.display_name = display_name
|
| 55 |
+
self.system_prompt = system_prompt
|
| 56 |
+
self.routing_instruction = routing_instruction
|
| 57 |
+
self.allowed_tools = allowed_tools
|
| 58 |
+
|
| 59 |
+
def build_messages(self, user_message: str, language_label: str, language_code: str) -> list[dict]:
|
| 60 |
+
"""Construct chat completion messages with dataset-specific guardrails."""
|
| 61 |
+
routing_guardrails = (
|
| 62 |
+
f"TARGET_DATA_SOURCE: {self.display_name}\n"
|
| 63 |
+
f"{self.routing_instruction}\n"
|
| 64 |
+
'If the request requires a different data source, respond with '
|
| 65 |
+
'{"response": "Explain that the other dataset should be selected in the app."}'
|
| 66 |
+
)
|
| 67 |
+
return [
|
| 68 |
+
{"role": "system", "content": self.system_prompt},
|
| 69 |
+
{"role": "system", "content": routing_guardrails},
|
| 70 |
+
{
|
| 71 |
+
"role": "user",
|
| 72 |
+
"content": (
|
| 73 |
+
f"Selected dataset: {self.display_name}\n"
|
| 74 |
+
f"Language preference: {language_label} ({language_code})\n"
|
| 75 |
+
f"Question: {user_message}"
|
| 76 |
+
),
|
| 77 |
+
},
|
| 78 |
+
]
|
| 79 |
+
|
| 80 |
+
@staticmethod
|
| 81 |
+
def _parse_model_response(raw_response: str) -> dict:
|
| 82 |
+
"""Parse JSON (with cleanup) returned by the LLM."""
|
| 83 |
+
clean_response = raw_response.strip()
|
| 84 |
+
if clean_response.startswith("```json"):
|
| 85 |
+
clean_response = clean_response[7:]
|
| 86 |
+
if clean_response.startswith("```"):
|
| 87 |
+
clean_response = clean_response[3:]
|
| 88 |
+
if clean_response.endswith("```"):
|
| 89 |
+
clean_response = clean_response[:-3]
|
| 90 |
+
clean_response = clean_response.strip()
|
| 91 |
+
|
| 92 |
+
json_start_candidates = []
|
| 93 |
+
for ch in ("{", "["):
|
| 94 |
+
idx = clean_response.find(ch)
|
| 95 |
+
if idx != -1:
|
| 96 |
+
json_start_candidates.append(idx)
|
| 97 |
+
if json_start_candidates:
|
| 98 |
+
clean_response = clean_response[min(json_start_candidates):]
|
| 99 |
+
|
| 100 |
+
return json.loads(clean_response)
|
| 101 |
+
|
| 102 |
+
def query_model(self, user_message: str, language_label: str, language_code: str) -> dict:
|
| 103 |
+
"""Call the LLM with dataset-constrained instructions."""
|
| 104 |
+
try:
|
| 105 |
+
messages = self.build_messages(user_message, language_label, language_code)
|
| 106 |
+
response = client.chat_completion(
|
| 107 |
+
model="meta-llama/Llama-3.1-8B-Instruct",
|
| 108 |
+
messages=messages,
|
| 109 |
+
max_tokens=500,
|
| 110 |
+
temperature=0.3,
|
| 111 |
+
)
|
| 112 |
+
assistant_message = response.choices[0].message.content
|
| 113 |
+
return self._parse_model_response(assistant_message)
|
| 114 |
+
except json.JSONDecodeError:
|
| 115 |
+
# Surface malformed responses to the user so they can retry.
|
| 116 |
+
return {"response": assistant_message}
|
| 117 |
+
except Exception as exc:
|
| 118 |
+
return {"error": f"Error querying model: {str(exc)}"}
|
| 119 |
+
|
| 120 |
+
def execute_tool(
|
| 121 |
+
self,
|
| 122 |
+
user_message: str,
|
| 123 |
+
tool_name: str,
|
| 124 |
+
arguments: dict,
|
| 125 |
+
show_debug: bool,
|
| 126 |
+
) -> tuple[str, str | None]:
|
| 127 |
+
"""Run the MCP tool for the dataset."""
|
| 128 |
+
raise NotImplementedError("execute_tool must be implemented by subclasses.")
|
| 129 |
+
|
| 130 |
+
def sanitize_arguments(self, tool_name: str, arguments: dict) -> dict:
|
| 131 |
+
"""
|
| 132 |
+
Sanitize and validate tool arguments before execution.
|
| 133 |
+
|
| 134 |
+
Args:
|
| 135 |
+
tool_name: Name of the tool being called
|
| 136 |
+
arguments: Raw arguments from LLM
|
| 137 |
+
|
| 138 |
+
Returns:
|
| 139 |
+
Sanitized arguments dict with proper types and valid values
|
| 140 |
+
"""
|
| 141 |
+
raise NotImplementedError("sanitize_arguments must be implemented by subclasses.")
|
| 142 |
+
|
| 143 |
+
def _compose_response_text(
|
| 144 |
+
self,
|
| 145 |
+
explanation: str,
|
| 146 |
+
debug_info: str | None,
|
| 147 |
+
show_debug: bool,
|
| 148 |
+
body: str,
|
| 149 |
+
) -> str:
|
| 150 |
+
parts = []
|
| 151 |
+
if explanation:
|
| 152 |
+
parts.append(f"*{explanation}*")
|
| 153 |
+
if show_debug and debug_info:
|
| 154 |
+
parts.append(f"### 🔧 Debug Information\n{debug_info}\n\n---")
|
| 155 |
+
parts.append(body)
|
| 156 |
+
return "\n\n".join(parts)
|
| 157 |
+
|
| 158 |
+
def postprocess_tool_response(
|
| 159 |
+
self,
|
| 160 |
+
*,
|
| 161 |
+
response: str,
|
| 162 |
+
tool_name: str,
|
| 163 |
+
explanation: str,
|
| 164 |
+
debug_info: str | None,
|
| 165 |
+
show_debug: bool,
|
| 166 |
+
language_code: str,
|
| 167 |
+
) -> tuple[str, str | None, dict, list]:
|
| 168 |
+
"""Default dataset response handler."""
|
| 169 |
+
body = f"### 📊 Results\n{response}"
|
| 170 |
+
final_response = self._compose_response_text(explanation, debug_info, show_debug, body)
|
| 171 |
+
return final_response, None, {}, []
|
| 172 |
+
|
| 173 |
+
def respond(
|
| 174 |
+
self,
|
| 175 |
+
user_message: str,
|
| 176 |
+
language_label: str,
|
| 177 |
+
language_code: str,
|
| 178 |
+
show_debug: bool,
|
| 179 |
+
) -> tuple[str, str | None, dict, list]:
|
| 180 |
+
"""Entry point used by the Gradio handler."""
|
| 181 |
+
model_response = self.query_model(user_message, language_label, language_code)
|
| 182 |
+
|
| 183 |
+
if "response" in model_response:
|
| 184 |
+
return model_response["response"], None, {}, []
|
| 185 |
+
|
| 186 |
+
if "error" in model_response:
|
| 187 |
+
return f"❌ {model_response['error']}", None, {}, []
|
| 188 |
+
|
| 189 |
+
tool_name = model_response.get("tool")
|
| 190 |
+
arguments = model_response.get("arguments")
|
| 191 |
+
|
| 192 |
+
if not tool_name or not isinstance(arguments, dict):
|
| 193 |
+
return (
|
| 194 |
+
"I couldn't determine how to process your request. Please try rephrasing your question.",
|
| 195 |
+
None,
|
| 196 |
+
{},
|
| 197 |
+
[],
|
| 198 |
+
)
|
| 199 |
+
|
| 200 |
+
if tool_name not in self.allowed_tools:
|
| 201 |
+
allowed_list = ", ".join(sorted(self.allowed_tools))
|
| 202 |
+
warning = (
|
| 203 |
+
f"❌ Tool '{tool_name}' is not available for {self.display_name}. "
|
| 204 |
+
f"Allowed tools: {allowed_list}. Please adjust your request."
|
| 205 |
+
)
|
| 206 |
+
return warning, None, {}, []
|
| 207 |
+
|
| 208 |
+
if "language" not in arguments:
|
| 209 |
+
arguments["language"] = language_code
|
| 210 |
+
|
| 211 |
+
# Sanitize arguments before execution
|
| 212 |
+
arguments = self.sanitize_arguments(tool_name, arguments)
|
| 213 |
+
print(f"✅ [DatasetEngine] Sanitized arguments: {arguments}")
|
| 214 |
+
|
| 215 |
+
explanation = model_response.get("explanation", "")
|
| 216 |
+
response, debug_info = self.execute_tool(user_message, tool_name, arguments, show_debug)
|
| 217 |
+
|
| 218 |
+
return self.postprocess_tool_response(
|
| 219 |
+
response=response,
|
| 220 |
+
tool_name=tool_name,
|
| 221 |
+
explanation=explanation,
|
| 222 |
+
debug_info=debug_info,
|
| 223 |
+
show_debug=show_debug,
|
| 224 |
+
language_code=language_code,
|
| 225 |
+
)
|
| 226 |
+
|
| 227 |
+
|
| 228 |
+
class ParliamentEngine(DatasetEngine):
|
| 229 |
+
# Valid parameter names per tool
|
| 230 |
+
TOOL_PARAMS = {
|
| 231 |
+
"openparldata_search_parliamentarians": {
|
| 232 |
+
"query", "canton", "party", "active_only", "level", "language",
|
| 233 |
+
"limit", "offset", "response_format"
|
| 234 |
+
},
|
| 235 |
+
"openparldata_search_votes": {
|
| 236 |
+
"query", "date_from", "date_to", "parliament_id", "vote_type",
|
| 237 |
+
"level", "language", "limit", "offset", "response_format"
|
| 238 |
+
},
|
| 239 |
+
"openparldata_search_motions": {
|
| 240 |
+
"query", "submitter_id", "status", "date_from", "date_to",
|
| 241 |
+
"level", "language", "limit", "offset", "response_format"
|
| 242 |
+
},
|
| 243 |
+
"openparldata_search_debates": {
|
| 244 |
+
"query", "date_from", "date_to", "speaker_id", "topic",
|
| 245 |
+
"parliament_id", "level", "language", "limit", "offset", "response_format"
|
| 246 |
+
},
|
| 247 |
+
}
|
| 248 |
+
|
| 249 |
+
def __init__(self):
|
| 250 |
+
super().__init__(
|
| 251 |
+
name="parliament",
|
| 252 |
+
display_name="Swiss Parliament Data (OpenParlData)",
|
| 253 |
+
system_prompt=PARLIAMENT_PROMPT,
|
| 254 |
+
routing_instruction="Use only tools that begin with 'openparldata_'. Never mention BFS tools.",
|
| 255 |
+
allowed_tools={
|
| 256 |
+
"openparldata_search_parliamentarians",
|
| 257 |
+
"openparldata_search_votes",
|
| 258 |
+
"openparldata_search_motions",
|
| 259 |
+
"openparldata_search_debates",
|
| 260 |
+
},
|
| 261 |
+
)
|
| 262 |
+
|
| 263 |
+
def sanitize_arguments(self, tool_name: str, arguments: dict) -> dict:
|
| 264 |
+
"""Sanitize arguments for OpenParlData tools."""
|
| 265 |
+
sanitized = {}
|
| 266 |
+
valid_params = self.TOOL_PARAMS.get(tool_name, set())
|
| 267 |
+
|
| 268 |
+
for key, value in arguments.items():
|
| 269 |
+
# Skip extra fields not in the tool schema
|
| 270 |
+
if key not in valid_params:
|
| 271 |
+
print(f"⚠️ [ParliamentEngine] Skipping invalid parameter '{key}' for {tool_name}")
|
| 272 |
+
continue
|
| 273 |
+
|
| 274 |
+
# Type conversions
|
| 275 |
+
if key == "limit":
|
| 276 |
+
# Convert to int and clamp to 1-100
|
| 277 |
+
try:
|
| 278 |
+
limit_val = int(value) if isinstance(value, str) else value
|
| 279 |
+
sanitized[key] = max(1, min(100, limit_val))
|
| 280 |
+
except (ValueError, TypeError):
|
| 281 |
+
sanitized[key] = 20 # Default
|
| 282 |
+
elif key == "offset":
|
| 283 |
+
# Convert to int and ensure >= 0
|
| 284 |
+
try:
|
| 285 |
+
offset_val = int(value) if isinstance(value, str) else value
|
| 286 |
+
sanitized[key] = max(0, offset_val)
|
| 287 |
+
except (ValueError, TypeError):
|
| 288 |
+
sanitized[key] = 0 # Default
|
| 289 |
+
elif key == "language":
|
| 290 |
+
# Validate language enum (case-insensitive)
|
| 291 |
+
lang_upper = str(value).upper()
|
| 292 |
+
if lang_upper in ["DE", "FR", "IT", "EN"]:
|
| 293 |
+
sanitized[key] = lang_upper.lower()
|
| 294 |
+
else:
|
| 295 |
+
sanitized[key] = "en" # Default to English
|
| 296 |
+
elif key == "active_only":
|
| 297 |
+
# Convert to bool
|
| 298 |
+
sanitized[key] = bool(value)
|
| 299 |
+
else:
|
| 300 |
+
# Keep other values as-is
|
| 301 |
+
sanitized[key] = value
|
| 302 |
+
|
| 303 |
+
return sanitized
|
| 304 |
+
|
| 305 |
+
def execute_tool(
|
| 306 |
+
self,
|
| 307 |
+
user_message: str,
|
| 308 |
+
tool_name: str,
|
| 309 |
+
arguments: dict,
|
| 310 |
+
show_debug: bool,
|
| 311 |
+
) -> tuple[str, str | None]:
|
| 312 |
+
# DEBUG: Capture arguments before MCP call
|
| 313 |
+
print(f"\n🔍 [ParliamentEngine] execute_tool called:")
|
| 314 |
+
print(f" Tool: {tool_name}")
|
| 315 |
+
print(f" Arguments: {arguments}")
|
| 316 |
+
print(f" Argument types: {dict((k, type(v).__name__) for k, v in arguments.items())}")
|
| 317 |
+
return asyncio.run(execute_mcp_query(user_message, tool_name, arguments, show_debug))
|
| 318 |
+
|
| 319 |
+
def postprocess_tool_response(
|
| 320 |
+
self,
|
| 321 |
+
*,
|
| 322 |
+
response: str,
|
| 323 |
+
tool_name: str,
|
| 324 |
+
explanation: str,
|
| 325 |
+
debug_info: str | None,
|
| 326 |
+
show_debug: bool,
|
| 327 |
+
language_code: str,
|
| 328 |
+
) -> tuple[str, str | None, dict, list]:
|
| 329 |
+
"""Parse OpenParlData JSON responses and create card data."""
|
| 330 |
+
parliament_cards = []
|
| 331 |
+
language_fallback = False
|
| 332 |
+
|
| 333 |
+
# Try to parse JSON response
|
| 334 |
+
try:
|
| 335 |
+
data = json.loads(response)
|
| 336 |
+
|
| 337 |
+
# Check if it's an OpenParlData response with data array
|
| 338 |
+
if isinstance(data, dict) and "data" in data and isinstance(data["data"], list):
|
| 339 |
+
# Extract card info from each item
|
| 340 |
+
for item in data["data"]:
|
| 341 |
+
if isinstance(item, dict):
|
| 342 |
+
# Get title in user's preferred language with fallback
|
| 343 |
+
title = "Untitled"
|
| 344 |
+
title_dict = item.get("affair_title") if "affair_title" in item else item.get("title")
|
| 345 |
+
|
| 346 |
+
if isinstance(title_dict, dict):
|
| 347 |
+
# Try user's language first
|
| 348 |
+
if language_code == "en":
|
| 349 |
+
# English not available in API, fallback to German
|
| 350 |
+
title = title_dict.get("de") or title_dict.get("fr") or title_dict.get("it") or "Untitled"
|
| 351 |
+
if title != "Untitled":
|
| 352 |
+
language_fallback = True
|
| 353 |
+
else:
|
| 354 |
+
# Try user's language, fallback to de → fr → it
|
| 355 |
+
title = (title_dict.get(language_code) or
|
| 356 |
+
title_dict.get("de") or
|
| 357 |
+
title_dict.get("fr") or
|
| 358 |
+
title_dict.get("it") or
|
| 359 |
+
"Untitled")
|
| 360 |
+
|
| 361 |
+
# Get URL in user's preferred language
|
| 362 |
+
url = "#"
|
| 363 |
+
if "url_external" in item and isinstance(item["url_external"], dict):
|
| 364 |
+
if language_code == "en":
|
| 365 |
+
url = item["url_external"].get("de") or item["url_external"].get("fr") or item["url_external"].get("it") or "#"
|
| 366 |
+
else:
|
| 367 |
+
url = (item["url_external"].get(language_code) or
|
| 368 |
+
item["url_external"].get("de") or
|
| 369 |
+
item["url_external"].get("fr") or
|
| 370 |
+
item["url_external"].get("it") or
|
| 371 |
+
"#")
|
| 372 |
+
|
| 373 |
+
# Add date if available
|
| 374 |
+
date_str = ""
|
| 375 |
+
if "date" in item:
|
| 376 |
+
date_str = item["date"][:10] # Extract YYYY-MM-DD
|
| 377 |
+
|
| 378 |
+
parliament_cards.append({
|
| 379 |
+
"title": title,
|
| 380 |
+
"url": url,
|
| 381 |
+
"date": date_str
|
| 382 |
+
})
|
| 383 |
+
|
| 384 |
+
# If we have cards, show a summary message
|
| 385 |
+
if parliament_cards:
|
| 386 |
+
count = len(parliament_cards)
|
| 387 |
+
total = data.get("meta", {}).get("total_records", count)
|
| 388 |
+
body = f"### 🏛️ Parliament Results\n\nFound **{total}** result(s). Showing {count} items below:"
|
| 389 |
+
|
| 390 |
+
# Add language fallback notice for English users
|
| 391 |
+
if language_fallback and language_code == "en":
|
| 392 |
+
body += "\n\n*Note: English content is not available from the API. Results are displayed in German.*"
|
| 393 |
+
else:
|
| 394 |
+
body = "### 🏛️ Parliament Results\n\nNo results found for your query."
|
| 395 |
+
else:
|
| 396 |
+
# Not a data response, show as-is
|
| 397 |
+
body = f"### 📊 Results\n{response}"
|
| 398 |
+
|
| 399 |
+
except json.JSONDecodeError:
|
| 400 |
+
# Not JSON, treat as text response
|
| 401 |
+
body = f"### 📊 Results\n{response}"
|
| 402 |
+
|
| 403 |
+
final_response = self._compose_response_text(explanation, debug_info, show_debug, body)
|
| 404 |
+
return final_response, None, {}, parliament_cards
|
| 405 |
+
|
| 406 |
+
|
| 407 |
+
class BFSEngine(DatasetEngine):
|
| 408 |
+
# Valid parameter names per tool
|
| 409 |
+
TOOL_PARAMS = {
|
| 410 |
+
"bfs_search": {
|
| 411 |
+
"keywords", "language" # NO format parameter!
|
| 412 |
+
},
|
| 413 |
+
"bfs_query_data": {
|
| 414 |
+
"datacube_id", "filters", "format", "language"
|
| 415 |
+
},
|
| 416 |
+
}
|
| 417 |
+
|
| 418 |
+
def __init__(self):
|
| 419 |
+
super().__init__(
|
| 420 |
+
name="statistics",
|
| 421 |
+
display_name="Swiss Statistics (BFS)",
|
| 422 |
+
system_prompt=BFS_PROMPT,
|
| 423 |
+
routing_instruction="Use only tools that begin with 'bfs_'. Never mention OpenParlData tools.",
|
| 424 |
+
allowed_tools={
|
| 425 |
+
"bfs_search",
|
| 426 |
+
"bfs_query_data",
|
| 427 |
+
},
|
| 428 |
+
)
|
| 429 |
+
|
| 430 |
+
def sanitize_arguments(self, tool_name: str, arguments: dict) -> dict:
|
| 431 |
+
"""Sanitize arguments for BFS tools."""
|
| 432 |
+
sanitized = {}
|
| 433 |
+
valid_params = self.TOOL_PARAMS.get(tool_name, set())
|
| 434 |
+
|
| 435 |
+
for key, value in arguments.items():
|
| 436 |
+
# Skip extra fields not in the tool schema
|
| 437 |
+
if key not in valid_params:
|
| 438 |
+
print(f"⚠️ [BFSEngine] Skipping invalid parameter '{key}' for {tool_name}")
|
| 439 |
+
continue
|
| 440 |
+
|
| 441 |
+
# Type conversions
|
| 442 |
+
if key == "language":
|
| 443 |
+
# Validate language enum (case-insensitive)
|
| 444 |
+
lang_upper = str(value).upper()
|
| 445 |
+
if lang_upper in ["DE", "FR", "IT", "EN"]:
|
| 446 |
+
sanitized[key] = lang_upper.lower()
|
| 447 |
+
else:
|
| 448 |
+
sanitized[key] = "en" # Default to English
|
| 449 |
+
elif key == "format":
|
| 450 |
+
# Validate and normalize format enum (only for bfs_query_data)
|
| 451 |
+
if tool_name == "bfs_query_data":
|
| 452 |
+
format_upper = str(value).upper().replace("-", "_")
|
| 453 |
+
# Map common values to DataFormat enum
|
| 454 |
+
format_map = {
|
| 455 |
+
"CSV": "csv",
|
| 456 |
+
"JSON": "json",
|
| 457 |
+
"JSON_STAT": "json-stat",
|
| 458 |
+
"JSON_STAT2": "json-stat2",
|
| 459 |
+
"PX": "px",
|
| 460 |
+
}
|
| 461 |
+
sanitized[key] = format_map.get(format_upper, "csv") # Default to CSV
|
| 462 |
+
else:
|
| 463 |
+
# Keep other values as-is
|
| 464 |
+
sanitized[key] = value
|
| 465 |
+
|
| 466 |
+
# Add default format for bfs_query_data if not present
|
| 467 |
+
if tool_name == "bfs_query_data" and "format" not in sanitized:
|
| 468 |
+
sanitized["format"] = "csv"
|
| 469 |
+
|
| 470 |
+
return sanitized
|
| 471 |
+
|
| 472 |
+
def execute_tool(
|
| 473 |
+
self,
|
| 474 |
+
user_message: str,
|
| 475 |
+
tool_name: str,
|
| 476 |
+
arguments: dict,
|
| 477 |
+
show_debug: bool,
|
| 478 |
+
) -> tuple[str, str | None]:
|
| 479 |
+
# DEBUG: Capture arguments after sanitization
|
| 480 |
+
print(f"\n🔍 [BFSEngine] execute_tool called:")
|
| 481 |
+
print(f" Tool: {tool_name}")
|
| 482 |
+
print(f" Arguments (sanitized): {arguments}")
|
| 483 |
+
print(f" Argument types: {dict((k, type(v).__name__) for k, v in arguments.items())}")
|
| 484 |
+
return asyncio.run(execute_mcp_query_bfs(user_message, tool_name, arguments, show_debug))
|
| 485 |
+
|
| 486 |
+
@staticmethod
|
| 487 |
+
def _parse_datacube_choices(response: str) -> tuple[dict, list]:
|
| 488 |
+
datacube_map: dict[str, str] = {}
|
| 489 |
+
datacube_choices: list[str] = []
|
| 490 |
+
import re
|
| 491 |
+
|
| 492 |
+
lines = response.split('\n')
|
| 493 |
+
i = 0
|
| 494 |
+
while i < len(lines):
|
| 495 |
+
line = lines[i]
|
| 496 |
+
match = re.search(r'^\s*\d+\.\s+\*\*([^*]+)\*\*\s*$', line)
|
| 497 |
+
if match:
|
| 498 |
+
datacube_id = match.group(1).strip()
|
| 499 |
+
description = datacube_id
|
| 500 |
+
if i + 1 < len(lines):
|
| 501 |
+
next_line = lines[i + 1].strip()
|
| 502 |
+
if not next_line.startswith('↳') and next_line:
|
| 503 |
+
description = next_line
|
| 504 |
+
elif i + 2 < len(lines):
|
| 505 |
+
description = lines[i + 2].strip() or datacube_id
|
| 506 |
+
if len(description) > 80:
|
| 507 |
+
description = description[:77] + "..."
|
| 508 |
+
label = f"{description} ({datacube_id})"
|
| 509 |
+
datacube_choices.append(label)
|
| 510 |
+
datacube_map[label] = datacube_id
|
| 511 |
+
i += 1
|
| 512 |
+
return datacube_map, datacube_choices
|
| 513 |
+
|
| 514 |
+
@staticmethod
|
| 515 |
+
def _detect_csv(response: str) -> bool:
|
| 516 |
+
lines = response.strip().split('\n')
|
| 517 |
+
if len(lines) < 2:
|
| 518 |
+
return False
|
| 519 |
+
if ',' not in lines[0] or ',' not in lines[1]:
|
| 520 |
+
return False
|
| 521 |
+
prefix = response.lower()[:200]
|
| 522 |
+
error_tokens = ["error", "no data", "no datacubes found", "try broader"]
|
| 523 |
+
return not any(token in prefix for token in error_tokens)
|
| 524 |
+
|
| 525 |
+
def postprocess_tool_response(
|
| 526 |
+
self,
|
| 527 |
+
*,
|
| 528 |
+
response: str,
|
| 529 |
+
tool_name: str,
|
| 530 |
+
explanation: str,
|
| 531 |
+
debug_info: str | None,
|
| 532 |
+
show_debug: bool,
|
| 533 |
+
language_code: str,
|
| 534 |
+
) -> tuple[str, str | None, dict, list]:
|
| 535 |
+
csv_file_path = None
|
| 536 |
+
datacube_map: dict[str, str] = {}
|
| 537 |
+
datacube_choices: list[str] = []
|
| 538 |
+
body = ""
|
| 539 |
+
|
| 540 |
+
if tool_name == "bfs_query_data" and self._detect_csv(response):
|
| 541 |
+
rows = response.count('\n')
|
| 542 |
+
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
| 543 |
+
csv_filename = f"bfs_data_{timestamp}.csv"
|
| 544 |
+
csv_file_path = os.path.join(tempfile.gettempdir(), csv_filename)
|
| 545 |
+
with open(csv_file_path, 'w', encoding='utf-8') as f:
|
| 546 |
+
f.write(response)
|
| 547 |
+
body = (
|
| 548 |
+
"### 📊 Data Ready\n"
|
| 549 |
+
f"✅ CSV file generated with {rows} rows\n\n"
|
| 550 |
+
"💾 **Download your data using the button below**"
|
| 551 |
+
)
|
| 552 |
+
else:
|
| 553 |
+
if tool_name == "bfs_search" and "matching datacube" in response.lower():
|
| 554 |
+
datacube_map, datacube_choices = self._parse_datacube_choices(response)
|
| 555 |
+
|
| 556 |
+
# If we found datacubes, show a simple message instead of the full response
|
| 557 |
+
if datacube_choices:
|
| 558 |
+
# Extract the search term from explanation
|
| 559 |
+
import re
|
| 560 |
+
match = re.search(r'related to (.+)', explanation, re.IGNORECASE)
|
| 561 |
+
search_term = match.group(1).strip() if match else "your search"
|
| 562 |
+
body = f"### 📊 Available Datasets\n\nHere is the data available for **{search_term}**. Please select a dataset below to download:"
|
| 563 |
+
else:
|
| 564 |
+
# No datacubes found, show the full error message
|
| 565 |
+
body = f"### 📊 Results\n{response}"
|
| 566 |
+
else:
|
| 567 |
+
body = f"### 📊 Results\n{response}"
|
| 568 |
+
|
| 569 |
+
final_response = self._compose_response_text(explanation, debug_info, show_debug, body)
|
| 570 |
+
return final_response, csv_file_path, datacube_map, datacube_choices
|
| 571 |
+
|
| 572 |
+
def fetch_datacube_data(
|
| 573 |
+
self,
|
| 574 |
+
datacube_id: str,
|
| 575 |
+
language_code: str,
|
| 576 |
+
show_debug: bool,
|
| 577 |
+
) -> tuple[str, str | None]:
|
| 578 |
+
response, debug_info = self.execute_tool(
|
| 579 |
+
user_message=f"Get data for datacube {datacube_id}",
|
| 580 |
+
tool_name="bfs_query_data",
|
| 581 |
+
arguments={"datacube_id": datacube_id, "language": language_code},
|
| 582 |
+
show_debug=show_debug,
|
| 583 |
+
)
|
| 584 |
+
if self._detect_csv(response):
|
| 585 |
+
rows = response.count('\n')
|
| 586 |
+
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
| 587 |
+
csv_filename = f"bfs_data_{timestamp}.csv"
|
| 588 |
+
csv_file_path = os.path.join(tempfile.gettempdir(), csv_filename)
|
| 589 |
+
with open(csv_file_path, 'w', encoding='utf-8') as f:
|
| 590 |
+
f.write(response)
|
| 591 |
+
message = (
|
| 592 |
+
"### 📊 Data Ready\n"
|
| 593 |
+
f"✅ CSV file generated with {rows} rows for datacube: `{datacube_id}`\n\n"
|
| 594 |
+
"💾 **Download your data using the button below**"
|
| 595 |
+
)
|
| 596 |
+
if show_debug and debug_info:
|
| 597 |
+
message = f"### 🔧 Debug Information\n{debug_info}\n\n---\n\n{message}"
|
| 598 |
+
return message, csv_file_path
|
| 599 |
+
error_message = f"❌ Error retrieving data:\n\n{response}"
|
| 600 |
+
return error_message, None
|
| 601 |
+
|
| 602 |
+
|
| 603 |
+
DATASET_ENGINES: dict[str, DatasetEngine] = {
|
| 604 |
+
"parliament": ParliamentEngine(),
|
| 605 |
+
"statistics": BFSEngine(),
|
| 606 |
+
}
|
| 607 |
+
|
| 608 |
# Initialize usage tracker with 50 requests per day limit
|
| 609 |
tracker = UsageTracker(daily_limit=50)
|
| 610 |
|
|
|
|
| 616 |
"Italiano": "it"
|
| 617 |
}
|
| 618 |
|
| 619 |
+
# Example queries for OpenParlData
|
| 620 |
+
OPENPARLDATA_EXAMPLES = {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 621 |
"en": [
|
| 622 |
"Who are the parliamentarians from Zurich?",
|
| 623 |
"Show me recent votes about climate policy",
|
|
|
|
| 644 |
]
|
| 645 |
}
|
| 646 |
|
| 647 |
+
# Example queries for BFS (two-step workflow)
|
| 648 |
+
BFS_EXAMPLES = {
|
| 649 |
+
"en": [
|
| 650 |
+
"I want inflation data",
|
| 651 |
+
"Show me population statistics",
|
| 652 |
+
"I need employment data by canton",
|
| 653 |
+
"Find energy consumption statistics"
|
| 654 |
+
],
|
| 655 |
+
"de": [
|
| 656 |
+
"Ich möchte Inflationsdaten",
|
| 657 |
+
"Zeige mir Bevölkerungsstatistiken",
|
| 658 |
+
"Ich brauche Beschäftigungsdaten nach Kanton",
|
| 659 |
+
"Finde Energieverbrauchsstatistiken"
|
| 660 |
+
],
|
| 661 |
+
"fr": [
|
| 662 |
+
"Je veux des données sur l'inflation",
|
| 663 |
+
"Montrez-moi les statistiques de population",
|
| 664 |
+
"J'ai besoin de données sur l'emploi par canton",
|
| 665 |
+
"Trouvez les statistiques de consommation d'énergie"
|
| 666 |
+
],
|
| 667 |
+
"it": [
|
| 668 |
+
"Voglio dati sull'inflazione",
|
| 669 |
+
"Mostrami le statistiche sulla popolazione",
|
| 670 |
+
"Ho bisogno di dati sull'occupazione per cantone",
|
| 671 |
+
"Trova le statistiche sul consumo energetico"
|
| 672 |
+
]
|
| 673 |
+
}
|
| 674 |
|
| 675 |
+
# Keep backward compatibility
|
| 676 |
+
EXAMPLES = OPENPARLDATA_EXAMPLES
|
| 677 |
+
def chat_response(message: str, history: list, language: str, show_debug: bool, dataset: str = "parliament") -> tuple[str, str | None, dict, list]:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 678 |
"""
|
| 679 |
+
Main chat response function routed through dataset-specific engines.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 680 |
"""
|
| 681 |
try:
|
| 682 |
+
engine = DATASET_ENGINES.get(dataset)
|
| 683 |
+
if not engine:
|
| 684 |
+
return f"❌ Unknown dataset selected: {dataset}", None, {}, []
|
| 685 |
|
| 686 |
+
language_code = LANGUAGES.get(language, "en")
|
| 687 |
+
return engine.respond(message, language, language_code, show_debug)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 688 |
|
| 689 |
except Exception as e:
|
| 690 |
+
return f"❌ An error occurred: {str(e)}", None, {}, []
|
| 691 |
|
| 692 |
|
| 693 |
# Custom CSS
|
|
|
|
| 699 |
text-align: center;
|
| 700 |
padding: 20px;
|
| 701 |
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 702 |
+
color: white !important;
|
| 703 |
border-radius: 10px;
|
| 704 |
margin-bottom: 20px;
|
| 705 |
}
|
| 706 |
+
.chatbot-header h1 {
|
| 707 |
+
color: white !important;
|
| 708 |
+
margin: 0;
|
| 709 |
+
}
|
| 710 |
+
.chatbot-header p {
|
| 711 |
+
color: white !important;
|
| 712 |
+
margin: 10px 0 0 0;
|
| 713 |
+
}
|
| 714 |
"""
|
| 715 |
|
| 716 |
# Build Gradio interface
|
| 717 |
+
with gr.Blocks(css=custom_css, title="CoJournalist Swiss Data") as demo:
|
| 718 |
+
# State to track datacube search results
|
| 719 |
+
datacube_state = gr.State({}) # Maps display text → datacube_id
|
| 720 |
+
|
| 721 |
+
# State to track parliament cards
|
| 722 |
+
parliament_cards_state = gr.State([]) # List of card dicts
|
| 723 |
+
parliament_page_state = gr.State(1) # Current page number
|
| 724 |
+
|
| 725 |
gr.Markdown(
|
| 726 |
"""
|
| 727 |
<div class="chatbot-header">
|
| 728 |
+
<h1>🇨🇭 CoJournalist Swiss Data</h1>
|
| 729 |
+
<p>Query Swiss parliamentary and statistical data in natural language</p>
|
| 730 |
</div>
|
| 731 |
"""
|
| 732 |
)
|
|
|
|
| 736 |
chatbot = gr.Chatbot(
|
| 737 |
height=500,
|
| 738 |
label="Chat with CoJournalist",
|
| 739 |
+
show_label=False,
|
| 740 |
+
type="messages"
|
| 741 |
)
|
| 742 |
|
| 743 |
+
# CSV download file component
|
| 744 |
+
download_file = gr.File(
|
| 745 |
+
label="📥 Download Data",
|
| 746 |
+
visible=False,
|
| 747 |
+
interactive=False
|
| 748 |
+
)
|
| 749 |
+
|
| 750 |
+
# Datacube selection (hidden by default, shown when search returns results)
|
| 751 |
+
with gr.Row(visible=False) as datacube_selection_row:
|
| 752 |
+
with gr.Column(scale=4):
|
| 753 |
+
datacube_radio = gr.Radio(
|
| 754 |
+
label="📋 Select Datacube for Download",
|
| 755 |
+
choices=[],
|
| 756 |
+
visible=True
|
| 757 |
+
)
|
| 758 |
+
with gr.Column(scale=1):
|
| 759 |
+
get_data_btn = gr.Button("📥 Get Data", variant="primary", size="lg")
|
| 760 |
+
|
| 761 |
+
# Parliament cards display (hidden by default, shown when parliament results return)
|
| 762 |
+
with gr.Column(visible=False) as parliament_cards_row:
|
| 763 |
+
parliament_cards_html = gr.HTML("")
|
| 764 |
+
with gr.Row():
|
| 765 |
+
prev_page_btn = gr.Button("◀ Previous", size="sm")
|
| 766 |
+
page_info = gr.Markdown("Page 1")
|
| 767 |
+
next_page_btn = gr.Button("Next ▶", size="sm")
|
| 768 |
+
|
| 769 |
with gr.Row():
|
| 770 |
msg = gr.Textbox(
|
| 771 |
+
placeholder="(Choose a source on the right first)",
|
| 772 |
show_label=False,
|
| 773 |
scale=4
|
| 774 |
)
|
|
|
|
| 777 |
with gr.Column(scale=1):
|
| 778 |
gr.Markdown("### ⚙️ Settings")
|
| 779 |
|
| 780 |
+
dataset = gr.Radio(
|
| 781 |
+
choices=[
|
| 782 |
+
("Swiss Parliament Data", "openparldata"),
|
| 783 |
+
("Swiss Statistics (BFS)", "bfs")
|
| 784 |
+
],
|
| 785 |
+
value="openparldata",
|
| 786 |
+
label="Data Source",
|
| 787 |
+
info="Choose which API to query"
|
| 788 |
+
)
|
| 789 |
+
|
| 790 |
language = gr.Radio(
|
| 791 |
choices=list(LANGUAGES.keys()),
|
| 792 |
value="English",
|
|
|
|
| 794 |
info="Select response language"
|
| 795 |
)
|
| 796 |
|
| 797 |
+
def ensure_message_history(history):
|
| 798 |
+
"""Normalize chat history to the format expected by gr.Chatbot(type='messages')."""
|
| 799 |
+
normalized: list[dict] = []
|
| 800 |
+
if not history:
|
| 801 |
+
return normalized
|
| 802 |
+
|
| 803 |
+
for entry in history:
|
| 804 |
+
if isinstance(entry, dict):
|
| 805 |
+
role = entry.get("role")
|
| 806 |
+
content = entry.get("content", "")
|
| 807 |
+
if role:
|
| 808 |
+
normalized.append({"role": role, "content": "" if content is None else str(content)})
|
| 809 |
+
elif isinstance(entry, (tuple, list)) and len(entry) == 2:
|
| 810 |
+
user, assistant = entry
|
| 811 |
+
if user is not None:
|
| 812 |
+
normalized.append({"role": "user", "content": str(user)})
|
| 813 |
+
if assistant is not None:
|
| 814 |
+
normalized.append({"role": "assistant", "content": str(assistant)})
|
| 815 |
+
return normalized
|
| 816 |
+
|
| 817 |
+
def append_message(history: list[dict], role: str, content: str | None):
|
| 818 |
+
"""Append a message to the normalized history."""
|
| 819 |
+
history.append({"role": role, "content": "" if content is None else str(content)})
|
| 820 |
+
|
| 821 |
+
def render_parliament_cards(cards: list[dict], page: int, items_per_page: int = 10) -> tuple[str, str, int, bool]:
|
| 822 |
+
"""Render parliament cards as HTML with pagination."""
|
| 823 |
+
if not cards:
|
| 824 |
+
return "", "No results", 1, False
|
| 825 |
+
|
| 826 |
+
total_pages = (len(cards) + items_per_page - 1) // items_per_page
|
| 827 |
+
page = max(1, min(page, total_pages)) # Clamp page to valid range
|
| 828 |
+
show_pagination = len(cards) > items_per_page
|
| 829 |
+
|
| 830 |
+
start_idx = (page - 1) * items_per_page
|
| 831 |
+
end_idx = min(start_idx + items_per_page, len(cards))
|
| 832 |
+
page_cards = cards[start_idx:end_idx]
|
| 833 |
+
|
| 834 |
+
# Generate HTML for cards
|
| 835 |
+
cards_html = '<div style="display: flex; flex-direction: column; gap: 15px;">'
|
| 836 |
+
for card in page_cards:
|
| 837 |
+
title = card.get("title", "Untitled")
|
| 838 |
+
url = card.get("url", "#")
|
| 839 |
+
date = card.get("date", "")
|
| 840 |
+
|
| 841 |
+
# Truncate title if too long
|
| 842 |
+
if len(title) > 120:
|
| 843 |
+
title = title[:117] + "..."
|
| 844 |
+
|
| 845 |
+
date_badge = f'<span style="background: #e0e0e0; padding: 4px 8px; border-radius: 4px; font-size: 12px; color: #666;">{date}</span>' if date else ''
|
| 846 |
+
|
| 847 |
+
cards_html += f'''
|
| 848 |
+
<a href="{url}" target="_blank" style="text-decoration: none;">
|
| 849 |
+
<div style="
|
| 850 |
+
border: 1px solid #ddd;
|
| 851 |
+
border-radius: 8px;
|
| 852 |
+
padding: 16px;
|
| 853 |
+
background: white;
|
| 854 |
+
transition: all 0.2s;
|
| 855 |
+
cursor: pointer;
|
| 856 |
+
">
|
| 857 |
+
<div style="display: flex; justify-content: space-between; align-items: start; gap: 12px;">
|
| 858 |
+
<h3 style="margin: 0; color: #333; font-size: 16px; flex: 1;">{title}</h3>
|
| 859 |
+
{date_badge}
|
| 860 |
+
</div>
|
| 861 |
+
</div>
|
| 862 |
+
</a>
|
| 863 |
+
'''
|
| 864 |
+
cards_html += '</div>'
|
| 865 |
+
|
| 866 |
+
page_info = f"Page {page} of {total_pages} ({len(cards)} total results)"
|
| 867 |
+
|
| 868 |
+
return cards_html, page_info, page, show_pagination
|
| 869 |
|
| 870 |
+
# Handle message submission
|
| 871 |
+
def respond(message, chat_history, language, dataset_choice, current_datacube_state, current_parliament_cards, current_page, request: gr.Request):
|
| 872 |
+
show_debug = False # Debug mode disabled in UI
|
| 873 |
+
chat_messages = ensure_message_history(chat_history)
|
| 874 |
|
| 875 |
+
if not message.strip():
|
| 876 |
+
return "", chat_messages, None, gr.update(visible=False), current_datacube_state, gr.update(), gr.update(visible=False), current_parliament_cards, current_page, "", "", gr.update(visible=False), gr.update(), gr.update()
|
|
|
|
|
|
|
|
|
|
|
|
|
| 877 |
|
| 878 |
+
# Check usage limit
|
| 879 |
+
user_id = request.client.host if request and hasattr(request, 'client') else "unknown"
|
| 880 |
+
|
| 881 |
+
append_message(chat_messages, "user", message)
|
| 882 |
+
|
| 883 |
+
if not tracker.check_limit(user_id):
|
| 884 |
+
bot_message = (
|
| 885 |
+
"⚠️ Daily request limit reached. You have used all 50 requests for today. "
|
| 886 |
+
"Please try again tomorrow.\n\nThis limit helps us keep the service free and available for everyone."
|
| 887 |
)
|
| 888 |
+
append_message(chat_messages, "assistant", bot_message)
|
| 889 |
+
return "", chat_messages, None, gr.update(visible=False), current_datacube_state, gr.update(), gr.update(visible=False), current_parliament_cards, current_page, "", "", gr.update(visible=False), gr.update(), gr.update()
|
| 890 |
+
|
| 891 |
+
# Map dataset choice to engine type
|
| 892 |
+
dataset_map = {
|
| 893 |
+
"openparldata": "parliament",
|
| 894 |
+
"bfs": "statistics"
|
| 895 |
+
}
|
| 896 |
+
dataset_type = dataset_map.get(dataset_choice, "parliament")
|
| 897 |
+
|
| 898 |
+
# Get bot response (returns tuple with optional CSV file and results data)
|
| 899 |
+
bot_message, csv_file, datacube_map, results_data = chat_response(
|
| 900 |
+
message, chat_messages, language, show_debug, dataset_type
|
| 901 |
+
)
|
| 902 |
|
| 903 |
+
append_message(chat_messages, "assistant", bot_message)
|
| 904 |
+
|
| 905 |
+
# Handle parliament cards (for Parliament dataset)
|
| 906 |
+
if dataset_type == "parliament" and results_data:
|
| 907 |
+
cards_html, page_info, page_num, show_pagination = render_parliament_cards(results_data, 1)
|
| 908 |
+
return (
|
| 909 |
+
"",
|
| 910 |
+
chat_messages,
|
| 911 |
+
None,
|
| 912 |
+
gr.update(visible=False),
|
| 913 |
+
current_datacube_state,
|
| 914 |
+
gr.update(),
|
| 915 |
+
gr.update(visible=False),
|
| 916 |
+
results_data, # parliament_cards_state
|
| 917 |
+
page_num, # parliament_page_state
|
| 918 |
+
cards_html, # parliament_cards_html
|
| 919 |
+
page_info, # page_info
|
| 920 |
+
gr.update(visible=True), # parliament_cards_row
|
| 921 |
+
gr.update(visible=show_pagination), # prev_page_btn
|
| 922 |
+
gr.update(visible=show_pagination) # next_page_btn
|
| 923 |
)
|
| 924 |
|
| 925 |
+
# Handle datacube search results (for BFS dataset)
|
| 926 |
+
if dataset_type == "statistics" and results_data:
|
| 927 |
+
return (
|
| 928 |
+
"",
|
| 929 |
+
chat_messages,
|
| 930 |
+
None,
|
| 931 |
+
gr.update(visible=False),
|
| 932 |
+
datacube_map,
|
| 933 |
+
gr.update(choices=results_data, value=None),
|
| 934 |
+
gr.update(visible=True),
|
| 935 |
+
current_parliament_cards,
|
| 936 |
+
current_page,
|
| 937 |
+
"",
|
| 938 |
+
"",
|
| 939 |
+
gr.update(visible=False),
|
| 940 |
+
gr.update(),
|
| 941 |
+
gr.update()
|
| 942 |
+
)
|
| 943 |
+
|
| 944 |
+
# Handle CSV download
|
| 945 |
+
if csv_file:
|
| 946 |
+
return (
|
| 947 |
+
"",
|
| 948 |
+
chat_messages,
|
| 949 |
+
csv_file,
|
| 950 |
+
gr.update(visible=True),
|
| 951 |
+
current_datacube_state,
|
| 952 |
+
gr.update(),
|
| 953 |
+
gr.update(visible=False),
|
| 954 |
+
current_parliament_cards,
|
| 955 |
+
current_page,
|
| 956 |
+
"",
|
| 957 |
+
"",
|
| 958 |
+
gr.update(visible=False),
|
| 959 |
+
gr.update(),
|
| 960 |
+
gr.update()
|
| 961 |
+
)
|
| 962 |
+
|
| 963 |
+
return (
|
| 964 |
+
"",
|
| 965 |
+
chat_messages,
|
| 966 |
+
None,
|
| 967 |
+
gr.update(visible=False),
|
| 968 |
+
current_datacube_state,
|
| 969 |
+
gr.update(),
|
| 970 |
+
gr.update(visible=False),
|
| 971 |
+
current_parliament_cards,
|
| 972 |
+
current_page,
|
| 973 |
+
"",
|
| 974 |
+
"",
|
| 975 |
+
gr.update(visible=False),
|
| 976 |
+
gr.update(),
|
| 977 |
+
gr.update()
|
| 978 |
+
)
|
| 979 |
+
|
| 980 |
+
# Handle parliament pagination
|
| 981 |
+
def prev_page(cards, current_page):
|
| 982 |
+
"""Go to previous page of parliament results."""
|
| 983 |
+
new_page = max(1, current_page - 1)
|
| 984 |
+
cards_html, page_info, page_num, show_pagination = render_parliament_cards(cards, new_page)
|
| 985 |
+
return cards_html, page_info, page_num
|
| 986 |
+
|
| 987 |
+
def next_page(cards, current_page):
|
| 988 |
+
"""Go to next page of parliament results."""
|
| 989 |
+
if not cards:
|
| 990 |
+
return "", "No results", current_page
|
| 991 |
+
total_pages = (len(cards) + 9) // 10 # 10 items per page
|
| 992 |
+
new_page = min(total_pages, current_page + 1)
|
| 993 |
+
cards_html, page_info, page_num, show_pagination = render_parliament_cards(cards, new_page)
|
| 994 |
+
return cards_html, page_info, page_num
|
| 995 |
+
|
| 996 |
+
# Handle "Get Data" button click for datacube selection
|
| 997 |
+
def fetch_datacube_data(selected_choice, current_datacube_state, chat_history, language, request: gr.Request):
|
| 998 |
+
show_debug = False # Debug mode disabled in UI
|
| 999 |
+
chat_messages = ensure_message_history(chat_history)
|
| 1000 |
+
user_message = f"Get Data: {selected_choice}" if selected_choice else "Get Data"
|
| 1001 |
+
append_message(chat_messages, "user", user_message)
|
| 1002 |
+
|
| 1003 |
+
if not selected_choice or not current_datacube_state:
|
| 1004 |
+
error_msg = "⚠️ Please select a datacube first."
|
| 1005 |
+
append_message(chat_messages, "assistant", error_msg)
|
| 1006 |
+
return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
|
| 1007 |
|
| 1008 |
# Check usage limit
|
| 1009 |
user_id = request.client.host if request and hasattr(request, 'client') else "unknown"
|
| 1010 |
|
| 1011 |
if not tracker.check_limit(user_id):
|
| 1012 |
+
bot_message = (
|
| 1013 |
+
"⚠️ Daily request limit reached. You have used all 50 requests for today. "
|
| 1014 |
+
"Please try again tomorrow.\n\nThis limit helps us keep the service free and available for everyone."
|
| 1015 |
+
)
|
| 1016 |
+
append_message(chat_messages, "assistant", bot_message)
|
| 1017 |
+
return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
|
| 1018 |
+
|
| 1019 |
+
# Get datacube ID from mapping
|
| 1020 |
+
datacube_id = current_datacube_state.get(selected_choice)
|
| 1021 |
+
|
| 1022 |
+
if not datacube_id:
|
| 1023 |
+
error_msg = "❌ Error: Could not find datacube ID for selected option."
|
| 1024 |
+
append_message(chat_messages, "assistant", error_msg)
|
| 1025 |
+
return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
|
| 1026 |
|
| 1027 |
+
# Get language code
|
| 1028 |
+
lang_code = LANGUAGES.get(language, "en")
|
| 1029 |
|
| 1030 |
+
bfs_engine = DATASET_ENGINES.get("statistics")
|
| 1031 |
+
if not isinstance(bfs_engine, BFSEngine):
|
| 1032 |
+
error_msg = "❌ Error: BFS engine unavailable."
|
| 1033 |
+
append_message(chat_messages, "assistant", error_msg)
|
| 1034 |
+
return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
|
| 1035 |
|
| 1036 |
+
bot_message, csv_file_path = bfs_engine.fetch_datacube_data(datacube_id, lang_code, show_debug)
|
| 1037 |
|
| 1038 |
+
append_message(chat_messages, "assistant", bot_message)
|
| 1039 |
+
if csv_file_path:
|
| 1040 |
+
return chat_messages, csv_file_path, gr.update(visible=True), gr.update(visible=False)
|
| 1041 |
|
| 1042 |
+
return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
|
| 1043 |
+
|
| 1044 |
+
msg.submit(
|
| 1045 |
+
respond,
|
| 1046 |
+
[msg, chatbot, language, dataset, datacube_state, parliament_cards_state, parliament_page_state],
|
| 1047 |
+
[msg, chatbot, download_file, download_file, datacube_state, datacube_radio, datacube_selection_row,
|
| 1048 |
+
parliament_cards_state, parliament_page_state, parliament_cards_html, page_info, parliament_cards_row,
|
| 1049 |
+
prev_page_btn, next_page_btn]
|
| 1050 |
+
)
|
| 1051 |
+
submit.click(
|
| 1052 |
+
respond,
|
| 1053 |
+
[msg, chatbot, language, dataset, datacube_state, parliament_cards_state, parliament_page_state],
|
| 1054 |
+
[msg, chatbot, download_file, download_file, datacube_state, datacube_radio, datacube_selection_row,
|
| 1055 |
+
parliament_cards_state, parliament_page_state, parliament_cards_html, page_info, parliament_cards_row,
|
| 1056 |
+
prev_page_btn, next_page_btn]
|
| 1057 |
+
)
|
| 1058 |
+
get_data_btn.click(
|
| 1059 |
+
fetch_datacube_data,
|
| 1060 |
+
[datacube_radio, datacube_state, chatbot, language],
|
| 1061 |
+
[chatbot, download_file, download_file, datacube_selection_row]
|
| 1062 |
+
)
|
| 1063 |
+
prev_page_btn.click(
|
| 1064 |
+
prev_page,
|
| 1065 |
+
[parliament_cards_state, parliament_page_state],
|
| 1066 |
+
[parliament_cards_html, page_info, parliament_page_state]
|
| 1067 |
+
)
|
| 1068 |
+
next_page_btn.click(
|
| 1069 |
+
next_page,
|
| 1070 |
+
[parliament_cards_state, parliament_page_state],
|
| 1071 |
+
[parliament_cards_html, page_info, parliament_page_state]
|
| 1072 |
+
)
|
| 1073 |
|
| 1074 |
gr.Markdown(
|
| 1075 |
"""
|
| 1076 |
---
|
| 1077 |
+
**Data Sources:**
|
| 1078 |
+
- **Swiss Parliament Data:** OpenParlData MCP server for parliamentary information
|
| 1079 |
+
- **Swiss Statistics (BFS):** Federal Statistical Office data via PxWeb API
|
| 1080 |
|
| 1081 |
+
**Rate Limit:** 50 requests per day per user (shared across both datasets) to keep the service affordable and accessible.
|
| 1082 |
|
| 1083 |
Powered by [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) via HF Inference Providers and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
|
| 1084 |
"""
|
|
@@ -599,5 +599,5 @@ async def search_debates(params: SearchDebatesInput) -> str:
|
|
| 599 |
|
| 600 |
# Main execution
|
| 601 |
if __name__ == "__main__":
|
| 602 |
-
|
| 603 |
-
|
|
|
|
| 599 |
|
| 600 |
# Main execution
|
| 601 |
if __name__ == "__main__":
|
| 602 |
+
# Run FastMCP server (synchronous, blocking call)
|
| 603 |
+
mcp.run()
|
|
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Swiss BFS API MCP Server
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
This MCP server provides access to ALL data from the Swiss Federal Statistical Office (BFS), not just population data. The BFS maintains comprehensive statistics on:
|
| 5 |
+
|
| 6 |
+
- Population and Demographics
|
| 7 |
+
- Territory and Environment
|
| 8 |
+
- Work and Income
|
| 9 |
+
- National Economy
|
| 10 |
+
- Prices and Inflation
|
| 11 |
+
- Industry and Services
|
| 12 |
+
- Agriculture and Forestry
|
| 13 |
+
- Energy
|
| 14 |
+
- Construction and Housing
|
| 15 |
+
- Tourism
|
| 16 |
+
- Mobility and Transport
|
| 17 |
+
- Social Security
|
| 18 |
+
- Health
|
| 19 |
+
- Education and Science
|
| 20 |
+
- Crime and Criminal Justice
|
| 21 |
+
|
| 22 |
+
## Installation
|
| 23 |
+
|
| 24 |
+
```bash
|
| 25 |
+
pip install -r requirements.txt
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
## Usage
|
| 29 |
+
|
| 30 |
+
Run the MCP server:
|
| 31 |
+
```bash
|
| 32 |
+
python bfs_mcp_server.py
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
The server communicates via stdio and can be integrated with any MCP-compatible client.
|
| 36 |
+
|
| 37 |
+
## Available Tools
|
| 38 |
+
|
| 39 |
+
### 1. `bfs_list_datacubes`
|
| 40 |
+
Browse available datacubes in the API hierarchy.
|
| 41 |
+
- `path`: Category path (e.g., "px-x-01" for population, "" for root)
|
| 42 |
+
- `language`: de/fr/it/en
|
| 43 |
+
|
| 44 |
+
### 2. `bfs_get_metadata`
|
| 45 |
+
Get detailed metadata about a specific datacube including dimensions and available values.
|
| 46 |
+
- `datacube_id`: The datacube identifier (e.g., "px-x-0102030000_101")
|
| 47 |
+
- `language`: de/fr/it/en
|
| 48 |
+
|
| 49 |
+
### 3. `bfs_query_data`
|
| 50 |
+
Query any BFS datacube with custom filters.
|
| 51 |
+
- `datacube_id`: The datacube identifier
|
| 52 |
+
- `filters`: Array of filter objects with `code`, `filter` type, and `values`
|
| 53 |
+
- `format`: Output format (csv/json/json-stat/json-stat2/px)
|
| 54 |
+
- `language`: de/fr/it/en
|
| 55 |
+
|
| 56 |
+
### 4. `bfs_search`
|
| 57 |
+
Search for datacubes by topic keywords.
|
| 58 |
+
- `keywords`: Search terms (e.g., "inflation", "education", "health")
|
| 59 |
+
- `language`: de/fr/it/en
|
| 60 |
+
|
| 61 |
+
### 5. `bfs_get_config`
|
| 62 |
+
Get API configuration and limits.
|
| 63 |
+
- `language`: de/fr/it/en
|
| 64 |
+
|
| 65 |
+
## Example Usage Flow
|
| 66 |
+
|
| 67 |
+
1. **Search for a topic:**
|
| 68 |
+
```
|
| 69 |
+
bfs_search(keywords="inflation")
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
2. **Browse a category:**
|
| 73 |
+
```
|
| 74 |
+
bfs_list_datacubes(path="px-x-05") # Price statistics
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
3. **Get metadata for a specific datacube:**
|
| 78 |
+
```
|
| 79 |
+
bfs_get_metadata(datacube_id="px-x-0502010000_104")
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
4. **Query data with filters:**
|
| 83 |
+
```
|
| 84 |
+
bfs_query_data(
|
| 85 |
+
datacube_id="px-x-0502010000_104",
|
| 86 |
+
filters=[
|
| 87 |
+
{"code": "Zeit", "filter": "top", "values": ["12"]}
|
| 88 |
+
],
|
| 89 |
+
format="csv"
|
| 90 |
+
)
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
## Category Codes
|
| 94 |
+
|
| 95 |
+
Main statistical categories in the BFS system:
|
| 96 |
+
- `px-x-01`: Population
|
| 97 |
+
- `px-x-02`: Territory and Environment
|
| 98 |
+
- `px-x-03`: Work and Income
|
| 99 |
+
- `px-x-04`: National Economy
|
| 100 |
+
- `px-x-05`: Prices
|
| 101 |
+
- `px-x-06`: Industry and Services
|
| 102 |
+
- `px-x-07`: Agriculture and Forestry
|
| 103 |
+
- `px-x-08`: Energy
|
| 104 |
+
- `px-x-09`: Construction and Housing
|
| 105 |
+
- `px-x-10`: Tourism
|
| 106 |
+
- `px-x-11`: Mobility and Transport
|
| 107 |
+
- `px-x-13`: Social Security
|
| 108 |
+
- `px-x-14`: Health
|
| 109 |
+
- `px-x-15`: Education and Science
|
| 110 |
+
- `px-x-19`: Crime and Criminal Justice
|
| 111 |
+
|
| 112 |
+
## Integration with LLM Clients
|
| 113 |
+
|
| 114 |
+
This MCP server is designed to work with any MCP-compatible LLM client. The server handles natural language understanding through the client, providing structured access to Swiss federal statistics.
|
| 115 |
+
|
| 116 |
+
## API Documentation
|
| 117 |
+
|
| 118 |
+
The underlying API is a PxWeb implementation (developed by Statistics Sweden).
|
| 119 |
+
- Base URL: https://www.pxweb.bfs.admin.ch/api/v1/{language}/
|
| 120 |
+
- Official BFS Website: https://www.bfs.admin.ch
|
|
@@ -0,0 +1,538 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Swiss BFS API MCP Server
|
| 4 |
+
Provides broad access to Swiss Federal Statistical Office data via PxWeb API
|
| 5 |
+
|
| 6 |
+
Refactored to use FastMCP for consistency with OpenParlData server.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
import asyncio
|
| 10 |
+
import json
|
| 11 |
+
import logging
|
| 12 |
+
from typing import Dict, List, Any, Optional
|
| 13 |
+
from enum import Enum
|
| 14 |
+
import httpx
|
| 15 |
+
from mcp.server.fastmcp import FastMCP
|
| 16 |
+
from pydantic import BaseModel, Field, ConfigDict
|
| 17 |
+
|
| 18 |
+
logging.basicConfig(level=logging.INFO)
|
| 19 |
+
logger = logging.getLogger(__name__)
|
| 20 |
+
|
| 21 |
+
# Initialize FastMCP server
|
| 22 |
+
mcp = FastMCP("swiss-bfs-api")
|
| 23 |
+
|
| 24 |
+
# API Configuration
|
| 25 |
+
BASE_URL = "https://www.pxweb.bfs.admin.ch/api/v1"
|
| 26 |
+
|
| 27 |
+
class Language(str, Enum):
|
| 28 |
+
DE = "de"
|
| 29 |
+
FR = "fr"
|
| 30 |
+
IT = "it"
|
| 31 |
+
EN = "en"
|
| 32 |
+
|
| 33 |
+
class DataFormat(str, Enum):
|
| 34 |
+
CSV = "csv"
|
| 35 |
+
JSON = "json"
|
| 36 |
+
JSON_STAT = "json-stat"
|
| 37 |
+
JSON_STAT2 = "json-stat2"
|
| 38 |
+
PX = "px"
|
| 39 |
+
|
| 40 |
+
class FilterType(str, Enum):
|
| 41 |
+
ALL = "all"
|
| 42 |
+
ITEM = "item"
|
| 43 |
+
TOP = "top"
|
| 44 |
+
|
| 45 |
+
# Datacube knowledge base: Maps keywords to known datacube IDs with descriptions
|
| 46 |
+
# This helps with semantic search since the API only returns cryptic IDs
|
| 47 |
+
DATACUBE_KNOWLEDGE_BASE = {
|
| 48 |
+
# Population & Demographics (px-x-01)
|
| 49 |
+
"population": [
|
| 50 |
+
("px-x-0102010000_101", "Permanent resident population by canton"),
|
| 51 |
+
("px-x-0102020000_101", "Population by age and sex"),
|
| 52 |
+
("px-x-0102020202_106", "Population statistics and scenarios"),
|
| 53 |
+
("px-x-0102020300_101", "Population growth and change"),
|
| 54 |
+
],
|
| 55 |
+
"demographics": [
|
| 56 |
+
("px-x-0102010000_101", "Permanent resident population by canton"),
|
| 57 |
+
("px-x-0102020000_101", "Population by age and sex"),
|
| 58 |
+
],
|
| 59 |
+
"birth": [
|
| 60 |
+
("px-x-0102020000_101", "Birth rates and statistics"),
|
| 61 |
+
],
|
| 62 |
+
"death": [
|
| 63 |
+
("px-x-0102020000_101", "Mortality rates and statistics"),
|
| 64 |
+
],
|
| 65 |
+
|
| 66 |
+
# Employment & Labor (px-x-03)
|
| 67 |
+
"employment": [
|
| 68 |
+
("px-x-0301000000_103", "Employment by sector"),
|
| 69 |
+
("px-x-0301000000_104", "Employment statistics"),
|
| 70 |
+
],
|
| 71 |
+
"unemployment": [
|
| 72 |
+
("px-x-0301000000_103", "Unemployment rates"),
|
| 73 |
+
],
|
| 74 |
+
"labor": [
|
| 75 |
+
("px-x-0301000000_103", "Labor market statistics"),
|
| 76 |
+
],
|
| 77 |
+
"work": [
|
| 78 |
+
("px-x-0301000000_103", "Employment and work statistics"),
|
| 79 |
+
],
|
| 80 |
+
|
| 81 |
+
# Prices & Inflation (px-x-05)
|
| 82 |
+
"inflation": [
|
| 83 |
+
("px-x-0502010000_101", "Consumer price index (CPI)"),
|
| 84 |
+
],
|
| 85 |
+
"prices": [
|
| 86 |
+
("px-x-0502010000_101", "Price statistics and indices"),
|
| 87 |
+
],
|
| 88 |
+
"cost": [
|
| 89 |
+
("px-x-0502010000_101", "Cost of living indices"),
|
| 90 |
+
],
|
| 91 |
+
|
| 92 |
+
# Income & Consumption (px-x-20)
|
| 93 |
+
"income": [
|
| 94 |
+
("px-x-2105000000_101", "Income distribution"),
|
| 95 |
+
("px-x-2105000000_102", "Household income"),
|
| 96 |
+
],
|
| 97 |
+
"wages": [
|
| 98 |
+
("px-x-2105000000_101", "Wage statistics"),
|
| 99 |
+
],
|
| 100 |
+
"salary": [
|
| 101 |
+
("px-x-2105000000_101", "Salary and compensation"),
|
| 102 |
+
],
|
| 103 |
+
|
| 104 |
+
# Education (px-x-15)
|
| 105 |
+
"education": [
|
| 106 |
+
("px-x-1502010000_101", "Education statistics"),
|
| 107 |
+
("px-x-1502010100_101", "Students and schools"),
|
| 108 |
+
],
|
| 109 |
+
"students": [
|
| 110 |
+
("px-x-1502010100_101", "Student enrollment"),
|
| 111 |
+
],
|
| 112 |
+
"schools": [
|
| 113 |
+
("px-x-1502010100_101", "School statistics"),
|
| 114 |
+
],
|
| 115 |
+
"university": [
|
| 116 |
+
("px-x-1502010100_101", "Higher education statistics"),
|
| 117 |
+
],
|
| 118 |
+
|
| 119 |
+
# Health (px-x-14)
|
| 120 |
+
"health": [
|
| 121 |
+
("px-x-1404010100_101", "Health statistics"),
|
| 122 |
+
("px-x-1404050000_101", "Healthcare costs"),
|
| 123 |
+
],
|
| 124 |
+
"hospital": [
|
| 125 |
+
("px-x-1404010100_101", "Hospital statistics"),
|
| 126 |
+
],
|
| 127 |
+
"medical": [
|
| 128 |
+
("px-x-1404010100_101", "Medical care statistics"),
|
| 129 |
+
],
|
| 130 |
+
|
| 131 |
+
# Energy (px-x-07)
|
| 132 |
+
"energy": [
|
| 133 |
+
("px-x-0702000000_101", "Energy statistics"),
|
| 134 |
+
],
|
| 135 |
+
"electricity": [
|
| 136 |
+
("px-x-0702000000_101", "Electricity production and consumption"),
|
| 137 |
+
],
|
| 138 |
+
"power": [
|
| 139 |
+
("px-x-0702000000_101", "Power generation"),
|
| 140 |
+
],
|
| 141 |
+
|
| 142 |
+
# Housing (px-x-09)
|
| 143 |
+
"housing": [
|
| 144 |
+
("px-x-0902020100_104", "Housing statistics"),
|
| 145 |
+
],
|
| 146 |
+
"rent": [
|
| 147 |
+
("px-x-0902020100_104", "Rental prices"),
|
| 148 |
+
],
|
| 149 |
+
"construction": [
|
| 150 |
+
("px-x-0902020100_104", "Construction statistics"),
|
| 151 |
+
],
|
| 152 |
+
}
|
| 153 |
+
|
| 154 |
+
# Global HTTP client
|
| 155 |
+
http_client: Optional[httpx.AsyncClient] = None
|
| 156 |
+
|
| 157 |
+
def get_client() -> httpx.AsyncClient:
|
| 158 |
+
"""Get or create HTTP client."""
|
| 159 |
+
global http_client
|
| 160 |
+
if http_client is None:
|
| 161 |
+
http_client = httpx.AsyncClient(
|
| 162 |
+
timeout=60.0,
|
| 163 |
+
headers={
|
| 164 |
+
"User-Agent": "Mozilla/5.0 (compatible; BFS-MCP/1.0; +https://github.com/user/bfs-mcp)",
|
| 165 |
+
"Accept": "application/json",
|
| 166 |
+
"Accept-Language": "en,de,fr,it"
|
| 167 |
+
}
|
| 168 |
+
)
|
| 169 |
+
return http_client
|
| 170 |
+
|
| 171 |
+
# Pydantic models for input validation
|
| 172 |
+
|
| 173 |
+
class ListDatacubesInput(BaseModel):
|
| 174 |
+
"""Input for listing BFS datacubes."""
|
| 175 |
+
model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
|
| 176 |
+
|
| 177 |
+
path: str = Field("", description="Category path to explore (e.g., '' for root, 'px-x-01' for population)")
|
| 178 |
+
language: Language = Field(Language.EN, description="Response language")
|
| 179 |
+
|
| 180 |
+
class GetMetadataInput(BaseModel):
|
| 181 |
+
"""Input for getting datacube metadata."""
|
| 182 |
+
model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
|
| 183 |
+
|
| 184 |
+
datacube_id: str = Field(..., description="The BFS datacube identifier (e.g., px-x-0102030000_101)", min_length=1)
|
| 185 |
+
language: Language = Field(Language.EN, description="Response language")
|
| 186 |
+
|
| 187 |
+
class DimensionFilter(BaseModel):
|
| 188 |
+
"""Filter for a single dimension."""
|
| 189 |
+
code: str = Field(..., description="Dimension code (e.g., 'Jahr', 'Region', 'Geschlecht')")
|
| 190 |
+
filter: FilterType = Field(..., description="Filter type")
|
| 191 |
+
values: List[str] = Field(..., description="Values to select")
|
| 192 |
+
|
| 193 |
+
class QueryDataInput(BaseModel):
|
| 194 |
+
"""Input for querying BFS datacube data."""
|
| 195 |
+
model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
|
| 196 |
+
|
| 197 |
+
datacube_id: str = Field(..., description="The BFS datacube identifier", min_length=1)
|
| 198 |
+
filters: List[DimensionFilter] = Field(default=[], description="Query filters for dimensions")
|
| 199 |
+
format: DataFormat = Field(DataFormat.CSV, description="Response format")
|
| 200 |
+
language: Language = Field(Language.EN, description="Response language")
|
| 201 |
+
|
| 202 |
+
class SearchDatacubesInput(BaseModel):
|
| 203 |
+
"""Input for searching BFS datacubes."""
|
| 204 |
+
model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
|
| 205 |
+
|
| 206 |
+
keywords: str = Field(..., description="Search keywords (e.g., 'inflation', 'employment', 'education', 'health')", min_length=1)
|
| 207 |
+
language: Language = Field(Language.EN, description="Response language")
|
| 208 |
+
|
| 209 |
+
class GetConfigInput(BaseModel):
|
| 210 |
+
"""Input for getting API configuration."""
|
| 211 |
+
model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
|
| 212 |
+
|
| 213 |
+
language: Language = Field(Language.EN, description="Response language")
|
| 214 |
+
|
| 215 |
+
# Tool implementations
|
| 216 |
+
|
| 217 |
+
@mcp.tool(
|
| 218 |
+
name="bfs_list_datacubes",
|
| 219 |
+
annotations={
|
| 220 |
+
"title": "List BFS Datacubes",
|
| 221 |
+
"readOnlyHint": True,
|
| 222 |
+
"destructiveHint": False,
|
| 223 |
+
"idempotentHint": True,
|
| 224 |
+
"openWorldHint": True
|
| 225 |
+
}
|
| 226 |
+
)
|
| 227 |
+
async def list_datacubes(params: ListDatacubesInput) -> str:
|
| 228 |
+
"""
|
| 229 |
+
List available datacubes from a BFS category path.
|
| 230 |
+
|
| 231 |
+
Browse the Swiss Federal Statistical Office data catalog by category.
|
| 232 |
+
The BFS API has datacube IDs at the root level.
|
| 233 |
+
|
| 234 |
+
Examples:
|
| 235 |
+
- List all datacubes: path=""
|
| 236 |
+
- Get specific datacube: path="px-x-0102030000_101"
|
| 237 |
+
"""
|
| 238 |
+
url = f"{BASE_URL}/{params.language.value}"
|
| 239 |
+
if params.path:
|
| 240 |
+
url += f"/{params.path}"
|
| 241 |
+
|
| 242 |
+
try:
|
| 243 |
+
client = get_client()
|
| 244 |
+
response = await client.get(url)
|
| 245 |
+
response.raise_for_status()
|
| 246 |
+
data = response.json()
|
| 247 |
+
|
| 248 |
+
result = f"Available datacubes (showing first 50):\n\n"
|
| 249 |
+
|
| 250 |
+
if isinstance(data, list):
|
| 251 |
+
# Limit to first 50 to avoid overwhelming response
|
| 252 |
+
for item in data[:50]:
|
| 253 |
+
if isinstance(item, dict):
|
| 254 |
+
dbid = item.get('dbid') or item.get('id', 'N/A')
|
| 255 |
+
text = item.get('text', 'N/A')
|
| 256 |
+
result += f"• **{dbid}**: {text}\n"
|
| 257 |
+
if item.get('type') == 't':
|
| 258 |
+
result += " ↳ Use bfs_query_data with this datacube_id\n"
|
| 259 |
+
|
| 260 |
+
if len(data) > 50:
|
| 261 |
+
result += f"\n... and {len(data) - 50} more datacubes\n"
|
| 262 |
+
else:
|
| 263 |
+
result += json.dumps(data, indent=2)
|
| 264 |
+
|
| 265 |
+
return result
|
| 266 |
+
|
| 267 |
+
except Exception as e:
|
| 268 |
+
logger.error(f"Error listing datacubes: {e}")
|
| 269 |
+
return f"Error listing datacubes: {str(e)}"
|
| 270 |
+
|
| 271 |
+
@mcp.tool(
|
| 272 |
+
name="bfs_get_metadata",
|
| 273 |
+
annotations={
|
| 274 |
+
"title": "Get BFS Datacube Metadata",
|
| 275 |
+
"readOnlyHint": True,
|
| 276 |
+
"destructiveHint": False,
|
| 277 |
+
"idempotentHint": True,
|
| 278 |
+
"openWorldHint": True
|
| 279 |
+
}
|
| 280 |
+
)
|
| 281 |
+
async def get_metadata(params: GetMetadataInput) -> str:
|
| 282 |
+
"""
|
| 283 |
+
Get metadata about a BFS datacube including dimensions and available values.
|
| 284 |
+
|
| 285 |
+
Returns detailed information about a specific datacube including:
|
| 286 |
+
- Title and description
|
| 287 |
+
- Available dimensions (time, region, category, etc.)
|
| 288 |
+
- Possible values for each dimension
|
| 289 |
+
- Data structure information
|
| 290 |
+
|
| 291 |
+
Use this before querying data to understand what filters are available.
|
| 292 |
+
"""
|
| 293 |
+
url = f"{BASE_URL}/{params.language.value}/{params.datacube_id}/{params.datacube_id}.px"
|
| 294 |
+
|
| 295 |
+
try:
|
| 296 |
+
client = get_client()
|
| 297 |
+
response = await client.get(url)
|
| 298 |
+
response.raise_for_status()
|
| 299 |
+
metadata = response.json()
|
| 300 |
+
|
| 301 |
+
result = f"Metadata for {params.datacube_id}:\n\n"
|
| 302 |
+
|
| 303 |
+
# Extract key information
|
| 304 |
+
if "title" in metadata:
|
| 305 |
+
result += f"Title: {metadata['title']}\n\n"
|
| 306 |
+
|
| 307 |
+
if "variables" in metadata:
|
| 308 |
+
result += "Available dimensions:\n"
|
| 309 |
+
for var in metadata["variables"]:
|
| 310 |
+
result += f"\n• {var.get('code', 'N/A')}: {var.get('text', 'N/A')}\n"
|
| 311 |
+
if "values" in var and len(var["values"]) <= 10:
|
| 312 |
+
result += f" Values: {', '.join(var['values'][:10])}\n"
|
| 313 |
+
elif "values" in var:
|
| 314 |
+
result += f" Values: {len(var['values'])} options available\n"
|
| 315 |
+
|
| 316 |
+
result += f"\n\nFull metadata:\n{json.dumps(metadata, indent=2)}"
|
| 317 |
+
|
| 318 |
+
return result
|
| 319 |
+
|
| 320 |
+
except Exception as e:
|
| 321 |
+
logger.error(f"Error fetching metadata: {e}")
|
| 322 |
+
return f"Error fetching metadata: {str(e)}"
|
| 323 |
+
|
| 324 |
+
@mcp.tool(
|
| 325 |
+
name="bfs_query_data",
|
| 326 |
+
annotations={
|
| 327 |
+
"title": "Query BFS Datacube Data",
|
| 328 |
+
"readOnlyHint": True,
|
| 329 |
+
"destructiveHint": False,
|
| 330 |
+
"idempotentHint": True,
|
| 331 |
+
"openWorldHint": True
|
| 332 |
+
}
|
| 333 |
+
)
|
| 334 |
+
async def query_data(params: QueryDataInput) -> str:
|
| 335 |
+
"""
|
| 336 |
+
Query any BFS datacube with custom filters.
|
| 337 |
+
|
| 338 |
+
Retrieve actual statistical data from a datacube. You can filter by:
|
| 339 |
+
- Time periods (years, months, quarters)
|
| 340 |
+
- Geographic regions (cantons, municipalities)
|
| 341 |
+
- Categories (age groups, sectors, types, etc.)
|
| 342 |
+
|
| 343 |
+
Returns data in the specified format (CSV, JSON, JSON-stat).
|
| 344 |
+
|
| 345 |
+
Note: If no filters are provided, will attempt to return recent data.
|
| 346 |
+
"""
|
| 347 |
+
url = f"{BASE_URL}/{params.language.value}/{params.datacube_id}/{params.datacube_id}.px"
|
| 348 |
+
|
| 349 |
+
# Build query
|
| 350 |
+
query = {
|
| 351 |
+
"query": [],
|
| 352 |
+
"response": {"format": params.format.value}
|
| 353 |
+
}
|
| 354 |
+
|
| 355 |
+
# Convert filters to query format
|
| 356 |
+
for f in params.filters:
|
| 357 |
+
query["query"].append({
|
| 358 |
+
"code": f.code,
|
| 359 |
+
"selection": {
|
| 360 |
+
"filter": f.filter.value,
|
| 361 |
+
"values": f.values
|
| 362 |
+
}
|
| 363 |
+
})
|
| 364 |
+
|
| 365 |
+
# If no filters, try to get recent/limited data
|
| 366 |
+
if not params.filters:
|
| 367 |
+
# Try to get metadata first to find a time dimension
|
| 368 |
+
try:
|
| 369 |
+
client = get_client()
|
| 370 |
+
meta_response = await client.get(url)
|
| 371 |
+
if meta_response.status_code == 200:
|
| 372 |
+
metadata = meta_response.json()
|
| 373 |
+
# Look for time-related dimension
|
| 374 |
+
for var in metadata.get("variables", []):
|
| 375 |
+
if var.get("code", "").lower() in ["jahr", "year", "zeit", "time", "periode"]:
|
| 376 |
+
query["query"] = [{
|
| 377 |
+
"code": var["code"],
|
| 378 |
+
"selection": {"filter": "top", "values": ["5"]}
|
| 379 |
+
}]
|
| 380 |
+
break
|
| 381 |
+
except:
|
| 382 |
+
pass
|
| 383 |
+
|
| 384 |
+
try:
|
| 385 |
+
client = get_client()
|
| 386 |
+
response = await client.post(url, json=query)
|
| 387 |
+
response.raise_for_status()
|
| 388 |
+
|
| 389 |
+
if params.format == DataFormat.CSV:
|
| 390 |
+
return response.text
|
| 391 |
+
else:
|
| 392 |
+
return json.dumps(response.json(), indent=2)
|
| 393 |
+
|
| 394 |
+
except httpx.HTTPStatusError as e:
|
| 395 |
+
error_msg = f"HTTP Error {e.response.status_code}: "
|
| 396 |
+
try:
|
| 397 |
+
error_detail = e.response.json()
|
| 398 |
+
error_msg += json.dumps(error_detail, indent=2)
|
| 399 |
+
except:
|
| 400 |
+
error_msg += e.response.text
|
| 401 |
+
logger.error(error_msg)
|
| 402 |
+
return error_msg
|
| 403 |
+
except Exception as e:
|
| 404 |
+
logger.error(f"Error querying data: {e}")
|
| 405 |
+
return f"Error querying data: {str(e)}"
|
| 406 |
+
|
| 407 |
+
@mcp.tool(
|
| 408 |
+
name="bfs_search",
|
| 409 |
+
annotations={
|
| 410 |
+
"title": "Search BFS Datacubes",
|
| 411 |
+
"readOnlyHint": True,
|
| 412 |
+
"destructiveHint": False,
|
| 413 |
+
"idempotentHint": True,
|
| 414 |
+
"openWorldHint": True
|
| 415 |
+
}
|
| 416 |
+
)
|
| 417 |
+
async def search_datacubes(params: SearchDatacubesInput) -> str:
|
| 418 |
+
"""
|
| 419 |
+
Search for BFS datacubes by topic keywords using built-in knowledge base.
|
| 420 |
+
|
| 421 |
+
Find relevant datacubes for topics like:
|
| 422 |
+
- Population statistics
|
| 423 |
+
- Employment and unemployment
|
| 424 |
+
- Education and science
|
| 425 |
+
- Health statistics
|
| 426 |
+
- Economic indicators
|
| 427 |
+
- Inflation and prices
|
| 428 |
+
- Energy consumption
|
| 429 |
+
- Housing and construction
|
| 430 |
+
|
| 431 |
+
Returns matching datacubes with descriptions.
|
| 432 |
+
"""
|
| 433 |
+
try:
|
| 434 |
+
# Search in knowledge base
|
| 435 |
+
keywords_lower = params.keywords.lower().strip()
|
| 436 |
+
matches = []
|
| 437 |
+
|
| 438 |
+
# Split search keywords and match against knowledge base
|
| 439 |
+
search_words = [w for w in keywords_lower.split() if len(w) > 2]
|
| 440 |
+
|
| 441 |
+
# Check each keyword in knowledge base
|
| 442 |
+
for keyword, datacubes in DATACUBE_KNOWLEDGE_BASE.items():
|
| 443 |
+
# Match if any search word appears in the knowledge base keyword
|
| 444 |
+
if any(word in keyword for word in search_words) or any(keyword in word for word in search_words):
|
| 445 |
+
for datacube_id, description in datacubes:
|
| 446 |
+
# Avoid duplicates
|
| 447 |
+
if not any(m['id'] == datacube_id for m in matches):
|
| 448 |
+
matches.append({
|
| 449 |
+
'id': datacube_id,
|
| 450 |
+
'text': description,
|
| 451 |
+
'keyword': keyword
|
| 452 |
+
})
|
| 453 |
+
|
| 454 |
+
# Format results
|
| 455 |
+
result = f"Search results for '{params.keywords}':\n\n"
|
| 456 |
+
|
| 457 |
+
if matches:
|
| 458 |
+
result += f"Found {len(matches)} matching datacube(s):\n\n"
|
| 459 |
+
for i, match in enumerate(matches[:20], 1): # Limit to 20 results
|
| 460 |
+
result += f"{i}. **{match['id']}**\n"
|
| 461 |
+
result += f" {match['text']}\n"
|
| 462 |
+
result += f" ↳ To get data: Use bfs_query_data(datacube_id='{match['id']}')\n"
|
| 463 |
+
result += "\n"
|
| 464 |
+
|
| 465 |
+
if len(matches) > 20:
|
| 466 |
+
result += f"... and {len(matches) - 20} more results (showing first 20)\n"
|
| 467 |
+
else:
|
| 468 |
+
result += "No datacubes found matching your keywords.\n\n"
|
| 469 |
+
result += "Try these topics: population, employment, unemployment, health, inflation, "
|
| 470 |
+
result += "education, energy, housing, income, wages, prices, cost\n"
|
| 471 |
+
|
| 472 |
+
return result
|
| 473 |
+
|
| 474 |
+
except Exception as e:
|
| 475 |
+
logger.error(f"Error searching datacubes: {e}")
|
| 476 |
+
return f"Error searching datacubes: {str(e)}"
|
| 477 |
+
|
| 478 |
+
@mcp.tool(
|
| 479 |
+
name="bfs_get_config",
|
| 480 |
+
annotations={
|
| 481 |
+
"title": "Get BFS API Configuration",
|
| 482 |
+
"readOnlyHint": True,
|
| 483 |
+
"destructiveHint": False,
|
| 484 |
+
"idempotentHint": True,
|
| 485 |
+
"openWorldHint": True
|
| 486 |
+
}
|
| 487 |
+
)
|
| 488 |
+
async def get_config(params: GetConfigInput) -> str:
|
| 489 |
+
"""
|
| 490 |
+
Get API configuration and limits.
|
| 491 |
+
|
| 492 |
+
Returns information about the BFS API including:
|
| 493 |
+
- API version
|
| 494 |
+
- Rate limits
|
| 495 |
+
- Data access restrictions
|
| 496 |
+
- Available features
|
| 497 |
+
"""
|
| 498 |
+
url = f"{BASE_URL}/{params.language.value}/?config"
|
| 499 |
+
|
| 500 |
+
try:
|
| 501 |
+
client = get_client()
|
| 502 |
+
response = await client.get(url)
|
| 503 |
+
response.raise_for_status()
|
| 504 |
+
config = response.json()
|
| 505 |
+
|
| 506 |
+
result = "BFS API Configuration:\n\n"
|
| 507 |
+
result += json.dumps(config, indent=2)
|
| 508 |
+
|
| 509 |
+
return result
|
| 510 |
+
|
| 511 |
+
except Exception as e:
|
| 512 |
+
logger.error(f"Error fetching config: {e}")
|
| 513 |
+
return f"Error fetching config: {str(e)}"
|
| 514 |
+
|
| 515 |
+
# Cleanup function
|
| 516 |
+
async def cleanup():
|
| 517 |
+
"""Cleanup resources on shutdown."""
|
| 518 |
+
global http_client
|
| 519 |
+
if http_client:
|
| 520 |
+
await http_client.aclose()
|
| 521 |
+
http_client = None
|
| 522 |
+
|
| 523 |
+
# Main execution
|
| 524 |
+
if __name__ == "__main__":
|
| 525 |
+
import atexit
|
| 526 |
+
|
| 527 |
+
# Register cleanup to run when server exits
|
| 528 |
+
def cleanup_sync():
|
| 529 |
+
import asyncio
|
| 530 |
+
try:
|
| 531 |
+
asyncio.run(cleanup())
|
| 532 |
+
except:
|
| 533 |
+
pass
|
| 534 |
+
|
| 535 |
+
atexit.register(cleanup_sync)
|
| 536 |
+
|
| 537 |
+
# Run FastMCP server (synchronous, blocking call)
|
| 538 |
+
mcp.run()
|
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Swiss BFS MCP Server Requirements
|
| 2 |
+
mcp>=0.1.0
|
| 3 |
+
httpx>=0.24.0
|
| 4 |
+
python-json-logger>=2.0.0
|
|
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Test script for Swiss BFS MCP Server
|
| 4 |
+
Demonstrates direct API usage (not MCP protocol)
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import asyncio
|
| 8 |
+
import httpx
|
| 9 |
+
import json
|
| 10 |
+
|
| 11 |
+
BASE_URL = "https://www.pxweb.bfs.admin.ch/api/v1"
|
| 12 |
+
|
| 13 |
+
async def test_api():
|
| 14 |
+
"""Test the BFS API directly to verify functionality"""
|
| 15 |
+
|
| 16 |
+
headers = {
|
| 17 |
+
"User-Agent": "Mozilla/5.0 (compatible; BFS-Test/1.0)",
|
| 18 |
+
"Accept": "application/json",
|
| 19 |
+
"Accept-Language": "en,de,fr,it"
|
| 20 |
+
}
|
| 21 |
+
|
| 22 |
+
async with httpx.AsyncClient(timeout=30.0, headers=headers) as client:
|
| 23 |
+
|
| 24 |
+
print("=" * 60)
|
| 25 |
+
print("Swiss BFS API Test")
|
| 26 |
+
print("=" * 60)
|
| 27 |
+
|
| 28 |
+
# 1. Test getting root categories
|
| 29 |
+
print("\n1. Getting root categories...")
|
| 30 |
+
try:
|
| 31 |
+
response = await client.get(f"{BASE_URL}/en")
|
| 32 |
+
data = response.json()
|
| 33 |
+
print(f"Found {len(data)} main categories")
|
| 34 |
+
for item in data[:5]:
|
| 35 |
+
if isinstance(item, dict):
|
| 36 |
+
print(f" - {item.get('id', 'N/A')}: {item.get('text', 'N/A')}")
|
| 37 |
+
except Exception as e:
|
| 38 |
+
print(f"Error: {e}")
|
| 39 |
+
|
| 40 |
+
# 2. Test getting metadata for population datacube
|
| 41 |
+
print("\n2. Getting metadata for population datacube...")
|
| 42 |
+
datacube_id = "px-x-0102030000_101"
|
| 43 |
+
try:
|
| 44 |
+
response = await client.get(f"{BASE_URL}/en/{datacube_id}/{datacube_id}.px")
|
| 45 |
+
metadata = response.json()
|
| 46 |
+
print(f"Datacube: {metadata.get('title', 'N/A')}")
|
| 47 |
+
if "variables" in metadata:
|
| 48 |
+
print("Variables available:")
|
| 49 |
+
for var in metadata["variables"]:
|
| 50 |
+
print(f" - {var.get('code', 'N/A')}: {var.get('text', 'N/A')}")
|
| 51 |
+
except Exception as e:
|
| 52 |
+
print(f"Error: {e}")
|
| 53 |
+
|
| 54 |
+
# 3. Test querying recent data
|
| 55 |
+
print("\n3. Querying recent population data...")
|
| 56 |
+
query = {
|
| 57 |
+
"query": [
|
| 58 |
+
{
|
| 59 |
+
"code": "Jahr",
|
| 60 |
+
"selection": {
|
| 61 |
+
"filter": "top",
|
| 62 |
+
"values": ["3"]
|
| 63 |
+
}
|
| 64 |
+
}
|
| 65 |
+
],
|
| 66 |
+
"response": {
|
| 67 |
+
"format": "json"
|
| 68 |
+
}
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
try:
|
| 72 |
+
response = await client.post(
|
| 73 |
+
f"{BASE_URL}/en/{datacube_id}/{datacube_id}.px",
|
| 74 |
+
json=query
|
| 75 |
+
)
|
| 76 |
+
data = response.json()
|
| 77 |
+
print("Successfully retrieved data")
|
| 78 |
+
print(f"Response keys: {list(data.keys())}")
|
| 79 |
+
except Exception as e:
|
| 80 |
+
print(f"Error: {e}")
|
| 81 |
+
|
| 82 |
+
# 4. Test browsing other categories
|
| 83 |
+
print("\n4. Browsing price statistics category...")
|
| 84 |
+
try:
|
| 85 |
+
response = await client.get(f"{BASE_URL}/en/px-x-05")
|
| 86 |
+
data = response.json()
|
| 87 |
+
print(f"Found {len(data)} items in price statistics")
|
| 88 |
+
for item in data[:3]:
|
| 89 |
+
if isinstance(item, dict):
|
| 90 |
+
print(f" - {item.get('id', 'N/A')}: {item.get('text', 'N/A')}")
|
| 91 |
+
except Exception as e:
|
| 92 |
+
print(f"Error: {e}")
|
| 93 |
+
|
| 94 |
+
print("\n" + "=" * 60)
|
| 95 |
+
print("Test completed")
|
| 96 |
+
print("=" * 60)
|
| 97 |
+
|
| 98 |
+
if __name__ == "__main__":
|
| 99 |
+
asyncio.run(test_api())
|
|
@@ -1,6 +1,6 @@
|
|
| 1 |
"""
|
| 2 |
-
MCP Integration for OpenParlData
|
| 3 |
-
Provides
|
| 4 |
and executing tools from the Gradio app.
|
| 5 |
"""
|
| 6 |
|
|
@@ -89,6 +89,109 @@ class OpenParlDataClient:
|
|
| 89 |
# Wrap arguments in 'params' key as expected by MCP server
|
| 90 |
tool_arguments = {"params": arguments}
|
| 91 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
# Call the tool
|
| 93 |
result = await self.session.call_tool(tool_name, arguments=tool_arguments)
|
| 94 |
|
|
@@ -248,7 +351,7 @@ async def execute_mcp_query(
|
|
| 248 |
show_debug: bool = False
|
| 249 |
) -> tuple[str, Optional[str]]:
|
| 250 |
"""
|
| 251 |
-
Execute any MCP tool query.
|
| 252 |
|
| 253 |
Args:
|
| 254 |
user_query: The original user question (for context)
|
|
@@ -274,3 +377,38 @@ async def execute_mcp_query(
|
|
| 274 |
|
| 275 |
finally:
|
| 276 |
await client.disconnect()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
"""
|
| 2 |
+
MCP Integration for OpenParlData and BFS
|
| 3 |
+
Provides wrappers for connecting to the OpenParlData and BFS MCP servers
|
| 4 |
and executing tools from the Gradio app.
|
| 5 |
"""
|
| 6 |
|
|
|
|
| 89 |
# Wrap arguments in 'params' key as expected by MCP server
|
| 90 |
tool_arguments = {"params": arguments}
|
| 91 |
|
| 92 |
+
# DEBUG: Log MCP payload before sending
|
| 93 |
+
print(f"\n📤 [OpenParlDataClient] Sending to MCP server:")
|
| 94 |
+
print(f" Tool: {tool_name}")
|
| 95 |
+
print(f" Wrapped payload: {tool_arguments}")
|
| 96 |
+
print(f" Payload types: {dict((k, type(v).__name__) for k, v in tool_arguments.items())}")
|
| 97 |
+
|
| 98 |
+
# Call the tool
|
| 99 |
+
result = await self.session.call_tool(tool_name, arguments=tool_arguments)
|
| 100 |
+
|
| 101 |
+
# Extract text content from result
|
| 102 |
+
if result.content:
|
| 103 |
+
# MCP returns list of content blocks
|
| 104 |
+
text_parts = []
|
| 105 |
+
for content in result.content:
|
| 106 |
+
if hasattr(content, 'text'):
|
| 107 |
+
text_parts.append(content.text)
|
| 108 |
+
elif isinstance(content, dict) and 'text' in content:
|
| 109 |
+
text_parts.append(content['text'])
|
| 110 |
+
return "\n".join(text_parts)
|
| 111 |
+
|
| 112 |
+
return "No response from tool"
|
| 113 |
+
|
| 114 |
+
def get_tool_info(self) -> List[Dict[str, Any]]:
|
| 115 |
+
"""Get information about available tools."""
|
| 116 |
+
return self.available_tools
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
class BFSClient:
|
| 120 |
+
"""Client for interacting with BFS MCP server."""
|
| 121 |
+
|
| 122 |
+
def __init__(self):
|
| 123 |
+
self.session: Optional[ClientSession] = None
|
| 124 |
+
self.available_tools: List[Dict[str, Any]] = []
|
| 125 |
+
|
| 126 |
+
async def connect(self):
|
| 127 |
+
"""Connect to the MCP server."""
|
| 128 |
+
# Get the path to the BFS MCP server script
|
| 129 |
+
server_script = Path(__file__).parent / "mcp_bfs" / "bfs_mcp_server.py"
|
| 130 |
+
|
| 131 |
+
if not server_script.exists():
|
| 132 |
+
raise FileNotFoundError(f"BFS MCP server script not found at {server_script}")
|
| 133 |
+
|
| 134 |
+
# Server parameters for stdio connection
|
| 135 |
+
server_params = StdioServerParameters(
|
| 136 |
+
command=sys.executable, # Python interpreter
|
| 137 |
+
args=[str(server_script)],
|
| 138 |
+
env=None
|
| 139 |
+
)
|
| 140 |
+
|
| 141 |
+
# Create stdio client context
|
| 142 |
+
self.stdio_context = stdio_client(server_params)
|
| 143 |
+
read, write = await self.stdio_context.__aenter__()
|
| 144 |
+
|
| 145 |
+
# Create session
|
| 146 |
+
self.session = ClientSession(read, write)
|
| 147 |
+
await self.session.__aenter__()
|
| 148 |
+
|
| 149 |
+
# Initialize and get available tools
|
| 150 |
+
await self.session.initialize()
|
| 151 |
+
|
| 152 |
+
# List available tools
|
| 153 |
+
tools_result = await self.session.list_tools()
|
| 154 |
+
self.available_tools = [
|
| 155 |
+
{
|
| 156 |
+
"name": tool.name,
|
| 157 |
+
"description": tool.description,
|
| 158 |
+
"input_schema": tool.inputSchema
|
| 159 |
+
}
|
| 160 |
+
for tool in tools_result.tools
|
| 161 |
+
]
|
| 162 |
+
|
| 163 |
+
return self.available_tools
|
| 164 |
+
|
| 165 |
+
async def disconnect(self):
|
| 166 |
+
"""Disconnect from the MCP server."""
|
| 167 |
+
if self.session:
|
| 168 |
+
await self.session.__aexit__(None, None, None)
|
| 169 |
+
if hasattr(self, 'stdio_context'):
|
| 170 |
+
await self.stdio_context.__aexit__(None, None, None)
|
| 171 |
+
|
| 172 |
+
async def call_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
|
| 173 |
+
"""
|
| 174 |
+
Call an MCP tool with given arguments.
|
| 175 |
+
|
| 176 |
+
Args:
|
| 177 |
+
tool_name: Name of the tool to call
|
| 178 |
+
arguments: Dictionary of arguments for the tool
|
| 179 |
+
|
| 180 |
+
Returns:
|
| 181 |
+
Tool response as string
|
| 182 |
+
"""
|
| 183 |
+
if not self.session:
|
| 184 |
+
raise RuntimeError("Not connected to BFS MCP server. Call connect() first.")
|
| 185 |
+
|
| 186 |
+
# Wrap arguments in 'params' key as expected by MCP server
|
| 187 |
+
tool_arguments = {"params": arguments}
|
| 188 |
+
|
| 189 |
+
# DEBUG: Log MCP payload before sending
|
| 190 |
+
print(f"\n📤 [BFSClient] Sending to MCP server:")
|
| 191 |
+
print(f" Tool: {tool_name}")
|
| 192 |
+
print(f" Wrapped payload: {tool_arguments}")
|
| 193 |
+
print(f" Payload types: {dict((k, type(v).__name__) for k, v in tool_arguments.items())}")
|
| 194 |
+
|
| 195 |
# Call the tool
|
| 196 |
result = await self.session.call_tool(tool_name, arguments=tool_arguments)
|
| 197 |
|
|
|
|
| 351 |
show_debug: bool = False
|
| 352 |
) -> tuple[str, Optional[str]]:
|
| 353 |
"""
|
| 354 |
+
Execute any OpenParlData MCP tool query.
|
| 355 |
|
| 356 |
Args:
|
| 357 |
user_query: The original user question (for context)
|
|
|
|
| 377 |
|
| 378 |
finally:
|
| 379 |
await client.disconnect()
|
| 380 |
+
|
| 381 |
+
|
| 382 |
+
async def execute_mcp_query_bfs(
|
| 383 |
+
user_query: str,
|
| 384 |
+
tool_name: str,
|
| 385 |
+
arguments: Dict[str, Any],
|
| 386 |
+
show_debug: bool = False
|
| 387 |
+
) -> tuple[str, Optional[str]]:
|
| 388 |
+
"""
|
| 389 |
+
Execute any BFS MCP tool query.
|
| 390 |
+
|
| 391 |
+
Args:
|
| 392 |
+
user_query: The original user question (for context)
|
| 393 |
+
tool_name: Name of the BFS MCP tool to call
|
| 394 |
+
arguments: Arguments for the tool
|
| 395 |
+
show_debug: Whether to return debug information
|
| 396 |
+
|
| 397 |
+
Returns:
|
| 398 |
+
Tuple of (response_text, debug_info)
|
| 399 |
+
"""
|
| 400 |
+
client = BFSClient()
|
| 401 |
+
|
| 402 |
+
try:
|
| 403 |
+
await client.connect()
|
| 404 |
+
|
| 405 |
+
debug_info = None
|
| 406 |
+
if show_debug:
|
| 407 |
+
debug_info = f"**User Query:** {user_query}\n\n**Tool:** {tool_name}\n**Arguments:** ```json\n{json.dumps(arguments, indent=2)}\n```"
|
| 408 |
+
|
| 409 |
+
response = await client.call_tool(tool_name, arguments)
|
| 410 |
+
|
| 411 |
+
return response, debug_info
|
| 412 |
+
|
| 413 |
+
finally:
|
| 414 |
+
await client.disconnect()
|
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You help users query Swiss Federal Statistical Office data. Return ONLY valid JSON. No markdown or explanations.
|
| 2 |
+
|
| 3 |
+
Format:
|
| 4 |
+
{"tool": "tool_name", "arguments": {...}, "explanation": "brief text"}
|
| 5 |
+
|
| 6 |
+
OR for non-data questions:
|
| 7 |
+
{"response": "your answer"}
|
| 8 |
+
|
| 9 |
+
AVAILABLE TOOLS (Two-Step Workflow):
|
| 10 |
+
|
| 11 |
+
STEP 1 - DISCOVERY:
|
| 12 |
+
bfs_search
|
| 13 |
+
Params: keywords, language
|
| 14 |
+
Purpose: Search datacubes by keywords (inflation, population, employment, etc.)
|
| 15 |
+
Returns: List of matching datacubes with IDs and descriptions
|
| 16 |
+
NOTE: Does NOT accept "format" parameter!
|
| 17 |
+
|
| 18 |
+
STEP 2 - DATA RETRIEVAL:
|
| 19 |
+
bfs_query_data
|
| 20 |
+
Params: datacube_id, language, format (required), filters (optional list)
|
| 21 |
+
Purpose: Get actual data in specified format
|
| 22 |
+
Example: {"datacube_id": "px-x-0502010000_104", "format": "csv", "language": "en"}
|
| 23 |
+
|
| 24 |
+
PARAMETER CONSTRAINTS:
|
| 25 |
+
- language: lowercase "en", "de", "fr", or "it"
|
| 26 |
+
- format (bfs_query_data only): "csv", "json", "json-stat", "json-stat2", or "px"
|
| 27 |
+
- keywords: String describing what data to find
|
| 28 |
+
- datacube_id: Exact ID from bfs_search results
|
| 29 |
+
- ONLY use parameters listed for each tool. NO extra/undocumented parameters.
|
| 30 |
+
|
| 31 |
+
WORKFLOW:
|
| 32 |
+
1. User asks "I want inflation data" → Use bfs_search with keywords="inflation"
|
| 33 |
+
2. Present datacube options to user (keep descriptions concise, max 1-2 sentences per datacube)
|
| 34 |
+
3. User confirms which datacube → Use bfs_query_data with exact datacube_id → CSV download
|
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You help users query Swiss parliamentary data. Return ONLY valid JSON. No markdown or explanations.
|
| 2 |
+
|
| 3 |
+
Format:
|
| 4 |
+
{"tool": "tool_name", "arguments": {...}, "explanation": "brief text"}
|
| 5 |
+
|
| 6 |
+
OR for non-data questions:
|
| 7 |
+
{"response": "your answer"}
|
| 8 |
+
|
| 9 |
+
AVAILABLE TOOLS:
|
| 10 |
+
1. openparldata_search_parliamentarians
|
| 11 |
+
Params: query, canton (2-letter uppercase like 'ZH'), party, language, limit, offset, response_format
|
| 12 |
+
|
| 13 |
+
2. openparldata_search_votes
|
| 14 |
+
Params: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), language, limit, offset, response_format
|
| 15 |
+
|
| 16 |
+
3. openparldata_search_motions
|
| 17 |
+
Params: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), status, language, limit, offset, response_format
|
| 18 |
+
|
| 19 |
+
4. openparldata_search_debates
|
| 20 |
+
Params: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), language, limit, offset, response_format
|
| 21 |
+
|
| 22 |
+
PARAMETER CONSTRAINTS:
|
| 23 |
+
- limit: Integer between 1-100 (default 20). NEVER exceed 100.
|
| 24 |
+
- language: lowercase "en", "de", "fr", or "it"
|
| 25 |
+
- offset: Integer >= 0 for pagination
|
| 26 |
+
- response_format: "json" or "markdown" (default "markdown")
|
| 27 |
+
- ONLY use parameters listed for each tool. NO extra/undocumented parameters.
|
| 28 |
+
|
| 29 |
+
Rules: Use YYYY-MM-DD dates. For "latest" use date_from="2024-01-01" only.
|
|
@@ -19,5 +19,8 @@ pydantic>=2.0.0
|
|
| 19 |
# Async Support
|
| 20 |
anyio>=3.0.0
|
| 21 |
|
|
|
|
|
|
|
|
|
|
| 22 |
# Environment Variables
|
| 23 |
python-dotenv>=1.0.0
|
|
|
|
| 19 |
# Async Support
|
| 20 |
anyio>=3.0.0
|
| 21 |
|
| 22 |
+
# Logging
|
| 23 |
+
python-json-logger>=2.0.0
|
| 24 |
+
|
| 25 |
# Environment Variables
|
| 26 |
python-dotenv>=1.0.0
|