Tom Claude commited on
Commit
81d39a3
·
1 Parent(s): c5df650

Implement engine-per-dataset architecture with argument sanitization and enhanced UI

Browse files

Major architectural improvements:
- Refactor to engine-per-dataset pattern (ParliamentEngine, BFSEngine)
- Add comprehensive argument sanitization layer to prevent MCP validation errors
- Implement datacube knowledge base for BFS semantic search
- Add parliament cards display with smart pagination (10 items, auto-hide when <10)
- Add strategic logging for debugging MCP payloads

BFS MCP enhancements:
- Create mcp_bfs/ with knowledge base mapping keywords to datacube IDs
- Map topics (population, employment, health, etc.) to specific datacubes
- Enable semantic search without relying on cryptic API IDs

UI improvements:
- Simplify BFS search results display (no duplicate listings)
- Add language-aware content display (de/fr/it preference with English fallback)
- Remove Example Questions and Debug Info sections
- Update title to "CoJournalist Swiss Data" with white text
- Simplify API dataset values to "openparldata" and "bfs"
- Change placeholder text to "(Choose a source on the right first)"

Technical improvements:
- Add Pydantic-compatible type conversions (string→int for limit, string→enum for language/format)
- Implement tool-specific parameter filtering to prevent extra='forbid' errors
- Update prompts with explicit parameter constraints
- Enable Gradio API parameter support for dataset selection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

app.py CHANGED
@@ -1,20 +1,37 @@
1
  """
2
- CoJournalist Data - Swiss Parliamentary Data Chatbot
3
- Powered by Llama-3.1-8B-Instruct and OpenParlData MCP
4
  """
5
 
6
  import os
7
  import json
 
 
 
8
  import gradio as gr
9
  from huggingface_hub import InferenceClient
10
  from dotenv import load_dotenv
11
- from mcp_integration import execute_mcp_query, OpenParlDataClient
12
  import asyncio
13
  from usage_tracker import UsageTracker
14
 
15
  # Load environment variables
16
  load_dotenv()
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  # Initialize Hugging Face Inference Client
19
  HF_TOKEN = os.getenv("HF_TOKEN")
20
  if not HF_TOKEN:
@@ -22,6 +39,572 @@ if not HF_TOKEN:
22
 
23
  client = InferenceClient(token=HF_TOKEN)
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  # Initialize usage tracker with 50 requests per day limit
26
  tracker = UsageTracker(daily_limit=50)
27
 
@@ -33,79 +616,8 @@ LANGUAGES = {
33
  "Italiano": "it"
34
  }
35
 
36
- # System prompt for Llama-3.1-8B-Instruct
37
- SYSTEM_PROMPT = """You are a helpful assistant that helps users query Swiss parliamentary data.
38
-
39
- You have access to the following tools from the OpenParlData MCP server:
40
-
41
- 1. **openparldata_search_parliamentarians** - Search for Swiss parliamentarians
42
- Parameters: query (name/party), canton (2-letter code), party, active_only, language, limit
43
-
44
- 2. **openparldata_get_parliamentarian** - Get detailed info about a specific parliamentarian
45
- Parameters: person_id, include_votes, include_motions, language
46
-
47
- 3. **openparldata_search_votes** - Search parliamentary votes
48
- Parameters:
49
- - query (title/description)
50
- - date_from (YYYY-MM-DD format, e.g., "2024-01-01")
51
- - date_to (YYYY-MM-DD format, e.g., "2024-12-31" - NEVER use "now", always use actual date)
52
- - vote_type (must be "final", "detail", or "overall")
53
- - language, limit
54
-
55
- 4. **openparldata_get_vote_details** - Get detailed vote information
56
- Parameters: vote_id, include_individual_votes, language
57
-
58
- 5. **openparldata_search_motions** - Search motions and proposals
59
- Parameters: query, status, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), submitter_id, language, limit
60
-
61
- 6. **openparldata_search_debates** - Search debate transcripts
62
- Parameters: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), speaker_id, language, limit
63
-
64
- CRITICAL RULES:
65
- - All dates MUST be in YYYY-MM-DD format (e.g., "2024-12-31")
66
- - NEVER use "now", "today", or relative dates - always use actual YYYY-MM-DD dates
67
- - For "latest" queries, use date_from with a recent date like "2024-01-01" and NO date_to parameter
68
- - vote_type must ONLY be "final", "detail", or "overall" - no other values
69
- - Your response MUST be valid JSON only
70
- - Do NOT include explanatory text or markdown formatting
71
-
72
- When a user asks a question about Swiss parliamentary data:
73
- 1. Analyze what information they need
74
- 2. Determine which tool(s) to use
75
- 3. Extract the relevant parameters from their question
76
- 4. Respond with ONLY a JSON object containing the tool call
77
-
78
- Your response should be in this exact format:
79
- {
80
- "tool": "tool_name",
81
- "arguments": {
82
- "param1": "value1",
83
- "param2": "value2"
84
- },
85
- "explanation": "Brief explanation of what you're searching for"
86
- }
87
-
88
- If the user's question is not about Swiss parliamentary data or you cannot determine the right tool, respond with:
89
- {
90
- "response": "Your natural language response here"
91
- }
92
-
93
- Example:
94
- User: "Who are the parliamentarians from Zurich?"
95
- Assistant:
96
- {
97
- "tool": "openparldata_search_parliamentarians",
98
- "arguments": {
99
- "canton": "ZH",
100
- "language": "en",
101
- "limit": 20
102
- },
103
- "explanation": "Searching for active parliamentarians from Canton Zurich"
104
- }
105
- """
106
-
107
- # Example queries
108
- EXAMPLES = {
109
  "en": [
110
  "Who are the parliamentarians from Zurich?",
111
  "Show me recent votes about climate policy",
@@ -132,132 +644,50 @@ EXAMPLES = {
132
  ]
133
  }
134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
 
136
- async def query_model_async(message: str, language: str = "en") -> dict:
137
- """Query Llama-3.1-8B model via Inference Providers to interpret user intent and determine tool calls."""
138
-
139
- try:
140
- # Create messages for chat completion
141
- messages = [
142
- {"role": "system", "content": SYSTEM_PROMPT},
143
- {"role": "user", "content": f"Language: {language}\nQuestion: {message}"}
144
- ]
145
-
146
- # Call Llama-3.1-8B via HuggingFace Inference Providers
147
- response = client.chat_completion(
148
- model="meta-llama/Llama-3.1-8B-Instruct",
149
- messages=messages,
150
- max_tokens=500,
151
- temperature=0.3
152
- )
153
-
154
- # Extract response
155
- assistant_message = response.choices[0].message.content
156
-
157
- # Try to parse as JSON
158
- try:
159
- # Clean up response (sometimes models add markdown code blocks)
160
- clean_response = assistant_message.strip()
161
- if clean_response.startswith("```json"):
162
- clean_response = clean_response[7:]
163
- if clean_response.startswith("```"):
164
- clean_response = clean_response[3:]
165
- if clean_response.endswith("```"):
166
- clean_response = clean_response[:-3]
167
- clean_response = clean_response.strip()
168
-
169
- # Find first { or [ (start of JSON) to handle explanatory text
170
- json_start = min(
171
- clean_response.find('{') if '{' in clean_response else len(clean_response),
172
- clean_response.find('[') if '[' in clean_response else len(clean_response)
173
- )
174
- if json_start > 0:
175
- clean_response = clean_response[json_start:]
176
-
177
- return json.loads(clean_response)
178
- except json.JSONDecodeError:
179
- # If not valid JSON, treat as natural language response
180
- return {"response": assistant_message}
181
-
182
- except Exception as e:
183
- return {"error": f"Error querying model: {str(e)}"}
184
-
185
-
186
- def query_model(message: str, language: str = "en") -> dict:
187
- """Synchronous wrapper for async model query."""
188
- return asyncio.run(query_model_async(message, language))
189
-
190
-
191
- async def execute_tool_async(tool_name: str, arguments: dict, show_debug: bool) -> tuple:
192
- """Execute MCP tool asynchronously."""
193
- return await execute_mcp_query("", tool_name, arguments, show_debug)
194
-
195
-
196
- def chat_response(message: str, history: list, language: str, show_debug: bool) -> str:
197
  """
198
- Main chat response function.
199
-
200
- Args:
201
- message: User's message
202
- history: Chat history
203
- language: Selected language
204
- show_debug: Whether to show debug information
205
-
206
- Returns:
207
- Response string
208
  """
209
  try:
210
- # Get language code
211
- lang_code = LANGUAGES.get(language, "en")
 
212
 
213
- # Query Phi-3 model to interpret intent
214
- model_response = query_model(message, lang_code)
215
-
216
- # Check if it's a direct response (no tool call needed)
217
- if "response" in model_response:
218
- return model_response["response"]
219
-
220
- # Check for error
221
- if "error" in model_response:
222
- return f"❌ {model_response['error']}"
223
-
224
- # Execute tool call
225
- if "tool" in model_response and "arguments" in model_response:
226
- tool_name = model_response["tool"]
227
- arguments = model_response["arguments"]
228
- explanation = model_response.get("explanation", "")
229
-
230
- # Ensure language is set in arguments
231
- if "language" not in arguments:
232
- arguments["language"] = lang_code
233
-
234
- # Execute the tool
235
- try:
236
- response, debug_info = asyncio.run(
237
- execute_tool_async(tool_name, arguments, show_debug)
238
- )
239
-
240
- # Build final response
241
- final_response = ""
242
-
243
- if explanation:
244
- final_response += f"*{explanation}*\n\n"
245
-
246
- if show_debug and debug_info:
247
- final_response += f"### 🔧 Debug Information\n{debug_info}\n\n---\n\n"
248
-
249
- final_response += f"### 📊 Results\n{response}"
250
-
251
- return final_response
252
-
253
- except Exception as e:
254
- return f"❌ Error executing tool '{tool_name}': {str(e)}"
255
-
256
- # Fallback
257
- return "I couldn't determine how to process your request. Please try rephrasing your question."
258
 
259
  except Exception as e:
260
- return f"❌ An error occurred: {str(e)}"
261
 
262
 
263
  # Custom CSS
@@ -269,19 +699,34 @@ custom_css = """
269
  text-align: center;
270
  padding: 20px;
271
  background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
272
- color: white;
273
  border-radius: 10px;
274
  margin-bottom: 20px;
275
  }
 
 
 
 
 
 
 
 
276
  """
277
 
278
  # Build Gradio interface
279
- with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
 
 
 
 
 
 
 
280
  gr.Markdown(
281
  """
282
  <div class="chatbot-header">
283
- <h1>🏛️ CoJournalist Data</h1>
284
- <p>Ask questions about Swiss parliamentary data in natural language</p>
285
  </div>
286
  """
287
  )
@@ -291,12 +736,39 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
291
  chatbot = gr.Chatbot(
292
  height=500,
293
  label="Chat with CoJournalist",
294
- show_label=False
 
295
  )
296
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
297
  with gr.Row():
298
  msg = gr.Textbox(
299
- placeholder="Ask a question about Swiss parliamentary data...",
300
  show_label=False,
301
  scale=4
302
  )
@@ -305,6 +777,16 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
305
  with gr.Column(scale=1):
306
  gr.Markdown("### ⚙️ Settings")
307
 
 
 
 
 
 
 
 
 
 
 
308
  language = gr.Radio(
309
  choices=list(LANGUAGES.keys()),
310
  value="English",
@@ -312,70 +794,291 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
312
  info="Select response language"
313
  )
314
 
315
- show_debug = gr.Checkbox(
316
- label="Show debug info",
317
- value=False,
318
- info="Display tool calls and parameters"
319
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
320
 
321
- gr.Markdown("### 💡 Example Questions")
 
 
 
322
 
323
- # Dynamic examples based on language
324
- def update_examples(lang):
325
- lang_code = LANGUAGES.get(lang, "en")
326
- return gr.update(
327
- choices=EXAMPLES.get(lang_code, EXAMPLES["en"])
328
- )
329
 
330
- examples_dropdown = gr.Dropdown(
331
- choices=EXAMPLES["en"],
332
- label="Try these:",
333
- show_label=False
 
 
 
 
 
334
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
335
 
336
- language.change(
337
- fn=update_examples,
338
- inputs=[language],
339
- outputs=[examples_dropdown]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
340
  )
341
 
342
- # Handle message submission
343
- def respond(message, chat_history, language, show_debug, request: gr.Request):
344
- if not message.strip():
345
- return "", chat_history
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
346
 
347
  # Check usage limit
348
  user_id = request.client.host if request and hasattr(request, 'client') else "unknown"
349
 
350
  if not tracker.check_limit(user_id):
351
- remaining = tracker.get_remaining(user_id)
352
- bot_message = f"⚠️ Daily request limit reached. You have used all 50 requests for today. Please try again tomorrow.\n\nThis limit helps us keep the service free and available for everyone."
353
- chat_history.append((message, bot_message))
354
- return "", chat_history
 
 
 
 
 
 
 
 
 
 
355
 
356
- # Get bot response
357
- bot_message = chat_response(message, chat_history, language, show_debug)
358
 
359
- # Update chat history
360
- chat_history.append((message, bot_message))
 
 
 
361
 
362
- return "", chat_history
363
 
364
- # Handle example selection
365
- def use_example(example):
366
- return example
367
 
368
- msg.submit(respond, [msg, chatbot, language, show_debug], [msg, chatbot])
369
- submit.click(respond, [msg, chatbot, language, show_debug], [msg, chatbot])
370
- examples_dropdown.change(use_example, [examples_dropdown], [msg])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
371
 
372
  gr.Markdown(
373
  """
374
  ---
375
- **Note:** This app uses the OpenParlData MCP server to access Swiss parliamentary data.
376
- Currently returning mock data while the OpenParlData API is in development.
 
377
 
378
- **Rate Limit:** 50 requests per day per user to keep the service affordable and accessible.
379
 
380
  Powered by [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) via HF Inference Providers and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
381
  """
 
1
  """
2
+ CoJournalist Data - Swiss Parliamentary Data & Statistics Chatbot
3
+ Powered by Llama-3.1-8B-Instruct with OpenParlData and BFS MCP
4
  """
5
 
6
  import os
7
  import json
8
+ import tempfile
9
+ from datetime import datetime
10
+ from pathlib import Path
11
  import gradio as gr
12
  from huggingface_hub import InferenceClient
13
  from dotenv import load_dotenv
14
+ from mcp_integration import execute_mcp_query, execute_mcp_query_bfs
15
  import asyncio
16
  from usage_tracker import UsageTracker
17
 
18
  # Load environment variables
19
  load_dotenv()
20
 
21
+ # Load system prompts from files
22
+ PROMPTS_DIR = Path(__file__).parent / "prompts"
23
+
24
+ def load_prompt(dataset_name: str) -> str:
25
+ """Load system prompt from file."""
26
+ prompt_file = PROMPTS_DIR / f"{dataset_name}.txt"
27
+ if not prompt_file.exists():
28
+ raise FileNotFoundError(f"Prompt file not found: {prompt_file}")
29
+ return prompt_file.read_text(encoding='utf-8')
30
+
31
+ # Load prompts at startup
32
+ PARLIAMENT_PROMPT = load_prompt("parliament")
33
+ BFS_PROMPT = load_prompt("bfs")
34
+
35
  # Initialize Hugging Face Inference Client
36
  HF_TOKEN = os.getenv("HF_TOKEN")
37
  if not HF_TOKEN:
 
39
 
40
  client = InferenceClient(token=HF_TOKEN)
41
 
42
+ class DatasetEngine:
43
+ """Dataset-specific orchestrator for LLM prompting and tool execution."""
44
+
45
+ def __init__(
46
+ self,
47
+ name: str,
48
+ display_name: str,
49
+ system_prompt: str,
50
+ routing_instruction: str,
51
+ allowed_tools: set[str],
52
+ ):
53
+ self.name = name
54
+ self.display_name = display_name
55
+ self.system_prompt = system_prompt
56
+ self.routing_instruction = routing_instruction
57
+ self.allowed_tools = allowed_tools
58
+
59
+ def build_messages(self, user_message: str, language_label: str, language_code: str) -> list[dict]:
60
+ """Construct chat completion messages with dataset-specific guardrails."""
61
+ routing_guardrails = (
62
+ f"TARGET_DATA_SOURCE: {self.display_name}\n"
63
+ f"{self.routing_instruction}\n"
64
+ 'If the request requires a different data source, respond with '
65
+ '{"response": "Explain that the other dataset should be selected in the app."}'
66
+ )
67
+ return [
68
+ {"role": "system", "content": self.system_prompt},
69
+ {"role": "system", "content": routing_guardrails},
70
+ {
71
+ "role": "user",
72
+ "content": (
73
+ f"Selected dataset: {self.display_name}\n"
74
+ f"Language preference: {language_label} ({language_code})\n"
75
+ f"Question: {user_message}"
76
+ ),
77
+ },
78
+ ]
79
+
80
+ @staticmethod
81
+ def _parse_model_response(raw_response: str) -> dict:
82
+ """Parse JSON (with cleanup) returned by the LLM."""
83
+ clean_response = raw_response.strip()
84
+ if clean_response.startswith("```json"):
85
+ clean_response = clean_response[7:]
86
+ if clean_response.startswith("```"):
87
+ clean_response = clean_response[3:]
88
+ if clean_response.endswith("```"):
89
+ clean_response = clean_response[:-3]
90
+ clean_response = clean_response.strip()
91
+
92
+ json_start_candidates = []
93
+ for ch in ("{", "["):
94
+ idx = clean_response.find(ch)
95
+ if idx != -1:
96
+ json_start_candidates.append(idx)
97
+ if json_start_candidates:
98
+ clean_response = clean_response[min(json_start_candidates):]
99
+
100
+ return json.loads(clean_response)
101
+
102
+ def query_model(self, user_message: str, language_label: str, language_code: str) -> dict:
103
+ """Call the LLM with dataset-constrained instructions."""
104
+ try:
105
+ messages = self.build_messages(user_message, language_label, language_code)
106
+ response = client.chat_completion(
107
+ model="meta-llama/Llama-3.1-8B-Instruct",
108
+ messages=messages,
109
+ max_tokens=500,
110
+ temperature=0.3,
111
+ )
112
+ assistant_message = response.choices[0].message.content
113
+ return self._parse_model_response(assistant_message)
114
+ except json.JSONDecodeError:
115
+ # Surface malformed responses to the user so they can retry.
116
+ return {"response": assistant_message}
117
+ except Exception as exc:
118
+ return {"error": f"Error querying model: {str(exc)}"}
119
+
120
+ def execute_tool(
121
+ self,
122
+ user_message: str,
123
+ tool_name: str,
124
+ arguments: dict,
125
+ show_debug: bool,
126
+ ) -> tuple[str, str | None]:
127
+ """Run the MCP tool for the dataset."""
128
+ raise NotImplementedError("execute_tool must be implemented by subclasses.")
129
+
130
+ def sanitize_arguments(self, tool_name: str, arguments: dict) -> dict:
131
+ """
132
+ Sanitize and validate tool arguments before execution.
133
+
134
+ Args:
135
+ tool_name: Name of the tool being called
136
+ arguments: Raw arguments from LLM
137
+
138
+ Returns:
139
+ Sanitized arguments dict with proper types and valid values
140
+ """
141
+ raise NotImplementedError("sanitize_arguments must be implemented by subclasses.")
142
+
143
+ def _compose_response_text(
144
+ self,
145
+ explanation: str,
146
+ debug_info: str | None,
147
+ show_debug: bool,
148
+ body: str,
149
+ ) -> str:
150
+ parts = []
151
+ if explanation:
152
+ parts.append(f"*{explanation}*")
153
+ if show_debug and debug_info:
154
+ parts.append(f"### 🔧 Debug Information\n{debug_info}\n\n---")
155
+ parts.append(body)
156
+ return "\n\n".join(parts)
157
+
158
+ def postprocess_tool_response(
159
+ self,
160
+ *,
161
+ response: str,
162
+ tool_name: str,
163
+ explanation: str,
164
+ debug_info: str | None,
165
+ show_debug: bool,
166
+ language_code: str,
167
+ ) -> tuple[str, str | None, dict, list]:
168
+ """Default dataset response handler."""
169
+ body = f"### 📊 Results\n{response}"
170
+ final_response = self._compose_response_text(explanation, debug_info, show_debug, body)
171
+ return final_response, None, {}, []
172
+
173
+ def respond(
174
+ self,
175
+ user_message: str,
176
+ language_label: str,
177
+ language_code: str,
178
+ show_debug: bool,
179
+ ) -> tuple[str, str | None, dict, list]:
180
+ """Entry point used by the Gradio handler."""
181
+ model_response = self.query_model(user_message, language_label, language_code)
182
+
183
+ if "response" in model_response:
184
+ return model_response["response"], None, {}, []
185
+
186
+ if "error" in model_response:
187
+ return f"❌ {model_response['error']}", None, {}, []
188
+
189
+ tool_name = model_response.get("tool")
190
+ arguments = model_response.get("arguments")
191
+
192
+ if not tool_name or not isinstance(arguments, dict):
193
+ return (
194
+ "I couldn't determine how to process your request. Please try rephrasing your question.",
195
+ None,
196
+ {},
197
+ [],
198
+ )
199
+
200
+ if tool_name not in self.allowed_tools:
201
+ allowed_list = ", ".join(sorted(self.allowed_tools))
202
+ warning = (
203
+ f"❌ Tool '{tool_name}' is not available for {self.display_name}. "
204
+ f"Allowed tools: {allowed_list}. Please adjust your request."
205
+ )
206
+ return warning, None, {}, []
207
+
208
+ if "language" not in arguments:
209
+ arguments["language"] = language_code
210
+
211
+ # Sanitize arguments before execution
212
+ arguments = self.sanitize_arguments(tool_name, arguments)
213
+ print(f"✅ [DatasetEngine] Sanitized arguments: {arguments}")
214
+
215
+ explanation = model_response.get("explanation", "")
216
+ response, debug_info = self.execute_tool(user_message, tool_name, arguments, show_debug)
217
+
218
+ return self.postprocess_tool_response(
219
+ response=response,
220
+ tool_name=tool_name,
221
+ explanation=explanation,
222
+ debug_info=debug_info,
223
+ show_debug=show_debug,
224
+ language_code=language_code,
225
+ )
226
+
227
+
228
+ class ParliamentEngine(DatasetEngine):
229
+ # Valid parameter names per tool
230
+ TOOL_PARAMS = {
231
+ "openparldata_search_parliamentarians": {
232
+ "query", "canton", "party", "active_only", "level", "language",
233
+ "limit", "offset", "response_format"
234
+ },
235
+ "openparldata_search_votes": {
236
+ "query", "date_from", "date_to", "parliament_id", "vote_type",
237
+ "level", "language", "limit", "offset", "response_format"
238
+ },
239
+ "openparldata_search_motions": {
240
+ "query", "submitter_id", "status", "date_from", "date_to",
241
+ "level", "language", "limit", "offset", "response_format"
242
+ },
243
+ "openparldata_search_debates": {
244
+ "query", "date_from", "date_to", "speaker_id", "topic",
245
+ "parliament_id", "level", "language", "limit", "offset", "response_format"
246
+ },
247
+ }
248
+
249
+ def __init__(self):
250
+ super().__init__(
251
+ name="parliament",
252
+ display_name="Swiss Parliament Data (OpenParlData)",
253
+ system_prompt=PARLIAMENT_PROMPT,
254
+ routing_instruction="Use only tools that begin with 'openparldata_'. Never mention BFS tools.",
255
+ allowed_tools={
256
+ "openparldata_search_parliamentarians",
257
+ "openparldata_search_votes",
258
+ "openparldata_search_motions",
259
+ "openparldata_search_debates",
260
+ },
261
+ )
262
+
263
+ def sanitize_arguments(self, tool_name: str, arguments: dict) -> dict:
264
+ """Sanitize arguments for OpenParlData tools."""
265
+ sanitized = {}
266
+ valid_params = self.TOOL_PARAMS.get(tool_name, set())
267
+
268
+ for key, value in arguments.items():
269
+ # Skip extra fields not in the tool schema
270
+ if key not in valid_params:
271
+ print(f"⚠️ [ParliamentEngine] Skipping invalid parameter '{key}' for {tool_name}")
272
+ continue
273
+
274
+ # Type conversions
275
+ if key == "limit":
276
+ # Convert to int and clamp to 1-100
277
+ try:
278
+ limit_val = int(value) if isinstance(value, str) else value
279
+ sanitized[key] = max(1, min(100, limit_val))
280
+ except (ValueError, TypeError):
281
+ sanitized[key] = 20 # Default
282
+ elif key == "offset":
283
+ # Convert to int and ensure >= 0
284
+ try:
285
+ offset_val = int(value) if isinstance(value, str) else value
286
+ sanitized[key] = max(0, offset_val)
287
+ except (ValueError, TypeError):
288
+ sanitized[key] = 0 # Default
289
+ elif key == "language":
290
+ # Validate language enum (case-insensitive)
291
+ lang_upper = str(value).upper()
292
+ if lang_upper in ["DE", "FR", "IT", "EN"]:
293
+ sanitized[key] = lang_upper.lower()
294
+ else:
295
+ sanitized[key] = "en" # Default to English
296
+ elif key == "active_only":
297
+ # Convert to bool
298
+ sanitized[key] = bool(value)
299
+ else:
300
+ # Keep other values as-is
301
+ sanitized[key] = value
302
+
303
+ return sanitized
304
+
305
+ def execute_tool(
306
+ self,
307
+ user_message: str,
308
+ tool_name: str,
309
+ arguments: dict,
310
+ show_debug: bool,
311
+ ) -> tuple[str, str | None]:
312
+ # DEBUG: Capture arguments before MCP call
313
+ print(f"\n🔍 [ParliamentEngine] execute_tool called:")
314
+ print(f" Tool: {tool_name}")
315
+ print(f" Arguments: {arguments}")
316
+ print(f" Argument types: {dict((k, type(v).__name__) for k, v in arguments.items())}")
317
+ return asyncio.run(execute_mcp_query(user_message, tool_name, arguments, show_debug))
318
+
319
+ def postprocess_tool_response(
320
+ self,
321
+ *,
322
+ response: str,
323
+ tool_name: str,
324
+ explanation: str,
325
+ debug_info: str | None,
326
+ show_debug: bool,
327
+ language_code: str,
328
+ ) -> tuple[str, str | None, dict, list]:
329
+ """Parse OpenParlData JSON responses and create card data."""
330
+ parliament_cards = []
331
+ language_fallback = False
332
+
333
+ # Try to parse JSON response
334
+ try:
335
+ data = json.loads(response)
336
+
337
+ # Check if it's an OpenParlData response with data array
338
+ if isinstance(data, dict) and "data" in data and isinstance(data["data"], list):
339
+ # Extract card info from each item
340
+ for item in data["data"]:
341
+ if isinstance(item, dict):
342
+ # Get title in user's preferred language with fallback
343
+ title = "Untitled"
344
+ title_dict = item.get("affair_title") if "affair_title" in item else item.get("title")
345
+
346
+ if isinstance(title_dict, dict):
347
+ # Try user's language first
348
+ if language_code == "en":
349
+ # English not available in API, fallback to German
350
+ title = title_dict.get("de") or title_dict.get("fr") or title_dict.get("it") or "Untitled"
351
+ if title != "Untitled":
352
+ language_fallback = True
353
+ else:
354
+ # Try user's language, fallback to de → fr → it
355
+ title = (title_dict.get(language_code) or
356
+ title_dict.get("de") or
357
+ title_dict.get("fr") or
358
+ title_dict.get("it") or
359
+ "Untitled")
360
+
361
+ # Get URL in user's preferred language
362
+ url = "#"
363
+ if "url_external" in item and isinstance(item["url_external"], dict):
364
+ if language_code == "en":
365
+ url = item["url_external"].get("de") or item["url_external"].get("fr") or item["url_external"].get("it") or "#"
366
+ else:
367
+ url = (item["url_external"].get(language_code) or
368
+ item["url_external"].get("de") or
369
+ item["url_external"].get("fr") or
370
+ item["url_external"].get("it") or
371
+ "#")
372
+
373
+ # Add date if available
374
+ date_str = ""
375
+ if "date" in item:
376
+ date_str = item["date"][:10] # Extract YYYY-MM-DD
377
+
378
+ parliament_cards.append({
379
+ "title": title,
380
+ "url": url,
381
+ "date": date_str
382
+ })
383
+
384
+ # If we have cards, show a summary message
385
+ if parliament_cards:
386
+ count = len(parliament_cards)
387
+ total = data.get("meta", {}).get("total_records", count)
388
+ body = f"### 🏛️ Parliament Results\n\nFound **{total}** result(s). Showing {count} items below:"
389
+
390
+ # Add language fallback notice for English users
391
+ if language_fallback and language_code == "en":
392
+ body += "\n\n*Note: English content is not available from the API. Results are displayed in German.*"
393
+ else:
394
+ body = "### 🏛️ Parliament Results\n\nNo results found for your query."
395
+ else:
396
+ # Not a data response, show as-is
397
+ body = f"### 📊 Results\n{response}"
398
+
399
+ except json.JSONDecodeError:
400
+ # Not JSON, treat as text response
401
+ body = f"### 📊 Results\n{response}"
402
+
403
+ final_response = self._compose_response_text(explanation, debug_info, show_debug, body)
404
+ return final_response, None, {}, parliament_cards
405
+
406
+
407
+ class BFSEngine(DatasetEngine):
408
+ # Valid parameter names per tool
409
+ TOOL_PARAMS = {
410
+ "bfs_search": {
411
+ "keywords", "language" # NO format parameter!
412
+ },
413
+ "bfs_query_data": {
414
+ "datacube_id", "filters", "format", "language"
415
+ },
416
+ }
417
+
418
+ def __init__(self):
419
+ super().__init__(
420
+ name="statistics",
421
+ display_name="Swiss Statistics (BFS)",
422
+ system_prompt=BFS_PROMPT,
423
+ routing_instruction="Use only tools that begin with 'bfs_'. Never mention OpenParlData tools.",
424
+ allowed_tools={
425
+ "bfs_search",
426
+ "bfs_query_data",
427
+ },
428
+ )
429
+
430
+ def sanitize_arguments(self, tool_name: str, arguments: dict) -> dict:
431
+ """Sanitize arguments for BFS tools."""
432
+ sanitized = {}
433
+ valid_params = self.TOOL_PARAMS.get(tool_name, set())
434
+
435
+ for key, value in arguments.items():
436
+ # Skip extra fields not in the tool schema
437
+ if key not in valid_params:
438
+ print(f"⚠️ [BFSEngine] Skipping invalid parameter '{key}' for {tool_name}")
439
+ continue
440
+
441
+ # Type conversions
442
+ if key == "language":
443
+ # Validate language enum (case-insensitive)
444
+ lang_upper = str(value).upper()
445
+ if lang_upper in ["DE", "FR", "IT", "EN"]:
446
+ sanitized[key] = lang_upper.lower()
447
+ else:
448
+ sanitized[key] = "en" # Default to English
449
+ elif key == "format":
450
+ # Validate and normalize format enum (only for bfs_query_data)
451
+ if tool_name == "bfs_query_data":
452
+ format_upper = str(value).upper().replace("-", "_")
453
+ # Map common values to DataFormat enum
454
+ format_map = {
455
+ "CSV": "csv",
456
+ "JSON": "json",
457
+ "JSON_STAT": "json-stat",
458
+ "JSON_STAT2": "json-stat2",
459
+ "PX": "px",
460
+ }
461
+ sanitized[key] = format_map.get(format_upper, "csv") # Default to CSV
462
+ else:
463
+ # Keep other values as-is
464
+ sanitized[key] = value
465
+
466
+ # Add default format for bfs_query_data if not present
467
+ if tool_name == "bfs_query_data" and "format" not in sanitized:
468
+ sanitized["format"] = "csv"
469
+
470
+ return sanitized
471
+
472
+ def execute_tool(
473
+ self,
474
+ user_message: str,
475
+ tool_name: str,
476
+ arguments: dict,
477
+ show_debug: bool,
478
+ ) -> tuple[str, str | None]:
479
+ # DEBUG: Capture arguments after sanitization
480
+ print(f"\n🔍 [BFSEngine] execute_tool called:")
481
+ print(f" Tool: {tool_name}")
482
+ print(f" Arguments (sanitized): {arguments}")
483
+ print(f" Argument types: {dict((k, type(v).__name__) for k, v in arguments.items())}")
484
+ return asyncio.run(execute_mcp_query_bfs(user_message, tool_name, arguments, show_debug))
485
+
486
+ @staticmethod
487
+ def _parse_datacube_choices(response: str) -> tuple[dict, list]:
488
+ datacube_map: dict[str, str] = {}
489
+ datacube_choices: list[str] = []
490
+ import re
491
+
492
+ lines = response.split('\n')
493
+ i = 0
494
+ while i < len(lines):
495
+ line = lines[i]
496
+ match = re.search(r'^\s*\d+\.\s+\*\*([^*]+)\*\*\s*$', line)
497
+ if match:
498
+ datacube_id = match.group(1).strip()
499
+ description = datacube_id
500
+ if i + 1 < len(lines):
501
+ next_line = lines[i + 1].strip()
502
+ if not next_line.startswith('↳') and next_line:
503
+ description = next_line
504
+ elif i + 2 < len(lines):
505
+ description = lines[i + 2].strip() or datacube_id
506
+ if len(description) > 80:
507
+ description = description[:77] + "..."
508
+ label = f"{description} ({datacube_id})"
509
+ datacube_choices.append(label)
510
+ datacube_map[label] = datacube_id
511
+ i += 1
512
+ return datacube_map, datacube_choices
513
+
514
+ @staticmethod
515
+ def _detect_csv(response: str) -> bool:
516
+ lines = response.strip().split('\n')
517
+ if len(lines) < 2:
518
+ return False
519
+ if ',' not in lines[0] or ',' not in lines[1]:
520
+ return False
521
+ prefix = response.lower()[:200]
522
+ error_tokens = ["error", "no data", "no datacubes found", "try broader"]
523
+ return not any(token in prefix for token in error_tokens)
524
+
525
+ def postprocess_tool_response(
526
+ self,
527
+ *,
528
+ response: str,
529
+ tool_name: str,
530
+ explanation: str,
531
+ debug_info: str | None,
532
+ show_debug: bool,
533
+ language_code: str,
534
+ ) -> tuple[str, str | None, dict, list]:
535
+ csv_file_path = None
536
+ datacube_map: dict[str, str] = {}
537
+ datacube_choices: list[str] = []
538
+ body = ""
539
+
540
+ if tool_name == "bfs_query_data" and self._detect_csv(response):
541
+ rows = response.count('\n')
542
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
543
+ csv_filename = f"bfs_data_{timestamp}.csv"
544
+ csv_file_path = os.path.join(tempfile.gettempdir(), csv_filename)
545
+ with open(csv_file_path, 'w', encoding='utf-8') as f:
546
+ f.write(response)
547
+ body = (
548
+ "### 📊 Data Ready\n"
549
+ f"✅ CSV file generated with {rows} rows\n\n"
550
+ "💾 **Download your data using the button below**"
551
+ )
552
+ else:
553
+ if tool_name == "bfs_search" and "matching datacube" in response.lower():
554
+ datacube_map, datacube_choices = self._parse_datacube_choices(response)
555
+
556
+ # If we found datacubes, show a simple message instead of the full response
557
+ if datacube_choices:
558
+ # Extract the search term from explanation
559
+ import re
560
+ match = re.search(r'related to (.+)', explanation, re.IGNORECASE)
561
+ search_term = match.group(1).strip() if match else "your search"
562
+ body = f"### 📊 Available Datasets\n\nHere is the data available for **{search_term}**. Please select a dataset below to download:"
563
+ else:
564
+ # No datacubes found, show the full error message
565
+ body = f"### 📊 Results\n{response}"
566
+ else:
567
+ body = f"### 📊 Results\n{response}"
568
+
569
+ final_response = self._compose_response_text(explanation, debug_info, show_debug, body)
570
+ return final_response, csv_file_path, datacube_map, datacube_choices
571
+
572
+ def fetch_datacube_data(
573
+ self,
574
+ datacube_id: str,
575
+ language_code: str,
576
+ show_debug: bool,
577
+ ) -> tuple[str, str | None]:
578
+ response, debug_info = self.execute_tool(
579
+ user_message=f"Get data for datacube {datacube_id}",
580
+ tool_name="bfs_query_data",
581
+ arguments={"datacube_id": datacube_id, "language": language_code},
582
+ show_debug=show_debug,
583
+ )
584
+ if self._detect_csv(response):
585
+ rows = response.count('\n')
586
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
587
+ csv_filename = f"bfs_data_{timestamp}.csv"
588
+ csv_file_path = os.path.join(tempfile.gettempdir(), csv_filename)
589
+ with open(csv_file_path, 'w', encoding='utf-8') as f:
590
+ f.write(response)
591
+ message = (
592
+ "### 📊 Data Ready\n"
593
+ f"✅ CSV file generated with {rows} rows for datacube: `{datacube_id}`\n\n"
594
+ "💾 **Download your data using the button below**"
595
+ )
596
+ if show_debug and debug_info:
597
+ message = f"### 🔧 Debug Information\n{debug_info}\n\n---\n\n{message}"
598
+ return message, csv_file_path
599
+ error_message = f"❌ Error retrieving data:\n\n{response}"
600
+ return error_message, None
601
+
602
+
603
+ DATASET_ENGINES: dict[str, DatasetEngine] = {
604
+ "parliament": ParliamentEngine(),
605
+ "statistics": BFSEngine(),
606
+ }
607
+
608
  # Initialize usage tracker with 50 requests per day limit
609
  tracker = UsageTracker(daily_limit=50)
610
 
 
616
  "Italiano": "it"
617
  }
618
 
619
+ # Example queries for OpenParlData
620
+ OPENPARLDATA_EXAMPLES = {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
621
  "en": [
622
  "Who are the parliamentarians from Zurich?",
623
  "Show me recent votes about climate policy",
 
644
  ]
645
  }
646
 
647
+ # Example queries for BFS (two-step workflow)
648
+ BFS_EXAMPLES = {
649
+ "en": [
650
+ "I want inflation data",
651
+ "Show me population statistics",
652
+ "I need employment data by canton",
653
+ "Find energy consumption statistics"
654
+ ],
655
+ "de": [
656
+ "Ich möchte Inflationsdaten",
657
+ "Zeige mir Bevölkerungsstatistiken",
658
+ "Ich brauche Beschäftigungsdaten nach Kanton",
659
+ "Finde Energieverbrauchsstatistiken"
660
+ ],
661
+ "fr": [
662
+ "Je veux des données sur l'inflation",
663
+ "Montrez-moi les statistiques de population",
664
+ "J'ai besoin de données sur l'emploi par canton",
665
+ "Trouvez les statistiques de consommation d'énergie"
666
+ ],
667
+ "it": [
668
+ "Voglio dati sull'inflazione",
669
+ "Mostrami le statistiche sulla popolazione",
670
+ "Ho bisogno di dati sull'occupazione per cantone",
671
+ "Trova le statistiche sul consumo energetico"
672
+ ]
673
+ }
674
 
675
+ # Keep backward compatibility
676
+ EXAMPLES = OPENPARLDATA_EXAMPLES
677
+ def chat_response(message: str, history: list, language: str, show_debug: bool, dataset: str = "parliament") -> tuple[str, str | None, dict, list]:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
678
  """
679
+ Main chat response function routed through dataset-specific engines.
 
 
 
 
 
 
 
 
 
680
  """
681
  try:
682
+ engine = DATASET_ENGINES.get(dataset)
683
+ if not engine:
684
+ return f"❌ Unknown dataset selected: {dataset}", None, {}, []
685
 
686
+ language_code = LANGUAGES.get(language, "en")
687
+ return engine.respond(message, language, language_code, show_debug)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
688
 
689
  except Exception as e:
690
+ return f"❌ An error occurred: {str(e)}", None, {}, []
691
 
692
 
693
  # Custom CSS
 
699
  text-align: center;
700
  padding: 20px;
701
  background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
702
+ color: white !important;
703
  border-radius: 10px;
704
  margin-bottom: 20px;
705
  }
706
+ .chatbot-header h1 {
707
+ color: white !important;
708
+ margin: 0;
709
+ }
710
+ .chatbot-header p {
711
+ color: white !important;
712
+ margin: 10px 0 0 0;
713
+ }
714
  """
715
 
716
  # Build Gradio interface
717
+ with gr.Blocks(css=custom_css, title="CoJournalist Swiss Data") as demo:
718
+ # State to track datacube search results
719
+ datacube_state = gr.State({}) # Maps display text → datacube_id
720
+
721
+ # State to track parliament cards
722
+ parliament_cards_state = gr.State([]) # List of card dicts
723
+ parliament_page_state = gr.State(1) # Current page number
724
+
725
  gr.Markdown(
726
  """
727
  <div class="chatbot-header">
728
+ <h1>🇨🇭 CoJournalist Swiss Data</h1>
729
+ <p>Query Swiss parliamentary and statistical data in natural language</p>
730
  </div>
731
  """
732
  )
 
736
  chatbot = gr.Chatbot(
737
  height=500,
738
  label="Chat with CoJournalist",
739
+ show_label=False,
740
+ type="messages"
741
  )
742
 
743
+ # CSV download file component
744
+ download_file = gr.File(
745
+ label="📥 Download Data",
746
+ visible=False,
747
+ interactive=False
748
+ )
749
+
750
+ # Datacube selection (hidden by default, shown when search returns results)
751
+ with gr.Row(visible=False) as datacube_selection_row:
752
+ with gr.Column(scale=4):
753
+ datacube_radio = gr.Radio(
754
+ label="📋 Select Datacube for Download",
755
+ choices=[],
756
+ visible=True
757
+ )
758
+ with gr.Column(scale=1):
759
+ get_data_btn = gr.Button("📥 Get Data", variant="primary", size="lg")
760
+
761
+ # Parliament cards display (hidden by default, shown when parliament results return)
762
+ with gr.Column(visible=False) as parliament_cards_row:
763
+ parliament_cards_html = gr.HTML("")
764
+ with gr.Row():
765
+ prev_page_btn = gr.Button("◀ Previous", size="sm")
766
+ page_info = gr.Markdown("Page 1")
767
+ next_page_btn = gr.Button("Next ▶", size="sm")
768
+
769
  with gr.Row():
770
  msg = gr.Textbox(
771
+ placeholder="(Choose a source on the right first)",
772
  show_label=False,
773
  scale=4
774
  )
 
777
  with gr.Column(scale=1):
778
  gr.Markdown("### ⚙️ Settings")
779
 
780
+ dataset = gr.Radio(
781
+ choices=[
782
+ ("Swiss Parliament Data", "openparldata"),
783
+ ("Swiss Statistics (BFS)", "bfs")
784
+ ],
785
+ value="openparldata",
786
+ label="Data Source",
787
+ info="Choose which API to query"
788
+ )
789
+
790
  language = gr.Radio(
791
  choices=list(LANGUAGES.keys()),
792
  value="English",
 
794
  info="Select response language"
795
  )
796
 
797
+ def ensure_message_history(history):
798
+ """Normalize chat history to the format expected by gr.Chatbot(type='messages')."""
799
+ normalized: list[dict] = []
800
+ if not history:
801
+ return normalized
802
+
803
+ for entry in history:
804
+ if isinstance(entry, dict):
805
+ role = entry.get("role")
806
+ content = entry.get("content", "")
807
+ if role:
808
+ normalized.append({"role": role, "content": "" if content is None else str(content)})
809
+ elif isinstance(entry, (tuple, list)) and len(entry) == 2:
810
+ user, assistant = entry
811
+ if user is not None:
812
+ normalized.append({"role": "user", "content": str(user)})
813
+ if assistant is not None:
814
+ normalized.append({"role": "assistant", "content": str(assistant)})
815
+ return normalized
816
+
817
+ def append_message(history: list[dict], role: str, content: str | None):
818
+ """Append a message to the normalized history."""
819
+ history.append({"role": role, "content": "" if content is None else str(content)})
820
+
821
+ def render_parliament_cards(cards: list[dict], page: int, items_per_page: int = 10) -> tuple[str, str, int, bool]:
822
+ """Render parliament cards as HTML with pagination."""
823
+ if not cards:
824
+ return "", "No results", 1, False
825
+
826
+ total_pages = (len(cards) + items_per_page - 1) // items_per_page
827
+ page = max(1, min(page, total_pages)) # Clamp page to valid range
828
+ show_pagination = len(cards) > items_per_page
829
+
830
+ start_idx = (page - 1) * items_per_page
831
+ end_idx = min(start_idx + items_per_page, len(cards))
832
+ page_cards = cards[start_idx:end_idx]
833
+
834
+ # Generate HTML for cards
835
+ cards_html = '<div style="display: flex; flex-direction: column; gap: 15px;">'
836
+ for card in page_cards:
837
+ title = card.get("title", "Untitled")
838
+ url = card.get("url", "#")
839
+ date = card.get("date", "")
840
+
841
+ # Truncate title if too long
842
+ if len(title) > 120:
843
+ title = title[:117] + "..."
844
+
845
+ date_badge = f'<span style="background: #e0e0e0; padding: 4px 8px; border-radius: 4px; font-size: 12px; color: #666;">{date}</span>' if date else ''
846
+
847
+ cards_html += f'''
848
+ <a href="{url}" target="_blank" style="text-decoration: none;">
849
+ <div style="
850
+ border: 1px solid #ddd;
851
+ border-radius: 8px;
852
+ padding: 16px;
853
+ background: white;
854
+ transition: all 0.2s;
855
+ cursor: pointer;
856
+ ">
857
+ <div style="display: flex; justify-content: space-between; align-items: start; gap: 12px;">
858
+ <h3 style="margin: 0; color: #333; font-size: 16px; flex: 1;">{title}</h3>
859
+ {date_badge}
860
+ </div>
861
+ </div>
862
+ </a>
863
+ '''
864
+ cards_html += '</div>'
865
+
866
+ page_info = f"Page {page} of {total_pages} ({len(cards)} total results)"
867
+
868
+ return cards_html, page_info, page, show_pagination
869
 
870
+ # Handle message submission
871
+ def respond(message, chat_history, language, dataset_choice, current_datacube_state, current_parliament_cards, current_page, request: gr.Request):
872
+ show_debug = False # Debug mode disabled in UI
873
+ chat_messages = ensure_message_history(chat_history)
874
 
875
+ if not message.strip():
876
+ return "", chat_messages, None, gr.update(visible=False), current_datacube_state, gr.update(), gr.update(visible=False), current_parliament_cards, current_page, "", "", gr.update(visible=False), gr.update(), gr.update()
 
 
 
 
877
 
878
+ # Check usage limit
879
+ user_id = request.client.host if request and hasattr(request, 'client') else "unknown"
880
+
881
+ append_message(chat_messages, "user", message)
882
+
883
+ if not tracker.check_limit(user_id):
884
+ bot_message = (
885
+ "⚠️ Daily request limit reached. You have used all 50 requests for today. "
886
+ "Please try again tomorrow.\n\nThis limit helps us keep the service free and available for everyone."
887
  )
888
+ append_message(chat_messages, "assistant", bot_message)
889
+ return "", chat_messages, None, gr.update(visible=False), current_datacube_state, gr.update(), gr.update(visible=False), current_parliament_cards, current_page, "", "", gr.update(visible=False), gr.update(), gr.update()
890
+
891
+ # Map dataset choice to engine type
892
+ dataset_map = {
893
+ "openparldata": "parliament",
894
+ "bfs": "statistics"
895
+ }
896
+ dataset_type = dataset_map.get(dataset_choice, "parliament")
897
+
898
+ # Get bot response (returns tuple with optional CSV file and results data)
899
+ bot_message, csv_file, datacube_map, results_data = chat_response(
900
+ message, chat_messages, language, show_debug, dataset_type
901
+ )
902
 
903
+ append_message(chat_messages, "assistant", bot_message)
904
+
905
+ # Handle parliament cards (for Parliament dataset)
906
+ if dataset_type == "parliament" and results_data:
907
+ cards_html, page_info, page_num, show_pagination = render_parliament_cards(results_data, 1)
908
+ return (
909
+ "",
910
+ chat_messages,
911
+ None,
912
+ gr.update(visible=False),
913
+ current_datacube_state,
914
+ gr.update(),
915
+ gr.update(visible=False),
916
+ results_data, # parliament_cards_state
917
+ page_num, # parliament_page_state
918
+ cards_html, # parliament_cards_html
919
+ page_info, # page_info
920
+ gr.update(visible=True), # parliament_cards_row
921
+ gr.update(visible=show_pagination), # prev_page_btn
922
+ gr.update(visible=show_pagination) # next_page_btn
923
  )
924
 
925
+ # Handle datacube search results (for BFS dataset)
926
+ if dataset_type == "statistics" and results_data:
927
+ return (
928
+ "",
929
+ chat_messages,
930
+ None,
931
+ gr.update(visible=False),
932
+ datacube_map,
933
+ gr.update(choices=results_data, value=None),
934
+ gr.update(visible=True),
935
+ current_parliament_cards,
936
+ current_page,
937
+ "",
938
+ "",
939
+ gr.update(visible=False),
940
+ gr.update(),
941
+ gr.update()
942
+ )
943
+
944
+ # Handle CSV download
945
+ if csv_file:
946
+ return (
947
+ "",
948
+ chat_messages,
949
+ csv_file,
950
+ gr.update(visible=True),
951
+ current_datacube_state,
952
+ gr.update(),
953
+ gr.update(visible=False),
954
+ current_parliament_cards,
955
+ current_page,
956
+ "",
957
+ "",
958
+ gr.update(visible=False),
959
+ gr.update(),
960
+ gr.update()
961
+ )
962
+
963
+ return (
964
+ "",
965
+ chat_messages,
966
+ None,
967
+ gr.update(visible=False),
968
+ current_datacube_state,
969
+ gr.update(),
970
+ gr.update(visible=False),
971
+ current_parliament_cards,
972
+ current_page,
973
+ "",
974
+ "",
975
+ gr.update(visible=False),
976
+ gr.update(),
977
+ gr.update()
978
+ )
979
+
980
+ # Handle parliament pagination
981
+ def prev_page(cards, current_page):
982
+ """Go to previous page of parliament results."""
983
+ new_page = max(1, current_page - 1)
984
+ cards_html, page_info, page_num, show_pagination = render_parliament_cards(cards, new_page)
985
+ return cards_html, page_info, page_num
986
+
987
+ def next_page(cards, current_page):
988
+ """Go to next page of parliament results."""
989
+ if not cards:
990
+ return "", "No results", current_page
991
+ total_pages = (len(cards) + 9) // 10 # 10 items per page
992
+ new_page = min(total_pages, current_page + 1)
993
+ cards_html, page_info, page_num, show_pagination = render_parliament_cards(cards, new_page)
994
+ return cards_html, page_info, page_num
995
+
996
+ # Handle "Get Data" button click for datacube selection
997
+ def fetch_datacube_data(selected_choice, current_datacube_state, chat_history, language, request: gr.Request):
998
+ show_debug = False # Debug mode disabled in UI
999
+ chat_messages = ensure_message_history(chat_history)
1000
+ user_message = f"Get Data: {selected_choice}" if selected_choice else "Get Data"
1001
+ append_message(chat_messages, "user", user_message)
1002
+
1003
+ if not selected_choice or not current_datacube_state:
1004
+ error_msg = "⚠️ Please select a datacube first."
1005
+ append_message(chat_messages, "assistant", error_msg)
1006
+ return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
1007
 
1008
  # Check usage limit
1009
  user_id = request.client.host if request and hasattr(request, 'client') else "unknown"
1010
 
1011
  if not tracker.check_limit(user_id):
1012
+ bot_message = (
1013
+ "⚠️ Daily request limit reached. You have used all 50 requests for today. "
1014
+ "Please try again tomorrow.\n\nThis limit helps us keep the service free and available for everyone."
1015
+ )
1016
+ append_message(chat_messages, "assistant", bot_message)
1017
+ return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
1018
+
1019
+ # Get datacube ID from mapping
1020
+ datacube_id = current_datacube_state.get(selected_choice)
1021
+
1022
+ if not datacube_id:
1023
+ error_msg = "❌ Error: Could not find datacube ID for selected option."
1024
+ append_message(chat_messages, "assistant", error_msg)
1025
+ return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
1026
 
1027
+ # Get language code
1028
+ lang_code = LANGUAGES.get(language, "en")
1029
 
1030
+ bfs_engine = DATASET_ENGINES.get("statistics")
1031
+ if not isinstance(bfs_engine, BFSEngine):
1032
+ error_msg = "❌ Error: BFS engine unavailable."
1033
+ append_message(chat_messages, "assistant", error_msg)
1034
+ return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
1035
 
1036
+ bot_message, csv_file_path = bfs_engine.fetch_datacube_data(datacube_id, lang_code, show_debug)
1037
 
1038
+ append_message(chat_messages, "assistant", bot_message)
1039
+ if csv_file_path:
1040
+ return chat_messages, csv_file_path, gr.update(visible=True), gr.update(visible=False)
1041
 
1042
+ return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
1043
+
1044
+ msg.submit(
1045
+ respond,
1046
+ [msg, chatbot, language, dataset, datacube_state, parliament_cards_state, parliament_page_state],
1047
+ [msg, chatbot, download_file, download_file, datacube_state, datacube_radio, datacube_selection_row,
1048
+ parliament_cards_state, parliament_page_state, parliament_cards_html, page_info, parliament_cards_row,
1049
+ prev_page_btn, next_page_btn]
1050
+ )
1051
+ submit.click(
1052
+ respond,
1053
+ [msg, chatbot, language, dataset, datacube_state, parliament_cards_state, parliament_page_state],
1054
+ [msg, chatbot, download_file, download_file, datacube_state, datacube_radio, datacube_selection_row,
1055
+ parliament_cards_state, parliament_page_state, parliament_cards_html, page_info, parliament_cards_row,
1056
+ prev_page_btn, next_page_btn]
1057
+ )
1058
+ get_data_btn.click(
1059
+ fetch_datacube_data,
1060
+ [datacube_radio, datacube_state, chatbot, language],
1061
+ [chatbot, download_file, download_file, datacube_selection_row]
1062
+ )
1063
+ prev_page_btn.click(
1064
+ prev_page,
1065
+ [parliament_cards_state, parliament_page_state],
1066
+ [parliament_cards_html, page_info, parliament_page_state]
1067
+ )
1068
+ next_page_btn.click(
1069
+ next_page,
1070
+ [parliament_cards_state, parliament_page_state],
1071
+ [parliament_cards_html, page_info, parliament_page_state]
1072
+ )
1073
 
1074
  gr.Markdown(
1075
  """
1076
  ---
1077
+ **Data Sources:**
1078
+ - **Swiss Parliament Data:** OpenParlData MCP server for parliamentary information
1079
+ - **Swiss Statistics (BFS):** Federal Statistical Office data via PxWeb API
1080
 
1081
+ **Rate Limit:** 50 requests per day per user (shared across both datasets) to keep the service affordable and accessible.
1082
 
1083
  Powered by [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) via HF Inference Providers and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
1084
  """
mcp/openparldata_mcp.py CHANGED
@@ -599,5 +599,5 @@ async def search_debates(params: SearchDebatesInput) -> str:
599
 
600
  # Main execution
601
  if __name__ == "__main__":
602
- import asyncio
603
- asyncio.run(mcp.run())
 
599
 
600
  # Main execution
601
  if __name__ == "__main__":
602
+ # Run FastMCP server (synchronous, blocking call)
603
+ mcp.run()
mcp_bfs/MCP_USAGE.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Swiss BFS API MCP Server
2
+
3
+ ## Overview
4
+ This MCP server provides access to ALL data from the Swiss Federal Statistical Office (BFS), not just population data. The BFS maintains comprehensive statistics on:
5
+
6
+ - Population and Demographics
7
+ - Territory and Environment
8
+ - Work and Income
9
+ - National Economy
10
+ - Prices and Inflation
11
+ - Industry and Services
12
+ - Agriculture and Forestry
13
+ - Energy
14
+ - Construction and Housing
15
+ - Tourism
16
+ - Mobility and Transport
17
+ - Social Security
18
+ - Health
19
+ - Education and Science
20
+ - Crime and Criminal Justice
21
+
22
+ ## Installation
23
+
24
+ ```bash
25
+ pip install -r requirements.txt
26
+ ```
27
+
28
+ ## Usage
29
+
30
+ Run the MCP server:
31
+ ```bash
32
+ python bfs_mcp_server.py
33
+ ```
34
+
35
+ The server communicates via stdio and can be integrated with any MCP-compatible client.
36
+
37
+ ## Available Tools
38
+
39
+ ### 1. `bfs_list_datacubes`
40
+ Browse available datacubes in the API hierarchy.
41
+ - `path`: Category path (e.g., "px-x-01" for population, "" for root)
42
+ - `language`: de/fr/it/en
43
+
44
+ ### 2. `bfs_get_metadata`
45
+ Get detailed metadata about a specific datacube including dimensions and available values.
46
+ - `datacube_id`: The datacube identifier (e.g., "px-x-0102030000_101")
47
+ - `language`: de/fr/it/en
48
+
49
+ ### 3. `bfs_query_data`
50
+ Query any BFS datacube with custom filters.
51
+ - `datacube_id`: The datacube identifier
52
+ - `filters`: Array of filter objects with `code`, `filter` type, and `values`
53
+ - `format`: Output format (csv/json/json-stat/json-stat2/px)
54
+ - `language`: de/fr/it/en
55
+
56
+ ### 4. `bfs_search`
57
+ Search for datacubes by topic keywords.
58
+ - `keywords`: Search terms (e.g., "inflation", "education", "health")
59
+ - `language`: de/fr/it/en
60
+
61
+ ### 5. `bfs_get_config`
62
+ Get API configuration and limits.
63
+ - `language`: de/fr/it/en
64
+
65
+ ## Example Usage Flow
66
+
67
+ 1. **Search for a topic:**
68
+ ```
69
+ bfs_search(keywords="inflation")
70
+ ```
71
+
72
+ 2. **Browse a category:**
73
+ ```
74
+ bfs_list_datacubes(path="px-x-05") # Price statistics
75
+ ```
76
+
77
+ 3. **Get metadata for a specific datacube:**
78
+ ```
79
+ bfs_get_metadata(datacube_id="px-x-0502010000_104")
80
+ ```
81
+
82
+ 4. **Query data with filters:**
83
+ ```
84
+ bfs_query_data(
85
+ datacube_id="px-x-0502010000_104",
86
+ filters=[
87
+ {"code": "Zeit", "filter": "top", "values": ["12"]}
88
+ ],
89
+ format="csv"
90
+ )
91
+ ```
92
+
93
+ ## Category Codes
94
+
95
+ Main statistical categories in the BFS system:
96
+ - `px-x-01`: Population
97
+ - `px-x-02`: Territory and Environment
98
+ - `px-x-03`: Work and Income
99
+ - `px-x-04`: National Economy
100
+ - `px-x-05`: Prices
101
+ - `px-x-06`: Industry and Services
102
+ - `px-x-07`: Agriculture and Forestry
103
+ - `px-x-08`: Energy
104
+ - `px-x-09`: Construction and Housing
105
+ - `px-x-10`: Tourism
106
+ - `px-x-11`: Mobility and Transport
107
+ - `px-x-13`: Social Security
108
+ - `px-x-14`: Health
109
+ - `px-x-15`: Education and Science
110
+ - `px-x-19`: Crime and Criminal Justice
111
+
112
+ ## Integration with LLM Clients
113
+
114
+ This MCP server is designed to work with any MCP-compatible LLM client. The server handles natural language understanding through the client, providing structured access to Swiss federal statistics.
115
+
116
+ ## API Documentation
117
+
118
+ The underlying API is a PxWeb implementation (developed by Statistics Sweden).
119
+ - Base URL: https://www.pxweb.bfs.admin.ch/api/v1/{language}/
120
+ - Official BFS Website: https://www.bfs.admin.ch
mcp_bfs/bfs_mcp_server.py ADDED
@@ -0,0 +1,538 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Swiss BFS API MCP Server
4
+ Provides broad access to Swiss Federal Statistical Office data via PxWeb API
5
+
6
+ Refactored to use FastMCP for consistency with OpenParlData server.
7
+ """
8
+
9
+ import asyncio
10
+ import json
11
+ import logging
12
+ from typing import Dict, List, Any, Optional
13
+ from enum import Enum
14
+ import httpx
15
+ from mcp.server.fastmcp import FastMCP
16
+ from pydantic import BaseModel, Field, ConfigDict
17
+
18
+ logging.basicConfig(level=logging.INFO)
19
+ logger = logging.getLogger(__name__)
20
+
21
+ # Initialize FastMCP server
22
+ mcp = FastMCP("swiss-bfs-api")
23
+
24
+ # API Configuration
25
+ BASE_URL = "https://www.pxweb.bfs.admin.ch/api/v1"
26
+
27
+ class Language(str, Enum):
28
+ DE = "de"
29
+ FR = "fr"
30
+ IT = "it"
31
+ EN = "en"
32
+
33
+ class DataFormat(str, Enum):
34
+ CSV = "csv"
35
+ JSON = "json"
36
+ JSON_STAT = "json-stat"
37
+ JSON_STAT2 = "json-stat2"
38
+ PX = "px"
39
+
40
+ class FilterType(str, Enum):
41
+ ALL = "all"
42
+ ITEM = "item"
43
+ TOP = "top"
44
+
45
+ # Datacube knowledge base: Maps keywords to known datacube IDs with descriptions
46
+ # This helps with semantic search since the API only returns cryptic IDs
47
+ DATACUBE_KNOWLEDGE_BASE = {
48
+ # Population & Demographics (px-x-01)
49
+ "population": [
50
+ ("px-x-0102010000_101", "Permanent resident population by canton"),
51
+ ("px-x-0102020000_101", "Population by age and sex"),
52
+ ("px-x-0102020202_106", "Population statistics and scenarios"),
53
+ ("px-x-0102020300_101", "Population growth and change"),
54
+ ],
55
+ "demographics": [
56
+ ("px-x-0102010000_101", "Permanent resident population by canton"),
57
+ ("px-x-0102020000_101", "Population by age and sex"),
58
+ ],
59
+ "birth": [
60
+ ("px-x-0102020000_101", "Birth rates and statistics"),
61
+ ],
62
+ "death": [
63
+ ("px-x-0102020000_101", "Mortality rates and statistics"),
64
+ ],
65
+
66
+ # Employment & Labor (px-x-03)
67
+ "employment": [
68
+ ("px-x-0301000000_103", "Employment by sector"),
69
+ ("px-x-0301000000_104", "Employment statistics"),
70
+ ],
71
+ "unemployment": [
72
+ ("px-x-0301000000_103", "Unemployment rates"),
73
+ ],
74
+ "labor": [
75
+ ("px-x-0301000000_103", "Labor market statistics"),
76
+ ],
77
+ "work": [
78
+ ("px-x-0301000000_103", "Employment and work statistics"),
79
+ ],
80
+
81
+ # Prices & Inflation (px-x-05)
82
+ "inflation": [
83
+ ("px-x-0502010000_101", "Consumer price index (CPI)"),
84
+ ],
85
+ "prices": [
86
+ ("px-x-0502010000_101", "Price statistics and indices"),
87
+ ],
88
+ "cost": [
89
+ ("px-x-0502010000_101", "Cost of living indices"),
90
+ ],
91
+
92
+ # Income & Consumption (px-x-20)
93
+ "income": [
94
+ ("px-x-2105000000_101", "Income distribution"),
95
+ ("px-x-2105000000_102", "Household income"),
96
+ ],
97
+ "wages": [
98
+ ("px-x-2105000000_101", "Wage statistics"),
99
+ ],
100
+ "salary": [
101
+ ("px-x-2105000000_101", "Salary and compensation"),
102
+ ],
103
+
104
+ # Education (px-x-15)
105
+ "education": [
106
+ ("px-x-1502010000_101", "Education statistics"),
107
+ ("px-x-1502010100_101", "Students and schools"),
108
+ ],
109
+ "students": [
110
+ ("px-x-1502010100_101", "Student enrollment"),
111
+ ],
112
+ "schools": [
113
+ ("px-x-1502010100_101", "School statistics"),
114
+ ],
115
+ "university": [
116
+ ("px-x-1502010100_101", "Higher education statistics"),
117
+ ],
118
+
119
+ # Health (px-x-14)
120
+ "health": [
121
+ ("px-x-1404010100_101", "Health statistics"),
122
+ ("px-x-1404050000_101", "Healthcare costs"),
123
+ ],
124
+ "hospital": [
125
+ ("px-x-1404010100_101", "Hospital statistics"),
126
+ ],
127
+ "medical": [
128
+ ("px-x-1404010100_101", "Medical care statistics"),
129
+ ],
130
+
131
+ # Energy (px-x-07)
132
+ "energy": [
133
+ ("px-x-0702000000_101", "Energy statistics"),
134
+ ],
135
+ "electricity": [
136
+ ("px-x-0702000000_101", "Electricity production and consumption"),
137
+ ],
138
+ "power": [
139
+ ("px-x-0702000000_101", "Power generation"),
140
+ ],
141
+
142
+ # Housing (px-x-09)
143
+ "housing": [
144
+ ("px-x-0902020100_104", "Housing statistics"),
145
+ ],
146
+ "rent": [
147
+ ("px-x-0902020100_104", "Rental prices"),
148
+ ],
149
+ "construction": [
150
+ ("px-x-0902020100_104", "Construction statistics"),
151
+ ],
152
+ }
153
+
154
+ # Global HTTP client
155
+ http_client: Optional[httpx.AsyncClient] = None
156
+
157
+ def get_client() -> httpx.AsyncClient:
158
+ """Get or create HTTP client."""
159
+ global http_client
160
+ if http_client is None:
161
+ http_client = httpx.AsyncClient(
162
+ timeout=60.0,
163
+ headers={
164
+ "User-Agent": "Mozilla/5.0 (compatible; BFS-MCP/1.0; +https://github.com/user/bfs-mcp)",
165
+ "Accept": "application/json",
166
+ "Accept-Language": "en,de,fr,it"
167
+ }
168
+ )
169
+ return http_client
170
+
171
+ # Pydantic models for input validation
172
+
173
+ class ListDatacubesInput(BaseModel):
174
+ """Input for listing BFS datacubes."""
175
+ model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
176
+
177
+ path: str = Field("", description="Category path to explore (e.g., '' for root, 'px-x-01' for population)")
178
+ language: Language = Field(Language.EN, description="Response language")
179
+
180
+ class GetMetadataInput(BaseModel):
181
+ """Input for getting datacube metadata."""
182
+ model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
183
+
184
+ datacube_id: str = Field(..., description="The BFS datacube identifier (e.g., px-x-0102030000_101)", min_length=1)
185
+ language: Language = Field(Language.EN, description="Response language")
186
+
187
+ class DimensionFilter(BaseModel):
188
+ """Filter for a single dimension."""
189
+ code: str = Field(..., description="Dimension code (e.g., 'Jahr', 'Region', 'Geschlecht')")
190
+ filter: FilterType = Field(..., description="Filter type")
191
+ values: List[str] = Field(..., description="Values to select")
192
+
193
+ class QueryDataInput(BaseModel):
194
+ """Input for querying BFS datacube data."""
195
+ model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
196
+
197
+ datacube_id: str = Field(..., description="The BFS datacube identifier", min_length=1)
198
+ filters: List[DimensionFilter] = Field(default=[], description="Query filters for dimensions")
199
+ format: DataFormat = Field(DataFormat.CSV, description="Response format")
200
+ language: Language = Field(Language.EN, description="Response language")
201
+
202
+ class SearchDatacubesInput(BaseModel):
203
+ """Input for searching BFS datacubes."""
204
+ model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
205
+
206
+ keywords: str = Field(..., description="Search keywords (e.g., 'inflation', 'employment', 'education', 'health')", min_length=1)
207
+ language: Language = Field(Language.EN, description="Response language")
208
+
209
+ class GetConfigInput(BaseModel):
210
+ """Input for getting API configuration."""
211
+ model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
212
+
213
+ language: Language = Field(Language.EN, description="Response language")
214
+
215
+ # Tool implementations
216
+
217
+ @mcp.tool(
218
+ name="bfs_list_datacubes",
219
+ annotations={
220
+ "title": "List BFS Datacubes",
221
+ "readOnlyHint": True,
222
+ "destructiveHint": False,
223
+ "idempotentHint": True,
224
+ "openWorldHint": True
225
+ }
226
+ )
227
+ async def list_datacubes(params: ListDatacubesInput) -> str:
228
+ """
229
+ List available datacubes from a BFS category path.
230
+
231
+ Browse the Swiss Federal Statistical Office data catalog by category.
232
+ The BFS API has datacube IDs at the root level.
233
+
234
+ Examples:
235
+ - List all datacubes: path=""
236
+ - Get specific datacube: path="px-x-0102030000_101"
237
+ """
238
+ url = f"{BASE_URL}/{params.language.value}"
239
+ if params.path:
240
+ url += f"/{params.path}"
241
+
242
+ try:
243
+ client = get_client()
244
+ response = await client.get(url)
245
+ response.raise_for_status()
246
+ data = response.json()
247
+
248
+ result = f"Available datacubes (showing first 50):\n\n"
249
+
250
+ if isinstance(data, list):
251
+ # Limit to first 50 to avoid overwhelming response
252
+ for item in data[:50]:
253
+ if isinstance(item, dict):
254
+ dbid = item.get('dbid') or item.get('id', 'N/A')
255
+ text = item.get('text', 'N/A')
256
+ result += f"• **{dbid}**: {text}\n"
257
+ if item.get('type') == 't':
258
+ result += " ↳ Use bfs_query_data with this datacube_id\n"
259
+
260
+ if len(data) > 50:
261
+ result += f"\n... and {len(data) - 50} more datacubes\n"
262
+ else:
263
+ result += json.dumps(data, indent=2)
264
+
265
+ return result
266
+
267
+ except Exception as e:
268
+ logger.error(f"Error listing datacubes: {e}")
269
+ return f"Error listing datacubes: {str(e)}"
270
+
271
+ @mcp.tool(
272
+ name="bfs_get_metadata",
273
+ annotations={
274
+ "title": "Get BFS Datacube Metadata",
275
+ "readOnlyHint": True,
276
+ "destructiveHint": False,
277
+ "idempotentHint": True,
278
+ "openWorldHint": True
279
+ }
280
+ )
281
+ async def get_metadata(params: GetMetadataInput) -> str:
282
+ """
283
+ Get metadata about a BFS datacube including dimensions and available values.
284
+
285
+ Returns detailed information about a specific datacube including:
286
+ - Title and description
287
+ - Available dimensions (time, region, category, etc.)
288
+ - Possible values for each dimension
289
+ - Data structure information
290
+
291
+ Use this before querying data to understand what filters are available.
292
+ """
293
+ url = f"{BASE_URL}/{params.language.value}/{params.datacube_id}/{params.datacube_id}.px"
294
+
295
+ try:
296
+ client = get_client()
297
+ response = await client.get(url)
298
+ response.raise_for_status()
299
+ metadata = response.json()
300
+
301
+ result = f"Metadata for {params.datacube_id}:\n\n"
302
+
303
+ # Extract key information
304
+ if "title" in metadata:
305
+ result += f"Title: {metadata['title']}\n\n"
306
+
307
+ if "variables" in metadata:
308
+ result += "Available dimensions:\n"
309
+ for var in metadata["variables"]:
310
+ result += f"\n• {var.get('code', 'N/A')}: {var.get('text', 'N/A')}\n"
311
+ if "values" in var and len(var["values"]) <= 10:
312
+ result += f" Values: {', '.join(var['values'][:10])}\n"
313
+ elif "values" in var:
314
+ result += f" Values: {len(var['values'])} options available\n"
315
+
316
+ result += f"\n\nFull metadata:\n{json.dumps(metadata, indent=2)}"
317
+
318
+ return result
319
+
320
+ except Exception as e:
321
+ logger.error(f"Error fetching metadata: {e}")
322
+ return f"Error fetching metadata: {str(e)}"
323
+
324
+ @mcp.tool(
325
+ name="bfs_query_data",
326
+ annotations={
327
+ "title": "Query BFS Datacube Data",
328
+ "readOnlyHint": True,
329
+ "destructiveHint": False,
330
+ "idempotentHint": True,
331
+ "openWorldHint": True
332
+ }
333
+ )
334
+ async def query_data(params: QueryDataInput) -> str:
335
+ """
336
+ Query any BFS datacube with custom filters.
337
+
338
+ Retrieve actual statistical data from a datacube. You can filter by:
339
+ - Time periods (years, months, quarters)
340
+ - Geographic regions (cantons, municipalities)
341
+ - Categories (age groups, sectors, types, etc.)
342
+
343
+ Returns data in the specified format (CSV, JSON, JSON-stat).
344
+
345
+ Note: If no filters are provided, will attempt to return recent data.
346
+ """
347
+ url = f"{BASE_URL}/{params.language.value}/{params.datacube_id}/{params.datacube_id}.px"
348
+
349
+ # Build query
350
+ query = {
351
+ "query": [],
352
+ "response": {"format": params.format.value}
353
+ }
354
+
355
+ # Convert filters to query format
356
+ for f in params.filters:
357
+ query["query"].append({
358
+ "code": f.code,
359
+ "selection": {
360
+ "filter": f.filter.value,
361
+ "values": f.values
362
+ }
363
+ })
364
+
365
+ # If no filters, try to get recent/limited data
366
+ if not params.filters:
367
+ # Try to get metadata first to find a time dimension
368
+ try:
369
+ client = get_client()
370
+ meta_response = await client.get(url)
371
+ if meta_response.status_code == 200:
372
+ metadata = meta_response.json()
373
+ # Look for time-related dimension
374
+ for var in metadata.get("variables", []):
375
+ if var.get("code", "").lower() in ["jahr", "year", "zeit", "time", "periode"]:
376
+ query["query"] = [{
377
+ "code": var["code"],
378
+ "selection": {"filter": "top", "values": ["5"]}
379
+ }]
380
+ break
381
+ except:
382
+ pass
383
+
384
+ try:
385
+ client = get_client()
386
+ response = await client.post(url, json=query)
387
+ response.raise_for_status()
388
+
389
+ if params.format == DataFormat.CSV:
390
+ return response.text
391
+ else:
392
+ return json.dumps(response.json(), indent=2)
393
+
394
+ except httpx.HTTPStatusError as e:
395
+ error_msg = f"HTTP Error {e.response.status_code}: "
396
+ try:
397
+ error_detail = e.response.json()
398
+ error_msg += json.dumps(error_detail, indent=2)
399
+ except:
400
+ error_msg += e.response.text
401
+ logger.error(error_msg)
402
+ return error_msg
403
+ except Exception as e:
404
+ logger.error(f"Error querying data: {e}")
405
+ return f"Error querying data: {str(e)}"
406
+
407
+ @mcp.tool(
408
+ name="bfs_search",
409
+ annotations={
410
+ "title": "Search BFS Datacubes",
411
+ "readOnlyHint": True,
412
+ "destructiveHint": False,
413
+ "idempotentHint": True,
414
+ "openWorldHint": True
415
+ }
416
+ )
417
+ async def search_datacubes(params: SearchDatacubesInput) -> str:
418
+ """
419
+ Search for BFS datacubes by topic keywords using built-in knowledge base.
420
+
421
+ Find relevant datacubes for topics like:
422
+ - Population statistics
423
+ - Employment and unemployment
424
+ - Education and science
425
+ - Health statistics
426
+ - Economic indicators
427
+ - Inflation and prices
428
+ - Energy consumption
429
+ - Housing and construction
430
+
431
+ Returns matching datacubes with descriptions.
432
+ """
433
+ try:
434
+ # Search in knowledge base
435
+ keywords_lower = params.keywords.lower().strip()
436
+ matches = []
437
+
438
+ # Split search keywords and match against knowledge base
439
+ search_words = [w for w in keywords_lower.split() if len(w) > 2]
440
+
441
+ # Check each keyword in knowledge base
442
+ for keyword, datacubes in DATACUBE_KNOWLEDGE_BASE.items():
443
+ # Match if any search word appears in the knowledge base keyword
444
+ if any(word in keyword for word in search_words) or any(keyword in word for word in search_words):
445
+ for datacube_id, description in datacubes:
446
+ # Avoid duplicates
447
+ if not any(m['id'] == datacube_id for m in matches):
448
+ matches.append({
449
+ 'id': datacube_id,
450
+ 'text': description,
451
+ 'keyword': keyword
452
+ })
453
+
454
+ # Format results
455
+ result = f"Search results for '{params.keywords}':\n\n"
456
+
457
+ if matches:
458
+ result += f"Found {len(matches)} matching datacube(s):\n\n"
459
+ for i, match in enumerate(matches[:20], 1): # Limit to 20 results
460
+ result += f"{i}. **{match['id']}**\n"
461
+ result += f" {match['text']}\n"
462
+ result += f" ↳ To get data: Use bfs_query_data(datacube_id='{match['id']}')\n"
463
+ result += "\n"
464
+
465
+ if len(matches) > 20:
466
+ result += f"... and {len(matches) - 20} more results (showing first 20)\n"
467
+ else:
468
+ result += "No datacubes found matching your keywords.\n\n"
469
+ result += "Try these topics: population, employment, unemployment, health, inflation, "
470
+ result += "education, energy, housing, income, wages, prices, cost\n"
471
+
472
+ return result
473
+
474
+ except Exception as e:
475
+ logger.error(f"Error searching datacubes: {e}")
476
+ return f"Error searching datacubes: {str(e)}"
477
+
478
+ @mcp.tool(
479
+ name="bfs_get_config",
480
+ annotations={
481
+ "title": "Get BFS API Configuration",
482
+ "readOnlyHint": True,
483
+ "destructiveHint": False,
484
+ "idempotentHint": True,
485
+ "openWorldHint": True
486
+ }
487
+ )
488
+ async def get_config(params: GetConfigInput) -> str:
489
+ """
490
+ Get API configuration and limits.
491
+
492
+ Returns information about the BFS API including:
493
+ - API version
494
+ - Rate limits
495
+ - Data access restrictions
496
+ - Available features
497
+ """
498
+ url = f"{BASE_URL}/{params.language.value}/?config"
499
+
500
+ try:
501
+ client = get_client()
502
+ response = await client.get(url)
503
+ response.raise_for_status()
504
+ config = response.json()
505
+
506
+ result = "BFS API Configuration:\n\n"
507
+ result += json.dumps(config, indent=2)
508
+
509
+ return result
510
+
511
+ except Exception as e:
512
+ logger.error(f"Error fetching config: {e}")
513
+ return f"Error fetching config: {str(e)}"
514
+
515
+ # Cleanup function
516
+ async def cleanup():
517
+ """Cleanup resources on shutdown."""
518
+ global http_client
519
+ if http_client:
520
+ await http_client.aclose()
521
+ http_client = None
522
+
523
+ # Main execution
524
+ if __name__ == "__main__":
525
+ import atexit
526
+
527
+ # Register cleanup to run when server exits
528
+ def cleanup_sync():
529
+ import asyncio
530
+ try:
531
+ asyncio.run(cleanup())
532
+ except:
533
+ pass
534
+
535
+ atexit.register(cleanup_sync)
536
+
537
+ # Run FastMCP server (synchronous, blocking call)
538
+ mcp.run()
mcp_bfs/requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Swiss BFS MCP Server Requirements
2
+ mcp>=0.1.0
3
+ httpx>=0.24.0
4
+ python-json-logger>=2.0.0
mcp_bfs/test_bfs_api.py ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for Swiss BFS MCP Server
4
+ Demonstrates direct API usage (not MCP protocol)
5
+ """
6
+
7
+ import asyncio
8
+ import httpx
9
+ import json
10
+
11
+ BASE_URL = "https://www.pxweb.bfs.admin.ch/api/v1"
12
+
13
+ async def test_api():
14
+ """Test the BFS API directly to verify functionality"""
15
+
16
+ headers = {
17
+ "User-Agent": "Mozilla/5.0 (compatible; BFS-Test/1.0)",
18
+ "Accept": "application/json",
19
+ "Accept-Language": "en,de,fr,it"
20
+ }
21
+
22
+ async with httpx.AsyncClient(timeout=30.0, headers=headers) as client:
23
+
24
+ print("=" * 60)
25
+ print("Swiss BFS API Test")
26
+ print("=" * 60)
27
+
28
+ # 1. Test getting root categories
29
+ print("\n1. Getting root categories...")
30
+ try:
31
+ response = await client.get(f"{BASE_URL}/en")
32
+ data = response.json()
33
+ print(f"Found {len(data)} main categories")
34
+ for item in data[:5]:
35
+ if isinstance(item, dict):
36
+ print(f" - {item.get('id', 'N/A')}: {item.get('text', 'N/A')}")
37
+ except Exception as e:
38
+ print(f"Error: {e}")
39
+
40
+ # 2. Test getting metadata for population datacube
41
+ print("\n2. Getting metadata for population datacube...")
42
+ datacube_id = "px-x-0102030000_101"
43
+ try:
44
+ response = await client.get(f"{BASE_URL}/en/{datacube_id}/{datacube_id}.px")
45
+ metadata = response.json()
46
+ print(f"Datacube: {metadata.get('title', 'N/A')}")
47
+ if "variables" in metadata:
48
+ print("Variables available:")
49
+ for var in metadata["variables"]:
50
+ print(f" - {var.get('code', 'N/A')}: {var.get('text', 'N/A')}")
51
+ except Exception as e:
52
+ print(f"Error: {e}")
53
+
54
+ # 3. Test querying recent data
55
+ print("\n3. Querying recent population data...")
56
+ query = {
57
+ "query": [
58
+ {
59
+ "code": "Jahr",
60
+ "selection": {
61
+ "filter": "top",
62
+ "values": ["3"]
63
+ }
64
+ }
65
+ ],
66
+ "response": {
67
+ "format": "json"
68
+ }
69
+ }
70
+
71
+ try:
72
+ response = await client.post(
73
+ f"{BASE_URL}/en/{datacube_id}/{datacube_id}.px",
74
+ json=query
75
+ )
76
+ data = response.json()
77
+ print("Successfully retrieved data")
78
+ print(f"Response keys: {list(data.keys())}")
79
+ except Exception as e:
80
+ print(f"Error: {e}")
81
+
82
+ # 4. Test browsing other categories
83
+ print("\n4. Browsing price statistics category...")
84
+ try:
85
+ response = await client.get(f"{BASE_URL}/en/px-x-05")
86
+ data = response.json()
87
+ print(f"Found {len(data)} items in price statistics")
88
+ for item in data[:3]:
89
+ if isinstance(item, dict):
90
+ print(f" - {item.get('id', 'N/A')}: {item.get('text', 'N/A')}")
91
+ except Exception as e:
92
+ print(f"Error: {e}")
93
+
94
+ print("\n" + "=" * 60)
95
+ print("Test completed")
96
+ print("=" * 60)
97
+
98
+ if __name__ == "__main__":
99
+ asyncio.run(test_api())
mcp_integration.py CHANGED
@@ -1,6 +1,6 @@
1
  """
2
- MCP Integration for OpenParlData
3
- Provides a wrapper for connecting to the OpenParlData MCP server
4
  and executing tools from the Gradio app.
5
  """
6
 
@@ -89,6 +89,109 @@ class OpenParlDataClient:
89
  # Wrap arguments in 'params' key as expected by MCP server
90
  tool_arguments = {"params": arguments}
91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
  # Call the tool
93
  result = await self.session.call_tool(tool_name, arguments=tool_arguments)
94
 
@@ -248,7 +351,7 @@ async def execute_mcp_query(
248
  show_debug: bool = False
249
  ) -> tuple[str, Optional[str]]:
250
  """
251
- Execute any MCP tool query.
252
 
253
  Args:
254
  user_query: The original user question (for context)
@@ -274,3 +377,38 @@ async def execute_mcp_query(
274
 
275
  finally:
276
  await client.disconnect()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  """
2
+ MCP Integration for OpenParlData and BFS
3
+ Provides wrappers for connecting to the OpenParlData and BFS MCP servers
4
  and executing tools from the Gradio app.
5
  """
6
 
 
89
  # Wrap arguments in 'params' key as expected by MCP server
90
  tool_arguments = {"params": arguments}
91
 
92
+ # DEBUG: Log MCP payload before sending
93
+ print(f"\n📤 [OpenParlDataClient] Sending to MCP server:")
94
+ print(f" Tool: {tool_name}")
95
+ print(f" Wrapped payload: {tool_arguments}")
96
+ print(f" Payload types: {dict((k, type(v).__name__) for k, v in tool_arguments.items())}")
97
+
98
+ # Call the tool
99
+ result = await self.session.call_tool(tool_name, arguments=tool_arguments)
100
+
101
+ # Extract text content from result
102
+ if result.content:
103
+ # MCP returns list of content blocks
104
+ text_parts = []
105
+ for content in result.content:
106
+ if hasattr(content, 'text'):
107
+ text_parts.append(content.text)
108
+ elif isinstance(content, dict) and 'text' in content:
109
+ text_parts.append(content['text'])
110
+ return "\n".join(text_parts)
111
+
112
+ return "No response from tool"
113
+
114
+ def get_tool_info(self) -> List[Dict[str, Any]]:
115
+ """Get information about available tools."""
116
+ return self.available_tools
117
+
118
+
119
+ class BFSClient:
120
+ """Client for interacting with BFS MCP server."""
121
+
122
+ def __init__(self):
123
+ self.session: Optional[ClientSession] = None
124
+ self.available_tools: List[Dict[str, Any]] = []
125
+
126
+ async def connect(self):
127
+ """Connect to the MCP server."""
128
+ # Get the path to the BFS MCP server script
129
+ server_script = Path(__file__).parent / "mcp_bfs" / "bfs_mcp_server.py"
130
+
131
+ if not server_script.exists():
132
+ raise FileNotFoundError(f"BFS MCP server script not found at {server_script}")
133
+
134
+ # Server parameters for stdio connection
135
+ server_params = StdioServerParameters(
136
+ command=sys.executable, # Python interpreter
137
+ args=[str(server_script)],
138
+ env=None
139
+ )
140
+
141
+ # Create stdio client context
142
+ self.stdio_context = stdio_client(server_params)
143
+ read, write = await self.stdio_context.__aenter__()
144
+
145
+ # Create session
146
+ self.session = ClientSession(read, write)
147
+ await self.session.__aenter__()
148
+
149
+ # Initialize and get available tools
150
+ await self.session.initialize()
151
+
152
+ # List available tools
153
+ tools_result = await self.session.list_tools()
154
+ self.available_tools = [
155
+ {
156
+ "name": tool.name,
157
+ "description": tool.description,
158
+ "input_schema": tool.inputSchema
159
+ }
160
+ for tool in tools_result.tools
161
+ ]
162
+
163
+ return self.available_tools
164
+
165
+ async def disconnect(self):
166
+ """Disconnect from the MCP server."""
167
+ if self.session:
168
+ await self.session.__aexit__(None, None, None)
169
+ if hasattr(self, 'stdio_context'):
170
+ await self.stdio_context.__aexit__(None, None, None)
171
+
172
+ async def call_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
173
+ """
174
+ Call an MCP tool with given arguments.
175
+
176
+ Args:
177
+ tool_name: Name of the tool to call
178
+ arguments: Dictionary of arguments for the tool
179
+
180
+ Returns:
181
+ Tool response as string
182
+ """
183
+ if not self.session:
184
+ raise RuntimeError("Not connected to BFS MCP server. Call connect() first.")
185
+
186
+ # Wrap arguments in 'params' key as expected by MCP server
187
+ tool_arguments = {"params": arguments}
188
+
189
+ # DEBUG: Log MCP payload before sending
190
+ print(f"\n📤 [BFSClient] Sending to MCP server:")
191
+ print(f" Tool: {tool_name}")
192
+ print(f" Wrapped payload: {tool_arguments}")
193
+ print(f" Payload types: {dict((k, type(v).__name__) for k, v in tool_arguments.items())}")
194
+
195
  # Call the tool
196
  result = await self.session.call_tool(tool_name, arguments=tool_arguments)
197
 
 
351
  show_debug: bool = False
352
  ) -> tuple[str, Optional[str]]:
353
  """
354
+ Execute any OpenParlData MCP tool query.
355
 
356
  Args:
357
  user_query: The original user question (for context)
 
377
 
378
  finally:
379
  await client.disconnect()
380
+
381
+
382
+ async def execute_mcp_query_bfs(
383
+ user_query: str,
384
+ tool_name: str,
385
+ arguments: Dict[str, Any],
386
+ show_debug: bool = False
387
+ ) -> tuple[str, Optional[str]]:
388
+ """
389
+ Execute any BFS MCP tool query.
390
+
391
+ Args:
392
+ user_query: The original user question (for context)
393
+ tool_name: Name of the BFS MCP tool to call
394
+ arguments: Arguments for the tool
395
+ show_debug: Whether to return debug information
396
+
397
+ Returns:
398
+ Tuple of (response_text, debug_info)
399
+ """
400
+ client = BFSClient()
401
+
402
+ try:
403
+ await client.connect()
404
+
405
+ debug_info = None
406
+ if show_debug:
407
+ debug_info = f"**User Query:** {user_query}\n\n**Tool:** {tool_name}\n**Arguments:** ```json\n{json.dumps(arguments, indent=2)}\n```"
408
+
409
+ response = await client.call_tool(tool_name, arguments)
410
+
411
+ return response, debug_info
412
+
413
+ finally:
414
+ await client.disconnect()
prompts/bfs.txt ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You help users query Swiss Federal Statistical Office data. Return ONLY valid JSON. No markdown or explanations.
2
+
3
+ Format:
4
+ {"tool": "tool_name", "arguments": {...}, "explanation": "brief text"}
5
+
6
+ OR for non-data questions:
7
+ {"response": "your answer"}
8
+
9
+ AVAILABLE TOOLS (Two-Step Workflow):
10
+
11
+ STEP 1 - DISCOVERY:
12
+ bfs_search
13
+ Params: keywords, language
14
+ Purpose: Search datacubes by keywords (inflation, population, employment, etc.)
15
+ Returns: List of matching datacubes with IDs and descriptions
16
+ NOTE: Does NOT accept "format" parameter!
17
+
18
+ STEP 2 - DATA RETRIEVAL:
19
+ bfs_query_data
20
+ Params: datacube_id, language, format (required), filters (optional list)
21
+ Purpose: Get actual data in specified format
22
+ Example: {"datacube_id": "px-x-0502010000_104", "format": "csv", "language": "en"}
23
+
24
+ PARAMETER CONSTRAINTS:
25
+ - language: lowercase "en", "de", "fr", or "it"
26
+ - format (bfs_query_data only): "csv", "json", "json-stat", "json-stat2", or "px"
27
+ - keywords: String describing what data to find
28
+ - datacube_id: Exact ID from bfs_search results
29
+ - ONLY use parameters listed for each tool. NO extra/undocumented parameters.
30
+
31
+ WORKFLOW:
32
+ 1. User asks "I want inflation data" → Use bfs_search with keywords="inflation"
33
+ 2. Present datacube options to user (keep descriptions concise, max 1-2 sentences per datacube)
34
+ 3. User confirms which datacube → Use bfs_query_data with exact datacube_id → CSV download
prompts/parliament.txt ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You help users query Swiss parliamentary data. Return ONLY valid JSON. No markdown or explanations.
2
+
3
+ Format:
4
+ {"tool": "tool_name", "arguments": {...}, "explanation": "brief text"}
5
+
6
+ OR for non-data questions:
7
+ {"response": "your answer"}
8
+
9
+ AVAILABLE TOOLS:
10
+ 1. openparldata_search_parliamentarians
11
+ Params: query, canton (2-letter uppercase like 'ZH'), party, language, limit, offset, response_format
12
+
13
+ 2. openparldata_search_votes
14
+ Params: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), language, limit, offset, response_format
15
+
16
+ 3. openparldata_search_motions
17
+ Params: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), status, language, limit, offset, response_format
18
+
19
+ 4. openparldata_search_debates
20
+ Params: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), language, limit, offset, response_format
21
+
22
+ PARAMETER CONSTRAINTS:
23
+ - limit: Integer between 1-100 (default 20). NEVER exceed 100.
24
+ - language: lowercase "en", "de", "fr", or "it"
25
+ - offset: Integer >= 0 for pagination
26
+ - response_format: "json" or "markdown" (default "markdown")
27
+ - ONLY use parameters listed for each tool. NO extra/undocumented parameters.
28
+
29
+ Rules: Use YYYY-MM-DD dates. For "latest" use date_from="2024-01-01" only.
requirements.txt CHANGED
@@ -19,5 +19,8 @@ pydantic>=2.0.0
19
  # Async Support
20
  anyio>=3.0.0
21
 
 
 
 
22
  # Environment Variables
23
  python-dotenv>=1.0.0
 
19
  # Async Support
20
  anyio>=3.0.0
21
 
22
+ # Logging
23
+ python-json-logger>=2.0.0
24
+
25
  # Environment Variables
26
  python-dotenv>=1.0.0