Spaces:

tomvaillant
/

cojournalist-data

Sleeping

Tom Claude commited on Nov 3

Commit

81d39a3

1 Parent(s): c5df650

Implement engine-per-dataset architecture with argument sanitization and enhanced UI

Major architectural improvements:
- Refactor to engine-per-dataset pattern (ParliamentEngine, BFSEngine)
- Add comprehensive argument sanitization layer to prevent MCP validation errors
- Implement datacube knowledge base for BFS semantic search
- Add parliament cards display with smart pagination (10 items, auto-hide when <10)
- Add strategic logging for debugging MCP payloads

BFS MCP enhancements:
- Create mcp_bfs/ with knowledge base mapping keywords to datacube IDs
- Map topics (population, employment, health, etc.) to specific datacubes
- Enable semantic search without relying on cryptic API IDs

UI improvements:
- Simplify BFS search results display (no duplicate listings)
- Add language-aware content display (de/fr/it preference with English fallback)
- Remove Example Questions and Debug Info sections
- Update title to "CoJournalist Swiss Data" with white text
- Simplify API dataset values to "openparldata" and "bfs"
- Change placeholder text to "(Choose a source on the right first)"

Technical improvements:
- Add Pydantic-compatible type conversions (string→int for limit, string→enum for language/format)
- Implement tool-specific parameter filtering to prevent extra='forbid' errors
- Update prompts with explicit parameter constraints
- Enable Gradio API parameter support for dataset selection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (10) hide show

app.py +946 -243
mcp/openparldata_mcp.py +2 -2
mcp_bfs/MCP_USAGE.md +120 -0
mcp_bfs/bfs_mcp_server.py +538 -0
mcp_bfs/requirements.txt +4 -0
mcp_bfs/test_bfs_api.py +99 -0
mcp_integration.py +141 -3
prompts/bfs.txt +34 -0
prompts/parliament.txt +29 -0
requirements.txt +3 -0

app.py CHANGED Viewed

@@ -1,20 +1,37 @@
 """
-CoJournalist Data - Swiss Parliamentary Data Chatbot
-Powered by Llama-3.1-8B-Instruct and OpenParlData MCP
 """
 import os
 import json
 import gradio as gr
 from huggingface_hub import InferenceClient
 from dotenv import load_dotenv
-from mcp_integration import execute_mcp_query, OpenParlDataClient
 import asyncio
 from usage_tracker import UsageTracker
 # Load environment variables
 load_dotenv()
 # Initialize Hugging Face Inference Client
 HF_TOKEN = os.getenv("HF_TOKEN")
 if not HF_TOKEN:
@@ -22,6 +39,572 @@ if not HF_TOKEN:
 client = InferenceClient(token=HF_TOKEN)
 # Initialize usage tracker with 50 requests per day limit
 tracker = UsageTracker(daily_limit=50)
@@ -33,79 +616,8 @@ LANGUAGES = {
     "Italiano": "it"
 }
-# System prompt for Llama-3.1-8B-Instruct
-SYSTEM_PROMPT = """You are a helpful assistant that helps users query Swiss parliamentary data.
-You have access to the following tools from the OpenParlData MCP server:
-1. **openparldata_search_parliamentarians** - Search for Swiss parliamentarians
-   Parameters: query (name/party), canton (2-letter code), party, active_only, language, limit
-2. **openparldata_get_parliamentarian** - Get detailed info about a specific parliamentarian
-   Parameters: person_id, include_votes, include_motions, language
-3. **openparldata_search_votes** - Search parliamentary votes
-   Parameters:
-   - query (title/description)
-   - date_from (YYYY-MM-DD format, e.g., "2024-01-01")
-   - date_to (YYYY-MM-DD format, e.g., "2024-12-31" - NEVER use "now", always use actual date)
-   - vote_type (must be "final", "detail", or "overall")
-   - language, limit
-4. **openparldata_get_vote_details** - Get detailed vote information
-   Parameters: vote_id, include_individual_votes, language
-5. **openparldata_search_motions** - Search motions and proposals
-   Parameters: query, status, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), submitter_id, language, limit
-6. **openparldata_search_debates** - Search debate transcripts
-   Parameters: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), speaker_id, language, limit
-CRITICAL RULES:
-- All dates MUST be in YYYY-MM-DD format (e.g., "2024-12-31")
-- NEVER use "now", "today", or relative dates - always use actual YYYY-MM-DD dates
-- For "latest" queries, use date_from with a recent date like "2024-01-01" and NO date_to parameter
-- vote_type must ONLY be "final", "detail", or "overall" - no other values
-- Your response MUST be valid JSON only
-- Do NOT include explanatory text or markdown formatting
-When a user asks a question about Swiss parliamentary data:
-1. Analyze what information they need
-2. Determine which tool(s) to use
-3. Extract the relevant parameters from their question
-4. Respond with ONLY a JSON object containing the tool call
-Your response should be in this exact format:
-{
-  "tool": "tool_name",
-  "arguments": {
-    "param1": "value1",
-    "param2": "value2"
-  },
-  "explanation": "Brief explanation of what you're searching for"
-}
-If the user's question is not about Swiss parliamentary data or you cannot determine the right tool, respond with:
-{
-  "response": "Your natural language response here"
-}
-Example:
-User: "Who are the parliamentarians from Zurich?"
-Assistant:
-{
-  "tool": "openparldata_search_parliamentarians",
-  "arguments": {
-    "canton": "ZH",
-    "language": "en",
-    "limit": 20
-  },
-  "explanation": "Searching for active parliamentarians from Canton Zurich"
-}
-"""
-# Example queries
-EXAMPLES = {
     "en": [
         "Who are the parliamentarians from Zurich?",
         "Show me recent votes about climate policy",
@@ -132,132 +644,50 @@ EXAMPLES = {
     ]
 }
-async def query_model_async(message: str, language: str = "en") -> dict:
-    """Query Llama-3.1-8B model via Inference Providers to interpret user intent and determine tool calls."""
-    try:
-        # Create messages for chat completion
-        messages = [
-            {"role": "system", "content": SYSTEM_PROMPT},
-            {"role": "user", "content": f"Language: {language}\nQuestion: {message}"}
-        ]
-        # Call Llama-3.1-8B via HuggingFace Inference Providers
-        response = client.chat_completion(
-            model="meta-llama/Llama-3.1-8B-Instruct",
-            messages=messages,
-            max_tokens=500,
-            temperature=0.3
-        )
-        # Extract response
-        assistant_message = response.choices[0].message.content
-        # Try to parse as JSON
-        try:
-            # Clean up response (sometimes models add markdown code blocks)
-            clean_response = assistant_message.strip()
-            if clean_response.startswith("```json"):
-                clean_response = clean_response[7:]
-            if clean_response.startswith("```"):
-                clean_response = clean_response[3:]
-            if clean_response.endswith("```"):
-                clean_response = clean_response[:-3]
-            clean_response = clean_response.strip()
-            # Find first { or [ (start of JSON) to handle explanatory text
-            json_start = min(
-                clean_response.find('{') if '{' in clean_response else len(clean_response),
-                clean_response.find('[') if '[' in clean_response else len(clean_response)
-            )
-            if json_start > 0:
-                clean_response = clean_response[json_start:]
-            return json.loads(clean_response)
-        except json.JSONDecodeError:
-            # If not valid JSON, treat as natural language response
-            return {"response": assistant_message}
-    except Exception as e:
-        return {"error": f"Error querying model: {str(e)}"}
-def query_model(message: str, language: str = "en") -> dict:
-    """Synchronous wrapper for async model query."""
-    return asyncio.run(query_model_async(message, language))
-async def execute_tool_async(tool_name: str, arguments: dict, show_debug: bool) -> tuple:
-    """Execute MCP tool asynchronously."""
-    return await execute_mcp_query("", tool_name, arguments, show_debug)
-def chat_response(message: str, history: list, language: str, show_debug: bool) -> str:
     """
-    Main chat response function.
-    Args:
-        message: User's message
-        history: Chat history
-        language: Selected language
-        show_debug: Whether to show debug information
-    Returns:
-        Response string
     """
     try:
-        # Get language code
-        lang_code = LANGUAGES.get(language, "en")
-        # Query Phi-3 model to interpret intent
-        model_response = query_model(message, lang_code)
-        # Check if it's a direct response (no tool call needed)
-        if "response" in model_response:
-            return model_response["response"]
-        # Check for error
-        if "error" in model_response:
-            return f"❌ {model_response['error']}"
-        # Execute tool call
-        if "tool" in model_response and "arguments" in model_response:
-            tool_name = model_response["tool"]
-            arguments = model_response["arguments"]
-            explanation = model_response.get("explanation", "")
-            # Ensure language is set in arguments
-            if "language" not in arguments:
-                arguments["language"] = lang_code
-            # Execute the tool
-            try:
-                response, debug_info = asyncio.run(
-                    execute_tool_async(tool_name, arguments, show_debug)
-                )
-                # Build final response
-                final_response = ""
-                if explanation:
-                    final_response += f"*{explanation}*\n\n"
-                if show_debug and debug_info:
-                    final_response += f"### 🔧 Debug Information\n{debug_info}\n\n---\n\n"
-                final_response += f"### 📊 Results\n{response}"
-                return final_response
-            except Exception as e:
-                return f"❌ Error executing tool '{tool_name}': {str(e)}"
-        # Fallback
-        return "I couldn't determine how to process your request. Please try rephrasing your question."
     except Exception as e:
-        return f"❌ An error occurred: {str(e)}"
 # Custom CSS
@@ -269,19 +699,34 @@ custom_css = """
     text-align: center;
     padding: 20px;
     background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
-    color: white;
     border-radius: 10px;
     margin-bottom: 20px;
 }
 """
 # Build Gradio interface
-with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
     gr.Markdown(
         """
         <div class="chatbot-header">
-            <h1>🏛️ CoJournalist Data</h1>
-            <p>Ask questions about Swiss parliamentary data in natural language</p>
         </div>
         """
     )
@@ -291,12 +736,39 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
             chatbot = gr.Chatbot(
                 height=500,
                 label="Chat with CoJournalist",
-                show_label=False
             )
             with gr.Row():
                 msg = gr.Textbox(
-                    placeholder="Ask a question about Swiss parliamentary data...",
                     show_label=False,
                     scale=4
                 )
@@ -305,6 +777,16 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
         with gr.Column(scale=1):
             gr.Markdown("### ⚙️ Settings")
             language = gr.Radio(
                 choices=list(LANGUAGES.keys()),
                 value="English",
@@ -312,70 +794,291 @@ with gr.Blocks(css=custom_css, title="CoJournalist Data") as demo:
                 info="Select response language"
             )
-            show_debug = gr.Checkbox(
-                label="Show debug info",
-                value=False,
-                info="Display tool calls and parameters"
-            )
-            gr.Markdown("### 💡 Example Questions")
-            # Dynamic examples based on language
-            def update_examples(lang):
-                lang_code = LANGUAGES.get(lang, "en")
-                return gr.update(
-                    choices=EXAMPLES.get(lang_code, EXAMPLES["en"])
-                )
-            examples_dropdown = gr.Dropdown(
-                choices=EXAMPLES["en"],
-                label="Try these:",
-                show_label=False
             )
-            language.change(
-                fn=update_examples,
-                inputs=[language],
-                outputs=[examples_dropdown]
             )
-    # Handle message submission
-    def respond(message, chat_history, language, show_debug, request: gr.Request):
-        if not message.strip():
-            return "", chat_history
         # Check usage limit
         user_id = request.client.host if request and hasattr(request, 'client') else "unknown"
         if not tracker.check_limit(user_id):
-            remaining = tracker.get_remaining(user_id)
-            bot_message = f"⚠️ Daily request limit reached. You have used all 50 requests for today. Please try again tomorrow.\n\nThis limit helps us keep the service free and available for everyone."
-            chat_history.append((message, bot_message))
-            return "", chat_history
-        # Get bot response
-        bot_message = chat_response(message, chat_history, language, show_debug)
-        # Update chat history
-        chat_history.append((message, bot_message))
-        return "", chat_history
-    # Handle example selection
-    def use_example(example):
-        return example
-    msg.submit(respond, [msg, chatbot, language, show_debug], [msg, chatbot])
-    submit.click(respond, [msg, chatbot, language, show_debug], [msg, chatbot])
-    examples_dropdown.change(use_example, [examples_dropdown], [msg])
     gr.Markdown(
         """
         ---
-        **Note:** This app uses the OpenParlData MCP server to access Swiss parliamentary data.
-        Currently returning mock data while the OpenParlData API is in development.
-        **Rate Limit:** 50 requests per day per user to keep the service affordable and accessible.
         Powered by [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) via HF Inference Providers and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
         """

 """
+CoJournalist Data - Swiss Parliamentary Data & Statistics Chatbot
+Powered by Llama-3.1-8B-Instruct with OpenParlData and BFS MCP
 """
 import os
 import json
+import tempfile
+from datetime import datetime
+from pathlib import Path
 import gradio as gr
 from huggingface_hub import InferenceClient
 from dotenv import load_dotenv
+from mcp_integration import execute_mcp_query, execute_mcp_query_bfs
 import asyncio
 from usage_tracker import UsageTracker
 # Load environment variables
 load_dotenv()
+# Load system prompts from files
+PROMPTS_DIR = Path(__file__).parent / "prompts"
+def load_prompt(dataset_name: str) -> str:
+    """Load system prompt from file."""
+    prompt_file = PROMPTS_DIR / f"{dataset_name}.txt"
+    if not prompt_file.exists():
+        raise FileNotFoundError(f"Prompt file not found: {prompt_file}")
+    return prompt_file.read_text(encoding='utf-8')
+# Load prompts at startup
+PARLIAMENT_PROMPT = load_prompt("parliament")
+BFS_PROMPT = load_prompt("bfs")
 # Initialize Hugging Face Inference Client
 HF_TOKEN = os.getenv("HF_TOKEN")
 if not HF_TOKEN:
 client = InferenceClient(token=HF_TOKEN)
+class DatasetEngine:
+    """Dataset-specific orchestrator for LLM prompting and tool execution."""
+    def __init__(
+        self,
+        name: str,
+        display_name: str,
+        system_prompt: str,
+        routing_instruction: str,
+        allowed_tools: set[str],
+    ):
+        self.name = name
+        self.display_name = display_name
+        self.system_prompt = system_prompt
+        self.routing_instruction = routing_instruction
+        self.allowed_tools = allowed_tools
+    def build_messages(self, user_message: str, language_label: str, language_code: str) -> list[dict]:
+        """Construct chat completion messages with dataset-specific guardrails."""
+        routing_guardrails = (
+            f"TARGET_DATA_SOURCE: {self.display_name}\n"
+            f"{self.routing_instruction}\n"
+            'If the request requires a different data source, respond with '
+            '{"response": "Explain that the other dataset should be selected in the app."}'
+        )
+        return [
+            {"role": "system", "content": self.system_prompt},
+            {"role": "system", "content": routing_guardrails},
+            {
+                "role": "user",
+                "content": (
+                    f"Selected dataset: {self.display_name}\n"
+                    f"Language preference: {language_label} ({language_code})\n"
+                    f"Question: {user_message}"
+                ),
+            },
+        ]
+    @staticmethod
+    def _parse_model_response(raw_response: str) -> dict:
+        """Parse JSON (with cleanup) returned by the LLM."""
+        clean_response = raw_response.strip()
+        if clean_response.startswith("```json"):
+            clean_response = clean_response[7:]
+        if clean_response.startswith("```"):
+            clean_response = clean_response[3:]
+        if clean_response.endswith("```"):
+            clean_response = clean_response[:-3]
+        clean_response = clean_response.strip()
+        json_start_candidates = []
+        for ch in ("{", "["):
+            idx = clean_response.find(ch)
+            if idx != -1:
+                json_start_candidates.append(idx)
+        if json_start_candidates:
+            clean_response = clean_response[min(json_start_candidates):]
+        return json.loads(clean_response)
+    def query_model(self, user_message: str, language_label: str, language_code: str) -> dict:
+        """Call the LLM with dataset-constrained instructions."""
+        try:
+            messages = self.build_messages(user_message, language_label, language_code)
+            response = client.chat_completion(
+                model="meta-llama/Llama-3.1-8B-Instruct",
+                messages=messages,
+                max_tokens=500,
+                temperature=0.3,
+            )
+            assistant_message = response.choices[0].message.content
+            return self._parse_model_response(assistant_message)
+        except json.JSONDecodeError:
+            # Surface malformed responses to the user so they can retry.
+            return {"response": assistant_message}
+        except Exception as exc:
+            return {"error": f"Error querying model: {str(exc)}"}
+    def execute_tool(
+        self,
+        user_message: str,
+        tool_name: str,
+        arguments: dict,
+        show_debug: bool,
+    ) -> tuple[str, str | None]:
+        """Run the MCP tool for the dataset."""
+        raise NotImplementedError("execute_tool must be implemented by subclasses.")
+    def sanitize_arguments(self, tool_name: str, arguments: dict) -> dict:
+        """
+        Sanitize and validate tool arguments before execution.
+        Args:
+            tool_name: Name of the tool being called
+            arguments: Raw arguments from LLM
+        Returns:
+            Sanitized arguments dict with proper types and valid values
+        """
+        raise NotImplementedError("sanitize_arguments must be implemented by subclasses.")
+    def _compose_response_text(
+        self,
+        explanation: str,
+        debug_info: str | None,
+        show_debug: bool,
+        body: str,
+    ) -> str:
+        parts = []
+        if explanation:
+            parts.append(f"*{explanation}*")
+        if show_debug and debug_info:
+            parts.append(f"### 🔧 Debug Information\n{debug_info}\n\n---")
+        parts.append(body)
+        return "\n\n".join(parts)
+    def postprocess_tool_response(
+        self,
+        *,
+        response: str,
+        tool_name: str,
+        explanation: str,
+        debug_info: str | None,
+        show_debug: bool,
+        language_code: str,
+    ) -> tuple[str, str | None, dict, list]:
+        """Default dataset response handler."""
+        body = f"### 📊 Results\n{response}"
+        final_response = self._compose_response_text(explanation, debug_info, show_debug, body)
+        return final_response, None, {}, []
+    def respond(
+        self,
+        user_message: str,
+        language_label: str,
+        language_code: str,
+        show_debug: bool,
+    ) -> tuple[str, str | None, dict, list]:
+        """Entry point used by the Gradio handler."""
+        model_response = self.query_model(user_message, language_label, language_code)
+        if "response" in model_response:
+            return model_response["response"], None, {}, []
+        if "error" in model_response:
+            return f"❌ {model_response['error']}", None, {}, []
+        tool_name = model_response.get("tool")
+        arguments = model_response.get("arguments")
+        if not tool_name or not isinstance(arguments, dict):
+            return (
+                "I couldn't determine how to process your request. Please try rephrasing your question.",
+                None,
+                {},
+                [],
+            )
+        if tool_name not in self.allowed_tools:
+            allowed_list = ", ".join(sorted(self.allowed_tools))
+            warning = (
+                f"❌ Tool '{tool_name}' is not available for {self.display_name}. "
+                f"Allowed tools: {allowed_list}. Please adjust your request."
+            )
+            return warning, None, {}, []
+        if "language" not in arguments:
+            arguments["language"] = language_code
+        # Sanitize arguments before execution
+        arguments = self.sanitize_arguments(tool_name, arguments)
+        print(f"✅ [DatasetEngine] Sanitized arguments: {arguments}")
+        explanation = model_response.get("explanation", "")
+        response, debug_info = self.execute_tool(user_message, tool_name, arguments, show_debug)
+        return self.postprocess_tool_response(
+            response=response,
+            tool_name=tool_name,
+            explanation=explanation,
+            debug_info=debug_info,
+            show_debug=show_debug,
+            language_code=language_code,
+        )
+class ParliamentEngine(DatasetEngine):
+    # Valid parameter names per tool
+    TOOL_PARAMS = {
+        "openparldata_search_parliamentarians": {
+            "query", "canton", "party", "active_only", "level", "language",
+            "limit", "offset", "response_format"
+        },
+        "openparldata_search_votes": {
+            "query", "date_from", "date_to", "parliament_id", "vote_type",
+            "level", "language", "limit", "offset", "response_format"
+        },
+        "openparldata_search_motions": {
+            "query", "submitter_id", "status", "date_from", "date_to",
+            "level", "language", "limit", "offset", "response_format"
+        },
+        "openparldata_search_debates": {
+            "query", "date_from", "date_to", "speaker_id", "topic",
+            "parliament_id", "level", "language", "limit", "offset", "response_format"
+        },
+    }
+    def __init__(self):
+        super().__init__(
+            name="parliament",
+            display_name="Swiss Parliament Data (OpenParlData)",
+            system_prompt=PARLIAMENT_PROMPT,
+            routing_instruction="Use only tools that begin with 'openparldata_'. Never mention BFS tools.",
+            allowed_tools={
+                "openparldata_search_parliamentarians",
+                "openparldata_search_votes",
+                "openparldata_search_motions",
+                "openparldata_search_debates",
+            },
+        )
+    def sanitize_arguments(self, tool_name: str, arguments: dict) -> dict:
+        """Sanitize arguments for OpenParlData tools."""
+        sanitized = {}
+        valid_params = self.TOOL_PARAMS.get(tool_name, set())
+        for key, value in arguments.items():
+            # Skip extra fields not in the tool schema
+            if key not in valid_params:
+                print(f"⚠️  [ParliamentEngine] Skipping invalid parameter '{key}' for {tool_name}")
+                continue
+            # Type conversions
+            if key == "limit":
+                # Convert to int and clamp to 1-100
+                try:
+                    limit_val = int(value) if isinstance(value, str) else value
+                    sanitized[key] = max(1, min(100, limit_val))
+                except (ValueError, TypeError):
+                    sanitized[key] = 20  # Default
+            elif key == "offset":
+                # Convert to int and ensure >= 0
+                try:
+                    offset_val = int(value) if isinstance(value, str) else value
+                    sanitized[key] = max(0, offset_val)
+                except (ValueError, TypeError):
+                    sanitized[key] = 0  # Default
+            elif key == "language":
+                # Validate language enum (case-insensitive)
+                lang_upper = str(value).upper()
+                if lang_upper in ["DE", "FR", "IT", "EN"]:
+                    sanitized[key] = lang_upper.lower()
+                else:
+                    sanitized[key] = "en"  # Default to English
+            elif key == "active_only":
+                # Convert to bool
+                sanitized[key] = bool(value)
+            else:
+                # Keep other values as-is
+                sanitized[key] = value
+        return sanitized
+    def execute_tool(
+        self,
+        user_message: str,
+        tool_name: str,
+        arguments: dict,
+        show_debug: bool,
+    ) -> tuple[str, str | None]:
+        # DEBUG: Capture arguments before MCP call
+        print(f"\n🔍 [ParliamentEngine] execute_tool called:")
+        print(f"  Tool: {tool_name}")
+        print(f"  Arguments: {arguments}")
+        print(f"  Argument types: {dict((k, type(v).__name__) for k, v in arguments.items())}")
+        return asyncio.run(execute_mcp_query(user_message, tool_name, arguments, show_debug))
+    def postprocess_tool_response(
+        self,
+        *,
+        response: str,
+        tool_name: str,
+        explanation: str,
+        debug_info: str | None,
+        show_debug: bool,
+        language_code: str,
+    ) -> tuple[str, str | None, dict, list]:
+        """Parse OpenParlData JSON responses and create card data."""
+        parliament_cards = []
+        language_fallback = False
+        # Try to parse JSON response
+        try:
+            data = json.loads(response)
+            # Check if it's an OpenParlData response with data array
+            if isinstance(data, dict) and "data" in data and isinstance(data["data"], list):
+                # Extract card info from each item
+                for item in data["data"]:
+                    if isinstance(item, dict):
+                        # Get title in user's preferred language with fallback
+                        title = "Untitled"
+                        title_dict = item.get("affair_title") if "affair_title" in item else item.get("title")
+                        if isinstance(title_dict, dict):
+                            # Try user's language first
+                            if language_code == "en":
+                                # English not available in API, fallback to German
+                                title = title_dict.get("de") or title_dict.get("fr") or title_dict.get("it") or "Untitled"
+                                if title != "Untitled":
+                                    language_fallback = True
+                            else:
+                                # Try user's language, fallback to de → fr → it
+                                title = (title_dict.get(language_code) or
+                                        title_dict.get("de") or
+                                        title_dict.get("fr") or
+                                        title_dict.get("it") or
+                                        "Untitled")
+                        # Get URL in user's preferred language
+                        url = "#"
+                        if "url_external" in item and isinstance(item["url_external"], dict):
+                            if language_code == "en":
+                                url = item["url_external"].get("de") or item["url_external"].get("fr") or item["url_external"].get("it") or "#"
+                            else:
+                                url = (item["url_external"].get(language_code) or
+                                      item["url_external"].get("de") or
+                                      item["url_external"].get("fr") or
+                                      item["url_external"].get("it") or
+                                      "#")
+                        # Add date if available
+                        date_str = ""
+                        if "date" in item:
+                            date_str = item["date"][:10]  # Extract YYYY-MM-DD
+                        parliament_cards.append({
+                            "title": title,
+                            "url": url,
+                            "date": date_str
+                        })
+                # If we have cards, show a summary message
+                if parliament_cards:
+                    count = len(parliament_cards)
+                    total = data.get("meta", {}).get("total_records", count)
+                    body = f"### 🏛️ Parliament Results\n\nFound **{total}** result(s). Showing {count} items below:"
+                    # Add language fallback notice for English users
+                    if language_fallback and language_code == "en":
+                        body += "\n\n*Note: English content is not available from the API. Results are displayed in German.*"
+                else:
+                    body = "### 🏛️ Parliament Results\n\nNo results found for your query."
+            else:
+                # Not a data response, show as-is
+                body = f"### 📊 Results\n{response}"
+        except json.JSONDecodeError:
+            # Not JSON, treat as text response
+            body = f"### 📊 Results\n{response}"
+        final_response = self._compose_response_text(explanation, debug_info, show_debug, body)
+        return final_response, None, {}, parliament_cards
+class BFSEngine(DatasetEngine):
+    # Valid parameter names per tool
+    TOOL_PARAMS = {
+        "bfs_search": {
+            "keywords", "language"  # NO format parameter!
+        },
+        "bfs_query_data": {
+            "datacube_id", "filters", "format", "language"
+        },
+    }
+    def __init__(self):
+        super().__init__(
+            name="statistics",
+            display_name="Swiss Statistics (BFS)",
+            system_prompt=BFS_PROMPT,
+            routing_instruction="Use only tools that begin with 'bfs_'. Never mention OpenParlData tools.",
+            allowed_tools={
+                "bfs_search",
+                "bfs_query_data",
+            },
+        )
+    def sanitize_arguments(self, tool_name: str, arguments: dict) -> dict:
+        """Sanitize arguments for BFS tools."""
+        sanitized = {}
+        valid_params = self.TOOL_PARAMS.get(tool_name, set())
+        for key, value in arguments.items():
+            # Skip extra fields not in the tool schema
+            if key not in valid_params:
+                print(f"⚠️  [BFSEngine] Skipping invalid parameter '{key}' for {tool_name}")
+                continue
+            # Type conversions
+            if key == "language":
+                # Validate language enum (case-insensitive)
+                lang_upper = str(value).upper()
+                if lang_upper in ["DE", "FR", "IT", "EN"]:
+                    sanitized[key] = lang_upper.lower()
+                else:
+                    sanitized[key] = "en"  # Default to English
+            elif key == "format":
+                # Validate and normalize format enum (only for bfs_query_data)
+                if tool_name == "bfs_query_data":
+                    format_upper = str(value).upper().replace("-", "_")
+                    # Map common values to DataFormat enum
+                    format_map = {
+                        "CSV": "csv",
+                        "JSON": "json",
+                        "JSON_STAT": "json-stat",
+                        "JSON_STAT2": "json-stat2",
+                        "PX": "px",
+                    }
+                    sanitized[key] = format_map.get(format_upper, "csv")  # Default to CSV
+            else:
+                # Keep other values as-is
+                sanitized[key] = value
+        # Add default format for bfs_query_data if not present
+        if tool_name == "bfs_query_data" and "format" not in sanitized:
+            sanitized["format"] = "csv"
+        return sanitized
+    def execute_tool(
+        self,
+        user_message: str,
+        tool_name: str,
+        arguments: dict,
+        show_debug: bool,
+    ) -> tuple[str, str | None]:
+        # DEBUG: Capture arguments after sanitization
+        print(f"\n🔍 [BFSEngine] execute_tool called:")
+        print(f"  Tool: {tool_name}")
+        print(f"  Arguments (sanitized): {arguments}")
+        print(f"  Argument types: {dict((k, type(v).__name__) for k, v in arguments.items())}")
+        return asyncio.run(execute_mcp_query_bfs(user_message, tool_name, arguments, show_debug))
+    @staticmethod
+    def _parse_datacube_choices(response: str) -> tuple[dict, list]:
+        datacube_map: dict[str, str] = {}
+        datacube_choices: list[str] = []
+        import re
+        lines = response.split('\n')
+        i = 0
+        while i < len(lines):
+            line = lines[i]
+            match = re.search(r'^\s*\d+\.\s+\*\*([^*]+)\*\*\s*$', line)
+            if match:
+                datacube_id = match.group(1).strip()
+                description = datacube_id
+                if i + 1 < len(lines):
+                    next_line = lines[i + 1].strip()
+                    if not next_line.startswith('↳') and next_line:
+                        description = next_line
+                    elif i + 2 < len(lines):
+                        description = lines[i + 2].strip() or datacube_id
+                if len(description) > 80:
+                    description = description[:77] + "..."
+                label = f"{description} ({datacube_id})"
+                datacube_choices.append(label)
+                datacube_map[label] = datacube_id
+            i += 1
+        return datacube_map, datacube_choices
+    @staticmethod
+    def _detect_csv(response: str) -> bool:
+        lines = response.strip().split('\n')
+        if len(lines) < 2:
+            return False
+        if ',' not in lines[0] or ',' not in lines[1]:
+            return False
+        prefix = response.lower()[:200]
+        error_tokens = ["error", "no data", "no datacubes found", "try broader"]
+        return not any(token in prefix for token in error_tokens)
+    def postprocess_tool_response(
+        self,
+        *,
+        response: str,
+        tool_name: str,
+        explanation: str,
+        debug_info: str | None,
+        show_debug: bool,
+        language_code: str,
+    ) -> tuple[str, str | None, dict, list]:
+        csv_file_path = None
+        datacube_map: dict[str, str] = {}
+        datacube_choices: list[str] = []
+        body = ""
+        if tool_name == "bfs_query_data" and self._detect_csv(response):
+            rows = response.count('\n')
+            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+            csv_filename = f"bfs_data_{timestamp}.csv"
+            csv_file_path = os.path.join(tempfile.gettempdir(), csv_filename)
+            with open(csv_file_path, 'w', encoding='utf-8') as f:
+                f.write(response)
+            body = (
+                "### 📊 Data Ready\n"
+                f"✅ CSV file generated with {rows} rows\n\n"
+                "💾 **Download your data using the button below**"
+            )
+        else:
+            if tool_name == "bfs_search" and "matching datacube" in response.lower():
+                datacube_map, datacube_choices = self._parse_datacube_choices(response)
+                # If we found datacubes, show a simple message instead of the full response
+                if datacube_choices:
+                    # Extract the search term from explanation
+                    import re
+                    match = re.search(r'related to (.+)', explanation, re.IGNORECASE)
+                    search_term = match.group(1).strip() if match else "your search"
+                    body = f"### 📊 Available Datasets\n\nHere is the data available for **{search_term}**. Please select a dataset below to download:"
+                else:
+                    # No datacubes found, show the full error message
+                    body = f"### 📊 Results\n{response}"
+            else:
+                body = f"### 📊 Results\n{response}"
+        final_response = self._compose_response_text(explanation, debug_info, show_debug, body)
+        return final_response, csv_file_path, datacube_map, datacube_choices
+    def fetch_datacube_data(
+        self,
+        datacube_id: str,
+        language_code: str,
+        show_debug: bool,
+    ) -> tuple[str, str | None]:
+        response, debug_info = self.execute_tool(
+            user_message=f"Get data for datacube {datacube_id}",
+            tool_name="bfs_query_data",
+            arguments={"datacube_id": datacube_id, "language": language_code},
+            show_debug=show_debug,
+        )
+        if self._detect_csv(response):
+            rows = response.count('\n')
+            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+            csv_filename = f"bfs_data_{timestamp}.csv"
+            csv_file_path = os.path.join(tempfile.gettempdir(), csv_filename)
+            with open(csv_file_path, 'w', encoding='utf-8') as f:
+                f.write(response)
+            message = (
+                "### 📊 Data Ready\n"
+                f"✅ CSV file generated with {rows} rows for datacube: `{datacube_id}`\n\n"
+                "💾 **Download your data using the button below**"
+            )
+            if show_debug and debug_info:
+                message = f"### 🔧 Debug Information\n{debug_info}\n\n---\n\n{message}"
+            return message, csv_file_path
+        error_message = f"❌ Error retrieving data:\n\n{response}"
+        return error_message, None
+DATASET_ENGINES: dict[str, DatasetEngine] = {
+    "parliament": ParliamentEngine(),
+    "statistics": BFSEngine(),
+}
 # Initialize usage tracker with 50 requests per day limit
 tracker = UsageTracker(daily_limit=50)
     "Italiano": "it"
 }
+# Example queries for OpenParlData
+OPENPARLDATA_EXAMPLES = {
     "en": [
         "Who are the parliamentarians from Zurich?",
         "Show me recent votes about climate policy",
     ]
 }
+# Example queries for BFS (two-step workflow)
+BFS_EXAMPLES = {
+    "en": [
+        "I want inflation data",
+        "Show me population statistics",
+        "I need employment data by canton",
+        "Find energy consumption statistics"
+    ],
+    "de": [
+        "Ich möchte Inflationsdaten",
+        "Zeige mir Bevölkerungsstatistiken",
+        "Ich brauche Beschäftigungsdaten nach Kanton",
+        "Finde Energieverbrauchsstatistiken"
+    ],
+    "fr": [
+        "Je veux des données sur l'inflation",
+        "Montrez-moi les statistiques de population",
+        "J'ai besoin de données sur l'emploi par canton",
+        "Trouvez les statistiques de consommation d'énergie"
+    ],
+    "it": [
+        "Voglio dati sull'inflazione",
+        "Mostrami le statistiche sulla popolazione",
+        "Ho bisogno di dati sull'occupazione per cantone",
+        "Trova le statistiche sul consumo energetico"
+    ]
+}
+# Keep backward compatibility
+EXAMPLES = OPENPARLDATA_EXAMPLES
+def chat_response(message: str, history: list, language: str, show_debug: bool, dataset: str = "parliament") -> tuple[str, str | None, dict, list]:
     """
+    Main chat response function routed through dataset-specific engines.
     """
     try:
+        engine = DATASET_ENGINES.get(dataset)
+        if not engine:
+            return f"❌ Unknown dataset selected: {dataset}", None, {}, []
+        language_code = LANGUAGES.get(language, "en")
+        return engine.respond(message, language, language_code, show_debug)
     except Exception as e:
+        return f"❌ An error occurred: {str(e)}", None, {}, []
 # Custom CSS
     text-align: center;
     padding: 20px;
     background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+    color: white !important;
     border-radius: 10px;
     margin-bottom: 20px;
 }
+.chatbot-header h1 {
+    color: white !important;
+    margin: 0;
+}
+.chatbot-header p {
+    color: white !important;
+    margin: 10px 0 0 0;
+}
 """
 # Build Gradio interface
+with gr.Blocks(css=custom_css, title="CoJournalist Swiss Data") as demo:
+    # State to track datacube search results
+    datacube_state = gr.State({})  # Maps display text → datacube_id
+    # State to track parliament cards
+    parliament_cards_state = gr.State([])  # List of card dicts
+    parliament_page_state = gr.State(1)  # Current page number
     gr.Markdown(
         """
         <div class="chatbot-header">
+            <h1>🇨🇭 CoJournalist Swiss Data</h1>
+            <p>Query Swiss parliamentary and statistical data in natural language</p>
         </div>
         """
     )
             chatbot = gr.Chatbot(
                 height=500,
                 label="Chat with CoJournalist",
+                show_label=False,
+                type="messages"
             )
+            # CSV download file component
+            download_file = gr.File(
+                label="📥 Download Data",
+                visible=False,
+                interactive=False
+            )
+            # Datacube selection (hidden by default, shown when search returns results)
+            with gr.Row(visible=False) as datacube_selection_row:
+                with gr.Column(scale=4):
+                    datacube_radio = gr.Radio(
+                        label="📋 Select Datacube for Download",
+                        choices=[],
+                        visible=True
+                    )
+                with gr.Column(scale=1):
+                    get_data_btn = gr.Button("📥 Get Data", variant="primary", size="lg")
+            # Parliament cards display (hidden by default, shown when parliament results return)
+            with gr.Column(visible=False) as parliament_cards_row:
+                parliament_cards_html = gr.HTML("")
+                with gr.Row():
+                    prev_page_btn = gr.Button("◀ Previous", size="sm")
+                    page_info = gr.Markdown("Page 1")
+                    next_page_btn = gr.Button("Next ▶", size="sm")
             with gr.Row():
                 msg = gr.Textbox(
+                    placeholder="(Choose a source on the right first)",
                     show_label=False,
                     scale=4
                 )
         with gr.Column(scale=1):
             gr.Markdown("### ⚙️ Settings")
+            dataset = gr.Radio(
+                choices=[
+                    ("Swiss Parliament Data", "openparldata"),
+                    ("Swiss Statistics (BFS)", "bfs")
+                ],
+                value="openparldata",
+                label="Data Source",
+                info="Choose which API to query"
+            )
             language = gr.Radio(
                 choices=list(LANGUAGES.keys()),
                 value="English",
                 info="Select response language"
             )
+    def ensure_message_history(history):
+        """Normalize chat history to the format expected by gr.Chatbot(type='messages')."""
+        normalized: list[dict] = []
+        if not history:
+            return normalized
+        for entry in history:
+            if isinstance(entry, dict):
+                role = entry.get("role")
+                content = entry.get("content", "")
+                if role:
+                    normalized.append({"role": role, "content": "" if content is None else str(content)})
+            elif isinstance(entry, (tuple, list)) and len(entry) == 2:
+                user, assistant = entry
+                if user is not None:
+                    normalized.append({"role": "user", "content": str(user)})
+                if assistant is not None:
+                    normalized.append({"role": "assistant", "content": str(assistant)})
+        return normalized
+    def append_message(history: list[dict], role: str, content: str | None):
+        """Append a message to the normalized history."""
+        history.append({"role": role, "content": "" if content is None else str(content)})
+    def render_parliament_cards(cards: list[dict], page: int, items_per_page: int = 10) -> tuple[str, str, int, bool]:
+        """Render parliament cards as HTML with pagination."""
+        if not cards:
+            return "", "No results", 1, False
+        total_pages = (len(cards) + items_per_page - 1) // items_per_page
+        page = max(1, min(page, total_pages))  # Clamp page to valid range
+        show_pagination = len(cards) > items_per_page
+        start_idx = (page - 1) * items_per_page
+        end_idx = min(start_idx + items_per_page, len(cards))
+        page_cards = cards[start_idx:end_idx]
+        # Generate HTML for cards
+        cards_html = '<div style="display: flex; flex-direction: column; gap: 15px;">'
+        for card in page_cards:
+            title = card.get("title", "Untitled")
+            url = card.get("url", "#")
+            date = card.get("date", "")
+            # Truncate title if too long
+            if len(title) > 120:
+                title = title[:117] + "..."
+            date_badge = f'<span style="background: #e0e0e0; padding: 4px 8px; border-radius: 4px; font-size: 12px; color: #666;">{date}</span>' if date else ''
+            cards_html += f'''
+            <a href="{url}" target="_blank" style="text-decoration: none;">
+                <div style="
+                    border: 1px solid #ddd;
+                    border-radius: 8px;
+                    padding: 16px;
+                    background: white;
+                    transition: all 0.2s;
+                    cursor: pointer;
+                ">
+                    <div style="display: flex; justify-content: space-between; align-items: start; gap: 12px;">
+                        <h3 style="margin: 0; color: #333; font-size: 16px; flex: 1;">{title}</h3>
+                        {date_badge}
+                    </div>
+                </div>
+            </a>
+            '''
+        cards_html += '</div>'
+        page_info = f"Page {page} of {total_pages} ({len(cards)} total results)"
+        return cards_html, page_info, page, show_pagination
+    # Handle message submission
+    def respond(message, chat_history, language, dataset_choice, current_datacube_state, current_parliament_cards, current_page, request: gr.Request):
+        show_debug = False  # Debug mode disabled in UI
+        chat_messages = ensure_message_history(chat_history)
+        if not message.strip():
+            return "", chat_messages, None, gr.update(visible=False), current_datacube_state, gr.update(), gr.update(visible=False), current_parliament_cards, current_page, "", "", gr.update(visible=False), gr.update(), gr.update()
+        # Check usage limit
+        user_id = request.client.host if request and hasattr(request, 'client') else "unknown"
+        append_message(chat_messages, "user", message)
+        if not tracker.check_limit(user_id):
+            bot_message = (
+                "⚠️ Daily request limit reached. You have used all 50 requests for today. "
+                "Please try again tomorrow.\n\nThis limit helps us keep the service free and available for everyone."
             )
+            append_message(chat_messages, "assistant", bot_message)
+            return "", chat_messages, None, gr.update(visible=False), current_datacube_state, gr.update(), gr.update(visible=False), current_parliament_cards, current_page, "", "", gr.update(visible=False), gr.update(), gr.update()
+        # Map dataset choice to engine type
+        dataset_map = {
+            "openparldata": "parliament",
+            "bfs": "statistics"
+        }
+        dataset_type = dataset_map.get(dataset_choice, "parliament")
+        # Get bot response (returns tuple with optional CSV file and results data)
+        bot_message, csv_file, datacube_map, results_data = chat_response(
+            message, chat_messages, language, show_debug, dataset_type
+        )
+        append_message(chat_messages, "assistant", bot_message)
+        # Handle parliament cards (for Parliament dataset)
+        if dataset_type == "parliament" and results_data:
+            cards_html, page_info, page_num, show_pagination = render_parliament_cards(results_data, 1)
+            return (
+                "",
+                chat_messages,
+                None,
+                gr.update(visible=False),
+                current_datacube_state,
+                gr.update(),
+                gr.update(visible=False),
+                results_data,  # parliament_cards_state
+                page_num,  # parliament_page_state
+                cards_html,  # parliament_cards_html
+                page_info,  # page_info
+                gr.update(visible=True),  # parliament_cards_row
+                gr.update(visible=show_pagination),  # prev_page_btn
+                gr.update(visible=show_pagination)  # next_page_btn
             )
+        # Handle datacube search results (for BFS dataset)
+        if dataset_type == "statistics" and results_data:
+            return (
+                "",
+                chat_messages,
+                None,
+                gr.update(visible=False),
+                datacube_map,
+                gr.update(choices=results_data, value=None),
+                gr.update(visible=True),
+                current_parliament_cards,
+                current_page,
+                "",
+                "",
+                gr.update(visible=False),
+                gr.update(),
+                gr.update()
+            )
+        # Handle CSV download
+        if csv_file:
+            return (
+                "",
+                chat_messages,
+                csv_file,
+                gr.update(visible=True),
+                current_datacube_state,
+                gr.update(),
+                gr.update(visible=False),
+                current_parliament_cards,
+                current_page,
+                "",
+                "",
+                gr.update(visible=False),
+                gr.update(),
+                gr.update()
+            )
+        return (
+            "",
+            chat_messages,
+            None,
+            gr.update(visible=False),
+            current_datacube_state,
+            gr.update(),
+            gr.update(visible=False),
+            current_parliament_cards,
+            current_page,
+            "",
+            "",
+            gr.update(visible=False),
+            gr.update(),
+            gr.update()
+        )
+    # Handle parliament pagination
+    def prev_page(cards, current_page):
+        """Go to previous page of parliament results."""
+        new_page = max(1, current_page - 1)
+        cards_html, page_info, page_num, show_pagination = render_parliament_cards(cards, new_page)
+        return cards_html, page_info, page_num
+    def next_page(cards, current_page):
+        """Go to next page of parliament results."""
+        if not cards:
+            return "", "No results", current_page
+        total_pages = (len(cards) + 9) // 10  # 10 items per page
+        new_page = min(total_pages, current_page + 1)
+        cards_html, page_info, page_num, show_pagination = render_parliament_cards(cards, new_page)
+        return cards_html, page_info, page_num
+    # Handle "Get Data" button click for datacube selection
+    def fetch_datacube_data(selected_choice, current_datacube_state, chat_history, language, request: gr.Request):
+        show_debug = False  # Debug mode disabled in UI
+        chat_messages = ensure_message_history(chat_history)
+        user_message = f"Get Data: {selected_choice}" if selected_choice else "Get Data"
+        append_message(chat_messages, "user", user_message)
+        if not selected_choice or not current_datacube_state:
+            error_msg = "⚠️ Please select a datacube first."
+            append_message(chat_messages, "assistant", error_msg)
+            return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
         # Check usage limit
         user_id = request.client.host if request and hasattr(request, 'client') else "unknown"
         if not tracker.check_limit(user_id):
+            bot_message = (
+                "⚠️ Daily request limit reached. You have used all 50 requests for today. "
+                "Please try again tomorrow.\n\nThis limit helps us keep the service free and available for everyone."
+            )
+            append_message(chat_messages, "assistant", bot_message)
+            return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
+        # Get datacube ID from mapping
+        datacube_id = current_datacube_state.get(selected_choice)
+        if not datacube_id:
+            error_msg = "❌ Error: Could not find datacube ID for selected option."
+            append_message(chat_messages, "assistant", error_msg)
+            return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
+        # Get language code
+        lang_code = LANGUAGES.get(language, "en")
+        bfs_engine = DATASET_ENGINES.get("statistics")
+        if not isinstance(bfs_engine, BFSEngine):
+            error_msg = "❌ Error: BFS engine unavailable."
+            append_message(chat_messages, "assistant", error_msg)
+            return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
+        bot_message, csv_file_path = bfs_engine.fetch_datacube_data(datacube_id, lang_code, show_debug)
+        append_message(chat_messages, "assistant", bot_message)
+        if csv_file_path:
+            return chat_messages, csv_file_path, gr.update(visible=True), gr.update(visible=False)
+        return chat_messages, None, gr.update(visible=False), gr.update(visible=False)
+    msg.submit(
+        respond,
+        [msg, chatbot, language, dataset, datacube_state, parliament_cards_state, parliament_page_state],
+        [msg, chatbot, download_file, download_file, datacube_state, datacube_radio, datacube_selection_row,
+         parliament_cards_state, parliament_page_state, parliament_cards_html, page_info, parliament_cards_row,
+         prev_page_btn, next_page_btn]
+    )
+    submit.click(
+        respond,
+        [msg, chatbot, language, dataset, datacube_state, parliament_cards_state, parliament_page_state],
+        [msg, chatbot, download_file, download_file, datacube_state, datacube_radio, datacube_selection_row,
+         parliament_cards_state, parliament_page_state, parliament_cards_html, page_info, parliament_cards_row,
+         prev_page_btn, next_page_btn]
+    )
+    get_data_btn.click(
+        fetch_datacube_data,
+        [datacube_radio, datacube_state, chatbot, language],
+        [chatbot, download_file, download_file, datacube_selection_row]
+    )
+    prev_page_btn.click(
+        prev_page,
+        [parliament_cards_state, parliament_page_state],
+        [parliament_cards_html, page_info, parliament_page_state]
+    )
+    next_page_btn.click(
+        next_page,
+        [parliament_cards_state, parliament_page_state],
+        [parliament_cards_html, page_info, parliament_page_state]
+    )
     gr.Markdown(
         """
         ---
+        **Data Sources:**
+        - **Swiss Parliament Data:** OpenParlData MCP server for parliamentary information
+        - **Swiss Statistics (BFS):** Federal Statistical Office data via PxWeb API
+        **Rate Limit:** 50 requests per day per user (shared across both datasets) to keep the service affordable and accessible.
         Powered by [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) via HF Inference Providers and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
         """

mcp/openparldata_mcp.py CHANGED Viewed

@@ -599,5 +599,5 @@ async def search_debates(params: SearchDebatesInput) -> str:
 # Main execution
 if __name__ == "__main__":
-    import asyncio
-    asyncio.run(mcp.run())

 # Main execution
 if __name__ == "__main__":
+    # Run FastMCP server (synchronous, blocking call)
+    mcp.run()

mcp_bfs/MCP_USAGE.md ADDED Viewed

	@@ -0,0 +1,120 @@

+# Swiss BFS API MCP Server
+## Overview
+This MCP server provides access to ALL data from the Swiss Federal Statistical Office (BFS), not just population data. The BFS maintains comprehensive statistics on:
+- Population and Demographics
+- Territory and Environment
+- Work and Income
+- National Economy
+- Prices and Inflation
+- Industry and Services
+- Agriculture and Forestry
+- Energy
+- Construction and Housing
+- Tourism
+- Mobility and Transport
+- Social Security
+- Health
+- Education and Science
+- Crime and Criminal Justice
+## Installation
+```bash
+pip install -r requirements.txt
+```
+## Usage
+Run the MCP server:
+```bash
+python bfs_mcp_server.py
+```
+The server communicates via stdio and can be integrated with any MCP-compatible client.
+## Available Tools
+### 1. `bfs_list_datacubes`
+Browse available datacubes in the API hierarchy.
+- `path`: Category path (e.g., "px-x-01" for population, "" for root)
+- `language`: de/fr/it/en
+### 2. `bfs_get_metadata`
+Get detailed metadata about a specific datacube including dimensions and available values.
+- `datacube_id`: The datacube identifier (e.g., "px-x-0102030000_101")
+- `language`: de/fr/it/en
+### 3. `bfs_query_data`
+Query any BFS datacube with custom filters.
+- `datacube_id`: The datacube identifier
+- `filters`: Array of filter objects with `code`, `filter` type, and `values`
+- `format`: Output format (csv/json/json-stat/json-stat2/px)
+- `language`: de/fr/it/en
+### 4. `bfs_search`
+Search for datacubes by topic keywords.
+- `keywords`: Search terms (e.g., "inflation", "education", "health")
+- `language`: de/fr/it/en
+### 5. `bfs_get_config`
+Get API configuration and limits.
+- `language`: de/fr/it/en
+## Example Usage Flow
+1. **Search for a topic:**
+   ```
+   bfs_search(keywords="inflation")
+   ```
+2. **Browse a category:**
+   ```
+   bfs_list_datacubes(path="px-x-05")  # Price statistics
+   ```
+3. **Get metadata for a specific datacube:**
+   ```
+   bfs_get_metadata(datacube_id="px-x-0502010000_104")
+   ```
+4. **Query data with filters:**
+   ```
+   bfs_query_data(
+     datacube_id="px-x-0502010000_104",
+     filters=[
+       {"code": "Zeit", "filter": "top", "values": ["12"]}
+     ],
+     format="csv"
+   )
+   ```
+## Category Codes
+Main statistical categories in the BFS system:
+- `px-x-01`: Population
+- `px-x-02`: Territory and Environment
+- `px-x-03`: Work and Income
+- `px-x-04`: National Economy
+- `px-x-05`: Prices
+- `px-x-06`: Industry and Services
+- `px-x-07`: Agriculture and Forestry
+- `px-x-08`: Energy
+- `px-x-09`: Construction and Housing
+- `px-x-10`: Tourism
+- `px-x-11`: Mobility and Transport
+- `px-x-13`: Social Security
+- `px-x-14`: Health
+- `px-x-15`: Education and Science
+- `px-x-19`: Crime and Criminal Justice
+## Integration with LLM Clients
+This MCP server is designed to work with any MCP-compatible LLM client. The server handles natural language understanding through the client, providing structured access to Swiss federal statistics.
+## API Documentation
+The underlying API is a PxWeb implementation (developed by Statistics Sweden).
+- Base URL: https://www.pxweb.bfs.admin.ch/api/v1/{language}/
+- Official BFS Website: https://www.bfs.admin.ch

mcp_bfs/bfs_mcp_server.py ADDED Viewed

	@@ -0,0 +1,538 @@

+#!/usr/bin/env python3
+"""
+Swiss BFS API MCP Server
+Provides broad access to Swiss Federal Statistical Office data via PxWeb API
+Refactored to use FastMCP for consistency with OpenParlData server.
+"""
+import asyncio
+import json
+import logging
+from typing import Dict, List, Any, Optional
+from enum import Enum
+import httpx
+from mcp.server.fastmcp import FastMCP
+from pydantic import BaseModel, Field, ConfigDict
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Initialize FastMCP server
+mcp = FastMCP("swiss-bfs-api")
+# API Configuration
+BASE_URL = "https://www.pxweb.bfs.admin.ch/api/v1"
+class Language(str, Enum):
+    DE = "de"
+    FR = "fr"
+    IT = "it"
+    EN = "en"
+class DataFormat(str, Enum):
+    CSV = "csv"
+    JSON = "json"
+    JSON_STAT = "json-stat"
+    JSON_STAT2 = "json-stat2"
+    PX = "px"
+class FilterType(str, Enum):
+    ALL = "all"
+    ITEM = "item"
+    TOP = "top"
+# Datacube knowledge base: Maps keywords to known datacube IDs with descriptions
+# This helps with semantic search since the API only returns cryptic IDs
+DATACUBE_KNOWLEDGE_BASE = {
+    # Population & Demographics (px-x-01)
+    "population": [
+        ("px-x-0102010000_101", "Permanent resident population by canton"),
+        ("px-x-0102020000_101", "Population by age and sex"),
+        ("px-x-0102020202_106", "Population statistics and scenarios"),
+        ("px-x-0102020300_101", "Population growth and change"),
+    ],
+    "demographics": [
+        ("px-x-0102010000_101", "Permanent resident population by canton"),
+        ("px-x-0102020000_101", "Population by age and sex"),
+    ],
+    "birth": [
+        ("px-x-0102020000_101", "Birth rates and statistics"),
+    ],
+    "death": [
+        ("px-x-0102020000_101", "Mortality rates and statistics"),
+    ],
+    # Employment & Labor (px-x-03)
+    "employment": [
+        ("px-x-0301000000_103", "Employment by sector"),
+        ("px-x-0301000000_104", "Employment statistics"),
+    ],
+    "unemployment": [
+        ("px-x-0301000000_103", "Unemployment rates"),
+    ],
+    "labor": [
+        ("px-x-0301000000_103", "Labor market statistics"),
+    ],
+    "work": [
+        ("px-x-0301000000_103", "Employment and work statistics"),
+    ],
+    # Prices & Inflation (px-x-05)
+    "inflation": [
+        ("px-x-0502010000_101", "Consumer price index (CPI)"),
+    ],
+    "prices": [
+        ("px-x-0502010000_101", "Price statistics and indices"),
+    ],
+    "cost": [
+        ("px-x-0502010000_101", "Cost of living indices"),
+    ],
+    # Income & Consumption (px-x-20)
+    "income": [
+        ("px-x-2105000000_101", "Income distribution"),
+        ("px-x-2105000000_102", "Household income"),
+    ],
+    "wages": [
+        ("px-x-2105000000_101", "Wage statistics"),
+    ],
+    "salary": [
+        ("px-x-2105000000_101", "Salary and compensation"),
+    ],
+    # Education (px-x-15)
+    "education": [
+        ("px-x-1502010000_101", "Education statistics"),
+        ("px-x-1502010100_101", "Students and schools"),
+    ],
+    "students": [
+        ("px-x-1502010100_101", "Student enrollment"),
+    ],
+    "schools": [
+        ("px-x-1502010100_101", "School statistics"),
+    ],
+    "university": [
+        ("px-x-1502010100_101", "Higher education statistics"),
+    ],
+    # Health (px-x-14)
+    "health": [
+        ("px-x-1404010100_101", "Health statistics"),
+        ("px-x-1404050000_101", "Healthcare costs"),
+    ],
+    "hospital": [
+        ("px-x-1404010100_101", "Hospital statistics"),
+    ],
+    "medical": [
+        ("px-x-1404010100_101", "Medical care statistics"),
+    ],
+    # Energy (px-x-07)
+    "energy": [
+        ("px-x-0702000000_101", "Energy statistics"),
+    ],
+    "electricity": [
+        ("px-x-0702000000_101", "Electricity production and consumption"),
+    ],
+    "power": [
+        ("px-x-0702000000_101", "Power generation"),
+    ],
+    # Housing (px-x-09)
+    "housing": [
+        ("px-x-0902020100_104", "Housing statistics"),
+    ],
+    "rent": [
+        ("px-x-0902020100_104", "Rental prices"),
+    ],
+    "construction": [
+        ("px-x-0902020100_104", "Construction statistics"),
+    ],
+}
+# Global HTTP client
+http_client: Optional[httpx.AsyncClient] = None
+def get_client() -> httpx.AsyncClient:
+    """Get or create HTTP client."""
+    global http_client
+    if http_client is None:
+        http_client = httpx.AsyncClient(
+            timeout=60.0,
+            headers={
+                "User-Agent": "Mozilla/5.0 (compatible; BFS-MCP/1.0; +https://github.com/user/bfs-mcp)",
+                "Accept": "application/json",
+                "Accept-Language": "en,de,fr,it"
+            }
+        )
+    return http_client
+# Pydantic models for input validation
+class ListDatacubesInput(BaseModel):
+    """Input for listing BFS datacubes."""
+    model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
+    path: str = Field("", description="Category path to explore (e.g., '' for root, 'px-x-01' for population)")
+    language: Language = Field(Language.EN, description="Response language")
+class GetMetadataInput(BaseModel):
+    """Input for getting datacube metadata."""
+    model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
+    datacube_id: str = Field(..., description="The BFS datacube identifier (e.g., px-x-0102030000_101)", min_length=1)
+    language: Language = Field(Language.EN, description="Response language")
+class DimensionFilter(BaseModel):
+    """Filter for a single dimension."""
+    code: str = Field(..., description="Dimension code (e.g., 'Jahr', 'Region', 'Geschlecht')")
+    filter: FilterType = Field(..., description="Filter type")
+    values: List[str] = Field(..., description="Values to select")
+class QueryDataInput(BaseModel):
+    """Input for querying BFS datacube data."""
+    model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
+    datacube_id: str = Field(..., description="The BFS datacube identifier", min_length=1)
+    filters: List[DimensionFilter] = Field(default=[], description="Query filters for dimensions")
+    format: DataFormat = Field(DataFormat.CSV, description="Response format")
+    language: Language = Field(Language.EN, description="Response language")
+class SearchDatacubesInput(BaseModel):
+    """Input for searching BFS datacubes."""
+    model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
+    keywords: str = Field(..., description="Search keywords (e.g., 'inflation', 'employment', 'education', 'health')", min_length=1)
+    language: Language = Field(Language.EN, description="Response language")
+class GetConfigInput(BaseModel):
+    """Input for getting API configuration."""
+    model_config = ConfigDict(str_strip_whitespace=True, validate_assignment=True, extra='forbid')
+    language: Language = Field(Language.EN, description="Response language")
+# Tool implementations
+@mcp.tool(
+    name="bfs_list_datacubes",
+    annotations={
+        "title": "List BFS Datacubes",
+        "readOnlyHint": True,
+        "destructiveHint": False,
+        "idempotentHint": True,
+        "openWorldHint": True
+    }
+)
+async def list_datacubes(params: ListDatacubesInput) -> str:
+    """
+    List available datacubes from a BFS category path.
+    Browse the Swiss Federal Statistical Office data catalog by category.
+    The BFS API has datacube IDs at the root level.
+    Examples:
+    - List all datacubes: path=""
+    - Get specific datacube: path="px-x-0102030000_101"
+    """
+    url = f"{BASE_URL}/{params.language.value}"
+    if params.path:
+        url += f"/{params.path}"
+    try:
+        client = get_client()
+        response = await client.get(url)
+        response.raise_for_status()
+        data = response.json()
+        result = f"Available datacubes (showing first 50):\n\n"
+        if isinstance(data, list):
+            # Limit to first 50 to avoid overwhelming response
+            for item in data[:50]:
+                if isinstance(item, dict):
+                    dbid = item.get('dbid') or item.get('id', 'N/A')
+                    text = item.get('text', 'N/A')
+                    result += f"• **{dbid}**: {text}\n"
+                    if item.get('type') == 't':
+                        result += "  ↳ Use bfs_query_data with this datacube_id\n"
+            if len(data) > 50:
+                result += f"\n... and {len(data) - 50} more datacubes\n"
+        else:
+            result += json.dumps(data, indent=2)
+        return result
+    except Exception as e:
+        logger.error(f"Error listing datacubes: {e}")
+        return f"Error listing datacubes: {str(e)}"
+@mcp.tool(
+    name="bfs_get_metadata",
+    annotations={
+        "title": "Get BFS Datacube Metadata",
+        "readOnlyHint": True,
+        "destructiveHint": False,
+        "idempotentHint": True,
+        "openWorldHint": True
+    }
+)
+async def get_metadata(params: GetMetadataInput) -> str:
+    """
+    Get metadata about a BFS datacube including dimensions and available values.
+    Returns detailed information about a specific datacube including:
+    - Title and description
+    - Available dimensions (time, region, category, etc.)
+    - Possible values for each dimension
+    - Data structure information
+    Use this before querying data to understand what filters are available.
+    """
+    url = f"{BASE_URL}/{params.language.value}/{params.datacube_id}/{params.datacube_id}.px"
+    try:
+        client = get_client()
+        response = await client.get(url)
+        response.raise_for_status()
+        metadata = response.json()
+        result = f"Metadata for {params.datacube_id}:\n\n"
+        # Extract key information
+        if "title" in metadata:
+            result += f"Title: {metadata['title']}\n\n"
+        if "variables" in metadata:
+            result += "Available dimensions:\n"
+            for var in metadata["variables"]:
+                result += f"\n• {var.get('code', 'N/A')}: {var.get('text', 'N/A')}\n"
+                if "values" in var and len(var["values"]) <= 10:
+                    result += f"  Values: {', '.join(var['values'][:10])}\n"
+                elif "values" in var:
+                    result += f"  Values: {len(var['values'])} options available\n"
+        result += f"\n\nFull metadata:\n{json.dumps(metadata, indent=2)}"
+        return result
+    except Exception as e:
+        logger.error(f"Error fetching metadata: {e}")
+        return f"Error fetching metadata: {str(e)}"
+@mcp.tool(
+    name="bfs_query_data",
+    annotations={
+        "title": "Query BFS Datacube Data",
+        "readOnlyHint": True,
+        "destructiveHint": False,
+        "idempotentHint": True,
+        "openWorldHint": True
+    }
+)
+async def query_data(params: QueryDataInput) -> str:
+    """
+    Query any BFS datacube with custom filters.
+    Retrieve actual statistical data from a datacube. You can filter by:
+    - Time periods (years, months, quarters)
+    - Geographic regions (cantons, municipalities)
+    - Categories (age groups, sectors, types, etc.)
+    Returns data in the specified format (CSV, JSON, JSON-stat).
+    Note: If no filters are provided, will attempt to return recent data.
+    """
+    url = f"{BASE_URL}/{params.language.value}/{params.datacube_id}/{params.datacube_id}.px"
+    # Build query
+    query = {
+        "query": [],
+        "response": {"format": params.format.value}
+    }
+    # Convert filters to query format
+    for f in params.filters:
+        query["query"].append({
+            "code": f.code,
+            "selection": {
+                "filter": f.filter.value,
+                "values": f.values
+            }
+        })
+    # If no filters, try to get recent/limited data
+    if not params.filters:
+        # Try to get metadata first to find a time dimension
+        try:
+            client = get_client()
+            meta_response = await client.get(url)
+            if meta_response.status_code == 200:
+                metadata = meta_response.json()
+                # Look for time-related dimension
+                for var in metadata.get("variables", []):
+                    if var.get("code", "").lower() in ["jahr", "year", "zeit", "time", "periode"]:
+                        query["query"] = [{
+                            "code": var["code"],
+                            "selection": {"filter": "top", "values": ["5"]}
+                        }]
+                        break
+        except:
+            pass
+    try:
+        client = get_client()
+        response = await client.post(url, json=query)
+        response.raise_for_status()
+        if params.format == DataFormat.CSV:
+            return response.text
+        else:
+            return json.dumps(response.json(), indent=2)
+    except httpx.HTTPStatusError as e:
+        error_msg = f"HTTP Error {e.response.status_code}: "
+        try:
+            error_detail = e.response.json()
+            error_msg += json.dumps(error_detail, indent=2)
+        except:
+            error_msg += e.response.text
+        logger.error(error_msg)
+        return error_msg
+    except Exception as e:
+        logger.error(f"Error querying data: {e}")
+        return f"Error querying data: {str(e)}"
+@mcp.tool(
+    name="bfs_search",
+    annotations={
+        "title": "Search BFS Datacubes",
+        "readOnlyHint": True,
+        "destructiveHint": False,
+        "idempotentHint": True,
+        "openWorldHint": True
+    }
+)
+async def search_datacubes(params: SearchDatacubesInput) -> str:
+    """
+    Search for BFS datacubes by topic keywords using built-in knowledge base.
+    Find relevant datacubes for topics like:
+    - Population statistics
+    - Employment and unemployment
+    - Education and science
+    - Health statistics
+    - Economic indicators
+    - Inflation and prices
+    - Energy consumption
+    - Housing and construction
+    Returns matching datacubes with descriptions.
+    """
+    try:
+        # Search in knowledge base
+        keywords_lower = params.keywords.lower().strip()
+        matches = []
+        # Split search keywords and match against knowledge base
+        search_words = [w for w in keywords_lower.split() if len(w) > 2]
+        # Check each keyword in knowledge base
+        for keyword, datacubes in DATACUBE_KNOWLEDGE_BASE.items():
+            # Match if any search word appears in the knowledge base keyword
+            if any(word in keyword for word in search_words) or any(keyword in word for word in search_words):
+                for datacube_id, description in datacubes:
+                    # Avoid duplicates
+                    if not any(m['id'] == datacube_id for m in matches):
+                        matches.append({
+                            'id': datacube_id,
+                            'text': description,
+                            'keyword': keyword
+                        })
+        # Format results
+        result = f"Search results for '{params.keywords}':\n\n"
+        if matches:
+            result += f"Found {len(matches)} matching datacube(s):\n\n"
+            for i, match in enumerate(matches[:20], 1):  # Limit to 20 results
+                result += f"{i}. **{match['id']}**\n"
+                result += f"   {match['text']}\n"
+                result += f"   ↳ To get data: Use bfs_query_data(datacube_id='{match['id']}')\n"
+                result += "\n"
+            if len(matches) > 20:
+                result += f"... and {len(matches) - 20} more results (showing first 20)\n"
+        else:
+            result += "No datacubes found matching your keywords.\n\n"
+            result += "Try these topics: population, employment, unemployment, health, inflation, "
+            result += "education, energy, housing, income, wages, prices, cost\n"
+        return result
+    except Exception as e:
+        logger.error(f"Error searching datacubes: {e}")
+        return f"Error searching datacubes: {str(e)}"
+@mcp.tool(
+    name="bfs_get_config",
+    annotations={
+        "title": "Get BFS API Configuration",
+        "readOnlyHint": True,
+        "destructiveHint": False,
+        "idempotentHint": True,
+        "openWorldHint": True
+    }
+)
+async def get_config(params: GetConfigInput) -> str:
+    """
+    Get API configuration and limits.
+    Returns information about the BFS API including:
+    - API version
+    - Rate limits
+    - Data access restrictions
+    - Available features
+    """
+    url = f"{BASE_URL}/{params.language.value}/?config"
+    try:
+        client = get_client()
+        response = await client.get(url)
+        response.raise_for_status()
+        config = response.json()
+        result = "BFS API Configuration:\n\n"
+        result += json.dumps(config, indent=2)
+        return result
+    except Exception as e:
+        logger.error(f"Error fetching config: {e}")
+        return f"Error fetching config: {str(e)}"
+# Cleanup function
+async def cleanup():
+    """Cleanup resources on shutdown."""
+    global http_client
+    if http_client:
+        await http_client.aclose()
+        http_client = None
+# Main execution
+if __name__ == "__main__":
+    import atexit
+    # Register cleanup to run when server exits
+    def cleanup_sync():
+        import asyncio
+        try:
+            asyncio.run(cleanup())
+        except:
+            pass
+    atexit.register(cleanup_sync)
+    # Run FastMCP server (synchronous, blocking call)
+    mcp.run()

mcp_bfs/requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+# Swiss BFS MCP Server Requirements
+mcp>=0.1.0
+httpx>=0.24.0
+python-json-logger>=2.0.0

mcp_bfs/test_bfs_api.py ADDED Viewed

	@@ -0,0 +1,99 @@

+#!/usr/bin/env python3
+"""
+Test script for Swiss BFS MCP Server
+Demonstrates direct API usage (not MCP protocol)
+"""
+import asyncio
+import httpx
+import json
+BASE_URL = "https://www.pxweb.bfs.admin.ch/api/v1"
+async def test_api():
+    """Test the BFS API directly to verify functionality"""
+    headers = {
+        "User-Agent": "Mozilla/5.0 (compatible; BFS-Test/1.0)",
+        "Accept": "application/json",
+        "Accept-Language": "en,de,fr,it"
+    }
+    async with httpx.AsyncClient(timeout=30.0, headers=headers) as client:
+        print("=" * 60)
+        print("Swiss BFS API Test")
+        print("=" * 60)
+        # 1. Test getting root categories
+        print("\n1. Getting root categories...")
+        try:
+            response = await client.get(f"{BASE_URL}/en")
+            data = response.json()
+            print(f"Found {len(data)} main categories")
+            for item in data[:5]:
+                if isinstance(item, dict):
+                    print(f"  - {item.get('id', 'N/A')}: {item.get('text', 'N/A')}")
+        except Exception as e:
+            print(f"Error: {e}")
+        # 2. Test getting metadata for population datacube
+        print("\n2. Getting metadata for population datacube...")
+        datacube_id = "px-x-0102030000_101"
+        try:
+            response = await client.get(f"{BASE_URL}/en/{datacube_id}/{datacube_id}.px")
+            metadata = response.json()
+            print(f"Datacube: {metadata.get('title', 'N/A')}")
+            if "variables" in metadata:
+                print("Variables available:")
+                for var in metadata["variables"]:
+                    print(f"  - {var.get('code', 'N/A')}: {var.get('text', 'N/A')}")
+        except Exception as e:
+            print(f"Error: {e}")
+        # 3. Test querying recent data
+        print("\n3. Querying recent population data...")
+        query = {
+            "query": [
+                {
+                    "code": "Jahr",
+                    "selection": {
+                        "filter": "top",
+                        "values": ["3"]
+                    }
+                }
+            ],
+            "response": {
+                "format": "json"
+            }
+        }
+        try:
+            response = await client.post(
+                f"{BASE_URL}/en/{datacube_id}/{datacube_id}.px",
+                json=query
+            )
+            data = response.json()
+            print("Successfully retrieved data")
+            print(f"Response keys: {list(data.keys())}")
+        except Exception as e:
+            print(f"Error: {e}")
+        # 4. Test browsing other categories
+        print("\n4. Browsing price statistics category...")
+        try:
+            response = await client.get(f"{BASE_URL}/en/px-x-05")
+            data = response.json()
+            print(f"Found {len(data)} items in price statistics")
+            for item in data[:3]:
+                if isinstance(item, dict):
+                    print(f"  - {item.get('id', 'N/A')}: {item.get('text', 'N/A')}")
+        except Exception as e:
+            print(f"Error: {e}")
+        print("\n" + "=" * 60)
+        print("Test completed")
+        print("=" * 60)
+if __name__ == "__main__":
+    asyncio.run(test_api())

mcp_integration.py CHANGED Viewed

@@ -1,6 +1,6 @@
 """
-MCP Integration for OpenParlData
-Provides a wrapper for connecting to the OpenParlData MCP server
 and executing tools from the Gradio app.
 """
@@ -89,6 +89,109 @@ class OpenParlDataClient:
         # Wrap arguments in 'params' key as expected by MCP server
         tool_arguments = {"params": arguments}
         # Call the tool
         result = await self.session.call_tool(tool_name, arguments=tool_arguments)
@@ -248,7 +351,7 @@ async def execute_mcp_query(
     show_debug: bool = False
 ) -> tuple[str, Optional[str]]:
     """
-    Execute any MCP tool query.
     Args:
         user_query: The original user question (for context)
@@ -274,3 +377,38 @@ async def execute_mcp_query(
     finally:
         await client.disconnect()

 """
+MCP Integration for OpenParlData and BFS
+Provides wrappers for connecting to the OpenParlData and BFS MCP servers
 and executing tools from the Gradio app.
 """
         # Wrap arguments in 'params' key as expected by MCP server
         tool_arguments = {"params": arguments}
+        # DEBUG: Log MCP payload before sending
+        print(f"\n📤 [OpenParlDataClient] Sending to MCP server:")
+        print(f"  Tool: {tool_name}")
+        print(f"  Wrapped payload: {tool_arguments}")
+        print(f"  Payload types: {dict((k, type(v).__name__) for k, v in tool_arguments.items())}")
+        # Call the tool
+        result = await self.session.call_tool(tool_name, arguments=tool_arguments)
+        # Extract text content from result
+        if result.content:
+            # MCP returns list of content blocks
+            text_parts = []
+            for content in result.content:
+                if hasattr(content, 'text'):
+                    text_parts.append(content.text)
+                elif isinstance(content, dict) and 'text' in content:
+                    text_parts.append(content['text'])
+            return "\n".join(text_parts)
+        return "No response from tool"
+    def get_tool_info(self) -> List[Dict[str, Any]]:
+        """Get information about available tools."""
+        return self.available_tools
+class BFSClient:
+    """Client for interacting with BFS MCP server."""
+    def __init__(self):
+        self.session: Optional[ClientSession] = None
+        self.available_tools: List[Dict[str, Any]] = []
+    async def connect(self):
+        """Connect to the MCP server."""
+        # Get the path to the BFS MCP server script
+        server_script = Path(__file__).parent / "mcp_bfs" / "bfs_mcp_server.py"
+        if not server_script.exists():
+            raise FileNotFoundError(f"BFS MCP server script not found at {server_script}")
+        # Server parameters for stdio connection
+        server_params = StdioServerParameters(
+            command=sys.executable,  # Python interpreter
+            args=[str(server_script)],
+            env=None
+        )
+        # Create stdio client context
+        self.stdio_context = stdio_client(server_params)
+        read, write = await self.stdio_context.__aenter__()
+        # Create session
+        self.session = ClientSession(read, write)
+        await self.session.__aenter__()
+        # Initialize and get available tools
+        await self.session.initialize()
+        # List available tools
+        tools_result = await self.session.list_tools()
+        self.available_tools = [
+            {
+                "name": tool.name,
+                "description": tool.description,
+                "input_schema": tool.inputSchema
+            }
+            for tool in tools_result.tools
+        ]
+        return self.available_tools
+    async def disconnect(self):
+        """Disconnect from the MCP server."""
+        if self.session:
+            await self.session.__aexit__(None, None, None)
+        if hasattr(self, 'stdio_context'):
+            await self.stdio_context.__aexit__(None, None, None)
+    async def call_tool(self, tool_name: str, arguments: Dict[str, Any]) -> str:
+        """
+        Call an MCP tool with given arguments.
+        Args:
+            tool_name: Name of the tool to call
+            arguments: Dictionary of arguments for the tool
+        Returns:
+            Tool response as string
+        """
+        if not self.session:
+            raise RuntimeError("Not connected to BFS MCP server. Call connect() first.")
+        # Wrap arguments in 'params' key as expected by MCP server
+        tool_arguments = {"params": arguments}
+        # DEBUG: Log MCP payload before sending
+        print(f"\n📤 [BFSClient] Sending to MCP server:")
+        print(f"  Tool: {tool_name}")
+        print(f"  Wrapped payload: {tool_arguments}")
+        print(f"  Payload types: {dict((k, type(v).__name__) for k, v in tool_arguments.items())}")
         # Call the tool
         result = await self.session.call_tool(tool_name, arguments=tool_arguments)
     show_debug: bool = False
 ) -> tuple[str, Optional[str]]:
     """
+    Execute any OpenParlData MCP tool query.
     Args:
         user_query: The original user question (for context)
     finally:
         await client.disconnect()
+async def execute_mcp_query_bfs(
+    user_query: str,
+    tool_name: str,
+    arguments: Dict[str, Any],
+    show_debug: bool = False
+) -> tuple[str, Optional[str]]:
+    """
+    Execute any BFS MCP tool query.
+    Args:
+        user_query: The original user question (for context)
+        tool_name: Name of the BFS MCP tool to call
+        arguments: Arguments for the tool
+        show_debug: Whether to return debug information
+    Returns:
+        Tuple of (response_text, debug_info)
+    """
+    client = BFSClient()
+    try:
+        await client.connect()
+        debug_info = None
+        if show_debug:
+            debug_info = f"**User Query:** {user_query}\n\n**Tool:** {tool_name}\n**Arguments:** ```json\n{json.dumps(arguments, indent=2)}\n```"
+        response = await client.call_tool(tool_name, arguments)
+        return response, debug_info
+    finally:
+        await client.disconnect()

prompts/bfs.txt ADDED Viewed

	@@ -0,0 +1,34 @@

+You help users query Swiss Federal Statistical Office data. Return ONLY valid JSON. No markdown or explanations.
+Format:
+{"tool": "tool_name", "arguments": {...}, "explanation": "brief text"}
+OR for non-data questions:
+{"response": "your answer"}
+AVAILABLE TOOLS (Two-Step Workflow):
+STEP 1 - DISCOVERY:
+bfs_search
+  Params: keywords, language
+  Purpose: Search datacubes by keywords (inflation, population, employment, etc.)
+  Returns: List of matching datacubes with IDs and descriptions
+  NOTE: Does NOT accept "format" parameter!
+STEP 2 - DATA RETRIEVAL:
+bfs_query_data
+  Params: datacube_id, language, format (required), filters (optional list)
+  Purpose: Get actual data in specified format
+  Example: {"datacube_id": "px-x-0502010000_104", "format": "csv", "language": "en"}
+PARAMETER CONSTRAINTS:
+- language: lowercase "en", "de", "fr", or "it"
+- format (bfs_query_data only): "csv", "json", "json-stat", "json-stat2", or "px"
+- keywords: String describing what data to find
+- datacube_id: Exact ID from bfs_search results
+- ONLY use parameters listed for each tool. NO extra/undocumented parameters.
+WORKFLOW:
+1. User asks "I want inflation data" → Use bfs_search with keywords="inflation"
+2. Present datacube options to user (keep descriptions concise, max 1-2 sentences per datacube)
+3. User confirms which datacube → Use bfs_query_data with exact datacube_id → CSV download

prompts/parliament.txt ADDED Viewed

	@@ -0,0 +1,29 @@

+You help users query Swiss parliamentary data. Return ONLY valid JSON. No markdown or explanations.
+Format:
+{"tool": "tool_name", "arguments": {...}, "explanation": "brief text"}
+OR for non-data questions:
+{"response": "your answer"}
+AVAILABLE TOOLS:
+1. openparldata_search_parliamentarians
+   Params: query, canton (2-letter uppercase like 'ZH'), party, language, limit, offset, response_format
+2. openparldata_search_votes
+   Params: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), language, limit, offset, response_format
+3. openparldata_search_motions
+   Params: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), status, language, limit, offset, response_format
+4. openparldata_search_debates
+   Params: query, date_from (YYYY-MM-DD), date_to (YYYY-MM-DD), language, limit, offset, response_format
+PARAMETER CONSTRAINTS:
+- limit: Integer between 1-100 (default 20). NEVER exceed 100.
+- language: lowercase "en", "de", "fr", or "it"
+- offset: Integer >= 0 for pagination
+- response_format: "json" or "markdown" (default "markdown")
+- ONLY use parameters listed for each tool. NO extra/undocumented parameters.
+Rules: Use YYYY-MM-DD dates. For "latest" use date_from="2024-01-01" only.

requirements.txt CHANGED Viewed

@@ -19,5 +19,8 @@ pydantic>=2.0.0
 # Async Support
 anyio>=3.0.0
 # Environment Variables
 python-dotenv>=1.0.0

 # Async Support
 anyio>=3.0.0
+# Logging
+python-json-logger>=2.0.0
 # Environment Variables
 python-dotenv>=1.0.0