| # TraceMind MCP Server - Technical Architecture | |
| This document provides a deep technical dive into the TraceMind MCP Server architecture, implementation details, and deployment configuration. | |
| ## Table of Contents | |
| - [System Overview](#system-overview) | |
| - [Project Structure](#project-structure) | |
| - [Core Components](#core-components) | |
| - [MCP Protocol Implementation](#mcp-protocol-implementation) | |
| - [Gemini Integration](#gemini-integration) | |
| - [Data Flow](#data-flow) | |
| - [Deployment Architecture](#deployment-architecture) | |
| - [Development Workflow](#development-workflow) | |
| - [Performance Considerations](#performance-considerations) | |
| - [Security](#security) | |
| --- | |
| ## System Overview | |
| TraceMind MCP Server is a Gradio-based MCP (Model Context Protocol) server that provides AI-powered analysis tools for agent evaluation data. It serves as the backend intelligence layer for the TraceMind ecosystem. | |
| ### Technology Stack | |
| | Component | Technology | Version | Purpose | | |
| |-----------|-----------|---------|---------| | |
| | **Framework** | Gradio | 6.x | Native MCP support with `@gr.mcp.*` decorators | | |
| | **AI Model** | Google Gemini | 2.5 Flash Lite | AI-powered analysis and insights | | |
| | **Data Source** | HuggingFace Datasets | Latest | Load evaluation datasets | | |
| | **Protocol** | MCP | 1.0 | Model Context Protocol for tool exposure | | |
| | **Transport** | SSE | - | Server-Sent Events for real-time communication | | |
| | **Deployment** | Docker | - | HuggingFace Spaces containerized deployment | | |
| | **Language** | Python | 3.10+ | Core implementation | | |
| ### Architecture Diagram | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β MCP Clients (External) β | |
| β - Claude Desktop β | |
| β - VS Code (Continue, Cursor, Cline) β | |
| β - TraceMind-AI (Track 2) β | |
| ββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| β MCP Protocol | |
| β (SSE Transport) | |
| β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β TraceMind MCP Server (HuggingFace Spaces) β | |
| β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Gradio App (app.py) β β | |
| β β - MCP Server Endpoint (mcp_server=True) β β | |
| β β - Testing UI (Gradio Blocks) β β | |
| β β - Configuration Management β β | |
| β βββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ β | |
| β β β | |
| β β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β MCP Tools (mcp_tools.py) β β | |
| β β - 11 Tools (@gr.mcp.tool()) β β | |
| β β - 3 Resources (@gr.mcp.resource()) β β | |
| β β - 3 Prompts (@gr.mcp.prompt()) β β | |
| β βββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ β | |
| β β β | |
| β β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Gemini Client (gemini_client.py) β β | |
| β β - API Authentication β β | |
| β β - Prompt Engineering β β | |
| β β - Response Parsing β β | |
| β βββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ β | |
| β β β | |
| ββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| β | |
| ββββββββββββββββββ | |
| β External APIs β | |
| β - Gemini API β | |
| β - HF Datasets β | |
| ββββββββββββββββββ | |
| ``` | |
| --- | |
| ## Project Structure | |
| ``` | |
| TraceMind-mcp-server/ | |
| βββ app.py # Main entry point, Gradio UI | |
| βββ mcp_tools.py # MCP tool implementations (11 tools + 3 resources + 3 prompts) | |
| βββ gemini_client.py # Google Gemini API client | |
| βββ requirements.txt # Python dependencies | |
| βββ Dockerfile # Container configuration | |
| βββ .env.example # Environment variable template | |
| βββ .gitignore # Git ignore rules | |
| βββ README.md # Project documentation | |
| βββ DOCUMENTATION.md # Complete API reference | |
| Total: 8 files (excluding docs) | |
| Lines of Code: ~3,500 lines (breakdown below) | |
| ``` | |
| ### File Sizes | |
| | File | Lines | Purpose | | |
| |------|-------|---------| | |
| | `app.py` | ~1,200 | Gradio UI + MCP server setup + testing interface | | |
| | `mcp_tools.py` | ~2,100 | All 17 MCP components (tools, resources, prompts) | | |
| | `gemini_client.py` | ~200 | Gemini API integration | | |
| | `requirements.txt` | ~20 | Dependencies | | |
| | `Dockerfile` | ~30 | Deployment configuration | | |
| --- | |
| ## Core Components | |
| ### 1. app.py - Main Application | |
| **Purpose**: Entry point for HuggingFace Spaces deployment, provides both MCP server and testing UI. | |
| **Key Responsibilities**: | |
| - Initialize Gradio app with `mcp_server=True` | |
| - Create testing interface for all MCP tools | |
| - Handle configuration (API keys, settings) | |
| - Manage client connections | |
| **Architecture**: | |
| ```python | |
| # app.py structure | |
| import gradio as gr | |
| from gemini_client import GeminiClient | |
| from mcp_tools import * # All tool implementations | |
| # 1. Initialize Gemini client (with fallback) | |
| default_gemini_client = GeminiClient() | |
| # 2. Create Gradio UI for testing | |
| def create_gradio_ui(): | |
| with gr.Blocks() as demo: | |
| # Settings tab for API key configuration | |
| # Tab for each MCP tool (11 tabs) | |
| # Tab for testing resources | |
| # Tab for testing prompts | |
| # API documentation tab | |
| return demo | |
| # 3. Launch with MCP server enabled | |
| if __name__ == "__main__": | |
| demo = create_gradio_ui() | |
| demo.launch( | |
| mcp_server=True, # β Enables MCP endpoint | |
| share=False, | |
| server_name="0.0.0.0", | |
| server_port=7860 | |
| ) | |
| ``` | |
| **MCP Enablement**: | |
| - `mcp_server=True` in `demo.launch()` automatically: | |
| - Exposes `/gradio_api/mcp/sse` endpoint | |
| - Discovers all `@gr.mcp.tool()`, `@gr.mcp.resource()`, `@gr.mcp.prompt()` decorated functions | |
| - Generates MCP tool schemas from function signatures and docstrings | |
| - Handles MCP protocol communication (SSE transport) | |
| **Testing Interface**: | |
| - **Settings Tab**: Configure Gemini API key and HF token | |
| - **Tool Tabs** (11): One tab per tool for manual testing | |
| - Input fields for all parameters | |
| - Submit button | |
| - Output display (Markdown or JSON) | |
| - **Resources Tab**: Test resource URIs | |
| - **Prompts Tab**: Test prompt templates | |
| - **API Documentation Tab**: Generated from tool docstrings | |
| --- | |
| ### 2. mcp_tools.py - MCP Components | |
| **Purpose**: Implements all 17 MCP components (11 tools + 3 resources + 3 prompts). | |
| **Structure**: | |
| ```python | |
| # mcp_tools.py structure | |
| import gradio as gr | |
| from gemini_client import GeminiClient | |
| from datasets import load_dataset | |
| # ============ TOOLS (11) ============ | |
| @gr.mcp.tool() | |
| async def analyze_leaderboard(...) -> str: | |
| """Tool docstring (becomes MCP description)""" | |
| # 1. Load data from HuggingFace | |
| # 2. Process/filter data | |
| # 3. Call Gemini for AI analysis | |
| # 4. Return formatted response | |
| pass | |
| @gr.mcp.tool() | |
| async def debug_trace(...) -> str: | |
| """Debug traces with AI assistance""" | |
| pass | |
| # ... (9 more tools) | |
| # ============ RESOURCES (3) ============ | |
| @gr.mcp.resource() | |
| def get_leaderboard_data(uri: str) -> str: | |
| """URI: leaderboard://{repo}""" | |
| # Parse URI | |
| # Load dataset | |
| # Return raw JSON | |
| pass | |
| @gr.mcp.resource() | |
| def get_trace_data(uri: str) -> str: | |
| """URI: trace://{trace_id}/{repo}""" | |
| pass | |
| @gr.mcp.resource() | |
| def get_cost_data(uri: str) -> str: | |
| """URI: cost://model/{model_name}""" | |
| pass | |
| # ============ PROMPTS (3) ============ | |
| @gr.mcp.prompt() | |
| def analysis_prompt(analysis_type: str, ...) -> str: | |
| """Generate analysis prompt templates""" | |
| pass | |
| @gr.mcp.prompt() | |
| def debug_prompt(debug_type: str, ...) -> str: | |
| """Generate debug prompt templates""" | |
| pass | |
| @gr.mcp.prompt() | |
| def optimization_prompt(optimization_goal: str, ...) -> str: | |
| """Generate optimization prompt templates""" | |
| pass | |
| ``` | |
| **Design Patterns**: | |
| 1. **Decorator-Based Registration**: | |
| ```python | |
| @gr.mcp.tool() # Gradio automatically registers as MCP tool | |
| async def tool_name(...) -> str: | |
| """Docstring becomes tool description in MCP schema""" | |
| pass | |
| ``` | |
| 2. **Structured Docstrings**: | |
| ```python | |
| """ | |
| Brief one-line description. | |
| Longer detailed description explaining purpose and behavior. | |
| Args: | |
| param1 (type): Description of param1 | |
| param2 (type): Description of param2. Default: value | |
| Returns: | |
| type: Description of return value | |
| """ | |
| ``` | |
| Gradio parses this to generate MCP tool schema automatically. | |
| 3. **Error Handling**: | |
| ```python | |
| try: | |
| # Tool implementation | |
| return result | |
| except Exception as e: | |
| return f"β **Error**: {str(e)}" | |
| ``` | |
| All errors returned as user-friendly strings. | |
| 4. **Async/Await**: | |
| All tools are `async` for efficient I/O operations (API calls, dataset loading). | |
| --- | |
| ### 3. gemini_client.py - AI Integration | |
| **Purpose**: Handles all interactions with Google Gemini 2.5 Flash Lite API. | |
| **Key Features**: | |
| - API authentication | |
| - Prompt engineering for different analysis types | |
| - Response parsing and formatting | |
| - Error handling and retries | |
| - Token optimization | |
| **Class Structure**: | |
| ```python | |
| class GeminiClient: | |
| def __init__(self, api_key: str, model_name: str): | |
| """Initialize with API key and model""" | |
| self.api_key = api_key | |
| self.model = genai.GenerativeModel(model_name) | |
| self.generation_config = { | |
| "temperature": 0.7, | |
| "top_p": 0.95, | |
| "max_output_tokens": 4096, # Optimized for HF Spaces | |
| } | |
| self.request_timeout = 30 # 30s timeout | |
| async def analyze_with_context( | |
| self, | |
| data: Dict, | |
| analysis_type: str, | |
| specific_question: Optional[str] = None | |
| ) -> str: | |
| """ | |
| Core analysis method used by all AI-powered tools | |
| Args: | |
| data: Data to analyze (dict or JSON) | |
| analysis_type: "leaderboard", "trace", "cost_estimate", "comparison", "results" | |
| specific_question: Optional specific question | |
| Returns: | |
| Markdown-formatted analysis | |
| """ | |
| # 1. Build system prompt based on analysis_type | |
| system_prompt = self._get_system_prompt(analysis_type) | |
| # 2. Format data for context | |
| data_str = json.dumps(data, indent=2) | |
| # 3. Build user prompt | |
| user_prompt = f"{system_prompt}\n\nData:\n{data_str}" | |
| if specific_question: | |
| user_prompt += f"\n\nSpecific Question: {specific_question}" | |
| # 4. Call Gemini API | |
| response = await self.model.generate_content_async( | |
| user_prompt, | |
| generation_config=self.generation_config, | |
| request_options={"timeout": self.request_timeout} | |
| ) | |
| # 5. Extract and return text | |
| return response.text | |
| def _get_system_prompt(self, analysis_type: str) -> str: | |
| """Get specialized system prompt for each analysis type""" | |
| prompts = { | |
| "leaderboard": """You are an expert AI agent performance analyst. | |
| Analyze evaluation leaderboard data and provide: | |
| - Top performers by key metrics | |
| - Trade-off analysis (cost vs accuracy) | |
| - Trend identification | |
| - Actionable recommendations | |
| Format: Markdown with clear sections.""", | |
| "trace": """You are an expert at debugging AI agent executions. | |
| Analyze OpenTelemetry trace data and: | |
| - Answer specific questions about execution | |
| - Identify performance bottlenecks | |
| - Explain reasoning chain | |
| - Provide optimization suggestions | |
| Format: Clear, concise explanation.""", | |
| "cost_estimate": """You are a cost optimization expert. | |
| Analyze cost estimation data and provide: | |
| - Detailed cost breakdown | |
| - Hardware recommendations | |
| - Cost optimization opportunities | |
| - ROI analysis | |
| Format: Structured breakdown with recommendations.""", | |
| # ... more prompts for other analysis types | |
| } | |
| return prompts.get(analysis_type, prompts["leaderboard"]) | |
| ``` | |
| **Optimization Strategies**: | |
| - **Token Reduction**: `max_output_tokens: 4096` (reduced from 8192) for faster responses | |
| - **Request Timeout**: 30s timeout for HF Spaces compatibility | |
| - **Temperature**: 0.7 for balanced creativity and consistency | |
| - **Model Selection**: `gemini-2.5-flash-lite` for speed (can switch to `gemini-2.5-flash` for quality) | |
| --- | |
| ## MCP Protocol Implementation | |
| ### How Gradio's Native MCP Support Works | |
| Gradio 6+ provides native MCP server capabilities through decorators and automatic schema generation. | |
| **1. Tool Registration**: | |
| ```python | |
| @gr.mcp.tool() # β This decorator tells Gradio to expose this as an MCP tool | |
| async def my_tool(param1: str, param2: int = 10) -> str: | |
| """ | |
| Brief description (used in MCP tool schema). | |
| Args: | |
| param1 (str): Description of param1 | |
| param2 (int): Description of param2. Default: 10 | |
| Returns: | |
| str: Description of return value | |
| """ | |
| return f"Result: {param1}, {param2}" | |
| ``` | |
| **What Gradio does automatically**: | |
| - Parses function signature to extract parameter names and types | |
| - Parses docstring to extract descriptions | |
| - Generates MCP tool schema: | |
| ```json | |
| { | |
| "name": "my_tool", | |
| "description": "Brief description (used in MCP tool schema).", | |
| "inputSchema": { | |
| "type": "object", | |
| "properties": { | |
| "param1": { | |
| "type": "string", | |
| "description": "Description of param1" | |
| }, | |
| "param2": { | |
| "type": "integer", | |
| "default": 10, | |
| "description": "Description of param2. Default: 10" | |
| } | |
| }, | |
| "required": ["param1"] | |
| } | |
| } | |
| ``` | |
| **2. Resource Registration**: | |
| ```python | |
| @gr.mcp.resource() | |
| def get_resource(uri: str) -> str: | |
| """ | |
| Resource description. | |
| Args: | |
| uri (str): Resource URI (e.g., "leaderboard://repo/name") | |
| Returns: | |
| str: JSON data | |
| """ | |
| # Parse URI | |
| # Load data | |
| # Return JSON string | |
| pass | |
| ``` | |
| **3. Prompt Registration**: | |
| ```python | |
| @gr.mcp.prompt() | |
| def generate_prompt(prompt_type: str, context: str) -> str: | |
| """ | |
| Generate reusable prompt templates. | |
| Args: | |
| prompt_type (str): Type of prompt | |
| context (str): Context for prompt generation | |
| Returns: | |
| str: Generated prompt text | |
| """ | |
| return f"Prompt template for {prompt_type} with {context}" | |
| ``` | |
| ### MCP Endpoint URLs | |
| When `demo.launch(mcp_server=True)` is called: | |
| **SSE Endpoint** (Primary): | |
| ``` | |
| https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse | |
| ``` | |
| **Streamable HTTP Endpoint** (Alternative): | |
| ``` | |
| https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/ | |
| ``` | |
| ### Client Configuration | |
| **Claude Desktop** (`claude_desktop_config.json`): | |
| ```json | |
| { | |
| "mcpServers": { | |
| "tracemind": { | |
| "url": "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse", | |
| "transport": "sse" | |
| } | |
| } | |
| } | |
| ``` | |
| **Python MCP Client**: | |
| ```python | |
| from mcp import ClientSession, ServerParameters | |
| session = ClientSession( | |
| ServerParameters( | |
| url="https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse", | |
| transport="sse" | |
| ) | |
| ) | |
| await session.__aenter__() | |
| # List tools | |
| tools = await session.list_tools() | |
| # Call tool | |
| result = await session.call_tool("analyze_leaderboard", arguments={ | |
| "metric_focus": "cost", | |
| "top_n": 5 | |
| }) | |
| ``` | |
| --- | |
| ## Gemini Integration | |
| ### API Configuration | |
| **Environment Variable**: | |
| ```bash | |
| GEMINI_API_KEY=your_api_key_here | |
| ``` | |
| **Initialization**: | |
| ```python | |
| import google.generativeai as genai | |
| genai.configure(api_key=os.getenv("GEMINI_API_KEY")) | |
| model = genai.GenerativeModel("gemini-2.5-flash-lite") | |
| ``` | |
| ### Prompt Engineering Strategy | |
| **1. System Prompts by Analysis Type**: | |
| Each analysis type (leaderboard, trace, cost, comparison, results) has a specialized system prompt that: | |
| - Defines the AI's role and expertise | |
| - Specifies output format (markdown, structured sections) | |
| - Lists key insights to include | |
| - Sets tone (professional, concise, actionable) | |
| **2. Context Injection**: | |
| ```python | |
| user_prompt = f""" | |
| {system_prompt} | |
| Data to Analyze: | |
| {json.dumps(data, indent=2)} | |
| Specific Question: {question} | |
| """ | |
| ``` | |
| **3. Output Formatting**: | |
| - All responses in Markdown | |
| - Clear sections: Top Performers, Key Insights, Trade-offs, Recommendations | |
| - Bullet points for readability | |
| - Code blocks for technical details | |
| ### Rate Limiting & Error Handling | |
| **Rate Limits** (Gemini 2.5 Flash Lite free tier): | |
| - 1,500 requests per day | |
| - 1 request per second | |
| **Error Handling Strategy**: | |
| ```python | |
| try: | |
| response = await model.generate_content_async(...) | |
| return response.text | |
| except google.api_core.exceptions.ResourceExhausted: | |
| return "β **Rate limit exceeded**. Please try again in a few seconds." | |
| except google.api_core.exceptions.DeadlineExceeded: | |
| return "β **Request timeout**. The analysis is taking too long. Try with less data." | |
| except Exception as e: | |
| return f"β **Error**: {str(e)}" | |
| ``` | |
| --- | |
| ## Data Flow | |
| ### Tool Execution Flow | |
| ``` | |
| 1. MCP Client (e.g., Claude Desktop, TraceMind-AI) | |
| βββ Calls: analyze_leaderboard(metric_focus="cost", top_n=5) | |
| 2. Gradio MCP Server (app.py) | |
| βββ Routes to: analyze_leaderboard() in mcp_tools.py | |
| 3. MCP Tool Function (mcp_tools.py) | |
| βββ Load data from HuggingFace Datasets | |
| β βββ ds = load_dataset("kshitijthakkar/smoltrace-leaderboard") | |
| β | |
| βββ Process/filter data | |
| β βββ Filter by time range, sort by metric | |
| β | |
| βββ Call Gemini Client | |
| β βββ gemini_client.analyze_with_context(data, "leaderboard") | |
| β | |
| βββ Return formatted response | |
| 4. Gemini Client (gemini_client.py) | |
| βββ Build system prompt | |
| βββ Format data as JSON | |
| βββ Call Gemini API | |
| β βββ model.generate_content_async(prompt) | |
| βββ Return AI-generated analysis | |
| 5. Response Path (back through stack) | |
| βββ Gemini β gemini_client β mcp_tool β Gradio β MCP Client | |
| 6. MCP Client (displays result to user) | |
| βββ Shows markdown-formatted analysis | |
| ``` | |
| ### Resource Access Flow | |
| ``` | |
| 1. MCP Client | |
| βββ Accesses: leaderboard://kshitijthakkar/smoltrace-leaderboard | |
| 2. Gradio MCP Server | |
| βββ Routes to: get_leaderboard_data(uri) | |
| 3. Resource Function | |
| βββ Parse URI to extract repo name | |
| βββ Load dataset from HuggingFace | |
| βββ Convert to JSON | |
| βββ Return raw JSON string | |
| 4. MCP Client | |
| βββ Receives raw JSON data (no AI processing) | |
| ``` | |
| --- | |
| ## Deployment Architecture | |
| ### HuggingFace Spaces Deployment | |
| **Platform**: HuggingFace Spaces | |
| **SDK**: Docker (for custom dependencies) | |
| **Hardware**: CPU Basic (free tier) - sufficient for API calls and dataset loading | |
| **URL**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server | |
| ### Dockerfile | |
| ```dockerfile | |
| # Base image | |
| FROM python:3.10-slim | |
| # Set working directory | |
| WORKDIR /app | |
| # Copy requirements | |
| COPY requirements.txt . | |
| # Install dependencies | |
| RUN pip install --no-cache-dir -r requirements.txt | |
| # Copy application files | |
| COPY app.py . | |
| COPY mcp_tools.py . | |
| COPY gemini_client.py . | |
| # Expose port | |
| EXPOSE 7860 | |
| # Set environment variables | |
| ENV GRADIO_SERVER_NAME="0.0.0.0" | |
| ENV GRADIO_SERVER_PORT="7860" | |
| # Run application | |
| CMD ["python", "app.py"] | |
| ``` | |
| ### Environment Variables (HF Spaces Secrets) | |
| ```bash | |
| # Required | |
| GEMINI_API_KEY=your_gemini_api_key_here | |
| # Optional (for testing) | |
| HF_TOKEN=your_huggingface_token_here | |
| ``` | |
| ### Scaling Considerations | |
| **Current Setup** (Free Tier): | |
| - Hardware: CPU Basic | |
| - Concurrent Users: ~10-20 | |
| - Request Latency: 2-5 seconds (AI analysis) | |
| - Rate Limit: Gemini API (1,500 req/day) | |
| **If Scaling Needed**: | |
| 1. **Upgrade Hardware**: CPU Basic β CPU Upgrade (2x performance) | |
| 2. **Caching**: Add Redis for caching frequent queries | |
| 3. **API Key Pool**: Rotate multiple Gemini API keys to bypass rate limits | |
| 4. **Load Balancing**: Deploy multiple Spaces instances with load balancer | |
| --- | |
| ## Development Workflow | |
| ### Local Development Setup | |
| ```bash | |
| # 1. Clone repository | |
| git clone https://github.com/Mandark-droid/TraceMind-mcp-server.git | |
| cd TraceMind-mcp-server | |
| # 2. Create virtual environment | |
| python -m venv venv | |
| source venv/bin/activate # Windows: venv\Scripts\activate | |
| # 3. Install dependencies | |
| pip install -r requirements.txt | |
| # 4. Configure environment | |
| cp .env.example .env | |
| # Edit .env with your API keys | |
| # 5. Run locally | |
| python app.py | |
| # 6. Access | |
| # - Gradio UI: http://localhost:7860 | |
| # - MCP Endpoint: http://localhost:7860/gradio_api/mcp/sse | |
| ``` | |
| ### Testing MCP Tools | |
| **Option 1: Gradio UI** (Easiest): | |
| ``` | |
| 1. Run app.py | |
| 2. Open http://localhost:7860 | |
| 3. Navigate to tool tab (e.g., "π Analyze Leaderboard") | |
| 4. Fill in parameters | |
| 5. Click submit button | |
| 6. View results | |
| ``` | |
| **Option 2: Python MCP Client**: | |
| ```python | |
| from mcp import ClientSession, ServerParameters | |
| async def test_tool(): | |
| session = ClientSession( | |
| ServerParameters( | |
| url="http://localhost:7860/gradio_api/mcp/sse", | |
| transport="sse" | |
| ) | |
| ) | |
| await session.__aenter__() | |
| result = await session.call_tool("analyze_leaderboard", { | |
| "metric_focus": "cost", | |
| "top_n": 3 | |
| }) | |
| print(result.content[0].text) | |
| import asyncio | |
| asyncio.run(test_tool()) | |
| ``` | |
| ### Adding New MCP Tools | |
| **Step 1: Add function to mcp_tools.py**: | |
| ```python | |
| @gr.mcp.tool() | |
| async def new_tool_name( | |
| param1: str, | |
| param2: int = 10 | |
| ) -> str: | |
| """ | |
| Brief description of what this tool does. | |
| Detailed explanation of the tool's purpose and behavior. | |
| Args: | |
| param1 (str): Description of param1 with examples | |
| param2 (int): Description of param2. Default: 10 | |
| Returns: | |
| str: Description of what the function returns | |
| """ | |
| try: | |
| # Implementation | |
| result = f"Processed: {param1} with {param2}" | |
| return result | |
| except Exception as e: | |
| return f"β **Error**: {str(e)}" | |
| ``` | |
| **Step 2: Add testing UI to app.py** (optional): | |
| ```python | |
| with gr.Tab("π New Tool"): | |
| gr.Markdown("## New Tool Name") | |
| param1_input = gr.Textbox(label="Param 1") | |
| param2_input = gr.Number(label="Param 2", value=10) | |
| submit_btn = gr.Button("Execute") | |
| output = gr.Markdown() | |
| submit_btn.click( | |
| fn=new_tool_name, | |
| inputs=[param1_input, param2_input], | |
| outputs=output | |
| ) | |
| ``` | |
| **Step 3: Test**: | |
| ```bash | |
| python app.py | |
| # Visit http://localhost:7860 | |
| # Test in new tab | |
| ``` | |
| **Step 4: Deploy**: | |
| ```bash | |
| git add mcp_tools.py app.py | |
| git commit -m "feat: Add new_tool_name MCP tool" | |
| git push origin main | |
| # HF Spaces auto-deploys | |
| ``` | |
| --- | |
| ## Performance Considerations | |
| ### 1. Token Optimization | |
| **Problem**: Loading full datasets consumes excessive tokens in AI analysis. | |
| **Solutions**: | |
| - **get_top_performers**: Returns only top N models (90% token reduction) | |
| - **get_leaderboard_summary**: Returns aggregated stats (99% token reduction) | |
| - **Data sampling**: Limit rows when loading datasets (max_rows parameter) | |
| **Example**: | |
| ```python | |
| # β BAD: Loads 51 rows, ~50K tokens | |
| full_data = load_dataset("kshitijthakkar/smoltrace-leaderboard") | |
| # β GOOD: Returns top 5, ~5K tokens (90% reduction) | |
| top_5 = await get_top_performers(top_n=5) | |
| # β BETTER: Returns summary, ~500 tokens (99% reduction) | |
| summary = await get_leaderboard_summary() | |
| ``` | |
| ### 2. Async Operations | |
| All tools are `async` for efficient I/O: | |
| ```python | |
| @gr.mcp.tool() | |
| async def tool_name(...): # β async | |
| ds = load_dataset(...) # β Blocks on I/O | |
| result = await gemini_client.analyze(...) # β async API call | |
| return result | |
| ``` | |
| Benefits: | |
| - Non-blocking API calls | |
| - Multiple concurrent requests | |
| - Better resource utilization | |
| ### 3. Caching (Future Enhancement) | |
| **Current**: No caching (stateless) | |
| **Future**: Add Redis for caching frequent queries | |
| ```python | |
| import redis | |
| from functools import wraps | |
| redis_client = redis.Redis(...) | |
| def cache_result(ttl=300): | |
| def decorator(func): | |
| @wraps(func) | |
| async def wrapper(*args, **kwargs): | |
| # Generate cache key | |
| cache_key = f"{func.__name__}:{hash((args, tuple(kwargs.items())))}" | |
| # Check cache | |
| cached = redis_client.get(cache_key) | |
| if cached: | |
| return cached.decode() | |
| # Execute function | |
| result = await func(*args, **kwargs) | |
| # Store in cache | |
| redis_client.setex(cache_key, ttl, result) | |
| return result | |
| return wrapper | |
| return decorator | |
| @gr.mcp.tool() | |
| @cache_result(ttl=300) # 5-minute cache | |
| async def analyze_leaderboard(...): | |
| pass | |
| ``` | |
| --- | |
| ## Security | |
| ### API Key Management | |
| **Storage**: | |
| - Development: `.env` file (gitignored) | |
| - Production: HuggingFace Spaces Secrets (encrypted) | |
| **Access**: | |
| ```python | |
| # gemini_client.py | |
| api_key = os.getenv("GEMINI_API_KEY") | |
| if not api_key: | |
| raise ValueError("GEMINI_API_KEY not set") | |
| ``` | |
| **Never**: | |
| - β Hardcode API keys in source code | |
| - β Commit `.env` to git | |
| - β Expose keys in client-side JavaScript | |
| - β Log API keys in console/files | |
| ### Input Validation | |
| **Dataset Repository Validation**: | |
| ```python | |
| # Only allow "smoltrace-" prefix datasets | |
| if "smoltrace-" not in dataset_repo: | |
| return "β Error: Dataset must contain 'smoltrace-' prefix for security" | |
| ``` | |
| **Parameter Validation**: | |
| ```python | |
| # Constrain ranges | |
| top_n = max(1, min(20, top_n)) # Clamp between 1-20 | |
| max_rows = max(10, min(500, max_rows)) # Clamp between 10-500 | |
| ``` | |
| ### Rate Limiting | |
| **Gemini API**: | |
| - Free tier: 1,500 requests/day | |
| - Handled by Google (automatic) | |
| - Errors returned as user-friendly messages | |
| **HuggingFace Datasets**: | |
| - No rate limits for public datasets | |
| - Private datasets require HF token | |
| --- | |
| ## Related Documentation | |
| - [README.md](PROPOSED_README_MCP_SERVER.md) - Overview and quick start | |
| - [DOCUMENTATION.md](DOCUMENTATION_MCP_SERVER.md) - Complete API reference | |
| - [TraceMind-AI Architecture](ARCHITECTURE_TRACEMIND_AI.md) - Client-side architecture | |
| --- | |
| **Last Updated**: November 21, 2025 | |
| **Version**: 1.0.0 | |
| **Track**: Building MCP (Enterprise) | |