TraceMind-mcp-server / ARCHITECTURE.md
kshitijthakkar's picture
docs: Deploy final documentation package
6982f0b
# TraceMind MCP Server - Technical Architecture
This document provides a deep technical dive into the TraceMind MCP Server architecture, implementation details, and deployment configuration.
## Table of Contents
- [System Overview](#system-overview)
- [Project Structure](#project-structure)
- [Core Components](#core-components)
- [MCP Protocol Implementation](#mcp-protocol-implementation)
- [Gemini Integration](#gemini-integration)
- [Data Flow](#data-flow)
- [Deployment Architecture](#deployment-architecture)
- [Development Workflow](#development-workflow)
- [Performance Considerations](#performance-considerations)
- [Security](#security)
---
## System Overview
TraceMind MCP Server is a Gradio-based MCP (Model Context Protocol) server that provides AI-powered analysis tools for agent evaluation data. It serves as the backend intelligence layer for the TraceMind ecosystem.
### Technology Stack
| Component | Technology | Version | Purpose |
|-----------|-----------|---------|---------|
| **Framework** | Gradio | 6.x | Native MCP support with `@gr.mcp.*` decorators |
| **AI Model** | Google Gemini | 2.5 Flash Lite | AI-powered analysis and insights |
| **Data Source** | HuggingFace Datasets | Latest | Load evaluation datasets |
| **Protocol** | MCP | 1.0 | Model Context Protocol for tool exposure |
| **Transport** | SSE | - | Server-Sent Events for real-time communication |
| **Deployment** | Docker | - | HuggingFace Spaces containerized deployment |
| **Language** | Python | 3.10+ | Core implementation |
### Architecture Diagram
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MCP Clients (External) β”‚
β”‚ - Claude Desktop β”‚
β”‚ - VS Code (Continue, Cursor, Cline) β”‚
β”‚ - TraceMind-AI (Track 2) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”‚ MCP Protocol
β”‚ (SSE Transport)
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ TraceMind MCP Server (HuggingFace Spaces) β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Gradio App (app.py) β”‚ β”‚
β”‚ β”‚ - MCP Server Endpoint (mcp_server=True) β”‚ β”‚
β”‚ β”‚ - Testing UI (Gradio Blocks) β”‚ β”‚
β”‚ β”‚ - Configuration Management β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ ↓ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ MCP Tools (mcp_tools.py) β”‚ β”‚
β”‚ β”‚ - 11 Tools (@gr.mcp.tool()) β”‚ β”‚
β”‚ β”‚ - 3 Resources (@gr.mcp.resource()) β”‚ β”‚
β”‚ β”‚ - 3 Prompts (@gr.mcp.prompt()) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ ↓ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Gemini Client (gemini_client.py) β”‚ β”‚
β”‚ β”‚ - API Authentication β”‚ β”‚
β”‚ β”‚ - Prompt Engineering β”‚ β”‚
β”‚ β”‚ - Response Parsing β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ External APIs β”‚
β”‚ - Gemini API β”‚
β”‚ - HF Datasets β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## Project Structure
```
TraceMind-mcp-server/
β”œβ”€β”€ app.py # Main entry point, Gradio UI
β”œβ”€β”€ mcp_tools.py # MCP tool implementations (11 tools + 3 resources + 3 prompts)
β”œβ”€β”€ gemini_client.py # Google Gemini API client
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ Dockerfile # Container configuration
β”œβ”€β”€ .env.example # Environment variable template
β”œβ”€β”€ .gitignore # Git ignore rules
β”œβ”€β”€ README.md # Project documentation
└── DOCUMENTATION.md # Complete API reference
Total: 8 files (excluding docs)
Lines of Code: ~3,500 lines (breakdown below)
```
### File Sizes
| File | Lines | Purpose |
|------|-------|---------|
| `app.py` | ~1,200 | Gradio UI + MCP server setup + testing interface |
| `mcp_tools.py` | ~2,100 | All 17 MCP components (tools, resources, prompts) |
| `gemini_client.py` | ~200 | Gemini API integration |
| `requirements.txt` | ~20 | Dependencies |
| `Dockerfile` | ~30 | Deployment configuration |
---
## Core Components
### 1. app.py - Main Application
**Purpose**: Entry point for HuggingFace Spaces deployment, provides both MCP server and testing UI.
**Key Responsibilities**:
- Initialize Gradio app with `mcp_server=True`
- Create testing interface for all MCP tools
- Handle configuration (API keys, settings)
- Manage client connections
**Architecture**:
```python
# app.py structure
import gradio as gr
from gemini_client import GeminiClient
from mcp_tools import * # All tool implementations
# 1. Initialize Gemini client (with fallback)
default_gemini_client = GeminiClient()
# 2. Create Gradio UI for testing
def create_gradio_ui():
with gr.Blocks() as demo:
# Settings tab for API key configuration
# Tab for each MCP tool (11 tabs)
# Tab for testing resources
# Tab for testing prompts
# API documentation tab
return demo
# 3. Launch with MCP server enabled
if __name__ == "__main__":
demo = create_gradio_ui()
demo.launch(
mcp_server=True, # ← Enables MCP endpoint
share=False,
server_name="0.0.0.0",
server_port=7860
)
```
**MCP Enablement**:
- `mcp_server=True` in `demo.launch()` automatically:
- Exposes `/gradio_api/mcp/sse` endpoint
- Discovers all `@gr.mcp.tool()`, `@gr.mcp.resource()`, `@gr.mcp.prompt()` decorated functions
- Generates MCP tool schemas from function signatures and docstrings
- Handles MCP protocol communication (SSE transport)
**Testing Interface**:
- **Settings Tab**: Configure Gemini API key and HF token
- **Tool Tabs** (11): One tab per tool for manual testing
- Input fields for all parameters
- Submit button
- Output display (Markdown or JSON)
- **Resources Tab**: Test resource URIs
- **Prompts Tab**: Test prompt templates
- **API Documentation Tab**: Generated from tool docstrings
---
### 2. mcp_tools.py - MCP Components
**Purpose**: Implements all 17 MCP components (11 tools + 3 resources + 3 prompts).
**Structure**:
```python
# mcp_tools.py structure
import gradio as gr
from gemini_client import GeminiClient
from datasets import load_dataset
# ============ TOOLS (11) ============
@gr.mcp.tool()
async def analyze_leaderboard(...) -> str:
"""Tool docstring (becomes MCP description)"""
# 1. Load data from HuggingFace
# 2. Process/filter data
# 3. Call Gemini for AI analysis
# 4. Return formatted response
pass
@gr.mcp.tool()
async def debug_trace(...) -> str:
"""Debug traces with AI assistance"""
pass
# ... (9 more tools)
# ============ RESOURCES (3) ============
@gr.mcp.resource()
def get_leaderboard_data(uri: str) -> str:
"""URI: leaderboard://{repo}"""
# Parse URI
# Load dataset
# Return raw JSON
pass
@gr.mcp.resource()
def get_trace_data(uri: str) -> str:
"""URI: trace://{trace_id}/{repo}"""
pass
@gr.mcp.resource()
def get_cost_data(uri: str) -> str:
"""URI: cost://model/{model_name}"""
pass
# ============ PROMPTS (3) ============
@gr.mcp.prompt()
def analysis_prompt(analysis_type: str, ...) -> str:
"""Generate analysis prompt templates"""
pass
@gr.mcp.prompt()
def debug_prompt(debug_type: str, ...) -> str:
"""Generate debug prompt templates"""
pass
@gr.mcp.prompt()
def optimization_prompt(optimization_goal: str, ...) -> str:
"""Generate optimization prompt templates"""
pass
```
**Design Patterns**:
1. **Decorator-Based Registration**:
```python
@gr.mcp.tool() # Gradio automatically registers as MCP tool
async def tool_name(...) -> str:
"""Docstring becomes tool description in MCP schema"""
pass
```
2. **Structured Docstrings**:
```python
"""
Brief one-line description.
Longer detailed description explaining purpose and behavior.
Args:
param1 (type): Description of param1
param2 (type): Description of param2. Default: value
Returns:
type: Description of return value
"""
```
Gradio parses this to generate MCP tool schema automatically.
3. **Error Handling**:
```python
try:
# Tool implementation
return result
except Exception as e:
return f"❌ **Error**: {str(e)}"
```
All errors returned as user-friendly strings.
4. **Async/Await**:
All tools are `async` for efficient I/O operations (API calls, dataset loading).
---
### 3. gemini_client.py - AI Integration
**Purpose**: Handles all interactions with Google Gemini 2.5 Flash Lite API.
**Key Features**:
- API authentication
- Prompt engineering for different analysis types
- Response parsing and formatting
- Error handling and retries
- Token optimization
**Class Structure**:
```python
class GeminiClient:
def __init__(self, api_key: str, model_name: str):
"""Initialize with API key and model"""
self.api_key = api_key
self.model = genai.GenerativeModel(model_name)
self.generation_config = {
"temperature": 0.7,
"top_p": 0.95,
"max_output_tokens": 4096, # Optimized for HF Spaces
}
self.request_timeout = 30 # 30s timeout
async def analyze_with_context(
self,
data: Dict,
analysis_type: str,
specific_question: Optional[str] = None
) -> str:
"""
Core analysis method used by all AI-powered tools
Args:
data: Data to analyze (dict or JSON)
analysis_type: "leaderboard", "trace", "cost_estimate", "comparison", "results"
specific_question: Optional specific question
Returns:
Markdown-formatted analysis
"""
# 1. Build system prompt based on analysis_type
system_prompt = self._get_system_prompt(analysis_type)
# 2. Format data for context
data_str = json.dumps(data, indent=2)
# 3. Build user prompt
user_prompt = f"{system_prompt}\n\nData:\n{data_str}"
if specific_question:
user_prompt += f"\n\nSpecific Question: {specific_question}"
# 4. Call Gemini API
response = await self.model.generate_content_async(
user_prompt,
generation_config=self.generation_config,
request_options={"timeout": self.request_timeout}
)
# 5. Extract and return text
return response.text
def _get_system_prompt(self, analysis_type: str) -> str:
"""Get specialized system prompt for each analysis type"""
prompts = {
"leaderboard": """You are an expert AI agent performance analyst.
Analyze evaluation leaderboard data and provide:
- Top performers by key metrics
- Trade-off analysis (cost vs accuracy)
- Trend identification
- Actionable recommendations
Format: Markdown with clear sections.""",
"trace": """You are an expert at debugging AI agent executions.
Analyze OpenTelemetry trace data and:
- Answer specific questions about execution
- Identify performance bottlenecks
- Explain reasoning chain
- Provide optimization suggestions
Format: Clear, concise explanation.""",
"cost_estimate": """You are a cost optimization expert.
Analyze cost estimation data and provide:
- Detailed cost breakdown
- Hardware recommendations
- Cost optimization opportunities
- ROI analysis
Format: Structured breakdown with recommendations.""",
# ... more prompts for other analysis types
}
return prompts.get(analysis_type, prompts["leaderboard"])
```
**Optimization Strategies**:
- **Token Reduction**: `max_output_tokens: 4096` (reduced from 8192) for faster responses
- **Request Timeout**: 30s timeout for HF Spaces compatibility
- **Temperature**: 0.7 for balanced creativity and consistency
- **Model Selection**: `gemini-2.5-flash-lite` for speed (can switch to `gemini-2.5-flash` for quality)
---
## MCP Protocol Implementation
### How Gradio's Native MCP Support Works
Gradio 6+ provides native MCP server capabilities through decorators and automatic schema generation.
**1. Tool Registration**:
```python
@gr.mcp.tool() # ← This decorator tells Gradio to expose this as an MCP tool
async def my_tool(param1: str, param2: int = 10) -> str:
"""
Brief description (used in MCP tool schema).
Args:
param1 (str): Description of param1
param2 (int): Description of param2. Default: 10
Returns:
str: Description of return value
"""
return f"Result: {param1}, {param2}"
```
**What Gradio does automatically**:
- Parses function signature to extract parameter names and types
- Parses docstring to extract descriptions
- Generates MCP tool schema:
```json
{
"name": "my_tool",
"description": "Brief description (used in MCP tool schema).",
"inputSchema": {
"type": "object",
"properties": {
"param1": {
"type": "string",
"description": "Description of param1"
},
"param2": {
"type": "integer",
"default": 10,
"description": "Description of param2. Default: 10"
}
},
"required": ["param1"]
}
}
```
**2. Resource Registration**:
```python
@gr.mcp.resource()
def get_resource(uri: str) -> str:
"""
Resource description.
Args:
uri (str): Resource URI (e.g., "leaderboard://repo/name")
Returns:
str: JSON data
"""
# Parse URI
# Load data
# Return JSON string
pass
```
**3. Prompt Registration**:
```python
@gr.mcp.prompt()
def generate_prompt(prompt_type: str, context: str) -> str:
"""
Generate reusable prompt templates.
Args:
prompt_type (str): Type of prompt
context (str): Context for prompt generation
Returns:
str: Generated prompt text
"""
return f"Prompt template for {prompt_type} with {context}"
```
### MCP Endpoint URLs
When `demo.launch(mcp_server=True)` is called:
**SSE Endpoint** (Primary):
```
https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse
```
**Streamable HTTP Endpoint** (Alternative):
```
https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/
```
### Client Configuration
**Claude Desktop** (`claude_desktop_config.json`):
```json
{
"mcpServers": {
"tracemind": {
"url": "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse",
"transport": "sse"
}
}
}
```
**Python MCP Client**:
```python
from mcp import ClientSession, ServerParameters
session = ClientSession(
ServerParameters(
url="https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse",
transport="sse"
)
)
await session.__aenter__()
# List tools
tools = await session.list_tools()
# Call tool
result = await session.call_tool("analyze_leaderboard", arguments={
"metric_focus": "cost",
"top_n": 5
})
```
---
## Gemini Integration
### API Configuration
**Environment Variable**:
```bash
GEMINI_API_KEY=your_api_key_here
```
**Initialization**:
```python
import google.generativeai as genai
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
model = genai.GenerativeModel("gemini-2.5-flash-lite")
```
### Prompt Engineering Strategy
**1. System Prompts by Analysis Type**:
Each analysis type (leaderboard, trace, cost, comparison, results) has a specialized system prompt that:
- Defines the AI's role and expertise
- Specifies output format (markdown, structured sections)
- Lists key insights to include
- Sets tone (professional, concise, actionable)
**2. Context Injection**:
```python
user_prompt = f"""
{system_prompt}
Data to Analyze:
{json.dumps(data, indent=2)}
Specific Question: {question}
"""
```
**3. Output Formatting**:
- All responses in Markdown
- Clear sections: Top Performers, Key Insights, Trade-offs, Recommendations
- Bullet points for readability
- Code blocks for technical details
### Rate Limiting & Error Handling
**Rate Limits** (Gemini 2.5 Flash Lite free tier):
- 1,500 requests per day
- 1 request per second
**Error Handling Strategy**:
```python
try:
response = await model.generate_content_async(...)
return response.text
except google.api_core.exceptions.ResourceExhausted:
return "❌ **Rate limit exceeded**. Please try again in a few seconds."
except google.api_core.exceptions.DeadlineExceeded:
return "❌ **Request timeout**. The analysis is taking too long. Try with less data."
except Exception as e:
return f"❌ **Error**: {str(e)}"
```
---
## Data Flow
### Tool Execution Flow
```
1. MCP Client (e.g., Claude Desktop, TraceMind-AI)
└─→ Calls: analyze_leaderboard(metric_focus="cost", top_n=5)
2. Gradio MCP Server (app.py)
└─→ Routes to: analyze_leaderboard() in mcp_tools.py
3. MCP Tool Function (mcp_tools.py)
β”œβ”€β†’ Load data from HuggingFace Datasets
β”‚ └─→ ds = load_dataset("kshitijthakkar/smoltrace-leaderboard")
β”‚
β”œβ”€β†’ Process/filter data
β”‚ └─→ Filter by time range, sort by metric
β”‚
β”œβ”€β†’ Call Gemini Client
β”‚ └─→ gemini_client.analyze_with_context(data, "leaderboard")
β”‚
└─→ Return formatted response
4. Gemini Client (gemini_client.py)
β”œβ”€β†’ Build system prompt
β”œβ”€β†’ Format data as JSON
β”œβ”€β†’ Call Gemini API
β”‚ └─→ model.generate_content_async(prompt)
└─→ Return AI-generated analysis
5. Response Path (back through stack)
└─→ Gemini β†’ gemini_client β†’ mcp_tool β†’ Gradio β†’ MCP Client
6. MCP Client (displays result to user)
└─→ Shows markdown-formatted analysis
```
### Resource Access Flow
```
1. MCP Client
└─→ Accesses: leaderboard://kshitijthakkar/smoltrace-leaderboard
2. Gradio MCP Server
└─→ Routes to: get_leaderboard_data(uri)
3. Resource Function
β”œβ”€β†’ Parse URI to extract repo name
β”œβ”€β†’ Load dataset from HuggingFace
β”œβ”€β†’ Convert to JSON
└─→ Return raw JSON string
4. MCP Client
└─→ Receives raw JSON data (no AI processing)
```
---
## Deployment Architecture
### HuggingFace Spaces Deployment
**Platform**: HuggingFace Spaces
**SDK**: Docker (for custom dependencies)
**Hardware**: CPU Basic (free tier) - sufficient for API calls and dataset loading
**URL**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
### Dockerfile
```dockerfile
# Base image
FROM python:3.10-slim
# Set working directory
WORKDIR /app
# Copy requirements
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application files
COPY app.py .
COPY mcp_tools.py .
COPY gemini_client.py .
# Expose port
EXPOSE 7860
# Set environment variables
ENV GRADIO_SERVER_NAME="0.0.0.0"
ENV GRADIO_SERVER_PORT="7860"
# Run application
CMD ["python", "app.py"]
```
### Environment Variables (HF Spaces Secrets)
```bash
# Required
GEMINI_API_KEY=your_gemini_api_key_here
# Optional (for testing)
HF_TOKEN=your_huggingface_token_here
```
### Scaling Considerations
**Current Setup** (Free Tier):
- Hardware: CPU Basic
- Concurrent Users: ~10-20
- Request Latency: 2-5 seconds (AI analysis)
- Rate Limit: Gemini API (1,500 req/day)
**If Scaling Needed**:
1. **Upgrade Hardware**: CPU Basic β†’ CPU Upgrade (2x performance)
2. **Caching**: Add Redis for caching frequent queries
3. **API Key Pool**: Rotate multiple Gemini API keys to bypass rate limits
4. **Load Balancing**: Deploy multiple Spaces instances with load balancer
---
## Development Workflow
### Local Development Setup
```bash
# 1. Clone repository
git clone https://github.com/Mandark-droid/TraceMind-mcp-server.git
cd TraceMind-mcp-server
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Edit .env with your API keys
# 5. Run locally
python app.py
# 6. Access
# - Gradio UI: http://localhost:7860
# - MCP Endpoint: http://localhost:7860/gradio_api/mcp/sse
```
### Testing MCP Tools
**Option 1: Gradio UI** (Easiest):
```
1. Run app.py
2. Open http://localhost:7860
3. Navigate to tool tab (e.g., "πŸ“Š Analyze Leaderboard")
4. Fill in parameters
5. Click submit button
6. View results
```
**Option 2: Python MCP Client**:
```python
from mcp import ClientSession, ServerParameters
async def test_tool():
session = ClientSession(
ServerParameters(
url="http://localhost:7860/gradio_api/mcp/sse",
transport="sse"
)
)
await session.__aenter__()
result = await session.call_tool("analyze_leaderboard", {
"metric_focus": "cost",
"top_n": 3
})
print(result.content[0].text)
import asyncio
asyncio.run(test_tool())
```
### Adding New MCP Tools
**Step 1: Add function to mcp_tools.py**:
```python
@gr.mcp.tool()
async def new_tool_name(
param1: str,
param2: int = 10
) -> str:
"""
Brief description of what this tool does.
Detailed explanation of the tool's purpose and behavior.
Args:
param1 (str): Description of param1 with examples
param2 (int): Description of param2. Default: 10
Returns:
str: Description of what the function returns
"""
try:
# Implementation
result = f"Processed: {param1} with {param2}"
return result
except Exception as e:
return f"❌ **Error**: {str(e)}"
```
**Step 2: Add testing UI to app.py** (optional):
```python
with gr.Tab("πŸ†• New Tool"):
gr.Markdown("## New Tool Name")
param1_input = gr.Textbox(label="Param 1")
param2_input = gr.Number(label="Param 2", value=10)
submit_btn = gr.Button("Execute")
output = gr.Markdown()
submit_btn.click(
fn=new_tool_name,
inputs=[param1_input, param2_input],
outputs=output
)
```
**Step 3: Test**:
```bash
python app.py
# Visit http://localhost:7860
# Test in new tab
```
**Step 4: Deploy**:
```bash
git add mcp_tools.py app.py
git commit -m "feat: Add new_tool_name MCP tool"
git push origin main
# HF Spaces auto-deploys
```
---
## Performance Considerations
### 1. Token Optimization
**Problem**: Loading full datasets consumes excessive tokens in AI analysis.
**Solutions**:
- **get_top_performers**: Returns only top N models (90% token reduction)
- **get_leaderboard_summary**: Returns aggregated stats (99% token reduction)
- **Data sampling**: Limit rows when loading datasets (max_rows parameter)
**Example**:
```python
# ❌ BAD: Loads 51 rows, ~50K tokens
full_data = load_dataset("kshitijthakkar/smoltrace-leaderboard")
# βœ… GOOD: Returns top 5, ~5K tokens (90% reduction)
top_5 = await get_top_performers(top_n=5)
# βœ… BETTER: Returns summary, ~500 tokens (99% reduction)
summary = await get_leaderboard_summary()
```
### 2. Async Operations
All tools are `async` for efficient I/O:
```python
@gr.mcp.tool()
async def tool_name(...): # ← async
ds = load_dataset(...) # ← Blocks on I/O
result = await gemini_client.analyze(...) # ← async API call
return result
```
Benefits:
- Non-blocking API calls
- Multiple concurrent requests
- Better resource utilization
### 3. Caching (Future Enhancement)
**Current**: No caching (stateless)
**Future**: Add Redis for caching frequent queries
```python
import redis
from functools import wraps
redis_client = redis.Redis(...)
def cache_result(ttl=300):
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
# Generate cache key
cache_key = f"{func.__name__}:{hash((args, tuple(kwargs.items())))}"
# Check cache
cached = redis_client.get(cache_key)
if cached:
return cached.decode()
# Execute function
result = await func(*args, **kwargs)
# Store in cache
redis_client.setex(cache_key, ttl, result)
return result
return wrapper
return decorator
@gr.mcp.tool()
@cache_result(ttl=300) # 5-minute cache
async def analyze_leaderboard(...):
pass
```
---
## Security
### API Key Management
**Storage**:
- Development: `.env` file (gitignored)
- Production: HuggingFace Spaces Secrets (encrypted)
**Access**:
```python
# gemini_client.py
api_key = os.getenv("GEMINI_API_KEY")
if not api_key:
raise ValueError("GEMINI_API_KEY not set")
```
**Never**:
- ❌ Hardcode API keys in source code
- ❌ Commit `.env` to git
- ❌ Expose keys in client-side JavaScript
- ❌ Log API keys in console/files
### Input Validation
**Dataset Repository Validation**:
```python
# Only allow "smoltrace-" prefix datasets
if "smoltrace-" not in dataset_repo:
return "❌ Error: Dataset must contain 'smoltrace-' prefix for security"
```
**Parameter Validation**:
```python
# Constrain ranges
top_n = max(1, min(20, top_n)) # Clamp between 1-20
max_rows = max(10, min(500, max_rows)) # Clamp between 10-500
```
### Rate Limiting
**Gemini API**:
- Free tier: 1,500 requests/day
- Handled by Google (automatic)
- Errors returned as user-friendly messages
**HuggingFace Datasets**:
- No rate limits for public datasets
- Private datasets require HF token
---
## Related Documentation
- [README.md](PROPOSED_README_MCP_SERVER.md) - Overview and quick start
- [DOCUMENTATION.md](DOCUMENTATION_MCP_SERVER.md) - Complete API reference
- [TraceMind-AI Architecture](ARCHITECTURE_TRACEMIND_AI.md) - Client-side architecture
---
**Last Updated**: November 21, 2025
**Version**: 1.0.0
**Track**: Building MCP (Enterprise)