Spaces:

MCP-1st-Birthday
/

TraceMind-mcp-server

Running

App Files Files Community

kshitijthakkar commited on 13 days ago

Commit

6982f0b

1 Parent(s): e4b0c31

docs: Deploy final documentation package

Browse files

Files changed (3) hide show

ARCHITECTURE.md +987 -0
DOCUMENTATION.md +918 -0
README.md +186 -770

ARCHITECTURE.md ADDED Viewed

	@@ -0,0 +1,987 @@

+# TraceMind MCP Server - Technical Architecture
+This document provides a deep technical dive into the TraceMind MCP Server architecture, implementation details, and deployment configuration.
+## Table of Contents
+- [System Overview](#system-overview)
+- [Project Structure](#project-structure)
+- [Core Components](#core-components)
+- [MCP Protocol Implementation](#mcp-protocol-implementation)
+- [Gemini Integration](#gemini-integration)
+- [Data Flow](#data-flow)
+- [Deployment Architecture](#deployment-architecture)
+- [Development Workflow](#development-workflow)
+- [Performance Considerations](#performance-considerations)
+- [Security](#security)
+---
+## System Overview
+TraceMind MCP Server is a Gradio-based MCP (Model Context Protocol) server that provides AI-powered analysis tools for agent evaluation data. It serves as the backend intelligence layer for the TraceMind ecosystem.
+### Technology Stack
+| Component | Technology | Version | Purpose |
+|-----------|-----------|---------|---------|
+| **Framework** | Gradio | 6.x | Native MCP support with `@gr.mcp.*` decorators |
+| **AI Model** | Google Gemini | 2.5 Flash Lite | AI-powered analysis and insights |
+| **Data Source** | HuggingFace Datasets | Latest | Load evaluation datasets |
+| **Protocol** | MCP | 1.0 | Model Context Protocol for tool exposure |
+| **Transport** | SSE | - | Server-Sent Events for real-time communication |
+| **Deployment** | Docker | - | HuggingFace Spaces containerized deployment |
+| **Language** | Python | 3.10+ | Core implementation |
+### Architecture Diagram
+```
+┌──────────────────────────────────────────────────────────────┐
+│ MCP Clients (External)                                        │
+│  - Claude Desktop                                             │
+│  - VS Code (Continue, Cursor, Cline)                         │
+│  - TraceMind-AI (Track 2)                                    │
+└────────────────┬─────────────────────────────────────────────┘
+                 │
+                 │ MCP Protocol
+                 │ (SSE Transport)
+                 ↓
+┌──────────────────────────────────────────────────────────────┐
+│ TraceMind MCP Server (HuggingFace Spaces)                    │
+│                                                               │
+│  ┌──────────────────────────────────────────────────────┐   │
+│  │ Gradio App (app.py)                                   │   │
+│  │  - MCP Server Endpoint (mcp_server=True)             │   │
+│  │  - Testing UI (Gradio Blocks)                        │   │
+│  │  - Configuration Management                           │   │
+│  └─────────────┬────────────────────────────────────────┘   │
+│                │                                              │
+│                ↓                                              │
+│  ┌──────────────────────────────────────────────────────┐   │
+│  │ MCP Tools (mcp_tools.py)                             │   │
+│  │  - 11 Tools (@gr.mcp.tool())                         │   │
+│  │  - 3 Resources (@gr.mcp.resource())                  │   │
+│  │  - 3 Prompts (@gr.mcp.prompt())                      │   │
+│  └─────────────┬────────────────────────────────────────┘   │
+│                │                                              │
+│                ↓                                              │
+│  ┌──────────────────────────────────────────────────────┐   │
+│  │ Gemini Client (gemini_client.py)                     │   │
+│  │  - API Authentication                                 │   │
+│  │  - Prompt Engineering                                 │   │
+│  │  - Response Parsing                                   │   │
+│  └─────────────┬────────────────────────────────────────┘   │
+│                │                                              │
+└────────────��───┼──────────────────────────────────────────────┘
+                 │
+                 ↓
+        ┌────────────────┐
+        │ External APIs  │
+        │  - Gemini API  │
+        │  - HF Datasets │
+        └────────────────┘
+```
+---
+## Project Structure
+```
+TraceMind-mcp-server/
+├── app.py                      # Main entry point, Gradio UI
+├── mcp_tools.py                # MCP tool implementations (11 tools + 3 resources + 3 prompts)
+├── gemini_client.py            # Google Gemini API client
+├── requirements.txt            # Python dependencies
+├── Dockerfile                  # Container configuration
+├── .env.example                # Environment variable template
+├── .gitignore                  # Git ignore rules
+├── README.md                   # Project documentation
+└── DOCUMENTATION.md            # Complete API reference
+Total: 8 files (excluding docs)
+Lines of Code: ~3,500 lines (breakdown below)
+```
+### File Sizes
+| File | Lines | Purpose |
+|------|-------|---------|
+| `app.py` | ~1,200 | Gradio UI + MCP server setup + testing interface |
+| `mcp_tools.py` | ~2,100 | All 17 MCP components (tools, resources, prompts) |
+| `gemini_client.py` | ~200 | Gemini API integration |
+| `requirements.txt` | ~20 | Dependencies |
+| `Dockerfile` | ~30 | Deployment configuration |
+---
+## Core Components
+### 1. app.py - Main Application
+**Purpose**: Entry point for HuggingFace Spaces deployment, provides both MCP server and testing UI.
+**Key Responsibilities**:
+- Initialize Gradio app with `mcp_server=True`
+- Create testing interface for all MCP tools
+- Handle configuration (API keys, settings)
+- Manage client connections
+**Architecture**:
+```python
+# app.py structure
+import gradio as gr
+from gemini_client import GeminiClient
+from mcp_tools import *  # All tool implementations
+# 1. Initialize Gemini client (with fallback)
+default_gemini_client = GeminiClient()
+# 2. Create Gradio UI for testing
+def create_gradio_ui():
+    with gr.Blocks() as demo:
+        # Settings tab for API key configuration
+        # Tab for each MCP tool (11 tabs)
+        # Tab for testing resources
+        # Tab for testing prompts
+        # API documentation tab
+    return demo
+# 3. Launch with MCP server enabled
+if __name__ == "__main__":
+    demo = create_gradio_ui()
+    demo.launch(
+        mcp_server=True,  # ← Enables MCP endpoint
+        share=False,
+        server_name="0.0.0.0",
+        server_port=7860
+    )
+```
+**MCP Enablement**:
+- `mcp_server=True` in `demo.launch()` automatically:
+  - Exposes `/gradio_api/mcp/sse` endpoint
+  - Discovers all `@gr.mcp.tool()`, `@gr.mcp.resource()`, `@gr.mcp.prompt()` decorated functions
+  - Generates MCP tool schemas from function signatures and docstrings
+  - Handles MCP protocol communication (SSE transport)
+**Testing Interface**:
+- **Settings Tab**: Configure Gemini API key and HF token
+- **Tool Tabs** (11): One tab per tool for manual testing
+  - Input fields for all parameters
+  - Submit button
+  - Output display (Markdown or JSON)
+- **Resources Tab**: Test resource URIs
+- **Prompts Tab**: Test prompt templates
+- **API Documentation Tab**: Generated from tool docstrings
+---
+### 2. mcp_tools.py - MCP Components
+**Purpose**: Implements all 17 MCP components (11 tools + 3 resources + 3 prompts).
+**Structure**:
+```python
+# mcp_tools.py structure
+import gradio as gr
+from gemini_client import GeminiClient
+from datasets import load_dataset
+# ============ TOOLS (11) ============
+@gr.mcp.tool()
+async def analyze_leaderboard(...) -> str:
+    """Tool docstring (becomes MCP description)"""
+    # 1. Load data from HuggingFace
+    # 2. Process/filter data
+    # 3. Call Gemini for AI analysis
+    # 4. Return formatted response
+    pass
+@gr.mcp.tool()
+async def debug_trace(...) -> str:
+    """Debug traces with AI assistance"""
+    pass
+# ... (9 more tools)
+# ============ RESOURCES (3) ============
+@gr.mcp.resource()
+def get_leaderboard_data(uri: str) -> str:
+    """URI: leaderboard://{repo}"""
+    # Parse URI
+    # Load dataset
+    # Return raw JSON
+    pass
+@gr.mcp.resource()
+def get_trace_data(uri: str) -> str:
+    """URI: trace://{trace_id}/{repo}"""
+    pass
+@gr.mcp.resource()
+def get_cost_data(uri: str) -> str:
+    """URI: cost://model/{model_name}"""
+    pass
+# ============ PROMPTS (3) ============
+@gr.mcp.prompt()
+def analysis_prompt(analysis_type: str, ...) -> str:
+    """Generate analysis prompt templates"""
+    pass
+@gr.mcp.prompt()
+def debug_prompt(debug_type: str, ...) -> str:
+    """Generate debug prompt templates"""
+    pass
+@gr.mcp.prompt()
+def optimization_prompt(optimization_goal: str, ...) -> str:
+    """Generate optimization prompt templates"""
+    pass
+```
+**Design Patterns**:
+1. **Decorator-Based Registration**:
+   ```python
+   @gr.mcp.tool()  # Gradio automatically registers as MCP tool
+   async def tool_name(...) -> str:
+       """Docstring becomes tool description in MCP schema"""
+       pass
+   ```
+2. **Structured Docstrings**:
+   ```python
+   """
+   Brief one-line description.
+   Longer detailed description explaining purpose and behavior.
+   Args:
+       param1 (type): Description of param1
+       param2 (type): Description of param2. Default: value
+   Returns:
+       type: Description of return value
+   """
+   ```
+   Gradio parses this to generate MCP tool schema automatically.
+3. **Error Handling**:
+   ```python
+   try:
+       # Tool implementation
+       return result
+   except Exception as e:
+       return f"❌ **Error**: {str(e)}"
+   ```
+   All errors returned as user-friendly strings.
+4. **Async/Await**:
+   All tools are `async` for efficient I/O operations (API calls, dataset loading).
+---
+### 3. gemini_client.py - AI Integration
+**Purpose**: Handles all interactions with Google Gemini 2.5 Flash Lite API.
+**Key Features**:
+- API authentication
+- Prompt engineering for different analysis types
+- Response parsing and formatting
+- Error handling and retries
+- Token optimization
+**Class Structure**:
+```python
+class GeminiClient:
+    def __init__(self, api_key: str, model_name: str):
+        """Initialize with API key and model"""
+        self.api_key = api_key
+        self.model = genai.GenerativeModel(model_name)
+        self.generation_config = {
+            "temperature": 0.7,
+            "top_p": 0.95,
+            "max_output_tokens": 4096,  # Optimized for HF Spaces
+        }
+        self.request_timeout = 30  # 30s timeout
+    async def analyze_with_context(
+        self,
+        data: Dict,
+        analysis_type: str,
+        specific_question: Optional[str] = None
+    ) -> str:
+        """
+        Core analysis method used by all AI-powered tools
+        Args:
+            data: Data to analyze (dict or JSON)
+            analysis_type: "leaderboard", "trace", "cost_estimate", "comparison", "results"
+            specific_question: Optional specific question
+        Returns:
+            Markdown-formatted analysis
+        """
+        # 1. Build system prompt based on analysis_type
+        system_prompt = self._get_system_prompt(analysis_type)
+        # 2. Format data for context
+        data_str = json.dumps(data, indent=2)
+        # 3. Build user prompt
+        user_prompt = f"{system_prompt}\n\nData:\n{data_str}"
+        if specific_question:
+            user_prompt += f"\n\nSpecific Question: {specific_question}"
+        # 4. Call Gemini API
+        response = await self.model.generate_content_async(
+            user_prompt,
+            generation_config=self.generation_config,
+            request_options={"timeout": self.request_timeout}
+        )
+        # 5. Extract and return text
+        return response.text
+    def _get_system_prompt(self, analysis_type: str) -> str:
+        """Get specialized system prompt for each analysis type"""
+        prompts = {
+            "leaderboard": """You are an expert AI agent performance analyst.
+                Analyze evaluation leaderboard data and provide:
+                - Top performers by key metrics
+                - Trade-off analysis (cost vs accuracy)
+                - Trend identification
+                - Actionable recommendations
+                Format: Markdown with clear sections.""",
+            "trace": """You are an expert at debugging AI agent executions.
+                Analyze OpenTelemetry trace data and:
+                - Answer specific questions about execution
+                - Identify performance bottlenecks
+                - Explain reasoning chain
+                - Provide optimization suggestions
+                Format: Clear, concise explanation.""",
+            "cost_estimate": """You are a cost optimization expert.
+                Analyze cost estimation data and provide:
+                - Detailed cost breakdown
+                - Hardware recommendations
+                - Cost optimization opportunities
+                - ROI analysis
+                Format: Structured breakdown with recommendations.""",
+            # ... more prompts for other analysis types
+        }
+        return prompts.get(analysis_type, prompts["leaderboard"])
+```
+**Optimization Strategies**:
+- **Token Reduction**: `max_output_tokens: 4096` (reduced from 8192) for faster responses
+- **Request Timeout**: 30s timeout for HF Spaces compatibility
+- **Temperature**: 0.7 for balanced creativity and consistency
+- **Model Selection**: `gemini-2.5-flash-lite` for speed (can switch to `gemini-2.5-flash` for quality)
+---
+## MCP Protocol Implementation
+### How Gradio's Native MCP Support Works
+Gradio 6+ provides native MCP server capabilities through decorators and automatic schema generation.
+**1. Tool Registration**:
+```python
+@gr.mcp.tool()  # �� This decorator tells Gradio to expose this as an MCP tool
+async def my_tool(param1: str, param2: int = 10) -> str:
+    """
+    Brief description (used in MCP tool schema).
+    Args:
+        param1 (str): Description of param1
+        param2 (int): Description of param2. Default: 10
+    Returns:
+        str: Description of return value
+    """
+    return f"Result: {param1}, {param2}"
+```
+**What Gradio does automatically**:
+- Parses function signature to extract parameter names and types
+- Parses docstring to extract descriptions
+- Generates MCP tool schema:
+  ```json
+  {
+    "name": "my_tool",
+    "description": "Brief description (used in MCP tool schema).",
+    "inputSchema": {
+      "type": "object",
+      "properties": {
+        "param1": {
+          "type": "string",
+          "description": "Description of param1"
+        },
+        "param2": {
+          "type": "integer",
+          "default": 10,
+          "description": "Description of param2. Default: 10"
+        }
+      },
+      "required": ["param1"]
+    }
+  }
+  ```
+**2. Resource Registration**:
+```python
+@gr.mcp.resource()
+def get_resource(uri: str) -> str:
+    """
+    Resource description.
+    Args:
+        uri (str): Resource URI (e.g., "leaderboard://repo/name")
+    Returns:
+        str: JSON data
+    """
+    # Parse URI
+    # Load data
+    # Return JSON string
+    pass
+```
+**3. Prompt Registration**:
+```python
+@gr.mcp.prompt()
+def generate_prompt(prompt_type: str, context: str) -> str:
+    """
+    Generate reusable prompt templates.
+    Args:
+        prompt_type (str): Type of prompt
+        context (str): Context for prompt generation
+    Returns:
+        str: Generated prompt text
+    """
+    return f"Prompt template for {prompt_type} with {context}"
+```
+### MCP Endpoint URLs
+When `demo.launch(mcp_server=True)` is called:
+**SSE Endpoint** (Primary):
+```
+https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse
+```
+**Streamable HTTP Endpoint** (Alternative):
+```
+https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/
+```
+### Client Configuration
+**Claude Desktop** (`claude_desktop_config.json`):
+```json
+{
+  "mcpServers": {
+    "tracemind": {
+      "url": "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse",
+      "transport": "sse"
+    }
+  }
+}
+```
+**Python MCP Client**:
+```python
+from mcp import ClientSession, ServerParameters
+session = ClientSession(
+    ServerParameters(
+        url="https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse",
+        transport="sse"
+    )
+)
+await session.__aenter__()
+# List tools
+tools = await session.list_tools()
+# Call tool
+result = await session.call_tool("analyze_leaderboard", arguments={
+    "metric_focus": "cost",
+    "top_n": 5
+})
+```
+---
+## Gemini Integration
+### API Configuration
+**Environment Variable**:
+```bash
+GEMINI_API_KEY=your_api_key_here
+```
+**Initialization**:
+```python
+import google.generativeai as genai
+genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
+model = genai.GenerativeModel("gemini-2.5-flash-lite")
+```
+### Prompt Engineering Strategy
+**1. System Prompts by Analysis Type**:
+Each analysis type (leaderboard, trace, cost, comparison, results) has a specialized system prompt that:
+- Defines the AI's role and expertise
+- Specifies output format (markdown, structured sections)
+- Lists key insights to include
+- Sets tone (professional, concise, actionable)
+**2. Context Injection**:
+```python
+user_prompt = f"""
+{system_prompt}
+Data to Analyze:
+{json.dumps(data, indent=2)}
+Specific Question: {question}
+"""
+```
+**3. Output Formatting**:
+- All responses in Markdown
+- Clear sections: Top Performers, Key Insights, Trade-offs, Recommendations
+- Bullet points for readability
+- Code blocks for technical details
+### Rate Limiting & Error Handling
+**Rate Limits** (Gemini 2.5 Flash Lite free tier):
+- 1,500 requests per day
+- 1 request per second
+**Error Handling Strategy**:
+```python
+try:
+    response = await model.generate_content_async(...)
+    return response.text
+except google.api_core.exceptions.ResourceExhausted:
+    return "❌ **Rate limit exceeded**. Please try again in a few seconds."
+except google.api_core.exceptions.DeadlineExceeded:
+    return "❌ **Request timeout**. The analysis is taking too long. Try with less data."
+except Exception as e:
+    return f"❌ **Error**: {str(e)}"
+```
+---
+## Data Flow
+### Tool Execution Flow
+```
+1. MCP Client                    (e.g., Claude Desktop, TraceMind-AI)
+   └─→ Calls: analyze_leaderboard(metric_focus="cost", top_n=5)
+2. Gradio MCP Server             (app.py)
+   └─→ Routes to: analyze_leaderboard() in mcp_tools.py
+3. MCP Tool Function             (mcp_tools.py)
+   ├─→ Load data from HuggingFace Datasets
+   │   └─→ ds = load_dataset("kshitijthakkar/smoltrace-leaderboard")
+   │
+   ├─→ Process/filter data
+   ��   └─→ Filter by time range, sort by metric
+   │
+   ├─→ Call Gemini Client
+   │   └─→ gemini_client.analyze_with_context(data, "leaderboard")
+   │
+   └─→ Return formatted response
+4. Gemini Client                 (gemini_client.py)
+   ├─→ Build system prompt
+   ├─→ Format data as JSON
+   ├─→ Call Gemini API
+   │   └─→ model.generate_content_async(prompt)
+   └─→ Return AI-generated analysis
+5. Response Path                 (back through stack)
+   └─→ Gemini → gemini_client → mcp_tool → Gradio → MCP Client
+6. MCP Client                    (displays result to user)
+   └─→ Shows markdown-formatted analysis
+```
+### Resource Access Flow
+```
+1. MCP Client
+   └─→ Accesses: leaderboard://kshitijthakkar/smoltrace-leaderboard
+2. Gradio MCP Server
+   └─→ Routes to: get_leaderboard_data(uri)
+3. Resource Function
+   ├─→ Parse URI to extract repo name
+   ├─→ Load dataset from HuggingFace
+   ├─→ Convert to JSON
+   └─→ Return raw JSON string
+4. MCP Client
+   └─→ Receives raw JSON data (no AI processing)
+```
+---
+## Deployment Architecture
+### HuggingFace Spaces Deployment
+**Platform**: HuggingFace Spaces
+**SDK**: Docker (for custom dependencies)
+**Hardware**: CPU Basic (free tier) - sufficient for API calls and dataset loading
+**URL**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
+### Dockerfile
+```dockerfile
+# Base image
+FROM python:3.10-slim
+# Set working directory
+WORKDIR /app
+# Copy requirements
+COPY requirements.txt .
+# Install dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application files
+COPY app.py .
+COPY mcp_tools.py .
+COPY gemini_client.py .
+# Expose port
+EXPOSE 7860
+# Set environment variables
+ENV GRADIO_SERVER_NAME="0.0.0.0"
+ENV GRADIO_SERVER_PORT="7860"
+# Run application
+CMD ["python", "app.py"]
+```
+### Environment Variables (HF Spaces Secrets)
+```bash
+# Required
+GEMINI_API_KEY=your_gemini_api_key_here
+# Optional (for testing)
+HF_TOKEN=your_huggingface_token_here
+```
+### Scaling Considerations
+**Current Setup** (Free Tier):
+- Hardware: CPU Basic
+- Concurrent Users: ~10-20
+- Request Latency: 2-5 seconds (AI analysis)
+- Rate Limit: Gemini API (1,500 req/day)
+**If Scaling Needed**:
+1. **Upgrade Hardware**: CPU Basic → CPU Upgrade (2x performance)
+2. **Caching**: Add Redis for caching frequent queries
+3. **API Key Pool**: Rotate multiple Gemini API keys to bypass rate limits
+4. **Load Balancing**: Deploy multiple Spaces instances with load balancer
+---
+## Development Workflow
+### Local Development Setup
+```bash
+# 1. Clone repository
+git clone https://github.com/Mandark-droid/TraceMind-mcp-server.git
+cd TraceMind-mcp-server
+# 2. Create virtual environment
+python -m venv venv
+source venv/bin/activate  # Windows: venv\Scripts\activate
+# 3. Install dependencies
+pip install -r requirements.txt
+# 4. Configure environment
+cp .env.example .env
+# Edit .env with your API keys
+# 5. Run locally
+python app.py
+# 6. Access
+# - Gradio UI: http://localhost:7860
+# - MCP Endpoint: http://localhost:7860/gradio_api/mcp/sse
+```
+### Testing MCP Tools
+**Option 1: Gradio UI** (Easiest):
+```
+1. Run app.py
+2. Open http://localhost:7860
+3. Navigate to tool tab (e.g., "📊 Analyze Leaderboard")
+4. Fill in parameters
+5. Click submit button
+6. View results
+```
+**Option 2: Python MCP Client**:
+```python
+from mcp import ClientSession, ServerParameters
+async def test_tool():
+    session = ClientSession(
+        ServerParameters(
+            url="http://localhost:7860/gradio_api/mcp/sse",
+            transport="sse"
+        )
+    )
+    await session.__aenter__()
+    result = await session.call_tool("analyze_leaderboard", {
+        "metric_focus": "cost",
+        "top_n": 3
+    })
+    print(result.content[0].text)
+import asyncio
+asyncio.run(test_tool())
+```
+### Adding New MCP Tools
+**Step 1: Add function to mcp_tools.py**:
+```python
+@gr.mcp.tool()
+async def new_tool_name(
+    param1: str,
+    param2: int = 10
+) -> str:
+    """
+    Brief description of what this tool does.
+    Detailed explanation of the tool's purpose and behavior.
+    Args:
+        param1 (str): Description of param1 with examples
+        param2 (int): Description of param2. Default: 10
+    Returns:
+        str: Description of what the function returns
+    """
+    try:
+        # Implementation
+        result = f"Processed: {param1} with {param2}"
+        return result
+    except Exception as e:
+        return f"❌ **Error**: {str(e)}"
+```
+**Step 2: Add testing UI to app.py** (optional):
+```python
+with gr.Tab("🆕 New Tool"):
+    gr.Markdown("## New Tool Name")
+    param1_input = gr.Textbox(label="Param 1")
+    param2_input = gr.Number(label="Param 2", value=10)
+    submit_btn = gr.Button("Execute")
+    output = gr.Markdown()
+    submit_btn.click(
+        fn=new_tool_name,
+        inputs=[param1_input, param2_input],
+        outputs=output
+    )
+```
+**Step 3: Test**:
+```bash
+python app.py
+# Visit http://localhost:7860
+# Test in new tab
+```
+**Step 4: Deploy**:
+```bash
+git add mcp_tools.py app.py
+git commit -m "feat: Add new_tool_name MCP tool"
+git push origin main
+# HF Spaces auto-deploys
+```
+---
+## Performance Considerations
+### 1. Token Optimization
+**Problem**: Loading full datasets consumes excessive tokens in AI analysis.
+**Solutions**:
+- **get_top_performers**: Returns only top N models (90% token reduction)
+- **get_leaderboard_summary**: Returns aggregated stats (99% token reduction)
+- **Data sampling**: Limit rows when loading datasets (max_rows parameter)
+**Example**:
+```python
+# ❌ BAD: Loads 51 rows, ~50K tokens
+full_data = load_dataset("kshitijthakkar/smoltrace-leaderboard")
+# ✅ GOOD: Returns top 5, ~5K tokens (90% reduction)
+top_5 = await get_top_performers(top_n=5)
+# ✅ BETTER: Returns summary, ~500 tokens (99% reduction)
+summary = await get_leaderboard_summary()
+```
+### 2. Async Operations
+All tools are `async` for efficient I/O:
+```python
+@gr.mcp.tool()
+async def tool_name(...):  # ← async
+    ds = load_dataset(...)  # ← Blocks on I/O
+    result = await gemini_client.analyze(...)  # ← async API call
+    return result
+```
+Benefits:
+- Non-blocking API calls
+- Multiple concurrent requests
+- Better resource utilization
+### 3. Caching (Future Enhancement)
+**Current**: No caching (stateless)
+**Future**: Add Redis for caching frequent queries
+```python
+import redis
+from functools import wraps
+redis_client = redis.Redis(...)
+def cache_result(ttl=300):
+    def decorator(func):
+        @wraps(func)
+        async def wrapper(*args, **kwargs):
+            # Generate cache key
+            cache_key = f"{func.__name__}:{hash((args, tuple(kwargs.items())))}"
+            # Check cache
+            cached = redis_client.get(cache_key)
+            if cached:
+                return cached.decode()
+            # Execute function
+            result = await func(*args, **kwargs)
+            # Store in cache
+            redis_client.setex(cache_key, ttl, result)
+            return result
+        return wrapper
+    return decorator
+@gr.mcp.tool()
+@cache_result(ttl=300)  # 5-minute cache
+async def analyze_leaderboard(...):
+    pass
+```
+---
+## Security
+### API Key Management
+**Storage**:
+- Development: `.env` file (gitignored)
+- Production: HuggingFace Spaces Secrets (encrypted)
+**Access**:
+```python
+# gemini_client.py
+api_key = os.getenv("GEMINI_API_KEY")
+if not api_key:
+    raise ValueError("GEMINI_API_KEY not set")
+```
+**Never**:
+- ❌ Hardcode API keys in source code
+- ❌ Commit `.env` to git
+- ❌ Expose keys in client-side JavaScript
+- ❌ Log API keys in console/files
+### Input Validation
+**Dataset Repository Validation**:
+```python
+# Only allow "smoltrace-" prefix datasets
+if "smoltrace-" not in dataset_repo:
+    return "❌ Error: Dataset must contain 'smoltrace-' prefix for security"
+```
+**Parameter Validation**:
+```python
+# Constrain ranges
+top_n = max(1, min(20, top_n))  # Clamp between 1-20
+max_rows = max(10, min(500, max_rows))  # Clamp between 10-500
+```
+### Rate Limiting
+**Gemini API**:
+- Free tier: 1,500 requests/day
+- Handled by Google (automatic)
+- Errors returned as user-friendly messages
+**HuggingFace Datasets**:
+- No rate limits for public datasets
+- Private datasets require HF token
+---
+## Related Documentation
+- [README.md](PROPOSED_README_MCP_SERVER.md) - Overview and quick start
+- [DOCUMENTATION.md](DOCUMENTATION_MCP_SERVER.md) - Complete API reference
+- [TraceMind-AI Architecture](ARCHITECTURE_TRACEMIND_AI.md) - Client-side architecture
+---
+**Last Updated**: November 21, 2025
+**Version**: 1.0.0
+**Track**: Building MCP (Enterprise)

DOCUMENTATION.md ADDED Viewed

	@@ -0,0 +1,918 @@

+# TraceMind MCP Server - Complete API Documentation
+This document provides comprehensive API reference for all MCP components provided by TraceMind MCP Server.
+## Table of Contents
+- [MCP Tools (11)](#mcp-tools)
+  - [AI-Powered Analysis Tools](#ai-powered-analysis-tools)
+  - [Token-Optimized Tools](#token-optimized-tools)
+  - [Data Management Tools](#data-management-tools)
+- [MCP Resources (3)](#mcp-resources)
+- [MCP Prompts (3)](#mcp-prompts)
+- [Error Handling](#error-handling)
+- [Best Practices](#best-practices)
+---
+## MCP Tools
+### AI-Powered Analysis Tools
+These tools use Google Gemini 2.5 Flash to provide intelligent, context-aware analysis of agent evaluation data.
+#### 1. analyze_leaderboard
+Analyzes evaluation leaderboard data from HuggingFace datasets and generates AI-powered insights.
+**Parameters:**
+- `leaderboard_repo` (str): HuggingFace dataset repository
+  - Default: `"kshitijthakkar/smoltrace-leaderboard"`
+  - Format: `"username/dataset-name"`
+- `metric_focus` (str): Primary metric to analyze
+  - Options: `"overall"`, `"accuracy"`, `"cost"`, `"latency"`, `"co2"`
+  - Default: `"overall"`
+- `time_range` (str): Time period to analyze
+  - Options: `"last_week"`, `"last_month"`, `"all_time"`
+  - Default: `"last_week"`
+- `top_n` (int): Number of top models to highlight
+  - Range: 1-20
+  - Default: 5
+**Returns:** String containing AI-generated analysis with:
+- Top performers by selected metric
+- Trade-off analysis (e.g., accuracy vs cost)
+- Trend identification
+- Actionable recommendations
+**Example Use Case:**
+Before choosing a model for production, get AI-powered insights on which configuration offers the best cost/performance for your requirements.
+**Example Call:**
+```python
+result = await analyze_leaderboard(
+    leaderboard_repo="kshitijthakkar/smoltrace-leaderboard",
+    metric_focus="cost",
+    time_range="last_week",
+    top_n=5
+)
+```
+**Example Response:**
+```
+Based on 247 evaluations in the past week:
+Top Performers (Cost Focus):
+1. meta-llama/Llama-3.1-8B: $0.002 per run, 93.4% accuracy
+2. mistralai/Mistral-7B: $0.003 per run, 91.2% accuracy
+3. openai/gpt-3.5-turbo: $0.008 per run, 94.1% accuracy
+Trade-off Analysis:
+- Llama-3.1 offers best cost/performance ratio at 25x cheaper than GPT-4
+- GPT-4 leads in accuracy (95.8%) but costs $0.05 per run
+- For production with 1M runs/month: Llama-3.1 saves $48,000 vs GPT-4
+Recommendations:
+- Cost-sensitive: Use Llama-3.1-8B (93% accuracy, minimal cost)
+- Accuracy-critical: Use GPT-4 (96% accuracy, premium cost)
+- Balanced: Use GPT-3.5-Turbo (94% accuracy, moderate cost)
+```
+---
+#### 2. debug_trace
+Analyzes OpenTelemetry trace data and answers specific questions about agent execution.
+**Parameters:**
+- `trace_dataset` (str): HuggingFace dataset containing traces
+  - Format: `"username/smoltrace-traces-model"`
+  - Must contain "smoltrace-" prefix
+- `trace_id` (str): Specific trace ID to analyze
+  - Format: `"trace_abc123"`
+- `question` (str): Question about the trace
+  - Examples: "Why was tool X called twice?", "Which step took the most time?"
+- `include_metrics` (bool): Include GPU metrics in analysis
+  - Default: `true`
+**Returns:** String containing AI analysis of the trace with:
+- Answer to the specific question
+- Relevant span details
+- Performance insights
+- GPU metrics (if available and requested)
+**Example Use Case:**
+When an agent test fails, understand exactly what happened without manually parsing trace spans.
+**Example Call:**
+```python
+result = await debug_trace(
+    trace_dataset="kshitij/smoltrace-traces-gpt4",
+    trace_id="trace_abc123",
+    question="Why was the search tool called twice?",
+    include_metrics=True
+)
+```
+**Example Response:**
+```
+Based on trace analysis:
+Answer:
+The agent called the search_web tool twice due to an iterative reasoning pattern:
+1. First call (span_003 at 14:23:19.000):
+   - Query: "weather in Tokyo"
+   - Duration: 890ms
+   - Result: 5 results, oldest was 2 days old
+2. Second call (span_005 at 14:23:21.200):
+   - Query: "latest weather in Tokyo"
+   - Duration: 1200ms
+   - Modified reasoning: LLM determined first results were stale
+Performance Impact:
+- Added 2.09s to total execution time
+- Cost increase: +$0.0003 (tokens for second reasoning step)
+- This is normal behavior for tool-calling agents with iterative reasoning
+GPU Metrics:
+- N/A (API model, no GPU used)
+```
+---
+#### 3. estimate_cost
+Predicts costs, duration, and environmental impact before running evaluations.
+**Parameters:**
+- `model` (str, required): Model name to evaluate
+  - Format: `"provider/model-name"` (e.g., `"openai/gpt-4"`, `"meta-llama/Llama-3.1-8B"`)
+- `agent_type` (str): Type of agent evaluation
+  - Options: `"tool"`, `"code"`, `"both"`
+  - Default: `"both"`
+- `num_tests` (int): Number of test cases
+  - Range: 1-10000
+  - Default: 100
+- `hardware` (str): Hardware type
+  - Options: `"auto"`, `"cpu"`, `"gpu_a10"`, `"gpu_h200"`
+  - Default: `"auto"` (auto-selects based on model)
+**Returns:** String containing cost estimate with:
+- LLM API costs (for API models)
+- HuggingFace Jobs compute costs (for local models)
+- Estimated duration
+- CO2 emissions estimate
+- Hardware recommendations
+**Example Use Case:**
+Compare the cost of evaluating GPT-4 vs Llama-3.1 across 1000 tests before committing resources.
+**Example Call:**
+```python
+result = await estimate_cost(
+    model="openai/gpt-4",
+    agent_type="both",
+    num_tests=1000,
+    hardware="auto"
+)
+```
+**Example Response:**
+```
+Cost Estimate for openai/gpt-4:
+LLM API Costs:
+- Estimated tokens per test: 1,500
+- Token cost: $0.03/1K input, $0.06/1K output
+- Total LLM cost: $50.00 (1000 tests)
+Compute Costs:
+- Recommended hardware: cpu-basic (API model)
+- HF Jobs cost: ~$0.05/hr
+- Estimated duration: 45 minutes
+- Total compute cost: $0.04
+Total Cost: $50.04
+Cost per test: $0.05
+CO2 emissions: ~0.5g (API calls, minimal compute)
+Recommendations:
+- This is an API model, CPU hardware is sufficient
+- For cost optimization, consider Llama-3.1-8B (25x cheaper)
+- Estimated runtime: 45 minutes for 1000 tests
+```
+---
+#### 4. compare_runs
+Compares two evaluation runs with AI-powered analysis across multiple dimensions.
+**Parameters:**
+- `run_id_1` (str, required): First run ID from leaderboard
+- `run_id_2` (str, required): Second run ID from leaderboard
+- `leaderboard_repo` (str): Leaderboard dataset repository
+  - Default: `"kshitijthakkar/smoltrace-leaderboard"`
+- `focus` (str): Comparison focus area
+  - Options:
+    - `"comprehensive"`: All dimensions
+    - `"cost"`: Cost efficiency and ROI
+    - `"performance"`: Speed and accuracy trade-offs
+    - `"eco_friendly"`: Environmental impact
+  - Default: `"comprehensive"`
+**Returns:** String containing AI comparison with:
+- Success rate comparison with statistical significance
+- Cost efficiency analysis
+- Speed comparison
+- Environmental impact (CO2 emissions)
+- GPU efficiency (for GPU jobs)
+**Example Use Case:**
+After running evaluations with two different models, compare them head-to-head to determine which is better for production deployment.
+**Example Call:**
+```python
+result = await compare_runs(
+    run_id_1="run_abc123",
+    run_id_2="run_def456",
+    leaderboard_repo="kshitijthakkar/smoltrace-leaderboard",
+    focus="cost"
+)
+```
+**Example Response:**
+```
+Comparison: GPT-4 vs Llama-3.1-8B (Cost Focus)
+Success Rates:
+- GPT-4: 95.8% (96/100 tests)
+- Llama-3.1: 93.4% (93/100 tests)
+- Difference: +2.4% for GPT-4 (statistically significant, p<0.05)
+Cost Efficiency:
+- GPT-4: $0.05 per test, $0.052 per successful test
+- Llama-3.1: $0.002 per test, $0.0021 per successful test
+- Cost ratio: GPT-4 is 25x more expensive
+ROI Analysis:
+- For 1M evaluations/month:
+  - GPT-4: $50,000/month, 958K successes
+  - Llama-3.1: $2,000/month, 934K successes
+- GPT-4 provides 24K more successes for $48K more cost
+- Cost per additional success: $2.00
+Recommendation (Cost Focus):
+Use Llama-3.1-8B for cost-sensitive workloads where 93% accuracy is acceptable.
+Switch to GPT-4 only for accuracy-critical tasks where the 2.4% improvement justifies 25x cost.
+```
+---
+#### 5. analyze_results
+Analyzes detailed test results and provides optimization recommendations.
+**Parameters:**
+- `results_repo` (str, required): HuggingFace dataset containing results
+  - Format: `"username/smoltrace-results-model-timestamp"`
+  - Must contain "smoltrace-results-" prefix
+- `analysis_focus` (str): Focus area for analysis
+  - Options: `"failures"`, `"performance"`, `"cost"`, `"comprehensive"`
+  - Default: `"comprehensive"`
+- `max_rows` (int): Maximum test cases to analyze
+  - Range: 10-500
+  - Default: 100
+**Returns:** String containing AI analysis with:
+- Failure patterns and root causes
+- Performance bottlenecks in specific test cases
+- Cost optimization opportunities
+- Tool usage patterns
+- Task-specific insights (which types work well vs poorly)
+- Actionable optimization recommendations
+**Example Use Case:**
+After running an evaluation, analyze the detailed test results to understand why certain tests are failing and get specific recommendations for improving success rate.
+**Example Call:**
+```python
+result = await analyze_results(
+    results_repo="kshitij/smoltrace-results-gpt4-20251120",
+    analysis_focus="failures",
+    max_rows=100
+)
+```
+**Example Response:**
+```
+Analysis of Test Results (100 tests analyzed)
+Overall Statistics:
+- Success Rate: 89% (89/100 tests passed)
+- Average Duration: 3.2s per test
+- Total Cost: $4.50 ($0.045 per test)
+Failure Analysis (11 failures):
+1. Tool Not Found (6 failures):
+   - Test IDs: task_012, task_045, task_067, task_089, task_091, task_093
+   - Pattern: All failed tests required the 'get_weather' tool
+   - Root Cause: Tool definition missing or incorrect name
+   - Fix: Ensure 'get_weather' tool is available in agent's tool list
+2. Timeout (3 failures):
+   - Test IDs: task_034, task_071, task_088
+   - Pattern: Complex multi-step tasks with >5 tool calls
+   - Root Cause: Exceeding 30s timeout limit
+   - Fix: Increase timeout to 60s or simplify complex tasks
+3. Incorrect Response (2 failures):
+   - Test IDs: task_056, task_072
+   - Pattern: Math calculation tasks
+   - Root Cause: Model hallucinating numbers instead of using calculator tool
+   - Fix: Update prompt to emphasize tool usage for calculations
+Performance Insights:
+- Fast tasks (<2s): 45 tests - Simple single-tool calls
+- Slow tasks (>5s): 12 tests - Multi-step reasoning with 3+ tools
+- Optimal duration: 2-3s for most tasks
+Cost Optimization:
+- High-cost tests: task_023 ($0.12) - Used 4K tokens
+- Low-cost tests: task_087 ($0.008) - Used 180 tokens
+- Recommendation: Optimize prompt to reduce token usage by 20%
+Recommendations:
+1. Add missing 'get_weather' tool → Fixes 6 failures
+2. Increase timeout from 30s to 60s → Fixes 3 failures
+3. Strengthen calculator tool instruction → Fixes 2 failures
+4. Expected improvement: 89% → 100% success rate
+```
+---
+### Token-Optimized Tools
+These tools are specifically designed to minimize token usage when querying leaderboard data.
+#### 6. get_top_performers
+Get top N performing models from leaderboard with 90% token reduction.
+**Performance Optimization:** Returns only top N models instead of loading the full leaderboard dataset (51 runs), resulting in **90% token reduction**.
+**When to Use:** Perfect for queries like "Which model is leading?", "Show me the top 5 models".
+**Parameters:**
+- `leaderboard_repo` (str): HuggingFace dataset repository
+  - Default: `"kshitijthakkar/smoltrace-leaderboard"`
+- `metric` (str): Metric to rank by
+  - Options: `"success_rate"`, `"total_cost_usd"`, `"avg_duration_ms"`, `"co2_emissions_g"`
+  - Default: `"success_rate"`
+- `top_n` (int): Number of top models to return
+  - Range: 1-20
+  - Default: 5
+**Returns:** JSON string with:
+- Metric used for ranking
+- Ranking order (ascending/descending)
+- Total runs in leaderboard
+- Array of top performers with 10 essential fields
+**Benefits:**
+- ✅ Token Reduction: 90% fewer tokens vs full dataset
+- ✅ Ready to Use: Properly formatted JSON
+- ✅ Pre-Sorted: Already ranked by chosen metric
+- ✅ Essential Data Only: 10 fields vs 20+ in full dataset
+**Example Call:**
+```python
+result = await get_top_performers(
+    leaderboard_repo="kshitijthakkar/smoltrace-leaderboard",
+    metric="total_cost_usd",
+    top_n=3
+)
+```
+**Example Response:**
+```json
+{
+  "metric": "total_cost_usd",
+  "order": "ascending",
+  "total_runs": 51,
+  "top_performers": [
+    {
+      "run_id": "run_001",
+      "model": "meta-llama/Llama-3.1-8B",
+      "success_rate": 93.4,
+      "total_cost_usd": 0.002,
+      "avg_duration_ms": 2100,
+      "agent_type": "both",
+      "provider": "transformers",
+      "submitted_by": "kshitij",
+      "timestamp": "2025-11-20T10:30:00Z",
+      "total_tests": 100
+    },
+    ...
+  ]
+}
+```
+---
+#### 7. get_leaderboard_summary
+Get high-level leaderboard statistics with 99% token reduction.
+**Performance Optimization:** Returns only aggregated statistics instead of raw data, resulting in **99% token reduction**.
+**When to Use:** Perfect for overview queries like "How many runs are in the leaderboard?", "What's the average success rate?".
+**Parameters:**
+- `leaderboard_repo` (str): HuggingFace dataset repository
+  - Default: `"kshitijthakkar/smoltrace-leaderboard"`
+**Returns:** JSON string with:
+- Total runs count
+- Unique models and submitters
+- Overall statistics (avg/best/worst success rates, avg cost, avg duration, total CO2)
+- Breakdown by agent type
+- Breakdown by provider
+- Top 3 models by success rate
+**Benefits:**
+- ✅ Extreme Token Reduction: 99% fewer tokens
+- ✅ Ready to Use: Properly formatted JSON
+- ✅ Comprehensive Stats: Averages, distributions, breakdowns
+- ✅ Quick Insights: Perfect for overview questions
+**Example Call:**
+```python
+result = await get_leaderboard_summary(
+    leaderboard_repo="kshitijthakkar/smoltrace-leaderboard"
+)
+```
+**Example Response:**
+```json
+{
+  "total_runs": 51,
+  "unique_models": 12,
+  "unique_submitters": 3,
+  "overall_stats": {
+    "avg_success_rate": 89.2,
+    "best_success_rate": 95.8,
+    "worst_success_rate": 78.3,
+    "avg_cost_usd": 0.012,
+    "avg_duration_ms": 3200,
+    "total_co2_g": 45.6
+  },
+  "by_agent_type": {
+    "tool": {"count": 20, "avg_success_rate": 88.5},
+    "code": {"count": 18, "avg_success_rate": 87.2},
+    "both": {"count": 13, "avg_success_rate": 92.1}
+  },
+  "by_provider": {
+    "litellm": {"count": 30, "avg_success_rate": 91.3},
+    "transformers": {"count": 21, "avg_success_rate": 86.4}
+  },
+  "top_3_models": [
+    {"model": "openai/gpt-4", "success_rate": 95.8},
+    {"model": "anthropic/claude-3", "success_rate": 94.1},
+    {"model": "meta-llama/Llama-3.1-8B", "success_rate": 93.4}
+  ]
+}
+```
+---
+### Data Management Tools
+#### 8. get_dataset
+Loads SMOLTRACE datasets from HuggingFace and returns raw data as JSON.
+**⚠️ Important:** For leaderboard queries, prefer using `get_top_performers()` or `get_leaderboard_summary()` to avoid token bloat!
+**Security Restriction:** Only datasets with "smoltrace-" in the repository name are allowed.
+**Parameters:**
+- `dataset_repo` (str, required): HuggingFace dataset repository
+  - Must contain "smoltrace-" prefix
+  - Format: `"username/smoltrace-type-model"`
+- `split` (str): Dataset split to load
+  - Default: `"train"`
+- `limit` (int): Maximum rows to return
+  - Range: 1-200
+  - Default: 100
+**Returns:** JSON string with:
+- Total rows in dataset
+- List of column names
+- Array of data rows (up to `limit`)
+**Primary Use Cases:**
+- Load `smoltrace-results-*` datasets for test case details
+- Load `smoltrace-traces-*` datasets for OpenTelemetry data
+- Load `smoltrace-metrics-*` datasets for GPU metrics
+- **NOT recommended** for leaderboard queries (use optimized tools)
+**Example Call:**
+```python
+result = await get_dataset(
+    dataset_repo="kshitij/smoltrace-results-gpt4",
+    split="train",
+    limit=50
+)
+```
+---
+#### 9. generate_synthetic_dataset
+Creates domain-specific test datasets for SMOLTRACE evaluations using AI.
+**Parameters:**
+- `domain` (str, required): Domain for tasks
+  - Examples: "e-commerce", "customer service", "finance", "healthcare"
+- `tools` (list[str], required): Available tools
+  - Example: `["search_web", "get_weather", "calculator"]`
+- `num_tasks` (int): Number of tasks to generate
+  - Range: 1-100
+  - Default: 20
+- `difficulty_distribution` (str): Task difficulty mix
+  - Options: `"balanced"`, `"easy_only"`, `"medium_only"`, `"hard_only"`, `"progressive"`
+  - Default: `"balanced"`
+- `agent_type` (str): Target agent type
+  - Options: `"tool"`, `"code"`, `"both"`
+  - Default: `"both"`
+**Returns:** JSON string with:
+- `dataset_info`: Metadata (domain, tools, counts, timestamp)
+- `tasks`: Array of SMOLTRACE-formatted tasks
+- `usage_instructions`: Guide for HuggingFace upload and SMOLTRACE usage
+**SMOLTRACE Task Format:**
+```json
+{
+  "id": "unique_identifier",
+  "prompt": "Clear, specific task for the agent",
+  "expected_tool": "tool_name",
+  "expected_tool_calls": 1,
+  "difficulty": "easy|medium|hard",
+  "agent_type": "tool|code",
+  "expected_keywords": ["keyword1", "keyword2"]
+}
+```
+**Difficulty Calibration:**
+- **Easy** (40%): Single tool call, straightforward input
+- **Medium** (40%): Multiple tool calls OR complex input parsing
+- **Hard** (20%): Multiple tools, complex reasoning, edge cases
+**Enterprise Use Cases:**
+- Custom Tools: Benchmark proprietary APIs
+- Industry-Specific: Generate tasks for finance, healthcare, legal
+- Internal Workflows: Test company-specific processes
+**Example Call:**
+```python
+result = await generate_synthetic_dataset(
+    domain="customer service",
+    tools=["search_knowledge_base", "create_ticket", "send_email"],
+    num_tasks=50,
+    difficulty_distribution="balanced",
+    agent_type="tool"
+)
+```
+---
+#### 10. push_dataset_to_hub
+Upload generated datasets to HuggingFace Hub with proper formatting.
+**Parameters:**
+- `dataset_name` (str, required): Repository name on HuggingFace
+  - Format: `"username/my-dataset"`
+- `data` (str or list, required): Dataset content
+  - Can be JSON string or list of dictionaries
+- `description` (str): Dataset description for card
+  - Default: Auto-generated
+- `private` (bool): Make dataset private
+  - Default: `False`
+**Returns:** Success message with dataset URL
+**Example Workflow:**
+1. Generate synthetic dataset with `generate_synthetic_dataset`
+2. Review and modify tasks if needed
+3. Upload to HuggingFace with `push_dataset_to_hub`
+4. Use in SMOLTRACE evaluations or share with team
+**Example Call:**
+```python
+result = await push_dataset_to_hub(
+    dataset_name="kshitij/my-custom-evaluation",
+    data=generated_tasks,
+    description="Custom evaluation dataset for e-commerce agents",
+    private=False
+)
+```
+---
+#### 11. generate_prompt_template
+Generate customized smolagents prompt template for a specific domain and tool set.
+**Parameters:**
+- `domain` (str, required): Domain for the prompt template
+  - Examples: `"finance"`, `"healthcare"`, `"customer_support"`, `"e-commerce"`
+- `tool_names` (str, required): Comma-separated list of tool names
+  - Format: `"tool1,tool2,tool3"`
+  - Example: `"get_stock_price,calculate_roi,fetch_company_info"`
+- `agent_type` (str): Agent type
+  - Options: `"tool"` (ToolCallingAgent), `"code"` (CodeAgent)
+  - Default: `"tool"`
+**Returns:** JSON response containing:
+- Customized YAML prompt template
+- Metadata (domain, tools, agent_type, timestamp)
+- Usage instructions
+**Use Case:**
+When you generate synthetic datasets with `generate_synthetic_dataset`, use this tool to create a matching prompt template that agents can use during evaluation. This ensures your evaluation setup is complete and ready to run.
+**Integration:**
+The generated prompt template can be included in your HuggingFace dataset card, making it easy for anyone to run evaluations with your dataset.
+**Example Call:**
+```python
+result = await generate_prompt_template(
+    domain="customer_support",
+    tool_names="search_knowledge_base,create_ticket,send_email,escalate_to_human",
+    agent_type="tool"
+)
+```
+**Example Response:**
+```json
+{
+  "prompt_template": "---\nname: customer_support_agent\ndescription: An AI agent for customer support tasks...\n\ninstructions: |-\n  You are a helpful customer support agent...\n  \n  Available tools:\n  - search_knowledge_base: Search the knowledge base...\n  - create_ticket: Create a support ticket...\n  ...",
+  "metadata": {
+    "domain": "customer_support",
+    "tools": ["search_knowledge_base", "create_ticket", "send_email", "escalate_to_human"],
+    "agent_type": "tool",
+    "base_template": "ToolCallingAgent",
+    "timestamp": "2025-11-21T10:30:00Z"
+  },
+  "usage_instructions": "1. Save the prompt_template to a file (e.g., customer_support_prompt.yaml)\n2. Use with SMOLTRACE: smoltrace-eval --model your-model --prompt-file customer_support_prompt.yaml\n3. Or include in your dataset card for easy evaluation"
+}
+```
+---
+## MCP Resources
+Resources provide direct data access without AI analysis. Access via URI scheme.
+### 1. leaderboard://{repo}
+Direct access to raw leaderboard data in JSON format.
+**URI Format:**
+```
+leaderboard://username/dataset-name
+```
+**Example:**
+```
+GET leaderboard://kshitijthakkar/smoltrace-leaderboard
+```
+**Returns:** JSON array with all evaluation runs, including:
+- run_id, model, agent_type, provider
+- success_rate, total_tests, successful_tests, failed_tests
+- avg_duration_ms, total_tokens, total_cost_usd, co2_emissions_g
+- results_dataset, traces_dataset, metrics_dataset (references)
+- timestamp, submitted_by, hf_job_id
+---
+### 2. trace://{trace_id}/{repo}
+Direct access to trace data with OpenTelemetry spans.
+**URI Format:**
+```
+trace://trace_id/username/dataset-name
+```
+**Example:**
+```
+GET trace://trace_abc123/kshitij/agent-traces-gpt4
+```
+**Returns:** JSON with:
+- traceId
+- spans array (spanId, parentSpanId, name, kind, startTime, endTime, attributes, status)
+---
+### 3. cost://model/{model_name}
+Model pricing and hardware cost information.
+**URI Format:**
+```
+cost://model/provider/model-name
+```
+**Example:**
+```
+GET cost://model/openai/gpt-4
+```
+**Returns:** JSON with:
+- Model pricing (input/output token costs)
+- Recommended hardware tier
+- Estimated compute costs
+- CO2 emissions per 1K tokens
+---
+## MCP Prompts
+Prompts provide reusable templates for standardized interactions.
+### 1. analysis_prompt
+Templates for different analysis types.
+**Parameters:**
+- `analysis_type` (str): Type of analysis
+  - Options: `"leaderboard"`, `"cost"`, `"performance"`, `"trace"`
+- `focus_area` (str): Specific focus
+  - Options: `"overall"`, `"cost"`, `"accuracy"`, `"speed"`, `"eco"`
+- `detail_level` (str): Level of detail
+  - Options: `"summary"`, `"detailed"`, `"comprehensive"`
+**Returns:** Formatted prompt string for use with AI tools
+**Example:**
+```python
+prompt = analysis_prompt(
+    analysis_type="leaderboard",
+    focus_area="cost",
+    detail_level="detailed"
+)
+# Returns: "Provide a detailed analysis of cost efficiency in the leaderboard..."
+```
+---
+### 2. debug_prompt
+Templates for debugging scenarios.
+**Parameters:**
+- `debug_type` (str): Type of debugging
+  - Options: `"failure"`, `"performance"`, `"tool_calling"`, `"reasoning"`
+- `context` (str): Additional context
+  - Options: `"test_failure"`, `"timeout"`, `"unexpected_tool"`, `"reasoning_loop"`
+**Returns:** Formatted prompt string
+**Example:**
+```python
+prompt = debug_prompt(
+    debug_type="performance",
+    context="tool_calling"
+)
+# Returns: "Analyze tool calling performance. Identify which tools are slow..."
+```
+---
+### 3. optimization_prompt
+Templates for optimization goals.
+**Parameters:**
+- `optimization_goal` (str): Optimization target
+  - Options: `"cost"`, `"speed"`, `"accuracy"`, `"co2"`
+- `constraints` (str): Constraints to respect
+  - Options: `"maintain_quality"`, `"no_accuracy_loss"`, `"budget_limit"`, `"time_limit"`
+**Returns:** Formatted prompt string
+**Example:**
+```python
+prompt = optimization_prompt(
+    optimization_goal="cost",
+    constraints="maintain_quality"
+)
+# Returns: "Analyze this evaluation setup and recommend cost optimizations..."
+```
+---
+## Error Handling
+### Common Error Responses
+**Invalid Dataset Repository:**
+```json
+{
+  "error": "Dataset must contain 'smoltrace-' prefix for security",
+  "provided": "username/invalid-dataset"
+}
+```
+**Dataset Not Found:**
+```json
+{
+  "error": "Dataset not found on HuggingFace",
+  "repository": "username/smoltrace-nonexistent"
+}
+```
+**API Rate Limit:**
+```json
+{
+  "error": "Gemini API rate limit exceeded",
+  "retry_after": 60
+}
+```
+**Invalid Parameters:**
+```json
+{
+  "error": "Invalid parameter value",
+  "parameter": "top_n",
+  "value": 50,
+  "allowed_range": "1-20"
+}
+```
+---
+## Best Practices
+### 1. Token Optimization
+**DO:**
+- Use `get_top_performers()` for "top N" queries (90% token reduction)
+- Use `get_leaderboard_summary()` for overview queries (99% token reduction)
+- Set appropriate `limit` when using `get_dataset()`
+**DON'T:**
+- Use `get_dataset()` for leaderboard queries (loads all 51 runs)
+- Request more data than needed
+- Ignore token optimization tools
+### 2. AI Tool Usage
+**DO:**
+- Use AI tools (`analyze_leaderboard`, `debug_trace`) for complex analysis
+- Provide specific questions to `debug_trace` for focused answers
+- Use `focus` parameter in `compare_runs` for targeted comparisons
+**DON'T:**
+- Use AI tools for simple data retrieval (use resources instead)
+- Make vague requests (be specific for better results)
+### 3. Dataset Security
+**DO:**
+- Only use datasets with "smoltrace-" prefix
+- Verify dataset exists before requesting
+- Use public datasets or authenticate for private ones
+**DON'T:**
+- Try to access arbitrary HuggingFace datasets
+- Share private dataset URLs without authentication
+### 4. Cost Management
+**DO:**
+- Use `estimate_cost` before running large evaluations
+- Compare cost estimates across different models
+- Consider token-optimized tools to reduce API costs
+**DON'T:**
+- Skip cost estimation for expensive operations
+- Ignore hardware recommendations
+- Overlook CO2 emissions in decision-making
+---
+## Support
+For issues or questions:
+- 📧 GitHub Issues: [TraceMind-mcp-server/issues](https://github.com/Mandark-droid/TraceMind-mcp-server/issues)
+- 💬 HF Discord: `#agents-mcp-hackathon-winter25`
+- 🏷️ Tag: `building-mcp-track-enterprise`

README.md CHANGED Viewed

@@ -23,497 +23,143 @@ tags:
   <img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-mcp-server/assets/Logo.png" alt="TraceMind MCP Server Logo" width="200"/>
 </p>
-**AI-Powered Analysis Tools for Agent Evaluation Data**
 [![MCP's 1st Birthday Hackathon](https://img.shields.io/badge/MCP%27s%201st%20Birthday-Hackathon-blue)](https://github.com/modelcontextprotocol)
-[![Track 1](https://img.shields.io/badge/Track-Building%20MCP%20(Enterprise)-blue)](https://github.com/modelcontextprotocol/hackathon)
-[![HF Space](https://img.shields.io/badge/HuggingFace-TraceMind--MCP--Server-yellow?logo=huggingface)](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
-[![Google Gemini](https://img.shields.io/badge/Powered%20by-Google%20Gemini%202.5%20Pro-orange)](https://ai.google.dev/)
 > **🎯 Track 1 Submission**: Building MCP (Enterprise)
 > **📅 MCP's 1st Birthday Hackathon**: November 14-30, 2025
-## Overview
-TraceMind MCP Server is a Gradio-based MCP (Model Context Protocol) server that provides a complete MCP implementation with:
-### 🏗️ **Built on Open Source Foundation**
-This MCP server is part of a complete agent evaluation ecosystem built on two foundational open-source projects:
-**🔭 TraceVerde (genai_otel_instrument)** - Automatic OpenTelemetry Instrumentation
-- **What**: Zero-code OTEL instrumentation for LLM frameworks (LiteLLM, Transformers, LangChain, etc.)
-- **Why**: Captures every LLM call, tool usage, and agent step automatically
-- **Links**: [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) | [PyPI](https://pypi.org/project/genai-otel-instrument)
-**📊 SMOLTRACE** - Agent Evaluation Engine
-- **What**: Lightweight, production-ready evaluation framework with OTEL tracing built-in
-- **Why**: Generates structured datasets (leaderboard, results, traces, metrics) that this MCP server analyzes
-- **Links**: [GitHub](https://github.com/Mandark-droid/SMOLTRACE) | [PyPI](https://pypi.org/project/smoltrace/)
-**The Flow**: `TraceVerde` instruments your agents → `SMOLTRACE` evaluates them → `TraceMind MCP Server` provides AI-powered analysis of the results
 ---
-### 🛠️ **9 AI-Powered & Optimized Tools**
-1. **📊 analyze_leaderboard**: Generate AI-powered insights from evaluation leaderboard data
-2. **🐛 debug_trace**: Debug specific agent execution traces using OpenTelemetry data with AI assistance
-3. **💰 estimate_cost**: Predict evaluation costs before running with AI-powered recommendations
-4. **⚖️ compare_runs**: Compare two evaluation runs with AI-powered analysis
-5. **🏆 get_top_performers**: Get top N models from leaderboard (optimized for quick queries, avoids token bloat)
-6. **📈 get_leaderboard_summary**: Get high-level leaderboard statistics (optimized for overview queries)
-7. **📦 get_dataset**: Load SMOLTRACE datasets (smoltrace-* prefix only) as JSON for flexible analysis
-8. **🧪 generate_synthetic_dataset**: Create domain-specific test datasets for SMOLTRACE evaluations (supports up to 100 tasks with parallel batched generation)
-9. **📤 push_dataset_to_hub**: Upload generated datasets to HuggingFace Hub
-### 📦 **3 Data Resources**
-1. **leaderboard data**: Direct JSON access to evaluation results
-2. **trace data**: Raw OpenTelemetry trace data with spans
-3. **cost data**: Model pricing and hardware cost information
-### 📝 **3 Prompt Templates**
-1. **analysis prompts**: Standardized templates for different analysis types
-2. **debug prompts**: Templates for debugging scenarios
-3. **optimization prompts**: Templates for optimization goals
-All analysis is powered by **Google Gemini 2.5 Flash** for intelligent, context-aware insights.
 ## 🔗 Quick Links
-- **Gradio UI**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
-- **MCP Endpoint (SSE - Recommended)**: `https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse`
-- **MCP Endpoint (Streamable HTTP)**: `https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/`
-- **Auto-Config**: Add `MCP-1st-Birthday/TraceMind-mcp-server` at https://huggingface.co/settings/mcp
-> 💡 **Tip**: Use the Auto-Config link above for the easiest setup! It generates the correct config for your MCP client automatically.
-## 📱 Social Media & Demo
-**📢 Announcement Post**: [Coming Soon - X/LinkedIn post]
-**🎥 Demo Video**: [Coming Soon - YouTube/Loom link showing MCP server integration with Claude Desktop]
 ---
-## Why This MCP Server?
-**Problem**: Agent evaluation generates massive amounts of data (leaderboards, traces, metrics), but developers struggle to:
-- Understand which models perform best for their use case
-- Debug why specific agent executions failed
-- Estimate costs before running expensive evaluations
-**Solution**: This MCP server provides AI-powered analysis tools that connect to HuggingFace datasets and deliver actionable insights in natural language.
-**Impact**: Developers can make informed decisions about agent configurations, debug issues faster, and optimize costs—all through a simple MCP interface.
-## Features
-### 🎯 Track 1 Compliance: Building MCP (Enterprise)
-- ✅ **Complete MCP Implementation**: Tools, Resources, AND Prompts
-- ✅ **MCP Standard Compliant**: Built with Gradio's native MCP support (`@gr.mcp.*` decorators)
-- ✅ **Production-Ready**: Deployable to HuggingFace Spaces with SSE transport
-- ✅ **Testing Interface**: Beautiful Gradio UI for testing all components
-- ✅ **Enterprise Focus**: Cost optimization, debugging, decision support, and custom dataset generation
-- ✅ **Google Gemini Powered**: Leverages Gemini 2.5 Flash for intelligent analysis
-- ✅ **17 Total Components**: 11 Tools + 3 Resources + 3 Prompts
-### 🛠️ Eleven Production-Ready Tools
-#### 1. analyze_leaderboard
-Analyzes evaluation leaderboard data from HuggingFace datasets and provides:
-- Top performers by selected metric (accuracy, cost, latency, CO2)
-- Trade-off analysis (e.g., "GPT-4 is most accurate but Llama-3.1 is 25x cheaper")
-- Trend identification
-- Actionable recommendations
-**Example Use Case**: Before choosing a model for production, get AI-powered insights on which configuration offers the best cost/performance for your requirements.
-#### 2. debug_trace
-Analyzes OpenTelemetry trace data and answers specific questions like:
-- "Why was tool X called twice?"
-- "Which step took the most time?"
-- "Why did this test fail?"
-**Example Use Case**: When an agent test fails, understand exactly what happened without manually parsing trace spans.
-#### 3. estimate_cost
-Predicts costs before running evaluations:
-- LLM API costs (token-based)
-- HuggingFace Jobs compute costs
-- CO2 emissions estimate
-- Hardware recommendations
-**Example Use Case**: Compare the cost of evaluating GPT-4 vs Llama-3.1 across 1000 tests before committing resources.
-#### 4. compare_runs
-Compares two evaluation runs with AI-powered analysis across multiple dimensions:
-- Success rate comparison with statistical significance
-- Cost efficiency analysis (total cost, cost per test, cost per successful test)
-- Speed comparison (average duration, throughput)
-- Environmental impact (CO2 emissions per test)
-- GPU efficiency (for GPU jobs)
-**Focus Options**:
-- `comprehensive`: Complete comparison across all dimensions
-- `cost`: Detailed cost efficiency and ROI analysis
-- `performance`: Speed and accuracy trade-off analysis
-- `eco_friendly`: Environmental impact and carbon footprint comparison
-**Example Use Case**: After running evaluations with two different models, compare them head-to-head to determine which is better for production deployment based on your priorities (accuracy, cost, speed, or environmental impact).
-#### 5. get_top_performers
-Get top performing models from leaderboard with optimized token usage.
-**⚡ Performance Optimization**: This tool returns only the top N models (5-20 runs) instead of loading the full leaderboard dataset (51 runs), resulting in **90% token reduction** compared to using `get_dataset()`.
-**When to Use**: Perfect for queries like:
-- "Which model is leading?"
-- "Show me the top 5 models"
-- "What's the best model for cost efficiency?"
-**Parameters**:
-- `leaderboard_repo` (str): HuggingFace dataset repository (default: "kshitijthakkar/smoltrace-leaderboard")
-- `metric` (str): Metric to rank by - "success_rate", "total_cost_usd", "avg_duration_ms", or "co2_emissions_g" (default: "success_rate")
-- `top_n` (int): Number of top models to return (range: 1-20, default: 5)
-**Returns**: Properly formatted JSON with:
-- Metric used for ranking
-- Ranking order (ascending/descending)
-- Total runs in leaderboard
-- Array of top performers with essential fields only (10 fields vs 20+ in full dataset)
-**Benefits**:
-- ✅ **Token Reduction**: Returns 5-20 runs instead of all 51 runs (90% fewer tokens)
-- ✅ **Ready to Use**: Properly formatted JSON (no parsing needed, no string conversion issues)
-- ✅ **Pre-Sorted**: Already sorted by your chosen metric
-- ✅ **Essential Data Only**: Includes only 10 essential columns to minimize token usage
-**Example Use Case**: An agent needs to quickly answer "What are the top 3 most cost-effective models?" without consuming excessive tokens by loading the entire leaderboard dataset.
-#### 6. get_leaderboard_summary
-Get high-level leaderboard statistics without loading individual runs.
-**⚡ Performance Optimization**: This tool returns only aggregated statistics instead of raw data, resulting in **99% token reduction** compared to using `get_dataset()` on the full leaderboard.
-**When to Use**: Perfect for overview queries like:
-- "How many runs are in the leaderboard?"
-- "What's the average success rate across all models?"
-- "Give me an overview of evaluation results"
-**Parameters**:
-- `leaderboard_repo` (str): HuggingFace dataset repository (default: "kshitijthakkar/smoltrace-leaderboard")
-**Returns**: Properly formatted JSON with:
-- Total runs count
-- Unique models and submitters count
-- Overall statistics (avg/best/worst success rates, avg cost, avg duration, total CO2)
-- Breakdown by agent type (tool/code/both)
-- Breakdown by provider (litellm/transformers)
-- Top 3 models by success rate
-**Benefits**:
-- ✅ **Extreme Token Reduction**: Returns summary stats instead of 51 runs (99% fewer tokens)
-- ✅ **Ready to Use**: Properly formatted JSON (no parsing needed)
-- ✅ **Comprehensive Stats**: Includes averages, distributions, and breakdowns
-- ✅ **Quick Insights**: Perfect for "overview" and "summary" questions
-**Example Use Case**: An agent needs to provide a high-level overview of evaluation results without loading 51 individual runs and consuming 50K+ tokens.
-#### 7. get_dataset
-Loads SMOLTRACE datasets from HuggingFace and returns raw data as JSON:
-- Simple, flexible tool that returns complete dataset with metadata
-- Works with any dataset containing "smoltrace-" prefix
-- Returns total rows, columns list, and data array
-- Automatically sorts by timestamp if available
-- Configurable row limit (1-200) to manage token usage
-**⚠️ Important**: For leaderboard queries, **prefer using `get_top_performers()` or `get_leaderboard_summary()` instead** - they're specifically optimized to avoid token bloat!
-**Security Restriction**: Only datasets with "smoltrace-" in the repository name are allowed.
-**Primary Use Cases**:
-- Load `smoltrace-results-*` datasets to see individual test case details
-- Load `smoltrace-traces-*` datasets to access OpenTelemetry trace data
-- Load `smoltrace-metrics-*` datasets to get GPU performance data
-- For leaderboard queries: **Use `get_top_performers()` or `get_leaderboard_summary()` instead!**
-**Recommended Workflow**:
-1. For overview: Use `get_leaderboard_summary()` (99% token reduction)
-2. For top N queries: Use `get_top_performers()` (90% token reduction)
-3. For specific run IDs: Use `get_dataset()` only when you need non-leaderboard datasets
-**Example Use Case**: When you need to load trace data or results data for a specific run, use `get_dataset("username/smoltrace-traces-gpt4")`. For leaderboard queries, use the optimized tools instead.
-#### 8. generate_synthetic_dataset
-Generates domain-specific synthetic test datasets for SMOLTRACE evaluations using Google Gemini 2.5 Flash:
-- AI-powered task generation tailored to your domain
-- Custom tool specifications
-- Configurable difficulty distribution (balanced, easy_only, medium_only, hard_only, progressive)
-- Target specific agent types (tool, code, or both)
-- Output follows SMOLTRACE task format exactly
-- Supports up to 100 tasks with parallel batched generation
-**SMOLTRACE Task Format**:
-Each generated task includes:
-```json
-{
-  "id": "unique_identifier",
-  "prompt": "Clear, specific task for the agent",
-  "expected_tool": "tool_name",
-  "expected_tool_calls": 1,
-  "difficulty": "easy|medium|hard",
-  "agent_type": "tool|code",
-  "expected_keywords": ["keyword1", "keyword2"]
-}
-```
-**Enterprise Use Cases**:
-- **Custom Tools**: Create benchmarks for your proprietary APIs and tools
-- **Industry-Specific**: Generate tasks for finance, healthcare, legal, manufacturing, etc.
-- **Internal Workflows**: Test agents on company-specific processes
-- **Rapid Prototyping**: Quickly create evaluation datasets without manual curation
-**Difficulty Calibration**:
-- **Easy** (40%): Single tool call, straightforward input, clear expected output
-- **Medium** (40%): Multiple tool calls OR complex input parsing OR conditional logic
-- **Hard** (20%): Multiple tools, complex reasoning, edge cases, error handling
-**Output Includes**:
-- `dataset_info`: Metadata (domain, tools, counts, timestamp)
-- `tasks`: Ready-to-use SMOLTRACE task array
-- `usage_instructions`: Step-by-step guide for HuggingFace upload and SMOLTRACE usage
-**Example Use Case**: A financial services company wants to evaluate their customer service agent that uses custom tools for stock quotes, portfolio analysis, and transaction processing. They use this tool to generate 50 realistic tasks covering common customer inquiries across different difficulty levels, then run SMOLTRACE evaluations to benchmark different LLM models before deployment.
-#### 9. push_dataset_to_hub
-Upload generated datasets to HuggingFace Hub with proper formatting and metadata:
-- Automatically formats data for HuggingFace datasets library
-- Handles authentication via HF_TOKEN
-- Validates dataset structure before upload
-- Supports both public and private datasets
-- Adds comprehensive metadata (description, tags, license)
-- Creates dataset card with usage instructions
-**Parameters**:
-- `dataset_name`: Repository name on HuggingFace (e.g., "username/my-dataset")
-- `data`: Dataset content (list of dictionaries or JSON string)
-- `description`: Dataset description for the card
-- `private`: Whether to make the dataset private (default: False)
-**Example Workflow**:
-1. Generate synthetic dataset with `generate_synthetic_dataset`
-2. Review and modify tasks if needed
-3. Upload to HuggingFace with `push_dataset_to_hub`
-4. Use in SMOLTRACE evaluations or share with team
-**Example Use Case**: After generating a custom evaluation dataset for your domain, upload it to HuggingFace to share with your team, version control your benchmarks, or make it publicly available for the community.
-## MCP Resources Usage
-Resources provide direct data access without AI analysis:
-```python
-# Access leaderboard data
-GET leaderboard://kshitijthakkar/smoltrace-leaderboard
-# Returns: JSON with all evaluation runs
-# Access specific trace
-GET trace://trace_abc123/username/agent-traces-gpt4
-# Returns: JSON with trace spans and attributes
-# Get model cost information
-GET cost://model/openai/gpt-4
-# Returns: JSON with pricing and hardware costs
-```
-## MCP Prompts Usage
-Prompts provide reusable templates for standardized interactions:
-```python
-# Get analysis prompt template
-analysis_prompt(analysis_type="leaderboard", focus_area="cost", detail_level="detailed")
-# Returns: "Provide a detailed analysis. Analyze cost efficiency in the leaderboard..."
-# Get debug prompt template
-debug_prompt(debug_type="performance", context="tool_calling")
-# Returns: "Analyze tool calling performance. Identify which tools are slow..."
-# Get optimization prompt template
-optimization_prompt(optimization_goal="cost", constraints="maintain_quality")
-# Returns: "Analyze this evaluation setup and recommend cost optimizations..."
-```
-Use these prompts when interacting with the tools to get consistent, high-quality analysis.
-## Quick Start
-### 1. Installation
-```bash
-git clone https://github.com/Mandark-droid/TraceMind-mcp-server.git
-cd TraceMind-mcp-server
-# Create virtual environment
-python -m venv venv
-source venv/bin/activate  # On Windows: venv\Scripts\activate
-# Install dependencies (note: gradio[mcp] includes MCP support)
-pip install -r requirements.txt
-```
-### 2. Environment Setup
-Create `.env` file:
-```bash
-cp .env.example .env
-# Edit .env and add your API keys
-```
-Get your keys:
-- **Gemini API Key**: https://ai.google.dev/
-- **HuggingFace Token**: https://huggingface.co/settings/tokens
-### 3. Run Locally
-```bash
-python app.py
-```
-Open http://localhost:7860 to test the tools via Gradio interface.
-### 4. Test with Live Data
-Try the live example with real HuggingFace dataset:
-**In the Gradio UI, Tab "📊 Analyze Leaderboard":**
 ```
-Leaderboard Repository: kshitijthakkar/smoltrace-leaderboard
-Metric Focus: overall
-Time Range: last_week
-Top N Models: 5
 ```
-Click "🔍 Analyze" and get AI-powered insights from live data!
-## 🎯 For Hackathon Judges & Visitors
-### Using Your Own API Keys (Recommended)
-This MCP server has pre-configured API keys in HuggingFace Spaces Secrets for quick testing. However, **to prevent credit issues during evaluation**, we strongly recommend using your own API keys:
-#### Option 1: Configure in MCP Server UI (Simplest)
-1. **Open the MCP Server Space**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
-2. Navigate to the **⚙️ Settings** tab
-3. Enter your own **Gemini API Key** and **HuggingFace Token**
-4. Click **"Save & Override Keys"**
-5. ✅ Your keys will be used for all MCP tool calls in this session
-**Then you can**:
-- Use any tool in the tabs above
-- Connect from TraceMind-AI (it will automatically use your keys configured here)
-- Test with Claude Desktop (will use your keys)
-#### Option 2: For TraceMind-AI Integration
-If you're testing the complete TraceMind platform (Track 2 - MCP in Action):
-1. **Configure MCP Server** (as described above)
-2. **Open TraceMind-AI**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
-3. Navigate to **⚙️ Settings** in TraceMind-AI
-4. Enter your API keys there as well
-5. ✅ Both apps will use your keys
-### Why Two Settings Screens?
-- **TraceMind-AI** (Track 2) is the user-facing UI - calls MCP server for intelligent analysis
-- **TraceMind MCP Server** (Track 1) is the backend service - provides MCP tools
-- They run in **separate browser sessions** → need separate configuration
-- Configuring both ensures your keys are used throughout the evaluation flow
-### Getting Free API Keys
-Both APIs have generous free tiers perfect for hackathon evaluation:
-**Google Gemini API Key**:
-- Go to https://ai.google.dev/
-- Click "Get API Key" → Create project → Generate key
-- **Free tier**: 1,500 requests/day
-**HuggingFace Token**:
-- Go to https://huggingface.co/settings/tokens
-- Click "New token" → Name it (e.g., "TraceMind Access")
-- **Permissions**:
-  - Select "Read" for viewing datasets (sufficient for most tools)
-  - Select "Write" if you want to use `push_dataset_to_hub` tool to upload synthetic datasets
-- **Recommended**: Use "Write" permissions for full functionality
-- No rate limits for public dataset access
-### Default Configuration (If You Don't Configure)
-If you don't configure your own keys, the MCP server will use our pre-configured keys from HuggingFace Spaces Secrets. This is fine for quick testing, but please note:
-- Uses our API credits
-- May hit rate limits during high traffic
-- Recommended only for brief testing
-## MCP Integration
-### How It Works
-This Gradio app uses `mcp_server=True` in the launch configuration, which automatically:
-- Exposes all async functions with proper docstrings as MCP tools
-- Handles MCP protocol communication
-- Provides MCP interfaces via:
-  - **Streamable HTTP** (recommended) - Modern streaming protocol
-  - **SSE** (deprecated) - Server-Sent Events for legacy compatibility
-### Connecting from MCP Clients
-Once deployed to HuggingFace Spaces, your MCP server will be available at:
-**🎯 MCP Endpoint (SSE - Recommended)**:
-```
-https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse
-```
-**MCP Endpoint (Streamable HTTP)**:
-```
-https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/
-```
-**Note**: Both SSE and Streamable HTTP endpoints are fully supported. The SSE endpoint is recommended for most MCP clients.
-### ✨ Easiest Way to Connect
-**Recommended for all users** - HuggingFace provides an automatic configuration generator:
-1. **Visit**: https://huggingface.co/settings/mcp (while logged in)
-2. **Add Space**: Enter `MCP-1st-Birthday/TraceMind-mcp-server`
-3. **Select Client**: Choose Claude Desktop, VSCode, Cursor, etc.
-4. **Copy Config**: Get the auto-generated configuration snippet
-5. **Paste & Restart**: Add to your client's config file and restart
-This automatically configures the correct endpoint URL and transport method for your chosen client!
-### 🔧 Manual Configuration (Advanced)
-If you prefer to manually configure your MCP client:
-**Claude Desktop (`claude_desktop_config.json`)**:
 ```json
 {
   "mcpServers": {
@@ -525,7 +171,7 @@ If you prefer to manually configure your MCP client:
 }
 ```
-**VSCode / Cursor (`settings.json` or `.cursor/mcp.json`)**:
 ```json
 {
   "mcp.servers": {
@@ -537,375 +183,145 @@ If you prefer to manually configure your MCP client:
 }
 ```
-**Cline / Other MCP Clients**:
-- **URL**: `https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse`
-- **Transport**: `sse` (or use streamable HTTP endpoint with `streamable-http` transport)
-### ❓ Connection FAQ
-**Q: Which endpoint should I use?**
-A: Use the **Streamable HTTP endpoint** (`/gradio_api/mcp/`) for all new connections. It's the modern, recommended protocol.
-**Q: My client only supports SSE. What should I do?**
-A: Use the SSE endpoint (`/gradio_api/mcp/sse`) for now, but note that it's deprecated. Consider upgrading your client if possible.
-**Q: What's the difference between the two transports?**
-A: Streamable HTTP is the newer, more efficient protocol with better error handling and performance. SSE is the legacy protocol being phased out.
-**Q: How do I test if my connection works?**
-A: After configuring your client, restart it and look for "tracemind" in your available MCP tools/servers. You should see 7 tools, 3 resources, and 3 prompts.
-**Q: Can I use this MCP server without authentication?**
-A: The MCP endpoint is publicly accessible. However, the tools may require HuggingFace datasets to be public or accessible with your HF token (configured server-side).
-### Available MCP Components
-**Tools** (9):
-1. **analyze_leaderboard**: AI-powered leaderboard analysis with Gemini 2.5 Flash
-2. **debug_trace**: Trace debugging with AI insights
-3. **estimate_cost**: Cost estimation with optimization recommendations
-4. **compare_runs**: Compare two evaluation runs with AI-powered analysis
-5. **get_top_performers**: Get top N models from leaderboard (optimized, 90% token reduction)
-6. **get_leaderboard_summary**: Get leaderboard statistics (optimized, 99% token reduction)
-7. **get_dataset**: Load SMOLTRACE datasets (smoltrace-* only) as JSON
-8. **generate_synthetic_dataset**: Create domain-specific test datasets with AI
-9. **push_dataset_to_hub**: Upload datasets to HuggingFace Hub
-**Resources** (3):
-1. **leaderboard://{repo}**: Direct access to raw leaderboard data in JSON
-2. **trace://{trace_id}/{repo}**: Direct access to trace data with spans
-3. **cost://model/{model_name}**: Model pricing and hardware cost information
-**Prompts** (3):
-1. **analysis_prompt**: Reusable templates for different analysis types
-2. **debug_prompt**: Reusable templates for debugging scenarios
-3. **optimization_prompt**: Reusable templates for optimization goals
-See full API documentation in the Gradio interface under "📖 API Documentation" tab.
-## Architecture
 ```
-TraceMind-mcp-server/
-├── app.py                      # Gradio UI + MCP server (mcp_server=True)
-├── gemini_client.py            # Google Gemini 2.5 Flash integration
-├── mcp_tools.py                # 7 tool implementations
-├── requirements.txt            # Python dependencies
-├── .env.example                # Environment variable template
-├── .gitignore
-└── README.md
 ```
-**Key Technologies**:
-- **Gradio 6 with MCP support**: `gradio[mcp]` provides native MCP server capabilities
-- **Google Gemini 2.5 Flash**: Latest AI model for intelligent analysis
-- **HuggingFace Datasets**: Data source for evaluations
-- **Streamable HTTP Transport**: Modern streaming protocol for MCP communication (recommended)
-- **SSE Transport**: Server-Sent Events for legacy MCP compatibility (deprecated)
-## Deploy to HuggingFace Spaces
-### 1. Create Space
-Go to https://huggingface.co/new-space
-- **Space name**: `TraceMind-mcp-server`
-- **License**: AGPL-3.0
-- **SDK**: Gradio
-- **Hardware**: CPU Basic (free tier works fine)
-### 2. Add Files
-Upload all files from this repository to your Space:
-- `app.py`
-- `gemini_client.py`
-- `mcp_tools.py`
-- `requirements.txt`
-- `README.md`
-### 3. Add Secrets
-In Space settings → Variables and secrets, add:
-- `GEMINI_API_KEY`: Your Gemini API key
-- `HF_TOKEN`: Your HuggingFace token
-### 4. Add Hackathon Tag
-In Space settings → Tags, add:
-- `building-mcp-track-enterprise`
-### 5. Access Your MCP Server
-Your MCP server will be publicly available at:
-**Gradio UI**:
-```
-https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
-```
-**MCP Endpoint (SSE - Recommended)**:
-```
-https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse
-```
-**MCP Endpoint (Streamable HTTP)**:
-```
-https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/
-```
-Use the **Easiest Way to Connect** section above to configure your MCP client automatically!
-## Testing
-### Test 1: Analyze Leaderboard (Live Data)
-```bash
-# In Gradio UI - Tab "📊 Analyze Leaderboard":
-Repository: kshitijthakkar/smoltrace-leaderboard
-Metric: overall
-Time Range: last_week
-Top N: 5
-Click "🔍 Analyze"
-```
-**Expected**: AI-generated analysis of top performing models from live HuggingFace dataset
-### Test 2: Estimate Cost
-```bash
-# In Gradio UI - Tab "💰 Estimate Cost":
-Model: openai/gpt-4
-Agent Type: both
-Number of Tests: 100
-Hardware: auto
-Click "💰 Estimate"
-```
-**Expected**: Cost breakdown with LLM costs, HF Jobs costs, duration, and CO2 estimate
-### Test 3: Debug Trace
-Note: This requires actual trace data from an evaluation run. For testing purposes, this will show an error about missing data, which is expected behavior.
-## Hackathon Submission
-### Track 1: Building MCP (Enterprise)
-**Tag**: `building-mcp-track-enterprise`
-**Why Enterprise Track?**
-- Solves real business problems (cost optimization, debugging, decision support)
-- Production-ready tools with clear ROI
-- Integrates with enterprise data infrastructure (HuggingFace datasets)
-**Technology Stack**
-- **AI Analysis**: Google Gemini 2.5 Flash for all intelligent insights
-- **MCP Framework**: Gradio 6 with native MCP support
-- **Data Source**: HuggingFace Datasets
-- **Transport**: Streamable HTTP (recommended) and SSE (deprecated)
-## Related Project: TraceMind-AI (Track 2)
-This MCP server is designed to be consumed by **[TraceMind-AI](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)** (separate submission for Track 2: MCP in Action).
-**Links**:
-- **Live Demo**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
-- **GitHub**: https://github.com/Mandark-droid/TraceMind-AI
-TraceMind-AI is a Gradio-based agent evaluation platform that uses these MCP tools to provide:
-- AI-powered leaderboard insights with autonomous agent chat
-- Interactive trace debugging with MCP-powered Q&A
-- Real-time cost estimation and comparison
-- Complete evaluation workflow visualization
-## File Descriptions
-### app.py
-Main Gradio application with:
-- Testing UI for all 7 tools
-- MCP server enabled via `mcp_server=True`
-- API documentation
-### gemini_client.py
-Google Gemini 2.5 Flash client that:
-- Handles API authentication
-- Provides specialized analysis methods for different data types
-- Formats prompts for optimal results
-- Uses `gemini-2.5-pro-latest` model (can switch to `gemini-2.5-flash-latest`)
-### mcp_tools.py
-Complete MCP implementation with 13 components:
-**Tools** (9 async functions):
-- `analyze_leaderboard()`: AI-powered leaderboard analysis
-- `debug_trace()`: AI-powered trace debugging
-- `estimate_cost()`: AI-powered cost estimation
-- `compare_runs()`: AI-powered run comparison
-- `get_top_performers()`: Optimized tool to get top N models (90% token reduction)
-- `get_leaderboard_summary()`: Optimized tool for leaderboard statistics (99% token reduction)
-- `get_dataset()`: Load SMOLTRACE datasets as JSON (use optimized tools for leaderboard!)
-- `generate_synthetic_dataset()`: Create domain-specific test datasets with AI
-- `push_dataset_to_hub()`: Upload datasets to HuggingFace Hub
-**Resources** (3 decorated functions with `@gr.mcp.resource()`):
-- `get_leaderboard_data()`: Raw leaderboard JSON data
-- `get_trace_data()`: Raw trace JSON data with spans
-- `get_cost_data()`: Model pricing and hardware cost JSON
-**Prompts** (3 decorated functions with `@gr.mcp.prompt()`):
-- `analysis_prompt()`: Templates for different analysis types
-- `debug_prompt()`: Templates for debugging scenarios
-- `optimization_prompt()`: Templates for optimization goals
-Each function includes:
-- Appropriate decorator (`@gr.mcp.tool()`, `@gr.mcp.resource()`, or `@gr.mcp.prompt()`)
-- Detailed docstring with "Args:" section
-- Type hints for all parameters and return values
-- Descriptive function name (becomes the MCP component name)
-## Environment Variables
-Required environment variables:
 ```bash
-GEMINI_API_KEY=your_gemini_api_key_here
-HF_TOKEN=your_huggingface_token_here
-```
-## Development
-### Running Tests
-```bash
-# Test Gemini client
-python -c "from gemini_client import GeminiClient; client = GeminiClient(); print('✅ Gemini client initialized')"
-# Test with live leaderboard data
 python app.py
-# Open browser, test "Analyze Leaderboard" tab
-```
-### Adding New Tools
-To add a new MCP tool (with Gradio's native MCP support):
-1. **Add function to `mcp_tools.py`** with proper docstring:
-```python
-async def your_new_tool(
-    gemini_client: GeminiClient,
-    param1: str,
-    param2: int = 10
-) -> str:
-    """
-    Brief description of what the tool does.
-    Longer description explaining the tool's purpose and behavior.
-    Args:
-        gemini_client (GeminiClient): Initialized Gemini client for AI analysis
-        param1 (str): Description of param1 with examples if helpful
-        param2 (int): Description of param2. Default: 10
-    Returns:
-        str: Description of what the function returns
-    """
-    # Your implementation
-    return result
-```
-2. **Add UI tab in `app.py`** (optional, for testing):
-```python
-with gr.Tab("Your Tool"):
-    # Add UI components
-    # Wire up to your_new_tool()
 ```
-3. That's it! Gradio automatically exposes it as an MCP tool based on:
-   - Function name (becomes tool name)
-   - Docstring (becomes tool description)
-   - Args section (becomes parameter descriptions)
-   - Type hints (become parameter types)
-### Switching to Gemini 2.5 Flash
-For faster (but slightly less capable) responses, switch to Gemini 2.5 Flash:
-```python
-# In app.py, change:
-gemini_client = GeminiClient(model_name="gemini-2.5-flash-latest")
-```
-## 🙏 Credits & Acknowledgments
-### Hackathon Sponsors
-Special thanks to the sponsors of **MCP's 1st Birthday Hackathon** (November 14-30, 2025):
-- **🤗 HuggingFace** - Hosting platform and dataset infrastructure
-- **🧠 Google Gemini** - AI analysis powered by Gemini 2.5 Flash API
-- **⚡ Modal** - Serverless infrastructure partner
-- **🏢 Anthropic** - MCP protocol creators
-- **🎨 Gradio** - Native MCP framework support
-- **🎙️ ElevenLabs** - Audio AI capabilities
-- **🦙 SambaNova** - High-performance AI infrastructure
-- **🎯 Blaxel** - Additional compute credits
-### Related Open Source Projects
-This MCP server builds upon our open source agent evaluation ecosystem:
-#### 📊 SMOLTRACE - Agent Evaluation Engine
-- **Description**: Lightweight, production-ready evaluation framework for AI agents with OpenTelemetry instrumentation
-- **GitHub**: [https://github.com/Mandark-droid/SMOLTRACE](https://github.com/Mandark-droid/SMOLTRACE)
-- **PyPI**: [https://pypi.org/project/smoltrace/](https://pypi.org/project/smoltrace/)
-#### 🔭 TraceVerde - GenAI OpenTelemetry Instrumentation
-- **Description**: Automatic OpenTelemetry instrumentation for LLM frameworks (LiteLLM, Transformers, LangChain, etc.)
-- **GitHub**: [https://github.com/Mandark-droid/genai_otel_instrument](https://github.com/Mandark-droid/genai_otel_instrument)
-- **PyPI**: [https://pypi.org/project/genai-otel-instrument](https://pypi.org/project/genai-otel-instrument)
-### Built By
 **Track**: Building MCP (Enterprise)
 **Author**: Kshitij Thakkar
 **Powered by**: Google Gemini 2.5 Flash
 **Built with**: Gradio (native MCP support)
----
-## 📄 License
-AGPL-3.0 License
-This project is licensed under the GNU Affero General Public License v3.0. See the LICENSE file for details.
 ---
-## 💬 Support
-For issues or questions:
-- 📧 Open an issue on GitHub
-- 💬 Join the [HuggingFace Discord](https://discord.gg/huggingface) - Channel: `#agents-mcp-hackathon-winter25`
-- 🏷️ Tag `building-mcp-track-enterprise` for hackathon-related questions
-- 🐦 Follow us on X: [@TraceMindAI](https://twitter.com/TraceMindAI) (placeholder)
-## Changelog
-### v1.0.0 (2025-11-14)
-- Initial release for MCP Hackathon
-- **Complete MCP Implementation**: 17 components total
-  - 11 AI-powered and optimized tools:
-    - analyze_leaderboard, debug_trace, estimate_cost, compare_runs, analyze_results (AI-powered analysis)
-    - get_top_performers, get_leaderboard_summary (optimized for token reduction)
-    - get_dataset, generate_synthetic_dataset, generate_prompt_template, push_dataset_to_hub (data management)
-  - 3 data resources (leaderboard, trace, cost data)
-  - 3 prompt templates (analysis, debug, optimization)
-- Gradio native MCP support with decorators (`@gr.mcp.*`)
-- Google Gemini 2.5 Flash integration for all AI analysis
-- Live HuggingFace dataset integration
-- **Performance Optimizations**:
-  - get_top_performers: 90% token reduction vs full leaderboard
-  - get_leaderboard_summary: 99% token reduction vs full leaderboard
-  - Proper JSON serialization (no string conversion issues)
-- SSE transport for MCP communication
-- Production-ready for HuggingFace Spaces deployment

   <img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-mcp-server/assets/Logo.png" alt="TraceMind MCP Server Logo" width="200"/>
 </p>
+**AI-Powered Analysis Tools for Agent Evaluation**
 [![MCP's 1st Birthday Hackathon](https://img.shields.io/badge/MCP%27s%201st%20Birthday-Hackathon-blue)](https://github.com/modelcontextprotocol)
+[![Track 1: Building MCP](https://img.shields.io/badge/Track-Building%20MCP%20(Enterprise)-blue)](https://github.com/modelcontextprotocol/hackathon)
+[![Powered by Google Gemini](https://img.shields.io/badge/Powered%20by-Google%20Gemini%202.5%20Pro-orange)](https://ai.google.dev/)
 > **🎯 Track 1 Submission**: Building MCP (Enterprise)
 > **📅 MCP's 1st Birthday Hackathon**: November 14-30, 2025
+---
+## Why This MCP Server?
+**Problem**: Agent evaluation generates mountains of data—leaderboards, traces, metrics—but developers struggle to extract actionable insights.
+**Solution**: This MCP server provides **11 AI-powered tools** that transform raw evaluation data into clear answers:
+- *"Which model is best for my use case?"*
+- *"Why did this agent execution fail?"*
+- *"How much will this evaluation cost?"*
+**Powered by Google Gemini 2.5 Flash** for intelligent, context-aware analysis of agent performance data.
 ---
 ## 🔗 Quick Links
+- **🌐 Live Demo**: [TraceMind-mcp-server Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
+- **⚡ Auto-Config**: Add `MCP-1st-Birthday/TraceMind-mcp-server` at https://huggingface.co/settings/mcp
+- **📖 Full Docs**: See [DOCUMENTATION.md](DOCUMENTATION.md) for complete technical reference
+- **🎬 Quick Demo (5 min)**: [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)
+- **📺 Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)
+**MCP Endpoints**:
+- SSE (Recommended): `https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse`
+- Streamable HTTP: `https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/`
 ---
+## The TraceMind Ecosystem
+This MCP server is part of a **complete agent evaluation platform** built from four interconnected projects:
+<p align="center">
+  <img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/TraceVerse_Logo.png" alt="TraceVerse Ecosystem" width="400"/>
+</p>
 ```
+🔭 TraceVerde                    📊 SMOLTRACE
+(genai_otel_instrument)         (Evaluation Engine)
+        ↓                               ↓
+    Instruments                    Evaluates
+    LLM calls                      agents
+        ↓                               ↓
+        └───────────┬───────────────────┘
+                    ↓
+            Generates Datasets
+        (leaderboard, traces, metrics)
+                    ↓
+        ┌───────────┴───────────────────┐
+        ↓                               ↓
+🛠️ TraceMind MCP Server         🧠 TraceMind-AI
+(This Project - Track 1)        (UI Platform - Track 2)
+Analyzes with AI                Visualizes & Interacts
 ```
+### The Foundation
+**🔭 TraceVerde** - Zero-code OpenTelemetry instrumentation for LLM frameworks
+→ [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) | [PyPI](https://pypi.org/project/genai-otel-instrument)
+**📊 SMOLTRACE** - Lightweight evaluation engine that generates structured datasets
+→ [GitHub](https://github.com/Mandark-droid/SMOLTRACE) | [PyPI](https://pypi.org/project/smoltrace/)
+### The Platform
+**🛠️ TraceMind MCP Server** (This Project) - Provides MCP tools for AI-powered analysis
+→ **Track 1**: Building MCP (Enterprise)
+→ [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) | [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server)
+**🧠 TraceMind-AI** - Gradio UI that consumes MCP tools for interactive evaluation
+→ [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) | [GitHub](https://github.com/Mandark-droid/TraceMind-AI)
+→ **Track 2**: MCP in Action (Enterprise)
+---
+## What's Included
+### 11 AI-Powered Tools
+**Core Analysis** (AI-Powered by Gemini 2.5 Flash):
+1. **📊 analyze_leaderboard** - Generate insights from evaluation data
+2. **🐛 debug_trace** - Debug agent execution traces with AI assistance
+3. **💰 estimate_cost** - Predict costs before running evaluations
+4. **⚖️ compare_runs** - Compare two evaluation runs with AI analysis
+5. **📋 analyze_results** - Analyze detailed test results with optimization recommendations
+**Token-Optimized Tools**:
+6. **🏆 get_top_performers** - Get top N models (90% token reduction vs. full dataset)
+7. **📈 get_leaderboard_summary** - High-level statistics (99% token reduction)
+**Data Management**:
+8. **📦 get_dataset** - Load SMOLTRACE datasets as JSON
+9. **🧪 generate_synthetic_dataset** - Create domain-specific test datasets with AI (up to 100 tasks)
+10. **📤 push_dataset_to_hub** - Upload datasets to HuggingFace
+11. **📝 generate_prompt_template** - Generate customized smolagents prompt templates
+### 3 Data Resources
+Direct JSON access without AI analysis:
+- **leaderboard://{repo}** - Raw evaluation results
+- **trace://{trace_id}/{repo}** - OpenTelemetry spans
+- **cost://model/{model}** - Pricing information
+### 3 Prompt Templates
+Standardized templates for consistent analysis:
+- **analysis_prompt** - Different analysis types (leaderboard, cost, performance)
+- **debug_prompt** - Debugging scenarios
+- **optimization_prompt** - Optimization goals
+**Total: 17 MCP Components** (11 + 3 + 3)
+---
+## Quick Start
+### 1. Connect to the Live Server
+**Easiest Method** (Recommended):
+1. Visit https://huggingface.co/settings/mcp (while logged in)
+2. Add Space: `MCP-1st-Birthday/TraceMind-mcp-server`
+3. Select your MCP client (Claude Desktop, VSCode, Cursor, etc.)
+4. Copy the auto-generated config and paste into your client
+**Manual Configuration** (Advanced):
+For Claude Desktop (`claude_desktop_config.json`):
 ```json
 {
   "mcpServers": {
 }
 ```
+For VSCode/Cursor (`settings.json`):
 ```json
 {
   "mcp.servers": {
 }
 ```
+### 2. Try It Out
+Open your MCP client and try:
 ```
+"Analyze the leaderboard at kshitijthakkar/smoltrace-leaderboard and show me the top 5 models"
 ```
+You should see AI-powered insights generated by Gemini 2.5 Flash!
+### 3. Using Your Own API Keys (Recommended)
+To avoid rate limits during evaluation:
+1. Visit the [MCP Server Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
+2. Go to **⚙️ Settings** tab
+3. Enter your **Gemini API Key** and **HuggingFace Token**
+4. Click **"Save & Override Keys"**
+**Get Free API Keys**:
+- **Gemini**: https://ai.google.dev/ (1,500 requests/day free)
+- **HuggingFace**: https://huggingface.co/settings/tokens (unlimited for public datasets)
+---
+## For Hackathon Judges
+### ✅ Track 1 Compliance
+- **Complete MCP Implementation**: 11 Tools + 3 Resources + 3 Prompts (17 total)
+- **MCP Standard Compliant**: Built with Gradio's native `@gr.mcp.*` decorators
+- **Production-Ready**: Deployed to HuggingFace Spaces with SSE transport
+- **Enterprise Focus**: Cost optimization, debugging, decision support
+- **Google Gemini Powered**: All AI analysis uses Gemini 2.5 Flash
+- **Interactive Testing**: Beautiful Gradio UI for testing all components
+### 🎯 Key Innovations
+1. **Token Optimization**: `get_top_performers` and `get_leaderboard_summary` reduce token usage by 90-99%
+2. **AI-Powered Synthetic Data**: Generate domain-specific test datasets + matching prompt templates
+3. **Complete Ecosystem**: Part of 4-project platform with TraceVerde → SMOLTRACE → MCP Server → TraceMind-AI
+4. **Real Data Integration**: Works with live HuggingFace datasets from SMOLTRACE evaluations
+5. **Test Results Analysis**: Deep-dive into individual test cases with `analyze_results` tool
+### 📹 Demo Materials
+- **🎥 Demo Video**: [Coming Soon - Link to video]
+- **📢 Social Post**: [Coming Soon - Link to announcement]
+---
+## Documentation
+**For quick evaluation**:
+- Read this README for overview
+- Visit the [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) to test tools
+- Use the Auto-Config link to connect your MCP client
+**For deep dives**:
+- [DOCUMENTATION.md](DOCUMENTATION.md) - Complete API reference
+  - Tool descriptions and parameters
+  - Resource URIs and schemas
+  - Prompt template details
+  - Example use cases
+- [ARCHITECTURE.md](ARCHITECTURE.md) - Technical architecture
+  - Project structure
+  - MCP protocol implementation
+  - Gemini integration details
+  - Deployment guide
+- [UI_GUIDE.md](UI_GUIDE.md) - Gradio interface walkthrough
+  - Tab-by-tab explanations
+  - Testing workflows
+  - Configuration options
+---
+## Technology Stack
+- **AI Model**: Google Gemini 2.5 Flash (via Google AI SDK)
+- **MCP Framework**: Gradio 6 with native MCP support (`@gr.mcp.*` decorators)
+- **Data Source**: HuggingFace Datasets API
+- **Transport**: SSE (recommended) + Streamable HTTP
+- **Deployment**: HuggingFace Spaces (Docker SDK)
+---
+## Run Locally (Optional)
 ```bash
+# Clone and setup
+git clone https://github.com/Mandark-droid/TraceMind-mcp-server.git
+cd TraceMind-mcp-server
+python -m venv venv
+source venv/bin/activate  # Windows: venv\Scripts\activate
+pip install -r requirements.txt
+# Configure API keys
+cp .env.example .env
+# Edit .env with your GEMINI_API_KEY and HF_TOKEN
+# Run the server
 python app.py
 ```
+Visit http://localhost:7860 to test the tools via Gradio UI.
+---
+## Related Projects
+**🧠 TraceMind-AI** (Track 2 - MCP in Action):
+- Live Demo: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
+- Consumes this MCP server for AI-powered agent evaluation UI
+- Features autonomous agent chat, trace visualization, job submission
+**📊 Foundation Libraries**:
+- TraceVerde: https://github.com/Mandark-droid/genai_otel_instrument
+- SMOLTRACE: https://github.com/Mandark-droid/SMOLTRACE
+---
+## Credits
+**Built for**: MCP's 1st Birthday Hackathon (Nov 14-30, 2025)
 **Track**: Building MCP (Enterprise)
 **Author**: Kshitij Thakkar
 **Powered by**: Google Gemini 2.5 Flash
 **Built with**: Gradio (native MCP support)
+**Sponsors**: HuggingFace • Google Gemini • Modal • Anthropic • Gradio • ElevenLabs • SambaNova • Blaxel
+---
+## License
+AGPL-3.0 - See [LICENSE](LICENSE) for details
 ---
+## Support
+- 📧 GitHub Issues: [TraceMind-mcp-server/issues](https://github.com/Mandark-droid/TraceMind-mcp-server/issues)
+- 💬 HF Discord: `#mcp-1st-birthday-official🏆`
+- 🏷️ Tag: `building-mcp-track-enterprise`