Spaces:

MCP-1st-Birthday
/

TraceMind

Running

App Files Files Community

TraceMind / MCP_INTEGRATION.md

kshitijthakkar

docs: Update Gemini model version and fix typos

040fd52 13 days ago

preview code

raw

history blame contribute delete

23.2 kB

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

TraceMind-AI - MCP Integration Guide

This document explains how TraceMind-AI integrates with MCP servers to provide AI-powered agent evaluation.

Overview
Dual MCP Integration
Architecture
MCP Client Implementation
Agent Framework Integration
MCP Tools Usage
Development Guide

Overview

TraceMind-AI demonstrates enterprise MCP client usage as part of the Track 2: MCP in Action submission. It showcases two distinct patterns of MCP integration:

Direct MCP Client: Python-based client connecting to remote MCP server via SSE transport
Autonomous Agent: smolagents-based agent with access to MCP tools for multi-step reasoning

Both patterns consume the same MCP server (TraceMind-mcp-server) to provide AI-powered analysis of agent evaluation data.

Dual MCP Integration

Pattern 1: Direct MCP Client Integration

Where: Leaderboard insights, cost estimation dialogs, trace debugging

How it works:

# TraceMind-AI calls MCP server directly
mcp_client = get_sync_mcp_client()
insights = mcp_client.analyze_leaderboard(
    metric_focus="overall",
    time_range="last_week",
    top_n=5
)
# Display insights in UI

Use cases:

Generate leaderboard insights when user clicks "Load Leaderboard"
Estimate costs when user clicks "Estimate Cost" in New Evaluation form
Debug traces when user asks questions in trace visualization

Advantages:

Direct, fast execution
Synchronous API (easy to integrate with Gradio)
Predictable, structured responses

Pattern 2: Autonomous Agent with MCP Tools

Where: Agent Chat tab

How it works:

# smolagents agent discovers and uses MCP tools autonomously
from smolagents import ToolCallingAgent, MCPClient

# Agent initialized with MCP client
agent = ToolCallingAgent(
    tools=[],  # Tools loaded from MCP server
    model=model_client,
    mcp_client=MCPClient(mcp_server_url)
)

# User asks question
result = agent.run("What are the top 3 models and their costs?")

# Agent plans:
#   1. Call get_top_performers MCP tool
#   2. Extract costs from results
#   3. Format and present to user

Use cases:

Answer complex questions requiring multi-step analysis
Compare models across multiple dimensions
Plan evaluation strategies with cost estimates
Provide recommendations based on leaderboard data

Advantages:

Natural language interface
Multi-step reasoning
Autonomous tool selection
Context-aware responses

Architecture

System Overview

┌─────────────────────────────────────────────────────────────┐
│ TraceMind-AI (Gradio App) - Track 2                         │
│                                                               │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ UI Layer (Gradio)                                       │ │
│ │  - Leaderboard tab                                      │ │
│ │  - Agent Chat tab                                       │ │
│ │  - New Evaluation tab                                   │ │
│ │  - Trace Visualization tab                              │ │
│ └────────────┬─────────────────────────────┬──────────────┘ │
│              ↓                             ↓                 │
│  ┌───────────────────────┐   ┌──────────────────────────┐  │
│  │ Direct MCP Client     │   │ Autonomous Agent         │  │
│  │ (sync_wrapper.py)     │   │ (smolagents)             │  │
│  │                       │   │                          │  │
│  │ - Synchronous API     │   │ - Multi-step reasoning   │  │
│  │ - Tool calling        │   │ - Tool discovery         │  │
│  │ - Error handling      │   │ - Context management     │  │
│  └───────────┬───────────┘   └─────────────┬────────────┘  │
│              └─────────────────┬─────────────┘               │
│                                ↓                             │
│                         MCP Protocol                         │
│                         (SSE Transport)                      │
└────────────────────────────────┬────────────────────────────┘
                                 ↓
┌─────────────────────────────────────────────────────────────┐
│ TraceMind MCP Server - Track 1                              │
│ https://huggingface.co/spaces/MCP-1st-Birthday/             │
│ TraceMind-mcp-server                                        │
│                                                               │
│ 11 AI-Powered Tools:                                        │
│  - analyze_leaderboard                                      │
│  - debug_trace                                              │
│  - estimate_cost                                            │
│  - compare_runs                                             │
│  - analyze_results                                          │
│  - get_top_performers                                       │
│  - get_leaderboard_summary                                  │
│  - get_dataset                                              │
│  - generate_synthetic_dataset                               │
│  - push_dataset_to_hub                                      │
│  - generate_prompt_template                                 │
└─────────────────────────────────────────────────────────────┘

MCP Client Implementation

File Structure

TraceMind-AI/
├── mcp_client/
│   ├── __init__.py
│   ├── client.py              # Async MCP client
│   └── sync_wrapper.py        # Synchronous wrapper for Gradio
├── agent/
│   ├── __init__.py
│   └── smolagents_setup.py    # Agent with MCP integration
└── app.py                     # Main Gradio app

Async MCP Client (`client.py`)

from mcp import ClientSession, StdioServerParameters
import mcp.types as types

class TraceMindMCPClient:
    """Async MCP client for TraceMind MCP Server"""

    def __init__(self, mcp_server_url: str):
        self.mcp_server_url = mcp_server_url
        self.session = None

    async def connect(self):
        """Establish connection to MCP server via SSE"""
        # For HTTP-based MCP servers (HuggingFace Spaces)
        self.session = ClientSession(
            ServerParameters(
                url=self.mcp_server_url,
                transport="sse"
            )
        )
        await self.session.__aenter__()

        # List available tools
        tools_result = await self.session.list_tools()
        self.available_tools = {tool.name: tool for tool in tools_result.tools}

        print(f"Connected to MCP server. Available tools: {list(self.available_tools.keys())}")

    async def call_tool(self, tool_name: str, arguments: dict) -> str:
        """Call an MCP tool with given arguments"""
        if not self.session:
            raise RuntimeError("MCP client not connected. Call connect() first.")

        if tool_name not in self.available_tools:
            raise ValueError(f"Tool '{tool_name}' not available. Available: {list(self.available_tools.keys())}")

        # Call the tool
        result = await self.session.call_tool(tool_name, arguments=arguments)

        # Extract text response
        if result.content and len(result.content) > 0:
            return result.content[0].text
        return ""

    async def analyze_leaderboard(self, **kwargs) -> str:
        """Wrapper for analyze_leaderboard tool"""
        return await self.call_tool("analyze_leaderboard", kwargs)

    async def estimate_cost(self, **kwargs) -> str:
        """Wrapper for estimate_cost tool"""
        return await self.call_tool("estimate_cost", kwargs)

    async def debug_trace(self, **kwargs) -> str:
        """Wrapper for debug_trace tool"""
        return await self.call_tool("debug_trace", kwargs)

    async def compare_runs(self, **kwargs) -> str:
        """Wrapper for compare_runs tool"""
        return await self.call_tool("compare_runs", kwargs)

    async def get_top_performers(self, **kwargs) -> str:
        """Wrapper for get_top_performers tool"""
        return await self.call_tool("get_top_performers", kwargs)

    async def disconnect(self):
        """Close MCP connection"""
        if self.session:
            await self.session.__aexit__(None, None, None)

Synchronous Wrapper (`sync_wrapper.py`)

import asyncio
from typing import Optional
from .client import TraceMindMCPClient

class SyncMCPClient:
    """Synchronous wrapper for async MCP client (Gradio-compatible)"""

    def __init__(self, mcp_server_url: str):
        self.mcp_server_url = mcp_server_url
        self.async_client = TraceMindMCPClient(mcp_server_url)
        self._connected = False

    def _run_async(self, coro):
        """Run async coroutine in sync context"""
        try:
            loop = asyncio.get_event_loop()
        except RuntimeError:
            loop = asyncio.new_event_loop()
            asyncio.set_event_loop(loop)

        return loop.run_until_complete(coro)

    def initialize(self):
        """Connect to MCP server"""
        if not self._connected:
            self._run_async(self.async_client.connect())
            self._connected = True

    def analyze_leaderboard(self, **kwargs) -> str:
        """Synchronous wrapper for analyze_leaderboard"""
        if not self._connected:
            self.initialize()
        return self._run_async(self.async_client.analyze_leaderboard(**kwargs))

    def estimate_cost(self, **kwargs) -> str:
        """Synchronous wrapper for estimate_cost"""
        if not self._connected:
            self.initialize()
        return self._run_async(self.async_client.estimate_cost(**kwargs))

    def debug_trace(self, **kwargs) -> str:
        """Synchronous wrapper for debug_trace"""
        if not self._connected:
            self.initialize()
        return self._run_async(self.async_client.debug_trace(**kwargs))

    # ... (similar wrappers for other tools)

# Global instance for use in Gradio app
_mcp_client: Optional[SyncMCPClient] = None

def get_sync_mcp_client() -> SyncMCPClient:
    """Get or create global sync MCP client instance"""
    global _mcp_client
    if _mcp_client is None:
        mcp_server_url = os.getenv(
            "MCP_SERVER_URL",
            "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse"
        )
        _mcp_client = SyncMCPClient(mcp_server_url)
    return _mcp_client

Usage in Gradio App

# app.py
from mcp_client.sync_wrapper import get_sync_mcp_client

# Initialize MCP client
mcp_client = get_sync_mcp_client()
mcp_client.initialize()

# Use in Gradio event handlers
def load_leaderboard():
    """Load leaderboard and generate AI insights"""
    # Load dataset
    ds = load_dataset("kshitijthakkar/smoltrace-leaderboard")
    df = pd.DataFrame(ds)

    # Get AI insights from MCP server
    try:
        insights = mcp_client.analyze_leaderboard(
            metric_focus="overall",
            time_range="last_week",
            top_n=5
        )
    except Exception as e:
        insights = f"❌ Error generating insights: {str(e)}"

    return df, insights

# Gradio UI
with gr.Blocks() as app:
    with gr.Tab("📊 Leaderboard"):
        load_btn = gr.Button("Load Leaderboard")
        insights_md = gr.Markdown(label="AI Insights")
        leaderboard_table = gr.Dataframe()

        load_btn.click(
            fn=load_leaderboard,
            outputs=[leaderboard_table, insights_md]
        )

Agent Framework Integration

smolagents Setup

# agent/smolagents_setup.py
from smolagents import ToolCallingAgent, MCPClient, HfApiModel
import os

def create_agent():
    """Create smolagents agent with MCP tool access"""

    # 1. Configure MCP client
    mcp_server_url = os.getenv(
        "MCP_SERVER_URL",
        "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse"
    )

    mcp_client = MCPClient(mcp_server_url)

    # 2. Configure LLM
    model = HfApiModel(
        model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
        token=os.getenv("HF_TOKEN")
    )

    # 3. Create agent with MCP tools
    agent = ToolCallingAgent(
        tools=[],  # MCP tools loaded automatically
        model=model,
        mcp_client=mcp_client,
        max_steps=10,
        verbosity_level=1
    )

    return agent

def run_agent_query(agent: ToolCallingAgent, query: str, show_reasoning: bool = False):
    """Run agent query and return response"""
    try:
        # Set verbosity based on show_reasoning flag
        if show_reasoning:
            agent.verbosity_level = 2  # Show tool execution logs
        else:
            agent.verbosity_level = 0  # Only show final answer

        # Run agent
        result = agent.run(query)

        return result
    except Exception as e:
        return f"❌ Agent error: {str(e)}"

Agent Chat UI

# app.py
from agent.smolagents_setup import create_agent, run_agent_query

# Initialize agent (once at startup)
agent = create_agent()

def agent_chat(message: str, history: list, show_reasoning: bool):
    """Handle agent chat interaction"""
    # Run agent query
    response = run_agent_query(agent, message, show_reasoning)

    # Update chat history
    history.append((message, response))

    return history, ""

# Gradio UI
with gr.Blocks() as app:
    with gr.Tab("🤖 Agent Chat"):
        gr.Markdown("## Autonomous Agent with MCP Tools")
        gr.Markdown("Ask questions about agent evaluations. The agent has access to all MCP tools.")

        chatbot = gr.Chatbot(label="Agent Chat")
        msg = gr.Textbox(label="Your Question", placeholder="What are the top 3 models and their costs?")
        show_reasoning = gr.Checkbox(label="Show Agent Reasoning", value=False)

        # Quick action buttons
        with gr.Row():
            quick_top = gr.Button("Quick: Top Models")
            quick_cost = gr.Button("Quick: Cost Estimate")
            quick_load = gr.Button("Quick: Load Leaderboard")

        # Event handlers
        msg.submit(agent_chat, [msg, chatbot, show_reasoning], [chatbot, msg])

        quick_top.click(
            lambda h, sr: agent_chat(
                "What are the top 5 models by success rate with their costs?",
                h,
                sr
            ),
            [chatbot, show_reasoning],
            [chatbot, msg]
        )

MCP Tools Usage

Tools Used in TraceMind-AI

Tool	Where Used	Purpose
`analyze_leaderboard`	Leaderboard tab	Generate AI insights when user loads leaderboard
`estimate_cost`	New Evaluation tab	Predict costs before submitting evaluation
`debug_trace`	Trace Visualization	Answer questions about execution traces
`compare_runs`	Compare Runs/Agent Chat	Compare two evaluation runs side-by-side
`analyze_results`	Agent Chat	Analyze detailed test results with optimization recommendations
`get_top_performers`	Agent Chat	Efficiently fetch top N models (90% token reduction)
`get_leaderboard_summary`	Agent Chat	Get high-level statistics (99% token reduction)
`get_dataset`	Agent Chat	Load SMOLTRACE datasets for detailed analysis

Example Tool Calls

Example 1: Leaderboard Insights

# User clicks "Load Leaderboard" button
insights = mcp_client.analyze_leaderboard(
    leaderboard_repo="kshitijthakkar/smoltrace-leaderboard",
    metric_focus="overall",
    time_range="last_week",
    top_n=5
)

# Display in Gradio Markdown component
insights_md.value = insights

Example 2: Cost Estimation

# User fills New Evaluation form and clicks "Estimate Cost"
estimate = mcp_client.estimate_cost(
    model="meta-llama/Llama-3.1-8B",
    agent_type="both",
    num_tests=100,
    hardware="auto"
)

# Display in dialog
gr.Info(estimate)

Example 3: Agent Multi-Step Query

# User asks: "What are the top 3 models and how much do they cost?"

# Agent reasoning (internal):
#   Step 1: Need to get top models by success rate
#   → Call get_top_performers(metric="success_rate", top_n=3)
#
#   Step 2: Extract cost information from results
#   → Parse JSON response, get "total_cost_usd" field
#
#   Step 3: Format response for user
#   → Create markdown table with model names, success rates, costs

# Agent response:
"""
Here are the top 3 models by success rate:

1. **GPT-4**: 95.8% success rate, $0.05 per run
2. **Claude-3**: 94.1% success rate, $0.04 per run
3. **Llama-3.1-8B**: 93.4% success rate, $0.002 per run

GPT-4 leads in accuracy but is 25x more expensive than Llama-3.1.
For cost-sensitive workloads, Llama-3.1 offers the best value.
"""

Development Guide

Adding New MCP Tool Integration

Add method to async client (client.py):

async def new_tool_name(self, **kwargs) -> str:
    """Wrapper for new_tool_name MCP tool"""
    return await self.call_tool("new_tool_name", kwargs)

Add synchronous wrapper (sync_wrapper.py):

def new_tool_name(self, **kwargs) -> str:
    """Synchronous wrapper for new_tool_name"""
    if not self._connected:
        self.initialize()
    return self._run_async(self.async_client.new_tool_name(**kwargs))

Use in Gradio app (app.py):

def handle_new_tool():
    result = mcp_client.new_tool_name(param1="value1", param2="value2")
    return result

Note: Agent automatically discovers new tools from MCP server, no code changes needed!

Testing MCP Integration

Test 1: Connection

python -c "from mcp_client.sync_wrapper import get_sync_mcp_client; client = get_sync_mcp_client(); client.initialize(); print('✅ MCP client connected')"

Test 2: Tool Call

from mcp_client.sync_wrapper import get_sync_mcp_client

client = get_sync_mcp_client()
client.initialize()

result = client.analyze_leaderboard(
    metric_focus="cost",
    time_range="last_week",
    top_n=3
)

print(result)

Test 3: Agent

from agent.smolagents_setup import create_agent, run_agent_query

agent = create_agent()
response = run_agent_query(agent, "What are the top 3 models?", show_reasoning=True)
print(response)

Debugging MCP Issues

Issue: Connection timeout

Check: MCP server is running at specified URL
Check: Network connectivity to HuggingFace Spaces
Check: SSE transport is enabled on server

Issue: Tool not found

Check: MCP server has the tool implemented
Check: Tool name matches exactly (case-sensitive)
Check: Client initialized successfully (call initialize() first)

Issue: Agent not using MCP tools

Check: MCPClient is properly configured in agent setup
Check: Agent has max_steps > 0 to allow tool usage
Check: Query requires tool usage (not answerable from agent's knowledge alone)

Performance Considerations

Token Optimization

Problem: Loading full leaderboard dataset consumes excessive tokens Solution: Use token-optimized MCP tools

# ❌ BAD: Loads all 51 runs (50K+ tokens)
leaderboard = mcp_client.get_dataset("kshitijthakkar/smoltrace-leaderboard")

# ✅ GOOD: Returns only top 5 (5K tokens, 90% reduction)
top_performers = mcp_client.get_top_performers(top_n=5)

# ✅ BETTER: Returns summary stats (500 tokens, 99% reduction)
summary = mcp_client.get_leaderboard_summary()

Caching

Problem: Repeated identical MCP calls waste time and credits Solution: Implement client-side caching

from functools import lru_cache
import time

@lru_cache(maxsize=32)
def cached_analyze_leaderboard(metric_focus: str, time_range: str, top_n: int, cache_key: int):
    """Cached MCP call with TTL via cache_key"""
    return mcp_client.analyze_leaderboard(
        metric_focus=metric_focus,
        time_range=time_range,
        top_n=top_n
    )

# Use with 5-minute cache TTL
cache_key = int(time.time() // 300)  # Changes every 5 minutes
insights = cached_analyze_leaderboard("overall", "last_week", 5, cache_key)

Async Optimization

Problem: Sequential MCP calls block UI Solution: Use async for parallel calls

import asyncio

async def load_leaderboard_with_insights():
    """Load leaderboard and insights in parallel"""
    # Start both operations concurrently
    leaderboard_task = asyncio.create_task(load_dataset_async("kshitijthakkar/smoltrace-leaderboard"))
    insights_task = asyncio.create_task(mcp_client.analyze_leaderboard(metric_focus="overall"))

    # Wait for both to complete
    leaderboard, insights = await asyncio.gather(leaderboard_task, insights_task)

    return leaderboard, insights

Security Considerations

API Key Management

DO:

Store API keys in environment variables or HF Spaces secrets
Use session-only storage in Gradio (not server-side persistence)
Rotate keys regularly

DON'T:

Hardcode API keys in source code
Expose keys in client-side JavaScript
Log API keys in console or files

MCP Server Trust

Verify MCP server authenticity:

Use HTTPS URLs only
Verify domain ownership (huggingface.co spaces)
Review MCP server code before connecting (open source)

Limit tool access:

Only connect to trusted MCP servers
Review tool permissions before use
Implement rate limiting for tool calls

Spaces:

MCP-1st-Birthday
/

TraceMind

Running

TraceMind-AI - MCP Integration Guide

Table of Contents

Overview

Dual MCP Integration

Pattern 1: Direct MCP Client Integration

Pattern 2: Autonomous Agent with MCP Tools

Architecture

System Overview

MCP Client Implementation

File Structure

Async MCP Client (`client.py`)

Synchronous Wrapper (`sync_wrapper.py`)

Usage in Gradio App

Agent Framework Integration

smolagents Setup

Agent Chat UI

MCP Tools Usage

Tools Used in TraceMind-AI

Example Tool Calls

Development Guide

Adding New MCP Tool Integration

Testing MCP Integration

Debugging MCP Issues

Performance Considerations

Token Optimization

Caching

Async Optimization

Security Considerations

API Key Management

MCP Server Trust

Related Documentation

TraceMind-AI - MCP Integration Guide

Table of Contents

Overview

Dual MCP Integration

Pattern 1: Direct MCP Client Integration

Pattern 2: Autonomous Agent with MCP Tools

Architecture

System Overview

MCP Client Implementation

File Structure

Async MCP Client (client.py)

Synchronous Wrapper (sync_wrapper.py)

Usage in Gradio App

Agent Framework Integration

smolagents Setup

Agent Chat UI

MCP Tools Usage

Tools Used in TraceMind-AI

Example Tool Calls

Development Guide

Adding New MCP Tool Integration

Testing MCP Integration

Debugging MCP Issues

Performance Considerations

Token Optimization

Caching

Async Optimization

Security Considerations

API Key Management

MCP Server Trust

Related Documentation

Async MCP Client (`client.py`)

Synchronous Wrapper (`sync_wrapper.py`)