Spaces:
Running
A newer version of the Gradio SDK is available:
6.0.2
TraceMind-AI - MCP Integration Guide
This document explains how TraceMind-AI integrates with MCP servers to provide AI-powered agent evaluation.
Table of Contents
- Overview
- Dual MCP Integration
- Architecture
- MCP Client Implementation
- Agent Framework Integration
- MCP Tools Usage
- Development Guide
Overview
TraceMind-AI demonstrates enterprise MCP client usage as part of the Track 2: MCP in Action submission. It showcases two distinct patterns of MCP integration:
- Direct MCP Client: Python-based client connecting to remote MCP server via SSE transport
- Autonomous Agent:
smolagents-based agent with access to MCP tools for multi-step reasoning
Both patterns consume the same MCP server (TraceMind-mcp-server) to provide AI-powered analysis of agent evaluation data.
Dual MCP Integration
Pattern 1: Direct MCP Client Integration
Where: Leaderboard insights, cost estimation dialogs, trace debugging
How it works:
# TraceMind-AI calls MCP server directly
mcp_client = get_sync_mcp_client()
insights = mcp_client.analyze_leaderboard(
metric_focus="overall",
time_range="last_week",
top_n=5
)
# Display insights in UI
Use cases:
- Generate leaderboard insights when user clicks "Load Leaderboard"
- Estimate costs when user clicks "Estimate Cost" in New Evaluation form
- Debug traces when user asks questions in trace visualization
Advantages:
- Direct, fast execution
- Synchronous API (easy to integrate with Gradio)
- Predictable, structured responses
Pattern 2: Autonomous Agent with MCP Tools
Where: Agent Chat tab
How it works:
# smolagents agent discovers and uses MCP tools autonomously
from smolagents import ToolCallingAgent, MCPClient
# Agent initialized with MCP client
agent = ToolCallingAgent(
tools=[], # Tools loaded from MCP server
model=model_client,
mcp_client=MCPClient(mcp_server_url)
)
# User asks question
result = agent.run("What are the top 3 models and their costs?")
# Agent plans:
# 1. Call get_top_performers MCP tool
# 2. Extract costs from results
# 3. Format and present to user
Use cases:
- Answer complex questions requiring multi-step analysis
- Compare models across multiple dimensions
- Plan evaluation strategies with cost estimates
- Provide recommendations based on leaderboard data
Advantages:
- Natural language interface
- Multi-step reasoning
- Autonomous tool selection
- Context-aware responses
Architecture
System Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TraceMind-AI (Gradio App) - Track 2 β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β UI Layer (Gradio) β β
β β - Leaderboard tab β β
β β - Agent Chat tab β β
β β - New Evaluation tab β β
β β - Trace Visualization tab β β
β ββββββββββββββ¬ββββββββββββββββββββββββββββββ¬βββββββββββββββ β
β β β β
β βββββββββββββββββββββββββ ββββββββββββββββββββββββββββ β
β β Direct MCP Client β β Autonomous Agent β β
β β (sync_wrapper.py) β β (smolagents) β β
β β β β β β
β β - Synchronous API β β - Multi-step reasoning β β
β β - Tool calling β β - Tool discovery β β
β β - Error handling β β - Context management β β
β βββββββββββββ¬ββββββββββββ βββββββββββββββ¬βββββββββββββ β
β βββββββββββββββββββ¬ββββββββββββββ β
β β β
β MCP Protocol β
β (SSE Transport) β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TraceMind MCP Server - Track 1 β
β https://huggingface.co/spaces/MCP-1st-Birthday/ β
β TraceMind-mcp-server β
β β
β 11 AI-Powered Tools: β
β - analyze_leaderboard β
β - debug_trace β
β - estimate_cost β
β - compare_runs β
β - analyze_results β
β - get_top_performers β
β - get_leaderboard_summary β
β - get_dataset β
β - generate_synthetic_dataset β
β - push_dataset_to_hub β
β - generate_prompt_template β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MCP Client Implementation
File Structure
TraceMind-AI/
βββ mcp_client/
β βββ __init__.py
β βββ client.py # Async MCP client
β βββ sync_wrapper.py # Synchronous wrapper for Gradio
βββ agent/
β βββ __init__.py
β βββ smolagents_setup.py # Agent with MCP integration
βββ app.py # Main Gradio app
Async MCP Client (client.py)
from mcp import ClientSession, StdioServerParameters
import mcp.types as types
class TraceMindMCPClient:
"""Async MCP client for TraceMind MCP Server"""
def __init__(self, mcp_server_url: str):
self.mcp_server_url = mcp_server_url
self.session = None
async def connect(self):
"""Establish connection to MCP server via SSE"""
# For HTTP-based MCP servers (HuggingFace Spaces)
self.session = ClientSession(
ServerParameters(
url=self.mcp_server_url,
transport="sse"
)
)
await self.session.__aenter__()
# List available tools
tools_result = await self.session.list_tools()
self.available_tools = {tool.name: tool for tool in tools_result.tools}
print(f"Connected to MCP server. Available tools: {list(self.available_tools.keys())}")
async def call_tool(self, tool_name: str, arguments: dict) -> str:
"""Call an MCP tool with given arguments"""
if not self.session:
raise RuntimeError("MCP client not connected. Call connect() first.")
if tool_name not in self.available_tools:
raise ValueError(f"Tool '{tool_name}' not available. Available: {list(self.available_tools.keys())}")
# Call the tool
result = await self.session.call_tool(tool_name, arguments=arguments)
# Extract text response
if result.content and len(result.content) > 0:
return result.content[0].text
return ""
async def analyze_leaderboard(self, **kwargs) -> str:
"""Wrapper for analyze_leaderboard tool"""
return await self.call_tool("analyze_leaderboard", kwargs)
async def estimate_cost(self, **kwargs) -> str:
"""Wrapper for estimate_cost tool"""
return await self.call_tool("estimate_cost", kwargs)
async def debug_trace(self, **kwargs) -> str:
"""Wrapper for debug_trace tool"""
return await self.call_tool("debug_trace", kwargs)
async def compare_runs(self, **kwargs) -> str:
"""Wrapper for compare_runs tool"""
return await self.call_tool("compare_runs", kwargs)
async def get_top_performers(self, **kwargs) -> str:
"""Wrapper for get_top_performers tool"""
return await self.call_tool("get_top_performers", kwargs)
async def disconnect(self):
"""Close MCP connection"""
if self.session:
await self.session.__aexit__(None, None, None)
Synchronous Wrapper (sync_wrapper.py)
import asyncio
from typing import Optional
from .client import TraceMindMCPClient
class SyncMCPClient:
"""Synchronous wrapper for async MCP client (Gradio-compatible)"""
def __init__(self, mcp_server_url: str):
self.mcp_server_url = mcp_server_url
self.async_client = TraceMindMCPClient(mcp_server_url)
self._connected = False
def _run_async(self, coro):
"""Run async coroutine in sync context"""
try:
loop = asyncio.get_event_loop()
except RuntimeError:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
return loop.run_until_complete(coro)
def initialize(self):
"""Connect to MCP server"""
if not self._connected:
self._run_async(self.async_client.connect())
self._connected = True
def analyze_leaderboard(self, **kwargs) -> str:
"""Synchronous wrapper for analyze_leaderboard"""
if not self._connected:
self.initialize()
return self._run_async(self.async_client.analyze_leaderboard(**kwargs))
def estimate_cost(self, **kwargs) -> str:
"""Synchronous wrapper for estimate_cost"""
if not self._connected:
self.initialize()
return self._run_async(self.async_client.estimate_cost(**kwargs))
def debug_trace(self, **kwargs) -> str:
"""Synchronous wrapper for debug_trace"""
if not self._connected:
self.initialize()
return self._run_async(self.async_client.debug_trace(**kwargs))
# ... (similar wrappers for other tools)
# Global instance for use in Gradio app
_mcp_client: Optional[SyncMCPClient] = None
def get_sync_mcp_client() -> SyncMCPClient:
"""Get or create global sync MCP client instance"""
global _mcp_client
if _mcp_client is None:
mcp_server_url = os.getenv(
"MCP_SERVER_URL",
"https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse"
)
_mcp_client = SyncMCPClient(mcp_server_url)
return _mcp_client
Usage in Gradio App
# app.py
from mcp_client.sync_wrapper import get_sync_mcp_client
# Initialize MCP client
mcp_client = get_sync_mcp_client()
mcp_client.initialize()
# Use in Gradio event handlers
def load_leaderboard():
"""Load leaderboard and generate AI insights"""
# Load dataset
ds = load_dataset("kshitijthakkar/smoltrace-leaderboard")
df = pd.DataFrame(ds)
# Get AI insights from MCP server
try:
insights = mcp_client.analyze_leaderboard(
metric_focus="overall",
time_range="last_week",
top_n=5
)
except Exception as e:
insights = f"β Error generating insights: {str(e)}"
return df, insights
# Gradio UI
with gr.Blocks() as app:
with gr.Tab("π Leaderboard"):
load_btn = gr.Button("Load Leaderboard")
insights_md = gr.Markdown(label="AI Insights")
leaderboard_table = gr.Dataframe()
load_btn.click(
fn=load_leaderboard,
outputs=[leaderboard_table, insights_md]
)
Agent Framework Integration
smolagents Setup
# agent/smolagents_setup.py
from smolagents import ToolCallingAgent, MCPClient, HfApiModel
import os
def create_agent():
"""Create smolagents agent with MCP tool access"""
# 1. Configure MCP client
mcp_server_url = os.getenv(
"MCP_SERVER_URL",
"https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse"
)
mcp_client = MCPClient(mcp_server_url)
# 2. Configure LLM
model = HfApiModel(
model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
token=os.getenv("HF_TOKEN")
)
# 3. Create agent with MCP tools
agent = ToolCallingAgent(
tools=[], # MCP tools loaded automatically
model=model,
mcp_client=mcp_client,
max_steps=10,
verbosity_level=1
)
return agent
def run_agent_query(agent: ToolCallingAgent, query: str, show_reasoning: bool = False):
"""Run agent query and return response"""
try:
# Set verbosity based on show_reasoning flag
if show_reasoning:
agent.verbosity_level = 2 # Show tool execution logs
else:
agent.verbosity_level = 0 # Only show final answer
# Run agent
result = agent.run(query)
return result
except Exception as e:
return f"β Agent error: {str(e)}"
Agent Chat UI
# app.py
from agent.smolagents_setup import create_agent, run_agent_query
# Initialize agent (once at startup)
agent = create_agent()
def agent_chat(message: str, history: list, show_reasoning: bool):
"""Handle agent chat interaction"""
# Run agent query
response = run_agent_query(agent, message, show_reasoning)
# Update chat history
history.append((message, response))
return history, ""
# Gradio UI
with gr.Blocks() as app:
with gr.Tab("π€ Agent Chat"):
gr.Markdown("## Autonomous Agent with MCP Tools")
gr.Markdown("Ask questions about agent evaluations. The agent has access to all MCP tools.")
chatbot = gr.Chatbot(label="Agent Chat")
msg = gr.Textbox(label="Your Question", placeholder="What are the top 3 models and their costs?")
show_reasoning = gr.Checkbox(label="Show Agent Reasoning", value=False)
# Quick action buttons
with gr.Row():
quick_top = gr.Button("Quick: Top Models")
quick_cost = gr.Button("Quick: Cost Estimate")
quick_load = gr.Button("Quick: Load Leaderboard")
# Event handlers
msg.submit(agent_chat, [msg, chatbot, show_reasoning], [chatbot, msg])
quick_top.click(
lambda h, sr: agent_chat(
"What are the top 5 models by success rate with their costs?",
h,
sr
),
[chatbot, show_reasoning],
[chatbot, msg]
)
MCP Tools Usage
Tools Used in TraceMind-AI
| Tool | Where Used | Purpose |
|---|---|---|
analyze_leaderboard |
Leaderboard tab | Generate AI insights when user loads leaderboard |
estimate_cost |
New Evaluation tab | Predict costs before submitting evaluation |
debug_trace |
Trace Visualization | Answer questions about execution traces |
compare_runs |
Compare Runs/Agent Chat | Compare two evaluation runs side-by-side |
analyze_results |
Agent Chat | Analyze detailed test results with optimization recommendations |
get_top_performers |
Agent Chat | Efficiently fetch top N models (90% token reduction) |
get_leaderboard_summary |
Agent Chat | Get high-level statistics (99% token reduction) |
get_dataset |
Agent Chat | Load SMOLTRACE datasets for detailed analysis |
Example Tool Calls
Example 1: Leaderboard Insights
# User clicks "Load Leaderboard" button
insights = mcp_client.analyze_leaderboard(
leaderboard_repo="kshitijthakkar/smoltrace-leaderboard",
metric_focus="overall",
time_range="last_week",
top_n=5
)
# Display in Gradio Markdown component
insights_md.value = insights
Example 2: Cost Estimation
# User fills New Evaluation form and clicks "Estimate Cost"
estimate = mcp_client.estimate_cost(
model="meta-llama/Llama-3.1-8B",
agent_type="both",
num_tests=100,
hardware="auto"
)
# Display in dialog
gr.Info(estimate)
Example 3: Agent Multi-Step Query
# User asks: "What are the top 3 models and how much do they cost?"
# Agent reasoning (internal):
# Step 1: Need to get top models by success rate
# β Call get_top_performers(metric="success_rate", top_n=3)
#
# Step 2: Extract cost information from results
# β Parse JSON response, get "total_cost_usd" field
#
# Step 3: Format response for user
# β Create markdown table with model names, success rates, costs
# Agent response:
"""
Here are the top 3 models by success rate:
1. **GPT-4**: 95.8% success rate, $0.05 per run
2. **Claude-3**: 94.1% success rate, $0.04 per run
3. **Llama-3.1-8B**: 93.4% success rate, $0.002 per run
GPT-4 leads in accuracy but is 25x more expensive than Llama-3.1.
For cost-sensitive workloads, Llama-3.1 offers the best value.
"""
Development Guide
Adding New MCP Tool Integration
- Add method to async client (
client.py):
async def new_tool_name(self, **kwargs) -> str:
"""Wrapper for new_tool_name MCP tool"""
return await self.call_tool("new_tool_name", kwargs)
- Add synchronous wrapper (
sync_wrapper.py):
def new_tool_name(self, **kwargs) -> str:
"""Synchronous wrapper for new_tool_name"""
if not self._connected:
self.initialize()
return self._run_async(self.async_client.new_tool_name(**kwargs))
- Use in Gradio app (
app.py):
def handle_new_tool():
result = mcp_client.new_tool_name(param1="value1", param2="value2")
return result
Note: Agent automatically discovers new tools from MCP server, no code changes needed!
Testing MCP Integration
Test 1: Connection
python -c "from mcp_client.sync_wrapper import get_sync_mcp_client; client = get_sync_mcp_client(); client.initialize(); print('β
MCP client connected')"
Test 2: Tool Call
from mcp_client.sync_wrapper import get_sync_mcp_client
client = get_sync_mcp_client()
client.initialize()
result = client.analyze_leaderboard(
metric_focus="cost",
time_range="last_week",
top_n=3
)
print(result)
Test 3: Agent
from agent.smolagents_setup import create_agent, run_agent_query
agent = create_agent()
response = run_agent_query(agent, "What are the top 3 models?", show_reasoning=True)
print(response)
Debugging MCP Issues
Issue: Connection timeout
- Check: MCP server is running at specified URL
- Check: Network connectivity to HuggingFace Spaces
- Check: SSE transport is enabled on server
Issue: Tool not found
- Check: MCP server has the tool implemented
- Check: Tool name matches exactly (case-sensitive)
- Check: Client initialized successfully (call
initialize()first)
Issue: Agent not using MCP tools
- Check: MCPClient is properly configured in agent setup
- Check: Agent has
max_steps > 0to allow tool usage - Check: Query requires tool usage (not answerable from agent's knowledge alone)
Performance Considerations
Token Optimization
Problem: Loading full leaderboard dataset consumes excessive tokens Solution: Use token-optimized MCP tools
# β BAD: Loads all 51 runs (50K+ tokens)
leaderboard = mcp_client.get_dataset("kshitijthakkar/smoltrace-leaderboard")
# β
GOOD: Returns only top 5 (5K tokens, 90% reduction)
top_performers = mcp_client.get_top_performers(top_n=5)
# β
BETTER: Returns summary stats (500 tokens, 99% reduction)
summary = mcp_client.get_leaderboard_summary()
Caching
Problem: Repeated identical MCP calls waste time and credits Solution: Implement client-side caching
from functools import lru_cache
import time
@lru_cache(maxsize=32)
def cached_analyze_leaderboard(metric_focus: str, time_range: str, top_n: int, cache_key: int):
"""Cached MCP call with TTL via cache_key"""
return mcp_client.analyze_leaderboard(
metric_focus=metric_focus,
time_range=time_range,
top_n=top_n
)
# Use with 5-minute cache TTL
cache_key = int(time.time() // 300) # Changes every 5 minutes
insights = cached_analyze_leaderboard("overall", "last_week", 5, cache_key)
Async Optimization
Problem: Sequential MCP calls block UI Solution: Use async for parallel calls
import asyncio
async def load_leaderboard_with_insights():
"""Load leaderboard and insights in parallel"""
# Start both operations concurrently
leaderboard_task = asyncio.create_task(load_dataset_async("kshitijthakkar/smoltrace-leaderboard"))
insights_task = asyncio.create_task(mcp_client.analyze_leaderboard(metric_focus="overall"))
# Wait for both to complete
leaderboard, insights = await asyncio.gather(leaderboard_task, insights_task)
return leaderboard, insights
Security Considerations
API Key Management
DO:
- Store API keys in environment variables or HF Spaces secrets
- Use session-only storage in Gradio (not server-side persistence)
- Rotate keys regularly
DON'T:
- Hardcode API keys in source code
- Expose keys in client-side JavaScript
- Log API keys in console or files
MCP Server Trust
Verify MCP server authenticity:
- Use HTTPS URLs only
- Verify domain ownership (huggingface.co spaces)
- Review MCP server code before connecting (open source)
Limit tool access:
- Only connect to trusted MCP servers
- Review tool permissions before use
- Implement rate limiting for tool calls
Related Documentation
- USER_GUIDE.md - Complete UI walkthrough
- ARCHITECTURE.md - Technical architecture
- TraceMind MCP Server Documentation
Last Updated: November 21, 2025