Spaces:
Running
A newer version of the Gradio SDK is available:
6.0.2
title: TraceMind AI
emoji: π§
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
short_description: AI agent evaluation with MCP-powered intelligence
license: agpl-3.0
pinned: true
tags:
- mcp-in-action-track-enterprise
- agent-evaluation
- mcp-client
- leaderboard
- gradio
π§ TraceMind-AI
Agent Evaluation Platform with MCP-Powered Intelligence
π― Track 2 Submission: MCP in Action (Enterprise) π MCP's 1st Birthday Hackathon: November 14-30, 2025
Why TraceMind-AI?
The Challenge: Evaluating AI agents generates complex data across models, providers, and configurations. Making sense of it all is overwhelming.
The Solution: TraceMind-AI is your intelligent agent evaluation command center:
- π Live leaderboard with real-time performance data
- π€ Autonomous agent chat powered by MCP tools
- π° Smart cost estimation before you run evaluations
- π Deep trace analysis to debug agent behavior
- βοΈ Multi-cloud job submission (HuggingFace Jobs + Modal)
All powered by the Model Context Protocol for AI-driven insights at every step.
π Try It Now
- π Live Demo: TraceMind-AI Space
- π οΈ MCP Server: TraceMind-mcp-server (Track 1)
- π Full Docs: See USER_GUIDE.md for complete walkthrough
- π₯ TraceMind-AI Full Demo (20 min): Watch on Loom
- π¬ MCP Server Quick Demo (5 min): Watch on Loom
- πΊ MCP Server Full Demo (20 min): Watch on Loom
The TraceMind Ecosystem
TraceMind-AI is the user-facing platform in a complete 4-project agent evaluation ecosystem:
π TraceVerde π SMOLTRACE
(genai_otel_instrument) (Evaluation Engine)
β β
Instruments Evaluates
LLM calls agents
β β
βββββββββββββ¬ββββββββββββββββββββ
β
Generates Datasets
(leaderboard, traces, metrics)
β
βββββββββββββ΄ββββββββββββββββββββ
β β
π οΈ TraceMind MCP Server π§ TraceMind-AI
(Track 1 - Building MCP) (This Project - Track 2)
Provides AI Tools Consumes MCP Tools
ββββββββββ MCP Protocol βββββββββ
The Foundation
π TraceVerde - Automatic OpenTelemetry instrumentation for LLM frameworks β Captures every LLM call, tool usage, and agent step β GitHub | PyPI
π SMOLTRACE - Lightweight evaluation engine with built-in tracing β Generates structured datasets (leaderboard, results, traces, metrics) β GitHub | PyPI
The Platform
π οΈ TraceMind MCP Server - AI-powered analysis tools via MCP β Live Demo | GitHub β Track 1: Building MCP (Enterprise)
π§ TraceMind-AI (This Project) - Interactive UI that consumes MCP tools β Track 2: MCP in Action (Enterprise)
Why This Matters for Hugging Face
This ecosystem is built around Hugging Face, not just "using it":
- Every SMOLTRACE evaluation creates 4 structured
datasetson the Hub (leaderboard, results, traces, metrics) - TraceMind MCP Server and TraceMind-AI run as Hugging Face Spaces, using Gradio's MCP integration
- The stack is designed for
smolagentsβ agents are evaluated, traced, and analyzed using HF's own agent framework - Evaluations can be executed via HF Jobs, turning evaluations into real compute usage, not just local scripts
So TraceMind isn't just another agent demo. It's an opinionated blueprint for:
"How Hugging Face models + Datasets + Spaces + Jobs + smolagents + MCP can work together as a complete agent evaluation and observability platform."
Key Features
π― MCP Integration (Track 2)
TraceMind-AI demonstrates enterprise MCP client usage in two ways:
1. Direct MCP Client Integration
- Connects to TraceMind MCP Server via SSE transport
- Uses 5 AI-powered tools:
analyze_leaderboard,estimate_cost,debug_trace,compare_runs,analyze_results - Real-time insights powered by Google Gemini 2.5 Flash
2. Autonomous Agent with MCP Tools
- Built with
smolagentsframework - Agent has access to all MCP server tools
- Natural language queries β autonomous tool execution
- Example: "What are the top 3 models and how much do they cost?"
π Agent Evaluation Features
- Live Leaderboard: View all evaluation runs with sortable metrics
- Cost Estimation: Auto-select hardware and predict costs before running
- Trace Visualization: Deep-dive into OpenTelemetry traces with GPU metrics
- Multi-Cloud Jobs: Submit evaluations to HuggingFace Jobs or Modal
- Performance Analytics: GPU utilization, CO2 emissions, token tracking
π‘ Smart Features
- Auto Hardware Selection: Based on model size and provider
- Real-time Job Monitoring: Track HuggingFace Jobs status
- Agent Reasoning Visibility: See step-by-step tool execution
- Quick Action Buttons: One-click common queries
Quick Start
Option 1: Use the Live Demo (Recommended)
- Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
- Login: Sign in with your HuggingFace account
- Explore: Browse the leaderboard, chat with the agent, visualize traces
Option 2: Run Locally
# Clone and setup
git clone https://github.com/Mandark-droid/TraceMind-AI.git
cd TraceMind-AI
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your API keys (see Configuration section)
# Run the app
python app.py
Visit http://localhost:7860
Configuration
For Viewing (Free)
Required:
- HuggingFace account (free)
- HuggingFace token with Read permissions
For Submitting Jobs (Paid)
Required:
- β οΈ HuggingFace Pro ($9/month) with credit card
- HuggingFace token with Read + Write + Run Jobs permissions
- LLM provider API keys (OpenAI, Anthropic, etc.)
Optional (Modal Alternative):
- Modal account (pay-per-second, no subscription)
- Modal API token (MODAL_TOKEN_ID + MODAL_TOKEN_SECRET)
Using Your Own API Keys (Recommended for Judges)
To prevent rate limits during evaluation:
Step 1: Configure MCP Server (Required for AI tools)
- Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
- Go to βοΈ Settings tab
- Enter: Gemini API Key + HuggingFace Token
- Click "Save & Override Keys"
Step 2: Configure TraceMind-AI (Optional)
- Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
- Go to βοΈ Settings tab
- Enter: Gemini API Key + HuggingFace Token
- Click "Save API Keys"
Get Free API Keys:
- Gemini: https://ai.google.dev/ (1,500 requests/day)
- HuggingFace: https://huggingface.co/settings/tokens (unlimited for public datasets)
For Hackathon Judges
β Track 2 Compliance
- MCP Client Integration: Connects to remote MCP server via SSE transport
- Autonomous Agent:
smolagentsagent with MCP tool access - Enterprise Focus: Cost optimization, job submission, performance analytics
- Production-Ready: Deployed to HuggingFace Spaces with OAuth authentication
- Real Data: Live HuggingFace datasets from SMOLTRACE evaluations
π― Key Innovations
- Dual MCP Integration: Both direct MCP client + autonomous agent with MCP tools
- Multi-Cloud Support: HuggingFace Jobs + Modal for serverless compute
- Auto Hardware Selection: Smart hardware recommendations based on model size
- Complete Ecosystem: Part of 4-project platform demonstrating full evaluation workflow
- Agent Reasoning Visibility: See step-by-step MCP tool execution
πΉ Demo Materials
- π₯ TraceMind-AI Full Demo (20 min): Watch on Loom - Complete walkthrough of all features
- π¬ MCP Server Quick Demo (5 min): Watch on Loom - Quick intro to MCP tools
- πΊ MCP Server Full Demo (20 min): Watch on Loom - Deep dive into MCP server
- π Blog Post: Building TraceMind Ecosystem - Technical deep-dive
- π LinkedIn Post: TraceMind-AI Hackathon Submission - Final submission announcement
π§ͺ Testing Suggestions
1. Try the Agent Chat (π€ Agent Chat tab):
- "Analyze the current leaderboard and show me the top 5 models"
- "Compare the costs of the top 3 models"
- "Estimate the cost of running 100 tests with GPT-4"
2. Explore the Leaderboard (π Leaderboard tab):
- Click "Load Leaderboard" to see live data
- Read the AI-generated insights (powered by MCP server)
- Click on a run to see detailed test results
3. Visualize Traces (Select a run β View traces):
- See OpenTelemetry waterfall diagrams
- View GPU metrics overlay (for GPU jobs)
- Ask questions about the trace (MCP-powered debugging)
What Can You Do?
π View & Analyze
- Browse leaderboard with AI-powered insights
- Compare models side-by-side across metrics
- Analyze traces with interactive visualization
- Ask questions via autonomous agent
π° Estimate & Plan
- Get cost estimates before running evaluations
- Compare hardware options (CPU vs GPU tiers)
- Preview duration and CO2 emissions
- See recommendations from AI analysis
π Submit & Monitor
- Submit evaluation jobs to HuggingFace or Modal
- Track job status in real-time
- View results automatically when complete
- Download datasets for further analysis
π§ͺ Generate & Customize
- Generate synthetic datasets for custom domains and tools
- Create prompt templates optimized for your use case
- Push to HuggingFace Hub with one click
- Test evaluations without writing code
Documentation
For quick evaluation:
- Read this README for overview
- Visit the Live Demo to try it
- Check out the π€ Agent Chat tab for autonomous MCP usage
For deep dives:
- USER_GUIDE.md - Complete screen-by-screen walkthrough
- Leaderboard tab usage
- Agent chat interactions
- Synthetic data generator
- Job submission workflow
- Trace visualization guide
- MCP_INTEGRATION.md - MCP client architecture
- How TraceMind-AI connects to MCP server
- Agent framework integration (smolagents)
- MCP tool usage examples
- JOB_SUBMISSION.md - Evaluation job guide
- HuggingFace Jobs setup
- Modal integration
- Hardware selection guide
- Cost optimization tips
- ARCHITECTURE.md - Technical architecture
- Project structure
- Data flow
- Authentication
- Deployment
Technology Stack
- UI Framework: Gradio 5.49.1
- Agent Framework: smolagents 1.22.0+
- MCP Integration: MCP Python SDK + smolagents MCPClient
- Data Source: HuggingFace Datasets API
- Authentication: HuggingFace OAuth (planned)
- AI Models:
- Agent: Google Gemini 2.5 Flash
- MCP Server: Google Gemini 2.5 Flash
- Cloud Platforms: HuggingFace Jobs + Modal
Example Workflows
Workflow 1: Quick Analysis
- Open TraceMind-AI
- Go to π€ Agent Chat
- Click "Quick: Top Models"
- See agent fetch leaderboard and analyze top performers
- Ask follow-up: "Which one is most cost-effective?"
Workflow 2: Submit Evaluation Job
- Go to βοΈ Settings β Configure API keys
- Go to π New Evaluation
- Select model (e.g.,
meta-llama/Llama-3.1-8B) - Choose infrastructure (HuggingFace Jobs or Modal)
- Click "π° Estimate Cost" to preview
- Click "Submit Evaluation"
- Monitor job in π Job Monitoring tab
- View results in leaderboard when complete
Workflow 3: Debug Agent Behavior
- Browse π Leaderboard
- Click on a run with failures
- View detailed test results
- Click on a failed test to see trace
- Use MCP-powered Q&A: "Why did this test fail?"
- Get AI analysis of the execution trace
Workflow 4: Generate Custom Test Dataset
- Go to π¬ Synthetic Data Generator
- Configure:
- Domain:
finance - Tools:
get_stock_price,calculate_profit,send_alert - Number of tasks:
20 - Difficulty:
balanced
- Domain:
- Click "Generate Dataset"
- Review generated tasks and prompt template
- Enter repository name:
yourname/smoltrace-finance-tasks - Click "Push to HuggingFace Hub"
- Use your custom dataset in evaluations
Screenshots
See SCREENSHOTS.md for annotated screenshots of all screens
π Quick Links
π¦ Component Links
| Component | Description | Links |
|---|---|---|
| TraceVerde | OTEL Instrumentation | GitHub β’ PyPI |
| SMOLTRACE | Evaluation Engine | GitHub β’ PyPI |
| MCP Server | Building MCP (Track 1) | HF Space β’ GitHub |
| TraceMind-AI | MCP in Action (Track 2) | HF Space β’ GitHub |
π’ Community Posts
- π TraceMind-AI Hackathon Submission - MCP's 1st Birthday Hackathon final submission
- π Building TraceMind Ecosystem Blog Post - Complete technical deep-dive into the TraceVerse ecosystem
- π TraceMind Teaser - MCP's 1st Birthday Hackathon announcement
- π SMOLTRACE Launch - Lightweight agent evaluation engine
- π TraceVerde Launch - Zero-code OTEL instrumentation for LLMs
- π TraceVerde 3K Downloads - Thank you to the community!
Credits
Built for: MCP's 1st Birthday Hackathon (Nov 14-30, 2025) Track: MCP in Action (Enterprise) Author: Kshitij Thakkar Powered by: TraceMind MCP Server + Gradio + smolagents Built with: Gradio 5.49.1 (MCP client integration)
Special Thanks:
- Eliseu Silva - For the gradio_htmlplus custom component that powers our interactive leaderboard table. Eliseu's timely help and collaboration during the hackathon was invaluable!
Sponsors: HuggingFace β’ Google Gemini β’ Modal β’ Anthropic β’ Gradio β’ OpenAI β’ Nebius β’ Hyperbolic β’ ElevenLabs β’ SambaNova β’ Blaxel
License
AGPL-3.0 - See LICENSE for details
Support
- π§ GitHub Issues: TraceMind-AI/issues
- π¬ HF Discord:
#mcp-1st-birthday-officialπ - π·οΈ Tag:
mcp-in-action-track-enterprise - π¦ Twitter: @TraceMindAI (placeholder)
Ready to evaluate your agents with AI-powered intelligence?
π Try the live demo: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind