TraceMind / README.md
kshitijthakkar's picture
docs: Add HuggingFace platform value proposition to README
9b4e279

A newer version of the Gradio SDK is available: 6.0.2

Upgrade
metadata
title: TraceMind AI
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
short_description: AI agent evaluation with MCP-powered intelligence
license: agpl-3.0
pinned: true
tags:
  - mcp-in-action-track-enterprise
  - agent-evaluation
  - mcp-client
  - leaderboard
  - gradio

🧠 TraceMind-AI

TraceMind-AI Logo

Agent Evaluation Platform with MCP-Powered Intelligence

MCP's 1st Birthday Hackathon Track 2: MCP in Action Powered by Gradio

🎯 Track 2 Submission: MCP in Action (Enterprise) πŸ“… MCP's 1st Birthday Hackathon: November 14-30, 2025


Why TraceMind-AI?

The Challenge: Evaluating AI agents generates complex data across models, providers, and configurations. Making sense of it all is overwhelming.

The Solution: TraceMind-AI is your intelligent agent evaluation command center:

  • πŸ“Š Live leaderboard with real-time performance data
  • πŸ€– Autonomous agent chat powered by MCP tools
  • πŸ’° Smart cost estimation before you run evaluations
  • πŸ” Deep trace analysis to debug agent behavior
  • ☁️ Multi-cloud job submission (HuggingFace Jobs + Modal)

All powered by the Model Context Protocol for AI-driven insights at every step.


πŸš€ Try It Now


The TraceMind Ecosystem

TraceMind-AI is the user-facing platform in a complete 4-project agent evaluation ecosystem:

TraceVerse Ecosystem

πŸ”­ TraceVerde                    πŸ“Š SMOLTRACE
(genai_otel_instrument)         (Evaluation Engine)
        ↓                               ↓
    Instruments                    Evaluates
    LLM calls                      agents
        ↓                               ↓
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
            Generates Datasets
        (leaderboard, traces, metrics)
                    ↓
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        ↓                               ↓
πŸ› οΈ TraceMind MCP Server         🧠 TraceMind-AI
(Track 1 - Building MCP)        (This Project - Track 2)
Provides AI Tools               Consumes MCP Tools
        └───────── MCP Protocol β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Foundation

πŸ”­ TraceVerde - Automatic OpenTelemetry instrumentation for LLM frameworks β†’ Captures every LLM call, tool usage, and agent step β†’ GitHub | PyPI

πŸ“Š SMOLTRACE - Lightweight evaluation engine with built-in tracing β†’ Generates structured datasets (leaderboard, results, traces, metrics) β†’ GitHub | PyPI

The Platform

πŸ› οΈ TraceMind MCP Server - AI-powered analysis tools via MCP β†’ Live Demo | GitHub β†’ Track 1: Building MCP (Enterprise)

🧠 TraceMind-AI (This Project) - Interactive UI that consumes MCP tools β†’ Track 2: MCP in Action (Enterprise)


Why This Matters for Hugging Face

This ecosystem is built around Hugging Face, not just "using it":

  • Every SMOLTRACE evaluation creates 4 structured datasets on the Hub (leaderboard, results, traces, metrics)
  • TraceMind MCP Server and TraceMind-AI run as Hugging Face Spaces, using Gradio's MCP integration
  • The stack is designed for smolagents – agents are evaluated, traced, and analyzed using HF's own agent framework
  • Evaluations can be executed via HF Jobs, turning evaluations into real compute usage, not just local scripts

So TraceMind isn't just another agent demo. It's an opinionated blueprint for:

"How Hugging Face models + Datasets + Spaces + Jobs + smolagents + MCP can work together as a complete agent evaluation and observability platform."


Key Features

🎯 MCP Integration (Track 2)

TraceMind-AI demonstrates enterprise MCP client usage in two ways:

1. Direct MCP Client Integration

  • Connects to TraceMind MCP Server via SSE transport
  • Uses 5 AI-powered tools: analyze_leaderboard, estimate_cost, debug_trace, compare_runs, analyze_results
  • Real-time insights powered by Google Gemini 2.5 Flash

2. Autonomous Agent with MCP Tools

  • Built with smolagents framework
  • Agent has access to all MCP server tools
  • Natural language queries β†’ autonomous tool execution
  • Example: "What are the top 3 models and how much do they cost?"

πŸ“Š Agent Evaluation Features

  • Live Leaderboard: View all evaluation runs with sortable metrics
  • Cost Estimation: Auto-select hardware and predict costs before running
  • Trace Visualization: Deep-dive into OpenTelemetry traces with GPU metrics
  • Multi-Cloud Jobs: Submit evaluations to HuggingFace Jobs or Modal
  • Performance Analytics: GPU utilization, CO2 emissions, token tracking

πŸ’‘ Smart Features

  • Auto Hardware Selection: Based on model size and provider
  • Real-time Job Monitoring: Track HuggingFace Jobs status
  • Agent Reasoning Visibility: See step-by-step tool execution
  • Quick Action Buttons: One-click common queries

Quick Start

Option 1: Use the Live Demo (Recommended)

  1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
  2. Login: Sign in with your HuggingFace account
  3. Explore: Browse the leaderboard, chat with the agent, visualize traces

Option 2: Run Locally

# Clone and setup
git clone https://github.com/Mandark-droid/TraceMind-AI.git
cd TraceMind-AI
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your API keys (see Configuration section)

# Run the app
python app.py

Visit http://localhost:7860


Configuration

For Viewing (Free)

Required:

  • HuggingFace account (free)
  • HuggingFace token with Read permissions

For Submitting Jobs (Paid)

Required:

  • ⚠️ HuggingFace Pro ($9/month) with credit card
  • HuggingFace token with Read + Write + Run Jobs permissions
  • LLM provider API keys (OpenAI, Anthropic, etc.)

Optional (Modal Alternative):

  • Modal account (pay-per-second, no subscription)
  • Modal API token (MODAL_TOKEN_ID + MODAL_TOKEN_SECRET)

Using Your Own API Keys (Recommended for Judges)

To prevent rate limits during evaluation:

Step 1: Configure MCP Server (Required for AI tools)

  1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
  2. Go to βš™οΈ Settings tab
  3. Enter: Gemini API Key + HuggingFace Token
  4. Click "Save & Override Keys"

Step 2: Configure TraceMind-AI (Optional)

  1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
  2. Go to βš™οΈ Settings tab
  3. Enter: Gemini API Key + HuggingFace Token
  4. Click "Save API Keys"

Get Free API Keys:


For Hackathon Judges

βœ… Track 2 Compliance

  • MCP Client Integration: Connects to remote MCP server via SSE transport
  • Autonomous Agent: smolagents agent with MCP tool access
  • Enterprise Focus: Cost optimization, job submission, performance analytics
  • Production-Ready: Deployed to HuggingFace Spaces with OAuth authentication
  • Real Data: Live HuggingFace datasets from SMOLTRACE evaluations

🎯 Key Innovations

  1. Dual MCP Integration: Both direct MCP client + autonomous agent with MCP tools
  2. Multi-Cloud Support: HuggingFace Jobs + Modal for serverless compute
  3. Auto Hardware Selection: Smart hardware recommendations based on model size
  4. Complete Ecosystem: Part of 4-project platform demonstrating full evaluation workflow
  5. Agent Reasoning Visibility: See step-by-step MCP tool execution

πŸ“Ή Demo Materials

πŸ§ͺ Testing Suggestions

1. Try the Agent Chat (πŸ€– Agent Chat tab):

  • "Analyze the current leaderboard and show me the top 5 models"
  • "Compare the costs of the top 3 models"
  • "Estimate the cost of running 100 tests with GPT-4"

2. Explore the Leaderboard (πŸ“Š Leaderboard tab):

  • Click "Load Leaderboard" to see live data
  • Read the AI-generated insights (powered by MCP server)
  • Click on a run to see detailed test results

3. Visualize Traces (Select a run β†’ View traces):

  • See OpenTelemetry waterfall diagrams
  • View GPU metrics overlay (for GPU jobs)
  • Ask questions about the trace (MCP-powered debugging)

What Can You Do?

πŸ“Š View & Analyze

  • Browse leaderboard with AI-powered insights
  • Compare models side-by-side across metrics
  • Analyze traces with interactive visualization
  • Ask questions via autonomous agent

πŸ’° Estimate & Plan

  • Get cost estimates before running evaluations
  • Compare hardware options (CPU vs GPU tiers)
  • Preview duration and CO2 emissions
  • See recommendations from AI analysis

πŸš€ Submit & Monitor

  • Submit evaluation jobs to HuggingFace or Modal
  • Track job status in real-time
  • View results automatically when complete
  • Download datasets for further analysis

πŸ§ͺ Generate & Customize

  • Generate synthetic datasets for custom domains and tools
  • Create prompt templates optimized for your use case
  • Push to HuggingFace Hub with one click
  • Test evaluations without writing code

Documentation

For quick evaluation:

  • Read this README for overview
  • Visit the Live Demo to try it
  • Check out the πŸ€– Agent Chat tab for autonomous MCP usage

For deep dives:

  • USER_GUIDE.md - Complete screen-by-screen walkthrough
    • Leaderboard tab usage
    • Agent chat interactions
    • Synthetic data generator
    • Job submission workflow
    • Trace visualization guide
  • MCP_INTEGRATION.md - MCP client architecture
    • How TraceMind-AI connects to MCP server
    • Agent framework integration (smolagents)
    • MCP tool usage examples
  • JOB_SUBMISSION.md - Evaluation job guide
    • HuggingFace Jobs setup
    • Modal integration
    • Hardware selection guide
    • Cost optimization tips
  • ARCHITECTURE.md - Technical architecture
    • Project structure
    • Data flow
    • Authentication
    • Deployment

Technology Stack

  • UI Framework: Gradio 5.49.1
  • Agent Framework: smolagents 1.22.0+
  • MCP Integration: MCP Python SDK + smolagents MCPClient
  • Data Source: HuggingFace Datasets API
  • Authentication: HuggingFace OAuth (planned)
  • AI Models:
    • Agent: Google Gemini 2.5 Flash
    • MCP Server: Google Gemini 2.5 Flash
  • Cloud Platforms: HuggingFace Jobs + Modal

Example Workflows

Workflow 1: Quick Analysis

  1. Open TraceMind-AI
  2. Go to πŸ€– Agent Chat
  3. Click "Quick: Top Models"
  4. See agent fetch leaderboard and analyze top performers
  5. Ask follow-up: "Which one is most cost-effective?"

Workflow 2: Submit Evaluation Job

  1. Go to βš™οΈ Settings β†’ Configure API keys
  2. Go to πŸš€ New Evaluation
  3. Select model (e.g., meta-llama/Llama-3.1-8B)
  4. Choose infrastructure (HuggingFace Jobs or Modal)
  5. Click "πŸ’° Estimate Cost" to preview
  6. Click "Submit Evaluation"
  7. Monitor job in πŸ“Š Job Monitoring tab
  8. View results in leaderboard when complete

Workflow 3: Debug Agent Behavior

  1. Browse πŸ“Š Leaderboard
  2. Click on a run with failures
  3. View detailed test results
  4. Click on a failed test to see trace
  5. Use MCP-powered Q&A: "Why did this test fail?"
  6. Get AI analysis of the execution trace

Workflow 4: Generate Custom Test Dataset

  1. Go to πŸ”¬ Synthetic Data Generator
  2. Configure:
    • Domain: finance
    • Tools: get_stock_price,calculate_profit,send_alert
    • Number of tasks: 20
    • Difficulty: balanced
  3. Click "Generate Dataset"
  4. Review generated tasks and prompt template
  5. Enter repository name: yourname/smoltrace-finance-tasks
  6. Click "Push to HuggingFace Hub"
  7. Use your custom dataset in evaluations

Screenshots

See SCREENSHOTS.md for annotated screenshots of all screens


πŸ”— Quick Links

πŸ“¦ Component Links

Component Description Links
TraceVerde OTEL Instrumentation GitHub β€’ PyPI
SMOLTRACE Evaluation Engine GitHub β€’ PyPI
MCP Server Building MCP (Track 1) HF Space β€’ GitHub
TraceMind-AI MCP in Action (Track 2) HF Space β€’ GitHub

πŸ“’ Community Posts


Credits

Built for: MCP's 1st Birthday Hackathon (Nov 14-30, 2025) Track: MCP in Action (Enterprise) Author: Kshitij Thakkar Powered by: TraceMind MCP Server + Gradio + smolagents Built with: Gradio 5.49.1 (MCP client integration)

Special Thanks:

  • Eliseu Silva - For the gradio_htmlplus custom component that powers our interactive leaderboard table. Eliseu's timely help and collaboration during the hackathon was invaluable!

Sponsors: HuggingFace β€’ Google Gemini β€’ Modal β€’ Anthropic β€’ Gradio β€’ OpenAI β€’ Nebius β€’ Hyperbolic β€’ ElevenLabs β€’ SambaNova β€’ Blaxel


License

AGPL-3.0 - See LICENSE for details


Support

  • πŸ“§ GitHub Issues: TraceMind-AI/issues
  • πŸ’¬ HF Discord: #mcp-1st-birthday-officialπŸ†
  • 🏷️ Tag: mcp-in-action-track-enterprise
  • 🐦 Twitter: @TraceMindAI (placeholder)

Ready to evaluate your agents with AI-powered intelligence?

🌐 Try the live demo: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind