Spaces:
Running
Running
File size: 17,965 Bytes
cdeb1d3 e7e29cb cdeb1d3 fae4e5b 664f166 fae4e5b 98dc4d3 ea9bb7d dafc8f1 fae4e5b 659d404 cdeb1d3 e7e29cb fae4e5b 73f859d 34f1a7a 73f859d fae4e5b 8dccf7d 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a 6ae304e 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 9b4e279 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 0b8bed8 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 3fbacd1 34f1a7a 3fbacd1 34f1a7a 3fbacd1 34f1a7a 3fbacd1 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a 4449927 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a d0bd9af 34f1a7a d0bd9af 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a d0bd9af 34f1a7a d0bd9af 6ae304e 54d748d d0bd9af 34f1a7a d0bd9af 34f1a7a d0bd9af 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a ae24574 34f1a7a ae24574 34f1a7a fae4e5b 34f1a7a d0bd9af 34f1a7a d0bd9af 34f1a7a d0bd9af 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 54d748d f42b8e7 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a 50b95c9 34f1a7a 50b95c9 34f1a7a 50b95c9 34f1a7a 2c5c69c 34f1a7a 50b95c9 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a fae4e5b 34f1a7a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 |
---
title: TraceMind AI
emoji: π§
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
short_description: AI agent evaluation with MCP-powered intelligence
license: agpl-3.0
pinned: true
tags:
- mcp-in-action-track-enterprise
- agent-evaluation
- mcp-client
- leaderboard
- gradio
---
# π§ TraceMind-AI
<p align="center">
<img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/Logo.png" alt="TraceMind-AI Logo" width="200"/>
</p>
**Agent Evaluation Platform with MCP-Powered Intelligence**
[](https://github.com/modelcontextprotocol)
[-purple)](https://github.com/modelcontextprotocol/hackathon)
[](https://gradio.app/)
> **π― Track 2 Submission**: MCP in Action (Enterprise)
> **π
MCP's 1st Birthday Hackathon**: November 14-30, 2025
---
## Why TraceMind-AI?
**The Challenge**: Evaluating AI agents generates complex data across models, providers, and configurations. Making sense of it all is overwhelming.
**The Solution**: TraceMind-AI is your **intelligent agent evaluation command center**:
- π **Live leaderboard** with real-time performance data
- π€ **Autonomous agent chat** powered by MCP tools
- π° **Smart cost estimation** before you run evaluations
- π **Deep trace analysis** to debug agent behavior
- βοΈ **Multi-cloud job submission** (HuggingFace Jobs + Modal)
All powered by the **Model Context Protocol** for AI-driven insights at every step.
---
## π Try It Now
- **π Live Demo**: [TraceMind-AI Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)
- **π οΈ MCP Server**: [TraceMind-mcp-server](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) (Track 1)
- **π Full Docs**: See [USER_GUIDE.md](USER_GUIDE.md) for complete walkthrough
- **π₯ TraceMind-AI Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/70b9689b57204da58b8fef0d23c304fe)
- **π¬ MCP Server Quick Demo (5 min)**: [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)
- **πΊ MCP Server Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)
---
## The TraceMind Ecosystem
TraceMind-AI is the **user-facing platform** in a complete 4-project agent evaluation ecosystem:
<p align="center">
<img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/TraceVerse_Logo.png" alt="TraceVerse Ecosystem" width="400"/>
<br/><br/>
</p>
```
π TraceVerde π SMOLTRACE
(genai_otel_instrument) (Evaluation Engine)
β β
Instruments Evaluates
LLM calls agents
β β
βββββββββββββ¬ββββββββββββββββββββ
β
Generates Datasets
(leaderboard, traces, metrics)
β
βββββββββββββ΄ββββββββββββββββββββ
β β
π οΈ TraceMind MCP Server π§ TraceMind-AI
(Track 1 - Building MCP) (This Project - Track 2)
Provides AI Tools Consumes MCP Tools
ββββββββββ MCP Protocol βββββββββ
```
### The Foundation
**π TraceVerde** - Automatic OpenTelemetry instrumentation for LLM frameworks
β Captures every LLM call, tool usage, and agent step
β [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) | [PyPI](https://pypi.org/project/genai-otel-instrument)
**π SMOLTRACE** - Lightweight evaluation engine with built-in tracing
β Generates structured datasets (leaderboard, results, traces, metrics)
β [GitHub](https://github.com/Mandark-droid/SMOLTRACE) | [PyPI](https://pypi.org/project/smoltrace/)
### The Platform
**π οΈ TraceMind MCP Server** - AI-powered analysis tools via MCP
β [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) | [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server)
β **Track 1**: Building MCP (Enterprise)
**π§ TraceMind-AI** (This Project) - Interactive UI that consumes MCP tools
β **Track 2**: MCP in Action (Enterprise)
---
## Why This Matters for Hugging Face
This ecosystem is built **around** Hugging Face, not just "using it":
- Every SMOLTRACE evaluation creates **4 structured `datasets` on the Hub** (leaderboard, results, traces, metrics)
- TraceMind MCP Server and TraceMind-AI run as **Hugging Face Spaces**, using **Gradio's MCP integration**
- The stack is designed for **`smolagents`** β agents are evaluated, traced, and analyzed using HF's own agent framework
- Evaluations can be executed via **HF Jobs**, turning evaluations into real compute usage, not just local scripts
So TraceMind isn't just another agent demo.
**It's an opinionated blueprint for:**
> **"How Hugging Face models + Datasets + Spaces + Jobs + smolagents + MCP can work together as a complete agent evaluation and observability platform."**
---
## Key Features
### π― MCP Integration (Track 2)
TraceMind-AI demonstrates **enterprise MCP client usage** in two ways:
**1. Direct MCP Client Integration**
- Connects to TraceMind MCP Server via SSE transport
- Uses 5 AI-powered tools: `analyze_leaderboard`, `estimate_cost`, `debug_trace`, `compare_runs`, `analyze_results`
- Real-time insights powered by Google Gemini 2.5 Flash
**2. Autonomous Agent with MCP Tools**
- Built with `smolagents` framework
- Agent has access to all MCP server tools
- Natural language queries β autonomous tool execution
- Example: *"What are the top 3 models and how much do they cost?"*
### π Agent Evaluation Features
- **Live Leaderboard**: View all evaluation runs with sortable metrics
- **Cost Estimation**: Auto-select hardware and predict costs before running
- **Trace Visualization**: Deep-dive into OpenTelemetry traces with GPU metrics
- **Multi-Cloud Jobs**: Submit evaluations to HuggingFace Jobs or Modal
- **Performance Analytics**: GPU utilization, CO2 emissions, token tracking
### π‘ Smart Features
- **Auto Hardware Selection**: Based on model size and provider
- **Real-time Job Monitoring**: Track HuggingFace Jobs status
- **Agent Reasoning Visibility**: See step-by-step tool execution
- **Quick Action Buttons**: One-click common queries
---
## Quick Start
### Option 1: Use the Live Demo (Recommended)
1. **Visit**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
2. **Login**: Sign in with your HuggingFace account
3. **Explore**: Browse the leaderboard, chat with the agent, visualize traces
### Option 2: Run Locally
```bash
# Clone and setup
git clone https://github.com/Mandark-droid/TraceMind-AI.git
cd TraceMind-AI
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your API keys (see Configuration section)
# Run the app
python app.py
```
Visit http://localhost:7860
---
## Configuration
### For Viewing (Free)
**Required**:
- HuggingFace account (free)
- HuggingFace token with **Read** permissions
### For Submitting Jobs (Paid)
**Required**:
- β οΈ **HuggingFace Pro** ($9/month) with credit card
- HuggingFace token with **Read + Write + Run Jobs** permissions
- LLM provider API keys (OpenAI, Anthropic, etc.)
**Optional (Modal Alternative)**:
- Modal account (pay-per-second, no subscription)
- Modal API token (MODAL_TOKEN_ID + MODAL_TOKEN_SECRET)
### Using Your Own API Keys (Recommended for Judges)
To prevent rate limits during evaluation:
**Step 1: Configure MCP Server** (Required for AI tools)
1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
2. Go to **βοΈ Settings** tab
3. Enter: **Gemini API Key** + **HuggingFace Token**
4. Click **"Save & Override Keys"**
**Step 2: Configure TraceMind-AI** (Optional)
1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
2. Go to **βοΈ Settings** tab
3. Enter: **Gemini API Key** + **HuggingFace Token**
4. Click **"Save API Keys"**
**Get Free API Keys**:
- **Gemini**: https://ai.google.dev/ (1,500 requests/day)
- **HuggingFace**: https://huggingface.co/settings/tokens (unlimited for public datasets)
---
## For Hackathon Judges
### β
Track 2 Compliance
- **MCP Client Integration**: Connects to remote MCP server via SSE transport
- **Autonomous Agent**: `smolagents` agent with MCP tool access
- **Enterprise Focus**: Cost optimization, job submission, performance analytics
- **Production-Ready**: Deployed to HuggingFace Spaces with OAuth authentication
- **Real Data**: Live HuggingFace datasets from SMOLTRACE evaluations
### π― Key Innovations
1. **Dual MCP Integration**: Both direct MCP client + autonomous agent with MCP tools
2. **Multi-Cloud Support**: HuggingFace Jobs + Modal for serverless compute
3. **Auto Hardware Selection**: Smart hardware recommendations based on model size
4. **Complete Ecosystem**: Part of 4-project platform demonstrating full evaluation workflow
5. **Agent Reasoning Visibility**: See step-by-step MCP tool execution
### πΉ Demo Materials
- **π₯ TraceMind-AI Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/70b9689b57204da58b8fef0d23c304fe) - Complete walkthrough of all features
- **π¬ MCP Server Quick Demo (5 min)**: [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835) - Quick intro to MCP tools
- **πΊ MCP Server Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250) - Deep dive into MCP server
- **π Blog Post**: [Building TraceMind Ecosystem](https://huggingface.co/blog/kshitijthakkar/tracemind-ecosystem) - Technical deep-dive
- **π LinkedIn Post**: [TraceMind-AI Hackathon Submission](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp1stbirthdayhackathon-mcp-modelcontextprotocol-activity-7399775530218065920-owgR) - Final submission announcement
### π§ͺ Testing Suggestions
**1. Try the Agent Chat** (π€ Agent Chat tab):
- "Analyze the current leaderboard and show me the top 5 models"
- "Compare the costs of the top 3 models"
- "Estimate the cost of running 100 tests with GPT-4"
**2. Explore the Leaderboard** (π Leaderboard tab):
- Click "Load Leaderboard" to see live data
- Read the AI-generated insights (powered by MCP server)
- Click on a run to see detailed test results
**3. Visualize Traces** (Select a run β View traces):
- See OpenTelemetry waterfall diagrams
- View GPU metrics overlay (for GPU jobs)
- Ask questions about the trace (MCP-powered debugging)
---
## What Can You Do?
### π View & Analyze
- **Browse leaderboard** with AI-powered insights
- **Compare models** side-by-side across metrics
- **Analyze traces** with interactive visualization
- **Ask questions** via autonomous agent
### π° Estimate & Plan
- **Get cost estimates** before running evaluations
- **Compare hardware options** (CPU vs GPU tiers)
- **Preview duration** and CO2 emissions
- **See recommendations** from AI analysis
### π Submit & Monitor
- **Submit evaluation jobs** to HuggingFace or Modal
- **Track job status** in real-time
- **View results** automatically when complete
- **Download datasets** for further analysis
### π§ͺ Generate & Customize
- **Generate synthetic datasets** for custom domains and tools
- **Create prompt templates** optimized for your use case
- **Push to HuggingFace Hub** with one click
- **Test evaluations** without writing code
---
## Documentation
**For quick evaluation**:
- Read this README for overview
- Visit the [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) to try it
- Check out the **π€ Agent Chat** tab for autonomous MCP usage
**For deep dives**:
- [USER_GUIDE.md](USER_GUIDE.md) - Complete screen-by-screen walkthrough
- Leaderboard tab usage
- Agent chat interactions
- Synthetic data generator
- Job submission workflow
- Trace visualization guide
- [MCP_INTEGRATION.md](MCP_INTEGRATION.md) - MCP client architecture
- How TraceMind-AI connects to MCP server
- Agent framework integration (smolagents)
- MCP tool usage examples
- [JOB_SUBMISSION.md](JOB_SUBMISSION.md) - Evaluation job guide
- HuggingFace Jobs setup
- Modal integration
- Hardware selection guide
- Cost optimization tips
- [ARCHITECTURE.md](ARCHITECTURE.md) - Technical architecture
- Project structure
- Data flow
- Authentication
- Deployment
---
## Technology Stack
- **UI Framework**: Gradio 5.49.1
- **Agent Framework**: smolagents 1.22.0+
- **MCP Integration**: MCP Python SDK + smolagents MCPClient
- **Data Source**: HuggingFace Datasets API
- **Authentication**: HuggingFace OAuth (planned)
- **AI Models**:
- Agent: Google Gemini 2.5 Flash
- MCP Server: Google Gemini 2.5 Flash
- **Cloud Platforms**: HuggingFace Jobs + Modal
---
## Example Workflows
### Workflow 1: Quick Analysis
1. Open TraceMind-AI
2. Go to **π€ Agent Chat**
3. Click **"Quick: Top Models"**
4. See agent fetch leaderboard and analyze top performers
5. Ask follow-up: *"Which one is most cost-effective?"*
### Workflow 2: Submit Evaluation Job
1. Go to **βοΈ Settings** β Configure API keys
2. Go to **π New Evaluation**
3. Select model (e.g., `meta-llama/Llama-3.1-8B`)
4. Choose infrastructure (HuggingFace Jobs or Modal)
5. Click **"π° Estimate Cost"** to preview
6. Click **"Submit Evaluation"**
7. Monitor job in **π Job Monitoring** tab
8. View results in leaderboard when complete
### Workflow 3: Debug Agent Behavior
1. Browse **π Leaderboard**
2. Click on a run with failures
3. View **detailed test results**
4. Click on a failed test to see trace
5. Use MCP-powered Q&A: *"Why did this test fail?"*
6. Get AI analysis of the execution trace
### Workflow 4: Generate Custom Test Dataset
1. Go to **π¬ Synthetic Data Generator**
2. Configure:
- Domain: `finance`
- Tools: `get_stock_price,calculate_profit,send_alert`
- Number of tasks: `20`
- Difficulty: `balanced`
3. Click **"Generate Dataset"**
4. Review generated tasks and prompt template
5. Enter repository name: `yourname/smoltrace-finance-tasks`
6. Click **"Push to HuggingFace Hub"**
7. Use your custom dataset in evaluations
---
## Screenshots
*See [SCREENSHOTS.md](SCREENSHOTS.md) for annotated screenshots of all screens*
---
## π Quick Links
### π¦ Component Links
| Component | Description | Links |
|-----------|-------------|-------|
| **TraceVerde** | OTEL Instrumentation | [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) β’ [PyPI](https://pypi.org/project/genai-otel-instrument) |
| **SMOLTRACE** | Evaluation Engine | [GitHub](https://github.com/Mandark-droid/SMOLTRACE) β’ [PyPI](https://pypi.org/project/smoltrace/) |
| **MCP Server** | Building MCP (Track 1) | [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) β’ [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server) |
| **TraceMind-AI** | MCP in Action (Track 2) | [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) β’ [GitHub](https://github.com/Mandark-droid/TraceMind-AI) |
### π’ Community Posts
- π [**TraceMind-AI Hackathon Submission**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp1stbirthdayhackathon-mcp-modelcontextprotocol-activity-7399775530218065920-owgR) - MCP's 1st Birthday Hackathon final submission
- π [**Building TraceMind Ecosystem Blog Post**](https://huggingface.co/blog/kshitijthakkar/tracemind-ecosystem) - Complete technical deep-dive into the TraceVerse ecosystem
- π [**TraceMind Teaser**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcpsfirstbirthdayhackathon-mcpsfirstbirthdayhackathon-activity-7395686529270013952-g_id) - MCP's 1st Birthday Hackathon announcement
- π [**SMOLTRACE Launch**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_ai-machinelearning-llm-activity-7394350375908126720-im_T) - Lightweight agent evaluation engine
- π [**TraceVerde Launch**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_genai-opentelemetry-observability-activity-7390339855135813632-wqEg) - Zero-code OTEL instrumentation for LLMs
- π [**TraceVerde 3K Downloads**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_thank-you-open-source-community-a-week-activity-7392205780592132096-nu6U) - Thank you to the community!
---
## Credits
**Built for**: MCP's 1st Birthday Hackathon (Nov 14-30, 2025)
**Track**: MCP in Action (Enterprise)
**Author**: Kshitij Thakkar
**Powered by**: TraceMind MCP Server + Gradio + smolagents
**Built with**: Gradio 5.49.1 (MCP client integration)
**Special Thanks**:
- **[Eliseu Silva](https://huggingface.co/elismasilva)** - For the [gradio_htmlplus](https://huggingface.co/spaces/elismasilva/gradio_htmlplus) custom component that powers our interactive leaderboard table. Eliseu's timely help and collaboration during the hackathon was invaluable!
**Sponsors**: HuggingFace β’ Google Gemini β’ Modal β’ Anthropic β’ Gradio β’ OpenAI β’ Nebius β’ Hyperbolic β’ ElevenLabs β’ SambaNova β’ Blaxel
---
## License
AGPL-3.0 - See [LICENSE](LICENSE) for details
---
## Support
- π§ GitHub Issues: [TraceMind-AI/issues](https://github.com/Mandark-droid/TraceMind-AI/issues)
- π¬ HF Discord: `#mcp-1st-birthday-officialπ`
- π·οΈ Tag: `mcp-in-action-track-enterprise`
- π¦ Twitter: [@TraceMindAI](https://twitter.com/TraceMindAI) (placeholder)
---
**Ready to evaluate your agents with AI-powered intelligence?**
π **Try the live demo**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
|