File size: 13,331 Bytes
c5a5a1d
a3116de
 
 
 
baded2f
 
4a16168
c5a5a1d
e4b0c31
a3116de
 
 
 
 
 
6dbab6f
c5a5a1d
 
a3116de
 
44e697d
 
 
 
6982f0b
a3116de
 
6982f0b
 
a3116de
 
 
 
6982f0b
a18d50d
6982f0b
a18d50d
6982f0b
a18d50d
6982f0b
 
 
 
a18d50d
6982f0b
a18d50d
 
 
228f78e
 
6982f0b
 
 
 
 
a3116de
846c324
 
 
1d46a37
846c324
 
 
 
 
6982f0b
 
 
a3116de
 
 
6982f0b
a3116de
6982f0b
a3116de
6982f0b
 
 
a3116de
 
6982f0b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a3116de
 
6982f0b
86f2cd8
6982f0b
 
86f2cd8
6982f0b
 
86f2cd8
6982f0b
86f2cd8
6982f0b
 
 
86f2cd8
6982f0b
 
 
86f2cd8
6982f0b
86f2cd8
3378120
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6982f0b
86f2cd8
6982f0b
86f2cd8
6982f0b
 
 
 
 
 
a3116de
6982f0b
 
 
a3116de
6982f0b
 
 
 
 
a3116de
6982f0b
a3116de
6982f0b
 
 
 
a3116de
6982f0b
228f78e
6982f0b
 
 
 
228f78e
6982f0b
228f78e
6982f0b
228f78e
6982f0b
228f78e
6982f0b
228f78e
6982f0b
 
 
 
 
228f78e
6982f0b
228f78e
6982f0b
228f78e
 
 
 
c1a84e8
 
228f78e
 
 
 
 
6982f0b
228f78e
 
 
 
 
 
 
 
 
 
 
6982f0b
a3116de
6982f0b
a3116de
6982f0b
a3116de
 
6982f0b
a3116de
6982f0b
a3116de
6982f0b
 
 
 
 
a3116de
6982f0b
 
 
a3116de
6982f0b
a3116de
6982f0b
64af94c
6982f0b
a3116de
6982f0b
 
 
 
 
 
a3116de
6982f0b
a3116de
6982f0b
 
 
 
 
a3116de
6982f0b
a3116de
846c324
 
 
 
 
a3116de
6982f0b
a3116de
6982f0b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a3116de
6982f0b
a3116de
6982f0b
a3116de
6982f0b
 
 
 
 
a3116de
6982f0b
a3116de
6982f0b
a3116de
 
6982f0b
 
 
 
 
 
a3116de
6982f0b
 
 
a3116de
6982f0b
a3116de
 
 
6982f0b
a3116de
6982f0b
a3116de
6982f0b
a3116de
6982f0b
 
 
 
a3116de
6982f0b
 
 
a3116de
6982f0b
a3116de
6982f0b
a3116de
6982f0b
a3116de
 
c1a84e8
 
a3116de
38fcab6
a3116de
6982f0b
a3116de
6982f0b
a3116de
6982f0b
a3116de
 
 
6982f0b
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
---
title: TraceMind MCP Server
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: true
license: agpl-3.0
short_description: MCP server for agent evaluation with Gemini 2.5 Flash
tags:
  - building-mcp-track-enterprise
  - mcp
  - gradio
  - gemini
  - agent-evaluation
  - leaderboard
---

# TraceMind MCP Server

<p align="center">
  <img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-mcp-server/assets/Logo.png" alt="TraceMind MCP Server Logo" width="200"/>
</p>

**AI-Powered Analysis Tools for Agent Evaluation**

[![MCP's 1st Birthday Hackathon](https://img.shields.io/badge/MCP%27s%201st%20Birthday-Hackathon-blue)](https://github.com/modelcontextprotocol)
[![Track 1: Building MCP](https://img.shields.io/badge/Track-Building%20MCP%20(Enterprise)-blue)](https://github.com/modelcontextprotocol/hackathon)
[![Powered by Google Gemini](https://img.shields.io/badge/Powered%20by-Google%20Gemini%202.5%20Pro-orange)](https://ai.google.dev/)

> **🎯 Track 1 Submission**: Building MCP (Enterprise)
> **πŸ“… MCP's 1st Birthday Hackathon**: November 14-30, 2025

---

## Why This MCP Server?

**Problem**: Agent evaluation generates mountains of dataβ€”leaderboards, traces, metricsβ€”but developers struggle to extract actionable insights.

**Solution**: This MCP server provides **11 AI-powered tools** that transform raw evaluation data into clear answers:
- *"Which model is best for my use case?"*
- *"Why did this agent execution fail?"*
- *"How much will this evaluation cost?"*

**Powered by Google Gemini 2.5 Flash** for intelligent, context-aware analysis of agent performance data.

---

## πŸ”— Quick Links

- **🌐 Live Demo**: [TraceMind-mcp-server Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
- **⚑ Auto-Config**: Add `MCP-1st-Birthday/TraceMind-mcp-server` at https://huggingface.co/settings/mcp
- **πŸ“– Full Docs**: See [DOCUMENTATION.md](DOCUMENTATION.md) for complete technical reference
- **🎬 Quick Demo (5 min)**: [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)
- **πŸ“Ί Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)

### Social Media

Read the announcement and join the discussion:
- **πŸ“ [Blog Post]**: [Building TraceMind Ecosystem](https://huggingface.co/blog/kshitijthakkar/tracemind-ecosystem) - Complete technical deep-dive into the TraceVerse ecosystem
- **[Twitter/X post link]** : [View on X](https://x.com/Mandark12921244/status/1993279134156607594?s=20)
- **[LinkedIn post link]**: [View on LinkedIn](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp-modelcontextprotocol-aiagents-activity-7399052013524647936-wgkA)
- **[HuggingFace Discord announcement link]**: [Read on discord](https://discord.com/channels/879548962464493619/1439001549492719726/1442838638307180656)


**MCP Endpoints**:
- SSE (Recommended): `https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse`
- Streamable HTTP: `https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/`

---

## The TraceMind Ecosystem

This MCP server is part of a **complete agent evaluation platform** built from four interconnected projects:

<p align="center">
  <img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/TraceVerse_Logo.png" alt="TraceVerse Ecosystem" width="400"/>
</p>

```
πŸ”­ TraceVerde                    πŸ“Š SMOLTRACE
(genai_otel_instrument)         (Evaluation Engine)
        ↓                               ↓
    Instruments                    Evaluates
    LLM calls                      agents
        ↓                               ↓
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
            Generates Datasets
        (leaderboard, traces, metrics)
                    ↓
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        ↓                               ↓
πŸ› οΈ TraceMind MCP Server         🧠 TraceMind-AI
(This Project - Track 1)        (UI Platform - Track 2)
Analyzes with AI                Visualizes & Interacts
```

### The Foundation

**πŸ”­ TraceVerde** - Zero-code OpenTelemetry instrumentation for LLM frameworks
β†’ [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) | [PyPI](https://pypi.org/project/genai-otel-instrument)

**πŸ“Š SMOLTRACE** - Lightweight evaluation engine that generates structured datasets
β†’ [GitHub](https://github.com/Mandark-droid/SMOLTRACE) | [PyPI](https://pypi.org/project/smoltrace/)

### The Platform

**πŸ› οΈ TraceMind MCP Server** (This Project) - Provides MCP tools for AI-powered analysis
β†’ **Track 1**: Building MCP (Enterprise)
β†’ [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) | [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server)

**🧠 TraceMind-AI** - Gradio UI that consumes MCP tools for interactive evaluation
β†’ [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) | [GitHub](https://github.com/Mandark-droid/TraceMind-AI)
β†’ **Track 2**: MCP in Action (Enterprise)

---

## Why This Matters for Hugging Face

This ecosystem is built **around** Hugging Face, not just "using it":

- Every SMOLTRACE evaluation creates **4 structured `datasets` on the Hub** (leaderboard, results, traces, metrics)
- TraceMind MCP Server and TraceMind-AI run as **Hugging Face Spaces**, using **Gradio's MCP integration**
- The stack is designed for **`smolagents`** – agents are evaluated, traced, and analyzed using HF's own agent framework
- Evaluations can be executed via **HF Jobs**, turning evaluations into real compute usage, not just local scripts

So TraceMind isn't just another MCP server demo.
**It's an opinionated blueprint for:**

> **"How Hugging Face models + Datasets + Spaces + Jobs + smolagents + MCP can work together as a complete agent evaluation and observability platform."**

---

## What's Included

### 11 AI-Powered Tools

**Core Analysis** (AI-Powered by Gemini 2.5 Flash):
1. **πŸ“Š analyze_leaderboard** - Generate insights from evaluation data
2. **πŸ› debug_trace** - Debug agent execution traces with AI assistance
3. **πŸ’° estimate_cost** - Predict costs before running evaluations
4. **βš–οΈ compare_runs** - Compare two evaluation runs with AI analysis
5. **πŸ“‹ analyze_results** - Analyze detailed test results with optimization recommendations

**Token-Optimized Tools**:
6. **πŸ† get_top_performers** - Get top N models (90% token reduction vs. full dataset)
7. **πŸ“ˆ get_leaderboard_summary** - High-level statistics (99% token reduction)

**Data Management**:
8. **πŸ“¦ get_dataset** - Load SMOLTRACE datasets as JSON
9. **πŸ§ͺ generate_synthetic_dataset** - Create domain-specific test datasets with AI (up to 100 tasks)
10. **πŸ“€ push_dataset_to_hub** - Upload datasets to HuggingFace
11. **πŸ“ generate_prompt_template** - Generate customized smolagents prompt templates

### 3 Data Resources

Direct JSON access without AI analysis:
- **leaderboard://{repo}** - Raw evaluation results
- **trace://{trace_id}/{repo}** - OpenTelemetry spans
- **cost://model/{model}** - Pricing information

### 3 Prompt Templates

Standardized templates for consistent analysis:
- **analysis_prompt** - Different analysis types (leaderboard, cost, performance)
- **debug_prompt** - Debugging scenarios
- **optimization_prompt** - Optimization goals

**Total: 17 MCP Components** (11 + 3 + 3)

---

## Quick Start

### 1. Connect to the Live Server

**Easiest Method** (Recommended):
1. Visit https://huggingface.co/settings/mcp (while logged in)
2. Add Space: `MCP-1st-Birthday/TraceMind-mcp-server`
3. Select your MCP client (Claude Desktop, VSCode, Cursor, etc.)
4. Copy the auto-generated config and paste into your client

**Manual Configuration** (Advanced):

For Claude Desktop (`claude_desktop_config.json`):
```json
{
  "mcpServers": {
    "tracemind": {
      "url": "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse",
      "transport": "sse"
    }
  }
}
```

For VSCode/Cursor (`settings.json`):
```json
{
  "mcp.servers": {
    "tracemind": {
      "url": "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/",
      "transport": "streamable-http"
    }
  }
}
```

### 2. Try It Out

Open your MCP client and try:
```
"Analyze the leaderboard at kshitijthakkar/smoltrace-leaderboard and show me the top 5 models"
```

You should see AI-powered insights generated by Gemini 2.5 Flash!

### 3. Using Your Own API Keys (Recommended)

To avoid rate limits during evaluation:
1. Visit the [MCP Server Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
2. Go to **βš™οΈ Settings** tab
3. Enter your **Gemini API Key** and **HuggingFace Token**
4. Click **"Save & Override Keys"**

**Get Free API Keys**:
- **Gemini**: https://ai.google.dev/ (1,500 requests/day free)
- **HuggingFace**: https://huggingface.co/settings/tokens (unlimited for public datasets)

---

## For Hackathon Judges

### βœ… Track 1 Compliance

- **Complete MCP Implementation**: 11 Tools + 3 Resources + 3 Prompts (17 total)
- **MCP Standard Compliant**: Built with Gradio's native `@gr.mcp.*` decorators
- **Production-Ready**: Deployed to HuggingFace Spaces with SSE transport
- **Enterprise Focus**: Cost optimization, debugging, decision support
- **Google Gemini Powered**: All AI analysis uses Gemini 2.5 Flash
- **Interactive Testing**: Beautiful Gradio UI for testing all components

### 🎯 Key Innovations

1. **Token Optimization**: `get_top_performers` and `get_leaderboard_summary` reduce token usage by 90-99%
2. **AI-Powered Synthetic Data**: Generate domain-specific test datasets + matching prompt templates
3. **Complete Ecosystem**: Part of 4-project platform with TraceVerde β†’ SMOLTRACE β†’ MCP Server β†’ TraceMind-AI
4. **Real Data Integration**: Works with live HuggingFace datasets from SMOLTRACE evaluations
5. **Test Results Analysis**: Deep-dive into individual test cases with `analyze_results` tool

### πŸ“Ή Demo Materials

1. **🎬 Quick Demo (5 min)**: [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)
2. **πŸ“Ί Full Demo (20 min)**: [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)
- **[Twitter/X post link]** : [View on X](https://x.com/Mandark12921244/status/1993279134156607594?s=20)
- **[LinkedIn post link]**: [View on LinkedIn](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp-modelcontextprotocol-aiagents-activity-7399052013524647936-wgkA)
- **[HuggingFace Discord announcement link]**: [Read on discord](https://discord.com/channels/879548962464493619/1439001549492719726/1442838638307180656)

---

## Documentation

**For quick evaluation**:
- Read this README for overview
- Visit the [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) to test tools
- Use the Auto-Config link to connect your MCP client

**For deep dives**:
- [DOCUMENTATION.md](DOCUMENTATION.md) - Complete API reference
  - Tool descriptions and parameters
  - Resource URIs and schemas
  - Prompt template details
  - Example use cases
- [ARCHITECTURE.md](ARCHITECTURE.md) - Technical architecture
  - Project structure
  - MCP protocol implementation
  - Gemini integration details
  - Deployment guide

---

## Technology Stack

- **AI Model**: Google Gemini 2.5 Flash (via Google AI SDK)
- **MCP Framework**: Gradio 6 with native MCP support (`@gr.mcp.*` decorators)
- **Data Source**: HuggingFace Datasets API
- **Transport**: SSE (recommended) + Streamable HTTP
- **Deployment**: HuggingFace Spaces (Docker SDK)

---

## Run Locally (Optional)

```bash
# Clone and setup
git clone https://github.com/Mandark-droid/TraceMind-mcp-server.git
cd TraceMind-mcp-server
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your GEMINI_API_KEY and HF_TOKEN

# Run the server
python app.py
```

Visit http://localhost:7860 to test the tools via Gradio UI.

---

## Related Projects

**🧠 TraceMind-AI** (Track 2 - MCP in Action):
- Live Demo: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
- Consumes this MCP server for AI-powered agent evaluation UI
- Features autonomous agent chat, trace visualization, job submission

**πŸ“Š Foundation Libraries**:
- TraceVerde: https://github.com/Mandark-droid/genai_otel_instrument
- SMOLTRACE: https://github.com/Mandark-droid/SMOLTRACE

---

## Credits

**Built for**: MCP's 1st Birthday Hackathon (Nov 14-30, 2025)
**Track**: Building MCP (Enterprise)
**Author**: Kshitij Thakkar
**Powered by**: Google Gemini 2.5 Flash
**Built with**: Gradio (native MCP support)

**Sponsors**: HuggingFace β€’ Google Gemini β€’ Modal β€’ Anthropic β€’ Gradio β€’ OpenAI β€’ Nebius β€’ Hyperbolic β€’ ElevenLabs β€’ SambaNova β€’ Blaxel

---

## License

AGPL-3.0 - See [LICENSE](LICENSE) for details

---

## Support

- πŸ“§ GitHub Issues: [TraceMind-mcp-server/issues](https://github.com/Mandark-droid/TraceMind-mcp-server/issues)
- πŸ’¬ HF Discord: `#mcp-1st-birthday-officialπŸ†`
- 🏷️ Tag: `building-mcp-track-enterprise`