Spaces:

MCP-1st-Birthday
/

TraceMind

Running

App Files Files Community

TraceMind / README.md

kshitijthakkar

docs: Add HuggingFace platform value proposition to README

9b4e279 9 days ago

preview code

raw

history blame contribute delete

18 kB

	---
	title: TraceMind AI
	emoji: 🧠
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app.py
	short_description: AI agent evaluation with MCP-powered intelligence
	license: agpl-3.0
	pinned: true
	tags:
	- mcp-in-action-track-enterprise
	- agent-evaluation
	- mcp-client
	- leaderboard
	- gradio
	---

	# 🧠 TraceMind-AI

	<p align="center">
	<img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/Logo.png" alt="TraceMind-AI Logo" width="200"/>
	</p>

	Agent Evaluation Platform with MCP-Powered Intelligence

	[![MCP's 1st Birthday Hackathon](https://img.shields.io/badge/MCP%27s%201st%20Birthday-Hackathon-blue)](https://github.com/modelcontextprotocol)
	[![Track 2: MCP in Action](https://img.shields.io/badge/Track-MCP%20in%20Action%20(Enterprise)-purple)](https://github.com/modelcontextprotocol/hackathon)
	[![Powered by Gradio](https://img.shields.io/badge/Powered%20by-Gradio-orange)](https://gradio.app/)

	> 🎯 Track 2 Submission: MCP in Action (Enterprise)
	> 📅 MCP's 1st Birthday Hackathon: November 14-30, 2025

	---

	## Why TraceMind-AI?

	The Challenge: Evaluating AI agents generates complex data across models, providers, and configurations. Making sense of it all is overwhelming.

	The Solution: TraceMind-AI is your intelligent agent evaluation command center:
	- 📊 Live leaderboard with real-time performance data
	- 🤖 Autonomous agent chat powered by MCP tools
	- 💰 Smart cost estimation before you run evaluations
	- 🔍 Deep trace analysis to debug agent behavior
	- ☁️ Multi-cloud job submission (HuggingFace Jobs + Modal)

	All powered by the Model Context Protocol for AI-driven insights at every step.

	---

	## 🚀 Try It Now

	- 🌐 Live Demo: [TraceMind-AI Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)
	- 🛠️ MCP Server: [TraceMind-mcp-server](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) (Track 1)
	- 📖 Full Docs: See [USER_GUIDE.md](USER_GUIDE.md) for complete walkthrough
	- 🎥 TraceMind-AI Full Demo (20 min): [Watch on Loom](https://www.loom.com/share/70b9689b57204da58b8fef0d23c304fe)
	- 🎬 MCP Server Quick Demo (5 min): [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)
	- 📺 MCP Server Full Demo (20 min): [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)

	---

	## The TraceMind Ecosystem

	TraceMind-AI is the user-facing platform in a complete 4-project agent evaluation ecosystem:

	<p align="center">
	<img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/TraceVerse_Logo.png" alt="TraceVerse Ecosystem" width="400"/>
	<br/><br/>
	</p>

	```
	🔭 TraceVerde 📊 SMOLTRACE
	(genai_otel_instrument) (Evaluation Engine)
	↓ ↓
	Instruments Evaluates
	LLM calls agents
	↓ ↓
	└───────────┬───────────────────┘
	↓
	Generates Datasets
	(leaderboard, traces, metrics)
	↓
	┌───────────┴───────────────────┐
	↓ ↓
	🛠️ TraceMind MCP Server 🧠 TraceMind-AI
	(Track 1 - Building MCP) (This Project - Track 2)
	Provides AI Tools Consumes MCP Tools
	└───────── MCP Protocol ────────┘
	```

	### The Foundation

	🔭 TraceVerde - Automatic OpenTelemetry instrumentation for LLM frameworks
	→ Captures every LLM call, tool usage, and agent step
	→ [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) \| [PyPI](https://pypi.org/project/genai-otel-instrument)

	📊 SMOLTRACE - Lightweight evaluation engine with built-in tracing
	→ Generates structured datasets (leaderboard, results, traces, metrics)
	→ [GitHub](https://github.com/Mandark-droid/SMOLTRACE) \| [PyPI](https://pypi.org/project/smoltrace/)

	### The Platform

	🛠️ TraceMind MCP Server - AI-powered analysis tools via MCP
	→ [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) \| [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server)
	→ Track 1: Building MCP (Enterprise)

	🧠 TraceMind-AI (This Project) - Interactive UI that consumes MCP tools
	→ Track 2: MCP in Action (Enterprise)

	---

	## Why This Matters for Hugging Face

	This ecosystem is built around Hugging Face, not just "using it":

	- Every SMOLTRACE evaluation creates 4 structured `datasets` on the Hub (leaderboard, results, traces, metrics)
	- TraceMind MCP Server and TraceMind-AI run as Hugging Face Spaces, using Gradio's MCP integration
	- The stack is designed for `smolagents` – agents are evaluated, traced, and analyzed using HF's own agent framework
	- Evaluations can be executed via HF Jobs, turning evaluations into real compute usage, not just local scripts

	So TraceMind isn't just another agent demo.
	It's an opinionated blueprint for:

	> "How Hugging Face models + Datasets + Spaces + Jobs + smolagents + MCP can work together as a complete agent evaluation and observability platform."

	---

	## Key Features

	### 🎯 MCP Integration (Track 2)

	TraceMind-AI demonstrates enterprise MCP client usage in two ways:

	1. Direct MCP Client Integration
	- Connects to TraceMind MCP Server via SSE transport
	- Uses 5 AI-powered tools: `analyze_leaderboard`, `estimate_cost`, `debug_trace`, `compare_runs`, `analyze_results`
	- Real-time insights powered by Google Gemini 2.5 Flash

	2. Autonomous Agent with MCP Tools
	- Built with `smolagents` framework
	- Agent has access to all MCP server tools
	- Natural language queries → autonomous tool execution
	- Example: "What are the top 3 models and how much do they cost?"

	### 📊 Agent Evaluation Features

	- Live Leaderboard: View all evaluation runs with sortable metrics
	- Cost Estimation: Auto-select hardware and predict costs before running
	- Trace Visualization: Deep-dive into OpenTelemetry traces with GPU metrics
	- Multi-Cloud Jobs: Submit evaluations to HuggingFace Jobs or Modal
	- Performance Analytics: GPU utilization, CO2 emissions, token tracking

	### 💡 Smart Features

	- Auto Hardware Selection: Based on model size and provider
	- Real-time Job Monitoring: Track HuggingFace Jobs status
	- Agent Reasoning Visibility: See step-by-step tool execution
	- Quick Action Buttons: One-click common queries

	---

	## Quick Start

	### Option 1: Use the Live Demo (Recommended)

	1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
	2. Login: Sign in with your HuggingFace account
	3. Explore: Browse the leaderboard, chat with the agent, visualize traces

	### Option 2: Run Locally

	```bash
	# Clone and setup
	git clone https://github.com/Mandark-droid/TraceMind-AI.git
	cd TraceMind-AI
	pip install -r requirements.txt

	# Configure environment
	cp .env.example .env
	# Edit .env with your API keys (see Configuration section)

	# Run the app
	python app.py
	```

	Visit http://localhost:7860

	---

	## Configuration

	### For Viewing (Free)

	Required:
	- HuggingFace account (free)
	- HuggingFace token with Read permissions

	### For Submitting Jobs (Paid)

	Required:
	- ⚠️ HuggingFace Pro ($9/month) with credit card
	- HuggingFace token with Read + Write + Run Jobs permissions
	- LLM provider API keys (OpenAI, Anthropic, etc.)

	Optional (Modal Alternative):
	- Modal account (pay-per-second, no subscription)
	- Modal API token (MODAL_TOKEN_ID + MODAL_TOKEN_SECRET)

	### Using Your Own API Keys (Recommended for Judges)

	To prevent rate limits during evaluation:

	Step 1: Configure MCP Server (Required for AI tools)
	1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server
	2. Go to ⚙️ Settings tab
	3. Enter: Gemini API Key + HuggingFace Token
	4. Click "Save & Override Keys"

	Step 2: Configure TraceMind-AI (Optional)
	1. Visit: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
	2. Go to ⚙️ Settings tab
	3. Enter: Gemini API Key + HuggingFace Token
	4. Click "Save API Keys"

	Get Free API Keys:
	- Gemini: https://ai.google.dev/ (1,500 requests/day)
	- HuggingFace: https://huggingface.co/settings/tokens (unlimited for public datasets)

	---

	## For Hackathon Judges

	### ✅ Track 2 Compliance

	- MCP Client Integration: Connects to remote MCP server via SSE transport
	- Autonomous Agent: `smolagents` agent with MCP tool access
	- Enterprise Focus: Cost optimization, job submission, performance analytics
	- Production-Ready: Deployed to HuggingFace Spaces with OAuth authentication
	- Real Data: Live HuggingFace datasets from SMOLTRACE evaluations

	### 🎯 Key Innovations

	1. Dual MCP Integration: Both direct MCP client + autonomous agent with MCP tools
	2. Multi-Cloud Support: HuggingFace Jobs + Modal for serverless compute
	3. Auto Hardware Selection: Smart hardware recommendations based on model size
	4. Complete Ecosystem: Part of 4-project platform demonstrating full evaluation workflow
	5. Agent Reasoning Visibility: See step-by-step MCP tool execution

	### 📹 Demo Materials

	- 🎥 TraceMind-AI Full Demo (20 min): [Watch on Loom](https://www.loom.com/share/70b9689b57204da58b8fef0d23c304fe) - Complete walkthrough of all features
	- 🎬 MCP Server Quick Demo (5 min): [Watch on Loom](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835) - Quick intro to MCP tools
	- 📺 MCP Server Full Demo (20 min): [Watch on Loom](https://www.loom.com/share/de559bb0aef749559c79117b7f951250) - Deep dive into MCP server
	- 📝 Blog Post: [Building TraceMind Ecosystem](https://huggingface.co/blog/kshitijthakkar/tracemind-ecosystem) - Technical deep-dive
	- 🚀 LinkedIn Post: [TraceMind-AI Hackathon Submission](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp1stbirthdayhackathon-mcp-modelcontextprotocol-activity-7399775530218065920-owgR) - Final submission announcement

	### 🧪 Testing Suggestions

	1. Try the Agent Chat (🤖 Agent Chat tab):
	- "Analyze the current leaderboard and show me the top 5 models"
	- "Compare the costs of the top 3 models"
	- "Estimate the cost of running 100 tests with GPT-4"

	2. Explore the Leaderboard (📊 Leaderboard tab):
	- Click "Load Leaderboard" to see live data
	- Read the AI-generated insights (powered by MCP server)
	- Click on a run to see detailed test results

	3. Visualize Traces (Select a run → View traces):
	- See OpenTelemetry waterfall diagrams
	- View GPU metrics overlay (for GPU jobs)
	- Ask questions about the trace (MCP-powered debugging)

	---

	## What Can You Do?

	### 📊 View & Analyze

	- Browse leaderboard with AI-powered insights
	- Compare models side-by-side across metrics
	- Analyze traces with interactive visualization
	- Ask questions via autonomous agent

	### 💰 Estimate & Plan

	- Get cost estimates before running evaluations
	- Compare hardware options (CPU vs GPU tiers)
	- Preview duration and CO2 emissions
	- See recommendations from AI analysis

	### 🚀 Submit & Monitor

	- Submit evaluation jobs to HuggingFace or Modal
	- Track job status in real-time
	- View results automatically when complete
	- Download datasets for further analysis

	### 🧪 Generate & Customize

	- Generate synthetic datasets for custom domains and tools
	- Create prompt templates optimized for your use case
	- Push to HuggingFace Hub with one click
	- Test evaluations without writing code

	---

	## Documentation

	For quick evaluation:
	- Read this README for overview
	- Visit the [Live Demo](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) to try it
	- Check out the 🤖 Agent Chat tab for autonomous MCP usage

	For deep dives:
	- [USER_GUIDE.md](USER_GUIDE.md) - Complete screen-by-screen walkthrough
	- Leaderboard tab usage
	- Agent chat interactions
	- Synthetic data generator
	- Job submission workflow
	- Trace visualization guide
	- [MCP_INTEGRATION.md](MCP_INTEGRATION.md) - MCP client architecture
	- How TraceMind-AI connects to MCP server
	- Agent framework integration (smolagents)
	- MCP tool usage examples
	- [JOB_SUBMISSION.md](JOB_SUBMISSION.md) - Evaluation job guide
	- HuggingFace Jobs setup
	- Modal integration
	- Hardware selection guide
	- Cost optimization tips
	- [ARCHITECTURE.md](ARCHITECTURE.md) - Technical architecture
	- Project structure
	- Data flow
	- Authentication
	- Deployment

	---

	## Technology Stack

	- UI Framework: Gradio 5.49.1
	- Agent Framework: smolagents 1.22.0+
	- MCP Integration: MCP Python SDK + smolagents MCPClient
	- Data Source: HuggingFace Datasets API
	- Authentication: HuggingFace OAuth (planned)
	- AI Models:
	- Agent: Google Gemini 2.5 Flash
	- MCP Server: Google Gemini 2.5 Flash
	- Cloud Platforms: HuggingFace Jobs + Modal

	---

	## Example Workflows

	### Workflow 1: Quick Analysis
	1. Open TraceMind-AI
	2. Go to 🤖 Agent Chat
	3. Click "Quick: Top Models"
	4. See agent fetch leaderboard and analyze top performers
	5. Ask follow-up: "Which one is most cost-effective?"

	### Workflow 2: Submit Evaluation Job
	1. Go to ⚙️ Settings → Configure API keys
	2. Go to 🚀 New Evaluation
	3. Select model (e.g., `meta-llama/Llama-3.1-8B`)
	4. Choose infrastructure (HuggingFace Jobs or Modal)
	5. Click "💰 Estimate Cost" to preview
	6. Click "Submit Evaluation"
	7. Monitor job in 📊 Job Monitoring tab
	8. View results in leaderboard when complete

	### Workflow 3: Debug Agent Behavior
	1. Browse 📊 Leaderboard
	2. Click on a run with failures
	3. View detailed test results
	4. Click on a failed test to see trace
	5. Use MCP-powered Q&A: "Why did this test fail?"
	6. Get AI analysis of the execution trace

	### Workflow 4: Generate Custom Test Dataset
	1. Go to 🔬 Synthetic Data Generator
	2. Configure:
	- Domain: `finance`
	- Tools: `get_stock_price,calculate_profit,send_alert`
	- Number of tasks: `20`
	- Difficulty: `balanced`
	3. Click "Generate Dataset"
	4. Review generated tasks and prompt template
	5. Enter repository name: `yourname/smoltrace-finance-tasks`
	6. Click "Push to HuggingFace Hub"
	7. Use your custom dataset in evaluations

	---

	## Screenshots

	See [SCREENSHOTS.md](SCREENSHOTS.md) for annotated screenshots of all screens

	---

	## 🔗 Quick Links

	### 📦 Component Links

	\| Component \| Description \| Links \|
	\|-----------\|-------------\|-------\|
	\| TraceVerde \| OTEL Instrumentation \| [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) • [PyPI](https://pypi.org/project/genai-otel-instrument) \|
	\| SMOLTRACE \| Evaluation Engine \| [GitHub](https://github.com/Mandark-droid/SMOLTRACE) • [PyPI](https://pypi.org/project/smoltrace/) \|
	\| MCP Server \| Building MCP (Track 1) \| [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) • [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server) \|
	\| TraceMind-AI \| MCP in Action (Track 2) \| [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) • [GitHub](https://github.com/Mandark-droid/TraceMind-AI) \|

	### 📢 Community Posts

	- 🚀 [TraceMind-AI Hackathon Submission](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcp1stbirthdayhackathon-mcp-modelcontextprotocol-activity-7399775530218065920-owgR) - MCP's 1st Birthday Hackathon final submission
	- 📝 [Building TraceMind Ecosystem Blog Post](https://huggingface.co/blog/kshitijthakkar/tracemind-ecosystem) - Complete technical deep-dive into the TraceVerse ecosystem
	- 🎉 [TraceMind Teaser](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcpsfirstbirthdayhackathon-mcpsfirstbirthdayhackathon-activity-7395686529270013952-g_id) - MCP's 1st Birthday Hackathon announcement
	- 📊 [SMOLTRACE Launch](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_ai-machinelearning-llm-activity-7394350375908126720-im_T) - Lightweight agent evaluation engine
	- 🔭 [TraceVerde Launch](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_genai-opentelemetry-observability-activity-7390339855135813632-wqEg) - Zero-code OTEL instrumentation for LLMs
	- 🙏 [TraceVerde 3K Downloads](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_thank-you-open-source-community-a-week-activity-7392205780592132096-nu6U) - Thank you to the community!

	---

	## Credits

	Built for: MCP's 1st Birthday Hackathon (Nov 14-30, 2025)
	Track: MCP in Action (Enterprise)
	Author: Kshitij Thakkar
	Powered by: TraceMind MCP Server + Gradio + smolagents
	Built with: Gradio 5.49.1 (MCP client integration)

	Special Thanks:
	- [Eliseu Silva](https://huggingface.co/elismasilva) - For the [gradio_htmlplus](https://huggingface.co/spaces/elismasilva/gradio_htmlplus) custom component that powers our interactive leaderboard table. Eliseu's timely help and collaboration during the hackathon was invaluable!

	Sponsors: HuggingFace • Google Gemini • Modal • Anthropic • Gradio • OpenAI • Nebius • Hyperbolic • ElevenLabs • SambaNova • Blaxel

	---

	## License

	AGPL-3.0 - See [LICENSE](LICENSE) for details

	---

	## Support

	- 📧 GitHub Issues: [TraceMind-AI/issues](https://github.com/Mandark-droid/TraceMind-AI/issues)
	- 💬 HF Discord: `#mcp-1st-birthday-official🏆`
	- 🏷️ Tag: `mcp-in-action-track-enterprise`
	- 🐦 Twitter: [@TraceMindAI](https://twitter.com/TraceMindAI) (placeholder)

	---

	Ready to evaluate your agents with AI-powered intelligence?

	🌐 Try the live demo: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind