Spaces:

MCP-1st-Birthday
/

TraceMind

Running

App Files Files Community

TraceMind / screens /documentation.py

kshitijthakkar

docs: Add comprehensive job submission requirements

4449927 23 days ago

raw

history blame

53.9 kB

	"""
	Documentation Screen for TraceMind-AI
	Comprehensive documentation for the TraceMind ecosystem
	"""

	import gradio as gr


	def create_about_tab():
	"""Create the About tab with ecosystem overview"""
	return gr.Markdown("""
	# 🧠 TraceMind Ecosystem

	<div align="center">
	<img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/Logo.png" alt="TraceMind Logo" width="300"/>
	</div>

	<br/>

	The Complete AI Agent Evaluation Platform

	<div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;">
	<a href="https://github.com/modelcontextprotocol"><img src="https://img.shields.io/badge/MCP%27s%201st%20Birthday-Hackathon-blue" alt="MCP's 1st Birthday Hackathon"></a>
	<a href="https://github.com/modelcontextprotocol/hackathon"><img src="https://img.shields.io/badge/Track-MCP%20in%20Action%20(Enterprise)-purple" alt="Track 2"></a>
	<a href="https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind"><img src="https://img.shields.io/badge/HuggingFace-TraceMind-yellow?logo=huggingface" alt="HF Space"></a>
	<a href="https://gradio.app/"><img src="https://img.shields.io/badge/Powered%20by-Gradio-orange" alt="Powered by Gradio"></a>
	</div>

	> 🎯 Track 2 Submission: MCP in Action (Enterprise)
	> 📅 MCP's 1st Birthday Hackathon: November 14-30, 2025

	TraceMind is a comprehensive ecosystem for evaluating, monitoring, and optimizing AI agents. Built on open-source foundations and powered by the Model Context Protocol (MCP), TraceMind provides everything you need for production-grade agent evaluation.

	---

	## 📖 Table of Contents

	- [Architecture Overview](#️-architecture-overview)
	- [The Complete Flow](#-the-complete-flow)
	- [Key Features](#-key-features)
	- [Built for MCP's 1st Birthday Hackathon](#-built-for-mcps-1st-birthday-hackathon)
	- [Quick Links](#-quick-links)
	- [Documentation Navigation](#-documentation-navigation)
	- [Getting Started](#-getting-started)
	- [Contributing](#-contributing)
	- [Acknowledgments](#-acknowledgments)

	---

	<details open>
	<summary><h2>🏗️ Architecture Overview</h2></summary>

	<div align="center">
	<img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/TraceVerse_Logo.png" alt="TraceVerse Ecosystem" width="500"/>
	</div>

	<br/>

	The TraceMind ecosystem consists of four integrated components:

	```
	┌─────────────────────────────────────────────────────────────┐
	│ TraceMind Ecosystem │
	├─────────────────────────────────────────────────────────────┤
	│ │
	│ 1️⃣ TraceVerde (genai_otel_instrument) │
	│ └─ Automatic OpenTelemetry Instrumentation │
	│ └─ Zero-code tracing for LLM frameworks │
	│ │
	│ 2️⃣ SMOLTRACE │
	│ └─ Lightweight Agent Evaluation Engine │
	│ └─ Generates structured datasets │
	│ │
	│ 3️⃣ TraceMind-MCP-Server │
	│ └─ MCP Server (Track 1: Building MCP) │
	│ └─ Provides intelligent analysis tools │
	│ │
	│ 4️⃣ TraceMind-AI (This App!) │
	│ └─ Gradio UI (Track 2: MCP in Action) │
	│ └─ Visualizes data + consumes MCP tools │
	│ │
	└─────────────────────────────────────────────────────────────┘
	```

	</details>

	---

	<details open>
	<summary><h2>🔄 The Complete Flow</h2></summary>

	### 1. Instrument Your Agents (TraceVerde)
	```python
	import genai_otel

	# Zero-code instrumentation
	genai_otel.instrument()

	# Your agent code runs normally, but now traced!
	agent.run("What's the weather in Tokyo?")
	```

	### 2. Evaluate with SMOLTRACE
	```bash
	# Run comprehensive evaluation
	smoltrace-eval \\
	--model openai/gpt-4 \\
	--agent-type both \\
	--enable-otel
	```

	### 3. Analyze Results (This UI)
	- View leaderboard rankings
	- Compare model performance
	- Explore detailed traces
	- Ask questions with MCP-powered chat

	</details>

	---

	<details open>
	<summary><h2>🎯 Key Features</h2></summary>

	### For Developers
	- ✅ Zero-code Instrumentation: Just import and go
	- ✅ Framework Agnostic: Works with LiteLLM, Transformers, LangChain, CrewAI, etc.
	- ✅ Production Ready: Lightweight, minimal overhead
	- ✅ Standards Compliant: Uses OpenTelemetry conventions

	### For Researchers
	- ✅ Comprehensive Metrics: Token usage, costs, latency, GPU utilization
	- ✅ Reproducible Results: Structured datasets on HuggingFace
	- ✅ Model Comparison: Side-by-side analysis
	- ✅ Trace Visualization: Step-by-step agent execution

	### For Organizations
	- ✅ Cost Transparency: Real-time cost tracking and estimation
	- ✅ Sustainability: CO2 emissions monitoring (TraceVerde)
	- ✅ MCP Integration: Connect to intelligent analysis tools
	- ✅ HuggingFace Native: Seamless dataset integration

	</details>

	---

	## 🏆 Built for MCP's 1st Birthday Hackathon

	TraceMind demonstrates the complete MCP ecosystem:

	Track 1 (Building MCP): [TraceMind-mcp-server](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
	- Provides MCP tools for leaderboard analysis, cost estimation, trace debugging

	Track 2 (MCP in Action): TraceMind-AI (this app!)
	- Consumes MCP servers for autonomous agent chat and intelligent insights

	---

	## 🔗 Quick Links

	### 📦 Component Links

	\| Component \| Description \| Links \|
	\|-----------\|-------------\|-------\|
	\| TraceVerde \| OTEL Instrumentation \| [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) • [PyPI](https://pypi.org/project/genai-otel-instrument) \|
	\| SMOLTRACE \| Evaluation Engine \| [GitHub](https://github.com/Mandark-droid/SMOLTRACE) • [PyPI](https://pypi.org/project/smoltrace/) \|
	\| MCP Server \| Building MCP (Track 1) \| [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) \|
	\| TraceMind-AI \| MCP in Action (Track 2) \| [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) \|

	### 📢 Community Posts

	- 🎉 [TraceMind Teaser](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcpsfirstbirthdayhackathon-mcpsfirstbirthdayhackathon-activity-7395686529270013952-g_id) - MCP's 1st Birthday Hackathon announcement
	- 📊 [SMOLTRACE Launch](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_ai-machinelearning-llm-activity-7394350375908126720-im_T) - Lightweight agent evaluation engine
	- 🔭 [TraceVerde Launch](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_genai-opentelemetry-observability-activity-7390339855135813632-wqEg) - Zero-code OTEL instrumentation for LLMs
	- 🙏 [TraceVerde 3K Downloads](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_thank-you-open-source-community-a-week-activity-7392205780592132096-nu6U) - Thank you to the community!

	---

	## 📚 Documentation Navigation

	Use the tabs above to explore detailed documentation for each component:

	- About: This overview (you are here)
	- TraceVerde: OpenTelemetry instrumentation for LLMs
	- SmolTrace: Agent evaluation engine
	- TraceMind-MCP-Server: MCP server implementation details

	---

	<details open>
	<summary><h2>💡 Getting Started</h2></summary>

	### Quick Start (5 minutes)
	```bash
	# 1. Install TraceVerde for instrumentation
	pip install genai-otel-instrument

	# 2. Install SMOLTRACE for evaluation
	pip install smoltrace

	# 3. Run your first evaluation
	smoltrace-eval --model openai/gpt-4 --agent-type tool

	# 4. View results in TraceMind-AI (this UI!)
	```

	### Learn More
	- Read component-specific docs in the tabs above
	- Try the Agent Chat for interactive queries
	- Explore the Leaderboard to see real evaluation data
	- Check the Trace Detail screen for deep inspection

	</details>

	---

	## 🤝 Contributing

	All components are open source under AGPL-3.0:
	- Report issues on GitHub
	- Submit pull requests
	- Share your evaluation results
	- Join the community discussions

	---

	## 👏 Acknowledgments

	Built with ❤️ for MCP's 1st Birthday Hackathon by Kshitij Thakkar

	Special thanks to:
	- Anthropic - For the Model Context Protocol
	- Gradio Team - For Gradio 6 with MCP integration
	- HuggingFace - For Spaces and dataset infrastructure
	- Google - For Gemini API access
	- OpenTelemetry - For standardized observability

	---

	Last Updated: November 2025
	""")


	def create_traceverde_tab():
	"""Create the TraceVerde documentation tab"""
	return gr.Markdown("""
	# 🔭 TraceVerde (genai_otel_instrument)

	<div align="center">
	<img src="https://raw.githubusercontent.com/Mandark-droid/genai_otel_instrument/main/.github/images/Logo.jpg" alt="TraceVerde Logo" width="400"/>
	</div>

	<br/>

	<div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;">
	<a href="https://badge.fury.io/py/genai-otel-instrument"><img src="https://badge.fury.io/py/genai-otel-instrument.svg" alt="PyPI version"></a>
	<a href="https://pypi.org/project/genai-otel-instrument/"><img src="https://img.shields.io/pypi/pyversions/genai-otel-instrument.svg" alt="Python Versions"></a>
	<a href="https://www.gnu.org/licenses/agpl-3.0"><img src="https://img.shields.io/badge/License-AGPL%203.0-blue.svg" alt="License"></a>
	<a href="https://pepy.tech/project/genai-otel-instrument"><img src="https://static.pepy.tech/badge/genai-otel-instrument" alt="Downloads"></a>
	<a href="https://pepy.tech/project/genai-otel-instrument"><img src="https://static.pepy.tech/badge/genai-otel-instrument/month" alt="Downloads/Month"></a>
	</div>

	<div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;">
	<a href="https://github.com/Mandark-droid/genai_otel_instrument"><img src="https://img.shields.io/github/stars/Mandark-droid/genai_otel_instrument?style=social" alt="GitHub Stars"></a>
	<a href="https://github.com/Mandark-droid/genai_otel_instrument"><img src="https://img.shields.io/github/forks/Mandark-droid/genai_otel_instrument?style=social" alt="GitHub Forks"></a>
	<a href="https://github.com/Mandark-droid/genai_otel_instrument/issues"><img src="https://img.shields.io/github/issues/Mandark-droid/genai_otel_instrument" alt="GitHub Issues"></a>
	</div>

	<div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;">
	<a href="https://opentelemetry.io/"><img src="https://img.shields.io/badge/OpenTelemetry-1.20%2B-blueviolet" alt="OpenTelemetry"></a>
	<a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/"><img src="https://img.shields.io/badge/OTel%20Semconv-GenAI%20v1.28-orange" alt="Semantic Conventions"></a>
	<a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code Style: Black"></a>
	</div>

	Automatic OpenTelemetry Instrumentation for LLM Applications

	---

	## 📖 Table of Contents

	- [What is TraceVerde?](#what-is-traceverde)
	- [Installation](#-installation)
	- [Quick Start](#-quick-start)
	- [Supported Frameworks](#-supported-frameworks)
	- [What Gets Captured?](#-what-gets-captured)
	- [CO2 Emissions Tracking](#-co2-emissions-tracking)
	- [Advanced Configuration](#-advanced-configuration)
	- [Integration with SMOLTRACE](#-integration-with-smoltrace)
	- [Use Cases](#-use-cases)
	- [OpenTelemetry Standards](#-opentelemetry-standards)
	- [Resources](#-resources)
	- [Troubleshooting](#-troubleshooting)
	- [License](#-license)
	- [Contributing](#-contributing)

	---

	## What is TraceVerde?

	TraceVerde is a zero-code OpenTelemetry instrumentation library for GenAI applications. It automatically captures:

	- 🔹 Every LLM call (token usage, cost, latency)
	- 🔹 Tool executions and results
	- 🔹 Agent reasoning steps
	- 🔹 GPU metrics (utilization, memory, temperature)
	- 🔹 CO2 emissions (via CodeCarbon integration)

	All with one import statement - no code changes required!

	---

	## 📦 Installation

	```bash
	pip install genai-otel-instrument

	# With GPU metrics support
	pip install genai-otel-instrument[gpu]

	# With CO2 emissions tracking
	pip install genai-otel-instrument[carbon]

	# All features
	pip install genai-otel-instrument[all]
	```

	---

	<details open>
	<summary><h2>🚀 Quick Start</h2></summary>

	### Basic Usage

	Option 1: Environment Variables (No code changes)

	```bash
	export OTEL_SERVICE_NAME=my-llm-app
	export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
	python your_app.py
	```

	Option 2: One line of code

	```python
	import genai_otel
	genai_otel.instrument()

	# Your existing code works unchanged
	import openai
	client = openai.OpenAI()
	response = client.chat.completions.create(
	model="gpt-4",
	messages=[{"role": "user", "content": "Hello!"}]
	)

	# Traces are automatically captured and exported!
	```

	Option 3: With OpenTelemetry Setup

	```python
	from opentelemetry import trace
	from opentelemetry.sdk.trace import TracerProvider
	from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor

	# 1. Setup OpenTelemetry (one-time setup)
	trace.set_tracer_provider(TracerProvider())
	span_processor = SimpleSpanProcessor(ConsoleSpanExporter())
	trace.get_tracer_provider().add_span_processor(span_processor)

	# 2. Instrument all LLM frameworks (one line!)
	import genai_otel
	genai_otel.instrument()

	# 3. Use your LLM framework normally - it's now traced!
	from litellm import completion

	response = completion(
	model="gpt-4",
	messages=[{"role": "user", "content": "Hello!"}]
	)

	# Traces are automatically captured and exported!
	```

	</details>

	---

	## 🎯 Supported Frameworks

	TraceVerde automatically instruments:

	\| Framework \| Status \| Import Required \|
	\|-----------\|--------\|-----------------\|
	\| LiteLLM \| ✅ Full Support \| `from litellm import completion` \|
	\| Transformers \| ✅ Full Support \| `from transformers import pipeline` \|
	\| LangChain \| ✅ Full Support \| `from langchain import ...` \|
	\| CrewAI \| ✅ Full Support \| `from crewai import Agent` \|
	\| smolagents \| ✅ Full Support \| `from smolagents import ...` \|
	\| OpenAI SDK \| ✅ Full Support \| `from openai import OpenAI` \|

	No code changes needed - just import and use as normal!

	---

	<details>
	<summary><h2>📊 What Gets Captured?</h2></summary>

	### LLM Spans

	Every LLM call creates a span with:

	```json
	{
	"span_name": "LLM Call - Reasoning",
	"attributes": {
	"gen_ai.system": "openai",
	"gen_ai.request.model": "gpt-4",
	"gen_ai.operation.name": "chat",
	"gen_ai.usage.prompt_tokens": 78,
	"gen_ai.usage.completion_tokens": 156,
	"gen_ai.usage.total_tokens": 234,
	"gen_ai.usage.cost.total": 0.0012,
	"gen_ai.response.finish_reasons": ["stop"],
	"gen_ai.request.temperature": 0.7
	}
	}
	```

	### Tool Spans

	Tool executions are traced with:

	```json
	{
	"span_name": "Tool Call - get_weather",
	"attributes": {
	"tool.name": "get_weather",
	"tool.input": "{\\"location\\": \\"Tokyo\\"}",
	"tool.output": "{\\"temp\\": \\"18°C\\"}",
	"tool.latency_ms": 890
	}
	}
	```

	### GPU Metrics

	When enabled, captures real-time GPU data:

	```json
	{
	"metrics": [
	{
	"name": "gen_ai.gpu.utilization",
	"value": 67.5,
	"unit": "%",
	"timestamp": "2025-11-18T14:23:00Z"
	},
	{
	"name": "gen_ai.gpu.memory.used",
	"value": 512.34,
	"unit": "MiB"
	}
	]
	}
	```

	</details>

	---

	## 🌱 CO2 Emissions Tracking

	TraceVerde integrates with CodeCarbon for sustainability monitoring:

	```python
	import genai_otel

	# Enable CO2 tracking
	genai_otel.instrument(enable_carbon_tracking=True)

	# Your LLM calls now track carbon emissions!
	```

	Captured Metrics:
	- 🌍 CO2 emissions (grams)
	- ⚡ Energy consumed (kWh)
	- 📍 Geographic region
	- 💻 Hardware type (CPU/GPU)

	---

	## 🔧 Advanced Configuration

	### Custom Exporters

	```python
	from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
	from opentelemetry.sdk.trace.export import BatchSpanProcessor

	# Export to Jaeger/Tempo/etc
	otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
	span_processor = BatchSpanProcessor(otlp_exporter)
	trace.get_tracer_provider().add_span_processor(span_processor)

	import genai_otel
	genai_otel.instrument()
	```

	### GPU Metrics

	```python
	# Enable GPU monitoring (requires pynvml)
	import genai_otel
	genai_otel.instrument(
	enable_gpu_metrics=True,
	gpu_poll_interval=1.0 # seconds
	)
	```

	---

	## 📈 Integration with SMOLTRACE

	TraceVerde powers SMOLTRACE's evaluation capabilities:

	```python
	# SMOLTRACE automatically uses TraceVerde for instrumentation
	from smoltrace import evaluate_agent

	results = evaluate_agent(
	model="gpt-4",
	agent_type="tool",
	enable_otel=True # Uses TraceVerde under the hood!
	)
	```

	---

	## 🎯 Use Cases

	### 1. Development & Debugging
	```python
	# See exactly what your agent is doing
	import genai_otel
	genai_otel.instrument()

	# Run your agent
	agent.run("Complex task")

	# View traces in console or Jaeger
	```

	### 2. Production Monitoring
	```python
	# Export to your observability platform
	from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

	otlp_exporter = OTLPSpanExporter(endpoint="https://your-otel-collector")
	# ... setup processor ...

	import genai_otel
	genai_otel.instrument()
	```

	### 3. Cost Analysis
	```python
	# Track costs across all LLM calls
	import genai_otel
	genai_otel.instrument()

	# Analyze cost per user/session/feature
	# All costs automatically captured in span attributes
	```

	### 4. Sustainability Reporting
	```python
	# Monitor environmental impact
	import genai_otel
	genai_otel.instrument(
	enable_carbon_tracking=True,
	enable_gpu_metrics=True
	)

	# Generate CO2 reports from trace data
	```

	---

	## 📐 OpenTelemetry Standards

	TraceVerde follows the Gen AI Semantic Conventions:
	- ✅ Consistent attribute naming (`gen_ai.*`)
	- ✅ Standard span structure
	- ✅ Compatible with all OTEL collectors
	- ✅ Works with Jaeger, Tempo, Datadog, New Relic, etc.

	---

	## 🔗 Resources

	- GitHub: [github.com/Mandark-droid/genai_otel_instrument](https://github.com/Mandark-droid/genai_otel_instrument)
	- PyPI: [pypi.org/project/genai-otel-instrument](https://pypi.org/project/genai-otel-instrument)
	- Examples: [github.com/Mandark-droid/genai_otel_instrument/examples](https://github.com/Mandark-droid/genai_otel_instrument/tree/main/examples)
	- OpenTelemetry Docs: [opentelemetry.io](https://opentelemetry.io)

	---

	## 🐛 Troubleshooting

	### Common Issues

	Q: Traces not appearing?
	```python
	# Make sure you setup a tracer provider first
	from opentelemetry import trace
	from opentelemetry.sdk.trace import TracerProvider

	trace.set_tracer_provider(TracerProvider())
	```

	Q: GPU metrics not working?
	```bash
	# Install GPU support
	pip install genai-otel-instrument[gpu]

	# Verify NVIDIA drivers installed
	nvidia-smi
	```

	Q: How to configure different options?
	```python
	# Use environment variables or pass options to instrument()
	import genai_otel
	genai_otel.instrument(enable_gpu_metrics=True)
	```

	---

	## 📄 License

	AGPL-3.0 - Open source and free to use

	---

	## 🤝 Contributing

	Contributions welcome!
	- Report bugs on GitHub Issues
	- Submit PRs for new framework support
	- Share your use cases

	---

	TraceVerde - Making AI agents observable, one trace at a time 🔭
	""")


	def create_smoltrace_tab():
	"""Create the SMOLTRACE documentation tab"""
	return gr.Markdown("""
	# 📊 SMOLTRACE

	<div align="center">
	<img src="https://raw.githubusercontent.com/Mandark-droid/SMOLTRACE/main/.github/images/Logo.png" alt="SMOLTRACE Logo" width="400"/>
	</div>

	<br/>

	Lightweight Agent Evaluation Engine with Built-in OpenTelemetry Tracing

	<div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;">
	<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/Python-3.10%2B-blue" alt="Python"></a>
	<a href="https://github.com/Mandark-droid/SMOLTRACE/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-AGPL--3.0-blue.svg" alt="License"></a>
	<a href="https://badge.fury.io/py/smoltrace"><img src="https://badge.fury.io/py/smoltrace.svg" alt="PyPI version"></a>
	<a href="https://pepy.tech/project/smoltrace"><img src="https://static.pepy.tech/badge/smoltrace" alt="Downloads"></a>
	<a href="https://pepy.tech/project/smoltrace"><img src="https://static.pepy.tech/badge/smoltrace/month" alt="Downloads/Month"></a>
	</div>

	<div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;">
	<a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"></a>
	<a href="https://pycqa.github.io/isort/"><img src="https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336" alt="Imports: isort"></a>
	<a href="https://github.com/Mandark-droid/SMOLTRACE/actions?query=workflow%3Atest"><img src="https://img.shields.io/github/actions/workflow/status/Mandark-droid/SMOLTRACE/test.yml?branch=main&label=tests" alt="Tests"></a>
	<a href="https://huggingface.co/docs/smoltrace/en/index"><img src="https://img.shields.io/badge/docs-stable-blue.svg" alt="Docs"></a>
	</div>

	---

	## 📖 Table of Contents

	- [What is SMOLTRACE?](#what-is-smoltrace)
	- [Installation](#-installation)
	- [Quick Start](#-quick-start)
	- [Evaluation Types](#-evaluation-types)
	- [What Gets Generated?](#-what-gets-generated)
	- [Configuration Options](#-configuration-options)
	- [Integration with HuggingFace Jobs](#️-integration-with-huggingface-jobs)
	- [Integration with TraceMind-AI](#-integration-with-tracemind-ai)
	- [Best Practices](#-best-practices)
	- [Cost Estimation](#-cost-estimation)
	- [Architecture](#-architecture)
	- [Resources](#-resources)
	- [Troubleshooting](#-troubleshooting)
	- [License](#-license)
	- [Contributing](#-contributing)

	---

	## What is SMOLTRACE?

	SMOLTRACE is a production-ready evaluation framework for AI agents that:

	- ✅ Evaluates agents across tool usage, code execution, and both
	- ✅ Supports both API models (via LiteLLM) and local models (via Transformers)
	- ✅ Automatically captures OpenTelemetry traces using TraceVerde
	- ✅ Generates structured datasets for HuggingFace
	- ✅ Tracks costs, GPU metrics, and CO2 emissions

	Goal: Become HuggingFace's standard agent evaluation platform

	---

	## 📦 Installation

	```bash
	# Basic installation
	pip install smoltrace

	# With OpenTelemetry support
	pip install smoltrace[otel]

	# With GPU metrics
	pip install smoltrace[otel,gpu]

	# Everything
	pip install smoltrace[all]
	```

	---

	<details open>
	<summary><h2>🚀 Quick Start</h2></summary>

	### Command Line

	```bash
	# Evaluate GPT-4 as a tool agent
	smoltrace-eval \\
	--model openai/gpt-4 \\
	--provider litellm \\
	--agent-type tool \\
	--enable-otel

	# Evaluate local Llama model
	smoltrace-eval \\
	--model meta-llama/Llama-3.1-8B \\
	--provider transformers \\
	--agent-type both \\
	--enable-otel \\
	--enable-gpu-metrics
	```

	### Python API

	```python
	from smoltrace import evaluate_agent

	# Run evaluation
	results = evaluate_agent(
	model="openai/gpt-4",
	provider="litellm",
	agent_type="tool",
	enable_otel=True,
	num_tests=100
	)

	# Access results
	print(f"Success Rate: {results.success_rate}%")
	print(f"Total Cost: ${results.total_cost}")
	print(f"Avg Duration: {results.avg_duration_ms}ms")

	# Upload to HuggingFace
	results.upload_to_hf(
	results_repo="username/agent-results-gpt4",
	traces_repo="username/agent-traces-gpt4",
	leaderboard_repo="username/agent-leaderboard"
	)
	```

	</details>

	---

	## 🎯 Evaluation Types

	### 1. Tool Agent
	Tests ability to use external tools:
	```bash
	smoltrace-eval --model gpt-4 --agent-type tool
	```

	Example Task: "What's the weather in Tokyo?"
	- Agent must call `get_weather` tool
	- Verify correct tool selection
	- Check response quality

	### 2. Code Agent
	Tests code generation and execution:
	```bash
	smoltrace-eval --model gpt-4 --agent-type code
	```

	Example Task: "Calculate the sum of first 10 prime numbers"
	- Agent must generate Python code
	- Execute code safely
	- Return correct result

	### 3. Both (Combined)
	Tests comprehensive agent capabilities:
	```bash
	smoltrace-eval --model gpt-4 --agent-type both
	```

	Tests both tool usage AND code generation

	---

	<details>
	<summary><h2>📊 What Gets Generated?</h2></summary>

	SMOLTRACE creates 4 structured datasets on HuggingFace:

	### 1. Leaderboard Dataset
	Aggregate statistics for all evaluation runs:

	```python
	{
	"run_id": "uuid",
	"model": "openai/gpt-4",
	"agent_type": "tool",
	"provider": "litellm",

	# Performance
	"success_rate": 95.8,
	"total_tests": 100,
	"avg_duration_ms": 3200.0,

	# Cost & Resources
	"total_tokens": 15000,
	"total_cost_usd": 0.05,
	"co2_emissions_g": 0.22,
	"gpu_utilization_avg": 67.5,

	# Dataset References
	"results_dataset": "username/agent-results-gpt4",
	"traces_dataset": "username/agent-traces-gpt4",
	"metrics_dataset": "username/agent-metrics-gpt4",

	# Metadata
	"timestamp": "2025-11-18T14:23:00Z",
	"submitted_by": "username"
	}
	```

	### 2. Results Dataset
	Individual test case results:

	```python
	{
	"run_id": "uuid",
	"task_id": "task_001",
	"test_index": 0,

	# Test Case
	"prompt": "What's the weather in Tokyo?",
	"expected_tool": "get_weather",

	# Result
	"success": true,
	"response": "The weather in Tokyo is 18°C and clear.",
	"tool_called": "get_weather",

	# Metrics
	"execution_time_ms": 2450.0,
	"total_tokens": 234,
	"cost_usd": 0.0012,

	# Trace Reference
	"trace_id": "trace_abc123"
	}
	```

	### 3. Traces Dataset
	Full OpenTelemetry traces:

	```python
	{
	"trace_id": "trace_abc123",
	"run_id": "uuid",
	"spans": [
	{
	"spanId": "span_001",
	"name": "Agent Execution",
	"startTime": "2025-11-18T14:23:01.000Z",
	"endTime": "2025-11-18T14:23:03.450Z",
	"attributes": {
	"agent.type": "tool",
	"gen_ai.system": "openai",
	"gen_ai.request.model": "gpt-4"
	}
	},
	# ... more spans ...
	]
	}
	```

	### 4. Metrics Dataset
	GPU metrics and performance data:

	```python
	{
	"run_id": "uuid",
	"trace_id": "trace_abc123",
	"metrics": [
	{
	"name": "gen_ai.gpu.utilization",
	"value": 67.5,
	"unit": "%",
	"timestamp": "2025-11-18T14:23:01.000Z"
	},
	{
	"name": "gen_ai.co2.emissions",
	"value": 0.22,
	"unit": "gCO2e"
	}
	]
	}
	```

	</details>

	---

	## 🔧 Configuration Options

	### Model Selection

	```bash
	# API Models (via LiteLLM)
	--model openai/gpt-4
	--model anthropic/claude-3-5-sonnet
	--model google/gemini-pro

	# Local Models (via Transformers)
	--model meta-llama/Llama-3.1-8B
	--model mistralai/Mistral-7B-v0.1
	```

	### Provider Selection

	```bash
	--provider litellm # For API models
	--provider transformers # For local models
	```

	### Hardware Selection

	Hardware is selected in HuggingFace Jobs configuration (`hardware:` field in job.yaml), not via CLI flags.

	SMOLTRACE automatically detects available resources:
	- API models (via litellm) → Uses CPU
	- Local models (via transformers) → Uses available GPU if present

	### OpenTelemetry Options

	```bash
	--enable-otel # Enable tracing
	--enable-gpu-metrics # Capture GPU data
	--enable-carbon-tracking # Track CO2 emissions
	```

	---

	## 🏗️ Integration with HuggingFace Jobs

	SMOLTRACE works seamlessly with HuggingFace Jobs for running evaluations on cloud infrastructure.

	### ⚠️ Requirements to Submit Jobs

	IMPORTANT: To submit jobs via TraceMind UI or HF CLI, you must:

	1. 🔑 HuggingFace Pro Account
	- You must be a HuggingFace Pro user
	- Credit card required to pay for compute usage
	- Sign up at: https://huggingface.co/pricing

	2. 🎫 HuggingFace Token Permissions
	- Your HF token needs Read + Write permissions
	- Token must have "Run Jobs" permission enabled
	- Create/update token at: https://huggingface.co/settings/tokens
	- ⚠️ Read-only tokens will NOT work for job submission

	3. 💳 Billing
	- You will be charged for compute usage
	- Pricing: https://huggingface.co/pricing#spaces-pricing
	- Monitor usage at: https://huggingface.co/settings/billing

	### Example Job Configuration

	```yaml
	# job.yaml
	name: SMOLTRACE Evaluation
	hardware: gpu-a10 # Use gpu-h200 for 70B+ models
	environment:
	MODEL: meta-llama/Llama-3.1-8B
	HF_TOKEN: ${{ secrets.HF_TOKEN }}
	command: \|
	pip install smoltrace[otel,gpu]
	smoltrace-eval \\
	--model $MODEL \\
	--provider transformers \\
	--agent-type both \\
	--enable-otel \\
	--enable-gpu-metrics \\
	--results-repo ${{ username }}/agent-results \\
	--leaderboard-repo huggingface/smolagents-leaderboard
	```

	### Hardware Selection

	- 🔧 cpu-basic: API models (OpenAI, Anthropic via LiteLLM) - ~$0.05/hr
	- 🎮 t4-small: Small models (4B-8B) - ~$0.60/hr
	- 🔧 a10g-small: Medium models (7B-13B) - ~$1.10/hr
	- 🚀 a100-large: Large models (70B+) - ~$3.00/hr

	Pricing: See https://huggingface.co/pricing#spaces-pricing

	### Benefits

	- 📊 Automatic Upload: Results → HuggingFace datasets
	- 🔄 Reproducible: Same environment every time
	- ⚡ Optimized Compute: Right hardware for your model size
	- 💰 Pay-per-use: Only pay for actual compute time

	---

	## 📈 Integration with TraceMind-AI

	SMOLTRACE datasets power the TraceMind-AI interface:

	```
	SMOLTRACE Evaluation
	↓
	4 Datasets Created
	↓
	┌────────┴────────┐
	│ │
	│ TraceMind-AI │ ← You are here!
	│ (Gradio UI) │
	│ │
	└─────────────────┘
	```

	What TraceMind-AI Shows:
	- 📊 Leaderboard: All evaluation runs
	- 🔍 Run Detail: Individual test cases
	- 🕵️ Trace Detail: OpenTelemetry visualization
	- 🤖 Agent Chat: MCP-powered analysis

	---

	## 🎯 Best Practices

	### 1. Start Small
	```bash
	# Test with 10 runs first
	smoltrace-eval --model gpt-4 --num-tests 10

	# Scale up after validation
	smoltrace-eval --model gpt-4 --num-tests 1000
	```

	### 2. Choose Appropriate Hardware in HF Jobs
	Hardware selection happens in your HuggingFace Jobs configuration:

	```yaml
	# For API models (OpenAI, Anthropic, etc.)
	hardware: cpu-basic

	# For 7B-13B local models
	hardware: gpu-a10

	# For 70B+ local models
	hardware: gpu-h200
	```

	### 3. Enable Full Observability
	```bash
	# Capture everything
	smoltrace-eval \\
	--model your-model \\
	--enable-otel \\
	--enable-gpu-metrics \\
	--enable-carbon-tracking
	```

	### 4. Organize Your Datasets
	```bash
	# Use descriptive repo names
	--results-repo username/results-gpt4-tool-20251118
	--traces-repo username/traces-gpt4-tool-20251118
	--leaderboard-repo username/agent-leaderboard
	```

	---

	## 🔍 Cost Estimation

	Before running evaluations, estimate costs:

	```python
	from smoltrace import estimate_cost

	# API model
	api_cost = estimate_cost(
	model="openai/gpt-4",
	num_tests=1000,
	agent_type="tool"
	)
	print(f"Estimated cost: ${api_cost.total_cost}")

	# GPU job
	gpu_cost = estimate_cost(
	model="meta-llama/Llama-3.1-8B",
	num_tests=1000,
	hardware="gpu_h200"
	)
	print(f"Estimated cost: ${gpu_cost.total_cost}")
	print(f"Estimated time: {gpu_cost.duration_minutes} minutes")
	```

	---

	## 📐 Architecture

	```
	┌─────────────────────────────────────────┐
	│ SMOLTRACE Core │
	├─────────────────────────────────────────┤
	│ │
	│ ┌──────────────┐ ┌──────────────┐ │
	│ │ LiteLLM │ │ Transformers │ │
	│ │ Provider │ │ Provider │ │
	│ └──────┬───────┘ └──────┬───────┘ │
	│ │ │ │
	│ └────────┬──────────┘ │
	│ ↓ │
	│ ┌──────────────┐ │
	│ │ TraceVerde │ │
	│ │ (OTEL) │ │
	│ └──────┬───────┘ │
	│ ↓ │
	│ ┌──────────────┐ │
	│ │ Dataset │ │
	│ │ Generator │ │
	│ └──────┬───────┘ │
	│ ↓ │
	│ ┌───────────────────────┐ │
	│ │ HuggingFace Upload │ │
	│ └───────────────────────┘ │
	│ │
	└─────────────────────────────────────────┘
	```

	---

	## 🔗 Resources

	- GitHub: [github.com/Mandark-droid/SMOLTRACE](https://github.com/Mandark-droid/SMOLTRACE)
	- PyPI: [pypi.org/project/smoltrace](https://pypi.org/project/smoltrace/)
	- Documentation: [SMOLTRACE README](https://github.com/Mandark-droid/SMOLTRACE#readme)

	---

	## 🐛 Troubleshooting

	### Common Issues

	Q: Evaluation is slow?
	```bash
	# Use GPU for local models
	--hardware gpu_h200

	# Or reduce test count
	--num-tests 10
	```

	Q: Traces not captured?
	```bash
	# Make sure OTEL is enabled
	--enable-otel
	```

	Q: Upload to HF failing?
	```bash
	# Check HF token
	export HF_TOKEN=your_token_here

	# Verify repo exists or allow auto-create
	```

	---

	## 📄 License

	AGPL-3.0 - Open source and free to use

	---

	## 🤝 Contributing

	We welcome contributions!
	- Add new agent types
	- Support more frameworks
	- Improve evaluation metrics
	- Optimize performance

	---

	SMOLTRACE - Lightweight evaluation for heavyweight results 📊
	""")


	def create_mcp_server_tab():
	"""Create the TraceMind-MCP-Server documentation tab"""
	return gr.Markdown("""
	# 🔌 TraceMind-MCP-Server

	<div align="center">
	<img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-mcp-server/assets/Logo.png" alt="TraceMind MCP Server Logo" width="300"/>
	</div>

	<br/>

	Building MCP: Intelligent Analysis Tools for Agent Evaluation

	<div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;">
	<a href="https://github.com/modelcontextprotocol"><img src="https://img.shields.io/badge/MCP%27s%201st%20Birthday-Hackathon-blue" alt="MCP's 1st Birthday Hackathon"></a>
	<a href="https://github.com/modelcontextprotocol/hackathon"><img src="https://img.shields.io/badge/Track-Building%20MCP%20(Enterprise)-blue" alt="Track 1"></a>
	<a href="https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server"><img src="https://img.shields.io/badge/HuggingFace-TraceMind--MCP--Server-yellow?logo=huggingface" alt="HF Space"></a>
	<a href="https://ai.google.dev/"><img src="https://img.shields.io/badge/Powered%20by-Google%20Gemini%202.5%20Pro-orange" alt="Google Gemini"></a>
	</div>

	> 🎯 Track 1 Submission: Building MCP (Enterprise)
	> 📅 MCP's 1st Birthday Hackathon: November 14-30, 2025

	---

	## 📖 Table of Contents

	- [What is TraceMind-MCP-Server?](#what-is-tracemind-mcp-server)
	- [MCP Tools Provided](#️-mcp-tools-provided)
	- [analyze_leaderboard](#1-analyze_leaderboard)
	- [estimate_cost](#2-estimate_cost)
	- [debug_trace](#3-debug_trace)
	- [compare_runs](#4-compare_runs)
	- [analyze_results](#5-analyze_results)
	- [Accessing the MCP Server](#-accessing-the-mcp-server)
	- [Use Cases](#-use-cases)
	- [Architecture](#️-architecture)
	- [Configuration](#-configuration)
	- [Dataset Requirements](#-dataset-requirements)
	- [Learning Resources](#-learning-resources)
	- [Troubleshooting](#-troubleshooting)
	- [Links](#-links)
	- [License](#-license)
	- [Contributing](#-contributing)
	- [MCP's 1st Birthday Hackathon](#-mcps-1st-birthday-hackathon)

	---

	## What is TraceMind-MCP-Server?

	TraceMind-MCP-Server is a Track 1 (Building MCP) submission that provides MCP tools for intelligent agent evaluation analysis.

	Key Features:
	- 🤖 Powered by Google Gemini 2.5 Pro
	- 🔌 Standards-compliant MCP implementation
	- 📊 Analyzes HuggingFace evaluation datasets
	- 💡 Provides actionable insights and recommendations
	- 🌐 Accessible via SSE transport for Gradio integration

	---

	<details>
	<summary><h2>🛠️ MCP Tools Provided</h2></summary>

	### 1. `analyze_leaderboard`

	Purpose: Generate AI-powered insights about evaluation leaderboard data

	Input Schema:
	```json
	{
	"leaderboard_repo": "string", // HF dataset (default: kshitijthakkar/smoltrace-leaderboard)
	"metric_focus": "string", // "overall" \| "accuracy" \| "cost" \| "latency" \| "co2"
	"time_range": "string", // "last_week" \| "last_month" \| "all_time"
	"top_n": "integer" // Number of top models to highlight
	}
	```

	What It Does:
	1. Fetches leaderboard dataset from HuggingFace
	2. Filters by time range
	3. Analyzes trends based on metric focus
	4. Uses Gemini to generate insights
	5. Returns markdown-formatted analysis

	Example Output:
	```markdown
	Based on 247 evaluations in the past week:

	Top Performers:
	- GPT-4 leads in accuracy at 95.8% but costs $0.05 per run
	- Llama-3.1-8B offers best cost/performance at 93.4% accuracy for $0.002
	- Qwen3-MoE is fastest at 1.7s average duration

	Trends:
	- API models dominate accuracy rankings
	- GPU models are 10x more cost-effective
	- H200 jobs show 2x faster execution vs A10

	Recommendations:
	- For production: Consider Llama-3.1-8B for cost-sensitive workloads
	- For maximum accuracy: GPT-4 remains state-of-the-art
	- For eco-friendly: Claude-3-Haiku has lowest CO2 emissions
	```

	---

	### 2. `estimate_cost`

	Purpose: Estimate evaluation costs with hardware recommendations

	Input Schema:
	```json
	{
	"model": "string", // Model name (e.g., "openai/gpt-4")
	"agent_type": "string", // "tool" \| "code" \| "both"
	"num_tests": "integer", // Number of test cases (default: 100)
	"hardware": "string" // "cpu" \| "gpu_a10" \| "gpu_h200" (optional)
	}
	```

	What It Does:
	1. Determines if model is API or local
	2. Calculates token usage estimates
	3. Computes costs (API pricing or GPU time)
	4. Estimates duration and CO2 emissions
	5. Provides hardware recommendations

	Example Output:
	```markdown
	## Cost Estimation: openai/gpt-4 (Tool Agent, 100 tests)

	Hardware: CPU (API model)

	Cost Breakdown:
	- Total Tokens: ~15,000
	- Prompt Tokens: ~5,000 ($0.03)
	- Completion Tokens: ~10,000 ($0.06)
	- Total Cost: $0.09

	Time Estimate:
	- Average per test: 3.2s
	- Total duration: ~5.3 minutes

	CO2 Emissions:
	- Estimated: 0.45g CO2e

	Recommendations:
	- ✅ Good choice for accuracy-critical applications
	- ⚠️ Consider Llama-3.1-8B for cost savings (10x cheaper)
	- 💡 Use caching to reduce repeated API calls
	```

	---

	### 3. `debug_trace`

	Purpose: Answer questions about agent execution traces

	Input Schema:
	```json
	{
	"trace_dataset": "string", // HF dataset with OTEL traces
	"trace_id": "string", // Specific trace to analyze
	"question": "string", // Question about the trace
	"include_metrics": "boolean" // Include GPU metrics (default: true)
	}
	```

	What It Does:
	1. Fetches trace data from HuggingFace
	2. Parses OpenTelemetry spans
	3. Analyzes execution flow
	4. Uses Gemini to answer questions
	5. Provides span-level details

	Example Output:
	```markdown
	## Why was the tool called twice?

	Based on trace analysis for `trace_abc123`:

	First Tool Call (span_003):
	- Time: 14:23:19.000
	- Tool: `search_web`
	- Input: {"query": "latest AI news"}
	- Result: 5 results returned
	- Issue: Results were 2 days old

	Second Tool Call (span_005):
	- Time: 14:23:21.200
	- Tool: `search_web`
	- Input: {"query": "latest AI news today"}
	- Reasoning: LLM determined first results were outdated
	- Duration: 1200ms

	Why Twice?
	The agent's reasoning chain shows it initially received outdated results.
	The LLM then decided to refine the query with "today" keyword to get
	more recent data.

	Performance Impact:
	- Added 2.09s to total execution
	- Cost increase: +$0.0003
	- This is normal for agents with iterative reasoning

	Recommendation:
	Consider adding date filters to initial tool calls to avoid retries.
	```

	---

	### 4. `compare_runs`

	Purpose: Side-by-side comparison of evaluation runs

	Input Schema:
	```json
	{
	"leaderboard_repo": "string", // HF leaderboard dataset
	"run_id_1": "string", // First run ID
	"run_id_2": "string", // Second run ID
	"comparison_focus": "string" // "overall" \| "cost" \| "accuracy" \| "speed"
	}
	```

	What It Does:
	1. Fetches data for both runs
	2. Compares key metrics
	3. Identifies strengths/weaknesses
	4. Provides recommendations

	Example Output:
	```markdown
	## Comparison: GPT-4 vs Llama-3.1-8B

	\| Metric \| GPT-4 \| Llama-3.1-8B \| Winner \|
	\|--------\|-------\|--------------\|--------\|
	\| Success Rate \| 95.8% \| 93.4% \| GPT-4 (+2.4%) \|
	\| Avg Duration \| 3.2s \| 2.1s \| Llama (+34% faster) \|
	\| Cost per Run \| $0.05 \| $0.002 \| Llama (25x cheaper) \|
	\| CO2 Emissions \| 0.22g \| 0.08g \| Llama (64% less) \|

	Analysis:
	- GPT-4 has slight accuracy edge but at significant cost premium
	- Llama-3.1-8B offers excellent cost/performance ratio
	- For 1000 runs: GPT-4 costs $50, Llama costs $2

	Recommendation:
	Use Llama-3.1-8B for production unless 95%+ accuracy is critical.
	Consider hybrid approach: Llama for routine tasks, GPT-4 for complex ones.
	```

	---

	### 5. `analyze_results`

	Purpose: Deep dive into test case results

	Input Schema:
	```json
	{
	"results_repo": "string", // HF results dataset
	"run_id": "string", // Run to analyze
	"focus": "string" // "failures" \| "successes" \| "all"
	}
	```

	What It Does:
	1. Loads results dataset
	2. Filters by success/failure
	3. Identifies patterns
	4. Suggests optimizations

	</details>

	---

	## 🌐 Accessing the MCP Server

	### Via TraceMind-AI (This App!)

	The Agent Chat screen uses TraceMind-MCP-Server automatically:

	```python
	# Happens automatically in the Chat screen
	from mcp_client.sync_wrapper import get_sync_mcp_client

	mcp = get_sync_mcp_client()
	insights = mcp.analyze_leaderboard(
	metric_focus="overall",
	time_range="last_week"
	)
	```

	### Via SSE Endpoint (for smolagents)

	```python
	from smolagents import MCPClient, ToolCallingAgent

	# Connect to MCP server via SSE
	mcp_client = MCPClient(
	"https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse"
	)

	# Create agent with MCP tools
	agent = ToolCallingAgent(
	tools=[],
	model="hfapi",
	additional_authorized_imports=["requests", "pandas"]
	)

	# Tools automatically available!
	agent.run("Analyze the leaderboard and show top 3 models")
	```

	### Via MCP SDK (for other clients)

	```python
	from mcp import ClientSession, StdioServerParameters

	# For local development
	session = ClientSession(
	StdioServerParameters(
	command="python",
	args=["-m", "mcp_tools"]
	)
	)

	# Call tools
	result = await session.call_tool(
	"analyze_leaderboard",
	arguments={"metric_focus": "cost"}
	)
	```

	---

	## 🎯 Use Cases

	### 1. Interactive Analysis (Agent Chat)
	Ask natural language questions:
	- "What are the top 3 models by accuracy?"
	- "Compare GPT-4 and Claude-3 on cost"
	- "Why is this agent slow?"

	### 2. Automated Insights (Leaderboard)
	Get AI summaries automatically:
	- Weekly trend reports
	- Cost optimization recommendations
	- Performance alerts

	### 3. Debugging (Trace Detail)
	Understand agent behavior:
	- "Why did the agent fail?"
	- "Which tool took the longest?"
	- "Why was the same tool called twice?"

	### 4. Planning (Cost Estimator)
	Before running evaluations:
	- "How much will 1000 tests cost?"
	- "Should I use A10 or H200?"
	- "What's the CO2 impact?"

	---

	## 🏗️ Architecture

	```
	┌─────────────────────────────────────────────────────┐
	│ TraceMind-MCP-Server (HF Space) │
	├─────────────────────────────────────────────────────┤
	│ │
	│ ┌─────────────────┐ ┌──────────────────┐ │
	│ │ Gradio App │ │ MCP Protocol │ │
	│ │ (UI + SSE) │◄──────►│ Handler │ │
	│ └─────────────────┘ └────────┬─────────┘ │
	│ │ │
	│ ┌────────▼─────────┐ │
	│ │ Tool Router │ │
	│ └────────┬─────────┘ │
	│ │ │
	│ ┌─────────────────────────────┼──────────┐ │
	│ │ │ │ │
	│ ┌──────▼──────┐ ┌─────────▼───────▼──┐ ┌──▼──▼──┐
	│ │ Leaderboard │ │ Cost Estimator │ │ Trace │
	│ │ Analyzer │ │ │ │Debugger│
	│ └─────────────┘ └───────────────────┘ └────────┘
	│ │ │ │ │
	│ └─────────────────────┴──────────────────┘ │
	│ │ │
	│ ┌─────────▼──────────┐ │
	│ │ Gemini 2.5 Pro │ │
	│ │ (Analysis Engine) │ │
	│ └────────────────────┘ │
	│ │
	└────────────────────────────────────────────────────────┘
	│
	│ MCP Protocol (SSE)
	│
	▼
	┌──────────────────────────┐
	│ TraceMind-AI (UI) │
	│ Agent Chat Screen │
	└──────────────────────────┘
	```

	---

	## 🔧 Configuration

	### Environment Variables

	```env
	# Google Gemini API (required)
	GEMINI_API_KEY=your_api_key_here

	# HuggingFace Token (for dataset access)
	HF_TOKEN=your_token_here

	# Default Leaderboard (optional)
	DEFAULT_LEADERBOARD_REPO=kshitijthakkar/smoltrace-leaderboard
	```

	---

	## 📊 Dataset Requirements

	MCP tools expect datasets with specific schemas:

	### Leaderboard Dataset
	```python
	{
	"run_id": "string",
	"model": "string",
	"success_rate": "float",
	"total_cost_usd": "float",
	"timestamp": "string",
	# ... other metrics
	}
	```

	### Results Dataset
	```python
	{
	"run_id": "string",
	"task_id": "string",
	"success": "boolean",
	"trace_id": "string",
	# ... other fields
	}
	```

	### Traces Dataset
	```python
	{
	"trace_id": "string",
	"spans": [
	{
	"spanId": "string",
	"name": "string",
	"attributes": {},
	# ... OTEL format
	}
	]
	}
	```

	---

	## 🎓 Learning Resources

	### MCP Documentation
	- [Model Context Protocol Spec](https://modelcontextprotocol.io)
	- [MCP Python SDK](https://github.com/modelcontextprotocol/python-sdk)
	- [Gradio MCP Integration](https://www.gradio.app/guides/creating-a-custom-chatbot-with-blocks#model-context-protocol-mcp)

	### Implementation Examples
	- This Server: [HF Space Code](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server/tree/main)
	- Client Integration: [TraceMind-AI mcp_client/](https://github.com/Mandark-droid/TraceMind-AI/tree/main/mcp_client)

	---

	## 🐛 Troubleshooting

	### Common Issues

	Q: MCP tools not appearing?
	```bash
	# Verify MCP_SERVER_URL is correct
	echo $MCP_SERVER_URL

	# Should be: https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse
	```

	Q: "Failed to load dataset" error?
	```bash
	# Check HF token
	export HF_TOKEN=your_token_here

	# Verify dataset exists
	huggingface-cli repo info kshitijthakkar/smoltrace-leaderboard
	```

	Q: Gemini API errors?
	```bash
	# Verify API key
	curl -H "Authorization: Bearer $GEMINI_API_KEY" \\
	https://generativelanguage.googleapis.com/v1beta/models

	# Check rate limits (10 requests/minute on free tier)
	```

	---

	## 🔗 Links

	- Live Server: [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
	- Source Code: [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server)
	- Client (This App): [TraceMind-AI](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)
	- MCP Spec: [modelcontextprotocol.io](https://modelcontextprotocol.io)

	---

	## 📄 License

	AGPL-3.0 - Open source and free to use

	---

	## 🤝 Contributing

	Help improve TraceMind-MCP-Server:
	- Add new MCP tools
	- Improve analysis quality
	- Optimize performance
	- Add support for more datasets

	---

	## 🏆 MCP's 1st Birthday Hackathon

	Track 1 Submission: Building MCP (Enterprise)

	TraceMind-MCP-Server demonstrates:
	- ✅ Standards-compliant MCP implementation
	- ✅ SSE transport for Gradio integration
	- ✅ Real-world use case (agent evaluation)
	- ✅ Gemini 2.5 Pro integration
	- ✅ Production-ready deployment on HF Spaces

	Used by: TraceMind-AI (Track 2) for autonomous agent chat

	---

	TraceMind-MCP-Server - Intelligent analysis, one tool at a time 🔌
	""")


	def create_documentation_screen():
	"""
	Create the complete documentation screen with tabs

	Returns:
	gr.Column: Gradio Column component for documentation (can be shown/hidden)
	"""
	with gr.Column(visible=False) as documentation_interface:
	gr.Markdown("""
	# 📚 TraceMind Documentation

	Comprehensive documentation for the entire TraceMind ecosystem
	""")

	with gr.Tabs():
	with gr.Tab("📖 About"):
	create_about_tab()

	with gr.Tab("🔭 TraceVerde"):
	create_traceverde_tab()

	with gr.Tab("📊 SmolTrace"):
	create_smoltrace_tab()

	with gr.Tab("🔌 TraceMind-MCP-Server"):
	create_mcp_server_tab()

	gr.Markdown("""
	---

	### 💡 Quick Navigation

	- Getting Started: Start with the "About" tab for ecosystem overview
	- Instrumentation: See "TraceVerde" for adding observability to your agents
	- Evaluation: Check "SmolTrace" for running evaluations
	- MCP Integration: Explore "TraceMind-MCP-Server" for intelligent analysis

	### 🔗 External Resources

	- [GitHub Organization](https://github.com/Mandark-droid)
	- [HuggingFace Spaces](https://huggingface.co/MCP-1st-Birthday)
	- [MCP Specification](https://modelcontextprotocol.io)

	Built with ❤️ for MCP's 1st Birthday Hackathon
	""")

	return documentation_interface


	if __name__ == "__main__":
	# For standalone testing
	with gr.Blocks() as demo:
	doc_screen = create_documentation_screen()
	# Make it visible for standalone testing
	doc_screen.visible = True
	demo.launch()