Spaces:
Running
Running
| """ | |
| Documentation Screen for TraceMind-AI | |
| Comprehensive documentation for the TraceMind ecosystem | |
| """ | |
| import gradio as gr | |
| def create_about_tab(): | |
| """Create the About tab with ecosystem overview""" | |
| return gr.Markdown(""" | |
| # 🧠 TraceMind Ecosystem | |
| <div align="center"> | |
| <img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/Logo.png" alt="TraceMind Logo" width="300"/> | |
| </div> | |
| <br/> | |
| **The Complete AI Agent Evaluation Platform** | |
| <div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;"> | |
| <a href="https://github.com/modelcontextprotocol"><img src="https://img.shields.io/badge/MCP%27s%201st%20Birthday-Hackathon-blue" alt="MCP's 1st Birthday Hackathon"></a> | |
| <a href="https://github.com/modelcontextprotocol/hackathon"><img src="https://img.shields.io/badge/Track-MCP%20in%20Action%20(Enterprise)-purple" alt="Track 2"></a> | |
| <a href="https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind"><img src="https://img.shields.io/badge/HuggingFace-TraceMind-yellow?logo=huggingface" alt="HF Space"></a> | |
| <a href="https://gradio.app/"><img src="https://img.shields.io/badge/Powered%20by-Gradio-orange" alt="Powered by Gradio"></a> | |
| </div> | |
| > **🎯 Track 2 Submission**: MCP in Action (Enterprise) | |
| > **📅 MCP's 1st Birthday Hackathon**: November 14-30, 2025 | |
| TraceMind is a comprehensive ecosystem for evaluating, monitoring, and optimizing AI agents. Built on open-source foundations and powered by the Model Context Protocol (MCP), TraceMind provides everything you need for production-grade agent evaluation. | |
| --- | |
| ## 📖 Table of Contents | |
| - [Architecture Overview](#️-architecture-overview) | |
| - [The Complete Flow](#-the-complete-flow) | |
| - [Key Features](#-key-features) | |
| - [Built for MCP's 1st Birthday Hackathon](#-built-for-mcps-1st-birthday-hackathon) | |
| - [Quick Links](#-quick-links) | |
| - [Documentation Navigation](#-documentation-navigation) | |
| - [Getting Started](#-getting-started) | |
| - [Contributing](#-contributing) | |
| - [Acknowledgments](#-acknowledgments) | |
| --- | |
| <details open> | |
| <summary><h2>🏗️ Architecture Overview</h2></summary> | |
| <div align="center"> | |
| <img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-AI/assets/TraceVerse_Logo.png" alt="TraceVerse Ecosystem" width="500"/> | |
| </div> | |
| <br/> | |
| The TraceMind ecosystem consists of four integrated components: | |
| ``` | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ TraceMind Ecosystem │ | |
| ├─────────────────────────────────────────────────────────────┤ | |
| │ │ | |
| │ 1️⃣ TraceVerde (genai_otel_instrument) │ | |
| │ └─ Automatic OpenTelemetry Instrumentation │ | |
| │ └─ Zero-code tracing for LLM frameworks │ | |
| │ │ | |
| │ 2️⃣ SMOLTRACE │ | |
| │ └─ Lightweight Agent Evaluation Engine │ | |
| │ └─ Generates structured datasets │ | |
| │ │ | |
| │ 3️⃣ TraceMind-MCP-Server │ | |
| │ └─ MCP Server (Track 1: Building MCP) │ | |
| │ └─ Provides intelligent analysis tools │ | |
| │ │ | |
| │ 4️⃣ TraceMind-AI (This App!) │ | |
| │ └─ Gradio UI (Track 2: MCP in Action) │ | |
| │ └─ Visualizes data + consumes MCP tools │ | |
| │ │ | |
| └─────────────────────────────────────────────────────────────┘ | |
| ``` | |
| </details> | |
| --- | |
| <details open> | |
| <summary><h2>🔄 The Complete Flow</h2></summary> | |
| ### 1. **Instrument Your Agents** (TraceVerde) | |
| ```python | |
| import genai_otel | |
| # Zero-code instrumentation | |
| genai_otel.instrument() | |
| # Your agent code runs normally, but now traced! | |
| agent.run("What's the weather in Tokyo?") | |
| ``` | |
| ### 2. **Evaluate with SMOLTRACE** | |
| ```bash | |
| # Run comprehensive evaluation | |
| smoltrace-eval \\ | |
| --model openai/gpt-4 \\ | |
| --agent-type both \\ | |
| --enable-otel | |
| ``` | |
| ### 3. **Analyze Results** (This UI) | |
| - View leaderboard rankings | |
| - Compare model performance | |
| - Explore detailed traces | |
| - Ask questions with MCP-powered chat | |
| </details> | |
| --- | |
| <details open> | |
| <summary><h2>🎯 Key Features</h2></summary> | |
| ### For Developers | |
| - ✅ **Zero-code Instrumentation**: Just import and go | |
| - ✅ **Framework Agnostic**: Works with LiteLLM, Transformers, HF Inference, Ollama, etc. | |
| - ✅ **Production Ready**: Lightweight, minimal overhead | |
| - ✅ **Standards Compliant**: Uses OpenTelemetry conventions | |
| ### For Researchers | |
| - ✅ **Comprehensive Metrics**: Token usage, costs, latency, GPU utilization | |
| - ✅ **Reproducible Results**: Structured datasets on HuggingFace | |
| - ✅ **Model Comparison**: Side-by-side analysis | |
| - ✅ **Trace Visualization**: Step-by-step agent execution | |
| ### For Organizations | |
| - ✅ **Cost Transparency**: Real-time cost tracking and estimation | |
| - ✅ **Sustainability**: CO2 emissions monitoring (TraceVerde) | |
| - ✅ **MCP Integration**: Connect to intelligent analysis tools | |
| - ✅ **HuggingFace Native**: Seamless dataset integration | |
| </details> | |
| --- | |
| ## 🏆 Built for MCP's 1st Birthday Hackathon | |
| TraceMind demonstrates the complete MCP ecosystem: | |
| **Track 1 (Building MCP)**: [TraceMind-mcp-server](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) | |
| - Provides MCP tools for leaderboard analysis, cost estimation, trace debugging | |
| **Track 2 (MCP in Action)**: TraceMind-AI (this app!) | |
| - Consumes MCP servers for autonomous agent chat and intelligent insights | |
| --- | |
| ## 🔗 Quick Links | |
| ### 📦 Component Links | |
| | Component | Description | Links | | |
| |-----------|-------------|-------| | |
| | **TraceVerde** | OTEL Instrumentation | [GitHub](https://github.com/Mandark-droid/genai_otel_instrument) • [PyPI](https://pypi.org/project/genai-otel-instrument) | | |
| | **SMOLTRACE** | Evaluation Engine | [GitHub](https://github.com/Mandark-droid/SMOLTRACE) • [PyPI](https://pypi.org/project/smoltrace/) | | |
| | **MCP Server** | Building MCP (Track 1) | [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) | | |
| | **TraceMind-AI** | MCP in Action (Track 2) | [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) | | |
| ### 📢 Community Posts | |
| - 🎉 [**TraceMind Teaser**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_mcpsfirstbirthdayhackathon-mcpsfirstbirthdayhackathon-activity-7395686529270013952-g_id) - MCP's 1st Birthday Hackathon announcement | |
| - 📊 [**SMOLTRACE Launch**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_ai-machinelearning-llm-activity-7394350375908126720-im_T) - Lightweight agent evaluation engine | |
| - 🔭 [**TraceVerde Launch**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_genai-opentelemetry-observability-activity-7390339855135813632-wqEg) - Zero-code OTEL instrumentation for LLMs | |
| - 🙏 [**TraceVerde 3K Downloads**](https://www.linkedin.com/posts/kshitij-thakkar-2061b924_thank-you-open-source-community-a-week-activity-7392205780592132096-nu6U) - Thank you to the community! | |
| --- | |
| ## 📚 Documentation Navigation | |
| Use the tabs above to explore detailed documentation for each component: | |
| - **About**: This overview (you are here) | |
| - **TraceVerde**: OpenTelemetry instrumentation for LLMs | |
| - **SmolTrace**: Agent evaluation engine | |
| - **TraceMind-MCP-Server**: MCP server implementation details | |
| --- | |
| <details open> | |
| <summary><h2>💡 Getting Started</h2></summary> | |
| ### Quick Start (5 minutes) | |
| ```bash | |
| # 1. Install TraceVerde for instrumentation | |
| pip install genai-otel-instrument | |
| # 2. Install SMOLTRACE for evaluation | |
| pip install smoltrace | |
| # 3. Run your first evaluation | |
| smoltrace-eval --model openai/gpt-4 --agent-type tool | |
| # 4. View results in TraceMind-AI (this UI!) | |
| ``` | |
| ### Learn More | |
| - Read component-specific docs in the tabs above | |
| - Try the **Agent Chat** for interactive queries | |
| - Explore the **Leaderboard** to see real evaluation data | |
| - Check the **Trace Detail** screen for deep inspection | |
| </details> | |
| --- | |
| ## 🤝 Contributing | |
| All components are open source under AGPL-3.0: | |
| - Report issues on GitHub | |
| - Submit pull requests | |
| - Share your evaluation results | |
| - Join the community discussions | |
| --- | |
| ## 👏 Acknowledgments | |
| Built with ❤️ for **MCP's 1st Birthday Hackathon** by **Kshitij Thakkar** | |
| Special thanks to: | |
| - **Anthropic** - For the Model Context Protocol | |
| - **Gradio Team** - For Gradio 6 with MCP integration | |
| - **HuggingFace** - For Spaces and dataset infrastructure | |
| - **Google** - For Gemini API access | |
| - **OpenTelemetry** - For standardized observability | |
| - **[Eliseu Silva](https://huggingface.co/elismasilva)** - For the [gradio_htmlplus](https://huggingface.co/spaces/elismasilva/gradio_htmlplus) custom component that powers our interactive leaderboard table. Eliseu's timely help and collaboration during the hackathon was invaluable! | |
| --- | |
| *Last Updated: November 2025* | |
| """) | |
| def create_traceverde_tab(): | |
| """Create the TraceVerde documentation tab""" | |
| return gr.Markdown(""" | |
| # 🔭 TraceVerde (genai_otel_instrument) | |
| <div align="center"> | |
| <img src="https://raw.githubusercontent.com/Mandark-droid/genai_otel_instrument/main/.github/images/Logo.jpg" alt="TraceVerde Logo" width="400"/> | |
| </div> | |
| <br/> | |
| <div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;"> | |
| <a href="https://badge.fury.io/py/genai-otel-instrument"><img src="https://badge.fury.io/py/genai-otel-instrument.svg" alt="PyPI version"></a> | |
| <a href="https://pypi.org/project/genai-otel-instrument/"><img src="https://img.shields.io/pypi/pyversions/genai-otel-instrument.svg" alt="Python Versions"></a> | |
| <a href="https://www.gnu.org/licenses/agpl-3.0"><img src="https://img.shields.io/badge/License-AGPL%203.0-blue.svg" alt="License"></a> | |
| <a href="https://pepy.tech/project/genai-otel-instrument"><img src="https://static.pepy.tech/badge/genai-otel-instrument" alt="Downloads"></a> | |
| <a href="https://pepy.tech/project/genai-otel-instrument"><img src="https://static.pepy.tech/badge/genai-otel-instrument/month" alt="Downloads/Month"></a> | |
| </div> | |
| <div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;"> | |
| <a href="https://github.com/Mandark-droid/genai_otel_instrument"><img src="https://img.shields.io/github/stars/Mandark-droid/genai_otel_instrument?style=social" alt="GitHub Stars"></a> | |
| <a href="https://github.com/Mandark-droid/genai_otel_instrument"><img src="https://img.shields.io/github/forks/Mandark-droid/genai_otel_instrument?style=social" alt="GitHub Forks"></a> | |
| <a href="https://github.com/Mandark-droid/genai_otel_instrument/issues"><img src="https://img.shields.io/github/issues/Mandark-droid/genai_otel_instrument" alt="GitHub Issues"></a> | |
| </div> | |
| <div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;"> | |
| <a href="https://opentelemetry.io/"><img src="https://img.shields.io/badge/OpenTelemetry-1.20%2B-blueviolet" alt="OpenTelemetry"></a> | |
| <a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/"><img src="https://img.shields.io/badge/OTel%20Semconv-GenAI%20v1.28-orange" alt="Semantic Conventions"></a> | |
| <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code Style: Black"></a> | |
| </div> | |
| **Automatic OpenTelemetry Instrumentation for LLM Applications** | |
| --- | |
| ## 📖 Table of Contents | |
| - [What is TraceVerde?](#what-is-traceverde) | |
| - [Installation](#-installation) | |
| - [Quick Start](#-quick-start) | |
| - [Supported Frameworks](#-supported-frameworks) | |
| - [What Gets Captured?](#-what-gets-captured) | |
| - [CO2 Emissions Tracking](#-co2-emissions-tracking) | |
| - [Advanced Configuration](#-advanced-configuration) | |
| - [Integration with SMOLTRACE](#-integration-with-smoltrace) | |
| - [Use Cases](#-use-cases) | |
| - [OpenTelemetry Standards](#-opentelemetry-standards) | |
| - [Resources](#-resources) | |
| - [Troubleshooting](#-troubleshooting) | |
| - [License](#-license) | |
| - [Contributing](#-contributing) | |
| --- | |
| ## What is TraceVerde? | |
| TraceVerde is a **zero-code** OpenTelemetry instrumentation library for GenAI applications. It automatically captures: | |
| - 🔹 Every LLM call (token usage, cost, latency) | |
| - 🔹 Tool executions and results | |
| - 🔹 Agent reasoning steps | |
| - 🔹 GPU metrics (utilization, memory, temperature) | |
| - 🔹 CO2 emissions (via CodeCarbon integration) | |
| All with **one import statement** - no code changes required! | |
| --- | |
| ## 📦 Installation | |
| ```bash | |
| pip install genai-otel-instrument | |
| # With GPU metrics support | |
| pip install genai-otel-instrument[gpu] | |
| # With CO2 emissions tracking | |
| pip install genai-otel-instrument[carbon] | |
| # All features | |
| pip install genai-otel-instrument[all] | |
| ``` | |
| --- | |
| <details open> | |
| <summary><h2>🚀 Quick Start</h2></summary> | |
| ### Basic Usage | |
| **Option 1: Environment Variables (No code changes)** | |
| ```bash | |
| export OTEL_SERVICE_NAME=my-llm-app | |
| export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 | |
| python your_app.py | |
| ``` | |
| **Option 2: One line of code** | |
| ```python | |
| import genai_otel | |
| genai_otel.instrument() | |
| # Your existing code works unchanged | |
| import openai | |
| client = openai.OpenAI() | |
| response = client.chat.completions.create( | |
| model="gpt-4", | |
| messages=[{"role": "user", "content": "Hello!"}] | |
| ) | |
| # Traces are automatically captured and exported! | |
| ``` | |
| **Option 3: With OpenTelemetry Setup** | |
| ```python | |
| from opentelemetry import trace | |
| from opentelemetry.sdk.trace import TracerProvider | |
| from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor | |
| # 1. Setup OpenTelemetry (one-time setup) | |
| trace.set_tracer_provider(TracerProvider()) | |
| span_processor = SimpleSpanProcessor(ConsoleSpanExporter()) | |
| trace.get_tracer_provider().add_span_processor(span_processor) | |
| # 2. Instrument all LLM frameworks (one line!) | |
| import genai_otel | |
| genai_otel.instrument() | |
| # 3. Use your LLM framework normally - it's now traced! | |
| from litellm import completion | |
| response = completion( | |
| model="gpt-4", | |
| messages=[{"role": "user", "content": "Hello!"}] | |
| ) | |
| # Traces are automatically captured and exported! | |
| ``` | |
| </details> | |
| --- | |
| <details> | |
| <summary><h2>🎯 Supported Frameworks</h2></summary> | |
| TraceVerde automatically instruments **30+ LLM frameworks and providers**: | |
| ### 🔥 Popular Frameworks | |
| | Framework | Status | Description | | |
| |-----------|--------|-------------| | |
| | **LiteLLM** | ✅ Full Support | Universal LLM gateway (100+ models) | | |
| | **Transformers** | ✅ Full Support | HuggingFace models | | |
| | **LangChain** | ✅ Full Support | LLM application framework | | |
| | **LangGraph** | ✅ Full Support | LangChain graph-based workflows | | |
| | **CrewAI** | ✅ Full Support | Multi-agent orchestration | | |
| | **smolagents** | ✅ Full Support | HuggingFace agent framework | | |
| | **LlamaIndex** | ✅ Full Support | Data framework for LLMs | | |
| ### 🏢 LLM Providers | |
| | Provider | Status | Description | | |
| |----------|--------|-------------| | |
| | **OpenAI** | ✅ Full Support | GPT-4, GPT-3.5, etc. | | |
| | **Anthropic** | ✅ Full Support | Claude models | | |
| | **Google AI** | ✅ Full Support | Gemini models | | |
| | **Cohere** | ✅ Full Support | Command models | | |
| | **Mistral AI** | ✅ Full Support | Mistral models | | |
| | **Groq** | ✅ Full Support | Fast LLM inference | | |
| | **Together AI** | ✅ Full Support | Open source models | | |
| | **Anyscale** | ✅ Full Support | Ray-based LLM serving | | |
| | **Replicate** | ✅ Full Support | Open source model API | | |
| | **SambaNova** | ✅ Full Support | Enterprise AI platform | | |
| | **Hyperbolic** | ✅ Full Support | Decentralized AI | | |
| ### ☁️ Cloud AI Services | |
| | Service | Status | Description | | |
| |---------|--------|-------------| | |
| | **Azure OpenAI** | ✅ Full Support | Azure-hosted OpenAI | | |
| | **AWS Bedrock** | ✅ Full Support | Amazon LLM service | | |
| | **Bedrock Agents** | ✅ Full Support | AWS agent framework | | |
| | **Vertex AI** | ✅ Full Support | Google Cloud AI | | |
| ### 🤖 Agent & Workflow Frameworks | |
| | Framework | Status | Description | | |
| |-----------|--------|-------------| | |
| | **AutoGen** | ✅ Full Support | Microsoft agent framework | | |
| | **OpenAI Agents** | ✅ Full Support | OpenAI assistants API | | |
| | **Pydantic AI** | ✅ Full Support | Type-safe agent framework | | |
| | **DSPy** | ✅ Full Support | Programming framework for LMs | | |
| | **Haystack** | ✅ Full Support | NLP framework | | |
| | **Guardrails AI** | ✅ Full Support | LLM validation framework | | |
| | **Instructor** | ✅ Full Support | Structured LLM outputs | | |
| ### 🖥️ Local & Self-Hosted | |
| | Provider | Status | Description | | |
| |----------|--------|-------------| | |
| | **Ollama** | ✅ Full Support | Local LLM runtime | | |
| **No code changes needed** - just import and use as normal! | |
| </details> | |
| --- | |
| <details> | |
| <summary><h2>📊 What Gets Captured?</h2></summary> | |
| ### LLM Spans | |
| Every LLM call creates a span with: | |
| ```json | |
| { | |
| "span_name": "LLM Call - Reasoning", | |
| "attributes": { | |
| "gen_ai.system": "openai", | |
| "gen_ai.request.model": "gpt-4", | |
| "gen_ai.operation.name": "chat", | |
| "gen_ai.usage.prompt_tokens": 78, | |
| "gen_ai.usage.completion_tokens": 156, | |
| "gen_ai.usage.total_tokens": 234, | |
| "gen_ai.usage.cost.total": 0.0012, | |
| "gen_ai.response.finish_reasons": ["stop"], | |
| "gen_ai.request.temperature": 0.7 | |
| } | |
| } | |
| ``` | |
| ### Tool Spans | |
| Tool executions are traced with: | |
| ```json | |
| { | |
| "span_name": "Tool Call - get_weather", | |
| "attributes": { | |
| "tool.name": "get_weather", | |
| "tool.input": "{\\"location\\": \\"Tokyo\\"}", | |
| "tool.output": "{\\"temp\\": \\"18°C\\"}", | |
| "tool.latency_ms": 890 | |
| } | |
| } | |
| ``` | |
| ### GPU Metrics | |
| When enabled, captures real-time GPU data: | |
| ```json | |
| { | |
| "metrics": [ | |
| { | |
| "name": "gen_ai.gpu.utilization", | |
| "value": 67.5, | |
| "unit": "%", | |
| "timestamp": "2025-11-18T14:23:00Z" | |
| }, | |
| { | |
| "name": "gen_ai.gpu.memory.used", | |
| "value": 512.34, | |
| "unit": "MiB" | |
| } | |
| ] | |
| } | |
| ``` | |
| </details> | |
| --- | |
| ## 🌱 CO2 Emissions Tracking | |
| TraceVerde integrates with CodeCarbon for sustainability monitoring: | |
| ```python | |
| import genai_otel | |
| # Enable CO2 tracking | |
| genai_otel.instrument(enable_carbon_tracking=True) | |
| # Your LLM calls now track carbon emissions! | |
| ``` | |
| **Captured Metrics:** | |
| - 🌍 CO2 emissions (grams) | |
| - ⚡ Energy consumed (kWh) | |
| - 📍 Geographic region | |
| - 💻 Hardware type (CPU/GPU) | |
| --- | |
| ## 🔧 Advanced Configuration | |
| ### Custom Exporters | |
| ```python | |
| from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter | |
| from opentelemetry.sdk.trace.export import BatchSpanProcessor | |
| # Export to Jaeger/Tempo/etc | |
| otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317") | |
| span_processor = BatchSpanProcessor(otlp_exporter) | |
| trace.get_tracer_provider().add_span_processor(span_processor) | |
| import genai_otel | |
| genai_otel.instrument() | |
| ``` | |
| ### GPU Metrics | |
| ```python | |
| # Enable GPU monitoring (requires pynvml) | |
| import genai_otel | |
| genai_otel.instrument( | |
| enable_gpu_metrics=True, | |
| gpu_poll_interval=1.0 # seconds | |
| ) | |
| ``` | |
| --- | |
| ## 📈 Integration with SMOLTRACE | |
| TraceVerde powers SMOLTRACE's evaluation capabilities: | |
| ```python | |
| # SMOLTRACE automatically uses TraceVerde for instrumentation | |
| from smoltrace import evaluate_agent | |
| results = evaluate_agent( | |
| model="gpt-4", | |
| agent_type="tool", | |
| enable_otel=True # Uses TraceVerde under the hood! | |
| ) | |
| ``` | |
| --- | |
| ## 🎯 Use Cases | |
| ### 1. Development & Debugging | |
| ```python | |
| # See exactly what your agent is doing | |
| import genai_otel | |
| genai_otel.instrument() | |
| # Run your agent | |
| agent.run("Complex task") | |
| # View traces in console or Jaeger | |
| ``` | |
| ### 2. Production Monitoring | |
| ```python | |
| # Export to your observability platform | |
| from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter | |
| otlp_exporter = OTLPSpanExporter(endpoint="https://your-otel-collector") | |
| # ... setup processor ... | |
| import genai_otel | |
| genai_otel.instrument() | |
| ``` | |
| ### 3. Cost Analysis | |
| ```python | |
| # Track costs across all LLM calls | |
| import genai_otel | |
| genai_otel.instrument() | |
| # Analyze cost per user/session/feature | |
| # All costs automatically captured in span attributes | |
| ``` | |
| ### 4. Sustainability Reporting | |
| ```python | |
| # Monitor environmental impact | |
| import genai_otel | |
| genai_otel.instrument( | |
| enable_carbon_tracking=True, | |
| enable_gpu_metrics=True | |
| ) | |
| # Generate CO2 reports from trace data | |
| ``` | |
| --- | |
| ## 📐 OpenTelemetry Standards | |
| TraceVerde follows the **Gen AI Semantic Conventions**: | |
| - ✅ Consistent attribute naming (`gen_ai.*`) | |
| - ✅ Standard span structure | |
| - ✅ Compatible with all OTEL collectors | |
| - ✅ Works with Jaeger, Tempo, Datadog, New Relic, etc. | |
| --- | |
| ## 🔗 Resources | |
| - **GitHub**: [github.com/Mandark-droid/genai_otel_instrument](https://github.com/Mandark-droid/genai_otel_instrument) | |
| - **PyPI**: [pypi.org/project/genai-otel-instrument](https://pypi.org/project/genai-otel-instrument) | |
| - **Examples**: [github.com/Mandark-droid/genai_otel_instrument/examples](https://github.com/Mandark-droid/genai_otel_instrument/tree/main/examples) | |
| - **OpenTelemetry Docs**: [opentelemetry.io](https://opentelemetry.io) | |
| --- | |
| ## 🐛 Troubleshooting | |
| ### Common Issues | |
| **Q: Traces not appearing?** | |
| ```python | |
| # Make sure you setup a tracer provider first | |
| from opentelemetry import trace | |
| from opentelemetry.sdk.trace import TracerProvider | |
| trace.set_tracer_provider(TracerProvider()) | |
| ``` | |
| **Q: GPU metrics not working?** | |
| ```bash | |
| # Install GPU support | |
| pip install genai-otel-instrument[gpu] | |
| # Verify NVIDIA drivers installed | |
| nvidia-smi | |
| ``` | |
| **Q: How to configure different options?** | |
| ```python | |
| # Use environment variables or pass options to instrument() | |
| import genai_otel | |
| genai_otel.instrument(enable_gpu_metrics=True) | |
| ``` | |
| --- | |
| ## 📄 License | |
| **AGPL-3.0** - Open source and free to use | |
| --- | |
| ## 🤝 Contributing | |
| Contributions welcome! | |
| - Report bugs on GitHub Issues | |
| - Submit PRs for new framework support | |
| - Share your use cases | |
| --- | |
| *TraceVerde - Making AI agents observable, one trace at a time* 🔭 | |
| """) | |
| def create_smoltrace_tab(): | |
| """Create the SMOLTRACE documentation tab""" | |
| return gr.Markdown(""" | |
| # 📊 SMOLTRACE | |
| <div align="center"> | |
| <img src="https://raw.githubusercontent.com/Mandark-droid/SMOLTRACE/main/.github/images/Logo.png" alt="SMOLTRACE Logo" width="400"/> | |
| </div> | |
| <br/> | |
| **Lightweight Agent Evaluation Engine with Built-in OpenTelemetry Tracing** | |
| <div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;"> | |
| <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/Python-3.10%2B-blue" alt="Python"></a> | |
| <a href="https://github.com/Mandark-droid/SMOLTRACE/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-AGPL--3.0-blue.svg" alt="License"></a> | |
| <a href="https://badge.fury.io/py/smoltrace"><img src="https://badge.fury.io/py/smoltrace.svg" alt="PyPI version"></a> | |
| <a href="https://pepy.tech/project/smoltrace"><img src="https://static.pepy.tech/badge/smoltrace" alt="Downloads"></a> | |
| <a href="https://pepy.tech/project/smoltrace"><img src="https://static.pepy.tech/badge/smoltrace/month" alt="Downloads/Month"></a> | |
| </div> | |
| <div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;"> | |
| <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"></a> | |
| <a href="https://pycqa.github.io/isort/"><img src="https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336" alt="Imports: isort"></a> | |
| <a href="https://github.com/Mandark-droid/SMOLTRACE/actions?query=workflow%3Atest"><img src="https://img.shields.io/github/actions/workflow/status/Mandark-droid/SMOLTRACE/test.yml?branch=main&label=tests" alt="Tests"></a> | |
| <a href="https://huggingface.co/docs/smoltrace/en/index"><img src="https://img.shields.io/badge/docs-stable-blue.svg" alt="Docs"></a> | |
| </div> | |
| --- | |
| ## 📖 Table of Contents | |
| - [What is SMOLTRACE?](#what-is-smoltrace) | |
| - [Installation](#-installation) | |
| - [Quick Start](#-quick-start) | |
| - [Evaluation Types](#-evaluation-types) | |
| - [What Gets Generated?](#-what-gets-generated) | |
| - [Configuration Options](#-configuration-options) | |
| - [Integration with HuggingFace Jobs](#️-integration-with-huggingface-jobs) | |
| - [Integration with TraceMind-AI](#-integration-with-tracemind-ai) | |
| - [Best Practices](#-best-practices) | |
| - [Cost Estimation](#-cost-estimation) | |
| - [Architecture](#-architecture) | |
| - [Resources](#-resources) | |
| - [Troubleshooting](#-troubleshooting) | |
| - [License](#-license) | |
| - [Contributing](#-contributing) | |
| --- | |
| ## What is SMOLTRACE? | |
| SMOLTRACE is a **production-ready** evaluation framework for AI agents that: | |
| - ✅ Evaluates agents across tool usage, code execution, and both | |
| - ✅ Supports both API models (via LiteLLM) and local models (via Transformers) | |
| - ✅ Automatically captures OpenTelemetry traces using TraceVerde | |
| - ✅ Generates structured datasets for HuggingFace | |
| - ✅ Tracks costs, GPU metrics, and CO2 emissions | |
| **Goal**: Become HuggingFace's standard agent evaluation platform | |
| --- | |
| ## 📦 Installation | |
| ```bash | |
| # Basic installation | |
| pip install smoltrace | |
| # With OpenTelemetry support | |
| pip install smoltrace[otel] | |
| # With GPU metrics | |
| pip install smoltrace[otel,gpu] | |
| # Everything | |
| pip install smoltrace[all] | |
| ``` | |
| --- | |
| <details open> | |
| <summary><h2>🚀 Quick Start</h2></summary> | |
| ### Command Line | |
| ```bash | |
| # Evaluate GPT-4 as a tool agent | |
| smoltrace-eval \\ | |
| --model openai/gpt-4 \\ | |
| --provider litellm \\ | |
| --agent-type tool \\ | |
| --enable-otel | |
| # Evaluate local Llama model | |
| smoltrace-eval \\ | |
| --model meta-llama/Llama-3.1-8B \\ | |
| --provider transformers \\ | |
| --agent-type both \\ | |
| --enable-otel \\ | |
| --enable-gpu-metrics | |
| ``` | |
| ### Python API | |
| ```python | |
| from smoltrace import evaluate_agent | |
| # Run evaluation | |
| results = evaluate_agent( | |
| model="openai/gpt-4", | |
| provider="litellm", | |
| agent_type="tool", | |
| enable_otel=True, | |
| num_tests=100 | |
| ) | |
| # Access results | |
| print(f"Success Rate: {results.success_rate}%") | |
| print(f"Total Cost: ${results.total_cost}") | |
| print(f"Avg Duration: {results.avg_duration_ms}ms") | |
| # Upload to HuggingFace | |
| results.upload_to_hf( | |
| results_repo="username/agent-results-gpt4", | |
| traces_repo="username/agent-traces-gpt4", | |
| leaderboard_repo="username/agent-leaderboard" | |
| ) | |
| ``` | |
| </details> | |
| --- | |
| ## 🎯 Evaluation Types | |
| ### 1. Tool Agent | |
| Tests ability to use external tools: | |
| ```bash | |
| smoltrace-eval --model gpt-4 --agent-type tool | |
| ``` | |
| **Example Task**: "What's the weather in Tokyo?" | |
| - Agent must call `get_weather` tool | |
| - Verify correct tool selection | |
| - Check response quality | |
| ### 2. Code Agent | |
| Tests code generation and execution: | |
| ```bash | |
| smoltrace-eval --model gpt-4 --agent-type code | |
| ``` | |
| **Example Task**: "Calculate the sum of first 10 prime numbers" | |
| - Agent must generate Python code | |
| - Execute code safely | |
| - Return correct result | |
| ### 3. Both (Combined) | |
| Tests comprehensive agent capabilities: | |
| ```bash | |
| smoltrace-eval --model gpt-4 --agent-type both | |
| ``` | |
| **Tests both tool usage AND code generation** | |
| --- | |
| <details> | |
| <summary><h2>📊 What Gets Generated?</h2></summary> | |
| SMOLTRACE creates **4 structured datasets** on HuggingFace: | |
| ### 1. Leaderboard Dataset | |
| Aggregate statistics for all evaluation runs: | |
| ```python | |
| { | |
| "run_id": "uuid", | |
| "model": "openai/gpt-4", | |
| "agent_type": "tool", | |
| "provider": "litellm", | |
| # Performance | |
| "success_rate": 95.8, | |
| "total_tests": 100, | |
| "avg_duration_ms": 3200.0, | |
| # Cost & Resources | |
| "total_tokens": 15000, | |
| "total_cost_usd": 0.05, | |
| "co2_emissions_g": 0.22, | |
| "gpu_utilization_avg": 67.5, | |
| # Dataset References | |
| "results_dataset": "username/agent-results-gpt4", | |
| "traces_dataset": "username/agent-traces-gpt4", | |
| "metrics_dataset": "username/agent-metrics-gpt4", | |
| # Metadata | |
| "timestamp": "2025-11-18T14:23:00Z", | |
| "submitted_by": "username" | |
| } | |
| ``` | |
| ### 2. Results Dataset | |
| Individual test case results: | |
| ```python | |
| { | |
| "run_id": "uuid", | |
| "task_id": "task_001", | |
| "test_index": 0, | |
| # Test Case | |
| "prompt": "What's the weather in Tokyo?", | |
| "expected_tool": "get_weather", | |
| # Result | |
| "success": true, | |
| "response": "The weather in Tokyo is 18°C and clear.", | |
| "tool_called": "get_weather", | |
| # Metrics | |
| "execution_time_ms": 2450.0, | |
| "total_tokens": 234, | |
| "cost_usd": 0.0012, | |
| # Trace Reference | |
| "trace_id": "trace_abc123" | |
| } | |
| ``` | |
| ### 3. Traces Dataset | |
| Full OpenTelemetry traces: | |
| ```python | |
| { | |
| "trace_id": "trace_abc123", | |
| "run_id": "uuid", | |
| "spans": [ | |
| { | |
| "spanId": "span_001", | |
| "name": "Agent Execution", | |
| "startTime": "2025-11-18T14:23:01.000Z", | |
| "endTime": "2025-11-18T14:23:03.450Z", | |
| "attributes": { | |
| "agent.type": "tool", | |
| "gen_ai.system": "openai", | |
| "gen_ai.request.model": "gpt-4" | |
| } | |
| }, | |
| # ... more spans ... | |
| ] | |
| } | |
| ``` | |
| ### 4. Metrics Dataset | |
| GPU metrics and performance data: | |
| ```python | |
| { | |
| "run_id": "uuid", | |
| "trace_id": "trace_abc123", | |
| "metrics": [ | |
| { | |
| "name": "gen_ai.gpu.utilization", | |
| "value": 67.5, | |
| "unit": "%", | |
| "timestamp": "2025-11-18T14:23:01.000Z" | |
| }, | |
| { | |
| "name": "gen_ai.co2.emissions", | |
| "value": 0.22, | |
| "unit": "gCO2e" | |
| } | |
| ] | |
| } | |
| ``` | |
| </details> | |
| --- | |
| ## 🔧 Configuration Options | |
| ### Model Selection | |
| ```bash | |
| # API Models (via LiteLLM) | |
| --model openai/gpt-4 | |
| --model anthropic/claude-3-5-sonnet | |
| --model google/gemini-pro | |
| # Local Models (via Transformers) | |
| --model meta-llama/Llama-3.1-8B | |
| --model mistralai/Mistral-7B-v0.1 | |
| ``` | |
| ### Provider Selection | |
| ```bash | |
| --provider litellm # For API models | |
| --provider transformers # For local models | |
| ``` | |
| ### Hardware Selection | |
| Hardware is selected in HuggingFace Jobs configuration (`hardware:` field in job.yaml), not via CLI flags. | |
| SMOLTRACE automatically detects available resources: | |
| - API models (via litellm) → Uses CPU | |
| - Local models (via transformers) → Uses available GPU if present | |
| ### OpenTelemetry Options | |
| ```bash | |
| --enable-otel # Enable tracing | |
| --enable-gpu-metrics # Capture GPU data | |
| --enable-carbon-tracking # Track CO2 emissions | |
| ``` | |
| --- | |
| ## 🏗️ Integration with HuggingFace Jobs | |
| SMOLTRACE works seamlessly with HuggingFace Jobs for running evaluations on cloud infrastructure. | |
| ### ⚠️ Requirements to Submit Jobs | |
| **IMPORTANT**: To submit jobs via TraceMind UI or HF CLI, you must: | |
| 1. **🔑 HuggingFace Pro Account** | |
| - You must be a HuggingFace Pro user | |
| - **Credit card required** to pay for compute usage | |
| - Sign up at: https://huggingface.co/pricing | |
| 2. **🎫 HuggingFace Token Permissions** | |
| - Your HF token needs **Read + Write** permissions | |
| - Token must have **"Run Jobs"** permission enabled | |
| - Create/update token at: https://huggingface.co/settings/tokens | |
| - ⚠️ Read-only tokens will **NOT** work for job submission | |
| 3. **💳 Billing** | |
| - You will be charged for compute usage | |
| - Pricing: https://huggingface.co/pricing#spaces-pricing | |
| - Monitor usage at: https://huggingface.co/settings/billing | |
| ### Example Job Configuration | |
| ```yaml | |
| # job.yaml | |
| name: SMOLTRACE Evaluation | |
| hardware: gpu-a10 # Use gpu-h200 for 70B+ models | |
| environment: | |
| MODEL: meta-llama/Llama-3.1-8B | |
| HF_TOKEN: ${{ secrets.HF_TOKEN }} | |
| command: | | |
| pip install smoltrace[otel,gpu] | |
| smoltrace-eval \\ | |
| --model $MODEL \\ | |
| --provider transformers \\ | |
| --agent-type both \\ | |
| --enable-otel \\ | |
| --enable-gpu-metrics \\ | |
| --results-repo ${{ username }}/agent-results \\ | |
| --leaderboard-repo huggingface/smolagents-leaderboard | |
| ``` | |
| ### Hardware Selection | |
| - 🔧 **cpu-basic**: API models (OpenAI, Anthropic via LiteLLM) - ~$0.05/hr | |
| - 🎮 **t4-small**: Small models (4B-8B) - ~$0.60/hr | |
| - 🔧 **a10g-small**: Medium models (7B-13B) - ~$1.10/hr | |
| - 🚀 **a100-large**: Large models (70B+) - ~$3.00/hr | |
| **Pricing**: See https://huggingface.co/pricing#spaces-pricing | |
| ### Benefits | |
| - 📊 **Automatic Upload**: Results → HuggingFace datasets | |
| - 🔄 **Reproducible**: Same environment every time | |
| - ⚡ **Optimized Compute**: Right hardware for your model size | |
| - 💰 **Pay-per-use**: Only pay for actual compute time | |
| --- | |
| ## 📈 Integration with TraceMind-AI | |
| SMOLTRACE datasets power the TraceMind-AI interface: | |
| ``` | |
| SMOLTRACE Evaluation | |
| ↓ | |
| 4 Datasets Created | |
| ↓ | |
| ┌────────┴────────┐ | |
| │ │ | |
| │ TraceMind-AI │ ← You are here! | |
| │ (Gradio UI) │ | |
| │ │ | |
| └─────────────────┘ | |
| ``` | |
| **What TraceMind-AI Shows:** | |
| - 📊 **Leaderboard**: All evaluation runs | |
| - 🔍 **Run Detail**: Individual test cases | |
| - 🕵️ **Trace Detail**: OpenTelemetry visualization | |
| - 🤖 **Agent Chat**: MCP-powered analysis | |
| --- | |
| ## 🎯 Best Practices | |
| ### 1. Start Small | |
| ```bash | |
| # Test with 10 runs first | |
| smoltrace-eval --model gpt-4 --num-tests 10 | |
| # Scale up after validation | |
| smoltrace-eval --model gpt-4 --num-tests 1000 | |
| ``` | |
| ### 2. Choose Appropriate Hardware in HF Jobs | |
| Hardware selection happens in your HuggingFace Jobs configuration: | |
| ```yaml | |
| # For API models (OpenAI, Anthropic, etc.) | |
| hardware: cpu-basic | |
| # For 7B-13B local models | |
| hardware: gpu-a10 | |
| # For 70B+ local models | |
| hardware: gpu-h200 | |
| ``` | |
| ### 3. Enable Full Observability | |
| ```bash | |
| # Capture everything | |
| smoltrace-eval \\ | |
| --model your-model \\ | |
| --enable-otel \\ | |
| --enable-gpu-metrics \\ | |
| --enable-carbon-tracking | |
| ``` | |
| ### 4. Organize Your Datasets | |
| ```bash | |
| # Use descriptive repo names | |
| --results-repo username/results-gpt4-tool-20251118 | |
| --traces-repo username/traces-gpt4-tool-20251118 | |
| --leaderboard-repo username/agent-leaderboard | |
| ``` | |
| --- | |
| ## 🔍 Cost Estimation | |
| Before running evaluations, estimate costs: | |
| ```python | |
| from smoltrace import estimate_cost | |
| # API model | |
| api_cost = estimate_cost( | |
| model="openai/gpt-4", | |
| num_tests=1000, | |
| agent_type="tool" | |
| ) | |
| print(f"Estimated cost: ${api_cost.total_cost}") | |
| # GPU job | |
| gpu_cost = estimate_cost( | |
| model="meta-llama/Llama-3.1-8B", | |
| num_tests=1000, | |
| hardware="gpu_h200" | |
| ) | |
| print(f"Estimated cost: ${gpu_cost.total_cost}") | |
| print(f"Estimated time: {gpu_cost.duration_minutes} minutes") | |
| ``` | |
| --- | |
| ## 📐 Architecture | |
| ``` | |
| ┌─────────────────────────────────────────┐ | |
| │ SMOLTRACE Core │ | |
| ├─────────────────────────────────────────┤ | |
| │ │ | |
| │ ┌──────────────┐ ┌──────────────┐ │ | |
| │ │ LiteLLM │ │ Transformers │ │ | |
| │ │ Provider │ │ Provider │ │ | |
| │ └──────┬───────┘ └──────┬───────┘ │ | |
| │ │ │ │ | |
| │ └────────┬──────────┘ │ | |
| │ ↓ │ | |
| │ ┌──────────────┐ │ | |
| │ │ TraceVerde │ │ | |
| │ │ (OTEL) │ │ | |
| │ └──────┬───────┘ │ | |
| │ ↓ │ | |
| │ ┌──────────────┐ │ | |
| │ │ Dataset │ │ | |
| │ │ Generator │ │ | |
| │ └──────┬───────┘ │ | |
| │ ↓ │ | |
| │ ┌───────────────────────┐ │ | |
| │ │ HuggingFace Upload │ │ | |
| │ └───────────────────────┘ │ | |
| │ │ | |
| └─────────────────────────────────────────┘ | |
| ``` | |
| --- | |
| ## 🔗 Resources | |
| - **GitHub**: [github.com/Mandark-droid/SMOLTRACE](https://github.com/Mandark-droid/SMOLTRACE) | |
| - **PyPI**: [pypi.org/project/smoltrace](https://pypi.org/project/smoltrace/) | |
| - **Documentation**: [SMOLTRACE README](https://github.com/Mandark-droid/SMOLTRACE#readme) | |
| --- | |
| ## 🐛 Troubleshooting | |
| ### Common Issues | |
| **Q: Evaluation is slow?** | |
| ```bash | |
| # Use GPU for local models | |
| --hardware gpu_h200 | |
| # Or reduce test count | |
| --num-tests 10 | |
| ``` | |
| **Q: Traces not captured?** | |
| ```bash | |
| # Make sure OTEL is enabled | |
| --enable-otel | |
| ``` | |
| **Q: Upload to HF failing?** | |
| ```bash | |
| # Check HF token | |
| export HF_TOKEN=your_token_here | |
| # Verify repo exists or allow auto-create | |
| ``` | |
| --- | |
| ## 📄 License | |
| **AGPL-3.0** - Open source and free to use | |
| --- | |
| ## 🤝 Contributing | |
| We welcome contributions! | |
| - Add new agent types | |
| - Support more frameworks | |
| - Improve evaluation metrics | |
| - Optimize performance | |
| --- | |
| *SMOLTRACE - Lightweight evaluation for heavyweight results* 📊 | |
| """) | |
| def create_mcp_server_tab(): | |
| """Create the TraceMind-MCP-Server documentation tab""" | |
| return gr.Markdown(""" | |
| # 🔌 TraceMind-MCP-Server | |
| <div align="center"> | |
| <img src="https://raw.githubusercontent.com/Mandark-droid/TraceMind-mcp-server/assets/Logo.png" alt="TraceMind MCP Server Logo" width="300"/> | |
| </div> | |
| <br/> | |
| **Building MCP: Intelligent Analysis Tools for Agent Evaluation** | |
| <div align="center" style="display: flex; flex-wrap: wrap; justify-content: center; gap: 5px;"> | |
| <a href="https://github.com/modelcontextprotocol"><img src="https://img.shields.io/badge/MCP%27s%201st%20Birthday-Hackathon-blue" alt="MCP's 1st Birthday Hackathon"></a> | |
| <a href="https://github.com/modelcontextprotocol/hackathon"><img src="https://img.shields.io/badge/Track-Building%20MCP%20(Enterprise)-blue" alt="Track 1"></a> | |
| <a href="https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server"><img src="https://img.shields.io/badge/HuggingFace-TraceMind--MCP--Server-yellow?logo=huggingface" alt="HF Space"></a> | |
| <a href="https://ai.google.dev/"><img src="https://img.shields.io/badge/Powered%20by-Google%20Gemini%202.5%20Pro-orange" alt="Google Gemini"></a> | |
| </div> | |
| > **🎯 Track 1 Submission**: Building MCP (Enterprise) | |
| > **📅 MCP's 1st Birthday Hackathon**: November 14-30, 2025 | |
| --- | |
| ## 📖 Table of Contents | |
| - [What is TraceMind-MCP-Server?](#what-is-tracemind-mcp-server) | |
| - [MCP Tools Provided](#️-mcp-tools-provided) | |
| - [analyze_leaderboard](#1-analyze_leaderboard) | |
| - [estimate_cost](#2-estimate_cost) | |
| - [debug_trace](#3-debug_trace) | |
| - [compare_runs](#4-compare_runs) | |
| - [analyze_results](#5-analyze_results) | |
| - [Accessing the MCP Server](#-accessing-the-mcp-server) | |
| - [Use Cases](#-use-cases) | |
| - [Architecture](#️-architecture) | |
| - [Configuration](#-configuration) | |
| - [Dataset Requirements](#-dataset-requirements) | |
| - [Learning Resources](#-learning-resources) | |
| - [Troubleshooting](#-troubleshooting) | |
| - [Links](#-links) | |
| - [License](#-license) | |
| - [Contributing](#-contributing) | |
| - [MCP's 1st Birthday Hackathon](#-mcps-1st-birthday-hackathon) | |
| --- | |
| ## What is TraceMind-MCP-Server? | |
| TraceMind-MCP-Server is a **Track 1 (Building MCP)** submission that provides MCP tools for intelligent agent evaluation analysis. | |
| **Key Features:** | |
| - 🤖 Powered by Google Gemini 2.5 Pro | |
| - 🔌 Standards-compliant MCP implementation | |
| - 📊 Analyzes HuggingFace evaluation datasets | |
| - 💡 Provides actionable insights and recommendations | |
| - 🌐 Accessible via SSE transport for Gradio integration | |
| --- | |
| <details> | |
| <summary><h2>🛠️ MCP Tools Provided</h2></summary> | |
| ### 1. `analyze_leaderboard` | |
| **Purpose**: Generate AI-powered insights about evaluation leaderboard data | |
| **Input Schema:** | |
| ```json | |
| { | |
| "leaderboard_repo": "string", // HF dataset (default: kshitijthakkar/smoltrace-leaderboard) | |
| "metric_focus": "string", // "overall" | "accuracy" | "cost" | "latency" | "co2" | |
| "time_range": "string", // "last_week" | "last_month" | "all_time" | |
| "top_n": "integer" // Number of top models to highlight | |
| } | |
| ``` | |
| **What It Does:** | |
| 1. Fetches leaderboard dataset from HuggingFace | |
| 2. Filters by time range | |
| 3. Analyzes trends based on metric focus | |
| 4. Uses Gemini to generate insights | |
| 5. Returns markdown-formatted analysis | |
| **Example Output:** | |
| ```markdown | |
| Based on 247 evaluations in the past week: | |
| **Top Performers:** | |
| - GPT-4 leads in accuracy at 95.8% but costs $0.05 per run | |
| - Llama-3.1-8B offers best cost/performance at 93.4% accuracy for $0.002 | |
| - Qwen3-MoE is fastest at 1.7s average duration | |
| **Trends:** | |
| - API models dominate accuracy rankings | |
| - GPU models are 10x more cost-effective | |
| - H200 jobs show 2x faster execution vs A10 | |
| **Recommendations:** | |
| - For production: Consider Llama-3.1-8B for cost-sensitive workloads | |
| - For maximum accuracy: GPT-4 remains state-of-the-art | |
| - For eco-friendly: Claude-3-Haiku has lowest CO2 emissions | |
| ``` | |
| --- | |
| ### 2. `estimate_cost` | |
| **Purpose**: Estimate evaluation costs with hardware recommendations | |
| **Input Schema:** | |
| ```json | |
| { | |
| "model": "string", // Model name (e.g., "openai/gpt-4") | |
| "agent_type": "string", // "tool" | "code" | "both" | |
| "num_tests": "integer", // Number of test cases (default: 100) | |
| "hardware": "string" // "cpu" | "gpu_a10" | "gpu_h200" (optional) | |
| } | |
| ``` | |
| **What It Does:** | |
| 1. Determines if model is API or local | |
| 2. Calculates token usage estimates | |
| 3. Computes costs (API pricing or GPU time) | |
| 4. Estimates duration and CO2 emissions | |
| 5. Provides hardware recommendations | |
| **Example Output:** | |
| ```markdown | |
| ## Cost Estimation: openai/gpt-4 (Tool Agent, 100 tests) | |
| **Hardware**: CPU (API model) | |
| **Cost Breakdown:** | |
| - Total Tokens: ~15,000 | |
| - Prompt Tokens: ~5,000 ($0.03) | |
| - Completion Tokens: ~10,000 ($0.06) | |
| - **Total Cost: $0.09** | |
| **Time Estimate:** | |
| - Average per test: 3.2s | |
| - Total duration: ~5.3 minutes | |
| **CO2 Emissions:** | |
| - Estimated: 0.45g CO2e | |
| **Recommendations:** | |
| - ✅ Good choice for accuracy-critical applications | |
| - ⚠️ Consider Llama-3.1-8B for cost savings (10x cheaper) | |
| - 💡 Use caching to reduce repeated API calls | |
| ``` | |
| --- | |
| ### 3. `debug_trace` | |
| **Purpose**: Answer questions about agent execution traces | |
| **Input Schema:** | |
| ```json | |
| { | |
| "trace_dataset": "string", // HF dataset with OTEL traces | |
| "trace_id": "string", // Specific trace to analyze | |
| "question": "string", // Question about the trace | |
| "include_metrics": "boolean" // Include GPU metrics (default: true) | |
| } | |
| ``` | |
| **What It Does:** | |
| 1. Fetches trace data from HuggingFace | |
| 2. Parses OpenTelemetry spans | |
| 3. Analyzes execution flow | |
| 4. Uses Gemini to answer questions | |
| 5. Provides span-level details | |
| **Example Output:** | |
| ```markdown | |
| ## Why was the tool called twice? | |
| Based on trace analysis for `trace_abc123`: | |
| **First Tool Call (span_003)**: | |
| - Time: 14:23:19.000 | |
| - Tool: `search_web` | |
| - Input: {"query": "latest AI news"} | |
| - Result: 5 results returned | |
| - Issue: Results were 2 days old | |
| **Second Tool Call (span_005)**: | |
| - Time: 14:23:21.200 | |
| - Tool: `search_web` | |
| - Input: {"query": "latest AI news today"} | |
| - Reasoning: LLM determined first results were outdated | |
| - Duration: 1200ms | |
| **Why Twice?** | |
| The agent's reasoning chain shows it initially received outdated results. | |
| The LLM then decided to refine the query with "today" keyword to get | |
| more recent data. | |
| **Performance Impact:** | |
| - Added 2.09s to total execution | |
| - Cost increase: +$0.0003 | |
| - This is normal for agents with iterative reasoning | |
| **Recommendation:** | |
| Consider adding date filters to initial tool calls to avoid retries. | |
| ``` | |
| --- | |
| ### 4. `compare_runs` | |
| **Purpose**: Side-by-side comparison of evaluation runs | |
| **Input Schema:** | |
| ```json | |
| { | |
| "leaderboard_repo": "string", // HF leaderboard dataset | |
| "run_id_1": "string", // First run ID | |
| "run_id_2": "string", // Second run ID | |
| "comparison_focus": "string" // "overall" | "cost" | "accuracy" | "speed" | |
| } | |
| ``` | |
| **What It Does:** | |
| 1. Fetches data for both runs | |
| 2. Compares key metrics | |
| 3. Identifies strengths/weaknesses | |
| 4. Provides recommendations | |
| **Example Output:** | |
| ```markdown | |
| ## Comparison: GPT-4 vs Llama-3.1-8B | |
| | Metric | GPT-4 | Llama-3.1-8B | Winner | | |
| |--------|-------|--------------|--------| | |
| | Success Rate | 95.8% | 93.4% | GPT-4 (+2.4%) | | |
| | Avg Duration | 3.2s | 2.1s | Llama (+34% faster) | | |
| | Cost per Run | $0.05 | $0.002 | Llama (25x cheaper) | | |
| | CO2 Emissions | 0.22g | 0.08g | Llama (64% less) | | |
| **Analysis:** | |
| - GPT-4 has slight accuracy edge but at significant cost premium | |
| - Llama-3.1-8B offers excellent cost/performance ratio | |
| - For 1000 runs: GPT-4 costs $50, Llama costs $2 | |
| **Recommendation:** | |
| Use Llama-3.1-8B for production unless 95%+ accuracy is critical. | |
| Consider hybrid approach: Llama for routine tasks, GPT-4 for complex ones. | |
| ``` | |
| --- | |
| ### 5. `analyze_results` | |
| **Purpose**: Deep dive into test case results | |
| **Input Schema:** | |
| ```json | |
| { | |
| "results_repo": "string", // HF results dataset | |
| "run_id": "string", // Run to analyze | |
| "focus": "string" // "failures" | "successes" | "all" | |
| } | |
| ``` | |
| **What It Does:** | |
| 1. Loads results dataset | |
| 2. Filters by success/failure | |
| 3. Identifies patterns | |
| 4. Suggests optimizations | |
| </details> | |
| --- | |
| ## 🌐 Accessing the MCP Server | |
| ### Via TraceMind-AI (This App!) | |
| The **Agent Chat** screen uses TraceMind-MCP-Server automatically: | |
| ```python | |
| # Happens automatically in the Chat screen | |
| from mcp_client.sync_wrapper import get_sync_mcp_client | |
| mcp = get_sync_mcp_client() | |
| insights = mcp.analyze_leaderboard( | |
| metric_focus="overall", | |
| time_range="last_week" | |
| ) | |
| ``` | |
| ### Via SSE Endpoint (for smolagents) | |
| ```python | |
| from smolagents import MCPClient, ToolCallingAgent | |
| # Connect to MCP server via SSE | |
| mcp_client = MCPClient( | |
| "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse" | |
| ) | |
| # Create agent with MCP tools | |
| agent = ToolCallingAgent( | |
| tools=[], | |
| model="hfapi", | |
| additional_authorized_imports=["requests", "pandas"] | |
| ) | |
| # Tools automatically available! | |
| agent.run("Analyze the leaderboard and show top 3 models") | |
| ``` | |
| ### Via MCP SDK (for other clients) | |
| ```python | |
| from mcp import ClientSession, StdioServerParameters | |
| # For local development | |
| session = ClientSession( | |
| StdioServerParameters( | |
| command="python", | |
| args=["-m", "mcp_tools"] | |
| ) | |
| ) | |
| # Call tools | |
| result = await session.call_tool( | |
| "analyze_leaderboard", | |
| arguments={"metric_focus": "cost"} | |
| ) | |
| ``` | |
| --- | |
| ## 🎯 Use Cases | |
| ### 1. Interactive Analysis (Agent Chat) | |
| Ask natural language questions: | |
| - "What are the top 3 models by accuracy?" | |
| - "Compare GPT-4 and Claude-3 on cost" | |
| - "Why is this agent slow?" | |
| ### 2. Automated Insights (Leaderboard) | |
| Get AI summaries automatically: | |
| - Weekly trend reports | |
| - Cost optimization recommendations | |
| - Performance alerts | |
| ### 3. Debugging (Trace Detail) | |
| Understand agent behavior: | |
| - "Why did the agent fail?" | |
| - "Which tool took the longest?" | |
| - "Why was the same tool called twice?" | |
| ### 4. Planning (Cost Estimator) | |
| Before running evaluations: | |
| - "How much will 1000 tests cost?" | |
| - "Should I use A10 or H200?" | |
| - "What's the CO2 impact?" | |
| --- | |
| ## 🏗️ Architecture | |
| ``` | |
| ┌─────────────────────────────────────────────────────┐ | |
| │ TraceMind-MCP-Server (HF Space) │ | |
| ├─────────────────────────────────────────────────────┤ | |
| │ │ | |
| │ ┌─────────────────┐ ┌──────────────────┐ │ | |
| │ │ Gradio App │ │ MCP Protocol │ │ | |
| │ │ (UI + SSE) │◄──────►│ Handler │ │ | |
| │ └─────────────────┘ └────────┬─────────┘ │ | |
| │ │ │ | |
| │ ┌────────▼─────────┐ │ | |
| │ │ Tool Router │ │ | |
| │ └────────┬─────────┘ │ | |
| │ │ │ | |
| │ ┌─────────────────────────────┼──────────┐ │ | |
| │ │ │ │ │ | |
| │ ┌──────▼──────┐ ┌─────────▼───────▼──┐ ┌──▼──▼──┐ | |
| │ │ Leaderboard │ │ Cost Estimator │ │ Trace │ | |
| │ │ Analyzer │ │ │ │Debugger│ | |
| │ └─────────────┘ └───────────────────┘ └────────┘ | |
| │ │ │ │ │ | |
| │ └─────────────────────┴──────────────────┘ │ | |
| │ │ │ | |
| │ ┌─────────▼──────────┐ │ | |
| │ │ Gemini 2.5 Pro │ │ | |
| │ │ (Analysis Engine) │ │ | |
| │ └────────────────────┘ │ | |
| │ │ | |
| └────────────────────────────────────────────────────────┘ | |
| │ | |
| │ MCP Protocol (SSE) | |
| │ | |
| ▼ | |
| ┌──────────────────────────┐ | |
| │ TraceMind-AI (UI) │ | |
| │ Agent Chat Screen │ | |
| └──────────────────────────┘ | |
| ``` | |
| --- | |
| ## 🔧 Configuration | |
| ### Environment Variables | |
| ```env | |
| # Google Gemini API (required) | |
| GEMINI_API_KEY=your_api_key_here | |
| # HuggingFace Token (for dataset access) | |
| HF_TOKEN=your_token_here | |
| # Default Leaderboard (optional) | |
| DEFAULT_LEADERBOARD_REPO=kshitijthakkar/smoltrace-leaderboard | |
| ``` | |
| --- | |
| ## 📊 Dataset Requirements | |
| MCP tools expect datasets with specific schemas: | |
| ### Leaderboard Dataset | |
| ```python | |
| { | |
| "run_id": "string", | |
| "model": "string", | |
| "success_rate": "float", | |
| "total_cost_usd": "float", | |
| "timestamp": "string", | |
| # ... other metrics | |
| } | |
| ``` | |
| ### Results Dataset | |
| ```python | |
| { | |
| "run_id": "string", | |
| "task_id": "string", | |
| "success": "boolean", | |
| "trace_id": "string", | |
| # ... other fields | |
| } | |
| ``` | |
| ### Traces Dataset | |
| ```python | |
| { | |
| "trace_id": "string", | |
| "spans": [ | |
| { | |
| "spanId": "string", | |
| "name": "string", | |
| "attributes": {}, | |
| # ... OTEL format | |
| } | |
| ] | |
| } | |
| ``` | |
| --- | |
| ## 🎓 Learning Resources | |
| ### MCP Documentation | |
| - [Model Context Protocol Spec](https://modelcontextprotocol.io) | |
| - [MCP Python SDK](https://github.com/modelcontextprotocol/python-sdk) | |
| - [Gradio MCP Integration](https://www.gradio.app/guides/creating-a-custom-chatbot-with-blocks#model-context-protocol-mcp) | |
| ### Implementation Examples | |
| - **This Server**: [HF Space Code](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server/tree/main) | |
| - **Client Integration**: [TraceMind-AI mcp_client/](https://github.com/Mandark-droid/TraceMind-AI/tree/main/mcp_client) | |
| --- | |
| ## 🐛 Troubleshooting | |
| ### Common Issues | |
| **Q: MCP tools not appearing?** | |
| ```bash | |
| # Verify MCP_SERVER_URL is correct | |
| echo $MCP_SERVER_URL | |
| # Should be: https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse | |
| ``` | |
| **Q: "Failed to load dataset" error?** | |
| ```bash | |
| # Check HF token | |
| export HF_TOKEN=your_token_here | |
| # Verify dataset exists | |
| huggingface-cli repo info kshitijthakkar/smoltrace-leaderboard | |
| ``` | |
| **Q: Gemini API errors?** | |
| ```bash | |
| # Verify API key | |
| curl -H "Authorization: Bearer $GEMINI_API_KEY" \\ | |
| https://generativelanguage.googleapis.com/v1beta/models | |
| # Check rate limits (10 requests/minute on free tier) | |
| ``` | |
| --- | |
| ## 🔗 Links | |
| - **Live Server**: [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) | |
| - **Source Code**: [GitHub](https://github.com/Mandark-droid/TraceMind-mcp-server) | |
| - **Client (This App)**: [TraceMind-AI](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) | |
| - **MCP Spec**: [modelcontextprotocol.io](https://modelcontextprotocol.io) | |
| --- | |
| ## 📄 License | |
| **AGPL-3.0** - Open source and free to use | |
| --- | |
| ## 🤝 Contributing | |
| Help improve TraceMind-MCP-Server: | |
| - Add new MCP tools | |
| - Improve analysis quality | |
| - Optimize performance | |
| - Add support for more datasets | |
| --- | |
| ## 🏆 MCP's 1st Birthday Hackathon | |
| **Track 1 Submission: Building MCP (Enterprise)** | |
| TraceMind-MCP-Server demonstrates: | |
| - ✅ Standards-compliant MCP implementation | |
| - ✅ SSE transport for Gradio integration | |
| - ✅ Real-world use case (agent evaluation) | |
| - ✅ Gemini 2.5 Pro integration | |
| - ✅ Production-ready deployment on HF Spaces | |
| **Used by**: TraceMind-AI (Track 2) for autonomous agent chat | |
| --- | |
| *TraceMind-MCP-Server - Intelligent analysis, one tool at a time* 🔌 | |
| """) | |
| def create_job_submission_tab(): | |
| """Create the Job Submission tab with full details about Modal and HF Jobs""" | |
| return gr.Markdown(""" | |
| # ☁️ Job Submission | |
| **Run SMOLTRACE Evaluations on Cloud Infrastructure** | |
| TraceMind-AI provides seamless integration with two cloud compute platforms, allowing you to run agent evaluations with automated hardware selection, cost estimation, and real-time monitoring. | |
| --- | |
| ## 📋 Table of Contents | |
| - [Platform Overview](#-platform-overview) | |
| - [HuggingFace Jobs Integration](#-huggingface-jobs-integration) | |
| - [Modal Integration](#-modal-integration) | |
| - [Hardware Auto-Selection](#-hardware-auto-selection) | |
| - [Cost Estimation](#-cost-estimation) | |
| - [Job Monitoring](#-job-monitoring) | |
| - [Step-by-Step Guide](#-step-by-step-guide) | |
| - [Troubleshooting](#-troubleshooting) | |
| --- | |
| ## 🌟 Platform Overview | |
| ### Supported Platforms | |
| | Platform | Best For | Pricing Model | GPU Options | Free Tier | | |
| |----------|----------|---------------|-------------|-----------| | |
| | **HuggingFace Jobs** | Managed infrastructure, dataset integration | Per-hour | T4, L4, A10, A100, V5e | ❌ ($9/mo Pro required) | | |
| | **Modal** | Serverless compute, pay-per-second | Per-second | T4, L4, A10, A100-80GB, H200 | ✅ Free credits available | | |
| ### Key Differences | |
| **HuggingFace Jobs**: | |
| - ✅ Native HuggingFace ecosystem integration | |
| - ✅ Managed infrastructure with guaranteed availability | |
| - ✅ Built-in dataset storage and versioning | |
| - ⚠️ Requires Pro account ($9/month) | |
| - ⚠️ Per-hour billing (minimum 1 hour charge) | |
| **Modal**: | |
| - ✅ Serverless architecture (no minimum charges) | |
| - ✅ Pay-per-second billing (more cost-effective for short jobs) | |
| - ✅ Latest GPUs (H200 available) | |
| - ✅ Free tier with credits | |
| - ⚠️ Requires separate account setup | |
| - ⚠️ Container cold start time (~2-3 minutes first run) | |
| --- | |
| ## 🤗 HuggingFace Jobs Integration | |
| ### Requirements | |
| **1. HuggingFace Pro Account** | |
| - Cost: $9/month | |
| - Sign up: https://huggingface.co/pricing | |
| - Includes compute credits and priority support | |
| **2. HuggingFace Token with Run Jobs Permission** | |
| ``` | |
| Steps to create token: | |
| 1. Visit: https://huggingface.co/settings/tokens | |
| 2. Click "New token" | |
| 3. Name: "TraceMind Evaluation" | |
| 4. Permissions: | |
| ✅ Read (view datasets) | |
| ✅ Write (upload results) | |
| ✅ Run Jobs (submit evaluation jobs) ⚠️ REQUIRED | |
| 5. Copy token (starts with hf_) | |
| 6. Save in TraceMind Settings | |
| ``` | |
| ### Hardware Options | |
| | Hardware | vCPUs | GPU | Memory | Best For | Price/hr | | |
| |----------|-------|-----|--------|----------|----------| | |
| | `cpu-basic` | 2 | - | 16 GB | API models (OpenAI, Anthropic) | ~$0.05 | | |
| | `cpu-upgrade` | 8 | - | 32 GB | API models (high volume) | ~$0.10 | | |
| | `t4-small` | 4 | T4 (16GB) | 16 GB | Small models (4B-8B) | ~$0.60 | | |
| | `t4-medium` | 8 | T4 (16GB) | 32 GB | Small models (batched) | ~$1.00 | | |
| | `a10g-small` | 4 | A10G (24GB) | 32 GB | Medium models (7B-13B) | ~$1.10 | | |
| | `a10g-large` | 12 | A10G (24GB) | 92 GB | Medium models (high memory) | ~$1.50 | | |
| | `a100-large` | 12 | A100 (80GB) | 142 GB | Large models (70B+) | ~$3.00 | | |
| | `v5e-1x1` | 4 | TPU v5e | 16 GB | TPU-optimized workloads | ~$1.20 | | |
| Full pricing: https://huggingface.co/pricing#spaces-pricing | |
| ### Auto-Selection Logic | |
| When you select `hardware: auto`, TraceMind applies this logic: | |
| ```python | |
| # API models (LiteLLM/Inference) | |
| if provider in ["litellm", "inference"]: | |
| hardware = "cpu-basic" | |
| # Local models (Transformers) | |
| elif "70b" in model.lower() or "65b" in model.lower(): | |
| hardware = "a100-large" # Large models | |
| elif "13b" in model.lower() or "34b" in model.lower(): | |
| hardware = "a10g-large" # Medium models | |
| elif "7b" in model.lower() or "8b" in model.lower() or "4b" in model.lower(): | |
| hardware = "t4-small" # Small models | |
| else: | |
| hardware = "t4-small" # Default | |
| ``` | |
| ### Job Workflow | |
| ``` | |
| 1. Configure Settings | |
| └─> Add HF Token (with Run Jobs permission) | |
| └─> Add LLM provider API keys | |
| 2. Create Evaluation | |
| └─> Select "HuggingFace Jobs" as infrastructure | |
| └─> Choose model and configuration | |
| └─> Hardware auto-selected or manually chosen | |
| 3. Submit Job | |
| └─> TraceMind validates credentials | |
| └─> Submits job via HF Jobs API | |
| └─> Returns job ID for monitoring | |
| 4. Job Execution | |
| └─> Container built with dependencies | |
| └─> SMOLTRACE runs evaluation | |
| └─> Results uploaded to HF datasets | |
| └─> Leaderboard updated automatically | |
| 5. Monitor Progress | |
| └─> Track at: https://huggingface.co/jobs | |
| └─> Or use Job Monitoring tab in TraceMind | |
| ``` | |
| --- | |
| ## ⚡ Modal Integration | |
| ### Requirements | |
| **1. Modal Account** | |
| - Free tier: $30 free credits per month | |
| - Sign up: https://modal.com | |
| **2. Modal API Credentials** | |
| ``` | |
| Steps to get credentials: | |
| 1. Visit: https://modal.com/settings/tokens | |
| 2. Click "Create token" | |
| 3. Copy: | |
| - Token ID (starts with ak-) | |
| - Token Secret (starts with as-) | |
| 4. Save in TraceMind Settings: | |
| - MODAL_TOKEN_ID: ak-xxxxx | |
| - MODAL_TOKEN_SECRET: as-xxxxx | |
| ``` | |
| ### Hardware Options | |
| | Hardware | GPU | Memory | Best For | Price/sec | Equivalent $/hr | | |
| |----------|-----|--------|----------|-----------|-----------------| | |
| | `CPU` | - | 16 GB | API models | ~$0.0001 | ~$0.36 | | |
| | `T4` | T4 (16GB) | 16 GB | Small models (4B-8B) | ~$0.0002 | ~$0.72 | | |
| | `L4` | L4 (24GB) | 24 GB | Small-medium models | ~$0.0004 | ~$1.44 | | |
| | `A10G` | A10G (24GB) | 32 GB | Medium models (7B-13B) | ~$0.0006 | ~$2.16 | | |
| | `L40S` | L40S (48GB) | 48 GB | Large models (optimized) | ~$0.0012 | ~$4.32 | | |
| | `A100` | A100 (40GB) | 64 GB | Large models | ~$0.0020 | ~$7.20 | | |
| | `A100-80GB` | A100 (80GB) | 128 GB | Very large models (70B+) | ~$0.0030 | ~$10.80 | | |
| | `H100` | H100 (80GB) | 192 GB | Latest generation inference | ~$0.0040 | ~$14.40 | | |
| | `H200` | H200 (141GB) | 256 GB | Cutting-edge, highest memory | ~$0.0050 | ~$18.00 | | |
| Full pricing: https://modal.com/pricing | |
| **💡 Cost Advantage**: Modal's per-second billing is more cost-effective for jobs <1 hour! | |
| ### Auto-Selection Logic | |
| When you select `hardware: auto`, TraceMind applies this logic: | |
| ```python | |
| # API models | |
| if provider in ["litellm", "inference"]: | |
| gpu = None # CPU only | |
| # Local models (Transformers) | |
| elif "70b" in model.lower() or "65b" in model.lower(): | |
| gpu = "A100-80GB" # Large models need 80GB | |
| elif "13b" in model.lower() or "34b" in model.lower(): | |
| gpu = "A10G" # Medium models | |
| elif "7b" in model.lower() or "8b" in model.lower(): | |
| gpu = "A10G" # Small models efficient on A10G | |
| else: | |
| gpu = "A10G" # Default | |
| ``` | |
| ### Modal-Specific Features | |
| **Dynamic Python Version Matching** | |
| ```python | |
| # Automatically matches your environment | |
| python_version = f"{sys.version_info.major}.{sys.version_info.minor}" | |
| # Example: "3.10" on HF Space, "3.12" locally | |
| ``` | |
| **Optimized Docker Images** | |
| ```python | |
| # GPU jobs: CUDA-optimized base | |
| image = "nvidia/cuda:12.6.0-cudnn-devel-ubuntu22.04" | |
| # CPU jobs: Lightweight | |
| image = "debian-slim" | |
| ``` | |
| **Smart Package Installation** | |
| ```python | |
| # GPU jobs get full stack | |
| packages = [ | |
| "smoltrace", | |
| "transformers", | |
| "torch", | |
| "accelerate", # For device_map | |
| "bitsandbytes", # For quantization | |
| "hf_transfer", # Fast downloads | |
| "nvidia-ml-py" # GPU metrics | |
| ] | |
| # CPU jobs get minimal dependencies | |
| packages = ["smoltrace", "litellm", "ddgs"] | |
| ``` | |
| ### Job Workflow | |
| ``` | |
| 1. Configure Settings | |
| └─> Add Modal Token ID + Secret | |
| └─> Add HF Token (for dataset upload) | |
| └─> Add LLM provider API keys | |
| 2. Create Evaluation | |
| └─> Select "Modal" as infrastructure | |
| └─> Choose model and configuration | |
| └─> Hardware auto-selected | |
| 3. Submit Job | |
| └─> TraceMind creates dynamic Modal app | |
| └─> Submits job in background thread | |
| └─> Returns Modal Call ID | |
| 4. Job Execution | |
| └─> Image builds (or uses cache) | |
| └─> Model downloads to Modal storage | |
| └─> SMOLTRACE runs evaluation | |
| └─> Results uploaded to HF datasets | |
| 5. Monitor Progress | |
| └─> Track at: https://modal.com/apps | |
| └─> View real-time streaming logs | |
| ``` | |
| --- | |
| ## 🎯 Hardware Auto-Selection | |
| ### How It Works | |
| TraceMind **automatically selects optimal hardware** based on: | |
| 1. **Provider type**: LiteLLM/Inference (API) vs Transformers (local) | |
| 2. **Model size**: Extracted from model name (e.g., "70b", "13b", "8b") | |
| 3. **Platform**: Modal or HuggingFace Jobs | |
| ### Selection Matrix | |
| | Model Type | Model Size | HF Jobs | Modal | | |
| |------------|------------|---------|-------| | |
| | API (OpenAI, Anthropic) | Any | `cpu-basic` | `CPU` | | |
| | Transformers | 4B-8B | `t4-small` | `A10G` | | |
| | Transformers | 13B-34B | `a10g-large` | `A10G` | | |
| | Transformers | 70B+ | `a100-large` | `A100-80GB` | | |
| ### Override Auto-Selection | |
| You can manually select hardware if needed: | |
| ``` | |
| Reasons to override: | |
| - You know your model needs more memory | |
| - You want to test performance on different GPUs | |
| - You want to optimize cost vs speed tradeoff | |
| ``` | |
| ### Cost Estimation Shows Auto-Selection | |
| When you click **"💰 Estimate Cost"** with `auto` hardware: | |
| **Modal Example**: | |
| ``` | |
| Hardware: auto → **A100-80GB** (Modal) | |
| Estimated Cost: $0.45 | |
| Duration: 15 minutes | |
| ``` | |
| **HF Jobs Example**: | |
| ``` | |
| Hardware: auto → **a100-large** (HF Jobs) | |
| Estimated Cost: $0.75 | |
| Duration: 15 minutes | |
| ``` | |
| --- | |
| ## 💰 Cost Estimation | |
| ### How Cost Estimation Works | |
| TraceMind provides **AI-powered cost estimation** before you submit jobs: | |
| **Data Sources**: | |
| 1. **Historical Data** (preferred): Analyzes past runs from leaderboard | |
| 2. **MCP Server** (fallback): Uses `estimate_cost` MCP tool with Gemini 2.5 Pro | |
| ### Estimation Process | |
| ``` | |
| 1. User clicks "💰 Estimate Cost" | |
| 2. TraceMind checks for historical data | |
| └─> If found: Use average cost/duration from past runs | |
| └─> If not found: Call MCP Server for AI analysis | |
| 3. Auto-selection applied | |
| └─> Determines actual hardware that will be used | |
| └─> Maps to pricing table | |
| 4. Display estimate | |
| └─> Cost breakdown | |
| └─> Duration estimate | |
| └─> Hardware details | |
| ``` | |
| ### Cost Estimate Components | |
| **Historical Data Estimate**: | |
| ```markdown | |
| ## 💰 Cost Estimate | |
| **📊 Historical Data (5 past runs)** | |
| | Metric | Value | | |
| |--------|-------| | |
| | Model | meta-llama/Llama-3.1-70B | | |
| | Hardware | auto → **A100-80GB** (Modal) | | |
| | Estimated Cost | $0.45 | | |
| | Duration | 15.2 minutes | | |
| --- | |
| *Based on 5 previous evaluation runs in the leaderboard.* | |
| ``` | |
| **MCP AI Estimate**: | |
| ```markdown | |
| ## 💰 Cost Estimate - AI Analysis | |
| **🤖 Powered by MCP Server + Gemini 2.5 Pro** | |
| *This estimate was generated by AI analysis since no historical | |
| data is available for this model.* | |
| **Hardware**: auto → **A100-80GB** (Modal) | |
| --- | |
| Based on the model size (70B parameters) and evaluation | |
| configuration, I estimate: | |
| **Cost Breakdown**: | |
| - Model download: ~5 minutes @ $0.0030/sec = $0.90 | |
| - Evaluation (100 tests): ~10 minutes @ $0.0030/sec = $1.80 | |
| - **Total estimated cost**: $2.70 | |
| **Duration**: 15-20 minutes | |
| **Recommendations**: | |
| - For cost savings, consider using A10G with quantization | |
| - For faster inference, H200 reduces duration to ~8 minutes | |
| ``` | |
| ### Accuracy of Estimates | |
| **Historical estimates**: ±10% accuracy | |
| - Based on actual past runs | |
| - Accounts for model-specific behavior | |
| **MCP AI estimates**: ±30% accuracy | |
| - Uses model knowledge and heuristics | |
| - Conservative (tends to overestimate) | |
| **Factors affecting accuracy**: | |
| - Model download time varies (network speed, caching) | |
| - Evaluation complexity depends on dataset | |
| - GPU availability can affect queue time | |
| --- | |
| ## 🔍 Job Monitoring | |
| ### HuggingFace Jobs Monitoring | |
| **Built-in Tab**: Go to **"🔍 Job Monitoring"** in TraceMind | |
| **Features**: | |
| ``` | |
| 📋 Inspect Job | |
| └─> Enter HF Job ID | |
| └─> View status, hardware, timestamps | |
| └─> See next steps based on status | |
| 📜 Job Logs | |
| └─> Load execution logs | |
| └─> Auto-refresh option | |
| └─> Search and filter | |
| 📑 Recent Jobs | |
| └─> List your recent jobs | |
| └─> Quick status overview | |
| └─> Click to inspect | |
| ``` | |
| **Job Statuses**: | |
| - ⏳ **QUEUED**: Waiting to start | |
| - 🔄 **STARTING**: Initializing (1-2 min) | |
| - ▶️ **RUNNING**: Executing evaluation | |
| - ✅ **SUCCEEDED**: Completed successfully | |
| - ❌ **FAILED**: Error occurred (check logs) | |
| - 🚫 **CANCELLED**: Manually stopped | |
| **External Monitoring**: | |
| - HF Dashboard: https://huggingface.co/jobs | |
| - CLI: `hf jobs ps` and `hf jobs logs <job_id>` | |
| ### Modal Monitoring | |
| **Modal Dashboard**: https://modal.com/apps | |
| **Features**: | |
| - Real-time streaming logs | |
| - GPU utilization graphs | |
| - Cost tracking | |
| - Container status | |
| **Log Visibility**: | |
| TraceMind uses streaming output for Modal jobs: | |
| ```python | |
| # You'll see in real-time: | |
| ================================================================================ | |
| Starting SMOLTRACE evaluation on Modal | |
| Command: smoltrace-eval --model Qwen/Qwen3-8B ... | |
| Python version: 3.10.0 | |
| GPU: NVIDIA A10 | |
| GPU Memory: 23.68 GB | |
| ================================================================================ | |
| Note: Model download may take several minutes for large models (14B = ~28GB) | |
| Downloading and initializing model... | |
| [Download progress bars appear here] | |
| [Evaluation progress appears here] | |
| ================================================================================ | |
| EVALUATION COMPLETED | |
| Return code: 0 | |
| ================================================================================ | |
| ``` | |
| ### Expected Duration | |
| **CPU Jobs (API Models)**: | |
| - Queue time: <1 minute | |
| - Execution: 2-5 minutes | |
| - **Total**: ~5 minutes | |
| **GPU Jobs (Local Models)**: | |
| - Queue time: 1-3 minutes | |
| - Image build: 2-5 minutes (first run, then cached) | |
| - Model download: 5-15 minutes (14B = ~10 min, 70B = ~15 min) | |
| - Evaluation: 3-10 minutes (depends on dataset size) | |
| - **Total**: 15-30 minutes | |
| **Pro Tip**: Modal caches images and models, so subsequent runs are **much faster** (skip image build and model download). | |
| --- | |
| ## 📝 Step-by-Step Guide | |
| ### Complete Workflow Example | |
| **Scenario**: Evaluate GPT-4 via LiteLLM on HuggingFace Jobs | |
| #### Step 1: Configure API Keys | |
| ``` | |
| 1. Go to "⚙️ Settings" tab | |
| 2. Under "HuggingFace Configuration": | |
| - HF Token: [your token with Run Jobs permission] | |
| - Click "Save API Keys" | |
| 3. Under "LLM Provider API Keys": | |
| - OpenAI API Key: [your key] | |
| - Click "Save API Keys" | |
| ``` | |
| #### Step 2: Navigate to New Evaluation | |
| ``` | |
| 1. Click "🚀 New Evaluation" in sidebar | |
| 2. You'll see the evaluation form with multiple sections | |
| ``` | |
| #### Step 3: Configure Evaluation | |
| **Infrastructure**: | |
| ``` | |
| Infrastructure Provider: HuggingFace Jobs | |
| Hardware: auto (will select cpu-basic) | |
| ``` | |
| **Model Configuration**: | |
| ``` | |
| Model: openai/gpt-4 | |
| Provider: litellm | |
| ``` | |
| **Agent Configuration**: | |
| ``` | |
| Agent Type: both (tool + code) | |
| Search Provider: duckduckgo | |
| Tools: python_interpreter, visit_webpage, duckduckgo_search | |
| ``` | |
| **Test Configuration**: | |
| ``` | |
| Dataset: kshitijthakkar/smoltrace-tasks | |
| Split: train | |
| Difficulty: all | |
| Parallel Workers: 1 | |
| ``` | |
| **Output & Monitoring**: | |
| ``` | |
| Output Format: hub (HuggingFace datasets) | |
| Enable OTEL: ✅ | |
| Enable GPU Metrics: ✅ (N/A for CPU) | |
| Timeout: 1h | |
| ``` | |
| #### Step 4: Estimate Cost | |
| ``` | |
| 1. Click "💰 Estimate Cost" | |
| 2. Review estimate: | |
| - Hardware: auto → **cpu-basic** (HF Jobs) | |
| - Cost: ~$0.08 | |
| - Duration: ~3 minutes | |
| ``` | |
| #### Step 5: Submit Job | |
| ``` | |
| 1. Click "Submit Evaluation" | |
| 2. Confirmation appears: | |
| ✅ Job submitted successfully! | |
| Job Details: | |
| - Run ID: job_abc12345 | |
| - HF Job ID: kshitijthakkar/def67890 | |
| - Hardware: cpu-basic | |
| - Platform: HuggingFace Jobs | |
| ``` | |
| #### Step 6: Monitor Job | |
| **Option A: TraceMind Job Monitoring** | |
| ``` | |
| 1. Go to "🔍 Job Monitoring" tab | |
| 2. Click "📋 Inspect Job" | |
| 3. Paste HF Job ID: kshitijthakkar/def67890 | |
| 4. Click "🔍 Inspect Job" | |
| 5. View status and click "📥 Load Logs" | |
| ``` | |
| **Option B: HuggingFace Dashboard** | |
| ``` | |
| 1. Visit: https://huggingface.co/jobs | |
| 2. Find your job by ID or timestamp | |
| 3. View logs and status | |
| ``` | |
| #### Step 7: View Results | |
| ``` | |
| When job completes (SUCCEEDED): | |
| 1. Go to "📊 Leaderboard" tab | |
| 2. Click "Load Leaderboard" | |
| 3. Find your run (job_abc12345) | |
| 4. Click row to view detailed results | |
| ``` | |
| --- | |
| ## 🔧 Troubleshooting | |
| ### Common Issues & Solutions | |
| #### 1. "Modal package not installed" | |
| **Error**: | |
| ``` | |
| Modal package not installed. Install with: pip install modal | |
| ``` | |
| **Solution**: | |
| ```bash | |
| pip install modal>=0.64.0 | |
| ``` | |
| #### 2. "HuggingFace token not configured" | |
| **Error**: | |
| ``` | |
| HuggingFace token not configured. Please set HF_TOKEN in Settings. | |
| ``` | |
| **Solution**: | |
| 1. Get token from: https://huggingface.co/settings/tokens | |
| 2. Add in Settings → HuggingFace Configuration | |
| 3. Ensure permissions include **Read**, **Write**, and **Run Jobs** | |
| #### 3. "Modal authentication failed" | |
| **Error**: | |
| ``` | |
| Modal authentication failed. Please verify your MODAL_TOKEN_ID | |
| and MODAL_TOKEN_SECRET in Settings. | |
| ``` | |
| **Solution**: | |
| 1. Get credentials from: https://modal.com/settings/tokens | |
| 2. Add both: | |
| - MODAL_TOKEN_ID (starts with `ak-`) | |
| - MODAL_TOKEN_SECRET (starts with `as-`) | |
| 3. Save and retry | |
| #### 4. "Job failed - Python version mismatch" | |
| **Error** (in Modal logs): | |
| ``` | |
| The 'submit_modal_job.<locals>.run_evaluation' Function | |
| was defined with Python 3.12, but its Image has 3.10. | |
| ``` | |
| **Solution**: | |
| This is automatically fixed in the latest version! TraceMind now dynamically matches Python versions. | |
| If still occurring: | |
| 1. Pull latest code: `git pull origin main` | |
| 2. Restart app | |
| #### 5. "Fast download using 'hf_transfer' is enabled but package not available" | |
| **Error** (in Modal logs): | |
| ``` | |
| ValueError: Fast download using 'hf_transfer' is enabled but | |
| 'hf_transfer' package is not available. | |
| ``` | |
| **Solution**: | |
| This is automatically fixed in the latest version! TraceMind now includes `hf_transfer` in GPU job packages. | |
| If still occurring: | |
| 1. Pull latest code | |
| 2. Modal will rebuild image with new dependencies | |
| #### 6. "Job stuck at 'Downloading model'" | |
| **Symptoms**: | |
| - Logs show "Downloading and initializing model..." | |
| - No progress for 10+ minutes | |
| **Explanation**: | |
| - Large models (14B+) take 10-15 minutes to download | |
| - This is normal! Model size: 28GB for 14B, 140GB for 70B | |
| **Solution**: | |
| - Be patient - download is in progress (Modal's network is fast) | |
| - Future runs will be cached and start instantly | |
| - Check Modal dashboard for download progress | |
| #### 7. "Job completed but no results in leaderboard" | |
| **Symptoms**: | |
| - Job status shows SUCCEEDED | |
| - No entry in leaderboard | |
| **Possible Causes**: | |
| 1. Results uploaded to different user's namespace | |
| 2. Leaderboard not refreshed | |
| 3. Job failed during result upload | |
| **Solution**: | |
| ``` | |
| 1. Refresh leaderboard: Click "Load Leaderboard" | |
| 2. Check HF dataset repos: | |
| - kshitijthakkar/smoltrace-leaderboard | |
| - kshitijthakkar/smoltrace-results-<timestamp> | |
| 3. Verify HF token has Write permission | |
| 4. Check job logs for upload errors | |
| ``` | |
| #### 8. "Cannot submit job - HuggingFace Pro required" | |
| **Error**: | |
| ``` | |
| HuggingFace Pro Account ($9/month) required to submit jobs. | |
| Free accounts cannot submit jobs. | |
| ``` | |
| **Solution**: | |
| - Option A: Upgrade to HF Pro: https://huggingface.co/pricing | |
| - Option B: Use Modal instead (has free tier with credits) | |
| #### 9. "Modal job exits after image build" | |
| **Symptoms**: | |
| - Logs show: "Stopping app - local entrypoint completed" | |
| - Job ends without running evaluation | |
| **Solution**: | |
| This was a known issue (fixed in latest version). The problem was using `.spawn()` with `with app.run()` context. | |
| Current implementation uses `.remote()` in background thread, which ensures job completes. | |
| If still occurring: | |
| 1. Pull latest code: `git pull origin main` | |
| 2. Restart app | |
| 3. Resubmit job | |
| #### 10. "Cost estimate shows wrong hardware" | |
| **Symptoms**: | |
| - Selected Modal with 70B model | |
| - Cost estimate shows "a10g-small" instead of "A100-80GB" | |
| **Solution**: | |
| This was a known issue (fixed in latest version). Cost estimation now applies platform-specific auto-selection logic. | |
| Verify fix: | |
| 1. Pull latest code | |
| 2. Click "💰 Estimate Cost" | |
| 3. Should show: `auto → **A100-80GB** (Modal)` | |
| --- | |
| ## 📞 Getting Help | |
| ### Resources | |
| **Documentation**: | |
| - TraceMind Docs: This tab! | |
| - SMOLTRACE Docs: [GitHub](https://github.com/Mandark-droid/SMOLTRACE) | |
| - Modal Docs: https://modal.com/docs | |
| - HF Jobs Docs: https://huggingface.co/docs/hub/spaces-sdks-docker | |
| **Community**: | |
| - GitHub Issues: [TraceMind-AI Issues](https://github.com/Mandark-droid/TraceMind-AI/issues) | |
| - LinkedIn: [@kshitij-thakkar](https://www.linkedin.com/in/kshitij-thakkar-2061b924) | |
| **Support**: | |
| - For TraceMind bugs: Open GitHub issue | |
| - For Modal issues: https://modal.com/docs/support | |
| - For HF Jobs issues: https://discuss.huggingface.co/ | |
| --- | |
| *TraceMind-AI - Multi-cloud agent evaluation made simple* ☁️ | |
| """) | |
| def create_documentation_screen(): | |
| """ | |
| Create the complete documentation screen with tabs | |
| Returns: | |
| gr.Column: Gradio Column component for documentation (can be shown/hidden) | |
| """ | |
| with gr.Column(visible=False) as documentation_interface: | |
| gr.Markdown(""" | |
| # 📚 TraceMind Documentation | |
| Comprehensive documentation for the entire TraceMind ecosystem | |
| """) | |
| with gr.Tabs(): | |
| with gr.Tab("📖 About"): | |
| create_about_tab() | |
| with gr.Tab("🔭 TraceVerde"): | |
| create_traceverde_tab() | |
| with gr.Tab("📊 SmolTrace"): | |
| create_smoltrace_tab() | |
| with gr.Tab("🔌 TraceMind-MCP-Server"): | |
| create_mcp_server_tab() | |
| with gr.Tab("☁️ Job Submission"): | |
| create_job_submission_tab() | |
| gr.Markdown(""" | |
| --- | |
| ### 💡 Quick Navigation | |
| - **Getting Started**: Start with the "About" tab for ecosystem overview | |
| - **Instrumentation**: See "TraceVerde" for adding observability to your agents | |
| - **Evaluation**: Check "SmolTrace" for running evaluations | |
| - **MCP Integration**: Explore "TraceMind-MCP-Server" for intelligent analysis | |
| ### 🔗 External Resources | |
| - [GitHub Organization](https://github.com/Mandark-droid) | |
| - [HuggingFace Spaces](https://huggingface.co/MCP-1st-Birthday) | |
| - [MCP Specification](https://modelcontextprotocol.io) | |
| *Built with ❤️ for MCP's 1st Birthday Hackathon* | |
| """) | |
| return documentation_interface | |
| if __name__ == "__main__": | |
| # For standalone testing | |
| with gr.Blocks() as demo: | |
| doc_screen = create_documentation_screen() | |
| # Make it visible for standalone testing | |
| doc_screen.visible = True | |
| demo.launch() | |