Spaces:

MCP-1st-Birthday
/

TraceMind-mcp-server

Running

App Files Files Community

kshitijthakkar commited on 13 days ago

Commit

e4b0c31

1 Parent(s): 0795b68

fix: Correct Gemini model name to 2.5 Flash

Browse files

Files changed (4) hide show

README.md +11 -11
app.py +8 -8
gemini_client.py +1 -1
mcp_tools.py +8 -8

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ sdk: docker
 app_port: 7860
 pinned: true
 license: agpl-3.0
-short_description: MCP server for agent evaluation with Gemini 2.5 Pro
 tags:
   - building-mcp-track-enterprise
   - mcp
@@ -76,7 +76,7 @@ This MCP server is part of a complete agent evaluation ecosystem built on two fo
 2. **debug prompts**: Templates for debugging scenarios
 3. **optimization prompts**: Templates for optimization goals
-All analysis is powered by **Google Gemini 2.5 Pro** for intelligent, context-aware insights.
 ## 🔗 Quick Links
@@ -115,7 +115,7 @@ All analysis is powered by **Google Gemini 2.5 Pro** for intelligent, context-aw
 - ✅ **Production-Ready**: Deployable to HuggingFace Spaces with SSE transport
 - ✅ **Testing Interface**: Beautiful Gradio UI for testing all components
 - ✅ **Enterprise Focus**: Cost optimization, debugging, decision support, and custom dataset generation
-- ✅ **Google Gemini Powered**: Leverages Gemini 2.5 Pro for intelligent analysis
 - ✅ **17 Total Components**: 11 Tools + 3 Resources + 3 Prompts
 ### 🛠️ Eleven Production-Ready Tools
@@ -254,7 +254,7 @@ Loads SMOLTRACE datasets from HuggingFace and returns raw data as JSON:
 #### 8. generate_synthetic_dataset
-Generates domain-specific synthetic test datasets for SMOLTRACE evaluations using Google Gemini 2.5 Pro:
 - AI-powered task generation tailored to your domain
 - Custom tool specifications
 - Configurable difficulty distribution (balanced, easy_only, medium_only, hard_only, progressive)
@@ -561,7 +561,7 @@ A: The MCP endpoint is publicly accessible. However, the tools may require Huggi
 ### Available MCP Components
 **Tools** (9):
-1. **analyze_leaderboard**: AI-powered leaderboard analysis with Gemini 2.5 Pro
 2. **debug_trace**: Trace debugging with AI insights
 3. **estimate_cost**: Cost estimation with optimization recommendations
 4. **compare_runs**: Compare two evaluation runs with AI-powered analysis
@@ -588,7 +588,7 @@ See full API documentation in the Gradio interface under "📖 API Documentation
 ```
 TraceMind-mcp-server/
 ├── app.py                      # Gradio UI + MCP server (mcp_server=True)
-├── gemini_client.py            # Google Gemini 2.5 Pro integration
 ├── mcp_tools.py                # 7 tool implementations
 ├── requirements.txt            # Python dependencies
 ├── .env.example                # Environment variable template
@@ -598,7 +598,7 @@ TraceMind-mcp-server/
 **Key Technologies**:
 - **Gradio 6 with MCP support**: `gradio[mcp]` provides native MCP server capabilities
-- **Google Gemini 2.5 Pro**: Latest AI model for intelligent analysis
 - **HuggingFace Datasets**: Data source for evaluations
 - **Streamable HTTP Transport**: Modern streaming protocol for MCP communication (recommended)
 - **SSE Transport**: Server-Sent Events for legacy MCP compatibility (deprecated)
@@ -699,7 +699,7 @@ Note: This requires actual trace data from an evaluation run. For testing purpos
 - Integrates with enterprise data infrastructure (HuggingFace datasets)
 **Technology Stack**
-- **AI Analysis**: Google Gemini 2.5 Pro for all intelligent insights
 - **MCP Framework**: Gradio 6 with native MCP support
 - **Data Source**: HuggingFace Datasets
 - **Transport**: Streamable HTTP (recommended) and SSE (deprecated)
@@ -727,7 +727,7 @@ Main Gradio application with:
 - API documentation
 ### gemini_client.py
-Google Gemini 2.5 Pro client that:
 - Handles API authentication
 - Provides specialized analysis methods for different data types
 - Formats prompts for optimal results
@@ -842,7 +842,7 @@ gemini_client = GeminiClient(model_name="gemini-2.5-flash-latest")
 Special thanks to the sponsors of **MCP's 1st Birthday Hackathon** (November 14-30, 2025):
 - **🤗 HuggingFace** - Hosting platform and dataset infrastructure
-- **🧠 Google Gemini** - AI analysis powered by Gemini 2.5 Pro API
 - **⚡ Modal** - Serverless infrastructure partner
 - **🏢 Anthropic** - MCP protocol creators
 - **🎨 Gradio** - Native MCP framework support
@@ -901,7 +901,7 @@ For issues or questions:
   - 3 data resources (leaderboard, trace, cost data)
   - 3 prompt templates (analysis, debug, optimization)
 - Gradio native MCP support with decorators (`@gr.mcp.*`)
-- Google Gemini 2.5 Pro integration for all AI analysis
 - Live HuggingFace dataset integration
 - **Performance Optimizations**:
   - get_top_performers: 90% token reduction vs full leaderboard

 app_port: 7860
 pinned: true
 license: agpl-3.0
+short_description: MCP server for agent evaluation with Gemini 2.5 Flash
 tags:
   - building-mcp-track-enterprise
   - mcp
 2. **debug prompts**: Templates for debugging scenarios
 3. **optimization prompts**: Templates for optimization goals
+All analysis is powered by **Google Gemini 2.5 Flash** for intelligent, context-aware insights.
 ## 🔗 Quick Links
 - ✅ **Production-Ready**: Deployable to HuggingFace Spaces with SSE transport
 - ✅ **Testing Interface**: Beautiful Gradio UI for testing all components
 - ✅ **Enterprise Focus**: Cost optimization, debugging, decision support, and custom dataset generation
+- ✅ **Google Gemini Powered**: Leverages Gemini 2.5 Flash for intelligent analysis
 - ✅ **17 Total Components**: 11 Tools + 3 Resources + 3 Prompts
 ### 🛠️ Eleven Production-Ready Tools
 #### 8. generate_synthetic_dataset
+Generates domain-specific synthetic test datasets for SMOLTRACE evaluations using Google Gemini 2.5 Flash:
 - AI-powered task generation tailored to your domain
 - Custom tool specifications
 - Configurable difficulty distribution (balanced, easy_only, medium_only, hard_only, progressive)
 ### Available MCP Components
 **Tools** (9):
+1. **analyze_leaderboard**: AI-powered leaderboard analysis with Gemini 2.5 Flash
 2. **debug_trace**: Trace debugging with AI insights
 3. **estimate_cost**: Cost estimation with optimization recommendations
 4. **compare_runs**: Compare two evaluation runs with AI-powered analysis
 ```
 TraceMind-mcp-server/
 ├── app.py                      # Gradio UI + MCP server (mcp_server=True)
+├── gemini_client.py            # Google Gemini 2.5 Flash integration
 ├── mcp_tools.py                # 7 tool implementations
 ├── requirements.txt            # Python dependencies
 ├── .env.example                # Environment variable template
 **Key Technologies**:
 - **Gradio 6 with MCP support**: `gradio[mcp]` provides native MCP server capabilities
+- **Google Gemini 2.5 Flash**: Latest AI model for intelligent analysis
 - **HuggingFace Datasets**: Data source for evaluations
 - **Streamable HTTP Transport**: Modern streaming protocol for MCP communication (recommended)
 - **SSE Transport**: Server-Sent Events for legacy MCP compatibility (deprecated)
 - Integrates with enterprise data infrastructure (HuggingFace datasets)
 **Technology Stack**
+- **AI Analysis**: Google Gemini 2.5 Flash for all intelligent insights
 - **MCP Framework**: Gradio 6 with native MCP support
 - **Data Source**: HuggingFace Datasets
 - **Transport**: Streamable HTTP (recommended) and SSE (deprecated)
 - API documentation
 ### gemini_client.py
+Google Gemini 2.5 Flash client that:
 - Handles API authentication
 - Provides specialized analysis methods for different data types
 - Formats prompts for optimal results
 Special thanks to the sponsors of **MCP's 1st Birthday Hackathon** (November 14-30, 2025):
 - **🤗 HuggingFace** - Hosting platform and dataset infrastructure
+- **🧠 Google Gemini** - AI analysis powered by Gemini 2.5 Flash API
 - **⚡ Modal** - Serverless infrastructure partner
 - **🏢 Anthropic** - MCP protocol creators
 - **🎨 Gradio** - Native MCP framework support
   - 3 data resources (leaderboard, trace, cost data)
   - 3 prompt templates (analysis, debug, optimization)
 - Gradio native MCP support with decorators (`@gr.mcp.*`)
+- Google Gemini 2.5 Flash integration for all AI analysis
 - Live HuggingFace dataset integration
 - **Performance Optimizations**:
   - get_top_performers: 90% token reduction vs full leaderboard

app.py CHANGED Viewed

@@ -22,7 +22,7 @@ Architecture:
          → MCP Endpoint (Gradio SSE)
          → TraceMind MCP Server (this file)
          → Tools (mcp_tools.py)
-         → Google Gemini 2.5 Pro API
 For Track 1: Building MCP Servers - Enterprise Category
 https://huggingface.co/MCP-1st-Birthday
@@ -139,7 +139,7 @@ def create_gradio_ui():
             #### 🤖 TraceMind MCP Server (This Project)
             **Track 1: Building MCP (Enterprise)**
             - Provides AI-powered MCP tools for analyzing evaluation data
-            - Uses Google Gemini 2.5 Pro for intelligent insights
             - 11 tools + 3 resources + 3 prompts
             - [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
@@ -170,7 +170,7 @@ def create_gradio_ui():
             TraceMind MCP Server provides intelligent analysis tools for agent evaluation data through the Model Context Protocol (MCP).
-            **Powered by**: Google Gemini 2.5 Pro
             **🎬 [Quick Demo (5 min)](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)** | **📺 [Full Demo (20 min)](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)**
@@ -265,7 +265,7 @@ def create_gradio_ui():
                     Analyze agent evaluation leaderboard and generate AI-powered insights.
                     This tool loads agent evaluation data from HuggingFace datasets and uses
-                    Google Gemini 2.5 Pro to provide intelligent analysis of top performers,
                     trends, cost/performance trade-offs, and actionable recommendations.
                     Args:
@@ -327,7 +327,7 @@ def create_gradio_ui():
                     Debug a specific agent execution trace using OpenTelemetry data.
                     This tool analyzes OpenTelemetry trace data from agent executions and uses
-                    Google Gemini 2.5 Pro to answer specific questions about the execution flow,
                     identify bottlenecks, explain agent behavior, and provide debugging insights.
                     Args:
@@ -409,7 +409,7 @@ def create_gradio_ui():
                     Estimate the cost, duration, and CO2 emissions of running agent evaluations.
                     This tool predicts costs before running evaluations by calculating LLM API costs,
-                    HuggingFace Jobs compute costs, and CO2 emissions. Uses Google Gemini 2.5 Pro
                     to provide detailed cost breakdown and optimization recommendations.
                     Args:
@@ -486,7 +486,7 @@ def create_gradio_ui():
                     Compare two evaluation runs and generate AI-powered comparative analysis.
                     This tool fetches data for two evaluation runs from the leaderboard and uses
-                    Google Gemini 2.5 Pro to provide intelligent comparison across multiple dimensions:
                     success rate, cost efficiency, speed, environmental impact, and use case recommendations.
                     Args:
@@ -1683,7 +1683,7 @@ if __name__ == "__main__":
     logger.info("  ✓ 7 AI-Powered Tools (Leaderboard + Trace + Cost + Dataset)")
     logger.info("  ✓ 3 Real-Time Resources (leaderboard, trace, cost data)")
     logger.info("  ✓ 3 Prompt Templates (analysis, debug, optimization)")
-    logger.info("  ✓ Google Gemini 2.5 Pro - Intelligent Analysis")
     logger.info("  ✓ HuggingFace Dataset Integration")
     logger.info("  ✓ SMOLTRACE Format Support")
     logger.info("  ✓ Synthetic Dataset Generation")

          → MCP Endpoint (Gradio SSE)
          → TraceMind MCP Server (this file)
          → Tools (mcp_tools.py)
+         → Google Gemini 2.5 Flash API
 For Track 1: Building MCP Servers - Enterprise Category
 https://huggingface.co/MCP-1st-Birthday
             #### 🤖 TraceMind MCP Server (This Project)
             **Track 1: Building MCP (Enterprise)**
             - Provides AI-powered MCP tools for analyzing evaluation data
+            - Uses Google Gemini 2.5 Flash for intelligent insights
             - 11 tools + 3 resources + 3 prompts
             - [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
             TraceMind MCP Server provides intelligent analysis tools for agent evaluation data through the Model Context Protocol (MCP).
+            **Powered by**: Google Gemini 2.5 Flash
             **🎬 [Quick Demo (5 min)](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)** | **📺 [Full Demo (20 min)](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)**
                     Analyze agent evaluation leaderboard and generate AI-powered insights.
                     This tool loads agent evaluation data from HuggingFace datasets and uses
+                    Google Gemini 2.5 Flash to provide intelligent analysis of top performers,
                     trends, cost/performance trade-offs, and actionable recommendations.
                     Args:
                     Debug a specific agent execution trace using OpenTelemetry data.
                     This tool analyzes OpenTelemetry trace data from agent executions and uses
+                    Google Gemini 2.5 Flash to answer specific questions about the execution flow,
                     identify bottlenecks, explain agent behavior, and provide debugging insights.
                     Args:
                     Estimate the cost, duration, and CO2 emissions of running agent evaluations.
                     This tool predicts costs before running evaluations by calculating LLM API costs,
+                    HuggingFace Jobs compute costs, and CO2 emissions. Uses Google Gemini 2.5 Flash
                     to provide detailed cost breakdown and optimization recommendations.
                     Args:
                     Compare two evaluation runs and generate AI-powered comparative analysis.
                     This tool fetches data for two evaluation runs from the leaderboard and uses
+                    Google Gemini 2.5 Flash to provide intelligent comparison across multiple dimensions:
                     success rate, cost efficiency, speed, environmental impact, and use case recommendations.
                     Args:
     logger.info("  ✓ 7 AI-Powered Tools (Leaderboard + Trace + Cost + Dataset)")
     logger.info("  ✓ 3 Real-Time Resources (leaderboard, trace, cost data)")
     logger.info("  ✓ 3 Prompt Templates (analysis, debug, optimization)")
+    logger.info("  ✓ Google Gemini 2.5 Flash - Intelligent Analysis")
     logger.info("  ✓ HuggingFace Dataset Integration")
     logger.info("  ✓ SMOLTRACE Format Support")
     logger.info("  ✓ Synthetic Dataset Generation")

gemini_client.py CHANGED Viewed

@@ -1,7 +1,7 @@
 """
 Gemini Client for TraceMind MCP Server
-Handles all interactions with Google Gemini 2.5 Pro API
 """
 import os

 """
 Gemini Client for TraceMind MCP Server
+Handles all interactions with Google Gemini 2.5 Flash API
 """
 import os

mcp_tools.py CHANGED Viewed

@@ -27,7 +27,7 @@ docstrings, and type hints.
     debug_prompt - Standardized templates for debugging scenarios
     optimization_prompt - Standardized templates for optimization goals
-All AI analysis powered by Google Gemini 2.5 Pro.
 Track 1: Building MCP Servers - Enterprise Category
 """
@@ -61,7 +61,7 @@ async def analyze_leaderboard(
     DO NOT use the leaderboard:// resource for questions - use this tool instead!
     The resource only returns raw JSON data without any analysis.
-    This tool uses Google Gemini 2.5 Pro to provide intelligent analysis of
     agent evaluation results, including top performers, trends, cost/performance
     trade-offs, and actionable recommendations.
@@ -166,7 +166,7 @@ async def debug_trace(
     DO NOT use the trace:// resource for questions - use this tool instead!
     The resource only returns raw OTEL JSON data without any analysis.
-    This tool uses Google Gemini 2.5 Pro to analyze OpenTelemetry trace data and
     provide intelligent debugging insights, step-by-step breakdowns, and answers
     to specific questions about execution flow.
@@ -267,7 +267,7 @@ async def estimate_cost(
     DO NOT use the cost:// resource for estimates - use this tool instead!
     The resource only returns raw pricing tables without calculations.
-    This tool uses Google Gemini 2.5 Pro to calculate LLM API costs, HuggingFace
     Jobs compute costs, CO2 emissions, and provide intelligent cost breakdowns with
     optimization recommendations.
@@ -490,7 +490,7 @@ async def compare_runs(
     Compare two evaluation runs and generate AI-powered comparative analysis.
     This tool fetches data for two evaluation runs from the leaderboard and uses
-    Google Gemini 2.5 Pro to provide intelligent comparison across multiple dimensions:
     success rate, cost efficiency, speed, environmental impact, and use case recommendations.
     Args:
@@ -693,7 +693,7 @@ async def analyze_results(
     - Analyze which types of tasks work well vs poorly
     This tool analyzes individual test case results (not aggregate leaderboard data)
-    and uses Google Gemini 2.5 Pro to provide actionable optimization recommendations.
     Args:
         results_repo (str): HuggingFace dataset repository containing results (e.g., "username/smoltrace-results-gpt4-20251114")
@@ -1462,7 +1462,7 @@ async def generate_synthetic_dataset(
     """
     Generate domain-specific synthetic test datasets for SMOLTRACE evaluations using AI.
-    This tool uses Google Gemini 2.5 Pro to create realistic, domain-specific evaluation
     tasks that follow the SMOLTRACE task dataset format. Perfect for creating custom
     benchmarks when standard datasets don't fit your use case.
@@ -2158,7 +2158,7 @@ Start your response with the YAML content immediately."""
                 "agent_type": agent_type,
                 "template_name": template_name,
                 "base_template_url": template_url,
-                "customization_method": "Google Gemini 2.5 Pro"
             },
             "prompt_template": customized_template,
             "usage_instructions": f"""

     debug_prompt - Standardized templates for debugging scenarios
     optimization_prompt - Standardized templates for optimization goals
+All AI analysis powered by Google Gemini 2.5 Flash.
 Track 1: Building MCP Servers - Enterprise Category
 """
     DO NOT use the leaderboard:// resource for questions - use this tool instead!
     The resource only returns raw JSON data without any analysis.
+    This tool uses Google Gemini 2.5 Flash to provide intelligent analysis of
     agent evaluation results, including top performers, trends, cost/performance
     trade-offs, and actionable recommendations.
     DO NOT use the trace:// resource for questions - use this tool instead!
     The resource only returns raw OTEL JSON data without any analysis.
+    This tool uses Google Gemini 2.5 Flash to analyze OpenTelemetry trace data and
     provide intelligent debugging insights, step-by-step breakdowns, and answers
     to specific questions about execution flow.
     DO NOT use the cost:// resource for estimates - use this tool instead!
     The resource only returns raw pricing tables without calculations.
+    This tool uses Google Gemini 2.5 Flash to calculate LLM API costs, HuggingFace
     Jobs compute costs, CO2 emissions, and provide intelligent cost breakdowns with
     optimization recommendations.
     Compare two evaluation runs and generate AI-powered comparative analysis.
     This tool fetches data for two evaluation runs from the leaderboard and uses
+    Google Gemini 2.5 Flash to provide intelligent comparison across multiple dimensions:
     success rate, cost efficiency, speed, environmental impact, and use case recommendations.
     Args:
     - Analyze which types of tasks work well vs poorly
     This tool analyzes individual test case results (not aggregate leaderboard data)
+    and uses Google Gemini 2.5 Flash to provide actionable optimization recommendations.
     Args:
         results_repo (str): HuggingFace dataset repository containing results (e.g., "username/smoltrace-results-gpt4-20251114")
     """
     Generate domain-specific synthetic test datasets for SMOLTRACE evaluations using AI.
+    This tool uses Google Gemini 2.5 Flash to create realistic, domain-specific evaluation
     tasks that follow the SMOLTRACE task dataset format. Perfect for creating custom
     benchmarks when standard datasets don't fit your use case.
                 "agent_type": agent_type,
                 "template_name": template_name,
                 "base_template_url": template_url,
+                "customization_method": "Google Gemini 2.5 Flash"
             },
             "prompt_template": customized_template,
             "usage_instructions": f"""