kshitijthakkar commited on
Commit
e4b0c31
Β·
1 Parent(s): 0795b68

fix: Correct Gemini model name to 2.5 Flash

Browse files
Files changed (4) hide show
  1. README.md +11 -11
  2. app.py +8 -8
  3. gemini_client.py +1 -1
  4. mcp_tools.py +8 -8
README.md CHANGED
@@ -7,7 +7,7 @@ sdk: docker
7
  app_port: 7860
8
  pinned: true
9
  license: agpl-3.0
10
- short_description: MCP server for agent evaluation with Gemini 2.5 Pro
11
  tags:
12
  - building-mcp-track-enterprise
13
  - mcp
@@ -76,7 +76,7 @@ This MCP server is part of a complete agent evaluation ecosystem built on two fo
76
  2. **debug prompts**: Templates for debugging scenarios
77
  3. **optimization prompts**: Templates for optimization goals
78
 
79
- All analysis is powered by **Google Gemini 2.5 Pro** for intelligent, context-aware insights.
80
 
81
  ## πŸ”— Quick Links
82
 
@@ -115,7 +115,7 @@ All analysis is powered by **Google Gemini 2.5 Pro** for intelligent, context-aw
115
  - βœ… **Production-Ready**: Deployable to HuggingFace Spaces with SSE transport
116
  - βœ… **Testing Interface**: Beautiful Gradio UI for testing all components
117
  - βœ… **Enterprise Focus**: Cost optimization, debugging, decision support, and custom dataset generation
118
- - βœ… **Google Gemini Powered**: Leverages Gemini 2.5 Pro for intelligent analysis
119
  - βœ… **17 Total Components**: 11 Tools + 3 Resources + 3 Prompts
120
 
121
  ### πŸ› οΈ Eleven Production-Ready Tools
@@ -254,7 +254,7 @@ Loads SMOLTRACE datasets from HuggingFace and returns raw data as JSON:
254
 
255
  #### 8. generate_synthetic_dataset
256
 
257
- Generates domain-specific synthetic test datasets for SMOLTRACE evaluations using Google Gemini 2.5 Pro:
258
  - AI-powered task generation tailored to your domain
259
  - Custom tool specifications
260
  - Configurable difficulty distribution (balanced, easy_only, medium_only, hard_only, progressive)
@@ -561,7 +561,7 @@ A: The MCP endpoint is publicly accessible. However, the tools may require Huggi
561
  ### Available MCP Components
562
 
563
  **Tools** (9):
564
- 1. **analyze_leaderboard**: AI-powered leaderboard analysis with Gemini 2.5 Pro
565
  2. **debug_trace**: Trace debugging with AI insights
566
  3. **estimate_cost**: Cost estimation with optimization recommendations
567
  4. **compare_runs**: Compare two evaluation runs with AI-powered analysis
@@ -588,7 +588,7 @@ See full API documentation in the Gradio interface under "πŸ“– API Documentation
588
  ```
589
  TraceMind-mcp-server/
590
  β”œβ”€β”€ app.py # Gradio UI + MCP server (mcp_server=True)
591
- β”œβ”€β”€ gemini_client.py # Google Gemini 2.5 Pro integration
592
  β”œβ”€β”€ mcp_tools.py # 7 tool implementations
593
  β”œβ”€β”€ requirements.txt # Python dependencies
594
  β”œβ”€β”€ .env.example # Environment variable template
@@ -598,7 +598,7 @@ TraceMind-mcp-server/
598
 
599
  **Key Technologies**:
600
  - **Gradio 6 with MCP support**: `gradio[mcp]` provides native MCP server capabilities
601
- - **Google Gemini 2.5 Pro**: Latest AI model for intelligent analysis
602
  - **HuggingFace Datasets**: Data source for evaluations
603
  - **Streamable HTTP Transport**: Modern streaming protocol for MCP communication (recommended)
604
  - **SSE Transport**: Server-Sent Events for legacy MCP compatibility (deprecated)
@@ -699,7 +699,7 @@ Note: This requires actual trace data from an evaluation run. For testing purpos
699
  - Integrates with enterprise data infrastructure (HuggingFace datasets)
700
 
701
  **Technology Stack**
702
- - **AI Analysis**: Google Gemini 2.5 Pro for all intelligent insights
703
  - **MCP Framework**: Gradio 6 with native MCP support
704
  - **Data Source**: HuggingFace Datasets
705
  - **Transport**: Streamable HTTP (recommended) and SSE (deprecated)
@@ -727,7 +727,7 @@ Main Gradio application with:
727
  - API documentation
728
 
729
  ### gemini_client.py
730
- Google Gemini 2.5 Pro client that:
731
  - Handles API authentication
732
  - Provides specialized analysis methods for different data types
733
  - Formats prompts for optimal results
@@ -842,7 +842,7 @@ gemini_client = GeminiClient(model_name="gemini-2.5-flash-latest")
842
  Special thanks to the sponsors of **MCP's 1st Birthday Hackathon** (November 14-30, 2025):
843
 
844
  - **πŸ€— HuggingFace** - Hosting platform and dataset infrastructure
845
- - **🧠 Google Gemini** - AI analysis powered by Gemini 2.5 Pro API
846
  - **⚑ Modal** - Serverless infrastructure partner
847
  - **🏒 Anthropic** - MCP protocol creators
848
  - **🎨 Gradio** - Native MCP framework support
@@ -901,7 +901,7 @@ For issues or questions:
901
  - 3 data resources (leaderboard, trace, cost data)
902
  - 3 prompt templates (analysis, debug, optimization)
903
  - Gradio native MCP support with decorators (`@gr.mcp.*`)
904
- - Google Gemini 2.5 Pro integration for all AI analysis
905
  - Live HuggingFace dataset integration
906
  - **Performance Optimizations**:
907
  - get_top_performers: 90% token reduction vs full leaderboard
 
7
  app_port: 7860
8
  pinned: true
9
  license: agpl-3.0
10
+ short_description: MCP server for agent evaluation with Gemini 2.5 Flash
11
  tags:
12
  - building-mcp-track-enterprise
13
  - mcp
 
76
  2. **debug prompts**: Templates for debugging scenarios
77
  3. **optimization prompts**: Templates for optimization goals
78
 
79
+ All analysis is powered by **Google Gemini 2.5 Flash** for intelligent, context-aware insights.
80
 
81
  ## πŸ”— Quick Links
82
 
 
115
  - βœ… **Production-Ready**: Deployable to HuggingFace Spaces with SSE transport
116
  - βœ… **Testing Interface**: Beautiful Gradio UI for testing all components
117
  - βœ… **Enterprise Focus**: Cost optimization, debugging, decision support, and custom dataset generation
118
+ - βœ… **Google Gemini Powered**: Leverages Gemini 2.5 Flash for intelligent analysis
119
  - βœ… **17 Total Components**: 11 Tools + 3 Resources + 3 Prompts
120
 
121
  ### πŸ› οΈ Eleven Production-Ready Tools
 
254
 
255
  #### 8. generate_synthetic_dataset
256
 
257
+ Generates domain-specific synthetic test datasets for SMOLTRACE evaluations using Google Gemini 2.5 Flash:
258
  - AI-powered task generation tailored to your domain
259
  - Custom tool specifications
260
  - Configurable difficulty distribution (balanced, easy_only, medium_only, hard_only, progressive)
 
561
  ### Available MCP Components
562
 
563
  **Tools** (9):
564
+ 1. **analyze_leaderboard**: AI-powered leaderboard analysis with Gemini 2.5 Flash
565
  2. **debug_trace**: Trace debugging with AI insights
566
  3. **estimate_cost**: Cost estimation with optimization recommendations
567
  4. **compare_runs**: Compare two evaluation runs with AI-powered analysis
 
588
  ```
589
  TraceMind-mcp-server/
590
  β”œβ”€β”€ app.py # Gradio UI + MCP server (mcp_server=True)
591
+ β”œβ”€β”€ gemini_client.py # Google Gemini 2.5 Flash integration
592
  β”œβ”€β”€ mcp_tools.py # 7 tool implementations
593
  β”œβ”€β”€ requirements.txt # Python dependencies
594
  β”œβ”€β”€ .env.example # Environment variable template
 
598
 
599
  **Key Technologies**:
600
  - **Gradio 6 with MCP support**: `gradio[mcp]` provides native MCP server capabilities
601
+ - **Google Gemini 2.5 Flash**: Latest AI model for intelligent analysis
602
  - **HuggingFace Datasets**: Data source for evaluations
603
  - **Streamable HTTP Transport**: Modern streaming protocol for MCP communication (recommended)
604
  - **SSE Transport**: Server-Sent Events for legacy MCP compatibility (deprecated)
 
699
  - Integrates with enterprise data infrastructure (HuggingFace datasets)
700
 
701
  **Technology Stack**
702
+ - **AI Analysis**: Google Gemini 2.5 Flash for all intelligent insights
703
  - **MCP Framework**: Gradio 6 with native MCP support
704
  - **Data Source**: HuggingFace Datasets
705
  - **Transport**: Streamable HTTP (recommended) and SSE (deprecated)
 
727
  - API documentation
728
 
729
  ### gemini_client.py
730
+ Google Gemini 2.5 Flash client that:
731
  - Handles API authentication
732
  - Provides specialized analysis methods for different data types
733
  - Formats prompts for optimal results
 
842
  Special thanks to the sponsors of **MCP's 1st Birthday Hackathon** (November 14-30, 2025):
843
 
844
  - **πŸ€— HuggingFace** - Hosting platform and dataset infrastructure
845
+ - **🧠 Google Gemini** - AI analysis powered by Gemini 2.5 Flash API
846
  - **⚑ Modal** - Serverless infrastructure partner
847
  - **🏒 Anthropic** - MCP protocol creators
848
  - **🎨 Gradio** - Native MCP framework support
 
901
  - 3 data resources (leaderboard, trace, cost data)
902
  - 3 prompt templates (analysis, debug, optimization)
903
  - Gradio native MCP support with decorators (`@gr.mcp.*`)
904
+ - Google Gemini 2.5 Flash integration for all AI analysis
905
  - Live HuggingFace dataset integration
906
  - **Performance Optimizations**:
907
  - get_top_performers: 90% token reduction vs full leaderboard
app.py CHANGED
@@ -22,7 +22,7 @@ Architecture:
22
  β†’ MCP Endpoint (Gradio SSE)
23
  β†’ TraceMind MCP Server (this file)
24
  β†’ Tools (mcp_tools.py)
25
- β†’ Google Gemini 2.5 Pro API
26
 
27
  For Track 1: Building MCP Servers - Enterprise Category
28
  https://huggingface.co/MCP-1st-Birthday
@@ -139,7 +139,7 @@ def create_gradio_ui():
139
  #### πŸ€– TraceMind MCP Server (This Project)
140
  **Track 1: Building MCP (Enterprise)**
141
  - Provides AI-powered MCP tools for analyzing evaluation data
142
- - Uses Google Gemini 2.5 Pro for intelligent insights
143
  - 11 tools + 3 resources + 3 prompts
144
  - [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
145
 
@@ -170,7 +170,7 @@ def create_gradio_ui():
170
 
171
  TraceMind MCP Server provides intelligent analysis tools for agent evaluation data through the Model Context Protocol (MCP).
172
 
173
- **Powered by**: Google Gemini 2.5 Pro
174
 
175
  **🎬 [Quick Demo (5 min)](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)** | **πŸ“Ί [Full Demo (20 min)](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)**
176
 
@@ -265,7 +265,7 @@ def create_gradio_ui():
265
  Analyze agent evaluation leaderboard and generate AI-powered insights.
266
 
267
  This tool loads agent evaluation data from HuggingFace datasets and uses
268
- Google Gemini 2.5 Pro to provide intelligent analysis of top performers,
269
  trends, cost/performance trade-offs, and actionable recommendations.
270
 
271
  Args:
@@ -327,7 +327,7 @@ def create_gradio_ui():
327
  Debug a specific agent execution trace using OpenTelemetry data.
328
 
329
  This tool analyzes OpenTelemetry trace data from agent executions and uses
330
- Google Gemini 2.5 Pro to answer specific questions about the execution flow,
331
  identify bottlenecks, explain agent behavior, and provide debugging insights.
332
 
333
  Args:
@@ -409,7 +409,7 @@ def create_gradio_ui():
409
  Estimate the cost, duration, and CO2 emissions of running agent evaluations.
410
 
411
  This tool predicts costs before running evaluations by calculating LLM API costs,
412
- HuggingFace Jobs compute costs, and CO2 emissions. Uses Google Gemini 2.5 Pro
413
  to provide detailed cost breakdown and optimization recommendations.
414
 
415
  Args:
@@ -486,7 +486,7 @@ def create_gradio_ui():
486
  Compare two evaluation runs and generate AI-powered comparative analysis.
487
 
488
  This tool fetches data for two evaluation runs from the leaderboard and uses
489
- Google Gemini 2.5 Pro to provide intelligent comparison across multiple dimensions:
490
  success rate, cost efficiency, speed, environmental impact, and use case recommendations.
491
 
492
  Args:
@@ -1683,7 +1683,7 @@ if __name__ == "__main__":
1683
  logger.info(" βœ“ 7 AI-Powered Tools (Leaderboard + Trace + Cost + Dataset)")
1684
  logger.info(" βœ“ 3 Real-Time Resources (leaderboard, trace, cost data)")
1685
  logger.info(" βœ“ 3 Prompt Templates (analysis, debug, optimization)")
1686
- logger.info(" βœ“ Google Gemini 2.5 Pro - Intelligent Analysis")
1687
  logger.info(" βœ“ HuggingFace Dataset Integration")
1688
  logger.info(" βœ“ SMOLTRACE Format Support")
1689
  logger.info(" βœ“ Synthetic Dataset Generation")
 
22
  β†’ MCP Endpoint (Gradio SSE)
23
  β†’ TraceMind MCP Server (this file)
24
  β†’ Tools (mcp_tools.py)
25
+ β†’ Google Gemini 2.5 Flash API
26
 
27
  For Track 1: Building MCP Servers - Enterprise Category
28
  https://huggingface.co/MCP-1st-Birthday
 
139
  #### πŸ€– TraceMind MCP Server (This Project)
140
  **Track 1: Building MCP (Enterprise)**
141
  - Provides AI-powered MCP tools for analyzing evaluation data
142
+ - Uses Google Gemini 2.5 Flash for intelligent insights
143
  - 11 tools + 3 resources + 3 prompts
144
  - [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
145
 
 
170
 
171
  TraceMind MCP Server provides intelligent analysis tools for agent evaluation data through the Model Context Protocol (MCP).
172
 
173
+ **Powered by**: Google Gemini 2.5 Flash
174
 
175
  **🎬 [Quick Demo (5 min)](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)** | **πŸ“Ί [Full Demo (20 min)](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)**
176
 
 
265
  Analyze agent evaluation leaderboard and generate AI-powered insights.
266
 
267
  This tool loads agent evaluation data from HuggingFace datasets and uses
268
+ Google Gemini 2.5 Flash to provide intelligent analysis of top performers,
269
  trends, cost/performance trade-offs, and actionable recommendations.
270
 
271
  Args:
 
327
  Debug a specific agent execution trace using OpenTelemetry data.
328
 
329
  This tool analyzes OpenTelemetry trace data from agent executions and uses
330
+ Google Gemini 2.5 Flash to answer specific questions about the execution flow,
331
  identify bottlenecks, explain agent behavior, and provide debugging insights.
332
 
333
  Args:
 
409
  Estimate the cost, duration, and CO2 emissions of running agent evaluations.
410
 
411
  This tool predicts costs before running evaluations by calculating LLM API costs,
412
+ HuggingFace Jobs compute costs, and CO2 emissions. Uses Google Gemini 2.5 Flash
413
  to provide detailed cost breakdown and optimization recommendations.
414
 
415
  Args:
 
486
  Compare two evaluation runs and generate AI-powered comparative analysis.
487
 
488
  This tool fetches data for two evaluation runs from the leaderboard and uses
489
+ Google Gemini 2.5 Flash to provide intelligent comparison across multiple dimensions:
490
  success rate, cost efficiency, speed, environmental impact, and use case recommendations.
491
 
492
  Args:
 
1683
  logger.info(" βœ“ 7 AI-Powered Tools (Leaderboard + Trace + Cost + Dataset)")
1684
  logger.info(" βœ“ 3 Real-Time Resources (leaderboard, trace, cost data)")
1685
  logger.info(" βœ“ 3 Prompt Templates (analysis, debug, optimization)")
1686
+ logger.info(" βœ“ Google Gemini 2.5 Flash - Intelligent Analysis")
1687
  logger.info(" βœ“ HuggingFace Dataset Integration")
1688
  logger.info(" βœ“ SMOLTRACE Format Support")
1689
  logger.info(" βœ“ Synthetic Dataset Generation")
gemini_client.py CHANGED
@@ -1,7 +1,7 @@
1
  """
2
  Gemini Client for TraceMind MCP Server
3
 
4
- Handles all interactions with Google Gemini 2.5 Pro API
5
  """
6
 
7
  import os
 
1
  """
2
  Gemini Client for TraceMind MCP Server
3
 
4
+ Handles all interactions with Google Gemini 2.5 Flash API
5
  """
6
 
7
  import os
mcp_tools.py CHANGED
@@ -27,7 +27,7 @@ docstrings, and type hints.
27
  debug_prompt - Standardized templates for debugging scenarios
28
  optimization_prompt - Standardized templates for optimization goals
29
 
30
- All AI analysis powered by Google Gemini 2.5 Pro.
31
  Track 1: Building MCP Servers - Enterprise Category
32
  """
33
 
@@ -61,7 +61,7 @@ async def analyze_leaderboard(
61
  DO NOT use the leaderboard:// resource for questions - use this tool instead!
62
  The resource only returns raw JSON data without any analysis.
63
 
64
- This tool uses Google Gemini 2.5 Pro to provide intelligent analysis of
65
  agent evaluation results, including top performers, trends, cost/performance
66
  trade-offs, and actionable recommendations.
67
 
@@ -166,7 +166,7 @@ async def debug_trace(
166
  DO NOT use the trace:// resource for questions - use this tool instead!
167
  The resource only returns raw OTEL JSON data without any analysis.
168
 
169
- This tool uses Google Gemini 2.5 Pro to analyze OpenTelemetry trace data and
170
  provide intelligent debugging insights, step-by-step breakdowns, and answers
171
  to specific questions about execution flow.
172
 
@@ -267,7 +267,7 @@ async def estimate_cost(
267
  DO NOT use the cost:// resource for estimates - use this tool instead!
268
  The resource only returns raw pricing tables without calculations.
269
 
270
- This tool uses Google Gemini 2.5 Pro to calculate LLM API costs, HuggingFace
271
  Jobs compute costs, CO2 emissions, and provide intelligent cost breakdowns with
272
  optimization recommendations.
273
 
@@ -490,7 +490,7 @@ async def compare_runs(
490
  Compare two evaluation runs and generate AI-powered comparative analysis.
491
 
492
  This tool fetches data for two evaluation runs from the leaderboard and uses
493
- Google Gemini 2.5 Pro to provide intelligent comparison across multiple dimensions:
494
  success rate, cost efficiency, speed, environmental impact, and use case recommendations.
495
 
496
  Args:
@@ -693,7 +693,7 @@ async def analyze_results(
693
  - Analyze which types of tasks work well vs poorly
694
 
695
  This tool analyzes individual test case results (not aggregate leaderboard data)
696
- and uses Google Gemini 2.5 Pro to provide actionable optimization recommendations.
697
 
698
  Args:
699
  results_repo (str): HuggingFace dataset repository containing results (e.g., "username/smoltrace-results-gpt4-20251114")
@@ -1462,7 +1462,7 @@ async def generate_synthetic_dataset(
1462
  """
1463
  Generate domain-specific synthetic test datasets for SMOLTRACE evaluations using AI.
1464
 
1465
- This tool uses Google Gemini 2.5 Pro to create realistic, domain-specific evaluation
1466
  tasks that follow the SMOLTRACE task dataset format. Perfect for creating custom
1467
  benchmarks when standard datasets don't fit your use case.
1468
 
@@ -2158,7 +2158,7 @@ Start your response with the YAML content immediately."""
2158
  "agent_type": agent_type,
2159
  "template_name": template_name,
2160
  "base_template_url": template_url,
2161
- "customization_method": "Google Gemini 2.5 Pro"
2162
  },
2163
  "prompt_template": customized_template,
2164
  "usage_instructions": f"""
 
27
  debug_prompt - Standardized templates for debugging scenarios
28
  optimization_prompt - Standardized templates for optimization goals
29
 
30
+ All AI analysis powered by Google Gemini 2.5 Flash.
31
  Track 1: Building MCP Servers - Enterprise Category
32
  """
33
 
 
61
  DO NOT use the leaderboard:// resource for questions - use this tool instead!
62
  The resource only returns raw JSON data without any analysis.
63
 
64
+ This tool uses Google Gemini 2.5 Flash to provide intelligent analysis of
65
  agent evaluation results, including top performers, trends, cost/performance
66
  trade-offs, and actionable recommendations.
67
 
 
166
  DO NOT use the trace:// resource for questions - use this tool instead!
167
  The resource only returns raw OTEL JSON data without any analysis.
168
 
169
+ This tool uses Google Gemini 2.5 Flash to analyze OpenTelemetry trace data and
170
  provide intelligent debugging insights, step-by-step breakdowns, and answers
171
  to specific questions about execution flow.
172
 
 
267
  DO NOT use the cost:// resource for estimates - use this tool instead!
268
  The resource only returns raw pricing tables without calculations.
269
 
270
+ This tool uses Google Gemini 2.5 Flash to calculate LLM API costs, HuggingFace
271
  Jobs compute costs, CO2 emissions, and provide intelligent cost breakdowns with
272
  optimization recommendations.
273
 
 
490
  Compare two evaluation runs and generate AI-powered comparative analysis.
491
 
492
  This tool fetches data for two evaluation runs from the leaderboard and uses
493
+ Google Gemini 2.5 Flash to provide intelligent comparison across multiple dimensions:
494
  success rate, cost efficiency, speed, environmental impact, and use case recommendations.
495
 
496
  Args:
 
693
  - Analyze which types of tasks work well vs poorly
694
 
695
  This tool analyzes individual test case results (not aggregate leaderboard data)
696
+ and uses Google Gemini 2.5 Flash to provide actionable optimization recommendations.
697
 
698
  Args:
699
  results_repo (str): HuggingFace dataset repository containing results (e.g., "username/smoltrace-results-gpt4-20251114")
 
1462
  """
1463
  Generate domain-specific synthetic test datasets for SMOLTRACE evaluations using AI.
1464
 
1465
+ This tool uses Google Gemini 2.5 Flash to create realistic, domain-specific evaluation
1466
  tasks that follow the SMOLTRACE task dataset format. Perfect for creating custom
1467
  benchmarks when standard datasets don't fit your use case.
1468
 
 
2158
  "agent_type": agent_type,
2159
  "template_name": template_name,
2160
  "base_template_url": template_url,
2161
+ "customization_method": "Google Gemini 2.5 Flash"
2162
  },
2163
  "prompt_template": customized_template,
2164
  "usage_instructions": f"""