Commit
Β·
e4b0c31
1
Parent(s):
0795b68
fix: Correct Gemini model name to 2.5 Flash
Browse files- README.md +11 -11
- app.py +8 -8
- gemini_client.py +1 -1
- mcp_tools.py +8 -8
README.md
CHANGED
|
@@ -7,7 +7,7 @@ sdk: docker
|
|
| 7 |
app_port: 7860
|
| 8 |
pinned: true
|
| 9 |
license: agpl-3.0
|
| 10 |
-
short_description: MCP server for agent evaluation with Gemini 2.5
|
| 11 |
tags:
|
| 12 |
- building-mcp-track-enterprise
|
| 13 |
- mcp
|
|
@@ -76,7 +76,7 @@ This MCP server is part of a complete agent evaluation ecosystem built on two fo
|
|
| 76 |
2. **debug prompts**: Templates for debugging scenarios
|
| 77 |
3. **optimization prompts**: Templates for optimization goals
|
| 78 |
|
| 79 |
-
All analysis is powered by **Google Gemini 2.5
|
| 80 |
|
| 81 |
## π Quick Links
|
| 82 |
|
|
@@ -115,7 +115,7 @@ All analysis is powered by **Google Gemini 2.5 Pro** for intelligent, context-aw
|
|
| 115 |
- β
**Production-Ready**: Deployable to HuggingFace Spaces with SSE transport
|
| 116 |
- β
**Testing Interface**: Beautiful Gradio UI for testing all components
|
| 117 |
- β
**Enterprise Focus**: Cost optimization, debugging, decision support, and custom dataset generation
|
| 118 |
-
- β
**Google Gemini Powered**: Leverages Gemini 2.5
|
| 119 |
- β
**17 Total Components**: 11 Tools + 3 Resources + 3 Prompts
|
| 120 |
|
| 121 |
### π οΈ Eleven Production-Ready Tools
|
|
@@ -254,7 +254,7 @@ Loads SMOLTRACE datasets from HuggingFace and returns raw data as JSON:
|
|
| 254 |
|
| 255 |
#### 8. generate_synthetic_dataset
|
| 256 |
|
| 257 |
-
Generates domain-specific synthetic test datasets for SMOLTRACE evaluations using Google Gemini 2.5
|
| 258 |
- AI-powered task generation tailored to your domain
|
| 259 |
- Custom tool specifications
|
| 260 |
- Configurable difficulty distribution (balanced, easy_only, medium_only, hard_only, progressive)
|
|
@@ -561,7 +561,7 @@ A: The MCP endpoint is publicly accessible. However, the tools may require Huggi
|
|
| 561 |
### Available MCP Components
|
| 562 |
|
| 563 |
**Tools** (9):
|
| 564 |
-
1. **analyze_leaderboard**: AI-powered leaderboard analysis with Gemini 2.5
|
| 565 |
2. **debug_trace**: Trace debugging with AI insights
|
| 566 |
3. **estimate_cost**: Cost estimation with optimization recommendations
|
| 567 |
4. **compare_runs**: Compare two evaluation runs with AI-powered analysis
|
|
@@ -588,7 +588,7 @@ See full API documentation in the Gradio interface under "π API Documentation
|
|
| 588 |
```
|
| 589 |
TraceMind-mcp-server/
|
| 590 |
βββ app.py # Gradio UI + MCP server (mcp_server=True)
|
| 591 |
-
βββ gemini_client.py # Google Gemini 2.5
|
| 592 |
βββ mcp_tools.py # 7 tool implementations
|
| 593 |
βββ requirements.txt # Python dependencies
|
| 594 |
βββ .env.example # Environment variable template
|
|
@@ -598,7 +598,7 @@ TraceMind-mcp-server/
|
|
| 598 |
|
| 599 |
**Key Technologies**:
|
| 600 |
- **Gradio 6 with MCP support**: `gradio[mcp]` provides native MCP server capabilities
|
| 601 |
-
- **Google Gemini 2.5
|
| 602 |
- **HuggingFace Datasets**: Data source for evaluations
|
| 603 |
- **Streamable HTTP Transport**: Modern streaming protocol for MCP communication (recommended)
|
| 604 |
- **SSE Transport**: Server-Sent Events for legacy MCP compatibility (deprecated)
|
|
@@ -699,7 +699,7 @@ Note: This requires actual trace data from an evaluation run. For testing purpos
|
|
| 699 |
- Integrates with enterprise data infrastructure (HuggingFace datasets)
|
| 700 |
|
| 701 |
**Technology Stack**
|
| 702 |
-
- **AI Analysis**: Google Gemini 2.5
|
| 703 |
- **MCP Framework**: Gradio 6 with native MCP support
|
| 704 |
- **Data Source**: HuggingFace Datasets
|
| 705 |
- **Transport**: Streamable HTTP (recommended) and SSE (deprecated)
|
|
@@ -727,7 +727,7 @@ Main Gradio application with:
|
|
| 727 |
- API documentation
|
| 728 |
|
| 729 |
### gemini_client.py
|
| 730 |
-
Google Gemini 2.5
|
| 731 |
- Handles API authentication
|
| 732 |
- Provides specialized analysis methods for different data types
|
| 733 |
- Formats prompts for optimal results
|
|
@@ -842,7 +842,7 @@ gemini_client = GeminiClient(model_name="gemini-2.5-flash-latest")
|
|
| 842 |
Special thanks to the sponsors of **MCP's 1st Birthday Hackathon** (November 14-30, 2025):
|
| 843 |
|
| 844 |
- **π€ HuggingFace** - Hosting platform and dataset infrastructure
|
| 845 |
-
- **π§ Google Gemini** - AI analysis powered by Gemini 2.5
|
| 846 |
- **β‘ Modal** - Serverless infrastructure partner
|
| 847 |
- **π’ Anthropic** - MCP protocol creators
|
| 848 |
- **π¨ Gradio** - Native MCP framework support
|
|
@@ -901,7 +901,7 @@ For issues or questions:
|
|
| 901 |
- 3 data resources (leaderboard, trace, cost data)
|
| 902 |
- 3 prompt templates (analysis, debug, optimization)
|
| 903 |
- Gradio native MCP support with decorators (`@gr.mcp.*`)
|
| 904 |
-
- Google Gemini 2.5
|
| 905 |
- Live HuggingFace dataset integration
|
| 906 |
- **Performance Optimizations**:
|
| 907 |
- get_top_performers: 90% token reduction vs full leaderboard
|
|
|
|
| 7 |
app_port: 7860
|
| 8 |
pinned: true
|
| 9 |
license: agpl-3.0
|
| 10 |
+
short_description: MCP server for agent evaluation with Gemini 2.5 Flash
|
| 11 |
tags:
|
| 12 |
- building-mcp-track-enterprise
|
| 13 |
- mcp
|
|
|
|
| 76 |
2. **debug prompts**: Templates for debugging scenarios
|
| 77 |
3. **optimization prompts**: Templates for optimization goals
|
| 78 |
|
| 79 |
+
All analysis is powered by **Google Gemini 2.5 Flash** for intelligent, context-aware insights.
|
| 80 |
|
| 81 |
## π Quick Links
|
| 82 |
|
|
|
|
| 115 |
- β
**Production-Ready**: Deployable to HuggingFace Spaces with SSE transport
|
| 116 |
- β
**Testing Interface**: Beautiful Gradio UI for testing all components
|
| 117 |
- β
**Enterprise Focus**: Cost optimization, debugging, decision support, and custom dataset generation
|
| 118 |
+
- β
**Google Gemini Powered**: Leverages Gemini 2.5 Flash for intelligent analysis
|
| 119 |
- β
**17 Total Components**: 11 Tools + 3 Resources + 3 Prompts
|
| 120 |
|
| 121 |
### π οΈ Eleven Production-Ready Tools
|
|
|
|
| 254 |
|
| 255 |
#### 8. generate_synthetic_dataset
|
| 256 |
|
| 257 |
+
Generates domain-specific synthetic test datasets for SMOLTRACE evaluations using Google Gemini 2.5 Flash:
|
| 258 |
- AI-powered task generation tailored to your domain
|
| 259 |
- Custom tool specifications
|
| 260 |
- Configurable difficulty distribution (balanced, easy_only, medium_only, hard_only, progressive)
|
|
|
|
| 561 |
### Available MCP Components
|
| 562 |
|
| 563 |
**Tools** (9):
|
| 564 |
+
1. **analyze_leaderboard**: AI-powered leaderboard analysis with Gemini 2.5 Flash
|
| 565 |
2. **debug_trace**: Trace debugging with AI insights
|
| 566 |
3. **estimate_cost**: Cost estimation with optimization recommendations
|
| 567 |
4. **compare_runs**: Compare two evaluation runs with AI-powered analysis
|
|
|
|
| 588 |
```
|
| 589 |
TraceMind-mcp-server/
|
| 590 |
βββ app.py # Gradio UI + MCP server (mcp_server=True)
|
| 591 |
+
βββ gemini_client.py # Google Gemini 2.5 Flash integration
|
| 592 |
βββ mcp_tools.py # 7 tool implementations
|
| 593 |
βββ requirements.txt # Python dependencies
|
| 594 |
βββ .env.example # Environment variable template
|
|
|
|
| 598 |
|
| 599 |
**Key Technologies**:
|
| 600 |
- **Gradio 6 with MCP support**: `gradio[mcp]` provides native MCP server capabilities
|
| 601 |
+
- **Google Gemini 2.5 Flash**: Latest AI model for intelligent analysis
|
| 602 |
- **HuggingFace Datasets**: Data source for evaluations
|
| 603 |
- **Streamable HTTP Transport**: Modern streaming protocol for MCP communication (recommended)
|
| 604 |
- **SSE Transport**: Server-Sent Events for legacy MCP compatibility (deprecated)
|
|
|
|
| 699 |
- Integrates with enterprise data infrastructure (HuggingFace datasets)
|
| 700 |
|
| 701 |
**Technology Stack**
|
| 702 |
+
- **AI Analysis**: Google Gemini 2.5 Flash for all intelligent insights
|
| 703 |
- **MCP Framework**: Gradio 6 with native MCP support
|
| 704 |
- **Data Source**: HuggingFace Datasets
|
| 705 |
- **Transport**: Streamable HTTP (recommended) and SSE (deprecated)
|
|
|
|
| 727 |
- API documentation
|
| 728 |
|
| 729 |
### gemini_client.py
|
| 730 |
+
Google Gemini 2.5 Flash client that:
|
| 731 |
- Handles API authentication
|
| 732 |
- Provides specialized analysis methods for different data types
|
| 733 |
- Formats prompts for optimal results
|
|
|
|
| 842 |
Special thanks to the sponsors of **MCP's 1st Birthday Hackathon** (November 14-30, 2025):
|
| 843 |
|
| 844 |
- **π€ HuggingFace** - Hosting platform and dataset infrastructure
|
| 845 |
+
- **π§ Google Gemini** - AI analysis powered by Gemini 2.5 Flash API
|
| 846 |
- **β‘ Modal** - Serverless infrastructure partner
|
| 847 |
- **π’ Anthropic** - MCP protocol creators
|
| 848 |
- **π¨ Gradio** - Native MCP framework support
|
|
|
|
| 901 |
- 3 data resources (leaderboard, trace, cost data)
|
| 902 |
- 3 prompt templates (analysis, debug, optimization)
|
| 903 |
- Gradio native MCP support with decorators (`@gr.mcp.*`)
|
| 904 |
+
- Google Gemini 2.5 Flash integration for all AI analysis
|
| 905 |
- Live HuggingFace dataset integration
|
| 906 |
- **Performance Optimizations**:
|
| 907 |
- get_top_performers: 90% token reduction vs full leaderboard
|
app.py
CHANGED
|
@@ -22,7 +22,7 @@ Architecture:
|
|
| 22 |
β MCP Endpoint (Gradio SSE)
|
| 23 |
β TraceMind MCP Server (this file)
|
| 24 |
β Tools (mcp_tools.py)
|
| 25 |
-
β Google Gemini 2.5
|
| 26 |
|
| 27 |
For Track 1: Building MCP Servers - Enterprise Category
|
| 28 |
https://huggingface.co/MCP-1st-Birthday
|
|
@@ -139,7 +139,7 @@ def create_gradio_ui():
|
|
| 139 |
#### π€ TraceMind MCP Server (This Project)
|
| 140 |
**Track 1: Building MCP (Enterprise)**
|
| 141 |
- Provides AI-powered MCP tools for analyzing evaluation data
|
| 142 |
-
- Uses Google Gemini 2.5
|
| 143 |
- 11 tools + 3 resources + 3 prompts
|
| 144 |
- [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
|
| 145 |
|
|
@@ -170,7 +170,7 @@ def create_gradio_ui():
|
|
| 170 |
|
| 171 |
TraceMind MCP Server provides intelligent analysis tools for agent evaluation data through the Model Context Protocol (MCP).
|
| 172 |
|
| 173 |
-
**Powered by**: Google Gemini 2.5
|
| 174 |
|
| 175 |
**π¬ [Quick Demo (5 min)](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)** | **πΊ [Full Demo (20 min)](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)**
|
| 176 |
|
|
@@ -265,7 +265,7 @@ def create_gradio_ui():
|
|
| 265 |
Analyze agent evaluation leaderboard and generate AI-powered insights.
|
| 266 |
|
| 267 |
This tool loads agent evaluation data from HuggingFace datasets and uses
|
| 268 |
-
Google Gemini 2.5
|
| 269 |
trends, cost/performance trade-offs, and actionable recommendations.
|
| 270 |
|
| 271 |
Args:
|
|
@@ -327,7 +327,7 @@ def create_gradio_ui():
|
|
| 327 |
Debug a specific agent execution trace using OpenTelemetry data.
|
| 328 |
|
| 329 |
This tool analyzes OpenTelemetry trace data from agent executions and uses
|
| 330 |
-
Google Gemini 2.5
|
| 331 |
identify bottlenecks, explain agent behavior, and provide debugging insights.
|
| 332 |
|
| 333 |
Args:
|
|
@@ -409,7 +409,7 @@ def create_gradio_ui():
|
|
| 409 |
Estimate the cost, duration, and CO2 emissions of running agent evaluations.
|
| 410 |
|
| 411 |
This tool predicts costs before running evaluations by calculating LLM API costs,
|
| 412 |
-
HuggingFace Jobs compute costs, and CO2 emissions. Uses Google Gemini 2.5
|
| 413 |
to provide detailed cost breakdown and optimization recommendations.
|
| 414 |
|
| 415 |
Args:
|
|
@@ -486,7 +486,7 @@ def create_gradio_ui():
|
|
| 486 |
Compare two evaluation runs and generate AI-powered comparative analysis.
|
| 487 |
|
| 488 |
This tool fetches data for two evaluation runs from the leaderboard and uses
|
| 489 |
-
Google Gemini 2.5
|
| 490 |
success rate, cost efficiency, speed, environmental impact, and use case recommendations.
|
| 491 |
|
| 492 |
Args:
|
|
@@ -1683,7 +1683,7 @@ if __name__ == "__main__":
|
|
| 1683 |
logger.info(" β 7 AI-Powered Tools (Leaderboard + Trace + Cost + Dataset)")
|
| 1684 |
logger.info(" β 3 Real-Time Resources (leaderboard, trace, cost data)")
|
| 1685 |
logger.info(" β 3 Prompt Templates (analysis, debug, optimization)")
|
| 1686 |
-
logger.info(" β Google Gemini 2.5
|
| 1687 |
logger.info(" β HuggingFace Dataset Integration")
|
| 1688 |
logger.info(" β SMOLTRACE Format Support")
|
| 1689 |
logger.info(" β Synthetic Dataset Generation")
|
|
|
|
| 22 |
β MCP Endpoint (Gradio SSE)
|
| 23 |
β TraceMind MCP Server (this file)
|
| 24 |
β Tools (mcp_tools.py)
|
| 25 |
+
β Google Gemini 2.5 Flash API
|
| 26 |
|
| 27 |
For Track 1: Building MCP Servers - Enterprise Category
|
| 28 |
https://huggingface.co/MCP-1st-Birthday
|
|
|
|
| 139 |
#### π€ TraceMind MCP Server (This Project)
|
| 140 |
**Track 1: Building MCP (Enterprise)**
|
| 141 |
- Provides AI-powered MCP tools for analyzing evaluation data
|
| 142 |
+
- Uses Google Gemini 2.5 Flash for intelligent insights
|
| 143 |
- 11 tools + 3 resources + 3 prompts
|
| 144 |
- [HF Space](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)
|
| 145 |
|
|
|
|
| 170 |
|
| 171 |
TraceMind MCP Server provides intelligent analysis tools for agent evaluation data through the Model Context Protocol (MCP).
|
| 172 |
|
| 173 |
+
**Powered by**: Google Gemini 2.5 Flash
|
| 174 |
|
| 175 |
**π¬ [Quick Demo (5 min)](https://www.loom.com/share/d4d0003f06fa4327b46ba5c081bdf835)** | **πΊ [Full Demo (20 min)](https://www.loom.com/share/de559bb0aef749559c79117b7f951250)**
|
| 176 |
|
|
|
|
| 265 |
Analyze agent evaluation leaderboard and generate AI-powered insights.
|
| 266 |
|
| 267 |
This tool loads agent evaluation data from HuggingFace datasets and uses
|
| 268 |
+
Google Gemini 2.5 Flash to provide intelligent analysis of top performers,
|
| 269 |
trends, cost/performance trade-offs, and actionable recommendations.
|
| 270 |
|
| 271 |
Args:
|
|
|
|
| 327 |
Debug a specific agent execution trace using OpenTelemetry data.
|
| 328 |
|
| 329 |
This tool analyzes OpenTelemetry trace data from agent executions and uses
|
| 330 |
+
Google Gemini 2.5 Flash to answer specific questions about the execution flow,
|
| 331 |
identify bottlenecks, explain agent behavior, and provide debugging insights.
|
| 332 |
|
| 333 |
Args:
|
|
|
|
| 409 |
Estimate the cost, duration, and CO2 emissions of running agent evaluations.
|
| 410 |
|
| 411 |
This tool predicts costs before running evaluations by calculating LLM API costs,
|
| 412 |
+
HuggingFace Jobs compute costs, and CO2 emissions. Uses Google Gemini 2.5 Flash
|
| 413 |
to provide detailed cost breakdown and optimization recommendations.
|
| 414 |
|
| 415 |
Args:
|
|
|
|
| 486 |
Compare two evaluation runs and generate AI-powered comparative analysis.
|
| 487 |
|
| 488 |
This tool fetches data for two evaluation runs from the leaderboard and uses
|
| 489 |
+
Google Gemini 2.5 Flash to provide intelligent comparison across multiple dimensions:
|
| 490 |
success rate, cost efficiency, speed, environmental impact, and use case recommendations.
|
| 491 |
|
| 492 |
Args:
|
|
|
|
| 1683 |
logger.info(" β 7 AI-Powered Tools (Leaderboard + Trace + Cost + Dataset)")
|
| 1684 |
logger.info(" β 3 Real-Time Resources (leaderboard, trace, cost data)")
|
| 1685 |
logger.info(" β 3 Prompt Templates (analysis, debug, optimization)")
|
| 1686 |
+
logger.info(" β Google Gemini 2.5 Flash - Intelligent Analysis")
|
| 1687 |
logger.info(" β HuggingFace Dataset Integration")
|
| 1688 |
logger.info(" β SMOLTRACE Format Support")
|
| 1689 |
logger.info(" β Synthetic Dataset Generation")
|
gemini_client.py
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
"""
|
| 2 |
Gemini Client for TraceMind MCP Server
|
| 3 |
|
| 4 |
-
Handles all interactions with Google Gemini 2.5
|
| 5 |
"""
|
| 6 |
|
| 7 |
import os
|
|
|
|
| 1 |
"""
|
| 2 |
Gemini Client for TraceMind MCP Server
|
| 3 |
|
| 4 |
+
Handles all interactions with Google Gemini 2.5 Flash API
|
| 5 |
"""
|
| 6 |
|
| 7 |
import os
|
mcp_tools.py
CHANGED
|
@@ -27,7 +27,7 @@ docstrings, and type hints.
|
|
| 27 |
debug_prompt - Standardized templates for debugging scenarios
|
| 28 |
optimization_prompt - Standardized templates for optimization goals
|
| 29 |
|
| 30 |
-
All AI analysis powered by Google Gemini 2.5
|
| 31 |
Track 1: Building MCP Servers - Enterprise Category
|
| 32 |
"""
|
| 33 |
|
|
@@ -61,7 +61,7 @@ async def analyze_leaderboard(
|
|
| 61 |
DO NOT use the leaderboard:// resource for questions - use this tool instead!
|
| 62 |
The resource only returns raw JSON data without any analysis.
|
| 63 |
|
| 64 |
-
This tool uses Google Gemini 2.5
|
| 65 |
agent evaluation results, including top performers, trends, cost/performance
|
| 66 |
trade-offs, and actionable recommendations.
|
| 67 |
|
|
@@ -166,7 +166,7 @@ async def debug_trace(
|
|
| 166 |
DO NOT use the trace:// resource for questions - use this tool instead!
|
| 167 |
The resource only returns raw OTEL JSON data without any analysis.
|
| 168 |
|
| 169 |
-
This tool uses Google Gemini 2.5
|
| 170 |
provide intelligent debugging insights, step-by-step breakdowns, and answers
|
| 171 |
to specific questions about execution flow.
|
| 172 |
|
|
@@ -267,7 +267,7 @@ async def estimate_cost(
|
|
| 267 |
DO NOT use the cost:// resource for estimates - use this tool instead!
|
| 268 |
The resource only returns raw pricing tables without calculations.
|
| 269 |
|
| 270 |
-
This tool uses Google Gemini 2.5
|
| 271 |
Jobs compute costs, CO2 emissions, and provide intelligent cost breakdowns with
|
| 272 |
optimization recommendations.
|
| 273 |
|
|
@@ -490,7 +490,7 @@ async def compare_runs(
|
|
| 490 |
Compare two evaluation runs and generate AI-powered comparative analysis.
|
| 491 |
|
| 492 |
This tool fetches data for two evaluation runs from the leaderboard and uses
|
| 493 |
-
Google Gemini 2.5
|
| 494 |
success rate, cost efficiency, speed, environmental impact, and use case recommendations.
|
| 495 |
|
| 496 |
Args:
|
|
@@ -693,7 +693,7 @@ async def analyze_results(
|
|
| 693 |
- Analyze which types of tasks work well vs poorly
|
| 694 |
|
| 695 |
This tool analyzes individual test case results (not aggregate leaderboard data)
|
| 696 |
-
and uses Google Gemini 2.5
|
| 697 |
|
| 698 |
Args:
|
| 699 |
results_repo (str): HuggingFace dataset repository containing results (e.g., "username/smoltrace-results-gpt4-20251114")
|
|
@@ -1462,7 +1462,7 @@ async def generate_synthetic_dataset(
|
|
| 1462 |
"""
|
| 1463 |
Generate domain-specific synthetic test datasets for SMOLTRACE evaluations using AI.
|
| 1464 |
|
| 1465 |
-
This tool uses Google Gemini 2.5
|
| 1466 |
tasks that follow the SMOLTRACE task dataset format. Perfect for creating custom
|
| 1467 |
benchmarks when standard datasets don't fit your use case.
|
| 1468 |
|
|
@@ -2158,7 +2158,7 @@ Start your response with the YAML content immediately."""
|
|
| 2158 |
"agent_type": agent_type,
|
| 2159 |
"template_name": template_name,
|
| 2160 |
"base_template_url": template_url,
|
| 2161 |
-
"customization_method": "Google Gemini 2.5
|
| 2162 |
},
|
| 2163 |
"prompt_template": customized_template,
|
| 2164 |
"usage_instructions": f"""
|
|
|
|
| 27 |
debug_prompt - Standardized templates for debugging scenarios
|
| 28 |
optimization_prompt - Standardized templates for optimization goals
|
| 29 |
|
| 30 |
+
All AI analysis powered by Google Gemini 2.5 Flash.
|
| 31 |
Track 1: Building MCP Servers - Enterprise Category
|
| 32 |
"""
|
| 33 |
|
|
|
|
| 61 |
DO NOT use the leaderboard:// resource for questions - use this tool instead!
|
| 62 |
The resource only returns raw JSON data without any analysis.
|
| 63 |
|
| 64 |
+
This tool uses Google Gemini 2.5 Flash to provide intelligent analysis of
|
| 65 |
agent evaluation results, including top performers, trends, cost/performance
|
| 66 |
trade-offs, and actionable recommendations.
|
| 67 |
|
|
|
|
| 166 |
DO NOT use the trace:// resource for questions - use this tool instead!
|
| 167 |
The resource only returns raw OTEL JSON data without any analysis.
|
| 168 |
|
| 169 |
+
This tool uses Google Gemini 2.5 Flash to analyze OpenTelemetry trace data and
|
| 170 |
provide intelligent debugging insights, step-by-step breakdowns, and answers
|
| 171 |
to specific questions about execution flow.
|
| 172 |
|
|
|
|
| 267 |
DO NOT use the cost:// resource for estimates - use this tool instead!
|
| 268 |
The resource only returns raw pricing tables without calculations.
|
| 269 |
|
| 270 |
+
This tool uses Google Gemini 2.5 Flash to calculate LLM API costs, HuggingFace
|
| 271 |
Jobs compute costs, CO2 emissions, and provide intelligent cost breakdowns with
|
| 272 |
optimization recommendations.
|
| 273 |
|
|
|
|
| 490 |
Compare two evaluation runs and generate AI-powered comparative analysis.
|
| 491 |
|
| 492 |
This tool fetches data for two evaluation runs from the leaderboard and uses
|
| 493 |
+
Google Gemini 2.5 Flash to provide intelligent comparison across multiple dimensions:
|
| 494 |
success rate, cost efficiency, speed, environmental impact, and use case recommendations.
|
| 495 |
|
| 496 |
Args:
|
|
|
|
| 693 |
- Analyze which types of tasks work well vs poorly
|
| 694 |
|
| 695 |
This tool analyzes individual test case results (not aggregate leaderboard data)
|
| 696 |
+
and uses Google Gemini 2.5 Flash to provide actionable optimization recommendations.
|
| 697 |
|
| 698 |
Args:
|
| 699 |
results_repo (str): HuggingFace dataset repository containing results (e.g., "username/smoltrace-results-gpt4-20251114")
|
|
|
|
| 1462 |
"""
|
| 1463 |
Generate domain-specific synthetic test datasets for SMOLTRACE evaluations using AI.
|
| 1464 |
|
| 1465 |
+
This tool uses Google Gemini 2.5 Flash to create realistic, domain-specific evaluation
|
| 1466 |
tasks that follow the SMOLTRACE task dataset format. Perfect for creating custom
|
| 1467 |
benchmarks when standard datasets don't fit your use case.
|
| 1468 |
|
|
|
|
| 2158 |
"agent_type": agent_type,
|
| 2159 |
"template_name": template_name,
|
| 2160 |
"base_template_url": template_url,
|
| 2161 |
+
"customization_method": "Google Gemini 2.5 Flash"
|
| 2162 |
},
|
| 2163 |
"prompt_template": customized_template,
|
| 2164 |
"usage_instructions": f"""
|