Spaces:

MCP-1st-Birthday
/

TraceMind-mcp-server

Running

Mandark-droid commited on 20 days ago

Commit

4a16168

1 Parent(s): fbd2ae8

Add synthetic dataset generation tools for custom SMOLTRACE evaluations

Enable users to create domain-specific test datasets when standard benchmarks don't fit their use case. Enterprise users can now generate custom evaluation datasets for proprietary tools, industry-specific workflows, and specialized agent capabilities.

Key features:
- generate_synthetic_dataset: AI-powered generation of SMOLTRACE-format tasks (5-100 tasks)
- Parallel batched generation: Automatically splits large requests into concurrent batches
- Extended timeout: 120s per batch to support 100-task generations
- push_dataset_to_hub: Direct upload to HuggingFace with naming validation
- Complete API documentation for both new tools

Technical improvements:
- Parallel execution with asyncio.gather for 5x speedup on large datasets
- Fair distribution of difficulty/agent_type across batches
- Partial success handling: continues if some batches fail
- Switch to gemini-2.5-flash-lite for cost efficiency

Files changed (4) hide show

README.md +8 -7
app.py +364 -28
gemini_client.py +2 -2
mcp_tools.py +497 -5

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ colorFrom: blue
 colorTo: purple
 sdk: docker
 app_port: 7860
-pinned: false
 license: agpl-3.0
 short_description: MCP server for agent evaluation with Gemini 2.5 Pro
 tags:
@@ -32,13 +32,14 @@ tags:
 TraceMind MCP Server is a Gradio-based MCP (Model Context Protocol) server that provides a complete MCP implementation with:
-### 🛠️ **6 AI-Powered Tools**
 1. **📊 analyze_leaderboard**: Generate insights from evaluation leaderboard data
 2. **🐛 debug_trace**: Debug specific agent execution traces using OpenTelemetry data
 3. **💰 estimate_cost**: Predict evaluation costs before running
 4. **⚖️ compare_runs**: Compare two evaluation runs with AI-powered analysis
-5. **🔍 analyze_results**: Deep dive into test results with optimization recommendations
-6. **📦 get_dataset**: Load SMOLTRACE datasets (smoltrace-* prefix only) as JSON for flexible analysis
 ### 📦 **3 Data Resources**
 1. **leaderboard data**: Direct JSON access to evaluation results
@@ -93,11 +94,11 @@ All analysis is powered by **Google Gemini 2.5 Pro** for intelligent, context-aw
 - ✅ **MCP Standard Compliant**: Built with Gradio's native MCP support (`@gr.mcp.*` decorators)
 - ✅ **Production-Ready**: Deployable to HuggingFace Spaces with SSE transport
 - ✅ **Testing Interface**: Beautiful Gradio UI for testing all components
-- ✅ **Enterprise Focus**: Cost optimization, debugging, and decision support
 - ✅ **Google Gemini Powered**: Leverages Gemini 2.5 Pro for intelligent analysis
-- ✅ **11 Total Components**: 5 Tools + 3 Resources + 3 Prompts
-### 🛠️ Five Production-Ready Tools
 #### 1. analyze_leaderboard

 colorTo: purple
 sdk: docker
 app_port: 7860
+pinned: true
 license: agpl-3.0
 short_description: MCP server for agent evaluation with Gemini 2.5 Pro
 tags:
 TraceMind MCP Server is a Gradio-based MCP (Model Context Protocol) server that provides a complete MCP implementation with:
+### 🛠️ **7 AI-Powered Tools**
 1. **📊 analyze_leaderboard**: Generate insights from evaluation leaderboard data
 2. **🐛 debug_trace**: Debug specific agent execution traces using OpenTelemetry data
 3. **💰 estimate_cost**: Predict evaluation costs before running
 4. **⚖️ compare_runs**: Compare two evaluation runs with AI-powered analysis
+5. **📦 get_dataset**: Load SMOLTRACE datasets (smoltrace-* prefix only) as JSON for flexible analysis
+6. **🧪 generate_synthetic_dataset**: Create domain-specific test datasets for SMOLTRACE evaluations (supports up to 100 tasks with parallel batched generation)
+7. **📤 push_dataset_to_hub**: Upload generated datasets to HuggingFace Hub
 ### 📦 **3 Data Resources**
 1. **leaderboard data**: Direct JSON access to evaluation results
 - ✅ **MCP Standard Compliant**: Built with Gradio's native MCP support (`@gr.mcp.*` decorators)
 - ✅ **Production-Ready**: Deployable to HuggingFace Spaces with SSE transport
 - ✅ **Testing Interface**: Beautiful Gradio UI for testing all components
+- ✅ **Enterprise Focus**: Cost optimization, debugging, decision support, and custom dataset generation
 - ✅ **Google Gemini Powered**: Leverages Gemini 2.5 Pro for intelligent analysis
+- ✅ **13 Total Components**: 7 Tools + 3 Resources + 3 Prompts
+### 🛠️ Seven Production-Ready Tools
 #### 1. analyze_leaderboard

app.py CHANGED Viewed

@@ -1,19 +1,49 @@
 """
-TraceMind MCP Server - Gradio Interface with MCP Support
-This server provides AI-powered analysis tools for agent evaluation data:
-1. analyze_leaderboard: Summarize trends and insights from leaderboard
-2. debug_trace: Debug specific agent execution traces
-3. estimate_cost: Predict evaluation costs before running
-4. compare_runs: Compare two evaluation runs with AI-powered analysis
-5. get_dataset: Load any HuggingFace dataset as JSON for flexible analysis
 """
 import os
 import gradio as gr
 from typing import Optional, Dict, Any
 from datetime import datetime
 # Local imports
 from gemini_client import GeminiClient
 from mcp_tools import (
@@ -21,8 +51,9 @@ from mcp_tools import (
     debug_trace,
     estimate_cost,
     compare_runs,
-    analyze_results,
-    get_dataset
 )
 # Initialize default Gemini client (fallback if user doesn't provide key)
@@ -42,15 +73,16 @@ def create_gradio_ui():
         **AI-Powered Analysis for Agent Evaluation Data**
-        This server provides **6 MCP Tools + 3 MCP Resources + 3 MCP Prompts**:
         ### MCP Tools (AI-Powered)
         - 📊 **Analyze Leaderboard**: Get insights from evaluation results
         - 🐛 **Debug Trace**: Understand what happened in a specific test
         - 💰 **Estimate Cost**: Predict evaluation costs before running
         - ⚖️ **Compare Runs**: Compare two evaluation runs with AI-powered analysis
-        - 🔍 **Analyze Results**: Deep dive into test results with optimization recommendations
         - 📦 **Get Dataset**: Load any HuggingFace dataset as JSON for flexible analysis
         ### MCP Resources (Data Access)
         - 📊 **leaderboard://{repo}**: Raw leaderboard data
@@ -493,12 +525,181 @@ def create_gradio_ui():
                     outputs=[dataset_output]
                 )
-            # Tab 6: MCP Resources & Prompts
             with gr.Tab("🔌 MCP Resources & Prompts"):
                 gr.Markdown("""
                 ## MCP Resources & Prompts
-                Beyond the 5 MCP Tools, this server also exposes **MCP Resources** and **MCP Prompts**
                 that MCP clients can use directly.
                 ### MCP Resources (Read-Only Data Access)
@@ -751,7 +952,7 @@ def create_gradio_ui():
                     outputs=[prompt_output]
                 )
-            # Tab 7: API Documentation
             with gr.Tab("📖 API Documentation"):
                 gr.Markdown("""
                 ## MCP Tool Specifications
@@ -842,6 +1043,95 @@ def create_gradio_ui():
                 ---
                 ## MCP Integration
                 This Gradio app is MCP-enabled. When deployed to HuggingFace Spaces, it can be accessed via MCP clients.
@@ -854,8 +1144,8 @@ def create_gradio_ui():
                 ### What's Exposed via MCP:
-                #### 5 MCP Tools (AI-Powered)
-                The five tools above (`analyze_leaderboard`, `debug_trace`, `estimate_cost`, `compare_runs`, `get_dataset`)
                 are automatically exposed as MCP tools and can be called from any MCP client.
                 #### 3 MCP Resources (Data Access)
@@ -891,14 +1181,60 @@ def create_gradio_ui():
     return demo
 if __name__ == "__main__":
-    # Create Gradio interface
-    demo = create_gradio_ui()
-    # Launch with MCP server enabled
-    # share=True creates a temporary public HTTPS URL for testing with Claude Code
-    demo.launch(
-        server_name="0.0.0.0",
-        server_port=7860,
-        #share=True,  # Creates temporary HTTPS URL (e.g., https://abc123.gradio.live)
-        mcp_server=True  # Enable MCP server functionality
-    )

 """
+TraceMind MCP Server - Hugging Face Space Entry Point (Track 1)
+This file serves as the entry point for HuggingFace Space deployment.
+Exposes 7 AI-powered MCP tools + 3 Resources + 3 Prompts via Gradio's native MCP support.
+Architecture:
+    User → MCP Client (Claude Desktop, Continue, Cline, etc.)
+         → MCP Endpoint (Gradio SSE)
+         → TraceMind MCP Server (this file)
+         → Tools (mcp_tools.py)
+         → Google Gemini 2.5 Pro API
+For Track 1: Building MCP Servers - Enterprise Category
+https://huggingface.co/MCP-1st-Birthday
+Tools Provided:
+    📊 analyze_leaderboard - AI-powered leaderboard analysis
+    🐛 debug_trace - Debug agent execution traces with AI
+    💰 estimate_cost - Predict evaluation costs before running
+    ⚖️ compare_runs - Compare evaluation runs with AI analysis
+    📦 get_dataset - Load SMOLTRACE datasets as JSON
+    🧪 generate_synthetic_dataset - Create domain-specific test datasets
+    📤 push_dataset_to_hub - Upload datasets to HuggingFace Hub
+Compatible with:
+- Claude Desktop (via Gradio MCP support)
+- Continue.dev (VS Code extension)
+- Cline (VS Code extension)
+- Any MCP client supporting Gradio's MCP protocol
 """
 import os
+import logging
 import gradio as gr
 from typing import Optional, Dict, Any
 from datetime import datetime
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+    handlers=[logging.StreamHandler()]
+)
+logger = logging.getLogger(__name__)
 # Local imports
 from gemini_client import GeminiClient
 from mcp_tools import (
     debug_trace,
     estimate_cost,
     compare_runs,
+    get_dataset,
+    generate_synthetic_dataset,
+    push_dataset_to_hub
 )
 # Initialize default Gemini client (fallback if user doesn't provide key)
         **AI-Powered Analysis for Agent Evaluation Data**
+        This server provides **7 MCP Tools + 3 MCP Resources + 3 MCP Prompts**:
         ### MCP Tools (AI-Powered)
         - 📊 **Analyze Leaderboard**: Get insights from evaluation results
         - 🐛 **Debug Trace**: Understand what happened in a specific test
         - 💰 **Estimate Cost**: Predict evaluation costs before running
         - ⚖️ **Compare Runs**: Compare two evaluation runs with AI-powered analysis
         - 📦 **Get Dataset**: Load any HuggingFace dataset as JSON for flexible analysis
+        - 🧪 **Generate Synthetic Dataset**: Create domain-specific test datasets for SMOLTRACE
+        - 📤 **Push to Hub**: Upload generated datasets to HuggingFace Hub
         ### MCP Resources (Data Access)
         - 📊 **leaderboard://{repo}**: Raw leaderboard data
                     outputs=[dataset_output]
                 )
+            # Tab 6: Generate Synthetic Dataset
+            with gr.Tab("🧪 Generate Synthetic Dataset"):
+                gr.Markdown("""
+                ## Create Domain-Specific Test Datasets for SMOLTRACE
+                Use AI to generate synthetic evaluation tasks tailored to your domain and tools.
+                Perfect for creating custom benchmarks when standard datasets don't fit your use case.
+                **🎯 Enterprise Use Case**: Quickly create evaluation datasets for:
+                - Custom tools and APIs your agents use
+                - Industry-specific domains (finance, healthcare, legal, etc.)
+                - Internal workflows and processes
+                - Specialized agent capabilities
+                **Output Format**: SMOLTRACE-compatible task dataset ready for HuggingFace upload
+                """)
+                with gr.Row():
+                    with gr.Column():
+                        synth_domain = gr.Textbox(
+                            label="Domain",
+                            placeholder="e.g., finance, healthcare, travel, ecommerce, customer_support",
+                            value="travel",
+                            info="The domain/industry for your synthetic tasks"
+                        )
+                        synth_tools = gr.Textbox(
+                            label="Tool Names (comma-separated)",
+                            placeholder="e.g., get_weather,search_flights,book_hotel,currency_converter",
+                            value="get_weather,search_flights,book_hotel",
+                            info="Names of tools your agent can use",
+                            lines=2
+                        )
+                        synth_num_tasks = gr.Slider(
+                            label="Number of Tasks",
+                            minimum=5,
+                            maximum=100,
+                            value=10,
+                            step=1,
+                            info="Total number of synthetic tasks to generate"
+                        )
+                        synth_difficulty = gr.Dropdown(
+                            label="Difficulty Distribution",
+                            choices=["balanced", "easy_only", "medium_only", "hard_only", "progressive"],
+                            value="balanced",
+                            info="How to distribute task difficulty"
+                        )
+                        synth_agent_type = gr.Dropdown(
+                            label="Agent Type",
+                            choices=["both", "tool", "code"],
+                            value="both",
+                            info="Target agent type for the tasks"
+                        )
+                        synth_button = gr.Button("🧪 Generate Synthetic Dataset", variant="primary", size="lg")
+                    with gr.Column():
+                        synth_output = gr.JSON(label="Generated Dataset (JSON)")
+                        gr.Markdown("""
+                        ### 📝 Next Steps
+                        After generation:
+                        1. **Copy the `tasks` array** from the JSON output above
+                        2. **Use the "Push to Hub" tab** to upload directly to HuggingFace
+                        3. **Or upload manually** following the instructions in the output
+                        **💡 Tip**: The generated dataset includes usage instructions and follows SMOLTRACE naming convention!
+                        """)
+                async def run_generate_synthetic(domain, tools, num_tasks, difficulty, agent_type):
+                    """Generate synthetic dataset with async support."""
+                    try:
+                        import json
+                        result = await generate_synthetic_dataset(
+                            domain=domain,
+                            tool_names=tools,
+                            num_tasks=int(num_tasks),
+                            difficulty_distribution=difficulty,
+                            agent_type=agent_type
+                        )
+                        return json.loads(result)
+                    except Exception as e:
+                        return {"error": str(e)}
+                synth_button.click(
+                    fn=run_generate_synthetic,
+                    inputs=[synth_domain, synth_tools, synth_num_tasks, synth_difficulty, synth_agent_type],
+                    outputs=[synth_output]
+                )
+            # Tab 7: Push Dataset to Hub
+            with gr.Tab("📤 Push to Hub"):
+                gr.Markdown("""
+                ## Upload Generated Dataset to HuggingFace Hub
+                Upload your synthetic dataset (from the previous tab or any SMOLTRACE-format dataset)
+                directly to HuggingFace Hub.
+                **Requirements**:
+                - HuggingFace account
+                - API token with write permissions ([Get one here](https://huggingface.co/settings/tokens))
+                - Dataset in SMOLTRACE format
+                **Naming Convention**: `{username}/smoltrace-{domain}-tasks` or `{username}/smoltrace-{domain}-tasks-v1`
+                """)
+                with gr.Row():
+                    with gr.Column():
+                        push_dataset_json = gr.Textbox(
+                            label="Dataset JSON (tasks array)",
+                            placeholder='[{"id": "task_001", "prompt": "...", "expected_tool": "...", ...}]',
+                            info="Paste the 'tasks' array from generate_synthetic_dataset output",
+                            lines=10
+                        )
+                        push_repo_name = gr.Textbox(
+                            label="Repository Name",
+                            placeholder="your-username/smoltrace-finance-tasks",
+                            info="HuggingFace repo name (follow SMOLTRACE convention)",
+                            value=""
+                        )
+                        push_hf_token = gr.Textbox(
+                            label="HuggingFace Token",
+                            placeholder="hf_...",
+                            info="API token with write permissions",
+                            type="password"
+                        )
+                        push_private = gr.Checkbox(
+                            label="Make dataset private",
+                            value=False,
+                            info="Private datasets are only visible to you"
+                        )
+                        push_button = gr.Button("📤 Push to HuggingFace Hub", variant="primary", size="lg")
+                    with gr.Column():
+                        push_output = gr.JSON(label="Upload Result")
+                        gr.Markdown("""
+                        ### 🎉 After Upload
+                        Once uploaded, you can:
+                        1. **View your dataset** at the URL provided in the output
+                        2. **Use in SMOLTRACE** evaluations with the command shown
+                        3. **Share with your team** (if public) or manage access (if private)
+                        **Example**: After uploading to `company/smoltrace-finance-tasks`:
+                        ```bash
+                        smoltrace-eval --model openai/gpt-4 --dataset-name company/smoltrace-finance-tasks
+                        ```
+                        """)
+                async def run_push_dataset(dataset_json, repo_name, hf_token, private):
+                    """Push dataset to hub with async support."""
+                    try:
+                        import json
+                        result = await push_dataset_to_hub(
+                            dataset_json=dataset_json,
+                            repo_name=repo_name,
+                            hf_token=hf_token,
+                            private=private
+                        )
+                        return json.loads(result)
+                    except Exception as e:
+                        return {"error": str(e)}
+                push_button.click(
+                    fn=run_push_dataset,
+                    inputs=[push_dataset_json, push_repo_name, push_hf_token, push_private],
+                    outputs=[push_output]
+                )
+            # Tab 9: MCP Resources & Prompts
             with gr.Tab("🔌 MCP Resources & Prompts"):
                 gr.Markdown("""
                 ## MCP Resources & Prompts
+                Beyond the 7 MCP Tools, this server also exposes **MCP Resources** and **MCP Prompts**
                 that MCP clients can use directly.
                 ### MCP Resources (Read-Only Data Access)
                     outputs=[prompt_output]
                 )
+            # Tab 10: API Documentation
             with gr.Tab("📖 API Documentation"):
                 gr.Markdown("""
                 ## MCP Tool Specifications
                 ---
+                ### 6. generate_synthetic_dataset
+                **Description**: Generate domain-specific synthetic test datasets for SMOLTRACE evaluations using AI
+                **Parameters**:
+                - `domain` (str, required): The domain for synthetic tasks (e.g., "finance", "healthcare", "travel", "ecommerce", "customer_support")
+                - `tool_names` (str, required): Comma-separated list of tool names to include (e.g., "get_weather,search_web,calculator")
+                - `num_tasks` (int): Number of synthetic tasks to generate (default: 10, range: 5-100)
+                - `difficulty_distribution` (str): How to distribute task difficulty (default: "balanced")
+                  - Options: "balanced" (40% easy, 40% medium, 20% hard), "easy_only", "medium_only", "hard_only", "progressive" (50% easy, 30% medium, 20% hard)
+                - `agent_type` (str): Target agent type for tasks (default: "both")
+                  - Options: "tool" (ToolCallingAgent), "code" (CodeAgent), "both" (50/50 mix)
+                **Returns**: JSON object with dataset_info (including batch statistics), tasks array (SMOLTRACE format), and usage_instructions
+                **🚀 Batched Generation**:
+                - Requests >20 tasks are automatically split into parallel batches
+                - Each batch generates up to 20 tasks concurrently
+                - Example: 100 tasks = 5 parallel batches (20 tasks each)
+                - Timeout: 120 seconds per batch
+                - Token limit: 8,192 per batch (40,960 total for 100 tasks)
+                **Performance**:
+                - 5-20 tasks: Single batch, ~30-60 seconds
+                - 21-100 tasks: Multiple parallel batches, ~60-120 seconds per batch
+                **SMOLTRACE Task Format**:
+                Each task includes: `id`, `prompt`, `expected_tool`, `expected_tool_calls` (optional), `difficulty`, `agent_type`, `expected_keywords` (optional)
+                **Use Cases**:
+                - Create custom evaluation datasets for industry-specific domains
+                - Test agents with proprietary tools and APIs
+                - Generate benchmarks for internal workflows
+                - Rapid prototyping of evaluation scenarios
+                ---
+                ### 7. push_dataset_to_hub
+                **Description**: Push a generated synthetic dataset to HuggingFace Hub
+                **Parameters**:
+                - `dataset_json` (str, required): JSON string containing the tasks array from generate_synthetic_dataset
+                - `repo_name` (str, required): HuggingFace repository name following SMOLTRACE naming convention
+                  - Format: `{username}/smoltrace-{domain}-tasks` or `{username}/smoltrace-{domain}-tasks-v{version}`
+                  - Examples: `kshitij/smoltrace-finance-tasks`, `kshitij/smoltrace-healthcare-tasks-v2`
+                - `hf_token` (str, required): HuggingFace API token with write permissions
+                - `private` (bool): Whether to create a private repository (default: False)
+                **Returns**: JSON object with upload status, repository URL, and dataset information
+                **Validation**:
+                - ✅ Checks SMOLTRACE naming convention (`smoltrace-` prefix required)
+                - ✅ Validates all tasks have required fields (id, prompt, expected_tool, difficulty, agent_type)
+                - ✅ Verifies HuggingFace token has write permissions
+                - ✅ Handles repository creation if it doesn't exist
+                **Workflow**:
+                1. Generate synthetic dataset using `generate_synthetic_dataset`
+                2. Extract the `tasks` array from the response JSON
+                3. Convert tasks array to JSON string
+                4. Call `push_dataset_to_hub` with the JSON string and desired repo name
+                5. Share the dataset URL with your team or use in SMOLTRACE evaluations
+                **Example Integration**:
+                ```python
+                # Step 1: Generate dataset
+                result = generate_synthetic_dataset(
+                    domain="finance",
+                    tool_names="get_stock_price,calculate_roi,fetch_company_info",
+                    num_tasks=50
+                )
+                # Step 2: Extract tasks
+                import json
+                data = json.loads(result)
+                tasks_json = json.dumps(data["tasks"])
+                # Step 3: Push to HuggingFace
+                push_result = push_dataset_to_hub(
+                    dataset_json=tasks_json,
+                    repo_name="your-username/smoltrace-finance-tasks",
+                    hf_token="hf_xxx",
+                    private=False
+                )
+                ```
+                ---
                 ## MCP Integration
                 This Gradio app is MCP-enabled. When deployed to HuggingFace Spaces, it can be accessed via MCP clients.
                 ### What's Exposed via MCP:
+                #### 7 MCP Tools (AI-Powered)
+                The seven tools above (`analyze_leaderboard`, `debug_trace`, `estimate_cost`, `compare_runs`, `get_dataset`, `generate_synthetic_dataset`, `push_dataset_to_hub`)
                 are automatically exposed as MCP tools and can be called from any MCP client.
                 #### 3 MCP Resources (Data Access)
     return demo
 if __name__ == "__main__":
+    logger.info("=" * 70)
+    logger.info("TraceMind MCP Server - HuggingFace Space (Track 1)")
+    logger.info("=" * 70)
+    logger.info("MCP Server: TraceMind Agent Evaluation Platform v1.0.0")
+    logger.info("Protocol: Model Context Protocol (MCP)")
+    logger.info("Transport: Gradio Native MCP Support (SSE)")
+    logger.info("MCP Endpoint: https://kshitijthakkar-tracemind-mcp-server.hf.space/gradio_api/mcp/")
+    logger.info("=" * 70)
+    logger.info("Features:")
+    logger.info("  ✓ 7 AI-Powered Tools (Leaderboard + Trace + Cost + Dataset)")
+    logger.info("  ✓ 3 Real-Time Resources (leaderboard, trace, cost data)")
+    logger.info("  ✓ 3 Prompt Templates (analysis, debug, optimization)")
+    logger.info("  ✓ Google Gemini 2.5 Pro - Intelligent Analysis")
+    logger.info("  ✓ HuggingFace Dataset Integration")
+    logger.info("  ✓ SMOLTRACE Format Support")
+    logger.info("  ✓ Synthetic Dataset Generation")
+    logger.info("=" * 70)
+    logger.info("Tool Categories:")
+    logger.info("  📊 Analysis: analyze_leaderboard, compare_runs")
+    logger.info("  🐛 Debugging: debug_trace")
+    logger.info("  �� Cost: estimate_cost")
+    logger.info("  📦 Data: get_dataset")
+    logger.info("  🧪 Generation: generate_synthetic_dataset, push_dataset_to_hub")
+    logger.info("=" * 70)
+    logger.info("Compatible Clients:")
+    logger.info("  • Claude Desktop")
+    logger.info("  • Continue.dev (VS Code)")
+    logger.info("  • Cline (VS Code)")
+    logger.info("  • Any MCP-compatible client")
+    logger.info("=" * 70)
+    logger.info("How to Connect (Claude Desktop/HF MCP Client):")
+    logger.info("  1. Go to https://huggingface.co/settings/mcp")
+    logger.info("  2. Add Space: kshitijthakkar-tracemind-mcp-server")
+    logger.info("  3. Start using TraceMind tools in your MCP client!")
+    logger.info("=" * 70)
+    logger.info("Starting Gradio UI + MCP Server on 0.0.0.0:7860...")
+    logger.info("Waiting for connections...")
+    logger.info("=" * 70)
+    try:
+        # Create Gradio interface
+        demo = create_gradio_ui()
+        # Launch with MCP server enabled
+        demo.launch(
+            server_name="0.0.0.0",
+            server_port=7860,
+            mcp_server=True  # Enable MCP server functionality
+        )
+    except Exception as e:
+        logger.error(f"Failed to start server: {e}")
+        logger.error("Check that:")
+        logger.error("  1. GEMINI_API_KEY environment variable is set")
+        logger.error("  2. Port 7860 is available")
+        logger.error("  3. All dependencies are installed")
+        raise

gemini_client.py CHANGED Viewed

@@ -12,13 +12,13 @@ import json
 class GeminiClient:
     """Client for Google Gemini API"""
-    def __init__(self, api_key: Optional[str] = None, model_name: str = "gemini-2.5-flash"):
         """
         Initialize Gemini client
         Args:
             api_key: Gemini API key (defaults to GEMINI_API_KEY env var)
-            model_name: Model to use (default: gemini-2.5-flash, can also use gemini-2.5-flash-lite)
         """
         self.api_key = api_key or os.getenv("GEMINI_API_KEY")
         if not self.api_key:

 class GeminiClient:
     """Client for Google Gemini API"""
+    def __init__(self, api_key: Optional[str] = None, model_name: str = "gemini-2.5-flash-lite"):
         """
         Initialize Gemini client
         Args:
             api_key: Gemini API key (defaults to GEMINI_API_KEY env var)
+            model_name: Model to use (default: gemini-2.5-flash-lite, can also use gemini-2.5-flash)
         """
         self.api_key = api_key or os.getenv("GEMINI_API_KEY")
         if not self.api_key:

mcp_tools.py CHANGED Viewed

@@ -1,14 +1,34 @@
 """
-MCP Tool Implementations for TraceMind
-Implements:
-- 5 MCP Tools: analyze_leaderboard, debug_trace, estimate_cost, compare_runs, get_dataset
-- 3 MCP Resources: leaderboard data, trace data, cost data
-- 3 MCP Prompts: analysis prompts, debug prompts, optimization prompts
 With Gradio's native MCP support (mcp_server=True), these are automatically
 exposed based on decorators (@gr.mcp.tool, @gr.mcp.resource, @gr.mcp.prompt),
 docstrings, and type hints.
 """
 import os
@@ -1114,3 +1134,475 @@ def optimization_prompt(
     template = templates.get(optimization_goal, {}).get(constraints, templates["cost"]["maintain_quality"])
     return template

 """
+MCP Tool Implementations for TraceMind MCP Server
+This module implements 13 MCP components (7 Tools + 3 Resources + 3 Prompts) for
+AI-powered agent evaluation analysis.
 With Gradio's native MCP support (mcp_server=True), these are automatically
 exposed based on decorators (@gr.mcp.tool, @gr.mcp.resource, @gr.mcp.prompt),
 docstrings, and type hints.
+🛠️ Tools (7 AI-Powered):
+    📊 analyze_leaderboard - Get AI insights from evaluation leaderboard data
+    🐛 debug_trace - Debug agent execution traces with AI assistance
+    💰 estimate_cost - Predict evaluation costs with AI recommendations
+    ⚖️ compare_runs - Compare two evaluation runs with AI analysis
+    📦 get_dataset - Load SMOLTRACE datasets as JSON for flexible analysis
+    🧪 generate_synthetic_dataset - Create domain-specific test datasets
+    📤 push_dataset_to_hub - Upload datasets to HuggingFace Hub
+📦 Resources (3 Data Access):
+    leaderboard://{repo} - Raw leaderboard data in JSON format
+    trace://{trace_id}/{repo} - Raw OpenTelemetry trace data
+    cost://model/{model_name} - Model pricing and hardware cost data
+📝 Prompts (3 Templates):
+    analysis_prompt - Standardized templates for analysis requests
+    debug_prompt - Standardized templates for debugging scenarios
+    optimization_prompt - Standardized templates for optimization goals
+All AI analysis powered by Google Gemini 2.5 Pro.
+Track 1: Building MCP Servers - Enterprise Category
 """
 import os
     template = templates.get(optimization_goal, {}).get(constraints, templates["cost"]["maintain_quality"])
     return template
+# ========================================
+# NEW TOOLS: Synthetic Dataset Generation
+# ========================================
+@gr.mcp.tool()
+async def generate_synthetic_dataset(
+    domain: str,
+    tool_names: str,
+    num_tasks: int = 10,
+    difficulty_distribution: str = "balanced",
+    agent_type: str = "both"
+) -> str:
+    """
+    Generate domain-specific synthetic test datasets for SMOLTRACE evaluations using AI.
+    This tool uses Google Gemini 2.5 Pro to create realistic, domain-specific evaluation
+    tasks that follow the SMOLTRACE task dataset format. Perfect for creating custom
+    benchmarks when standard datasets don't fit your use case.
+    **🚀 Batched Generation for Scale**:
+    - Requests >20 tasks are automatically split into parallel batches
+    - Utilizes Gemini's large context window efficiently
+    - Supports up to 100 tasks with 120s timeout per batch
+    - Example: 100 tasks = 5 parallel batches (20 tasks each)
+    **Enterprise Use Case**: Quickly create evaluation datasets for:
+    - Custom tools and APIs your agents use
+    - Industry-specific domains (finance, healthcare, legal, manufacturing, etc.)
+    - Internal workflows and business processes
+    - Specialized agent capabilities
+    **Security**: Requires GEMINI_API_KEY environment variable.
+    Args:
+        domain (str): The domain for synthetic tasks (e.g., "finance", "healthcare", "travel", "ecommerce", "customer_support")
+        tool_names (str): Comma-separated list of tool names to include (e.g., "get_weather,search_web,calculator")
+        num_tasks (int): Number of synthetic tasks to generate. Must be between 5 and 100. Default: 10
+                        - 5-20 tasks: Single batch (fast, ~30-60s)
+                        - 21-100 tasks: Multiple parallel batches (slower, ~60-120s per batch)
+        difficulty_distribution (str): How to distribute task difficulty. Options: "balanced" (40% easy, 40% medium, 20% hard), "easy_only", "medium_only", "hard_only", "progressive" (50% easy, 30% medium, 20% hard). Default: "balanced"
+        agent_type (str): Target agent type for tasks. Options: "tool" (ToolCallingAgent), "code" (CodeAgent), "both" (50/50 mix). Default: "both"
+    Returns:
+        str: JSON-formatted response with dataset_info (including batch statistics), tasks array (SMOLTRACE format), and usage_instructions
+    """
+    try:
+        # Initialize Gemini client
+        gemini_client = GeminiClient()
+        # Validate inputs
+        if num_tasks < 5 or num_tasks > 100:
+            return json.dumps({
+                "error": "num_tasks must be between 5 and 100",
+                "num_tasks_provided": num_tasks
+            }, indent=2)
+        # Parse tool names
+        tools = [tool.strip() for tool in tool_names.split(",") if tool.strip()]
+        if len(tools) == 0:
+            return json.dumps({
+                "error": "At least one tool name must be provided",
+                "tool_names_provided": tool_names
+            }, indent=2)
+        # Calculate distributions
+        difficulty_counts = _calculate_difficulty_distribution(num_tasks, difficulty_distribution)
+        agent_type_counts = _calculate_agent_type_distribution(num_tasks, agent_type)
+        # Create generation prompt
+        generation_prompt = f"""You are an expert at creating synthetic evaluation datasets for AI agents.
+Generate {num_tasks} synthetic test tasks for the **{domain}** domain following the SMOLTRACE task format.
+**Available Tools**: {", ".join(tools)}
+**Difficulty Distribution**:
+- Easy ({difficulty_counts['easy']} tasks): Single tool call, straightforward input, clear expected output
+- Medium ({difficulty_counts['medium']} tasks): Multiple tool calls OR complex input parsing OR conditional logic
+- Hard ({difficulty_counts['hard']} tasks): Multiple tools, complex reasoning, edge cases, error handling
+**Agent Type Distribution**:
+- Tool Agent ({agent_type_counts['tool']} tasks): Uses ToolCallingAgent - declarative tool calling
+- Code Agent ({agent_type_counts['code']} tasks): Uses CodeAgent - writes Python code with tools
+**SMOLTRACE Task Format** (required structure):
+```json
+{{
+  "id": "string - unique identifier like '{domain.lower()}_{{tool}}_{{number}}'",
+  "prompt": "string - clear, specific task description",
+  "expected_tool": "string - the tool name that should be used",
+  "expected_tool_calls": "integer - how many times the tool should be called (optional, default 1)",
+  "difficulty": "string - 'easy', 'medium', or 'hard'",
+  "agent_type": "string - 'tool' or 'code'",
+  "expected_keywords": "array of strings - keywords expected in response (optional)"
+}}
+```
+**Generation Guidelines**:
+1. **Domain Specificity**: Make tasks realistic and specific to the {domain} domain
+2. **Tool Usage**: Ensure each task requires using one of: {", ".join(tools)}
+3. **Prompt Quality**: Write clear, unambiguous prompts that an agent can execute
+4. **Expected Keywords**: Include 2-4 expected keywords for validation (optional but recommended)
+5. **Variety**: Vary the tasks to cover different aspects of the domain
+**IMPORTANT**: Return ONLY a valid JSON array of tasks. No explanatory text, no markdown formatting, no code blocks. Just the raw JSON array starting with [ and ending with ].
+Generate exactly {num_tasks} tasks:"""
+        print(f"[GENERATE_SYNTHETIC_DATASET] Generating {num_tasks} tasks for domain '{domain}'...")
+        print(f"[GENERATE_SYNTHETIC_DATASET] Tools: {', '.join(tools)}")
+        # Import required modules
+        import asyncio
+        import google.generativeai as genai
+        # Determine batching strategy
+        # Gemini can handle ~20 tasks per call with 8192 token output limit
+        TASKS_PER_BATCH = 20
+        num_batches = (num_tasks + TASKS_PER_BATCH - 1) // TASKS_PER_BATCH  # Ceiling division
+        if num_batches > 1:
+            print(f"[GENERATE_SYNTHETIC_DATASET] Large request detected. Splitting into {num_batches} parallel batches...")
+        # Create batch generation tasks
+        async def generate_batch(batch_num: int, batch_size: int, batch_difficulty: dict, batch_agent_type: dict):
+            """Generate a single batch of tasks"""
+            batch_prompt = f"""You are an expert at creating synthetic evaluation datasets for AI agents.
+Generate {batch_size} synthetic test tasks for the **{domain}** domain following the SMOLTRACE task format.
+**Available Tools**: {", ".join(tools)}
+**Difficulty Distribution for this batch**:
+- Easy ({batch_difficulty['easy']} tasks): Single tool call, straightforward input, clear expected output
+- Medium ({batch_difficulty['medium']} tasks): Multiple tool calls OR complex input parsing OR conditional logic
+- Hard ({batch_difficulty['hard']} tasks): Multiple tools, complex reasoning, edge cases, error handling
+**Agent Type Distribution for this batch**:
+- Tool Agent ({batch_agent_type['tool']} tasks): Uses ToolCallingAgent - declarative tool calling
+- Code Agent ({batch_agent_type['code']} tasks): Uses CodeAgent - writes Python code with tools
+**SMOLTRACE Task Format** (required structure):
+```json
+{{
+  "id": "string - unique identifier like '{domain.lower()}_{{tool}}_batch{batch_num}_{{number}}'",
+  "prompt": "string - clear, specific task description",
+  "expected_tool": "string - the tool name that should be used",
+  "expected_tool_calls": "integer - how many times the tool should be called (optional, default 1)",
+  "difficulty": "string - 'easy', 'medium', or 'hard'",
+  "agent_type": "string - 'tool' or 'code'",
+  "expected_keywords": "array of strings - keywords expected in response (optional)"
+}}
+```
+**Generation Guidelines**:
+1. **Domain Specificity**: Make tasks realistic and specific to the {domain} domain
+2. **Tool Usage**: Ensure each task requires using one of: {", ".join(tools)}
+3. **Prompt Quality**: Write clear, unambiguous prompts that an agent can execute
+4. **Expected Keywords**: Include 2-4 expected keywords for validation (optional but recommended)
+5. **Variety**: Vary the tasks to cover different aspects of the domain
+6. **Unique IDs**: Include 'batch{batch_num}' in task IDs to ensure uniqueness across batches
+**IMPORTANT**: Return ONLY a valid JSON array of tasks. No explanatory text, no markdown formatting, no code blocks. Just the raw JSON array starting with [ and ending with ].
+Generate exactly {batch_size} tasks:"""
+            generation_config = {
+                "temperature": 0.8,  # Higher for creativity and diversity
+                "top_p": 0.95,
+                "top_k": 40,
+                "max_output_tokens": 8192,
+            }
+            try:
+                response = await asyncio.wait_for(
+                    gemini_client.model.generate_content_async(
+                        batch_prompt,
+                        generation_config=generation_config
+                    ),
+                    timeout=120.0  # 120 seconds per batch for larger datasets
+                )
+                return response.text, None
+            except Exception as e:
+                return None, str(e)
+        # Split difficulty and agent type distributions across batches
+        def split_distribution(total_counts: dict, num_batches: int, batch_num: int, remaining_tasks: int):
+            """Split distribution counts across batches fairly"""
+            batch_counts = {}
+            for key, total in total_counts.items():
+                # Calculate fair share for this batch
+                base_share = total // num_batches
+                extra = 1 if batch_num < (total % num_batches) else 0
+                batch_counts[key] = min(base_share + extra, remaining_tasks)
+            return batch_counts
+        # Generate all batches in parallel
+        batch_tasks = []
+        remaining_tasks = num_tasks
+        for batch_num in range(num_batches):
+            batch_size = min(TASKS_PER_BATCH, remaining_tasks)
+            # Calculate distributions for this batch
+            batch_difficulty = split_distribution(difficulty_counts, num_batches, batch_num, batch_size)
+            batch_agent_type = split_distribution(agent_type_counts, num_batches, batch_num, batch_size)
+            batch_tasks.append(generate_batch(batch_num, batch_size, batch_difficulty, batch_agent_type))
+            remaining_tasks -= batch_size
+        print(f"[GENERATE_SYNTHETIC_DATASET] Executing {num_batches} parallel Gemini API calls...")
+        # Execute all batches in parallel
+        batch_results = await asyncio.gather(*batch_tasks)
+        # Combine and validate results
+        all_tasks = []
+        errors = []
+        for batch_num, (response_text, error) in enumerate(batch_results):
+            if error:
+                errors.append(f"Batch {batch_num} failed: {error}")
+                continue
+            try:
+                # Clean response (remove markdown if present)
+                cleaned_response = response_text.strip()
+                if cleaned_response.startswith("```"):
+                    import re
+                    match = re.search(r'```(?:json)?\s*\n(.*?)\n```', cleaned_response, re.DOTALL)
+                    if match:
+                        cleaned_response = match.group(1)
+                # Parse JSON
+                batch_tasks_parsed = json.loads(cleaned_response)
+                if not isinstance(batch_tasks_parsed, list):
+                    errors.append(f"Batch {batch_num} did not return a JSON array")
+                    continue
+                all_tasks.extend(batch_tasks_parsed)
+            except json.JSONDecodeError as e:
+                errors.append(f"Batch {batch_num} JSON parsing failed: {str(e)}")
+        # Check if we got enough tasks
+        if len(all_tasks) == 0:
+            return json.dumps({
+                "error": "All batches failed to generate tasks",
+                "batch_errors": errors,
+                "suggestion": "Check GEMINI_API_KEY and try again"
+            }, indent=2)
+        if errors:
+            print(f"[GENERATE_SYNTHETIC_DATASET] Warning: Some batches failed: {errors}")
+        print(f"[GENERATE_SYNTHETIC_DATASET] Successfully generated {len(all_tasks)} tasks across {num_batches} batch(es)")
+        # Validate required fields for all tasks
+        synthetic_tasks = all_tasks
+        required_fields = ["id", "prompt", "expected_tool", "difficulty", "agent_type"]
+        for i, task in enumerate(synthetic_tasks):
+            missing_fields = [field for field in required_fields if field not in task]
+            if missing_fields:
+                return json.dumps({
+                    "error": f"Task {i} is missing required fields: {missing_fields}",
+                    "task": task
+                }, indent=2)
+        # Return formatted dataset with metadata
+        result = {
+            "dataset_info": {
+                "domain": domain,
+                "tools": tools,
+                "num_tasks_requested": num_tasks,
+                "num_tasks_generated": len(synthetic_tasks),
+                "num_batches": num_batches,
+                "batches_succeeded": num_batches - len(errors),
+                "batches_failed": len(errors) if errors else 0,
+                "batch_errors": errors if errors else None,
+                "difficulty_distribution": difficulty_counts,
+                "agent_type_distribution": agent_type_counts,
+                "generated_at": datetime.now().isoformat(),
+                "smoltrace_naming_convention": f"{{username}}/smoltrace-{domain.lower()}-tasks",
+                "warning": f"⚠️ {len(errors)} batch(es) failed. Generated {len(synthetic_tasks)}/{num_tasks} tasks." if errors else None
+            },
+            "tasks": synthetic_tasks,
+            "usage_instructions": {
+                "format": "SMOLTRACE task dataset format",
+                "naming_convention": f"Follow SMOLTRACE naming: {{username}}/smoltrace-{domain.lower()}-tasks or {{username}}/smoltrace-{domain.lower()}-tasks-v1 for versioning",
+                "how_to_upload": [
+                    "Option 1: Use the push_dataset_to_hub tool in this MCP server",
+                    "Option 2: Manual upload with Python code (see example_code below)"
+                ],
+                "example_code": f"""from datasets import Dataset
+# Extract tasks from this response
+tasks = result["tasks"]
+# Create and push to HuggingFace (following SMOLTRACE naming convention)
+dataset = Dataset.from_list(tasks)
+dataset.push_to_hub("your-username/smoltrace-{domain.lower()}-tasks")
+# Use in SMOLTRACE evaluation
+# smoltrace-eval --model openai/gpt-4 --dataset-name your-username/smoltrace-{domain.lower()}-tasks"""
+            }
+        }
+        return json.dumps(result, indent=2, default=str)
+    except Exception as e:
+        return json.dumps({
+            "error": f"Failed to generate synthetic dataset: {str(e)}",
+            "domain": domain,
+            "tools": tool_names
+        }, indent=2)
+@gr.mcp.tool()
+async def push_dataset_to_hub(
+    dataset_json: str,
+    repo_name: str,
+    hf_token: str,
+    private: bool = False
+) -> str:
+    """
+    Push a generated synthetic dataset to HuggingFace Hub.
+    This tool uploads datasets created by generate_synthetic_dataset (or any SMOLTRACE-format
+    dataset) to HuggingFace Hub, making them ready for use in SMOLTRACE evaluations.
+    **Naming Convention**: Repo name should follow SMOLTRACE convention:
+    - Format: {username}/smoltrace-{domain}-tasks or {username}/smoltrace-{domain}-tasks-v{version}
+    - Examples: "mycompany/smoltrace-finance-tasks", "alice/smoltrace-healthcare-tasks-v2"
+    **Security**: Requires valid HuggingFace token with write permissions.
+    Args:
+        dataset_json (str): JSON string containing the tasks array (from generate_synthetic_dataset output, use the "tasks" field)
+        repo_name (str): HuggingFace repository name following SMOLTRACE naming: {username}/smoltrace-{domain}-tasks
+        hf_token (str): HuggingFace API token with write permissions (get from https://huggingface.co/settings/tokens)
+        private (bool): Whether to create a private dataset. Default: False (public)
+    Returns:
+        str: JSON response with upload status, dataset URL, and next steps
+    """
+    try:
+        from huggingface_hub import HfApi
+        # Validate repo name follows SMOLTRACE convention
+        if "smoltrace-" not in repo_name and "-tasks" not in repo_name:
+            return json.dumps({
+                "warning": "Repository name doesn't follow SMOLTRACE naming convention",
+                "expected_format": "{username}/smoltrace-{domain}-tasks or {username}/smoltrace-{domain}-tasks-v{version}",
+                "your_repo_name": repo_name,
+                "recommendation": "Consider renaming to follow the convention for consistency with SMOLTRACE ecosystem",
+                "proceeding": "Continuing with upload..."
+            }, indent=2)
+        # Parse dataset JSON
+        try:
+            tasks = json.loads(dataset_json)
+            if not isinstance(tasks, list):
+                return json.dumps({
+                    "error": "dataset_json must be a JSON array of tasks",
+                    "type_received": str(type(tasks))
+                }, indent=2)
+        except json.JSONDecodeError as e:
+            return json.dumps({
+                "error": "Invalid JSON in dataset_json",
+                "parse_error": str(e)
+            }, indent=2)
+        # Validate task structure
+        required_fields = ["id", "prompt", "expected_tool", "difficulty", "agent_type"]
+        for i, task in enumerate(tasks):
+            missing_fields = [field for field in required_fields if field not in task]
+            if missing_fields:
+                return json.dumps({
+                    "error": f"Task {i} is missing required SMOLTRACE fields: {missing_fields}",
+                    "task": task
+                }, indent=2)
+        # Create dataset and push to hub
+        from datasets import Dataset
+        dataset = Dataset.from_list(tasks)
+        print(f"[PUSH_DATASET_TO_HUB] Uploading {len(tasks)} tasks to {repo_name}...")
+        # Push to hub
+        dataset.push_to_hub(
+            repo_name,
+            token=hf_token,
+            private=private
+        )
+        # Return success response
+        result = {
+            "status": "success",
+            "message": f"Successfully uploaded {len(tasks)} tasks to HuggingFace Hub",
+            "dataset_info": {
+                "repository": repo_name,
+                "num_tasks": len(tasks),
+                "visibility": "private" if private else "public",
+                "dataset_url": f"https://huggingface.co/datasets/{repo_name}"
+            },
+            "next_steps": {
+                "view_dataset": f"https://huggingface.co/datasets/{repo_name}",
+                "use_in_smoltrace": f"smoltrace-eval --model openai/gpt-4 --dataset-name {repo_name}",
+                "share_with_team": f"Team members can access at https://huggingface.co/datasets/{repo_name}" if not private else "Dataset is private - share access via HuggingFace settings"
+            }
+        }
+        return json.dumps(result, indent=2)
+    except ImportError:
+        return json.dumps({
+            "error": "Required packages not installed",
+            "missing_packages": "datasets, huggingface_hub",
+            "install_command": "pip install datasets huggingface_hub"
+        }, indent=2)
+    except Exception as e:
+        return json.dumps({
+            "error": f"Failed to push dataset to hub: {str(e)}",
+            "repo_name": repo_name
+        }, indent=2)
+# Helper functions for synthetic dataset generation
+def _calculate_difficulty_distribution(num_tasks: int, difficulty_distribution: str) -> dict:
+    """Calculate how many tasks of each difficulty to generate."""
+    if difficulty_distribution == "balanced":
+        easy = int(num_tasks * 0.4)
+        medium = int(num_tasks * 0.4)
+        hard = num_tasks - easy - medium
+    elif difficulty_distribution == "easy_only":
+        easy, medium, hard = num_tasks, 0, 0
+    elif difficulty_distribution == "medium_only":
+        easy, medium, hard = 0, num_tasks, 0
+    elif difficulty_distribution == "hard_only":
+        easy, medium, hard = 0, 0, num_tasks
+    elif difficulty_distribution == "progressive":
+        easy = int(num_tasks * 0.5)
+        medium = int(num_tasks * 0.3)
+        hard = num_tasks - easy - medium
+    else:
+        # Default to balanced
+        easy = int(num_tasks * 0.4)
+        medium = int(num_tasks * 0.4)
+        hard = num_tasks - easy - medium
+    return {"easy": easy, "medium": medium, "hard": hard}
+def _calculate_agent_type_distribution(num_tasks: int, agent_type: str) -> dict:
+    """Calculate how many tasks for each agent type to generate."""
+    if agent_type == "tool":
+        return {"tool": num_tasks, "code": 0}
+    elif agent_type == "code":
+        return {"tool": 0, "code": num_tasks}
+    elif agent_type == "both":
+        tool_count = num_tasks // 2
+        code_count = num_tasks - tool_count
+        return {"tool": tool_count, "code": code_count}
+    else:
+        # Default to both
+        tool_count = num_tasks // 2
+        code_count = num_tasks - tool_count
+        return {"tool": tool_count, "code": code_count}