kshitijthakkar commited on
Commit
64af94c
Β·
1 Parent(s): 3788b21

docs: Add missing tools 6 & 7 descriptions and fix tool counts

Browse files

- Add detailed descriptions for tools 6 (generate_synthetic_dataset) and 7 (push_dataset_to_hub)
- Fix all tool count discrepancies (6 -> 7) throughout README
- Update Available MCP Components section with all 7 tools
- Update Related Project section with TraceMind-AI links
- Fix mcp_tools.py component count (12 -> 13)
- Update changelog with correct tool count

Files changed (1) hide show
  1. README.md +91 -17
README.md CHANGED
@@ -190,6 +190,73 @@ Loads SMOLTRACE datasets from HuggingFace and returns raw data as JSON:
190
 
191
  **Example Use Case**: When the user asks "Can you provide me with the list of last 10 runIds and model names?", the LLM loads the leaderboard dataset and extracts the requested information from the JSON response.
192
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
193
  ## MCP Resources Usage
194
 
195
  Resources provide direct data access without AI analysis:
@@ -363,20 +430,21 @@ A: Use the SSE endpoint (`/gradio_api/mcp/sse`) for now, but note that it's depr
363
  A: Streamable HTTP is the newer, more efficient protocol with better error handling and performance. SSE is the legacy protocol being phased out.
364
 
365
  **Q: How do I test if my connection works?**
366
- A: After configuring your client, restart it and look for "tracemind" in your available MCP tools/servers. You should see 6 tools, 3 resources, and 3 prompts.
367
 
368
  **Q: Can I use this MCP server without authentication?**
369
  A: The MCP endpoint is publicly accessible. However, the tools may require HuggingFace datasets to be public or accessible with your HF token (configured server-side).
370
 
371
  ### Available MCP Components
372
 
373
- **Tools** (6):
374
  1. **analyze_leaderboard**: AI-powered leaderboard analysis with Gemini 2.5 Pro
375
  2. **debug_trace**: Trace debugging with AI insights
376
  3. **estimate_cost**: Cost estimation with optimization recommendations
377
  4. **compare_runs**: Compare two evaluation runs with AI-powered analysis
378
- 5. **analyze_results**: Deep dive into test results with optimization recommendations
379
- 6. **get_dataset**: Load SMOLTRACE datasets (smoltrace-* only) as JSON
 
380
 
381
  **Resources** (3):
382
  1. **leaderboard://{repo}**: Direct access to raw leaderboard data in JSON
@@ -396,7 +464,7 @@ See full API documentation in the Gradio interface under "πŸ“– API Documentation
396
  TraceMind-mcp-server/
397
  β”œβ”€β”€ app.py # Gradio UI + MCP server (mcp_server=True)
398
  β”œβ”€β”€ gemini_client.py # Google Gemini 2.5 Pro integration
399
- β”œβ”€β”€ mcp_tools.py # 3 tool implementations
400
  β”œβ”€β”€ requirements.txt # Python dependencies
401
  β”œβ”€β”€ .env.example # Environment variable template
402
  β”œβ”€β”€ .gitignore
@@ -511,20 +579,25 @@ Note: This requires actual trace data from an evaluation run. For testing purpos
511
  - **Data Source**: HuggingFace Datasets
512
  - **Transport**: Streamable HTTP (recommended) and SSE (deprecated)
513
 
514
- ## Related Project: TraceMind UI (Track 2)
 
 
515
 
516
- This MCP server is designed to be consumed by **TraceMind UI** (separate submission for Track 2: MCP in Action).
 
 
517
 
518
- TraceMind UI is a Gradio-based agent evaluation platform that uses these MCP tools to provide:
519
- - AI-powered leaderboard insights
520
- - Interactive trace debugging
521
- - Pre-evaluation cost estimates
 
522
 
523
  ## File Descriptions
524
 
525
  ### app.py
526
  Main Gradio application with:
527
- - Testing UI for all 6 tools
528
  - MCP server enabled via `mcp_server=True`
529
  - API documentation
530
 
@@ -536,15 +609,16 @@ Google Gemini 2.5 Pro client that:
536
  - Uses `gemini-2.5-pro-latest` model (can switch to `gemini-2.5-flash-latest`)
537
 
538
  ### mcp_tools.py
539
- Complete MCP implementation with 12 components:
540
 
541
- **Tools** (6 async functions):
542
  - `analyze_leaderboard()`: AI-powered leaderboard analysis
543
  - `debug_trace()`: AI-powered trace debugging
544
  - `estimate_cost()`: AI-powered cost estimation
545
  - `compare_runs()`: AI-powered run comparison
546
- - `analyze_results()`: AI-powered results analysis with optimization recommendations
547
  - `get_dataset()`: Load SMOLTRACE datasets as JSON
 
 
548
 
549
  **Resources** (3 decorated functions with `@gr.mcp.resource()`):
550
  - `get_leaderboard_data()`: Raw leaderboard JSON data
@@ -692,8 +766,8 @@ For issues or questions:
692
 
693
  ### v1.0.0 (2025-11-14)
694
  - Initial release for MCP Hackathon
695
- - **Complete MCP Implementation**: 11 components total
696
- - 5 AI-powered tools (analyze_leaderboard, debug_trace, estimate_cost, compare_runs, get_dataset)
697
  - 3 data resources (leaderboard, trace, cost data)
698
  - 3 prompt templates (analysis, debug, optimization)
699
  - Gradio native MCP support with decorators (`@gr.mcp.*`)
 
190
 
191
  **Example Use Case**: When the user asks "Can you provide me with the list of last 10 runIds and model names?", the LLM loads the leaderboard dataset and extracts the requested information from the JSON response.
192
 
193
+ #### 6. generate_synthetic_dataset
194
+
195
+ Generates domain-specific synthetic test datasets for SMOLTRACE evaluations using Google Gemini 2.5 Pro:
196
+ - AI-powered task generation tailored to your domain
197
+ - Custom tool specifications
198
+ - Configurable difficulty distribution (balanced, easy_only, medium_only, hard_only, progressive)
199
+ - Target specific agent types (tool, code, or both)
200
+ - Output follows SMOLTRACE task format exactly
201
+ - Supports up to 100 tasks with parallel batched generation
202
+
203
+ **SMOLTRACE Task Format**:
204
+ Each generated task includes:
205
+ ```json
206
+ {
207
+ "id": "unique_identifier",
208
+ "prompt": "Clear, specific task for the agent",
209
+ "expected_tool": "tool_name",
210
+ "expected_tool_calls": 1,
211
+ "difficulty": "easy|medium|hard",
212
+ "agent_type": "tool|code",
213
+ "expected_keywords": ["keyword1", "keyword2"]
214
+ }
215
+ ```
216
+
217
+ **Enterprise Use Cases**:
218
+ - **Custom Tools**: Create benchmarks for your proprietary APIs and tools
219
+ - **Industry-Specific**: Generate tasks for finance, healthcare, legal, manufacturing, etc.
220
+ - **Internal Workflows**: Test agents on company-specific processes
221
+ - **Rapid Prototyping**: Quickly create evaluation datasets without manual curation
222
+
223
+ **Difficulty Calibration**:
224
+ - **Easy** (40%): Single tool call, straightforward input, clear expected output
225
+ - **Medium** (40%): Multiple tool calls OR complex input parsing OR conditional logic
226
+ - **Hard** (20%): Multiple tools, complex reasoning, edge cases, error handling
227
+
228
+ **Output Includes**:
229
+ - `dataset_info`: Metadata (domain, tools, counts, timestamp)
230
+ - `tasks`: Ready-to-use SMOLTRACE task array
231
+ - `usage_instructions`: Step-by-step guide for HuggingFace upload and SMOLTRACE usage
232
+
233
+ **Example Use Case**: A financial services company wants to evaluate their customer service agent that uses custom tools for stock quotes, portfolio analysis, and transaction processing. They use this tool to generate 50 realistic tasks covering common customer inquiries across different difficulty levels, then run SMOLTRACE evaluations to benchmark different LLM models before deployment.
234
+
235
+ #### 7. push_dataset_to_hub
236
+
237
+ Upload generated datasets to HuggingFace Hub with proper formatting and metadata:
238
+ - Automatically formats data for HuggingFace datasets library
239
+ - Handles authentication via HF_TOKEN
240
+ - Validates dataset structure before upload
241
+ - Supports both public and private datasets
242
+ - Adds comprehensive metadata (description, tags, license)
243
+ - Creates dataset card with usage instructions
244
+
245
+ **Parameters**:
246
+ - `dataset_name`: Repository name on HuggingFace (e.g., "username/my-dataset")
247
+ - `data`: Dataset content (list of dictionaries or JSON string)
248
+ - `description`: Dataset description for the card
249
+ - `private`: Whether to make the dataset private (default: False)
250
+
251
+ **Example Workflow**:
252
+ 1. Generate synthetic dataset with `generate_synthetic_dataset`
253
+ 2. Review and modify tasks if needed
254
+ 3. Upload to HuggingFace with `push_dataset_to_hub`
255
+ 4. Use in SMOLTRACE evaluations or share with team
256
+
257
+ **Example Use Case**: After generating a custom evaluation dataset for your domain, upload it to HuggingFace to share with your team, version control your benchmarks, or make it publicly available for the community.
258
+
259
+
260
  ## MCP Resources Usage
261
 
262
  Resources provide direct data access without AI analysis:
 
430
  A: Streamable HTTP is the newer, more efficient protocol with better error handling and performance. SSE is the legacy protocol being phased out.
431
 
432
  **Q: How do I test if my connection works?**
433
+ A: After configuring your client, restart it and look for "tracemind" in your available MCP tools/servers. You should see 7 tools, 3 resources, and 3 prompts.
434
 
435
  **Q: Can I use this MCP server without authentication?**
436
  A: The MCP endpoint is publicly accessible. However, the tools may require HuggingFace datasets to be public or accessible with your HF token (configured server-side).
437
 
438
  ### Available MCP Components
439
 
440
+ **Tools** (7):
441
  1. **analyze_leaderboard**: AI-powered leaderboard analysis with Gemini 2.5 Pro
442
  2. **debug_trace**: Trace debugging with AI insights
443
  3. **estimate_cost**: Cost estimation with optimization recommendations
444
  4. **compare_runs**: Compare two evaluation runs with AI-powered analysis
445
+ 5. **get_dataset**: Load SMOLTRACE datasets (smoltrace-* only) as JSON
446
+ 6. **generate_synthetic_dataset**: Create domain-specific test datasets with AI
447
+ 7. **push_dataset_to_hub**: Upload datasets to HuggingFace Hub
448
 
449
  **Resources** (3):
450
  1. **leaderboard://{repo}**: Direct access to raw leaderboard data in JSON
 
464
  TraceMind-mcp-server/
465
  β”œβ”€β”€ app.py # Gradio UI + MCP server (mcp_server=True)
466
  β”œβ”€β”€ gemini_client.py # Google Gemini 2.5 Pro integration
467
+ β”œβ”€β”€ mcp_tools.py # 7 tool implementations
468
  β”œβ”€β”€ requirements.txt # Python dependencies
469
  β”œβ”€β”€ .env.example # Environment variable template
470
  β”œβ”€β”€ .gitignore
 
579
  - **Data Source**: HuggingFace Datasets
580
  - **Transport**: Streamable HTTP (recommended) and SSE (deprecated)
581
 
582
+ ## Related Project: TraceMind-AI (Track 2)
583
+
584
+ This MCP server is designed to be consumed by **[TraceMind-AI](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)** (separate submission for Track 2: MCP in Action).
585
 
586
+ **Links**:
587
+ - **Live Demo**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
588
+ - **GitHub**: https://github.com/Mandark-droid/TraceMind-AI
589
 
590
+ TraceMind-AI is a Gradio-based agent evaluation platform that uses these MCP tools to provide:
591
+ - AI-powered leaderboard insights with autonomous agent chat
592
+ - Interactive trace debugging with MCP-powered Q&A
593
+ - Real-time cost estimation and comparison
594
+ - Complete evaluation workflow visualization
595
 
596
  ## File Descriptions
597
 
598
  ### app.py
599
  Main Gradio application with:
600
+ - Testing UI for all 7 tools
601
  - MCP server enabled via `mcp_server=True`
602
  - API documentation
603
 
 
609
  - Uses `gemini-2.5-pro-latest` model (can switch to `gemini-2.5-flash-latest`)
610
 
611
  ### mcp_tools.py
612
+ Complete MCP implementation with 13 components:
613
 
614
+ **Tools** (7 async functions):
615
  - `analyze_leaderboard()`: AI-powered leaderboard analysis
616
  - `debug_trace()`: AI-powered trace debugging
617
  - `estimate_cost()`: AI-powered cost estimation
618
  - `compare_runs()`: AI-powered run comparison
 
619
  - `get_dataset()`: Load SMOLTRACE datasets as JSON
620
+ - `generate_synthetic_dataset()`: Create domain-specific test datasets with AI
621
+ - `push_dataset_to_hub()`: Upload datasets to HuggingFace Hub
622
 
623
  **Resources** (3 decorated functions with `@gr.mcp.resource()`):
624
  - `get_leaderboard_data()`: Raw leaderboard JSON data
 
766
 
767
  ### v1.0.0 (2025-11-14)
768
  - Initial release for MCP Hackathon
769
+ - **Complete MCP Implementation**: 13 components total
770
+ - 7 AI-powered tools (analyze_leaderboard, debug_trace, estimate_cost, compare_runs, get_dataset, generate_synthetic_dataset, push_dataset_to_hub)
771
  - 3 data resources (leaderboard, trace, cost data)
772
  - 3 prompt templates (analysis, debug, optimization)
773
  - Gradio native MCP support with decorators (`@gr.mcp.*`)