Spaces:
Running
Running
Commit
·
1d45733
1
Parent(s):
1e21c93
fix: Distinguish between AI-powered and data-retrieval MCP tool return types
Browse filesMCP tools have two different return types:
1. AI-powered tools (analyze_leaderboard, debug_trace, estimate_cost, etc.)
→ Return markdown text strings, use directly
2. Data-retrieval tools (get_top_performers, get_leaderboard_summary, etc.)
→ Return Python dict strings, must parse with ast.literal_eval()
Updated rule #4 to clearly document which tools return which types and
how to handle each correctly. This fixes the SyntaxError when trying to
parse markdown as Python dicts.
- prompts/code_agent.yaml +9 -7
prompts/code_agent.yaml
CHANGED
|
@@ -241,13 +241,15 @@ system_prompt: |-
|
|
| 241 |
- For overview questions (e.g., "how many runs", "average success rate"): Use `run_get_leaderboard_summary()` (99% token savings!)
|
| 242 |
- For leaderboard analysis with AI insights: Use `run_analyze_leaderboard()`
|
| 243 |
- ONLY use `run_get_dataset()` for non-leaderboard datasets (traces, results, metrics)
|
| 244 |
-
- **IMPORTANT - MCP Tool
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
|
|
|
|
|
|
| 251 |
5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
|
| 252 |
6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
|
| 253 |
7. Never create any notional variables in our code, as having these in your logs will derail you from the true variables.
|
|
|
|
| 241 |
- For overview questions (e.g., "how many runs", "average success rate"): Use `run_get_leaderboard_summary()` (99% token savings!)
|
| 242 |
- For leaderboard analysis with AI insights: Use `run_analyze_leaderboard()`
|
| 243 |
- ONLY use `run_get_dataset()` for non-leaderboard datasets (traces, results, metrics)
|
| 244 |
+
- **IMPORTANT - MCP Tool Return Types**:
|
| 245 |
+
- **AI-powered tools** (analyze_leaderboard, debug_trace, estimate_cost, compare_runs, analyze_results) return **markdown text strings** - use directly, no parsing needed
|
| 246 |
+
- **Data tools** (get_top_performers, get_leaderboard_summary, get_dataset, generate_synthetic_dataset, push_dataset_to_hub) return **Python dict strings** - MUST parse with ast.literal_eval():
|
| 247 |
+
```python
|
| 248 |
+
import ast
|
| 249 |
+
result_raw = run_get_top_performers(...)
|
| 250 |
+
result = ast.literal_eval(result_raw) if isinstance(result_raw, str) else result_raw
|
| 251 |
+
```
|
| 252 |
+
- Use json.dumps() to convert dicts to JSON strings (e.g., for push_dataset_to_hub input).
|
| 253 |
5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
|
| 254 |
6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
|
| 255 |
7. Never create any notional variables in our code, as having these in your logs will derail you from the true variables.
|