kshitijthakkar commited on
Commit
1d45733
·
1 Parent(s): 1e21c93

fix: Distinguish between AI-powered and data-retrieval MCP tool return types

Browse files

MCP tools have two different return types:
1. AI-powered tools (analyze_leaderboard, debug_trace, estimate_cost, etc.)
→ Return markdown text strings, use directly
2. Data-retrieval tools (get_top_performers, get_leaderboard_summary, etc.)
→ Return Python dict strings, must parse with ast.literal_eval()

Updated rule #4 to clearly document which tools return which types and
how to handle each correctly. This fixes the SyntaxError when trying to
parse markdown as Python dicts.

Files changed (1) hide show
  1. prompts/code_agent.yaml +9 -7
prompts/code_agent.yaml CHANGED
@@ -241,13 +241,15 @@ system_prompt: |-
241
  - For overview questions (e.g., "how many runs", "average success rate"): Use `run_get_leaderboard_summary()` (99% token savings!)
242
  - For leaderboard analysis with AI insights: Use `run_analyze_leaderboard()`
243
  - ONLY use `run_get_dataset()` for non-leaderboard datasets (traces, results, metrics)
244
- - **IMPORTANT - MCP Tool Returns**: MCP tools return STRING representations of Python dicts (with single quotes). ALWAYS use this pattern:
245
- ```python
246
- import ast
247
- result_raw = run_tool(...)
248
- result = ast.literal_eval(result_raw) if isinstance(result_raw, str) else result_raw
249
- ```
250
- Then access dict keys normally: `result['key']`. Use json.dumps() when converting dict to JSON string (e.g., for push_dataset_to_hub).
 
 
251
  5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
252
  6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
253
  7. Never create any notional variables in our code, as having these in your logs will derail you from the true variables.
 
241
  - For overview questions (e.g., "how many runs", "average success rate"): Use `run_get_leaderboard_summary()` (99% token savings!)
242
  - For leaderboard analysis with AI insights: Use `run_analyze_leaderboard()`
243
  - ONLY use `run_get_dataset()` for non-leaderboard datasets (traces, results, metrics)
244
+ - **IMPORTANT - MCP Tool Return Types**:
245
+ - **AI-powered tools** (analyze_leaderboard, debug_trace, estimate_cost, compare_runs, analyze_results) return **markdown text strings** - use directly, no parsing needed
246
+ - **Data tools** (get_top_performers, get_leaderboard_summary, get_dataset, generate_synthetic_dataset, push_dataset_to_hub) return **Python dict strings** - MUST parse with ast.literal_eval():
247
+ ```python
248
+ import ast
249
+ result_raw = run_get_top_performers(...)
250
+ result = ast.literal_eval(result_raw) if isinstance(result_raw, str) else result_raw
251
+ ```
252
+ - Use json.dumps() to convert dicts to JSON strings (e.g., for push_dataset_to_hub input).
253
  5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
254
  6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
255
  7. Never create any notional variables in our code, as having these in your logs will derail you from the true variables.