kshitijthakkar commited on
Commit
60c4817
·
1 Parent(s): 3001796

feat: Use real token estimates from actual evaluation data

Browse files

Updated token usage estimates based on analysis of real evaluation results
from kshitijthakkar/smoltrace-results-20251117_104845:

Old estimates (way too low):
- tool: 350 total tokens
- code: 700 total tokens
- both: 900 total tokens

New estimates (from real data):
- tool: 12,629 avg tokens (36x higher!)
- code: 17,202 avg tokens (25x higher!)
- both: 14,833 avg tokens

Input/output split: 60/40 based on typical agent patterns
(large context with system prompts, tool outputs, reasoning chains)

This dramatically improves cost estimation accuracy, especially for API models
where token costs dominate.

Files changed (1) hide show
  1. mcp_tools.py +16 -7
mcp_tools.py CHANGED
@@ -344,14 +344,23 @@ async def estimate_cost(
344
  else:
345
  model_cost = {"input_cost_per_token": 0, "output_cost_per_token": 0} # Local model
346
 
347
- # Estimate token usage per test
348
- # Tool agent: ~200 tokens input, ~150 output
349
- # Code agent: ~300 tokens input, ~400 output
350
- # Both: ~400 tokens input, ~500 output
351
  token_estimates = {
352
- "tool": {"input": 200, "output": 150},
353
- "code": {"input": 300, "output": 400},
354
- "both": {"input": 400, "output": 500}
 
 
 
 
 
 
 
 
 
355
  }
356
 
357
  tokens_per_test = token_estimates[agent_type]
 
344
  else:
345
  model_cost = {"input_cost_per_token": 0, "output_cost_per_token": 0} # Local model
346
 
347
+ # Estimate token usage per test (based on real data from kshitijthakkar/smoltrace-results-20251117_104845)
348
+ # These are averages from actual agent evaluation runs
349
+ # Input/output split estimated at 60/40 based on typical agent patterns
350
+ # (agents have large context with system prompts, tool outputs, etc.)
351
  token_estimates = {
352
+ "tool": {
353
+ "input": 7577, # 60% of 12,629 avg total tokens
354
+ "output": 5052 # 40% of 12,629 avg total tokens
355
+ },
356
+ "code": {
357
+ "input": 10321, # 60% of 17,202 avg total tokens
358
+ "output": 6881 # 40% of 17,202 avg total tokens
359
+ },
360
+ "both": {
361
+ "input": 8900, # Average of tool+code inputs
362
+ "output": 5933 # Average of tool+code outputs
363
+ }
364
  }
365
 
366
  tokens_per_test = token_estimates[agent_type]