kshitijthakkar commited on
Commit
3fbacd1
Β·
1 Parent(s): 83ebb04

docs: Add comprehensive Modal and HF Jobs documentation

Browse files

README:
- Updated Features section with Multi-Cloud Evaluation and Smart Cost Estimation
- Added Modal as infrastructure option alongside HF Jobs
- Updated requirements section with Modal account setup
- Added hardware comparison table for both platforms
- Updated job submission workflow with both platforms
- Added expected duration estimates

Documentation Screen:
- Created new 'Job Submission' tab with complete guide
- Platform comparison: HF Jobs vs Modal
- Detailed hardware options and pricing for both platforms
- Auto-selection logic with code examples
- Cost estimation deep-dive (historical vs MCP AI)
- Job monitoring guide for both platforms
- Step-by-step workflow example
- Comprehensive troubleshooting section (10 common issues)
- Coverage: setup, execution, monitoring, debugging

Files changed (2) hide show
  1. README.md +46 -16
  2. screens/documentation.py +793 -0
README.md CHANGED
@@ -62,8 +62,9 @@ This platform is part of a complete agent evaluation ecosystem built on two foun
62
  - **πŸ“Š Real-time Leaderboard**: Live evaluation data from HuggingFace datasets
63
  - **πŸ€– Autonomous Agent Chat**: Interactive agent powered by smolagents with MCP tools (Track 2)
64
  - **πŸ’¬ MCP Integration**: AI-powered analysis using remote MCP servers
65
- - **πŸ’° Cost Estimation**: Calculate evaluation costs for different models and configurations
66
- - **πŸ” Trace Visualization**: Detailed OpenTelemetry trace analysis
 
67
  - **πŸ“ˆ Performance Metrics**: GPU utilization, CO2 emissions, token usage tracking
68
  - **🧠 Agent Reasoning**: View step-by-step agent planning and tool execution
69
 
@@ -180,11 +181,13 @@ If you don't configure your own keys:
180
 
181
  ## πŸš€ Submitting Evaluation Jobs
182
 
183
- TraceMind-AI allows you to submit evaluation jobs directly from the UI to HuggingFace Jobs infrastructure.
 
 
184
 
185
  ### ⚠️ Requirements for Job Submission
186
 
187
- **IMPORTANT**: To submit evaluation jobs, you need:
188
 
189
  1. **HuggingFace Pro Account** ($9/month)
190
  - Sign up at: https://huggingface.co/pricing
@@ -199,6 +202,19 @@ TraceMind-AI allows you to submit evaluation jobs directly from the UI to Huggin
199
  - βœ… **Run Jobs** (submit evaluation jobs)
200
  - ⚠️ Read-only tokens will NOT work
201
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
  3. **Model Provider API Keys**
203
  - OpenAI, Anthropic, Google, etc.
204
  - Configure in Settings β†’ LLM Provider API Keys
@@ -206,44 +222,58 @@ TraceMind-AI allows you to submit evaluation jobs directly from the UI to Huggin
206
 
207
  ### Hardware Options & Pricing
208
 
209
- TraceMind auto-selects hardware based on your model:
210
 
 
211
  - **cpu-basic**: API models (OpenAI, Anthropic) - ~$0.05/hr
212
  - **t4-small**: Small models (4B-8B parameters) - ~$0.60/hr
213
  - **a10g-small**: Medium models (7B-13B) - ~$1.10/hr
214
  - **a100-large**: Large models (70B+) - ~$3.00/hr
 
215
 
216
- Full pricing: https://huggingface.co/pricing#spaces-pricing
 
 
 
 
 
217
 
218
  ### How to Submit a Job
219
 
220
  1. **Configure API Keys** (Settings tab):
221
- - Add HF Token (with Run Jobs permission)
222
- - Add Modal API credentials (optional, for Modal execution)
223
  - Add LLM provider keys (OpenAI, Anthropic, etc.)
224
 
225
  2. **Create Evaluation** (New Evaluation tab):
226
- - Select infrastructure: HuggingFace Jobs or Modal
227
  - Choose model and agent type
228
- - Configure hardware (or use "auto")
229
  - Set timeout (default: 1h)
 
230
  - Click "Submit Evaluation"
231
 
232
  3. **Monitor Job**:
233
- - View job ID and status
234
- - Track at: https://huggingface.co/jobs
 
235
  - Results automatically appear in leaderboard when complete
236
 
237
  ### What Happens During a Job
238
 
239
- 1. Job starts on HuggingFace infrastructure
240
- 2. SMOLTRACE evaluates your model with OpenTelemetry tracing
241
- 3. Results uploaded to 4 HuggingFace datasets:
 
242
  - Leaderboard entry (summary stats)
243
  - Results dataset (test case details)
244
  - Traces dataset (OTEL spans)
245
  - Metrics dataset (GPU metrics, CO2 emissions)
246
- 4. Results appear in TraceMind leaderboard automatically
 
 
 
 
247
 
248
  ## Configuration
249
 
 
62
  - **πŸ“Š Real-time Leaderboard**: Live evaluation data from HuggingFace datasets
63
  - **πŸ€– Autonomous Agent Chat**: Interactive agent powered by smolagents with MCP tools (Track 2)
64
  - **πŸ’¬ MCP Integration**: AI-powered analysis using remote MCP servers
65
+ - **☁️ Multi-Cloud Evaluation**: Submit jobs to HuggingFace Jobs or Modal (H200, A100, A10 GPUs)
66
+ - **πŸ’° Smart Cost Estimation**: Auto-select hardware and predict costs before running evaluations
67
+ - **πŸ” Trace Visualization**: Detailed OpenTelemetry trace analysis with GPU metrics
68
  - **πŸ“ˆ Performance Metrics**: GPU utilization, CO2 emissions, token usage tracking
69
  - **🧠 Agent Reasoning**: View step-by-step agent planning and tool execution
70
 
 
181
 
182
  ## πŸš€ Submitting Evaluation Jobs
183
 
184
+ TraceMind-AI allows you to submit evaluation jobs to **two cloud platforms**:
185
+ - **HuggingFace Jobs**: Managed compute with H200, A100, A10, T4 GPUs
186
+ - **Modal**: Serverless GPU compute with pay-per-second pricing
187
 
188
  ### ⚠️ Requirements for Job Submission
189
 
190
+ **For HuggingFace Jobs:**
191
 
192
  1. **HuggingFace Pro Account** ($9/month)
193
  - Sign up at: https://huggingface.co/pricing
 
202
  - βœ… **Run Jobs** (submit evaluation jobs)
203
  - ⚠️ Read-only tokens will NOT work
204
 
205
+ **For Modal (Optional Alternative):**
206
+
207
+ 1. **Modal Account** (Free tier available)
208
+ - Sign up at: https://modal.com
209
+ - Generate API token at: https://modal.com/settings/tokens
210
+ - Pay-per-second billing (no monthly subscription)
211
+
212
+ 2. **Configure Modal Credentials in Settings**
213
+ - MODAL_TOKEN_ID (starts with `ak-`)
214
+ - MODAL_TOKEN_SECRET (starts with `as-`)
215
+
216
+ **Both Platforms Require:**
217
+
218
  3. **Model Provider API Keys**
219
  - OpenAI, Anthropic, Google, etc.
220
  - Configure in Settings β†’ LLM Provider API Keys
 
222
 
223
  ### Hardware Options & Pricing
224
 
225
+ TraceMind **auto-selects optimal hardware** based on your model size and provider:
226
 
227
+ **HuggingFace Jobs:**
228
  - **cpu-basic**: API models (OpenAI, Anthropic) - ~$0.05/hr
229
  - **t4-small**: Small models (4B-8B parameters) - ~$0.60/hr
230
  - **a10g-small**: Medium models (7B-13B) - ~$1.10/hr
231
  - **a100-large**: Large models (70B+) - ~$3.00/hr
232
+ - Pricing: https://huggingface.co/pricing#spaces-pricing
233
 
234
+ **Modal:**
235
+ - **CPU**: API models - ~$0.0001/sec
236
+ - **A10G**: Small-medium models (7B-13B) - ~$0.0006/sec
237
+ - **A100-80GB**: Large models (70B+) - ~$0.0030/sec
238
+ - **H200**: Fastest inference - ~$0.0050/sec
239
+ - Pricing: https://modal.com/pricing
240
 
241
  ### How to Submit a Job
242
 
243
  1. **Configure API Keys** (Settings tab):
244
+ - Add HF Token (with Run Jobs permission) - **required for both platforms**
245
+ - Add Modal credentials (MODAL_TOKEN_ID + MODAL_TOKEN_SECRET) - **for Modal only**
246
  - Add LLM provider keys (OpenAI, Anthropic, etc.)
247
 
248
  2. **Create Evaluation** (New Evaluation tab):
249
+ - **Select infrastructure**: HuggingFace Jobs or Modal
250
  - Choose model and agent type
251
+ - Configure hardware (or use **"auto"** for smart selection)
252
  - Set timeout (default: 1h)
253
+ - Click "πŸ’° Estimate Cost" to preview cost/duration
254
  - Click "Submit Evaluation"
255
 
256
  3. **Monitor Job**:
257
+ - View job ID and status in confirmation screen
258
+ - **HF Jobs**: Track at https://huggingface.co/jobs or use Job Monitoring tab
259
+ - **Modal**: Track at https://modal.com/apps
260
  - Results automatically appear in leaderboard when complete
261
 
262
  ### What Happens During a Job
263
 
264
+ 1. Job starts on selected infrastructure (HF Jobs or Modal)
265
+ 2. Docker container built with required dependencies
266
+ 3. SMOLTRACE evaluates your model with OpenTelemetry tracing
267
+ 4. Results uploaded to 4 HuggingFace datasets:
268
  - Leaderboard entry (summary stats)
269
  - Results dataset (test case details)
270
  - Traces dataset (OTEL spans)
271
  - Metrics dataset (GPU metrics, CO2 emissions)
272
+ 5. Results appear in TraceMind leaderboard automatically
273
+
274
+ **Expected Duration:**
275
+ - CPU jobs (API models): 2-5 minutes
276
+ - GPU jobs (local models): 15-30 minutes (includes model download)
277
 
278
  ## Configuration
279
 
screens/documentation.py CHANGED
@@ -1791,6 +1791,796 @@ TraceMind-MCP-Server demonstrates:
1791
  """)
1792
 
1793
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1794
  def create_documentation_screen():
1795
  """
1796
  Create the complete documentation screen with tabs
@@ -1818,6 +2608,9 @@ def create_documentation_screen():
1818
  with gr.Tab("πŸ”Œ TraceMind-MCP-Server"):
1819
  create_mcp_server_tab()
1820
 
 
 
 
1821
  gr.Markdown("""
1822
  ---
1823
 
 
1791
  """)
1792
 
1793
 
1794
+ def create_job_submission_tab():
1795
+ """Create the Job Submission tab with full details about Modal and HF Jobs"""
1796
+ return gr.Markdown("""
1797
+ # ☁️ Job Submission
1798
+
1799
+ **Run SMOLTRACE Evaluations on Cloud Infrastructure**
1800
+
1801
+ TraceMind-AI provides seamless integration with two cloud compute platforms, allowing you to run agent evaluations with automated hardware selection, cost estimation, and real-time monitoring.
1802
+
1803
+ ---
1804
+
1805
+ ## πŸ“‹ Table of Contents
1806
+
1807
+ - [Platform Overview](#-platform-overview)
1808
+ - [HuggingFace Jobs Integration](#-huggingface-jobs-integration)
1809
+ - [Modal Integration](#-modal-integration)
1810
+ - [Hardware Auto-Selection](#-hardware-auto-selection)
1811
+ - [Cost Estimation](#-cost-estimation)
1812
+ - [Job Monitoring](#-job-monitoring)
1813
+ - [Step-by-Step Guide](#-step-by-step-guide)
1814
+ - [Troubleshooting](#-troubleshooting)
1815
+
1816
+ ---
1817
+
1818
+ ## 🌟 Platform Overview
1819
+
1820
+ ### Supported Platforms
1821
+
1822
+ | Platform | Best For | Pricing Model | GPU Options | Free Tier |
1823
+ |----------|----------|---------------|-------------|-----------|
1824
+ | **HuggingFace Jobs** | Managed infrastructure, dataset integration | Per-hour | T4, L4, A10, A100, V5e | ❌ ($9/mo Pro required) |
1825
+ | **Modal** | Serverless compute, pay-per-second | Per-second | T4, L4, A10, A100-80GB, H200 | βœ… Free credits available |
1826
+
1827
+ ### Key Differences
1828
+
1829
+ **HuggingFace Jobs**:
1830
+ - βœ… Native HuggingFace ecosystem integration
1831
+ - βœ… Managed infrastructure with guaranteed availability
1832
+ - βœ… Built-in dataset storage and versioning
1833
+ - ⚠️ Requires Pro account ($9/month)
1834
+ - ⚠️ Per-hour billing (minimum 1 hour charge)
1835
+
1836
+ **Modal**:
1837
+ - βœ… Serverless architecture (no minimum charges)
1838
+ - βœ… Pay-per-second billing (more cost-effective for short jobs)
1839
+ - βœ… Latest GPUs (H200 available)
1840
+ - βœ… Free tier with credits
1841
+ - ⚠️ Requires separate account setup
1842
+ - ⚠️ Container cold start time (~2-3 minutes first run)
1843
+
1844
+ ---
1845
+
1846
+ ## πŸ€— HuggingFace Jobs Integration
1847
+
1848
+ ### Requirements
1849
+
1850
+ **1. HuggingFace Pro Account**
1851
+ - Cost: $9/month
1852
+ - Sign up: https://huggingface.co/pricing
1853
+ - Includes compute credits and priority support
1854
+
1855
+ **2. HuggingFace Token with Run Jobs Permission**
1856
+ ```
1857
+ Steps to create token:
1858
+ 1. Visit: https://huggingface.co/settings/tokens
1859
+ 2. Click "New token"
1860
+ 3. Name: "TraceMind Evaluation"
1861
+ 4. Permissions:
1862
+ βœ… Read (view datasets)
1863
+ βœ… Write (upload results)
1864
+ βœ… Run Jobs (submit evaluation jobs) ⚠️ REQUIRED
1865
+ 5. Copy token (starts with hf_)
1866
+ 6. Save in TraceMind Settings
1867
+ ```
1868
+
1869
+ ### Hardware Options
1870
+
1871
+ | Hardware | vCPUs | GPU | Memory | Best For | Price/hr |
1872
+ |----------|-------|-----|--------|----------|----------|
1873
+ | `cpu-basic` | 2 | - | 16 GB | API models (OpenAI, Anthropic) | ~$0.05 |
1874
+ | `cpu-upgrade` | 8 | - | 32 GB | API models (high volume) | ~$0.10 |
1875
+ | `t4-small` | 4 | T4 (16GB) | 16 GB | Small models (4B-8B) | ~$0.60 |
1876
+ | `t4-medium` | 8 | T4 (16GB) | 32 GB | Small models (batched) | ~$1.00 |
1877
+ | `a10g-small` | 4 | A10G (24GB) | 32 GB | Medium models (7B-13B) | ~$1.10 |
1878
+ | `a10g-large` | 12 | A10G (24GB) | 92 GB | Medium models (high memory) | ~$1.50 |
1879
+ | `a100-large` | 12 | A100 (80GB) | 142 GB | Large models (70B+) | ~$3.00 |
1880
+ | `v5e-1x1` | 4 | TPU v5e | 16 GB | TPU-optimized workloads | ~$1.20 |
1881
+
1882
+ Full pricing: https://huggingface.co/pricing#spaces-pricing
1883
+
1884
+ ### Auto-Selection Logic
1885
+
1886
+ When you select `hardware: auto`, TraceMind applies this logic:
1887
+
1888
+ ```python
1889
+ # API models (LiteLLM/Inference)
1890
+ if provider in ["litellm", "inference"]:
1891
+ hardware = "cpu-basic"
1892
+
1893
+ # Local models (Transformers)
1894
+ elif "70b" in model.lower() or "65b" in model.lower():
1895
+ hardware = "a100-large" # Large models
1896
+ elif "13b" in model.lower() or "34b" in model.lower():
1897
+ hardware = "a10g-large" # Medium models
1898
+ elif "7b" in model.lower() or "8b" in model.lower() or "4b" in model.lower():
1899
+ hardware = "t4-small" # Small models
1900
+ else:
1901
+ hardware = "t4-small" # Default
1902
+ ```
1903
+
1904
+ ### Job Workflow
1905
+
1906
+ ```
1907
+ 1. Configure Settings
1908
+ └─> Add HF Token (with Run Jobs permission)
1909
+ └─> Add LLM provider API keys
1910
+
1911
+ 2. Create Evaluation
1912
+ └─> Select "HuggingFace Jobs" as infrastructure
1913
+ └─> Choose model and configuration
1914
+ └─> Hardware auto-selected or manually chosen
1915
+
1916
+ 3. Submit Job
1917
+ └─> TraceMind validates credentials
1918
+ └─> Submits job via HF Jobs API
1919
+ └─> Returns job ID for monitoring
1920
+
1921
+ 4. Job Execution
1922
+ └─> Container built with dependencies
1923
+ └─> SMOLTRACE runs evaluation
1924
+ └─> Results uploaded to HF datasets
1925
+ └─> Leaderboard updated automatically
1926
+
1927
+ 5. Monitor Progress
1928
+ └─> Track at: https://huggingface.co/jobs
1929
+ └─> Or use Job Monitoring tab in TraceMind
1930
+ ```
1931
+
1932
+ ---
1933
+
1934
+ ## ⚑ Modal Integration
1935
+
1936
+ ### Requirements
1937
+
1938
+ **1. Modal Account**
1939
+ - Free tier: $30 free credits per month
1940
+ - Sign up: https://modal.com
1941
+
1942
+ **2. Modal API Credentials**
1943
+ ```
1944
+ Steps to get credentials:
1945
+ 1. Visit: https://modal.com/settings/tokens
1946
+ 2. Click "Create token"
1947
+ 3. Copy:
1948
+ - Token ID (starts with ak-)
1949
+ - Token Secret (starts with as-)
1950
+ 4. Save in TraceMind Settings:
1951
+ - MODAL_TOKEN_ID: ak-xxxxx
1952
+ - MODAL_TOKEN_SECRET: as-xxxxx
1953
+ ```
1954
+
1955
+ ### Hardware Options
1956
+
1957
+ | Hardware | GPU | Memory | Best For | Price/sec | Equivalent $/hr |
1958
+ |----------|-----|--------|----------|-----------|-----------------|
1959
+ | `CPU` | - | 16 GB | API models | ~$0.0001 | ~$0.36 |
1960
+ | `T4` | T4 (16GB) | 16 GB | Small models (4B-8B) | ~$0.0002 | ~$0.72 |
1961
+ | `L4` | L4 (24GB) | 24 GB | Small-medium models | ~$0.0004 | ~$1.44 |
1962
+ | `A10G` | A10G (24GB) | 32 GB | Medium models (7B-13B) | ~$0.0006 | ~$2.16 |
1963
+ | `L40S` | L40S (48GB) | 48 GB | Large models (optimized) | ~$0.0012 | ~$4.32 |
1964
+ | `A100` | A100 (40GB) | 64 GB | Large models | ~$0.0020 | ~$7.20 |
1965
+ | `A100-80GB` | A100 (80GB) | 128 GB | Very large models (70B+) | ~$0.0030 | ~$10.80 |
1966
+ | `H100` | H100 (80GB) | 192 GB | Latest generation inference | ~$0.0040 | ~$14.40 |
1967
+ | `H200` | H200 (141GB) | 256 GB | Cutting-edge, highest memory | ~$0.0050 | ~$18.00 |
1968
+
1969
+ Full pricing: https://modal.com/pricing
1970
+
1971
+ **πŸ’‘ Cost Advantage**: Modal's per-second billing is more cost-effective for jobs <1 hour!
1972
+
1973
+ ### Auto-Selection Logic
1974
+
1975
+ When you select `hardware: auto`, TraceMind applies this logic:
1976
+
1977
+ ```python
1978
+ # API models
1979
+ if provider in ["litellm", "inference"]:
1980
+ gpu = None # CPU only
1981
+
1982
+ # Local models (Transformers)
1983
+ elif "70b" in model.lower() or "65b" in model.lower():
1984
+ gpu = "A100-80GB" # Large models need 80GB
1985
+ elif "13b" in model.lower() or "34b" in model.lower():
1986
+ gpu = "A10G" # Medium models
1987
+ elif "7b" in model.lower() or "8b" in model.lower():
1988
+ gpu = "A10G" # Small models efficient on A10G
1989
+ else:
1990
+ gpu = "A10G" # Default
1991
+ ```
1992
+
1993
+ ### Modal-Specific Features
1994
+
1995
+ **Dynamic Python Version Matching**
1996
+ ```python
1997
+ # Automatically matches your environment
1998
+ python_version = f"{sys.version_info.major}.{sys.version_info.minor}"
1999
+ # Example: "3.10" on HF Space, "3.12" locally
2000
+ ```
2001
+
2002
+ **Optimized Docker Images**
2003
+ ```python
2004
+ # GPU jobs: CUDA-optimized base
2005
+ image = "nvidia/cuda:12.6.0-cudnn-devel-ubuntu22.04"
2006
+
2007
+ # CPU jobs: Lightweight
2008
+ image = "debian-slim"
2009
+ ```
2010
+
2011
+ **Smart Package Installation**
2012
+ ```python
2013
+ # GPU jobs get full stack
2014
+ packages = [
2015
+ "smoltrace",
2016
+ "transformers",
2017
+ "torch",
2018
+ "accelerate", # For device_map
2019
+ "bitsandbytes", # For quantization
2020
+ "hf_transfer", # Fast downloads
2021
+ "nvidia-ml-py" # GPU metrics
2022
+ ]
2023
+
2024
+ # CPU jobs get minimal dependencies
2025
+ packages = ["smoltrace", "litellm", "ddgs"]
2026
+ ```
2027
+
2028
+ ### Job Workflow
2029
+
2030
+ ```
2031
+ 1. Configure Settings
2032
+ └─> Add Modal Token ID + Secret
2033
+ └─> Add HF Token (for dataset upload)
2034
+ └─> Add LLM provider API keys
2035
+
2036
+ 2. Create Evaluation
2037
+ └─> Select "Modal" as infrastructure
2038
+ └─> Choose model and configuration
2039
+ └─> Hardware auto-selected
2040
+
2041
+ 3. Submit Job
2042
+ └─> TraceMind creates dynamic Modal app
2043
+ └─> Submits job in background thread
2044
+ └─> Returns Modal Call ID
2045
+
2046
+ 4. Job Execution
2047
+ └─> Image builds (or uses cache)
2048
+ └─> Model downloads to Modal storage
2049
+ └─> SMOLTRACE runs evaluation
2050
+ └─> Results uploaded to HF datasets
2051
+
2052
+ 5. Monitor Progress
2053
+ └─> Track at: https://modal.com/apps
2054
+ └─> View real-time streaming logs
2055
+ ```
2056
+
2057
+ ---
2058
+
2059
+ ## 🎯 Hardware Auto-Selection
2060
+
2061
+ ### How It Works
2062
+
2063
+ TraceMind **automatically selects optimal hardware** based on:
2064
+ 1. **Provider type**: LiteLLM/Inference (API) vs Transformers (local)
2065
+ 2. **Model size**: Extracted from model name (e.g., "70b", "13b", "8b")
2066
+ 3. **Platform**: Modal or HuggingFace Jobs
2067
+
2068
+ ### Selection Matrix
2069
+
2070
+ | Model Type | Model Size | HF Jobs | Modal |
2071
+ |------------|------------|---------|-------|
2072
+ | API (OpenAI, Anthropic) | Any | `cpu-basic` | `CPU` |
2073
+ | Transformers | 4B-8B | `t4-small` | `A10G` |
2074
+ | Transformers | 13B-34B | `a10g-large` | `A10G` |
2075
+ | Transformers | 70B+ | `a100-large` | `A100-80GB` |
2076
+
2077
+ ### Override Auto-Selection
2078
+
2079
+ You can manually select hardware if needed:
2080
+
2081
+ ```
2082
+ Reasons to override:
2083
+ - You know your model needs more memory
2084
+ - You want to test performance on different GPUs
2085
+ - You want to optimize cost vs speed tradeoff
2086
+ ```
2087
+
2088
+ ### Cost Estimation Shows Auto-Selection
2089
+
2090
+ When you click **"πŸ’° Estimate Cost"** with `auto` hardware:
2091
+
2092
+ **Modal Example**:
2093
+ ```
2094
+ Hardware: auto β†’ **A100-80GB** (Modal)
2095
+ Estimated Cost: $0.45
2096
+ Duration: 15 minutes
2097
+ ```
2098
+
2099
+ **HF Jobs Example**:
2100
+ ```
2101
+ Hardware: auto β†’ **a100-large** (HF Jobs)
2102
+ Estimated Cost: $0.75
2103
+ Duration: 15 minutes
2104
+ ```
2105
+
2106
+ ---
2107
+
2108
+ ## πŸ’° Cost Estimation
2109
+
2110
+ ### How Cost Estimation Works
2111
+
2112
+ TraceMind provides **AI-powered cost estimation** before you submit jobs:
2113
+
2114
+ **Data Sources**:
2115
+ 1. **Historical Data** (preferred): Analyzes past runs from leaderboard
2116
+ 2. **MCP Server** (fallback): Uses `estimate_cost` MCP tool with Gemini 2.5 Pro
2117
+
2118
+ ### Estimation Process
2119
+
2120
+ ```
2121
+ 1. User clicks "πŸ’° Estimate Cost"
2122
+
2123
+ 2. TraceMind checks for historical data
2124
+ └─> If found: Use average cost/duration from past runs
2125
+ └─> If not found: Call MCP Server for AI analysis
2126
+
2127
+ 3. Auto-selection applied
2128
+ └─> Determines actual hardware that will be used
2129
+ └─> Maps to pricing table
2130
+
2131
+ 4. Display estimate
2132
+ └─> Cost breakdown
2133
+ └─> Duration estimate
2134
+ └─> Hardware details
2135
+ ```
2136
+
2137
+ ### Cost Estimate Components
2138
+
2139
+ **Historical Data Estimate**:
2140
+ ```markdown
2141
+ ## πŸ’° Cost Estimate
2142
+
2143
+ **πŸ“Š Historical Data (5 past runs)**
2144
+
2145
+ | Metric | Value |
2146
+ |--------|-------|
2147
+ | Model | meta-llama/Llama-3.1-70B |
2148
+ | Hardware | auto β†’ **A100-80GB** (Modal) |
2149
+ | Estimated Cost | $0.45 |
2150
+ | Duration | 15.2 minutes |
2151
+
2152
+ ---
2153
+
2154
+ *Based on 5 previous evaluation runs in the leaderboard.*
2155
+ ```
2156
+
2157
+ **MCP AI Estimate**:
2158
+ ```markdown
2159
+ ## πŸ’° Cost Estimate - AI Analysis
2160
+
2161
+ **πŸ€– Powered by MCP Server + Gemini 2.5 Pro**
2162
+
2163
+ *This estimate was generated by AI analysis since no historical
2164
+ data is available for this model.*
2165
+
2166
+ **Hardware**: auto β†’ **A100-80GB** (Modal)
2167
+
2168
+ ---
2169
+
2170
+ Based on the model size (70B parameters) and evaluation
2171
+ configuration, I estimate:
2172
+
2173
+ **Cost Breakdown**:
2174
+ - Model download: ~5 minutes @ $0.0030/sec = $0.90
2175
+ - Evaluation (100 tests): ~10 minutes @ $0.0030/sec = $1.80
2176
+ - **Total estimated cost**: $2.70
2177
+
2178
+ **Duration**: 15-20 minutes
2179
+
2180
+ **Recommendations**:
2181
+ - For cost savings, consider using A10G with quantization
2182
+ - For faster inference, H200 reduces duration to ~8 minutes
2183
+ ```
2184
+
2185
+ ### Accuracy of Estimates
2186
+
2187
+ **Historical estimates**: Β±10% accuracy
2188
+ - Based on actual past runs
2189
+ - Accounts for model-specific behavior
2190
+
2191
+ **MCP AI estimates**: Β±30% accuracy
2192
+ - Uses model knowledge and heuristics
2193
+ - Conservative (tends to overestimate)
2194
+
2195
+ **Factors affecting accuracy**:
2196
+ - Model download time varies (network speed, caching)
2197
+ - Evaluation complexity depends on dataset
2198
+ - GPU availability can affect queue time
2199
+
2200
+ ---
2201
+
2202
+ ## πŸ” Job Monitoring
2203
+
2204
+ ### HuggingFace Jobs Monitoring
2205
+
2206
+ **Built-in Tab**: Go to **"πŸ” Job Monitoring"** in TraceMind
2207
+
2208
+ **Features**:
2209
+ ```
2210
+ πŸ“‹ Inspect Job
2211
+ └─> Enter HF Job ID
2212
+ └─> View status, hardware, timestamps
2213
+ └─> See next steps based on status
2214
+
2215
+ πŸ“œ Job Logs
2216
+ └─> Load execution logs
2217
+ └─> Auto-refresh option
2218
+ └─> Search and filter
2219
+
2220
+ πŸ“‘ Recent Jobs
2221
+ └─> List your recent jobs
2222
+ └─> Quick status overview
2223
+ └─> Click to inspect
2224
+ ```
2225
+
2226
+ **Job Statuses**:
2227
+ - ⏳ **QUEUED**: Waiting to start
2228
+ - πŸ”„ **STARTING**: Initializing (1-2 min)
2229
+ - ▢️ **RUNNING**: Executing evaluation
2230
+ - βœ… **SUCCEEDED**: Completed successfully
2231
+ - ❌ **FAILED**: Error occurred (check logs)
2232
+ - 🚫 **CANCELLED**: Manually stopped
2233
+
2234
+ **External Monitoring**:
2235
+ - HF Dashboard: https://huggingface.co/jobs
2236
+ - CLI: `hf jobs ps` and `hf jobs logs <job_id>`
2237
+
2238
+ ### Modal Monitoring
2239
+
2240
+ **Modal Dashboard**: https://modal.com/apps
2241
+
2242
+ **Features**:
2243
+ - Real-time streaming logs
2244
+ - GPU utilization graphs
2245
+ - Cost tracking
2246
+ - Container status
2247
+
2248
+ **Log Visibility**:
2249
+ TraceMind uses streaming output for Modal jobs:
2250
+ ```python
2251
+ # You'll see in real-time:
2252
+ ================================================================================
2253
+ Starting SMOLTRACE evaluation on Modal
2254
+ Command: smoltrace-eval --model Qwen/Qwen3-8B ...
2255
+ Python version: 3.10.0
2256
+ GPU: NVIDIA A10
2257
+ GPU Memory: 23.68 GB
2258
+ ================================================================================
2259
+
2260
+ Note: Model download may take several minutes for large models (14B = ~28GB)
2261
+ Downloading and initializing model...
2262
+
2263
+ [Download progress bars appear here]
2264
+ [Evaluation progress appears here]
2265
+
2266
+ ================================================================================
2267
+ EVALUATION COMPLETED
2268
+ Return code: 0
2269
+ ================================================================================
2270
+ ```
2271
+
2272
+ ### Expected Duration
2273
+
2274
+ **CPU Jobs (API Models)**:
2275
+ - Queue time: <1 minute
2276
+ - Execution: 2-5 minutes
2277
+ - **Total**: ~5 minutes
2278
+
2279
+ **GPU Jobs (Local Models)**:
2280
+ - Queue time: 1-3 minutes
2281
+ - Image build: 2-5 minutes (first run, then cached)
2282
+ - Model download: 5-15 minutes (14B = ~10 min, 70B = ~15 min)
2283
+ - Evaluation: 3-10 minutes (depends on dataset size)
2284
+ - **Total**: 15-30 minutes
2285
+
2286
+ **Pro Tip**: Modal caches images and models, so subsequent runs are **much faster** (skip image build and model download).
2287
+
2288
+ ---
2289
+
2290
+ ## πŸ“ Step-by-Step Guide
2291
+
2292
+ ### Complete Workflow Example
2293
+
2294
+ **Scenario**: Evaluate GPT-4 via LiteLLM on HuggingFace Jobs
2295
+
2296
+ #### Step 1: Configure API Keys
2297
+
2298
+ ```
2299
+ 1. Go to "βš™οΈ Settings" tab
2300
+ 2. Under "HuggingFace Configuration":
2301
+ - HF Token: [your token with Run Jobs permission]
2302
+ - Click "Save API Keys"
2303
+ 3. Under "LLM Provider API Keys":
2304
+ - OpenAI API Key: [your key]
2305
+ - Click "Save API Keys"
2306
+ ```
2307
+
2308
+ #### Step 2: Navigate to New Evaluation
2309
+
2310
+ ```
2311
+ 1. Click "πŸš€ New Evaluation" in sidebar
2312
+ 2. You'll see the evaluation form with multiple sections
2313
+ ```
2314
+
2315
+ #### Step 3: Configure Evaluation
2316
+
2317
+ **Infrastructure**:
2318
+ ```
2319
+ Infrastructure Provider: HuggingFace Jobs
2320
+ Hardware: auto (will select cpu-basic)
2321
+ ```
2322
+
2323
+ **Model Configuration**:
2324
+ ```
2325
+ Model: openai/gpt-4
2326
+ Provider: litellm
2327
+ ```
2328
+
2329
+ **Agent Configuration**:
2330
+ ```
2331
+ Agent Type: both (tool + code)
2332
+ Search Provider: duckduckgo
2333
+ Tools: python_interpreter, visit_webpage, duckduckgo_search
2334
+ ```
2335
+
2336
+ **Test Configuration**:
2337
+ ```
2338
+ Dataset: kshitijthakkar/smoltrace-tasks
2339
+ Split: train
2340
+ Difficulty: all
2341
+ Parallel Workers: 1
2342
+ ```
2343
+
2344
+ **Output & Monitoring**:
2345
+ ```
2346
+ Output Format: hub (HuggingFace datasets)
2347
+ Enable OTEL: βœ…
2348
+ Enable GPU Metrics: βœ… (N/A for CPU)
2349
+ Timeout: 1h
2350
+ ```
2351
+
2352
+ #### Step 4: Estimate Cost
2353
+
2354
+ ```
2355
+ 1. Click "πŸ’° Estimate Cost"
2356
+ 2. Review estimate:
2357
+ - Hardware: auto β†’ **cpu-basic** (HF Jobs)
2358
+ - Cost: ~$0.08
2359
+ - Duration: ~3 minutes
2360
+ ```
2361
+
2362
+ #### Step 5: Submit Job
2363
+
2364
+ ```
2365
+ 1. Click "Submit Evaluation"
2366
+ 2. Confirmation appears:
2367
+ βœ… Job submitted successfully!
2368
+
2369
+ Job Details:
2370
+ - Run ID: job_abc12345
2371
+ - HF Job ID: kshitijthakkar/def67890
2372
+ - Hardware: cpu-basic
2373
+ - Platform: HuggingFace Jobs
2374
+ ```
2375
+
2376
+ #### Step 6: Monitor Job
2377
+
2378
+ **Option A: TraceMind Job Monitoring**
2379
+ ```
2380
+ 1. Go to "πŸ” Job Monitoring" tab
2381
+ 2. Click "πŸ“‹ Inspect Job"
2382
+ 3. Paste HF Job ID: kshitijthakkar/def67890
2383
+ 4. Click "πŸ” Inspect Job"
2384
+ 5. View status and click "πŸ“₯ Load Logs"
2385
+ ```
2386
+
2387
+ **Option B: HuggingFace Dashboard**
2388
+ ```
2389
+ 1. Visit: https://huggingface.co/jobs
2390
+ 2. Find your job by ID or timestamp
2391
+ 3. View logs and status
2392
+ ```
2393
+
2394
+ #### Step 7: View Results
2395
+
2396
+ ```
2397
+ When job completes (SUCCEEDED):
2398
+ 1. Go to "πŸ“Š Leaderboard" tab
2399
+ 2. Click "Load Leaderboard"
2400
+ 3. Find your run (job_abc12345)
2401
+ 4. Click row to view detailed results
2402
+ ```
2403
+
2404
+ ---
2405
+
2406
+ ## πŸ”§ Troubleshooting
2407
+
2408
+ ### Common Issues & Solutions
2409
+
2410
+ #### 1. "Modal package not installed"
2411
+
2412
+ **Error**:
2413
+ ```
2414
+ Modal package not installed. Install with: pip install modal
2415
+ ```
2416
+
2417
+ **Solution**:
2418
+ ```bash
2419
+ pip install modal>=0.64.0
2420
+ ```
2421
+
2422
+ #### 2. "HuggingFace token not configured"
2423
+
2424
+ **Error**:
2425
+ ```
2426
+ HuggingFace token not configured. Please set HF_TOKEN in Settings.
2427
+ ```
2428
+
2429
+ **Solution**:
2430
+ 1. Get token from: https://huggingface.co/settings/tokens
2431
+ 2. Add in Settings β†’ HuggingFace Configuration
2432
+ 3. Ensure permissions include **Read**, **Write**, and **Run Jobs**
2433
+
2434
+ #### 3. "Modal authentication failed"
2435
+
2436
+ **Error**:
2437
+ ```
2438
+ Modal authentication failed. Please verify your MODAL_TOKEN_ID
2439
+ and MODAL_TOKEN_SECRET in Settings.
2440
+ ```
2441
+
2442
+ **Solution**:
2443
+ 1. Get credentials from: https://modal.com/settings/tokens
2444
+ 2. Add both:
2445
+ - MODAL_TOKEN_ID (starts with `ak-`)
2446
+ - MODAL_TOKEN_SECRET (starts with `as-`)
2447
+ 3. Save and retry
2448
+
2449
+ #### 4. "Job failed - Python version mismatch"
2450
+
2451
+ **Error** (in Modal logs):
2452
+ ```
2453
+ The 'submit_modal_job.<locals>.run_evaluation' Function
2454
+ was defined with Python 3.12, but its Image has 3.10.
2455
+ ```
2456
+
2457
+ **Solution**:
2458
+ This is automatically fixed in the latest version! TraceMind now dynamically matches Python versions.
2459
+
2460
+ If still occurring:
2461
+ 1. Pull latest code: `git pull origin main`
2462
+ 2. Restart app
2463
+
2464
+ #### 5. "Fast download using 'hf_transfer' is enabled but package not available"
2465
+
2466
+ **Error** (in Modal logs):
2467
+ ```
2468
+ ValueError: Fast download using 'hf_transfer' is enabled but
2469
+ 'hf_transfer' package is not available.
2470
+ ```
2471
+
2472
+ **Solution**:
2473
+ This is automatically fixed in the latest version! TraceMind now includes `hf_transfer` in GPU job packages.
2474
+
2475
+ If still occurring:
2476
+ 1. Pull latest code
2477
+ 2. Modal will rebuild image with new dependencies
2478
+
2479
+ #### 6. "Job stuck at 'Downloading model'"
2480
+
2481
+ **Symptoms**:
2482
+ - Logs show "Downloading and initializing model..."
2483
+ - No progress for 10+ minutes
2484
+
2485
+ **Explanation**:
2486
+ - Large models (14B+) take 10-15 minutes to download
2487
+ - This is normal! Model size: 28GB for 14B, 140GB for 70B
2488
+
2489
+ **Solution**:
2490
+ - Be patient - download is in progress (Modal's network is fast)
2491
+ - Future runs will be cached and start instantly
2492
+ - Check Modal dashboard for download progress
2493
+
2494
+ #### 7. "Job completed but no results in leaderboard"
2495
+
2496
+ **Symptoms**:
2497
+ - Job status shows SUCCEEDED
2498
+ - No entry in leaderboard
2499
+
2500
+ **Possible Causes**:
2501
+ 1. Results uploaded to different user's namespace
2502
+ 2. Leaderboard not refreshed
2503
+ 3. Job failed during result upload
2504
+
2505
+ **Solution**:
2506
+ ```
2507
+ 1. Refresh leaderboard: Click "Load Leaderboard"
2508
+ 2. Check HF dataset repos:
2509
+ - kshitijthakkar/smoltrace-leaderboard
2510
+ - kshitijthakkar/smoltrace-results-<timestamp>
2511
+ 3. Verify HF token has Write permission
2512
+ 4. Check job logs for upload errors
2513
+ ```
2514
+
2515
+ #### 8. "Cannot submit job - HuggingFace Pro required"
2516
+
2517
+ **Error**:
2518
+ ```
2519
+ HuggingFace Pro Account ($9/month) required to submit jobs.
2520
+ Free accounts cannot submit jobs.
2521
+ ```
2522
+
2523
+ **Solution**:
2524
+ - Option A: Upgrade to HF Pro: https://huggingface.co/pricing
2525
+ - Option B: Use Modal instead (has free tier with credits)
2526
+
2527
+ #### 9. "Modal job exits after image build"
2528
+
2529
+ **Symptoms**:
2530
+ - Logs show: "Stopping app - local entrypoint completed"
2531
+ - Job ends without running evaluation
2532
+
2533
+ **Solution**:
2534
+ This was a known issue (fixed in latest version). The problem was using `.spawn()` with `with app.run()` context.
2535
+
2536
+ Current implementation uses `.remote()` in background thread, which ensures job completes.
2537
+
2538
+ If still occurring:
2539
+ 1. Pull latest code: `git pull origin main`
2540
+ 2. Restart app
2541
+ 3. Resubmit job
2542
+
2543
+ #### 10. "Cost estimate shows wrong hardware"
2544
+
2545
+ **Symptoms**:
2546
+ - Selected Modal with 70B model
2547
+ - Cost estimate shows "a10g-small" instead of "A100-80GB"
2548
+
2549
+ **Solution**:
2550
+ This was a known issue (fixed in latest version). Cost estimation now applies platform-specific auto-selection logic.
2551
+
2552
+ Verify fix:
2553
+ 1. Pull latest code
2554
+ 2. Click "πŸ’° Estimate Cost"
2555
+ 3. Should show: `auto β†’ **A100-80GB** (Modal)`
2556
+
2557
+ ---
2558
+
2559
+ ## πŸ“ž Getting Help
2560
+
2561
+ ### Resources
2562
+
2563
+ **Documentation**:
2564
+ - TraceMind Docs: This tab!
2565
+ - SMOLTRACE Docs: [GitHub](https://github.com/Mandark-droid/SMOLTRACE)
2566
+ - Modal Docs: https://modal.com/docs
2567
+ - HF Jobs Docs: https://huggingface.co/docs/hub/spaces-sdks-docker
2568
+
2569
+ **Community**:
2570
+ - GitHub Issues: [TraceMind-AI Issues](https://github.com/Mandark-droid/TraceMind-AI/issues)
2571
+ - LinkedIn: [@kshitij-thakkar](https://www.linkedin.com/in/kshitij-thakkar-2061b924)
2572
+
2573
+ **Support**:
2574
+ - For TraceMind bugs: Open GitHub issue
2575
+ - For Modal issues: https://modal.com/docs/support
2576
+ - For HF Jobs issues: https://discuss.huggingface.co/
2577
+
2578
+ ---
2579
+
2580
+ *TraceMind-AI - Multi-cloud agent evaluation made simple* ☁️
2581
+ """)
2582
+
2583
+
2584
  def create_documentation_screen():
2585
  """
2586
  Create the complete documentation screen with tabs
 
2608
  with gr.Tab("πŸ”Œ TraceMind-MCP-Server"):
2609
  create_mcp_server_tab()
2610
 
2611
+ with gr.Tab("☁️ Job Submission"):
2612
+ create_job_submission_tab()
2613
+
2614
  gr.Markdown("""
2615
  ---
2616