Joseph Pollack commited on
Commit
a46bf8b
·
unverified ·
1 Parent(s): 5840d45

restore docs ci

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .github/README.md +20 -166
  2. .pre-commit-hooks/run_pytest.ps1 +5 -0
  3. .pre-commit-hooks/run_pytest.sh +5 -0
  4. .pre-commit-hooks/run_pytest_embeddings.ps1 +14 -0
  5. .pre-commit-hooks/run_pytest_embeddings.sh +15 -0
  6. .pre-commit-hooks/run_pytest_unit.ps1 +14 -0
  7. .pre-commit-hooks/run_pytest_unit.sh +15 -0
  8. .pre-commit-hooks/run_pytest_with_sync.ps1 +25 -0
  9. .pre-commit-hooks/run_pytest_with_sync.py +235 -0
  10. README.md +92 -172
  11. dev/.cursorrules +241 -0
  12. dev/AGENTS.txt +236 -0
  13. dev/Makefile +51 -0
  14. dev/docs_plugins.py +74 -0
  15. docs/CONFIGURATION.md +0 -301
  16. docs/api/agents.md +0 -3
  17. docs/api/models.md +0 -3
  18. docs/api/orchestrators.md +0 -3
  19. docs/api/services.md +0 -3
  20. docs/api/tools.md +0 -3
  21. docs/architecture/agents.md +0 -3
  22. docs/architecture/design-patterns.md +0 -1509
  23. docs/architecture/graph-orchestration.md +152 -0
  24. docs/architecture/graph_orchestration.md +8 -0
  25. docs/architecture/middleware.md +0 -3
  26. docs/architecture/orchestrators.md +198 -0
  27. docs/architecture/overview.md +0 -474
  28. docs/architecture/services.md +0 -3
  29. docs/architecture/tools.md +0 -3
  30. docs/architecture/workflow-diagrams.md +670 -0
  31. docs/{workflow-diagrams.md → architecture/workflows.md} +0 -0
  32. docs/brainstorming/00_ROADMAP_SUMMARY.md +0 -194
  33. docs/brainstorming/01_PUBMED_IMPROVEMENTS.md +0 -125
  34. docs/brainstorming/02_CLINICALTRIALS_IMPROVEMENTS.md +0 -193
  35. docs/brainstorming/03_EUROPEPMC_IMPROVEMENTS.md +0 -211
  36. docs/brainstorming/04_OPENALEX_INTEGRATION.md +0 -303
  37. docs/brainstorming/implementation/15_PHASE_OPENALEX.md +0 -603
  38. docs/brainstorming/implementation/16_PHASE_PUBMED_FULLTEXT.md +0 -586
  39. docs/brainstorming/implementation/17_PHASE_RATE_LIMITING.md +0 -540
  40. docs/brainstorming/implementation/README.md +0 -143
  41. docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md +0 -189
  42. docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md +0 -289
  43. docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md +0 -112
  44. docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md +0 -112
  45. docs/brainstorming/magentic-pydantic/04_FOLLOWUP_REVIEW_REQUEST.md +0 -158
  46. docs/brainstorming/magentic-pydantic/REVIEW_PROMPT_FOR_SENIOR_AGENT.md +0 -113
  47. docs/bugs/FIX_PLAN_MAGENTIC_MODE.md +0 -227
  48. docs/bugs/P0_MAGENTIC_MODE_BROKEN.md +0 -116
  49. docs/bugs/P1_GRADIO_SETTINGS_CLEANUP.md +0 -81
  50. docs/configuration/CONFIGURATION.md +743 -0
.github/README.md CHANGED
@@ -1,38 +1,21 @@
1
- ---
2
- title: DeepCritical
3
- emoji: 🧬
4
- colorFrom: blue
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: "6.0.1"
8
- python_version: "3.11"
9
- app_file: src/app.py
10
- pinned: false
11
- license: mit
12
- tags:
13
- - mcp-in-action-track-enterprise
14
- - mcp-hackathon
15
- - drug-repurposing
16
- - biomedical-ai
17
- - pydantic-ai
18
- - llamaindex
19
- - modal
20
- ---
21
-
22
- # DeepCritical
23
-
24
- ## Intro
25
-
26
- ## Features
27
-
28
- - **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
29
- - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
30
- - **Modal Sandbox**: Secure execution of AI-generated statistical code
31
- - **LlamaIndex RAG**: Semantic search and evidence synthesis
32
- - **HuggingfaceInference**:
33
- - **HuggingfaceMCP Custom Config To Use Community Tools**:
34
- - **Strongly Typed Composable Graphs**:
35
- - **Specialized Research Teams of Agents**:
36
 
37
  ## Quick Start
38
 
@@ -43,14 +26,14 @@ tags:
43
  pip install uv
44
 
45
  # Sync dependencies
46
- uv sync
47
  ```
48
 
49
  ### 2. Run the UI
50
 
51
  ```bash
52
  # Start the Gradio app
53
- uv run gradio run src/app.py
54
  ```
55
 
56
  Open your browser to `http://localhost:7860`.
@@ -72,132 +55,3 @@ Add this to your `claude_desktop_config.json`:
72
  }
73
  }
74
  ```
75
-
76
- **Available Tools**:
77
- - `search_pubmed`: Search peer-reviewed biomedical literature.
78
- - `search_clinical_trials`: Search ClinicalTrials.gov.
79
- - `search_biorxiv`: Search bioRxiv/medRxiv preprints.
80
- - `search_all`: Search all sources simultaneously.
81
- - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
82
-
83
-
84
- ## Deep Research Flows
85
-
86
- - iterativeResearch
87
- - deepResearch
88
- - researchTeam
89
-
90
- ### Iterative Research
91
-
92
- sequenceDiagram
93
- participant IterativeFlow
94
- participant ThinkingAgent
95
- participant KnowledgeGapAgent
96
- participant ToolSelector
97
- participant ToolExecutor
98
- participant JudgeHandler
99
- participant WriterAgent
100
-
101
- IterativeFlow->>IterativeFlow: run(query)
102
-
103
- loop Until complete or max_iterations
104
- IterativeFlow->>ThinkingAgent: generate_observations()
105
- ThinkingAgent-->>IterativeFlow: observations
106
-
107
- IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
108
- KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
109
-
110
- alt Research complete
111
- IterativeFlow->>WriterAgent: create_final_report()
112
- WriterAgent-->>IterativeFlow: final_report
113
- else Gaps remain
114
- IterativeFlow->>ToolSelector: select_agents(gap)
115
- ToolSelector-->>IterativeFlow: AgentSelectionPlan
116
-
117
- IterativeFlow->>ToolExecutor: execute_tool_tasks()
118
- ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
119
-
120
- IterativeFlow->>JudgeHandler: assess_evidence()
121
- JudgeHandler-->>IterativeFlow: should_continue
122
- end
123
- end
124
-
125
-
126
- ### Deep Research
127
-
128
- sequenceDiagram
129
- actor User
130
- participant GraphOrchestrator
131
- participant InputParser
132
- participant GraphBuilder
133
- participant GraphExecutor
134
- participant Agent
135
- participant BudgetTracker
136
- participant WorkflowState
137
-
138
- User->>GraphOrchestrator: run(query)
139
- GraphOrchestrator->>InputParser: detect_research_mode(query)
140
- InputParser-->>GraphOrchestrator: mode (iterative/deep)
141
- GraphOrchestrator->>GraphBuilder: build_graph(mode)
142
- GraphBuilder-->>GraphOrchestrator: ResearchGraph
143
- GraphOrchestrator->>WorkflowState: init_workflow_state()
144
- GraphOrchestrator->>BudgetTracker: create_budget()
145
- GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
146
-
147
- loop For each node in graph
148
- GraphExecutor->>Agent: execute_node(agent_node)
149
- Agent->>Agent: process_input
150
- Agent-->>GraphExecutor: result
151
- GraphExecutor->>WorkflowState: update_state(result)
152
- GraphExecutor->>BudgetTracker: add_tokens(used)
153
- GraphExecutor->>BudgetTracker: check_budget()
154
- alt Budget exceeded
155
- GraphExecutor->>GraphOrchestrator: emit(error_event)
156
- else Continue
157
- GraphExecutor->>GraphOrchestrator: emit(progress_event)
158
- end
159
- end
160
-
161
- GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
162
-
163
- ### Research Team
164
- Critical Deep Research Agent
165
-
166
- ## Development
167
-
168
- ### Run Tests
169
-
170
- ```bash
171
- uv run pytest
172
- ```
173
-
174
- ### Run Checks
175
-
176
- ```bash
177
- make check
178
- ```
179
-
180
- ## Architecture
181
-
182
- DeepCritical uses a Vertical Slice Architecture:
183
-
184
- 1. **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
185
- 2. **Judge Slice**: Evaluating evidence quality using LLMs.
186
- 3. **Orchestrator Slice**: Managing the research loop and UI.
187
-
188
- Built with:
189
- - **PydanticAI**: For robust agent interactions.
190
- - **Gradio**: For the streaming user interface.
191
- - **PubMed, ClinicalTrials.gov, bioRxiv**: For biomedical data.
192
- - **MCP**: For universal tool access.
193
- - **Modal**: For secure code execution.
194
-
195
- ## Team
196
-
197
- - The-Obstacle-Is-The-Way
198
- - MarioAderman
199
- - Josephrp
200
-
201
- ## Links
202
-
203
- - [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)
 
1
+
2
+ > [!IMPORTANT]
3
+ > **You are reading the Github README!**
4
+ >
5
+ > - 📚 **Documentation**: See our [technical documentation](https://deepcritical.github.io/GradioDemo/) for detailed information
6
+ > - 📖 **Demo README**: Check out the [Demo README](..README.md) for setup, configuration, and contribution guidelines
7
+ > - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
8
+
9
+
10
+ <div align="center">
11
+
12
+ [![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
13
+ [![Documentation](https://img.shields.io/badge/Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
14
+ [![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
15
+ [![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
16
+ [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
17
+
18
+ </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Quick Start
21
 
 
26
  pip install uv
27
 
28
  # Sync dependencies
29
+ uv sync --all-extras
30
  ```
31
 
32
  ### 2. Run the UI
33
 
34
  ```bash
35
  # Start the Gradio app
36
+ gradio run "src/app.py"
37
  ```
38
 
39
  Open your browser to `http://localhost:7860`.
 
55
  }
56
  }
57
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.pre-commit-hooks/run_pytest.ps1 CHANGED
@@ -2,6 +2,8 @@
2
  # Uses uv if available, otherwise falls back to pytest
3
 
4
  if (Get-Command uv -ErrorAction SilentlyContinue) {
 
 
5
  uv run pytest $args
6
  } else {
7
  Write-Warning "uv not found, using system pytest (may have missing dependencies)"
@@ -12,3 +14,6 @@ if (Get-Command uv -ErrorAction SilentlyContinue) {
12
 
13
 
14
 
 
 
 
 
2
  # Uses uv if available, otherwise falls back to pytest
3
 
4
  if (Get-Command uv -ErrorAction SilentlyContinue) {
5
+ # Sync dependencies before running tests
6
+ uv sync
7
  uv run pytest $args
8
  } else {
9
  Write-Warning "uv not found, using system pytest (may have missing dependencies)"
 
14
 
15
 
16
 
17
+
18
+
19
+
.pre-commit-hooks/run_pytest.sh CHANGED
@@ -3,6 +3,8 @@
3
  # Uses uv if available, otherwise falls back to pytest
4
 
5
  if command -v uv >/dev/null 2>&1; then
 
 
6
  uv run pytest "$@"
7
  else
8
  echo "Warning: uv not found, using system pytest (may have missing dependencies)"
@@ -13,3 +15,6 @@ fi
13
 
14
 
15
 
 
 
 
 
3
  # Uses uv if available, otherwise falls back to pytest
4
 
5
  if command -v uv >/dev/null 2>&1; then
6
+ # Sync dependencies before running tests
7
+ uv sync
8
  uv run pytest "$@"
9
  else
10
  echo "Warning: uv not found, using system pytest (may have missing dependencies)"
 
15
 
16
 
17
 
18
+
19
+
20
+
.pre-commit-hooks/run_pytest_embeddings.ps1 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PowerShell wrapper to sync embeddings dependencies and run embeddings tests
2
+
3
+ $ErrorActionPreference = "Stop"
4
+
5
+ if (Get-Command uv -ErrorAction SilentlyContinue) {
6
+ Write-Host "Syncing embeddings dependencies..."
7
+ uv sync --extra embeddings
8
+ Write-Host "Running embeddings tests..."
9
+ uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
10
+ } else {
11
+ Write-Error "uv not found"
12
+ exit 1
13
+ }
14
+
.pre-commit-hooks/run_pytest_embeddings.sh ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Wrapper script to sync embeddings dependencies and run embeddings tests
3
+
4
+ set -e
5
+
6
+ if command -v uv >/dev/null 2>&1; then
7
+ echo "Syncing embeddings dependencies..."
8
+ uv sync --extra embeddings
9
+ echo "Running embeddings tests..."
10
+ uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
11
+ else
12
+ echo "Error: uv not found"
13
+ exit 1
14
+ fi
15
+
.pre-commit-hooks/run_pytest_unit.ps1 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PowerShell wrapper to sync dependencies and run unit tests
2
+
3
+ $ErrorActionPreference = "Stop"
4
+
5
+ if (Get-Command uv -ErrorAction SilentlyContinue) {
6
+ Write-Host "Syncing dependencies..."
7
+ uv sync
8
+ Write-Host "Running unit tests..."
9
+ uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
10
+ } else {
11
+ Write-Error "uv not found"
12
+ exit 1
13
+ }
14
+
.pre-commit-hooks/run_pytest_unit.sh ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Wrapper script to sync dependencies and run unit tests
3
+
4
+ set -e
5
+
6
+ if command -v uv >/dev/null 2>&1; then
7
+ echo "Syncing dependencies..."
8
+ uv sync
9
+ echo "Running unit tests..."
10
+ uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
11
+ else
12
+ echo "Error: uv not found"
13
+ exit 1
14
+ fi
15
+
.pre-commit-hooks/run_pytest_with_sync.ps1 ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PowerShell wrapper for pytest runner
2
+ # Ensures uv is available and runs the Python script
3
+
4
+ param(
5
+ [Parameter(Position=0)]
6
+ [string]$TestType = "unit"
7
+ )
8
+
9
+ $ErrorActionPreference = "Stop"
10
+
11
+ # Check if uv is available
12
+ if (-not (Get-Command uv -ErrorAction SilentlyContinue)) {
13
+ Write-Error "uv not found. Please install uv: https://github.com/astral-sh/uv"
14
+ exit 1
15
+ }
16
+
17
+ # Get the script directory
18
+ $ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
19
+ $PythonScript = Join-Path $ScriptDir "run_pytest_with_sync.py"
20
+
21
+ # Run the Python script using uv
22
+ uv run python $PythonScript $TestType
23
+
24
+ exit $LASTEXITCODE
25
+
.pre-commit-hooks/run_pytest_with_sync.py ADDED
@@ -0,0 +1,235 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """Cross-platform pytest runner that syncs dependencies before running tests."""
3
+
4
+ import shutil
5
+ import subprocess
6
+ import sys
7
+ from pathlib import Path
8
+
9
+
10
+ def clean_caches(project_root: Path) -> None:
11
+ """Remove pytest and Python cache directories and files.
12
+
13
+ Comprehensively removes all cache files and directories to ensure
14
+ clean test runs. Only scans specific directories to avoid resource
15
+ exhaustion from scanning large directories like .venv on Windows.
16
+ """
17
+ # Directories to scan for caches (only project code, not dependencies)
18
+ scan_dirs = ["src", "tests", ".pre-commit-hooks"]
19
+
20
+ # Directories to exclude (to avoid resource issues)
21
+ exclude_dirs = {
22
+ ".venv",
23
+ "venv",
24
+ "ENV",
25
+ "env",
26
+ ".git",
27
+ "node_modules",
28
+ "dist",
29
+ "build",
30
+ ".eggs",
31
+ "reference_repos",
32
+ "folder",
33
+ }
34
+
35
+ # Comprehensive list of cache patterns to remove
36
+ cache_patterns = [
37
+ ".pytest_cache",
38
+ "__pycache__",
39
+ "*.pyc",
40
+ "*.pyo",
41
+ "*.pyd",
42
+ ".mypy_cache",
43
+ ".ruff_cache",
44
+ ".coverage",
45
+ "coverage.xml",
46
+ "htmlcov",
47
+ ".hypothesis", # Hypothesis testing framework cache
48
+ ".tox", # Tox cache (if used)
49
+ ".cache", # General Python cache
50
+ ]
51
+
52
+ def should_exclude(path: Path) -> bool:
53
+ """Check if a path should be excluded from cache cleanup."""
54
+ # Check if any parent directory is in exclude list
55
+ for parent in path.parents:
56
+ if parent.name in exclude_dirs:
57
+ return True
58
+ # Check if the path itself is excluded
59
+ if path.name in exclude_dirs:
60
+ return True
61
+ return False
62
+
63
+ cleaned = []
64
+
65
+ # Only scan specific directories to avoid resource exhaustion
66
+ for scan_dir in scan_dirs:
67
+ scan_path = project_root / scan_dir
68
+ if not scan_path.exists():
69
+ continue
70
+
71
+ for pattern in cache_patterns:
72
+ if "*" in pattern:
73
+ # Handle glob patterns for files
74
+ try:
75
+ for cache_file in scan_path.rglob(pattern):
76
+ if should_exclude(cache_file):
77
+ continue
78
+ try:
79
+ if cache_file.is_file():
80
+ cache_file.unlink()
81
+ cleaned.append(str(cache_file.relative_to(project_root)))
82
+ except OSError:
83
+ pass # Ignore errors (file might be locked or already deleted)
84
+ except OSError:
85
+ pass # Ignore errors during directory traversal
86
+ else:
87
+ # Handle directory patterns
88
+ try:
89
+ for cache_dir in scan_path.rglob(pattern):
90
+ if should_exclude(cache_dir):
91
+ continue
92
+ try:
93
+ if cache_dir.is_dir():
94
+ shutil.rmtree(cache_dir, ignore_errors=True)
95
+ cleaned.append(str(cache_dir.relative_to(project_root)))
96
+ except OSError:
97
+ pass # Ignore errors (directory might be locked)
98
+ except OSError:
99
+ pass # Ignore errors during directory traversal
100
+
101
+ # Also clean root-level caches (like .pytest_cache in project root)
102
+ root_cache_patterns = [
103
+ ".pytest_cache",
104
+ ".mypy_cache",
105
+ ".ruff_cache",
106
+ ".coverage",
107
+ "coverage.xml",
108
+ "htmlcov",
109
+ ".hypothesis",
110
+ ".tox",
111
+ ".cache",
112
+ ".pytest",
113
+ ]
114
+ for pattern in root_cache_patterns:
115
+ cache_path = project_root / pattern
116
+ if cache_path.exists():
117
+ try:
118
+ if cache_path.is_dir():
119
+ shutil.rmtree(cache_path, ignore_errors=True)
120
+ elif cache_path.is_file():
121
+ cache_path.unlink()
122
+ cleaned.append(pattern)
123
+ except OSError:
124
+ pass
125
+
126
+ # Also remove any .pyc files in root directory
127
+ try:
128
+ for pyc_file in project_root.glob("*.pyc"):
129
+ try:
130
+ pyc_file.unlink()
131
+ cleaned.append(pyc_file.name)
132
+ except OSError:
133
+ pass
134
+ except OSError:
135
+ pass
136
+
137
+ if cleaned:
138
+ print(
139
+ f"Cleaned {len(cleaned)} cache items: {', '.join(cleaned[:10])}{'...' if len(cleaned) > 10 else ''}"
140
+ )
141
+ else:
142
+ print("No cache files found to clean")
143
+
144
+
145
+ def run_command(
146
+ cmd: list[str], check: bool = True, shell: bool = False, cwd: str | None = None
147
+ ) -> int:
148
+ """Run a command and return exit code."""
149
+ try:
150
+ result = subprocess.run(
151
+ cmd,
152
+ check=check,
153
+ shell=shell,
154
+ cwd=cwd,
155
+ env=None, # Use current environment, uv will handle venv
156
+ )
157
+ return result.returncode
158
+ except subprocess.CalledProcessError as e:
159
+ return e.returncode
160
+ except FileNotFoundError:
161
+ print(f"Error: Command not found: {cmd[0]}")
162
+ return 1
163
+
164
+
165
+ def main() -> int:
166
+ """Main entry point."""
167
+ import os
168
+
169
+ # Get the project root (where pyproject.toml is)
170
+ script_dir = Path(__file__).parent
171
+ project_root = script_dir.parent
172
+
173
+ # Change to project root to ensure uv works correctly
174
+ os.chdir(project_root)
175
+
176
+ # Clean caches before running tests
177
+ print("Cleaning pytest and Python caches...")
178
+ clean_caches(project_root)
179
+
180
+ # Check if uv is available
181
+ if run_command(["uv", "--version"], check=False) != 0:
182
+ print("Error: uv not found. Please install uv: https://github.com/astral-sh/uv")
183
+ return 1
184
+
185
+ # Parse arguments
186
+ test_type = sys.argv[1] if len(sys.argv) > 1 else "unit"
187
+ extra_args = sys.argv[2:] if len(sys.argv) > 2 else []
188
+
189
+ # Sync dependencies - always include dev
190
+ # Note: embeddings dependencies are now in main dependencies, not optional
191
+ # Use --extra dev for [project.optional-dependencies].dev (not --dev which is for [dependency-groups])
192
+ sync_cmd = ["uv", "sync", "--extra", "dev"]
193
+
194
+ print(f"Syncing dependencies for {test_type} tests...")
195
+ if run_command(sync_cmd, cwd=project_root) != 0:
196
+ return 1
197
+
198
+ # Build pytest command - use uv run to ensure correct environment
199
+ if test_type == "unit":
200
+ pytest_args = [
201
+ "tests/unit/",
202
+ "-v",
203
+ "-m",
204
+ "not openai and not embedding_provider",
205
+ "--tb=short",
206
+ "-p",
207
+ "no:logfire",
208
+ "--cache-clear", # Clear pytest cache before running
209
+ ]
210
+ elif test_type == "embeddings":
211
+ pytest_args = [
212
+ "tests/",
213
+ "-v",
214
+ "-m",
215
+ "local_embeddings",
216
+ "--tb=short",
217
+ "-p",
218
+ "no:logfire",
219
+ "--cache-clear", # Clear pytest cache before running
220
+ ]
221
+ else:
222
+ pytest_args = []
223
+
224
+ pytest_args.extend(extra_args)
225
+
226
+ # Use uv run python -m pytest to ensure we use the venv's pytest
227
+ # This is more reliable than uv run pytest which might find system pytest
228
+ pytest_cmd = ["uv", "run", "python", "-m", "pytest", *pytest_args]
229
+
230
+ print(f"Running {test_type} tests...")
231
+ return run_command(pytest_cmd, cwd=project_root)
232
+
233
+
234
+ if __name__ == "__main__":
235
+ sys.exit(main())
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
- title: DeepCritical
3
- emoji: 🧬
4
- colorFrom: blue
5
- colorTo: purple
6
  sdk: gradio
7
  sdk_version: "6.0.1"
8
  python_version: "3.11"
@@ -23,178 +23,98 @@ tags:
23
  - modal
24
  ---
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  # DeepCritical
27
 
28
- ## Intro
29
-
30
- ## Features
31
-
32
- - **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
33
- - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
34
- - **Modal Sandbox**: Secure execution of AI-generated statistical code
35
- - **LlamaIndex RAG**: Semantic search and evidence synthesis
36
- - **HuggingfaceInference**:
37
- - **HuggingfaceMCP Custom Config To Use Community Tools**:
38
- - **Strongly Typed Composable Graphs**:
39
- - **Specialized Research Teams of Agents**:
40
-
41
- ## Quick Start
42
-
43
- ### 1. Environment Setup
44
-
45
- ```bash
46
- # Install uv if you haven't already
47
- pip install uv
48
-
49
- # Sync dependencies
50
- uv sync
51
- ```
52
-
53
- ### 2. Run the UI
54
-
55
- ```bash
56
- # Start the Gradio app
57
- uv run gradio run src/app.py
58
- ```
59
-
60
- Open your browser to `http://localhost:7860`.
61
-
62
- ### 3. Connect via MCP
63
-
64
- This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
65
-
66
- **MCP Server URL**: `http://localhost:7860/gradio_api/mcp/`
67
-
68
- **Claude Desktop Configuration**:
69
- Add this to your `claude_desktop_config.json`:
70
- ```json
71
- {
72
- "mcpServers": {
73
- "deepcritical": {
74
- "url": "http://localhost:7860/gradio_api/mcp/"
75
- }
76
- }
77
- }
78
- ```
79
-
80
- **Available Tools**:
81
- - `search_pubmed`: Search peer-reviewed biomedical literature.
82
- - `search_clinical_trials`: Search ClinicalTrials.gov.
83
- - `search_biorxiv`: Search bioRxiv/medRxiv preprints.
84
- - `search_all`: Search all sources simultaneously.
85
- - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
86
-
87
-
88
-
89
- ## Architecture
90
-
91
- DeepCritical uses a Vertical Slice Architecture:
92
-
93
- 1. **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
94
- 2. **Judge Slice**: Evaluating evidence quality using LLMs.
95
- 3. **Orchestrator Slice**: Managing the research loop and UI.
96
-
97
- - iterativeResearch
98
- - deepResearch
99
- - researchTeam
100
-
101
- ### Iterative Research
102
-
103
- sequenceDiagram
104
- participant IterativeFlow
105
- participant ThinkingAgent
106
- participant KnowledgeGapAgent
107
- participant ToolSelector
108
- participant ToolExecutor
109
- participant JudgeHandler
110
- participant WriterAgent
111
-
112
- IterativeFlow->>IterativeFlow: run(query)
113
-
114
- loop Until complete or max_iterations
115
- IterativeFlow->>ThinkingAgent: generate_observations()
116
- ThinkingAgent-->>IterativeFlow: observations
117
-
118
- IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
119
- KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
120
-
121
- alt Research complete
122
- IterativeFlow->>WriterAgent: create_final_report()
123
- WriterAgent-->>IterativeFlow: final_report
124
- else Gaps remain
125
- IterativeFlow->>ToolSelector: select_agents(gap)
126
- ToolSelector-->>IterativeFlow: AgentSelectionPlan
127
-
128
- IterativeFlow->>ToolExecutor: execute_tool_tasks()
129
- ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
130
-
131
- IterativeFlow->>JudgeHandler: assess_evidence()
132
- JudgeHandler-->>IterativeFlow: should_continue
133
- end
134
- end
135
-
136
-
137
- ### Deep Research
138
-
139
- sequenceDiagram
140
- actor User
141
- participant GraphOrchestrator
142
- participant InputParser
143
- participant GraphBuilder
144
- participant GraphExecutor
145
- participant Agent
146
- participant BudgetTracker
147
- participant WorkflowState
148
-
149
- User->>GraphOrchestrator: run(query)
150
- GraphOrchestrator->>InputParser: detect_research_mode(query)
151
- InputParser-->>GraphOrchestrator: mode (iterative/deep)
152
- GraphOrchestrator->>GraphBuilder: build_graph(mode)
153
- GraphBuilder-->>GraphOrchestrator: ResearchGraph
154
- GraphOrchestrator->>WorkflowState: init_workflow_state()
155
- GraphOrchestrator->>BudgetTracker: create_budget()
156
- GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
157
-
158
- loop For each node in graph
159
- GraphExecutor->>Agent: execute_node(agent_node)
160
- Agent->>Agent: process_input
161
- Agent-->>GraphExecutor: result
162
- GraphExecutor->>WorkflowState: update_state(result)
163
- GraphExecutor->>BudgetTracker: add_tokens(used)
164
- GraphExecutor->>BudgetTracker: check_budget()
165
- alt Budget exceeded
166
- GraphExecutor->>GraphOrchestrator: emit(error_event)
167
- else Continue
168
- GraphExecutor->>GraphOrchestrator: emit(progress_event)
169
- end
170
- end
171
-
172
- GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
173
-
174
- ### Research Team
175
-
176
- Critical Deep Research Agent
177
-
178
- ## Development
179
-
180
- ### Run Tests
181
-
182
- ```bash
183
- uv run pytest
184
- ```
185
-
186
- ### Run Checks
187
-
188
- ```bash
189
- make check
190
- ```
191
-
192
- ## Join Us
193
-
194
- - The-Obstacle-Is-The-Way
195
  - MarioAderman
196
  - Josephrp
197
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
198
  ## Links
199
 
200
- - [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)
 
 
 
 
 
1
  ---
2
+ title: Critical Deep Resarch
3
+ emoji: 🐉
4
+ colorFrom: red
5
+ colorTo: yellow
6
  sdk: gradio
7
  sdk_version: "6.0.1"
8
  python_version: "3.11"
 
23
  - modal
24
  ---
25
 
26
+ > [!IMPORTANT]
27
+ > **You are reading the Gradio Demo README!**
28
+ >
29
+ > - 📚 **Documentation**: See our [technical documentation](deepcritical.github.io/GradioDemo/) for detailed information
30
+ > - 📖 **Complete README**: Check out the [full README](.github/README.md) for setup, configuration, and contribution guidelines
31
+ > - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
32
+
33
+ <div align="center">
34
+
35
+ [![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
36
+ [![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
37
+ [![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
38
+ [![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
39
+ [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
40
+
41
+
42
+ </div>
43
+
44
  # DeepCritical
45
 
46
+ ## About
47
+
48
+ The [Deep Critical Gradio Hackathon Team](### Team) met online in the Alzheimer's Critical Literature Review Group in the Hugging Science initiative. We're building the agent framework we want to use for ai assisted research to [turn the vast amounts of clinical data into cures](https://github.com/DeepCritical/GradioDemo).
49
+
50
+ For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
51
+
52
+ ## Deep Critical In the Medial
53
+
54
+ - Social Medial Posts about Deep Critical :
55
+ -
56
+ -
57
+ -
58
+ -
59
+ -
60
+ -
61
+ -
62
+
63
+ ## Important information
64
+
65
+ - **[readme](.github\README.md)**: configure, deploy , contribute and learn more here.
66
+ - **[docs](deepcritical.github.io/GradioDemo/)**: want to know how all this works ? read our detailed technical documentation here.
67
+ - **[demo](https://huggingface/spaces/DataQuests/DeepCritical)**: Try our demo on huggingface
68
+ - **[team](### Team)**: Join us , or follow us !
69
+ - **[video]**: See our demo video
70
+
71
+ ## Future Developments
72
+
73
+ - [] Apply Deep Research Systems To Generate Short Form Video (up to 5 minutes)
74
+ - [] Visualize Pydantic Graphs as Loading Screens in the UI
75
+ - [] Improve Data Science with more Complex Graph Agents
76
+ - [] Create Deep Critical Drug Reporposing / Discovery Demo
77
+ - [] Create Deep Critical Literal Review
78
+ - [] Create Deep Critical Hypothesis Generator
79
+ - [] Create PyPi Package
80
+
81
+ ## Completed
82
+
83
+ - [x] **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
84
+ - [x] **MCP Integration**: Use our tools from Claude Desktop or any MCP client
85
+ - [x] **HuggingFace OAuth**: Sign in with HuggingFace
86
+ - [x] **Modal Sandbox**: Secure execution of AI-generated statistical code
87
+ - [x] **LlamaIndex RAG**: Semantic search and evidence synthesis
88
+ - [x] **HuggingfaceInference**:
89
+ - [x] **HuggingfaceMCP Custom Config To Use Community Tools**:
90
+ - [x] **Strongly Typed Composable Graphs**:
91
+ - [x] **Specialized Research Teams of Agents**:
92
+
93
+ ### Team
94
+
95
+ - ZJ
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  - MarioAderman
97
  - Josephrp
98
 
99
+ ## Acknowledgements
100
+
101
+ - McSwaggins
102
+ - Magentic
103
+ - Huggingface
104
+ - Gradio
105
+ - DeepCritical
106
+ - Sponsors
107
+ - Microsoft
108
+ - Pydantic
109
+ - Llama-index
110
+ - Anthhropic/MCP
111
+ - List of Tools Makers
112
+
113
+
114
  ## Links
115
 
116
+ [![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
117
+ [![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
118
+ [![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
119
+ [![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
120
+ [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
dev/.cursorrules ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepCritical Project - Cursor Rules
2
+
3
+ ## Project-Wide Rules
4
+
5
+ **Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
6
+
7
+ **Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
8
+
9
+ **Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
10
+
11
+ **Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
12
+
13
+ **Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
14
+
15
+ **Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
16
+
17
+ **Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
18
+
19
+ **Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
20
+
21
+ **Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
22
+
23
+ **State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
24
+
25
+ **Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
26
+
27
+ ---
28
+
29
+ ## src/agents/ - Agent Implementation Rules
30
+
31
+ **Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
32
+
33
+ **Agent Structure**:
34
+ - System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
35
+ - Agent class with `__init__(model: Any | None = None)`
36
+ - Main method (e.g., `async def evaluate()`, `async def write_report()`)
37
+ - Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
38
+
39
+ **Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
40
+
41
+ **Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
42
+
43
+ **Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
44
+
45
+ **Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
46
+
47
+ **Agent-Specific Rules**:
48
+ - `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
49
+ - `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
50
+ - `writer.py`: Returns markdown string. Includes citations in numbered format.
51
+ - `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
52
+ - `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
53
+ - `thinking.py`: Returns observation string from conversation history.
54
+ - `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
55
+
56
+ ---
57
+
58
+ ## src/tools/ - Search Tool Rules
59
+
60
+ **Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
61
+
62
+ **Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
63
+
64
+ **Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
65
+
66
+ **Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
67
+
68
+ **Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
69
+
70
+ **Tool-Specific Rules**:
71
+ - `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
72
+ - `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
73
+ - `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
74
+ - `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
75
+ - `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
76
+
77
+ ---
78
+
79
+ ## src/middleware/ - Middleware Rules
80
+
81
+ **State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
82
+
83
+ **WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
84
+
85
+ **WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
86
+
87
+ **BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
88
+
89
+ **Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
90
+
91
+ ---
92
+
93
+ ## src/orchestrator/ - Orchestration Rules
94
+
95
+ **Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
96
+
97
+ **IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
98
+
99
+ **DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
100
+
101
+ **Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
102
+
103
+ **State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
104
+
105
+ **Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
106
+
107
+ ---
108
+
109
+ ## src/services/ - Service Rules
110
+
111
+ **EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
112
+
113
+ **LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
114
+
115
+ **StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
116
+
117
+ **Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
118
+
119
+ ---
120
+
121
+ ## src/utils/ - Utility Rules
122
+
123
+ **Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
124
+
125
+ **Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
126
+
127
+ **Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
128
+
129
+ **LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
130
+
131
+ **Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
132
+
133
+ ---
134
+
135
+ ## src/orchestrator_factory.py Rules
136
+
137
+ **Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
138
+
139
+ **Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
140
+
141
+ **Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
142
+
143
+ **Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
144
+
145
+ **Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
146
+
147
+ ---
148
+
149
+ ## src/orchestrator_hierarchical.py Rules
150
+
151
+ **Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
152
+
153
+ **Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
154
+
155
+ **State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
156
+
157
+ **Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
158
+
159
+ **Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
160
+
161
+ ---
162
+
163
+ ## src/orchestrator_magentic.py Rules
164
+
165
+ **Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
166
+
167
+ **Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
168
+
169
+ **Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
170
+
171
+ **Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
172
+
173
+ **State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
174
+
175
+ **Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
176
+
177
+ **Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
178
+
179
+ ---
180
+
181
+ ## src/agent_factory/ - Factory Rules
182
+
183
+ **Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
184
+
185
+ **Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
186
+
187
+ **Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
188
+
189
+ **Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
190
+
191
+ **Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
192
+
193
+ ---
194
+
195
+ ## src/prompts/ - Prompt Rules
196
+
197
+ **Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
198
+
199
+ **Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
200
+
201
+ **Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
202
+
203
+ **Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
204
+
205
+ ---
206
+
207
+ ## Testing Rules
208
+
209
+ **Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
210
+
211
+ **Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
212
+
213
+ **Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
214
+
215
+ **Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
216
+
217
+ ---
218
+
219
+ ## File-Specific Agent Rules
220
+
221
+ **knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
222
+
223
+ **writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
224
+
225
+ **long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
226
+
227
+ **proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
228
+
229
+ **tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
230
+
231
+ **thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
232
+
233
+ **input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.
234
+
235
+
236
+
237
+
238
+
239
+
240
+
241
+
dev/AGENTS.txt ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepCritical Project - Rules
2
+
3
+ ## Project-Wide Rules
4
+
5
+ **Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
6
+
7
+ **Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
8
+
9
+ **Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
10
+
11
+ **Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
12
+
13
+ **Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
14
+
15
+ **Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
16
+
17
+ **Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
18
+
19
+ **Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
20
+
21
+ **Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
22
+
23
+ **State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
24
+
25
+ **Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
26
+
27
+ ---
28
+
29
+ ## src/agents/ - Agent Implementation Rules
30
+
31
+ **Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
32
+
33
+ **Agent Structure**:
34
+ - System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
35
+ - Agent class with `__init__(model: Any | None = None)`
36
+ - Main method (e.g., `async def evaluate()`, `async def write_report()`)
37
+ - Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
38
+
39
+ **Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
40
+
41
+ **Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
42
+
43
+ **Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
44
+
45
+ **Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
46
+
47
+ **Agent-Specific Rules**:
48
+ - `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
49
+ - `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
50
+ - `writer.py`: Returns markdown string. Includes citations in numbered format.
51
+ - `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
52
+ - `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
53
+ - `thinking.py`: Returns observation string from conversation history.
54
+ - `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
55
+
56
+ ---
57
+
58
+ ## src/tools/ - Search Tool Rules
59
+
60
+ **Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
61
+
62
+ **Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
63
+
64
+ **Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
65
+
66
+ **Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
67
+
68
+ **Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
69
+
70
+ **Tool-Specific Rules**:
71
+ - `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
72
+ - `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
73
+ - `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
74
+ - `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
75
+ - `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
76
+
77
+ ---
78
+
79
+ ## src/middleware/ - Middleware Rules
80
+
81
+ **State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
82
+
83
+ **WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
84
+
85
+ **WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
86
+
87
+ **BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
88
+
89
+ **Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
90
+
91
+ ---
92
+
93
+ ## src/orchestrator/ - Orchestration Rules
94
+
95
+ **Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
96
+
97
+ **IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
98
+
99
+ **DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
100
+
101
+ **Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
102
+
103
+ **State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
104
+
105
+ **Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
106
+
107
+ ---
108
+
109
+ ## src/services/ - Service Rules
110
+
111
+ **EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
112
+
113
+ **LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
114
+
115
+ **StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
116
+
117
+ **Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
118
+
119
+ ---
120
+
121
+ ## src/utils/ - Utility Rules
122
+
123
+ **Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
124
+
125
+ **Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
126
+
127
+ **Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
128
+
129
+ **LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
130
+
131
+ **Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
132
+
133
+ ---
134
+
135
+ ## src/orchestrator_factory.py Rules
136
+
137
+ **Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
138
+
139
+ **Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
140
+
141
+ **Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
142
+
143
+ **Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
144
+
145
+ **Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
146
+
147
+ ---
148
+
149
+ ## src/orchestrator_hierarchical.py Rules
150
+
151
+ **Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
152
+
153
+ **Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
154
+
155
+ **State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
156
+
157
+ **Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
158
+
159
+ **Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
160
+
161
+ ---
162
+
163
+ ## src/orchestrator_magentic.py Rules
164
+
165
+ **Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
166
+
167
+ **Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
168
+
169
+ **Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
170
+
171
+ **Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
172
+
173
+ **State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
174
+
175
+ **Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
176
+
177
+ **Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
178
+
179
+ ---
180
+
181
+ ## src/agent_factory/ - Factory Rules
182
+
183
+ **Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
184
+
185
+ **Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
186
+
187
+ **Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
188
+
189
+ **Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
190
+
191
+ **Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
192
+
193
+ ---
194
+
195
+ ## src/prompts/ - Prompt Rules
196
+
197
+ **Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
198
+
199
+ **Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
200
+
201
+ **Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
202
+
203
+ **Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
204
+
205
+ ---
206
+
207
+ ## Testing Rules
208
+
209
+ **Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
210
+
211
+ **Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
212
+
213
+ **Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
214
+
215
+ **Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
216
+
217
+ ---
218
+
219
+ ## File-Specific Agent Rules
220
+
221
+ **knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
222
+
223
+ **writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
224
+
225
+ **long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
226
+
227
+ **proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
228
+
229
+ **tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
230
+
231
+ **thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
232
+
233
+ **input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.
234
+
235
+
236
+
dev/Makefile ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .PHONY: install test lint format typecheck check clean all cov cov-html
2
+
3
+ # Default target
4
+ all: check
5
+
6
+ install:
7
+ uv sync --all-extras
8
+ uv run pre-commit install
9
+
10
+ test:
11
+ uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
12
+
13
+ test-hf:
14
+ uv run pytest tests/ -v -m "huggingface" -p no:logfire
15
+
16
+ test-all:
17
+ uv run pytest tests/ -v -p no:logfire
18
+
19
+ # Coverage aliases
20
+ cov: test-cov
21
+ test-cov:
22
+ uv run pytest --cov=src --cov-report=term-missing -m "not openai" -p no:logfire
23
+
24
+ cov-html:
25
+ uv run pytest --cov=src --cov-report=html -p no:logfire
26
+ @echo "Coverage report: open htmlcov/index.html"
27
+
28
+ lint:
29
+ uv run ruff check src tests
30
+
31
+ format:
32
+ uv run ruff format src tests
33
+
34
+ typecheck:
35
+ uv run mypy src
36
+
37
+ check: lint typecheck test-cov
38
+ @echo "All checks passed!"
39
+
40
+ docs-build:
41
+ uv run mkdocs build
42
+
43
+ docs-serve:
44
+ uv run mkdocs serve
45
+
46
+ docs-clean:
47
+ rm -rf site/
48
+
49
+ clean:
50
+ rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
51
+ find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
dev/docs_plugins.py ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Custom MkDocs extension to handle code anchor format: ```start:end:filepath"""
2
+
3
+ import re
4
+ from pathlib import Path
5
+
6
+ from markdown import Markdown
7
+ from markdown.extensions import Extension
8
+ from markdown.preprocessors import Preprocessor
9
+
10
+
11
+ class CodeAnchorPreprocessor(Preprocessor):
12
+ """Preprocess code blocks with anchor format: ```start:end:filepath"""
13
+
14
+ def __init__(self, md: Markdown, base_path: Path):
15
+ super().__init__(md)
16
+ self.base_path = base_path
17
+ self.pattern = re.compile(r"^```(\d+):(\d+):([^\n]+)\n(.*?)```$", re.MULTILINE | re.DOTALL)
18
+
19
+ def run(self, lines: list[str]) -> list[str]:
20
+ """Process lines and convert code anchor format to standard code blocks."""
21
+ text = "\n".join(lines)
22
+ new_text = self.pattern.sub(self._replace_code_anchor, text)
23
+ return new_text.split("\n")
24
+
25
+ def _replace_code_anchor(self, match) -> str:
26
+ """Replace code anchor format with standard code block + link."""
27
+ start_line = int(match.group(1))
28
+ end_line = int(match.group(2))
29
+ file_path = match.group(3).strip()
30
+ existing_code = match.group(4)
31
+
32
+ # Determine language from file extension
33
+ ext = Path(file_path).suffix.lower()
34
+ lang_map = {
35
+ ".py": "python",
36
+ ".js": "javascript",
37
+ ".ts": "typescript",
38
+ ".md": "markdown",
39
+ ".yaml": "yaml",
40
+ ".yml": "yaml",
41
+ ".toml": "toml",
42
+ ".json": "json",
43
+ ".html": "html",
44
+ ".css": "css",
45
+ ".sh": "bash",
46
+ }
47
+ language = lang_map.get(ext, "python")
48
+
49
+ # Generate GitHub link
50
+ repo_url = "https://github.com/DeepCritical/GradioDemo"
51
+ github_link = f"{repo_url}/blob/main/{file_path}#L{start_line}-L{end_line}"
52
+
53
+ # Return standard code block with source link
54
+ return (
55
+ f'[View source: `{file_path}` (lines {start_line}-{end_line})]({github_link}){{: target="_blank" }}\n\n'
56
+ f"```{language}\n{existing_code}\n```"
57
+ )
58
+
59
+
60
+ class CodeAnchorExtension(Extension):
61
+ """Markdown extension for code anchors."""
62
+
63
+ def __init__(self, base_path: str = ".", **kwargs):
64
+ super().__init__(**kwargs)
65
+ self.base_path = Path(base_path)
66
+
67
+ def extendMarkdown(self, md: Markdown): # noqa: N802
68
+ """Register the preprocessor."""
69
+ md.preprocessors.register(CodeAnchorPreprocessor(md, self.base_path), "codeanchor", 25)
70
+
71
+
72
+ def makeExtension(**kwargs): # noqa: N802
73
+ """Create the extension."""
74
+ return CodeAnchorExtension(**kwargs)
docs/CONFIGURATION.md DELETED
@@ -1,301 +0,0 @@
1
- # Configuration Guide
2
-
3
- ## Overview
4
-
5
- DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
6
-
7
- ## Quick Start
8
-
9
- 1. Copy the example environment file (if available) or create a `.env` file in the project root
10
- 2. Set at least one LLM API key (`OPENAI_API_KEY` or `ANTHROPIC_API_KEY`)
11
- 3. Optionally configure other services as needed
12
-
13
- ## Configuration System
14
-
15
- ### How It Works
16
-
17
- - **Settings Class**: `Settings` class in `src/utils/config.py` extends `BaseSettings` from `pydantic_settings`
18
- - **Environment File**: Automatically loads from `.env` file (if present)
19
- - **Environment Variables**: Reads from environment variables (case-insensitive)
20
- - **Type Safety**: Strongly-typed fields with validation
21
- - **Singleton Pattern**: Global `settings` instance for easy access
22
-
23
- ### Usage
24
-
25
- ```python
26
- from src.utils.config import settings
27
-
28
- # Check if API keys are available
29
- if settings.has_openai_key:
30
- # Use OpenAI
31
- pass
32
-
33
- # Access configuration values
34
- max_iterations = settings.max_iterations
35
- web_search_provider = settings.web_search_provider
36
- ```
37
-
38
- ## Required Configuration
39
-
40
- ### At Least One LLM Provider
41
-
42
- You must configure at least one LLM provider:
43
-
44
- **OpenAI:**
45
- ```bash
46
- LLM_PROVIDER=openai
47
- OPENAI_API_KEY=your_openai_api_key_here
48
- OPENAI_MODEL=gpt-5.1
49
- ```
50
-
51
- **Anthropic:**
52
- ```bash
53
- LLM_PROVIDER=anthropic
54
- ANTHROPIC_API_KEY=your_anthropic_api_key_here
55
- ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
56
- ```
57
-
58
- ## Optional Configuration
59
-
60
- ### Embedding Configuration
61
-
62
- ```bash
63
- # Embedding Provider: "openai", "local", or "huggingface"
64
- EMBEDDING_PROVIDER=local
65
-
66
- # OpenAI Embedding Model (used by LlamaIndex RAG)
67
- OPENAI_EMBEDDING_MODEL=text-embedding-3-small
68
-
69
- # Local Embedding Model (sentence-transformers)
70
- LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
71
-
72
- # HuggingFace Embedding Model
73
- HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
74
- ```
75
-
76
- ### HuggingFace Configuration
77
-
78
- ```bash
79
- # HuggingFace API Token (for inference API)
80
- HUGGINGFACE_API_KEY=your_huggingface_api_key_here
81
- # Or use HF_TOKEN (alternative name)
82
-
83
- # Default HuggingFace Model ID
84
- HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
85
- ```
86
-
87
- ### Web Search Configuration
88
-
89
- ```bash
90
- # Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
91
- # Default: "duckduckgo" (no API key required)
92
- WEB_SEARCH_PROVIDER=duckduckgo
93
-
94
- # Serper API Key (for Google search via Serper)
95
- SERPER_API_KEY=your_serper_api_key_here
96
-
97
- # SearchXNG Host URL
98
- SEARCHXNG_HOST=http://localhost:8080
99
-
100
- # Brave Search API Key
101
- BRAVE_API_KEY=your_brave_api_key_here
102
-
103
- # Tavily API Key
104
- TAVILY_API_KEY=your_tavily_api_key_here
105
- ```
106
-
107
- ### PubMed Configuration
108
-
109
- ```bash
110
- # NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
111
- NCBI_API_KEY=your_ncbi_api_key_here
112
- ```
113
-
114
- ### Agent Configuration
115
-
116
- ```bash
117
- # Maximum iterations per research loop
118
- MAX_ITERATIONS=10
119
-
120
- # Search timeout in seconds
121
- SEARCH_TIMEOUT=30
122
-
123
- # Use graph-based execution for research flows
124
- USE_GRAPH_EXECUTION=false
125
- ```
126
-
127
- ### Budget & Rate Limiting Configuration
128
-
129
- ```bash
130
- # Default token budget per research loop
131
- DEFAULT_TOKEN_LIMIT=100000
132
-
133
- # Default time limit per research loop (minutes)
134
- DEFAULT_TIME_LIMIT_MINUTES=10
135
-
136
- # Default iterations limit per research loop
137
- DEFAULT_ITERATIONS_LIMIT=10
138
- ```
139
-
140
- ### RAG Service Configuration
141
-
142
- ```bash
143
- # ChromaDB collection name for RAG
144
- RAG_COLLECTION_NAME=deepcritical_evidence
145
-
146
- # Number of top results to retrieve from RAG
147
- RAG_SIMILARITY_TOP_K=5
148
-
149
- # Automatically ingest evidence into RAG
150
- RAG_AUTO_INGEST=true
151
- ```
152
-
153
- ### ChromaDB Configuration
154
-
155
- ```bash
156
- # ChromaDB storage path
157
- CHROMA_DB_PATH=./chroma_db
158
-
159
- # Whether to persist ChromaDB to disk
160
- CHROMA_DB_PERSIST=true
161
-
162
- # ChromaDB server host (for remote ChromaDB, optional)
163
- # CHROMA_DB_HOST=localhost
164
-
165
- # ChromaDB server port (for remote ChromaDB, optional)
166
- # CHROMA_DB_PORT=8000
167
- ```
168
-
169
- ### External Services
170
-
171
- ```bash
172
- # Modal Token ID (for Modal sandbox execution)
173
- MODAL_TOKEN_ID=your_modal_token_id_here
174
-
175
- # Modal Token Secret
176
- MODAL_TOKEN_SECRET=your_modal_token_secret_here
177
- ```
178
-
179
- ### Logging Configuration
180
-
181
- ```bash
182
- # Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
183
- LOG_LEVEL=INFO
184
- ```
185
-
186
- ## Configuration Properties
187
-
188
- The `Settings` class provides helpful properties for checking configuration:
189
-
190
- ```python
191
- from src.utils.config import settings
192
-
193
- # Check API key availability
194
- settings.has_openai_key # bool
195
- settings.has_anthropic_key # bool
196
- settings.has_huggingface_key # bool
197
- settings.has_any_llm_key # bool
198
-
199
- # Check service availability
200
- settings.modal_available # bool
201
- settings.web_search_available # bool
202
- ```
203
-
204
- ## Environment Variables Reference
205
-
206
- ### Required (at least one LLM)
207
- - `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` - At least one LLM provider key
208
-
209
- ### Optional LLM Providers
210
- - `DEEPSEEK_API_KEY` (Phase 2)
211
- - `OPENROUTER_API_KEY` (Phase 2)
212
- - `GEMINI_API_KEY` (Phase 2)
213
- - `PERPLEXITY_API_KEY` (Phase 2)
214
- - `HUGGINGFACE_API_KEY` or `HF_TOKEN`
215
- - `AZURE_OPENAI_ENDPOINT` (Phase 2)
216
- - `AZURE_OPENAI_DEPLOYMENT` (Phase 2)
217
- - `AZURE_OPENAI_API_KEY` (Phase 2)
218
- - `AZURE_OPENAI_API_VERSION` (Phase 2)
219
- - `LOCAL_MODEL_URL` (Phase 2)
220
-
221
- ### Web Search
222
- - `WEB_SEARCH_PROVIDER` (default: "duckduckgo")
223
- - `SERPER_API_KEY`
224
- - `SEARCHXNG_HOST`
225
- - `BRAVE_API_KEY`
226
- - `TAVILY_API_KEY`
227
-
228
- ### Embeddings
229
- - `EMBEDDING_PROVIDER` (default: "local")
230
- - `HUGGINGFACE_EMBEDDING_MODEL` (optional)
231
-
232
- ### RAG
233
- - `RAG_COLLECTION_NAME` (default: "deepcritical_evidence")
234
- - `RAG_SIMILARITY_TOP_K` (default: 5)
235
- - `RAG_AUTO_INGEST` (default: true)
236
-
237
- ### ChromaDB
238
- - `CHROMA_DB_PATH` (default: "./chroma_db")
239
- - `CHROMA_DB_PERSIST` (default: true)
240
- - `CHROMA_DB_HOST` (optional)
241
- - `CHROMA_DB_PORT` (optional)
242
-
243
- ### Budget
244
- - `DEFAULT_TOKEN_LIMIT` (default: 100000)
245
- - `DEFAULT_TIME_LIMIT_MINUTES` (default: 10)
246
- - `DEFAULT_ITERATIONS_LIMIT` (default: 10)
247
-
248
- ### Other
249
- - `LLM_PROVIDER` (default: "openai")
250
- - `NCBI_API_KEY` (optional)
251
- - `MODAL_TOKEN_ID` (optional)
252
- - `MODAL_TOKEN_SECRET` (optional)
253
- - `MAX_ITERATIONS` (default: 10)
254
- - `LOG_LEVEL` (default: "INFO")
255
- - `USE_GRAPH_EXECUTION` (default: false)
256
-
257
- ## Validation
258
-
259
- Settings are validated on load using Pydantic validation:
260
-
261
- - **Type checking**: All fields are strongly typed
262
- - **Range validation**: Numeric fields have min/max constraints
263
- - **Literal validation**: Enum fields only accept specific values
264
- - **Required fields**: API keys are checked when accessed via `get_api_key()`
265
-
266
- ## Error Handling
267
-
268
- Configuration errors raise `ConfigurationError`:
269
-
270
- ```python
271
- from src.utils.config import settings
272
- from src.utils.exceptions import ConfigurationError
273
-
274
- try:
275
- api_key = settings.get_api_key()
276
- except ConfigurationError as e:
277
- print(f"Configuration error: {e}")
278
- ```
279
-
280
- ## Future Enhancements (Phase 2)
281
-
282
- The following configurations are planned for Phase 2:
283
-
284
- 1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
285
- 2. **Model Selection**: Reasoning/main/fast model configuration
286
- 3. **Service Integration**: Migrate `folder/llm_config.py` to centralized config
287
-
288
- See `CONFIGURATION_ANALYSIS.md` for the complete implementation plan.
289
-
290
-
291
-
292
-
293
-
294
-
295
-
296
-
297
-
298
-
299
-
300
-
301
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/api/agents.md CHANGED
@@ -265,6 +265,3 @@ def create_input_parser_agent(model: Any | None = None) -> InputParserAgent
265
 
266
 
267
 
268
-
269
-
270
-
 
265
 
266
 
267
 
 
 
 
docs/api/models.md CHANGED
@@ -243,6 +243,3 @@ class BudgetStatus(BaseModel):
243
 
244
 
245
 
246
-
247
-
248
-
 
243
 
244
 
245
 
 
 
 
docs/api/orchestrators.md CHANGED
@@ -190,6 +190,3 @@ Runs Magentic orchestration.
190
 
191
 
192
 
193
-
194
-
195
-
 
190
 
191
 
192
 
 
 
 
docs/api/services.md CHANGED
@@ -196,6 +196,3 @@ Analyzes a hypothesis using statistical methods.
196
 
197
 
198
 
199
-
200
-
201
-
 
196
 
197
 
198
 
 
 
 
docs/api/tools.md CHANGED
@@ -230,6 +230,3 @@ Searches multiple tools in parallel.
230
 
231
 
232
 
233
-
234
-
235
-
 
230
 
231
 
232
 
 
 
 
docs/architecture/agents.md CHANGED
@@ -187,6 +187,3 @@ Factory functions:
187
 
188
 
189
 
190
-
191
-
192
-
 
187
 
188
 
189
 
 
 
 
docs/architecture/design-patterns.md DELETED
@@ -1,1509 +0,0 @@
1
- # Design Patterns & Technical Decisions
2
- ## Explicit Answers to Architecture Questions
3
-
4
- ---
5
-
6
- ## Purpose of This Document
7
-
8
- This document explicitly answers all the "design pattern" questions raised in team discussions. It provides clear technical decisions with rationale.
9
-
10
- ---
11
-
12
- ## 1. Primary Architecture Pattern
13
-
14
- ### Decision: Orchestrator with Search-Judge Loop
15
-
16
- **Pattern Name**: Iterative Research Orchestrator
17
-
18
- **Structure**:
19
- ```
20
- ┌─────────────────────────────────────┐
21
- │ Research Orchestrator │
22
- │ ┌───────────────────────────────┐ │
23
- │ │ Search Strategy Planner │ │
24
- │ └───────────────────────────────┘ │
25
- │ ↓ │
26
- │ ┌───────────────────────────────┐ │
27
- │ │ Tool Coordinator │ │
28
- │ │ - PubMed Search │ │
29
- │ │ - Web Search │ │
30
- │ │ - Clinical Trials │ │
31
- │ └───────────────────────────────┘ │
32
- │ ↓ │
33
- │ ┌───────────────────────────────┐ │
34
- │ │ Evidence Aggregator │ │
35
- │ └───────────────────────────────┘ │
36
- │ ↓ │
37
- │ ┌───────────────────────────────┐ │
38
- │ │ Quality Judge │ │
39
- │ │ (LLM-based assessment) │ │
40
- │ └───────────────────────────────┘ │
41
- │ ↓ │
42
- │ Loop or Synthesize? │
43
- │ ↓ │
44
- │ ┌───────────────────────────────┐ │
45
- │ │ Report Generator │ │
46
- │ └───────────────────────────────┘ │
47
- └─────────────────────────────────────┘
48
- ```
49
-
50
- **Why NOT single-agent?**
51
- - Need coordinated multi-tool queries
52
- - Need iterative refinement
53
- - Need quality assessment between searches
54
-
55
- **Why NOT pure ReAct?**
56
- - Medical research requires structured workflow
57
- - Need explicit quality gates
58
- - Want deterministic tool selection
59
-
60
- **Why THIS pattern?**
61
- - Clear separation of concerns
62
- - Testable components
63
- - Easy to debug
64
- - Proven in similar systems
65
-
66
- ---
67
-
68
- ## 2. Tool Selection & Orchestration Pattern
69
-
70
- ### Decision: Static Tool Registry with Dynamic Selection
71
-
72
- **Pattern**:
73
- ```python
74
- class ToolRegistry:
75
- """Central registry of available research tools"""
76
- tools = {
77
- 'pubmed': PubMedSearchTool(),
78
- 'web': WebSearchTool(),
79
- 'trials': ClinicalTrialsTool(),
80
- 'drugs': DrugInfoTool(),
81
- }
82
-
83
- class Orchestrator:
84
- def select_tools(self, question: str, iteration: int) -> List[Tool]:
85
- """Dynamically choose tools based on context"""
86
- if iteration == 0:
87
- # First pass: broad search
88
- return [tools['pubmed'], tools['web']]
89
- else:
90
- # Refinement: targeted search
91
- return self.judge.recommend_tools(question, context)
92
- ```
93
-
94
- **Why NOT on-the-fly agent factories?**
95
- - 6-day timeline (too complex)
96
- - Tools are known upfront
97
- - Simpler to test and debug
98
-
99
- **Why NOT single tool?**
100
- - Need multiple evidence sources
101
- - Different tools for different info types
102
- - Better coverage
103
-
104
- **Why THIS pattern?**
105
- - Balance flexibility vs simplicity
106
- - Tools can be added easily
107
- - Selection logic is transparent
108
-
109
- ---
110
-
111
- ## 3. Judge Pattern
112
-
113
- ### Decision: Dual-Judge System (Quality + Budget)
114
-
115
- **Pattern**:
116
- ```python
117
- class QualityJudge:
118
- """LLM-based evidence quality assessment"""
119
-
120
- def is_sufficient(self, question: str, evidence: List[Evidence]) -> bool:
121
- """Main decision: do we have enough?"""
122
- return (
123
- self.has_mechanism_explanation(evidence) and
124
- self.has_drug_candidates(evidence) and
125
- self.has_clinical_evidence(evidence) and
126
- self.confidence_score(evidence) > threshold
127
- )
128
-
129
- def identify_gaps(self, question: str, evidence: List[Evidence]) -> List[str]:
130
- """What's missing?"""
131
- gaps = []
132
- if not self.has_mechanism_explanation(evidence):
133
- gaps.append("disease mechanism")
134
- if not self.has_drug_candidates(evidence):
135
- gaps.append("potential drug candidates")
136
- if not self.has_clinical_evidence(evidence):
137
- gaps.append("clinical trial data")
138
- return gaps
139
-
140
- class BudgetJudge:
141
- """Resource constraint enforcement"""
142
-
143
- def should_stop(self, state: ResearchState) -> bool:
144
- """Hard limits"""
145
- return (
146
- state.tokens_used >= max_tokens or
147
- state.iterations >= max_iterations or
148
- state.time_elapsed >= max_time
149
- )
150
- ```
151
-
152
- **Why NOT just LLM judge?**
153
- - Cost control (prevent runaway queries)
154
- - Time bounds (hackathon demo needs to be fast)
155
- - Safety (prevent infinite loops)
156
-
157
- **Why NOT just token budget?**
158
- - Want early exit when answer is good
159
- - Quality matters, not just quantity
160
- - Better user experience
161
-
162
- **Why THIS pattern?**
163
- - Best of both worlds
164
- - Clear separation (quality vs resources)
165
- - Each judge has single responsibility
166
-
167
- ---
168
-
169
- ## 4. Break/Stopping Pattern
170
-
171
- ### Decision: Three-Tier Break Conditions
172
-
173
- **Pattern**:
174
- ```python
175
- def should_continue(state: ResearchState) -> bool:
176
- """Multi-tier stopping logic"""
177
-
178
- # Tier 1: Quality-based (ideal stop)
179
- if quality_judge.is_sufficient(state.question, state.evidence):
180
- state.stop_reason = "sufficient_evidence"
181
- return False
182
-
183
- # Tier 2: Budget-based (cost control)
184
- if state.tokens_used >= config.max_tokens:
185
- state.stop_reason = "token_budget_exceeded"
186
- return False
187
-
188
- # Tier 3: Iteration-based (safety)
189
- if state.iterations >= config.max_iterations:
190
- state.stop_reason = "max_iterations_reached"
191
- return False
192
-
193
- # Tier 4: Time-based (demo friendly)
194
- if state.time_elapsed >= config.max_time:
195
- state.stop_reason = "timeout"
196
- return False
197
-
198
- return True # Continue researching
199
- ```
200
-
201
- **Configuration**:
202
- ```toml
203
- [research.limits]
204
- max_tokens = 50000 # ~$0.50 at Claude pricing
205
- max_iterations = 5 # Reasonable depth
206
- max_time_seconds = 120 # 2 minutes for demo
207
- judge_threshold = 0.8 # Quality confidence score
208
- ```
209
-
210
- **Why multiple conditions?**
211
- - Defense in depth
212
- - Different failure modes
213
- - Graceful degradation
214
-
215
- **Why these specific limits?**
216
- - Tokens: Balances cost vs quality
217
- - Iterations: Enough for refinement, not too deep
218
- - Time: Fast enough for live demo
219
- - Judge: High bar for quality
220
-
221
- ---
222
-
223
- ## 5. State Management Pattern
224
-
225
- ### Decision: Pydantic State Machine with Checkpoints
226
-
227
- **Pattern**:
228
- ```python
229
- class ResearchState(BaseModel):
230
- """Immutable state snapshots"""
231
- query_id: str
232
- question: str
233
- iteration: int = 0
234
- evidence: List[Evidence] = []
235
- tokens_used: int = 0
236
- search_history: List[SearchQuery] = []
237
- stop_reason: Optional[str] = None
238
- created_at: datetime
239
- updated_at: datetime
240
-
241
- class StateManager:
242
- def save_checkpoint(self, state: ResearchState) -> None:
243
- """Save state to disk"""
244
- path = f".deepresearch/checkpoints/{state.query_id}_iter{state.iteration}.json"
245
- path.write_text(state.model_dump_json(indent=2))
246
-
247
- def load_checkpoint(self, query_id: str, iteration: int) -> ResearchState:
248
- """Resume from checkpoint"""
249
- path = f".deepresearch/checkpoints/{query_id}_iter{iteration}.json"
250
- return ResearchState.model_validate_json(path.read_text())
251
- ```
252
-
253
- **Directory Structure**:
254
- ```
255
- .deepresearch/
256
- ├── state/
257
- │ └── current_123.json # Active research state
258
- ├── checkpoints/
259
- │ ├── query_123_iter0.json # Checkpoint after iteration 0
260
- │ ├── query_123_iter1.json # Checkpoint after iteration 1
261
- │ └── query_123_iter2.json # Checkpoint after iteration 2
262
- └── workspace/
263
- └── query_123/
264
- ├── papers/ # Downloaded PDFs
265
- ├── search_results/ # Raw search results
266
- └── analysis/ # Intermediate analysis
267
- ```
268
-
269
- **Why Pydantic?**
270
- - Type safety
271
- - Validation
272
- - Easy serialization
273
- - Integration with Pydantic AI
274
-
275
- **Why checkpoints?**
276
- - Resume interrupted research
277
- - Debugging (inspect state at each iteration)
278
- - Cost savings (don't re-query)
279
- - Demo resilience
280
-
281
- ---
282
-
283
- ## 6. Tool Interface Pattern
284
-
285
- ### Decision: Async Unified Tool Protocol
286
-
287
- **Pattern**:
288
- ```python
289
- from typing import Protocol, Optional, List, Dict
290
- import asyncio
291
-
292
- class ResearchTool(Protocol):
293
- """Standard async interface all tools must implement"""
294
-
295
- async def search(
296
- self,
297
- query: str,
298
- max_results: int = 10,
299
- filters: Optional[Dict] = None
300
- ) -> List[Evidence]:
301
- """Execute search and return structured evidence"""
302
- ...
303
-
304
- def get_metadata(self) -> ToolMetadata:
305
- """Tool capabilities and requirements"""
306
- ...
307
-
308
- class PubMedSearchTool:
309
- """Concrete async implementation"""
310
-
311
- def __init__(self):
312
- self._rate_limiter = asyncio.Semaphore(3) # 3 req/sec
313
- self._cache: Dict[str, List[Evidence]] = {}
314
-
315
- async def search(self, query: str, max_results: int = 10, **kwargs) -> List[Evidence]:
316
- # Check cache first
317
- cache_key = f"{query}:{max_results}"
318
- if cache_key in self._cache:
319
- return self._cache[cache_key]
320
-
321
- async with self._rate_limiter:
322
- # 1. Query PubMed E-utilities API (async httpx)
323
- async with httpx.AsyncClient() as client:
324
- response = await client.get(
325
- "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi",
326
- params={"db": "pubmed", "term": query, "retmax": max_results}
327
- )
328
- # 2. Parse XML response
329
- # 3. Extract: title, abstract, authors, citations
330
- # 4. Convert to Evidence objects
331
- evidence_list = self._parse_response(response.text)
332
-
333
- # Cache results
334
- self._cache[cache_key] = evidence_list
335
- return evidence_list
336
-
337
- def get_metadata(self) -> ToolMetadata:
338
- return ToolMetadata(
339
- name="PubMed",
340
- description="Biomedical literature search",
341
- rate_limit="3 requests/second",
342
- requires_api_key=False
343
- )
344
- ```
345
-
346
- **Parallel Tool Execution**:
347
- ```python
348
- async def search_all_tools(query: str, tools: List[ResearchTool]) -> List[Evidence]:
349
- """Run all tool searches in parallel"""
350
- tasks = [tool.search(query) for tool in tools]
351
- results = await asyncio.gather(*tasks, return_exceptions=True)
352
-
353
- # Flatten and filter errors
354
- evidence = []
355
- for result in results:
356
- if isinstance(result, Exception):
357
- logger.warning(f"Tool failed: {result}")
358
- else:
359
- evidence.extend(result)
360
- return evidence
361
- ```
362
-
363
- **Why Async?**
364
- - Tools are I/O bound (network calls)
365
- - Parallel execution = faster searches
366
- - Better UX (streaming progress)
367
- - Standard in 2025 Python
368
-
369
- **Why Protocol?**
370
- - Loose coupling
371
- - Easy to add new tools
372
- - Testable with mocks
373
- - Clear contract
374
-
375
- **Why NOT abstract base class?**
376
- - More Pythonic (PEP 544)
377
- - Duck typing friendly
378
- - Runtime checking with isinstance
379
-
380
- ---
381
-
382
- ## 7. Report Generation Pattern
383
-
384
- ### Decision: Structured Output with Citations
385
-
386
- **Pattern**:
387
- ```python
388
- class DrugCandidate(BaseModel):
389
- name: str
390
- mechanism: str
391
- evidence_quality: Literal["strong", "moderate", "weak"]
392
- clinical_status: str # "FDA approved", "Phase 2", etc.
393
- citations: List[Citation]
394
-
395
- class ResearchReport(BaseModel):
396
- query: str
397
- disease_mechanism: str
398
- candidates: List[DrugCandidate]
399
- methodology: str # How we searched
400
- confidence: float
401
- sources_used: List[str]
402
- generated_at: datetime
403
-
404
- def to_markdown(self) -> str:
405
- """Human-readable format"""
406
- ...
407
-
408
- def to_json(self) -> str:
409
- """Machine-readable format"""
410
- ...
411
- ```
412
-
413
- **Output Example**:
414
- ```markdown
415
- # Research Report: Long COVID Fatigue
416
-
417
- ## Disease Mechanism
418
- Long COVID fatigue is associated with mitochondrial dysfunction
419
- and persistent inflammation [1, 2].
420
-
421
- ## Drug Candidates
422
-
423
- ### 1. Coenzyme Q10 (CoQ10) - STRONG EVIDENCE
424
- - **Mechanism**: Mitochondrial support, ATP production
425
- - **Status**: FDA approved (supplement)
426
- - **Evidence**: 2 randomized controlled trials showing fatigue reduction
427
- - **Citations**:
428
- - Smith et al. (2023) - PubMed: 12345678
429
- - Johnson et al. (2023) - PubMed: 87654321
430
-
431
- ### 2. Low-dose Naltrexone (LDN) - MODERATE EVIDENCE
432
- - **Mechanism**: Anti-inflammatory, immune modulation
433
- - **Status**: FDA approved (different indication)
434
- - **Evidence**: 3 case studies, 1 ongoing Phase 2 trial
435
- - **Citations**: ...
436
-
437
- ## Methodology
438
- - Searched PubMed: 45 papers reviewed
439
- - Searched Web: 12 sources
440
- - Clinical trials: 8 trials identified
441
- - Total iterations: 3
442
- - Tokens used: 12,450
443
-
444
- ## Confidence: 85%
445
-
446
- ## Sources
447
- - PubMed E-utilities
448
- - ClinicalTrials.gov
449
- - OpenFDA Database
450
- ```
451
-
452
- **Why structured?**
453
- - Parseable by other systems
454
- - Consistent format
455
- - Easy to validate
456
- - Good for datasets
457
-
458
- **Why markdown?**
459
- - Human-readable
460
- - Renders nicely in Gradio
461
- - Easy to convert to PDF
462
- - Standard format
463
-
464
- ---
465
-
466
- ## 8. Error Handling Pattern
467
-
468
- ### Decision: Graceful Degradation with Fallbacks
469
-
470
- **Pattern**:
471
- ```python
472
- class ResearchAgent:
473
- def research(self, question: str) -> ResearchReport:
474
- try:
475
- return self._research_with_retry(question)
476
- except TokenBudgetExceeded:
477
- # Return partial results
478
- return self._synthesize_partial(state)
479
- except ToolFailure as e:
480
- # Try alternate tools
481
- return self._research_with_fallback(question, failed_tool=e.tool)
482
- except Exception as e:
483
- # Log and return error report
484
- logger.error(f"Research failed: {e}")
485
- return self._error_report(question, error=e)
486
- ```
487
-
488
- **Why NOT fail fast?**
489
- - Hackathon demo must be robust
490
- - Partial results better than nothing
491
- - Good user experience
492
-
493
- **Why NOT silent failures?**
494
- - Need visibility for debugging
495
- - User should know limitations
496
- - Honest about confidence
497
-
498
- ---
499
-
500
- ## 9. Configuration Pattern
501
-
502
- ### Decision: Hydra-inspired but Simpler
503
-
504
- **Pattern**:
505
- ```toml
506
- # config.toml
507
-
508
- [research]
509
- max_iterations = 5
510
- max_tokens = 50000
511
- max_time_seconds = 120
512
- judge_threshold = 0.85
513
-
514
- [tools]
515
- enabled = ["pubmed", "web", "trials"]
516
-
517
- [tools.pubmed]
518
- max_results = 20
519
- rate_limit = 3 # per second
520
-
521
- [tools.web]
522
- engine = "serpapi"
523
- max_results = 10
524
-
525
- [llm]
526
- provider = "anthropic"
527
- model = "claude-3-5-sonnet-20241022"
528
- temperature = 0.1
529
-
530
- [output]
531
- format = "markdown"
532
- include_citations = true
533
- include_methodology = true
534
- ```
535
-
536
- **Loading**:
537
- ```python
538
- from pathlib import Path
539
- import tomllib
540
-
541
- def load_config() -> dict:
542
- config_path = Path("config.toml")
543
- with open(config_path, "rb") as f:
544
- return tomllib.load(f)
545
- ```
546
-
547
- **Why NOT full Hydra?**
548
- - Simpler for hackathon
549
- - Easier to understand
550
- - Faster to modify
551
- - Can upgrade later
552
-
553
- **Why TOML?**
554
- - Human-readable
555
- - Standard (PEP 680)
556
- - Better than YAML edge cases
557
- - Native in Python 3.11+
558
-
559
- ---
560
-
561
- ## 10. Testing Pattern
562
-
563
- ### Decision: Three-Level Testing Strategy
564
-
565
- **Pattern**:
566
- ```python
567
- # Level 1: Unit tests (fast, isolated)
568
- def test_pubmed_tool():
569
- tool = PubMedSearchTool()
570
- results = tool.search("aspirin cardiovascular")
571
- assert len(results) > 0
572
- assert all(isinstance(r, Evidence) for r in results)
573
-
574
- # Level 2: Integration tests (tools + agent)
575
- def test_research_loop():
576
- agent = ResearchAgent(config=test_config)
577
- report = agent.research("aspirin repurposing")
578
- assert report.candidates
579
- assert report.confidence > 0
580
-
581
- # Level 3: End-to-end tests (full system)
582
- def test_full_workflow():
583
- # Simulate user query through Gradio UI
584
- response = gradio_app.predict("test query")
585
- assert "Drug Candidates" in response
586
- ```
587
-
588
- **Why three levels?**
589
- - Fast feedback (unit tests)
590
- - Confidence (integration tests)
591
- - Reality check (e2e tests)
592
-
593
- **Test Data**:
594
- ```python
595
- # tests/fixtures/
596
- - mock_pubmed_response.xml
597
- - mock_web_results.json
598
- - sample_research_query.txt
599
- - expected_report.md
600
- ```
601
-
602
- ---
603
-
604
- ## 11. Judge Prompt Templates
605
-
606
- ### Decision: Structured JSON Output with Domain-Specific Criteria
607
-
608
- **Quality Judge System Prompt**:
609
- ```python
610
- QUALITY_JUDGE_SYSTEM = """You are a medical research quality assessor specializing in drug repurposing.
611
- Your task is to evaluate if collected evidence is sufficient to answer a drug repurposing question.
612
-
613
- You assess evidence against four criteria specific to drug repurposing research:
614
- 1. MECHANISM: Understanding of the disease's molecular/cellular mechanisms
615
- 2. CANDIDATES: Identification of potential drug candidates with known mechanisms
616
- 3. EVIDENCE: Clinical or preclinical evidence supporting repurposing
617
- 4. SOURCES: Quality and credibility of sources (peer-reviewed > preprints > web)
618
-
619
- You MUST respond with valid JSON only. No other text."""
620
- ```
621
-
622
- **Quality Judge User Prompt**:
623
- ```python
624
- QUALITY_JUDGE_USER = """
625
- ## Research Question
626
- {question}
627
-
628
- ## Evidence Collected (Iteration {iteration} of {max_iterations})
629
- {evidence_summary}
630
-
631
- ## Token Budget
632
- Used: {tokens_used} / {max_tokens}
633
-
634
- ## Your Assessment
635
-
636
- Evaluate the evidence and respond with this exact JSON structure:
637
-
638
- ```json
639
- {{
640
- "assessment": {{
641
- "mechanism_score": <0-10>,
642
- "mechanism_reasoning": "<Step-by-step analysis of mechanism understanding>",
643
- "candidates_score": <0-10>,
644
- "candidates_found": ["<drug1>", "<drug2>", ...],
645
- "evidence_score": <0-10>,
646
- "evidence_reasoning": "<Critical evaluation of clinical/preclinical support>",
647
- "sources_score": <0-10>,
648
- "sources_breakdown": {{
649
- "peer_reviewed": <count>,
650
- "clinical_trials": <count>,
651
- "preprints": <count>,
652
- "other": <count>
653
- }}
654
- }},
655
- "overall_confidence": <0.0-1.0>,
656
- "sufficient": <true/false>,
657
- "gaps": ["<missing info 1>", "<missing info 2>"],
658
- "recommended_searches": ["<search query 1>", "<search query 2>"],
659
- "recommendation": "<continue|synthesize>"
660
- }}
661
- ```
662
-
663
- Decision rules:
664
- - sufficient=true if overall_confidence >= 0.8 AND mechanism_score >= 6 AND candidates_score >= 6
665
- - sufficient=true if remaining budget < 10% (must synthesize with what we have)
666
- - Otherwise, provide recommended_searches to fill gaps
667
- """
668
- ```
669
-
670
- **Report Synthesis Prompt**:
671
- ```python
672
- SYNTHESIS_PROMPT = """You are a medical research synthesizer creating a drug repurposing report.
673
-
674
- ## Research Question
675
- {question}
676
-
677
- ## Collected Evidence
678
- {all_evidence}
679
-
680
- ## Judge Assessment
681
- {final_assessment}
682
-
683
- ## Your Task
684
- Create a comprehensive research report with this structure:
685
-
686
- 1. **Executive Summary** (2-3 sentences)
687
- 2. **Disease Mechanism** - What we understand about the condition
688
- 3. **Drug Candidates** - For each candidate:
689
- - Drug name and current FDA status
690
- - Proposed mechanism for this condition
691
- - Evidence quality (strong/moderate/weak)
692
- - Key citations
693
- 4. **Methodology** - How we searched (tools used, queries, iterations)
694
- 5. **Limitations** - What we couldn't find or verify
695
- 6. **Confidence Score** - Overall confidence in findings
696
-
697
- Format as Markdown. Include PubMed IDs as citations [PMID: 12345678].
698
- Be scientifically accurate. Do not hallucinate drug names or mechanisms.
699
- If evidence is weak, say so clearly."""
700
- ```
701
-
702
- **Why Structured JSON?**
703
- - Parseable by code (not just LLM output)
704
- - Consistent format for logging/debugging
705
- - Can trigger specific actions (continue vs synthesize)
706
- - Testable with expected outputs
707
-
708
- **Why Domain-Specific Criteria?**
709
- - Generic "is this good?" prompts fail
710
- - Drug repurposing has specific requirements
711
- - Physician on team validated criteria
712
- - Maps to real research workflow
713
-
714
- ---
715
-
716
- ## 12. MCP Server Integration (Hackathon Track)
717
-
718
- ### Decision: Tools as MCP Servers for Reusability
719
-
720
- **Why MCP?**
721
- - Hackathon has dedicated MCP track
722
- - Makes our tools reusable by others
723
- - Standard protocol (Model Context Protocol)
724
- - Future-proof (industry adoption growing)
725
-
726
- **Architecture**:
727
- ```
728
- ┌─────────────────────────────────────────────────┐
729
- │ DeepCritical Agent │
730
- │ (uses tools directly OR via MCP) │
731
- └─────────────────────────────────────────────────┘
732
-
733
- ┌────────────┼────────────┐
734
- ↓ ↓ ↓
735
- ┌─────────────┐ ┌──────────┐ ┌───────────────┐
736
- │ PubMed MCP │ │ Web MCP │ │ Trials MCP │
737
- │ Server │ │ Server │ │ Server │
738
- └─────────────┘ └──────────┘ └───────────────┘
739
- │ │ │
740
- ↓ ↓ ↓
741
- PubMed API Brave/DDG ClinicalTrials.gov
742
- ```
743
-
744
- **PubMed MCP Server Implementation**:
745
- ```python
746
- # src/mcp_servers/pubmed_server.py
747
- from fastmcp import FastMCP
748
-
749
- mcp = FastMCP("PubMed Research Tool")
750
-
751
- @mcp.tool()
752
- async def search_pubmed(
753
- query: str,
754
- max_results: int = 10,
755
- date_range: str = "5y"
756
- ) -> dict:
757
- """
758
- Search PubMed for biomedical literature.
759
-
760
- Args:
761
- query: Search terms (supports PubMed syntax like [MeSH])
762
- max_results: Maximum papers to return (default 10, max 100)
763
- date_range: Time filter - "1y", "5y", "10y", or "all"
764
-
765
- Returns:
766
- dict with papers list containing title, abstract, authors, pmid, date
767
- """
768
- tool = PubMedSearchTool()
769
- results = await tool.search(query, max_results)
770
- return {
771
- "query": query,
772
- "count": len(results),
773
- "papers": [r.model_dump() for r in results]
774
- }
775
-
776
- @mcp.tool()
777
- async def get_paper_details(pmid: str) -> dict:
778
- """
779
- Get full details for a specific PubMed paper.
780
-
781
- Args:
782
- pmid: PubMed ID (e.g., "12345678")
783
-
784
- Returns:
785
- Full paper metadata including abstract, MeSH terms, references
786
- """
787
- tool = PubMedSearchTool()
788
- return await tool.get_details(pmid)
789
-
790
- if __name__ == "__main__":
791
- mcp.run()
792
- ```
793
-
794
- **Running the MCP Server**:
795
- ```bash
796
- # Start the server
797
- python -m src.mcp_servers.pubmed_server
798
-
799
- # Or with uvx (recommended)
800
- uvx fastmcp run src/mcp_servers/pubmed_server.py
801
-
802
- # Note: fastmcp uses stdio transport by default, which is perfect
803
- # for local integration with Claude Desktop or the main agent.
804
- ```
805
-
806
- **Claude Desktop Integration** (for demo):
807
- ```json
808
- // ~/Library/Application Support/Claude/claude_desktop_config.json
809
- {
810
- "mcpServers": {
811
- "pubmed": {
812
- "command": "python",
813
- "args": ["-m", "src.mcp_servers.pubmed_server"],
814
- "cwd": "/path/to/deepcritical"
815
- }
816
- }
817
- }
818
- ```
819
-
820
- **Why FastMCP?**
821
- - Simple decorator syntax
822
- - Handles protocol complexity
823
- - Good docs and examples
824
- - Works with Claude Desktop and API
825
-
826
- **MCP Track Submission Requirements**:
827
- - [ ] At least one tool as MCP server
828
- - [ ] README with setup instructions
829
- - [ ] Demo showing MCP usage
830
- - [ ] Bonus: Multiple tools as MCP servers
831
-
832
- ---
833
-
834
- ## 13. Gradio UI Pattern (Hackathon Track)
835
-
836
- ### Decision: Streaming Progress with Modern UI
837
-
838
- **Pattern**:
839
- ```python
840
- import gradio as gr
841
- from typing import Generator
842
-
843
- def research_with_streaming(question: str) -> Generator[str, None, None]:
844
- """Stream research progress to UI"""
845
- yield "🔍 Starting research...\n\n"
846
-
847
- agent = ResearchAgent()
848
-
849
- async for event in agent.research_stream(question):
850
- match event.type:
851
- case "search_start":
852
- yield f"📚 Searching {event.tool}...\n"
853
- case "search_complete":
854
- yield f"✅ Found {event.count} results from {event.tool}\n"
855
- case "judge_thinking":
856
- yield f"🤔 Evaluating evidence quality...\n"
857
- case "judge_decision":
858
- yield f"📊 Confidence: {event.confidence:.0%}\n"
859
- case "iteration_complete":
860
- yield f"🔄 Iteration {event.iteration} complete\n\n"
861
- case "synthesis_start":
862
- yield f"📝 Generating report...\n"
863
- case "complete":
864
- yield f"\n---\n\n{event.report}"
865
-
866
- # Gradio 5 UI
867
- with gr.Blocks(theme=gr.themes.Soft()) as demo:
868
- gr.Markdown("# 🔬 DeepCritical: Drug Repurposing Research Agent")
869
- gr.Markdown("Ask a question about potential drug repurposing opportunities.")
870
-
871
- with gr.Row():
872
- with gr.Column(scale=2):
873
- question = gr.Textbox(
874
- label="Research Question",
875
- placeholder="What existing drugs might help treat long COVID fatigue?",
876
- lines=2
877
- )
878
- examples = gr.Examples(
879
- examples=[
880
- "What existing drugs might help treat long COVID fatigue?",
881
- "Find existing drugs that might slow Alzheimer's progression",
882
- "Which diabetes drugs show promise for cancer treatment?"
883
- ],
884
- inputs=question
885
- )
886
- submit = gr.Button("🚀 Start Research", variant="primary")
887
-
888
- with gr.Column(scale=3):
889
- output = gr.Markdown(label="Research Progress & Report")
890
-
891
- submit.click(
892
- fn=research_with_streaming,
893
- inputs=question,
894
- outputs=output,
895
- )
896
-
897
- demo.launch()
898
- ```
899
-
900
- **Why Streaming?**
901
- - User sees progress, not loading spinner
902
- - Builds trust (system is working)
903
- - Better UX for long operations
904
- - Gradio 5 native support
905
-
906
- **Why gr.Markdown Output?**
907
- - Research reports are markdown
908
- - Renders citations nicely
909
- - Code blocks for methodology
910
- - Tables for drug comparisons
911
-
912
- ---
913
-
914
- ## Summary: Design Decision Table
915
-
916
- | # | Question | Decision | Why |
917
- |---|----------|----------|-----|
918
- | 1 | **Architecture** | Orchestrator with search-judge loop | Clear, testable, proven |
919
- | 2 | **Tools** | Static registry, dynamic selection | Balance flexibility vs simplicity |
920
- | 3 | **Judge** | Dual (quality + budget) | Quality + cost control |
921
- | 4 | **Stopping** | Four-tier conditions | Defense in depth |
922
- | 5 | **State** | Pydantic + checkpoints | Type-safe, resumable |
923
- | 6 | **Tool Interface** | Async Protocol + parallel execution | Fast I/O, modern Python |
924
- | 7 | **Output** | Structured + Markdown | Human & machine readable |
925
- | 8 | **Errors** | Graceful degradation + fallbacks | Robust for demo |
926
- | 9 | **Config** | TOML (Hydra-inspired) | Simple, standard |
927
- | 10 | **Testing** | Three levels | Fast feedback + confidence |
928
- | 11 | **Judge Prompts** | Structured JSON + domain criteria | Parseable, medical-specific |
929
- | 12 | **MCP** | Tools as MCP servers | Hackathon track, reusability |
930
- | 13 | **UI** | Gradio 5 streaming | Progress visibility, modern UX |
931
-
932
- ---
933
-
934
- ## Answers to Specific Questions
935
-
936
- ### "What's the orchestrator pattern?"
937
- **Answer**: See Section 1 - Iterative Research Orchestrator with search-judge loop
938
-
939
- ### "LLM-as-judge or token budget?"
940
- **Answer**: Both - See Section 3 (Dual-Judge System) and Section 4 (Three-Tier Break Conditions)
941
-
942
- ### "What's the break pattern?"
943
- **Answer**: See Section 4 - Three stopping conditions: quality threshold, token budget, max iterations
944
-
945
- ### "Should we use agent factories?"
946
- **Answer**: No - See Section 2. Static tool registry is simpler for 6-day timeline
947
-
948
- ### "How do we handle state?"
949
- **Answer**: See Section 5 - Pydantic state machine with checkpoints
950
-
951
- ---
952
-
953
- ## Appendix: Complete Data Models
954
-
955
- ```python
956
- # src/deepresearch/models.py
957
- from pydantic import BaseModel, Field
958
- from typing import List, Optional, Literal
959
- from datetime import datetime
960
-
961
- class Citation(BaseModel):
962
- """Reference to a source"""
963
- source_type: Literal["pubmed", "web", "trial", "fda"]
964
- identifier: str # PMID, URL, NCT number, etc.
965
- title: str
966
- authors: Optional[List[str]] = None
967
- date: Optional[str] = None
968
- url: Optional[str] = None
969
-
970
- class Evidence(BaseModel):
971
- """Single piece of evidence from search"""
972
- content: str
973
- source: Citation
974
- relevance_score: float = Field(ge=0, le=1)
975
- evidence_type: Literal["mechanism", "candidate", "clinical", "safety"]
976
-
977
- class DrugCandidate(BaseModel):
978
- """Potential drug for repurposing"""
979
- name: str
980
- generic_name: Optional[str] = None
981
- mechanism: str
982
- current_indications: List[str]
983
- proposed_mechanism: str
984
- evidence_quality: Literal["strong", "moderate", "weak"]
985
- fda_status: str
986
- citations: List[Citation]
987
-
988
- class JudgeAssessment(BaseModel):
989
- """Output from quality judge"""
990
- mechanism_score: int = Field(ge=0, le=10)
991
- candidates_score: int = Field(ge=0, le=10)
992
- evidence_score: int = Field(ge=0, le=10)
993
- sources_score: int = Field(ge=0, le=10)
994
- overall_confidence: float = Field(ge=0, le=1)
995
- sufficient: bool
996
- gaps: List[str]
997
- recommended_searches: List[str]
998
- recommendation: Literal["continue", "synthesize"]
999
-
1000
- class ResearchState(BaseModel):
1001
- """Complete state of a research session"""
1002
- query_id: str
1003
- question: str
1004
- iteration: int = 0
1005
- evidence: List[Evidence] = []
1006
- assessments: List[JudgeAssessment] = []
1007
- tokens_used: int = 0
1008
- search_history: List[str] = []
1009
- stop_reason: Optional[str] = None
1010
- created_at: datetime = Field(default_factory=datetime.utcnow)
1011
- updated_at: datetime = Field(default_factory=datetime.utcnow)
1012
-
1013
- class ResearchReport(BaseModel):
1014
- """Final output report"""
1015
- query: str
1016
- executive_summary: str
1017
- disease_mechanism: str
1018
- candidates: List[DrugCandidate]
1019
- methodology: str
1020
- limitations: str
1021
- confidence: float
1022
- sources_used: int
1023
- tokens_used: int
1024
- iterations: int
1025
- generated_at: datetime = Field(default_factory=datetime.utcnow)
1026
-
1027
- def to_markdown(self) -> str:
1028
- """Render as markdown for Gradio"""
1029
- md = f"# Research Report: {self.query}\n\n"
1030
- md += f"## Executive Summary\n{self.executive_summary}\n\n"
1031
- md += f"## Disease Mechanism\n{self.disease_mechanism}\n\n"
1032
- md += "## Drug Candidates\n\n"
1033
- for i, drug in enumerate(self.candidates, 1):
1034
- md += f"### {i}. {drug.name} - {drug.evidence_quality.upper()} EVIDENCE\n"
1035
- md += f"- **Mechanism**: {drug.proposed_mechanism}\n"
1036
- md += f"- **FDA Status**: {drug.fda_status}\n"
1037
- md += f"- **Current Uses**: {', '.join(drug.current_indications)}\n"
1038
- md += f"- **Citations**: {len(drug.citations)} sources\n\n"
1039
- md += f"## Methodology\n{self.methodology}\n\n"
1040
- md += f"## Limitations\n{self.limitations}\n\n"
1041
- md += f"## Confidence: {self.confidence:.0%}\n"
1042
- return md
1043
- ```
1044
-
1045
- ---
1046
-
1047
- ## 14. Alternative Frameworks Considered
1048
-
1049
- We researched major agent frameworks before settling on our stack. Here's why we chose what we chose, and what we'd steal if we're shipping like animals and have time for Gucci upgrades.
1050
-
1051
- ### Frameworks Evaluated
1052
-
1053
- | Framework | Repo | What It Does |
1054
- |-----------|------|--------------|
1055
- | **Microsoft AutoGen** | [github.com/microsoft/autogen](https://github.com/microsoft/autogen) | Multi-agent orchestration, complex workflows |
1056
- | **Claude Agent SDK** | [github.com/anthropics/claude-agent-sdk-python](https://github.com/anthropics/claude-agent-sdk-python) | Anthropic's official agent framework |
1057
- | **Pydantic AI** | [github.com/pydantic/pydantic-ai](https://github.com/pydantic/pydantic-ai) | Type-safe agents, structured outputs |
1058
-
1059
- ### Why NOT AutoGen (Microsoft)?
1060
-
1061
- **Pros:**
1062
- - Battle-tested multi-agent orchestration
1063
- - `reflect_on_tool_use` - model reviews its own tool results
1064
- - `max_tool_iterations` - built-in iteration limits
1065
- - Concurrent tool execution
1066
- - Rich ecosystem (AutoGen Studio, benchmarks)
1067
-
1068
- **Cons for MVP:**
1069
- - Heavy dependency tree (50+ packages)
1070
- - Complex configuration (YAML + Python)
1071
- - Overkill for single-agent search-judge loop
1072
- - Learning curve eats into 6-day timeline
1073
-
1074
- **Verdict:** Great for multi-agent systems. Overkill for our MVP.
1075
-
1076
- ### Why NOT Claude Agent SDK (Anthropic)?
1077
-
1078
- **Pros:**
1079
- - Official Anthropic framework
1080
- - Clean `@tool` decorator pattern
1081
- - In-process MCP servers (no subprocess)
1082
- - Hooks for pre/post tool execution
1083
- - Direct Claude Code integration
1084
-
1085
- **Cons for MVP:**
1086
- - Requires Claude Code CLI bundled
1087
- - Node.js dependency for some features
1088
- - Designed for Claude Code ecosystem, not standalone agents
1089
- - Less flexible for custom LLM providers
1090
-
1091
- **Verdict:** Would be great if we were building ON Claude Code. We're building a standalone agent.
1092
-
1093
- ### Why Pydantic AI + FastMCP (Our Choice)
1094
-
1095
- **Pros:**
1096
- - ✅ Simple, Pythonic API
1097
- - ✅ Native async/await
1098
- - ✅ Type-safe with Pydantic
1099
- - ✅ Works with any LLM provider
1100
- - ✅ FastMCP for clean MCP servers
1101
- - ✅ Minimal dependencies
1102
- - ✅ Can ship MVP in 6 days
1103
-
1104
- **Cons:**
1105
- - Newer framework (less battle-tested)
1106
- - Smaller ecosystem
1107
- - May need to build more from scratch
1108
-
1109
- **Verdict:** Right tool for the job. Ship fast, iterate later.
1110
-
1111
- ---
1112
-
1113
- ## 15. Stretch Goals: Gucci Bangers (If We're Shipping Like Animals)
1114
-
1115
- If MVP ships early and we're crushing it, here's what we'd steal from other frameworks:
1116
-
1117
- ### Tier 1: Quick Wins (2-4 hours each)
1118
-
1119
- #### From Claude Agent SDK: `@tool` Decorator Pattern
1120
- Replace our Protocol-based tools with cleaner decorators:
1121
-
1122
- ```python
1123
- # CURRENT (Protocol-based)
1124
- class PubMedSearchTool:
1125
- async def search(self, query: str, max_results: int = 10) -> List[Evidence]:
1126
- ...
1127
-
1128
- # UPGRADE (Decorator-based, stolen from Claude SDK)
1129
- from claude_agent_sdk import tool
1130
-
1131
- @tool("search_pubmed", "Search PubMed for biomedical papers", {
1132
- "query": str,
1133
- "max_results": int
1134
- })
1135
- async def search_pubmed(args):
1136
- results = await _do_pubmed_search(args["query"], args["max_results"])
1137
- return {"content": [{"type": "text", "text": json.dumps(results)}]}
1138
- ```
1139
-
1140
- **Why it's Gucci:** Cleaner syntax, automatic schema generation, less boilerplate.
1141
-
1142
- #### From AutoGen: Reflect on Tool Use
1143
- Add a reflection step where the model reviews its own tool results:
1144
-
1145
- ```python
1146
- # CURRENT: Judge evaluates evidence
1147
- assessment = await judge.assess(question, evidence)
1148
-
1149
- # UPGRADE: Add reflection step (stolen from AutoGen)
1150
- class ReflectiveJudge:
1151
- async def assess_with_reflection(self, question, evidence, tool_results):
1152
- # First pass: raw assessment
1153
- initial = await self._assess(question, evidence)
1154
-
1155
- # Reflection: "Did I use the tools correctly?"
1156
- reflection = await self._reflect_on_tool_use(tool_results)
1157
-
1158
- # Final: combine assessment + reflection
1159
- return self._combine(initial, reflection)
1160
- ```
1161
-
1162
- **Why it's Gucci:** Catches tool misuse, improves accuracy, more robust judge.
1163
-
1164
- ### Tier 2: Medium Lifts (4-8 hours each)
1165
-
1166
- #### From AutoGen: Concurrent Tool Execution
1167
- Run multiple tools in parallel with proper error handling:
1168
-
1169
- ```python
1170
- # CURRENT: Sequential with asyncio.gather
1171
- results = await asyncio.gather(*[tool.search(query) for tool in tools])
1172
-
1173
- # UPGRADE: AutoGen-style with cancellation + timeout
1174
- from autogen_core import CancellationToken
1175
-
1176
- async def execute_tools_concurrent(tools, query, timeout=30):
1177
- token = CancellationToken()
1178
-
1179
- async def run_with_timeout(tool):
1180
- try:
1181
- return await asyncio.wait_for(
1182
- tool.search(query, cancellation_token=token),
1183
- timeout=timeout
1184
- )
1185
- except asyncio.TimeoutError:
1186
- token.cancel() # Cancel other tools
1187
- return ToolError(f"{tool.name} timed out")
1188
-
1189
- return await asyncio.gather(*[run_with_timeout(t) for t in tools])
1190
- ```
1191
-
1192
- **Why it's Gucci:** Proper timeout handling, cancellation propagation, production-ready.
1193
-
1194
- #### From Claude SDK: Hooks System
1195
- Add pre/post hooks for logging, validation, cost tracking:
1196
-
1197
- ```python
1198
- # UPGRADE: Hook system (stolen from Claude SDK)
1199
- class HookManager:
1200
- async def pre_tool_use(self, tool_name, args):
1201
- """Called before every tool execution"""
1202
- logger.info(f"Calling {tool_name} with {args}")
1203
- self.cost_tracker.start_timer()
1204
-
1205
- async def post_tool_use(self, tool_name, result, duration):
1206
- """Called after every tool execution"""
1207
- self.cost_tracker.record(tool_name, duration)
1208
- if result.is_error:
1209
- self.error_tracker.record(tool_name, result.error)
1210
- ```
1211
-
1212
- **Why it's Gucci:** Observability, debugging, cost tracking, production-ready.
1213
-
1214
- ### Tier 3: Big Lifts (Post-Hackathon)
1215
-
1216
- #### Full AutoGen Integration
1217
- If we want multi-agent capabilities later:
1218
-
1219
- ```python
1220
- # POST-HACKATHON: Multi-agent drug repurposing
1221
- from autogen_agentchat import AssistantAgent, GroupChat
1222
-
1223
- literature_agent = AssistantAgent(
1224
- name="LiteratureReviewer",
1225
- tools=[pubmed_search, web_search],
1226
- system_message="You search and summarize medical literature."
1227
- )
1228
-
1229
- mechanism_agent = AssistantAgent(
1230
- name="MechanismAnalyzer",
1231
- tools=[pathway_db, protein_db],
1232
- system_message="You analyze disease mechanisms and drug targets."
1233
- )
1234
-
1235
- synthesis_agent = AssistantAgent(
1236
- name="ReportSynthesizer",
1237
- system_message="You synthesize findings into actionable reports."
1238
- )
1239
-
1240
- # Orchestrate multi-agent workflow
1241
- group_chat = GroupChat(
1242
- agents=[literature_agent, mechanism_agent, synthesis_agent],
1243
- max_round=10
1244
- )
1245
- ```
1246
-
1247
- **Why it's Gucci:** True multi-agent collaboration, specialized roles, scalable.
1248
-
1249
- ---
1250
-
1251
- ## Priority Order for Stretch Goals
1252
-
1253
- | Priority | Feature | Source | Effort | Impact |
1254
- |----------|---------|--------|--------|--------|
1255
- | 1 | `@tool` decorator | Claude SDK | 2 hrs | High - cleaner code |
1256
- | 2 | Reflect on tool use | AutoGen | 3 hrs | High - better accuracy |
1257
- | 3 | Hooks system | Claude SDK | 4 hrs | Medium - observability |
1258
- | 4 | Concurrent + cancellation | AutoGen | 4 hrs | Medium - robustness |
1259
- | 5 | Multi-agent | AutoGen | 8+ hrs | Post-hackathon |
1260
-
1261
- ---
1262
-
1263
- ## The Bottom Line
1264
-
1265
- ```
1266
- ┌─────────────────────────────────────────────────────────────┐
1267
- │ MVP (Days 1-4): Pydantic AI + FastMCP │
1268
- │ - Ship working drug repurposing agent │
1269
- │ - Search-judge loop with PubMed + Web │
1270
- │ - Gradio UI with streaming │
1271
- │ - MCP server for hackathon track │
1272
- ├─────────────────────────────────────────────────────────────┤
1273
- │ If Crushing It (Days 5-6): Steal the Gucci │
1274
- │ - @tool decorators from Claude SDK │
1275
- │ - Reflect on tool use from AutoGen │
1276
- │ - Hooks for observability │
1277
- ├─────────────────────────────────────────────────────────────┤
1278
- │ Post-Hackathon: Full AutoGen Integration │
1279
- │ - Multi-agent workflows │
1280
- │ - Specialized agent roles │
1281
- │ - Production-grade orchestration │
1282
- └─────────────────────────────────────────────────────────────┘
1283
- ```
1284
-
1285
- **Ship MVP first. Steal bangers if time. Scale later.**
1286
-
1287
- ---
1288
-
1289
- ## 16. Reference Implementation Resources
1290
-
1291
- We've cloned production-ready repos into `reference_repos/` that we can vendor, copy from, or just USE directly. This section documents what's available and how to leverage it.
1292
-
1293
- ### Cloned Repositories
1294
-
1295
- | Repository | Location | What It Provides |
1296
- |------------|----------|------------------|
1297
- | **pydanticai-research-agent** | `reference_repos/pydanticai-research-agent/` | Complete PydanticAI agent with Brave Search |
1298
- | **pubmed-mcp-server** | `reference_repos/pubmed-mcp-server/` | Production-grade PubMed MCP server (TypeScript) |
1299
- | **autogen-microsoft** | `reference_repos/autogen-microsoft/` | Microsoft's multi-agent framework |
1300
- | **claude-agent-sdk** | `reference_repos/claude-agent-sdk/` | Anthropic's agent SDK with @tool decorator |
1301
-
1302
- ### 🔥 CHEAT CODE: Production PubMed MCP Already Exists
1303
-
1304
- The `pubmed-mcp-server` is **production-grade** and has EVERYTHING we need:
1305
-
1306
- ```bash
1307
- # Already available tools in pubmed-mcp-server:
1308
- pubmed_search_articles # Search PubMed with filters, date ranges
1309
- pubmed_fetch_contents # Get full article details by PMID
1310
- pubmed_article_connections # Find citations, related articles
1311
- pubmed_research_agent # Generate research plan outlines
1312
- pubmed_generate_chart # Create PNG charts from data
1313
- ```
1314
-
1315
- **Option 1: Use it directly via npx**
1316
- ```json
1317
- {
1318
- "mcpServers": {
1319
- "pubmed": {
1320
- "command": "npx",
1321
- "args": ["@cyanheads/pubmed-mcp-server"],
1322
- "env": { "NCBI_API_KEY": "your_key" }
1323
- }
1324
- }
1325
- }
1326
- ```
1327
-
1328
- **Option 2: Vendor the logic into Python**
1329
- The TypeScript code in `reference_repos/pubmed-mcp-server/src/` shows exactly how to:
1330
- - Construct PubMed E-utilities queries
1331
- - Handle rate limiting (3/sec without key, 10/sec with key)
1332
- - Parse XML responses
1333
- - Extract article metadata
1334
-
1335
- ### PydanticAI Research Agent Patterns
1336
-
1337
- The `pydanticai-research-agent` repo provides copy-paste patterns:
1338
-
1339
- **Agent Definition** (`agents/research_agent.py`):
1340
- ```python
1341
- from pydantic_ai import Agent, RunContext
1342
- from dataclasses import dataclass
1343
-
1344
- @dataclass
1345
- class ResearchAgentDependencies:
1346
- brave_api_key: str
1347
- session_id: Optional[str] = None
1348
-
1349
- research_agent = Agent(
1350
- get_llm_model(),
1351
- deps_type=ResearchAgentDependencies,
1352
- system_prompt=SYSTEM_PROMPT
1353
- )
1354
-
1355
- @research_agent.tool
1356
- async def search_web(
1357
- ctx: RunContext[ResearchAgentDependencies],
1358
- query: str,
1359
- max_results: int = 10
1360
- ) -> List[Dict[str, Any]]:
1361
- """Search with context access via ctx.deps"""
1362
- results = await search_web_tool(ctx.deps.brave_api_key, query, max_results)
1363
- return results
1364
- ```
1365
-
1366
- **Brave Search Tool** (`tools/brave_search.py`):
1367
- ```python
1368
- async def search_web_tool(api_key: str, query: str, count: int = 10) -> List[Dict]:
1369
- headers = {"X-Subscription-Token": api_key, "Accept": "application/json"}
1370
- async with httpx.AsyncClient() as client:
1371
- response = await client.get(
1372
- "https://api.search.brave.com/res/v1/web/search",
1373
- headers=headers,
1374
- params={"q": query, "count": count},
1375
- timeout=30.0
1376
- )
1377
- # Handle 429 rate limit, 401 auth errors
1378
- data = response.json()
1379
- return data.get("web", {}).get("results", [])
1380
- ```
1381
-
1382
- **Pydantic Models** (`models/research_models.py`):
1383
- ```python
1384
- class BraveSearchResult(BaseModel):
1385
- title: str
1386
- url: str
1387
- description: str
1388
- score: float = Field(ge=0.0, le=1.0)
1389
- ```
1390
-
1391
- ### Microsoft Agent Framework Orchestration Patterns
1392
-
1393
- From [deepwiki.com/microsoft/agent-framework](https://deepwiki.com/microsoft/agent-framework/3.4-workflows-and-orchestration):
1394
-
1395
- #### Sequential Orchestration
1396
- ```
1397
- Agent A → Agent B → Agent C (each receives prior outputs)
1398
- ```
1399
- **Use when:** Tasks have dependencies, results inform next steps.
1400
-
1401
- #### Concurrent (Fan-out/Fan-in)
1402
- ```
1403
- ┌→ Agent A ─┐
1404
- Dispatcher ├→ Agent B ─┼→ Aggregator
1405
- └→ Agent C ─┘
1406
- ```
1407
- **Use when:** Independent tasks can run in parallel, results need consolidation.
1408
- **Our use:** Parallel PubMed + Web search.
1409
-
1410
- #### Handoff Orchestration
1411
- ```
1412
- Coordinator → routes to → Specialist A, B, or C based on request
1413
- ```
1414
- **Use when:** Router decides which search strategy based on query type.
1415
- **Our use:** Route "mechanism" vs "clinical trial" vs "drug info" queries.
1416
-
1417
- #### HITL (Human-in-the-Loop)
1418
- ```
1419
- Agent → RequestInfoEvent → Human validates → Agent continues
1420
- ```
1421
- **Use when:** Critical judgment points need human validation.
1422
- **Our use:** Optional "approve drug candidates before synthesis" step.
1423
-
1424
- ### Recommended Hybrid Pattern for Our Agent
1425
-
1426
- Based on all the research, here's our recommended implementation:
1427
-
1428
- ```
1429
- ┌─────────────────────────────────────────────────────────┐
1430
- │ 1. ROUTER (Handoff Pattern) │
1431
- │ - Analyze query type │
1432
- │ - Choose search strategy │
1433
- ├─────────────────────────────────────────────────────────┤
1434
- │ 2. SEARCH (Concurrent Pattern) │
1435
- │ - Fan-out to PubMed + Web in parallel │
1436
- │ - Timeout handling per AutoGen patterns │
1437
- │ - Aggregate results │
1438
- ├─────────────────────────────────────────────────────────┤
1439
- │ 3. JUDGE (Sequential + Budget) │
1440
- │ - Quality assessment │
1441
- │ - Token/iteration budget check │
1442
- │ - Recommend: continue or synthesize │
1443
- ├─────────────────────────────────────────────────────────┤
1444
- │ 4. SYNTHESIZE (Final Agent) │
1445
- │ - Generate research report │
1446
- │ - Include citations │
1447
- │ - Stream to Gradio UI │
1448
- └─────────────────────────────────────────────────────────┘
1449
- ```
1450
-
1451
- ### Quick Start: Minimal Implementation Path
1452
-
1453
- **Day 1-2: Core Loop**
1454
- 1. Copy `search_web_tool` from `pydanticai-research-agent/tools/brave_search.py`
1455
- 2. Implement PubMed search (reference `pubmed-mcp-server/src/` for E-utilities patterns)
1456
- 3. Wire up basic search-judge loop
1457
-
1458
- **Day 3: Judge + State**
1459
- 1. Implement quality judge with JSON structured output
1460
- 2. Add budget judge
1461
- 3. Add Pydantic state management
1462
-
1463
- **Day 4: UI + MCP**
1464
- 1. Gradio streaming UI
1465
- 2. Wrap PubMed tool as FastMCP server
1466
-
1467
- **Day 5-6: Polish + Deploy**
1468
- 1. HuggingFace Spaces deployment
1469
- 2. Demo video
1470
- 3. Stretch goals if time
1471
-
1472
- ---
1473
-
1474
- ## 17. External Resources & MCP Servers
1475
-
1476
- ### Available PubMed MCP Servers (Community)
1477
-
1478
- | Server | Author | Features | Link |
1479
- |--------|--------|----------|------|
1480
- | **pubmed-mcp-server** | cyanheads | Full E-utilities, research agent, charts | [GitHub](https://github.com/cyanheads/pubmed-mcp-server) |
1481
- | **BioMCP** | GenomOncology | PubMed + ClinicalTrials + MyVariant | [GitHub](https://github.com/genomoncology/biomcp) |
1482
- | **PubMed-MCP-Server** | JackKuo666 | Basic search, metadata access | [GitHub](https://github.com/JackKuo666/PubMed-MCP-Server) |
1483
-
1484
- ### Web Search Options
1485
-
1486
- | Tool | Free Tier | API Key | Async Support |
1487
- |------|-----------|---------|---------------|
1488
- | **Brave Search** | 2000/month | Required | Yes (httpx) |
1489
- | **DuckDuckGo** | Unlimited | No | Yes (duckduckgo-search) |
1490
- | **SerpAPI** | None | Required | Yes |
1491
-
1492
- **Recommended:** Start with DuckDuckGo (free, no key), upgrade to Brave for production.
1493
-
1494
- ```python
1495
- # DuckDuckGo async search (no API key needed!)
1496
- from duckduckgo_search import DDGS
1497
-
1498
- async def search_ddg(query: str, max_results: int = 10) -> List[Dict]:
1499
- with DDGS() as ddgs:
1500
- results = list(ddgs.text(query, max_results=max_results))
1501
- return [{"title": r["title"], "url": r["href"], "description": r["body"]} for r in results]
1502
- ```
1503
-
1504
- ---
1505
-
1506
- **Document Status**: Official Architecture Spec
1507
- **Review Score**: 100/100 (Ironclad Gucci Banger Edition)
1508
- **Sections**: 17 design patterns + data models appendix + reference repos + stretch goals
1509
- **Last Updated**: November 2025
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/architecture/graph-orchestration.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Graph Orchestration Architecture
2
+
3
+ ## Overview
4
+
5
+ Phase 4 implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
6
+
7
+ ## Graph Structure
8
+
9
+ ### Nodes
10
+
11
+ Graph nodes represent different stages in the research workflow:
12
+
13
+ 1. **Agent Nodes**: Execute Pydantic AI agents
14
+ - Input: Prompt/query
15
+ - Output: Structured or unstructured response
16
+ - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
17
+
18
+ 2. **State Nodes**: Update or read workflow state
19
+ - Input: Current state
20
+ - Output: Updated state
21
+ - Examples: Update evidence, update conversation history
22
+
23
+ 3. **Decision Nodes**: Make routing decisions based on conditions
24
+ - Input: Current state/results
25
+ - Output: Next node ID
26
+ - Examples: Continue research vs. complete research
27
+
28
+ 4. **Parallel Nodes**: Execute multiple nodes concurrently
29
+ - Input: List of node IDs
30
+ - Output: Aggregated results
31
+ - Examples: Parallel iterative research loops
32
+
33
+ ### Edges
34
+
35
+ Edges define transitions between nodes:
36
+
37
+ 1. **Sequential Edges**: Always traversed (no condition)
38
+ - From: Source node
39
+ - To: Target node
40
+ - Condition: None (always True)
41
+
42
+ 2. **Conditional Edges**: Traversed based on condition
43
+ - From: Source node
44
+ - To: Target node
45
+ - Condition: Callable that returns bool
46
+ - Example: If research complete → go to writer, else → continue loop
47
+
48
+ 3. **Parallel Edges**: Used for parallel execution branches
49
+ - From: Parallel node
50
+ - To: Multiple target nodes
51
+ - Execution: All targets run concurrently
52
+
53
+ ## Graph Patterns
54
+
55
+ ### Iterative Research Graph
56
+
57
+ ```
58
+ [Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
59
+ ↓ No ↓ Yes
60
+ [Tool Selector] [Writer]
61
+
62
+ [Execute Tools] → [Loop Back]
63
+ ```
64
+
65
+ ### Deep Research Graph
66
+
67
+ ```
68
+ [Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
69
+ ↓ ↓ ↓
70
+ [Loop1] [Loop2] [Loop3]
71
+ ```
72
+
73
+ ## State Management
74
+
75
+ State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
76
+
77
+ - **Evidence**: Collected evidence from searches
78
+ - **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
79
+ - **Embedding Service**: For semantic search
80
+
81
+ State transitions occur at state nodes, which update the global workflow state.
82
+
83
+ ## Execution Flow
84
+
85
+ 1. **Graph Construction**: Build graph from nodes and edges
86
+ 2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
87
+ 3. **Graph Execution**: Traverse graph from entry node
88
+ 4. **Node Execution**: Execute each node based on type
89
+ 5. **Edge Evaluation**: Determine next node(s) based on edges
90
+ 6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
91
+ 7. **State Updates**: Update state at state nodes
92
+ 8. **Event Streaming**: Yield events during execution for UI
93
+
94
+ ## Conditional Routing
95
+
96
+ Decision nodes evaluate conditions and return next node IDs:
97
+
98
+ - **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
99
+ - **Budget Decision**: If budget exceeded → exit, else → continue
100
+ - **Iteration Decision**: If max iterations → exit, else → continue
101
+
102
+ ## Parallel Execution
103
+
104
+ Parallel nodes execute multiple nodes concurrently:
105
+
106
+ - Each parallel branch runs independently
107
+ - Results are aggregated after all branches complete
108
+ - State is synchronized after parallel execution
109
+ - Errors in one branch don't stop other branches
110
+
111
+ ## Budget Enforcement
112
+
113
+ Budget constraints are enforced at decision nodes:
114
+
115
+ - **Token Budget**: Track LLM token usage
116
+ - **Time Budget**: Track elapsed time
117
+ - **Iteration Budget**: Track iteration count
118
+
119
+ If any budget is exceeded, execution routes to exit node.
120
+
121
+ ## Error Handling
122
+
123
+ Errors are handled at multiple levels:
124
+
125
+ 1. **Node Level**: Catch errors in individual node execution
126
+ 2. **Graph Level**: Handle errors during graph traversal
127
+ 3. **State Level**: Rollback state changes on error
128
+
129
+ Errors are logged and yield error events for UI.
130
+
131
+ ## Backward Compatibility
132
+
133
+ Graph execution is optional via feature flag:
134
+
135
+ - `USE_GRAPH_EXECUTION=true`: Use graph-based execution
136
+ - `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
137
+
138
+ This allows gradual migration and fallback if needed.
139
+
140
+
141
+
142
+
143
+
144
+
145
+
146
+
147
+
148
+
149
+
150
+
151
+
152
+
docs/architecture/graph_orchestration.md CHANGED
@@ -137,6 +137,14 @@ Graph execution is optional via feature flag:
137
 
138
  This allows gradual migration and fallback if needed.
139
 
 
 
 
 
 
 
 
 
140
 
141
 
142
 
 
137
 
138
  This allows gradual migration and fallback if needed.
139
 
140
+ ## See Also
141
+
142
+ - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
143
+ - [Workflows](workflows.md) - Workflow diagrams and patterns
144
+ - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
145
+ - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
146
+
147
+
148
 
149
 
150
 
docs/architecture/middleware.md CHANGED
@@ -137,6 +137,3 @@ All middleware components use `ContextVar` for thread-safe isolation:
137
 
138
 
139
 
140
-
141
-
142
-
 
137
 
138
 
139
 
 
 
 
docs/architecture/orchestrators.md ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Orchestrators Architecture
2
+
3
+ DeepCritical supports multiple orchestration patterns for research workflows.
4
+
5
+ ## Research Flows
6
+
7
+ ### IterativeResearchFlow
8
+
9
+ **File**: `src/orchestrator/research_flow.py`
10
+
11
+ **Pattern**: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete
12
+
13
+ **Agents Used**:
14
+ - `KnowledgeGapAgent`: Evaluates research completeness
15
+ - `ToolSelectorAgent`: Selects tools for addressing gaps
16
+ - `ThinkingAgent`: Generates observations
17
+ - `WriterAgent`: Creates final report
18
+ - `JudgeHandler`: Assesses evidence sufficiency
19
+
20
+ **Features**:
21
+ - Tracks iterations, time, budget
22
+ - Supports graph execution (`use_graph=True`) and agent chains (`use_graph=False`)
23
+ - Iterates until research complete or constraints met
24
+
25
+ **Usage**:
26
+ ```python
27
+ from src.orchestrator.research_flow import IterativeResearchFlow
28
+
29
+ flow = IterativeResearchFlow(
30
+ search_handler=search_handler,
31
+ judge_handler=judge_handler,
32
+ use_graph=False
33
+ )
34
+
35
+ async for event in flow.run(query):
36
+ # Handle events
37
+ pass
38
+ ```
39
+
40
+ ### DeepResearchFlow
41
+
42
+ **File**: `src/orchestrator/research_flow.py`
43
+
44
+ **Pattern**: Planner → Parallel iterative loops per section → Synthesizer
45
+
46
+ **Agents Used**:
47
+ - `PlannerAgent`: Breaks query into report sections
48
+ - `IterativeResearchFlow`: Per-section research (parallel)
49
+ - `LongWriterAgent` or `ProofreaderAgent`: Final synthesis
50
+
51
+ **Features**:
52
+ - Uses `WorkflowManager` for parallel execution
53
+ - Budget tracking per section and globally
54
+ - State synchronization across parallel loops
55
+ - Supports graph execution and agent chains
56
+
57
+ **Usage**:
58
+ ```python
59
+ from src.orchestrator.research_flow import DeepResearchFlow
60
+
61
+ flow = DeepResearchFlow(
62
+ search_handler=search_handler,
63
+ judge_handler=judge_handler,
64
+ use_graph=True
65
+ )
66
+
67
+ async for event in flow.run(query):
68
+ # Handle events
69
+ pass
70
+ ```
71
+
72
+ ## Graph Orchestrator
73
+
74
+ **File**: `src/orchestrator/graph_orchestrator.py`
75
+
76
+ **Purpose**: Graph-based execution using Pydantic AI agents as nodes
77
+
78
+ **Features**:
79
+ - Uses Pydantic AI Graphs (when available) or agent chains (fallback)
80
+ - Routes based on research mode (iterative/deep/auto)
81
+ - Streams `AgentEvent` objects for UI
82
+
83
+ **Node Types**:
84
+ - **Agent Nodes**: Execute Pydantic AI agents
85
+ - **State Nodes**: Update or read workflow state
86
+ - **Decision Nodes**: Make routing decisions
87
+ - **Parallel Nodes**: Execute multiple nodes concurrently
88
+
89
+ **Edge Types**:
90
+ - **Sequential Edges**: Always traversed
91
+ - **Conditional Edges**: Traversed based on condition
92
+ - **Parallel Edges**: Used for parallel execution branches
93
+
94
+ ## Orchestrator Factory
95
+
96
+ **File**: `src/orchestrator_factory.py`
97
+
98
+ **Purpose**: Factory for creating orchestrators
99
+
100
+ **Modes**:
101
+ - **Simple**: Legacy orchestrator (backward compatible)
102
+ - **Advanced**: Magentic orchestrator (requires OpenAI API key)
103
+ - **Auto-detect**: Chooses based on API key availability
104
+
105
+ **Usage**:
106
+ ```python
107
+ from src.orchestrator_factory import create_orchestrator
108
+
109
+ orchestrator = create_orchestrator(
110
+ search_handler=search_handler,
111
+ judge_handler=judge_handler,
112
+ config={},
113
+ mode="advanced" # or "simple" or None for auto-detect
114
+ )
115
+ ```
116
+
117
+ ## Magentic Orchestrator
118
+
119
+ **File**: `src/orchestrator_magentic.py`
120
+
121
+ **Purpose**: Multi-agent coordination using Microsoft Agent Framework
122
+
123
+ **Features**:
124
+ - Uses `agent-framework-core`
125
+ - ChatAgent pattern with internal LLMs per agent
126
+ - `MagenticBuilder` with participants: searcher, hypothesizer, judge, reporter
127
+ - Manager orchestrates agents via `OpenAIChatClient`
128
+ - Requires OpenAI API key (function calling support)
129
+ - Event-driven: converts Magentic events to `AgentEvent` for UI streaming
130
+
131
+ **Requirements**:
132
+ - `agent-framework-core` package
133
+ - OpenAI API key
134
+
135
+ ## Hierarchical Orchestrator
136
+
137
+ **File**: `src/orchestrator_hierarchical.py`
138
+
139
+ **Purpose**: Hierarchical orchestrator using middleware and sub-teams
140
+
141
+ **Features**:
142
+ - Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`
143
+ - Adapts Magentic ChatAgent to `SubIterationTeam` protocol
144
+ - Event-driven via `asyncio.Queue` for coordination
145
+ - Supports sub-iteration patterns for complex research tasks
146
+
147
+ ## Legacy Simple Mode
148
+
149
+ **File**: `src/legacy_orchestrator.py`
150
+
151
+ **Purpose**: Linear search-judge-synthesize loop
152
+
153
+ **Features**:
154
+ - Uses `SearchHandlerProtocol` and `JudgeHandlerProtocol`
155
+ - Generator-based design yielding `AgentEvent` objects
156
+ - Backward compatibility for simple use cases
157
+
158
+ ## State Initialization
159
+
160
+ All orchestrators must initialize workflow state:
161
+
162
+ ```python
163
+ from src.middleware.state_machine import init_workflow_state
164
+ from src.services.embeddings import get_embedding_service
165
+
166
+ embedding_service = get_embedding_service()
167
+ init_workflow_state(embedding_service)
168
+ ```
169
+
170
+ ## Event Streaming
171
+
172
+ All orchestrators yield `AgentEvent` objects:
173
+
174
+ **Event Types**:
175
+ - `started`: Research started
176
+ - `search_complete`: Search completed
177
+ - `judge_complete`: Evidence evaluation completed
178
+ - `hypothesizing`: Generating hypotheses
179
+ - `synthesizing`: Synthesizing results
180
+ - `complete`: Research completed
181
+ - `error`: Error occurred
182
+
183
+ **Event Structure**:
184
+ ```python
185
+ class AgentEvent:
186
+ type: str
187
+ iteration: int | None
188
+ data: dict[str, Any]
189
+ ```
190
+
191
+ ## See Also
192
+
193
+ - [Graph Orchestration](graph-orchestration.md) - Graph-based execution details
194
+ - [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
195
+ - [Workflows](workflows.md) - Workflow diagrams and patterns
196
+ - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
197
+ - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
198
+
docs/architecture/overview.md DELETED
@@ -1,474 +0,0 @@
1
- # DeepCritical: Medical Drug Repurposing Research Agent
2
- ## Project Overview
3
-
4
- ---
5
-
6
- ## Executive Summary
7
-
8
- **DeepCritical** is a deep research agent designed to accelerate medical drug repurposing research by autonomously searching, analyzing, and synthesizing evidence from multiple biomedical databases.
9
-
10
- ### The Problem We Solve
11
-
12
- Drug repurposing - finding new therapeutic uses for existing FDA-approved drugs - can take years of manual literature review. Researchers must:
13
- - Search thousands of papers across multiple databases
14
- - Identify molecular mechanisms
15
- - Find relevant clinical trials
16
- - Assess safety profiles
17
- - Synthesize evidence into actionable insights
18
-
19
- **DeepCritical automates this process from hours to minutes.**
20
-
21
- ### What Is Drug Repurposing?
22
-
23
- **Simple Explanation:**
24
- Using existing approved drugs to treat NEW diseases they weren't originally designed for.
25
-
26
- **Real Examples:**
27
- - **Viagra** (sildenafil): Originally for heart disease → Now treats erectile dysfunction
28
- - **Thalidomide**: Once banned → Now treats multiple myeloma
29
- - **Aspirin**: Pain reliever → Heart attack prevention
30
- - **Metformin**: Diabetes drug → Being tested for aging/longevity
31
-
32
- **Why It Matters:**
33
- - Faster than developing new drugs (years vs decades)
34
- - Cheaper (known safety profiles)
35
- - Lower risk (already FDA approved)
36
- - Immediate patient benefit potential
37
-
38
- ---
39
-
40
- ## Core Use Case
41
-
42
- ### Primary Query Type
43
- > "What existing drugs might help treat [disease/condition]?"
44
-
45
- ### Example Queries
46
-
47
- 1. **Long COVID Fatigue**
48
- - Query: "What existing drugs might help treat long COVID fatigue?"
49
- - Agent searches: PubMed, clinical trials, drug databases
50
- - Output: List of candidate drugs with mechanisms + evidence + citations
51
-
52
- 2. **Alzheimer's Disease**
53
- - Query: "Find existing drugs that target beta-amyloid pathways"
54
- - Agent identifies: Disease mechanisms → Drug candidates → Clinical evidence
55
- - Output: Comprehensive research report with drug candidates
56
-
57
- 3. **Rare Disease Treatment**
58
- - Query: "What drugs might help with fibrodysplasia ossificans progressiva?"
59
- - Agent finds: Similar conditions → Shared pathways → Potential treatments
60
- - Output: Evidence-based treatment suggestions
61
-
62
- ---
63
-
64
- ## System Architecture
65
-
66
- ### High-Level Design (Phases 1-8)
67
-
68
- ```text
69
- User Query
70
-
71
- Gradio UI (Phase 4)
72
-
73
- Magentic Manager (Phase 5) ← LLM-powered coordinator
74
- ├── SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
75
- ├── HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
76
- ├── JudgeAgent (Phase 3+5) ←→ Evidence Assessment
77
- └── ReportAgent (Phase 8) ←→ Final Synthesis
78
-
79
- Structured Research Report
80
- ```
81
-
82
- ### Key Components
83
-
84
- 1. **Magentic Manager (Orchestrator)**
85
- - LLM-powered multi-agent coordinator
86
- - Dynamic planning and agent selection
87
- - Built-in stall detection and replanning
88
- - Microsoft Agent Framework integration
89
-
90
- 2. **SearchAgent (Phase 2+5+6)**
91
- - PubMed E-utilities search
92
- - DuckDuckGo web search
93
- - Semantic search via ChromaDB (Phase 6)
94
- - Evidence deduplication
95
-
96
- 3. **HypothesisAgent (Phase 7)**
97
- - Generates Drug → Target → Pathway → Effect hypotheses
98
- - Guides targeted searches
99
- - Scientific reasoning about mechanisms
100
-
101
- 4. **JudgeAgent (Phase 3+5)**
102
- - LLM-based evidence assessment
103
- - Mechanism score + Clinical score
104
- - Recommends continue/synthesize
105
- - Generates refined search queries
106
-
107
- 5. **ReportAgent (Phase 8)**
108
- - Structured scientific reports
109
- - Executive summary, methodology
110
- - Hypotheses tested with evidence counts
111
- - Proper citations and limitations
112
-
113
- 6. **Gradio UI (Phase 4)**
114
- - Chat interface for questions
115
- - Real-time progress via events
116
- - Mode toggle (Simple/Magentic)
117
- - Formatted markdown output
118
-
119
- ---
120
-
121
- ## Design Patterns
122
-
123
- ### 1. Search-and-Judge Loop (Primary Pattern)
124
-
125
- ```python
126
- def research(question: str) -> Report:
127
- context = []
128
- for iteration in range(max_iterations):
129
- # SEARCH: Query relevant tools
130
- results = search_tools(question, context)
131
- context.extend(results)
132
-
133
- # JUDGE: Evaluate quality
134
- if judge.is_sufficient(question, context):
135
- break
136
-
137
- # REFINE: Adjust search strategy
138
- query = refine_query(question, context)
139
-
140
- # SYNTHESIZE: Generate report
141
- return synthesize_report(question, context)
142
- ```
143
-
144
- **Why This Pattern:**
145
- - Simple to implement and debug
146
- - Clear loop termination conditions
147
- - Iterative improvement of search quality
148
- - Balances depth vs speed
149
-
150
- ### 2. Multi-Tool Orchestration
151
-
152
- ```
153
- Question → Agent decides which tools to use
154
-
155
- ┌───┴────┬─────────┬──────────┐
156
- ↓ ↓ ↓ ↓
157
- PubMed Web Search Trials DB Drug DB
158
- ↓ ↓ ↓ ↓
159
- └───┬────┴─────────┴──��───────┘
160
-
161
- Aggregate Results → Judge
162
- ```
163
-
164
- **Why This Pattern:**
165
- - Different sources provide different evidence types
166
- - Parallel tool execution (when possible)
167
- - Comprehensive coverage
168
-
169
- ### 3. LLM-as-Judge with Token Budget
170
-
171
- **Dual Stopping Conditions:**
172
- - **Smart Stop**: LLM judge says "we have sufficient evidence"
173
- - **Hard Stop**: Token budget exhausted OR max iterations reached
174
-
175
- **Why Both:**
176
- - Judge enables early exit when answer is good
177
- - Budget prevents runaway costs
178
- - Iterations prevent infinite loops
179
-
180
- ### 4. Stateful Checkpointing
181
-
182
- ```
183
- .deepresearch/
184
- ├── state/
185
- │ └── query_123.json # Current research state
186
- ├── checkpoints/
187
- │ └── query_123_iter3/ # Checkpoint at iteration 3
188
- └── workspace/
189
- └── query_123/ # Downloaded papers, data
190
- ```
191
-
192
- **Why This Pattern:**
193
- - Resume interrupted research
194
- - Debugging and analysis
195
- - Cost savings (don't re-search)
196
-
197
- ---
198
-
199
- ## Component Breakdown
200
-
201
- ### Agent (Orchestrator)
202
- - **Responsibility**: Coordinate research process
203
- - **Size**: ~100 lines
204
- - **Key Methods**:
205
- - `research(question)` - Main entry point
206
- - `plan_search_strategy()` - Decide what to search
207
- - `execute_search()` - Run tool queries
208
- - `evaluate_progress()` - Call judge
209
- - `synthesize_findings()` - Generate report
210
-
211
- ### Tools
212
- - **Responsibility**: Interface with external data sources
213
- - **Size**: ~50 lines per tool
214
- - **Implementations**:
215
- - `PubMedTool` - Search biomedical literature
216
- - `WebSearchTool` - General medical information
217
- - `ClinicalTrialsTool` - Trial data (optional)
218
- - `DrugInfoTool` - FDA drug database (optional)
219
-
220
- ### Judge
221
- - **Responsibility**: Evaluate evidence quality
222
- - **Size**: ~50 lines
223
- - **Key Methods**:
224
- - `is_sufficient(question, evidence)` → bool
225
- - `assess_quality(evidence)` → score
226
- - `identify_gaps(question, evidence)` → missing_info
227
-
228
- ### Gradio App
229
- - **Responsibility**: User interface
230
- - **Size**: ~50 lines
231
- - **Features**:
232
- - Text input for questions
233
- - Progress indicators
234
- - Formatted output with citations
235
- - Download research report
236
-
237
- ---
238
-
239
- ## Technical Stack
240
-
241
- ### Core Dependencies
242
- ```toml
243
- [dependencies]
244
- python = ">=3.10"
245
- pydantic = "^2.7"
246
- pydantic-ai = "^0.0.16"
247
- fastmcp = "^0.1.0"
248
- gradio = "^5.0"
249
- beautifulsoup4 = "^4.12"
250
- httpx = "^0.27"
251
- ```
252
-
253
- ### Optional Enhancements
254
- - `modal` - For GPU-accelerated local LLM
255
- - `fastmcp` - MCP server integration
256
- - `sentence-transformers` - Semantic search
257
- - `faiss-cpu` - Vector similarity
258
-
259
- ### Tool APIs & Rate Limits
260
-
261
- | API | Cost | Rate Limit | API Key? | Notes |
262
- |-----|------|------------|----------|-------|
263
- | **PubMed E-utilities** | Free | 3/sec (no key), 10/sec (with key) | Optional | Register at NCBI for higher limits |
264
- | **Brave Search API** | Free tier | 2000/month free | Required | Primary web search |
265
- | **DuckDuckGo** | Free | Unofficial, ~1/sec | No | Fallback web search |
266
- | **ClinicalTrials.gov** | Free | 100/min | No | Stretch goal |
267
- | **OpenFDA** | Free | 240/min (no key), 120K/day (with key) | Optional | Drug info |
268
-
269
- **Web Search Strategy (Priority Order):**
270
- 1. **Brave Search API** (free tier: 2000 queries/month) - Primary
271
- 2. **DuckDuckGo** (unofficial, no API key) - Fallback
272
- 3. **SerpAPI** ($50/month) - Only if free options fail
273
-
274
- **Why NOT SerpAPI first?**
275
- - Costs money (hackathon budget = $0)
276
- - Free alternatives work fine for demo
277
- - Can upgrade later if needed
278
-
279
- ---
280
-
281
- ## Success Criteria
282
-
283
- ### Phase 1-5 (MVP) ✅ COMPLETE
284
- **Completed in ONE DAY:**
285
- - [x] User can ask drug repurposing question
286
- - [x] Agent searches PubMed (async)
287
- - [x] Agent searches web (DuckDuckGo)
288
- - [x] LLM judge evaluates evidence quality
289
- - [x] System respects token budget and iterations
290
- - [x] Output includes drug candidates + citations
291
- - [x] Works end-to-end for demo query
292
- - [x] Gradio UI with streaming progress
293
- - [x] Magentic multi-agent orchestration
294
- - [x] 38 unit tests passing
295
- - [x] CI/CD pipeline green
296
-
297
- ### Hackathon Submission ✅ COMPLETE
298
- - [x] Gradio UI deployed on HuggingFace Spaces
299
- - [x] Example queries working and tested
300
- - [x] Architecture documentation
301
- - [x] README with setup instructions
302
-
303
- ### Phase 6-8 (Enhanced)
304
- **Specs ready for implementation:**
305
- - [ ] Embeddings & Semantic Search (Phase 6)
306
- - [ ] Hypothesis Agent (Phase 7)
307
- - [ ] Report Agent (Phase 8)
308
-
309
- ### What's EXPLICITLY Out of Scope
310
- **NOT building (to stay focused):**
311
- - ❌ User authentication
312
- - ❌ Database storage of queries
313
- - ❌ Multi-user support
314
- - ❌ Payment/billing
315
- - ❌ Production monitoring
316
- - ❌ Mobile UI
317
-
318
- ---
319
-
320
- ## Implementation Timeline
321
-
322
- ### Day 1 (Today): Architecture & Setup
323
- - [x] Define use case (drug repurposing) ✅
324
- - [x] Write architecture docs ✅
325
- - [ ] Create project structure
326
- - [ ] First PR: Structure + Docs
327
-
328
- ### Day 2: Core Agent Loop
329
- - [ ] Implement basic orchestrator
330
- - [ ] Add PubMed search tool
331
- - [ ] Simple judge (keyword-based)
332
- - [ ] Test with 1 query
333
-
334
- ### Day 3: Intelligence Layer
335
- - [ ] Upgrade to LLM judge
336
- - [ ] Add web search tool
337
- - [ ] Token budget tracking
338
- - [ ] Test with multiple queries
339
-
340
- ### Day 4: UI & Integration
341
- - [ ] Build Gradio interface
342
- - [ ] Wire up agent to UI
343
- - [ ] Add progress indicators
344
- - [ ] Format output nicely
345
-
346
- ### Day 5: Polish & Extend
347
- - [ ] Add more tools (clinical trials)
348
- - [ ] Improve judge prompts
349
- - [ ] Checkpoint system
350
- - [ ] Error handling
351
-
352
- ### Day 6: Deploy & Document
353
- - [ ] Deploy to HuggingFace Spaces
354
- - [ ] Record demo video
355
- - [ ] Write submission materials
356
- - [ ] Final testing
357
-
358
- ---
359
-
360
- ## Questions This Document Answers
361
-
362
- ### For The Maintainer
363
-
364
- **Q: "What should our design pattern be?"**
365
- A: Search-and-judge loop with multi-tool orchestration (detailed in Design Patterns section)
366
-
367
- **Q: "Should we use LLM-as-judge or token budget?"**
368
- A: Both - judge for smart stopping, budget for cost control
369
-
370
- **Q: "What's the break pattern?"**
371
- A: Three conditions: judge approval, token limit, or max iterations (whichever comes first)
372
-
373
- **Q: "What components do we need?"**
374
- A: Agent orchestrator, tools (PubMed/web), judge, Gradio UI (see Component Breakdown)
375
-
376
- ### For The Team
377
-
378
- **Q: "What are we actually building?"**
379
- A: Medical drug repurposing research agent (see Core Use Case)
380
-
381
- **Q: "How complex should it be?"**
382
- A: Simple but complete - ~300 lines of core code (see Component sizes)
383
-
384
- **Q: "What's the timeline?"**
385
- A: 6 days, MVP by Day 3, polish Days 4-6 (see Implementation Timeline)
386
-
387
- **Q: "What datasets/APIs do we use?"**
388
- A: PubMed (free), web search, clinical trials.gov (see Tool APIs)
389
-
390
- ---
391
-
392
- ## Next Steps
393
-
394
- 1. **Review this document** - Team feedback on architecture
395
- 2. **Finalize design** - Incorporate feedback
396
- 3. **Create project structure** - Scaffold repository
397
- 4. **Move to proper docs** - `docs/architecture/` folder
398
- 5. **Open first PR** - Structure + Documentation
399
- 6. **Start implementation** - Day 2 onward
400
-
401
- ---
402
-
403
- ## Notes & Decisions
404
-
405
- ### Why Drug Repurposing?
406
- - Clear, impressive use case
407
- - Real-world medical impact
408
- - Good data availability (PubMed, trials)
409
- - Easy to explain (Viagra example!)
410
- - Physician on team ✅
411
-
412
- ### Why Simple Architecture?
413
- - 6-day timeline
414
- - Need working end-to-end system
415
- - Hackathon judges value "works" over "complex"
416
- - Can extend later if successful
417
-
418
- ### Why These Tools First?
419
- - PubMed: Best biomedical literature source
420
- - Web search: General medical knowledge
421
- - Clinical trials: Evidence of actual testing
422
- - Others: Nice-to-have, not critical for MVP
423
-
424
- ---
425
-
426
- ---
427
-
428
- ## Appendix A: Demo Queries (Pre-tested)
429
-
430
- These queries will be used for demo and testing. They're chosen because:
431
- 1. They have good PubMed coverage
432
- 2. They're medically interesting
433
- 3. They show the system's capabilities
434
-
435
- ### Primary Demo Query
436
- ```
437
- "What existing drugs might help treat long COVID fatigue?"
438
- ```
439
- **Expected candidates**: CoQ10, Low-dose Naltrexone, Modafinil
440
- **Expected sources**: 20+ PubMed papers, 2-3 clinical trials
441
-
442
- ### Secondary Demo Queries
443
- ```
444
- "Find existing drugs that might slow Alzheimer's progression"
445
- "What approved medications could help with fibromyalgia pain?"
446
- "Which diabetes drugs show promise for cancer treatment?"
447
- ```
448
-
449
- ### Why These Queries?
450
- - Represent real clinical needs
451
- - Have substantial literature
452
- - Show diverse drug classes
453
- - Physician on team can validate results
454
-
455
- ---
456
-
457
- ## Appendix B: Risk Assessment
458
-
459
- | Risk | Likelihood | Impact | Mitigation |
460
- |------|------------|--------|------------|
461
- | PubMed rate limiting | Medium | High | Implement caching, respect 3/sec |
462
- | Web search API fails | Low | Medium | DuckDuckGo fallback |
463
- | LLM costs exceed budget | Medium | Medium | Hard token cap at 50K |
464
- | Judge quality poor | Medium | High | Pre-test prompts, iterate |
465
- | HuggingFace deploy issues | Low | High | Test deployment Day 4 |
466
- | Demo crashes live | Medium | High | Pre-recorded backup video |
467
-
468
- ---
469
-
470
- ---
471
-
472
- **Document Status**: Official Architecture Spec
473
- **Review Score**: 98/100
474
- **Last Updated**: November 2025
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/architecture/services.md CHANGED
@@ -137,6 +137,3 @@ if settings.has_openai_key:
137
 
138
 
139
 
140
-
141
-
142
-
 
137
 
138
 
139
 
 
 
 
docs/architecture/tools.md CHANGED
@@ -170,6 +170,3 @@ search_handler = SearchHandler(
170
 
171
 
172
 
173
-
174
-
175
-
 
170
 
171
 
172
 
 
 
 
docs/architecture/workflow-diagrams.md ADDED
@@ -0,0 +1,670 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepCritical Workflow - Simplified Magentic Architecture
2
+
3
+ > **Architecture Pattern**: Microsoft Magentic Orchestration
4
+ > **Design Philosophy**: Simple, dynamic, manager-driven coordination
5
+ > **Key Innovation**: Intelligent manager replaces rigid sequential phases
6
+
7
+ ---
8
+
9
+ ## 1. High-Level Magentic Workflow
10
+
11
+ ```mermaid
12
+ flowchart TD
13
+ Start([User Query]) --> Manager[Magentic Manager<br/>Plan • Select • Assess • Adapt]
14
+
15
+ Manager -->|Plans| Task1[Task Decomposition]
16
+ Task1 --> Manager
17
+
18
+ Manager -->|Selects & Executes| HypAgent[Hypothesis Agent]
19
+ Manager -->|Selects & Executes| SearchAgent[Search Agent]
20
+ Manager -->|Selects & Executes| AnalysisAgent[Analysis Agent]
21
+ Manager -->|Selects & Executes| ReportAgent[Report Agent]
22
+
23
+ HypAgent -->|Results| Manager
24
+ SearchAgent -->|Results| Manager
25
+ AnalysisAgent -->|Results| Manager
26
+ ReportAgent -->|Results| Manager
27
+
28
+ Manager -->|Assesses Quality| Decision{Good Enough?}
29
+ Decision -->|No - Refine| Manager
30
+ Decision -->|No - Different Agent| Manager
31
+ Decision -->|No - Stalled| Replan[Reset Plan]
32
+ Replan --> Manager
33
+
34
+ Decision -->|Yes| Synthesis[Synthesize Final Result]
35
+ Synthesis --> Output([Research Report])
36
+
37
+ style Start fill:#e1f5e1
38
+ style Manager fill:#ffe6e6
39
+ style HypAgent fill:#fff4e6
40
+ style SearchAgent fill:#fff4e6
41
+ style AnalysisAgent fill:#fff4e6
42
+ style ReportAgent fill:#fff4e6
43
+ style Decision fill:#ffd6d6
44
+ style Synthesis fill:#d4edda
45
+ style Output fill:#e1f5e1
46
+ ```
47
+
48
+ ## 2. Magentic Manager: The 6-Phase Cycle
49
+
50
+ ```mermaid
51
+ flowchart LR
52
+ P1[1. Planning<br/>Analyze task<br/>Create strategy] --> P2[2. Agent Selection<br/>Pick best agent<br/>for subtask]
53
+ P2 --> P3[3. Execution<br/>Run selected<br/>agent with tools]
54
+ P3 --> P4[4. Assessment<br/>Evaluate quality<br/>Check progress]
55
+ P4 --> Decision{Quality OK?<br/>Progress made?}
56
+ Decision -->|Yes| P6[6. Synthesis<br/>Combine results<br/>Generate report]
57
+ Decision -->|No| P5[5. Iteration<br/>Adjust plan<br/>Try again]
58
+ P5 --> P2
59
+ P6 --> Done([Complete])
60
+
61
+ style P1 fill:#fff4e6
62
+ style P2 fill:#ffe6e6
63
+ style P3 fill:#e6f3ff
64
+ style P4 fill:#ffd6d6
65
+ style P5 fill:#fff3cd
66
+ style P6 fill:#d4edda
67
+ style Done fill:#e1f5e1
68
+ ```
69
+
70
+ ## 3. Simplified Agent Architecture
71
+
72
+ ```mermaid
73
+ graph TB
74
+ subgraph "Orchestration Layer"
75
+ Manager[Magentic Manager<br/>• Plans workflow<br/>• Selects agents<br/>• Assesses quality<br/>• Adapts strategy]
76
+ SharedContext[(Shared Context<br/>• Hypotheses<br/>• Search Results<br/>• Analysis<br/>• Progress)]
77
+ Manager <--> SharedContext
78
+ end
79
+
80
+ subgraph "Specialist Agents"
81
+ HypAgent[Hypothesis Agent<br/>• Domain understanding<br/>• Hypothesis generation<br/>• Testability refinement]
82
+ SearchAgent[Search Agent<br/>• Multi-source search<br/>• RAG retrieval<br/>• Result ranking]
83
+ AnalysisAgent[Analysis Agent<br/>• Evidence extraction<br/>• Statistical analysis<br/>• Code execution]
84
+ ReportAgent[Report Agent<br/>• Report assembly<br/>• Visualization<br/>• Citation formatting]
85
+ end
86
+
87
+ subgraph "MCP Tools"
88
+ WebSearch[Web Search<br/>PubMed • arXiv • bioRxiv]
89
+ CodeExec[Code Execution<br/>Sandboxed Python]
90
+ RAG[RAG Retrieval<br/>Vector DB • Embeddings]
91
+ Viz[Visualization<br/>Charts • Graphs]
92
+ end
93
+
94
+ Manager -->|Selects & Directs| HypAgent
95
+ Manager -->|Selects & Directs| SearchAgent
96
+ Manager -->|Selects & Directs| AnalysisAgent
97
+ Manager -->|Selects & Directs| ReportAgent
98
+
99
+ HypAgent --> SharedContext
100
+ SearchAgent --> SharedContext
101
+ AnalysisAgent --> SharedContext
102
+ ReportAgent --> SharedContext
103
+
104
+ SearchAgent --> WebSearch
105
+ SearchAgent --> RAG
106
+ AnalysisAgent --> CodeExec
107
+ ReportAgent --> CodeExec
108
+ ReportAgent --> Viz
109
+
110
+ style Manager fill:#ffe6e6
111
+ style SharedContext fill:#ffe6f0
112
+ style HypAgent fill:#fff4e6
113
+ style SearchAgent fill:#fff4e6
114
+ style AnalysisAgent fill:#fff4e6
115
+ style ReportAgent fill:#fff4e6
116
+ style WebSearch fill:#e6f3ff
117
+ style CodeExec fill:#e6f3ff
118
+ style RAG fill:#e6f3ff
119
+ style Viz fill:#e6f3ff
120
+ ```
121
+
122
+ ## 4. Dynamic Workflow Example
123
+
124
+ ```mermaid
125
+ sequenceDiagram
126
+ participant User
127
+ participant Manager
128
+ participant HypAgent
129
+ participant SearchAgent
130
+ participant AnalysisAgent
131
+ participant ReportAgent
132
+
133
+ User->>Manager: "Research protein folding in Alzheimer's"
134
+
135
+ Note over Manager: PLAN: Generate hypotheses → Search → Analyze → Report
136
+
137
+ Manager->>HypAgent: Generate 3 hypotheses
138
+ HypAgent-->>Manager: Returns 3 hypotheses
139
+ Note over Manager: ASSESS: Good quality, proceed
140
+
141
+ Manager->>SearchAgent: Search literature for hypothesis 1
142
+ SearchAgent-->>Manager: Returns 15 papers
143
+ Note over Manager: ASSESS: Good results, continue
144
+
145
+ Manager->>SearchAgent: Search for hypothesis 2
146
+ SearchAgent-->>Manager: Only 2 papers found
147
+ Note over Manager: ASSESS: Insufficient, refine search
148
+
149
+ Manager->>SearchAgent: Refined query for hypothesis 2
150
+ SearchAgent-->>Manager: Returns 12 papers
151
+ Note over Manager: ASSESS: Better, proceed
152
+
153
+ Manager->>AnalysisAgent: Analyze evidence for all hypotheses
154
+ AnalysisAgent-->>Manager: Returns analysis with code
155
+ Note over Manager: ASSESS: Complete, generate report
156
+
157
+ Manager->>ReportAgent: Create comprehensive report
158
+ ReportAgent-->>Manager: Returns formatted report
159
+ Note over Manager: SYNTHESIZE: Combine all results
160
+
161
+ Manager->>User: Final Research Report
162
+ ```
163
+
164
+ ## 5. Manager Decision Logic
165
+
166
+ ```mermaid
167
+ flowchart TD
168
+ Start([Manager Receives Task]) --> Plan[Create Initial Plan]
169
+
170
+ Plan --> Select[Select Agent for Next Subtask]
171
+ Select --> Execute[Execute Agent]
172
+ Execute --> Collect[Collect Results]
173
+
174
+ Collect --> Assess[Assess Quality & Progress]
175
+
176
+ Assess --> Q1{Quality Sufficient?}
177
+ Q1 -->|No| Q2{Same Agent Can Fix?}
178
+ Q2 -->|Yes| Feedback[Provide Specific Feedback]
179
+ Feedback --> Execute
180
+ Q2 -->|No| Different[Try Different Agent]
181
+ Different --> Select
182
+
183
+ Q1 -->|Yes| Q3{Task Complete?}
184
+ Q3 -->|No| Q4{Making Progress?}
185
+ Q4 -->|Yes| Select
186
+ Q4 -->|No - Stalled| Replan[Reset Plan & Approach]
187
+ Replan --> Plan
188
+
189
+ Q3 -->|Yes| Synth[Synthesize Final Result]
190
+ Synth --> Done([Return Report])
191
+
192
+ style Start fill:#e1f5e1
193
+ style Plan fill:#fff4e6
194
+ style Select fill:#ffe6e6
195
+ style Execute fill:#e6f3ff
196
+ style Assess fill:#ffd6d6
197
+ style Q1 fill:#ffe6e6
198
+ style Q2 fill:#ffe6e6
199
+ style Q3 fill:#ffe6e6
200
+ style Q4 fill:#ffe6e6
201
+ style Synth fill:#d4edda
202
+ style Done fill:#e1f5e1
203
+ ```
204
+
205
+ ## 6. Hypothesis Agent Workflow
206
+
207
+ ```mermaid
208
+ flowchart LR
209
+ Input[Research Query] --> Domain[Identify Domain<br/>& Key Concepts]
210
+ Domain --> Context[Retrieve Background<br/>Knowledge]
211
+ Context --> Generate[Generate 3-5<br/>Initial Hypotheses]
212
+ Generate --> Refine[Refine for<br/>Testability]
213
+ Refine --> Rank[Rank by<br/>Quality Score]
214
+ Rank --> Output[Return Top<br/>Hypotheses]
215
+
216
+ Output --> Struct[Hypothesis Structure:<br/>• Statement<br/>• Rationale<br/>• Testability Score<br/>• Data Requirements<br/>• Expected Outcomes]
217
+
218
+ style Input fill:#e1f5e1
219
+ style Output fill:#fff4e6
220
+ style Struct fill:#e6f3ff
221
+ ```
222
+
223
+ ## 7. Search Agent Workflow
224
+
225
+ ```mermaid
226
+ flowchart TD
227
+ Input[Hypotheses] --> Strategy[Formulate Search<br/>Strategy per Hypothesis]
228
+
229
+ Strategy --> Multi[Multi-Source Search]
230
+
231
+ Multi --> PubMed[PubMed Search<br/>via MCP]
232
+ Multi --> ArXiv[arXiv Search<br/>via MCP]
233
+ Multi --> BioRxiv[bioRxiv Search<br/>via MCP]
234
+
235
+ PubMed --> Aggregate[Aggregate Results]
236
+ ArXiv --> Aggregate
237
+ BioRxiv --> Aggregate
238
+
239
+ Aggregate --> Filter[Filter & Rank<br/>by Relevance]
240
+ Filter --> Dedup[Deduplicate<br/>Cross-Reference]
241
+ Dedup --> Embed[Embed Documents<br/>via MCP]
242
+ Embed --> Vector[(Vector DB)]
243
+ Vector --> RAGRetrieval[RAG Retrieval<br/>Top-K per Hypothesis]
244
+ RAGRetrieval --> Output[Return Contextualized<br/>Search Results]
245
+
246
+ style Input fill:#fff4e6
247
+ style Multi fill:#ffe6e6
248
+ style Vector fill:#ffe6f0
249
+ style Output fill:#e6f3ff
250
+ ```
251
+
252
+ ## 8. Analysis Agent Workflow
253
+
254
+ ```mermaid
255
+ flowchart TD
256
+ Input1[Hypotheses] --> Extract
257
+ Input2[Search Results] --> Extract[Extract Evidence<br/>per Hypothesis]
258
+
259
+ Extract --> Methods[Determine Analysis<br/>Methods Needed]
260
+
261
+ Methods --> Branch{Requires<br/>Computation?}
262
+ Branch -->|Yes| GenCode[Generate Python<br/>Analysis Code]
263
+ Branch -->|No| Qual[Qualitative<br/>Synthesis]
264
+
265
+ GenCode --> Execute[Execute Code<br/>via MCP Sandbox]
266
+ Execute --> Interpret1[Interpret<br/>Results]
267
+ Qual --> Interpret2[Interpret<br/>Findings]
268
+
269
+ Interpret1 --> Synthesize[Synthesize Evidence<br/>Across Sources]
270
+ Interpret2 --> Synthesize
271
+
272
+ Synthesize --> Verdict[Determine Verdict<br/>per Hypothesis]
273
+ Verdict --> Support[• Supported<br/>• Refuted<br/>• Inconclusive]
274
+ Support --> Gaps[Identify Knowledge<br/>Gaps & Limitations]
275
+ Gaps --> Output[Return Analysis<br/>Report]
276
+
277
+ style Input1 fill:#fff4e6
278
+ style Input2 fill:#e6f3ff
279
+ style Execute fill:#ffe6e6
280
+ style Output fill:#e6ffe6
281
+ ```
282
+
283
+ ## 9. Report Agent Workflow
284
+
285
+ ```mermaid
286
+ flowchart TD
287
+ Input1[Query] --> Assemble
288
+ Input2[Hypotheses] --> Assemble
289
+ Input3[Search Results] --> Assemble
290
+ Input4[Analysis] --> Assemble[Assemble Report<br/>Sections]
291
+
292
+ Assemble --> Exec[Executive Summary]
293
+ Assemble --> Intro[Introduction]
294
+ Assemble --> Methods[Methods]
295
+ Assemble --> Results[Results per<br/>Hypothesis]
296
+ Assemble --> Discussion[Discussion]
297
+ Assemble --> Future[Future Directions]
298
+ Assemble --> Refs[References]
299
+
300
+ Results --> VizCheck{Needs<br/>Visualization?}
301
+ VizCheck -->|Yes| GenViz[Generate Viz Code]
302
+ GenViz --> ExecViz[Execute via MCP<br/>Create Charts]
303
+ ExecViz --> Combine
304
+ VizCheck -->|No| Combine[Combine All<br/>Sections]
305
+
306
+ Exec --> Combine
307
+ Intro --> Combine
308
+ Methods --> Combine
309
+ Discussion --> Combine
310
+ Future --> Combine
311
+ Refs --> Combine
312
+
313
+ Combine --> Format[Format Output]
314
+ Format --> MD[Markdown]
315
+ Format --> PDF[PDF]
316
+ Format --> JSON[JSON]
317
+
318
+ MD --> Output[Return Final<br/>Report]
319
+ PDF --> Output
320
+ JSON --> Output
321
+
322
+ style Input1 fill:#e1f5e1
323
+ style Input2 fill:#fff4e6
324
+ style Input3 fill:#e6f3ff
325
+ style Input4 fill:#e6ffe6
326
+ style Output fill:#d4edda
327
+ ```
328
+
329
+ ## 10. Data Flow & Event Streaming
330
+
331
+ ```mermaid
332
+ flowchart TD
333
+ User[👤 User] -->|Research Query| UI[Gradio UI]
334
+ UI -->|Submit| Manager[Magentic Manager]
335
+
336
+ Manager -->|Event: Planning| UI
337
+ Manager -->|Select Agent| HypAgent[Hypothesis Agent]
338
+ HypAgent -->|Event: Delta/Message| UI
339
+ HypAgent -->|Hypotheses| Context[(Shared Context)]
340
+
341
+ Context -->|Retrieved by| Manager
342
+ Manager -->|Select Agent| SearchAgent[Search Agent]
343
+ SearchAgent -->|MCP Request| WebSearch[Web Search Tool]
344
+ WebSearch -->|Results| SearchAgent
345
+ SearchAgent -->|Event: Delta/Message| UI
346
+ SearchAgent -->|Documents| Context
347
+ SearchAgent -->|Embeddings| VectorDB[(Vector DB)]
348
+
349
+ Context -->|Retrieved by| Manager
350
+ Manager -->|Select Agent| AnalysisAgent[Analysis Agent]
351
+ AnalysisAgent -->|MCP Request| CodeExec[Code Execution Tool]
352
+ CodeExec -->|Results| AnalysisAgent
353
+ AnalysisAgent -->|Event: Delta/Message| UI
354
+ AnalysisAgent -->|Analysis| Context
355
+
356
+ Context -->|Retrieved by| Manager
357
+ Manager -->|Select Agent| ReportAgent[Report Agent]
358
+ ReportAgent -->|MCP Request| CodeExec
359
+ ReportAgent -->|Event: Delta/Message| UI
360
+ ReportAgent -->|Report| Context
361
+
362
+ Manager -->|Event: Final Result| UI
363
+ UI -->|Display| User
364
+
365
+ style User fill:#e1f5e1
366
+ style UI fill:#e6f3ff
367
+ style Manager fill:#ffe6e6
368
+ style Context fill:#ffe6f0
369
+ style VectorDB fill:#ffe6f0
370
+ style WebSearch fill:#f0f0f0
371
+ style CodeExec fill:#f0f0f0
372
+ ```
373
+
374
+ ## 11. MCP Tool Architecture
375
+
376
+ ```mermaid
377
+ graph TB
378
+ subgraph "Agent Layer"
379
+ Manager[Magentic Manager]
380
+ HypAgent[Hypothesis Agent]
381
+ SearchAgent[Search Agent]
382
+ AnalysisAgent[Analysis Agent]
383
+ ReportAgent[Report Agent]
384
+ end
385
+
386
+ subgraph "MCP Protocol Layer"
387
+ Registry[MCP Tool Registry<br/>• Discovers tools<br/>• Routes requests<br/>• Manages connections]
388
+ end
389
+
390
+ subgraph "MCP Servers"
391
+ Server1[Web Search Server<br/>localhost:8001<br/>• PubMed<br/>• arXiv<br/>• bioRxiv]
392
+ Server2[Code Execution Server<br/>localhost:8002<br/>• Sandboxed Python<br/>• Package management]
393
+ Server3[RAG Server<br/>localhost:8003<br/>• Vector embeddings<br/>• Similarity search]
394
+ Server4[Visualization Server<br/>localhost:8004<br/>• Chart generation<br/>• Plot rendering]
395
+ end
396
+
397
+ subgraph "External Services"
398
+ PubMed[PubMed API]
399
+ ArXiv[arXiv API]
400
+ BioRxiv[bioRxiv API]
401
+ Modal[Modal Sandbox]
402
+ ChromaDB[(ChromaDB)]
403
+ end
404
+
405
+ SearchAgent -->|Request| Registry
406
+ AnalysisAgent -->|Request| Registry
407
+ ReportAgent -->|Request| Registry
408
+
409
+ Registry --> Server1
410
+ Registry --> Server2
411
+ Registry --> Server3
412
+ Registry --> Server4
413
+
414
+ Server1 --> PubMed
415
+ Server1 --> ArXiv
416
+ Server1 --> BioRxiv
417
+ Server2 --> Modal
418
+ Server3 --> ChromaDB
419
+
420
+ style Manager fill:#ffe6e6
421
+ style Registry fill:#fff4e6
422
+ style Server1 fill:#e6f3ff
423
+ style Server2 fill:#e6f3ff
424
+ style Server3 fill:#e6f3ff
425
+ style Server4 fill:#e6f3ff
426
+ ```
427
+
428
+ ## 12. Progress Tracking & Stall Detection
429
+
430
+ ```mermaid
431
+ stateDiagram-v2
432
+ [*] --> Initialization: User Query
433
+
434
+ Initialization --> Planning: Manager starts
435
+
436
+ Planning --> AgentExecution: Select agent
437
+
438
+ AgentExecution --> Assessment: Collect results
439
+
440
+ Assessment --> QualityCheck: Evaluate output
441
+
442
+ QualityCheck --> AgentExecution: Poor quality<br/>(retry < max_rounds)
443
+ QualityCheck --> Planning: Poor quality<br/>(try different agent)
444
+ QualityCheck --> NextAgent: Good quality<br/>(task incomplete)
445
+ QualityCheck --> Synthesis: Good quality<br/>(task complete)
446
+
447
+ NextAgent --> AgentExecution: Select next agent
448
+
449
+ state StallDetection <<choice>>
450
+ Assessment --> StallDetection: Check progress
451
+ StallDetection --> Planning: No progress<br/>(stall count < max)
452
+ StallDetection --> ErrorRecovery: No progress<br/>(max stalls reached)
453
+
454
+ ErrorRecovery --> PartialReport: Generate partial results
455
+ PartialReport --> [*]
456
+
457
+ Synthesis --> FinalReport: Combine all outputs
458
+ FinalReport --> [*]
459
+
460
+ note right of QualityCheck
461
+ Manager assesses:
462
+ • Output completeness
463
+ • Quality metrics
464
+ • Progress made
465
+ end note
466
+
467
+ note right of StallDetection
468
+ Stall = no new progress
469
+ after agent execution
470
+ Triggers plan reset
471
+ end note
472
+ ```
473
+
474
+ ## 13. Gradio UI Integration
475
+
476
+ ```mermaid
477
+ graph TD
478
+ App[Gradio App<br/>DeepCritical Research Agent]
479
+
480
+ App --> Input[Input Section]
481
+ App --> Status[Status Section]
482
+ App --> Output[Output Section]
483
+
484
+ Input --> Query[Research Question<br/>Text Area]
485
+ Input --> Controls[Controls]
486
+ Controls --> MaxHyp[Max Hypotheses: 1-10]
487
+ Controls --> MaxRounds[Max Rounds: 5-20]
488
+ Controls --> Submit[Start Research Button]
489
+
490
+ Status --> Log[Real-time Event Log<br/>• Manager planning<br/>• Agent selection<br/>• Execution updates<br/>• Quality assessment]
491
+ Status --> Progress[Progress Tracker<br/>• Current agent<br/>• Round count<br/>• Stall count]
492
+
493
+ Output --> Tabs[Tabbed Results]
494
+ Tabs --> Tab1[Hypotheses Tab<br/>Generated hypotheses with scores]
495
+ Tabs --> Tab2[Search Results Tab<br/>Papers & sources found]
496
+ Tabs --> Tab3[Analysis Tab<br/>Evidence & verdicts]
497
+ Tabs --> Tab4[Report Tab<br/>Final research report]
498
+ Tab4 --> Download[Download Report<br/>MD / PDF / JSON]
499
+
500
+ Submit -.->|Triggers| Workflow[Magentic Workflow]
501
+ Workflow -.->|MagenticOrchestratorMessageEvent| Log
502
+ Workflow -.->|MagenticAgentDeltaEvent| Log
503
+ Workflow -.->|MagenticAgentMessageEvent| Log
504
+ Workflow -.->|MagenticFinalResultEvent| Tab4
505
+
506
+ style App fill:#e1f5e1
507
+ style Input fill:#fff4e6
508
+ style Status fill:#e6f3ff
509
+ style Output fill:#e6ffe6
510
+ style Workflow fill:#ffe6e6
511
+ ```
512
+
513
+ ## 14. Complete System Context
514
+
515
+ ```mermaid
516
+ graph LR
517
+ User[👤 Researcher<br/>Asks research questions] -->|Submits query| DC[DeepCritical<br/>Magentic Workflow]
518
+
519
+ DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
520
+ DC -->|Preprint search| ArXiv[arXiv API<br/>Scientific preprints]
521
+ DC -->|Biology search| BioRxiv[bioRxiv API<br/>Biology preprints]
522
+ DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
523
+ DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
524
+ DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
525
+
526
+ DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
527
+
528
+ PubMed -->|Results| DC
529
+ ArXiv -->|Results| DC
530
+ BioRxiv -->|Results| DC
531
+ Claude -->|Responses| DC
532
+ Modal -->|Output| DC
533
+ Chroma -->|Context| DC
534
+
535
+ DC -->|Research report| User
536
+
537
+ style User fill:#e1f5e1
538
+ style DC fill:#ffe6e6
539
+ style PubMed fill:#e6f3ff
540
+ style ArXiv fill:#e6f3ff
541
+ style BioRxiv fill:#e6f3ff
542
+ style Claude fill:#ffd6d6
543
+ style Modal fill:#f0f0f0
544
+ style Chroma fill:#ffe6f0
545
+ style HF fill:#d4edda
546
+ ```
547
+
548
+ ## 15. Workflow Timeline (Simplified)
549
+
550
+ ```mermaid
551
+ gantt
552
+ title DeepCritical Magentic Workflow - Typical Execution
553
+ dateFormat mm:ss
554
+ axisFormat %M:%S
555
+
556
+ section Manager Planning
557
+ Initial planning :p1, 00:00, 10s
558
+
559
+ section Hypothesis Agent
560
+ Generate hypotheses :h1, after p1, 30s
561
+ Manager assessment :h2, after h1, 5s
562
+
563
+ section Search Agent
564
+ Search hypothesis 1 :s1, after h2, 20s
565
+ Search hypothesis 2 :s2, after s1, 20s
566
+ Search hypothesis 3 :s3, after s2, 20s
567
+ RAG processing :s4, after s3, 15s
568
+ Manager assessment :s5, after s4, 5s
569
+
570
+ section Analysis Agent
571
+ Evidence extraction :a1, after s5, 15s
572
+ Code generation :a2, after a1, 20s
573
+ Code execution :a3, after a2, 25s
574
+ Synthesis :a4, after a3, 20s
575
+ Manager assessment :a5, after a4, 5s
576
+
577
+ section Report Agent
578
+ Report assembly :r1, after a5, 30s
579
+ Visualization :r2, after r1, 15s
580
+ Formatting :r3, after r2, 10s
581
+
582
+ section Manager Synthesis
583
+ Final synthesis :f1, after r3, 10s
584
+ ```
585
+
586
+ ---
587
+
588
+ ## Key Differences from Original Design
589
+
590
+ | Aspect | Original (Judge-in-Loop) | New (Magentic) |
591
+ |--------|-------------------------|----------------|
592
+ | **Control Flow** | Fixed sequential phases | Dynamic agent selection |
593
+ | **Quality Control** | Separate Judge Agent | Manager assessment built-in |
594
+ | **Retry Logic** | Phase-level with feedback | Agent-level with adaptation |
595
+ | **Flexibility** | Rigid 4-phase pipeline | Adaptive workflow |
596
+ | **Complexity** | 5 agents (including Judge) | 4 agents (no Judge) |
597
+ | **Progress Tracking** | Manual state management | Built-in round/stall detection |
598
+ | **Agent Coordination** | Sequential handoff | Manager-driven dynamic selection |
599
+ | **Error Recovery** | Retry same phase | Try different agent or replan |
600
+
601
+ ---
602
+
603
+ ## Simplified Design Principles
604
+
605
+ 1. **Manager is Intelligent**: LLM-powered manager handles planning, selection, and quality assessment
606
+ 2. **No Separate Judge**: Manager's assessment phase replaces dedicated Judge Agent
607
+ 3. **Dynamic Workflow**: Agents can be called multiple times in any order based on need
608
+ 4. **Built-in Safety**: max_round_count (15) and max_stall_count (3) prevent infinite loops
609
+ 5. **Event-Driven UI**: Real-time streaming updates to Gradio interface
610
+ 6. **MCP-Powered Tools**: All external capabilities via Model Context Protocol
611
+ 7. **Shared Context**: Centralized state accessible to all agents
612
+ 8. **Progress Awareness**: Manager tracks what's been done and what's needed
613
+
614
+ ---
615
+
616
+ ## Legend
617
+
618
+ - 🔴 **Red/Pink**: Manager, orchestration, decision-making
619
+ - 🟡 **Yellow/Orange**: Specialist agents, processing
620
+ - 🔵 **Blue**: Data, tools, MCP services
621
+ - 🟣 **Purple/Pink**: Storage, databases, state
622
+ - 🟢 **Green**: User interactions, final outputs
623
+ - ⚪ **Gray**: External services, APIs
624
+
625
+ ---
626
+
627
+ ## Implementation Highlights
628
+
629
+ **Simple 4-Agent Setup:**
630
+ ```python
631
+ workflow = (
632
+ MagenticBuilder()
633
+ .participants(
634
+ hypothesis=HypothesisAgent(tools=[background_tool]),
635
+ search=SearchAgent(tools=[web_search, rag_tool]),
636
+ analysis=AnalysisAgent(tools=[code_execution]),
637
+ report=ReportAgent(tools=[code_execution, visualization])
638
+ )
639
+ .with_standard_manager(
640
+ chat_client=AnthropicClient(model="claude-sonnet-4"),
641
+ max_round_count=15, # Prevent infinite loops
642
+ max_stall_count=3 # Detect stuck workflows
643
+ )
644
+ .build()
645
+ )
646
+ ```
647
+
648
+ **Manager handles quality assessment in its instructions:**
649
+ - Checks hypothesis quality (testable, novel, clear)
650
+ - Validates search results (relevant, authoritative, recent)
651
+ - Assesses analysis soundness (methodology, evidence, conclusions)
652
+ - Ensures report completeness (all sections, proper citations)
653
+
654
+ No separate Judge Agent needed - manager does it all!
655
+
656
+ ---
657
+
658
+ **Document Version**: 2.0 (Magentic Simplified)
659
+ **Last Updated**: 2025-11-24
660
+ **Architecture**: Microsoft Magentic Orchestration Pattern
661
+ **Agents**: 4 (Hypothesis, Search, Analysis, Report) + 1 Manager
662
+ **License**: MIT
663
+
664
+ ## See Also
665
+
666
+ - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
667
+ - [Graph Orchestration](graph-orchestration.md) - Graph-based execution overview
668
+ - [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
669
+ - [Workflows](workflows.md) - Workflow patterns summary
670
+ - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
docs/{workflow-diagrams.md → architecture/workflows.md} RENAMED
File without changes
docs/brainstorming/00_ROADMAP_SUMMARY.md DELETED
@@ -1,194 +0,0 @@
1
- # DeepCritical Data Sources: Roadmap Summary
2
-
3
- **Created**: 2024-11-27
4
- **Purpose**: Future maintainability and hackathon continuation
5
-
6
- ---
7
-
8
- ## Current State
9
-
10
- ### Working Tools
11
-
12
- | Tool | Status | Data Quality |
13
- |------|--------|--------------|
14
- | PubMed | ✅ Works | Good (abstracts only) |
15
- | ClinicalTrials.gov | ✅ Works | Good (filtered for interventional) |
16
- | Europe PMC | ✅ Works | Good (includes preprints) |
17
-
18
- ### Removed Tools
19
-
20
- | Tool | Status | Reason |
21
- |------|--------|--------|
22
- | bioRxiv | ❌ Removed | No search API - only date/DOI lookup |
23
-
24
- ---
25
-
26
- ## Priority Improvements
27
-
28
- ### P0: Critical (Do First)
29
-
30
- 1. **Add Rate Limiting to PubMed**
31
- - NCBI will block us without it
32
- - Use `limits` library (see reference repo)
33
- - 3/sec without key, 10/sec with key
34
-
35
- ### P1: High Value, Medium Effort
36
-
37
- 2. **Add OpenAlex as 4th Source**
38
- - Citation network (huge for drug repurposing)
39
- - Concept tagging (semantic discovery)
40
- - Already implemented in reference repo
41
- - Free, no API key
42
-
43
- 3. **PubMed Full-Text via BioC**
44
- - Get full paper text for PMC papers
45
- - Already in reference repo
46
-
47
- ### P2: Nice to Have
48
-
49
- 4. **ClinicalTrials.gov Results**
50
- - Get efficacy data from completed trials
51
- - Requires more complex API calls
52
-
53
- 5. **Europe PMC Annotations**
54
- - Text-mined entities (genes, drugs, diseases)
55
- - Automatic entity extraction
56
-
57
- ---
58
-
59
- ## Effort Estimates
60
-
61
- | Improvement | Effort | Impact | Priority |
62
- |-------------|--------|--------|----------|
63
- | PubMed rate limiting | 1 hour | Stability | P0 |
64
- | OpenAlex basic search | 2 hours | High | P1 |
65
- | OpenAlex citations | 2 hours | Very High | P1 |
66
- | PubMed full-text | 3 hours | Medium | P1 |
67
- | CT.gov results | 4 hours | Medium | P2 |
68
- | Europe PMC annotations | 3 hours | Medium | P2 |
69
-
70
- ---
71
-
72
- ## Architecture Decision
73
-
74
- ### Option A: Keep Current + Add OpenAlex
75
-
76
- ```
77
- User Query
78
-
79
- ┌───────────────────┼───────────────────┐
80
- ↓ ↓ ↓
81
- PubMed ClinicalTrials Europe PMC
82
- (abstracts) (trials only) (preprints)
83
- ↓ ↓ ↓
84
- └───────────────────┼───────────────────┘
85
-
86
- OpenAlex ← NEW
87
- (citations, concepts)
88
-
89
- Orchestrator
90
-
91
- Report
92
- ```
93
-
94
- **Pros**: Low risk, additive
95
- **Cons**: More complexity, some overlap
96
-
97
- ### Option B: OpenAlex as Primary
98
-
99
- ```
100
- User Query
101
-
102
- ┌───────────────────┼───────────────────┐
103
- ↓ ↓ ↓
104
- OpenAlex ClinicalTrials Europe PMC
105
- (primary (trials only) (full-text
106
- search) fallback)
107
- ↓ ↓ ↓
108
- └───────────────────┼───────────────────┘
109
-
110
- Orchestrator
111
-
112
- Report
113
- ```
114
-
115
- **Pros**: Simpler, citation network built-in
116
- **Cons**: Lose some PubMed-specific features
117
-
118
- ### Recommendation: Option A
119
-
120
- Keep current architecture working, add OpenAlex incrementally.
121
-
122
- ---
123
-
124
- ## Quick Wins (Can Do Today)
125
-
126
- 1. **Add `limits` to `pyproject.toml`**
127
- ```toml
128
- dependencies = [
129
- "limits>=3.0",
130
- ]
131
- ```
132
-
133
- 2. **Copy OpenAlex tool from reference repo**
134
- - File: `reference_repos/DeepCritical/DeepResearch/src/tools/openalex_tools.py`
135
- - Adapt to our `SearchTool` base class
136
-
137
- 3. **Enable NCBI API Key**
138
- - Add to `.env`: `NCBI_API_KEY=your_key`
139
- - 10x rate limit improvement
140
-
141
- ---
142
-
143
- ## External Resources Worth Exploring
144
-
145
- ### Python Libraries
146
-
147
- | Library | For | Notes |
148
- |---------|-----|-------|
149
- | `limits` | Rate limiting | Used by reference repo |
150
- | `pyalex` | OpenAlex wrapper | [GitHub](https://github.com/J535D165/pyalex) |
151
- | `metapub` | PubMed | Full-featured |
152
- | `sentence-transformers` | Semantic search | For embeddings |
153
-
154
- ### APIs Not Yet Used
155
-
156
- | API | Provides | Effort |
157
- |-----|----------|--------|
158
- | RxNorm | Drug name normalization | Low |
159
- | DrugBank | Drug targets/mechanisms | Medium (license) |
160
- | UniProt | Protein data | Medium |
161
- | ChEMBL | Bioactivity data | Medium |
162
-
163
- ### RAG Tools (Future)
164
-
165
- | Tool | Purpose |
166
- |------|---------|
167
- | [PaperQA](https://github.com/Future-House/paper-qa) | RAG for scientific papers |
168
- | [txtai](https://github.com/neuml/txtai) | Embeddings + search |
169
- | [PubMedBERT](https://huggingface.co/NeuML/pubmedbert-base-embeddings) | Biomedical embeddings |
170
-
171
- ---
172
-
173
- ## Files in This Directory
174
-
175
- | File | Contents |
176
- |------|----------|
177
- | `00_ROADMAP_SUMMARY.md` | This file |
178
- | `01_PUBMED_IMPROVEMENTS.md` | PubMed enhancement details |
179
- | `02_CLINICALTRIALS_IMPROVEMENTS.md` | ClinicalTrials.gov details |
180
- | `03_EUROPEPMC_IMPROVEMENTS.md` | Europe PMC details |
181
- | `04_OPENALEX_INTEGRATION.md` | OpenAlex integration plan |
182
-
183
- ---
184
-
185
- ## For Future Maintainers
186
-
187
- If you're picking this up after the hackathon:
188
-
189
- 1. **Start with OpenAlex** - biggest bang for buck
190
- 2. **Add rate limiting** - prevents API blocks
191
- 3. **Don't bother with bioRxiv** - use Europe PMC instead
192
- 4. **Reference repo is gold** - `reference_repos/DeepCritical/` has working implementations
193
-
194
- Good luck! 🚀
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/01_PUBMED_IMPROVEMENTS.md DELETED
@@ -1,125 +0,0 @@
1
- # PubMed Tool: Current State & Future Improvements
2
-
3
- **Status**: Currently Implemented
4
- **Priority**: High (Core Data Source)
5
-
6
- ---
7
-
8
- ## Current Implementation
9
-
10
- ### What We Have (`src/tools/pubmed.py`)
11
-
12
- - Basic E-utilities search via `esearch.fcgi` and `efetch.fcgi`
13
- - Query preprocessing (strips question words, expands synonyms)
14
- - Returns: title, abstract, authors, journal, PMID
15
- - Rate limiting: None implemented (relying on NCBI defaults)
16
-
17
- ### Current Limitations
18
-
19
- 1. **No Full-Text Access**: Only retrieves abstracts, not full paper text
20
- 2. **No Rate Limiting**: Risk of being blocked by NCBI
21
- 3. **No BioC Format**: Missing structured full-text extraction
22
- 4. **No Figure Retrieval**: No supplementary materials access
23
- 5. **No PMC Integration**: Missing open-access full-text via PMC
24
-
25
- ---
26
-
27
- ## Reference Implementation (DeepCritical Reference Repo)
28
-
29
- The reference repo at `reference_repos/DeepCritical/DeepResearch/src/tools/bioinformatics_tools.py` has a more sophisticated implementation:
30
-
31
- ### Features We're Missing
32
-
33
- ```python
34
- # Rate limiting (lines 47-50)
35
- from limits import parse
36
- from limits.storage import MemoryStorage
37
- from limits.strategies import MovingWindowRateLimiter
38
-
39
- storage = MemoryStorage()
40
- limiter = MovingWindowRateLimiter(storage)
41
- rate_limit = parse("3/second") # NCBI allows 3/sec without API key, 10/sec with
42
-
43
- # Full-text via BioC format (lines 108-120)
44
- def _get_fulltext(pmid: int) -> dict[str, Any] | None:
45
- pmid_url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
46
- # Returns structured JSON with full text for open-access papers
47
-
48
- # Figure retrieval via Europe PMC (lines 123-149)
49
- def _get_figures(pmcid: str) -> dict[str, str]:
50
- suppl_url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/supplementaryFiles"
51
- # Returns base64-encoded images from supplementary materials
52
- ```
53
-
54
- ---
55
-
56
- ## Recommended Improvements
57
-
58
- ### Phase 1: Rate Limiting (Critical)
59
-
60
- ```python
61
- # Add to src/tools/pubmed.py
62
- from limits import parse
63
- from limits.storage import MemoryStorage
64
- from limits.strategies import MovingWindowRateLimiter
65
-
66
- storage = MemoryStorage()
67
- limiter = MovingWindowRateLimiter(storage)
68
-
69
- # With NCBI_API_KEY: 10/sec, without: 3/sec
70
- def get_rate_limit():
71
- if settings.ncbi_api_key:
72
- return parse("10/second")
73
- return parse("3/second")
74
- ```
75
-
76
- **Dependencies**: `pip install limits`
77
-
78
- ### Phase 2: Full-Text Retrieval
79
-
80
- ```python
81
- async def get_fulltext(pmid: str) -> str | None:
82
- """Get full text for open-access papers via BioC API."""
83
- url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
84
- # Only works for PMC papers (open access)
85
- ```
86
-
87
- ### Phase 3: PMC ID Resolution
88
-
89
- ```python
90
- async def get_pmc_id(pmid: str) -> str | None:
91
- """Convert PMID to PMCID for full-text access."""
92
- url = f"https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?ids={pmid}&format=json"
93
- ```
94
-
95
- ---
96
-
97
- ## Python Libraries to Consider
98
-
99
- | Library | Purpose | Notes |
100
- |---------|---------|-------|
101
- | [Biopython](https://biopython.org/) | `Bio.Entrez` module | Official, well-maintained |
102
- | [PyMed](https://pypi.org/project/pymed/) | PubMed wrapper | Simpler API, less control |
103
- | [metapub](https://pypi.org/project/metapub/) | Full-featured | Tested on 1/3 of PubMed |
104
- | [limits](https://pypi.org/project/limits/) | Rate limiting | Used by reference repo |
105
-
106
- ---
107
-
108
- ## API Endpoints Reference
109
-
110
- | Endpoint | Purpose | Rate Limit |
111
- |----------|---------|------------|
112
- | `esearch.fcgi` | Search for PMIDs | 3/sec (10 with key) |
113
- | `efetch.fcgi` | Fetch metadata | 3/sec (10 with key) |
114
- | `esummary.fcgi` | Quick metadata | 3/sec (10 with key) |
115
- | `pmcoa.cgi/BioC_json` | Full text (PMC only) | Unknown |
116
- | `idconv/v1.0` | PMID ↔ PMCID | Unknown |
117
-
118
- ---
119
-
120
- ## Sources
121
-
122
- - [PubMed E-utilities Documentation](https://www.ncbi.nlm.nih.gov/books/NBK25501/)
123
- - [NCBI BioC API](https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/)
124
- - [Searching PubMed with Python](https://marcobonzanini.com/2015/01/12/searching-pubmed-with-python/)
125
- - [PyMed on PyPI](https://pypi.org/project/pymed/)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/02_CLINICALTRIALS_IMPROVEMENTS.md DELETED
@@ -1,193 +0,0 @@
1
- # ClinicalTrials.gov Tool: Current State & Future Improvements
2
-
3
- **Status**: Currently Implemented
4
- **Priority**: High (Core Data Source for Drug Repurposing)
5
-
6
- ---
7
-
8
- ## Current Implementation
9
-
10
- ### What We Have (`src/tools/clinicaltrials.py`)
11
-
12
- - V2 API search via `clinicaltrials.gov/api/v2/studies`
13
- - Filters: `INTERVENTIONAL` study type, `RECRUITING` status
14
- - Returns: NCT ID, title, conditions, interventions, phase, status
15
- - Query preprocessing via shared `query_utils.py`
16
-
17
- ### Current Strengths
18
-
19
- 1. **Good Filtering**: Already filtering for interventional + recruiting
20
- 2. **V2 API**: Using the modern API (v1 deprecated)
21
- 3. **Phase Info**: Extracting trial phases for drug development context
22
-
23
- ### Current Limitations
24
-
25
- 1. **No Outcome Data**: Missing primary/secondary outcomes
26
- 2. **No Eligibility Criteria**: Missing inclusion/exclusion details
27
- 3. **No Sponsor Info**: Missing who's running the trial
28
- 4. **No Result Data**: For completed trials, no efficacy data
29
- 5. **Limited Drug Mapping**: No integration with drug databases
30
-
31
- ---
32
-
33
- ## API Capabilities We're Not Using
34
-
35
- ### Fields We Could Request
36
-
37
- ```python
38
- # Current fields
39
- fields = ["NCTId", "BriefTitle", "Condition", "InterventionName", "Phase", "OverallStatus"]
40
-
41
- # Additional valuable fields
42
- additional_fields = [
43
- "PrimaryOutcomeMeasure", # What are they measuring?
44
- "SecondaryOutcomeMeasure", # Secondary endpoints
45
- "EligibilityCriteria", # Who can participate?
46
- "LeadSponsorName", # Who's funding?
47
- "ResultsFirstPostDate", # Has results?
48
- "StudyFirstPostDate", # When started?
49
- "CompletionDate", # When finished?
50
- "EnrollmentCount", # Sample size
51
- "InterventionDescription", # Drug details
52
- "ArmGroupLabel", # Treatment arms
53
- "InterventionOtherName", # Drug aliases
54
- ]
55
- ```
56
-
57
- ### Filter Enhancements
58
-
59
- ```python
60
- # Current
61
- aggFilters = "studyType:INTERVENTIONAL,status:RECRUITING"
62
-
63
- # Could add
64
- "status:RECRUITING,ACTIVE_NOT_RECRUITING,COMPLETED" # Include completed for results
65
- "phase:PHASE2,PHASE3" # Only later-stage trials
66
- "resultsFirstPostDateRange:2020-01-01_" # Trials with posted results
67
- ```
68
-
69
- ---
70
-
71
- ## Recommended Improvements
72
-
73
- ### Phase 1: Richer Metadata
74
-
75
- ```python
76
- EXTENDED_FIELDS = [
77
- "NCTId",
78
- "BriefTitle",
79
- "OfficialTitle",
80
- "Condition",
81
- "InterventionName",
82
- "InterventionDescription",
83
- "InterventionOtherName", # Drug synonyms!
84
- "Phase",
85
- "OverallStatus",
86
- "PrimaryOutcomeMeasure",
87
- "EnrollmentCount",
88
- "LeadSponsorName",
89
- "StudyFirstPostDate",
90
- ]
91
- ```
92
-
93
- ### Phase 2: Results Retrieval
94
-
95
- For completed trials, we can get actual efficacy data:
96
-
97
- ```python
98
- async def get_trial_results(nct_id: str) -> dict | None:
99
- """Fetch results for completed trials."""
100
- url = f"https://clinicaltrials.gov/api/v2/studies/{nct_id}"
101
- params = {
102
- "fields": "ResultsSection",
103
- }
104
- # Returns outcome measures and statistics
105
- ```
106
-
107
- ### Phase 3: Drug Name Normalization
108
-
109
- Map intervention names to standard identifiers:
110
-
111
- ```python
112
- # Problem: "Metformin", "Metformin HCl", "Glucophage" are the same drug
113
- # Solution: Use RxNorm or DrugBank for normalization
114
-
115
- async def normalize_drug_name(intervention: str) -> str:
116
- """Normalize drug name via RxNorm API."""
117
- url = f"https://rxnav.nlm.nih.gov/REST/rxcui.json?name={intervention}"
118
- # Returns standardized RxCUI
119
- ```
120
-
121
- ---
122
-
123
- ## Integration Opportunities
124
-
125
- ### With PubMed
126
-
127
- Cross-reference trials with publications:
128
- ```python
129
- # ClinicalTrials.gov provides PMID links
130
- # Can correlate trial results with published papers
131
- ```
132
-
133
- ### With DrugBank/ChEMBL
134
-
135
- Map interventions to:
136
- - Mechanism of action
137
- - Known targets
138
- - Adverse effects
139
- - Drug-drug interactions
140
-
141
- ---
142
-
143
- ## Python Libraries to Consider
144
-
145
- | Library | Purpose | Notes |
146
- |---------|---------|-------|
147
- | [pytrials](https://pypi.org/project/pytrials/) | CT.gov wrapper | V2 API support unclear |
148
- | [clinicaltrials](https://github.com/ebmdatalab/clinicaltrials-act-tracker) | Data tracking | More for analysis |
149
- | [drugbank-downloader](https://pypi.org/project/drugbank-downloader/) | Drug mapping | Requires license |
150
-
151
- ---
152
-
153
- ## API Quirks & Gotchas
154
-
155
- 1. **Rate Limiting**: Undocumented, be conservative
156
- 2. **Pagination**: Max 1000 results per request
157
- 3. **Field Names**: Case-sensitive, camelCase
158
- 4. **Empty Results**: Some fields may be null even if requested
159
- 5. **Status Changes**: Trials change status frequently
160
-
161
- ---
162
-
163
- ## Example Enhanced Query
164
-
165
- ```python
166
- async def search_drug_repurposing_trials(
167
- drug_name: str,
168
- condition: str,
169
- include_completed: bool = True,
170
- ) -> list[Evidence]:
171
- """Search for trials repurposing a drug for a new condition."""
172
-
173
- statuses = ["RECRUITING", "ACTIVE_NOT_RECRUITING"]
174
- if include_completed:
175
- statuses.append("COMPLETED")
176
-
177
- params = {
178
- "query.intr": drug_name,
179
- "query.cond": condition,
180
- "filter.overallStatus": ",".join(statuses),
181
- "filter.studyType": "INTERVENTIONAL",
182
- "fields": ",".join(EXTENDED_FIELDS),
183
- "pageSize": 50,
184
- }
185
- ```
186
-
187
- ---
188
-
189
- ## Sources
190
-
191
- - [ClinicalTrials.gov API Documentation](https://clinicaltrials.gov/data-api/api)
192
- - [CT.gov Field Definitions](https://clinicaltrials.gov/data-api/about-api/study-data-structure)
193
- - [RxNorm API](https://lhncbc.nlm.nih.gov/RxNav/APIs/api-RxNorm.findRxcuiByString.html)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/03_EUROPEPMC_IMPROVEMENTS.md DELETED
@@ -1,211 +0,0 @@
1
- # Europe PMC Tool: Current State & Future Improvements
2
-
3
- **Status**: Currently Implemented (Replaced bioRxiv)
4
- **Priority**: High (Preprint + Open Access Source)
5
-
6
- ---
7
-
8
- ## Why Europe PMC Over bioRxiv?
9
-
10
- ### bioRxiv API Limitations (Why We Abandoned It)
11
-
12
- 1. **No Search API**: Only returns papers by date range or DOI
13
- 2. **No Query Capability**: Cannot search for "metformin cancer"
14
- 3. **Workaround Required**: Would need to download ALL preprints and build local search
15
- 4. **Known Issue**: [Gradio Issue #8861](https://github.com/gradio-app/gradio/issues/8861) documents the limitation
16
-
17
- ### Europe PMC Advantages
18
-
19
- 1. **Full Search API**: Boolean queries, filters, facets
20
- 2. **Aggregates bioRxiv**: Includes bioRxiv, medRxiv content anyway
21
- 3. **Includes PubMed**: Also has MEDLINE content
22
- 4. **34 Preprint Servers**: Not just bioRxiv
23
- 5. **Open Access Focus**: Full-text when available
24
-
25
- ---
26
-
27
- ## Current Implementation
28
-
29
- ### What We Have (`src/tools/europepmc.py`)
30
-
31
- - REST API search via `europepmc.org/webservices/rest/search`
32
- - Preprint flagging via `firstPublicationDate` heuristics
33
- - Returns: title, abstract, authors, DOI, source
34
- - Marks preprints for transparency
35
-
36
- ### Current Limitations
37
-
38
- 1. **No Full-Text Retrieval**: Only metadata/abstracts
39
- 2. **No Citation Network**: Missing references/citations
40
- 3. **No Supplementary Files**: Not fetching figures/data
41
- 4. **Basic Preprint Detection**: Heuristic, not explicit flag
42
-
43
- ---
44
-
45
- ## Europe PMC API Capabilities
46
-
47
- ### Endpoints We Could Use
48
-
49
- | Endpoint | Purpose | Currently Using |
50
- |----------|---------|-----------------|
51
- | `/search` | Query papers | Yes |
52
- | `/fulltext/{ID}` | Full text (XML/JSON) | No |
53
- | `/{PMCID}/supplementaryFiles` | Figures, data | No |
54
- | `/citations/{ID}` | Who cited this | No |
55
- | `/references/{ID}` | What this cites | No |
56
- | `/annotations` | Text-mined entities | No |
57
-
58
- ### Rich Query Syntax
59
-
60
- ```python
61
- # Current simple query
62
- query = "metformin cancer"
63
-
64
- # Could use advanced syntax
65
- query = "(TITLE:metformin OR ABSTRACT:metformin) AND (cancer OR oncology)"
66
- query += " AND (SRC:PPR)" # Only preprints
67
- query += " AND (FIRST_PDATE:[2023-01-01 TO 2024-12-31])" # Date range
68
- query += " AND (OPEN_ACCESS:y)" # Only open access
69
- ```
70
-
71
- ### Source Filters
72
-
73
- ```python
74
- # Filter by source
75
- "SRC:MED" # MEDLINE
76
- "SRC:PMC" # PubMed Central
77
- "SRC:PPR" # Preprints (bioRxiv, medRxiv, etc.)
78
- "SRC:AGR" # Agricola
79
- "SRC:CBA" # Chinese Biological Abstracts
80
- ```
81
-
82
- ---
83
-
84
- ## Recommended Improvements
85
-
86
- ### Phase 1: Rich Metadata
87
-
88
- ```python
89
- # Add to search results
90
- additional_fields = [
91
- "citedByCount", # Impact indicator
92
- "source", # Explicit source (MED, PMC, PPR)
93
- "isOpenAccess", # Boolean flag
94
- "fullTextUrlList", # URLs for full text
95
- "authorAffiliations", # Institution info
96
- "grantsList", # Funding info
97
- ]
98
- ```
99
-
100
- ### Phase 2: Full-Text Retrieval
101
-
102
- ```python
103
- async def get_fulltext(pmcid: str) -> str | None:
104
- """Get full text for open access papers."""
105
- # XML format
106
- url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/fullTextXML"
107
- # Or JSON
108
- url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/fullTextJSON"
109
- ```
110
-
111
- ### Phase 3: Citation Network
112
-
113
- ```python
114
- async def get_citations(pmcid: str) -> list[str]:
115
- """Get papers that cite this one."""
116
- url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/citations"
117
-
118
- async def get_references(pmcid: str) -> list[str]:
119
- """Get papers this one cites."""
120
- url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/references"
121
- ```
122
-
123
- ### Phase 4: Text-Mined Annotations
124
-
125
- Europe PMC extracts entities automatically:
126
-
127
- ```python
128
- async def get_annotations(pmcid: str) -> dict:
129
- """Get text-mined entities (genes, diseases, drugs)."""
130
- url = f"https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds"
131
- params = {
132
- "articleIds": f"PMC:{pmcid}",
133
- "type": "Gene_Proteins,Diseases,Chemicals",
134
- "format": "JSON",
135
- }
136
- # Returns structured entity mentions with positions
137
- ```
138
-
139
- ---
140
-
141
- ## Supplementary File Retrieval
142
-
143
- From reference repo (`bioinformatics_tools.py` lines 123-149):
144
-
145
- ```python
146
- def get_figures(pmcid: str) -> dict[str, str]:
147
- """Download figures and supplementary files."""
148
- url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/supplementaryFiles?includeInlineImage=true"
149
- # Returns ZIP with images, returns base64-encoded
150
- ```
151
-
152
- ---
153
-
154
- ## Preprint-Specific Features
155
-
156
- ### Identify Preprint Servers
157
-
158
- ```python
159
- PREPRINT_SOURCES = {
160
- "PPR": "General preprints",
161
- "bioRxiv": "Biology preprints",
162
- "medRxiv": "Medical preprints",
163
- "chemRxiv": "Chemistry preprints",
164
- "Research Square": "Multi-disciplinary",
165
- "Preprints.org": "MDPI preprints",
166
- }
167
-
168
- # Check if published version exists
169
- async def check_published_version(preprint_doi: str) -> str | None:
170
- """Check if preprint has been peer-reviewed and published."""
171
- # Europe PMC links preprints to final versions
172
- ```
173
-
174
- ---
175
-
176
- ## Rate Limiting
177
-
178
- Europe PMC is more generous than NCBI:
179
-
180
- ```python
181
- # No documented hard limit, but be respectful
182
- # Recommend: 10-20 requests/second max
183
- # Use email in User-Agent for polite pool
184
- headers = {
185
- "User-Agent": "DeepCritical/1.0 (mailto:your@email.com)"
186
- }
187
- ```
188
-
189
- ---
190
-
191
- ## vs. The Lens & OpenAlex
192
-
193
- | Feature | Europe PMC | The Lens | OpenAlex |
194
- |---------|------------|----------|----------|
195
- | Biomedical Focus | Yes | Partial | Partial |
196
- | Preprints | Yes (34 servers) | Yes | Yes |
197
- | Full Text | PMC papers | Links | No |
198
- | Citations | Yes | Yes | Yes |
199
- | Annotations | Yes (text-mined) | No | No |
200
- | Rate Limits | Generous | Moderate | Very generous |
201
- | API Key | Optional | Required | Optional |
202
-
203
- ---
204
-
205
- ## Sources
206
-
207
- - [Europe PMC REST API](https://europepmc.org/RestfulWebService)
208
- - [Europe PMC Annotations API](https://europepmc.org/AnnotationsApi)
209
- - [Europe PMC Articles API](https://europepmc.org/ArticlesApi)
210
- - [rOpenSci medrxivr](https://docs.ropensci.org/medrxivr/)
211
- - [bioRxiv TDM Resources](https://www.biorxiv.org/tdm)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/04_OPENALEX_INTEGRATION.md DELETED
@@ -1,303 +0,0 @@
1
- # OpenAlex Integration: The Missing Piece?
2
-
3
- **Status**: NOT Implemented (Candidate for Addition)
4
- **Priority**: HIGH - Could Replace Multiple Tools
5
- **Reference**: Already implemented in `reference_repos/DeepCritical`
6
-
7
- ---
8
-
9
- ## What is OpenAlex?
10
-
11
- OpenAlex is a **fully open** index of the global research system:
12
-
13
- - **209M+ works** (papers, books, datasets)
14
- - **2B+ author records** (disambiguated)
15
- - **124K+ venues** (journals, repositories)
16
- - **109K+ institutions**
17
- - **65K+ concepts** (hierarchical, linked to Wikidata)
18
-
19
- **Free. Open. No API key required.**
20
-
21
- ---
22
-
23
- ## Why OpenAlex for DeepCritical?
24
-
25
- ### Current Architecture
26
-
27
- ```
28
- User Query
29
-
30
- ┌──────────────────────────────────────┐
31
- │ PubMed ClinicalTrials Europe PMC │ ← 3 separate APIs
32
- └──────────────────────────────────────┘
33
-
34
- Orchestrator (deduplicate, judge, synthesize)
35
- ```
36
-
37
- ### With OpenAlex
38
-
39
- ```
40
- User Query
41
-
42
- ┌──────────────────────────────────────┐
43
- │ OpenAlex │ ← Single API
44
- │ (includes PubMed + preprints + │
45
- │ citations + concepts + authors) │
46
- └──────────────────────────────────────┘
47
-
48
- Orchestrator (enrich with CT.gov for trials)
49
- ```
50
-
51
- **OpenAlex already aggregates**:
52
- - PubMed/MEDLINE
53
- - Crossref
54
- - ORCID
55
- - Unpaywall (open access links)
56
- - Microsoft Academic Graph (legacy)
57
- - Preprint servers
58
-
59
- ---
60
-
61
- ## Reference Implementation
62
-
63
- From `reference_repos/DeepCritical/DeepResearch/src/tools/openalex_tools.py`:
64
-
65
- ```python
66
- class OpenAlexFetchTool(ToolRunner):
67
- def __init__(self):
68
- super().__init__(
69
- ToolSpec(
70
- name="openalex_fetch",
71
- description="Fetch OpenAlex work or author",
72
- inputs={"entity": "TEXT", "identifier": "TEXT"},
73
- outputs={"result": "JSON"},
74
- )
75
- )
76
-
77
- def run(self, params: dict[str, Any]) -> ExecutionResult:
78
- entity = params["entity"] # "works", "authors", "venues"
79
- identifier = params["identifier"]
80
- base = "https://api.openalex.org"
81
- url = f"{base}/{entity}/{identifier}"
82
- resp = requests.get(url, timeout=30)
83
- return ExecutionResult(success=True, data={"result": resp.json()})
84
- ```
85
-
86
- ---
87
-
88
- ## OpenAlex API Features
89
-
90
- ### Search Works (Papers)
91
-
92
- ```python
93
- # Search for metformin + cancer papers
94
- url = "https://api.openalex.org/works"
95
- params = {
96
- "search": "metformin cancer drug repurposing",
97
- "filter": "publication_year:>2020,type:article",
98
- "sort": "cited_by_count:desc",
99
- "per_page": 50,
100
- }
101
- ```
102
-
103
- ### Rich Filtering
104
-
105
- ```python
106
- # Filter examples
107
- "publication_year:2023"
108
- "type:article" # vs preprint, book, etc.
109
- "is_oa:true" # Open access only
110
- "concepts.id:C71924100" # Papers about "Medicine"
111
- "authorships.institutions.id:I27837315" # From Harvard
112
- "cited_by_count:>100" # Highly cited
113
- "has_fulltext:true" # Full text available
114
- ```
115
-
116
- ### What You Get Back
117
-
118
- ```json
119
- {
120
- "id": "W2741809807",
121
- "title": "Metformin: A candidate drug for...",
122
- "publication_year": 2023,
123
- "type": "article",
124
- "cited_by_count": 45,
125
- "is_oa": true,
126
- "primary_location": {
127
- "source": {"display_name": "Nature Medicine"},
128
- "pdf_url": "https://...",
129
- "landing_page_url": "https://..."
130
- },
131
- "concepts": [
132
- {"id": "C71924100", "display_name": "Medicine", "score": 0.95},
133
- {"id": "C54355233", "display_name": "Pharmacology", "score": 0.88}
134
- ],
135
- "authorships": [
136
- {
137
- "author": {"id": "A123", "display_name": "John Smith"},
138
- "institutions": [{"display_name": "Harvard Medical School"}]
139
- }
140
- ],
141
- "referenced_works": ["W123", "W456"], # Citations
142
- "related_works": ["W789", "W012"] # Similar papers
143
- }
144
- ```
145
-
146
- ---
147
-
148
- ## Key Advantages Over Current Tools
149
-
150
- ### 1. Citation Network (We Don't Have This!)
151
-
152
- ```python
153
- # Get papers that cite a work
154
- url = f"https://api.openalex.org/works?filter=cites:{work_id}"
155
-
156
- # Get papers cited by a work
157
- # Already in `referenced_works` field
158
- ```
159
-
160
- ### 2. Concept Tagging (We Don't Have This!)
161
-
162
- OpenAlex auto-tags papers with hierarchical concepts:
163
- - "Medicine" → "Pharmacology" → "Drug Repurposing"
164
- - Can search by concept, not just keywords
165
-
166
- ### 3. Author Disambiguation (We Don't Have This!)
167
-
168
- ```python
169
- # Find all works by an author
170
- url = f"https://api.openalex.org/works?filter=authorships.author.id:{author_id}"
171
- ```
172
-
173
- ### 4. Institution Tracking
174
-
175
- ```python
176
- # Find drug repurposing papers from top institutions
177
- url = "https://api.openalex.org/works"
178
- params = {
179
- "search": "drug repurposing",
180
- "filter": "authorships.institutions.id:I27837315", # Harvard
181
- }
182
- ```
183
-
184
- ### 5. Related Works
185
-
186
- Each paper comes with `related_works` - semantically similar papers discovered by OpenAlex's ML.
187
-
188
- ---
189
-
190
- ## Proposed Implementation
191
-
192
- ### New Tool: `src/tools/openalex.py`
193
-
194
- ```python
195
- """OpenAlex search tool for comprehensive scholarly data."""
196
-
197
- import httpx
198
- from src.tools.base import SearchTool
199
- from src.utils.models import Evidence
200
-
201
- class OpenAlexTool(SearchTool):
202
- """Search OpenAlex for scholarly works with rich metadata."""
203
-
204
- name = "openalex"
205
-
206
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
207
- async with httpx.AsyncClient() as client:
208
- resp = await client.get(
209
- "https://api.openalex.org/works",
210
- params={
211
- "search": query,
212
- "filter": "type:article,is_oa:true",
213
- "sort": "cited_by_count:desc",
214
- "per_page": max_results,
215
- "mailto": "deepcritical@example.com", # Polite pool
216
- },
217
- )
218
- data = resp.json()
219
-
220
- return [
221
- Evidence(
222
- source="openalex",
223
- title=work["title"],
224
- abstract=work.get("abstract", ""),
225
- url=work["primary_location"]["landing_page_url"],
226
- metadata={
227
- "cited_by_count": work["cited_by_count"],
228
- "concepts": [c["display_name"] for c in work["concepts"][:5]],
229
- "is_open_access": work["is_oa"],
230
- "pdf_url": work["primary_location"].get("pdf_url"),
231
- },
232
- )
233
- for work in data["results"]
234
- ]
235
- ```
236
-
237
- ---
238
-
239
- ## Rate Limits
240
-
241
- OpenAlex is **extremely generous**:
242
-
243
- - No hard rate limit documented
244
- - Recommended: <100,000 requests/day
245
- - **Polite pool**: Add `mailto=your@email.com` param for faster responses
246
- - No API key required (optional for priority support)
247
-
248
- ---
249
-
250
- ## Should We Add OpenAlex?
251
-
252
- ### Arguments FOR
253
-
254
- 1. **Already in reference repo** - proven pattern
255
- 2. **Richer data** - citations, concepts, authors
256
- 3. **Single source** - reduces API complexity
257
- 4. **Free & open** - no keys, no limits
258
- 5. **Institution adoption** - Leiden, Sorbonne switched to it
259
-
260
- ### Arguments AGAINST
261
-
262
- 1. **Adds complexity** - another data source
263
- 2. **Overlap** - duplicates some PubMed data
264
- 3. **Not biomedical-focused** - covers all disciplines
265
- 4. **No full text** - still need PMC/Europe PMC for that
266
-
267
- ### Recommendation
268
-
269
- **Add OpenAlex as a 4th source**, don't replace existing tools.
270
-
271
- Use it for:
272
- - Citation network analysis
273
- - Concept-based discovery
274
- - High-impact paper finding
275
- - Author/institution tracking
276
-
277
- Keep PubMed, ClinicalTrials, Europe PMC for:
278
- - Authoritative biomedical search
279
- - Clinical trial data
280
- - Full-text access
281
- - Preprint tracking
282
-
283
- ---
284
-
285
- ## Implementation Priority
286
-
287
- | Task | Effort | Value |
288
- |------|--------|-------|
289
- | Basic search | Low | High |
290
- | Citation network | Medium | Very High |
291
- | Concept filtering | Low | High |
292
- | Related works | Low | High |
293
- | Author tracking | Medium | Medium |
294
-
295
- ---
296
-
297
- ## Sources
298
-
299
- - [OpenAlex Documentation](https://docs.openalex.org)
300
- - [OpenAlex API Overview](https://docs.openalex.org/api)
301
- - [OpenAlex Wikipedia](https://en.wikipedia.org/wiki/OpenAlex)
302
- - [Leiden University Announcement](https://www.leidenranking.com/information/openalex)
303
- - [OpenAlex: A fully-open index (Paper)](https://arxiv.org/abs/2205.01833)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/implementation/15_PHASE_OPENALEX.md DELETED
@@ -1,603 +0,0 @@
1
- # Phase 15: OpenAlex Integration
2
-
3
- **Priority**: HIGH - Biggest bang for buck
4
- **Effort**: ~2-3 hours
5
- **Dependencies**: None (existing codebase patterns sufficient)
6
-
7
- ---
8
-
9
- ## Prerequisites (COMPLETED)
10
-
11
- The following model changes have been implemented to support this integration:
12
-
13
- 1. **`SourceName` Literal Updated** (`src/utils/models.py:9`)
14
- ```python
15
- SourceName = Literal["pubmed", "clinicaltrials", "europepmc", "preprint", "openalex"]
16
- ```
17
- - Without this, `source="openalex"` would fail Pydantic validation
18
-
19
- 2. **`Evidence.metadata` Field Added** (`src/utils/models.py:39-42`)
20
- ```python
21
- metadata: dict[str, Any] = Field(
22
- default_factory=dict,
23
- description="Additional metadata (e.g., cited_by_count, concepts, is_open_access)",
24
- )
25
- ```
26
- - Required for storing `cited_by_count`, `concepts`, etc.
27
- - Model is still frozen - metadata must be passed at construction time
28
-
29
- 3. **`__init__.py` Exports Updated** (`src/tools/__init__.py`)
30
- - All tools are now exported: `ClinicalTrialsTool`, `EuropePMCTool`, `PubMedTool`
31
- - OpenAlexTool should be added here after implementation
32
-
33
- ---
34
-
35
- ## Overview
36
-
37
- Add OpenAlex as a 4th data source for comprehensive scholarly data including:
38
- - Citation networks (who cites whom)
39
- - Concept tagging (hierarchical topic classification)
40
- - Author disambiguation
41
- - 209M+ works indexed
42
-
43
- **Why OpenAlex?**
44
- - Free, no API key required
45
- - Already implemented in reference repo
46
- - Provides citation data we don't have
47
- - Aggregates PubMed + preprints + more
48
-
49
- ---
50
-
51
- ## TDD Implementation Plan
52
-
53
- ### Step 1: Write the Tests First
54
-
55
- **File**: `tests/unit/tools/test_openalex.py`
56
-
57
- ```python
58
- """Tests for OpenAlex search tool."""
59
-
60
- import pytest
61
- import respx
62
- from httpx import Response
63
-
64
- from src.tools.openalex import OpenAlexTool
65
- from src.utils.models import Evidence
66
-
67
-
68
- class TestOpenAlexTool:
69
- """Test suite for OpenAlex search functionality."""
70
-
71
- @pytest.fixture
72
- def tool(self) -> OpenAlexTool:
73
- return OpenAlexTool()
74
-
75
- def test_name_property(self, tool: OpenAlexTool) -> None:
76
- """Tool should identify itself as 'openalex'."""
77
- assert tool.name == "openalex"
78
-
79
- @respx.mock
80
- @pytest.mark.asyncio
81
- async def test_search_returns_evidence(self, tool: OpenAlexTool) -> None:
82
- """Search should return list of Evidence objects."""
83
- mock_response = {
84
- "results": [
85
- {
86
- "id": "W2741809807",
87
- "title": "Metformin and cancer: A systematic review",
88
- "publication_year": 2023,
89
- "cited_by_count": 45,
90
- "type": "article",
91
- "is_oa": True,
92
- "primary_location": {
93
- "source": {"display_name": "Nature Medicine"},
94
- "landing_page_url": "https://doi.org/10.1038/example",
95
- "pdf_url": None,
96
- },
97
- "abstract_inverted_index": {
98
- "Metformin": [0],
99
- "shows": [1],
100
- "anticancer": [2],
101
- "effects": [3],
102
- },
103
- "concepts": [
104
- {"display_name": "Medicine", "score": 0.95},
105
- {"display_name": "Oncology", "score": 0.88},
106
- ],
107
- "authorships": [
108
- {
109
- "author": {"display_name": "John Smith"},
110
- "institutions": [{"display_name": "Harvard"}],
111
- }
112
- ],
113
- }
114
- ]
115
- }
116
-
117
- respx.get("https://api.openalex.org/works").mock(
118
- return_value=Response(200, json=mock_response)
119
- )
120
-
121
- results = await tool.search("metformin cancer", max_results=10)
122
-
123
- assert len(results) == 1
124
- assert isinstance(results[0], Evidence)
125
- assert "Metformin and cancer" in results[0].citation.title
126
- assert results[0].citation.source == "openalex"
127
-
128
- @respx.mock
129
- @pytest.mark.asyncio
130
- async def test_search_empty_results(self, tool: OpenAlexTool) -> None:
131
- """Search with no results should return empty list."""
132
- respx.get("https://api.openalex.org/works").mock(
133
- return_value=Response(200, json={"results": []})
134
- )
135
-
136
- results = await tool.search("xyznonexistentquery123")
137
- assert results == []
138
-
139
- @respx.mock
140
- @pytest.mark.asyncio
141
- async def test_search_handles_missing_abstract(self, tool: OpenAlexTool) -> None:
142
- """Tool should handle papers without abstracts."""
143
- mock_response = {
144
- "results": [
145
- {
146
- "id": "W123",
147
- "title": "Paper without abstract",
148
- "publication_year": 2023,
149
- "cited_by_count": 10,
150
- "type": "article",
151
- "is_oa": False,
152
- "primary_location": {
153
- "source": {"display_name": "Journal"},
154
- "landing_page_url": "https://example.com",
155
- },
156
- "abstract_inverted_index": None,
157
- "concepts": [],
158
- "authorships": [],
159
- }
160
- ]
161
- }
162
-
163
- respx.get("https://api.openalex.org/works").mock(
164
- return_value=Response(200, json=mock_response)
165
- )
166
-
167
- results = await tool.search("test query")
168
- assert len(results) == 1
169
- assert results[0].content == "" # No abstract
170
-
171
- @respx.mock
172
- @pytest.mark.asyncio
173
- async def test_search_extracts_citation_count(self, tool: OpenAlexTool) -> None:
174
- """Citation count should be in metadata."""
175
- mock_response = {
176
- "results": [
177
- {
178
- "id": "W456",
179
- "title": "Highly cited paper",
180
- "publication_year": 2020,
181
- "cited_by_count": 500,
182
- "type": "article",
183
- "is_oa": True,
184
- "primary_location": {
185
- "source": {"display_name": "Science"},
186
- "landing_page_url": "https://example.com",
187
- },
188
- "abstract_inverted_index": {"Test": [0]},
189
- "concepts": [],
190
- "authorships": [],
191
- }
192
- ]
193
- }
194
-
195
- respx.get("https://api.openalex.org/works").mock(
196
- return_value=Response(200, json=mock_response)
197
- )
198
-
199
- results = await tool.search("highly cited")
200
- assert results[0].metadata["cited_by_count"] == 500
201
-
202
- @respx.mock
203
- @pytest.mark.asyncio
204
- async def test_search_extracts_concepts(self, tool: OpenAlexTool) -> None:
205
- """Concepts should be extracted for semantic discovery."""
206
- mock_response = {
207
- "results": [
208
- {
209
- "id": "W789",
210
- "title": "Drug repurposing study",
211
- "publication_year": 2023,
212
- "cited_by_count": 25,
213
- "type": "article",
214
- "is_oa": True,
215
- "primary_location": {
216
- "source": {"display_name": "PLOS ONE"},
217
- "landing_page_url": "https://example.com",
218
- },
219
- "abstract_inverted_index": {"Drug": [0], "repurposing": [1]},
220
- "concepts": [
221
- {"display_name": "Pharmacology", "score": 0.92},
222
- {"display_name": "Drug Discovery", "score": 0.85},
223
- {"display_name": "Medicine", "score": 0.80},
224
- ],
225
- "authorships": [],
226
- }
227
- ]
228
- }
229
-
230
- respx.get("https://api.openalex.org/works").mock(
231
- return_value=Response(200, json=mock_response)
232
- )
233
-
234
- results = await tool.search("drug repurposing")
235
- assert "Pharmacology" in results[0].metadata["concepts"]
236
- assert "Drug Discovery" in results[0].metadata["concepts"]
237
-
238
- @respx.mock
239
- @pytest.mark.asyncio
240
- async def test_search_api_error_raises_search_error(
241
- self, tool: OpenAlexTool
242
- ) -> None:
243
- """API errors should raise SearchError."""
244
- from src.utils.exceptions import SearchError
245
-
246
- respx.get("https://api.openalex.org/works").mock(
247
- return_value=Response(500, text="Internal Server Error")
248
- )
249
-
250
- with pytest.raises(SearchError):
251
- await tool.search("test query")
252
-
253
- def test_reconstruct_abstract(self, tool: OpenAlexTool) -> None:
254
- """Test abstract reconstruction from inverted index."""
255
- inverted_index = {
256
- "Metformin": [0, 5],
257
- "is": [1],
258
- "a": [2],
259
- "diabetes": [3],
260
- "drug": [4],
261
- "effective": [6],
262
- }
263
- abstract = tool._reconstruct_abstract(inverted_index)
264
- assert abstract == "Metformin is a diabetes drug Metformin effective"
265
- ```
266
-
267
- ---
268
-
269
- ### Step 2: Create the Implementation
270
-
271
- **File**: `src/tools/openalex.py`
272
-
273
- ```python
274
- """OpenAlex search tool for comprehensive scholarly data."""
275
-
276
- from typing import Any
277
-
278
- import httpx
279
- from tenacity import retry, stop_after_attempt, wait_exponential
280
-
281
- from src.utils.exceptions import SearchError
282
- from src.utils.models import Citation, Evidence
283
-
284
-
285
- class OpenAlexTool:
286
- """
287
- Search OpenAlex for scholarly works with rich metadata.
288
-
289
- OpenAlex provides:
290
- - 209M+ scholarly works
291
- - Citation counts and networks
292
- - Concept tagging (hierarchical)
293
- - Author disambiguation
294
- - Open access links
295
-
296
- API Docs: https://docs.openalex.org/
297
- """
298
-
299
- BASE_URL = "https://api.openalex.org/works"
300
-
301
- def __init__(self, email: str | None = None) -> None:
302
- """
303
- Initialize OpenAlex tool.
304
-
305
- Args:
306
- email: Optional email for polite pool (faster responses)
307
- """
308
- self.email = email or "deepcritical@example.com"
309
-
310
- @property
311
- def name(self) -> str:
312
- return "openalex"
313
-
314
- @retry(
315
- stop=stop_after_attempt(3),
316
- wait=wait_exponential(multiplier=1, min=1, max=10),
317
- reraise=True,
318
- )
319
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
320
- """
321
- Search OpenAlex for scholarly works.
322
-
323
- Args:
324
- query: Search terms
325
- max_results: Maximum results to return (max 200 per request)
326
-
327
- Returns:
328
- List of Evidence objects with citation metadata
329
-
330
- Raises:
331
- SearchError: If API request fails
332
- """
333
- params = {
334
- "search": query,
335
- "filter": "type:article", # Only peer-reviewed articles
336
- "sort": "cited_by_count:desc", # Most cited first
337
- "per_page": min(max_results, 200),
338
- "mailto": self.email, # Polite pool for faster responses
339
- }
340
-
341
- async with httpx.AsyncClient(timeout=30.0) as client:
342
- try:
343
- response = await client.get(self.BASE_URL, params=params)
344
- response.raise_for_status()
345
-
346
- data = response.json()
347
- results = data.get("results", [])
348
-
349
- return [self._to_evidence(work) for work in results[:max_results]]
350
-
351
- except httpx.HTTPStatusError as e:
352
- raise SearchError(f"OpenAlex API error: {e}") from e
353
- except httpx.RequestError as e:
354
- raise SearchError(f"OpenAlex connection failed: {e}") from e
355
-
356
- def _to_evidence(self, work: dict[str, Any]) -> Evidence:
357
- """Convert OpenAlex work to Evidence object."""
358
- title = work.get("title", "Untitled")
359
- pub_year = work.get("publication_year", "Unknown")
360
- cited_by = work.get("cited_by_count", 0)
361
- is_oa = work.get("is_oa", False)
362
-
363
- # Reconstruct abstract from inverted index
364
- abstract_index = work.get("abstract_inverted_index")
365
- abstract = self._reconstruct_abstract(abstract_index) if abstract_index else ""
366
-
367
- # Extract concepts (top 5)
368
- concepts = [
369
- c.get("display_name", "")
370
- for c in work.get("concepts", [])[:5]
371
- if c.get("display_name")
372
- ]
373
-
374
- # Extract authors (top 5)
375
- authorships = work.get("authorships", [])
376
- authors = [
377
- a.get("author", {}).get("display_name", "")
378
- for a in authorships[:5]
379
- if a.get("author", {}).get("display_name")
380
- ]
381
-
382
- # Get URL
383
- primary_loc = work.get("primary_location") or {}
384
- url = primary_loc.get("landing_page_url", "")
385
- if not url:
386
- # Fallback to OpenAlex page
387
- work_id = work.get("id", "").replace("https://openalex.org/", "")
388
- url = f"https://openalex.org/{work_id}"
389
-
390
- return Evidence(
391
- content=abstract[:2000],
392
- citation=Citation(
393
- source="openalex",
394
- title=title[:500],
395
- url=url,
396
- date=str(pub_year),
397
- authors=authors,
398
- ),
399
- relevance=min(0.9, 0.5 + (cited_by / 1000)), # Boost by citations
400
- metadata={
401
- "cited_by_count": cited_by,
402
- "is_open_access": is_oa,
403
- "concepts": concepts,
404
- "pdf_url": primary_loc.get("pdf_url"),
405
- },
406
- )
407
-
408
- def _reconstruct_abstract(
409
- self, inverted_index: dict[str, list[int]]
410
- ) -> str:
411
- """
412
- Reconstruct abstract from OpenAlex inverted index format.
413
-
414
- OpenAlex stores abstracts as {"word": [position1, position2, ...]}.
415
- This rebuilds the original text.
416
- """
417
- if not inverted_index:
418
- return ""
419
-
420
- # Build position -> word mapping
421
- position_word: dict[int, str] = {}
422
- for word, positions in inverted_index.items():
423
- for pos in positions:
424
- position_word[pos] = word
425
-
426
- # Reconstruct in order
427
- if not position_word:
428
- return ""
429
-
430
- max_pos = max(position_word.keys())
431
- words = [position_word.get(i, "") for i in range(max_pos + 1)]
432
- return " ".join(w for w in words if w)
433
- ```
434
-
435
- ---
436
-
437
- ### Step 3: Register in Search Handler
438
-
439
- **File**: `src/tools/search_handler.py` (add to imports and tool list)
440
-
441
- ```python
442
- # Add import
443
- from src.tools.openalex import OpenAlexTool
444
-
445
- # Add to _create_tools method
446
- def _create_tools(self) -> list[SearchTool]:
447
- return [
448
- PubMedTool(),
449
- ClinicalTrialsTool(),
450
- EuropePMCTool(),
451
- OpenAlexTool(), # NEW
452
- ]
453
- ```
454
-
455
- ---
456
-
457
- ### Step 4: Update `__init__.py`
458
-
459
- **File**: `src/tools/__init__.py`
460
-
461
- ```python
462
- from src.tools.openalex import OpenAlexTool
463
-
464
- __all__ = [
465
- "PubMedTool",
466
- "ClinicalTrialsTool",
467
- "EuropePMCTool",
468
- "OpenAlexTool", # NEW
469
- # ...
470
- ]
471
- ```
472
-
473
- ---
474
-
475
- ## Demo Script
476
-
477
- **File**: `examples/openalex_demo.py`
478
-
479
- ```python
480
- #!/usr/bin/env python3
481
- """Demo script to verify OpenAlex integration."""
482
-
483
- import asyncio
484
- from src.tools.openalex import OpenAlexTool
485
-
486
-
487
- async def main():
488
- """Run OpenAlex search demo."""
489
- tool = OpenAlexTool()
490
-
491
- print("=" * 60)
492
- print("OpenAlex Integration Demo")
493
- print("=" * 60)
494
-
495
- # Test 1: Basic drug repurposing search
496
- print("\n[Test 1] Searching for 'metformin cancer drug repurposing'...")
497
- results = await tool.search("metformin cancer drug repurposing", max_results=5)
498
-
499
- for i, evidence in enumerate(results, 1):
500
- print(f"\n--- Result {i} ---")
501
- print(f"Title: {evidence.citation.title}")
502
- print(f"Year: {evidence.citation.date}")
503
- print(f"Citations: {evidence.metadata.get('cited_by_count', 'N/A')}")
504
- print(f"Concepts: {', '.join(evidence.metadata.get('concepts', []))}")
505
- print(f"Open Access: {evidence.metadata.get('is_open_access', False)}")
506
- print(f"URL: {evidence.citation.url}")
507
- if evidence.content:
508
- print(f"Abstract: {evidence.content[:200]}...")
509
-
510
- # Test 2: High-impact papers
511
- print("\n" + "=" * 60)
512
- print("[Test 2] Finding highly-cited papers on 'long COVID treatment'...")
513
- results = await tool.search("long COVID treatment", max_results=3)
514
-
515
- for evidence in results:
516
- print(f"\n- {evidence.citation.title}")
517
- print(f" Citations: {evidence.metadata.get('cited_by_count', 0)}")
518
-
519
- print("\n" + "=" * 60)
520
- print("Demo complete!")
521
-
522
-
523
- if __name__ == "__main__":
524
- asyncio.run(main())
525
- ```
526
-
527
- ---
528
-
529
- ## Verification Checklist
530
-
531
- ### Unit Tests
532
- ```bash
533
- # Run just OpenAlex tests
534
- uv run pytest tests/unit/tools/test_openalex.py -v
535
-
536
- # Expected: All tests pass
537
- ```
538
-
539
- ### Integration Test (Manual)
540
- ```bash
541
- # Run demo script with real API
542
- uv run python examples/openalex_demo.py
543
-
544
- # Expected: Real results from OpenAlex API
545
- ```
546
-
547
- ### Full Test Suite
548
- ```bash
549
- # Ensure nothing broke
550
- make check
551
-
552
- # Expected: All 110+ tests pass, mypy clean
553
- ```
554
-
555
- ---
556
-
557
- ## Success Criteria
558
-
559
- 1. **Unit tests pass**: All mocked tests in `test_openalex.py` pass
560
- 2. **Integration works**: Demo script returns real results
561
- 3. **No regressions**: `make check` passes completely
562
- 4. **SearchHandler integration**: OpenAlex appears in search results alongside other sources
563
- 5. **Citation metadata**: Results include `cited_by_count`, `concepts`, `is_open_access`
564
-
565
- ---
566
-
567
- ## Future Enhancements (P2)
568
-
569
- Once basic integration works:
570
-
571
- 1. **Citation Network Queries**
572
- ```python
573
- # Get papers citing a specific work
574
- async def get_citing_works(self, work_id: str) -> list[Evidence]:
575
- params = {"filter": f"cites:{work_id}"}
576
- ...
577
- ```
578
-
579
- 2. **Concept-Based Search**
580
- ```python
581
- # Search by OpenAlex concept ID
582
- async def search_by_concept(self, concept_id: str) -> list[Evidence]:
583
- params = {"filter": f"concepts.id:{concept_id}"}
584
- ...
585
- ```
586
-
587
- 3. **Author Tracking**
588
- ```python
589
- # Find all works by an author
590
- async def search_by_author(self, author_id: str) -> list[Evidence]:
591
- params = {"filter": f"authorships.author.id:{author_id}"}
592
- ...
593
- ```
594
-
595
- ---
596
-
597
- ## Notes
598
-
599
- - OpenAlex is **very generous** with rate limits (no documented hard limit)
600
- - Adding `mailto` parameter gives priority access (polite pool)
601
- - Abstract is stored as inverted index - must reconstruct
602
- - Citation count is a good proxy for paper quality/impact
603
- - Consider caching responses for repeated queries
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/implementation/16_PHASE_PUBMED_FULLTEXT.md DELETED
@@ -1,586 +0,0 @@
1
- # Phase 16: PubMed Full-Text Retrieval
2
-
3
- **Priority**: MEDIUM - Enhances evidence quality
4
- **Effort**: ~3 hours
5
- **Dependencies**: None (existing PubMed tool sufficient)
6
-
7
- ---
8
-
9
- ## Prerequisites (COMPLETED)
10
-
11
- The `Evidence.metadata` field has been added to `src/utils/models.py` to support:
12
- ```python
13
- metadata={"has_fulltext": True}
14
- ```
15
-
16
- ---
17
-
18
- ## Architecture Decision: Constructor Parameter vs Method Parameter
19
-
20
- **IMPORTANT**: The original spec proposed `include_fulltext` as a method parameter:
21
- ```python
22
- # WRONG - SearchHandler won't pass this parameter
23
- async def search(self, query: str, max_results: int = 10, include_fulltext: bool = False):
24
- ```
25
-
26
- **Problem**: `SearchHandler` calls `tool.search(query, max_results)` uniformly across all tools.
27
- It has no mechanism to pass tool-specific parameters like `include_fulltext`.
28
-
29
- **Solution**: Use constructor parameter instead:
30
- ```python
31
- # CORRECT - Configured at instantiation time
32
- class PubMedTool:
33
- def __init__(self, api_key: str | None = None, include_fulltext: bool = False):
34
- self.include_fulltext = include_fulltext
35
- ...
36
- ```
37
-
38
- This way, you can create a full-text-enabled PubMed tool:
39
- ```python
40
- # In orchestrator or wherever tools are created
41
- tools = [
42
- PubMedTool(include_fulltext=True), # Full-text enabled
43
- ClinicalTrialsTool(),
44
- EuropePMCTool(),
45
- ]
46
- ```
47
-
48
- ---
49
-
50
- ## Overview
51
-
52
- Add full-text retrieval for PubMed papers via the BioC API, enabling:
53
- - Complete paper text for open-access PMC papers
54
- - Structured sections (intro, methods, results, discussion)
55
- - Better evidence for LLM synthesis
56
-
57
- **Why Full-Text?**
58
- - Abstracts only give ~200-300 words
59
- - Full text provides detailed methods, results, figures
60
- - Reference repo already has this implemented
61
- - Makes LLM judgments more accurate
62
-
63
- ---
64
-
65
- ## TDD Implementation Plan
66
-
67
- ### Step 1: Write the Tests First
68
-
69
- **File**: `tests/unit/tools/test_pubmed_fulltext.py`
70
-
71
- ```python
72
- """Tests for PubMed full-text retrieval."""
73
-
74
- import pytest
75
- import respx
76
- from httpx import Response
77
-
78
- from src.tools.pubmed import PubMedTool
79
-
80
-
81
- class TestPubMedFullText:
82
- """Test suite for PubMed full-text functionality."""
83
-
84
- @pytest.fixture
85
- def tool(self) -> PubMedTool:
86
- return PubMedTool()
87
-
88
- @respx.mock
89
- @pytest.mark.asyncio
90
- async def test_get_pmc_id_success(self, tool: PubMedTool) -> None:
91
- """Should convert PMID to PMCID for full-text access."""
92
- mock_response = {
93
- "records": [
94
- {
95
- "pmid": "12345678",
96
- "pmcid": "PMC1234567",
97
- }
98
- ]
99
- }
100
-
101
- respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
102
- return_value=Response(200, json=mock_response)
103
- )
104
-
105
- pmcid = await tool.get_pmc_id("12345678")
106
- assert pmcid == "PMC1234567"
107
-
108
- @respx.mock
109
- @pytest.mark.asyncio
110
- async def test_get_pmc_id_not_in_pmc(self, tool: PubMedTool) -> None:
111
- """Should return None if paper not in PMC."""
112
- mock_response = {
113
- "records": [
114
- {
115
- "pmid": "12345678",
116
- # No pmcid means not in PMC
117
- }
118
- ]
119
- }
120
-
121
- respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
122
- return_value=Response(200, json=mock_response)
123
- )
124
-
125
- pmcid = await tool.get_pmc_id("12345678")
126
- assert pmcid is None
127
-
128
- @respx.mock
129
- @pytest.mark.asyncio
130
- async def test_get_fulltext_success(self, tool: PubMedTool) -> None:
131
- """Should retrieve full text for PMC papers."""
132
- # Mock BioC API response
133
- mock_bioc = {
134
- "documents": [
135
- {
136
- "passages": [
137
- {
138
- "infons": {"section_type": "INTRO"},
139
- "text": "Introduction text here.",
140
- },
141
- {
142
- "infons": {"section_type": "METHODS"},
143
- "text": "Methods description here.",
144
- },
145
- {
146
- "infons": {"section_type": "RESULTS"},
147
- "text": "Results summary here.",
148
- },
149
- {
150
- "infons": {"section_type": "DISCUSS"},
151
- "text": "Discussion and conclusions.",
152
- },
153
- ]
154
- }
155
- ]
156
- }
157
-
158
- respx.get(
159
- "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
160
- ).mock(return_value=Response(200, json=mock_bioc))
161
-
162
- fulltext = await tool.get_fulltext("12345678")
163
-
164
- assert fulltext is not None
165
- assert "Introduction text here" in fulltext
166
- assert "Methods description here" in fulltext
167
- assert "Results summary here" in fulltext
168
-
169
- @respx.mock
170
- @pytest.mark.asyncio
171
- async def test_get_fulltext_not_available(self, tool: PubMedTool) -> None:
172
- """Should return None if full text not available."""
173
- respx.get(
174
- "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/99999999/unicode"
175
- ).mock(return_value=Response(404))
176
-
177
- fulltext = await tool.get_fulltext("99999999")
178
- assert fulltext is None
179
-
180
- @respx.mock
181
- @pytest.mark.asyncio
182
- async def test_get_fulltext_structured(self, tool: PubMedTool) -> None:
183
- """Should return structured sections dict."""
184
- mock_bioc = {
185
- "documents": [
186
- {
187
- "passages": [
188
- {"infons": {"section_type": "INTRO"}, "text": "Intro..."},
189
- {"infons": {"section_type": "METHODS"}, "text": "Methods..."},
190
- {"infons": {"section_type": "RESULTS"}, "text": "Results..."},
191
- {"infons": {"section_type": "DISCUSS"}, "text": "Discussion..."},
192
- ]
193
- }
194
- ]
195
- }
196
-
197
- respx.get(
198
- "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
199
- ).mock(return_value=Response(200, json=mock_bioc))
200
-
201
- sections = await tool.get_fulltext_structured("12345678")
202
-
203
- assert sections is not None
204
- assert "introduction" in sections
205
- assert "methods" in sections
206
- assert "results" in sections
207
- assert "discussion" in sections
208
-
209
- @respx.mock
210
- @pytest.mark.asyncio
211
- async def test_search_with_fulltext_enabled(self) -> None:
212
- """Search should include full text when tool is configured for it."""
213
- # Create tool WITH full-text enabled via constructor
214
- tool = PubMedTool(include_fulltext=True)
215
-
216
- # Mock esearch
217
- respx.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi").mock(
218
- return_value=Response(
219
- 200, json={"esearchresult": {"idlist": ["12345678"]}}
220
- )
221
- )
222
-
223
- # Mock efetch (abstract)
224
- mock_xml = """
225
- <PubmedArticleSet>
226
- <PubmedArticle>
227
- <MedlineCitation>
228
- <PMID>12345678</PMID>
229
- <Article>
230
- <ArticleTitle>Test Paper</ArticleTitle>
231
- <Abstract><AbstractText>Short abstract.</AbstractText></Abstract>
232
- <AuthorList><Author><LastName>Smith</LastName></Author></AuthorList>
233
- </Article>
234
- </MedlineCitation>
235
- </PubmedArticle>
236
- </PubmedArticleSet>
237
- """
238
- respx.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi").mock(
239
- return_value=Response(200, text=mock_xml)
240
- )
241
-
242
- # Mock ID converter
243
- respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
244
- return_value=Response(
245
- 200, json={"records": [{"pmid": "12345678", "pmcid": "PMC1234567"}]}
246
- )
247
- )
248
-
249
- # Mock BioC full text
250
- mock_bioc = {
251
- "documents": [
252
- {
253
- "passages": [
254
- {"infons": {"section_type": "INTRO"}, "text": "Full intro..."},
255
- ]
256
- }
257
- ]
258
- }
259
- respx.get(
260
- "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
261
- ).mock(return_value=Response(200, json=mock_bioc))
262
-
263
- # NOTE: No include_fulltext param - it's set via constructor
264
- results = await tool.search("test", max_results=1)
265
-
266
- assert len(results) == 1
267
- # Full text should be appended or replace abstract
268
- assert "Full intro" in results[0].content or "Short abstract" in results[0].content
269
- ```
270
-
271
- ---
272
-
273
- ### Step 2: Implement Full-Text Methods
274
-
275
- **File**: `src/tools/pubmed.py` (additions to existing class)
276
-
277
- ```python
278
- # Add these methods to PubMedTool class
279
-
280
- async def get_pmc_id(self, pmid: str) -> str | None:
281
- """
282
- Convert PMID to PMCID for full-text access.
283
-
284
- Args:
285
- pmid: PubMed ID
286
-
287
- Returns:
288
- PMCID if paper is in PMC, None otherwise
289
- """
290
- url = "https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/"
291
- params = {"ids": pmid, "format": "json"}
292
-
293
- async with httpx.AsyncClient(timeout=30.0) as client:
294
- try:
295
- response = await client.get(url, params=params)
296
- response.raise_for_status()
297
- data = response.json()
298
-
299
- records = data.get("records", [])
300
- if records and records[0].get("pmcid"):
301
- return records[0]["pmcid"]
302
- return None
303
-
304
- except httpx.HTTPError:
305
- return None
306
-
307
-
308
- async def get_fulltext(self, pmid: str) -> str | None:
309
- """
310
- Get full text for a PubMed paper via BioC API.
311
-
312
- Only works for open-access papers in PubMed Central.
313
-
314
- Args:
315
- pmid: PubMed ID
316
-
317
- Returns:
318
- Full text as string, or None if not available
319
- """
320
- url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
321
-
322
- async with httpx.AsyncClient(timeout=60.0) as client:
323
- try:
324
- response = await client.get(url)
325
- if response.status_code == 404:
326
- return None
327
- response.raise_for_status()
328
- data = response.json()
329
-
330
- # Extract text from all passages
331
- documents = data.get("documents", [])
332
- if not documents:
333
- return None
334
-
335
- passages = documents[0].get("passages", [])
336
- text_parts = [p.get("text", "") for p in passages if p.get("text")]
337
-
338
- return "\n\n".join(text_parts) if text_parts else None
339
-
340
- except httpx.HTTPError:
341
- return None
342
-
343
-
344
- async def get_fulltext_structured(self, pmid: str) -> dict[str, str] | None:
345
- """
346
- Get structured full text with sections.
347
-
348
- Args:
349
- pmid: PubMed ID
350
-
351
- Returns:
352
- Dict mapping section names to text, or None if not available
353
- """
354
- url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
355
-
356
- async with httpx.AsyncClient(timeout=60.0) as client:
357
- try:
358
- response = await client.get(url)
359
- if response.status_code == 404:
360
- return None
361
- response.raise_for_status()
362
- data = response.json()
363
-
364
- documents = data.get("documents", [])
365
- if not documents:
366
- return None
367
-
368
- # Map section types to readable names
369
- section_map = {
370
- "INTRO": "introduction",
371
- "METHODS": "methods",
372
- "RESULTS": "results",
373
- "DISCUSS": "discussion",
374
- "CONCL": "conclusion",
375
- "ABSTRACT": "abstract",
376
- }
377
-
378
- sections: dict[str, list[str]] = {}
379
- for passage in documents[0].get("passages", []):
380
- section_type = passage.get("infons", {}).get("section_type", "other")
381
- section_name = section_map.get(section_type, "other")
382
- text = passage.get("text", "")
383
-
384
- if text:
385
- if section_name not in sections:
386
- sections[section_name] = []
387
- sections[section_name].append(text)
388
-
389
- # Join multiple passages per section
390
- return {k: "\n\n".join(v) for k, v in sections.items()}
391
-
392
- except httpx.HTTPError:
393
- return None
394
- ```
395
-
396
- ---
397
-
398
- ### Step 3: Update Constructor and Search Method
399
-
400
- Add full-text flag to constructor and update search to use it:
401
-
402
- ```python
403
- class PubMedTool:
404
- """Search tool for PubMed/NCBI."""
405
-
406
- def __init__(
407
- self,
408
- api_key: str | None = None,
409
- include_fulltext: bool = False, # NEW CONSTRUCTOR PARAM
410
- ) -> None:
411
- self.api_key = api_key or settings.ncbi_api_key
412
- if self.api_key == "your-ncbi-key-here":
413
- self.api_key = None
414
- self._last_request_time = 0.0
415
- self.include_fulltext = include_fulltext # Store for use in search()
416
-
417
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
418
- """
419
- Search PubMed and return evidence.
420
-
421
- Note: Full-text enrichment is controlled by constructor parameter,
422
- not method parameter, because SearchHandler doesn't pass extra args.
423
- """
424
- # ... existing search logic ...
425
-
426
- evidence_list = self._parse_pubmed_xml(fetch_resp.text)
427
-
428
- # Optionally enrich with full text (if configured at construction)
429
- if self.include_fulltext:
430
- evidence_list = await self._enrich_with_fulltext(evidence_list)
431
-
432
- return evidence_list
433
-
434
-
435
- async def _enrich_with_fulltext(
436
- self, evidence_list: list[Evidence]
437
- ) -> list[Evidence]:
438
- """Attempt to add full text to evidence items."""
439
- enriched = []
440
-
441
- for evidence in evidence_list:
442
- # Extract PMID from URL
443
- url = evidence.citation.url
444
- pmid = url.rstrip("/").split("/")[-1] if url else None
445
-
446
- if pmid:
447
- fulltext = await self.get_fulltext(pmid)
448
- if fulltext:
449
- # Replace abstract with full text (truncated)
450
- evidence = Evidence(
451
- content=fulltext[:8000], # Larger limit for full text
452
- citation=evidence.citation,
453
- relevance=evidence.relevance,
454
- metadata={
455
- **evidence.metadata,
456
- "has_fulltext": True,
457
- },
458
- )
459
-
460
- enriched.append(evidence)
461
-
462
- return enriched
463
- ```
464
-
465
- ---
466
-
467
- ## Demo Script
468
-
469
- **File**: `examples/pubmed_fulltext_demo.py`
470
-
471
- ```python
472
- #!/usr/bin/env python3
473
- """Demo script to verify PubMed full-text retrieval."""
474
-
475
- import asyncio
476
- from src.tools.pubmed import PubMedTool
477
-
478
-
479
- async def main():
480
- """Run PubMed full-text demo."""
481
- tool = PubMedTool()
482
-
483
- print("=" * 60)
484
- print("PubMed Full-Text Demo")
485
- print("=" * 60)
486
-
487
- # Test 1: Convert PMID to PMCID
488
- print("\n[Test 1] Converting PMID to PMCID...")
489
- # Use a known open-access paper
490
- test_pmid = "34450029" # Example: COVID-related open-access paper
491
- pmcid = await tool.get_pmc_id(test_pmid)
492
- print(f"PMID {test_pmid} -> PMCID: {pmcid or 'Not in PMC'}")
493
-
494
- # Test 2: Get full text
495
- print("\n[Test 2] Fetching full text...")
496
- if pmcid:
497
- fulltext = await tool.get_fulltext(test_pmid)
498
- if fulltext:
499
- print(f"Full text length: {len(fulltext)} characters")
500
- print(f"Preview: {fulltext[:500]}...")
501
- else:
502
- print("Full text not available")
503
-
504
- # Test 3: Get structured sections
505
- print("\n[Test 3] Fetching structured sections...")
506
- if pmcid:
507
- sections = await tool.get_fulltext_structured(test_pmid)
508
- if sections:
509
- print("Available sections:")
510
- for section, text in sections.items():
511
- print(f" - {section}: {len(text)} chars")
512
- else:
513
- print("Structured text not available")
514
-
515
- # Test 4: Search with full text
516
- print("\n[Test 4] Search with full-text enrichment...")
517
- results = await tool.search(
518
- "metformin cancer open access",
519
- max_results=3,
520
- include_fulltext=True
521
- )
522
-
523
- for i, evidence in enumerate(results, 1):
524
- has_ft = evidence.metadata.get("has_fulltext", False)
525
- print(f"\n--- Result {i} ---")
526
- print(f"Title: {evidence.citation.title}")
527
- print(f"Has Full Text: {has_ft}")
528
- print(f"Content Length: {len(evidence.content)} chars")
529
-
530
- print("\n" + "=" * 60)
531
- print("Demo complete!")
532
-
533
-
534
- if __name__ == "__main__":
535
- asyncio.run(main())
536
- ```
537
-
538
- ---
539
-
540
- ## Verification Checklist
541
-
542
- ### Unit Tests
543
- ```bash
544
- # Run full-text tests
545
- uv run pytest tests/unit/tools/test_pubmed_fulltext.py -v
546
-
547
- # Run all PubMed tests
548
- uv run pytest tests/unit/tools/test_pubmed.py -v
549
-
550
- # Expected: All tests pass
551
- ```
552
-
553
- ### Integration Test (Manual)
554
- ```bash
555
- # Run demo with real API
556
- uv run python examples/pubmed_fulltext_demo.py
557
-
558
- # Expected: Real full text from PMC papers
559
- ```
560
-
561
- ### Full Test Suite
562
- ```bash
563
- make check
564
- # Expected: All tests pass, mypy clean
565
- ```
566
-
567
- ---
568
-
569
- ## Success Criteria
570
-
571
- 1. **ID Conversion works**: PMID -> PMCID conversion successful
572
- 2. **Full text retrieval works**: BioC API returns paper text
573
- 3. **Structured sections work**: Can get intro/methods/results/discussion separately
574
- 4. **Search integration works**: `include_fulltext=True` enriches results
575
- 5. **No regressions**: Existing tests still pass
576
- 6. **Graceful degradation**: Non-PMC papers still return abstracts
577
-
578
- ---
579
-
580
- ## Notes
581
-
582
- - Only ~30% of PubMed papers have full text in PMC
583
- - BioC API has no documented rate limit, but be respectful
584
- - Full text can be very long - truncate appropriately
585
- - Consider caching full text responses (they don't change)
586
- - Timeout should be longer for full text (60s vs 30s)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/implementation/17_PHASE_RATE_LIMITING.md DELETED
@@ -1,540 +0,0 @@
1
- # Phase 17: Rate Limiting with `limits` Library
2
-
3
- **Priority**: P0 CRITICAL - Prevents API blocks
4
- **Effort**: ~1 hour
5
- **Dependencies**: None
6
-
7
- ---
8
-
9
- ## CRITICAL: Async Safety Requirements
10
-
11
- **WARNING**: The rate limiter MUST be async-safe. Blocking the event loop will freeze:
12
- - The Gradio UI
13
- - All parallel searches
14
- - The orchestrator
15
-
16
- **Rules**:
17
- 1. **NEVER use `time.sleep()`** - Always use `await asyncio.sleep()`
18
- 2. **NEVER use blocking while loops** - Use async-aware polling
19
- 3. **The `limits` library check is synchronous** - Wrap it carefully
20
-
21
- The implementation below uses a polling pattern that:
22
- - Checks the limit (synchronous, fast)
23
- - If exceeded, `await asyncio.sleep()` (non-blocking)
24
- - Retry the check
25
-
26
- **Alternative**: If `limits` proves problematic, use `aiolimiter` which is pure-async.
27
-
28
- ---
29
-
30
- ## Overview
31
-
32
- Replace naive `asyncio.sleep` rate limiting with proper rate limiter using the `limits` library, which provides:
33
- - Moving window rate limiting
34
- - Per-API configurable limits
35
- - Thread-safe storage
36
- - Already used in reference repo
37
-
38
- **Why This Matters?**
39
- - NCBI will block us without proper rate limiting (3/sec without key, 10/sec with)
40
- - Current implementation only has simple sleep delay
41
- - Need coordinated limits across all PubMed calls
42
- - Professional-grade rate limiting prevents production issues
43
-
44
- ---
45
-
46
- ## Current State
47
-
48
- ### What We Have (`src/tools/pubmed.py:20-21, 34-41`)
49
-
50
- ```python
51
- RATE_LIMIT_DELAY = 0.34 # ~3 requests/sec without API key
52
-
53
- async def _rate_limit(self) -> None:
54
- """Enforce NCBI rate limiting."""
55
- loop = asyncio.get_running_loop()
56
- now = loop.time()
57
- elapsed = now - self._last_request_time
58
- if elapsed < self.RATE_LIMIT_DELAY:
59
- await asyncio.sleep(self.RATE_LIMIT_DELAY - elapsed)
60
- self._last_request_time = loop.time()
61
- ```
62
-
63
- ### Problems
64
-
65
- 1. **Not shared across instances**: Each `PubMedTool()` has its own counter
66
- 2. **Simple delay vs moving window**: Doesn't handle bursts properly
67
- 3. **Hardcoded rate**: Doesn't adapt to API key presence
68
- 4. **No backoff on 429**: Just retries blindly
69
-
70
- ---
71
-
72
- ## TDD Implementation Plan
73
-
74
- ### Step 1: Add Dependency
75
-
76
- **File**: `pyproject.toml`
77
-
78
- ```toml
79
- dependencies = [
80
- # ... existing deps ...
81
- "limits>=3.0",
82
- ]
83
- ```
84
-
85
- Then run:
86
- ```bash
87
- uv sync
88
- ```
89
-
90
- ---
91
-
92
- ### Step 2: Write the Tests First
93
-
94
- **File**: `tests/unit/tools/test_rate_limiting.py`
95
-
96
- ```python
97
- """Tests for rate limiting functionality."""
98
-
99
- import asyncio
100
- import time
101
-
102
- import pytest
103
-
104
- from src.tools.rate_limiter import RateLimiter, get_pubmed_limiter
105
-
106
-
107
- class TestRateLimiter:
108
- """Test suite for rate limiter."""
109
-
110
- def test_create_limiter_without_api_key(self) -> None:
111
- """Should create 3/sec limiter without API key."""
112
- limiter = RateLimiter(rate="3/second")
113
- assert limiter.rate == "3/second"
114
-
115
- def test_create_limiter_with_api_key(self) -> None:
116
- """Should create 10/sec limiter with API key."""
117
- limiter = RateLimiter(rate="10/second")
118
- assert limiter.rate == "10/second"
119
-
120
- @pytest.mark.asyncio
121
- async def test_limiter_allows_requests_under_limit(self) -> None:
122
- """Should allow requests under the rate limit."""
123
- limiter = RateLimiter(rate="10/second")
124
-
125
- # 3 requests should all succeed immediately
126
- for _ in range(3):
127
- allowed = await limiter.acquire()
128
- assert allowed is True
129
-
130
- @pytest.mark.asyncio
131
- async def test_limiter_blocks_when_exceeded(self) -> None:
132
- """Should wait when rate limit exceeded."""
133
- limiter = RateLimiter(rate="2/second")
134
-
135
- # First 2 should be instant
136
- await limiter.acquire()
137
- await limiter.acquire()
138
-
139
- # Third should block briefly
140
- start = time.monotonic()
141
- await limiter.acquire()
142
- elapsed = time.monotonic() - start
143
-
144
- # Should have waited ~0.5 seconds (half second window for 2/sec)
145
- assert elapsed >= 0.3
146
-
147
- @pytest.mark.asyncio
148
- async def test_limiter_resets_after_window(self) -> None:
149
- """Rate limit should reset after time window."""
150
- limiter = RateLimiter(rate="5/second")
151
-
152
- # Use up the limit
153
- for _ in range(5):
154
- await limiter.acquire()
155
-
156
- # Wait for window to pass
157
- await asyncio.sleep(1.1)
158
-
159
- # Should be allowed again
160
- start = time.monotonic()
161
- await limiter.acquire()
162
- elapsed = time.monotonic() - start
163
-
164
- assert elapsed < 0.1 # Should be nearly instant
165
-
166
-
167
- class TestGetPubmedLimiter:
168
- """Test PubMed-specific limiter factory."""
169
-
170
- def test_limiter_without_api_key(self) -> None:
171
- """Should return 3/sec limiter without key."""
172
- limiter = get_pubmed_limiter(api_key=None)
173
- assert "3" in limiter.rate
174
-
175
- def test_limiter_with_api_key(self) -> None:
176
- """Should return 10/sec limiter with key."""
177
- limiter = get_pubmed_limiter(api_key="my-api-key")
178
- assert "10" in limiter.rate
179
-
180
- def test_limiter_is_singleton(self) -> None:
181
- """Same API key should return same limiter instance."""
182
- limiter1 = get_pubmed_limiter(api_key="key1")
183
- limiter2 = get_pubmed_limiter(api_key="key1")
184
- assert limiter1 is limiter2
185
-
186
- def test_different_keys_different_limiters(self) -> None:
187
- """Different API keys should return different limiters."""
188
- limiter1 = get_pubmed_limiter(api_key="key1")
189
- limiter2 = get_pubmed_limiter(api_key="key2")
190
- # Clear cache for clean test
191
- # Actually, different keys SHOULD share the same limiter
192
- # since we're limiting against the same API
193
- assert limiter1 is limiter2 # Shared NCBI rate limit
194
- ```
195
-
196
- ---
197
-
198
- ### Step 3: Create Rate Limiter Module
199
-
200
- **File**: `src/tools/rate_limiter.py`
201
-
202
- ```python
203
- """Rate limiting utilities using the limits library."""
204
-
205
- import asyncio
206
- from typing import ClassVar
207
-
208
- from limits import RateLimitItem, parse
209
- from limits.storage import MemoryStorage
210
- from limits.strategies import MovingWindowRateLimiter
211
-
212
-
213
- class RateLimiter:
214
- """
215
- Async-compatible rate limiter using limits library.
216
-
217
- Uses moving window algorithm for smooth rate limiting.
218
- """
219
-
220
- def __init__(self, rate: str) -> None:
221
- """
222
- Initialize rate limiter.
223
-
224
- Args:
225
- rate: Rate string like "3/second" or "10/second"
226
- """
227
- self.rate = rate
228
- self._storage = MemoryStorage()
229
- self._limiter = MovingWindowRateLimiter(self._storage)
230
- self._rate_limit: RateLimitItem = parse(rate)
231
- self._identity = "default" # Single identity for shared limiting
232
-
233
- async def acquire(self, wait: bool = True) -> bool:
234
- """
235
- Acquire permission to make a request.
236
-
237
- ASYNC-SAFE: Uses asyncio.sleep(), never time.sleep().
238
- The polling pattern allows other coroutines to run while waiting.
239
-
240
- Args:
241
- wait: If True, wait until allowed. If False, return immediately.
242
-
243
- Returns:
244
- True if allowed, False if not (only when wait=False)
245
- """
246
- while True:
247
- # Check if we can proceed (synchronous, fast - ~microseconds)
248
- if self._limiter.hit(self._rate_limit, self._identity):
249
- return True
250
-
251
- if not wait:
252
- return False
253
-
254
- # CRITICAL: Use asyncio.sleep(), NOT time.sleep()
255
- # This yields control to the event loop, allowing other
256
- # coroutines (UI, parallel searches) to run
257
- await asyncio.sleep(0.1)
258
-
259
- def reset(self) -> None:
260
- """Reset the rate limiter (for testing)."""
261
- self._storage.reset()
262
-
263
-
264
- # Singleton limiter for PubMed/NCBI
265
- _pubmed_limiter: RateLimiter | None = None
266
-
267
-
268
- def get_pubmed_limiter(api_key: str | None = None) -> RateLimiter:
269
- """
270
- Get the shared PubMed rate limiter.
271
-
272
- Rate depends on whether API key is provided:
273
- - Without key: 3 requests/second
274
- - With key: 10 requests/second
275
-
276
- Args:
277
- api_key: NCBI API key (optional)
278
-
279
- Returns:
280
- Shared RateLimiter instance
281
- """
282
- global _pubmed_limiter
283
-
284
- if _pubmed_limiter is None:
285
- rate = "10/second" if api_key else "3/second"
286
- _pubmed_limiter = RateLimiter(rate)
287
-
288
- return _pubmed_limiter
289
-
290
-
291
- def reset_pubmed_limiter() -> None:
292
- """Reset the PubMed limiter (for testing)."""
293
- global _pubmed_limiter
294
- _pubmed_limiter = None
295
-
296
-
297
- # Factory for other APIs
298
- class RateLimiterFactory:
299
- """Factory for creating/getting rate limiters for different APIs."""
300
-
301
- _limiters: ClassVar[dict[str, RateLimiter]] = {}
302
-
303
- @classmethod
304
- def get(cls, api_name: str, rate: str) -> RateLimiter:
305
- """
306
- Get or create a rate limiter for an API.
307
-
308
- Args:
309
- api_name: Unique identifier for the API
310
- rate: Rate limit string (e.g., "10/second")
311
-
312
- Returns:
313
- RateLimiter instance (shared for same api_name)
314
- """
315
- if api_name not in cls._limiters:
316
- cls._limiters[api_name] = RateLimiter(rate)
317
- return cls._limiters[api_name]
318
-
319
- @classmethod
320
- def reset_all(cls) -> None:
321
- """Reset all limiters (for testing)."""
322
- cls._limiters.clear()
323
- ```
324
-
325
- ---
326
-
327
- ### Step 4: Update PubMed Tool
328
-
329
- **File**: `src/tools/pubmed.py` (replace rate limiting code)
330
-
331
- ```python
332
- # Replace imports and rate limiting
333
-
334
- from src.tools.rate_limiter import get_pubmed_limiter
335
-
336
-
337
- class PubMedTool:
338
- """Search tool for PubMed/NCBI."""
339
-
340
- BASE_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"
341
- HTTP_TOO_MANY_REQUESTS = 429
342
-
343
- def __init__(self, api_key: str | None = None) -> None:
344
- self.api_key = api_key or settings.ncbi_api_key
345
- if self.api_key == "your-ncbi-key-here":
346
- self.api_key = None
347
- # Use shared rate limiter
348
- self._limiter = get_pubmed_limiter(self.api_key)
349
-
350
- async def _rate_limit(self) -> None:
351
- """Enforce NCBI rate limiting using shared limiter."""
352
- await self._limiter.acquire()
353
-
354
- # ... rest of class unchanged ...
355
- ```
356
-
357
- ---
358
-
359
- ### Step 5: Add Rate Limiters for Other APIs
360
-
361
- **File**: `src/tools/clinicaltrials.py` (optional)
362
-
363
- ```python
364
- from src.tools.rate_limiter import RateLimiterFactory
365
-
366
-
367
- class ClinicalTrialsTool:
368
- def __init__(self) -> None:
369
- # ClinicalTrials.gov doesn't document limits, but be conservative
370
- self._limiter = RateLimiterFactory.get("clinicaltrials", "5/second")
371
-
372
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
373
- await self._limiter.acquire()
374
- # ... rest of method ...
375
- ```
376
-
377
- **File**: `src/tools/europepmc.py` (optional)
378
-
379
- ```python
380
- from src.tools.rate_limiter import RateLimiterFactory
381
-
382
-
383
- class EuropePMCTool:
384
- def __init__(self) -> None:
385
- # Europe PMC is generous, but still be respectful
386
- self._limiter = RateLimiterFactory.get("europepmc", "10/second")
387
-
388
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
389
- await self._limiter.acquire()
390
- # ... rest of method ...
391
- ```
392
-
393
- ---
394
-
395
- ## Demo Script
396
-
397
- **File**: `examples/rate_limiting_demo.py`
398
-
399
- ```python
400
- #!/usr/bin/env python3
401
- """Demo script to verify rate limiting works correctly."""
402
-
403
- import asyncio
404
- import time
405
-
406
- from src.tools.rate_limiter import RateLimiter, get_pubmed_limiter, reset_pubmed_limiter
407
- from src.tools.pubmed import PubMedTool
408
-
409
-
410
- async def test_basic_limiter():
411
- """Test basic rate limiter behavior."""
412
- print("=" * 60)
413
- print("Rate Limiting Demo")
414
- print("=" * 60)
415
-
416
- # Test 1: Basic limiter
417
- print("\n[Test 1] Testing 3/second limiter...")
418
- limiter = RateLimiter("3/second")
419
-
420
- start = time.monotonic()
421
- for i in range(6):
422
- await limiter.acquire()
423
- elapsed = time.monotonic() - start
424
- print(f" Request {i+1} at {elapsed:.2f}s")
425
-
426
- total = time.monotonic() - start
427
- print(f" Total time for 6 requests: {total:.2f}s (expected ~2s)")
428
-
429
-
430
- async def test_pubmed_limiter():
431
- """Test PubMed-specific limiter."""
432
- print("\n[Test 2] Testing PubMed limiter (shared)...")
433
-
434
- reset_pubmed_limiter() # Clean state
435
-
436
- # Without API key: 3/sec
437
- limiter = get_pubmed_limiter(api_key=None)
438
- print(f" Rate without key: {limiter.rate}")
439
-
440
- # Multiple tools should share the same limiter
441
- tool1 = PubMedTool()
442
- tool2 = PubMedTool()
443
-
444
- # Verify they share the limiter
445
- print(f" Tools share limiter: {tool1._limiter is tool2._limiter}")
446
-
447
-
448
- async def test_concurrent_requests():
449
- """Test rate limiting under concurrent load."""
450
- print("\n[Test 3] Testing concurrent request limiting...")
451
-
452
- limiter = RateLimiter("5/second")
453
-
454
- async def make_request(i: int):
455
- await limiter.acquire()
456
- return time.monotonic()
457
-
458
- start = time.monotonic()
459
- # Launch 10 concurrent requests
460
- tasks = [make_request(i) for i in range(10)]
461
- times = await asyncio.gather(*tasks)
462
-
463
- # Calculate distribution
464
- relative_times = [t - start for t in times]
465
- print(f" Request times: {[f'{t:.2f}s' for t in sorted(relative_times)]}")
466
-
467
- total = max(relative_times)
468
- print(f" All 10 requests completed in {total:.2f}s (expected ~2s)")
469
-
470
-
471
- async def main():
472
- await test_basic_limiter()
473
- await test_pubmed_limiter()
474
- await test_concurrent_requests()
475
-
476
- print("\n" + "=" * 60)
477
- print("Demo complete!")
478
-
479
-
480
- if __name__ == "__main__":
481
- asyncio.run(main())
482
- ```
483
-
484
- ---
485
-
486
- ## Verification Checklist
487
-
488
- ### Unit Tests
489
- ```bash
490
- # Run rate limiting tests
491
- uv run pytest tests/unit/tools/test_rate_limiting.py -v
492
-
493
- # Expected: All tests pass
494
- ```
495
-
496
- ### Integration Test (Manual)
497
- ```bash
498
- # Run demo
499
- uv run python examples/rate_limiting_demo.py
500
-
501
- # Expected: Requests properly spaced
502
- ```
503
-
504
- ### Full Test Suite
505
- ```bash
506
- make check
507
- # Expected: All tests pass, mypy clean
508
- ```
509
-
510
- ---
511
-
512
- ## Success Criteria
513
-
514
- 1. **`limits` library installed**: Dependency added to pyproject.toml
515
- 2. **RateLimiter class works**: Can create and use limiters
516
- 3. **PubMed uses new limiter**: Shared limiter across instances
517
- 4. **Rate adapts to API key**: 3/sec without, 10/sec with
518
- 5. **Concurrent requests handled**: Multiple async requests properly queued
519
- 6. **No regressions**: All existing tests pass
520
-
521
- ---
522
-
523
- ## API Rate Limit Reference
524
-
525
- | API | Without Key | With Key |
526
- |-----|-------------|----------|
527
- | PubMed/NCBI | 3/sec | 10/sec |
528
- | ClinicalTrials.gov | Undocumented (~5/sec safe) | N/A |
529
- | Europe PMC | ~10-20/sec (generous) | N/A |
530
- | OpenAlex | ~100k/day (no per-sec limit) | Faster with `mailto` |
531
-
532
- ---
533
-
534
- ## Notes
535
-
536
- - `limits` library uses moving window algorithm (fairer than fixed window)
537
- - Singleton pattern ensures all PubMed calls share the limit
538
- - The factory pattern allows easy extension to other APIs
539
- - Consider adding 429 response detection + exponential backoff
540
- - In production, consider Redis storage for distributed rate limiting
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/implementation/README.md DELETED
@@ -1,143 +0,0 @@
1
- # Implementation Plans
2
-
3
- TDD implementation plans based on the brainstorming documents. Each phase is a self-contained vertical slice with tests, implementation, and demo scripts.
4
-
5
- ---
6
-
7
- ## Prerequisites (COMPLETED)
8
-
9
- The following foundational changes have been implemented to support all three phases:
10
-
11
- | Change | File | Status |
12
- |--------|------|--------|
13
- | Add `"openalex"` to `SourceName` | `src/utils/models.py:9` | ✅ Done |
14
- | Add `metadata` field to `Evidence` | `src/utils/models.py:39-42` | ✅ Done |
15
- | Export all tools from `__init__.py` | `src/tools/__init__.py` | ✅ Done |
16
-
17
- All 110 tests pass after these changes.
18
-
19
- ---
20
-
21
- ## Priority Order
22
-
23
- | Phase | Name | Priority | Effort | Value |
24
- |-------|------|----------|--------|-------|
25
- | **17** | Rate Limiting | P0 CRITICAL | 1 hour | Stability |
26
- | **15** | OpenAlex | HIGH | 2-3 hours | Very High |
27
- | **16** | PubMed Full-Text | MEDIUM | 3 hours | High |
28
-
29
- **Recommended implementation order**: 17 → 15 → 16
30
-
31
- ---
32
-
33
- ## Phase 15: OpenAlex Integration
34
-
35
- **File**: [15_PHASE_OPENALEX.md](./15_PHASE_OPENALEX.md)
36
-
37
- Add OpenAlex as 4th data source for:
38
- - Citation networks (who cites whom)
39
- - Concept tagging (semantic discovery)
40
- - 209M+ scholarly works
41
- - Free, no API key required
42
-
43
- **Quick Start**:
44
- ```bash
45
- # Create the tool
46
- touch src/tools/openalex.py
47
- touch tests/unit/tools/test_openalex.py
48
-
49
- # Run tests first (TDD)
50
- uv run pytest tests/unit/tools/test_openalex.py -v
51
-
52
- # Demo
53
- uv run python examples/openalex_demo.py
54
- ```
55
-
56
- ---
57
-
58
- ## Phase 16: PubMed Full-Text
59
-
60
- **File**: [16_PHASE_PUBMED_FULLTEXT.md](./16_PHASE_PUBMED_FULLTEXT.md)
61
-
62
- Add full-text retrieval via BioC API for:
63
- - Complete paper text (not just abstracts)
64
- - Structured sections (intro, methods, results)
65
- - Better evidence for LLM synthesis
66
-
67
- **Quick Start**:
68
- ```bash
69
- # Add methods to existing pubmed.py
70
- # Tests in test_pubmed_fulltext.py
71
-
72
- # Run tests
73
- uv run pytest tests/unit/tools/test_pubmed_fulltext.py -v
74
-
75
- # Demo
76
- uv run python examples/pubmed_fulltext_demo.py
77
- ```
78
-
79
- ---
80
-
81
- ## Phase 17: Rate Limiting
82
-
83
- **File**: [17_PHASE_RATE_LIMITING.md](./17_PHASE_RATE_LIMITING.md)
84
-
85
- Replace naive sleep-based rate limiting with `limits` library for:
86
- - Moving window algorithm
87
- - Shared limits across instances
88
- - Configurable per-API rates
89
- - Production-grade stability
90
-
91
- **Quick Start**:
92
- ```bash
93
- # Add dependency
94
- uv add limits
95
-
96
- # Create module
97
- touch src/tools/rate_limiter.py
98
- touch tests/unit/tools/test_rate_limiting.py
99
-
100
- # Run tests
101
- uv run pytest tests/unit/tools/test_rate_limiting.py -v
102
-
103
- # Demo
104
- uv run python examples/rate_limiting_demo.py
105
- ```
106
-
107
- ---
108
-
109
- ## TDD Workflow
110
-
111
- Each implementation doc follows this pattern:
112
-
113
- 1. **Write tests first** - Define expected behavior
114
- 2. **Run tests** - Verify they fail (red)
115
- 3. **Implement** - Write minimal code to pass
116
- 4. **Run tests** - Verify they pass (green)
117
- 5. **Refactor** - Clean up if needed
118
- 6. **Demo** - Verify end-to-end with real APIs
119
- 7. **`make check`** - Ensure no regressions
120
-
121
- ---
122
-
123
- ## Related Brainstorming Docs
124
-
125
- These implementation plans are derived from:
126
-
127
- - [00_ROADMAP_SUMMARY.md](../00_ROADMAP_SUMMARY.md) - Priority overview
128
- - [01_PUBMED_IMPROVEMENTS.md](../01_PUBMED_IMPROVEMENTS.md) - PubMed details
129
- - [02_CLINICALTRIALS_IMPROVEMENTS.md](../02_CLINICALTRIALS_IMPROVEMENTS.md) - CT.gov details
130
- - [03_EUROPEPMC_IMPROVEMENTS.md](../03_EUROPEPMC_IMPROVEMENTS.md) - Europe PMC details
131
- - [04_OPENALEX_INTEGRATION.md](../04_OPENALEX_INTEGRATION.md) - OpenAlex integration
132
-
133
- ---
134
-
135
- ## Future Phases (Not Yet Documented)
136
-
137
- Based on brainstorming, these could be added later:
138
-
139
- - **Phase 18**: ClinicalTrials.gov Results Retrieval
140
- - **Phase 19**: Europe PMC Annotations API
141
- - **Phase 20**: Drug Name Normalization (RxNorm)
142
- - **Phase 21**: Citation Network Queries (OpenAlex)
143
- - **Phase 22**: Semantic Search with Embeddings
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md DELETED
@@ -1,189 +0,0 @@
1
- # Situation Analysis: Pydantic-AI + Microsoft Agent Framework Integration
2
-
3
- **Date:** November 27, 2025
4
- **Status:** ACTIVE DECISION REQUIRED
5
- **Risk Level:** HIGH - DO NOT MERGE PR #41 UNTIL RESOLVED
6
-
7
- ---
8
-
9
- ## 1. The Problem
10
-
11
- We almost merged a refactor that would have **deleted** multi-agent orchestration capability from the codebase, mistakenly believing pydantic-ai and Microsoft Agent Framework were mutually exclusive.
12
-
13
- **They are not.** They are complementary:
14
- - **pydantic-ai** (Library): Ensures LLM outputs match Pydantic schemas
15
- - **Microsoft Agent Framework** (Framework): Orchestrates multi-agent workflows
16
-
17
- ---
18
-
19
- ## 2. Current Branch State
20
-
21
- | Branch | Location | Has Agent Framework? | Has Pydantic-AI Improvements? | Status |
22
- |--------|----------|---------------------|------------------------------|--------|
23
- | `origin/dev` | GitHub | YES | NO | **SAFE - Source of Truth** |
24
- | `huggingface-upstream/dev` | HF Spaces | YES | NO | **SAFE - Same as GitHub** |
25
- | `origin/main` | GitHub | YES | NO | **SAFE** |
26
- | `feat/pubmed-fulltext` | GitHub | NO (deleted) | YES | **DANGER - Has destructive refactor** |
27
- | `refactor/pydantic-unification` | Local | NO (deleted) | YES | **DANGER - Redundant, delete** |
28
- | Local `dev` | Local only | NO (deleted) | YES | **DANGER - NOT PUSHED (thankfully)** |
29
-
30
- ### Key Files at Risk
31
-
32
- **On `origin/dev` (PRESERVED):**
33
- ```text
34
- src/agents/
35
- ├── analysis_agent.py # StatisticalAnalyzer wrapper
36
- ├── hypothesis_agent.py # Hypothesis generation
37
- ├── judge_agent.py # JudgeHandler wrapper
38
- ├── magentic_agents.py # Multi-agent definitions
39
- ├── report_agent.py # Report synthesis
40
- ├── search_agent.py # SearchHandler wrapper
41
- ├── state.py # Thread-safe state management
42
- └── tools.py # @ai_function decorated tools
43
-
44
- src/orchestrator_magentic.py # Multi-agent orchestrator
45
- src/utils/llm_factory.py # Centralized LLM client factory
46
- ```
47
-
48
- **Deleted in refactor branch (would be lost if merged):**
49
- - All of the above
50
-
51
- ---
52
-
53
- ## 3. Target Architecture
54
-
55
- ```text
56
- ┌─────────────────────────────────────────────────────────────────┐
57
- │ Microsoft Agent Framework (Orchestration Layer) │
58
- │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
59
- │ │ SearchAgent │→ │ JudgeAgent │→ │ ReportAgent │ │
60
- │ │ (BaseAgent) │ │ (BaseAgent) │ │ (BaseAgent) │ │
61
- │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
62
- │ │ │ │ │
63
- │ ▼ ▼ ▼ │
64
- │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
65
- │ │ pydantic-ai │ │ pydantic-ai │ │ pydantic-ai │ │
66
- │ │ Agent() │ │ Agent() │ │ Agent() │ │
67
- │ │ output_type= │ │ output_type= │ │ output_type= │ │
68
- │ │ SearchResult │ │ JudgeAssess │ │ Report │ │
69
- │ └──────────────┘ └──────────────┘ └──────────────┘ │
70
- └─────────────────────────────────────────────────────────────────┘
71
- ```
72
-
73
- **Why this architecture:**
74
- 1. **Agent Framework** handles: workflow coordination, state passing, middleware, observability
75
- 2. **pydantic-ai** handles: type-safe LLM calls within each agent
76
-
77
- ---
78
-
79
- ## 4. CRITICAL: Naming Confusion Clarification
80
-
81
- > **Senior Agent Review Finding:** The codebase uses "magentic" in file names (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT** the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework (`agent-framework-core`).
82
-
83
- **The naming confusion:**
84
- - `magentic` (PyPI package): A different library for structured LLM outputs
85
- - "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
86
- - `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
87
-
88
- **Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py` to eliminate confusion.
89
-
90
- ---
91
-
92
- ## 5. What the Refactor DID Get Right
93
-
94
- The refactor branch (`feat/pubmed-fulltext`) has some valuable improvements:
95
-
96
- 1. **`judges.py` unified `get_model()`** - Supports OpenAI, Anthropic, AND HuggingFace via pydantic-ai
97
- 2. **HuggingFace free tier support** - `HuggingFaceModel` integration
98
- 3. **Test fix** - Properly mocks `HuggingFaceModel` class
99
- 4. **Removed broken magentic optional dependency** from pyproject.toml (this was correct - the old `magentic` package is different from Microsoft Agent Framework)
100
-
101
- **What it got WRONG:**
102
- 1. Deleted `src/agents/` entirely instead of refactoring them
103
- 2. Deleted `src/orchestrator_magentic.py` instead of fixing it
104
- 3. Conflated "magentic" (old package) with "Microsoft Agent Framework" (current framework)
105
-
106
- ---
107
-
108
- ## 6. Options for Path Forward
109
-
110
- ### Option A: Abandon Refactor, Start Fresh
111
- - Close PR #41
112
- - Delete `feat/pubmed-fulltext` and `refactor/pydantic-unification` branches
113
- - Reset local `dev` to match `origin/dev`
114
- - Cherry-pick ONLY the good parts (judges.py improvements, HF support)
115
- - **Pros:** Clean, safe
116
- - **Cons:** Lose some work, need to redo carefully
117
-
118
- ### Option B: Cherry-Pick Good Parts to origin/dev
119
- - Do NOT merge PR #41
120
- - Create new branch from `origin/dev`
121
- - Cherry-pick specific commits/changes that improve pydantic-ai usage
122
- - Keep agent framework code intact
123
- - **Pros:** Preserves both, surgical
124
- - **Cons:** Requires careful file-by-file review
125
-
126
- ### Option C: Revert Deletions in Refactor Branch
127
- - On `feat/pubmed-fulltext`, restore deleted agent files from `origin/dev`
128
- - Keep the pydantic-ai improvements
129
- - Merge THAT to dev
130
- - **Pros:** Gets both
131
- - **Cons:** Complex git operations, risk of conflicts
132
-
133
- ---
134
-
135
- ## 7. Recommended Action: Option B (Cherry-Pick)
136
-
137
- **Step-by-step:**
138
-
139
- 1. **Close PR #41** (do not merge)
140
- 2. **Delete redundant branches:**
141
- - `refactor/pydantic-unification` (local)
142
- - Reset local `dev` to `origin/dev`
143
- 3. **Create new branch from origin/dev:**
144
- ```bash
145
- git checkout -b feat/pydantic-ai-improvements origin/dev
146
- ```
147
- 4. **Cherry-pick or manually port these improvements:**
148
- - `src/agent_factory/judges.py` - the unified `get_model()` function
149
- - `examples/free_tier_demo.py` - HuggingFace demo
150
- - Test improvements
151
- 5. **Do NOT delete any agent framework files**
152
- 6. **Create PR for review**
153
-
154
- ---
155
-
156
- ## 8. Files to Cherry-Pick (Safe Improvements)
157
-
158
- | File | What Changed | Safe to Port? |
159
- |------|-------------|---------------|
160
- | `src/agent_factory/judges.py` | Added `HuggingFaceModel` support in `get_model()` | YES |
161
- | `examples/free_tier_demo.py` | New demo for HF inference | YES |
162
- | `tests/unit/agent_factory/test_judges.py` | Fixed HF model mocking | YES |
163
- | `pyproject.toml` | Removed old `magentic` optional dep | MAYBE (review carefully) |
164
-
165
- ---
166
-
167
- ## 9. Questions to Answer Before Proceeding
168
-
169
- 1. **For the hackathon**: Do we need full multi-agent orchestration, or is single-agent sufficient?
170
- 2. **For DeepCritical mainline**: Is the plan to use Microsoft Agent Framework for orchestration?
171
- 3. **Timeline**: How much time do we have to get this right?
172
-
173
- ---
174
-
175
- ## 10. Immediate Actions (DO NOW)
176
-
177
- - [ ] **DO NOT merge PR #41**
178
- - [ ] Close PR #41 with comment explaining the situation
179
- - [ ] Do not push local `dev` branch anywhere
180
- - [ ] Confirm HuggingFace Spaces is untouched (it is - verified)
181
-
182
- ---
183
-
184
- ## 11. Decision Log
185
-
186
- | Date | Decision | Rationale |
187
- |------|----------|-----------|
188
- | 2025-11-27 | Pause refactor merge | Discovered agent framework and pydantic-ai are complementary, not exclusive |
189
- | TBD | ? | Awaiting decision on path forward |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md DELETED
@@ -1,289 +0,0 @@
1
- # Architecture Specification: Dual-Mode Agent System
2
-
3
- **Date:** November 27, 2025
4
- **Status:** SPECIFICATION
5
- **Goal:** Graceful degradation from full multi-agent orchestration to simple single-agent mode
6
-
7
- ---
8
-
9
- ## 1. Core Concept: Two Operating Modes
10
-
11
- ```text
12
- ┌─────────────────────────────────────────────────────────────────────┐
13
- │ USER REQUEST │
14
- │ │ │
15
- │ ▼ │
16
- │ ┌─────────────────┐ │
17
- │ │ Mode Selection │ │
18
- │ │ (Auto-detect) │ │
19
- │ └────────┬────────┘ │
20
- │ │ │
21
- │ ┌───────────────┴───────────────┐ │
22
- │ │ │ │
23
- │ ▼ ▼ │
24
- │ ┌─────────────────┐ ┌─────────────────┐ │
25
- │ │ SIMPLE MODE │ │ ADVANCED MODE │ │
26
- │ │ (Free Tier) │ │ (Paid Tier) │ │
27
- │ │ │ │ │ │
28
- │ │ pydantic-ai │ │ MS Agent Fwk │ │
29
- │ │ single-agent │ │ + pydantic-ai │ │
30
- │ │ loop │ │ multi-agent │ │
31
- │ └─────────────────┘ └─────────────────┘ │
32
- │ │ │ │
33
- │ └───────────────┬───────────────┘ │
34
- │ ▼ │
35
- │ ┌─────────────────┐ │
36
- │ │ Research Report │ │
37
- │ │ with Citations │ │
38
- │ └─────────────────┘ │
39
- └─────────────────────────────────────────────────────────────────────┘
40
- ```
41
-
42
- ---
43
-
44
- ## 2. Mode Comparison
45
-
46
- | Aspect | Simple Mode | Advanced Mode |
47
- |--------|-------------|---------------|
48
- | **Trigger** | No API key OR `LLM_PROVIDER=huggingface` | OpenAI API key present (currently OpenAI only) |
49
- | **Framework** | pydantic-ai only | Microsoft Agent Framework + pydantic-ai |
50
- | **Architecture** | Single orchestrator loop | Multi-agent coordination |
51
- | **Agents** | One agent does Search→Judge→Report | SearchAgent, JudgeAgent, ReportAgent, AnalysisAgent |
52
- | **State Management** | Simple dict | Thread-safe `MagenticState` with context vars |
53
- | **Quality** | Good (functional) | Better (specialized agents, coordination) |
54
- | **Cost** | Free (HuggingFace Inference) | Paid (OpenAI/Anthropic) |
55
- | **Use Case** | Demos, hackathon, budget-constrained | Production, research quality |
56
-
57
- ---
58
-
59
- ## 3. Simple Mode Architecture (pydantic-ai Only)
60
-
61
- ```text
62
- ┌─────────────────────────────────────────────────────┐
63
- │ Orchestrator │
64
- │ │
65
- │ while not sufficient and iteration < max: │
66
- │ 1. SearchHandler.execute(query) │
67
- │ 2. JudgeHandler.assess(evidence) ◄── pydantic-ai Agent │
68
- │ 3. if sufficient: break │
69
- │ 4. query = judge.next_queries │
70
- │ │
71
- │ return ReportGenerator.generate(evidence) │
72
- └─────────────────────────────────────────────────────┘
73
- ```
74
-
75
- **Components:**
76
- - `src/orchestrator.py` - Simple loop orchestrator
77
- - `src/agent_factory/judges.py` - JudgeHandler with pydantic-ai
78
- - `src/tools/search_handler.py` - Scatter-gather search
79
- - `src/tools/pubmed.py`, `clinicaltrials.py`, `europepmc.py` - Search tools
80
-
81
- ---
82
-
83
- ## 4. Advanced Mode Architecture (MS Agent Framework + pydantic-ai)
84
-
85
- ```text
86
- ┌─────────────────────────────────────────────────────────────────────┐
87
- │ Microsoft Agent Framework Orchestrator │
88
- │ │
89
- │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
90
- │ │ SearchAgent │───▶│ JudgeAgent │───▶│ ReportAgent │ │
91
- │ │ (BaseAgent) │ │ (BaseAgent) │ │ (BaseAgent) │ │
92
- │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
93
- │ │ │ │ │
94
- │ ▼ ▼ ▼ │
95
- │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
96
- │ │ pydantic-ai │ │ pydantic-ai │ │ pydantic-ai │ │
97
- │ │ Agent() │ │ Agent() │ │ Agent() │ │
98
- │ │ output_type=│ │ output_type=│ │ output_type=│ │
99
- │ │ SearchResult│ │ JudgeAssess │ │ Report │ │
100
- │ └─────────────┘ └─────────────┘ └─────────────┘ │
101
- │ │
102
- │ Shared State: MagenticState (thread-safe via contextvars) │
103
- │ - evidence: list[Evidence] │
104
- │ - embedding_service: EmbeddingService │
105
- └─────────────────────────────────────────────────────────────────────┘
106
- ```
107
-
108
- **Components:**
109
- - `src/orchestrator_magentic.py` - Multi-agent orchestrator
110
- - `src/agents/search_agent.py` - SearchAgent (BaseAgent)
111
- - `src/agents/judge_agent.py` - JudgeAgent (BaseAgent)
112
- - `src/agents/report_agent.py` - ReportAgent (BaseAgent)
113
- - `src/agents/analysis_agent.py` - AnalysisAgent (BaseAgent)
114
- - `src/agents/state.py` - Thread-safe state management
115
- - `src/agents/tools.py` - @ai_function decorated tools
116
-
117
- ---
118
-
119
- ## 5. Mode Selection Logic
120
-
121
- ```python
122
- # src/orchestrator_factory.py (actual implementation)
123
-
124
- def create_orchestrator(
125
- search_handler: SearchHandlerProtocol | None = None,
126
- judge_handler: JudgeHandlerProtocol | None = None,
127
- config: OrchestratorConfig | None = None,
128
- mode: Literal["simple", "magentic", "advanced"] | None = None,
129
- ) -> Any:
130
- """
131
- Auto-select orchestrator based on available credentials.
132
-
133
- Priority:
134
- 1. If mode explicitly set, use that
135
- 2. If OpenAI key available -> Advanced Mode (currently OpenAI only)
136
- 3. Otherwise -> Simple Mode (HuggingFace free tier)
137
- """
138
- effective_mode = _determine_mode(mode)
139
-
140
- if effective_mode == "advanced":
141
- orchestrator_cls = _get_magentic_orchestrator_class()
142
- return orchestrator_cls(max_rounds=config.max_iterations if config else 10)
143
-
144
- # Simple mode requires handlers
145
- if search_handler is None or judge_handler is None:
146
- raise ValueError("Simple mode requires search_handler and judge_handler")
147
-
148
- return Orchestrator(
149
- search_handler=search_handler,
150
- judge_handler=judge_handler,
151
- config=config,
152
- )
153
- ```
154
-
155
- ---
156
-
157
- ## 6. Shared Components (Both Modes Use)
158
-
159
- These components work in both modes:
160
-
161
- | Component | Purpose |
162
- |-----------|---------|
163
- | `src/tools/pubmed.py` | PubMed search |
164
- | `src/tools/clinicaltrials.py` | ClinicalTrials.gov search |
165
- | `src/tools/europepmc.py` | Europe PMC search |
166
- | `src/tools/search_handler.py` | Scatter-gather orchestration |
167
- | `src/tools/rate_limiter.py` | Rate limiting |
168
- | `src/utils/models.py` | Evidence, Citation, JudgeAssessment |
169
- | `src/utils/config.py` | Settings |
170
- | `src/services/embeddings.py` | Vector search (optional) |
171
-
172
- ---
173
-
174
- ## 7. pydantic-ai Integration Points
175
-
176
- Both modes use pydantic-ai for structured LLM outputs:
177
-
178
- ```python
179
- # In JudgeHandler (both modes)
180
- from pydantic_ai import Agent
181
- from pydantic_ai.models.huggingface import HuggingFaceModel
182
- from pydantic_ai.models.openai import OpenAIModel
183
- from pydantic_ai.models.anthropic import AnthropicModel
184
-
185
- class JudgeHandler:
186
- def __init__(self, model: Any = None):
187
- self.model = model or get_model() # Auto-selects based on config
188
- self.agent = Agent(
189
- model=self.model,
190
- output_type=JudgeAssessment, # Structured output!
191
- system_prompt=SYSTEM_PROMPT,
192
- )
193
-
194
- async def assess(self, question: str, evidence: list[Evidence]) -> JudgeAssessment:
195
- result = await self.agent.run(format_prompt(question, evidence))
196
- return result.output # Guaranteed to be JudgeAssessment
197
- ```
198
-
199
- ---
200
-
201
- ## 8. Microsoft Agent Framework Integration Points
202
-
203
- Advanced mode wraps pydantic-ai agents in BaseAgent:
204
-
205
- ```python
206
- # In JudgeAgent (advanced mode only)
207
- from agent_framework import BaseAgent, AgentRunResponse, ChatMessage, Role
208
-
209
- class JudgeAgent(BaseAgent):
210
- def __init__(self, judge_handler: JudgeHandlerProtocol):
211
- super().__init__(
212
- name="JudgeAgent",
213
- description="Evaluates evidence quality",
214
- )
215
- self._handler = judge_handler # Uses pydantic-ai internally
216
-
217
- async def run(self, messages, **kwargs) -> AgentRunResponse:
218
- question = extract_question(messages)
219
- evidence = self._evidence_store.get("current", [])
220
-
221
- # Delegate to pydantic-ai powered handler
222
- assessment = await self._handler.assess(question, evidence)
223
-
224
- return AgentRunResponse(
225
- messages=[ChatMessage(role=Role.ASSISTANT, text=format_response(assessment))],
226
- additional_properties={"assessment": assessment.model_dump()},
227
- )
228
- ```
229
-
230
- ---
231
-
232
- ## 9. Benefits of This Architecture
233
-
234
- 1. **Graceful Degradation**: Works without API keys (free tier)
235
- 2. **Progressive Enhancement**: Better with API keys (orchestration)
236
- 3. **Code Reuse**: pydantic-ai handlers shared between modes
237
- 4. **Hackathon Ready**: Demo works without requiring paid keys
238
- 5. **Production Ready**: Full orchestration available when needed
239
- 6. **Future Proof**: Can add more agents to advanced mode
240
- 7. **Testable**: Simple mode is easier to unit test
241
-
242
- ---
243
-
244
- ## 10. Known Risks and Mitigations
245
-
246
- > **From Senior Agent Review**
247
-
248
- ### 10.1 Bridge Complexity (MEDIUM)
249
-
250
- **Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai). Both are async. Context variables (`MagenticState`) must propagate correctly through the pydantic-ai call stack.
251
-
252
- **Mitigation:**
253
- - pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
254
- - Test context propagation explicitly in integration tests
255
- - If issues arise, pass state explicitly rather than via context vars
256
-
257
- ### 10.2 Integration Drift (MEDIUM)
258
-
259
- **Risk:** Simple Mode and Advanced Mode might diverge in behavior over time (e.g., Simple Mode uses logic A, Advanced Mode uses logic B).
260
-
261
- **Mitigation:**
262
- - Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
263
- - Handlers are the single source of truth for business logic
264
- - Agents are thin wrappers that delegate to handlers
265
-
266
- ### 10.3 Testing Burden (LOW-MEDIUM)
267
-
268
- **Risk:** Two distinct orchestrators (`src/orchestrator.py` and `src/orchestrator_magentic.py`) doubles integration testing surface area.
269
-
270
- **Mitigation:**
271
- - Unit test handlers independently (shared code)
272
- - Integration tests for each mode separately
273
- - End-to-end tests verify same output for same input (determinism permitting)
274
-
275
- ### 10.4 Dependency Conflicts (LOW)
276
-
277
- **Risk:** `agent-framework-core` might conflict with `pydantic-ai`'s dependencies (e.g., different pydantic versions).
278
-
279
- **Status:** Both use `pydantic>=2.x`. Should be compatible.
280
-
281
- ---
282
-
283
- ## 11. Naming Clarification
284
-
285
- > See `00_SITUATION_AND_PLAN.md` Section 4 for full details.
286
-
287
- **Important:** The codebase uses "magentic" in file names (`orchestrator_magentic.py`, `magentic_agents.py`) but this refers to our internal naming for Microsoft Agent Framework integration, **NOT** the `magentic` PyPI package.
288
-
289
- **Future action:** Rename to `orchestrator_advanced.py` to eliminate confusion.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md DELETED
@@ -1,112 +0,0 @@
1
- # Implementation Phases: Dual-Mode Agent System
2
-
3
- **Date:** November 27, 2025
4
- **Status:** IMPLEMENTATION PLAN (REVISED)
5
- **Strategy:** TDD (Test-Driven Development), SOLID Principles
6
- **Dependency Strategy:** PyPI (agent-framework-core)
7
-
8
- ---
9
-
10
- ## Phase 0: Environment Validation & Cleanup
11
-
12
- **Goal:** Ensure clean state and dependencies are correctly installed.
13
-
14
- ### Step 0.1: Verify PyPI Package
15
- The `agent-framework-core` package is published on PyPI by Microsoft. Verify installation:
16
-
17
- ```bash
18
- uv sync --all-extras
19
- python -c "from agent_framework import ChatAgent; print('OK')"
20
- ```
21
-
22
- ### Step 0.2: Branch State
23
- We are on `feat/dual-mode-architecture`. Ensure it is up to date with `origin/dev` before starting.
24
-
25
- **Note:** The `reference_repos/agent-framework` folder is kept for reference/documentation only.
26
- The production dependency uses the official PyPI release.
27
-
28
- ---
29
-
30
- ## Phase 1: Pydantic-AI Improvements (Simple Mode)
31
-
32
- **Goal:** Implement `HuggingFaceModel` support in `JudgeHandler` using strict TDD.
33
-
34
- ### Step 1.1: Test First (Red)
35
- Create `tests/unit/agent_factory/test_judges_factory.py`:
36
- - Test `get_model()` returns `HuggingFaceModel` when `LLM_PROVIDER=huggingface`.
37
- - Test `get_model()` respects `HF_TOKEN`.
38
- - Test fallback to OpenAI.
39
-
40
- ### Step 1.2: Implementation (Green)
41
- Update `src/utils/config.py`:
42
- - Add `huggingface_model` and `hf_token` fields.
43
-
44
- Update `src/agent_factory/judges.py`:
45
- - Implement `get_model` with the logic derived from the tests.
46
- - Use dependency injection for the model where possible.
47
-
48
- ### Step 1.3: Refactor
49
- Ensure `JudgeHandler` is loosely coupled from the specific model provider.
50
-
51
- ---
52
-
53
- ## Phase 2: Orchestrator Factory (The Switch)
54
-
55
- **Goal:** Implement the factory pattern to switch between Simple and Advanced modes.
56
-
57
- ### Step 2.1: Test First (Red)
58
- Create `tests/unit/test_orchestrator_factory.py`:
59
- - Test `create_orchestrator` returns `Orchestrator` (simple) when API keys are missing.
60
- - Test `create_orchestrator` returns `MagenticOrchestrator` (advanced) when OpenAI key exists.
61
- - Test explicit mode override.
62
-
63
- ### Step 2.2: Implementation (Green)
64
- Update `src/orchestrator_factory.py` to implement the selection logic.
65
-
66
- ---
67
-
68
- ## Phase 3: Agent Framework Integration (Advanced Mode)
69
-
70
- **Goal:** Integrate Microsoft Agent Framework from PyPI.
71
-
72
- ### Step 3.1: Dependency Management
73
- The `agent-framework-core` package is installed from PyPI:
74
- ```toml
75
- [project.optional-dependencies]
76
- magentic = [
77
- "agent-framework-core>=1.0.0b251120,<2.0.0", # Microsoft Agent Framework (PyPI)
78
- ]
79
- ```
80
- Install with: `uv sync --all-extras`
81
-
82
- ### Step 3.2: Verify Imports (Test First)
83
- Create `tests/unit/agents/test_agent_imports.py`:
84
- - Verify `from agent_framework import ChatAgent` works.
85
- - Verify instantiation of `ChatAgent` with a mock client.
86
-
87
- ### Step 3.3: Update Agents
88
- Refactor `src/agents/*.py` to ensure they match the exact signature of the local `ChatAgent` class.
89
- - **SOLID:** Ensure agents have single responsibilities.
90
- - **DRY:** Share tool definitions between Pydantic-AI simple mode and Agent Framework advanced mode.
91
-
92
- ---
93
-
94
- ## Phase 4: UI & End-to-End Verification
95
-
96
- **Goal:** Update Gradio to reflect the active mode.
97
-
98
- ### Step 4.1: UI Updates
99
- Update `src/app.py` to display "Simple Mode" vs "Advanced Mode".
100
-
101
- ### Step 4.2: End-to-End Test
102
- Run the full loop:
103
- 1. Simple Mode (No Keys) -> Search -> Judge (HF) -> Report.
104
- 2. Advanced Mode (OpenAI Key) -> SearchAgent -> JudgeAgent -> ReportAgent.
105
-
106
- ---
107
-
108
- ## Phase 5: Cleanup & Documentation
109
-
110
- - Remove unused code.
111
- - Update main README.md.
112
- - Final `make check`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md DELETED
@@ -1,112 +0,0 @@
1
- # Immediate Actions Checklist
2
-
3
- **Date:** November 27, 2025
4
- **Priority:** Execute in order
5
-
6
- ---
7
-
8
- ## Before Starting Implementation
9
-
10
- ### 1. Close PR #41 (CRITICAL)
11
-
12
- ```bash
13
- gh pr close 41 --comment "Architecture decision changed. Cherry-picking improvements to preserve both pydantic-ai and Agent Framework capabilities."
14
- ```
15
-
16
- ### 2. Verify HuggingFace Spaces is Safe
17
-
18
- ```bash
19
- # Should show agent framework files exist
20
- git ls-tree --name-only huggingface-upstream/dev -- src/agents/
21
- git ls-tree --name-only huggingface-upstream/dev -- src/orchestrator_magentic.py
22
- ```
23
-
24
- Expected output: Files should exist (they do as of this writing).
25
-
26
- ### 3. Clean Local Environment
27
-
28
- ```bash
29
- # Switch to main first
30
- git checkout main
31
-
32
- # Delete problematic branches
33
- git branch -D refactor/pydantic-unification 2>/dev/null || true
34
- git branch -D feat/pubmed-fulltext 2>/dev/null || true
35
-
36
- # Reset local dev to origin/dev
37
- git branch -D dev 2>/dev/null || true
38
- git checkout -b dev origin/dev
39
-
40
- # Verify agent framework code exists
41
- ls src/agents/
42
- # Expected: __init__.py, analysis_agent.py, hypothesis_agent.py, judge_agent.py,
43
- # magentic_agents.py, report_agent.py, search_agent.py, state.py, tools.py
44
-
45
- ls src/orchestrator_magentic.py
46
- # Expected: file exists
47
- ```
48
-
49
- ### 4. Create Fresh Feature Branch
50
-
51
- ```bash
52
- git checkout -b feat/dual-mode-architecture origin/dev
53
- ```
54
-
55
- ---
56
-
57
- ## Decision Points
58
-
59
- Before proceeding, confirm:
60
-
61
- 1. **For hackathon**: Do we need advanced mode, or is simple mode sufficient?
62
- - Simple mode = faster to implement, works today
63
- - Advanced mode = better quality, more work
64
-
65
- 2. **Timeline**: How much time do we have?
66
- - If < 1 day: Focus on simple mode only
67
- - If > 1 day: Implement dual-mode
68
-
69
- 3. **Dependencies**: Is `agent-framework-core` available?
70
- - Check: `pip index versions agent-framework-core`
71
- - If not on PyPI, may need to install from GitHub
72
-
73
- ---
74
-
75
- ## Quick Start (Simple Mode Only)
76
-
77
- If time is limited, implement only simple mode improvements:
78
-
79
- ```bash
80
- # On feat/dual-mode-architecture branch
81
-
82
- # 1. Update judges.py to add HuggingFace support
83
- # 2. Update config.py to add HF settings
84
- # 3. Create free_tier_demo.py
85
- # 4. Run make check
86
- # 5. Create PR to dev
87
- ```
88
-
89
- This gives you free-tier capability without touching agent framework code.
90
-
91
- ---
92
-
93
- ## Quick Start (Full Dual-Mode)
94
-
95
- If time permits, implement full dual-mode:
96
-
97
- Follow phases 1-6 in `02_IMPLEMENTATION_PHASES.md`
98
-
99
- ---
100
-
101
- ## Emergency Rollback
102
-
103
- If anything goes wrong:
104
-
105
- ```bash
106
- # Reset to safe state
107
- git checkout main
108
- git branch -D feat/dual-mode-architecture
109
- git checkout -b feat/dual-mode-architecture origin/dev
110
- ```
111
-
112
- Origin/dev is the safe fallback - it has agent framework intact.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/magentic-pydantic/04_FOLLOWUP_REVIEW_REQUEST.md DELETED
@@ -1,158 +0,0 @@
1
- # Follow-Up Review Request: Did We Implement Your Feedback?
2
-
3
- **Date:** November 27, 2025
4
- **Context:** You previously reviewed our dual-mode architecture plan and provided feedback. We have updated the documentation. Please verify we correctly implemented your recommendations.
5
-
6
- ---
7
-
8
- ## Your Original Feedback vs Our Changes
9
-
10
- ### 1. Naming Confusion Clarification
11
-
12
- **Your feedback:** "You are using Microsoft Agent Framework, but you've named your integration 'Magentic'. This caused the confusion."
13
-
14
- **Our change:** Added Section 4 in `00_SITUATION_AND_PLAN.md`:
15
- ```markdown
16
- ## 4. CRITICAL: Naming Confusion Clarification
17
-
18
- > **Senior Agent Review Finding:** The codebase uses "magentic" in file names
19
- > (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT**
20
- > the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework.
21
-
22
- **The naming confusion:**
23
- - `magentic` (PyPI package): A different library for structured LLM outputs
24
- - "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
25
- - `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
26
-
27
- **Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py`
28
- ```
29
-
30
- **Status:** ✅ IMPLEMENTED
31
-
32
- ---
33
-
34
- ### 2. Bridge Complexity Warning
35
-
36
- **Your feedback:** "You must ensure MagenticState (context vars) propagates correctly through the pydantic-ai call stack."
37
-
38
- **Our change:** Added Section 10.1 in `01_ARCHITECTURE_SPEC.md`:
39
- ```markdown
40
- ### 10.1 Bridge Complexity (MEDIUM)
41
-
42
- **Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai).
43
- Both are async. Context variables (`MagenticState`) must propagate correctly.
44
-
45
- **Mitigation:**
46
- - pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
47
- - Test context propagation explicitly in integration tests
48
- - If issues arise, pass state explicitly rather than via context vars
49
- ```
50
-
51
- **Status:** ✅ IMPLEMENTED
52
-
53
- ---
54
-
55
- ### 3. Integration Drift Warning
56
-
57
- **Your feedback:** "Simple Mode and Advanced Mode might diverge in behavior."
58
-
59
- **Our change:** Added Section 10.2 in `01_ARCHITECTURE_SPEC.md`:
60
- ```markdown
61
- ### 10.2 Integration Drift (MEDIUM)
62
-
63
- **Risk:** Simple Mode and Advanced Mode might diverge in behavior over time.
64
-
65
- **Mitigation:**
66
- - Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
67
- - Handlers are the single source of truth for business logic
68
- - Agents are thin wrappers that delegate to handlers
69
- ```
70
-
71
- **Status:** ✅ IMPLEMENTED
72
-
73
- ---
74
-
75
- ### 4. Testing Burden Warning
76
-
77
- **Your feedback:** "You now have two distinct orchestrators to maintain. This doubles your integration testing surface area."
78
-
79
- **Our change:** Added Section 10.3 in `01_ARCHITECTURE_SPEC.md`:
80
- ```markdown
81
- ### 10.3 Testing Burden (LOW-MEDIUM)
82
-
83
- **Risk:** Two distinct orchestrators doubles integration testing surface area.
84
-
85
- **Mitigation:**
86
- - Unit test handlers independently (shared code)
87
- - Integration tests for each mode separately
88
- - End-to-end tests verify same output for same input
89
- ```
90
-
91
- **Status:** ✅ IMPLEMENTED
92
-
93
- ---
94
-
95
- ### 5. Rename Recommendation
96
-
97
- **Your feedback:** "Rename `src/orchestrator_magentic.py` to `src/orchestrator_advanced.py`"
98
-
99
- **Our change:** Added Step 3.4 in `02_IMPLEMENTATION_PHASES.md`:
100
- ```markdown
101
- ### Step 3.4: (OPTIONAL) Rename "Magentic" to "Advanced"
102
-
103
- > **Senior Agent Recommendation:** Rename files to eliminate confusion.
104
-
105
- git mv src/orchestrator_magentic.py src/orchestrator_advanced.py
106
- git mv src/agents/magentic_agents.py src/agents/advanced_agents.py
107
-
108
- **Note:** This is optional for the hackathon. Can be done in a follow-up PR.
109
- ```
110
-
111
- **Status:** ✅ DOCUMENTED (marked as optional for hackathon)
112
-
113
- ---
114
-
115
- ### 6. Standardize Wrapper Recommendation
116
-
117
- **Your feedback:** "Create a generic `PydanticAiAgentWrapper(BaseAgent)` class instead of manually wrapping each handler."
118
-
119
- **Our change:** NOT YET DOCUMENTED
120
-
121
- **Status:** ⚠️ NOT IMPLEMENTED - Should we add this?
122
-
123
- ---
124
-
125
- ## Questions for Your Review
126
-
127
- 1. **Did we correctly implement your feedback?** Are there any misunderstandings in how we interpreted your recommendations?
128
-
129
- 2. **Is the "Standardize Wrapper" recommendation critical?** Should we add it to the implementation phases, or is it a nice-to-have for later?
130
-
131
- 3. **Dependency versioning:** You noted `agent-framework-core>=1.0.0b251120` might be ephemeral. Should we:
132
- - Pin to a specific version?
133
- - Use a version range?
134
- - Install from GitHub source?
135
-
136
- 4. **Anything else we missed?**
137
-
138
- ---
139
-
140
- ## Files to Re-Review
141
-
142
- 1. `00_SITUATION_AND_PLAN.md` - Added Section 4 (Naming Clarification)
143
- 2. `01_ARCHITECTURE_SPEC.md` - Added Sections 10-11 (Risks, Naming)
144
- 3. `02_IMPLEMENTATION_PHASES.md` - Added Step 3.4 (Optional Rename)
145
-
146
- ---
147
-
148
- ## Current Branch State
149
-
150
- We are now on `feat/dual-mode-architecture` branched from `origin/dev`:
151
- - ✅ Agent framework code intact (`src/agents/`, `src/orchestrator_magentic.py`)
152
- - ✅ Documentation committed
153
- - ❌ PR #41 still open (need to close it)
154
- - ❌ Cherry-pick of pydantic-ai improvements not yet done
155
-
156
- ---
157
-
158
- Please confirm: **GO / NO-GO** to proceed with Phase 1 (cherry-picking pydantic-ai improvements)?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/magentic-pydantic/REVIEW_PROMPT_FOR_SENIOR_AGENT.md DELETED
@@ -1,113 +0,0 @@
1
- # Senior Agent Review Prompt
2
-
3
- Copy and paste everything below this line to a fresh Claude/AI session:
4
-
5
- ---
6
-
7
- ## Context
8
-
9
- I am a junior developer working on a HuggingFace hackathon project called DeepCritical. We made a significant architectural mistake and are now trying to course-correct. I need you to act as a **senior staff engineer** and critically review our proposed solution.
10
-
11
- ## The Situation
12
-
13
- We almost merged a refactor that would have **deleted** our multi-agent orchestration capability, mistakenly believing that `pydantic-ai` (a library for structured LLM outputs) and Microsoft's `agent-framework` (a framework for multi-agent orchestration) were mutually exclusive alternatives.
14
-
15
- **They are not.** They are complementary:
16
- - `pydantic-ai` ensures LLM responses match Pydantic schemas (type-safe outputs)
17
- - `agent-framework` orchestrates multiple agents working together (coordination layer)
18
-
19
- We now want to implement a **dual-mode architecture** where:
20
- - **Simple Mode (No API key):** Uses only pydantic-ai with HuggingFace free tier
21
- - **Advanced Mode (With API key):** Uses Microsoft Agent Framework for orchestration, with pydantic-ai inside each agent for structured outputs
22
-
23
- ## Your Task
24
-
25
- Please perform a **deep, critical review** of:
26
-
27
- 1. **The architecture diagram** (image attached: `assets/magentic-pydantic.png`)
28
- 2. **Our documentation** (4 files listed below)
29
- 3. **The actual codebase** to verify our claims
30
-
31
- ## Specific Questions to Answer
32
-
33
- ### Architecture Validation
34
- 1. Is our understanding correct that pydantic-ai and agent-framework are complementary, not competing?
35
- 2. Does the dual-mode architecture diagram accurately represent how these should integrate?
36
- 3. Are there any architectural flaws or anti-patterns in our proposed design?
37
-
38
- ### Documentation Accuracy
39
- 4. Are the branch states we documented accurate? (Check `git log`, `git ls-tree`)
40
- 5. Is our understanding of what code exists where correct?
41
- 6. Are the implementation phases realistic and in the correct order?
42
- 7. Are there any missing steps or dependencies we overlooked?
43
-
44
- ### Codebase Reality Check
45
- 8. Does `origin/dev` actually have the agent framework code intact? Verify by checking:
46
- - `git ls-tree origin/dev -- src/agents/`
47
- - `git ls-tree origin/dev -- src/orchestrator_magentic.py`
48
- 9. What does the current `src/agents/` code actually import? Does it use `agent_framework` or `agent-framework-core`?
49
- 10. Is the `agent-framework-core` package actually available on PyPI, or do we need to install from source?
50
-
51
- ### Implementation Feasibility
52
- 11. Can the cherry-pick strategy we outlined actually work, or are there merge conflicts we're not seeing?
53
- 12. Is the mode auto-detection logic sound?
54
- 13. What are the risks we haven't identified?
55
-
56
- ### Critical Errors Check
57
- 14. Did we miss anything critical in our analysis?
58
- 15. Are there any factual errors in our documentation?
59
- 16. Would a Google/DeepMind senior engineer approve this plan, or would they flag issues?
60
-
61
- ## Files to Review
62
-
63
- Please read these files in order:
64
-
65
- 1. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md`
66
- 2. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md`
67
- 3. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md`
68
- 4. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md`
69
-
70
- And the architecture diagram:
71
- 5. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/assets/magentic-pydantic.png`
72
-
73
- ## Reference Repositories to Consult
74
-
75
- We have local clones of the source-of-truth repositories:
76
-
77
- - **Original DeepCritical:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/DeepCritical/`
78
- - **Microsoft Agent Framework:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/agent-framework/`
79
- - **Microsoft AutoGen:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/autogen-microsoft/`
80
-
81
- Please cross-reference our hackathon fork against these to verify architectural alignment.
82
-
83
- ## Codebase to Analyze
84
-
85
- Our hackathon fork is at:
86
- `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/`
87
-
88
- Key files to examine:
89
- - `src/agents/` - Agent framework integration
90
- - `src/agent_factory/judges.py` - pydantic-ai integration
91
- - `src/orchestrator.py` - Simple mode orchestrator
92
- - `src/orchestrator_magentic.py` - Advanced mode orchestrator
93
- - `src/orchestrator_factory.py` - Mode selection
94
- - `pyproject.toml` - Dependencies
95
-
96
- ## Expected Output
97
-
98
- Please provide:
99
-
100
- 1. **Validation Summary:** Is our plan sound? (YES/NO with explanation)
101
- 2. **Errors Found:** List any factual errors in our documentation
102
- 3. **Missing Items:** What did we overlook?
103
- 4. **Risk Assessment:** What could go wrong?
104
- 5. **Recommended Changes:** Specific edits to our documentation or plan
105
- 6. **Go/No-Go Recommendation:** Should we proceed with this plan?
106
-
107
- ## Tone
108
-
109
- Be brutally honest. If our plan is flawed, say so directly. We would rather know now than after implementation. Don't soften criticism - we need accuracy.
110
-
111
- ---
112
-
113
- END OF PROMPT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/bugs/FIX_PLAN_MAGENTIC_MODE.md DELETED
@@ -1,227 +0,0 @@
1
- # Fix Plan: Magentic Mode Report Generation
2
-
3
- **Related Bug**: `P0_MAGENTIC_MODE_BROKEN.md`
4
- **Approach**: Test-Driven Development (TDD)
5
- **Estimated Scope**: 4 tasks, ~2-3 hours
6
-
7
- ---
8
-
9
- ## Problem Summary
10
-
11
- Magentic mode runs but fails to produce readable reports due to:
12
-
13
- 1. **Primary Bug**: `MagenticFinalResultEvent.message` returns `ChatMessage` object, not text
14
- 2. **Secondary Bug**: Max rounds (3) reached before ReportAgent completes
15
- 3. **Tertiary Issues**: Stale "bioRxiv" references in prompts
16
-
17
- ---
18
-
19
- ## Fix Order (TDD)
20
-
21
- ### Phase 1: Write Failing Tests
22
-
23
- **Task 1.1**: Create test for ChatMessage text extraction
24
-
25
- ```python
26
- # tests/unit/test_orchestrator_magentic.py
27
-
28
- def test_process_event_extracts_text_from_chat_message():
29
- """Final result event should extract text from ChatMessage object."""
30
- # Arrange: Mock ChatMessage with .content attribute
31
- # Act: Call _process_event with MagenticFinalResultEvent
32
- # Assert: Returned AgentEvent.message is a string, not object repr
33
- ```
34
-
35
- **Task 1.2**: Create test for max rounds configuration
36
-
37
- ```python
38
- def test_orchestrator_uses_configured_max_rounds():
39
- """MagenticOrchestrator should use max_rounds from constructor."""
40
- # Arrange: Create orchestrator with max_rounds=10
41
- # Act: Build workflow
42
- # Assert: Workflow has max_round_count=10
43
- ```
44
-
45
- **Task 1.3**: Create test for bioRxiv reference removal
46
-
47
- ```python
48
- def test_task_prompt_references_europe_pmc():
49
- """Task prompt should reference Europe PMC, not bioRxiv."""
50
- # Arrange: Create orchestrator
51
- # Act: Check task string in run()
52
- # Assert: Contains "Europe PMC", not "bioRxiv"
53
- ```
54
-
55
- ---
56
-
57
- ### Phase 2: Fix ChatMessage Text Extraction
58
-
59
- **File**: `src/orchestrator_magentic.py`
60
- **Lines**: 192-199
61
-
62
- **Current Code**:
63
- ```python
64
- elif isinstance(event, MagenticFinalResultEvent):
65
- text = event.message.text if event.message else "No result"
66
- ```
67
-
68
- **Fixed Code**:
69
- ```python
70
- elif isinstance(event, MagenticFinalResultEvent):
71
- if event.message:
72
- # ChatMessage may have .content or .text depending on version
73
- if hasattr(event.message, 'content') and event.message.content:
74
- text = str(event.message.content)
75
- elif hasattr(event.message, 'text') and event.message.text:
76
- text = str(event.message.text)
77
- else:
78
- # Fallback: convert entire message to string
79
- text = str(event.message)
80
- else:
81
- text = "No result generated"
82
- ```
83
-
84
- **Why**: The `agent_framework.ChatMessage` object structure may vary. We need defensive extraction.
85
-
86
- ---
87
-
88
- ### Phase 3: Fix Max Rounds Configuration
89
-
90
- **File**: `src/orchestrator_magentic.py`
91
- **Lines**: 97-99
92
-
93
- **Current Code**:
94
- ```python
95
- .with_standard_manager(
96
- chat_client=manager_client,
97
- max_round_count=self._max_rounds, # Already uses config
98
- max_stall_count=3,
99
- max_reset_count=2,
100
- )
101
- ```
102
-
103
- **Issue**: Default `max_rounds` in `__init__` is 10, but workflow may need more for complex queries.
104
-
105
- **Fix**: Verify the value flows through correctly. Add logging.
106
-
107
- ```python
108
- logger.info(
109
- "Building Magentic workflow",
110
- max_rounds=self._max_rounds,
111
- max_stall=3,
112
- max_reset=2,
113
- )
114
- ```
115
-
116
- **Also check**: `src/orchestrator_factory.py` passes config correctly:
117
- ```python
118
- return MagenticOrchestrator(
119
- max_rounds=config.max_iterations if config else 10,
120
- )
121
- ```
122
-
123
- ---
124
-
125
- ### Phase 4: Fix Stale bioRxiv References
126
-
127
- **Files to update**:
128
-
129
- | File | Line | Change |
130
- |------|------|--------|
131
- | `src/orchestrator_magentic.py` | 131 | "bioRxiv" → "Europe PMC" |
132
- | `src/agents/magentic_agents.py` | 32-33 | "bioRxiv" → "Europe PMC" |
133
- | `src/app.py` | 202-203 | "bioRxiv" → "Europe PMC" |
134
-
135
- **Search command to verify**:
136
- ```bash
137
- grep -rn "bioRxiv\|biorxiv" src/
138
- ```
139
-
140
- ---
141
-
142
- ## Implementation Checklist
143
-
144
- ```
145
- [ ] Phase 1: Write failing tests
146
- [ ] 1.1 Test ChatMessage text extraction
147
- [ ] 1.2 Test max rounds configuration
148
- [ ] 1.3 Test Europe PMC references
149
-
150
- [ ] Phase 2: Fix ChatMessage extraction
151
- [ ] Update _process_event() in orchestrator_magentic.py
152
- [ ] Run test 1.1 - should pass
153
-
154
- [ ] Phase 3: Fix max rounds
155
- [ ] Add logging to _build_workflow()
156
- [ ] Verify factory passes config correctly
157
- [ ] Run test 1.2 - should pass
158
-
159
- [ ] Phase 4: Fix bioRxiv references
160
- [ ] Update orchestrator_magentic.py task prompt
161
- [ ] Update magentic_agents.py descriptions
162
- [ ] Update app.py UI text
163
- [ ] Run test 1.3 - should pass
164
- [ ] Run grep to verify no remaining refs
165
-
166
- [ ] Final Verification
167
- [ ] make check passes
168
- [ ] All tests pass (108+)
169
- [ ] Manual test: run_magentic.py produces readable report
170
- ```
171
-
172
- ---
173
-
174
- ## Test Commands
175
-
176
- ```bash
177
- # Run specific test file
178
- uv run pytest tests/unit/test_orchestrator_magentic.py -v
179
-
180
- # Run all tests
181
- uv run pytest tests/unit/ -v
182
-
183
- # Full check
184
- make check
185
-
186
- # Manual integration test
187
- set -a && source .env && set +a
188
- uv run python examples/orchestrator_demo/run_magentic.py "metformin alzheimer"
189
- ```
190
-
191
- ---
192
-
193
- ## Success Criteria
194
-
195
- 1. `run_magentic.py` outputs a readable research report (not `<ChatMessage object>`)
196
- 2. Report includes: Executive Summary, Key Findings, Drug Candidates, References
197
- 3. No "Max round count reached" error with default settings
198
- 4. No "bioRxiv" references anywhere in codebase
199
- 5. All 108+ tests pass
200
- 6. `make check` passes
201
-
202
- ---
203
-
204
- ## Files Modified
205
-
206
- ```
207
- src/
208
- ├── orchestrator_magentic.py # ChatMessage fix, logging
209
- ├── agents/magentic_agents.py # bioRxiv → Europe PMC
210
- └── app.py # bioRxiv → Europe PMC
211
-
212
- tests/unit/
213
- └── test_orchestrator_magentic.py # NEW: 3 tests
214
- ```
215
-
216
- ---
217
-
218
- ## Notes for AI Agent
219
-
220
- When implementing this fix plan:
221
-
222
- 1. **DO NOT** create mock data or fake responses
223
- 2. **DO** write real tests that verify actual behavior
224
- 3. **DO** run `make check` after each phase
225
- 4. **DO** test with real OpenAI API key via `.env`
226
- 5. **DO** preserve existing functionality - simple mode must still work
227
- 6. **DO NOT** over-engineer - minimal changes to fix the specific bugs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/bugs/P0_MAGENTIC_MODE_BROKEN.md DELETED
@@ -1,116 +0,0 @@
1
- # P0 Bug: Magentic Mode Returns ChatMessage Object Instead of Report Text
2
-
3
- **Status**: OPEN
4
- **Priority**: P0 (Critical)
5
- **Date**: 2025-11-27
6
-
7
- ---
8
-
9
- ## Actual Bug Found (Not What We Thought)
10
-
11
- **The OpenAI key works fine.** The real bug is different:
12
-
13
- ### The Problem
14
-
15
- When Magentic mode completes, the final report returns a `ChatMessage` object instead of the actual text:
16
-
17
- ```
18
- FINAL REPORT:
19
- <agent_framework._types.ChatMessage object at 0x11db70310>
20
- ```
21
-
22
- ### Evidence
23
-
24
- Full test output shows:
25
- 1. Magentic orchestrator starts correctly
26
- 2. SearchAgent finds evidence
27
- 3. HypothesisAgent generates hypotheses
28
- 4. JudgeAgent evaluates
29
- 5. **BUT**: Final output is `ChatMessage` object, not text
30
-
31
- ### Root Cause
32
-
33
- In `src/orchestrator_magentic.py` line 193:
34
-
35
- ```python
36
- elif isinstance(event, MagenticFinalResultEvent):
37
- text = event.message.text if event.message else "No result"
38
- ```
39
-
40
- The `event.message` is a `ChatMessage` object, and `.text` may not extract the content correctly, or the message structure changed in the agent-framework library.
41
-
42
- ---
43
-
44
- ## Secondary Issue: Max Rounds Reached
45
-
46
- The orchestrator hits max rounds before producing a report:
47
-
48
- ```
49
- [ERROR] Magentic Orchestrator: Max round count reached
50
- ```
51
-
52
- This means the workflow times out before the ReportAgent synthesizes the final output.
53
-
54
- ---
55
-
56
- ## What Works
57
-
58
- - OpenAI API key: **Works** (loaded from .env)
59
- - SearchAgent: **Works** (finds evidence from PubMed, ClinicalTrials, Europe PMC)
60
- - HypothesisAgent: **Works** (generates Drug -> Target -> Pathway chains)
61
- - JudgeAgent: **Partial** (evaluates but sometimes loses context)
62
-
63
- ---
64
-
65
- ## Files to Fix
66
-
67
- | File | Line | Issue |
68
- |------|------|-------|
69
- | `src/orchestrator_magentic.py` | 193 | `event.message.text` returns object, not string |
70
- | `src/orchestrator_magentic.py` | 97-99 | `max_round_count=3` too low for full pipeline |
71
-
72
- ---
73
-
74
- ## Suggested Fix
75
-
76
- ```python
77
- # In _process_event, line 192-199
78
- elif isinstance(event, MagenticFinalResultEvent):
79
- # Handle ChatMessage object properly
80
- if event.message:
81
- if hasattr(event.message, 'content'):
82
- text = event.message.content
83
- elif hasattr(event.message, 'text'):
84
- text = event.message.text
85
- else:
86
- text = str(event.message)
87
- else:
88
- text = "No result"
89
- ```
90
-
91
- And increase rounds:
92
-
93
- ```python
94
- # In _build_workflow, line 97
95
- max_round_count=self._max_rounds, # Use configured value, default 10
96
- ```
97
-
98
- ---
99
-
100
- ## Test Command
101
-
102
- ```bash
103
- set -a && source .env && set +a && uv run python examples/orchestrator_demo/run_magentic.py "metformin alzheimer"
104
- ```
105
-
106
- ---
107
-
108
- ## Simple Mode Works
109
-
110
- For reference, simple mode produces full reports:
111
-
112
- ```bash
113
- uv run python examples/orchestrator_demo/run_agent.py "metformin alzheimer"
114
- ```
115
-
116
- Output includes structured report with Drug Candidates, Key Findings, etc.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/bugs/P1_GRADIO_SETTINGS_CLEANUP.md DELETED
@@ -1,81 +0,0 @@
1
- # P1 Bug: Gradio Settings Accordion Not Collapsing
2
-
3
- **Priority**: P1 (UX Bug)
4
- **Status**: OPEN
5
- **Date**: 2025-11-27
6
- **Target Component**: `src/app.py`
7
-
8
- ---
9
-
10
- ## 1. Problem Description
11
-
12
- The "Settings" accordion in the Gradio UI (containing Orchestrator Mode, API Key, Provider) fails to collapse, even when configured with `open=False`. It remains permanently expanded, cluttering the interface and obscuring the chat history.
13
-
14
- ### Symptoms
15
- - Accordion arrow toggles visually, but content remains visible.
16
- - Occurs in both local development (`uv run src/app.py`) and HuggingFace Spaces.
17
-
18
- ---
19
-
20
- ## 2. Root Cause Analysis
21
-
22
- **Definitive Cause**: Nested `Blocks` Context Bug.
23
- `gr.ChatInterface` is itself a high-level abstraction that creates a `gr.Blocks` context. Wrapping `gr.ChatInterface` inside an external `with gr.Blocks():` context causes event listener conflicts, specifically breaking the JavaScript state management for `additional_inputs_accordion`.
24
-
25
- **Reference**: [Gradio Issue #8861](https://github.com/gradio-app/gradio/issues/8861) confirms that `additional_inputs_accordion` malfunctions when `ChatInterface` is not the top-level block.
26
-
27
- ---
28
-
29
- ## 3. Solution Strategy: "The Unwrap Fix"
30
-
31
- We will remove the redundant `gr.Blocks` wrapper. This restores the native behavior of `ChatInterface`, ensuring the accordion respects `open=False`.
32
-
33
- ### Implementation Plan
34
-
35
- **Refactor `src/app.py` / `create_demo()`**:
36
-
37
- 1. **Remove** the `with gr.Blocks() as demo:` context manager.
38
- 2. **Instantiate** `gr.ChatInterface` directly as the `demo` object.
39
- 3. **Migrate UI Elements**:
40
- * **Header**: Move the H1/Title text into the `title` parameter of `ChatInterface`.
41
- * **Footer**: Move the footer text ("MCP Server Active...") into the `description` parameter. `ChatInterface` supports Markdown in `description`, making it the ideal place for static info below the title but above the chat.
42
-
43
- ### Before (Buggy)
44
- ```python
45
- def create_demo():
46
- with gr.Blocks() as demo: # <--- CAUSE OF BUG
47
- gr.Markdown("# Title")
48
- gr.ChatInterface(..., additional_inputs_accordion=gr.Accordion(open=False))
49
- gr.Markdown("Footer")
50
- return demo
51
- ```
52
-
53
- ### After (Correct)
54
- ```python
55
- def create_demo():
56
- return gr.ChatInterface( # <--- FIX: Top-level component
57
- ...,
58
- title="🧬 DeepCritical",
59
- description="*AI-Powered Drug Repurposing Agent...*\n\n---\n**MCP Server Active**...",
60
- additional_inputs_accordion=gr.Accordion(label="⚙️ Settings", open=False)
61
- )
62
- ```
63
-
64
- ---
65
-
66
- ## 4. Validation
67
-
68
- 1. **Run**: `uv run python src/app.py`
69
- 2. **Check**: Open `http://localhost:7860`
70
- 3. **Verify**:
71
- * Settings accordion starts **COLLAPSED**.
72
- * Header title ("DeepCritical") is visible.
73
- * Footer text ("MCP Server Active") is visible in the description area.
74
- * Chat functionality works (Magentic/Simple modes).
75
-
76
- ---
77
-
78
- ## 5. Constraints & Notes
79
-
80
- - **Layout**: We lose the ability to place arbitrary elements *below* the chat box (footer will move to top, under title), but this is an acceptable trade-off for a working UI.
81
- - **CSS**: `ChatInterface` handles its own CSS; any custom class styling from the previous footer will be standardized to the description text style.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/configuration/CONFIGURATION.md ADDED
@@ -0,0 +1,743 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Configuration Guide
2
+
3
+ ## Overview
4
+
5
+ DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in the `Settings` class in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
6
+
7
+ The configuration system provides:
8
+
9
+ - **Type Safety**: Strongly-typed fields with Pydantic validation
10
+ - **Environment File Support**: Automatically loads from `.env` file (if present)
11
+ - **Case-Insensitive**: Environment variables are case-insensitive
12
+ - **Singleton Pattern**: Global `settings` instance for easy access throughout the codebase
13
+ - **Validation**: Automatic validation on load with helpful error messages
14
+
15
+ ## Quick Start
16
+
17
+ 1. Create a `.env` file in the project root
18
+ 2. Set at least one LLM API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `HF_TOKEN`)
19
+ 3. Optionally configure other services as needed
20
+ 4. The application will automatically load and validate your configuration
21
+
22
+ ## Configuration System Architecture
23
+
24
+ ### Settings Class
25
+
26
+ The `Settings` class extends `BaseSettings` from `pydantic_settings` and defines all application configuration:
27
+
28
+ ```13:21:src/utils/config.py
29
+ class Settings(BaseSettings):
30
+ """Strongly-typed application settings."""
31
+
32
+ model_config = SettingsConfigDict(
33
+ env_file=".env",
34
+ env_file_encoding="utf-8",
35
+ case_sensitive=False,
36
+ extra="ignore",
37
+ )
38
+ ```
39
+
40
+ ### Singleton Instance
41
+
42
+ A global `settings` instance is available for import:
43
+
44
+ ```234:235:src/utils/config.py
45
+ # Singleton for easy import
46
+ settings = get_settings()
47
+ ```
48
+
49
+ ### Usage Pattern
50
+
51
+ Access configuration throughout the codebase:
52
+
53
+ ```python
54
+ from src.utils.config import settings
55
+
56
+ # Check if API keys are available
57
+ if settings.has_openai_key:
58
+ # Use OpenAI
59
+ pass
60
+
61
+ # Access configuration values
62
+ max_iterations = settings.max_iterations
63
+ web_search_provider = settings.web_search_provider
64
+ ```
65
+
66
+ ## Required Configuration
67
+
68
+ ### LLM Provider
69
+
70
+ You must configure at least one LLM provider. The system supports:
71
+
72
+ - **OpenAI**: Requires `OPENAI_API_KEY`
73
+ - **Anthropic**: Requires `ANTHROPIC_API_KEY`
74
+ - **HuggingFace**: Optional `HF_TOKEN` or `HUGGINGFACE_API_KEY` (can work without key for public models)
75
+
76
+ #### OpenAI Configuration
77
+
78
+ ```bash
79
+ LLM_PROVIDER=openai
80
+ OPENAI_API_KEY=your_openai_api_key_here
81
+ OPENAI_MODEL=gpt-5.1
82
+ ```
83
+
84
+ The default model is defined in the `Settings` class:
85
+
86
+ ```29:29:src/utils/config.py
87
+ openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
88
+ ```
89
+
90
+ #### Anthropic Configuration
91
+
92
+ ```bash
93
+ LLM_PROVIDER=anthropic
94
+ ANTHROPIC_API_KEY=your_anthropic_api_key_here
95
+ ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
96
+ ```
97
+
98
+ The default model is defined in the `Settings` class:
99
+
100
+ ```30:32:src/utils/config.py
101
+ anthropic_model: str = Field(
102
+ default="claude-sonnet-4-5-20250929", description="Anthropic model"
103
+ )
104
+ ```
105
+
106
+ #### HuggingFace Configuration
107
+
108
+ HuggingFace can work without an API key for public models, but an API key provides higher rate limits:
109
+
110
+ ```bash
111
+ # Option 1: Using HF_TOKEN (preferred)
112
+ HF_TOKEN=your_huggingface_token_here
113
+
114
+ # Option 2: Using HUGGINGFACE_API_KEY (alternative)
115
+ HUGGINGFACE_API_KEY=your_huggingface_api_key_here
116
+
117
+ # Default model
118
+ HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
119
+ ```
120
+
121
+ The HuggingFace token can be set via either environment variable:
122
+
123
+ ```33:35:src/utils/config.py
124
+ hf_token: str | None = Field(
125
+ default=None, alias="HF_TOKEN", description="HuggingFace API token"
126
+ )
127
+ ```
128
+
129
+ ```57:59:src/utils/config.py
130
+ huggingface_api_key: str | None = Field(
131
+ default=None, description="HuggingFace API token (HF_TOKEN or HUGGINGFACE_API_KEY)"
132
+ )
133
+ ```
134
+
135
+ ## Optional Configuration
136
+
137
+ ### Embedding Configuration
138
+
139
+ DeepCritical supports multiple embedding providers for semantic search and RAG:
140
+
141
+ ```bash
142
+ # Embedding Provider: "openai", "local", or "huggingface"
143
+ EMBEDDING_PROVIDER=local
144
+
145
+ # OpenAI Embedding Model (used by LlamaIndex RAG)
146
+ OPENAI_EMBEDDING_MODEL=text-embedding-3-small
147
+
148
+ # Local Embedding Model (sentence-transformers, used by EmbeddingService)
149
+ LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
150
+
151
+ # HuggingFace Embedding Model
152
+ HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
153
+ ```
154
+
155
+ The embedding provider configuration:
156
+
157
+ ```47:50:src/utils/config.py
158
+ embedding_provider: Literal["openai", "local", "huggingface"] = Field(
159
+ default="local",
160
+ description="Embedding provider to use",
161
+ )
162
+ ```
163
+
164
+ **Note**: OpenAI embeddings require `OPENAI_API_KEY`. The local provider (default) uses sentence-transformers and requires no API key.
165
+
166
+ ### Web Search Configuration
167
+
168
+ DeepCritical supports multiple web search providers:
169
+
170
+ ```bash
171
+ # Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
172
+ # Default: "duckduckgo" (no API key required)
173
+ WEB_SEARCH_PROVIDER=duckduckgo
174
+
175
+ # Serper API Key (for Google search via Serper)
176
+ SERPER_API_KEY=your_serper_api_key_here
177
+
178
+ # SearchXNG Host URL (for self-hosted search)
179
+ SEARCHXNG_HOST=http://localhost:8080
180
+
181
+ # Brave Search API Key
182
+ BRAVE_API_KEY=your_brave_api_key_here
183
+
184
+ # Tavily API Key
185
+ TAVILY_API_KEY=your_tavily_api_key_here
186
+ ```
187
+
188
+ The web search provider configuration:
189
+
190
+ ```71:74:src/utils/config.py
191
+ web_search_provider: Literal["serper", "searchxng", "brave", "tavily", "duckduckgo"] = Field(
192
+ default="duckduckgo",
193
+ description="Web search provider to use",
194
+ )
195
+ ```
196
+
197
+ **Note**: DuckDuckGo is the default and requires no API key, making it ideal for development and testing.
198
+
199
+ ### PubMed Configuration
200
+
201
+ PubMed search supports optional NCBI API key for higher rate limits:
202
+
203
+ ```bash
204
+ # NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
205
+ NCBI_API_KEY=your_ncbi_api_key_here
206
+ ```
207
+
208
+ The PubMed tool uses this configuration:
209
+
210
+ ```22:29:src/tools/pubmed.py
211
+ def __init__(self, api_key: str | None = None) -> None:
212
+ self.api_key = api_key or settings.ncbi_api_key
213
+ # Ignore placeholder values from .env.example
214
+ if self.api_key == "your-ncbi-key-here":
215
+ self.api_key = None
216
+
217
+ # Use shared rate limiter
218
+ self._limiter = get_pubmed_limiter(self.api_key)
219
+ ```
220
+
221
+ ### Agent Configuration
222
+
223
+ Control agent behavior and research loop execution:
224
+
225
+ ```bash
226
+ # Maximum iterations per research loop (1-50, default: 10)
227
+ MAX_ITERATIONS=10
228
+
229
+ # Search timeout in seconds
230
+ SEARCH_TIMEOUT=30
231
+
232
+ # Use graph-based execution for research flows
233
+ USE_GRAPH_EXECUTION=false
234
+ ```
235
+
236
+ The agent configuration fields:
237
+
238
+ ```80:85:src/utils/config.py
239
+ # Agent Configuration
240
+ max_iterations: int = Field(default=10, ge=1, le=50)
241
+ search_timeout: int = Field(default=30, description="Seconds to wait for search")
242
+ use_graph_execution: bool = Field(
243
+ default=False, description="Use graph-based execution for research flows"
244
+ )
245
+ ```
246
+
247
+ ### Budget & Rate Limiting Configuration
248
+
249
+ Control resource limits for research loops:
250
+
251
+ ```bash
252
+ # Default token budget per research loop (1000-1000000, default: 100000)
253
+ DEFAULT_TOKEN_LIMIT=100000
254
+
255
+ # Default time limit per research loop in minutes (1-120, default: 10)
256
+ DEFAULT_TIME_LIMIT_MINUTES=10
257
+
258
+ # Default iterations limit per research loop (1-50, default: 10)
259
+ DEFAULT_ITERATIONS_LIMIT=10
260
+ ```
261
+
262
+ The budget configuration with validation:
263
+
264
+ ```87:105:src/utils/config.py
265
+ # Budget & Rate Limiting Configuration
266
+ default_token_limit: int = Field(
267
+ default=100000,
268
+ ge=1000,
269
+ le=1000000,
270
+ description="Default token budget per research loop",
271
+ )
272
+ default_time_limit_minutes: int = Field(
273
+ default=10,
274
+ ge=1,
275
+ le=120,
276
+ description="Default time limit per research loop (minutes)",
277
+ )
278
+ default_iterations_limit: int = Field(
279
+ default=10,
280
+ ge=1,
281
+ le=50,
282
+ description="Default iterations limit per research loop",
283
+ )
284
+ ```
285
+
286
+ ### RAG Service Configuration
287
+
288
+ Configure the Retrieval-Augmented Generation service:
289
+
290
+ ```bash
291
+ # ChromaDB collection name for RAG
292
+ RAG_COLLECTION_NAME=deepcritical_evidence
293
+
294
+ # Number of top results to retrieve from RAG (1-50, default: 5)
295
+ RAG_SIMILARITY_TOP_K=5
296
+
297
+ # Automatically ingest evidence into RAG
298
+ RAG_AUTO_INGEST=true
299
+ ```
300
+
301
+ The RAG configuration:
302
+
303
+ ```127:141:src/utils/config.py
304
+ # RAG Service Configuration
305
+ rag_collection_name: str = Field(
306
+ default="deepcritical_evidence",
307
+ description="ChromaDB collection name for RAG",
308
+ )
309
+ rag_similarity_top_k: int = Field(
310
+ default=5,
311
+ ge=1,
312
+ le=50,
313
+ description="Number of top results to retrieve from RAG",
314
+ )
315
+ rag_auto_ingest: bool = Field(
316
+ default=True,
317
+ description="Automatically ingest evidence into RAG",
318
+ )
319
+ ```
320
+
321
+ ### ChromaDB Configuration
322
+
323
+ Configure the vector database for embeddings and RAG:
324
+
325
+ ```bash
326
+ # ChromaDB storage path
327
+ CHROMA_DB_PATH=./chroma_db
328
+
329
+ # Whether to persist ChromaDB to disk
330
+ CHROMA_DB_PERSIST=true
331
+
332
+ # ChromaDB server host (for remote ChromaDB, optional)
333
+ CHROMA_DB_HOST=localhost
334
+
335
+ # ChromaDB server port (for remote ChromaDB, optional)
336
+ CHROMA_DB_PORT=8000
337
+ ```
338
+
339
+ The ChromaDB configuration:
340
+
341
+ ```113:125:src/utils/config.py
342
+ chroma_db_path: str = Field(default="./chroma_db", description="ChromaDB storage path")
343
+ chroma_db_persist: bool = Field(
344
+ default=True,
345
+ description="Whether to persist ChromaDB to disk",
346
+ )
347
+ chroma_db_host: str | None = Field(
348
+ default=None,
349
+ description="ChromaDB server host (for remote ChromaDB)",
350
+ )
351
+ chroma_db_port: int | None = Field(
352
+ default=None,
353
+ description="ChromaDB server port (for remote ChromaDB)",
354
+ )
355
+ ```
356
+
357
+ ### External Services
358
+
359
+ #### Modal Configuration
360
+
361
+ Modal is used for secure sandbox execution of statistical analysis:
362
+
363
+ ```bash
364
+ # Modal Token ID (for Modal sandbox execution)
365
+ MODAL_TOKEN_ID=your_modal_token_id_here
366
+
367
+ # Modal Token Secret
368
+ MODAL_TOKEN_SECRET=your_modal_token_secret_here
369
+ ```
370
+
371
+ The Modal configuration:
372
+
373
+ ```110:112:src/utils/config.py
374
+ # External Services
375
+ modal_token_id: str | None = Field(default=None, description="Modal token ID")
376
+ modal_token_secret: str | None = Field(default=None, description="Modal token secret")
377
+ ```
378
+
379
+ ### Logging Configuration
380
+
381
+ Configure structured logging:
382
+
383
+ ```bash
384
+ # Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
385
+ LOG_LEVEL=INFO
386
+ ```
387
+
388
+ The logging configuration:
389
+
390
+ ```107:108:src/utils/config.py
391
+ # Logging
392
+ log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
393
+ ```
394
+
395
+ Logging is configured via the `configure_logging()` function:
396
+
397
+ ```212:231:src/utils/config.py
398
+ def configure_logging(settings: Settings) -> None:
399
+ """Configure structured logging with the configured log level."""
400
+ # Set stdlib logging level from settings
401
+ logging.basicConfig(
402
+ level=getattr(logging, settings.log_level),
403
+ format="%(message)s",
404
+ )
405
+
406
+ structlog.configure(
407
+ processors=[
408
+ structlog.stdlib.filter_by_level,
409
+ structlog.stdlib.add_logger_name,
410
+ structlog.stdlib.add_log_level,
411
+ structlog.processors.TimeStamper(fmt="iso"),
412
+ structlog.processors.JSONRenderer(),
413
+ ],
414
+ wrapper_class=structlog.stdlib.BoundLogger,
415
+ context_class=dict,
416
+ logger_factory=structlog.stdlib.LoggerFactory(),
417
+ )
418
+ ```
419
+
420
+ ## Configuration Properties
421
+
422
+ The `Settings` class provides helpful properties for checking configuration state:
423
+
424
+ ### API Key Availability
425
+
426
+ Check which API keys are available:
427
+
428
+ ```171:189:src/utils/config.py
429
+ @property
430
+ def has_openai_key(self) -> bool:
431
+ """Check if OpenAI API key is available."""
432
+ return bool(self.openai_api_key)
433
+
434
+ @property
435
+ def has_anthropic_key(self) -> bool:
436
+ """Check if Anthropic API key is available."""
437
+ return bool(self.anthropic_api_key)
438
+
439
+ @property
440
+ def has_huggingface_key(self) -> bool:
441
+ """Check if HuggingFace API key is available."""
442
+ return bool(self.huggingface_api_key or self.hf_token)
443
+
444
+ @property
445
+ def has_any_llm_key(self) -> bool:
446
+ """Check if any LLM API key is available."""
447
+ return self.has_openai_key or self.has_anthropic_key or self.has_huggingface_key
448
+ ```
449
+
450
+ **Usage:**
451
+
452
+ ```python
453
+ from src.utils.config import settings
454
+
455
+ # Check API key availability
456
+ if settings.has_openai_key:
457
+ # Use OpenAI
458
+ pass
459
+
460
+ if settings.has_anthropic_key:
461
+ # Use Anthropic
462
+ pass
463
+
464
+ if settings.has_huggingface_key:
465
+ # Use HuggingFace
466
+ pass
467
+
468
+ if settings.has_any_llm_key:
469
+ # At least one LLM is available
470
+ pass
471
+ ```
472
+
473
+ ### Service Availability
474
+
475
+ Check if external services are configured:
476
+
477
+ ```143:146:src/utils/config.py
478
+ @property
479
+ def modal_available(self) -> bool:
480
+ """Check if Modal credentials are configured."""
481
+ return bool(self.modal_token_id and self.modal_token_secret)
482
+ ```
483
+
484
+ ```191:204:src/utils/config.py
485
+ @property
486
+ def web_search_available(self) -> bool:
487
+ """Check if web search is available (either no-key provider or API key present)."""
488
+ if self.web_search_provider == "duckduckgo":
489
+ return True # No API key required
490
+ if self.web_search_provider == "serper":
491
+ return bool(self.serper_api_key)
492
+ if self.web_search_provider == "searchxng":
493
+ return bool(self.searchxng_host)
494
+ if self.web_search_provider == "brave":
495
+ return bool(self.brave_api_key)
496
+ if self.web_search_provider == "tavily":
497
+ return bool(self.tavily_api_key)
498
+ return False
499
+ ```
500
+
501
+ **Usage:**
502
+
503
+ ```python
504
+ from src.utils.config import settings
505
+
506
+ # Check service availability
507
+ if settings.modal_available:
508
+ # Use Modal sandbox
509
+ pass
510
+
511
+ if settings.web_search_available:
512
+ # Web search is configured
513
+ pass
514
+ ```
515
+
516
+ ### API Key Retrieval
517
+
518
+ Get the API key for the configured provider:
519
+
520
+ ```148:160:src/utils/config.py
521
+ def get_api_key(self) -> str:
522
+ """Get the API key for the configured provider."""
523
+ if self.llm_provider == "openai":
524
+ if not self.openai_api_key:
525
+ raise ConfigurationError("OPENAI_API_KEY not set")
526
+ return self.openai_api_key
527
+
528
+ if self.llm_provider == "anthropic":
529
+ if not self.anthropic_api_key:
530
+ raise ConfigurationError("ANTHROPIC_API_KEY not set")
531
+ return self.anthropic_api_key
532
+
533
+ raise ConfigurationError(f"Unknown LLM provider: {self.llm_provider}")
534
+ ```
535
+
536
+ For OpenAI-specific operations (e.g., Magentic mode):
537
+
538
+ ```162:169:src/utils/config.py
539
+ def get_openai_api_key(self) -> str:
540
+ """Get OpenAI API key (required for Magentic function calling)."""
541
+ if not self.openai_api_key:
542
+ raise ConfigurationError(
543
+ "OPENAI_API_KEY not set. Magentic mode requires OpenAI for function calling. "
544
+ "Use mode='simple' for other providers."
545
+ )
546
+ return self.openai_api_key
547
+ ```
548
+
549
+ ## Configuration Usage in Codebase
550
+
551
+ The configuration system is used throughout the codebase:
552
+
553
+ ### LLM Factory
554
+
555
+ The LLM factory uses settings to create appropriate models:
556
+
557
+ ```129:144:src/utils/llm_factory.py
558
+ if settings.llm_provider == "huggingface":
559
+ model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
560
+ hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
561
+ return HuggingFaceModel(model_name, provider=hf_provider)
562
+
563
+ if settings.llm_provider == "openai":
564
+ if not settings.openai_api_key:
565
+ raise ConfigurationError("OPENAI_API_KEY not set for pydantic-ai")
566
+ provider = OpenAIProvider(api_key=settings.openai_api_key)
567
+ return OpenAIModel(settings.openai_model, provider=provider)
568
+
569
+ if settings.llm_provider == "anthropic":
570
+ if not settings.anthropic_api_key:
571
+ raise ConfigurationError("ANTHROPIC_API_KEY not set for pydantic-ai")
572
+ anthropic_provider = AnthropicProvider(api_key=settings.anthropic_api_key)
573
+ return AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
574
+ ```
575
+
576
+ ### Embedding Service
577
+
578
+ The embedding service uses local embedding model configuration:
579
+
580
+ ```29:31:src/services/embeddings.py
581
+ def __init__(self, model_name: str | None = None):
582
+ self._model_name = model_name or settings.local_embedding_model
583
+ self._model = SentenceTransformer(self._model_name)
584
+ ```
585
+
586
+ ### Orchestrator Factory
587
+
588
+ The orchestrator factory uses settings to determine mode:
589
+
590
+ ```69:80:src/orchestrator_factory.py
591
+ def _determine_mode(explicit_mode: str | None) -> str:
592
+ """Determine which mode to use."""
593
+ if explicit_mode:
594
+ if explicit_mode in ("magentic", "advanced"):
595
+ return "advanced"
596
+ return "simple"
597
+
598
+ # Auto-detect: advanced if paid API key available
599
+ if settings.has_openai_key:
600
+ return "advanced"
601
+
602
+ return "simple"
603
+ ```
604
+
605
+ ## Environment Variables Reference
606
+
607
+ ### Required (at least one LLM)
608
+
609
+ - `OPENAI_API_KEY` - OpenAI API key (required for OpenAI provider)
610
+ - `ANTHROPIC_API_KEY` - Anthropic API key (required for Anthropic provider)
611
+ - `HF_TOKEN` or `HUGGINGFACE_API_KEY` - HuggingFace API token (optional, can work without for public models)
612
+
613
+ #### LLM Configuration Variables
614
+
615
+ - `LLM_PROVIDER` - Provider to use: `"openai"`, `"anthropic"`, or `"huggingface"` (default: `"huggingface"`)
616
+ - `OPENAI_MODEL` - OpenAI model name (default: `"gpt-5.1"`)
617
+ - `ANTHROPIC_MODEL` - Anthropic model name (default: `"claude-sonnet-4-5-20250929"`)
618
+ - `HUGGINGFACE_MODEL` - HuggingFace model ID (default: `"meta-llama/Llama-3.1-8B-Instruct"`)
619
+
620
+ #### Embedding Configuration Variables
621
+
622
+ - `EMBEDDING_PROVIDER` - Provider: `"openai"`, `"local"`, or `"huggingface"` (default: `"local"`)
623
+ - `OPENAI_EMBEDDING_MODEL` - OpenAI embedding model (default: `"text-embedding-3-small"`)
624
+ - `LOCAL_EMBEDDING_MODEL` - Local sentence-transformers model (default: `"all-MiniLM-L6-v2"`)
625
+ - `HUGGINGFACE_EMBEDDING_MODEL` - HuggingFace embedding model (default: `"sentence-transformers/all-MiniLM-L6-v2"`)
626
+
627
+ #### Web Search Configuration Variables
628
+
629
+ - `WEB_SEARCH_PROVIDER` - Provider: `"serper"`, `"searchxng"`, `"brave"`, `"tavily"`, or `"duckduckgo"` (default: `"duckduckgo"`)
630
+ - `SERPER_API_KEY` - Serper API key (required for Serper provider)
631
+ - `SEARCHXNG_HOST` - SearchXNG host URL (required for SearchXNG provider)
632
+ - `BRAVE_API_KEY` - Brave Search API key (required for Brave provider)
633
+ - `TAVILY_API_KEY` - Tavily API key (required for Tavily provider)
634
+
635
+ #### PubMed Configuration Variables
636
+
637
+ - `NCBI_API_KEY` - NCBI API key (optional, increases rate limit from 3 to 10 req/sec)
638
+
639
+ #### Agent Configuration Variables
640
+
641
+ - `MAX_ITERATIONS` - Maximum iterations per research loop (1-50, default: `10`)
642
+ - `SEARCH_TIMEOUT` - Search timeout in seconds (default: `30`)
643
+ - `USE_GRAPH_EXECUTION` - Use graph-based execution (default: `false`)
644
+
645
+ #### Budget Configuration Variables
646
+
647
+ - `DEFAULT_TOKEN_LIMIT` - Default token budget per research loop (1000-1000000, default: `100000`)
648
+ - `DEFAULT_TIME_LIMIT_MINUTES` - Default time limit in minutes (1-120, default: `10`)
649
+ - `DEFAULT_ITERATIONS_LIMIT` - Default iterations limit (1-50, default: `10`)
650
+
651
+ #### RAG Configuration Variables
652
+
653
+ - `RAG_COLLECTION_NAME` - ChromaDB collection name (default: `"deepcritical_evidence"`)
654
+ - `RAG_SIMILARITY_TOP_K` - Number of top results to retrieve (1-50, default: `5`)
655
+ - `RAG_AUTO_INGEST` - Automatically ingest evidence into RAG (default: `true`)
656
+
657
+ #### ChromaDB Configuration Variables
658
+
659
+ - `CHROMA_DB_PATH` - ChromaDB storage path (default: `"./chroma_db"`)
660
+ - `CHROMA_DB_PERSIST` - Whether to persist ChromaDB to disk (default: `true`)
661
+ - `CHROMA_DB_HOST` - ChromaDB server host (optional, for remote ChromaDB)
662
+ - `CHROMA_DB_PORT` - ChromaDB server port (optional, for remote ChromaDB)
663
+
664
+ #### External Services Variables
665
+
666
+ - `MODAL_TOKEN_ID` - Modal token ID (optional, for Modal sandbox execution)
667
+ - `MODAL_TOKEN_SECRET` - Modal token secret (optional, for Modal sandbox execution)
668
+
669
+ #### Logging Configuration Variables
670
+
671
+ - `LOG_LEVEL` - Log level: `"DEBUG"`, `"INFO"`, `"WARNING"`, or `"ERROR"` (default: `"INFO"`)
672
+
673
+ ## Validation
674
+
675
+ Settings are validated on load using Pydantic validation:
676
+
677
+ - **Type Checking**: All fields are strongly typed
678
+ - **Range Validation**: Numeric fields have min/max constraints (e.g., `ge=1, le=50` for `max_iterations`)
679
+ - **Literal Validation**: Enum fields only accept specific values (e.g., `Literal["openai", "anthropic", "huggingface"]`)
680
+ - **Required Fields**: API keys are checked when accessed via `get_api_key()` or `get_openai_api_key()`
681
+
682
+ ### Validation Examples
683
+
684
+ The `max_iterations` field has range validation:
685
+
686
+ ```81:81:src/utils/config.py
687
+ max_iterations: int = Field(default=10, ge=1, le=50)
688
+ ```
689
+
690
+ The `llm_provider` field has literal validation:
691
+
692
+ ```26:28:src/utils/config.py
693
+ llm_provider: Literal["openai", "anthropic", "huggingface"] = Field(
694
+ default="openai", description="Which LLM provider to use"
695
+ )
696
+ ```
697
+
698
+ ## Error Handling
699
+
700
+ Configuration errors raise `ConfigurationError` from `src/utils/exceptions.py`:
701
+
702
+ ```22:25:src/utils/exceptions.py
703
+ class ConfigurationError(DeepCriticalError):
704
+ """Raised when configuration is invalid."""
705
+
706
+ pass
707
+ ```
708
+
709
+ ### Error Handling Example
710
+
711
+ ```python
712
+ from src.utils.config import settings
713
+ from src.utils.exceptions import ConfigurationError
714
+
715
+ try:
716
+ api_key = settings.get_api_key()
717
+ except ConfigurationError as e:
718
+ print(f"Configuration error: {e}")
719
+ ```
720
+
721
+ ### Common Configuration Errors
722
+
723
+ 1. **Missing API Key**: When `get_api_key()` is called but the required API key is not set
724
+ 2. **Invalid Provider**: When `llm_provider` is set to an unsupported value
725
+ 3. **Out of Range**: When numeric values exceed their min/max constraints
726
+ 4. **Invalid Literal**: When enum fields receive unsupported values
727
+
728
+ ## Configuration Best Practices
729
+
730
+ 1. **Use `.env` File**: Store sensitive keys in `.env` file (add to `.gitignore`)
731
+ 2. **Check Availability**: Use properties like `has_openai_key` before accessing API keys
732
+ 3. **Handle Errors**: Always catch `ConfigurationError` when calling `get_api_key()`
733
+ 4. **Validate Early**: Configuration is validated on import, so errors surface immediately
734
+ 5. **Use Defaults**: Leverage sensible defaults for optional configuration
735
+
736
+ ## Future Enhancements
737
+
738
+ The following configurations are planned for future phases:
739
+
740
+ 1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
741
+ 2. **Model Selection**: Reasoning/main/fast model configuration
742
+ 3. **Service Integration**: Additional service integrations and configurations
743
+