File size: 9,867 Bytes
016b413
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
# Implementation Roadmap: DeepCritical (Vertical Slices)

**Philosophy:** AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025).

This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value.

---

## The 2025 "Gucci" Tooling Stack

We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy.

| Category | Tool | Why? |
|----------|------|------|
| **Package Manager** | **`uv`** | Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps. |
| **Linting/Format** | **`ruff`** | Rust-based, instant. Replaces black, isort, flake8. |
| **Type Checking** | **`mypy`** | Strict static typing. Run via `uv run mypy`. |
| **Testing** | **`pytest`** | The standard. |
| **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. |
| **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. |
| **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. |
| **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. |

---

## Architecture: Vertical Slices

Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**.
Each slice implements a feature from **Entry Point (UI/API) -> Logic -> Data/External**.

### Directory Structure (Maintainer's Structure)

```bash
src/
β”œβ”€β”€ app.py                      # Entry point (Gradio UI)
β”œβ”€β”€ orchestrator.py             # Agent loop (Search -> Judge -> Loop)
β”œβ”€β”€ agent_factory/              # Agent creation and judges
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ agents.py               # PydanticAI agent definitions
β”‚   └── judges.py               # JudgeHandler for evidence assessment
β”œβ”€β”€ tools/                      # Search tools
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ pubmed.py               # PubMed E-utilities tool
β”‚   β”œβ”€β”€ clinicaltrials.py       # ClinicalTrials.gov API
β”‚   β”œβ”€β”€ biorxiv.py              # bioRxiv/medRxiv preprints
β”‚   β”œβ”€β”€ code_execution.py       # Modal sandbox execution
β”‚   └── search_handler.py       # Orchestrates multiple tools
β”œβ”€β”€ prompts/                    # Prompt templates
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── judge.py                # Judge prompts
β”œβ”€β”€ utils/                      # Shared utilities
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ config.py               # Settings/configuration
β”‚   β”œβ”€β”€ exceptions.py           # Custom exceptions
β”‚   β”œβ”€β”€ models.py               # Shared Pydantic models
β”‚   β”œβ”€β”€ dataloaders.py          # Data loading utilities
β”‚   └── parsers.py              # Parsing utilities
β”œβ”€β”€ middleware/                 # (Future: middleware components)
β”œβ”€β”€ database_services/          # (Future: database integrations)
└── retrieval_factory/          # (Future: RAG components)

tests/
β”œβ”€β”€ unit/
β”‚   β”œβ”€β”€ tools/
β”‚   β”‚   β”œβ”€β”€ test_pubmed.py
β”‚   β”‚   β”œβ”€β”€ test_clinicaltrials.py
β”‚   β”‚   β”œβ”€β”€ test_biorxiv.py
β”‚   β”‚   └── test_search_handler.py
β”‚   β”œβ”€β”€ agent_factory/
β”‚   β”‚   └── test_judges.py
β”‚   └── test_orchestrator.py
└── integration/
    └── test_pubmed_live.py
```

---

## Phased Execution Plan

### **Phase 1: Foundation & Tooling (Day 1)**

*Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.*

- [ ] Initialize `pyproject.toml` with `uv`.
- [ ] Configure `ruff` (strict) and `mypy` (strict).
- [ ] Set up `pytest` with sugar and coverage.
- [ ] Implement `src/utils/config.py` (Configuration Slice).
- [ ] Implement `src/utils/exceptions.py` (Custom exceptions).
- **Deliverable**: A repo that passes CI with `uv run pytest`.

### **Phase 2: The "Search" Vertical Slice (Day 2)**

*Goal: Agent can receive a query and get raw results from PubMed/Web.*

- [ ] **TDD**: Write test for `SearchHandler`.
- [ ] Implement `src/tools/pubmed.py` (PubMed E-utilities).
- [ ] Implement `src/tools/websearch.py` (DuckDuckGo).
- [ ] Implement `src/tools/search_handler.py` (Orchestrates tools).
- [ ] Implement `src/utils/models.py` (Evidence, Citation, SearchResult).
- **Deliverable**: Function that takes "long covid" -> returns `List[Evidence]`.

### **Phase 3: The "Judge" Vertical Slice (Day 3)**

*Goal: Agent can decide if evidence is sufficient.*

- [ ] **TDD**: Write test for `JudgeHandler` (Mocked LLM).
- [ ] Implement `src/prompts/judge.py` (Structured outputs).
- [ ] Implement `src/agent_factory/judges.py` (LLM interaction).
- **Deliverable**: Function that takes `List[Evidence]` -> returns `JudgeAssessment`.

### **Phase 4: The "Loop" & UI Slice (Day 4)**

*Goal: End-to-End User Value.*

- [ ] Implement `src/orchestrator.py` (Connects Search + Judge loops).
- [ ] Build `src/app.py` (Gradio with Streaming).
- **Deliverable**: Working DeepCritical Agent on HuggingFace.

---

### **Phase 5: Magentic Integration** βœ… COMPLETE

*Goal: Upgrade orchestrator to use Microsoft Agent Framework patterns.*

- [x] Wrap SearchHandler as `AgentProtocol` (SearchAgent) with strict protocol compliance.
- [x] Wrap JudgeHandler as `AgentProtocol` (JudgeAgent) with strict protocol compliance.
- [x] Implement `MagenticOrchestrator` using `MagenticBuilder`.
- [x] Create factory pattern for switching implementations.
- **Deliverable**: Same API, better multi-agent orchestration engine.

---

### **Phase 6: Embeddings & Semantic Search**

*Goal: Add vector search for semantic evidence retrieval.*

- [ ] Implement `EmbeddingService` with ChromaDB.
- [ ] Add semantic deduplication to SearchAgent.
- [ ] Enable semantic search for related evidence.
- [ ] Store embeddings in shared context.
- **Deliverable**: Find semantically related papers, not just keyword matches.

---

### **Phase 7: Hypothesis Agent**

*Goal: Generate scientific hypotheses to guide targeted searches.*

- [ ] Implement `MechanismHypothesis` and `HypothesisAssessment` models.
- [ ] Implement `HypothesisAgent` for mechanistic reasoning.
- [ ] Add hypothesis-driven search queries.
- [ ] Integrate into Magentic workflow.
- **Deliverable**: Drug β†’ Target β†’ Pathway β†’ Effect hypotheses that guide research.

---

### **Phase 8: Report Agent**

*Goal: Generate structured scientific reports with proper citations.*

- [ ] Implement `ResearchReport` model with all sections.
- [ ] Implement `ReportAgent` for synthesis.
- [ ] Include methodology, limitations, formatted references.
- [ ] Integrate as final synthesis step in Magentic workflow.
- **Deliverable**: Publication-quality research reports.

---

## Complete Architecture (Phases 1-8)

```text
User Query
    ↓
Gradio UI (Phase 4)
    ↓
Magentic Manager (Phase 5)
    β”œβ”€β”€ SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
    β”œβ”€β”€ HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
    β”œβ”€β”€ JudgeAgent (Phase 3+5) ←→ Evidence Assessment
    └── ReportAgent (Phase 8) ←→ Final Synthesis
    ↓
Structured Research Report
```

---

## Spec Documents

### Core Platform (Phases 1-8)

1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)** βœ…
2. **[Phase 2 Spec: Search Slice](02_phase_search.md)** βœ…
3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)** βœ…
4. **[Phase 4 Spec: UI & Loop](04_phase_ui.md)** βœ…
5. **[Phase 5 Spec: Magentic Integration](05_phase_magentic.md)** βœ…
6. **[Phase 6 Spec: Embeddings & Semantic Search](06_phase_embeddings.md)** βœ…
7. **[Phase 7 Spec: Hypothesis Agent](07_phase_hypothesis.md)** βœ…
8. **[Phase 8 Spec: Report Agent](08_phase_report.md)** βœ…

### Multi-Source Search (Phases 9-11)

9. **[Phase 9 Spec: Remove DuckDuckGo](09_phase_source_cleanup.md)** βœ…
10. **[Phase 10 Spec: ClinicalTrials.gov](10_phase_clinicaltrials.md)** βœ…
11. **[Phase 11 Spec: bioRxiv Preprints](11_phase_biorxiv.md)** βœ…

### Hackathon Integration (Phases 12-14)

12. **[Phase 12 Spec: MCP Server](12_phase_mcp_server.md)** βœ… COMPLETE
13. **[Phase 13 Spec: Modal Pipeline](13_phase_modal_integration.md)** πŸ“ P1 - $2,500
14. **[Phase 14 Spec: Demo & Submission](14_phase_demo_submission.md)** πŸ“ P0 - REQUIRED

---

## Progress Summary

| Phase | Status | Deliverable |
|-------|--------|-------------|
| Phase 1: Foundation | βœ… COMPLETE | CI-ready repo with uv/pytest |
| Phase 2: Search | βœ… COMPLETE | PubMed + Web search |
| Phase 3: Judge | βœ… COMPLETE | LLM evidence assessment |
| Phase 4: UI & Loop | βœ… COMPLETE | Working Gradio app |
| Phase 5: Magentic | βœ… COMPLETE | Multi-agent orchestration |
| Phase 6: Embeddings | βœ… COMPLETE | Semantic search + ChromaDB |
| Phase 7: Hypothesis | βœ… COMPLETE | Mechanistic reasoning chains |
| Phase 8: Report | βœ… COMPLETE | Structured scientific reports |
| Phase 9: Source Cleanup | βœ… COMPLETE | Remove DuckDuckGo |
| Phase 10: ClinicalTrials | βœ… COMPLETE | ClinicalTrials.gov API |
| Phase 11: bioRxiv | βœ… COMPLETE | Preprint search |
| Phase 12: MCP Server | βœ… COMPLETE | MCP protocol integration |
| Phase 13: Modal Pipeline | πŸ“ SPEC READY | Sandboxed code execution |
| Phase 14: Demo & Submit | πŸ“ SPEC READY | Hackathon submission |

*Phases 1-12 COMPLETE. Phases 13-14 for hackathon prizes.*

---

## Hackathon Prize Potential

| Award | Amount | Requirement | Phase |
|-------|--------|-------------|-------|
| Track 2: MCP in Action (1st) | $2,500 | MCP server working | 12 |
| Modal Innovation | $2,500 | Sandbox demo ready | 13 |
| LlamaIndex | $1,000 | Using RAG | βœ… Done |
| Community Choice | $1,000 | Great demo video | 14 |
| **Total Potential** | **$7,000** | | |

**Deadline: November 30, 2025 11:59 PM UTC**