Spaces:
Running
Running
File size: 9,867 Bytes
016b413 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 |
# Implementation Roadmap: DeepCritical (Vertical Slices)
**Philosophy:** AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025).
This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value.
---
## The 2025 "Gucci" Tooling Stack
We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy.
| Category | Tool | Why? |
|----------|------|------|
| **Package Manager** | **`uv`** | Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps. |
| **Linting/Format** | **`ruff`** | Rust-based, instant. Replaces black, isort, flake8. |
| **Type Checking** | **`mypy`** | Strict static typing. Run via `uv run mypy`. |
| **Testing** | **`pytest`** | The standard. |
| **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. |
| **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. |
| **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. |
| **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. |
---
## Architecture: Vertical Slices
Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**.
Each slice implements a feature from **Entry Point (UI/API) -> Logic -> Data/External**.
### Directory Structure (Maintainer's Structure)
```bash
src/
βββ app.py # Entry point (Gradio UI)
βββ orchestrator.py # Agent loop (Search -> Judge -> Loop)
βββ agent_factory/ # Agent creation and judges
β βββ __init__.py
β βββ agents.py # PydanticAI agent definitions
β βββ judges.py # JudgeHandler for evidence assessment
βββ tools/ # Search tools
β βββ __init__.py
β βββ pubmed.py # PubMed E-utilities tool
β βββ clinicaltrials.py # ClinicalTrials.gov API
β βββ biorxiv.py # bioRxiv/medRxiv preprints
β βββ code_execution.py # Modal sandbox execution
β βββ search_handler.py # Orchestrates multiple tools
βββ prompts/ # Prompt templates
β βββ __init__.py
β βββ judge.py # Judge prompts
βββ utils/ # Shared utilities
β βββ __init__.py
β βββ config.py # Settings/configuration
β βββ exceptions.py # Custom exceptions
β βββ models.py # Shared Pydantic models
β βββ dataloaders.py # Data loading utilities
β βββ parsers.py # Parsing utilities
βββ middleware/ # (Future: middleware components)
βββ database_services/ # (Future: database integrations)
βββ retrieval_factory/ # (Future: RAG components)
tests/
βββ unit/
β βββ tools/
β β βββ test_pubmed.py
β β βββ test_clinicaltrials.py
β β βββ test_biorxiv.py
β β βββ test_search_handler.py
β βββ agent_factory/
β β βββ test_judges.py
β βββ test_orchestrator.py
βββ integration/
βββ test_pubmed_live.py
```
---
## Phased Execution Plan
### **Phase 1: Foundation & Tooling (Day 1)**
*Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.*
- [ ] Initialize `pyproject.toml` with `uv`.
- [ ] Configure `ruff` (strict) and `mypy` (strict).
- [ ] Set up `pytest` with sugar and coverage.
- [ ] Implement `src/utils/config.py` (Configuration Slice).
- [ ] Implement `src/utils/exceptions.py` (Custom exceptions).
- **Deliverable**: A repo that passes CI with `uv run pytest`.
### **Phase 2: The "Search" Vertical Slice (Day 2)**
*Goal: Agent can receive a query and get raw results from PubMed/Web.*
- [ ] **TDD**: Write test for `SearchHandler`.
- [ ] Implement `src/tools/pubmed.py` (PubMed E-utilities).
- [ ] Implement `src/tools/websearch.py` (DuckDuckGo).
- [ ] Implement `src/tools/search_handler.py` (Orchestrates tools).
- [ ] Implement `src/utils/models.py` (Evidence, Citation, SearchResult).
- **Deliverable**: Function that takes "long covid" -> returns `List[Evidence]`.
### **Phase 3: The "Judge" Vertical Slice (Day 3)**
*Goal: Agent can decide if evidence is sufficient.*
- [ ] **TDD**: Write test for `JudgeHandler` (Mocked LLM).
- [ ] Implement `src/prompts/judge.py` (Structured outputs).
- [ ] Implement `src/agent_factory/judges.py` (LLM interaction).
- **Deliverable**: Function that takes `List[Evidence]` -> returns `JudgeAssessment`.
### **Phase 4: The "Loop" & UI Slice (Day 4)**
*Goal: End-to-End User Value.*
- [ ] Implement `src/orchestrator.py` (Connects Search + Judge loops).
- [ ] Build `src/app.py` (Gradio with Streaming).
- **Deliverable**: Working DeepCritical Agent on HuggingFace.
---
### **Phase 5: Magentic Integration** β
COMPLETE
*Goal: Upgrade orchestrator to use Microsoft Agent Framework patterns.*
- [x] Wrap SearchHandler as `AgentProtocol` (SearchAgent) with strict protocol compliance.
- [x] Wrap JudgeHandler as `AgentProtocol` (JudgeAgent) with strict protocol compliance.
- [x] Implement `MagenticOrchestrator` using `MagenticBuilder`.
- [x] Create factory pattern for switching implementations.
- **Deliverable**: Same API, better multi-agent orchestration engine.
---
### **Phase 6: Embeddings & Semantic Search**
*Goal: Add vector search for semantic evidence retrieval.*
- [ ] Implement `EmbeddingService` with ChromaDB.
- [ ] Add semantic deduplication to SearchAgent.
- [ ] Enable semantic search for related evidence.
- [ ] Store embeddings in shared context.
- **Deliverable**: Find semantically related papers, not just keyword matches.
---
### **Phase 7: Hypothesis Agent**
*Goal: Generate scientific hypotheses to guide targeted searches.*
- [ ] Implement `MechanismHypothesis` and `HypothesisAssessment` models.
- [ ] Implement `HypothesisAgent` for mechanistic reasoning.
- [ ] Add hypothesis-driven search queries.
- [ ] Integrate into Magentic workflow.
- **Deliverable**: Drug β Target β Pathway β Effect hypotheses that guide research.
---
### **Phase 8: Report Agent**
*Goal: Generate structured scientific reports with proper citations.*
- [ ] Implement `ResearchReport` model with all sections.
- [ ] Implement `ReportAgent` for synthesis.
- [ ] Include methodology, limitations, formatted references.
- [ ] Integrate as final synthesis step in Magentic workflow.
- **Deliverable**: Publication-quality research reports.
---
## Complete Architecture (Phases 1-8)
```text
User Query
β
Gradio UI (Phase 4)
β
Magentic Manager (Phase 5)
βββ SearchAgent (Phase 2+5) ββ PubMed + Web + VectorDB (Phase 6)
βββ HypothesisAgent (Phase 7) ββ Mechanistic Reasoning
βββ JudgeAgent (Phase 3+5) ββ Evidence Assessment
βββ ReportAgent (Phase 8) ββ Final Synthesis
β
Structured Research Report
```
---
## Spec Documents
### Core Platform (Phases 1-8)
1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)** β
2. **[Phase 2 Spec: Search Slice](02_phase_search.md)** β
3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)** β
4. **[Phase 4 Spec: UI & Loop](04_phase_ui.md)** β
5. **[Phase 5 Spec: Magentic Integration](05_phase_magentic.md)** β
6. **[Phase 6 Spec: Embeddings & Semantic Search](06_phase_embeddings.md)** β
7. **[Phase 7 Spec: Hypothesis Agent](07_phase_hypothesis.md)** β
8. **[Phase 8 Spec: Report Agent](08_phase_report.md)** β
### Multi-Source Search (Phases 9-11)
9. **[Phase 9 Spec: Remove DuckDuckGo](09_phase_source_cleanup.md)** β
10. **[Phase 10 Spec: ClinicalTrials.gov](10_phase_clinicaltrials.md)** β
11. **[Phase 11 Spec: bioRxiv Preprints](11_phase_biorxiv.md)** β
### Hackathon Integration (Phases 12-14)
12. **[Phase 12 Spec: MCP Server](12_phase_mcp_server.md)** β
COMPLETE
13. **[Phase 13 Spec: Modal Pipeline](13_phase_modal_integration.md)** π P1 - $2,500
14. **[Phase 14 Spec: Demo & Submission](14_phase_demo_submission.md)** π P0 - REQUIRED
---
## Progress Summary
| Phase | Status | Deliverable |
|-------|--------|-------------|
| Phase 1: Foundation | β
COMPLETE | CI-ready repo with uv/pytest |
| Phase 2: Search | β
COMPLETE | PubMed + Web search |
| Phase 3: Judge | β
COMPLETE | LLM evidence assessment |
| Phase 4: UI & Loop | β
COMPLETE | Working Gradio app |
| Phase 5: Magentic | β
COMPLETE | Multi-agent orchestration |
| Phase 6: Embeddings | β
COMPLETE | Semantic search + ChromaDB |
| Phase 7: Hypothesis | β
COMPLETE | Mechanistic reasoning chains |
| Phase 8: Report | β
COMPLETE | Structured scientific reports |
| Phase 9: Source Cleanup | β
COMPLETE | Remove DuckDuckGo |
| Phase 10: ClinicalTrials | β
COMPLETE | ClinicalTrials.gov API |
| Phase 11: bioRxiv | β
COMPLETE | Preprint search |
| Phase 12: MCP Server | β
COMPLETE | MCP protocol integration |
| Phase 13: Modal Pipeline | π SPEC READY | Sandboxed code execution |
| Phase 14: Demo & Submit | π SPEC READY | Hackathon submission |
*Phases 1-12 COMPLETE. Phases 13-14 for hackathon prizes.*
---
## Hackathon Prize Potential
| Award | Amount | Requirement | Phase |
|-------|--------|-------------|-------|
| Track 2: MCP in Action (1st) | $2,500 | MCP server working | 12 |
| Modal Innovation | $2,500 | Sandbox demo ready | 13 |
| LlamaIndex | $1,000 | Using RAG | β
Done |
| Community Choice | $1,000 | Great demo video | 14 |
| **Total Potential** | **$7,000** | | |
**Deadline: November 30, 2025 11:59 PM UTC**
|