Joseph Pollack
Initial commit - Independent repository - Breaking fork relationship
016b413
|
raw
history blame
9.87 kB
# Implementation Roadmap: DeepCritical (Vertical Slices)
**Philosophy:** AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025).
This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value.
---
## The 2025 "Gucci" Tooling Stack
We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy.
| Category | Tool | Why? |
|----------|------|------|
| **Package Manager** | **`uv`** | Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps. |
| **Linting/Format** | **`ruff`** | Rust-based, instant. Replaces black, isort, flake8. |
| **Type Checking** | **`mypy`** | Strict static typing. Run via `uv run mypy`. |
| **Testing** | **`pytest`** | The standard. |
| **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. |
| **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. |
| **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. |
| **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. |
---
## Architecture: Vertical Slices
Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**.
Each slice implements a feature from **Entry Point (UI/API) -> Logic -> Data/External**.
### Directory Structure (Maintainer's Structure)
```bash
src/
β”œβ”€β”€ app.py # Entry point (Gradio UI)
β”œβ”€β”€ orchestrator.py # Agent loop (Search -> Judge -> Loop)
β”œβ”€β”€ agent_factory/ # Agent creation and judges
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ agents.py # PydanticAI agent definitions
β”‚ └── judges.py # JudgeHandler for evidence assessment
β”œβ”€β”€ tools/ # Search tools
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ pubmed.py # PubMed E-utilities tool
β”‚ β”œβ”€β”€ clinicaltrials.py # ClinicalTrials.gov API
β”‚ β”œβ”€β”€ biorxiv.py # bioRxiv/medRxiv preprints
β”‚ β”œβ”€β”€ code_execution.py # Modal sandbox execution
β”‚ └── search_handler.py # Orchestrates multiple tools
β”œβ”€β”€ prompts/ # Prompt templates
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── judge.py # Judge prompts
β”œβ”€β”€ utils/ # Shared utilities
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ config.py # Settings/configuration
β”‚ β”œβ”€β”€ exceptions.py # Custom exceptions
β”‚ β”œβ”€β”€ models.py # Shared Pydantic models
β”‚ β”œβ”€β”€ dataloaders.py # Data loading utilities
β”‚ └── parsers.py # Parsing utilities
β”œβ”€β”€ middleware/ # (Future: middleware components)
β”œβ”€β”€ database_services/ # (Future: database integrations)
└── retrieval_factory/ # (Future: RAG components)
tests/
β”œβ”€β”€ unit/
β”‚ β”œβ”€β”€ tools/
β”‚ β”‚ β”œβ”€β”€ test_pubmed.py
β”‚ β”‚ β”œβ”€β”€ test_clinicaltrials.py
β”‚ β”‚ β”œβ”€β”€ test_biorxiv.py
β”‚ β”‚ └── test_search_handler.py
β”‚ β”œβ”€β”€ agent_factory/
β”‚ β”‚ └── test_judges.py
β”‚ └── test_orchestrator.py
└── integration/
└── test_pubmed_live.py
```
---
## Phased Execution Plan
### **Phase 1: Foundation & Tooling (Day 1)**
*Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.*
- [ ] Initialize `pyproject.toml` with `uv`.
- [ ] Configure `ruff` (strict) and `mypy` (strict).
- [ ] Set up `pytest` with sugar and coverage.
- [ ] Implement `src/utils/config.py` (Configuration Slice).
- [ ] Implement `src/utils/exceptions.py` (Custom exceptions).
- **Deliverable**: A repo that passes CI with `uv run pytest`.
### **Phase 2: The "Search" Vertical Slice (Day 2)**
*Goal: Agent can receive a query and get raw results from PubMed/Web.*
- [ ] **TDD**: Write test for `SearchHandler`.
- [ ] Implement `src/tools/pubmed.py` (PubMed E-utilities).
- [ ] Implement `src/tools/websearch.py` (DuckDuckGo).
- [ ] Implement `src/tools/search_handler.py` (Orchestrates tools).
- [ ] Implement `src/utils/models.py` (Evidence, Citation, SearchResult).
- **Deliverable**: Function that takes "long covid" -> returns `List[Evidence]`.
### **Phase 3: The "Judge" Vertical Slice (Day 3)**
*Goal: Agent can decide if evidence is sufficient.*
- [ ] **TDD**: Write test for `JudgeHandler` (Mocked LLM).
- [ ] Implement `src/prompts/judge.py` (Structured outputs).
- [ ] Implement `src/agent_factory/judges.py` (LLM interaction).
- **Deliverable**: Function that takes `List[Evidence]` -> returns `JudgeAssessment`.
### **Phase 4: The "Loop" & UI Slice (Day 4)**
*Goal: End-to-End User Value.*
- [ ] Implement `src/orchestrator.py` (Connects Search + Judge loops).
- [ ] Build `src/app.py` (Gradio with Streaming).
- **Deliverable**: Working DeepCritical Agent on HuggingFace.
---
### **Phase 5: Magentic Integration** βœ… COMPLETE
*Goal: Upgrade orchestrator to use Microsoft Agent Framework patterns.*
- [x] Wrap SearchHandler as `AgentProtocol` (SearchAgent) with strict protocol compliance.
- [x] Wrap JudgeHandler as `AgentProtocol` (JudgeAgent) with strict protocol compliance.
- [x] Implement `MagenticOrchestrator` using `MagenticBuilder`.
- [x] Create factory pattern for switching implementations.
- **Deliverable**: Same API, better multi-agent orchestration engine.
---
### **Phase 6: Embeddings & Semantic Search**
*Goal: Add vector search for semantic evidence retrieval.*
- [ ] Implement `EmbeddingService` with ChromaDB.
- [ ] Add semantic deduplication to SearchAgent.
- [ ] Enable semantic search for related evidence.
- [ ] Store embeddings in shared context.
- **Deliverable**: Find semantically related papers, not just keyword matches.
---
### **Phase 7: Hypothesis Agent**
*Goal: Generate scientific hypotheses to guide targeted searches.*
- [ ] Implement `MechanismHypothesis` and `HypothesisAssessment` models.
- [ ] Implement `HypothesisAgent` for mechanistic reasoning.
- [ ] Add hypothesis-driven search queries.
- [ ] Integrate into Magentic workflow.
- **Deliverable**: Drug β†’ Target β†’ Pathway β†’ Effect hypotheses that guide research.
---
### **Phase 8: Report Agent**
*Goal: Generate structured scientific reports with proper citations.*
- [ ] Implement `ResearchReport` model with all sections.
- [ ] Implement `ReportAgent` for synthesis.
- [ ] Include methodology, limitations, formatted references.
- [ ] Integrate as final synthesis step in Magentic workflow.
- **Deliverable**: Publication-quality research reports.
---
## Complete Architecture (Phases 1-8)
```text
User Query
↓
Gradio UI (Phase 4)
↓
Magentic Manager (Phase 5)
β”œβ”€β”€ SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
β”œβ”€β”€ HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
β”œβ”€β”€ JudgeAgent (Phase 3+5) ←→ Evidence Assessment
└── ReportAgent (Phase 8) ←→ Final Synthesis
↓
Structured Research Report
```
---
## Spec Documents
### Core Platform (Phases 1-8)
1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)** βœ…
2. **[Phase 2 Spec: Search Slice](02_phase_search.md)** βœ…
3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)** βœ…
4. **[Phase 4 Spec: UI & Loop](04_phase_ui.md)** βœ…
5. **[Phase 5 Spec: Magentic Integration](05_phase_magentic.md)** βœ…
6. **[Phase 6 Spec: Embeddings & Semantic Search](06_phase_embeddings.md)** βœ…
7. **[Phase 7 Spec: Hypothesis Agent](07_phase_hypothesis.md)** βœ…
8. **[Phase 8 Spec: Report Agent](08_phase_report.md)** βœ…
### Multi-Source Search (Phases 9-11)
9. **[Phase 9 Spec: Remove DuckDuckGo](09_phase_source_cleanup.md)** βœ…
10. **[Phase 10 Spec: ClinicalTrials.gov](10_phase_clinicaltrials.md)** βœ…
11. **[Phase 11 Spec: bioRxiv Preprints](11_phase_biorxiv.md)** βœ…
### Hackathon Integration (Phases 12-14)
12. **[Phase 12 Spec: MCP Server](12_phase_mcp_server.md)** βœ… COMPLETE
13. **[Phase 13 Spec: Modal Pipeline](13_phase_modal_integration.md)** πŸ“ P1 - $2,500
14. **[Phase 14 Spec: Demo & Submission](14_phase_demo_submission.md)** πŸ“ P0 - REQUIRED
---
## Progress Summary
| Phase | Status | Deliverable |
|-------|--------|-------------|
| Phase 1: Foundation | βœ… COMPLETE | CI-ready repo with uv/pytest |
| Phase 2: Search | βœ… COMPLETE | PubMed + Web search |
| Phase 3: Judge | βœ… COMPLETE | LLM evidence assessment |
| Phase 4: UI & Loop | βœ… COMPLETE | Working Gradio app |
| Phase 5: Magentic | βœ… COMPLETE | Multi-agent orchestration |
| Phase 6: Embeddings | βœ… COMPLETE | Semantic search + ChromaDB |
| Phase 7: Hypothesis | βœ… COMPLETE | Mechanistic reasoning chains |
| Phase 8: Report | βœ… COMPLETE | Structured scientific reports |
| Phase 9: Source Cleanup | βœ… COMPLETE | Remove DuckDuckGo |
| Phase 10: ClinicalTrials | βœ… COMPLETE | ClinicalTrials.gov API |
| Phase 11: bioRxiv | βœ… COMPLETE | Preprint search |
| Phase 12: MCP Server | βœ… COMPLETE | MCP protocol integration |
| Phase 13: Modal Pipeline | πŸ“ SPEC READY | Sandboxed code execution |
| Phase 14: Demo & Submit | πŸ“ SPEC READY | Hackathon submission |
*Phases 1-12 COMPLETE. Phases 13-14 for hackathon prizes.*
---
## Hackathon Prize Potential
| Award | Amount | Requirement | Phase |
|-------|--------|-------------|-------|
| Track 2: MCP in Action (1st) | $2,500 | MCP server working | 12 |
| Modal Innovation | $2,500 | Sandbox demo ready | 13 |
| LlamaIndex | $1,000 | Using RAG | βœ… Done |
| Community Choice | $1,000 | Great demo video | 14 |
| **Total Potential** | **$7,000** | | |
**Deadline: November 30, 2025 11:59 PM UTC**