Spaces:
Running
Running
| # Implementation Roadmap: DeepCritical (Vertical Slices) | |
| **Philosophy:** AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025). | |
| This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value. | |
| --- | |
| ## The 2025 "Gucci" Tooling Stack | |
| We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy. | |
| | Category | Tool | Why? | | |
| |----------|------|------| | |
| | **Package Manager** | **`uv`** | Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps. | | |
| | **Linting/Format** | **`ruff`** | Rust-based, instant. Replaces black, isort, flake8. | | |
| | **Type Checking** | **`mypy`** | Strict static typing. Run via `uv run mypy`. | | |
| | **Testing** | **`pytest`** | The standard. | | |
| | **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. | | |
| | **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. | | |
| | **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. | | |
| | **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. | | |
| --- | |
| ## Architecture: Vertical Slices | |
| Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**. | |
| Each slice implements a feature from **Entry Point (UI/API) -> Logic -> Data/External**. | |
| ### Directory Structure (Maintainer's Structure) | |
| ```bash | |
| src/ | |
| βββ app.py # Entry point (Gradio UI) | |
| βββ orchestrator.py # Agent loop (Search -> Judge -> Loop) | |
| βββ agent_factory/ # Agent creation and judges | |
| β βββ __init__.py | |
| β βββ agents.py # PydanticAI agent definitions | |
| β βββ judges.py # JudgeHandler for evidence assessment | |
| βββ tools/ # Search tools | |
| β βββ __init__.py | |
| β βββ pubmed.py # PubMed E-utilities tool | |
| β βββ clinicaltrials.py # ClinicalTrials.gov API | |
| β βββ biorxiv.py # bioRxiv/medRxiv preprints | |
| β βββ code_execution.py # Modal sandbox execution | |
| β βββ search_handler.py # Orchestrates multiple tools | |
| βββ prompts/ # Prompt templates | |
| β βββ __init__.py | |
| β βββ judge.py # Judge prompts | |
| βββ utils/ # Shared utilities | |
| β βββ __init__.py | |
| β βββ config.py # Settings/configuration | |
| β βββ exceptions.py # Custom exceptions | |
| β βββ models.py # Shared Pydantic models | |
| β βββ dataloaders.py # Data loading utilities | |
| β βββ parsers.py # Parsing utilities | |
| βββ middleware/ # (Future: middleware components) | |
| βββ database_services/ # (Future: database integrations) | |
| βββ retrieval_factory/ # (Future: RAG components) | |
| tests/ | |
| βββ unit/ | |
| β βββ tools/ | |
| β β βββ test_pubmed.py | |
| β β βββ test_clinicaltrials.py | |
| β β βββ test_biorxiv.py | |
| β β βββ test_search_handler.py | |
| β βββ agent_factory/ | |
| β β βββ test_judges.py | |
| β βββ test_orchestrator.py | |
| βββ integration/ | |
| βββ test_pubmed_live.py | |
| ``` | |
| --- | |
| ## Phased Execution Plan | |
| ### **Phase 1: Foundation & Tooling (Day 1)** | |
| *Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.* | |
| - [ ] Initialize `pyproject.toml` with `uv`. | |
| - [ ] Configure `ruff` (strict) and `mypy` (strict). | |
| - [ ] Set up `pytest` with sugar and coverage. | |
| - [ ] Implement `src/utils/config.py` (Configuration Slice). | |
| - [ ] Implement `src/utils/exceptions.py` (Custom exceptions). | |
| - **Deliverable**: A repo that passes CI with `uv run pytest`. | |
| ### **Phase 2: The "Search" Vertical Slice (Day 2)** | |
| *Goal: Agent can receive a query and get raw results from PubMed/Web.* | |
| - [ ] **TDD**: Write test for `SearchHandler`. | |
| - [ ] Implement `src/tools/pubmed.py` (PubMed E-utilities). | |
| - [ ] Implement `src/tools/websearch.py` (DuckDuckGo). | |
| - [ ] Implement `src/tools/search_handler.py` (Orchestrates tools). | |
| - [ ] Implement `src/utils/models.py` (Evidence, Citation, SearchResult). | |
| - **Deliverable**: Function that takes "long covid" -> returns `List[Evidence]`. | |
| ### **Phase 3: The "Judge" Vertical Slice (Day 3)** | |
| *Goal: Agent can decide if evidence is sufficient.* | |
| - [ ] **TDD**: Write test for `JudgeHandler` (Mocked LLM). | |
| - [ ] Implement `src/prompts/judge.py` (Structured outputs). | |
| - [ ] Implement `src/agent_factory/judges.py` (LLM interaction). | |
| - **Deliverable**: Function that takes `List[Evidence]` -> returns `JudgeAssessment`. | |
| ### **Phase 4: The "Loop" & UI Slice (Day 4)** | |
| *Goal: End-to-End User Value.* | |
| - [ ] Implement `src/orchestrator.py` (Connects Search + Judge loops). | |
| - [ ] Build `src/app.py` (Gradio with Streaming). | |
| - **Deliverable**: Working DeepCritical Agent on HuggingFace. | |
| --- | |
| ### **Phase 5: Magentic Integration** β COMPLETE | |
| *Goal: Upgrade orchestrator to use Microsoft Agent Framework patterns.* | |
| - [x] Wrap SearchHandler as `AgentProtocol` (SearchAgent) with strict protocol compliance. | |
| - [x] Wrap JudgeHandler as `AgentProtocol` (JudgeAgent) with strict protocol compliance. | |
| - [x] Implement `MagenticOrchestrator` using `MagenticBuilder`. | |
| - [x] Create factory pattern for switching implementations. | |
| - **Deliverable**: Same API, better multi-agent orchestration engine. | |
| --- | |
| ### **Phase 6: Embeddings & Semantic Search** | |
| *Goal: Add vector search for semantic evidence retrieval.* | |
| - [ ] Implement `EmbeddingService` with ChromaDB. | |
| - [ ] Add semantic deduplication to SearchAgent. | |
| - [ ] Enable semantic search for related evidence. | |
| - [ ] Store embeddings in shared context. | |
| - **Deliverable**: Find semantically related papers, not just keyword matches. | |
| --- | |
| ### **Phase 7: Hypothesis Agent** | |
| *Goal: Generate scientific hypotheses to guide targeted searches.* | |
| - [ ] Implement `MechanismHypothesis` and `HypothesisAssessment` models. | |
| - [ ] Implement `HypothesisAgent` for mechanistic reasoning. | |
| - [ ] Add hypothesis-driven search queries. | |
| - [ ] Integrate into Magentic workflow. | |
| - **Deliverable**: Drug β Target β Pathway β Effect hypotheses that guide research. | |
| --- | |
| ### **Phase 8: Report Agent** | |
| *Goal: Generate structured scientific reports with proper citations.* | |
| - [ ] Implement `ResearchReport` model with all sections. | |
| - [ ] Implement `ReportAgent` for synthesis. | |
| - [ ] Include methodology, limitations, formatted references. | |
| - [ ] Integrate as final synthesis step in Magentic workflow. | |
| - **Deliverable**: Publication-quality research reports. | |
| --- | |
| ## Complete Architecture (Phases 1-8) | |
| ```text | |
| User Query | |
| β | |
| Gradio UI (Phase 4) | |
| β | |
| Magentic Manager (Phase 5) | |
| βββ SearchAgent (Phase 2+5) ββ PubMed + Web + VectorDB (Phase 6) | |
| βββ HypothesisAgent (Phase 7) ββ Mechanistic Reasoning | |
| βββ JudgeAgent (Phase 3+5) ββ Evidence Assessment | |
| βββ ReportAgent (Phase 8) ββ Final Synthesis | |
| β | |
| Structured Research Report | |
| ``` | |
| --- | |
| ## Spec Documents | |
| ### Core Platform (Phases 1-8) | |
| 1. **[Phase 1 Spec: Foundation](01_phase_foundation.md)** β | |
| 2. **[Phase 2 Spec: Search Slice](02_phase_search.md)** β | |
| 3. **[Phase 3 Spec: Judge Slice](03_phase_judge.md)** β | |
| 4. **[Phase 4 Spec: UI & Loop](04_phase_ui.md)** β | |
| 5. **[Phase 5 Spec: Magentic Integration](05_phase_magentic.md)** β | |
| 6. **[Phase 6 Spec: Embeddings & Semantic Search](06_phase_embeddings.md)** β | |
| 7. **[Phase 7 Spec: Hypothesis Agent](07_phase_hypothesis.md)** β | |
| 8. **[Phase 8 Spec: Report Agent](08_phase_report.md)** β | |
| ### Multi-Source Search (Phases 9-11) | |
| 9. **[Phase 9 Spec: Remove DuckDuckGo](09_phase_source_cleanup.md)** β | |
| 10. **[Phase 10 Spec: ClinicalTrials.gov](10_phase_clinicaltrials.md)** β | |
| 11. **[Phase 11 Spec: bioRxiv Preprints](11_phase_biorxiv.md)** β | |
| ### Hackathon Integration (Phases 12-14) | |
| 12. **[Phase 12 Spec: MCP Server](12_phase_mcp_server.md)** β COMPLETE | |
| 13. **[Phase 13 Spec: Modal Pipeline](13_phase_modal_integration.md)** π P1 - $2,500 | |
| 14. **[Phase 14 Spec: Demo & Submission](14_phase_demo_submission.md)** π P0 - REQUIRED | |
| --- | |
| ## Progress Summary | |
| | Phase | Status | Deliverable | | |
| |-------|--------|-------------| | |
| | Phase 1: Foundation | β COMPLETE | CI-ready repo with uv/pytest | | |
| | Phase 2: Search | β COMPLETE | PubMed + Web search | | |
| | Phase 3: Judge | β COMPLETE | LLM evidence assessment | | |
| | Phase 4: UI & Loop | β COMPLETE | Working Gradio app | | |
| | Phase 5: Magentic | β COMPLETE | Multi-agent orchestration | | |
| | Phase 6: Embeddings | β COMPLETE | Semantic search + ChromaDB | | |
| | Phase 7: Hypothesis | β COMPLETE | Mechanistic reasoning chains | | |
| | Phase 8: Report | β COMPLETE | Structured scientific reports | | |
| | Phase 9: Source Cleanup | β COMPLETE | Remove DuckDuckGo | | |
| | Phase 10: ClinicalTrials | β COMPLETE | ClinicalTrials.gov API | | |
| | Phase 11: bioRxiv | β COMPLETE | Preprint search | | |
| | Phase 12: MCP Server | β COMPLETE | MCP protocol integration | | |
| | Phase 13: Modal Pipeline | π SPEC READY | Sandboxed code execution | | |
| | Phase 14: Demo & Submit | π SPEC READY | Hackathon submission | | |
| *Phases 1-12 COMPLETE. Phases 13-14 for hackathon prizes.* | |
| --- | |
| ## Hackathon Prize Potential | |
| | Award | Amount | Requirement | Phase | | |
| |-------|--------|-------------|-------| | |
| | Track 2: MCP in Action (1st) | $2,500 | MCP server working | 12 | | |
| | Modal Innovation | $2,500 | Sandbox demo ready | 13 | | |
| | LlamaIndex | $1,000 | Using RAG | β Done | | |
| | Community Choice | $1,000 | Great demo video | 14 | | |
| | **Total Potential** | **$7,000** | | | | |
| **Deadline: November 30, 2025 11:59 PM UTC** | |