Spaces:
Running
Implementation Roadmap: DeepCritical (Vertical Slices)
Philosophy: AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025).
This roadmap defines the execution strategy to deliver DeepCritical effectively. We reject "overplanning" in favor of ironclad, testable vertical slices. Each phase delivers a fully functional slice of end-to-end value.
The 2025 "Gucci" Tooling Stack
We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy.
| Category | Tool | Why? |
|---|---|---|
| Package Manager | uv |
Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps. |
| Linting/Format | ruff |
Rust-based, instant. Replaces black, isort, flake8. |
| Type Checking | mypy |
Strict static typing. Run via uv run mypy. |
| Testing | pytest |
The standard. |
| Test Plugins | pytest-sugar |
Instant feedback, progress bars. "Gucci" visuals. |
| Test Plugins | pytest-asyncio |
Essential for our async agent loop. |
| Test Plugins | pytest-cov |
Coverage reporting to ensure TDD adherence. |
| Git Hooks | pre-commit |
Enforce ruff/mypy before commit. |
Architecture: Vertical Slices
Instead of horizontal layers (e.g., "Building the Database Layer"), we build Vertical Slices. Each slice implements a feature from Entry Point (UI/API) -> Logic -> Data/External.
Directory Structure (Maintainer's Structure)
src/
βββ app.py # Entry point (Gradio UI)
βββ orchestrator.py # Agent loop (Search -> Judge -> Loop)
βββ agent_factory/ # Agent creation and judges
β βββ __init__.py
β βββ agents.py # PydanticAI agent definitions
β βββ judges.py # JudgeHandler for evidence assessment
βββ tools/ # Search tools
β βββ __init__.py
β βββ pubmed.py # PubMed E-utilities tool
β βββ clinicaltrials.py # ClinicalTrials.gov API
β βββ biorxiv.py # bioRxiv/medRxiv preprints
β βββ code_execution.py # Modal sandbox execution
β βββ search_handler.py # Orchestrates multiple tools
βββ prompts/ # Prompt templates
β βββ __init__.py
β βββ judge.py # Judge prompts
βββ utils/ # Shared utilities
β βββ __init__.py
β βββ config.py # Settings/configuration
β βββ exceptions.py # Custom exceptions
β βββ models.py # Shared Pydantic models
β βββ dataloaders.py # Data loading utilities
β βββ parsers.py # Parsing utilities
βββ middleware/ # (Future: middleware components)
βββ database_services/ # (Future: database integrations)
βββ retrieval_factory/ # (Future: RAG components)
tests/
βββ unit/
β βββ tools/
β β βββ test_pubmed.py
β β βββ test_clinicaltrials.py
β β βββ test_biorxiv.py
β β βββ test_search_handler.py
β βββ agent_factory/
β β βββ test_judges.py
β βββ test_orchestrator.py
βββ integration/
βββ test_pubmed_live.py
Phased Execution Plan
Phase 1: Foundation & Tooling (Day 1)
Goal: A rock-solid, CI-ready environment with uv and pytest configured.
- Initialize
pyproject.tomlwithuv. - Configure
ruff(strict) andmypy(strict). - Set up
pytestwith sugar and coverage. - Implement
src/utils/config.py(Configuration Slice). - Implement
src/utils/exceptions.py(Custom exceptions). - Deliverable: A repo that passes CI with
uv run pytest.
Phase 2: The "Search" Vertical Slice (Day 2)
Goal: Agent can receive a query and get raw results from PubMed/Web.
- TDD: Write test for
SearchHandler. - Implement
src/tools/pubmed.py(PubMed E-utilities). - Implement
src/tools/websearch.py(DuckDuckGo). - Implement
src/tools/search_handler.py(Orchestrates tools). - Implement
src/utils/models.py(Evidence, Citation, SearchResult). - Deliverable: Function that takes "long covid" -> returns
List[Evidence].
Phase 3: The "Judge" Vertical Slice (Day 3)
Goal: Agent can decide if evidence is sufficient.
- TDD: Write test for
JudgeHandler(Mocked LLM). - Implement
src/prompts/judge.py(Structured outputs). - Implement
src/agent_factory/judges.py(LLM interaction). - Deliverable: Function that takes
List[Evidence]-> returnsJudgeAssessment.
Phase 4: The "Loop" & UI Slice (Day 4)
Goal: End-to-End User Value.
- Implement
src/orchestrator.py(Connects Search + Judge loops). - Build
src/app.py(Gradio with Streaming). - Deliverable: Working DeepCritical Agent on HuggingFace.
Phase 5: Magentic Integration β COMPLETE
Goal: Upgrade orchestrator to use Microsoft Agent Framework patterns.
- Wrap SearchHandler as
AgentProtocol(SearchAgent) with strict protocol compliance. - Wrap JudgeHandler as
AgentProtocol(JudgeAgent) with strict protocol compliance. - Implement
MagenticOrchestratorusingMagenticBuilder. - Create factory pattern for switching implementations.
- Deliverable: Same API, better multi-agent orchestration engine.
Phase 6: Embeddings & Semantic Search
Goal: Add vector search for semantic evidence retrieval.
- Implement
EmbeddingServicewith ChromaDB. - Add semantic deduplication to SearchAgent.
- Enable semantic search for related evidence.
- Store embeddings in shared context.
- Deliverable: Find semantically related papers, not just keyword matches.
Phase 7: Hypothesis Agent
Goal: Generate scientific hypotheses to guide targeted searches.
- Implement
MechanismHypothesisandHypothesisAssessmentmodels. - Implement
HypothesisAgentfor mechanistic reasoning. - Add hypothesis-driven search queries.
- Integrate into Magentic workflow.
- Deliverable: Drug β Target β Pathway β Effect hypotheses that guide research.
Phase 8: Report Agent
Goal: Generate structured scientific reports with proper citations.
- Implement
ResearchReportmodel with all sections. - Implement
ReportAgentfor synthesis. - Include methodology, limitations, formatted references.
- Integrate as final synthesis step in Magentic workflow.
- Deliverable: Publication-quality research reports.
Complete Architecture (Phases 1-8)
User Query
β
Gradio UI (Phase 4)
β
Magentic Manager (Phase 5)
βββ SearchAgent (Phase 2+5) ββ PubMed + Web + VectorDB (Phase 6)
βββ HypothesisAgent (Phase 7) ββ Mechanistic Reasoning
βββ JudgeAgent (Phase 3+5) ββ Evidence Assessment
βββ ReportAgent (Phase 8) ββ Final Synthesis
β
Structured Research Report
Spec Documents
Core Platform (Phases 1-8)
- Phase 1 Spec: Foundation β
- Phase 2 Spec: Search Slice β
- Phase 3 Spec: Judge Slice β
- Phase 4 Spec: UI & Loop β
- Phase 5 Spec: Magentic Integration β
- Phase 6 Spec: Embeddings & Semantic Search β
- Phase 7 Spec: Hypothesis Agent β
- Phase 8 Spec: Report Agent β
Multi-Source Search (Phases 9-11)
- Phase 9 Spec: Remove DuckDuckGo β
- Phase 10 Spec: ClinicalTrials.gov β
- Phase 11 Spec: bioRxiv Preprints β
Hackathon Integration (Phases 12-14)
- Phase 12 Spec: MCP Server β COMPLETE
- Phase 13 Spec: Modal Pipeline π P1 - $2,500
- Phase 14 Spec: Demo & Submission π P0 - REQUIRED
Progress Summary
| Phase | Status | Deliverable |
|---|---|---|
| Phase 1: Foundation | β COMPLETE | CI-ready repo with uv/pytest |
| Phase 2: Search | β COMPLETE | PubMed + Web search |
| Phase 3: Judge | β COMPLETE | LLM evidence assessment |
| Phase 4: UI & Loop | β COMPLETE | Working Gradio app |
| Phase 5: Magentic | β COMPLETE | Multi-agent orchestration |
| Phase 6: Embeddings | β COMPLETE | Semantic search + ChromaDB |
| Phase 7: Hypothesis | β COMPLETE | Mechanistic reasoning chains |
| Phase 8: Report | β COMPLETE | Structured scientific reports |
| Phase 9: Source Cleanup | β COMPLETE | Remove DuckDuckGo |
| Phase 10: ClinicalTrials | β COMPLETE | ClinicalTrials.gov API |
| Phase 11: bioRxiv | β COMPLETE | Preprint search |
| Phase 12: MCP Server | β COMPLETE | MCP protocol integration |
| Phase 13: Modal Pipeline | π SPEC READY | Sandboxed code execution |
| Phase 14: Demo & Submit | π SPEC READY | Hackathon submission |
Phases 1-12 COMPLETE. Phases 13-14 for hackathon prizes.
Hackathon Prize Potential
| Award | Amount | Requirement | Phase |
|---|---|---|---|
| Track 2: MCP in Action (1st) | $2,500 | MCP server working | 12 |
| Modal Innovation | $2,500 | Sandbox demo ready | 13 |
| LlamaIndex | $1,000 | Using RAG | β Done |
| Community Choice | $1,000 | Great demo video | 14 |
| Total Potential | $7,000 |
Deadline: November 30, 2025 11:59 PM UTC