Joseph Pollack
Initial commit - Independent repository - Breaking fork relationship
016b413
|
raw
history blame
9.87 kB

Implementation Roadmap: DeepCritical (Vertical Slices)

Philosophy: AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025).

This roadmap defines the execution strategy to deliver DeepCritical effectively. We reject "overplanning" in favor of ironclad, testable vertical slices. Each phase delivers a fully functional slice of end-to-end value.


The 2025 "Gucci" Tooling Stack

We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy.

Category Tool Why?
Package Manager uv Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps.
Linting/Format ruff Rust-based, instant. Replaces black, isort, flake8.
Type Checking mypy Strict static typing. Run via uv run mypy.
Testing pytest The standard.
Test Plugins pytest-sugar Instant feedback, progress bars. "Gucci" visuals.
Test Plugins pytest-asyncio Essential for our async agent loop.
Test Plugins pytest-cov Coverage reporting to ensure TDD adherence.
Git Hooks pre-commit Enforce ruff/mypy before commit.

Architecture: Vertical Slices

Instead of horizontal layers (e.g., "Building the Database Layer"), we build Vertical Slices. Each slice implements a feature from Entry Point (UI/API) -> Logic -> Data/External.

Directory Structure (Maintainer's Structure)

src/
β”œβ”€β”€ app.py                      # Entry point (Gradio UI)
β”œβ”€β”€ orchestrator.py             # Agent loop (Search -> Judge -> Loop)
β”œβ”€β”€ agent_factory/              # Agent creation and judges
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ agents.py               # PydanticAI agent definitions
β”‚   └── judges.py               # JudgeHandler for evidence assessment
β”œβ”€β”€ tools/                      # Search tools
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ pubmed.py               # PubMed E-utilities tool
β”‚   β”œβ”€β”€ clinicaltrials.py       # ClinicalTrials.gov API
β”‚   β”œβ”€β”€ biorxiv.py              # bioRxiv/medRxiv preprints
β”‚   β”œβ”€β”€ code_execution.py       # Modal sandbox execution
β”‚   └── search_handler.py       # Orchestrates multiple tools
β”œβ”€β”€ prompts/                    # Prompt templates
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── judge.py                # Judge prompts
β”œβ”€β”€ utils/                      # Shared utilities
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ config.py               # Settings/configuration
β”‚   β”œβ”€β”€ exceptions.py           # Custom exceptions
β”‚   β”œβ”€β”€ models.py               # Shared Pydantic models
β”‚   β”œβ”€β”€ dataloaders.py          # Data loading utilities
β”‚   └── parsers.py              # Parsing utilities
β”œβ”€β”€ middleware/                 # (Future: middleware components)
β”œβ”€β”€ database_services/          # (Future: database integrations)
└── retrieval_factory/          # (Future: RAG components)

tests/
β”œβ”€β”€ unit/
β”‚   β”œβ”€β”€ tools/
β”‚   β”‚   β”œβ”€β”€ test_pubmed.py
β”‚   β”‚   β”œβ”€β”€ test_clinicaltrials.py
β”‚   β”‚   β”œβ”€β”€ test_biorxiv.py
β”‚   β”‚   └── test_search_handler.py
β”‚   β”œβ”€β”€ agent_factory/
β”‚   β”‚   └── test_judges.py
β”‚   └── test_orchestrator.py
└── integration/
    └── test_pubmed_live.py

Phased Execution Plan

Phase 1: Foundation & Tooling (Day 1)

Goal: A rock-solid, CI-ready environment with uv and pytest configured.

  • Initialize pyproject.toml with uv.
  • Configure ruff (strict) and mypy (strict).
  • Set up pytest with sugar and coverage.
  • Implement src/utils/config.py (Configuration Slice).
  • Implement src/utils/exceptions.py (Custom exceptions).
  • Deliverable: A repo that passes CI with uv run pytest.

Phase 2: The "Search" Vertical Slice (Day 2)

Goal: Agent can receive a query and get raw results from PubMed/Web.

  • TDD: Write test for SearchHandler.
  • Implement src/tools/pubmed.py (PubMed E-utilities).
  • Implement src/tools/websearch.py (DuckDuckGo).
  • Implement src/tools/search_handler.py (Orchestrates tools).
  • Implement src/utils/models.py (Evidence, Citation, SearchResult).
  • Deliverable: Function that takes "long covid" -> returns List[Evidence].

Phase 3: The "Judge" Vertical Slice (Day 3)

Goal: Agent can decide if evidence is sufficient.

  • TDD: Write test for JudgeHandler (Mocked LLM).
  • Implement src/prompts/judge.py (Structured outputs).
  • Implement src/agent_factory/judges.py (LLM interaction).
  • Deliverable: Function that takes List[Evidence] -> returns JudgeAssessment.

Phase 4: The "Loop" & UI Slice (Day 4)

Goal: End-to-End User Value.

  • Implement src/orchestrator.py (Connects Search + Judge loops).
  • Build src/app.py (Gradio with Streaming).
  • Deliverable: Working DeepCritical Agent on HuggingFace.

Phase 5: Magentic Integration βœ… COMPLETE

Goal: Upgrade orchestrator to use Microsoft Agent Framework patterns.

  • Wrap SearchHandler as AgentProtocol (SearchAgent) with strict protocol compliance.
  • Wrap JudgeHandler as AgentProtocol (JudgeAgent) with strict protocol compliance.
  • Implement MagenticOrchestrator using MagenticBuilder.
  • Create factory pattern for switching implementations.
  • Deliverable: Same API, better multi-agent orchestration engine.

Phase 6: Embeddings & Semantic Search

Goal: Add vector search for semantic evidence retrieval.

  • Implement EmbeddingService with ChromaDB.
  • Add semantic deduplication to SearchAgent.
  • Enable semantic search for related evidence.
  • Store embeddings in shared context.
  • Deliverable: Find semantically related papers, not just keyword matches.

Phase 7: Hypothesis Agent

Goal: Generate scientific hypotheses to guide targeted searches.

  • Implement MechanismHypothesis and HypothesisAssessment models.
  • Implement HypothesisAgent for mechanistic reasoning.
  • Add hypothesis-driven search queries.
  • Integrate into Magentic workflow.
  • Deliverable: Drug β†’ Target β†’ Pathway β†’ Effect hypotheses that guide research.

Phase 8: Report Agent

Goal: Generate structured scientific reports with proper citations.

  • Implement ResearchReport model with all sections.
  • Implement ReportAgent for synthesis.
  • Include methodology, limitations, formatted references.
  • Integrate as final synthesis step in Magentic workflow.
  • Deliverable: Publication-quality research reports.

Complete Architecture (Phases 1-8)

User Query
    ↓
Gradio UI (Phase 4)
    ↓
Magentic Manager (Phase 5)
    β”œβ”€β”€ SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
    β”œβ”€β”€ HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
    β”œβ”€β”€ JudgeAgent (Phase 3+5) ←→ Evidence Assessment
    └── ReportAgent (Phase 8) ←→ Final Synthesis
    ↓
Structured Research Report

Spec Documents

Core Platform (Phases 1-8)

  1. Phase 1 Spec: Foundation βœ…
  2. Phase 2 Spec: Search Slice βœ…
  3. Phase 3 Spec: Judge Slice βœ…
  4. Phase 4 Spec: UI & Loop βœ…
  5. Phase 5 Spec: Magentic Integration βœ…
  6. Phase 6 Spec: Embeddings & Semantic Search βœ…
  7. Phase 7 Spec: Hypothesis Agent βœ…
  8. Phase 8 Spec: Report Agent βœ…

Multi-Source Search (Phases 9-11)

  1. Phase 9 Spec: Remove DuckDuckGo βœ…
  2. Phase 10 Spec: ClinicalTrials.gov βœ…
  3. Phase 11 Spec: bioRxiv Preprints βœ…

Hackathon Integration (Phases 12-14)

  1. Phase 12 Spec: MCP Server βœ… COMPLETE
  2. Phase 13 Spec: Modal Pipeline πŸ“ P1 - $2,500
  3. Phase 14 Spec: Demo & Submission πŸ“ P0 - REQUIRED

Progress Summary

Phase Status Deliverable
Phase 1: Foundation βœ… COMPLETE CI-ready repo with uv/pytest
Phase 2: Search βœ… COMPLETE PubMed + Web search
Phase 3: Judge βœ… COMPLETE LLM evidence assessment
Phase 4: UI & Loop βœ… COMPLETE Working Gradio app
Phase 5: Magentic βœ… COMPLETE Multi-agent orchestration
Phase 6: Embeddings βœ… COMPLETE Semantic search + ChromaDB
Phase 7: Hypothesis βœ… COMPLETE Mechanistic reasoning chains
Phase 8: Report βœ… COMPLETE Structured scientific reports
Phase 9: Source Cleanup βœ… COMPLETE Remove DuckDuckGo
Phase 10: ClinicalTrials βœ… COMPLETE ClinicalTrials.gov API
Phase 11: bioRxiv βœ… COMPLETE Preprint search
Phase 12: MCP Server βœ… COMPLETE MCP protocol integration
Phase 13: Modal Pipeline πŸ“ SPEC READY Sandboxed code execution
Phase 14: Demo & Submit πŸ“ SPEC READY Hackathon submission

Phases 1-12 COMPLETE. Phases 13-14 for hackathon prizes.


Hackathon Prize Potential

Award Amount Requirement Phase
Track 2: MCP in Action (1st) $2,500 MCP server working 12
Modal Innovation $2,500 Sandbox demo ready 13
LlamaIndex $1,000 Using RAG βœ… Done
Community Choice $1,000 Great demo video 14
Total Potential $7,000

Deadline: November 30, 2025 11:59 PM UTC