Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

Joseph Pollack commited on 9 days ago

Commit

a46bf8b

unverified ·

1 Parent(s): 5840d45

restore docs ci

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.github/README.md +20 -166
.pre-commit-hooks/run_pytest.ps1 +5 -0
.pre-commit-hooks/run_pytest.sh +5 -0
.pre-commit-hooks/run_pytest_embeddings.ps1 +14 -0
.pre-commit-hooks/run_pytest_embeddings.sh +15 -0
.pre-commit-hooks/run_pytest_unit.ps1 +14 -0
.pre-commit-hooks/run_pytest_unit.sh +15 -0
.pre-commit-hooks/run_pytest_with_sync.ps1 +25 -0
.pre-commit-hooks/run_pytest_with_sync.py +235 -0
README.md +92 -172
dev/.cursorrules +241 -0
dev/AGENTS.txt +236 -0
dev/Makefile +51 -0
dev/docs_plugins.py +74 -0
docs/CONFIGURATION.md +0 -301
docs/api/agents.md +0 -3
docs/api/models.md +0 -3
docs/api/orchestrators.md +0 -3
docs/api/services.md +0 -3
docs/api/tools.md +0 -3
docs/architecture/agents.md +0 -3
docs/architecture/design-patterns.md +0 -1509
docs/architecture/graph-orchestration.md +152 -0
docs/architecture/graph_orchestration.md +8 -0
docs/architecture/middleware.md +0 -3
docs/architecture/orchestrators.md +198 -0
docs/architecture/overview.md +0 -474
docs/architecture/services.md +0 -3
docs/architecture/tools.md +0 -3
docs/architecture/workflow-diagrams.md +670 -0
docs/{workflow-diagrams.md → architecture/workflows.md} +0 -0
docs/brainstorming/00_ROADMAP_SUMMARY.md +0 -194
docs/brainstorming/01_PUBMED_IMPROVEMENTS.md +0 -125
docs/brainstorming/02_CLINICALTRIALS_IMPROVEMENTS.md +0 -193
docs/brainstorming/03_EUROPEPMC_IMPROVEMENTS.md +0 -211
docs/brainstorming/04_OPENALEX_INTEGRATION.md +0 -303
docs/brainstorming/implementation/15_PHASE_OPENALEX.md +0 -603
docs/brainstorming/implementation/16_PHASE_PUBMED_FULLTEXT.md +0 -586
docs/brainstorming/implementation/17_PHASE_RATE_LIMITING.md +0 -540
docs/brainstorming/implementation/README.md +0 -143
docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md +0 -189
docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md +0 -289
docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md +0 -112
docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md +0 -112
docs/brainstorming/magentic-pydantic/04_FOLLOWUP_REVIEW_REQUEST.md +0 -158
docs/brainstorming/magentic-pydantic/REVIEW_PROMPT_FOR_SENIOR_AGENT.md +0 -113
docs/bugs/FIX_PLAN_MAGENTIC_MODE.md +0 -227
docs/bugs/P0_MAGENTIC_MODE_BROKEN.md +0 -116
docs/bugs/P1_GRADIO_SETTINGS_CLEANUP.md +0 -81
docs/configuration/CONFIGURATION.md +743 -0

.github/README.md CHANGED Viewed

@@ -1,38 +1,21 @@
----
-title: DeepCritical
-emoji: 🧬
-colorFrom: blue
-colorTo: purple
-sdk: gradio
-sdk_version: "6.0.1"
-python_version: "3.11"
-app_file: src/app.py
-pinned: false
-license: mit
-tags:
-  - mcp-in-action-track-enterprise
-  - mcp-hackathon
-  - drug-repurposing
-  - biomedical-ai
-  - pydantic-ai
-  - llamaindex
-  - modal
----
-# DeepCritical
-## Intro
-## Features
-- **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
-- **MCP Integration**: Use our tools from Claude Desktop or any MCP client
-- **Modal Sandbox**: Secure execution of AI-generated statistical code
-- **LlamaIndex RAG**: Semantic search and evidence synthesis
-- **HuggingfaceInference**:
-- **HuggingfaceMCP Custom Config To Use Community Tools**:
-- **Strongly Typed Composable Graphs**:
-- **Specialized Research Teams of Agents**:
 ## Quick Start
@@ -43,14 +26,14 @@ tags:
 pip install uv
 # Sync dependencies
-uv sync
 ```
 ### 2. Run the UI
 ```bash
 # Start the Gradio app
-uv run gradio run src/app.py
 ```
 Open your browser to `http://localhost:7860`.
@@ -72,132 +55,3 @@ Add this to your `claude_desktop_config.json`:
   }
 }
 ```
-**Available Tools**:
-- `search_pubmed`: Search peer-reviewed biomedical literature.
-- `search_clinical_trials`: Search ClinicalTrials.gov.
-- `search_biorxiv`: Search bioRxiv/medRxiv preprints.
-- `search_all`: Search all sources simultaneously.
-- `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
-## Deep Research Flows
-- iterativeResearch
-- deepResearch
-- researchTeam
-### Iterative Research
-sequenceDiagram
-    participant IterativeFlow
-    participant ThinkingAgent
-    participant KnowledgeGapAgent
-    participant ToolSelector
-    participant ToolExecutor
-    participant JudgeHandler
-    participant WriterAgent
-    IterativeFlow->>IterativeFlow: run(query)
-    loop Until complete or max_iterations
-        IterativeFlow->>ThinkingAgent: generate_observations()
-        ThinkingAgent-->>IterativeFlow: observations
-        IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
-        KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
-        alt Research complete
-            IterativeFlow->>WriterAgent: create_final_report()
-            WriterAgent-->>IterativeFlow: final_report
-        else Gaps remain
-            IterativeFlow->>ToolSelector: select_agents(gap)
-            ToolSelector-->>IterativeFlow: AgentSelectionPlan
-            IterativeFlow->>ToolExecutor: execute_tool_tasks()
-            ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
-            IterativeFlow->>JudgeHandler: assess_evidence()
-            JudgeHandler-->>IterativeFlow: should_continue
-        end
-    end
-### Deep Research
-sequenceDiagram
-    actor User
-    participant GraphOrchestrator
-    participant InputParser
-    participant GraphBuilder
-    participant GraphExecutor
-    participant Agent
-    participant BudgetTracker
-    participant WorkflowState
-    User->>GraphOrchestrator: run(query)
-    GraphOrchestrator->>InputParser: detect_research_mode(query)
-    InputParser-->>GraphOrchestrator: mode (iterative/deep)
-    GraphOrchestrator->>GraphBuilder: build_graph(mode)
-    GraphBuilder-->>GraphOrchestrator: ResearchGraph
-    GraphOrchestrator->>WorkflowState: init_workflow_state()
-    GraphOrchestrator->>BudgetTracker: create_budget()
-    GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
-    loop For each node in graph
-        GraphExecutor->>Agent: execute_node(agent_node)
-        Agent->>Agent: process_input
-        Agent-->>GraphExecutor: result
-        GraphExecutor->>WorkflowState: update_state(result)
-        GraphExecutor->>BudgetTracker: add_tokens(used)
-        GraphExecutor->>BudgetTracker: check_budget()
-        alt Budget exceeded
-            GraphExecutor->>GraphOrchestrator: emit(error_event)
-        else Continue
-            GraphExecutor->>GraphOrchestrator: emit(progress_event)
-        end
-    end
-    GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
-### Research Team
-Critical Deep Research Agent
-## Development
-### Run Tests
-```bash
-uv run pytest
-```
-### Run Checks
-```bash
-make check
-```
-## Architecture
-DeepCritical uses a Vertical Slice Architecture:
-1.  **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
-2.  **Judge Slice**: Evaluating evidence quality using LLMs.
-3.  **Orchestrator Slice**: Managing the research loop and UI.
-Built with:
-- **PydanticAI**: For robust agent interactions.
-- **Gradio**: For the streaming user interface.
-- **PubMed, ClinicalTrials.gov, bioRxiv**: For biomedical data.
-- **MCP**: For universal tool access.
-- **Modal**: For secure code execution.
-## Team
-- The-Obstacle-Is-The-Way
-- MarioAderman
-- Josephrp
-## Links
-- [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)

+> [!IMPORTANT]
+> **You are reading the Github README!**
+>
+> - 📚 **Documentation**: See our [technical documentation](https://deepcritical.github.io/GradioDemo/) for detailed information
+> - 📖 **Demo README**: Check out the [Demo README](..README.md) for setup, configuration, and contribution guidelines
+> - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
+<div align="center">
+[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
+[![Documentation](https://img.shields.io/badge/Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
+[![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
+[![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
+[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
+</div>
 ## Quick Start
 pip install uv
 # Sync dependencies
+uv sync --all-extras
 ```
 ### 2. Run the UI
 ```bash
 # Start the Gradio app
+gradio run "src/app.py"
 ```
 Open your browser to `http://localhost:7860`.
   }
 }
 ```

.pre-commit-hooks/run_pytest.ps1 CHANGED Viewed

@@ -2,6 +2,8 @@
 # Uses uv if available, otherwise falls back to pytest
 if (Get-Command uv -ErrorAction SilentlyContinue) {
     uv run pytest $args
 } else {
     Write-Warning "uv not found, using system pytest (may have missing dependencies)"
@@ -12,3 +14,6 @@ if (Get-Command uv -ErrorAction SilentlyContinue) {

 # Uses uv if available, otherwise falls back to pytest
 if (Get-Command uv -ErrorAction SilentlyContinue) {
+    # Sync dependencies before running tests
+    uv sync
     uv run pytest $args
 } else {
     Write-Warning "uv not found, using system pytest (may have missing dependencies)"

.pre-commit-hooks/run_pytest.sh CHANGED Viewed

@@ -3,6 +3,8 @@
 # Uses uv if available, otherwise falls back to pytest
 if command -v uv >/dev/null 2>&1; then
     uv run pytest "$@"
 else
     echo "Warning: uv not found, using system pytest (may have missing dependencies)"
@@ -13,3 +15,6 @@ fi

 # Uses uv if available, otherwise falls back to pytest
 if command -v uv >/dev/null 2>&1; then
+    # Sync dependencies before running tests
+    uv sync
     uv run pytest "$@"
 else
     echo "Warning: uv not found, using system pytest (may have missing dependencies)"

.pre-commit-hooks/run_pytest_embeddings.ps1 ADDED Viewed

	@@ -0,0 +1,14 @@

+# PowerShell wrapper to sync embeddings dependencies and run embeddings tests
+$ErrorActionPreference = "Stop"
+if (Get-Command uv -ErrorAction SilentlyContinue) {
+    Write-Host "Syncing embeddings dependencies..."
+    uv sync --extra embeddings
+    Write-Host "Running embeddings tests..."
+    uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
+} else {
+    Write-Error "uv not found"
+    exit 1
+}

.pre-commit-hooks/run_pytest_embeddings.sh ADDED Viewed

	@@ -0,0 +1,15 @@

+#!/bin/bash
+# Wrapper script to sync embeddings dependencies and run embeddings tests
+set -e
+if command -v uv >/dev/null 2>&1; then
+    echo "Syncing embeddings dependencies..."
+    uv sync --extra embeddings
+    echo "Running embeddings tests..."
+    uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
+else
+    echo "Error: uv not found"
+    exit 1
+fi

.pre-commit-hooks/run_pytest_unit.ps1 ADDED Viewed

	@@ -0,0 +1,14 @@

+# PowerShell wrapper to sync dependencies and run unit tests
+$ErrorActionPreference = "Stop"
+if (Get-Command uv -ErrorAction SilentlyContinue) {
+    Write-Host "Syncing dependencies..."
+    uv sync
+    Write-Host "Running unit tests..."
+    uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
+} else {
+    Write-Error "uv not found"
+    exit 1
+}

.pre-commit-hooks/run_pytest_unit.sh ADDED Viewed

	@@ -0,0 +1,15 @@

+#!/bin/bash
+# Wrapper script to sync dependencies and run unit tests
+set -e
+if command -v uv >/dev/null 2>&1; then
+    echo "Syncing dependencies..."
+    uv sync
+    echo "Running unit tests..."
+    uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
+else
+    echo "Error: uv not found"
+    exit 1
+fi

.pre-commit-hooks/run_pytest_with_sync.ps1 ADDED Viewed

	@@ -0,0 +1,25 @@

+# PowerShell wrapper for pytest runner
+# Ensures uv is available and runs the Python script
+param(
+    [Parameter(Position=0)]
+    [string]$TestType = "unit"
+)
+$ErrorActionPreference = "Stop"
+# Check if uv is available
+if (-not (Get-Command uv -ErrorAction SilentlyContinue)) {
+    Write-Error "uv not found. Please install uv: https://github.com/astral-sh/uv"
+    exit 1
+}
+# Get the script directory
+$ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
+$PythonScript = Join-Path $ScriptDir "run_pytest_with_sync.py"
+# Run the Python script using uv
+uv run python $PythonScript $TestType
+exit $LASTEXITCODE

.pre-commit-hooks/run_pytest_with_sync.py ADDED Viewed

	@@ -0,0 +1,235 @@

+#!/usr/bin/env python3
+"""Cross-platform pytest runner that syncs dependencies before running tests."""
+import shutil
+import subprocess
+import sys
+from pathlib import Path
+def clean_caches(project_root: Path) -> None:
+    """Remove pytest and Python cache directories and files.
+    Comprehensively removes all cache files and directories to ensure
+    clean test runs. Only scans specific directories to avoid resource
+    exhaustion from scanning large directories like .venv on Windows.
+    """
+    # Directories to scan for caches (only project code, not dependencies)
+    scan_dirs = ["src", "tests", ".pre-commit-hooks"]
+    # Directories to exclude (to avoid resource issues)
+    exclude_dirs = {
+        ".venv",
+        "venv",
+        "ENV",
+        "env",
+        ".git",
+        "node_modules",
+        "dist",
+        "build",
+        ".eggs",
+        "reference_repos",
+        "folder",
+    }
+    # Comprehensive list of cache patterns to remove
+    cache_patterns = [
+        ".pytest_cache",
+        "__pycache__",
+        "*.pyc",
+        "*.pyo",
+        "*.pyd",
+        ".mypy_cache",
+        ".ruff_cache",
+        ".coverage",
+        "coverage.xml",
+        "htmlcov",
+        ".hypothesis",  # Hypothesis testing framework cache
+        ".tox",  # Tox cache (if used)
+        ".cache",  # General Python cache
+    ]
+    def should_exclude(path: Path) -> bool:
+        """Check if a path should be excluded from cache cleanup."""
+        # Check if any parent directory is in exclude list
+        for parent in path.parents:
+            if parent.name in exclude_dirs:
+                return True
+        # Check if the path itself is excluded
+        if path.name in exclude_dirs:
+            return True
+        return False
+    cleaned = []
+    # Only scan specific directories to avoid resource exhaustion
+    for scan_dir in scan_dirs:
+        scan_path = project_root / scan_dir
+        if not scan_path.exists():
+            continue
+        for pattern in cache_patterns:
+            if "*" in pattern:
+                # Handle glob patterns for files
+                try:
+                    for cache_file in scan_path.rglob(pattern):
+                        if should_exclude(cache_file):
+                            continue
+                        try:
+                            if cache_file.is_file():
+                                cache_file.unlink()
+                                cleaned.append(str(cache_file.relative_to(project_root)))
+                        except OSError:
+                            pass  # Ignore errors (file might be locked or already deleted)
+                except OSError:
+                    pass  # Ignore errors during directory traversal
+            else:
+                # Handle directory patterns
+                try:
+                    for cache_dir in scan_path.rglob(pattern):
+                        if should_exclude(cache_dir):
+                            continue
+                        try:
+                            if cache_dir.is_dir():
+                                shutil.rmtree(cache_dir, ignore_errors=True)
+                                cleaned.append(str(cache_dir.relative_to(project_root)))
+                        except OSError:
+                            pass  # Ignore errors (directory might be locked)
+                except OSError:
+                    pass  # Ignore errors during directory traversal
+    # Also clean root-level caches (like .pytest_cache in project root)
+    root_cache_patterns = [
+        ".pytest_cache",
+        ".mypy_cache",
+        ".ruff_cache",
+        ".coverage",
+        "coverage.xml",
+        "htmlcov",
+        ".hypothesis",
+        ".tox",
+        ".cache",
+        ".pytest",
+    ]
+    for pattern in root_cache_patterns:
+        cache_path = project_root / pattern
+        if cache_path.exists():
+            try:
+                if cache_path.is_dir():
+                    shutil.rmtree(cache_path, ignore_errors=True)
+                elif cache_path.is_file():
+                    cache_path.unlink()
+                cleaned.append(pattern)
+            except OSError:
+                pass
+    # Also remove any .pyc files in root directory
+    try:
+        for pyc_file in project_root.glob("*.pyc"):
+            try:
+                pyc_file.unlink()
+                cleaned.append(pyc_file.name)
+            except OSError:
+                pass
+    except OSError:
+        pass
+    if cleaned:
+        print(
+            f"Cleaned {len(cleaned)} cache items: {', '.join(cleaned[:10])}{'...' if len(cleaned) > 10 else ''}"
+        )
+    else:
+        print("No cache files found to clean")
+def run_command(
+    cmd: list[str], check: bool = True, shell: bool = False, cwd: str | None = None
+) -> int:
+    """Run a command and return exit code."""
+    try:
+        result = subprocess.run(
+            cmd,
+            check=check,
+            shell=shell,
+            cwd=cwd,
+            env=None,  # Use current environment, uv will handle venv
+        )
+        return result.returncode
+    except subprocess.CalledProcessError as e:
+        return e.returncode
+    except FileNotFoundError:
+        print(f"Error: Command not found: {cmd[0]}")
+        return 1
+def main() -> int:
+    """Main entry point."""
+    import os
+    # Get the project root (where pyproject.toml is)
+    script_dir = Path(__file__).parent
+    project_root = script_dir.parent
+    # Change to project root to ensure uv works correctly
+    os.chdir(project_root)
+    # Clean caches before running tests
+    print("Cleaning pytest and Python caches...")
+    clean_caches(project_root)
+    # Check if uv is available
+    if run_command(["uv", "--version"], check=False) != 0:
+        print("Error: uv not found. Please install uv: https://github.com/astral-sh/uv")
+        return 1
+    # Parse arguments
+    test_type = sys.argv[1] if len(sys.argv) > 1 else "unit"
+    extra_args = sys.argv[2:] if len(sys.argv) > 2 else []
+    # Sync dependencies - always include dev
+    # Note: embeddings dependencies are now in main dependencies, not optional
+    # Use --extra dev for [project.optional-dependencies].dev (not --dev which is for [dependency-groups])
+    sync_cmd = ["uv", "sync", "--extra", "dev"]
+    print(f"Syncing dependencies for {test_type} tests...")
+    if run_command(sync_cmd, cwd=project_root) != 0:
+        return 1
+    # Build pytest command - use uv run to ensure correct environment
+    if test_type == "unit":
+        pytest_args = [
+            "tests/unit/",
+            "-v",
+            "-m",
+            "not openai and not embedding_provider",
+            "--tb=short",
+            "-p",
+            "no:logfire",
+            "--cache-clear",  # Clear pytest cache before running
+        ]
+    elif test_type == "embeddings":
+        pytest_args = [
+            "tests/",
+            "-v",
+            "-m",
+            "local_embeddings",
+            "--tb=short",
+            "-p",
+            "no:logfire",
+            "--cache-clear",  # Clear pytest cache before running
+        ]
+    else:
+        pytest_args = []
+    pytest_args.extend(extra_args)
+    # Use uv run python -m pytest to ensure we use the venv's pytest
+    # This is more reliable than uv run pytest which might find system pytest
+    pytest_cmd = ["uv", "run", "python", "-m", "pytest", *pytest_args]
+    print(f"Running {test_type} tests...")
+    return run_command(pytest_cmd, cwd=project_root)
+if __name__ == "__main__":
+    sys.exit(main())

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
-title: DeepCritical
-emoji: 🧬
-colorFrom: blue
-colorTo: purple
 sdk: gradio
 sdk_version: "6.0.1"
 python_version: "3.11"
@@ -23,178 +23,98 @@ tags:
   - modal
 ---
 # DeepCritical
-## Intro
-## Features
-- **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
-- **MCP Integration**: Use our tools from Claude Desktop or any MCP client
-- **Modal Sandbox**: Secure execution of AI-generated statistical code
-- **LlamaIndex RAG**: Semantic search and evidence synthesis
-- **HuggingfaceInference**:
-- **HuggingfaceMCP Custom Config To Use Community Tools**:
-- **Strongly Typed Composable Graphs**:
-- **Specialized Research Teams of Agents**:
-## Quick Start
-### 1. Environment Setup
-```bash
-# Install uv if you haven't already
-pip install uv
-# Sync dependencies
-uv sync
-```
-### 2. Run the UI
-```bash
-# Start the Gradio app
-uv run gradio run src/app.py
-```
-Open your browser to `http://localhost:7860`.
-### 3. Connect via MCP
-This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
-**MCP Server URL**: `http://localhost:7860/gradio_api/mcp/`
-**Claude Desktop Configuration**:
-Add this to your `claude_desktop_config.json`:
-```json
-{
-  "mcpServers": {
-    "deepcritical": {
-      "url": "http://localhost:7860/gradio_api/mcp/"
-    }
-  }
-}
-```
-**Available Tools**:
-- `search_pubmed`: Search peer-reviewed biomedical literature.
-- `search_clinical_trials`: Search ClinicalTrials.gov.
-- `search_biorxiv`: Search bioRxiv/medRxiv preprints.
-- `search_all`: Search all sources simultaneously.
-- `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
-## Architecture
-DeepCritical uses a Vertical Slice Architecture:
-1.  **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
-2.  **Judge Slice**: Evaluating evidence quality using LLMs.
-3.  **Orchestrator Slice**: Managing the research loop and UI.
-- iterativeResearch
-- deepResearch
-- researchTeam
-### Iterative Research
-sequenceDiagram
-    participant IterativeFlow
-    participant ThinkingAgent
-    participant KnowledgeGapAgent
-    participant ToolSelector
-    participant ToolExecutor
-    participant JudgeHandler
-    participant WriterAgent
-    IterativeFlow->>IterativeFlow: run(query)
-    loop Until complete or max_iterations
-        IterativeFlow->>ThinkingAgent: generate_observations()
-        ThinkingAgent-->>IterativeFlow: observations
-        IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
-        KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
-        alt Research complete
-            IterativeFlow->>WriterAgent: create_final_report()
-            WriterAgent-->>IterativeFlow: final_report
-        else Gaps remain
-            IterativeFlow->>ToolSelector: select_agents(gap)
-            ToolSelector-->>IterativeFlow: AgentSelectionPlan
-            IterativeFlow->>ToolExecutor: execute_tool_tasks()
-            ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
-            IterativeFlow->>JudgeHandler: assess_evidence()
-            JudgeHandler-->>IterativeFlow: should_continue
-        end
-    end
-### Deep Research
-sequenceDiagram
-    actor User
-    participant GraphOrchestrator
-    participant InputParser
-    participant GraphBuilder
-    participant GraphExecutor
-    participant Agent
-    participant BudgetTracker
-    participant WorkflowState
-    User->>GraphOrchestrator: run(query)
-    GraphOrchestrator->>InputParser: detect_research_mode(query)
-    InputParser-->>GraphOrchestrator: mode (iterative/deep)
-    GraphOrchestrator->>GraphBuilder: build_graph(mode)
-    GraphBuilder-->>GraphOrchestrator: ResearchGraph
-    GraphOrchestrator->>WorkflowState: init_workflow_state()
-    GraphOrchestrator->>BudgetTracker: create_budget()
-    GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
-    loop For each node in graph
-        GraphExecutor->>Agent: execute_node(agent_node)
-        Agent->>Agent: process_input
-        Agent-->>GraphExecutor: result
-        GraphExecutor->>WorkflowState: update_state(result)
-        GraphExecutor->>BudgetTracker: add_tokens(used)
-        GraphExecutor->>BudgetTracker: check_budget()
-        alt Budget exceeded
-            GraphExecutor->>GraphOrchestrator: emit(error_event)
-        else Continue
-            GraphExecutor->>GraphOrchestrator: emit(progress_event)
-        end
-    end
-    GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
-### Research Team
-Critical Deep Research Agent
-## Development
-### Run Tests
-```bash
-uv run pytest
-```
-### Run Checks
-```bash
-make check
-```
-## Join Us
-- The-Obstacle-Is-The-Way
 - MarioAderman
 - Josephrp
 ## Links
-- [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)

 ---
+title: Critical Deep Resarch
+emoji: 🐉
+colorFrom: red
+colorTo: yellow
 sdk: gradio
 sdk_version: "6.0.1"
 python_version: "3.11"
   - modal
 ---
+> [!IMPORTANT]
+> **You are reading the Gradio Demo README!**
+>
+> - 📚 **Documentation**: See our [technical documentation](deepcritical.github.io/GradioDemo/) for detailed information
+> - 📖 **Complete README**: Check out the [full README](.github/README.md) for setup, configuration, and contribution guidelines
+> - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
+<div align="center">
+[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
+[![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
+[![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
+[![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
+[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
+</div>
 # DeepCritical
+## About
+The [Deep Critical Gradio Hackathon Team](### Team) met online in the Alzheimer's Critical Literature Review Group in the Hugging Science initiative. We're building the agent framework we want to use for ai assisted research to [turn the vast amounts of clinical data into cures](https://github.com/DeepCritical/GradioDemo).
+For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
+## Deep Critical In the Medial
+- Social Medial Posts about Deep Critical :
+  -
+  -
+  -
+  -
+  -
+  -
+  -
+## Important information
+- **[readme](.github\README.md)**: configure, deploy , contribute and learn more here.
+- **[docs](deepcritical.github.io/GradioDemo/)**: want to know how all this works ? read our detailed technical documentation here.
+- **[demo](https://huggingface/spaces/DataQuests/DeepCritical)**: Try our demo on huggingface
+- **[team](### Team)**: Join us , or follow us !
+- **[video]**: See our demo video
+## Future Developments
+- [] Apply Deep Research Systems To Generate Short Form Video (up to 5 minutes)
+- [] Visualize Pydantic Graphs as Loading Screens in the UI
+- [] Improve Data Science with more Complex Graph Agents
+- [] Create Deep Critical Drug Reporposing / Discovery Demo
+- [] Create Deep Critical Literal Review
+- [] Create Deep Critical Hypothesis Generator
+- [] Create PyPi Package
+## Completed
+- [x] **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
+- [x] **MCP Integration**: Use our tools from Claude Desktop or any MCP client
+- [x] **HuggingFace OAuth**: Sign in with HuggingFace
+- [x] **Modal Sandbox**: Secure execution of AI-generated statistical code
+- [x] **LlamaIndex RAG**: Semantic search and evidence synthesis
+- [x] **HuggingfaceInference**:
+- [x] **HuggingfaceMCP Custom Config To Use Community Tools**:
+- [x] **Strongly Typed Composable Graphs**:
+- [x] **Specialized Research Teams of Agents**:
+### Team
+- ZJ
 - MarioAderman
 - Josephrp
+## Acknowledgements
+- McSwaggins
+- Magentic
+- Huggingface
+- Gradio
+- DeepCritical
+- Sponsors
+- Microsoft
+- Pydantic
+- Llama-index
+- Anthhropic/MCP
+- List of Tools Makers
 ## Links
+[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
+[![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
+[![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
+[![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
+[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)

dev/.cursorrules ADDED Viewed

	@@ -0,0 +1,241 @@

+# DeepCritical Project - Cursor Rules
+## Project-Wide Rules
+**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
+**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
+**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
+**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
+**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
+**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
+**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
+**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
+**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
+**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
+**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
+---
+## src/agents/ - Agent Implementation Rules
+**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
+**Agent Structure**:
+- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
+- Agent class with `__init__(model: Any | None = None)`
+- Main method (e.g., `async def evaluate()`, `async def write_report()`)
+- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
+**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
+**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
+**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
+**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
+**Agent-Specific Rules**:
+- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
+- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
+- `writer.py`: Returns markdown string. Includes citations in numbered format.
+- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
+- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
+- `thinking.py`: Returns observation string from conversation history.
+- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
+---
+## src/tools/ - Search Tool Rules
+**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
+**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
+**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
+**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
+**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
+**Tool-Specific Rules**:
+- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
+- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
+- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
+- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
+- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
+---
+## src/middleware/ - Middleware Rules
+**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
+**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
+**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
+**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
+**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
+---
+## src/orchestrator/ - Orchestration Rules
+**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
+**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
+**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
+**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
+**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
+**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
+---
+## src/services/ - Service Rules
+**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
+**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
+**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
+**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
+---
+## src/utils/ - Utility Rules
+**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
+**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
+**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
+**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
+**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
+---
+## src/orchestrator_factory.py Rules
+**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
+**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
+**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
+**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
+**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
+---
+## src/orchestrator_hierarchical.py Rules
+**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
+**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
+**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
+**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
+**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
+---
+## src/orchestrator_magentic.py Rules
+**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
+**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
+**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
+**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
+**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
+**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
+**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
+---
+## src/agent_factory/ - Factory Rules
+**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
+**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
+**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
+**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
+**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
+---
+## src/prompts/ - Prompt Rules
+**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
+**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
+**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
+**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
+---
+## Testing Rules
+**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
+**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
+**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
+**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
+---
+## File-Specific Agent Rules
+**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
+**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
+**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
+**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
+**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
+**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
+**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

dev/AGENTS.txt ADDED Viewed

	@@ -0,0 +1,236 @@

+# DeepCritical Project - Rules
+## Project-Wide Rules
+**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
+**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
+**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
+**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
+**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
+**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
+**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
+**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
+**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
+**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
+**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
+---
+## src/agents/ - Agent Implementation Rules
+**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
+**Agent Structure**:
+- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
+- Agent class with `__init__(model: Any | None = None)`
+- Main method (e.g., `async def evaluate()`, `async def write_report()`)
+- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
+**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
+**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
+**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
+**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
+**Agent-Specific Rules**:
+- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
+- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
+- `writer.py`: Returns markdown string. Includes citations in numbered format.
+- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
+- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
+- `thinking.py`: Returns observation string from conversation history.
+- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
+---
+## src/tools/ - Search Tool Rules
+**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
+**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
+**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
+**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
+**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
+**Tool-Specific Rules**:
+- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
+- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
+- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
+- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
+- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
+---
+## src/middleware/ - Middleware Rules
+**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
+**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
+**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
+**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
+**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
+---
+## src/orchestrator/ - Orchestration Rules
+**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
+**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
+**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
+**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
+**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
+**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
+---
+## src/services/ - Service Rules
+**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
+**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
+**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
+**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
+---
+## src/utils/ - Utility Rules
+**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
+**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
+**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
+**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
+**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
+---
+## src/orchestrator_factory.py Rules
+**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
+**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
+**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
+**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
+**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
+---
+## src/orchestrator_hierarchical.py Rules
+**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
+**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
+**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
+**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
+**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
+---
+## src/orchestrator_magentic.py Rules
+**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
+**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
+**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
+**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
+**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
+**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
+**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
+---
+## src/agent_factory/ - Factory Rules
+**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
+**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
+**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
+**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
+**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
+---
+## src/prompts/ - Prompt Rules
+**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
+**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
+**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
+**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
+---
+## Testing Rules
+**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
+**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
+**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
+**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
+---
+## File-Specific Agent Rules
+**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
+**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
+**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
+**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
+**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
+**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
+**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

dev/Makefile ADDED Viewed

	@@ -0,0 +1,51 @@

+.PHONY: install test lint format typecheck check clean all cov cov-html
+# Default target
+all: check
+install:
+	uv sync --all-extras
+	uv run pre-commit install
+test:
+	uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
+test-hf:
+	uv run pytest tests/ -v -m "huggingface" -p no:logfire
+test-all:
+	uv run pytest tests/ -v -p no:logfire
+# Coverage aliases
+cov: test-cov
+test-cov:
+	uv run pytest --cov=src --cov-report=term-missing -m "not openai" -p no:logfire
+cov-html:
+	uv run pytest --cov=src --cov-report=html -p no:logfire
+	@echo "Coverage report: open htmlcov/index.html"
+lint:
+	uv run ruff check src tests
+format:
+	uv run ruff format src tests
+typecheck:
+	uv run mypy src
+check: lint typecheck test-cov
+	@echo "All checks passed!"
+docs-build:
+	uv run mkdocs build
+docs-serve:
+	uv run mkdocs serve
+docs-clean:
+	rm -rf site/
+clean:
+	rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
+	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true

dev/docs_plugins.py ADDED Viewed

	@@ -0,0 +1,74 @@

+"""Custom MkDocs extension to handle code anchor format: ```start:end:filepath"""
+import re
+from pathlib import Path
+from markdown import Markdown
+from markdown.extensions import Extension
+from markdown.preprocessors import Preprocessor
+class CodeAnchorPreprocessor(Preprocessor):
+    """Preprocess code blocks with anchor format: ```start:end:filepath"""
+    def __init__(self, md: Markdown, base_path: Path):
+        super().__init__(md)
+        self.base_path = base_path
+        self.pattern = re.compile(r"^```(\d+):(\d+):([^\n]+)\n(.*?)```$", re.MULTILINE | re.DOTALL)
+    def run(self, lines: list[str]) -> list[str]:
+        """Process lines and convert code anchor format to standard code blocks."""
+        text = "\n".join(lines)
+        new_text = self.pattern.sub(self._replace_code_anchor, text)
+        return new_text.split("\n")
+    def _replace_code_anchor(self, match) -> str:
+        """Replace code anchor format with standard code block + link."""
+        start_line = int(match.group(1))
+        end_line = int(match.group(2))
+        file_path = match.group(3).strip()
+        existing_code = match.group(4)
+        # Determine language from file extension
+        ext = Path(file_path).suffix.lower()
+        lang_map = {
+            ".py": "python",
+            ".js": "javascript",
+            ".ts": "typescript",
+            ".md": "markdown",
+            ".yaml": "yaml",
+            ".yml": "yaml",
+            ".toml": "toml",
+            ".json": "json",
+            ".html": "html",
+            ".css": "css",
+            ".sh": "bash",
+        }
+        language = lang_map.get(ext, "python")
+        # Generate GitHub link
+        repo_url = "https://github.com/DeepCritical/GradioDemo"
+        github_link = f"{repo_url}/blob/main/{file_path}#L{start_line}-L{end_line}"
+        # Return standard code block with source link
+        return (
+            f'[View source: `{file_path}` (lines {start_line}-{end_line})]({github_link}){{: target="_blank" }}\n\n'
+            f"```{language}\n{existing_code}\n```"
+        )
+class CodeAnchorExtension(Extension):
+    """Markdown extension for code anchors."""
+    def __init__(self, base_path: str = ".", **kwargs):
+        super().__init__(**kwargs)
+        self.base_path = Path(base_path)
+    def extendMarkdown(self, md: Markdown):  # noqa: N802
+        """Register the preprocessor."""
+        md.preprocessors.register(CodeAnchorPreprocessor(md, self.base_path), "codeanchor", 25)
+def makeExtension(**kwargs):  # noqa: N802
+    """Create the extension."""
+    return CodeAnchorExtension(**kwargs)

docs/CONFIGURATION.md DELETED Viewed

@@ -1,301 +0,0 @@
-# Configuration Guide
-## Overview
-DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
-## Quick Start
-1. Copy the example environment file (if available) or create a `.env` file in the project root
-2. Set at least one LLM API key (`OPENAI_API_KEY` or `ANTHROPIC_API_KEY`)
-3. Optionally configure other services as needed
-## Configuration System
-### How It Works
-- **Settings Class**: `Settings` class in `src/utils/config.py` extends `BaseSettings` from `pydantic_settings`
-- **Environment File**: Automatically loads from `.env` file (if present)
-- **Environment Variables**: Reads from environment variables (case-insensitive)
-- **Type Safety**: Strongly-typed fields with validation
-- **Singleton Pattern**: Global `settings` instance for easy access
-### Usage
-```python
-from src.utils.config import settings
-# Check if API keys are available
-if settings.has_openai_key:
-    # Use OpenAI
-    pass
-# Access configuration values
-max_iterations = settings.max_iterations
-web_search_provider = settings.web_search_provider
-```
-## Required Configuration
-### At Least One LLM Provider
-You must configure at least one LLM provider:
-**OpenAI:**
-```bash
-LLM_PROVIDER=openai
-OPENAI_API_KEY=your_openai_api_key_here
-OPENAI_MODEL=gpt-5.1
-```
-**Anthropic:**
-```bash
-LLM_PROVIDER=anthropic
-ANTHROPIC_API_KEY=your_anthropic_api_key_here
-ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
-```
-## Optional Configuration
-### Embedding Configuration
-```bash
-# Embedding Provider: "openai", "local", or "huggingface"
-EMBEDDING_PROVIDER=local
-# OpenAI Embedding Model (used by LlamaIndex RAG)
-OPENAI_EMBEDDING_MODEL=text-embedding-3-small
-# Local Embedding Model (sentence-transformers)
-LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
-# HuggingFace Embedding Model
-HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
-```
-### HuggingFace Configuration
-```bash
-# HuggingFace API Token (for inference API)
-HUGGINGFACE_API_KEY=your_huggingface_api_key_here
-# Or use HF_TOKEN (alternative name)
-# Default HuggingFace Model ID
-HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
-```
-### Web Search Configuration
-```bash
-# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
-# Default: "duckduckgo" (no API key required)
-WEB_SEARCH_PROVIDER=duckduckgo
-# Serper API Key (for Google search via Serper)
-SERPER_API_KEY=your_serper_api_key_here
-# SearchXNG Host URL
-SEARCHXNG_HOST=http://localhost:8080
-# Brave Search API Key
-BRAVE_API_KEY=your_brave_api_key_here
-# Tavily API Key
-TAVILY_API_KEY=your_tavily_api_key_here
-```
-### PubMed Configuration
-```bash
-# NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
-NCBI_API_KEY=your_ncbi_api_key_here
-```
-### Agent Configuration
-```bash
-# Maximum iterations per research loop
-MAX_ITERATIONS=10
-# Search timeout in seconds
-SEARCH_TIMEOUT=30
-# Use graph-based execution for research flows
-USE_GRAPH_EXECUTION=false
-```
-### Budget & Rate Limiting Configuration
-```bash
-# Default token budget per research loop
-DEFAULT_TOKEN_LIMIT=100000
-# Default time limit per research loop (minutes)
-DEFAULT_TIME_LIMIT_MINUTES=10
-# Default iterations limit per research loop
-DEFAULT_ITERATIONS_LIMIT=10
-```
-### RAG Service Configuration
-```bash
-# ChromaDB collection name for RAG
-RAG_COLLECTION_NAME=deepcritical_evidence
-# Number of top results to retrieve from RAG
-RAG_SIMILARITY_TOP_K=5
-# Automatically ingest evidence into RAG
-RAG_AUTO_INGEST=true
-```
-### ChromaDB Configuration
-```bash
-# ChromaDB storage path
-CHROMA_DB_PATH=./chroma_db
-# Whether to persist ChromaDB to disk
-CHROMA_DB_PERSIST=true
-# ChromaDB server host (for remote ChromaDB, optional)
-# CHROMA_DB_HOST=localhost
-# ChromaDB server port (for remote ChromaDB, optional)
-# CHROMA_DB_PORT=8000
-```
-### External Services
-```bash
-# Modal Token ID (for Modal sandbox execution)
-MODAL_TOKEN_ID=your_modal_token_id_here
-# Modal Token Secret
-MODAL_TOKEN_SECRET=your_modal_token_secret_here
-```
-### Logging Configuration
-```bash
-# Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
-LOG_LEVEL=INFO
-```
-## Configuration Properties
-The `Settings` class provides helpful properties for checking configuration:
-```python
-from src.utils.config import settings
-# Check API key availability
-settings.has_openai_key          # bool
-settings.has_anthropic_key       # bool
-settings.has_huggingface_key     # bool
-settings.has_any_llm_key         # bool
-# Check service availability
-settings.modal_available         # bool
-settings.web_search_available    # bool
-```
-## Environment Variables Reference
-### Required (at least one LLM)
-- `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` - At least one LLM provider key
-### Optional LLM Providers
-- `DEEPSEEK_API_KEY` (Phase 2)
-- `OPENROUTER_API_KEY` (Phase 2)
-- `GEMINI_API_KEY` (Phase 2)
-- `PERPLEXITY_API_KEY` (Phase 2)
-- `HUGGINGFACE_API_KEY` or `HF_TOKEN`
-- `AZURE_OPENAI_ENDPOINT` (Phase 2)
-- `AZURE_OPENAI_DEPLOYMENT` (Phase 2)
-- `AZURE_OPENAI_API_KEY` (Phase 2)
-- `AZURE_OPENAI_API_VERSION` (Phase 2)
-- `LOCAL_MODEL_URL` (Phase 2)
-### Web Search
-- `WEB_SEARCH_PROVIDER` (default: "duckduckgo")
-- `SERPER_API_KEY`
-- `SEARCHXNG_HOST`
-- `BRAVE_API_KEY`
-- `TAVILY_API_KEY`
-### Embeddings
-- `EMBEDDING_PROVIDER` (default: "local")
-- `HUGGINGFACE_EMBEDDING_MODEL` (optional)
-### RAG
-- `RAG_COLLECTION_NAME` (default: "deepcritical_evidence")
-- `RAG_SIMILARITY_TOP_K` (default: 5)
-- `RAG_AUTO_INGEST` (default: true)
-### ChromaDB
-- `CHROMA_DB_PATH` (default: "./chroma_db")
-- `CHROMA_DB_PERSIST` (default: true)
-- `CHROMA_DB_HOST` (optional)
-- `CHROMA_DB_PORT` (optional)
-### Budget
-- `DEFAULT_TOKEN_LIMIT` (default: 100000)
-- `DEFAULT_TIME_LIMIT_MINUTES` (default: 10)
-- `DEFAULT_ITERATIONS_LIMIT` (default: 10)
-### Other
-- `LLM_PROVIDER` (default: "openai")
-- `NCBI_API_KEY` (optional)
-- `MODAL_TOKEN_ID` (optional)
-- `MODAL_TOKEN_SECRET` (optional)
-- `MAX_ITERATIONS` (default: 10)
-- `LOG_LEVEL` (default: "INFO")
-- `USE_GRAPH_EXECUTION` (default: false)
-## Validation
-Settings are validated on load using Pydantic validation:
-- **Type checking**: All fields are strongly typed
-- **Range validation**: Numeric fields have min/max constraints
-- **Literal validation**: Enum fields only accept specific values
-- **Required fields**: API keys are checked when accessed via `get_api_key()`
-## Error Handling
-Configuration errors raise `ConfigurationError`:
-```python
-from src.utils.config import settings
-from src.utils.exceptions import ConfigurationError
-try:
-    api_key = settings.get_api_key()
-except ConfigurationError as e:
-    print(f"Configuration error: {e}")
-```
-## Future Enhancements (Phase 2)
-The following configurations are planned for Phase 2:
-1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
-2. **Model Selection**: Reasoning/main/fast model configuration
-3. **Service Integration**: Migrate `folder/llm_config.py` to centralized config
-See `CONFIGURATION_ANALYSIS.md` for the complete implementation plan.

docs/api/agents.md CHANGED Viewed

	@@ -265,6 +265,3 @@ def create_input_parser_agent(model: Any \| None = None) -> InputParserAgent
265
266
267
268	-
269	-
270	-


265
266
267

docs/api/models.md CHANGED Viewed

	@@ -243,6 +243,3 @@ class BudgetStatus(BaseModel):
243
244
245
246	-
247	-
248	-


243
244
245

docs/api/orchestrators.md CHANGED Viewed

	@@ -190,6 +190,3 @@ Runs Magentic orchestration.
190
191
192
193	-
194	-
195	-


190
191
192

docs/api/services.md CHANGED Viewed

	@@ -196,6 +196,3 @@ Analyzes a hypothesis using statistical methods.
196
197
198
199	-
200	-
201	-


196
197
198

docs/api/tools.md CHANGED Viewed

	@@ -230,6 +230,3 @@ Searches multiple tools in parallel.
230
231
232
233	-
234	-
235	-


230
231
232

docs/architecture/agents.md CHANGED Viewed

	@@ -187,6 +187,3 @@ Factory functions:
187
188
189
190	-
191	-
192	-


187
188
189

docs/architecture/design-patterns.md DELETED Viewed

@@ -1,1509 +0,0 @@
-# Design Patterns & Technical Decisions
-## Explicit Answers to Architecture Questions
----
-## Purpose of This Document
-This document explicitly answers all the "design pattern" questions raised in team discussions. It provides clear technical decisions with rationale.
----
-## 1. Primary Architecture Pattern
-### Decision: Orchestrator with Search-Judge Loop
-**Pattern Name**: Iterative Research Orchestrator
-**Structure**:
-```
-┌─────────────────────────────────────┐
-│    Research Orchestrator            │
-│  ┌───────────────────────────────┐  │
-│  │  Search Strategy Planner      │  │
-│  └───────────────────────────────┘  │
-│              ↓                      │
-│  ┌───────────────────────────────┐  │
-│  │  Tool Coordinator             │  │
-│  │  - PubMed Search              │  │
-│  │  - Web Search                 │  │
-│  │  - Clinical Trials            │  │
-│  └───────────────────────────────┘  │
-│              ↓                      │
-│  ┌───────────────────────────────┐  │
-│  │  Evidence Aggregator          │  │
-│  └───────────────────────────────┘  │
-│              ↓                      │
-│  ┌───────────────────────────────┐  │
-│  │  Quality Judge                │  │
-│  │  (LLM-based assessment)       │  │
-│  └───────────────────────────────┘  │
-│              ↓                      │
-│       Loop or Synthesize?           │
-│              ↓                      │
-│  ┌───────────────────────────────┐  │
-│  │  Report Generator             │  │
-│  └───────────────────────────────┘  │
-└─────────────────────────────────────┘
-```
-**Why NOT single-agent?**
-- Need coordinated multi-tool queries
-- Need iterative refinement
-- Need quality assessment between searches
-**Why NOT pure ReAct?**
-- Medical research requires structured workflow
-- Need explicit quality gates
-- Want deterministic tool selection
-**Why THIS pattern?**
-- Clear separation of concerns
-- Testable components
-- Easy to debug
-- Proven in similar systems
----
-## 2. Tool Selection & Orchestration Pattern
-### Decision: Static Tool Registry with Dynamic Selection
-**Pattern**:
-```python
-class ToolRegistry:
-    """Central registry of available research tools"""
-    tools = {
-        'pubmed': PubMedSearchTool(),
-        'web': WebSearchTool(),
-        'trials': ClinicalTrialsTool(),
-        'drugs': DrugInfoTool(),
-    }
-class Orchestrator:
-    def select_tools(self, question: str, iteration: int) -> List[Tool]:
-        """Dynamically choose tools based on context"""
-        if iteration == 0:
-            # First pass: broad search
-            return [tools['pubmed'], tools['web']]
-        else:
-            # Refinement: targeted search
-            return self.judge.recommend_tools(question, context)
-```
-**Why NOT on-the-fly agent factories?**
-- 6-day timeline (too complex)
-- Tools are known upfront
-- Simpler to test and debug
-**Why NOT single tool?**
-- Need multiple evidence sources
-- Different tools for different info types
-- Better coverage
-**Why THIS pattern?**
-- Balance flexibility vs simplicity
-- Tools can be added easily
-- Selection logic is transparent
----
-## 3. Judge Pattern
-### Decision: Dual-Judge System (Quality + Budget)
-**Pattern**:
-```python
-class QualityJudge:
-    """LLM-based evidence quality assessment"""
-    def is_sufficient(self, question: str, evidence: List[Evidence]) -> bool:
-        """Main decision: do we have enough?"""
-        return (
-            self.has_mechanism_explanation(evidence) and
-            self.has_drug_candidates(evidence) and
-            self.has_clinical_evidence(evidence) and
-            self.confidence_score(evidence) > threshold
-        )
-    def identify_gaps(self, question: str, evidence: List[Evidence]) -> List[str]:
-        """What's missing?"""
-        gaps = []
-        if not self.has_mechanism_explanation(evidence):
-            gaps.append("disease mechanism")
-        if not self.has_drug_candidates(evidence):
-            gaps.append("potential drug candidates")
-        if not self.has_clinical_evidence(evidence):
-            gaps.append("clinical trial data")
-        return gaps
-class BudgetJudge:
-    """Resource constraint enforcement"""
-    def should_stop(self, state: ResearchState) -> bool:
-        """Hard limits"""
-        return (
-            state.tokens_used >= max_tokens or
-            state.iterations >= max_iterations or
-            state.time_elapsed >= max_time
-        )
-```
-**Why NOT just LLM judge?**
-- Cost control (prevent runaway queries)
-- Time bounds (hackathon demo needs to be fast)
-- Safety (prevent infinite loops)
-**Why NOT just token budget?**
-- Want early exit when answer is good
-- Quality matters, not just quantity
-- Better user experience
-**Why THIS pattern?**
-- Best of both worlds
-- Clear separation (quality vs resources)
-- Each judge has single responsibility
----
-## 4. Break/Stopping Pattern
-### Decision: Three-Tier Break Conditions
-**Pattern**:
-```python
-def should_continue(state: ResearchState) -> bool:
-    """Multi-tier stopping logic"""
-    # Tier 1: Quality-based (ideal stop)
-    if quality_judge.is_sufficient(state.question, state.evidence):
-        state.stop_reason = "sufficient_evidence"
-        return False
-    # Tier 2: Budget-based (cost control)
-    if state.tokens_used >= config.max_tokens:
-        state.stop_reason = "token_budget_exceeded"
-        return False
-    # Tier 3: Iteration-based (safety)
-    if state.iterations >= config.max_iterations:
-        state.stop_reason = "max_iterations_reached"
-        return False
-    # Tier 4: Time-based (demo friendly)
-    if state.time_elapsed >= config.max_time:
-        state.stop_reason = "timeout"
-        return False
-    return True  # Continue researching
-```
-**Configuration**:
-```toml
-[research.limits]
-max_tokens = 50000      # ~$0.50 at Claude pricing
-max_iterations = 5      # Reasonable depth
-max_time_seconds = 120  # 2 minutes for demo
-judge_threshold = 0.8   # Quality confidence score
-```
-**Why multiple conditions?**
-- Defense in depth
-- Different failure modes
-- Graceful degradation
-**Why these specific limits?**
-- Tokens: Balances cost vs quality
-- Iterations: Enough for refinement, not too deep
-- Time: Fast enough for live demo
-- Judge: High bar for quality
----
-## 5. State Management Pattern
-### Decision: Pydantic State Machine with Checkpoints
-**Pattern**:
-```python
-class ResearchState(BaseModel):
-    """Immutable state snapshots"""
-    query_id: str
-    question: str
-    iteration: int = 0
-    evidence: List[Evidence] = []
-    tokens_used: int = 0
-    search_history: List[SearchQuery] = []
-    stop_reason: Optional[str] = None
-    created_at: datetime
-    updated_at: datetime
-class StateManager:
-    def save_checkpoint(self, state: ResearchState) -> None:
-        """Save state to disk"""
-        path = f".deepresearch/checkpoints/{state.query_id}_iter{state.iteration}.json"
-        path.write_text(state.model_dump_json(indent=2))
-    def load_checkpoint(self, query_id: str, iteration: int) -> ResearchState:
-        """Resume from checkpoint"""
-        path = f".deepresearch/checkpoints/{query_id}_iter{iteration}.json"
-        return ResearchState.model_validate_json(path.read_text())
-```
-**Directory Structure**:
-```
-.deepresearch/
-├── state/
-│   └── current_123.json          # Active research state
-├── checkpoints/
-│   ├── query_123_iter0.json      # Checkpoint after iteration 0
-│   ├── query_123_iter1.json      # Checkpoint after iteration 1
-│   └── query_123_iter2.json      # Checkpoint after iteration 2
-└── workspace/
-    └── query_123/
-        ├── papers/                # Downloaded PDFs
-        ├── search_results/        # Raw search results
-        └── analysis/              # Intermediate analysis
-```
-**Why Pydantic?**
-- Type safety
-- Validation
-- Easy serialization
-- Integration with Pydantic AI
-**Why checkpoints?**
-- Resume interrupted research
-- Debugging (inspect state at each iteration)
-- Cost savings (don't re-query)
-- Demo resilience
----
-## 6. Tool Interface Pattern
-### Decision: Async Unified Tool Protocol
-**Pattern**:
-```python
-from typing import Protocol, Optional, List, Dict
-import asyncio
-class ResearchTool(Protocol):
-    """Standard async interface all tools must implement"""
-    async def search(
-        self,
-        query: str,
-        max_results: int = 10,
-        filters: Optional[Dict] = None
-    ) -> List[Evidence]:
-        """Execute search and return structured evidence"""
-        ...
-    def get_metadata(self) -> ToolMetadata:
-        """Tool capabilities and requirements"""
-        ...
-class PubMedSearchTool:
-    """Concrete async implementation"""
-    def __init__(self):
-        self._rate_limiter = asyncio.Semaphore(3)  # 3 req/sec
-        self._cache: Dict[str, List[Evidence]] = {}
-    async def search(self, query: str, max_results: int = 10, **kwargs) -> List[Evidence]:
-        # Check cache first
-        cache_key = f"{query}:{max_results}"
-        if cache_key in self._cache:
-            return self._cache[cache_key]
-        async with self._rate_limiter:
-            # 1. Query PubMed E-utilities API (async httpx)
-            async with httpx.AsyncClient() as client:
-                response = await client.get(
-                    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi",
-                    params={"db": "pubmed", "term": query, "retmax": max_results}
-                )
-            # 2. Parse XML response
-            # 3. Extract: title, abstract, authors, citations
-            # 4. Convert to Evidence objects
-            evidence_list = self._parse_response(response.text)
-            # Cache results
-            self._cache[cache_key] = evidence_list
-            return evidence_list
-    def get_metadata(self) -> ToolMetadata:
-        return ToolMetadata(
-            name="PubMed",
-            description="Biomedical literature search",
-            rate_limit="3 requests/second",
-            requires_api_key=False
-        )
-```
-**Parallel Tool Execution**:
-```python
-async def search_all_tools(query: str, tools: List[ResearchTool]) -> List[Evidence]:
-    """Run all tool searches in parallel"""
-    tasks = [tool.search(query) for tool in tools]
-    results = await asyncio.gather(*tasks, return_exceptions=True)
-    # Flatten and filter errors
-    evidence = []
-    for result in results:
-        if isinstance(result, Exception):
-            logger.warning(f"Tool failed: {result}")
-        else:
-            evidence.extend(result)
-    return evidence
-```
-**Why Async?**
-- Tools are I/O bound (network calls)
-- Parallel execution = faster searches
-- Better UX (streaming progress)
-- Standard in 2025 Python
-**Why Protocol?**
-- Loose coupling
-- Easy to add new tools
-- Testable with mocks
-- Clear contract
-**Why NOT abstract base class?**
-- More Pythonic (PEP 544)
-- Duck typing friendly
-- Runtime checking with isinstance
----
-## 7. Report Generation Pattern
-### Decision: Structured Output with Citations
-**Pattern**:
-```python
-class DrugCandidate(BaseModel):
-    name: str
-    mechanism: str
-    evidence_quality: Literal["strong", "moderate", "weak"]
-    clinical_status: str  # "FDA approved", "Phase 2", etc.
-    citations: List[Citation]
-class ResearchReport(BaseModel):
-    query: str
-    disease_mechanism: str
-    candidates: List[DrugCandidate]
-    methodology: str  # How we searched
-    confidence: float
-    sources_used: List[str]
-    generated_at: datetime
-    def to_markdown(self) -> str:
-        """Human-readable format"""
-        ...
-    def to_json(self) -> str:
-        """Machine-readable format"""
-        ...
-```
-**Output Example**:
-```markdown
-# Research Report: Long COVID Fatigue
-## Disease Mechanism
-Long COVID fatigue is associated with mitochondrial dysfunction
-and persistent inflammation [1, 2].
-## Drug Candidates
-### 1. Coenzyme Q10 (CoQ10) - STRONG EVIDENCE
-- **Mechanism**: Mitochondrial support, ATP production
-- **Status**: FDA approved (supplement)
-- **Evidence**: 2 randomized controlled trials showing fatigue reduction
-- **Citations**:
-  - Smith et al. (2023) - PubMed: 12345678
-  - Johnson et al. (2023) - PubMed: 87654321
-### 2. Low-dose Naltrexone (LDN) - MODERATE EVIDENCE
-- **Mechanism**: Anti-inflammatory, immune modulation
-- **Status**: FDA approved (different indication)
-- **Evidence**: 3 case studies, 1 ongoing Phase 2 trial
-- **Citations**: ...
-## Methodology
-- Searched PubMed: 45 papers reviewed
-- Searched Web: 12 sources
-- Clinical trials: 8 trials identified
-- Total iterations: 3
-- Tokens used: 12,450
-## Confidence: 85%
-## Sources
-- PubMed E-utilities
-- ClinicalTrials.gov
-- OpenFDA Database
-```
-**Why structured?**
-- Parseable by other systems
-- Consistent format
-- Easy to validate
-- Good for datasets
-**Why markdown?**
-- Human-readable
-- Renders nicely in Gradio
-- Easy to convert to PDF
-- Standard format
----
-## 8. Error Handling Pattern
-### Decision: Graceful Degradation with Fallbacks
-**Pattern**:
-```python
-class ResearchAgent:
-    def research(self, question: str) -> ResearchReport:
-        try:
-            return self._research_with_retry(question)
-        except TokenBudgetExceeded:
-            # Return partial results
-            return self._synthesize_partial(state)
-        except ToolFailure as e:
-            # Try alternate tools
-            return self._research_with_fallback(question, failed_tool=e.tool)
-        except Exception as e:
-            # Log and return error report
-            logger.error(f"Research failed: {e}")
-            return self._error_report(question, error=e)
-```
-**Why NOT fail fast?**
-- Hackathon demo must be robust
-- Partial results better than nothing
-- Good user experience
-**Why NOT silent failures?**
-- Need visibility for debugging
-- User should know limitations
-- Honest about confidence
----
-## 9. Configuration Pattern
-### Decision: Hydra-inspired but Simpler
-**Pattern**:
-```toml
-# config.toml
-[research]
-max_iterations = 5
-max_tokens = 50000
-max_time_seconds = 120
-judge_threshold = 0.85
-[tools]
-enabled = ["pubmed", "web", "trials"]
-[tools.pubmed]
-max_results = 20
-rate_limit = 3  # per second
-[tools.web]
-engine = "serpapi"
-max_results = 10
-[llm]
-provider = "anthropic"
-model = "claude-3-5-sonnet-20241022"
-temperature = 0.1
-[output]
-format = "markdown"
-include_citations = true
-include_methodology = true
-```
-**Loading**:
-```python
-from pathlib import Path
-import tomllib
-def load_config() -> dict:
-    config_path = Path("config.toml")
-    with open(config_path, "rb") as f:
-        return tomllib.load(f)
-```
-**Why NOT full Hydra?**
-- Simpler for hackathon
-- Easier to understand
-- Faster to modify
-- Can upgrade later
-**Why TOML?**
-- Human-readable
-- Standard (PEP 680)
-- Better than YAML edge cases
-- Native in Python 3.11+
----
-## 10. Testing Pattern
-### Decision: Three-Level Testing Strategy
-**Pattern**:
-```python
-# Level 1: Unit tests (fast, isolated)
-def test_pubmed_tool():
-    tool = PubMedSearchTool()
-    results = tool.search("aspirin cardiovascular")
-    assert len(results) > 0
-    assert all(isinstance(r, Evidence) for r in results)
-# Level 2: Integration tests (tools + agent)
-def test_research_loop():
-    agent = ResearchAgent(config=test_config)
-    report = agent.research("aspirin repurposing")
-    assert report.candidates
-    assert report.confidence > 0
-# Level 3: End-to-end tests (full system)
-def test_full_workflow():
-    # Simulate user query through Gradio UI
-    response = gradio_app.predict("test query")
-    assert "Drug Candidates" in response
-```
-**Why three levels?**
-- Fast feedback (unit tests)
-- Confidence (integration tests)
-- Reality check (e2e tests)
-**Test Data**:
-```python
-# tests/fixtures/
-- mock_pubmed_response.xml
-- mock_web_results.json
-- sample_research_query.txt
-- expected_report.md
-```
----
-## 11. Judge Prompt Templates
-### Decision: Structured JSON Output with Domain-Specific Criteria
-**Quality Judge System Prompt**:
-```python
-QUALITY_JUDGE_SYSTEM = """You are a medical research quality assessor specializing in drug repurposing.
-Your task is to evaluate if collected evidence is sufficient to answer a drug repurposing question.
-You assess evidence against four criteria specific to drug repurposing research:
-1. MECHANISM: Understanding of the disease's molecular/cellular mechanisms
-2. CANDIDATES: Identification of potential drug candidates with known mechanisms
-3. EVIDENCE: Clinical or preclinical evidence supporting repurposing
-4. SOURCES: Quality and credibility of sources (peer-reviewed > preprints > web)
-You MUST respond with valid JSON only. No other text."""
-```
-**Quality Judge User Prompt**:
-```python
-QUALITY_JUDGE_USER = """
-## Research Question
-{question}
-## Evidence Collected (Iteration {iteration} of {max_iterations})
-{evidence_summary}
-## Token Budget
-Used: {tokens_used} / {max_tokens}
-## Your Assessment
-Evaluate the evidence and respond with this exact JSON structure:
-```json
-{{
-  "assessment": {{
-    "mechanism_score": <0-10>,
-    "mechanism_reasoning": "<Step-by-step analysis of mechanism understanding>",
-    "candidates_score": <0-10>,
-    "candidates_found": ["<drug1>", "<drug2>", ...],
-    "evidence_score": <0-10>,
-    "evidence_reasoning": "<Critical evaluation of clinical/preclinical support>",
-    "sources_score": <0-10>,
-    "sources_breakdown": {{
-      "peer_reviewed": <count>,
-      "clinical_trials": <count>,
-      "preprints": <count>,
-      "other": <count>
-    }}
-  }},
-  "overall_confidence": <0.0-1.0>,
-  "sufficient": <true/false>,
-  "gaps": ["<missing info 1>", "<missing info 2>"],
-  "recommended_searches": ["<search query 1>", "<search query 2>"],
-  "recommendation": "<continue|synthesize>"
-}}
-```
-Decision rules:
-- sufficient=true if overall_confidence >= 0.8 AND mechanism_score >= 6 AND candidates_score >= 6
-- sufficient=true if remaining budget < 10% (must synthesize with what we have)
-- Otherwise, provide recommended_searches to fill gaps
-"""
-```
-**Report Synthesis Prompt**:
-```python
-SYNTHESIS_PROMPT = """You are a medical research synthesizer creating a drug repurposing report.
-## Research Question
-{question}
-## Collected Evidence
-{all_evidence}
-## Judge Assessment
-{final_assessment}
-## Your Task
-Create a comprehensive research report with this structure:
-1. **Executive Summary** (2-3 sentences)
-2. **Disease Mechanism** - What we understand about the condition
-3. **Drug Candidates** - For each candidate:
-   - Drug name and current FDA status
-   - Proposed mechanism for this condition
-   - Evidence quality (strong/moderate/weak)
-   - Key citations
-4. **Methodology** - How we searched (tools used, queries, iterations)
-5. **Limitations** - What we couldn't find or verify
-6. **Confidence Score** - Overall confidence in findings
-Format as Markdown. Include PubMed IDs as citations [PMID: 12345678].
-Be scientifically accurate. Do not hallucinate drug names or mechanisms.
-If evidence is weak, say so clearly."""
-```
-**Why Structured JSON?**
-- Parseable by code (not just LLM output)
-- Consistent format for logging/debugging
-- Can trigger specific actions (continue vs synthesize)
-- Testable with expected outputs
-**Why Domain-Specific Criteria?**
-- Generic "is this good?" prompts fail
-- Drug repurposing has specific requirements
-- Physician on team validated criteria
-- Maps to real research workflow
----
-## 12. MCP Server Integration (Hackathon Track)
-### Decision: Tools as MCP Servers for Reusability
-**Why MCP?**
-- Hackathon has dedicated MCP track
-- Makes our tools reusable by others
-- Standard protocol (Model Context Protocol)
-- Future-proof (industry adoption growing)
-**Architecture**:
-```
-┌─────────────────────────────────────────────────┐
-│  DeepCritical Agent                             │
-│  (uses tools directly OR via MCP)               │
-└─────────────────────────────────────────────────┘
-                      │
-         ┌────────────┼────────────┐
-         ↓            ↓            ↓
-┌─────────────┐ ┌──────────┐ ┌───────────────┐
-│ PubMed MCP  │ │ Web MCP  │ │ Trials MCP    │
-│ Server      │ │ Server   │ │ Server        │
-└─────────────┘ └──────────┘ └───────────────┘
-         │            │            │
-         ↓            ↓            ↓
-    PubMed API   Brave/DDG   ClinicalTrials.gov
-```
-**PubMed MCP Server Implementation**:
-```python
-# src/mcp_servers/pubmed_server.py
-from fastmcp import FastMCP
-mcp = FastMCP("PubMed Research Tool")
-@mcp.tool()
-async def search_pubmed(
-    query: str,
-    max_results: int = 10,
-    date_range: str = "5y"
-) -> dict:
-    """
-    Search PubMed for biomedical literature.
-    Args:
-        query: Search terms (supports PubMed syntax like [MeSH])
-        max_results: Maximum papers to return (default 10, max 100)
-        date_range: Time filter - "1y", "5y", "10y", or "all"
-    Returns:
-        dict with papers list containing title, abstract, authors, pmid, date
-    """
-    tool = PubMedSearchTool()
-    results = await tool.search(query, max_results)
-    return {
-        "query": query,
-        "count": len(results),
-        "papers": [r.model_dump() for r in results]
-    }
-@mcp.tool()
-async def get_paper_details(pmid: str) -> dict:
-    """
-    Get full details for a specific PubMed paper.
-    Args:
-        pmid: PubMed ID (e.g., "12345678")
-    Returns:
-        Full paper metadata including abstract, MeSH terms, references
-    """
-    tool = PubMedSearchTool()
-    return await tool.get_details(pmid)
-if __name__ == "__main__":
-    mcp.run()
-```
-**Running the MCP Server**:
-```bash
-# Start the server
-python -m src.mcp_servers.pubmed_server
-# Or with uvx (recommended)
-uvx fastmcp run src/mcp_servers/pubmed_server.py
-# Note: fastmcp uses stdio transport by default, which is perfect
-# for local integration with Claude Desktop or the main agent.
-```
-**Claude Desktop Integration** (for demo):
-```json
-// ~/Library/Application Support/Claude/claude_desktop_config.json
-{
-  "mcpServers": {
-    "pubmed": {
-      "command": "python",
-      "args": ["-m", "src.mcp_servers.pubmed_server"],
-      "cwd": "/path/to/deepcritical"
-    }
-  }
-}
-```
-**Why FastMCP?**
-- Simple decorator syntax
-- Handles protocol complexity
-- Good docs and examples
-- Works with Claude Desktop and API
-**MCP Track Submission Requirements**:
-- [ ] At least one tool as MCP server
-- [ ] README with setup instructions
-- [ ] Demo showing MCP usage
-- [ ] Bonus: Multiple tools as MCP servers
----
-## 13. Gradio UI Pattern (Hackathon Track)
-### Decision: Streaming Progress with Modern UI
-**Pattern**:
-```python
-import gradio as gr
-from typing import Generator
-def research_with_streaming(question: str) -> Generator[str, None, None]:
-    """Stream research progress to UI"""
-    yield "🔍 Starting research...\n\n"
-    agent = ResearchAgent()
-    async for event in agent.research_stream(question):
-        match event.type:
-            case "search_start":
-                yield f"📚 Searching {event.tool}...\n"
-            case "search_complete":
-                yield f"✅ Found {event.count} results from {event.tool}\n"
-            case "judge_thinking":
-                yield f"🤔 Evaluating evidence quality...\n"
-            case "judge_decision":
-                yield f"📊 Confidence: {event.confidence:.0%}\n"
-            case "iteration_complete":
-                yield f"🔄 Iteration {event.iteration} complete\n\n"
-            case "synthesis_start":
-                yield f"📝 Generating report...\n"
-            case "complete":
-                yield f"\n---\n\n{event.report}"
-# Gradio 5 UI
-with gr.Blocks(theme=gr.themes.Soft()) as demo:
-    gr.Markdown("# 🔬 DeepCritical: Drug Repurposing Research Agent")
-    gr.Markdown("Ask a question about potential drug repurposing opportunities.")
-    with gr.Row():
-        with gr.Column(scale=2):
-            question = gr.Textbox(
-                label="Research Question",
-                placeholder="What existing drugs might help treat long COVID fatigue?",
-                lines=2
-            )
-            examples = gr.Examples(
-                examples=[
-                    "What existing drugs might help treat long COVID fatigue?",
-                    "Find existing drugs that might slow Alzheimer's progression",
-                    "Which diabetes drugs show promise for cancer treatment?"
-                ],
-                inputs=question
-            )
-            submit = gr.Button("🚀 Start Research", variant="primary")
-        with gr.Column(scale=3):
-            output = gr.Markdown(label="Research Progress & Report")
-    submit.click(
-        fn=research_with_streaming,
-        inputs=question,
-        outputs=output,
-    )
-demo.launch()
-```
-**Why Streaming?**
-- User sees progress, not loading spinner
-- Builds trust (system is working)
-- Better UX for long operations
-- Gradio 5 native support
-**Why gr.Markdown Output?**
-- Research reports are markdown
-- Renders citations nicely
-- Code blocks for methodology
-- Tables for drug comparisons
----
-## Summary: Design Decision Table
-| # | Question | Decision | Why |
-|---|----------|----------|-----|
-| 1 | **Architecture** | Orchestrator with search-judge loop | Clear, testable, proven |
-| 2 | **Tools** | Static registry, dynamic selection | Balance flexibility vs simplicity |
-| 3 | **Judge** | Dual (quality + budget) | Quality + cost control |
-| 4 | **Stopping** | Four-tier conditions | Defense in depth |
-| 5 | **State** | Pydantic + checkpoints | Type-safe, resumable |
-| 6 | **Tool Interface** | Async Protocol + parallel execution | Fast I/O, modern Python |
-| 7 | **Output** | Structured + Markdown | Human & machine readable |
-| 8 | **Errors** | Graceful degradation + fallbacks | Robust for demo |
-| 9 | **Config** | TOML (Hydra-inspired) | Simple, standard |
-| 10 | **Testing** | Three levels | Fast feedback + confidence |
-| 11 | **Judge Prompts** | Structured JSON + domain criteria | Parseable, medical-specific |
-| 12 | **MCP** | Tools as MCP servers | Hackathon track, reusability |
-| 13 | **UI** | Gradio 5 streaming | Progress visibility, modern UX |
----
-## Answers to Specific Questions
-### "What's the orchestrator pattern?"
-**Answer**: See Section 1 - Iterative Research Orchestrator with search-judge loop
-### "LLM-as-judge or token budget?"
-**Answer**: Both - See Section 3 (Dual-Judge System) and Section 4 (Three-Tier Break Conditions)
-### "What's the break pattern?"
-**Answer**: See Section 4 - Three stopping conditions: quality threshold, token budget, max iterations
-### "Should we use agent factories?"
-**Answer**: No - See Section 2. Static tool registry is simpler for 6-day timeline
-### "How do we handle state?"
-**Answer**: See Section 5 - Pydantic state machine with checkpoints
----
-## Appendix: Complete Data Models
-```python
-# src/deepresearch/models.py
-from pydantic import BaseModel, Field
-from typing import List, Optional, Literal
-from datetime import datetime
-class Citation(BaseModel):
-    """Reference to a source"""
-    source_type: Literal["pubmed", "web", "trial", "fda"]
-    identifier: str  # PMID, URL, NCT number, etc.
-    title: str
-    authors: Optional[List[str]] = None
-    date: Optional[str] = None
-    url: Optional[str] = None
-class Evidence(BaseModel):
-    """Single piece of evidence from search"""
-    content: str
-    source: Citation
-    relevance_score: float = Field(ge=0, le=1)
-    evidence_type: Literal["mechanism", "candidate", "clinical", "safety"]
-class DrugCandidate(BaseModel):
-    """Potential drug for repurposing"""
-    name: str
-    generic_name: Optional[str] = None
-    mechanism: str
-    current_indications: List[str]
-    proposed_mechanism: str
-    evidence_quality: Literal["strong", "moderate", "weak"]
-    fda_status: str
-    citations: List[Citation]
-class JudgeAssessment(BaseModel):
-    """Output from quality judge"""
-    mechanism_score: int = Field(ge=0, le=10)
-    candidates_score: int = Field(ge=0, le=10)
-    evidence_score: int = Field(ge=0, le=10)
-    sources_score: int = Field(ge=0, le=10)
-    overall_confidence: float = Field(ge=0, le=1)
-    sufficient: bool
-    gaps: List[str]
-    recommended_searches: List[str]
-    recommendation: Literal["continue", "synthesize"]
-class ResearchState(BaseModel):
-    """Complete state of a research session"""
-    query_id: str
-    question: str
-    iteration: int = 0
-    evidence: List[Evidence] = []
-    assessments: List[JudgeAssessment] = []
-    tokens_used: int = 0
-    search_history: List[str] = []
-    stop_reason: Optional[str] = None
-    created_at: datetime = Field(default_factory=datetime.utcnow)
-    updated_at: datetime = Field(default_factory=datetime.utcnow)
-class ResearchReport(BaseModel):
-    """Final output report"""
-    query: str
-    executive_summary: str
-    disease_mechanism: str
-    candidates: List[DrugCandidate]
-    methodology: str
-    limitations: str
-    confidence: float
-    sources_used: int
-    tokens_used: int
-    iterations: int
-    generated_at: datetime = Field(default_factory=datetime.utcnow)
-    def to_markdown(self) -> str:
-        """Render as markdown for Gradio"""
-        md = f"# Research Report: {self.query}\n\n"
-        md += f"## Executive Summary\n{self.executive_summary}\n\n"
-        md += f"## Disease Mechanism\n{self.disease_mechanism}\n\n"
-        md += "## Drug Candidates\n\n"
-        for i, drug in enumerate(self.candidates, 1):
-            md += f"### {i}. {drug.name} - {drug.evidence_quality.upper()} EVIDENCE\n"
-            md += f"- **Mechanism**: {drug.proposed_mechanism}\n"
-            md += f"- **FDA Status**: {drug.fda_status}\n"
-            md += f"- **Current Uses**: {', '.join(drug.current_indications)}\n"
-            md += f"- **Citations**: {len(drug.citations)} sources\n\n"
-        md += f"## Methodology\n{self.methodology}\n\n"
-        md += f"## Limitations\n{self.limitations}\n\n"
-        md += f"## Confidence: {self.confidence:.0%}\n"
-        return md
-```
----
-## 14. Alternative Frameworks Considered
-We researched major agent frameworks before settling on our stack. Here's why we chose what we chose, and what we'd steal if we're shipping like animals and have time for Gucci upgrades.
-### Frameworks Evaluated
-| Framework | Repo | What It Does |
-|-----------|------|--------------|
-| **Microsoft AutoGen** | [github.com/microsoft/autogen](https://github.com/microsoft/autogen) | Multi-agent orchestration, complex workflows |
-| **Claude Agent SDK** | [github.com/anthropics/claude-agent-sdk-python](https://github.com/anthropics/claude-agent-sdk-python) | Anthropic's official agent framework |
-| **Pydantic AI** | [github.com/pydantic/pydantic-ai](https://github.com/pydantic/pydantic-ai) | Type-safe agents, structured outputs |
-### Why NOT AutoGen (Microsoft)?
-**Pros:**
-- Battle-tested multi-agent orchestration
-- `reflect_on_tool_use` - model reviews its own tool results
-- `max_tool_iterations` - built-in iteration limits
-- Concurrent tool execution
-- Rich ecosystem (AutoGen Studio, benchmarks)
-**Cons for MVP:**
-- Heavy dependency tree (50+ packages)
-- Complex configuration (YAML + Python)
-- Overkill for single-agent search-judge loop
-- Learning curve eats into 6-day timeline
-**Verdict:** Great for multi-agent systems. Overkill for our MVP.
-### Why NOT Claude Agent SDK (Anthropic)?
-**Pros:**
-- Official Anthropic framework
-- Clean `@tool` decorator pattern
-- In-process MCP servers (no subprocess)
-- Hooks for pre/post tool execution
-- Direct Claude Code integration
-**Cons for MVP:**
-- Requires Claude Code CLI bundled
-- Node.js dependency for some features
-- Designed for Claude Code ecosystem, not standalone agents
-- Less flexible for custom LLM providers
-**Verdict:** Would be great if we were building ON Claude Code. We're building a standalone agent.
-### Why Pydantic AI + FastMCP (Our Choice)
-**Pros:**
-- ✅ Simple, Pythonic API
-- ✅ Native async/await
-- ✅ Type-safe with Pydantic
-- ✅ Works with any LLM provider
-- ✅ FastMCP for clean MCP servers
-- ✅ Minimal dependencies
-- ✅ Can ship MVP in 6 days
-**Cons:**
-- Newer framework (less battle-tested)
-- Smaller ecosystem
-- May need to build more from scratch
-**Verdict:** Right tool for the job. Ship fast, iterate later.
----
-## 15. Stretch Goals: Gucci Bangers (If We're Shipping Like Animals)
-If MVP ships early and we're crushing it, here's what we'd steal from other frameworks:
-### Tier 1: Quick Wins (2-4 hours each)
-#### From Claude Agent SDK: `@tool` Decorator Pattern
-Replace our Protocol-based tools with cleaner decorators:
-```python
-# CURRENT (Protocol-based)
-class PubMedSearchTool:
-    async def search(self, query: str, max_results: int = 10) -> List[Evidence]:
-        ...
-# UPGRADE (Decorator-based, stolen from Claude SDK)
-from claude_agent_sdk import tool
-@tool("search_pubmed", "Search PubMed for biomedical papers", {
-    "query": str,
-    "max_results": int
-})
-async def search_pubmed(args):
-    results = await _do_pubmed_search(args["query"], args["max_results"])
-    return {"content": [{"type": "text", "text": json.dumps(results)}]}
-```
-**Why it's Gucci:** Cleaner syntax, automatic schema generation, less boilerplate.
-#### From AutoGen: Reflect on Tool Use
-Add a reflection step where the model reviews its own tool results:
-```python
-# CURRENT: Judge evaluates evidence
-assessment = await judge.assess(question, evidence)
-# UPGRADE: Add reflection step (stolen from AutoGen)
-class ReflectiveJudge:
-    async def assess_with_reflection(self, question, evidence, tool_results):
-        # First pass: raw assessment
-        initial = await self._assess(question, evidence)
-        # Reflection: "Did I use the tools correctly?"
-        reflection = await self._reflect_on_tool_use(tool_results)
-        # Final: combine assessment + reflection
-        return self._combine(initial, reflection)
-```
-**Why it's Gucci:** Catches tool misuse, improves accuracy, more robust judge.
-### Tier 2: Medium Lifts (4-8 hours each)
-#### From AutoGen: Concurrent Tool Execution
-Run multiple tools in parallel with proper error handling:
-```python
-# CURRENT: Sequential with asyncio.gather
-results = await asyncio.gather(*[tool.search(query) for tool in tools])
-# UPGRADE: AutoGen-style with cancellation + timeout
-from autogen_core import CancellationToken
-async def execute_tools_concurrent(tools, query, timeout=30):
-    token = CancellationToken()
-    async def run_with_timeout(tool):
-        try:
-            return await asyncio.wait_for(
-                tool.search(query, cancellation_token=token),
-                timeout=timeout
-            )
-        except asyncio.TimeoutError:
-            token.cancel()  # Cancel other tools
-            return ToolError(f"{tool.name} timed out")
-    return await asyncio.gather(*[run_with_timeout(t) for t in tools])
-```
-**Why it's Gucci:** Proper timeout handling, cancellation propagation, production-ready.
-#### From Claude SDK: Hooks System
-Add pre/post hooks for logging, validation, cost tracking:
-```python
-# UPGRADE: Hook system (stolen from Claude SDK)
-class HookManager:
-    async def pre_tool_use(self, tool_name, args):
-        """Called before every tool execution"""
-        logger.info(f"Calling {tool_name} with {args}")
-        self.cost_tracker.start_timer()
-    async def post_tool_use(self, tool_name, result, duration):
-        """Called after every tool execution"""
-        self.cost_tracker.record(tool_name, duration)
-        if result.is_error:
-            self.error_tracker.record(tool_name, result.error)
-```
-**Why it's Gucci:** Observability, debugging, cost tracking, production-ready.
-### Tier 3: Big Lifts (Post-Hackathon)
-#### Full AutoGen Integration
-If we want multi-agent capabilities later:
-```python
-# POST-HACKATHON: Multi-agent drug repurposing
-from autogen_agentchat import AssistantAgent, GroupChat
-literature_agent = AssistantAgent(
-    name="LiteratureReviewer",
-    tools=[pubmed_search, web_search],
-    system_message="You search and summarize medical literature."
-)
-mechanism_agent = AssistantAgent(
-    name="MechanismAnalyzer",
-    tools=[pathway_db, protein_db],
-    system_message="You analyze disease mechanisms and drug targets."
-)
-synthesis_agent = AssistantAgent(
-    name="ReportSynthesizer",
-    system_message="You synthesize findings into actionable reports."
-)
-# Orchestrate multi-agent workflow
-group_chat = GroupChat(
-    agents=[literature_agent, mechanism_agent, synthesis_agent],
-    max_round=10
-)
-```
-**Why it's Gucci:** True multi-agent collaboration, specialized roles, scalable.
----
-## Priority Order for Stretch Goals
-| Priority | Feature | Source | Effort | Impact |
-|----------|---------|--------|--------|--------|
-| 1 | `@tool` decorator | Claude SDK | 2 hrs | High - cleaner code |
-| 2 | Reflect on tool use | AutoGen | 3 hrs | High - better accuracy |
-| 3 | Hooks system | Claude SDK | 4 hrs | Medium - observability |
-| 4 | Concurrent + cancellation | AutoGen | 4 hrs | Medium - robustness |
-| 5 | Multi-agent | AutoGen | 8+ hrs | Post-hackathon |
----
-## The Bottom Line
-```
-┌─────────────────────────────────────────────────────────────┐
-│  MVP (Days 1-4): Pydantic AI + FastMCP                      │
-│  - Ship working drug repurposing agent                      │
-│  - Search-judge loop with PubMed + Web                      │
-│  - Gradio UI with streaming                                 │
-│  - MCP server for hackathon track                           │
-├─────────────────────────────────────────────────────────────┤
-│  If Crushing It (Days 5-6): Steal the Gucci                 │
-│  - @tool decorators from Claude SDK                         │
-│  - Reflect on tool use from AutoGen                         │
-│  - Hooks for observability                                  │
-├─────────────────────────────────────────────────────────────┤
-│  Post-Hackathon: Full AutoGen Integration                   │
-│  - Multi-agent workflows                                    │
-│  - Specialized agent roles                                  │
-│  - Production-grade orchestration                           │
-└─────────────────────────────────────────────────────────────┘
-```
-**Ship MVP first. Steal bangers if time. Scale later.**
----
-## 16. Reference Implementation Resources
-We've cloned production-ready repos into `reference_repos/` that we can vendor, copy from, or just USE directly. This section documents what's available and how to leverage it.
-### Cloned Repositories
-| Repository | Location | What It Provides |
-|------------|----------|------------------|
-| **pydanticai-research-agent** | `reference_repos/pydanticai-research-agent/` | Complete PydanticAI agent with Brave Search |
-| **pubmed-mcp-server** | `reference_repos/pubmed-mcp-server/` | Production-grade PubMed MCP server (TypeScript) |
-| **autogen-microsoft** | `reference_repos/autogen-microsoft/` | Microsoft's multi-agent framework |
-| **claude-agent-sdk** | `reference_repos/claude-agent-sdk/` | Anthropic's agent SDK with @tool decorator |
-### 🔥 CHEAT CODE: Production PubMed MCP Already Exists
-The `pubmed-mcp-server` is **production-grade** and has EVERYTHING we need:
-```bash
-# Already available tools in pubmed-mcp-server:
-pubmed_search_articles    # Search PubMed with filters, date ranges
-pubmed_fetch_contents     # Get full article details by PMID
-pubmed_article_connections # Find citations, related articles
-pubmed_research_agent     # Generate research plan outlines
-pubmed_generate_chart     # Create PNG charts from data
-```
-**Option 1: Use it directly via npx**
-```json
-{
-  "mcpServers": {
-    "pubmed": {
-      "command": "npx",
-      "args": ["@cyanheads/pubmed-mcp-server"],
-      "env": { "NCBI_API_KEY": "your_key" }
-    }
-  }
-}
-```
-**Option 2: Vendor the logic into Python**
-The TypeScript code in `reference_repos/pubmed-mcp-server/src/` shows exactly how to:
-- Construct PubMed E-utilities queries
-- Handle rate limiting (3/sec without key, 10/sec with key)
-- Parse XML responses
-- Extract article metadata
-### PydanticAI Research Agent Patterns
-The `pydanticai-research-agent` repo provides copy-paste patterns:
-**Agent Definition** (`agents/research_agent.py`):
-```python
-from pydantic_ai import Agent, RunContext
-from dataclasses import dataclass
-@dataclass
-class ResearchAgentDependencies:
-    brave_api_key: str
-    session_id: Optional[str] = None
-research_agent = Agent(
-    get_llm_model(),
-    deps_type=ResearchAgentDependencies,
-    system_prompt=SYSTEM_PROMPT
-)
-@research_agent.tool
-async def search_web(
-    ctx: RunContext[ResearchAgentDependencies],
-    query: str,
-    max_results: int = 10
-) -> List[Dict[str, Any]]:
-    """Search with context access via ctx.deps"""
-    results = await search_web_tool(ctx.deps.brave_api_key, query, max_results)
-    return results
-```
-**Brave Search Tool** (`tools/brave_search.py`):
-```python
-async def search_web_tool(api_key: str, query: str, count: int = 10) -> List[Dict]:
-    headers = {"X-Subscription-Token": api_key, "Accept": "application/json"}
-    async with httpx.AsyncClient() as client:
-        response = await client.get(
-            "https://api.search.brave.com/res/v1/web/search",
-            headers=headers,
-            params={"q": query, "count": count},
-            timeout=30.0
-        )
-    # Handle 429 rate limit, 401 auth errors
-    data = response.json()
-    return data.get("web", {}).get("results", [])
-```
-**Pydantic Models** (`models/research_models.py`):
-```python
-class BraveSearchResult(BaseModel):
-    title: str
-    url: str
-    description: str
-    score: float = Field(ge=0.0, le=1.0)
-```
-### Microsoft Agent Framework Orchestration Patterns
-From [deepwiki.com/microsoft/agent-framework](https://deepwiki.com/microsoft/agent-framework/3.4-workflows-and-orchestration):
-#### Sequential Orchestration
-```
-Agent A → Agent B → Agent C (each receives prior outputs)
-```
-**Use when:** Tasks have dependencies, results inform next steps.
-#### Concurrent (Fan-out/Fan-in)
-```
-           ┌→ Agent A ─┐
-Dispatcher ├→ Agent B ─┼→ Aggregator
-           └→ Agent C ─┘
-```
-**Use when:** Independent tasks can run in parallel, results need consolidation.
-**Our use:** Parallel PubMed + Web search.
-#### Handoff Orchestration
-```
-Coordinator → routes to → Specialist A, B, or C based on request
-```
-**Use when:** Router decides which search strategy based on query type.
-**Our use:** Route "mechanism" vs "clinical trial" vs "drug info" queries.
-#### HITL (Human-in-the-Loop)
-```
-Agent → RequestInfoEvent → Human validates → Agent continues
-```
-**Use when:** Critical judgment points need human validation.
-**Our use:** Optional "approve drug candidates before synthesis" step.
-### Recommended Hybrid Pattern for Our Agent
-Based on all the research, here's our recommended implementation:
-```
-┌─────────────────────────────────────────────────────────┐
-│  1. ROUTER (Handoff Pattern)                             │
-│     - Analyze query type                                 │
-│     - Choose search strategy                             │
-├─────────────────────────────────────────────────────────┤
-│  2. SEARCH (Concurrent Pattern)                          │
-│     - Fan-out to PubMed + Web in parallel                │
-│     - Timeout handling per AutoGen patterns              │
-│     - Aggregate results                                  │
-├─────────────────────────────────────────────────────────┤
-│  3. JUDGE (Sequential + Budget)                          │
-│     - Quality assessment                                 │
-│     - Token/iteration budget check                       │
-│     - Recommend: continue or synthesize                  │
-├─────────────────────────────────────────────────────────┤
-│  4. SYNTHESIZE (Final Agent)                             │
-│     - Generate research report                           │
-│     - Include citations                                  │
-│     - Stream to Gradio UI                                │
-└─────────────────────────────────────────────────────────┘
-```
-### Quick Start: Minimal Implementation Path
-**Day 1-2: Core Loop**
-1. Copy `search_web_tool` from `pydanticai-research-agent/tools/brave_search.py`
-2. Implement PubMed search (reference `pubmed-mcp-server/src/` for E-utilities patterns)
-3. Wire up basic search-judge loop
-**Day 3: Judge + State**
-1. Implement quality judge with JSON structured output
-2. Add budget judge
-3. Add Pydantic state management
-**Day 4: UI + MCP**
-1. Gradio streaming UI
-2. Wrap PubMed tool as FastMCP server
-**Day 5-6: Polish + Deploy**
-1. HuggingFace Spaces deployment
-2. Demo video
-3. Stretch goals if time
----
-## 17. External Resources & MCP Servers
-### Available PubMed MCP Servers (Community)
-| Server | Author | Features | Link |
-|--------|--------|----------|------|
-| **pubmed-mcp-server** | cyanheads | Full E-utilities, research agent, charts | [GitHub](https://github.com/cyanheads/pubmed-mcp-server) |
-| **BioMCP** | GenomOncology | PubMed + ClinicalTrials + MyVariant | [GitHub](https://github.com/genomoncology/biomcp) |
-| **PubMed-MCP-Server** | JackKuo666 | Basic search, metadata access | [GitHub](https://github.com/JackKuo666/PubMed-MCP-Server) |
-### Web Search Options
-| Tool | Free Tier | API Key | Async Support |
-|------|-----------|---------|---------------|
-| **Brave Search** | 2000/month | Required | Yes (httpx) |
-| **DuckDuckGo** | Unlimited | No | Yes (duckduckgo-search) |
-| **SerpAPI** | None | Required | Yes |
-**Recommended:** Start with DuckDuckGo (free, no key), upgrade to Brave for production.
-```python
-# DuckDuckGo async search (no API key needed!)
-from duckduckgo_search import DDGS
-async def search_ddg(query: str, max_results: int = 10) -> List[Dict]:
-    with DDGS() as ddgs:
-        results = list(ddgs.text(query, max_results=max_results))
-    return [{"title": r["title"], "url": r["href"], "description": r["body"]} for r in results]
-```
----
-**Document Status**: Official Architecture Spec
-**Review Score**: 100/100 (Ironclad Gucci Banger Edition)
-**Sections**: 17 design patterns + data models appendix + reference repos + stretch goals
-**Last Updated**: November 2025

docs/architecture/graph-orchestration.md ADDED Viewed

	@@ -0,0 +1,152 @@

+# Graph Orchestration Architecture
+## Overview
+Phase 4 implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
+## Graph Structure
+### Nodes
+Graph nodes represent different stages in the research workflow:
+1. **Agent Nodes**: Execute Pydantic AI agents
+   - Input: Prompt/query
+   - Output: Structured or unstructured response
+   - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
+2. **State Nodes**: Update or read workflow state
+   - Input: Current state
+   - Output: Updated state
+   - Examples: Update evidence, update conversation history
+3. **Decision Nodes**: Make routing decisions based on conditions
+   - Input: Current state/results
+   - Output: Next node ID
+   - Examples: Continue research vs. complete research
+4. **Parallel Nodes**: Execute multiple nodes concurrently
+   - Input: List of node IDs
+   - Output: Aggregated results
+   - Examples: Parallel iterative research loops
+### Edges
+Edges define transitions between nodes:
+1. **Sequential Edges**: Always traversed (no condition)
+   - From: Source node
+   - To: Target node
+   - Condition: None (always True)
+2. **Conditional Edges**: Traversed based on condition
+   - From: Source node
+   - To: Target node
+   - Condition: Callable that returns bool
+   - Example: If research complete → go to writer, else → continue loop
+3. **Parallel Edges**: Used for parallel execution branches
+   - From: Parallel node
+   - To: Multiple target nodes
+   - Execution: All targets run concurrently
+## Graph Patterns
+### Iterative Research Graph
+```
+[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
+                                              ↓ No          ↓ Yes
+                                    [Tool Selector]    [Writer]
+                                              ↓
+                                    [Execute Tools] → [Loop Back]
+```
+### Deep Research Graph
+```
+[Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
+                           ↓         ↓         ↓
+                        [Loop1]  [Loop2]  [Loop3]
+```
+## State Management
+State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
+- **Evidence**: Collected evidence from searches
+- **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
+- **Embedding Service**: For semantic search
+State transitions occur at state nodes, which update the global workflow state.
+## Execution Flow
+1. **Graph Construction**: Build graph from nodes and edges
+2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
+3. **Graph Execution**: Traverse graph from entry node
+4. **Node Execution**: Execute each node based on type
+5. **Edge Evaluation**: Determine next node(s) based on edges
+6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
+7. **State Updates**: Update state at state nodes
+8. **Event Streaming**: Yield events during execution for UI
+## Conditional Routing
+Decision nodes evaluate conditions and return next node IDs:
+- **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
+- **Budget Decision**: If budget exceeded → exit, else → continue
+- **Iteration Decision**: If max iterations → exit, else → continue
+## Parallel Execution
+Parallel nodes execute multiple nodes concurrently:
+- Each parallel branch runs independently
+- Results are aggregated after all branches complete
+- State is synchronized after parallel execution
+- Errors in one branch don't stop other branches
+## Budget Enforcement
+Budget constraints are enforced at decision nodes:
+- **Token Budget**: Track LLM token usage
+- **Time Budget**: Track elapsed time
+- **Iteration Budget**: Track iteration count
+If any budget is exceeded, execution routes to exit node.
+## Error Handling
+Errors are handled at multiple levels:
+1. **Node Level**: Catch errors in individual node execution
+2. **Graph Level**: Handle errors during graph traversal
+3. **State Level**: Rollback state changes on error
+Errors are logged and yield error events for UI.
+## Backward Compatibility
+Graph execution is optional via feature flag:
+- `USE_GRAPH_EXECUTION=true`: Use graph-based execution
+- `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
+This allows gradual migration and fallback if needed.

docs/architecture/graph_orchestration.md CHANGED Viewed

	@@ -137,6 +137,14 @@ Graph execution is optional via feature flag:
137
138	This allows gradual migration and fallback if needed.
139








140
141
142

 This allows gradual migration and fallback if needed.
+## See Also
+- [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
+- [Workflows](workflows.md) - Workflow diagrams and patterns
+- [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
+- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/middleware.md CHANGED Viewed

	@@ -137,6 +137,3 @@ All middleware components use `ContextVar` for thread-safe isolation:
137
138
139
140	-
141	-
142	-


137
138
139

docs/architecture/orchestrators.md ADDED Viewed

	@@ -0,0 +1,198 @@

+# Orchestrators Architecture
+DeepCritical supports multiple orchestration patterns for research workflows.
+## Research Flows
+### IterativeResearchFlow
+**File**: `src/orchestrator/research_flow.py`
+**Pattern**: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete
+**Agents Used**:
+- `KnowledgeGapAgent`: Evaluates research completeness
+- `ToolSelectorAgent`: Selects tools for addressing gaps
+- `ThinkingAgent`: Generates observations
+- `WriterAgent`: Creates final report
+- `JudgeHandler`: Assesses evidence sufficiency
+**Features**:
+- Tracks iterations, time, budget
+- Supports graph execution (`use_graph=True`) and agent chains (`use_graph=False`)
+- Iterates until research complete or constraints met
+**Usage**:
+```python
+from src.orchestrator.research_flow import IterativeResearchFlow
+flow = IterativeResearchFlow(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    use_graph=False
+)
+async for event in flow.run(query):
+    # Handle events
+    pass
+```
+### DeepResearchFlow
+**File**: `src/orchestrator/research_flow.py`
+**Pattern**: Planner → Parallel iterative loops per section → Synthesizer
+**Agents Used**:
+- `PlannerAgent`: Breaks query into report sections
+- `IterativeResearchFlow`: Per-section research (parallel)
+- `LongWriterAgent` or `ProofreaderAgent`: Final synthesis
+**Features**:
+- Uses `WorkflowManager` for parallel execution
+- Budget tracking per section and globally
+- State synchronization across parallel loops
+- Supports graph execution and agent chains
+**Usage**:
+```python
+from src.orchestrator.research_flow import DeepResearchFlow
+flow = DeepResearchFlow(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    use_graph=True
+)
+async for event in flow.run(query):
+    # Handle events
+    pass
+```
+## Graph Orchestrator
+**File**: `src/orchestrator/graph_orchestrator.py`
+**Purpose**: Graph-based execution using Pydantic AI agents as nodes
+**Features**:
+- Uses Pydantic AI Graphs (when available) or agent chains (fallback)
+- Routes based on research mode (iterative/deep/auto)
+- Streams `AgentEvent` objects for UI
+**Node Types**:
+- **Agent Nodes**: Execute Pydantic AI agents
+- **State Nodes**: Update or read workflow state
+- **Decision Nodes**: Make routing decisions
+- **Parallel Nodes**: Execute multiple nodes concurrently
+**Edge Types**:
+- **Sequential Edges**: Always traversed
+- **Conditional Edges**: Traversed based on condition
+- **Parallel Edges**: Used for parallel execution branches
+## Orchestrator Factory
+**File**: `src/orchestrator_factory.py`
+**Purpose**: Factory for creating orchestrators
+**Modes**:
+- **Simple**: Legacy orchestrator (backward compatible)
+- **Advanced**: Magentic orchestrator (requires OpenAI API key)
+- **Auto-detect**: Chooses based on API key availability
+**Usage**:
+```python
+from src.orchestrator_factory import create_orchestrator
+orchestrator = create_orchestrator(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    config={},
+    mode="advanced"  # or "simple" or None for auto-detect
+)
+```
+## Magentic Orchestrator
+**File**: `src/orchestrator_magentic.py`
+**Purpose**: Multi-agent coordination using Microsoft Agent Framework
+**Features**:
+- Uses `agent-framework-core`
+- ChatAgent pattern with internal LLMs per agent
+- `MagenticBuilder` with participants: searcher, hypothesizer, judge, reporter
+- Manager orchestrates agents via `OpenAIChatClient`
+- Requires OpenAI API key (function calling support)
+- Event-driven: converts Magentic events to `AgentEvent` for UI streaming
+**Requirements**:
+- `agent-framework-core` package
+- OpenAI API key
+## Hierarchical Orchestrator
+**File**: `src/orchestrator_hierarchical.py`
+**Purpose**: Hierarchical orchestrator using middleware and sub-teams
+**Features**:
+- Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`
+- Adapts Magentic ChatAgent to `SubIterationTeam` protocol
+- Event-driven via `asyncio.Queue` for coordination
+- Supports sub-iteration patterns for complex research tasks
+## Legacy Simple Mode
+**File**: `src/legacy_orchestrator.py`
+**Purpose**: Linear search-judge-synthesize loop
+**Features**:
+- Uses `SearchHandlerProtocol` and `JudgeHandlerProtocol`
+- Generator-based design yielding `AgentEvent` objects
+- Backward compatibility for simple use cases
+## State Initialization
+All orchestrators must initialize workflow state:
+```python
+from src.middleware.state_machine import init_workflow_state
+from src.services.embeddings import get_embedding_service
+embedding_service = get_embedding_service()
+init_workflow_state(embedding_service)
+```
+## Event Streaming
+All orchestrators yield `AgentEvent` objects:
+**Event Types**:
+- `started`: Research started
+- `search_complete`: Search completed
+- `judge_complete`: Evidence evaluation completed
+- `hypothesizing`: Generating hypotheses
+- `synthesizing`: Synthesizing results
+- `complete`: Research completed
+- `error`: Error occurred
+**Event Structure**:
+```python
+class AgentEvent:
+    type: str
+    iteration: int | None
+    data: dict[str, Any]
+```
+## See Also
+- [Graph Orchestration](graph-orchestration.md) - Graph-based execution details
+- [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
+- [Workflows](workflows.md) - Workflow diagrams and patterns
+- [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
+- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/overview.md DELETED Viewed

@@ -1,474 +0,0 @@
-# DeepCritical: Medical Drug Repurposing Research Agent
-## Project Overview
----
-## Executive Summary
-**DeepCritical** is a deep research agent designed to accelerate medical drug repurposing research by autonomously searching, analyzing, and synthesizing evidence from multiple biomedical databases.
-### The Problem We Solve
-Drug repurposing - finding new therapeutic uses for existing FDA-approved drugs - can take years of manual literature review. Researchers must:
-- Search thousands of papers across multiple databases
-- Identify molecular mechanisms
-- Find relevant clinical trials
-- Assess safety profiles
-- Synthesize evidence into actionable insights
-**DeepCritical automates this process from hours to minutes.**
-### What Is Drug Repurposing?
-**Simple Explanation:**
-Using existing approved drugs to treat NEW diseases they weren't originally designed for.
-**Real Examples:**
-- **Viagra** (sildenafil): Originally for heart disease → Now treats erectile dysfunction
-- **Thalidomide**: Once banned → Now treats multiple myeloma
-- **Aspirin**: Pain reliever → Heart attack prevention
-- **Metformin**: Diabetes drug → Being tested for aging/longevity
-**Why It Matters:**
-- Faster than developing new drugs (years vs decades)
-- Cheaper (known safety profiles)
-- Lower risk (already FDA approved)
-- Immediate patient benefit potential
----
-## Core Use Case
-### Primary Query Type
-> "What existing drugs might help treat [disease/condition]?"
-### Example Queries
-1. **Long COVID Fatigue**
-   - Query: "What existing drugs might help treat long COVID fatigue?"
-   - Agent searches: PubMed, clinical trials, drug databases
-   - Output: List of candidate drugs with mechanisms + evidence + citations
-2. **Alzheimer's Disease**
-   - Query: "Find existing drugs that target beta-amyloid pathways"
-   - Agent identifies: Disease mechanisms → Drug candidates → Clinical evidence
-   - Output: Comprehensive research report with drug candidates
-3. **Rare Disease Treatment**
-   - Query: "What drugs might help with fibrodysplasia ossificans progressiva?"
-   - Agent finds: Similar conditions → Shared pathways → Potential treatments
-   - Output: Evidence-based treatment suggestions
----
-## System Architecture
-### High-Level Design (Phases 1-8)
-```text
-User Query
-    ↓
-Gradio UI (Phase 4)
-    ↓
-Magentic Manager (Phase 5) ← LLM-powered coordinator
-    ├── SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
-    ├── HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
-    ├── JudgeAgent (Phase 3+5) ←→ Evidence Assessment
-    └── ReportAgent (Phase 8) ←→ Final Synthesis
-    ↓
-Structured Research Report
-```
-### Key Components
-1. **Magentic Manager (Orchestrator)**
-   - LLM-powered multi-agent coordinator
-   - Dynamic planning and agent selection
-   - Built-in stall detection and replanning
-   - Microsoft Agent Framework integration
-2. **SearchAgent (Phase 2+5+6)**
-   - PubMed E-utilities search
-   - DuckDuckGo web search
-   - Semantic search via ChromaDB (Phase 6)
-   - Evidence deduplication
-3. **HypothesisAgent (Phase 7)**
-   - Generates Drug → Target → Pathway → Effect hypotheses
-   - Guides targeted searches
-   - Scientific reasoning about mechanisms
-4. **JudgeAgent (Phase 3+5)**
-   - LLM-based evidence assessment
-   - Mechanism score + Clinical score
-   - Recommends continue/synthesize
-   - Generates refined search queries
-5. **ReportAgent (Phase 8)**
-   - Structured scientific reports
-   - Executive summary, methodology
-   - Hypotheses tested with evidence counts
-   - Proper citations and limitations
-6. **Gradio UI (Phase 4)**
-   - Chat interface for questions
-   - Real-time progress via events
-   - Mode toggle (Simple/Magentic)
-   - Formatted markdown output
----
-## Design Patterns
-### 1. Search-and-Judge Loop (Primary Pattern)
-```python
-def research(question: str) -> Report:
-    context = []
-    for iteration in range(max_iterations):
-        # SEARCH: Query relevant tools
-        results = search_tools(question, context)
-        context.extend(results)
-        # JUDGE: Evaluate quality
-        if judge.is_sufficient(question, context):
-            break
-        # REFINE: Adjust search strategy
-        query = refine_query(question, context)
-    # SYNTHESIZE: Generate report
-    return synthesize_report(question, context)
-```
-**Why This Pattern:**
-- Simple to implement and debug
-- Clear loop termination conditions
-- Iterative improvement of search quality
-- Balances depth vs speed
-### 2. Multi-Tool Orchestration
-```
-Question → Agent decides which tools to use
-           ↓
-       ┌───┴────┬─────────┬──────────┐
-       ↓        ↓         ↓          ↓
-   PubMed  Web Search  Trials DB  Drug DB
-       ↓        ↓         ↓          ↓
-       └───┬────┴─────────┴──��───────┘
-           ↓
-    Aggregate Results → Judge
-```
-**Why This Pattern:**
-- Different sources provide different evidence types
-- Parallel tool execution (when possible)
-- Comprehensive coverage
-### 3. LLM-as-Judge with Token Budget
-**Dual Stopping Conditions:**
-- **Smart Stop**: LLM judge says "we have sufficient evidence"
-- **Hard Stop**: Token budget exhausted OR max iterations reached
-**Why Both:**
-- Judge enables early exit when answer is good
-- Budget prevents runaway costs
-- Iterations prevent infinite loops
-### 4. Stateful Checkpointing
-```
-.deepresearch/
-├── state/
-│   └── query_123.json    # Current research state
-├── checkpoints/
-│   └── query_123_iter3/  # Checkpoint at iteration 3
-└── workspace/
-    └── query_123/        # Downloaded papers, data
-```
-**Why This Pattern:**
-- Resume interrupted research
-- Debugging and analysis
-- Cost savings (don't re-search)
----
-## Component Breakdown
-### Agent (Orchestrator)
-- **Responsibility**: Coordinate research process
-- **Size**: ~100 lines
-- **Key Methods**:
-  - `research(question)` - Main entry point
-  - `plan_search_strategy()` - Decide what to search
-  - `execute_search()` - Run tool queries
-  - `evaluate_progress()` - Call judge
-  - `synthesize_findings()` - Generate report
-### Tools
-- **Responsibility**: Interface with external data sources
-- **Size**: ~50 lines per tool
-- **Implementations**:
-  - `PubMedTool` - Search biomedical literature
-  - `WebSearchTool` - General medical information
-  - `ClinicalTrialsTool` - Trial data (optional)
-  - `DrugInfoTool` - FDA drug database (optional)
-### Judge
-- **Responsibility**: Evaluate evidence quality
-- **Size**: ~50 lines
-- **Key Methods**:
-  - `is_sufficient(question, evidence)` → bool
-  - `assess_quality(evidence)` → score
-  - `identify_gaps(question, evidence)` → missing_info
-### Gradio App
-- **Responsibility**: User interface
-- **Size**: ~50 lines
-- **Features**:
-  - Text input for questions
-  - Progress indicators
-  - Formatted output with citations
-  - Download research report
----
-## Technical Stack
-### Core Dependencies
-```toml
-[dependencies]
-python = ">=3.10"
-pydantic = "^2.7"
-pydantic-ai = "^0.0.16"
-fastmcp = "^0.1.0"
-gradio = "^5.0"
-beautifulsoup4 = "^4.12"
-httpx = "^0.27"
-```
-### Optional Enhancements
-- `modal` - For GPU-accelerated local LLM
-- `fastmcp` - MCP server integration
-- `sentence-transformers` - Semantic search
-- `faiss-cpu` - Vector similarity
-### Tool APIs & Rate Limits
-| API | Cost | Rate Limit | API Key? | Notes |
-|-----|------|------------|----------|-------|
-| **PubMed E-utilities** | Free | 3/sec (no key), 10/sec (with key) | Optional | Register at NCBI for higher limits |
-| **Brave Search API** | Free tier | 2000/month free | Required | Primary web search |
-| **DuckDuckGo** | Free | Unofficial, ~1/sec | No | Fallback web search |
-| **ClinicalTrials.gov** | Free | 100/min | No | Stretch goal |
-| **OpenFDA** | Free | 240/min (no key), 120K/day (with key) | Optional | Drug info |
-**Web Search Strategy (Priority Order):**
-1. **Brave Search API** (free tier: 2000 queries/month) - Primary
-2. **DuckDuckGo** (unofficial, no API key) - Fallback
-3. **SerpAPI** ($50/month) - Only if free options fail
-**Why NOT SerpAPI first?**
-- Costs money (hackathon budget = $0)
-- Free alternatives work fine for demo
-- Can upgrade later if needed
----
-## Success Criteria
-### Phase 1-5 (MVP) ✅ COMPLETE
-**Completed in ONE DAY:**
-- [x] User can ask drug repurposing question
-- [x] Agent searches PubMed (async)
-- [x] Agent searches web (DuckDuckGo)
-- [x] LLM judge evaluates evidence quality
-- [x] System respects token budget and iterations
-- [x] Output includes drug candidates + citations
-- [x] Works end-to-end for demo query
-- [x] Gradio UI with streaming progress
-- [x] Magentic multi-agent orchestration
-- [x] 38 unit tests passing
-- [x] CI/CD pipeline green
-### Hackathon Submission ✅ COMPLETE
-- [x] Gradio UI deployed on HuggingFace Spaces
-- [x] Example queries working and tested
-- [x] Architecture documentation
-- [x] README with setup instructions
-### Phase 6-8 (Enhanced)
-**Specs ready for implementation:**
-- [ ] Embeddings & Semantic Search (Phase 6)
-- [ ] Hypothesis Agent (Phase 7)
-- [ ] Report Agent (Phase 8)
-### What's EXPLICITLY Out of Scope
-**NOT building (to stay focused):**
-- ❌ User authentication
-- ❌ Database storage of queries
-- ❌ Multi-user support
-- ❌ Payment/billing
-- ❌ Production monitoring
-- ❌ Mobile UI
----
-## Implementation Timeline
-### Day 1 (Today): Architecture & Setup
-- [x] Define use case (drug repurposing) ✅
-- [x] Write architecture docs ✅
-- [ ] Create project structure
-- [ ] First PR: Structure + Docs
-### Day 2: Core Agent Loop
-- [ ] Implement basic orchestrator
-- [ ] Add PubMed search tool
-- [ ] Simple judge (keyword-based)
-- [ ] Test with 1 query
-### Day 3: Intelligence Layer
-- [ ] Upgrade to LLM judge
-- [ ] Add web search tool
-- [ ] Token budget tracking
-- [ ] Test with multiple queries
-### Day 4: UI & Integration
-- [ ] Build Gradio interface
-- [ ] Wire up agent to UI
-- [ ] Add progress indicators
-- [ ] Format output nicely
-### Day 5: Polish & Extend
-- [ ] Add more tools (clinical trials)
-- [ ] Improve judge prompts
-- [ ] Checkpoint system
-- [ ] Error handling
-### Day 6: Deploy & Document
-- [ ] Deploy to HuggingFace Spaces
-- [ ] Record demo video
-- [ ] Write submission materials
-- [ ] Final testing
----
-## Questions This Document Answers
-### For The Maintainer
-**Q: "What should our design pattern be?"**
-A: Search-and-judge loop with multi-tool orchestration (detailed in Design Patterns section)
-**Q: "Should we use LLM-as-judge or token budget?"**
-A: Both - judge for smart stopping, budget for cost control
-**Q: "What's the break pattern?"**
-A: Three conditions: judge approval, token limit, or max iterations (whichever comes first)
-**Q: "What components do we need?"**
-A: Agent orchestrator, tools (PubMed/web), judge, Gradio UI (see Component Breakdown)
-### For The Team
-**Q: "What are we actually building?"**
-A: Medical drug repurposing research agent (see Core Use Case)
-**Q: "How complex should it be?"**
-A: Simple but complete - ~300 lines of core code (see Component sizes)
-**Q: "What's the timeline?"**
-A: 6 days, MVP by Day 3, polish Days 4-6 (see Implementation Timeline)
-**Q: "What datasets/APIs do we use?"**
-A: PubMed (free), web search, clinical trials.gov (see Tool APIs)
----
-## Next Steps
-1. **Review this document** - Team feedback on architecture
-2. **Finalize design** - Incorporate feedback
-3. **Create project structure** - Scaffold repository
-4. **Move to proper docs** - `docs/architecture/` folder
-5. **Open first PR** - Structure + Documentation
-6. **Start implementation** - Day 2 onward
----
-## Notes & Decisions
-### Why Drug Repurposing?
-- Clear, impressive use case
-- Real-world medical impact
-- Good data availability (PubMed, trials)
-- Easy to explain (Viagra example!)
-- Physician on team ✅
-### Why Simple Architecture?
-- 6-day timeline
-- Need working end-to-end system
-- Hackathon judges value "works" over "complex"
-- Can extend later if successful
-### Why These Tools First?
-- PubMed: Best biomedical literature source
-- Web search: General medical knowledge
-- Clinical trials: Evidence of actual testing
-- Others: Nice-to-have, not critical for MVP
----
----
-## Appendix A: Demo Queries (Pre-tested)
-These queries will be used for demo and testing. They're chosen because:
-1. They have good PubMed coverage
-2. They're medically interesting
-3. They show the system's capabilities
-### Primary Demo Query
-```
-"What existing drugs might help treat long COVID fatigue?"
-```
-**Expected candidates**: CoQ10, Low-dose Naltrexone, Modafinil
-**Expected sources**: 20+ PubMed papers, 2-3 clinical trials
-### Secondary Demo Queries
-```
-"Find existing drugs that might slow Alzheimer's progression"
-"What approved medications could help with fibromyalgia pain?"
-"Which diabetes drugs show promise for cancer treatment?"
-```
-### Why These Queries?
-- Represent real clinical needs
-- Have substantial literature
-- Show diverse drug classes
-- Physician on team can validate results
----
-## Appendix B: Risk Assessment
-| Risk | Likelihood | Impact | Mitigation |
-|------|------------|--------|------------|
-| PubMed rate limiting | Medium | High | Implement caching, respect 3/sec |
-| Web search API fails | Low | Medium | DuckDuckGo fallback |
-| LLM costs exceed budget | Medium | Medium | Hard token cap at 50K |
-| Judge quality poor | Medium | High | Pre-test prompts, iterate |
-| HuggingFace deploy issues | Low | High | Test deployment Day 4 |
-| Demo crashes live | Medium | High | Pre-recorded backup video |
----
----
-**Document Status**: Official Architecture Spec
-**Review Score**: 98/100
-**Last Updated**: November 2025

docs/architecture/services.md CHANGED Viewed

	@@ -137,6 +137,3 @@ if settings.has_openai_key:
137
138
139
140	-
141	-
142	-


137
138
139

docs/architecture/tools.md CHANGED Viewed

	@@ -170,6 +170,3 @@ search_handler = SearchHandler(
170
171
172
173	-
174	-
175	-


170
171
172

docs/architecture/workflow-diagrams.md ADDED Viewed

	@@ -0,0 +1,670 @@

+# DeepCritical Workflow - Simplified Magentic Architecture
+> **Architecture Pattern**: Microsoft Magentic Orchestration
+> **Design Philosophy**: Simple, dynamic, manager-driven coordination
+> **Key Innovation**: Intelligent manager replaces rigid sequential phases
+---
+## 1. High-Level Magentic Workflow
+```mermaid
+flowchart TD
+    Start([User Query]) --> Manager[Magentic Manager<br/>Plan • Select • Assess • Adapt]
+    Manager -->|Plans| Task1[Task Decomposition]
+    Task1 --> Manager
+    Manager -->|Selects & Executes| HypAgent[Hypothesis Agent]
+    Manager -->|Selects & Executes| SearchAgent[Search Agent]
+    Manager -->|Selects & Executes| AnalysisAgent[Analysis Agent]
+    Manager -->|Selects & Executes| ReportAgent[Report Agent]
+    HypAgent -->|Results| Manager
+    SearchAgent -->|Results| Manager
+    AnalysisAgent -->|Results| Manager
+    ReportAgent -->|Results| Manager
+    Manager -->|Assesses Quality| Decision{Good Enough?}
+    Decision -->|No - Refine| Manager
+    Decision -->|No - Different Agent| Manager
+    Decision -->|No - Stalled| Replan[Reset Plan]
+    Replan --> Manager
+    Decision -->|Yes| Synthesis[Synthesize Final Result]
+    Synthesis --> Output([Research Report])
+    style Start fill:#e1f5e1
+    style Manager fill:#ffe6e6
+    style HypAgent fill:#fff4e6
+    style SearchAgent fill:#fff4e6
+    style AnalysisAgent fill:#fff4e6
+    style ReportAgent fill:#fff4e6
+    style Decision fill:#ffd6d6
+    style Synthesis fill:#d4edda
+    style Output fill:#e1f5e1
+```
+## 2. Magentic Manager: The 6-Phase Cycle
+```mermaid
+flowchart LR
+    P1[1. Planning<br/>Analyze task<br/>Create strategy] --> P2[2. Agent Selection<br/>Pick best agent<br/>for subtask]
+    P2 --> P3[3. Execution<br/>Run selected<br/>agent with tools]
+    P3 --> P4[4. Assessment<br/>Evaluate quality<br/>Check progress]
+    P4 --> Decision{Quality OK?<br/>Progress made?}
+    Decision -->|Yes| P6[6. Synthesis<br/>Combine results<br/>Generate report]
+    Decision -->|No| P5[5. Iteration<br/>Adjust plan<br/>Try again]
+    P5 --> P2
+    P6 --> Done([Complete])
+    style P1 fill:#fff4e6
+    style P2 fill:#ffe6e6
+    style P3 fill:#e6f3ff
+    style P4 fill:#ffd6d6
+    style P5 fill:#fff3cd
+    style P6 fill:#d4edda
+    style Done fill:#e1f5e1
+```
+## 3. Simplified Agent Architecture
+```mermaid
+graph TB
+    subgraph "Orchestration Layer"
+        Manager[Magentic Manager<br/>• Plans workflow<br/>• Selects agents<br/>• Assesses quality<br/>• Adapts strategy]
+        SharedContext[(Shared Context<br/>• Hypotheses<br/>• Search Results<br/>• Analysis<br/>• Progress)]
+        Manager <--> SharedContext
+    end
+    subgraph "Specialist Agents"
+        HypAgent[Hypothesis Agent<br/>• Domain understanding<br/>• Hypothesis generation<br/>• Testability refinement]
+        SearchAgent[Search Agent<br/>• Multi-source search<br/>• RAG retrieval<br/>• Result ranking]
+        AnalysisAgent[Analysis Agent<br/>• Evidence extraction<br/>• Statistical analysis<br/>• Code execution]
+        ReportAgent[Report Agent<br/>• Report assembly<br/>• Visualization<br/>• Citation formatting]
+    end
+    subgraph "MCP Tools"
+        WebSearch[Web Search<br/>PubMed • arXiv • bioRxiv]
+        CodeExec[Code Execution<br/>Sandboxed Python]
+        RAG[RAG Retrieval<br/>Vector DB • Embeddings]
+        Viz[Visualization<br/>Charts • Graphs]
+    end
+    Manager -->|Selects & Directs| HypAgent
+    Manager -->|Selects & Directs| SearchAgent
+    Manager -->|Selects & Directs| AnalysisAgent
+    Manager -->|Selects & Directs| ReportAgent
+    HypAgent --> SharedContext
+    SearchAgent --> SharedContext
+    AnalysisAgent --> SharedContext
+    ReportAgent --> SharedContext
+    SearchAgent --> WebSearch
+    SearchAgent --> RAG
+    AnalysisAgent --> CodeExec
+    ReportAgent --> CodeExec
+    ReportAgent --> Viz
+    style Manager fill:#ffe6e6
+    style SharedContext fill:#ffe6f0
+    style HypAgent fill:#fff4e6
+    style SearchAgent fill:#fff4e6
+    style AnalysisAgent fill:#fff4e6
+    style ReportAgent fill:#fff4e6
+    style WebSearch fill:#e6f3ff
+    style CodeExec fill:#e6f3ff
+    style RAG fill:#e6f3ff
+    style Viz fill:#e6f3ff
+```
+## 4. Dynamic Workflow Example
+```mermaid
+sequenceDiagram
+    participant User
+    participant Manager
+    participant HypAgent
+    participant SearchAgent
+    participant AnalysisAgent
+    participant ReportAgent
+    User->>Manager: "Research protein folding in Alzheimer's"
+    Note over Manager: PLAN: Generate hypotheses → Search → Analyze → Report
+    Manager->>HypAgent: Generate 3 hypotheses
+    HypAgent-->>Manager: Returns 3 hypotheses
+    Note over Manager: ASSESS: Good quality, proceed
+    Manager->>SearchAgent: Search literature for hypothesis 1
+    SearchAgent-->>Manager: Returns 15 papers
+    Note over Manager: ASSESS: Good results, continue
+    Manager->>SearchAgent: Search for hypothesis 2
+    SearchAgent-->>Manager: Only 2 papers found
+    Note over Manager: ASSESS: Insufficient, refine search
+    Manager->>SearchAgent: Refined query for hypothesis 2
+    SearchAgent-->>Manager: Returns 12 papers
+    Note over Manager: ASSESS: Better, proceed
+    Manager->>AnalysisAgent: Analyze evidence for all hypotheses
+    AnalysisAgent-->>Manager: Returns analysis with code
+    Note over Manager: ASSESS: Complete, generate report
+    Manager->>ReportAgent: Create comprehensive report
+    ReportAgent-->>Manager: Returns formatted report
+    Note over Manager: SYNTHESIZE: Combine all results
+    Manager->>User: Final Research Report
+```
+## 5. Manager Decision Logic
+```mermaid
+flowchart TD
+    Start([Manager Receives Task]) --> Plan[Create Initial Plan]
+    Plan --> Select[Select Agent for Next Subtask]
+    Select --> Execute[Execute Agent]
+    Execute --> Collect[Collect Results]
+    Collect --> Assess[Assess Quality & Progress]
+    Assess --> Q1{Quality Sufficient?}
+    Q1 -->|No| Q2{Same Agent Can Fix?}
+    Q2 -->|Yes| Feedback[Provide Specific Feedback]
+    Feedback --> Execute
+    Q2 -->|No| Different[Try Different Agent]
+    Different --> Select
+    Q1 -->|Yes| Q3{Task Complete?}
+    Q3 -->|No| Q4{Making Progress?}
+    Q4 -->|Yes| Select
+    Q4 -->|No - Stalled| Replan[Reset Plan & Approach]
+    Replan --> Plan
+    Q3 -->|Yes| Synth[Synthesize Final Result]
+    Synth --> Done([Return Report])
+    style Start fill:#e1f5e1
+    style Plan fill:#fff4e6
+    style Select fill:#ffe6e6
+    style Execute fill:#e6f3ff
+    style Assess fill:#ffd6d6
+    style Q1 fill:#ffe6e6
+    style Q2 fill:#ffe6e6
+    style Q3 fill:#ffe6e6
+    style Q4 fill:#ffe6e6
+    style Synth fill:#d4edda
+    style Done fill:#e1f5e1
+```
+## 6. Hypothesis Agent Workflow
+```mermaid
+flowchart LR
+    Input[Research Query] --> Domain[Identify Domain<br/>& Key Concepts]
+    Domain --> Context[Retrieve Background<br/>Knowledge]
+    Context --> Generate[Generate 3-5<br/>Initial Hypotheses]
+    Generate --> Refine[Refine for<br/>Testability]
+    Refine --> Rank[Rank by<br/>Quality Score]
+    Rank --> Output[Return Top<br/>Hypotheses]
+    Output --> Struct[Hypothesis Structure:<br/>• Statement<br/>• Rationale<br/>• Testability Score<br/>• Data Requirements<br/>• Expected Outcomes]
+    style Input fill:#e1f5e1
+    style Output fill:#fff4e6
+    style Struct fill:#e6f3ff
+```
+## 7. Search Agent Workflow
+```mermaid
+flowchart TD
+    Input[Hypotheses] --> Strategy[Formulate Search<br/>Strategy per Hypothesis]
+    Strategy --> Multi[Multi-Source Search]
+    Multi --> PubMed[PubMed Search<br/>via MCP]
+    Multi --> ArXiv[arXiv Search<br/>via MCP]
+    Multi --> BioRxiv[bioRxiv Search<br/>via MCP]
+    PubMed --> Aggregate[Aggregate Results]
+    ArXiv --> Aggregate
+    BioRxiv --> Aggregate
+    Aggregate --> Filter[Filter & Rank<br/>by Relevance]
+    Filter --> Dedup[Deduplicate<br/>Cross-Reference]
+    Dedup --> Embed[Embed Documents<br/>via MCP]
+    Embed --> Vector[(Vector DB)]
+    Vector --> RAGRetrieval[RAG Retrieval<br/>Top-K per Hypothesis]
+    RAGRetrieval --> Output[Return Contextualized<br/>Search Results]
+    style Input fill:#fff4e6
+    style Multi fill:#ffe6e6
+    style Vector fill:#ffe6f0
+    style Output fill:#e6f3ff
+```
+## 8. Analysis Agent Workflow
+```mermaid
+flowchart TD
+    Input1[Hypotheses] --> Extract
+    Input2[Search Results] --> Extract[Extract Evidence<br/>per Hypothesis]
+    Extract --> Methods[Determine Analysis<br/>Methods Needed]
+    Methods --> Branch{Requires<br/>Computation?}
+    Branch -->|Yes| GenCode[Generate Python<br/>Analysis Code]
+    Branch -->|No| Qual[Qualitative<br/>Synthesis]
+    GenCode --> Execute[Execute Code<br/>via MCP Sandbox]
+    Execute --> Interpret1[Interpret<br/>Results]
+    Qual --> Interpret2[Interpret<br/>Findings]
+    Interpret1 --> Synthesize[Synthesize Evidence<br/>Across Sources]
+    Interpret2 --> Synthesize
+    Synthesize --> Verdict[Determine Verdict<br/>per Hypothesis]
+    Verdict --> Support[• Supported<br/>• Refuted<br/>• Inconclusive]
+    Support --> Gaps[Identify Knowledge<br/>Gaps & Limitations]
+    Gaps --> Output[Return Analysis<br/>Report]
+    style Input1 fill:#fff4e6
+    style Input2 fill:#e6f3ff
+    style Execute fill:#ffe6e6
+    style Output fill:#e6ffe6
+```
+## 9. Report Agent Workflow
+```mermaid
+flowchart TD
+    Input1[Query] --> Assemble
+    Input2[Hypotheses] --> Assemble
+    Input3[Search Results] --> Assemble
+    Input4[Analysis] --> Assemble[Assemble Report<br/>Sections]
+    Assemble --> Exec[Executive Summary]
+    Assemble --> Intro[Introduction]
+    Assemble --> Methods[Methods]
+    Assemble --> Results[Results per<br/>Hypothesis]
+    Assemble --> Discussion[Discussion]
+    Assemble --> Future[Future Directions]
+    Assemble --> Refs[References]
+    Results --> VizCheck{Needs<br/>Visualization?}
+    VizCheck -->|Yes| GenViz[Generate Viz Code]
+    GenViz --> ExecViz[Execute via MCP<br/>Create Charts]
+    ExecViz --> Combine
+    VizCheck -->|No| Combine[Combine All<br/>Sections]
+    Exec --> Combine
+    Intro --> Combine
+    Methods --> Combine
+    Discussion --> Combine
+    Future --> Combine
+    Refs --> Combine
+    Combine --> Format[Format Output]
+    Format --> MD[Markdown]
+    Format --> PDF[PDF]
+    Format --> JSON[JSON]
+    MD --> Output[Return Final<br/>Report]
+    PDF --> Output
+    JSON --> Output
+    style Input1 fill:#e1f5e1
+    style Input2 fill:#fff4e6
+    style Input3 fill:#e6f3ff
+    style Input4 fill:#e6ffe6
+    style Output fill:#d4edda
+```
+## 10. Data Flow & Event Streaming
+```mermaid
+flowchart TD
+    User[👤 User] -->|Research Query| UI[Gradio UI]
+    UI -->|Submit| Manager[Magentic Manager]
+    Manager -->|Event: Planning| UI
+    Manager -->|Select Agent| HypAgent[Hypothesis Agent]
+    HypAgent -->|Event: Delta/Message| UI
+    HypAgent -->|Hypotheses| Context[(Shared Context)]
+    Context -->|Retrieved by| Manager
+    Manager -->|Select Agent| SearchAgent[Search Agent]
+    SearchAgent -->|MCP Request| WebSearch[Web Search Tool]
+    WebSearch -->|Results| SearchAgent
+    SearchAgent -->|Event: Delta/Message| UI
+    SearchAgent -->|Documents| Context
+    SearchAgent -->|Embeddings| VectorDB[(Vector DB)]
+    Context -->|Retrieved by| Manager
+    Manager -->|Select Agent| AnalysisAgent[Analysis Agent]
+    AnalysisAgent -->|MCP Request| CodeExec[Code Execution Tool]
+    CodeExec -->|Results| AnalysisAgent
+    AnalysisAgent -->|Event: Delta/Message| UI
+    AnalysisAgent -->|Analysis| Context
+    Context -->|Retrieved by| Manager
+    Manager -->|Select Agent| ReportAgent[Report Agent]
+    ReportAgent -->|MCP Request| CodeExec
+    ReportAgent -->|Event: Delta/Message| UI
+    ReportAgent -->|Report| Context
+    Manager -->|Event: Final Result| UI
+    UI -->|Display| User
+    style User fill:#e1f5e1
+    style UI fill:#e6f3ff
+    style Manager fill:#ffe6e6
+    style Context fill:#ffe6f0
+    style VectorDB fill:#ffe6f0
+    style WebSearch fill:#f0f0f0
+    style CodeExec fill:#f0f0f0
+```
+## 11. MCP Tool Architecture
+```mermaid
+graph TB
+    subgraph "Agent Layer"
+        Manager[Magentic Manager]
+        HypAgent[Hypothesis Agent]
+        SearchAgent[Search Agent]
+        AnalysisAgent[Analysis Agent]
+        ReportAgent[Report Agent]
+    end
+    subgraph "MCP Protocol Layer"
+        Registry[MCP Tool Registry<br/>• Discovers tools<br/>• Routes requests<br/>• Manages connections]
+    end
+    subgraph "MCP Servers"
+        Server1[Web Search Server<br/>localhost:8001<br/>• PubMed<br/>• arXiv<br/>• bioRxiv]
+        Server2[Code Execution Server<br/>localhost:8002<br/>• Sandboxed Python<br/>• Package management]
+        Server3[RAG Server<br/>localhost:8003<br/>• Vector embeddings<br/>• Similarity search]
+        Server4[Visualization Server<br/>localhost:8004<br/>• Chart generation<br/>• Plot rendering]
+    end
+    subgraph "External Services"
+        PubMed[PubMed API]
+        ArXiv[arXiv API]
+        BioRxiv[bioRxiv API]
+        Modal[Modal Sandbox]
+        ChromaDB[(ChromaDB)]
+    end
+    SearchAgent -->|Request| Registry
+    AnalysisAgent -->|Request| Registry
+    ReportAgent -->|Request| Registry
+    Registry --> Server1
+    Registry --> Server2
+    Registry --> Server3
+    Registry --> Server4
+    Server1 --> PubMed
+    Server1 --> ArXiv
+    Server1 --> BioRxiv
+    Server2 --> Modal
+    Server3 --> ChromaDB
+    style Manager fill:#ffe6e6
+    style Registry fill:#fff4e6
+    style Server1 fill:#e6f3ff
+    style Server2 fill:#e6f3ff
+    style Server3 fill:#e6f3ff
+    style Server4 fill:#e6f3ff
+```
+## 12. Progress Tracking & Stall Detection
+```mermaid
+stateDiagram-v2
+    [*] --> Initialization: User Query
+    Initialization --> Planning: Manager starts
+    Planning --> AgentExecution: Select agent
+    AgentExecution --> Assessment: Collect results
+    Assessment --> QualityCheck: Evaluate output
+    QualityCheck --> AgentExecution: Poor quality<br/>(retry < max_rounds)
+    QualityCheck --> Planning: Poor quality<br/>(try different agent)
+    QualityCheck --> NextAgent: Good quality<br/>(task incomplete)
+    QualityCheck --> Synthesis: Good quality<br/>(task complete)
+    NextAgent --> AgentExecution: Select next agent
+    state StallDetection <<choice>>
+    Assessment --> StallDetection: Check progress
+    StallDetection --> Planning: No progress<br/>(stall count < max)
+    StallDetection --> ErrorRecovery: No progress<br/>(max stalls reached)
+    ErrorRecovery --> PartialReport: Generate partial results
+    PartialReport --> [*]
+    Synthesis --> FinalReport: Combine all outputs
+    FinalReport --> [*]
+    note right of QualityCheck
+        Manager assesses:
+        • Output completeness
+        • Quality metrics
+        • Progress made
+    end note
+    note right of StallDetection
+        Stall = no new progress
+        after agent execution
+        Triggers plan reset
+    end note
+```
+## 13. Gradio UI Integration
+```mermaid
+graph TD
+    App[Gradio App<br/>DeepCritical Research Agent]
+    App --> Input[Input Section]
+    App --> Status[Status Section]
+    App --> Output[Output Section]
+    Input --> Query[Research Question<br/>Text Area]
+    Input --> Controls[Controls]
+    Controls --> MaxHyp[Max Hypotheses: 1-10]
+    Controls --> MaxRounds[Max Rounds: 5-20]
+    Controls --> Submit[Start Research Button]
+    Status --> Log[Real-time Event Log<br/>• Manager planning<br/>• Agent selection<br/>• Execution updates<br/>• Quality assessment]
+    Status --> Progress[Progress Tracker<br/>• Current agent<br/>• Round count<br/>• Stall count]
+    Output --> Tabs[Tabbed Results]
+    Tabs --> Tab1[Hypotheses Tab<br/>Generated hypotheses with scores]
+    Tabs --> Tab2[Search Results Tab<br/>Papers & sources found]
+    Tabs --> Tab3[Analysis Tab<br/>Evidence & verdicts]
+    Tabs --> Tab4[Report Tab<br/>Final research report]
+    Tab4 --> Download[Download Report<br/>MD / PDF / JSON]
+    Submit -.->|Triggers| Workflow[Magentic Workflow]
+    Workflow -.->|MagenticOrchestratorMessageEvent| Log
+    Workflow -.->|MagenticAgentDeltaEvent| Log
+    Workflow -.->|MagenticAgentMessageEvent| Log
+    Workflow -.->|MagenticFinalResultEvent| Tab4
+    style App fill:#e1f5e1
+    style Input fill:#fff4e6
+    style Status fill:#e6f3ff
+    style Output fill:#e6ffe6
+    style Workflow fill:#ffe6e6
+```
+## 14. Complete System Context
+```mermaid
+graph LR
+    User[👤 Researcher<br/>Asks research questions] -->|Submits query| DC[DeepCritical<br/>Magentic Workflow]
+    DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
+    DC -->|Preprint search| ArXiv[arXiv API<br/>Scientific preprints]
+    DC -->|Biology search| BioRxiv[bioRxiv API<br/>Biology preprints]
+    DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
+    DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
+    DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
+    DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
+    PubMed -->|Results| DC
+    ArXiv -->|Results| DC
+    BioRxiv -->|Results| DC
+    Claude -->|Responses| DC
+    Modal -->|Output| DC
+    Chroma -->|Context| DC
+    DC -->|Research report| User
+    style User fill:#e1f5e1
+    style DC fill:#ffe6e6
+    style PubMed fill:#e6f3ff
+    style ArXiv fill:#e6f3ff
+    style BioRxiv fill:#e6f3ff
+    style Claude fill:#ffd6d6
+    style Modal fill:#f0f0f0
+    style Chroma fill:#ffe6f0
+    style HF fill:#d4edda
+```
+## 15. Workflow Timeline (Simplified)
+```mermaid
+gantt
+    title DeepCritical Magentic Workflow - Typical Execution
+    dateFormat mm:ss
+    axisFormat %M:%S
+    section Manager Planning
+    Initial planning         :p1, 00:00, 10s
+    section Hypothesis Agent
+    Generate hypotheses      :h1, after p1, 30s
+    Manager assessment       :h2, after h1, 5s
+    section Search Agent
+    Search hypothesis 1      :s1, after h2, 20s
+    Search hypothesis 2      :s2, after s1, 20s
+    Search hypothesis 3      :s3, after s2, 20s
+    RAG processing          :s4, after s3, 15s
+    Manager assessment      :s5, after s4, 5s
+    section Analysis Agent
+    Evidence extraction     :a1, after s5, 15s
+    Code generation        :a2, after a1, 20s
+    Code execution         :a3, after a2, 25s
+    Synthesis              :a4, after a3, 20s
+    Manager assessment     :a5, after a4, 5s
+    section Report Agent
+    Report assembly        :r1, after a5, 30s
+    Visualization          :r2, after r1, 15s
+    Formatting             :r3, after r2, 10s
+    section Manager Synthesis
+    Final synthesis        :f1, after r3, 10s
+```
+---
+## Key Differences from Original Design
+| Aspect | Original (Judge-in-Loop) | New (Magentic) |
+|--------|-------------------------|----------------|
+| **Control Flow** | Fixed sequential phases | Dynamic agent selection |
+| **Quality Control** | Separate Judge Agent | Manager assessment built-in |
+| **Retry Logic** | Phase-level with feedback | Agent-level with adaptation |
+| **Flexibility** | Rigid 4-phase pipeline | Adaptive workflow |
+| **Complexity** | 5 agents (including Judge) | 4 agents (no Judge) |
+| **Progress Tracking** | Manual state management | Built-in round/stall detection |
+| **Agent Coordination** | Sequential handoff | Manager-driven dynamic selection |
+| **Error Recovery** | Retry same phase | Try different agent or replan |
+---
+## Simplified Design Principles
+1. **Manager is Intelligent**: LLM-powered manager handles planning, selection, and quality assessment
+2. **No Separate Judge**: Manager's assessment phase replaces dedicated Judge Agent
+3. **Dynamic Workflow**: Agents can be called multiple times in any order based on need
+4. **Built-in Safety**: max_round_count (15) and max_stall_count (3) prevent infinite loops
+5. **Event-Driven UI**: Real-time streaming updates to Gradio interface
+6. **MCP-Powered Tools**: All external capabilities via Model Context Protocol
+7. **Shared Context**: Centralized state accessible to all agents
+8. **Progress Awareness**: Manager tracks what's been done and what's needed
+---
+## Legend
+- 🔴 **Red/Pink**: Manager, orchestration, decision-making
+- 🟡 **Yellow/Orange**: Specialist agents, processing
+- 🔵 **Blue**: Data, tools, MCP services
+- 🟣 **Purple/Pink**: Storage, databases, state
+- 🟢 **Green**: User interactions, final outputs
+- ⚪ **Gray**: External services, APIs
+---
+## Implementation Highlights
+**Simple 4-Agent Setup:**
+```python
+workflow = (
+    MagenticBuilder()
+    .participants(
+        hypothesis=HypothesisAgent(tools=[background_tool]),
+        search=SearchAgent(tools=[web_search, rag_tool]),
+        analysis=AnalysisAgent(tools=[code_execution]),
+        report=ReportAgent(tools=[code_execution, visualization])
+    )
+    .with_standard_manager(
+        chat_client=AnthropicClient(model="claude-sonnet-4"),
+        max_round_count=15,    # Prevent infinite loops
+        max_stall_count=3      # Detect stuck workflows
+    )
+    .build()
+)
+```
+**Manager handles quality assessment in its instructions:**
+- Checks hypothesis quality (testable, novel, clear)
+- Validates search results (relevant, authoritative, recent)
+- Assesses analysis soundness (methodology, evidence, conclusions)
+- Ensures report completeness (all sections, proper citations)
+No separate Judge Agent needed - manager does it all!
+---
+**Document Version**: 2.0 (Magentic Simplified)
+**Last Updated**: 2025-11-24
+**Architecture**: Microsoft Magentic Orchestration Pattern
+**Agents**: 4 (Hypothesis, Search, Analysis, Report) + 1 Manager
+**License**: MIT
+## See Also
+- [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
+- [Graph Orchestration](graph-orchestration.md) - Graph-based execution overview
+- [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
+- [Workflows](workflows.md) - Workflow patterns summary
+- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/{workflow-diagrams.md → architecture/workflows.md} RENAMED Viewed

File without changes

docs/brainstorming/00_ROADMAP_SUMMARY.md DELETED Viewed

@@ -1,194 +0,0 @@
-# DeepCritical Data Sources: Roadmap Summary
-**Created**: 2024-11-27
-**Purpose**: Future maintainability and hackathon continuation
----
-## Current State
-### Working Tools
-| Tool | Status | Data Quality |
-|------|--------|--------------|
-| PubMed | ✅ Works | Good (abstracts only) |
-| ClinicalTrials.gov | ✅ Works | Good (filtered for interventional) |
-| Europe PMC | ✅ Works | Good (includes preprints) |
-### Removed Tools
-| Tool | Status | Reason |
-|------|--------|--------|
-| bioRxiv | ❌ Removed | No search API - only date/DOI lookup |
----
-## Priority Improvements
-### P0: Critical (Do First)
-1. **Add Rate Limiting to PubMed**
-   - NCBI will block us without it
-   - Use `limits` library (see reference repo)
-   - 3/sec without key, 10/sec with key
-### P1: High Value, Medium Effort
-2. **Add OpenAlex as 4th Source**
-   - Citation network (huge for drug repurposing)
-   - Concept tagging (semantic discovery)
-   - Already implemented in reference repo
-   - Free, no API key
-3. **PubMed Full-Text via BioC**
-   - Get full paper text for PMC papers
-   - Already in reference repo
-### P2: Nice to Have
-4. **ClinicalTrials.gov Results**
-   - Get efficacy data from completed trials
-   - Requires more complex API calls
-5. **Europe PMC Annotations**
-   - Text-mined entities (genes, drugs, diseases)
-   - Automatic entity extraction
----
-## Effort Estimates
-| Improvement | Effort | Impact | Priority |
-|-------------|--------|--------|----------|
-| PubMed rate limiting | 1 hour | Stability | P0 |
-| OpenAlex basic search | 2 hours | High | P1 |
-| OpenAlex citations | 2 hours | Very High | P1 |
-| PubMed full-text | 3 hours | Medium | P1 |
-| CT.gov results | 4 hours | Medium | P2 |
-| Europe PMC annotations | 3 hours | Medium | P2 |
----
-## Architecture Decision
-### Option A: Keep Current + Add OpenAlex
-```
-                    User Query
-                        ↓
-    ┌───────────────────┼───────────────────┐
-    ↓                   ↓                   ↓
- PubMed          ClinicalTrials        Europe PMC
- (abstracts)     (trials only)         (preprints)
-    ↓                   ↓                   ↓
-    └───────────────────┼───────────────────┘
-                        ↓
-                   OpenAlex              ← NEW
-               (citations, concepts)
-                        ↓
-                  Orchestrator
-                        ↓
-                     Report
-```
-**Pros**: Low risk, additive
-**Cons**: More complexity, some overlap
-### Option B: OpenAlex as Primary
-```
-                    User Query
-                        ↓
-    ┌───────────────────┼───────────────────┐
-    ↓                   ↓                   ↓
- OpenAlex          ClinicalTrials      Europe PMC
- (primary          (trials only)       (full-text
-  search)                               fallback)
-    ↓                   ↓                   ↓
-    └───────────────────┼───────────────────┘
-                        ↓
-                  Orchestrator
-                        ↓
-                     Report
-```
-**Pros**: Simpler, citation network built-in
-**Cons**: Lose some PubMed-specific features
-### Recommendation: Option A
-Keep current architecture working, add OpenAlex incrementally.
----
-## Quick Wins (Can Do Today)
-1. **Add `limits` to `pyproject.toml`**
-   ```toml
-   dependencies = [
-       "limits>=3.0",
-   ]
-   ```
-2. **Copy OpenAlex tool from reference repo**
-   - File: `reference_repos/DeepCritical/DeepResearch/src/tools/openalex_tools.py`
-   - Adapt to our `SearchTool` base class
-3. **Enable NCBI API Key**
-   - Add to `.env`: `NCBI_API_KEY=your_key`
-   - 10x rate limit improvement
----
-## External Resources Worth Exploring
-### Python Libraries
-| Library | For | Notes |
-|---------|-----|-------|
-| `limits` | Rate limiting | Used by reference repo |
-| `pyalex` | OpenAlex wrapper | [GitHub](https://github.com/J535D165/pyalex) |
-| `metapub` | PubMed | Full-featured |
-| `sentence-transformers` | Semantic search | For embeddings |
-### APIs Not Yet Used
-| API | Provides | Effort |
-|-----|----------|--------|
-| RxNorm | Drug name normalization | Low |
-| DrugBank | Drug targets/mechanisms | Medium (license) |
-| UniProt | Protein data | Medium |
-| ChEMBL | Bioactivity data | Medium |
-### RAG Tools (Future)
-| Tool | Purpose |
-|------|---------|
-| [PaperQA](https://github.com/Future-House/paper-qa) | RAG for scientific papers |
-| [txtai](https://github.com/neuml/txtai) | Embeddings + search |
-| [PubMedBERT](https://huggingface.co/NeuML/pubmedbert-base-embeddings) | Biomedical embeddings |
----
-## Files in This Directory
-| File | Contents |
-|------|----------|
-| `00_ROADMAP_SUMMARY.md` | This file |
-| `01_PUBMED_IMPROVEMENTS.md` | PubMed enhancement details |
-| `02_CLINICALTRIALS_IMPROVEMENTS.md` | ClinicalTrials.gov details |
-| `03_EUROPEPMC_IMPROVEMENTS.md` | Europe PMC details |
-| `04_OPENALEX_INTEGRATION.md` | OpenAlex integration plan |
----
-## For Future Maintainers
-If you're picking this up after the hackathon:
-1. **Start with OpenAlex** - biggest bang for buck
-2. **Add rate limiting** - prevents API blocks
-3. **Don't bother with bioRxiv** - use Europe PMC instead
-4. **Reference repo is gold** - `reference_repos/DeepCritical/` has working implementations
-Good luck! 🚀

docs/brainstorming/01_PUBMED_IMPROVEMENTS.md DELETED Viewed

@@ -1,125 +0,0 @@
-# PubMed Tool: Current State & Future Improvements
-**Status**: Currently Implemented
-**Priority**: High (Core Data Source)
----
-## Current Implementation
-### What We Have (`src/tools/pubmed.py`)
-- Basic E-utilities search via `esearch.fcgi` and `efetch.fcgi`
-- Query preprocessing (strips question words, expands synonyms)
-- Returns: title, abstract, authors, journal, PMID
-- Rate limiting: None implemented (relying on NCBI defaults)
-### Current Limitations
-1. **No Full-Text Access**: Only retrieves abstracts, not full paper text
-2. **No Rate Limiting**: Risk of being blocked by NCBI
-3. **No BioC Format**: Missing structured full-text extraction
-4. **No Figure Retrieval**: No supplementary materials access
-5. **No PMC Integration**: Missing open-access full-text via PMC
----
-## Reference Implementation (DeepCritical Reference Repo)
-The reference repo at `reference_repos/DeepCritical/DeepResearch/src/tools/bioinformatics_tools.py` has a more sophisticated implementation:
-### Features We're Missing
-```python
-# Rate limiting (lines 47-50)
-from limits import parse
-from limits.storage import MemoryStorage
-from limits.strategies import MovingWindowRateLimiter
-storage = MemoryStorage()
-limiter = MovingWindowRateLimiter(storage)
-rate_limit = parse("3/second")  # NCBI allows 3/sec without API key, 10/sec with
-# Full-text via BioC format (lines 108-120)
-def _get_fulltext(pmid: int) -> dict[str, Any] | None:
-    pmid_url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
-    # Returns structured JSON with full text for open-access papers
-# Figure retrieval via Europe PMC (lines 123-149)
-def _get_figures(pmcid: str) -> dict[str, str]:
-    suppl_url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/supplementaryFiles"
-    # Returns base64-encoded images from supplementary materials
-```
----
-## Recommended Improvements
-### Phase 1: Rate Limiting (Critical)
-```python
-# Add to src/tools/pubmed.py
-from limits import parse
-from limits.storage import MemoryStorage
-from limits.strategies import MovingWindowRateLimiter
-storage = MemoryStorage()
-limiter = MovingWindowRateLimiter(storage)
-# With NCBI_API_KEY: 10/sec, without: 3/sec
-def get_rate_limit():
-    if settings.ncbi_api_key:
-        return parse("10/second")
-    return parse("3/second")
-```
-**Dependencies**: `pip install limits`
-### Phase 2: Full-Text Retrieval
-```python
-async def get_fulltext(pmid: str) -> str | None:
-    """Get full text for open-access papers via BioC API."""
-    url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
-    # Only works for PMC papers (open access)
-```
-### Phase 3: PMC ID Resolution
-```python
-async def get_pmc_id(pmid: str) -> str | None:
-    """Convert PMID to PMCID for full-text access."""
-    url = f"https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?ids={pmid}&format=json"
-```
----
-## Python Libraries to Consider
-| Library | Purpose | Notes |
-|---------|---------|-------|
-| [Biopython](https://biopython.org/) | `Bio.Entrez` module | Official, well-maintained |
-| [PyMed](https://pypi.org/project/pymed/) | PubMed wrapper | Simpler API, less control |
-| [metapub](https://pypi.org/project/metapub/) | Full-featured | Tested on 1/3 of PubMed |
-| [limits](https://pypi.org/project/limits/) | Rate limiting | Used by reference repo |
----
-## API Endpoints Reference
-| Endpoint | Purpose | Rate Limit |
-|----------|---------|------------|
-| `esearch.fcgi` | Search for PMIDs | 3/sec (10 with key) |
-| `efetch.fcgi` | Fetch metadata | 3/sec (10 with key) |
-| `esummary.fcgi` | Quick metadata | 3/sec (10 with key) |
-| `pmcoa.cgi/BioC_json` | Full text (PMC only) | Unknown |
-| `idconv/v1.0` | PMID ↔ PMCID | Unknown |
----
-## Sources
-- [PubMed E-utilities Documentation](https://www.ncbi.nlm.nih.gov/books/NBK25501/)
-- [NCBI BioC API](https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/)
-- [Searching PubMed with Python](https://marcobonzanini.com/2015/01/12/searching-pubmed-with-python/)
-- [PyMed on PyPI](https://pypi.org/project/pymed/)

docs/brainstorming/02_CLINICALTRIALS_IMPROVEMENTS.md DELETED Viewed

@@ -1,193 +0,0 @@
-# ClinicalTrials.gov Tool: Current State & Future Improvements
-**Status**: Currently Implemented
-**Priority**: High (Core Data Source for Drug Repurposing)
----
-## Current Implementation
-### What We Have (`src/tools/clinicaltrials.py`)
-- V2 API search via `clinicaltrials.gov/api/v2/studies`
-- Filters: `INTERVENTIONAL` study type, `RECRUITING` status
-- Returns: NCT ID, title, conditions, interventions, phase, status
-- Query preprocessing via shared `query_utils.py`
-### Current Strengths
-1. **Good Filtering**: Already filtering for interventional + recruiting
-2. **V2 API**: Using the modern API (v1 deprecated)
-3. **Phase Info**: Extracting trial phases for drug development context
-### Current Limitations
-1. **No Outcome Data**: Missing primary/secondary outcomes
-2. **No Eligibility Criteria**: Missing inclusion/exclusion details
-3. **No Sponsor Info**: Missing who's running the trial
-4. **No Result Data**: For completed trials, no efficacy data
-5. **Limited Drug Mapping**: No integration with drug databases
----
-## API Capabilities We're Not Using
-### Fields We Could Request
-```python
-# Current fields
-fields = ["NCTId", "BriefTitle", "Condition", "InterventionName", "Phase", "OverallStatus"]
-# Additional valuable fields
-additional_fields = [
-    "PrimaryOutcomeMeasure",      # What are they measuring?
-    "SecondaryOutcomeMeasure",    # Secondary endpoints
-    "EligibilityCriteria",        # Who can participate?
-    "LeadSponsorName",            # Who's funding?
-    "ResultsFirstPostDate",       # Has results?
-    "StudyFirstPostDate",         # When started?
-    "CompletionDate",             # When finished?
-    "EnrollmentCount",            # Sample size
-    "InterventionDescription",    # Drug details
-    "ArmGroupLabel",              # Treatment arms
-    "InterventionOtherName",      # Drug aliases
-]
-```
-### Filter Enhancements
-```python
-# Current
-aggFilters = "studyType:INTERVENTIONAL,status:RECRUITING"
-# Could add
-"status:RECRUITING,ACTIVE_NOT_RECRUITING,COMPLETED"  # Include completed for results
-"phase:PHASE2,PHASE3"  # Only later-stage trials
-"resultsFirstPostDateRange:2020-01-01_"  # Trials with posted results
-```
----
-## Recommended Improvements
-### Phase 1: Richer Metadata
-```python
-EXTENDED_FIELDS = [
-    "NCTId",
-    "BriefTitle",
-    "OfficialTitle",
-    "Condition",
-    "InterventionName",
-    "InterventionDescription",
-    "InterventionOtherName",  # Drug synonyms!
-    "Phase",
-    "OverallStatus",
-    "PrimaryOutcomeMeasure",
-    "EnrollmentCount",
-    "LeadSponsorName",
-    "StudyFirstPostDate",
-]
-```
-### Phase 2: Results Retrieval
-For completed trials, we can get actual efficacy data:
-```python
-async def get_trial_results(nct_id: str) -> dict | None:
-    """Fetch results for completed trials."""
-    url = f"https://clinicaltrials.gov/api/v2/studies/{nct_id}"
-    params = {
-        "fields": "ResultsSection",
-    }
-    # Returns outcome measures and statistics
-```
-### Phase 3: Drug Name Normalization
-Map intervention names to standard identifiers:
-```python
-# Problem: "Metformin", "Metformin HCl", "Glucophage" are the same drug
-# Solution: Use RxNorm or DrugBank for normalization
-async def normalize_drug_name(intervention: str) -> str:
-    """Normalize drug name via RxNorm API."""
-    url = f"https://rxnav.nlm.nih.gov/REST/rxcui.json?name={intervention}"
-    # Returns standardized RxCUI
-```
----
-## Integration Opportunities
-### With PubMed
-Cross-reference trials with publications:
-```python
-# ClinicalTrials.gov provides PMID links
-# Can correlate trial results with published papers
-```
-### With DrugBank/ChEMBL
-Map interventions to:
-- Mechanism of action
-- Known targets
-- Adverse effects
-- Drug-drug interactions
----
-## Python Libraries to Consider
-| Library | Purpose | Notes |
-|---------|---------|-------|
-| [pytrials](https://pypi.org/project/pytrials/) | CT.gov wrapper | V2 API support unclear |
-| [clinicaltrials](https://github.com/ebmdatalab/clinicaltrials-act-tracker) | Data tracking | More for analysis |
-| [drugbank-downloader](https://pypi.org/project/drugbank-downloader/) | Drug mapping | Requires license |
----
-## API Quirks & Gotchas
-1. **Rate Limiting**: Undocumented, be conservative
-2. **Pagination**: Max 1000 results per request
-3. **Field Names**: Case-sensitive, camelCase
-4. **Empty Results**: Some fields may be null even if requested
-5. **Status Changes**: Trials change status frequently
----
-## Example Enhanced Query
-```python
-async def search_drug_repurposing_trials(
-    drug_name: str,
-    condition: str,
-    include_completed: bool = True,
-) -> list[Evidence]:
-    """Search for trials repurposing a drug for a new condition."""
-    statuses = ["RECRUITING", "ACTIVE_NOT_RECRUITING"]
-    if include_completed:
-        statuses.append("COMPLETED")
-    params = {
-        "query.intr": drug_name,
-        "query.cond": condition,
-        "filter.overallStatus": ",".join(statuses),
-        "filter.studyType": "INTERVENTIONAL",
-        "fields": ",".join(EXTENDED_FIELDS),
-        "pageSize": 50,
-    }
-```
----
-## Sources
-- [ClinicalTrials.gov API Documentation](https://clinicaltrials.gov/data-api/api)
-- [CT.gov Field Definitions](https://clinicaltrials.gov/data-api/about-api/study-data-structure)
-- [RxNorm API](https://lhncbc.nlm.nih.gov/RxNav/APIs/api-RxNorm.findRxcuiByString.html)

docs/brainstorming/03_EUROPEPMC_IMPROVEMENTS.md DELETED Viewed

@@ -1,211 +0,0 @@
-# Europe PMC Tool: Current State & Future Improvements
-**Status**: Currently Implemented (Replaced bioRxiv)
-**Priority**: High (Preprint + Open Access Source)
----
-## Why Europe PMC Over bioRxiv?
-### bioRxiv API Limitations (Why We Abandoned It)
-1. **No Search API**: Only returns papers by date range or DOI
-2. **No Query Capability**: Cannot search for "metformin cancer"
-3. **Workaround Required**: Would need to download ALL preprints and build local search
-4. **Known Issue**: [Gradio Issue #8861](https://github.com/gradio-app/gradio/issues/8861) documents the limitation
-### Europe PMC Advantages
-1. **Full Search API**: Boolean queries, filters, facets
-2. **Aggregates bioRxiv**: Includes bioRxiv, medRxiv content anyway
-3. **Includes PubMed**: Also has MEDLINE content
-4. **34 Preprint Servers**: Not just bioRxiv
-5. **Open Access Focus**: Full-text when available
----
-## Current Implementation
-### What We Have (`src/tools/europepmc.py`)
-- REST API search via `europepmc.org/webservices/rest/search`
-- Preprint flagging via `firstPublicationDate` heuristics
-- Returns: title, abstract, authors, DOI, source
-- Marks preprints for transparency
-### Current Limitations
-1. **No Full-Text Retrieval**: Only metadata/abstracts
-2. **No Citation Network**: Missing references/citations
-3. **No Supplementary Files**: Not fetching figures/data
-4. **Basic Preprint Detection**: Heuristic, not explicit flag
----
-## Europe PMC API Capabilities
-### Endpoints We Could Use
-| Endpoint | Purpose | Currently Using |
-|----------|---------|-----------------|
-| `/search` | Query papers | Yes |
-| `/fulltext/{ID}` | Full text (XML/JSON) | No |
-| `/{PMCID}/supplementaryFiles` | Figures, data | No |
-| `/citations/{ID}` | Who cited this | No |
-| `/references/{ID}` | What this cites | No |
-| `/annotations` | Text-mined entities | No |
-### Rich Query Syntax
-```python
-# Current simple query
-query = "metformin cancer"
-# Could use advanced syntax
-query = "(TITLE:metformin OR ABSTRACT:metformin) AND (cancer OR oncology)"
-query += " AND (SRC:PPR)"  # Only preprints
-query += " AND (FIRST_PDATE:[2023-01-01 TO 2024-12-31])"  # Date range
-query += " AND (OPEN_ACCESS:y)"  # Only open access
-```
-### Source Filters
-```python
-# Filter by source
-"SRC:MED"     # MEDLINE
-"SRC:PMC"     # PubMed Central
-"SRC:PPR"     # Preprints (bioRxiv, medRxiv, etc.)
-"SRC:AGR"     # Agricola
-"SRC:CBA"     # Chinese Biological Abstracts
-```
----
-## Recommended Improvements
-### Phase 1: Rich Metadata
-```python
-# Add to search results
-additional_fields = [
-    "citedByCount",           # Impact indicator
-    "source",                 # Explicit source (MED, PMC, PPR)
-    "isOpenAccess",           # Boolean flag
-    "fullTextUrlList",        # URLs for full text
-    "authorAffiliations",     # Institution info
-    "grantsList",             # Funding info
-]
-```
-### Phase 2: Full-Text Retrieval
-```python
-async def get_fulltext(pmcid: str) -> str | None:
-    """Get full text for open access papers."""
-    # XML format
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/fullTextXML"
-    # Or JSON
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/fullTextJSON"
-```
-### Phase 3: Citation Network
-```python
-async def get_citations(pmcid: str) -> list[str]:
-    """Get papers that cite this one."""
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/citations"
-async def get_references(pmcid: str) -> list[str]:
-    """Get papers this one cites."""
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/references"
-```
-### Phase 4: Text-Mined Annotations
-Europe PMC extracts entities automatically:
-```python
-async def get_annotations(pmcid: str) -> dict:
-    """Get text-mined entities (genes, diseases, drugs)."""
-    url = f"https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds"
-    params = {
-        "articleIds": f"PMC:{pmcid}",
-        "type": "Gene_Proteins,Diseases,Chemicals",
-        "format": "JSON",
-    }
-    # Returns structured entity mentions with positions
-```
----
-## Supplementary File Retrieval
-From reference repo (`bioinformatics_tools.py` lines 123-149):
-```python
-def get_figures(pmcid: str) -> dict[str, str]:
-    """Download figures and supplementary files."""
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/supplementaryFiles?includeInlineImage=true"
-    # Returns ZIP with images, returns base64-encoded
-```
----
-## Preprint-Specific Features
-### Identify Preprint Servers
-```python
-PREPRINT_SOURCES = {
-    "PPR": "General preprints",
-    "bioRxiv": "Biology preprints",
-    "medRxiv": "Medical preprints",
-    "chemRxiv": "Chemistry preprints",
-    "Research Square": "Multi-disciplinary",
-    "Preprints.org": "MDPI preprints",
-}
-# Check if published version exists
-async def check_published_version(preprint_doi: str) -> str | None:
-    """Check if preprint has been peer-reviewed and published."""
-    # Europe PMC links preprints to final versions
-```
----
-## Rate Limiting
-Europe PMC is more generous than NCBI:
-```python
-# No documented hard limit, but be respectful
-# Recommend: 10-20 requests/second max
-# Use email in User-Agent for polite pool
-headers = {
-    "User-Agent": "DeepCritical/1.0 (mailto:your@email.com)"
-}
-```
----
-## vs. The Lens & OpenAlex
-| Feature | Europe PMC | The Lens | OpenAlex |
-|---------|------------|----------|----------|
-| Biomedical Focus | Yes | Partial | Partial |
-| Preprints | Yes (34 servers) | Yes | Yes |
-| Full Text | PMC papers | Links | No |
-| Citations | Yes | Yes | Yes |
-| Annotations | Yes (text-mined) | No | No |
-| Rate Limits | Generous | Moderate | Very generous |
-| API Key | Optional | Required | Optional |
----
-## Sources
-- [Europe PMC REST API](https://europepmc.org/RestfulWebService)
-- [Europe PMC Annotations API](https://europepmc.org/AnnotationsApi)
-- [Europe PMC Articles API](https://europepmc.org/ArticlesApi)
-- [rOpenSci medrxivr](https://docs.ropensci.org/medrxivr/)
-- [bioRxiv TDM Resources](https://www.biorxiv.org/tdm)

docs/brainstorming/04_OPENALEX_INTEGRATION.md DELETED Viewed

@@ -1,303 +0,0 @@
-# OpenAlex Integration: The Missing Piece?
-**Status**: NOT Implemented (Candidate for Addition)
-**Priority**: HIGH - Could Replace Multiple Tools
-**Reference**: Already implemented in `reference_repos/DeepCritical`
----
-## What is OpenAlex?
-OpenAlex is a **fully open** index of the global research system:
-- **209M+ works** (papers, books, datasets)
-- **2B+ author records** (disambiguated)
-- **124K+ venues** (journals, repositories)
-- **109K+ institutions**
-- **65K+ concepts** (hierarchical, linked to Wikidata)
-**Free. Open. No API key required.**
----
-## Why OpenAlex for DeepCritical?
-### Current Architecture
-```
-User Query
-    ↓
-┌──────────────────────────────────────┐
-│  PubMed    ClinicalTrials  Europe PMC │  ← 3 separate APIs
-└──────────────────────────────────────┘
-    ↓
-Orchestrator (deduplicate, judge, synthesize)
-```
-### With OpenAlex
-```
-User Query
-    ↓
-┌──────────────────────────────────────┐
-│              OpenAlex                 │  ← Single API
-│  (includes PubMed + preprints +       │
-│   citations + concepts + authors)     │
-└──────────────────────────────────────┘
-    ↓
-Orchestrator (enrich with CT.gov for trials)
-```
-**OpenAlex already aggregates**:
-- PubMed/MEDLINE
-- Crossref
-- ORCID
-- Unpaywall (open access links)
-- Microsoft Academic Graph (legacy)
-- Preprint servers
----
-## Reference Implementation
-From `reference_repos/DeepCritical/DeepResearch/src/tools/openalex_tools.py`:
-```python
-class OpenAlexFetchTool(ToolRunner):
-    def __init__(self):
-        super().__init__(
-            ToolSpec(
-                name="openalex_fetch",
-                description="Fetch OpenAlex work or author",
-                inputs={"entity": "TEXT", "identifier": "TEXT"},
-                outputs={"result": "JSON"},
-            )
-        )
-    def run(self, params: dict[str, Any]) -> ExecutionResult:
-        entity = params["entity"]      # "works", "authors", "venues"
-        identifier = params["identifier"]
-        base = "https://api.openalex.org"
-        url = f"{base}/{entity}/{identifier}"
-        resp = requests.get(url, timeout=30)
-        return ExecutionResult(success=True, data={"result": resp.json()})
-```
----
-## OpenAlex API Features
-### Search Works (Papers)
-```python
-# Search for metformin + cancer papers
-url = "https://api.openalex.org/works"
-params = {
-    "search": "metformin cancer drug repurposing",
-    "filter": "publication_year:>2020,type:article",
-    "sort": "cited_by_count:desc",
-    "per_page": 50,
-}
-```
-### Rich Filtering
-```python
-# Filter examples
-"publication_year:2023"
-"type:article"                      # vs preprint, book, etc.
-"is_oa:true"                        # Open access only
-"concepts.id:C71924100"             # Papers about "Medicine"
-"authorships.institutions.id:I27837315"  # From Harvard
-"cited_by_count:>100"               # Highly cited
-"has_fulltext:true"                 # Full text available
-```
-### What You Get Back
-```json
-{
-    "id": "W2741809807",
-    "title": "Metformin: A candidate drug for...",
-    "publication_year": 2023,
-    "type": "article",
-    "cited_by_count": 45,
-    "is_oa": true,
-    "primary_location": {
-        "source": {"display_name": "Nature Medicine"},
-        "pdf_url": "https://...",
-        "landing_page_url": "https://..."
-    },
-    "concepts": [
-        {"id": "C71924100", "display_name": "Medicine", "score": 0.95},
-        {"id": "C54355233", "display_name": "Pharmacology", "score": 0.88}
-    ],
-    "authorships": [
-        {
-            "author": {"id": "A123", "display_name": "John Smith"},
-            "institutions": [{"display_name": "Harvard Medical School"}]
-        }
-    ],
-    "referenced_works": ["W123", "W456"],  # Citations
-    "related_works": ["W789", "W012"]       # Similar papers
-}
-```
----
-## Key Advantages Over Current Tools
-### 1. Citation Network (We Don't Have This!)
-```python
-# Get papers that cite a work
-url = f"https://api.openalex.org/works?filter=cites:{work_id}"
-# Get papers cited by a work
-# Already in `referenced_works` field
-```
-### 2. Concept Tagging (We Don't Have This!)
-OpenAlex auto-tags papers with hierarchical concepts:
-- "Medicine" → "Pharmacology" → "Drug Repurposing"
-- Can search by concept, not just keywords
-### 3. Author Disambiguation (We Don't Have This!)
-```python
-# Find all works by an author
-url = f"https://api.openalex.org/works?filter=authorships.author.id:{author_id}"
-```
-### 4. Institution Tracking
-```python
-# Find drug repurposing papers from top institutions
-url = "https://api.openalex.org/works"
-params = {
-    "search": "drug repurposing",
-    "filter": "authorships.institutions.id:I27837315",  # Harvard
-}
-```
-### 5. Related Works
-Each paper comes with `related_works` - semantically similar papers discovered by OpenAlex's ML.
----
-## Proposed Implementation
-### New Tool: `src/tools/openalex.py`
-```python
-"""OpenAlex search tool for comprehensive scholarly data."""
-import httpx
-from src.tools.base import SearchTool
-from src.utils.models import Evidence
-class OpenAlexTool(SearchTool):
-    """Search OpenAlex for scholarly works with rich metadata."""
-    name = "openalex"
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        async with httpx.AsyncClient() as client:
-            resp = await client.get(
-                "https://api.openalex.org/works",
-                params={
-                    "search": query,
-                    "filter": "type:article,is_oa:true",
-                    "sort": "cited_by_count:desc",
-                    "per_page": max_results,
-                    "mailto": "deepcritical@example.com",  # Polite pool
-                },
-            )
-            data = resp.json()
-        return [
-            Evidence(
-                source="openalex",
-                title=work["title"],
-                abstract=work.get("abstract", ""),
-                url=work["primary_location"]["landing_page_url"],
-                metadata={
-                    "cited_by_count": work["cited_by_count"],
-                    "concepts": [c["display_name"] for c in work["concepts"][:5]],
-                    "is_open_access": work["is_oa"],
-                    "pdf_url": work["primary_location"].get("pdf_url"),
-                },
-            )
-            for work in data["results"]
-        ]
-```
----
-## Rate Limits
-OpenAlex is **extremely generous**:
-- No hard rate limit documented
-- Recommended: <100,000 requests/day
-- **Polite pool**: Add `mailto=your@email.com` param for faster responses
-- No API key required (optional for priority support)
----
-## Should We Add OpenAlex?
-### Arguments FOR
-1. **Already in reference repo** - proven pattern
-2. **Richer data** - citations, concepts, authors
-3. **Single source** - reduces API complexity
-4. **Free & open** - no keys, no limits
-5. **Institution adoption** - Leiden, Sorbonne switched to it
-### Arguments AGAINST
-1. **Adds complexity** - another data source
-2. **Overlap** - duplicates some PubMed data
-3. **Not biomedical-focused** - covers all disciplines
-4. **No full text** - still need PMC/Europe PMC for that
-### Recommendation
-**Add OpenAlex as a 4th source**, don't replace existing tools.
-Use it for:
-- Citation network analysis
-- Concept-based discovery
-- High-impact paper finding
-- Author/institution tracking
-Keep PubMed, ClinicalTrials, Europe PMC for:
-- Authoritative biomedical search
-- Clinical trial data
-- Full-text access
-- Preprint tracking
----
-## Implementation Priority
-| Task | Effort | Value |
-|------|--------|-------|
-| Basic search | Low | High |
-| Citation network | Medium | Very High |
-| Concept filtering | Low | High |
-| Related works | Low | High |
-| Author tracking | Medium | Medium |
----
-## Sources
-- [OpenAlex Documentation](https://docs.openalex.org)
-- [OpenAlex API Overview](https://docs.openalex.org/api)
-- [OpenAlex Wikipedia](https://en.wikipedia.org/wiki/OpenAlex)
-- [Leiden University Announcement](https://www.leidenranking.com/information/openalex)
-- [OpenAlex: A fully-open index (Paper)](https://arxiv.org/abs/2205.01833)

docs/brainstorming/implementation/15_PHASE_OPENALEX.md DELETED Viewed

@@ -1,603 +0,0 @@
-# Phase 15: OpenAlex Integration
-**Priority**: HIGH - Biggest bang for buck
-**Effort**: ~2-3 hours
-**Dependencies**: None (existing codebase patterns sufficient)
----
-## Prerequisites (COMPLETED)
-The following model changes have been implemented to support this integration:
-1. **`SourceName` Literal Updated** (`src/utils/models.py:9`)
-   ```python
-   SourceName = Literal["pubmed", "clinicaltrials", "europepmc", "preprint", "openalex"]
-   ```
-   - Without this, `source="openalex"` would fail Pydantic validation
-2. **`Evidence.metadata` Field Added** (`src/utils/models.py:39-42`)
-   ```python
-   metadata: dict[str, Any] = Field(
-       default_factory=dict,
-       description="Additional metadata (e.g., cited_by_count, concepts, is_open_access)",
-   )
-   ```
-   - Required for storing `cited_by_count`, `concepts`, etc.
-   - Model is still frozen - metadata must be passed at construction time
-3. **`__init__.py` Exports Updated** (`src/tools/__init__.py`)
-   - All tools are now exported: `ClinicalTrialsTool`, `EuropePMCTool`, `PubMedTool`
-   - OpenAlexTool should be added here after implementation
----
-## Overview
-Add OpenAlex as a 4th data source for comprehensive scholarly data including:
-- Citation networks (who cites whom)
-- Concept tagging (hierarchical topic classification)
-- Author disambiguation
-- 209M+ works indexed
-**Why OpenAlex?**
-- Free, no API key required
-- Already implemented in reference repo
-- Provides citation data we don't have
-- Aggregates PubMed + preprints + more
----
-## TDD Implementation Plan
-### Step 1: Write the Tests First
-**File**: `tests/unit/tools/test_openalex.py`
-```python
-"""Tests for OpenAlex search tool."""
-import pytest
-import respx
-from httpx import Response
-from src.tools.openalex import OpenAlexTool
-from src.utils.models import Evidence
-class TestOpenAlexTool:
-    """Test suite for OpenAlex search functionality."""
-    @pytest.fixture
-    def tool(self) -> OpenAlexTool:
-        return OpenAlexTool()
-    def test_name_property(self, tool: OpenAlexTool) -> None:
-        """Tool should identify itself as 'openalex'."""
-        assert tool.name == "openalex"
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_returns_evidence(self, tool: OpenAlexTool) -> None:
-        """Search should return list of Evidence objects."""
-        mock_response = {
-            "results": [
-                {
-                    "id": "W2741809807",
-                    "title": "Metformin and cancer: A systematic review",
-                    "publication_year": 2023,
-                    "cited_by_count": 45,
-                    "type": "article",
-                    "is_oa": True,
-                    "primary_location": {
-                        "source": {"display_name": "Nature Medicine"},
-                        "landing_page_url": "https://doi.org/10.1038/example",
-                        "pdf_url": None,
-                    },
-                    "abstract_inverted_index": {
-                        "Metformin": [0],
-                        "shows": [1],
-                        "anticancer": [2],
-                        "effects": [3],
-                    },
-                    "concepts": [
-                        {"display_name": "Medicine", "score": 0.95},
-                        {"display_name": "Oncology", "score": 0.88},
-                    ],
-                    "authorships": [
-                        {
-                            "author": {"display_name": "John Smith"},
-                            "institutions": [{"display_name": "Harvard"}],
-                        }
-                    ],
-                }
-            ]
-        }
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        results = await tool.search("metformin cancer", max_results=10)
-        assert len(results) == 1
-        assert isinstance(results[0], Evidence)
-        assert "Metformin and cancer" in results[0].citation.title
-        assert results[0].citation.source == "openalex"
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_empty_results(self, tool: OpenAlexTool) -> None:
-        """Search with no results should return empty list."""
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(200, json={"results": []})
-        )
-        results = await tool.search("xyznonexistentquery123")
-        assert results == []
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_handles_missing_abstract(self, tool: OpenAlexTool) -> None:
-        """Tool should handle papers without abstracts."""
-        mock_response = {
-            "results": [
-                {
-                    "id": "W123",
-                    "title": "Paper without abstract",
-                    "publication_year": 2023,
-                    "cited_by_count": 10,
-                    "type": "article",
-                    "is_oa": False,
-                    "primary_location": {
-                        "source": {"display_name": "Journal"},
-                        "landing_page_url": "https://example.com",
-                    },
-                    "abstract_inverted_index": None,
-                    "concepts": [],
-                    "authorships": [],
-                }
-            ]
-        }
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        results = await tool.search("test query")
-        assert len(results) == 1
-        assert results[0].content == ""  # No abstract
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_extracts_citation_count(self, tool: OpenAlexTool) -> None:
-        """Citation count should be in metadata."""
-        mock_response = {
-            "results": [
-                {
-                    "id": "W456",
-                    "title": "Highly cited paper",
-                    "publication_year": 2020,
-                    "cited_by_count": 500,
-                    "type": "article",
-                    "is_oa": True,
-                    "primary_location": {
-                        "source": {"display_name": "Science"},
-                        "landing_page_url": "https://example.com",
-                    },
-                    "abstract_inverted_index": {"Test": [0]},
-                    "concepts": [],
-                    "authorships": [],
-                }
-            ]
-        }
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        results = await tool.search("highly cited")
-        assert results[0].metadata["cited_by_count"] == 500
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_extracts_concepts(self, tool: OpenAlexTool) -> None:
-        """Concepts should be extracted for semantic discovery."""
-        mock_response = {
-            "results": [
-                {
-                    "id": "W789",
-                    "title": "Drug repurposing study",
-                    "publication_year": 2023,
-                    "cited_by_count": 25,
-                    "type": "article",
-                    "is_oa": True,
-                    "primary_location": {
-                        "source": {"display_name": "PLOS ONE"},
-                        "landing_page_url": "https://example.com",
-                    },
-                    "abstract_inverted_index": {"Drug": [0], "repurposing": [1]},
-                    "concepts": [
-                        {"display_name": "Pharmacology", "score": 0.92},
-                        {"display_name": "Drug Discovery", "score": 0.85},
-                        {"display_name": "Medicine", "score": 0.80},
-                    ],
-                    "authorships": [],
-                }
-            ]
-        }
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        results = await tool.search("drug repurposing")
-        assert "Pharmacology" in results[0].metadata["concepts"]
-        assert "Drug Discovery" in results[0].metadata["concepts"]
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_api_error_raises_search_error(
-        self, tool: OpenAlexTool
-    ) -> None:
-        """API errors should raise SearchError."""
-        from src.utils.exceptions import SearchError
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(500, text="Internal Server Error")
-        )
-        with pytest.raises(SearchError):
-            await tool.search("test query")
-    def test_reconstruct_abstract(self, tool: OpenAlexTool) -> None:
-        """Test abstract reconstruction from inverted index."""
-        inverted_index = {
-            "Metformin": [0, 5],
-            "is": [1],
-            "a": [2],
-            "diabetes": [3],
-            "drug": [4],
-            "effective": [6],
-        }
-        abstract = tool._reconstruct_abstract(inverted_index)
-        assert abstract == "Metformin is a diabetes drug Metformin effective"
-```
----
-### Step 2: Create the Implementation
-**File**: `src/tools/openalex.py`
-```python
-"""OpenAlex search tool for comprehensive scholarly data."""
-from typing import Any
-import httpx
-from tenacity import retry, stop_after_attempt, wait_exponential
-from src.utils.exceptions import SearchError
-from src.utils.models import Citation, Evidence
-class OpenAlexTool:
-    """
-    Search OpenAlex for scholarly works with rich metadata.
-    OpenAlex provides:
-    - 209M+ scholarly works
-    - Citation counts and networks
-    - Concept tagging (hierarchical)
-    - Author disambiguation
-    - Open access links
-    API Docs: https://docs.openalex.org/
-    """
-    BASE_URL = "https://api.openalex.org/works"
-    def __init__(self, email: str | None = None) -> None:
-        """
-        Initialize OpenAlex tool.
-        Args:
-            email: Optional email for polite pool (faster responses)
-        """
-        self.email = email or "deepcritical@example.com"
-    @property
-    def name(self) -> str:
-        return "openalex"
-    @retry(
-        stop=stop_after_attempt(3),
-        wait=wait_exponential(multiplier=1, min=1, max=10),
-        reraise=True,
-    )
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        """
-        Search OpenAlex for scholarly works.
-        Args:
-            query: Search terms
-            max_results: Maximum results to return (max 200 per request)
-        Returns:
-            List of Evidence objects with citation metadata
-        Raises:
-            SearchError: If API request fails
-        """
-        params = {
-            "search": query,
-            "filter": "type:article",  # Only peer-reviewed articles
-            "sort": "cited_by_count:desc",  # Most cited first
-            "per_page": min(max_results, 200),
-            "mailto": self.email,  # Polite pool for faster responses
-        }
-        async with httpx.AsyncClient(timeout=30.0) as client:
-            try:
-                response = await client.get(self.BASE_URL, params=params)
-                response.raise_for_status()
-                data = response.json()
-                results = data.get("results", [])
-                return [self._to_evidence(work) for work in results[:max_results]]
-            except httpx.HTTPStatusError as e:
-                raise SearchError(f"OpenAlex API error: {e}") from e
-            except httpx.RequestError as e:
-                raise SearchError(f"OpenAlex connection failed: {e}") from e
-    def _to_evidence(self, work: dict[str, Any]) -> Evidence:
-        """Convert OpenAlex work to Evidence object."""
-        title = work.get("title", "Untitled")
-        pub_year = work.get("publication_year", "Unknown")
-        cited_by = work.get("cited_by_count", 0)
-        is_oa = work.get("is_oa", False)
-        # Reconstruct abstract from inverted index
-        abstract_index = work.get("abstract_inverted_index")
-        abstract = self._reconstruct_abstract(abstract_index) if abstract_index else ""
-        # Extract concepts (top 5)
-        concepts = [
-            c.get("display_name", "")
-            for c in work.get("concepts", [])[:5]
-            if c.get("display_name")
-        ]
-        # Extract authors (top 5)
-        authorships = work.get("authorships", [])
-        authors = [
-            a.get("author", {}).get("display_name", "")
-            for a in authorships[:5]
-            if a.get("author", {}).get("display_name")
-        ]
-        # Get URL
-        primary_loc = work.get("primary_location") or {}
-        url = primary_loc.get("landing_page_url", "")
-        if not url:
-            # Fallback to OpenAlex page
-            work_id = work.get("id", "").replace("https://openalex.org/", "")
-            url = f"https://openalex.org/{work_id}"
-        return Evidence(
-            content=abstract[:2000],
-            citation=Citation(
-                source="openalex",
-                title=title[:500],
-                url=url,
-                date=str(pub_year),
-                authors=authors,
-            ),
-            relevance=min(0.9, 0.5 + (cited_by / 1000)),  # Boost by citations
-            metadata={
-                "cited_by_count": cited_by,
-                "is_open_access": is_oa,
-                "concepts": concepts,
-                "pdf_url": primary_loc.get("pdf_url"),
-            },
-        )
-    def _reconstruct_abstract(
-        self, inverted_index: dict[str, list[int]]
-    ) -> str:
-        """
-        Reconstruct abstract from OpenAlex inverted index format.
-        OpenAlex stores abstracts as {"word": [position1, position2, ...]}.
-        This rebuilds the original text.
-        """
-        if not inverted_index:
-            return ""
-        # Build position -> word mapping
-        position_word: dict[int, str] = {}
-        for word, positions in inverted_index.items():
-            for pos in positions:
-                position_word[pos] = word
-        # Reconstruct in order
-        if not position_word:
-            return ""
-        max_pos = max(position_word.keys())
-        words = [position_word.get(i, "") for i in range(max_pos + 1)]
-        return " ".join(w for w in words if w)
-```
----
-### Step 3: Register in Search Handler
-**File**: `src/tools/search_handler.py` (add to imports and tool list)
-```python
-# Add import
-from src.tools.openalex import OpenAlexTool
-# Add to _create_tools method
-def _create_tools(self) -> list[SearchTool]:
-    return [
-        PubMedTool(),
-        ClinicalTrialsTool(),
-        EuropePMCTool(),
-        OpenAlexTool(),  # NEW
-    ]
-```
----
-### Step 4: Update `__init__.py`
-**File**: `src/tools/__init__.py`
-```python
-from src.tools.openalex import OpenAlexTool
-__all__ = [
-    "PubMedTool",
-    "ClinicalTrialsTool",
-    "EuropePMCTool",
-    "OpenAlexTool",  # NEW
-    # ...
-]
-```
----
-## Demo Script
-**File**: `examples/openalex_demo.py`
-```python
-#!/usr/bin/env python3
-"""Demo script to verify OpenAlex integration."""
-import asyncio
-from src.tools.openalex import OpenAlexTool
-async def main():
-    """Run OpenAlex search demo."""
-    tool = OpenAlexTool()
-    print("=" * 60)
-    print("OpenAlex Integration Demo")
-    print("=" * 60)
-    # Test 1: Basic drug repurposing search
-    print("\n[Test 1] Searching for 'metformin cancer drug repurposing'...")
-    results = await tool.search("metformin cancer drug repurposing", max_results=5)
-    for i, evidence in enumerate(results, 1):
-        print(f"\n--- Result {i} ---")
-        print(f"Title: {evidence.citation.title}")
-        print(f"Year: {evidence.citation.date}")
-        print(f"Citations: {evidence.metadata.get('cited_by_count', 'N/A')}")
-        print(f"Concepts: {', '.join(evidence.metadata.get('concepts', []))}")
-        print(f"Open Access: {evidence.metadata.get('is_open_access', False)}")
-        print(f"URL: {evidence.citation.url}")
-        if evidence.content:
-            print(f"Abstract: {evidence.content[:200]}...")
-    # Test 2: High-impact papers
-    print("\n" + "=" * 60)
-    print("[Test 2] Finding highly-cited papers on 'long COVID treatment'...")
-    results = await tool.search("long COVID treatment", max_results=3)
-    for evidence in results:
-        print(f"\n- {evidence.citation.title}")
-        print(f"  Citations: {evidence.metadata.get('cited_by_count', 0)}")
-    print("\n" + "=" * 60)
-    print("Demo complete!")
-if __name__ == "__main__":
-    asyncio.run(main())
-```
----
-## Verification Checklist
-### Unit Tests
-```bash
-# Run just OpenAlex tests
-uv run pytest tests/unit/tools/test_openalex.py -v
-# Expected: All tests pass
-```
-### Integration Test (Manual)
-```bash
-# Run demo script with real API
-uv run python examples/openalex_demo.py
-# Expected: Real results from OpenAlex API
-```
-### Full Test Suite
-```bash
-# Ensure nothing broke
-make check
-# Expected: All 110+ tests pass, mypy clean
-```
----
-## Success Criteria
-1. **Unit tests pass**: All mocked tests in `test_openalex.py` pass
-2. **Integration works**: Demo script returns real results
-3. **No regressions**: `make check` passes completely
-4. **SearchHandler integration**: OpenAlex appears in search results alongside other sources
-5. **Citation metadata**: Results include `cited_by_count`, `concepts`, `is_open_access`
----
-## Future Enhancements (P2)
-Once basic integration works:
-1. **Citation Network Queries**
-   ```python
-   # Get papers citing a specific work
-   async def get_citing_works(self, work_id: str) -> list[Evidence]:
-       params = {"filter": f"cites:{work_id}"}
-       ...
-   ```
-2. **Concept-Based Search**
-   ```python
-   # Search by OpenAlex concept ID
-   async def search_by_concept(self, concept_id: str) -> list[Evidence]:
-       params = {"filter": f"concepts.id:{concept_id}"}
-       ...
-   ```
-3. **Author Tracking**
-   ```python
-   # Find all works by an author
-   async def search_by_author(self, author_id: str) -> list[Evidence]:
-       params = {"filter": f"authorships.author.id:{author_id}"}
-       ...
-   ```
----
-## Notes
-- OpenAlex is **very generous** with rate limits (no documented hard limit)
-- Adding `mailto` parameter gives priority access (polite pool)
-- Abstract is stored as inverted index - must reconstruct
-- Citation count is a good proxy for paper quality/impact
-- Consider caching responses for repeated queries

docs/brainstorming/implementation/16_PHASE_PUBMED_FULLTEXT.md DELETED Viewed

@@ -1,586 +0,0 @@
-# Phase 16: PubMed Full-Text Retrieval
-**Priority**: MEDIUM - Enhances evidence quality
-**Effort**: ~3 hours
-**Dependencies**: None (existing PubMed tool sufficient)
----
-## Prerequisites (COMPLETED)
-The `Evidence.metadata` field has been added to `src/utils/models.py` to support:
-```python
-metadata={"has_fulltext": True}
-```
----
-## Architecture Decision: Constructor Parameter vs Method Parameter
-**IMPORTANT**: The original spec proposed `include_fulltext` as a method parameter:
-```python
-# WRONG - SearchHandler won't pass this parameter
-async def search(self, query: str, max_results: int = 10, include_fulltext: bool = False):
-```
-**Problem**: `SearchHandler` calls `tool.search(query, max_results)` uniformly across all tools.
-It has no mechanism to pass tool-specific parameters like `include_fulltext`.
-**Solution**: Use constructor parameter instead:
-```python
-# CORRECT - Configured at instantiation time
-class PubMedTool:
-    def __init__(self, api_key: str | None = None, include_fulltext: bool = False):
-        self.include_fulltext = include_fulltext
-        ...
-```
-This way, you can create a full-text-enabled PubMed tool:
-```python
-# In orchestrator or wherever tools are created
-tools = [
-    PubMedTool(include_fulltext=True),  # Full-text enabled
-    ClinicalTrialsTool(),
-    EuropePMCTool(),
-]
-```
----
-## Overview
-Add full-text retrieval for PubMed papers via the BioC API, enabling:
-- Complete paper text for open-access PMC papers
-- Structured sections (intro, methods, results, discussion)
-- Better evidence for LLM synthesis
-**Why Full-Text?**
-- Abstracts only give ~200-300 words
-- Full text provides detailed methods, results, figures
-- Reference repo already has this implemented
-- Makes LLM judgments more accurate
----
-## TDD Implementation Plan
-### Step 1: Write the Tests First
-**File**: `tests/unit/tools/test_pubmed_fulltext.py`
-```python
-"""Tests for PubMed full-text retrieval."""
-import pytest
-import respx
-from httpx import Response
-from src.tools.pubmed import PubMedTool
-class TestPubMedFullText:
-    """Test suite for PubMed full-text functionality."""
-    @pytest.fixture
-    def tool(self) -> PubMedTool:
-        return PubMedTool()
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_get_pmc_id_success(self, tool: PubMedTool) -> None:
-        """Should convert PMID to PMCID for full-text access."""
-        mock_response = {
-            "records": [
-                {
-                    "pmid": "12345678",
-                    "pmcid": "PMC1234567",
-                }
-            ]
-        }
-        respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        pmcid = await tool.get_pmc_id("12345678")
-        assert pmcid == "PMC1234567"
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_get_pmc_id_not_in_pmc(self, tool: PubMedTool) -> None:
-        """Should return None if paper not in PMC."""
-        mock_response = {
-            "records": [
-                {
-                    "pmid": "12345678",
-                    # No pmcid means not in PMC
-                }
-            ]
-        }
-        respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        pmcid = await tool.get_pmc_id("12345678")
-        assert pmcid is None
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_get_fulltext_success(self, tool: PubMedTool) -> None:
-        """Should retrieve full text for PMC papers."""
-        # Mock BioC API response
-        mock_bioc = {
-            "documents": [
-                {
-                    "passages": [
-                        {
-                            "infons": {"section_type": "INTRO"},
-                            "text": "Introduction text here.",
-                        },
-                        {
-                            "infons": {"section_type": "METHODS"},
-                            "text": "Methods description here.",
-                        },
-                        {
-                            "infons": {"section_type": "RESULTS"},
-                            "text": "Results summary here.",
-                        },
-                        {
-                            "infons": {"section_type": "DISCUSS"},
-                            "text": "Discussion and conclusions.",
-                        },
-                    ]
-                }
-            ]
-        }
-        respx.get(
-            "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
-        ).mock(return_value=Response(200, json=mock_bioc))
-        fulltext = await tool.get_fulltext("12345678")
-        assert fulltext is not None
-        assert "Introduction text here" in fulltext
-        assert "Methods description here" in fulltext
-        assert "Results summary here" in fulltext
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_get_fulltext_not_available(self, tool: PubMedTool) -> None:
-        """Should return None if full text not available."""
-        respx.get(
-            "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/99999999/unicode"
-        ).mock(return_value=Response(404))
-        fulltext = await tool.get_fulltext("99999999")
-        assert fulltext is None
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_get_fulltext_structured(self, tool: PubMedTool) -> None:
-        """Should return structured sections dict."""
-        mock_bioc = {
-            "documents": [
-                {
-                    "passages": [
-                        {"infons": {"section_type": "INTRO"}, "text": "Intro..."},
-                        {"infons": {"section_type": "METHODS"}, "text": "Methods..."},
-                        {"infons": {"section_type": "RESULTS"}, "text": "Results..."},
-                        {"infons": {"section_type": "DISCUSS"}, "text": "Discussion..."},
-                    ]
-                }
-            ]
-        }
-        respx.get(
-            "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
-        ).mock(return_value=Response(200, json=mock_bioc))
-        sections = await tool.get_fulltext_structured("12345678")
-        assert sections is not None
-        assert "introduction" in sections
-        assert "methods" in sections
-        assert "results" in sections
-        assert "discussion" in sections
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_with_fulltext_enabled(self) -> None:
-        """Search should include full text when tool is configured for it."""
-        # Create tool WITH full-text enabled via constructor
-        tool = PubMedTool(include_fulltext=True)
-        # Mock esearch
-        respx.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi").mock(
-            return_value=Response(
-                200, json={"esearchresult": {"idlist": ["12345678"]}}
-            )
-        )
-        # Mock efetch (abstract)
-        mock_xml = """
-        <PubmedArticleSet>
-          <PubmedArticle>
-            <MedlineCitation>
-              <PMID>12345678</PMID>
-              <Article>
-                <ArticleTitle>Test Paper</ArticleTitle>
-                <Abstract><AbstractText>Short abstract.</AbstractText></Abstract>
-                <AuthorList><Author><LastName>Smith</LastName></Author></AuthorList>
-              </Article>
-            </MedlineCitation>
-          </PubmedArticle>
-        </PubmedArticleSet>
-        """
-        respx.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi").mock(
-            return_value=Response(200, text=mock_xml)
-        )
-        # Mock ID converter
-        respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
-            return_value=Response(
-                200, json={"records": [{"pmid": "12345678", "pmcid": "PMC1234567"}]}
-            )
-        )
-        # Mock BioC full text
-        mock_bioc = {
-            "documents": [
-                {
-                    "passages": [
-                        {"infons": {"section_type": "INTRO"}, "text": "Full intro..."},
-                    ]
-                }
-            ]
-        }
-        respx.get(
-            "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
-        ).mock(return_value=Response(200, json=mock_bioc))
-        # NOTE: No include_fulltext param - it's set via constructor
-        results = await tool.search("test", max_results=1)
-        assert len(results) == 1
-        # Full text should be appended or replace abstract
-        assert "Full intro" in results[0].content or "Short abstract" in results[0].content
-```
----
-### Step 2: Implement Full-Text Methods
-**File**: `src/tools/pubmed.py` (additions to existing class)
-```python
-# Add these methods to PubMedTool class
-async def get_pmc_id(self, pmid: str) -> str | None:
-    """
-    Convert PMID to PMCID for full-text access.
-    Args:
-        pmid: PubMed ID
-    Returns:
-        PMCID if paper is in PMC, None otherwise
-    """
-    url = "https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/"
-    params = {"ids": pmid, "format": "json"}
-    async with httpx.AsyncClient(timeout=30.0) as client:
-        try:
-            response = await client.get(url, params=params)
-            response.raise_for_status()
-            data = response.json()
-            records = data.get("records", [])
-            if records and records[0].get("pmcid"):
-                return records[0]["pmcid"]
-            return None
-        except httpx.HTTPError:
-            return None
-async def get_fulltext(self, pmid: str) -> str | None:
-    """
-    Get full text for a PubMed paper via BioC API.
-    Only works for open-access papers in PubMed Central.
-    Args:
-        pmid: PubMed ID
-    Returns:
-        Full text as string, or None if not available
-    """
-    url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
-    async with httpx.AsyncClient(timeout=60.0) as client:
-        try:
-            response = await client.get(url)
-            if response.status_code == 404:
-                return None
-            response.raise_for_status()
-            data = response.json()
-            # Extract text from all passages
-            documents = data.get("documents", [])
-            if not documents:
-                return None
-            passages = documents[0].get("passages", [])
-            text_parts = [p.get("text", "") for p in passages if p.get("text")]
-            return "\n\n".join(text_parts) if text_parts else None
-        except httpx.HTTPError:
-            return None
-async def get_fulltext_structured(self, pmid: str) -> dict[str, str] | None:
-    """
-    Get structured full text with sections.
-    Args:
-        pmid: PubMed ID
-    Returns:
-        Dict mapping section names to text, or None if not available
-    """
-    url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
-    async with httpx.AsyncClient(timeout=60.0) as client:
-        try:
-            response = await client.get(url)
-            if response.status_code == 404:
-                return None
-            response.raise_for_status()
-            data = response.json()
-            documents = data.get("documents", [])
-            if not documents:
-                return None
-            # Map section types to readable names
-            section_map = {
-                "INTRO": "introduction",
-                "METHODS": "methods",
-                "RESULTS": "results",
-                "DISCUSS": "discussion",
-                "CONCL": "conclusion",
-                "ABSTRACT": "abstract",
-            }
-            sections: dict[str, list[str]] = {}
-            for passage in documents[0].get("passages", []):
-                section_type = passage.get("infons", {}).get("section_type", "other")
-                section_name = section_map.get(section_type, "other")
-                text = passage.get("text", "")
-                if text:
-                    if section_name not in sections:
-                        sections[section_name] = []
-                    sections[section_name].append(text)
-            # Join multiple passages per section
-            return {k: "\n\n".join(v) for k, v in sections.items()}
-        except httpx.HTTPError:
-            return None
-```
----
-### Step 3: Update Constructor and Search Method
-Add full-text flag to constructor and update search to use it:
-```python
-class PubMedTool:
-    """Search tool for PubMed/NCBI."""
-    def __init__(
-        self,
-        api_key: str | None = None,
-        include_fulltext: bool = False,  # NEW CONSTRUCTOR PARAM
-    ) -> None:
-        self.api_key = api_key or settings.ncbi_api_key
-        if self.api_key == "your-ncbi-key-here":
-            self.api_key = None
-        self._last_request_time = 0.0
-        self.include_fulltext = include_fulltext  # Store for use in search()
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        """
-        Search PubMed and return evidence.
-        Note: Full-text enrichment is controlled by constructor parameter,
-        not method parameter, because SearchHandler doesn't pass extra args.
-        """
-        # ... existing search logic ...
-        evidence_list = self._parse_pubmed_xml(fetch_resp.text)
-        # Optionally enrich with full text (if configured at construction)
-        if self.include_fulltext:
-            evidence_list = await self._enrich_with_fulltext(evidence_list)
-        return evidence_list
-async def _enrich_with_fulltext(
-    self, evidence_list: list[Evidence]
-) -> list[Evidence]:
-    """Attempt to add full text to evidence items."""
-    enriched = []
-    for evidence in evidence_list:
-        # Extract PMID from URL
-        url = evidence.citation.url
-        pmid = url.rstrip("/").split("/")[-1] if url else None
-        if pmid:
-            fulltext = await self.get_fulltext(pmid)
-            if fulltext:
-                # Replace abstract with full text (truncated)
-                evidence = Evidence(
-                    content=fulltext[:8000],  # Larger limit for full text
-                    citation=evidence.citation,
-                    relevance=evidence.relevance,
-                    metadata={
-                        **evidence.metadata,
-                        "has_fulltext": True,
-                    },
-                )
-        enriched.append(evidence)
-    return enriched
-```
----
-## Demo Script
-**File**: `examples/pubmed_fulltext_demo.py`
-```python
-#!/usr/bin/env python3
-"""Demo script to verify PubMed full-text retrieval."""
-import asyncio
-from src.tools.pubmed import PubMedTool
-async def main():
-    """Run PubMed full-text demo."""
-    tool = PubMedTool()
-    print("=" * 60)
-    print("PubMed Full-Text Demo")
-    print("=" * 60)
-    # Test 1: Convert PMID to PMCID
-    print("\n[Test 1] Converting PMID to PMCID...")
-    # Use a known open-access paper
-    test_pmid = "34450029"  # Example: COVID-related open-access paper
-    pmcid = await tool.get_pmc_id(test_pmid)
-    print(f"PMID {test_pmid} -> PMCID: {pmcid or 'Not in PMC'}")
-    # Test 2: Get full text
-    print("\n[Test 2] Fetching full text...")
-    if pmcid:
-        fulltext = await tool.get_fulltext(test_pmid)
-        if fulltext:
-            print(f"Full text length: {len(fulltext)} characters")
-            print(f"Preview: {fulltext[:500]}...")
-        else:
-            print("Full text not available")
-    # Test 3: Get structured sections
-    print("\n[Test 3] Fetching structured sections...")
-    if pmcid:
-        sections = await tool.get_fulltext_structured(test_pmid)
-        if sections:
-            print("Available sections:")
-            for section, text in sections.items():
-                print(f"  - {section}: {len(text)} chars")
-        else:
-            print("Structured text not available")
-    # Test 4: Search with full text
-    print("\n[Test 4] Search with full-text enrichment...")
-    results = await tool.search(
-        "metformin cancer open access",
-        max_results=3,
-        include_fulltext=True
-    )
-    for i, evidence in enumerate(results, 1):
-        has_ft = evidence.metadata.get("has_fulltext", False)
-        print(f"\n--- Result {i} ---")
-        print(f"Title: {evidence.citation.title}")
-        print(f"Has Full Text: {has_ft}")
-        print(f"Content Length: {len(evidence.content)} chars")
-    print("\n" + "=" * 60)
-    print("Demo complete!")
-if __name__ == "__main__":
-    asyncio.run(main())
-```
----
-## Verification Checklist
-### Unit Tests
-```bash
-# Run full-text tests
-uv run pytest tests/unit/tools/test_pubmed_fulltext.py -v
-# Run all PubMed tests
-uv run pytest tests/unit/tools/test_pubmed.py -v
-# Expected: All tests pass
-```
-### Integration Test (Manual)
-```bash
-# Run demo with real API
-uv run python examples/pubmed_fulltext_demo.py
-# Expected: Real full text from PMC papers
-```
-### Full Test Suite
-```bash
-make check
-# Expected: All tests pass, mypy clean
-```
----
-## Success Criteria
-1. **ID Conversion works**: PMID -> PMCID conversion successful
-2. **Full text retrieval works**: BioC API returns paper text
-3. **Structured sections work**: Can get intro/methods/results/discussion separately
-4. **Search integration works**: `include_fulltext=True` enriches results
-5. **No regressions**: Existing tests still pass
-6. **Graceful degradation**: Non-PMC papers still return abstracts
----
-## Notes
-- Only ~30% of PubMed papers have full text in PMC
-- BioC API has no documented rate limit, but be respectful
-- Full text can be very long - truncate appropriately
-- Consider caching full text responses (they don't change)
-- Timeout should be longer for full text (60s vs 30s)

docs/brainstorming/implementation/17_PHASE_RATE_LIMITING.md DELETED Viewed

@@ -1,540 +0,0 @@
-# Phase 17: Rate Limiting with `limits` Library
-**Priority**: P0 CRITICAL - Prevents API blocks
-**Effort**: ~1 hour
-**Dependencies**: None
----
-## CRITICAL: Async Safety Requirements
-**WARNING**: The rate limiter MUST be async-safe. Blocking the event loop will freeze:
-- The Gradio UI
-- All parallel searches
-- The orchestrator
-**Rules**:
-1. **NEVER use `time.sleep()`** - Always use `await asyncio.sleep()`
-2. **NEVER use blocking while loops** - Use async-aware polling
-3. **The `limits` library check is synchronous** - Wrap it carefully
-The implementation below uses a polling pattern that:
-- Checks the limit (synchronous, fast)
-- If exceeded, `await asyncio.sleep()` (non-blocking)
-- Retry the check
-**Alternative**: If `limits` proves problematic, use `aiolimiter` which is pure-async.
----
-## Overview
-Replace naive `asyncio.sleep` rate limiting with proper rate limiter using the `limits` library, which provides:
-- Moving window rate limiting
-- Per-API configurable limits
-- Thread-safe storage
-- Already used in reference repo
-**Why This Matters?**
-- NCBI will block us without proper rate limiting (3/sec without key, 10/sec with)
-- Current implementation only has simple sleep delay
-- Need coordinated limits across all PubMed calls
-- Professional-grade rate limiting prevents production issues
----
-## Current State
-### What We Have (`src/tools/pubmed.py:20-21, 34-41`)
-```python
-RATE_LIMIT_DELAY = 0.34  # ~3 requests/sec without API key
-async def _rate_limit(self) -> None:
-    """Enforce NCBI rate limiting."""
-    loop = asyncio.get_running_loop()
-    now = loop.time()
-    elapsed = now - self._last_request_time
-    if elapsed < self.RATE_LIMIT_DELAY:
-        await asyncio.sleep(self.RATE_LIMIT_DELAY - elapsed)
-    self._last_request_time = loop.time()
-```
-### Problems
-1. **Not shared across instances**: Each `PubMedTool()` has its own counter
-2. **Simple delay vs moving window**: Doesn't handle bursts properly
-3. **Hardcoded rate**: Doesn't adapt to API key presence
-4. **No backoff on 429**: Just retries blindly
----
-## TDD Implementation Plan
-### Step 1: Add Dependency
-**File**: `pyproject.toml`
-```toml
-dependencies = [
-    # ... existing deps ...
-    "limits>=3.0",
-]
-```
-Then run:
-```bash
-uv sync
-```
----
-### Step 2: Write the Tests First
-**File**: `tests/unit/tools/test_rate_limiting.py`
-```python
-"""Tests for rate limiting functionality."""
-import asyncio
-import time
-import pytest
-from src.tools.rate_limiter import RateLimiter, get_pubmed_limiter
-class TestRateLimiter:
-    """Test suite for rate limiter."""
-    def test_create_limiter_without_api_key(self) -> None:
-        """Should create 3/sec limiter without API key."""
-        limiter = RateLimiter(rate="3/second")
-        assert limiter.rate == "3/second"
-    def test_create_limiter_with_api_key(self) -> None:
-        """Should create 10/sec limiter with API key."""
-        limiter = RateLimiter(rate="10/second")
-        assert limiter.rate == "10/second"
-    @pytest.mark.asyncio
-    async def test_limiter_allows_requests_under_limit(self) -> None:
-        """Should allow requests under the rate limit."""
-        limiter = RateLimiter(rate="10/second")
-        # 3 requests should all succeed immediately
-        for _ in range(3):
-            allowed = await limiter.acquire()
-            assert allowed is True
-    @pytest.mark.asyncio
-    async def test_limiter_blocks_when_exceeded(self) -> None:
-        """Should wait when rate limit exceeded."""
-        limiter = RateLimiter(rate="2/second")
-        # First 2 should be instant
-        await limiter.acquire()
-        await limiter.acquire()
-        # Third should block briefly
-        start = time.monotonic()
-        await limiter.acquire()
-        elapsed = time.monotonic() - start
-        # Should have waited ~0.5 seconds (half second window for 2/sec)
-        assert elapsed >= 0.3
-    @pytest.mark.asyncio
-    async def test_limiter_resets_after_window(self) -> None:
-        """Rate limit should reset after time window."""
-        limiter = RateLimiter(rate="5/second")
-        # Use up the limit
-        for _ in range(5):
-            await limiter.acquire()
-        # Wait for window to pass
-        await asyncio.sleep(1.1)
-        # Should be allowed again
-        start = time.monotonic()
-        await limiter.acquire()
-        elapsed = time.monotonic() - start
-        assert elapsed < 0.1  # Should be nearly instant
-class TestGetPubmedLimiter:
-    """Test PubMed-specific limiter factory."""
-    def test_limiter_without_api_key(self) -> None:
-        """Should return 3/sec limiter without key."""
-        limiter = get_pubmed_limiter(api_key=None)
-        assert "3" in limiter.rate
-    def test_limiter_with_api_key(self) -> None:
-        """Should return 10/sec limiter with key."""
-        limiter = get_pubmed_limiter(api_key="my-api-key")
-        assert "10" in limiter.rate
-    def test_limiter_is_singleton(self) -> None:
-        """Same API key should return same limiter instance."""
-        limiter1 = get_pubmed_limiter(api_key="key1")
-        limiter2 = get_pubmed_limiter(api_key="key1")
-        assert limiter1 is limiter2
-    def test_different_keys_different_limiters(self) -> None:
-        """Different API keys should return different limiters."""
-        limiter1 = get_pubmed_limiter(api_key="key1")
-        limiter2 = get_pubmed_limiter(api_key="key2")
-        # Clear cache for clean test
-        # Actually, different keys SHOULD share the same limiter
-        # since we're limiting against the same API
-        assert limiter1 is limiter2  # Shared NCBI rate limit
-```
----
-### Step 3: Create Rate Limiter Module
-**File**: `src/tools/rate_limiter.py`
-```python
-"""Rate limiting utilities using the limits library."""
-import asyncio
-from typing import ClassVar
-from limits import RateLimitItem, parse
-from limits.storage import MemoryStorage
-from limits.strategies import MovingWindowRateLimiter
-class RateLimiter:
-    """
-    Async-compatible rate limiter using limits library.
-    Uses moving window algorithm for smooth rate limiting.
-    """
-    def __init__(self, rate: str) -> None:
-        """
-        Initialize rate limiter.
-        Args:
-            rate: Rate string like "3/second" or "10/second"
-        """
-        self.rate = rate
-        self._storage = MemoryStorage()
-        self._limiter = MovingWindowRateLimiter(self._storage)
-        self._rate_limit: RateLimitItem = parse(rate)
-        self._identity = "default"  # Single identity for shared limiting
-    async def acquire(self, wait: bool = True) -> bool:
-        """
-        Acquire permission to make a request.
-        ASYNC-SAFE: Uses asyncio.sleep(), never time.sleep().
-        The polling pattern allows other coroutines to run while waiting.
-        Args:
-            wait: If True, wait until allowed. If False, return immediately.
-        Returns:
-            True if allowed, False if not (only when wait=False)
-        """
-        while True:
-            # Check if we can proceed (synchronous, fast - ~microseconds)
-            if self._limiter.hit(self._rate_limit, self._identity):
-                return True
-            if not wait:
-                return False
-            # CRITICAL: Use asyncio.sleep(), NOT time.sleep()
-            # This yields control to the event loop, allowing other
-            # coroutines (UI, parallel searches) to run
-            await asyncio.sleep(0.1)
-    def reset(self) -> None:
-        """Reset the rate limiter (for testing)."""
-        self._storage.reset()
-# Singleton limiter for PubMed/NCBI
-_pubmed_limiter: RateLimiter | None = None
-def get_pubmed_limiter(api_key: str | None = None) -> RateLimiter:
-    """
-    Get the shared PubMed rate limiter.
-    Rate depends on whether API key is provided:
-    - Without key: 3 requests/second
-    - With key: 10 requests/second
-    Args:
-        api_key: NCBI API key (optional)
-    Returns:
-        Shared RateLimiter instance
-    """
-    global _pubmed_limiter
-    if _pubmed_limiter is None:
-        rate = "10/second" if api_key else "3/second"
-        _pubmed_limiter = RateLimiter(rate)
-    return _pubmed_limiter
-def reset_pubmed_limiter() -> None:
-    """Reset the PubMed limiter (for testing)."""
-    global _pubmed_limiter
-    _pubmed_limiter = None
-# Factory for other APIs
-class RateLimiterFactory:
-    """Factory for creating/getting rate limiters for different APIs."""
-    _limiters: ClassVar[dict[str, RateLimiter]] = {}
-    @classmethod
-    def get(cls, api_name: str, rate: str) -> RateLimiter:
-        """
-        Get or create a rate limiter for an API.
-        Args:
-            api_name: Unique identifier for the API
-            rate: Rate limit string (e.g., "10/second")
-        Returns:
-            RateLimiter instance (shared for same api_name)
-        """
-        if api_name not in cls._limiters:
-            cls._limiters[api_name] = RateLimiter(rate)
-        return cls._limiters[api_name]
-    @classmethod
-    def reset_all(cls) -> None:
-        """Reset all limiters (for testing)."""
-        cls._limiters.clear()
-```
----
-### Step 4: Update PubMed Tool
-**File**: `src/tools/pubmed.py` (replace rate limiting code)
-```python
-# Replace imports and rate limiting
-from src.tools.rate_limiter import get_pubmed_limiter
-class PubMedTool:
-    """Search tool for PubMed/NCBI."""
-    BASE_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"
-    HTTP_TOO_MANY_REQUESTS = 429
-    def __init__(self, api_key: str | None = None) -> None:
-        self.api_key = api_key or settings.ncbi_api_key
-        if self.api_key == "your-ncbi-key-here":
-            self.api_key = None
-        # Use shared rate limiter
-        self._limiter = get_pubmed_limiter(self.api_key)
-    async def _rate_limit(self) -> None:
-        """Enforce NCBI rate limiting using shared limiter."""
-        await self._limiter.acquire()
-    # ... rest of class unchanged ...
-```
----
-### Step 5: Add Rate Limiters for Other APIs
-**File**: `src/tools/clinicaltrials.py` (optional)
-```python
-from src.tools.rate_limiter import RateLimiterFactory
-class ClinicalTrialsTool:
-    def __init__(self) -> None:
-        # ClinicalTrials.gov doesn't document limits, but be conservative
-        self._limiter = RateLimiterFactory.get("clinicaltrials", "5/second")
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        await self._limiter.acquire()
-        # ... rest of method ...
-```
-**File**: `src/tools/europepmc.py` (optional)
-```python
-from src.tools.rate_limiter import RateLimiterFactory
-class EuropePMCTool:
-    def __init__(self) -> None:
-        # Europe PMC is generous, but still be respectful
-        self._limiter = RateLimiterFactory.get("europepmc", "10/second")
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        await self._limiter.acquire()
-        # ... rest of method ...
-```
----
-## Demo Script
-**File**: `examples/rate_limiting_demo.py`
-```python
-#!/usr/bin/env python3
-"""Demo script to verify rate limiting works correctly."""
-import asyncio
-import time
-from src.tools.rate_limiter import RateLimiter, get_pubmed_limiter, reset_pubmed_limiter
-from src.tools.pubmed import PubMedTool
-async def test_basic_limiter():
-    """Test basic rate limiter behavior."""
-    print("=" * 60)
-    print("Rate Limiting Demo")
-    print("=" * 60)
-    # Test 1: Basic limiter
-    print("\n[Test 1] Testing 3/second limiter...")
-    limiter = RateLimiter("3/second")
-    start = time.monotonic()
-    for i in range(6):
-        await limiter.acquire()
-        elapsed = time.monotonic() - start
-        print(f"  Request {i+1} at {elapsed:.2f}s")
-    total = time.monotonic() - start
-    print(f"  Total time for 6 requests: {total:.2f}s (expected ~2s)")
-async def test_pubmed_limiter():
-    """Test PubMed-specific limiter."""
-    print("\n[Test 2] Testing PubMed limiter (shared)...")
-    reset_pubmed_limiter()  # Clean state
-    # Without API key: 3/sec
-    limiter = get_pubmed_limiter(api_key=None)
-    print(f"  Rate without key: {limiter.rate}")
-    # Multiple tools should share the same limiter
-    tool1 = PubMedTool()
-    tool2 = PubMedTool()
-    # Verify they share the limiter
-    print(f"  Tools share limiter: {tool1._limiter is tool2._limiter}")
-async def test_concurrent_requests():
-    """Test rate limiting under concurrent load."""
-    print("\n[Test 3] Testing concurrent request limiting...")
-    limiter = RateLimiter("5/second")
-    async def make_request(i: int):
-        await limiter.acquire()
-        return time.monotonic()
-    start = time.monotonic()
-    # Launch 10 concurrent requests
-    tasks = [make_request(i) for i in range(10)]
-    times = await asyncio.gather(*tasks)
-    # Calculate distribution
-    relative_times = [t - start for t in times]
-    print(f"  Request times: {[f'{t:.2f}s' for t in sorted(relative_times)]}")
-    total = max(relative_times)
-    print(f"  All 10 requests completed in {total:.2f}s (expected ~2s)")
-async def main():
-    await test_basic_limiter()
-    await test_pubmed_limiter()
-    await test_concurrent_requests()
-    print("\n" + "=" * 60)
-    print("Demo complete!")
-if __name__ == "__main__":
-    asyncio.run(main())
-```
----
-## Verification Checklist
-### Unit Tests
-```bash
-# Run rate limiting tests
-uv run pytest tests/unit/tools/test_rate_limiting.py -v
-# Expected: All tests pass
-```
-### Integration Test (Manual)
-```bash
-# Run demo
-uv run python examples/rate_limiting_demo.py
-# Expected: Requests properly spaced
-```
-### Full Test Suite
-```bash
-make check
-# Expected: All tests pass, mypy clean
-```
----
-## Success Criteria
-1. **`limits` library installed**: Dependency added to pyproject.toml
-2. **RateLimiter class works**: Can create and use limiters
-3. **PubMed uses new limiter**: Shared limiter across instances
-4. **Rate adapts to API key**: 3/sec without, 10/sec with
-5. **Concurrent requests handled**: Multiple async requests properly queued
-6. **No regressions**: All existing tests pass
----
-## API Rate Limit Reference
-| API | Without Key | With Key |
-|-----|-------------|----------|
-| PubMed/NCBI | 3/sec | 10/sec |
-| ClinicalTrials.gov | Undocumented (~5/sec safe) | N/A |
-| Europe PMC | ~10-20/sec (generous) | N/A |
-| OpenAlex | ~100k/day (no per-sec limit) | Faster with `mailto` |
----
-## Notes
-- `limits` library uses moving window algorithm (fairer than fixed window)
-- Singleton pattern ensures all PubMed calls share the limit
-- The factory pattern allows easy extension to other APIs
-- Consider adding 429 response detection + exponential backoff
-- In production, consider Redis storage for distributed rate limiting

docs/brainstorming/implementation/README.md DELETED Viewed

@@ -1,143 +0,0 @@
-# Implementation Plans
-TDD implementation plans based on the brainstorming documents. Each phase is a self-contained vertical slice with tests, implementation, and demo scripts.
----
-## Prerequisites (COMPLETED)
-The following foundational changes have been implemented to support all three phases:
-| Change | File | Status |
-|--------|------|--------|
-| Add `"openalex"` to `SourceName` | `src/utils/models.py:9` | ✅ Done |
-| Add `metadata` field to `Evidence` | `src/utils/models.py:39-42` | ✅ Done |
-| Export all tools from `__init__.py` | `src/tools/__init__.py` | ✅ Done |
-All 110 tests pass after these changes.
----
-## Priority Order
-| Phase | Name | Priority | Effort | Value |
-|-------|------|----------|--------|-------|
-| **17** | Rate Limiting | P0 CRITICAL | 1 hour | Stability |
-| **15** | OpenAlex | HIGH | 2-3 hours | Very High |
-| **16** | PubMed Full-Text | MEDIUM | 3 hours | High |
-**Recommended implementation order**: 17 → 15 → 16
----
-## Phase 15: OpenAlex Integration
-**File**: [15_PHASE_OPENALEX.md](./15_PHASE_OPENALEX.md)
-Add OpenAlex as 4th data source for:
-- Citation networks (who cites whom)
-- Concept tagging (semantic discovery)
-- 209M+ scholarly works
-- Free, no API key required
-**Quick Start**:
-```bash
-# Create the tool
-touch src/tools/openalex.py
-touch tests/unit/tools/test_openalex.py
-# Run tests first (TDD)
-uv run pytest tests/unit/tools/test_openalex.py -v
-# Demo
-uv run python examples/openalex_demo.py
-```
----
-## Phase 16: PubMed Full-Text
-**File**: [16_PHASE_PUBMED_FULLTEXT.md](./16_PHASE_PUBMED_FULLTEXT.md)
-Add full-text retrieval via BioC API for:
-- Complete paper text (not just abstracts)
-- Structured sections (intro, methods, results)
-- Better evidence for LLM synthesis
-**Quick Start**:
-```bash
-# Add methods to existing pubmed.py
-# Tests in test_pubmed_fulltext.py
-# Run tests
-uv run pytest tests/unit/tools/test_pubmed_fulltext.py -v
-# Demo
-uv run python examples/pubmed_fulltext_demo.py
-```
----
-## Phase 17: Rate Limiting
-**File**: [17_PHASE_RATE_LIMITING.md](./17_PHASE_RATE_LIMITING.md)
-Replace naive sleep-based rate limiting with `limits` library for:
-- Moving window algorithm
-- Shared limits across instances
-- Configurable per-API rates
-- Production-grade stability
-**Quick Start**:
-```bash
-# Add dependency
-uv add limits
-# Create module
-touch src/tools/rate_limiter.py
-touch tests/unit/tools/test_rate_limiting.py
-# Run tests
-uv run pytest tests/unit/tools/test_rate_limiting.py -v
-# Demo
-uv run python examples/rate_limiting_demo.py
-```
----
-## TDD Workflow
-Each implementation doc follows this pattern:
-1. **Write tests first** - Define expected behavior
-2. **Run tests** - Verify they fail (red)
-3. **Implement** - Write minimal code to pass
-4. **Run tests** - Verify they pass (green)
-5. **Refactor** - Clean up if needed
-6. **Demo** - Verify end-to-end with real APIs
-7. **`make check`** - Ensure no regressions
----
-## Related Brainstorming Docs
-These implementation plans are derived from:
-- [00_ROADMAP_SUMMARY.md](../00_ROADMAP_SUMMARY.md) - Priority overview
-- [01_PUBMED_IMPROVEMENTS.md](../01_PUBMED_IMPROVEMENTS.md) - PubMed details
-- [02_CLINICALTRIALS_IMPROVEMENTS.md](../02_CLINICALTRIALS_IMPROVEMENTS.md) - CT.gov details
-- [03_EUROPEPMC_IMPROVEMENTS.md](../03_EUROPEPMC_IMPROVEMENTS.md) - Europe PMC details
-- [04_OPENALEX_INTEGRATION.md](../04_OPENALEX_INTEGRATION.md) - OpenAlex integration
----
-## Future Phases (Not Yet Documented)
-Based on brainstorming, these could be added later:
-- **Phase 18**: ClinicalTrials.gov Results Retrieval
-- **Phase 19**: Europe PMC Annotations API
-- **Phase 20**: Drug Name Normalization (RxNorm)
-- **Phase 21**: Citation Network Queries (OpenAlex)
-- **Phase 22**: Semantic Search with Embeddings

docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md DELETED Viewed

@@ -1,189 +0,0 @@
-# Situation Analysis: Pydantic-AI + Microsoft Agent Framework Integration
-**Date:** November 27, 2025
-**Status:** ACTIVE DECISION REQUIRED
-**Risk Level:** HIGH - DO NOT MERGE PR #41 UNTIL RESOLVED
----
-## 1. The Problem
-We almost merged a refactor that would have **deleted** multi-agent orchestration capability from the codebase, mistakenly believing pydantic-ai and Microsoft Agent Framework were mutually exclusive.
-**They are not.** They are complementary:
-- **pydantic-ai** (Library): Ensures LLM outputs match Pydantic schemas
-- **Microsoft Agent Framework** (Framework): Orchestrates multi-agent workflows
----
-## 2. Current Branch State
-| Branch | Location | Has Agent Framework? | Has Pydantic-AI Improvements? | Status |
-|--------|----------|---------------------|------------------------------|--------|
-| `origin/dev` | GitHub | YES | NO | **SAFE - Source of Truth** |
-| `huggingface-upstream/dev` | HF Spaces | YES | NO | **SAFE - Same as GitHub** |
-| `origin/main` | GitHub | YES | NO | **SAFE** |
-| `feat/pubmed-fulltext` | GitHub | NO (deleted) | YES | **DANGER - Has destructive refactor** |
-| `refactor/pydantic-unification` | Local | NO (deleted) | YES | **DANGER - Redundant, delete** |
-| Local `dev` | Local only | NO (deleted) | YES | **DANGER - NOT PUSHED (thankfully)** |
-### Key Files at Risk
-**On `origin/dev` (PRESERVED):**
-```text
-src/agents/
-├── analysis_agent.py      # StatisticalAnalyzer wrapper
-├── hypothesis_agent.py    # Hypothesis generation
-├── judge_agent.py         # JudgeHandler wrapper
-├── magentic_agents.py     # Multi-agent definitions
-├── report_agent.py        # Report synthesis
-├── search_agent.py        # SearchHandler wrapper
-├── state.py               # Thread-safe state management
-└── tools.py               # @ai_function decorated tools
-src/orchestrator_magentic.py  # Multi-agent orchestrator
-src/utils/llm_factory.py      # Centralized LLM client factory
-```
-**Deleted in refactor branch (would be lost if merged):**
-- All of the above
----
-## 3. Target Architecture
-```text
-┌─────────────────────────────────────────────────────────────────┐
-│  Microsoft Agent Framework (Orchestration Layer)                │
-│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
-│  │ SearchAgent  │→ │ JudgeAgent   │→ │ ReportAgent  │          │
-│  │ (BaseAgent)  │  │ (BaseAgent)  │  │ (BaseAgent)  │          │
-│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘          │
-│         │                 │                 │                  │
-│         ▼                 ▼                 ▼                  │
-│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
-│  │ pydantic-ai  │  │ pydantic-ai  │  │ pydantic-ai  │          │
-│  │ Agent()      │  │ Agent()      │  │ Agent()      │          │
-│  │ output_type= │  │ output_type= │  │ output_type= │          │
-│  │ SearchResult │  │ JudgeAssess  │  │ Report       │          │
-│  └──────────────┘  └──────────────┘  └──────────────┘          │
-└─────────────────────────────────────────────────────────────────┘
-```
-**Why this architecture:**
-1. **Agent Framework** handles: workflow coordination, state passing, middleware, observability
-2. **pydantic-ai** handles: type-safe LLM calls within each agent
----
-## 4. CRITICAL: Naming Confusion Clarification
-> **Senior Agent Review Finding:** The codebase uses "magentic" in file names (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT** the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework (`agent-framework-core`).
-**The naming confusion:**
-- `magentic` (PyPI package): A different library for structured LLM outputs
-- "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
-- `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
-**Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py` to eliminate confusion.
----
-## 5. What the Refactor DID Get Right
-The refactor branch (`feat/pubmed-fulltext`) has some valuable improvements:
-1. **`judges.py` unified `get_model()`** - Supports OpenAI, Anthropic, AND HuggingFace via pydantic-ai
-2. **HuggingFace free tier support** - `HuggingFaceModel` integration
-3. **Test fix** - Properly mocks `HuggingFaceModel` class
-4. **Removed broken magentic optional dependency** from pyproject.toml (this was correct - the old `magentic` package is different from Microsoft Agent Framework)
-**What it got WRONG:**
-1. Deleted `src/agents/` entirely instead of refactoring them
-2. Deleted `src/orchestrator_magentic.py` instead of fixing it
-3. Conflated "magentic" (old package) with "Microsoft Agent Framework" (current framework)
----
-## 6. Options for Path Forward
-### Option A: Abandon Refactor, Start Fresh
-- Close PR #41
-- Delete `feat/pubmed-fulltext` and `refactor/pydantic-unification` branches
-- Reset local `dev` to match `origin/dev`
-- Cherry-pick ONLY the good parts (judges.py improvements, HF support)
-- **Pros:** Clean, safe
-- **Cons:** Lose some work, need to redo carefully
-### Option B: Cherry-Pick Good Parts to origin/dev
-- Do NOT merge PR #41
-- Create new branch from `origin/dev`
-- Cherry-pick specific commits/changes that improve pydantic-ai usage
-- Keep agent framework code intact
-- **Pros:** Preserves both, surgical
-- **Cons:** Requires careful file-by-file review
-### Option C: Revert Deletions in Refactor Branch
-- On `feat/pubmed-fulltext`, restore deleted agent files from `origin/dev`
-- Keep the pydantic-ai improvements
-- Merge THAT to dev
-- **Pros:** Gets both
-- **Cons:** Complex git operations, risk of conflicts
----
-## 7. Recommended Action: Option B (Cherry-Pick)
-**Step-by-step:**
-1. **Close PR #41** (do not merge)
-2. **Delete redundant branches:**
-   - `refactor/pydantic-unification` (local)
-   - Reset local `dev` to `origin/dev`
-3. **Create new branch from origin/dev:**
-   ```bash
-   git checkout -b feat/pydantic-ai-improvements origin/dev
-   ```
-4. **Cherry-pick or manually port these improvements:**
-   - `src/agent_factory/judges.py` - the unified `get_model()` function
-   - `examples/free_tier_demo.py` - HuggingFace demo
-   - Test improvements
-5. **Do NOT delete any agent framework files**
-6. **Create PR for review**
----
-## 8. Files to Cherry-Pick (Safe Improvements)
-| File | What Changed | Safe to Port? |
-|------|-------------|---------------|
-| `src/agent_factory/judges.py` | Added `HuggingFaceModel` support in `get_model()` | YES |
-| `examples/free_tier_demo.py` | New demo for HF inference | YES |
-| `tests/unit/agent_factory/test_judges.py` | Fixed HF model mocking | YES |
-| `pyproject.toml` | Removed old `magentic` optional dep | MAYBE (review carefully) |
----
-## 9. Questions to Answer Before Proceeding
-1. **For the hackathon**: Do we need full multi-agent orchestration, or is single-agent sufficient?
-2. **For DeepCritical mainline**: Is the plan to use Microsoft Agent Framework for orchestration?
-3. **Timeline**: How much time do we have to get this right?
----
-## 10. Immediate Actions (DO NOW)
-- [ ] **DO NOT merge PR #41**
-- [ ] Close PR #41 with comment explaining the situation
-- [ ] Do not push local `dev` branch anywhere
-- [ ] Confirm HuggingFace Spaces is untouched (it is - verified)
----
-## 11. Decision Log
-| Date | Decision | Rationale |
-|------|----------|-----------|
-| 2025-11-27 | Pause refactor merge | Discovered agent framework and pydantic-ai are complementary, not exclusive |
-| TBD | ? | Awaiting decision on path forward |

docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md DELETED Viewed

@@ -1,289 +0,0 @@
-# Architecture Specification: Dual-Mode Agent System
-**Date:** November 27, 2025
-**Status:** SPECIFICATION
-**Goal:** Graceful degradation from full multi-agent orchestration to simple single-agent mode
----
-## 1. Core Concept: Two Operating Modes
-```text
-┌─────────────────────────────────────────────────────────────────────┐
-│                        USER REQUEST                                 │
-│                            │                                        │
-│                            ▼                                        │
-│                   ┌─────────────────┐                               │
-│                   │  Mode Selection │                               │
-│                   │  (Auto-detect)  │                               │
-│                   └────────┬────────┘                               │
-│                            │                                        │
-│            ┌───────────────┴───────────────┐                        │
-│            │                               │                        │
-│            ▼                               ▼                        │
-│   ┌─────────────────┐             ┌─────────────────┐               │
-│   │   SIMPLE MODE   │             │  ADVANCED MODE  │               │
-│   │  (Free Tier)    │             │  (Paid Tier)    │               │
-│   │                 │             │                 │               │
-│   │  pydantic-ai    │             │  MS Agent Fwk   │               │
-│   │  single-agent   │             │  + pydantic-ai  │               │
-│   │  loop           │             │  multi-agent    │               │
-│   └─────────────────┘             └─────────────────┘               │
-│            │                               │                        │
-│            └───────────────┬───────────────┘                        │
-│                            ▼                                        │
-│                   ┌─────────────────┐                               │
-│                   │  Research Report │                              │
-│                   │  with Citations  │                              │
-│                   └─────────────────┘                               │
-└─────────────────────────────────────────────────────────────────────┘
-```
----
-## 2. Mode Comparison
-| Aspect | Simple Mode | Advanced Mode |
-|--------|-------------|---------------|
-| **Trigger** | No API key OR `LLM_PROVIDER=huggingface` | OpenAI API key present (currently OpenAI only) |
-| **Framework** | pydantic-ai only | Microsoft Agent Framework + pydantic-ai |
-| **Architecture** | Single orchestrator loop | Multi-agent coordination |
-| **Agents** | One agent does Search→Judge→Report | SearchAgent, JudgeAgent, ReportAgent, AnalysisAgent |
-| **State Management** | Simple dict | Thread-safe `MagenticState` with context vars |
-| **Quality** | Good (functional) | Better (specialized agents, coordination) |
-| **Cost** | Free (HuggingFace Inference) | Paid (OpenAI/Anthropic) |
-| **Use Case** | Demos, hackathon, budget-constrained | Production, research quality |
----
-## 3. Simple Mode Architecture (pydantic-ai Only)
-```text
-┌─────────────────────────────────────────────────────┐
-│                  Orchestrator                       │
-│                                                     │
-│   while not sufficient and iteration < max:        │
-│       1. SearchHandler.execute(query)              │
-│       2. JudgeHandler.assess(evidence)    ◄── pydantic-ai Agent  │
-│       3. if sufficient: break                      │
-│       4. query = judge.next_queries                │
-│                                                     │
-│   return ReportGenerator.generate(evidence)        │
-└─────────────────────────────────────────────────────┘
-```
-**Components:**
-- `src/orchestrator.py` - Simple loop orchestrator
-- `src/agent_factory/judges.py` - JudgeHandler with pydantic-ai
-- `src/tools/search_handler.py` - Scatter-gather search
-- `src/tools/pubmed.py`, `clinicaltrials.py`, `europepmc.py` - Search tools
----
-## 4. Advanced Mode Architecture (MS Agent Framework + pydantic-ai)
-```text
-┌─────────────────────────────────────────────────────────────────────┐
-│              Microsoft Agent Framework Orchestrator                 │
-│                                                                     │
-│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐            │
-│   │ SearchAgent │───▶│ JudgeAgent  │───▶│ ReportAgent │            │
-│   │ (BaseAgent) │    │ (BaseAgent) │    │ (BaseAgent) │            │
-│   └──────┬──────┘    └──────┬──────┘    └──────┬──────┘            │
-│          │                  │                  │                    │
-│          ▼                  ▼                  ▼                    │
-│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐            │
-│   │ pydantic-ai │    │ pydantic-ai │    │ pydantic-ai │            │
-│   │ Agent()     │    │ Agent()     │    │ Agent()     │            │
-│   │ output_type=│    │ output_type=│    │ output_type=│            │
-│   │ SearchResult│    │ JudgeAssess │    │ Report      │            │
-│   └─────────────┘    └─────────────┘    └─────────────┘            │
-│                                                                     │
-│   Shared State: MagenticState (thread-safe via contextvars)        │
-│   - evidence: list[Evidence]                                       │
-│   - embedding_service: EmbeddingService                            │
-└─────────────────────────────────────────────────────────────────────┘
-```
-**Components:**
-- `src/orchestrator_magentic.py` - Multi-agent orchestrator
-- `src/agents/search_agent.py` - SearchAgent (BaseAgent)
-- `src/agents/judge_agent.py` - JudgeAgent (BaseAgent)
-- `src/agents/report_agent.py` - ReportAgent (BaseAgent)
-- `src/agents/analysis_agent.py` - AnalysisAgent (BaseAgent)
-- `src/agents/state.py` - Thread-safe state management
-- `src/agents/tools.py` - @ai_function decorated tools
----
-## 5. Mode Selection Logic
-```python
-# src/orchestrator_factory.py (actual implementation)
-def create_orchestrator(
-    search_handler: SearchHandlerProtocol | None = None,
-    judge_handler: JudgeHandlerProtocol | None = None,
-    config: OrchestratorConfig | None = None,
-    mode: Literal["simple", "magentic", "advanced"] | None = None,
-) -> Any:
-    """
-    Auto-select orchestrator based on available credentials.
-    Priority:
-    1. If mode explicitly set, use that
-    2. If OpenAI key available -> Advanced Mode (currently OpenAI only)
-    3. Otherwise -> Simple Mode (HuggingFace free tier)
-    """
-    effective_mode = _determine_mode(mode)
-    if effective_mode == "advanced":
-        orchestrator_cls = _get_magentic_orchestrator_class()
-        return orchestrator_cls(max_rounds=config.max_iterations if config else 10)
-    # Simple mode requires handlers
-    if search_handler is None or judge_handler is None:
-        raise ValueError("Simple mode requires search_handler and judge_handler")
-    return Orchestrator(
-        search_handler=search_handler,
-        judge_handler=judge_handler,
-        config=config,
-    )
-```
----
-## 6. Shared Components (Both Modes Use)
-These components work in both modes:
-| Component | Purpose |
-|-----------|---------|
-| `src/tools/pubmed.py` | PubMed search |
-| `src/tools/clinicaltrials.py` | ClinicalTrials.gov search |
-| `src/tools/europepmc.py` | Europe PMC search |
-| `src/tools/search_handler.py` | Scatter-gather orchestration |
-| `src/tools/rate_limiter.py` | Rate limiting |
-| `src/utils/models.py` | Evidence, Citation, JudgeAssessment |
-| `src/utils/config.py` | Settings |
-| `src/services/embeddings.py` | Vector search (optional) |
----
-## 7. pydantic-ai Integration Points
-Both modes use pydantic-ai for structured LLM outputs:
-```python
-# In JudgeHandler (both modes)
-from pydantic_ai import Agent
-from pydantic_ai.models.huggingface import HuggingFaceModel
-from pydantic_ai.models.openai import OpenAIModel
-from pydantic_ai.models.anthropic import AnthropicModel
-class JudgeHandler:
-    def __init__(self, model: Any = None):
-        self.model = model or get_model()  # Auto-selects based on config
-        self.agent = Agent(
-            model=self.model,
-            output_type=JudgeAssessment,  # Structured output!
-            system_prompt=SYSTEM_PROMPT,
-        )
-    async def assess(self, question: str, evidence: list[Evidence]) -> JudgeAssessment:
-        result = await self.agent.run(format_prompt(question, evidence))
-        return result.output  # Guaranteed to be JudgeAssessment
-```
----
-## 8. Microsoft Agent Framework Integration Points
-Advanced mode wraps pydantic-ai agents in BaseAgent:
-```python
-# In JudgeAgent (advanced mode only)
-from agent_framework import BaseAgent, AgentRunResponse, ChatMessage, Role
-class JudgeAgent(BaseAgent):
-    def __init__(self, judge_handler: JudgeHandlerProtocol):
-        super().__init__(
-            name="JudgeAgent",
-            description="Evaluates evidence quality",
-        )
-        self._handler = judge_handler  # Uses pydantic-ai internally
-    async def run(self, messages, **kwargs) -> AgentRunResponse:
-        question = extract_question(messages)
-        evidence = self._evidence_store.get("current", [])
-        # Delegate to pydantic-ai powered handler
-        assessment = await self._handler.assess(question, evidence)
-        return AgentRunResponse(
-            messages=[ChatMessage(role=Role.ASSISTANT, text=format_response(assessment))],
-            additional_properties={"assessment": assessment.model_dump()},
-        )
-```
----
-## 9. Benefits of This Architecture
-1. **Graceful Degradation**: Works without API keys (free tier)
-2. **Progressive Enhancement**: Better with API keys (orchestration)
-3. **Code Reuse**: pydantic-ai handlers shared between modes
-4. **Hackathon Ready**: Demo works without requiring paid keys
-5. **Production Ready**: Full orchestration available when needed
-6. **Future Proof**: Can add more agents to advanced mode
-7. **Testable**: Simple mode is easier to unit test
----
-## 10. Known Risks and Mitigations
-> **From Senior Agent Review**
-### 10.1 Bridge Complexity (MEDIUM)
-**Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai). Both are async. Context variables (`MagenticState`) must propagate correctly through the pydantic-ai call stack.
-**Mitigation:**
-- pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
-- Test context propagation explicitly in integration tests
-- If issues arise, pass state explicitly rather than via context vars
-### 10.2 Integration Drift (MEDIUM)
-**Risk:** Simple Mode and Advanced Mode might diverge in behavior over time (e.g., Simple Mode uses logic A, Advanced Mode uses logic B).
-**Mitigation:**
-- Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
-- Handlers are the single source of truth for business logic
-- Agents are thin wrappers that delegate to handlers
-### 10.3 Testing Burden (LOW-MEDIUM)
-**Risk:** Two distinct orchestrators (`src/orchestrator.py` and `src/orchestrator_magentic.py`) doubles integration testing surface area.
-**Mitigation:**
-- Unit test handlers independently (shared code)
-- Integration tests for each mode separately
-- End-to-end tests verify same output for same input (determinism permitting)
-### 10.4 Dependency Conflicts (LOW)
-**Risk:** `agent-framework-core` might conflict with `pydantic-ai`'s dependencies (e.g., different pydantic versions).
-**Status:** Both use `pydantic>=2.x`. Should be compatible.
----
-## 11. Naming Clarification
-> See `00_SITUATION_AND_PLAN.md` Section 4 for full details.
-**Important:** The codebase uses "magentic" in file names (`orchestrator_magentic.py`, `magentic_agents.py`) but this refers to our internal naming for Microsoft Agent Framework integration, **NOT** the `magentic` PyPI package.
-**Future action:** Rename to `orchestrator_advanced.py` to eliminate confusion.

docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md DELETED Viewed

@@ -1,112 +0,0 @@
-# Implementation Phases: Dual-Mode Agent System
-**Date:** November 27, 2025
-**Status:** IMPLEMENTATION PLAN (REVISED)
-**Strategy:** TDD (Test-Driven Development), SOLID Principles
-**Dependency Strategy:** PyPI (agent-framework-core)
----
-## Phase 0: Environment Validation & Cleanup
-**Goal:** Ensure clean state and dependencies are correctly installed.
-### Step 0.1: Verify PyPI Package
-The `agent-framework-core` package is published on PyPI by Microsoft. Verify installation:
-```bash
-uv sync --all-extras
-python -c "from agent_framework import ChatAgent; print('OK')"
-```
-### Step 0.2: Branch State
-We are on `feat/dual-mode-architecture`. Ensure it is up to date with `origin/dev` before starting.
-**Note:** The `reference_repos/agent-framework` folder is kept for reference/documentation only.
-The production dependency uses the official PyPI release.
----
-## Phase 1: Pydantic-AI Improvements (Simple Mode)
-**Goal:** Implement `HuggingFaceModel` support in `JudgeHandler` using strict TDD.
-### Step 1.1: Test First (Red)
-Create `tests/unit/agent_factory/test_judges_factory.py`:
-- Test `get_model()` returns `HuggingFaceModel` when `LLM_PROVIDER=huggingface`.
-- Test `get_model()` respects `HF_TOKEN`.
-- Test fallback to OpenAI.
-### Step 1.2: Implementation (Green)
-Update `src/utils/config.py`:
-- Add `huggingface_model` and `hf_token` fields.
-Update `src/agent_factory/judges.py`:
-- Implement `get_model` with the logic derived from the tests.
-- Use dependency injection for the model where possible.
-### Step 1.3: Refactor
-Ensure `JudgeHandler` is loosely coupled from the specific model provider.
----
-## Phase 2: Orchestrator Factory (The Switch)
-**Goal:** Implement the factory pattern to switch between Simple and Advanced modes.
-### Step 2.1: Test First (Red)
-Create `tests/unit/test_orchestrator_factory.py`:
-- Test `create_orchestrator` returns `Orchestrator` (simple) when API keys are missing.
-- Test `create_orchestrator` returns `MagenticOrchestrator` (advanced) when OpenAI key exists.
-- Test explicit mode override.
-### Step 2.2: Implementation (Green)
-Update `src/orchestrator_factory.py` to implement the selection logic.
----
-## Phase 3: Agent Framework Integration (Advanced Mode)
-**Goal:** Integrate Microsoft Agent Framework from PyPI.
-### Step 3.1: Dependency Management
-The `agent-framework-core` package is installed from PyPI:
-```toml
-[project.optional-dependencies]
-magentic = [
-    "agent-framework-core>=1.0.0b251120,<2.0.0",  # Microsoft Agent Framework (PyPI)
-]
-```
-Install with: `uv sync --all-extras`
-### Step 3.2: Verify Imports (Test First)
-Create `tests/unit/agents/test_agent_imports.py`:
-- Verify `from agent_framework import ChatAgent` works.
-- Verify instantiation of `ChatAgent` with a mock client.
-### Step 3.3: Update Agents
-Refactor `src/agents/*.py` to ensure they match the exact signature of the local `ChatAgent` class.
-- **SOLID:** Ensure agents have single responsibilities.
-- **DRY:** Share tool definitions between Pydantic-AI simple mode and Agent Framework advanced mode.
----
-## Phase 4: UI & End-to-End Verification
-**Goal:** Update Gradio to reflect the active mode.
-### Step 4.1: UI Updates
-Update `src/app.py` to display "Simple Mode" vs "Advanced Mode".
-### Step 4.2: End-to-End Test
-Run the full loop:
-1. Simple Mode (No Keys) -> Search -> Judge (HF) -> Report.
-2. Advanced Mode (OpenAI Key) -> SearchAgent -> JudgeAgent -> ReportAgent.
----
-## Phase 5: Cleanup & Documentation
-- Remove unused code.
-- Update main README.md.
-- Final `make check`.

docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md DELETED Viewed

@@ -1,112 +0,0 @@
-# Immediate Actions Checklist
-**Date:** November 27, 2025
-**Priority:** Execute in order
----
-## Before Starting Implementation
-### 1. Close PR #41 (CRITICAL)
-```bash
-gh pr close 41 --comment "Architecture decision changed. Cherry-picking improvements to preserve both pydantic-ai and Agent Framework capabilities."
-```
-### 2. Verify HuggingFace Spaces is Safe
-```bash
-# Should show agent framework files exist
-git ls-tree --name-only huggingface-upstream/dev -- src/agents/
-git ls-tree --name-only huggingface-upstream/dev -- src/orchestrator_magentic.py
-```
-Expected output: Files should exist (they do as of this writing).
-### 3. Clean Local Environment
-```bash
-# Switch to main first
-git checkout main
-# Delete problematic branches
-git branch -D refactor/pydantic-unification 2>/dev/null || true
-git branch -D feat/pubmed-fulltext 2>/dev/null || true
-# Reset local dev to origin/dev
-git branch -D dev 2>/dev/null || true
-git checkout -b dev origin/dev
-# Verify agent framework code exists
-ls src/agents/
-# Expected: __init__.py, analysis_agent.py, hypothesis_agent.py, judge_agent.py,
-#           magentic_agents.py, report_agent.py, search_agent.py, state.py, tools.py
-ls src/orchestrator_magentic.py
-# Expected: file exists
-```
-### 4. Create Fresh Feature Branch
-```bash
-git checkout -b feat/dual-mode-architecture origin/dev
-```
----
-## Decision Points
-Before proceeding, confirm:
-1. **For hackathon**: Do we need advanced mode, or is simple mode sufficient?
-   - Simple mode = faster to implement, works today
-   - Advanced mode = better quality, more work
-2. **Timeline**: How much time do we have?
-   - If < 1 day: Focus on simple mode only
-   - If > 1 day: Implement dual-mode
-3. **Dependencies**: Is `agent-framework-core` available?
-   - Check: `pip index versions agent-framework-core`
-   - If not on PyPI, may need to install from GitHub
----
-## Quick Start (Simple Mode Only)
-If time is limited, implement only simple mode improvements:
-```bash
-# On feat/dual-mode-architecture branch
-# 1. Update judges.py to add HuggingFace support
-# 2. Update config.py to add HF settings
-# 3. Create free_tier_demo.py
-# 4. Run make check
-# 5. Create PR to dev
-```
-This gives you free-tier capability without touching agent framework code.
----
-## Quick Start (Full Dual-Mode)
-If time permits, implement full dual-mode:
-Follow phases 1-6 in `02_IMPLEMENTATION_PHASES.md`
----
-## Emergency Rollback
-If anything goes wrong:
-```bash
-# Reset to safe state
-git checkout main
-git branch -D feat/dual-mode-architecture
-git checkout -b feat/dual-mode-architecture origin/dev
-```
-Origin/dev is the safe fallback - it has agent framework intact.

docs/brainstorming/magentic-pydantic/04_FOLLOWUP_REVIEW_REQUEST.md DELETED Viewed

@@ -1,158 +0,0 @@
-# Follow-Up Review Request: Did We Implement Your Feedback?
-**Date:** November 27, 2025
-**Context:** You previously reviewed our dual-mode architecture plan and provided feedback. We have updated the documentation. Please verify we correctly implemented your recommendations.
----
-## Your Original Feedback vs Our Changes
-### 1. Naming Confusion Clarification
-**Your feedback:** "You are using Microsoft Agent Framework, but you've named your integration 'Magentic'. This caused the confusion."
-**Our change:** Added Section 4 in `00_SITUATION_AND_PLAN.md`:
-```markdown
-## 4. CRITICAL: Naming Confusion Clarification
-> **Senior Agent Review Finding:** The codebase uses "magentic" in file names
-> (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT**
-> the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework.
-**The naming confusion:**
-- `magentic` (PyPI package): A different library for structured LLM outputs
-- "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
-- `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
-**Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py`
-```
-**Status:** ✅ IMPLEMENTED
----
-### 2. Bridge Complexity Warning
-**Your feedback:** "You must ensure MagenticState (context vars) propagates correctly through the pydantic-ai call stack."
-**Our change:** Added Section 10.1 in `01_ARCHITECTURE_SPEC.md`:
-```markdown
-### 10.1 Bridge Complexity (MEDIUM)
-**Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai).
-Both are async. Context variables (`MagenticState`) must propagate correctly.
-**Mitigation:**
-- pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
-- Test context propagation explicitly in integration tests
-- If issues arise, pass state explicitly rather than via context vars
-```
-**Status:** ✅ IMPLEMENTED
----
-### 3. Integration Drift Warning
-**Your feedback:** "Simple Mode and Advanced Mode might diverge in behavior."
-**Our change:** Added Section 10.2 in `01_ARCHITECTURE_SPEC.md`:
-```markdown
-### 10.2 Integration Drift (MEDIUM)
-**Risk:** Simple Mode and Advanced Mode might diverge in behavior over time.
-**Mitigation:**
-- Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
-- Handlers are the single source of truth for business logic
-- Agents are thin wrappers that delegate to handlers
-```
-**Status:** ✅ IMPLEMENTED
----
-### 4. Testing Burden Warning
-**Your feedback:** "You now have two distinct orchestrators to maintain. This doubles your integration testing surface area."
-**Our change:** Added Section 10.3 in `01_ARCHITECTURE_SPEC.md`:
-```markdown
-### 10.3 Testing Burden (LOW-MEDIUM)
-**Risk:** Two distinct orchestrators doubles integration testing surface area.
-**Mitigation:**
-- Unit test handlers independently (shared code)
-- Integration tests for each mode separately
-- End-to-end tests verify same output for same input
-```
-**Status:** ✅ IMPLEMENTED
----
-### 5. Rename Recommendation
-**Your feedback:** "Rename `src/orchestrator_magentic.py` to `src/orchestrator_advanced.py`"
-**Our change:** Added Step 3.4 in `02_IMPLEMENTATION_PHASES.md`:
-```markdown
-### Step 3.4: (OPTIONAL) Rename "Magentic" to "Advanced"
-> **Senior Agent Recommendation:** Rename files to eliminate confusion.
-git mv src/orchestrator_magentic.py src/orchestrator_advanced.py
-git mv src/agents/magentic_agents.py src/agents/advanced_agents.py
-**Note:** This is optional for the hackathon. Can be done in a follow-up PR.
-```
-**Status:** ✅ DOCUMENTED (marked as optional for hackathon)
----
-### 6. Standardize Wrapper Recommendation
-**Your feedback:** "Create a generic `PydanticAiAgentWrapper(BaseAgent)` class instead of manually wrapping each handler."
-**Our change:** NOT YET DOCUMENTED
-**Status:** ⚠️ NOT IMPLEMENTED - Should we add this?
----
-## Questions for Your Review
-1. **Did we correctly implement your feedback?** Are there any misunderstandings in how we interpreted your recommendations?
-2. **Is the "Standardize Wrapper" recommendation critical?** Should we add it to the implementation phases, or is it a nice-to-have for later?
-3. **Dependency versioning:** You noted `agent-framework-core>=1.0.0b251120` might be ephemeral. Should we:
-   - Pin to a specific version?
-   - Use a version range?
-   - Install from GitHub source?
-4. **Anything else we missed?**
----
-## Files to Re-Review
-1. `00_SITUATION_AND_PLAN.md` - Added Section 4 (Naming Clarification)
-2. `01_ARCHITECTURE_SPEC.md` - Added Sections 10-11 (Risks, Naming)
-3. `02_IMPLEMENTATION_PHASES.md` - Added Step 3.4 (Optional Rename)
----
-## Current Branch State
-We are now on `feat/dual-mode-architecture` branched from `origin/dev`:
-- ✅ Agent framework code intact (`src/agents/`, `src/orchestrator_magentic.py`)
-- ✅ Documentation committed
-- ❌ PR #41 still open (need to close it)
-- ❌ Cherry-pick of pydantic-ai improvements not yet done
----
-Please confirm: **GO / NO-GO** to proceed with Phase 1 (cherry-picking pydantic-ai improvements)?

docs/brainstorming/magentic-pydantic/REVIEW_PROMPT_FOR_SENIOR_AGENT.md DELETED Viewed

@@ -1,113 +0,0 @@
-# Senior Agent Review Prompt
-Copy and paste everything below this line to a fresh Claude/AI session:
----
-## Context
-I am a junior developer working on a HuggingFace hackathon project called DeepCritical. We made a significant architectural mistake and are now trying to course-correct. I need you to act as a **senior staff engineer** and critically review our proposed solution.
-## The Situation
-We almost merged a refactor that would have **deleted** our multi-agent orchestration capability, mistakenly believing that `pydantic-ai` (a library for structured LLM outputs) and Microsoft's `agent-framework` (a framework for multi-agent orchestration) were mutually exclusive alternatives.
-**They are not.** They are complementary:
-- `pydantic-ai` ensures LLM responses match Pydantic schemas (type-safe outputs)
-- `agent-framework` orchestrates multiple agents working together (coordination layer)
-We now want to implement a **dual-mode architecture** where:
-- **Simple Mode (No API key):** Uses only pydantic-ai with HuggingFace free tier
-- **Advanced Mode (With API key):** Uses Microsoft Agent Framework for orchestration, with pydantic-ai inside each agent for structured outputs
-## Your Task
-Please perform a **deep, critical review** of:
-1. **The architecture diagram** (image attached: `assets/magentic-pydantic.png`)
-2. **Our documentation** (4 files listed below)
-3. **The actual codebase** to verify our claims
-## Specific Questions to Answer
-### Architecture Validation
-1. Is our understanding correct that pydantic-ai and agent-framework are complementary, not competing?
-2. Does the dual-mode architecture diagram accurately represent how these should integrate?
-3. Are there any architectural flaws or anti-patterns in our proposed design?
-### Documentation Accuracy
-4. Are the branch states we documented accurate? (Check `git log`, `git ls-tree`)
-5. Is our understanding of what code exists where correct?
-6. Are the implementation phases realistic and in the correct order?
-7. Are there any missing steps or dependencies we overlooked?
-### Codebase Reality Check
-8. Does `origin/dev` actually have the agent framework code intact? Verify by checking:
-   - `git ls-tree origin/dev -- src/agents/`
-   - `git ls-tree origin/dev -- src/orchestrator_magentic.py`
-9. What does the current `src/agents/` code actually import? Does it use `agent_framework` or `agent-framework-core`?
-10. Is the `agent-framework-core` package actually available on PyPI, or do we need to install from source?
-### Implementation Feasibility
-11. Can the cherry-pick strategy we outlined actually work, or are there merge conflicts we're not seeing?
-12. Is the mode auto-detection logic sound?
-13. What are the risks we haven't identified?
-### Critical Errors Check
-14. Did we miss anything critical in our analysis?
-15. Are there any factual errors in our documentation?
-16. Would a Google/DeepMind senior engineer approve this plan, or would they flag issues?
-## Files to Review
-Please read these files in order:
-1. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md`
-2. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md`
-3. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md`
-4. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md`
-And the architecture diagram:
-5. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/assets/magentic-pydantic.png`
-## Reference Repositories to Consult
-We have local clones of the source-of-truth repositories:
-- **Original DeepCritical:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/DeepCritical/`
-- **Microsoft Agent Framework:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/agent-framework/`
-- **Microsoft AutoGen:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/autogen-microsoft/`
-Please cross-reference our hackathon fork against these to verify architectural alignment.
-## Codebase to Analyze
-Our hackathon fork is at:
-`/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/`
-Key files to examine:
-- `src/agents/` - Agent framework integration
-- `src/agent_factory/judges.py` - pydantic-ai integration
-- `src/orchestrator.py` - Simple mode orchestrator
-- `src/orchestrator_magentic.py` - Advanced mode orchestrator
-- `src/orchestrator_factory.py` - Mode selection
-- `pyproject.toml` - Dependencies
-## Expected Output
-Please provide:
-1. **Validation Summary:** Is our plan sound? (YES/NO with explanation)
-2. **Errors Found:** List any factual errors in our documentation
-3. **Missing Items:** What did we overlook?
-4. **Risk Assessment:** What could go wrong?
-5. **Recommended Changes:** Specific edits to our documentation or plan
-6. **Go/No-Go Recommendation:** Should we proceed with this plan?
-## Tone
-Be brutally honest. If our plan is flawed, say so directly. We would rather know now than after implementation. Don't soften criticism - we need accuracy.
----
-END OF PROMPT

docs/bugs/FIX_PLAN_MAGENTIC_MODE.md DELETED Viewed

@@ -1,227 +0,0 @@
-# Fix Plan: Magentic Mode Report Generation
-**Related Bug**: `P0_MAGENTIC_MODE_BROKEN.md`
-**Approach**: Test-Driven Development (TDD)
-**Estimated Scope**: 4 tasks, ~2-3 hours
----
-## Problem Summary
-Magentic mode runs but fails to produce readable reports due to:
-1. **Primary Bug**: `MagenticFinalResultEvent.message` returns `ChatMessage` object, not text
-2. **Secondary Bug**: Max rounds (3) reached before ReportAgent completes
-3. **Tertiary Issues**: Stale "bioRxiv" references in prompts
----
-## Fix Order (TDD)
-### Phase 1: Write Failing Tests
-**Task 1.1**: Create test for ChatMessage text extraction
-```python
-# tests/unit/test_orchestrator_magentic.py
-def test_process_event_extracts_text_from_chat_message():
-    """Final result event should extract text from ChatMessage object."""
-    # Arrange: Mock ChatMessage with .content attribute
-    # Act: Call _process_event with MagenticFinalResultEvent
-    # Assert: Returned AgentEvent.message is a string, not object repr
-```
-**Task 1.2**: Create test for max rounds configuration
-```python
-def test_orchestrator_uses_configured_max_rounds():
-    """MagenticOrchestrator should use max_rounds from constructor."""
-    # Arrange: Create orchestrator with max_rounds=10
-    # Act: Build workflow
-    # Assert: Workflow has max_round_count=10
-```
-**Task 1.3**: Create test for bioRxiv reference removal
-```python
-def test_task_prompt_references_europe_pmc():
-    """Task prompt should reference Europe PMC, not bioRxiv."""
-    # Arrange: Create orchestrator
-    # Act: Check task string in run()
-    # Assert: Contains "Europe PMC", not "bioRxiv"
-```
----
-### Phase 2: Fix ChatMessage Text Extraction
-**File**: `src/orchestrator_magentic.py`
-**Lines**: 192-199
-**Current Code**:
-```python
-elif isinstance(event, MagenticFinalResultEvent):
-    text = event.message.text if event.message else "No result"
-```
-**Fixed Code**:
-```python
-elif isinstance(event, MagenticFinalResultEvent):
-    if event.message:
-        # ChatMessage may have .content or .text depending on version
-        if hasattr(event.message, 'content') and event.message.content:
-            text = str(event.message.content)
-        elif hasattr(event.message, 'text') and event.message.text:
-            text = str(event.message.text)
-        else:
-            # Fallback: convert entire message to string
-            text = str(event.message)
-    else:
-        text = "No result generated"
-```
-**Why**: The `agent_framework.ChatMessage` object structure may vary. We need defensive extraction.
----
-### Phase 3: Fix Max Rounds Configuration
-**File**: `src/orchestrator_magentic.py`
-**Lines**: 97-99
-**Current Code**:
-```python
-.with_standard_manager(
-    chat_client=manager_client,
-    max_round_count=self._max_rounds,  # Already uses config
-    max_stall_count=3,
-    max_reset_count=2,
-)
-```
-**Issue**: Default `max_rounds` in `__init__` is 10, but workflow may need more for complex queries.
-**Fix**: Verify the value flows through correctly. Add logging.
-```python
-logger.info(
-    "Building Magentic workflow",
-    max_rounds=self._max_rounds,
-    max_stall=3,
-    max_reset=2,
-)
-```
-**Also check**: `src/orchestrator_factory.py` passes config correctly:
-```python
-return MagenticOrchestrator(
-    max_rounds=config.max_iterations if config else 10,
-)
-```
----
-### Phase 4: Fix Stale bioRxiv References
-**Files to update**:
-| File | Line | Change |
-|------|------|--------|
-| `src/orchestrator_magentic.py` | 131 | "bioRxiv" → "Europe PMC" |
-| `src/agents/magentic_agents.py` | 32-33 | "bioRxiv" → "Europe PMC" |
-| `src/app.py` | 202-203 | "bioRxiv" → "Europe PMC" |
-**Search command to verify**:
-```bash
-grep -rn "bioRxiv\|biorxiv" src/
-```
----
-## Implementation Checklist
-```
-[ ] Phase 1: Write failing tests
-    [ ] 1.1 Test ChatMessage text extraction
-    [ ] 1.2 Test max rounds configuration
-    [ ] 1.3 Test Europe PMC references
-[ ] Phase 2: Fix ChatMessage extraction
-    [ ] Update _process_event() in orchestrator_magentic.py
-    [ ] Run test 1.1 - should pass
-[ ] Phase 3: Fix max rounds
-    [ ] Add logging to _build_workflow()
-    [ ] Verify factory passes config correctly
-    [ ] Run test 1.2 - should pass
-[ ] Phase 4: Fix bioRxiv references
-    [ ] Update orchestrator_magentic.py task prompt
-    [ ] Update magentic_agents.py descriptions
-    [ ] Update app.py UI text
-    [ ] Run test 1.3 - should pass
-    [ ] Run grep to verify no remaining refs
-[ ] Final Verification
-    [ ] make check passes
-    [ ] All tests pass (108+)
-    [ ] Manual test: run_magentic.py produces readable report
-```
----
-## Test Commands
-```bash
-# Run specific test file
-uv run pytest tests/unit/test_orchestrator_magentic.py -v
-# Run all tests
-uv run pytest tests/unit/ -v
-# Full check
-make check
-# Manual integration test
-set -a && source .env && set +a
-uv run python examples/orchestrator_demo/run_magentic.py "metformin alzheimer"
-```
----
-## Success Criteria
-1. `run_magentic.py` outputs a readable research report (not `<ChatMessage object>`)
-2. Report includes: Executive Summary, Key Findings, Drug Candidates, References
-3. No "Max round count reached" error with default settings
-4. No "bioRxiv" references anywhere in codebase
-5. All 108+ tests pass
-6. `make check` passes
----
-## Files Modified
-```
-src/
-├── orchestrator_magentic.py   # ChatMessage fix, logging
-├── agents/magentic_agents.py  # bioRxiv → Europe PMC
-└── app.py                     # bioRxiv → Europe PMC
-tests/unit/
-└── test_orchestrator_magentic.py  # NEW: 3 tests
-```
----
-## Notes for AI Agent
-When implementing this fix plan:
-1. **DO NOT** create mock data or fake responses
-2. **DO** write real tests that verify actual behavior
-3. **DO** run `make check` after each phase
-4. **DO** test with real OpenAI API key via `.env`
-5. **DO** preserve existing functionality - simple mode must still work
-6. **DO NOT** over-engineer - minimal changes to fix the specific bugs

docs/bugs/P0_MAGENTIC_MODE_BROKEN.md DELETED Viewed

@@ -1,116 +0,0 @@
-# P0 Bug: Magentic Mode Returns ChatMessage Object Instead of Report Text
-**Status**: OPEN
-**Priority**: P0 (Critical)
-**Date**: 2025-11-27
----
-## Actual Bug Found (Not What We Thought)
-**The OpenAI key works fine.** The real bug is different:
-### The Problem
-When Magentic mode completes, the final report returns a `ChatMessage` object instead of the actual text:
-```
-FINAL REPORT:
-<agent_framework._types.ChatMessage object at 0x11db70310>
-```
-### Evidence
-Full test output shows:
-1. Magentic orchestrator starts correctly
-2. SearchAgent finds evidence
-3. HypothesisAgent generates hypotheses
-4. JudgeAgent evaluates
-5. **BUT**: Final output is `ChatMessage` object, not text
-### Root Cause
-In `src/orchestrator_magentic.py` line 193:
-```python
-elif isinstance(event, MagenticFinalResultEvent):
-    text = event.message.text if event.message else "No result"
-```
-The `event.message` is a `ChatMessage` object, and `.text` may not extract the content correctly, or the message structure changed in the agent-framework library.
----
-## Secondary Issue: Max Rounds Reached
-The orchestrator hits max rounds before producing a report:
-```
-[ERROR] Magentic Orchestrator: Max round count reached
-```
-This means the workflow times out before the ReportAgent synthesizes the final output.
----
-## What Works
-- OpenAI API key: **Works** (loaded from .env)
-- SearchAgent: **Works** (finds evidence from PubMed, ClinicalTrials, Europe PMC)
-- HypothesisAgent: **Works** (generates Drug -> Target -> Pathway chains)
-- JudgeAgent: **Partial** (evaluates but sometimes loses context)
----
-## Files to Fix
-| File | Line | Issue |
-|------|------|-------|
-| `src/orchestrator_magentic.py` | 193 | `event.message.text` returns object, not string |
-| `src/orchestrator_magentic.py` | 97-99 | `max_round_count=3` too low for full pipeline |
----
-## Suggested Fix
-```python
-# In _process_event, line 192-199
-elif isinstance(event, MagenticFinalResultEvent):
-    # Handle ChatMessage object properly
-    if event.message:
-        if hasattr(event.message, 'content'):
-            text = event.message.content
-        elif hasattr(event.message, 'text'):
-            text = event.message.text
-        else:
-            text = str(event.message)
-    else:
-        text = "No result"
-```
-And increase rounds:
-```python
-# In _build_workflow, line 97
-max_round_count=self._max_rounds,  # Use configured value, default 10
-```
----
-## Test Command
-```bash
-set -a && source .env && set +a && uv run python examples/orchestrator_demo/run_magentic.py "metformin alzheimer"
-```
----
-## Simple Mode Works
-For reference, simple mode produces full reports:
-```bash
-uv run python examples/orchestrator_demo/run_agent.py "metformin alzheimer"
-```
-Output includes structured report with Drug Candidates, Key Findings, etc.

docs/bugs/P1_GRADIO_SETTINGS_CLEANUP.md DELETED Viewed

@@ -1,81 +0,0 @@
-# P1 Bug: Gradio Settings Accordion Not Collapsing
-**Priority**: P1 (UX Bug)
-**Status**: OPEN
-**Date**: 2025-11-27
-**Target Component**: `src/app.py`
----
-## 1. Problem Description
-The "Settings" accordion in the Gradio UI (containing Orchestrator Mode, API Key, Provider) fails to collapse, even when configured with `open=False`. It remains permanently expanded, cluttering the interface and obscuring the chat history.
-### Symptoms
-- Accordion arrow toggles visually, but content remains visible.
-- Occurs in both local development (`uv run src/app.py`) and HuggingFace Spaces.
----
-## 2. Root Cause Analysis
-**Definitive Cause**: Nested `Blocks` Context Bug.
-`gr.ChatInterface` is itself a high-level abstraction that creates a `gr.Blocks` context. Wrapping `gr.ChatInterface` inside an external `with gr.Blocks():` context causes event listener conflicts, specifically breaking the JavaScript state management for `additional_inputs_accordion`.
-**Reference**: [Gradio Issue #8861](https://github.com/gradio-app/gradio/issues/8861) confirms that `additional_inputs_accordion` malfunctions when `ChatInterface` is not the top-level block.
----
-## 3. Solution Strategy: "The Unwrap Fix"
-We will remove the redundant `gr.Blocks` wrapper. This restores the native behavior of `ChatInterface`, ensuring the accordion respects `open=False`.
-### Implementation Plan
-**Refactor `src/app.py` / `create_demo()`**:
-1.  **Remove** the `with gr.Blocks() as demo:` context manager.
-2.  **Instantiate** `gr.ChatInterface` directly as the `demo` object.
-3.  **Migrate UI Elements**:
-    *   **Header**: Move the H1/Title text into the `title` parameter of `ChatInterface`.
-    *   **Footer**: Move the footer text ("MCP Server Active...") into the `description` parameter. `ChatInterface` supports Markdown in `description`, making it the ideal place for static info below the title but above the chat.
-### Before (Buggy)
-```python
-def create_demo():
-    with gr.Blocks() as demo:  # <--- CAUSE OF BUG
-        gr.Markdown("# Title")
-        gr.ChatInterface(..., additional_inputs_accordion=gr.Accordion(open=False))
-        gr.Markdown("Footer")
-    return demo
-```
-### After (Correct)
-```python
-def create_demo():
-    return gr.ChatInterface(   # <--- FIX: Top-level component
-        ...,
-        title="🧬 DeepCritical",
-        description="*AI-Powered Drug Repurposing Agent...*\n\n---\n**MCP Server Active**...",
-        additional_inputs_accordion=gr.Accordion(label="⚙️ Settings", open=False)
-    )
-```
----
-## 4. Validation
-1.  **Run**: `uv run python src/app.py`
-2.  **Check**: Open `http://localhost:7860`
-3.  **Verify**:
-    *   Settings accordion starts **COLLAPSED**.
-    *   Header title ("DeepCritical") is visible.
-    *   Footer text ("MCP Server Active") is visible in the description area.
-    *   Chat functionality works (Magentic/Simple modes).
----
-## 5. Constraints & Notes
-- **Layout**: We lose the ability to place arbitrary elements *below* the chat box (footer will move to top, under title), but this is an acceptable trade-off for a working UI.
-- **CSS**: `ChatInterface` handles its own CSS; any custom class styling from the previous footer will be standardized to the description text style.

docs/configuration/CONFIGURATION.md ADDED Viewed

	@@ -0,0 +1,743 @@

+# Configuration Guide
+## Overview
+DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in the `Settings` class in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
+The configuration system provides:
+- **Type Safety**: Strongly-typed fields with Pydantic validation
+- **Environment File Support**: Automatically loads from `.env` file (if present)
+- **Case-Insensitive**: Environment variables are case-insensitive
+- **Singleton Pattern**: Global `settings` instance for easy access throughout the codebase
+- **Validation**: Automatic validation on load with helpful error messages
+## Quick Start
+1. Create a `.env` file in the project root
+2. Set at least one LLM API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `HF_TOKEN`)
+3. Optionally configure other services as needed
+4. The application will automatically load and validate your configuration
+## Configuration System Architecture
+### Settings Class
+The `Settings` class extends `BaseSettings` from `pydantic_settings` and defines all application configuration:
+```13:21:src/utils/config.py
+class Settings(BaseSettings):
+    """Strongly-typed application settings."""
+    model_config = SettingsConfigDict(
+        env_file=".env",
+        env_file_encoding="utf-8",
+        case_sensitive=False,
+        extra="ignore",
+    )
+```
+### Singleton Instance
+A global `settings` instance is available for import:
+```234:235:src/utils/config.py
+# Singleton for easy import
+settings = get_settings()
+```
+### Usage Pattern
+Access configuration throughout the codebase:
+```python
+from src.utils.config import settings
+# Check if API keys are available
+if settings.has_openai_key:
+    # Use OpenAI
+    pass
+# Access configuration values
+max_iterations = settings.max_iterations
+web_search_provider = settings.web_search_provider
+```
+## Required Configuration
+### LLM Provider
+You must configure at least one LLM provider. The system supports:
+- **OpenAI**: Requires `OPENAI_API_KEY`
+- **Anthropic**: Requires `ANTHROPIC_API_KEY`
+- **HuggingFace**: Optional `HF_TOKEN` or `HUGGINGFACE_API_KEY` (can work without key for public models)
+#### OpenAI Configuration
+```bash
+LLM_PROVIDER=openai
+OPENAI_API_KEY=your_openai_api_key_here
+OPENAI_MODEL=gpt-5.1
+```
+The default model is defined in the `Settings` class:
+```29:29:src/utils/config.py
+    openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
+```
+#### Anthropic Configuration
+```bash
+LLM_PROVIDER=anthropic
+ANTHROPIC_API_KEY=your_anthropic_api_key_here
+ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
+```
+The default model is defined in the `Settings` class:
+```30:32:src/utils/config.py
+    anthropic_model: str = Field(
+        default="claude-sonnet-4-5-20250929", description="Anthropic model"
+    )
+```
+#### HuggingFace Configuration
+HuggingFace can work without an API key for public models, but an API key provides higher rate limits:
+```bash
+# Option 1: Using HF_TOKEN (preferred)
+HF_TOKEN=your_huggingface_token_here
+# Option 2: Using HUGGINGFACE_API_KEY (alternative)
+HUGGINGFACE_API_KEY=your_huggingface_api_key_here
+# Default model
+HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
+```
+The HuggingFace token can be set via either environment variable:
+```33:35:src/utils/config.py
+    hf_token: str | None = Field(
+        default=None, alias="HF_TOKEN", description="HuggingFace API token"
+    )
+```
+```57:59:src/utils/config.py
+    huggingface_api_key: str | None = Field(
+        default=None, description="HuggingFace API token (HF_TOKEN or HUGGINGFACE_API_KEY)"
+    )
+```
+## Optional Configuration
+### Embedding Configuration
+DeepCritical supports multiple embedding providers for semantic search and RAG:
+```bash
+# Embedding Provider: "openai", "local", or "huggingface"
+EMBEDDING_PROVIDER=local
+# OpenAI Embedding Model (used by LlamaIndex RAG)
+OPENAI_EMBEDDING_MODEL=text-embedding-3-small
+# Local Embedding Model (sentence-transformers, used by EmbeddingService)
+LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
+# HuggingFace Embedding Model
+HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
+```
+The embedding provider configuration:
+```47:50:src/utils/config.py
+    embedding_provider: Literal["openai", "local", "huggingface"] = Field(
+        default="local",
+        description="Embedding provider to use",
+    )
+```
+**Note**: OpenAI embeddings require `OPENAI_API_KEY`. The local provider (default) uses sentence-transformers and requires no API key.
+### Web Search Configuration
+DeepCritical supports multiple web search providers:
+```bash
+# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
+# Default: "duckduckgo" (no API key required)
+WEB_SEARCH_PROVIDER=duckduckgo
+# Serper API Key (for Google search via Serper)
+SERPER_API_KEY=your_serper_api_key_here
+# SearchXNG Host URL (for self-hosted search)
+SEARCHXNG_HOST=http://localhost:8080
+# Brave Search API Key
+BRAVE_API_KEY=your_brave_api_key_here
+# Tavily API Key
+TAVILY_API_KEY=your_tavily_api_key_here
+```
+The web search provider configuration:
+```71:74:src/utils/config.py
+    web_search_provider: Literal["serper", "searchxng", "brave", "tavily", "duckduckgo"] = Field(
+        default="duckduckgo",
+        description="Web search provider to use",
+    )
+```
+**Note**: DuckDuckGo is the default and requires no API key, making it ideal for development and testing.
+### PubMed Configuration
+PubMed search supports optional NCBI API key for higher rate limits:
+```bash
+# NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
+NCBI_API_KEY=your_ncbi_api_key_here
+```
+The PubMed tool uses this configuration:
+```22:29:src/tools/pubmed.py
+    def __init__(self, api_key: str | None = None) -> None:
+        self.api_key = api_key or settings.ncbi_api_key
+        # Ignore placeholder values from .env.example
+        if self.api_key == "your-ncbi-key-here":
+            self.api_key = None
+        # Use shared rate limiter
+        self._limiter = get_pubmed_limiter(self.api_key)
+```
+### Agent Configuration
+Control agent behavior and research loop execution:
+```bash
+# Maximum iterations per research loop (1-50, default: 10)
+MAX_ITERATIONS=10
+# Search timeout in seconds
+SEARCH_TIMEOUT=30
+# Use graph-based execution for research flows
+USE_GRAPH_EXECUTION=false
+```
+The agent configuration fields:
+```80:85:src/utils/config.py
+    # Agent Configuration
+    max_iterations: int = Field(default=10, ge=1, le=50)
+    search_timeout: int = Field(default=30, description="Seconds to wait for search")
+    use_graph_execution: bool = Field(
+        default=False, description="Use graph-based execution for research flows"
+    )
+```
+### Budget & Rate Limiting Configuration
+Control resource limits for research loops:
+```bash
+# Default token budget per research loop (1000-1000000, default: 100000)
+DEFAULT_TOKEN_LIMIT=100000
+# Default time limit per research loop in minutes (1-120, default: 10)
+DEFAULT_TIME_LIMIT_MINUTES=10
+# Default iterations limit per research loop (1-50, default: 10)
+DEFAULT_ITERATIONS_LIMIT=10
+```
+The budget configuration with validation:
+```87:105:src/utils/config.py
+    # Budget & Rate Limiting Configuration
+    default_token_limit: int = Field(
+        default=100000,
+        ge=1000,
+        le=1000000,
+        description="Default token budget per research loop",
+    )
+    default_time_limit_minutes: int = Field(
+        default=10,
+        ge=1,
+        le=120,
+        description="Default time limit per research loop (minutes)",
+    )
+    default_iterations_limit: int = Field(
+        default=10,
+        ge=1,
+        le=50,
+        description="Default iterations limit per research loop",
+    )
+```
+### RAG Service Configuration
+Configure the Retrieval-Augmented Generation service:
+```bash
+# ChromaDB collection name for RAG
+RAG_COLLECTION_NAME=deepcritical_evidence
+# Number of top results to retrieve from RAG (1-50, default: 5)
+RAG_SIMILARITY_TOP_K=5
+# Automatically ingest evidence into RAG
+RAG_AUTO_INGEST=true
+```
+The RAG configuration:
+```127:141:src/utils/config.py
+    # RAG Service Configuration
+    rag_collection_name: str = Field(
+        default="deepcritical_evidence",
+        description="ChromaDB collection name for RAG",
+    )
+    rag_similarity_top_k: int = Field(
+        default=5,
+        ge=1,
+        le=50,
+        description="Number of top results to retrieve from RAG",
+    )
+    rag_auto_ingest: bool = Field(
+        default=True,
+        description="Automatically ingest evidence into RAG",
+    )
+```
+### ChromaDB Configuration
+Configure the vector database for embeddings and RAG:
+```bash
+# ChromaDB storage path
+CHROMA_DB_PATH=./chroma_db
+# Whether to persist ChromaDB to disk
+CHROMA_DB_PERSIST=true
+# ChromaDB server host (for remote ChromaDB, optional)
+CHROMA_DB_HOST=localhost
+# ChromaDB server port (for remote ChromaDB, optional)
+CHROMA_DB_PORT=8000
+```
+The ChromaDB configuration:
+```113:125:src/utils/config.py
+    chroma_db_path: str = Field(default="./chroma_db", description="ChromaDB storage path")
+    chroma_db_persist: bool = Field(
+        default=True,
+        description="Whether to persist ChromaDB to disk",
+    )
+    chroma_db_host: str | None = Field(
+        default=None,
+        description="ChromaDB server host (for remote ChromaDB)",
+    )
+    chroma_db_port: int | None = Field(
+        default=None,
+        description="ChromaDB server port (for remote ChromaDB)",
+    )
+```
+### External Services
+#### Modal Configuration
+Modal is used for secure sandbox execution of statistical analysis:
+```bash
+# Modal Token ID (for Modal sandbox execution)
+MODAL_TOKEN_ID=your_modal_token_id_here
+# Modal Token Secret
+MODAL_TOKEN_SECRET=your_modal_token_secret_here
+```
+The Modal configuration:
+```110:112:src/utils/config.py
+    # External Services
+    modal_token_id: str | None = Field(default=None, description="Modal token ID")
+    modal_token_secret: str | None = Field(default=None, description="Modal token secret")
+```
+### Logging Configuration
+Configure structured logging:
+```bash
+# Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
+LOG_LEVEL=INFO
+```
+The logging configuration:
+```107:108:src/utils/config.py
+    # Logging
+    log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
+```
+Logging is configured via the `configure_logging()` function:
+```212:231:src/utils/config.py
+def configure_logging(settings: Settings) -> None:
+    """Configure structured logging with the configured log level."""
+    # Set stdlib logging level from settings
+    logging.basicConfig(
+        level=getattr(logging, settings.log_level),
+        format="%(message)s",
+    )
+    structlog.configure(
+        processors=[
+            structlog.stdlib.filter_by_level,
+            structlog.stdlib.add_logger_name,
+            structlog.stdlib.add_log_level,
+            structlog.processors.TimeStamper(fmt="iso"),
+            structlog.processors.JSONRenderer(),
+        ],
+        wrapper_class=structlog.stdlib.BoundLogger,
+        context_class=dict,
+        logger_factory=structlog.stdlib.LoggerFactory(),
+    )
+```
+## Configuration Properties
+The `Settings` class provides helpful properties for checking configuration state:
+### API Key Availability
+Check which API keys are available:
+```171:189:src/utils/config.py
+    @property
+    def has_openai_key(self) -> bool:
+        """Check if OpenAI API key is available."""
+        return bool(self.openai_api_key)
+    @property
+    def has_anthropic_key(self) -> bool:
+        """Check if Anthropic API key is available."""
+        return bool(self.anthropic_api_key)
+    @property
+    def has_huggingface_key(self) -> bool:
+        """Check if HuggingFace API key is available."""
+        return bool(self.huggingface_api_key or self.hf_token)
+    @property
+    def has_any_llm_key(self) -> bool:
+        """Check if any LLM API key is available."""
+        return self.has_openai_key or self.has_anthropic_key or self.has_huggingface_key
+```
+**Usage:**
+```python
+from src.utils.config import settings
+# Check API key availability
+if settings.has_openai_key:
+    # Use OpenAI
+    pass
+if settings.has_anthropic_key:
+    # Use Anthropic
+    pass
+if settings.has_huggingface_key:
+    # Use HuggingFace
+    pass
+if settings.has_any_llm_key:
+    # At least one LLM is available
+    pass
+```
+### Service Availability
+Check if external services are configured:
+```143:146:src/utils/config.py
+    @property
+    def modal_available(self) -> bool:
+        """Check if Modal credentials are configured."""
+        return bool(self.modal_token_id and self.modal_token_secret)
+```
+```191:204:src/utils/config.py
+    @property
+    def web_search_available(self) -> bool:
+        """Check if web search is available (either no-key provider or API key present)."""
+        if self.web_search_provider == "duckduckgo":
+            return True  # No API key required
+        if self.web_search_provider == "serper":
+            return bool(self.serper_api_key)
+        if self.web_search_provider == "searchxng":
+            return bool(self.searchxng_host)
+        if self.web_search_provider == "brave":
+            return bool(self.brave_api_key)
+        if self.web_search_provider == "tavily":
+            return bool(self.tavily_api_key)
+        return False
+```
+**Usage:**
+```python
+from src.utils.config import settings
+# Check service availability
+if settings.modal_available:
+    # Use Modal sandbox
+    pass
+if settings.web_search_available:
+    # Web search is configured
+    pass
+```
+### API Key Retrieval
+Get the API key for the configured provider:
+```148:160:src/utils/config.py
+    def get_api_key(self) -> str:
+        """Get the API key for the configured provider."""
+        if self.llm_provider == "openai":
+            if not self.openai_api_key:
+                raise ConfigurationError("OPENAI_API_KEY not set")
+            return self.openai_api_key
+        if self.llm_provider == "anthropic":
+            if not self.anthropic_api_key:
+                raise ConfigurationError("ANTHROPIC_API_KEY not set")
+            return self.anthropic_api_key
+        raise ConfigurationError(f"Unknown LLM provider: {self.llm_provider}")
+```
+For OpenAI-specific operations (e.g., Magentic mode):
+```162:169:src/utils/config.py
+    def get_openai_api_key(self) -> str:
+        """Get OpenAI API key (required for Magentic function calling)."""
+        if not self.openai_api_key:
+            raise ConfigurationError(
+                "OPENAI_API_KEY not set. Magentic mode requires OpenAI for function calling. "
+                "Use mode='simple' for other providers."
+            )
+        return self.openai_api_key
+```
+## Configuration Usage in Codebase
+The configuration system is used throughout the codebase:
+### LLM Factory
+The LLM factory uses settings to create appropriate models:
+```129:144:src/utils/llm_factory.py
+    if settings.llm_provider == "huggingface":
+        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
+        hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
+        return HuggingFaceModel(model_name, provider=hf_provider)
+    if settings.llm_provider == "openai":
+        if not settings.openai_api_key:
+            raise ConfigurationError("OPENAI_API_KEY not set for pydantic-ai")
+        provider = OpenAIProvider(api_key=settings.openai_api_key)
+        return OpenAIModel(settings.openai_model, provider=provider)
+    if settings.llm_provider == "anthropic":
+        if not settings.anthropic_api_key:
+            raise ConfigurationError("ANTHROPIC_API_KEY not set for pydantic-ai")
+        anthropic_provider = AnthropicProvider(api_key=settings.anthropic_api_key)
+        return AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
+```
+### Embedding Service
+The embedding service uses local embedding model configuration:
+```29:31:src/services/embeddings.py
+    def __init__(self, model_name: str | None = None):
+        self._model_name = model_name or settings.local_embedding_model
+        self._model = SentenceTransformer(self._model_name)
+```
+### Orchestrator Factory
+The orchestrator factory uses settings to determine mode:
+```69:80:src/orchestrator_factory.py
+def _determine_mode(explicit_mode: str | None) -> str:
+    """Determine which mode to use."""
+    if explicit_mode:
+        if explicit_mode in ("magentic", "advanced"):
+            return "advanced"
+        return "simple"
+    # Auto-detect: advanced if paid API key available
+    if settings.has_openai_key:
+        return "advanced"
+    return "simple"
+```
+## Environment Variables Reference
+### Required (at least one LLM)
+- `OPENAI_API_KEY` - OpenAI API key (required for OpenAI provider)
+- `ANTHROPIC_API_KEY` - Anthropic API key (required for Anthropic provider)
+- `HF_TOKEN` or `HUGGINGFACE_API_KEY` - HuggingFace API token (optional, can work without for public models)
+#### LLM Configuration Variables
+- `LLM_PROVIDER` - Provider to use: `"openai"`, `"anthropic"`, or `"huggingface"` (default: `"huggingface"`)
+- `OPENAI_MODEL` - OpenAI model name (default: `"gpt-5.1"`)
+- `ANTHROPIC_MODEL` - Anthropic model name (default: `"claude-sonnet-4-5-20250929"`)
+- `HUGGINGFACE_MODEL` - HuggingFace model ID (default: `"meta-llama/Llama-3.1-8B-Instruct"`)
+#### Embedding Configuration Variables
+- `EMBEDDING_PROVIDER` - Provider: `"openai"`, `"local"`, or `"huggingface"` (default: `"local"`)
+- `OPENAI_EMBEDDING_MODEL` - OpenAI embedding model (default: `"text-embedding-3-small"`)
+- `LOCAL_EMBEDDING_MODEL` - Local sentence-transformers model (default: `"all-MiniLM-L6-v2"`)
+- `HUGGINGFACE_EMBEDDING_MODEL` - HuggingFace embedding model (default: `"sentence-transformers/all-MiniLM-L6-v2"`)
+#### Web Search Configuration Variables
+- `WEB_SEARCH_PROVIDER` - Provider: `"serper"`, `"searchxng"`, `"brave"`, `"tavily"`, or `"duckduckgo"` (default: `"duckduckgo"`)
+- `SERPER_API_KEY` - Serper API key (required for Serper provider)
+- `SEARCHXNG_HOST` - SearchXNG host URL (required for SearchXNG provider)
+- `BRAVE_API_KEY` - Brave Search API key (required for Brave provider)
+- `TAVILY_API_KEY` - Tavily API key (required for Tavily provider)
+#### PubMed Configuration Variables
+- `NCBI_API_KEY` - NCBI API key (optional, increases rate limit from 3 to 10 req/sec)
+#### Agent Configuration Variables
+- `MAX_ITERATIONS` - Maximum iterations per research loop (1-50, default: `10`)
+- `SEARCH_TIMEOUT` - Search timeout in seconds (default: `30`)
+- `USE_GRAPH_EXECUTION` - Use graph-based execution (default: `false`)
+#### Budget Configuration Variables
+- `DEFAULT_TOKEN_LIMIT` - Default token budget per research loop (1000-1000000, default: `100000`)
+- `DEFAULT_TIME_LIMIT_MINUTES` - Default time limit in minutes (1-120, default: `10`)
+- `DEFAULT_ITERATIONS_LIMIT` - Default iterations limit (1-50, default: `10`)
+#### RAG Configuration Variables
+- `RAG_COLLECTION_NAME` - ChromaDB collection name (default: `"deepcritical_evidence"`)
+- `RAG_SIMILARITY_TOP_K` - Number of top results to retrieve (1-50, default: `5`)
+- `RAG_AUTO_INGEST` - Automatically ingest evidence into RAG (default: `true`)
+#### ChromaDB Configuration Variables
+- `CHROMA_DB_PATH` - ChromaDB storage path (default: `"./chroma_db"`)
+- `CHROMA_DB_PERSIST` - Whether to persist ChromaDB to disk (default: `true`)
+- `CHROMA_DB_HOST` - ChromaDB server host (optional, for remote ChromaDB)
+- `CHROMA_DB_PORT` - ChromaDB server port (optional, for remote ChromaDB)
+#### External Services Variables
+- `MODAL_TOKEN_ID` - Modal token ID (optional, for Modal sandbox execution)
+- `MODAL_TOKEN_SECRET` - Modal token secret (optional, for Modal sandbox execution)
+#### Logging Configuration Variables
+- `LOG_LEVEL` - Log level: `"DEBUG"`, `"INFO"`, `"WARNING"`, or `"ERROR"` (default: `"INFO"`)
+## Validation
+Settings are validated on load using Pydantic validation:
+- **Type Checking**: All fields are strongly typed
+- **Range Validation**: Numeric fields have min/max constraints (e.g., `ge=1, le=50` for `max_iterations`)
+- **Literal Validation**: Enum fields only accept specific values (e.g., `Literal["openai", "anthropic", "huggingface"]`)
+- **Required Fields**: API keys are checked when accessed via `get_api_key()` or `get_openai_api_key()`
+### Validation Examples
+The `max_iterations` field has range validation:
+```81:81:src/utils/config.py
+    max_iterations: int = Field(default=10, ge=1, le=50)
+```
+The `llm_provider` field has literal validation:
+```26:28:src/utils/config.py
+    llm_provider: Literal["openai", "anthropic", "huggingface"] = Field(
+        default="openai", description="Which LLM provider to use"
+    )
+```
+## Error Handling
+Configuration errors raise `ConfigurationError` from `src/utils/exceptions.py`:
+```22:25:src/utils/exceptions.py
+class ConfigurationError(DeepCriticalError):
+    """Raised when configuration is invalid."""
+    pass
+```
+### Error Handling Example
+```python
+from src.utils.config import settings
+from src.utils.exceptions import ConfigurationError
+try:
+    api_key = settings.get_api_key()
+except ConfigurationError as e:
+    print(f"Configuration error: {e}")
+```
+### Common Configuration Errors
+1. **Missing API Key**: When `get_api_key()` is called but the required API key is not set
+2. **Invalid Provider**: When `llm_provider` is set to an unsupported value
+3. **Out of Range**: When numeric values exceed their min/max constraints
+4. **Invalid Literal**: When enum fields receive unsupported values
+## Configuration Best Practices
+1. **Use `.env` File**: Store sensitive keys in `.env` file (add to `.gitignore`)
+2. **Check Availability**: Use properties like `has_openai_key` before accessing API keys
+3. **Handle Errors**: Always catch `ConfigurationError` when calling `get_api_key()`
+4. **Validate Early**: Configuration is validated on import, so errors surface immediately
+5. **Use Defaults**: Leverage sensible defaults for optional configuration
+## Future Enhancements
+The following configurations are planned for future phases:
+1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
+2. **Model Selection**: Reasoning/main/fast model configuration
+3. **Service Integration**: Additional service integrations and configurations