Spaces:
Running
Running
| # Phase 7 Implementation Spec: Hypothesis Agent | |
| **Goal**: Add an agent that generates scientific hypotheses to guide targeted searches. | |
| **Philosophy**: "Don't just find evidenceβunderstand the mechanisms." | |
| **Prerequisite**: Phase 6 complete (Embeddings working) | |
| --- | |
| ## 1. Why Hypothesis Agent? | |
| Current limitation: **Search is reactive, not hypothesis-driven.** | |
| Current flow: | |
| 1. User asks about "metformin alzheimer" | |
| 2. Search finds papers | |
| 3. Judge says "need more evidence" | |
| 4. Search again with slightly different keywords | |
| With Hypothesis Agent: | |
| 1. User asks about "metformin alzheimer" | |
| 2. Search finds initial papers | |
| 3. **Hypothesis Agent analyzes**: "Evidence suggests metformin β AMPK activation β autophagy β amyloid clearance" | |
| 4. Search can now target: "metformin AMPK", "autophagy neurodegeneration", "amyloid clearance drugs" | |
| **Key insight**: Scientific research is hypothesis-driven. The agent should think like a researcher. | |
| --- | |
| ## 2. Architecture | |
| ### Current (Phase 6) | |
| ``` | |
| User Query β Magentic Manager | |
| βββ SearchAgent β Evidence | |
| βββ JudgeAgent β Sufficient? β Synthesize/Continue | |
| ``` | |
| ### Phase 7 | |
| ``` | |
| User Query β Magentic Manager | |
| βββ SearchAgent β Evidence | |
| βββ HypothesisAgent β Mechanistic Hypotheses β NEW | |
| βββ JudgeAgent β Sufficient? β Synthesize/Continue | |
| β | |
| Uses hypotheses to guide next search | |
| ``` | |
| ### Shared Context Enhancement | |
| ```python | |
| evidence_store = { | |
| "current": [], | |
| "embeddings": {}, | |
| "vector_index": None, | |
| "hypotheses": [], # NEW: Generated hypotheses | |
| "tested_hypotheses": [], # NEW: Hypotheses with supporting/contradicting evidence | |
| } | |
| ``` | |
| --- | |
| ## 3. Hypothesis Model | |
| ### 3.1 Data Model (`src/utils/models.py`) | |
| ```python | |
| class MechanismHypothesis(BaseModel): | |
| """A scientific hypothesis about drug mechanism.""" | |
| drug: str = Field(description="The drug being studied") | |
| target: str = Field(description="Molecular target (e.g., AMPK, mTOR)") | |
| pathway: str = Field(description="Biological pathway affected") | |
| effect: str = Field(description="Downstream effect on disease") | |
| confidence: float = Field(ge=0, le=1, description="Confidence in hypothesis") | |
| supporting_evidence: list[str] = Field( | |
| default_factory=list, | |
| description="PMIDs or URLs supporting this hypothesis" | |
| ) | |
| contradicting_evidence: list[str] = Field( | |
| default_factory=list, | |
| description="PMIDs or URLs contradicting this hypothesis" | |
| ) | |
| search_suggestions: list[str] = Field( | |
| default_factory=list, | |
| description="Suggested searches to test this hypothesis" | |
| ) | |
| def to_search_queries(self) -> list[str]: | |
| """Generate search queries to test this hypothesis.""" | |
| return [ | |
| f"{self.drug} {self.target}", | |
| f"{self.target} {self.pathway}", | |
| f"{self.pathway} {self.effect}", | |
| *self.search_suggestions | |
| ] | |
| ``` | |
| ### 3.2 Hypothesis Assessment | |
| ```python | |
| class HypothesisAssessment(BaseModel): | |
| """Assessment of evidence against hypotheses.""" | |
| hypotheses: list[MechanismHypothesis] | |
| primary_hypothesis: MechanismHypothesis | None = Field( | |
| description="Most promising hypothesis based on current evidence" | |
| ) | |
| knowledge_gaps: list[str] = Field( | |
| description="What we don't know yet" | |
| ) | |
| recommended_searches: list[str] = Field( | |
| description="Searches to fill knowledge gaps" | |
| ) | |
| ``` | |
| --- | |
| ## 4. Implementation | |
| ### 4.0 Text Utilities (`src/utils/text_utils.py`) | |
| > **Why These Utilities?** | |
| > | |
| > The original spec used arbitrary truncation (`evidence[:10]` and `content[:300]`). | |
| > This loses important information randomly. These utilities provide: | |
| > 1. **Sentence-aware truncation** - cuts at sentence boundaries, not mid-word | |
| > 2. **Diverse evidence selection** - uses embeddings to select varied evidence (MMR) | |
| ```python | |
| """Text processing utilities for evidence handling.""" | |
| from typing import TYPE_CHECKING | |
| if TYPE_CHECKING: | |
| from src.services.embeddings import EmbeddingService | |
| from src.utils.models import Evidence | |
| def truncate_at_sentence(text: str, max_chars: int = 300) -> str: | |
| """Truncate text at sentence boundary, preserving meaning. | |
| Args: | |
| text: The text to truncate | |
| max_chars: Maximum characters (default 300) | |
| Returns: | |
| Text truncated at last complete sentence within limit | |
| """ | |
| if len(text) <= max_chars: | |
| return text | |
| # Find truncation point | |
| truncated = text[:max_chars] | |
| # Look for sentence endings: . ! ? followed by space or end | |
| for sep in ['. ', '! ', '? ', '.\n', '!\n', '?\n']: | |
| last_sep = truncated.rfind(sep) | |
| if last_sep > max_chars // 2: # Don't truncate too aggressively | |
| return text[:last_sep + 1].strip() | |
| # Fallback: find last period | |
| last_period = truncated.rfind('.') | |
| if last_period > max_chars // 2: | |
| return text[:last_period + 1].strip() | |
| # Last resort: truncate at word boundary | |
| last_space = truncated.rfind(' ') | |
| if last_space > 0: | |
| return text[:last_space].strip() + "..." | |
| return truncated + "..." | |
| async def select_diverse_evidence( | |
| evidence: list["Evidence"], | |
| n: int, | |
| query: str, | |
| embeddings: "EmbeddingService | None" = None | |
| ) -> list["Evidence"]: | |
| """Select n most diverse and relevant evidence items. | |
| Uses Maximal Marginal Relevance (MMR) when embeddings available, | |
| falls back to relevance_score sorting otherwise. | |
| Args: | |
| evidence: All available evidence | |
| n: Number of items to select | |
| query: Original query for relevance scoring | |
| embeddings: Optional EmbeddingService for semantic diversity | |
| Returns: | |
| Selected evidence items, diverse and relevant | |
| """ | |
| if not evidence: | |
| return [] | |
| if n >= len(evidence): | |
| return evidence | |
| # Fallback: sort by relevance score if no embeddings | |
| if embeddings is None: | |
| return sorted( | |
| evidence, | |
| key=lambda e: e.relevance_score, | |
| reverse=True | |
| )[:n] | |
| # MMR: Maximal Marginal Relevance for diverse selection | |
| # Score = Ξ» * relevance - (1-Ξ») * max_similarity_to_selected | |
| lambda_param = 0.7 # Balance relevance vs diversity | |
| # Get query embedding | |
| query_emb = await embeddings.embed(query) | |
| # Get all evidence embeddings | |
| evidence_embs = await embeddings.embed_batch([e.content for e in evidence]) | |
| # Compute relevance scores (cosine similarity to query) | |
| from numpy import dot | |
| from numpy.linalg import norm | |
| cosine = lambda a, b: float(dot(a, b) / (norm(a) * norm(b))) | |
| relevance_scores = [cosine(query_emb, emb) for emb in evidence_embs] | |
| # Greedy MMR selection | |
| selected_indices: list[int] = [] | |
| remaining = set(range(len(evidence))) | |
| for _ in range(n): | |
| best_score = float('-inf') | |
| best_idx = -1 | |
| for idx in remaining: | |
| # Relevance component | |
| relevance = relevance_scores[idx] | |
| # Diversity component: max similarity to already selected | |
| if selected_indices: | |
| max_sim = max( | |
| cosine(evidence_embs[idx], evidence_embs[sel]) | |
| for sel in selected_indices | |
| ) | |
| else: | |
| max_sim = 0 | |
| # MMR score | |
| mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim | |
| if mmr_score > best_score: | |
| best_score = mmr_score | |
| best_idx = idx | |
| if best_idx >= 0: | |
| selected_indices.append(best_idx) | |
| remaining.remove(best_idx) | |
| return [evidence[i] for i in selected_indices] | |
| ``` | |
| ### 4.1 Hypothesis Prompts (`src/prompts/hypothesis.py`) | |
| ```python | |
| """Prompts for Hypothesis Agent.""" | |
| from src.utils.text_utils import truncate_at_sentence, select_diverse_evidence | |
| SYSTEM_PROMPT = """You are a biomedical research scientist specializing in drug repurposing. | |
| Your role is to generate mechanistic hypotheses based on evidence. | |
| A good hypothesis: | |
| 1. Proposes a MECHANISM: Drug β Target β Pathway β Effect | |
| 2. Is TESTABLE: Can be supported or refuted by literature search | |
| 3. Is SPECIFIC: Names actual molecular targets and pathways | |
| 4. Generates SEARCH QUERIES: Helps find more evidence | |
| Example hypothesis format: | |
| - Drug: Metformin | |
| - Target: AMPK (AMP-activated protein kinase) | |
| - Pathway: mTOR inhibition β autophagy activation | |
| - Effect: Enhanced clearance of amyloid-beta in Alzheimer's | |
| - Confidence: 0.7 | |
| - Search suggestions: ["metformin AMPK brain", "autophagy amyloid clearance"] | |
| Be specific. Use actual gene/protein names when possible.""" | |
| async def format_hypothesis_prompt( | |
| query: str, | |
| evidence: list, | |
| embeddings=None | |
| ) -> str: | |
| """Format prompt for hypothesis generation. | |
| Uses smart evidence selection instead of arbitrary truncation. | |
| Args: | |
| query: The research query | |
| evidence: All collected evidence | |
| embeddings: Optional EmbeddingService for diverse selection | |
| """ | |
| # Select diverse, relevant evidence (not arbitrary first 10) | |
| selected = await select_diverse_evidence( | |
| evidence, n=10, query=query, embeddings=embeddings | |
| ) | |
| # Format with sentence-aware truncation | |
| evidence_text = "\n".join([ | |
| f"- **{e.citation.title}** ({e.citation.source}): {truncate_at_sentence(e.content, 300)}" | |
| for e in selected | |
| ]) | |
| return f"""Based on the following evidence about "{query}", generate mechanistic hypotheses. | |
| ## Evidence ({len(selected)} papers selected for diversity) | |
| {evidence_text} | |
| ## Task | |
| 1. Identify potential drug targets mentioned in the evidence | |
| 2. Propose mechanism hypotheses (Drug β Target β Pathway β Effect) | |
| 3. Rate confidence based on evidence strength | |
| 4. Suggest searches to test each hypothesis | |
| Generate 2-4 hypotheses, prioritized by confidence.""" | |
| ``` | |
| ### 4.2 Hypothesis Agent (`src/agents/hypothesis_agent.py`) | |
| ```python | |
| """Hypothesis agent for mechanistic reasoning.""" | |
| from collections.abc import AsyncIterable | |
| from typing import TYPE_CHECKING, Any | |
| from agent_framework import ( | |
| AgentRunResponse, | |
| AgentRunResponseUpdate, | |
| AgentThread, | |
| BaseAgent, | |
| ChatMessage, | |
| Role, | |
| ) | |
| from pydantic_ai import Agent | |
| from src.prompts.hypothesis import SYSTEM_PROMPT, format_hypothesis_prompt | |
| from src.utils.config import settings | |
| from src.utils.models import Evidence, HypothesisAssessment | |
| if TYPE_CHECKING: | |
| from src.services.embeddings import EmbeddingService | |
| class HypothesisAgent(BaseAgent): | |
| """Generates mechanistic hypotheses based on evidence.""" | |
| def __init__( | |
| self, | |
| evidence_store: dict[str, list[Evidence]], | |
| embedding_service: "EmbeddingService | None" = None, # NEW: for diverse selection | |
| ) -> None: | |
| super().__init__( | |
| name="HypothesisAgent", | |
| description="Generates scientific hypotheses about drug mechanisms to guide research", | |
| ) | |
| self._evidence_store = evidence_store | |
| self._embeddings = embedding_service # Used for MMR evidence selection | |
| self._agent = Agent( | |
| model=settings.llm_provider, # Uses configured LLM | |
| output_type=HypothesisAssessment, | |
| system_prompt=SYSTEM_PROMPT, | |
| ) | |
| async def run( | |
| self, | |
| messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None, | |
| *, | |
| thread: AgentThread | None = None, | |
| **kwargs: Any, | |
| ) -> AgentRunResponse: | |
| """Generate hypotheses based on current evidence.""" | |
| # Extract query | |
| query = self._extract_query(messages) | |
| # Get current evidence | |
| evidence = self._evidence_store.get("current", []) | |
| if not evidence: | |
| return AgentRunResponse( | |
| messages=[ChatMessage( | |
| role=Role.ASSISTANT, | |
| text="No evidence available yet. Search for evidence first." | |
| )], | |
| response_id="hypothesis-no-evidence", | |
| ) | |
| # Generate hypotheses with diverse evidence selection | |
| # NOTE: format_hypothesis_prompt is now async | |
| prompt = await format_hypothesis_prompt( | |
| query, evidence, embeddings=self._embeddings | |
| ) | |
| result = await self._agent.run(prompt) | |
| assessment = result.output | |
| # Store hypotheses in shared context | |
| existing = self._evidence_store.get("hypotheses", []) | |
| self._evidence_store["hypotheses"] = existing + assessment.hypotheses | |
| # Format response | |
| response_text = self._format_response(assessment) | |
| return AgentRunResponse( | |
| messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)], | |
| response_id=f"hypothesis-{len(assessment.hypotheses)}", | |
| additional_properties={"assessment": assessment.model_dump()}, | |
| ) | |
| def _format_response(self, assessment: HypothesisAssessment) -> str: | |
| """Format hypothesis assessment as markdown.""" | |
| lines = ["## Generated Hypotheses\n"] | |
| for i, h in enumerate(assessment.hypotheses, 1): | |
| lines.append(f"### Hypothesis {i} (Confidence: {h.confidence:.0%})") | |
| lines.append(f"**Mechanism**: {h.drug} β {h.target} β {h.pathway} β {h.effect}") | |
| lines.append(f"**Suggested searches**: {', '.join(h.search_suggestions)}\n") | |
| if assessment.primary_hypothesis: | |
| lines.append(f"### Primary Hypothesis") | |
| h = assessment.primary_hypothesis | |
| lines.append(f"{h.drug} β {h.target} β {h.pathway} β {h.effect}\n") | |
| if assessment.knowledge_gaps: | |
| lines.append("### Knowledge Gaps") | |
| for gap in assessment.knowledge_gaps: | |
| lines.append(f"- {gap}") | |
| if assessment.recommended_searches: | |
| lines.append("\n### Recommended Next Searches") | |
| for search in assessment.recommended_searches: | |
| lines.append(f"- `{search}`") | |
| return "\n".join(lines) | |
| def _extract_query(self, messages) -> str: | |
| """Extract query from messages.""" | |
| if isinstance(messages, str): | |
| return messages | |
| elif isinstance(messages, ChatMessage): | |
| return messages.text or "" | |
| elif isinstance(messages, list): | |
| for msg in reversed(messages): | |
| if isinstance(msg, ChatMessage) and msg.role == Role.USER: | |
| return msg.text or "" | |
| elif isinstance(msg, str): | |
| return msg | |
| return "" | |
| async def run_stream( | |
| self, | |
| messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None, | |
| *, | |
| thread: AgentThread | None = None, | |
| **kwargs: Any, | |
| ) -> AsyncIterable[AgentRunResponseUpdate]: | |
| """Streaming wrapper.""" | |
| result = await self.run(messages, thread=thread, **kwargs) | |
| yield AgentRunResponseUpdate( | |
| messages=result.messages, | |
| response_id=result.response_id | |
| ) | |
| ``` | |
| ### 4.3 Update MagenticOrchestrator | |
| Add HypothesisAgent to the workflow: | |
| ```python | |
| # In MagenticOrchestrator.__init__ | |
| self._hypothesis_agent = HypothesisAgent(self._evidence_store) | |
| # In workflow building | |
| workflow = ( | |
| MagenticBuilder() | |
| .participants( | |
| searcher=search_agent, | |
| hypothesizer=self._hypothesis_agent, # NEW | |
| judge=judge_agent, | |
| ) | |
| .with_standard_manager(...) | |
| .build() | |
| ) | |
| # Update task instruction | |
| task = f"""Research drug repurposing opportunities for: {query} | |
| Workflow: | |
| 1. SearchAgent: Find initial evidence from PubMed and web | |
| 2. HypothesisAgent: Generate mechanistic hypotheses (Drug β Target β Pathway β Effect) | |
| 3. SearchAgent: Use hypothesis-suggested queries for targeted search | |
| 4. JudgeAgent: Evaluate if evidence supports hypotheses | |
| 5. Repeat until confident or max rounds | |
| Focus on: | |
| - Identifying specific molecular targets | |
| - Understanding mechanism of action | |
| - Finding supporting/contradicting evidence for hypotheses | |
| """ | |
| ``` | |
| --- | |
| ## 5. Directory Structure After Phase 7 | |
| ``` | |
| src/ | |
| βββ agents/ | |
| β βββ search_agent.py | |
| β βββ judge_agent.py | |
| β βββ hypothesis_agent.py # NEW | |
| βββ prompts/ | |
| β βββ judge.py | |
| β βββ hypothesis.py # NEW | |
| βββ services/ | |
| β βββ embeddings.py | |
| βββ utils/ | |
| βββ models.py # Updated with hypothesis models | |
| ``` | |
| --- | |
| ## 6. Tests | |
| ### 6.1 Unit Tests (`tests/unit/agents/test_hypothesis_agent.py`) | |
| ```python | |
| """Unit tests for HypothesisAgent.""" | |
| import pytest | |
| from unittest.mock import AsyncMock, MagicMock, patch | |
| from src.agents.hypothesis_agent import HypothesisAgent | |
| from src.utils.models import Citation, Evidence, HypothesisAssessment, MechanismHypothesis | |
| @pytest.fixture | |
| def sample_evidence(): | |
| return [ | |
| Evidence( | |
| content="Metformin activates AMPK, which inhibits mTOR signaling...", | |
| citation=Citation( | |
| source="pubmed", | |
| title="Metformin and AMPK", | |
| url="https://pubmed.ncbi.nlm.nih.gov/12345/", | |
| date="2023" | |
| ) | |
| ) | |
| ] | |
| @pytest.fixture | |
| def mock_assessment(): | |
| return HypothesisAssessment( | |
| hypotheses=[ | |
| MechanismHypothesis( | |
| drug="Metformin", | |
| target="AMPK", | |
| pathway="mTOR inhibition", | |
| effect="Reduced cancer cell proliferation", | |
| confidence=0.75, | |
| search_suggestions=["metformin AMPK cancer", "mTOR cancer therapy"] | |
| ) | |
| ], | |
| primary_hypothesis=None, | |
| knowledge_gaps=["Clinical trial data needed"], | |
| recommended_searches=["metformin clinical trial cancer"] | |
| ) | |
| @pytest.mark.asyncio | |
| async def test_hypothesis_agent_generates_hypotheses(sample_evidence, mock_assessment): | |
| """HypothesisAgent should generate mechanistic hypotheses.""" | |
| store = {"current": sample_evidence, "hypotheses": []} | |
| with patch("src.agents.hypothesis_agent.Agent") as MockAgent: | |
| mock_result = MagicMock() | |
| mock_result.output = mock_assessment | |
| MockAgent.return_value.run = AsyncMock(return_value=mock_result) | |
| agent = HypothesisAgent(store) | |
| response = await agent.run("metformin cancer") | |
| assert "AMPK" in response.messages[0].text | |
| assert len(store["hypotheses"]) == 1 | |
| @pytest.mark.asyncio | |
| async def test_hypothesis_agent_no_evidence(): | |
| """HypothesisAgent should handle empty evidence gracefully.""" | |
| store = {"current": [], "hypotheses": []} | |
| agent = HypothesisAgent(store) | |
| response = await agent.run("test query") | |
| assert "No evidence" in response.messages[0].text | |
| ``` | |
| --- | |
| ## 7. Definition of Done | |
| Phase 7 is **COMPLETE** when: | |
| 1. `MechanismHypothesis` and `HypothesisAssessment` models implemented | |
| 2. `HypothesisAgent` generates hypotheses from evidence | |
| 3. Hypotheses stored in shared context | |
| 4. Search queries generated from hypotheses | |
| 5. Magentic workflow includes HypothesisAgent | |
| 6. All unit tests pass | |
| --- | |
| ## 8. Value Delivered | |
| | Before (Phase 6) | After (Phase 7) | | |
| |------------------|-----------------| | |
| | Reactive search | Hypothesis-driven search | | |
| | Generic queries | Mechanism-targeted queries | | |
| | No scientific reasoning | Drug β Target β Pathway β Effect | | |
| | Judge says "need more" | Hypothesis says "search for X to test Y" | | |
| **Real example improvement:** | |
| - Query: "metformin alzheimer" | |
| - Before: "metformin alzheimer mechanism", "metformin brain" | |
| - After: "metformin AMPK activation", "AMPK autophagy neurodegeneration", "autophagy amyloid clearance" | |
| The search becomes **scientifically targeted** rather than keyword variations. | |