Spaces:
Running
Running
Phase 7 Implementation Spec: Hypothesis Agent
Goal: Add an agent that generates scientific hypotheses to guide targeted searches. Philosophy: "Don't just find evidenceβunderstand the mechanisms." Prerequisite: Phase 6 complete (Embeddings working)
1. Why Hypothesis Agent?
Current limitation: Search is reactive, not hypothesis-driven.
Current flow:
- User asks about "metformin alzheimer"
- Search finds papers
- Judge says "need more evidence"
- Search again with slightly different keywords
With Hypothesis Agent:
- User asks about "metformin alzheimer"
- Search finds initial papers
- Hypothesis Agent analyzes: "Evidence suggests metformin β AMPK activation β autophagy β amyloid clearance"
- Search can now target: "metformin AMPK", "autophagy neurodegeneration", "amyloid clearance drugs"
Key insight: Scientific research is hypothesis-driven. The agent should think like a researcher.
2. Architecture
Current (Phase 6)
User Query β Magentic Manager
βββ SearchAgent β Evidence
βββ JudgeAgent β Sufficient? β Synthesize/Continue
Phase 7
User Query β Magentic Manager
βββ SearchAgent β Evidence
βββ HypothesisAgent β Mechanistic Hypotheses β NEW
βββ JudgeAgent β Sufficient? β Synthesize/Continue
β
Uses hypotheses to guide next search
Shared Context Enhancement
evidence_store = {
"current": [],
"embeddings": {},
"vector_index": None,
"hypotheses": [], # NEW: Generated hypotheses
"tested_hypotheses": [], # NEW: Hypotheses with supporting/contradicting evidence
}
3. Hypothesis Model
3.1 Data Model (src/utils/models.py)
class MechanismHypothesis(BaseModel):
"""A scientific hypothesis about drug mechanism."""
drug: str = Field(description="The drug being studied")
target: str = Field(description="Molecular target (e.g., AMPK, mTOR)")
pathway: str = Field(description="Biological pathway affected")
effect: str = Field(description="Downstream effect on disease")
confidence: float = Field(ge=0, le=1, description="Confidence in hypothesis")
supporting_evidence: list[str] = Field(
default_factory=list,
description="PMIDs or URLs supporting this hypothesis"
)
contradicting_evidence: list[str] = Field(
default_factory=list,
description="PMIDs or URLs contradicting this hypothesis"
)
search_suggestions: list[str] = Field(
default_factory=list,
description="Suggested searches to test this hypothesis"
)
def to_search_queries(self) -> list[str]:
"""Generate search queries to test this hypothesis."""
return [
f"{self.drug} {self.target}",
f"{self.target} {self.pathway}",
f"{self.pathway} {self.effect}",
*self.search_suggestions
]
3.2 Hypothesis Assessment
class HypothesisAssessment(BaseModel):
"""Assessment of evidence against hypotheses."""
hypotheses: list[MechanismHypothesis]
primary_hypothesis: MechanismHypothesis | None = Field(
description="Most promising hypothesis based on current evidence"
)
knowledge_gaps: list[str] = Field(
description="What we don't know yet"
)
recommended_searches: list[str] = Field(
description="Searches to fill knowledge gaps"
)
4. Implementation
4.0 Text Utilities (src/utils/text_utils.py)
Why These Utilities?
The original spec used arbitrary truncation (
evidence[:10]andcontent[:300]). This loses important information randomly. These utilities provide:
- Sentence-aware truncation - cuts at sentence boundaries, not mid-word
- Diverse evidence selection - uses embeddings to select varied evidence (MMR)
"""Text processing utilities for evidence handling."""
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from src.services.embeddings import EmbeddingService
from src.utils.models import Evidence
def truncate_at_sentence(text: str, max_chars: int = 300) -> str:
"""Truncate text at sentence boundary, preserving meaning.
Args:
text: The text to truncate
max_chars: Maximum characters (default 300)
Returns:
Text truncated at last complete sentence within limit
"""
if len(text) <= max_chars:
return text
# Find truncation point
truncated = text[:max_chars]
# Look for sentence endings: . ! ? followed by space or end
for sep in ['. ', '! ', '? ', '.\n', '!\n', '?\n']:
last_sep = truncated.rfind(sep)
if last_sep > max_chars // 2: # Don't truncate too aggressively
return text[:last_sep + 1].strip()
# Fallback: find last period
last_period = truncated.rfind('.')
if last_period > max_chars // 2:
return text[:last_period + 1].strip()
# Last resort: truncate at word boundary
last_space = truncated.rfind(' ')
if last_space > 0:
return text[:last_space].strip() + "..."
return truncated + "..."
async def select_diverse_evidence(
evidence: list["Evidence"],
n: int,
query: str,
embeddings: "EmbeddingService | None" = None
) -> list["Evidence"]:
"""Select n most diverse and relevant evidence items.
Uses Maximal Marginal Relevance (MMR) when embeddings available,
falls back to relevance_score sorting otherwise.
Args:
evidence: All available evidence
n: Number of items to select
query: Original query for relevance scoring
embeddings: Optional EmbeddingService for semantic diversity
Returns:
Selected evidence items, diverse and relevant
"""
if not evidence:
return []
if n >= len(evidence):
return evidence
# Fallback: sort by relevance score if no embeddings
if embeddings is None:
return sorted(
evidence,
key=lambda e: e.relevance_score,
reverse=True
)[:n]
# MMR: Maximal Marginal Relevance for diverse selection
# Score = Ξ» * relevance - (1-Ξ») * max_similarity_to_selected
lambda_param = 0.7 # Balance relevance vs diversity
# Get query embedding
query_emb = await embeddings.embed(query)
# Get all evidence embeddings
evidence_embs = await embeddings.embed_batch([e.content for e in evidence])
# Compute relevance scores (cosine similarity to query)
from numpy import dot
from numpy.linalg import norm
cosine = lambda a, b: float(dot(a, b) / (norm(a) * norm(b)))
relevance_scores = [cosine(query_emb, emb) for emb in evidence_embs]
# Greedy MMR selection
selected_indices: list[int] = []
remaining = set(range(len(evidence)))
for _ in range(n):
best_score = float('-inf')
best_idx = -1
for idx in remaining:
# Relevance component
relevance = relevance_scores[idx]
# Diversity component: max similarity to already selected
if selected_indices:
max_sim = max(
cosine(evidence_embs[idx], evidence_embs[sel])
for sel in selected_indices
)
else:
max_sim = 0
# MMR score
mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim
if mmr_score > best_score:
best_score = mmr_score
best_idx = idx
if best_idx >= 0:
selected_indices.append(best_idx)
remaining.remove(best_idx)
return [evidence[i] for i in selected_indices]
4.1 Hypothesis Prompts (src/prompts/hypothesis.py)
"""Prompts for Hypothesis Agent."""
from src.utils.text_utils import truncate_at_sentence, select_diverse_evidence
SYSTEM_PROMPT = """You are a biomedical research scientist specializing in drug repurposing.
Your role is to generate mechanistic hypotheses based on evidence.
A good hypothesis:
1. Proposes a MECHANISM: Drug β Target β Pathway β Effect
2. Is TESTABLE: Can be supported or refuted by literature search
3. Is SPECIFIC: Names actual molecular targets and pathways
4. Generates SEARCH QUERIES: Helps find more evidence
Example hypothesis format:
- Drug: Metformin
- Target: AMPK (AMP-activated protein kinase)
- Pathway: mTOR inhibition β autophagy activation
- Effect: Enhanced clearance of amyloid-beta in Alzheimer's
- Confidence: 0.7
- Search suggestions: ["metformin AMPK brain", "autophagy amyloid clearance"]
Be specific. Use actual gene/protein names when possible."""
async def format_hypothesis_prompt(
query: str,
evidence: list,
embeddings=None
) -> str:
"""Format prompt for hypothesis generation.
Uses smart evidence selection instead of arbitrary truncation.
Args:
query: The research query
evidence: All collected evidence
embeddings: Optional EmbeddingService for diverse selection
"""
# Select diverse, relevant evidence (not arbitrary first 10)
selected = await select_diverse_evidence(
evidence, n=10, query=query, embeddings=embeddings
)
# Format with sentence-aware truncation
evidence_text = "\n".join([
f"- **{e.citation.title}** ({e.citation.source}): {truncate_at_sentence(e.content, 300)}"
for e in selected
])
return f"""Based on the following evidence about "{query}", generate mechanistic hypotheses.
## Evidence ({len(selected)} papers selected for diversity)
{evidence_text}
## Task
1. Identify potential drug targets mentioned in the evidence
2. Propose mechanism hypotheses (Drug β Target β Pathway β Effect)
3. Rate confidence based on evidence strength
4. Suggest searches to test each hypothesis
Generate 2-4 hypotheses, prioritized by confidence."""
4.2 Hypothesis Agent (src/agents/hypothesis_agent.py)
"""Hypothesis agent for mechanistic reasoning."""
from collections.abc import AsyncIterable
from typing import TYPE_CHECKING, Any
from agent_framework import (
AgentRunResponse,
AgentRunResponseUpdate,
AgentThread,
BaseAgent,
ChatMessage,
Role,
)
from pydantic_ai import Agent
from src.prompts.hypothesis import SYSTEM_PROMPT, format_hypothesis_prompt
from src.utils.config import settings
from src.utils.models import Evidence, HypothesisAssessment
if TYPE_CHECKING:
from src.services.embeddings import EmbeddingService
class HypothesisAgent(BaseAgent):
"""Generates mechanistic hypotheses based on evidence."""
def __init__(
self,
evidence_store: dict[str, list[Evidence]],
embedding_service: "EmbeddingService | None" = None, # NEW: for diverse selection
) -> None:
super().__init__(
name="HypothesisAgent",
description="Generates scientific hypotheses about drug mechanisms to guide research",
)
self._evidence_store = evidence_store
self._embeddings = embedding_service # Used for MMR evidence selection
self._agent = Agent(
model=settings.llm_provider, # Uses configured LLM
output_type=HypothesisAssessment,
system_prompt=SYSTEM_PROMPT,
)
async def run(
self,
messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
*,
thread: AgentThread | None = None,
**kwargs: Any,
) -> AgentRunResponse:
"""Generate hypotheses based on current evidence."""
# Extract query
query = self._extract_query(messages)
# Get current evidence
evidence = self._evidence_store.get("current", [])
if not evidence:
return AgentRunResponse(
messages=[ChatMessage(
role=Role.ASSISTANT,
text="No evidence available yet. Search for evidence first."
)],
response_id="hypothesis-no-evidence",
)
# Generate hypotheses with diverse evidence selection
# NOTE: format_hypothesis_prompt is now async
prompt = await format_hypothesis_prompt(
query, evidence, embeddings=self._embeddings
)
result = await self._agent.run(prompt)
assessment = result.output
# Store hypotheses in shared context
existing = self._evidence_store.get("hypotheses", [])
self._evidence_store["hypotheses"] = existing + assessment.hypotheses
# Format response
response_text = self._format_response(assessment)
return AgentRunResponse(
messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)],
response_id=f"hypothesis-{len(assessment.hypotheses)}",
additional_properties={"assessment": assessment.model_dump()},
)
def _format_response(self, assessment: HypothesisAssessment) -> str:
"""Format hypothesis assessment as markdown."""
lines = ["## Generated Hypotheses\n"]
for i, h in enumerate(assessment.hypotheses, 1):
lines.append(f"### Hypothesis {i} (Confidence: {h.confidence:.0%})")
lines.append(f"**Mechanism**: {h.drug} β {h.target} β {h.pathway} β {h.effect}")
lines.append(f"**Suggested searches**: {', '.join(h.search_suggestions)}\n")
if assessment.primary_hypothesis:
lines.append(f"### Primary Hypothesis")
h = assessment.primary_hypothesis
lines.append(f"{h.drug} β {h.target} β {h.pathway} β {h.effect}\n")
if assessment.knowledge_gaps:
lines.append("### Knowledge Gaps")
for gap in assessment.knowledge_gaps:
lines.append(f"- {gap}")
if assessment.recommended_searches:
lines.append("\n### Recommended Next Searches")
for search in assessment.recommended_searches:
lines.append(f"- `{search}`")
return "\n".join(lines)
def _extract_query(self, messages) -> str:
"""Extract query from messages."""
if isinstance(messages, str):
return messages
elif isinstance(messages, ChatMessage):
return messages.text or ""
elif isinstance(messages, list):
for msg in reversed(messages):
if isinstance(msg, ChatMessage) and msg.role == Role.USER:
return msg.text or ""
elif isinstance(msg, str):
return msg
return ""
async def run_stream(
self,
messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None,
*,
thread: AgentThread | None = None,
**kwargs: Any,
) -> AsyncIterable[AgentRunResponseUpdate]:
"""Streaming wrapper."""
result = await self.run(messages, thread=thread, **kwargs)
yield AgentRunResponseUpdate(
messages=result.messages,
response_id=result.response_id
)
4.3 Update MagenticOrchestrator
Add HypothesisAgent to the workflow:
# In MagenticOrchestrator.__init__
self._hypothesis_agent = HypothesisAgent(self._evidence_store)
# In workflow building
workflow = (
MagenticBuilder()
.participants(
searcher=search_agent,
hypothesizer=self._hypothesis_agent, # NEW
judge=judge_agent,
)
.with_standard_manager(...)
.build()
)
# Update task instruction
task = f"""Research drug repurposing opportunities for: {query}
Workflow:
1. SearchAgent: Find initial evidence from PubMed and web
2. HypothesisAgent: Generate mechanistic hypotheses (Drug β Target β Pathway β Effect)
3. SearchAgent: Use hypothesis-suggested queries for targeted search
4. JudgeAgent: Evaluate if evidence supports hypotheses
5. Repeat until confident or max rounds
Focus on:
- Identifying specific molecular targets
- Understanding mechanism of action
- Finding supporting/contradicting evidence for hypotheses
"""
5. Directory Structure After Phase 7
src/
βββ agents/
β βββ search_agent.py
β βββ judge_agent.py
β βββ hypothesis_agent.py # NEW
βββ prompts/
β βββ judge.py
β βββ hypothesis.py # NEW
βββ services/
β βββ embeddings.py
βββ utils/
βββ models.py # Updated with hypothesis models
6. Tests
6.1 Unit Tests (tests/unit/agents/test_hypothesis_agent.py)
"""Unit tests for HypothesisAgent."""
import pytest
from unittest.mock import AsyncMock, MagicMock, patch
from src.agents.hypothesis_agent import HypothesisAgent
from src.utils.models import Citation, Evidence, HypothesisAssessment, MechanismHypothesis
@pytest.fixture
def sample_evidence():
return [
Evidence(
content="Metformin activates AMPK, which inhibits mTOR signaling...",
citation=Citation(
source="pubmed",
title="Metformin and AMPK",
url="https://pubmed.ncbi.nlm.nih.gov/12345/",
date="2023"
)
)
]
@pytest.fixture
def mock_assessment():
return HypothesisAssessment(
hypotheses=[
MechanismHypothesis(
drug="Metformin",
target="AMPK",
pathway="mTOR inhibition",
effect="Reduced cancer cell proliferation",
confidence=0.75,
search_suggestions=["metformin AMPK cancer", "mTOR cancer therapy"]
)
],
primary_hypothesis=None,
knowledge_gaps=["Clinical trial data needed"],
recommended_searches=["metformin clinical trial cancer"]
)
@pytest.mark.asyncio
async def test_hypothesis_agent_generates_hypotheses(sample_evidence, mock_assessment):
"""HypothesisAgent should generate mechanistic hypotheses."""
store = {"current": sample_evidence, "hypotheses": []}
with patch("src.agents.hypothesis_agent.Agent") as MockAgent:
mock_result = MagicMock()
mock_result.output = mock_assessment
MockAgent.return_value.run = AsyncMock(return_value=mock_result)
agent = HypothesisAgent(store)
response = await agent.run("metformin cancer")
assert "AMPK" in response.messages[0].text
assert len(store["hypotheses"]) == 1
@pytest.mark.asyncio
async def test_hypothesis_agent_no_evidence():
"""HypothesisAgent should handle empty evidence gracefully."""
store = {"current": [], "hypotheses": []}
agent = HypothesisAgent(store)
response = await agent.run("test query")
assert "No evidence" in response.messages[0].text
7. Definition of Done
Phase 7 is COMPLETE when:
MechanismHypothesisandHypothesisAssessmentmodels implementedHypothesisAgentgenerates hypotheses from evidence- Hypotheses stored in shared context
- Search queries generated from hypotheses
- Magentic workflow includes HypothesisAgent
- All unit tests pass
8. Value Delivered
| Before (Phase 6) | After (Phase 7) |
|---|---|
| Reactive search | Hypothesis-driven search |
| Generic queries | Mechanism-targeted queries |
| No scientific reasoning | Drug β Target β Pathway β Effect |
| Judge says "need more" | Hypothesis says "search for X to test Y" |
Real example improvement:
- Query: "metformin alzheimer"
- Before: "metformin alzheimer mechanism", "metformin brain"
- After: "metformin AMPK activation", "AMPK autophagy neurodegeneration", "autophagy amyloid clearance"
The search becomes scientifically targeted rather than keyword variations.