Joseph Pollack
Restore recent changes
026ee5d
|
raw
history blame
3.92 kB

Tools Architecture

DeepCritical implements a protocol-based search tool system for retrieving evidence from multiple sources.

SearchTool Protocol

All tools implement the SearchTool protocol from src/tools/base.py:

class SearchTool(Protocol):
    @property
    def name(self) -> str: ...
    
    async def search(
        self, 
        query: str, 
        max_results: int = 10
    ) -> list[Evidence]: ...

Rate Limiting

All tools use the @retry decorator from tenacity:

@retry(
    stop=stop_after_attempt(3), 
    wait=wait_exponential(...)
)
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
    # Implementation

Tools with API rate limits implement _rate_limit() method and use shared rate limiters from src/tools/rate_limiter.py.

Error Handling

Tools raise custom exceptions:

  • SearchError: General search failures
  • RateLimitError: Rate limit exceeded

Tools handle HTTP errors (429, 500, timeout) and return empty lists on non-critical errors (with warning logs).

Query Preprocessing

Tools use preprocess_query() from src/tools/query_utils.py to:

  • Remove noise from queries
  • Expand synonyms
  • Normalize query format

Evidence Conversion

All tools convert API responses to Evidence objects with:

  • Citation: Title, URL, date, authors
  • content: Evidence text
  • relevance_score: 0.0-1.0 relevance score
  • metadata: Additional metadata

Missing fields are handled gracefully with defaults.

Tool Implementations

PubMed Tool

File: src/tools/pubmed.py

API: NCBI E-utilities (ESearch → EFetch)

Rate Limiting:

  • 0.34s between requests (3 req/sec without API key)
  • 0.1s between requests (10 req/sec with NCBI API key)

Features:

  • XML parsing with xmltodict
  • Handles single vs. multiple articles
  • Query preprocessing
  • Evidence conversion with metadata extraction

ClinicalTrials Tool

File: src/tools/clinicaltrials.py

API: ClinicalTrials.gov API v2

Important: Uses requests library (NOT httpx) because WAF blocks httpx TLS fingerprint.

Execution: Runs in thread pool: await asyncio.to_thread(requests.get, ...)

Filtering:

  • Only interventional studies
  • Status: COMPLETED, ACTIVE_NOT_RECRUITING, RECRUITING, ENROLLING_BY_INVITATION

Features:

  • Parses nested JSON structure
  • Extracts trial metadata
  • Evidence conversion

Europe PMC Tool

File: src/tools/europepmc.py

API: Europe PMC REST API

Features:

  • Handles preprint markers: [PREPRINT - Not peer-reviewed]
  • Builds URLs from DOI or PMID
  • Checks pubTypeList for preprint detection
  • Includes both preprints and peer-reviewed articles

RAG Tool

File: src/tools/rag_tool.py

Purpose: Semantic search within collected evidence

Implementation: Wraps LlamaIndexRAGService

Features:

  • Returns Evidence from RAG results
  • Handles evidence ingestion
  • Semantic similarity search
  • Metadata preservation

Search Handler

File: src/tools/search_handler.py

Purpose: Orchestrates parallel searches across multiple tools

Features:

  • Uses asyncio.gather() with return_exceptions=True
  • Aggregates results into SearchResult
  • Handles tool failures gracefully
  • Deduplicates results by URL

Tool Registration

Tools are registered in the search handler:

from src.tools.pubmed import PubMedTool
from src.tools.clinicaltrials import ClinicalTrialsTool
from src.tools.europepmc import EuropePMCTool

search_handler = SearchHandler(
    tools=[
        PubMedTool(),
        ClinicalTrialsTool(),
        EuropePMCTool(),
    ]
)

See Also