# Services API Reference This page documents the API for DeepCritical services. ## EmbeddingService **Module**: `src.services.embeddings` **Purpose**: Local sentence-transformers for semantic search and deduplication. ### Methods #### `embed` [EmbeddingService.embed](../src/services/embeddings.py) start_line:55 end_line:55 Generates embedding for a text string. **Parameters**: - `text`: Text to embed **Returns**: Embedding vector as list of floats. #### `embed_batch` ```python async def embed_batch(self, texts: list[str]) -> list[list[float]] ``` Generates embeddings for multiple texts. **Parameters**: - `texts`: List of texts to embed **Returns**: List of embedding vectors. #### `similarity` ```python async def similarity(self, text1: str, text2: str) -> float ``` Calculates similarity between two texts. **Parameters**: - `text1`: First text - `text2`: Second text **Returns**: Similarity score (0.0-1.0). #### `find_duplicates` ```python async def find_duplicates( self, texts: list[str], threshold: float = 0.85 ) -> list[tuple[int, int]] ``` Finds duplicate texts based on similarity threshold. **Parameters**: - `texts`: List of texts to check - `threshold`: Similarity threshold (default: 0.85) **Returns**: List of (index1, index2) tuples for duplicate pairs. #### `add_evidence` ```python async def add_evidence( self, evidence_id: str, content: str, metadata: dict[str, Any] ) -> None ``` Adds evidence to vector store for semantic search. **Parameters**: - `evidence_id`: Unique identifier for the evidence - `content`: Evidence text content - `metadata`: Additional metadata dictionary #### `search_similar` ```python async def search_similar( self, query: str, n_results: int = 5 ) -> list[dict[str, Any]] ``` Finds semantically similar evidence. **Parameters**: - `query`: Search query string - `n_results`: Number of results to return (default: 5) **Returns**: List of dictionaries with `id`, `content`, `metadata`, and `distance` keys. #### `deduplicate` ```python async def deduplicate( self, new_evidence: list[Evidence], threshold: float = 0.9 ) -> list[Evidence] ``` Removes semantically duplicate evidence. **Parameters**: - `new_evidence`: List of evidence items to deduplicate - `threshold`: Similarity threshold (default: 0.9, where 0.9 = 90% similar is duplicate) **Returns**: List of unique evidence items (not already in vector store). ### Factory Function #### `get_embedding_service` ```python @lru_cache(maxsize=1) def get_embedding_service() -> EmbeddingService ``` Returns singleton EmbeddingService instance. ## LlamaIndexRAGService **Module**: `src.services.rag` **Purpose**: Retrieval-Augmented Generation using LlamaIndex. ### Methods #### `ingest_evidence` [LlamaIndexRAGService.ingest_evidence](../src/services/llamaindex_rag.py) start_line:290 end_line:290 Ingests evidence into RAG service. **Parameters**: - `evidence_list`: List of Evidence objects to ingest **Note**: Supports multiple embedding providers (OpenAI, local sentence-transformers, Hugging Face). #### `retrieve` ```python def retrieve( self, query: str, top_k: int | None = None ) -> list[dict[str, Any]] ``` Retrieves relevant documents for a query. **Parameters**: - `query`: Search query string - `top_k`: Number of top results to return (defaults to `similarity_top_k` from constructor) **Returns**: List of dictionaries with `text`, `score`, and `metadata` keys. #### `query` ```python def query( self, query_str: str, top_k: int | None = None ) -> str ``` Queries RAG service and returns synthesized response. **Parameters**: - `query_str`: Query string - `top_k`: Number of results to use (defaults to `similarity_top_k` from constructor) **Returns**: Synthesized response string. **Raises**: - `ConfigurationError`: If no LLM API key is available for query synthesis #### `ingest_documents` ```python def ingest_documents(self, documents: list[Any]) -> None ``` Ingests raw LlamaIndex Documents. **Parameters**: - `documents`: List of LlamaIndex Document objects #### `clear_collection` ```python def clear_collection(self) -> None ``` Clears all documents from the collection. ### Factory Function #### `get_rag_service` ```python def get_rag_service( collection_name: str = "deepcritical_evidence", oauth_token: str | None = None, **kwargs: Any ) -> LlamaIndexRAGService ``` Get or create a RAG service instance. **Parameters**: - `collection_name`: Name of the ChromaDB collection (default: "deepcritical_evidence") - `oauth_token`: Optional OAuth token from HuggingFace login (takes priority over env vars) - `**kwargs`: Additional arguments for LlamaIndexRAGService (e.g., `use_openai_embeddings=False`) **Returns**: Configured LlamaIndexRAGService instance. **Note**: By default, uses local embeddings (sentence-transformers) which require no API keys. ## StatisticalAnalyzer **Module**: `src.services.statistical_analyzer` **Purpose**: Secure execution of AI-generated statistical code. ### Methods #### `analyze` ```python async def analyze( self, query: str, evidence: list[Evidence], hypothesis: dict[str, Any] | None = None ) -> AnalysisResult ``` Analyzes a research question using statistical methods. **Parameters**: - `query`: The research question - `evidence`: List of Evidence objects to analyze - `hypothesis`: Optional hypothesis dict with `drug`, `target`, `pathway`, `effect`, `confidence` keys **Returns**: `AnalysisResult` with: - `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE - `confidence`: Confidence in verdict (0.0-1.0) - `statistical_evidence`: Summary of statistical findings - `code_generated`: Python code that was executed - `execution_output`: Output from code execution - `key_takeaways`: Key takeaways from analysis - `limitations`: List of limitations **Note**: Requires Modal credentials for sandbox execution. ## See Also - [Architecture - Services](../architecture/services.md) - Architecture overview - [Configuration](../configuration/index.md) - Service configuration