Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

DeepCritical / docs /api /services.md

Joseph Pollack

implements documentation improvements

d45d242 13 days ago

preview code

raw

history blame contribute delete

6.21 kB

	# Services API Reference

	This page documents the API for DeepCritical services.

	## EmbeddingService

	Module: `src.services.embeddings`

	Purpose: Local sentence-transformers for semantic search and deduplication.

	### Methods

	#### `embed`

	<!--codeinclude-->
	[EmbeddingService.embed](../src/services/embeddings.py) start_line:55 end_line:55
	<!--/codeinclude-->

	Generates embedding for a text string.

	Parameters:
	- `text`: Text to embed

	Returns: Embedding vector as list of floats.

	#### `embed_batch`

	```python
	async def embed_batch(self, texts: list[str]) -> list[list[float]]
	```

	Generates embeddings for multiple texts.

	Parameters:
	- `texts`: List of texts to embed

	Returns: List of embedding vectors.

	#### `similarity`

	```python
	async def similarity(self, text1: str, text2: str) -> float
	```

	Calculates similarity between two texts.

	Parameters:
	- `text1`: First text
	- `text2`: Second text

	Returns: Similarity score (0.0-1.0).

	#### `find_duplicates`

	```python
	async def find_duplicates(
	self,
	texts: list[str],
	threshold: float = 0.85
	) -> list[tuple[int, int]]
	```

	Finds duplicate texts based on similarity threshold.

	Parameters:
	- `texts`: List of texts to check
	- `threshold`: Similarity threshold (default: 0.85)

	Returns: List of (index1, index2) tuples for duplicate pairs.

	#### `add_evidence`

	```python
	async def add_evidence(
	self,
	evidence_id: str,
	content: str,
	metadata: dict[str, Any]
	) -> None
	```

	Adds evidence to vector store for semantic search.

	Parameters:
	- `evidence_id`: Unique identifier for the evidence
	- `content`: Evidence text content
	- `metadata`: Additional metadata dictionary

	#### `search_similar`

	```python
	async def search_similar(
	self,
	query: str,
	n_results: int = 5
	) -> list[dict[str, Any]]
	```

	Finds semantically similar evidence.

	Parameters:
	- `query`: Search query string
	- `n_results`: Number of results to return (default: 5)

	Returns: List of dictionaries with `id`, `content`, `metadata`, and `distance` keys.

	#### `deduplicate`

	```python
	async def deduplicate(
	self,
	new_evidence: list[Evidence],
	threshold: float = 0.9
	) -> list[Evidence]
	```

	Removes semantically duplicate evidence.

	Parameters:
	- `new_evidence`: List of evidence items to deduplicate
	- `threshold`: Similarity threshold (default: 0.9, where 0.9 = 90% similar is duplicate)

	Returns: List of unique evidence items (not already in vector store).

	### Factory Function

	#### `get_embedding_service`

	```python
	@lru_cache(maxsize=1)
	def get_embedding_service() -> EmbeddingService
	```

	Returns singleton EmbeddingService instance.

	## LlamaIndexRAGService

	Module: `src.services.rag`

	Purpose: Retrieval-Augmented Generation using LlamaIndex.

	### Methods

	#### `ingest_evidence`

	<!--codeinclude-->
	[LlamaIndexRAGService.ingest_evidence](../src/services/llamaindex_rag.py) start_line:290 end_line:290
	<!--/codeinclude-->

	Ingests evidence into RAG service.

	Parameters:
	- `evidence_list`: List of Evidence objects to ingest

	Note: Supports multiple embedding providers (OpenAI, local sentence-transformers, Hugging Face).

	#### `retrieve`

	```python
	def retrieve(
	self,
	query: str,
	top_k: int \| None = None
	) -> list[dict[str, Any]]
	```

	Retrieves relevant documents for a query.

	Parameters:
	- `query`: Search query string
	- `top_k`: Number of top results to return (defaults to `similarity_top_k` from constructor)

	Returns: List of dictionaries with `text`, `score`, and `metadata` keys.

	#### `query`

	```python
	def query(
	self,
	query_str: str,
	top_k: int \| None = None
	) -> str
	```

	Queries RAG service and returns synthesized response.

	Parameters:
	- `query_str`: Query string
	- `top_k`: Number of results to use (defaults to `similarity_top_k` from constructor)

	Returns: Synthesized response string.

	Raises:
	- `ConfigurationError`: If no LLM API key is available for query synthesis

	#### `ingest_documents`

	```python
	def ingest_documents(self, documents: list[Any]) -> None
	```

	Ingests raw LlamaIndex Documents.

	Parameters:
	- `documents`: List of LlamaIndex Document objects

	#### `clear_collection`

	```python
	def clear_collection(self) -> None
	```

	Clears all documents from the collection.

	### Factory Function

	#### `get_rag_service`

	```python
	def get_rag_service(
	collection_name: str = "deepcritical_evidence",
	oauth_token: str \| None = None,
	**kwargs: Any
	) -> LlamaIndexRAGService
	```

	Get or create a RAG service instance.

	Parameters:
	- `collection_name`: Name of the ChromaDB collection (default: "deepcritical_evidence")
	- `oauth_token`: Optional OAuth token from HuggingFace login (takes priority over env vars)
	- `**kwargs`: Additional arguments for LlamaIndexRAGService (e.g., `use_openai_embeddings=False`)

	Returns: Configured LlamaIndexRAGService instance.

	Note: By default, uses local embeddings (sentence-transformers) which require no API keys.

	## StatisticalAnalyzer

	Module: `src.services.statistical_analyzer`

	Purpose: Secure execution of AI-generated statistical code.

	### Methods

	#### `analyze`

	```python
	async def analyze(
	self,
	query: str,
	evidence: list[Evidence],
	hypothesis: dict[str, Any] \| None = None
	) -> AnalysisResult
	```

	Analyzes a research question using statistical methods.

	Parameters:
	- `query`: The research question
	- `evidence`: List of Evidence objects to analyze
	- `hypothesis`: Optional hypothesis dict with `drug`, `target`, `pathway`, `effect`, `confidence` keys

	Returns: `AnalysisResult` with:
	- `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
	- `confidence`: Confidence in verdict (0.0-1.0)
	- `statistical_evidence`: Summary of statistical findings
	- `code_generated`: Python code that was executed
	- `execution_output`: Output from code execution
	- `key_takeaways`: Key takeaways from analysis
	- `limitations`: List of limitations

	Note: Requires Modal credentials for sandbox execution.

	## See Also

	- [Architecture - Services](../architecture/services.md) - Architecture overview
	- [Configuration](../configuration/index.md) - Service configuration