Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

DeepCritical / docs /api /services.md

Joseph Pollack

implements documentation improvements

d45d242 11 days ago

preview code

raw

history blame contribute delete

6.21 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Services API Reference

This page documents the API for DeepCritical services.

EmbeddingService

Module: src.services.embeddings

Purpose: Local sentence-transformers for semantic search and deduplication.

Methods

`embed`

EmbeddingService.embed start_line:55 end_line:55

Generates embedding for a text string.

Parameters:

text: Text to embed

Returns: Embedding vector as list of floats.

`embed_batch`

async def embed_batch(self, texts: list[str]) -> list[list[float]]

Generates embeddings for multiple texts.

Parameters:

texts: List of texts to embed

Returns: List of embedding vectors.

`similarity`

async def similarity(self, text1: str, text2: str) -> float

Calculates similarity between two texts.

Parameters:

text1: First text
text2: Second text

Returns: Similarity score (0.0-1.0).

`find_duplicates`

async def find_duplicates(
    self,
    texts: list[str],
    threshold: float = 0.85
) -> list[tuple[int, int]]

Finds duplicate texts based on similarity threshold.

Parameters:

texts: List of texts to check
threshold: Similarity threshold (default: 0.85)

Returns: List of (index1, index2) tuples for duplicate pairs.

`add_evidence`

async def add_evidence(
    self,
    evidence_id: str,
    content: str,
    metadata: dict[str, Any]
) -> None

Adds evidence to vector store for semantic search.

Parameters:

evidence_id: Unique identifier for the evidence
content: Evidence text content
metadata: Additional metadata dictionary

`search_similar`

async def search_similar(
    self,
    query: str,
    n_results: int = 5
) -> list[dict[str, Any]]

Finds semantically similar evidence.

Parameters:

query: Search query string
n_results: Number of results to return (default: 5)

Returns: List of dictionaries with id, content, metadata, and distance keys.

`deduplicate`

async def deduplicate(
    self,
    new_evidence: list[Evidence],
    threshold: float = 0.9
) -> list[Evidence]

Removes semantically duplicate evidence.

Parameters:

new_evidence: List of evidence items to deduplicate
threshold: Similarity threshold (default: 0.9, where 0.9 = 90% similar is duplicate)

Returns: List of unique evidence items (not already in vector store).

Factory Function

`get_embedding_service`

@lru_cache(maxsize=1)
def get_embedding_service() -> EmbeddingService

Returns singleton EmbeddingService instance.

LlamaIndexRAGService

Module: src.services.rag

Purpose: Retrieval-Augmented Generation using LlamaIndex.

Methods

`ingest_evidence`

LlamaIndexRAGService.ingest_evidence start_line:290 end_line:290

Ingests evidence into RAG service.

Parameters:

evidence_list: List of Evidence objects to ingest

Note: Supports multiple embedding providers (OpenAI, local sentence-transformers, Hugging Face).

`retrieve`

def retrieve(
    self,
    query: str,
    top_k: int | None = None
) -> list[dict[str, Any]]

Retrieves relevant documents for a query.

Parameters:

query: Search query string
top_k: Number of top results to return (defaults to similarity_top_k from constructor)

Returns: List of dictionaries with text, score, and metadata keys.

`query`

def query(
    self,
    query_str: str,
    top_k: int | None = None
) -> str

Queries RAG service and returns synthesized response.

Parameters:

query_str: Query string
top_k: Number of results to use (defaults to similarity_top_k from constructor)

Returns: Synthesized response string.

Raises:

ConfigurationError: If no LLM API key is available for query synthesis

`ingest_documents`

def ingest_documents(self, documents: list[Any]) -> None

Ingests raw LlamaIndex Documents.

Parameters:

documents: List of LlamaIndex Document objects

`clear_collection`

def clear_collection(self) -> None

Clears all documents from the collection.

Factory Function

`get_rag_service`

def get_rag_service(
    collection_name: str = "deepcritical_evidence",
    oauth_token: str | None = None,
    **kwargs: Any
) -> LlamaIndexRAGService

Get or create a RAG service instance.

Parameters:

collection_name: Name of the ChromaDB collection (default: "deepcritical_evidence")
oauth_token: Optional OAuth token from HuggingFace login (takes priority over env vars)
**kwargs: Additional arguments for LlamaIndexRAGService (e.g., use_openai_embeddings=False)

Returns: Configured LlamaIndexRAGService instance.

Note: By default, uses local embeddings (sentence-transformers) which require no API keys.

StatisticalAnalyzer

Module: src.services.statistical_analyzer

Purpose: Secure execution of AI-generated statistical code.

Methods

`analyze`

async def analyze(
    self,
    query: str,
    evidence: list[Evidence],
    hypothesis: dict[str, Any] | None = None
) -> AnalysisResult

Analyzes a research question using statistical methods.

Parameters:

query: The research question
evidence: List of Evidence objects to analyze
hypothesis: Optional hypothesis dict with drug, target, pathway, effect, confidence keys

Returns: AnalysisResult with:

verdict: SUPPORTED, REFUTED, or INCONCLUSIVE
confidence: Confidence in verdict (0.0-1.0)
statistical_evidence: Summary of statistical findings
code_generated: Python code that was executed
execution_output: Output from code execution
key_takeaways: Key takeaways from analysis
limitations: List of limitations

Note: Requires Modal credentials for sandbox execution.

Services API Reference

EmbeddingService

Methods

embed

embed_batch

similarity

find_duplicates

add_evidence

search_similar

deduplicate

Factory Function

get_embedding_service

LlamaIndexRAGService

Methods

ingest_evidence

retrieve

query

ingest_documents

clear_collection

Factory Function

get_rag_service

StatisticalAnalyzer

Methods

analyze

See Also

`embed`

`embed_batch`

`similarity`

`find_duplicates`

`add_evidence`

`search_similar`

`deduplicate`

`get_embedding_service`

`ingest_evidence`

`retrieve`

`query`

`ingest_documents`

`clear_collection`

`get_rag_service`

`analyze`