Spaces:
Running
A newer version of the Gradio SDK is available:
6.1.0
Services API Reference
This page documents the API for DeepCritical services.
EmbeddingService
Module: src.services.embeddings
Purpose: Local sentence-transformers for semantic search and deduplication.
Methods
embed
EmbeddingService.embed start_line:55 end_line:55
Generates embedding for a text string.
Parameters:
text: Text to embed
Returns: Embedding vector as list of floats.
embed_batch
async def embed_batch(self, texts: list[str]) -> list[list[float]]
Generates embeddings for multiple texts.
Parameters:
texts: List of texts to embed
Returns: List of embedding vectors.
similarity
async def similarity(self, text1: str, text2: str) -> float
Calculates similarity between two texts.
Parameters:
text1: First texttext2: Second text
Returns: Similarity score (0.0-1.0).
find_duplicates
async def find_duplicates(
self,
texts: list[str],
threshold: float = 0.85
) -> list[tuple[int, int]]
Finds duplicate texts based on similarity threshold.
Parameters:
texts: List of texts to checkthreshold: Similarity threshold (default: 0.85)
Returns: List of (index1, index2) tuples for duplicate pairs.
add_evidence
async def add_evidence(
self,
evidence_id: str,
content: str,
metadata: dict[str, Any]
) -> None
Adds evidence to vector store for semantic search.
Parameters:
evidence_id: Unique identifier for the evidencecontent: Evidence text contentmetadata: Additional metadata dictionary
search_similar
async def search_similar(
self,
query: str,
n_results: int = 5
) -> list[dict[str, Any]]
Finds semantically similar evidence.
Parameters:
query: Search query stringn_results: Number of results to return (default: 5)
Returns: List of dictionaries with id, content, metadata, and distance keys.
deduplicate
async def deduplicate(
self,
new_evidence: list[Evidence],
threshold: float = 0.9
) -> list[Evidence]
Removes semantically duplicate evidence.
Parameters:
new_evidence: List of evidence items to deduplicatethreshold: Similarity threshold (default: 0.9, where 0.9 = 90% similar is duplicate)
Returns: List of unique evidence items (not already in vector store).
Factory Function
get_embedding_service
@lru_cache(maxsize=1)
def get_embedding_service() -> EmbeddingService
Returns singleton EmbeddingService instance.
LlamaIndexRAGService
Module: src.services.rag
Purpose: Retrieval-Augmented Generation using LlamaIndex.
Methods
ingest_evidence
LlamaIndexRAGService.ingest_evidence start_line:290 end_line:290
Ingests evidence into RAG service.
Parameters:
evidence_list: List of Evidence objects to ingest
Note: Supports multiple embedding providers (OpenAI, local sentence-transformers, Hugging Face).
retrieve
def retrieve(
self,
query: str,
top_k: int | None = None
) -> list[dict[str, Any]]
Retrieves relevant documents for a query.
Parameters:
query: Search query stringtop_k: Number of top results to return (defaults tosimilarity_top_kfrom constructor)
Returns: List of dictionaries with text, score, and metadata keys.
query
def query(
self,
query_str: str,
top_k: int | None = None
) -> str
Queries RAG service and returns synthesized response.
Parameters:
query_str: Query stringtop_k: Number of results to use (defaults tosimilarity_top_kfrom constructor)
Returns: Synthesized response string.
Raises:
ConfigurationError: If no LLM API key is available for query synthesis
ingest_documents
def ingest_documents(self, documents: list[Any]) -> None
Ingests raw LlamaIndex Documents.
Parameters:
documents: List of LlamaIndex Document objects
clear_collection
def clear_collection(self) -> None
Clears all documents from the collection.
Factory Function
get_rag_service
def get_rag_service(
collection_name: str = "deepcritical_evidence",
oauth_token: str | None = None,
**kwargs: Any
) -> LlamaIndexRAGService
Get or create a RAG service instance.
Parameters:
collection_name: Name of the ChromaDB collection (default: "deepcritical_evidence")oauth_token: Optional OAuth token from HuggingFace login (takes priority over env vars)**kwargs: Additional arguments for LlamaIndexRAGService (e.g.,use_openai_embeddings=False)
Returns: Configured LlamaIndexRAGService instance.
Note: By default, uses local embeddings (sentence-transformers) which require no API keys.
StatisticalAnalyzer
Module: src.services.statistical_analyzer
Purpose: Secure execution of AI-generated statistical code.
Methods
analyze
async def analyze(
self,
query: str,
evidence: list[Evidence],
hypothesis: dict[str, Any] | None = None
) -> AnalysisResult
Analyzes a research question using statistical methods.
Parameters:
query: The research questionevidence: List of Evidence objects to analyzehypothesis: Optional hypothesis dict withdrug,target,pathway,effect,confidencekeys
Returns: AnalysisResult with:
verdict: SUPPORTED, REFUTED, or INCONCLUSIVEconfidence: Confidence in verdict (0.0-1.0)statistical_evidence: Summary of statistical findingscode_generated: Python code that was executedexecution_output: Output from code executionkey_takeaways: Key takeaways from analysislimitations: List of limitations
Note: Requires Modal credentials for sandbox execution.
See Also
- Architecture - Services - Architecture overview
- Configuration - Service configuration