Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: Articles Search Engine
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.45.0
app_file: frontend/app.py
python_version: '3.12'
pinned: false
Articles Search Engine
A compact, production-style RAG pipeline. It ingests Substack, Medium and top publications RSS articles, stores them in Postgres (Supabase), creates dense/sparse embeddings in Qdrant, and exposes search and answer endpoints via FastAPI with a simple Gradio UI.
How it works (brief)
- Ingest RSS β Supabase:
- Prefect flow (
src/pipelines/flows/rss_ingestion_flow.py) reads feeds fromsrc/configs/feeds_rss.yaml, parses articles, and writes them to Postgres using SQLAlchemy models.
- Prefect flow (
- Embed + index in Qdrant:
- Content is chunked, embedded (e.g., BAAI bge models), and upserted to a Qdrant collection with payload indexes for filtering and hybrid search.
- Collection and indexes are created via utilities in
src/infrastructure/qdrant/.
- Search + generate:
- FastAPI (
src/api/main.py) exposes search endpoints (keyword, semantic, hybrid) and assembles answers with citations. - LLM providers are pluggable with fallback (OpenRouter, OpenAI, Hugging Face).
- Opik is used for Evaluation
- FastAPI (
- UI + deploy:
- Gradio app for quick local search (
frontend/app.py). - Containerization with Docker and optional deploy to Google Cloud Run.
- Gradio app for quick local search (
Tech stack
- Python 3.12, FastAPI, Prefect, SQLAlchemy
- Supabase (Postgres) for articles
- Qdrant for vector search (dense + sparse/hybrid)
- OpenRouter / OpenAI / Hugging Face for LLM completion, Opik for LLM Evaluation
- Gradio UI, Docker, Google Cloud Run
- Config via Pydantic Settings,
uvorpipfor deps
Run locally (minimal)
Configure environment (either
.envor shell). Key variables (Pydantic nested with__):- Supabase:
SUPABASE_DB__HOST,SUPABASE_DB__PORT,SUPABASE_DB__NAME,SUPABASE_DB__USER,SUPABASE_DB__PASSWORD - Qdrant:
QDRANT__URL,QDRANT__API_KEY - LLM (choose one):
OPENROUTER__API_KEYorOPENAI__API_KEYorHUGGING_FACE__API_KEY - Optional CORS:
ALLOWED_ORIGINS
- Supabase:
Install dependencies:
# with uv
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt
# or with pip
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
- Initialize storage:
python src/infrastructure/supabase/create_db.py
python src/infrastructure/qdrant/create_collection.py
python src/infrastructure/qdrant/create_indexes.py
- Ingest and embed:
python src/pipelines/flows/rss_ingestion_flow.py
python src/pipelines/flows/embeddings_ingestion_flow.py
- Start services:
# REST API
uvicorn src.api.main:app --reload
# Gradio UI (optional)
python frontend/app.py
Project structure (high-level)
src/api/β FastAPI app, routes, middleware, exceptionssrc/infrastructure/supabase/β DB init and sessionssrc/infrastructure/qdrant/β Vector store and collection utilitiessrc/pipelines/β Prefect flows and tasks for ingestion/embeddingssrc/models/β SQL and vector modelsfrontend/β Gradio UIconfigs/β RSS feeds config