title: GeoQuery
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 7860
GeoQuery
ππ€
Territorial Intelligence Platform - Natural language interface for geospatial data analysis powered by LLMs and DuckDB Spatial.
β¨ What is GeoQuery?
GeoQuery transforms geographic data analysis by combining Large Language Models with spatial databases. Simply ask questions in natural language and get instant maps, charts, and insights.
Example: "Show me hospitals in Panama City" β Interactive map with 45 hospital locations, automatically styled with π₯ icons.
Key Capabilities
- π£οΈ Conversational Queries - Natural language instead of SQL or GIS interfaces
- πΊοΈ Auto-Visualization - Smart choropleth maps, point markers, and heatmaps
- π Dynamic Charts - Automatic bar, pie, and line chart generation
- π Semantic Discovery - Finds relevant datasets from 100+ options using AI embeddings
- π§© Multi-Step Analysis - Complex queries automatically decomposed and executed
- π‘ Thinking Transparency - See the LLM's reasoning process in real-time
- π¨ Custom Point Styles - Icon markers for POI, circle points for large datasets
π¬ Quick Demo
Try These Queries
| Query | What You Get |
|---|---|
| "Show me all provinces colored by area" | Choropleth map with size-based gradient |
| "Where are the universities?" | Point map with π icons |
| "Compare hospital count vs school count by province" | Multi-step analysis with side-by-side bar charts |
| "Show intersections in David as circle points" | 1,288 traffic intersections as simple colored circles |
| "Population density in Veraguas" | H3 hexagon heatmap (33K cells) |
ποΈ Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (Next.js) β
β Chat Interface β Leaflet Maps β Data Explorer β
ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β (SSE Streaming)
ββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββ
β Backend (FastAPI) β
β Intent Detection β Semantic Search β SQL Generation β
β β β β β
β Gemini LLM DataCatalog (Embeddings) DuckDB Spatial β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
It supports dynamic dataset discovery via semantic embeddings + LLM-generated spatial SQL.
π Quick Start
Prerequisites
- Python 3.11+
- Node.js 18+
- Google AI API Key (Get one free)
Installation
# 1. Clone repository
git clone https://github.com/GerardCB/GeoQuery.git
cd GeoQuery
# 2. Backend setup
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e .
# 3. Configure API key
export GEMINI_API_KEY="your-api-key-here"
# 4. Start backend
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
# 5. Frontend setup (new terminal)
cd frontend
npm install
npm run dev
π Done!
Open http://localhost:3000 and start querying!
π Detailed Setup Guide
π Project Structure
GeoQuery/
βββ backend/
β βββ api/ # FastAPI endpoints
β β βββ endpoints/ # /chat, /catalog, /schema
β βββ core/ # Core services
β β βββ llm_gateway.py # Gemini API integration
β β βββ geo_engine.py # DuckDB Spatial wrapper
β β βββ semantic_search.py # Embedding-based discovery
β β βββ data_catalog.py # Dataset metadata management
β β βββ query_planner.py # Multi-step query orchestration
β β βββ prompts.py # LLM system instructions
β βββ services/ # Business logic
β β βββ executor.py # Query pipeline orchestrator
β β βββ response_formatter.py # GeoJSON/chart formatting
β βββ data/ # Datasets and metadata
β β βββ catalog.json # Dataset registry
β β βββ embeddings.npy # Vector embeddings
β β βββ osm/ # OpenStreetMap data
β β βββ admin/ # Administrative boundaries
β β βββ global/ # Global datasets (Kontur, etc.)
β β βββ socioeconomic/ # World Bank, poverty data
β βββ scripts/ # Data ingestion scripts
β βββ download_geofabrik.py
β βββ download_hdx_panama.py
β βββ stri_catalog_scraper.py
βββ frontend/
β βββ src/
β βββ app/ # Next.js App Router pages
β βββ components/
β βββ ChatPanel.tsx # Chat interface with SSE
β βββ MapViewer.tsx # Leaflet map with layers
β βββ DataExplorer.tsx # Tabular data view
βββ docs/ # Technical documentation
βββ backend/ # Backend deep-dives
βββ frontend/ # Frontend architecture
βββ data/ # Data system docs
π§ Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| LLM | Google Gemini 2.0 | Intent detection, SQL generation, explanations |
| Backend | Python 3.11 + FastAPI | Async HTTP server with SSE streaming |
| Database | DuckDB with Spatial | In-memory spatial analytics |
| Frontend | Next.js 15 + React 18 | Server-side rendering + interactive UI |
| Maps | Leaflet 1.9 | Interactive web maps |
| Embeddings | sentence-transformers | Semantic dataset search |
| Data | GeoJSON + Parquet | Standardized geospatial formats |
π Available Datasets
GeoQuery currently includes 100+ datasets across multiple categories:
Administrative
- Panama provinces, districts, corregimientos (HDX 2021)
- Comarca boundaries
- Electoral districts
Infrastructure
- Roads and highways (OpenStreetMap)
- Hospitals and health facilities (986 locations)
- Universities and schools (200+ institutions)
- Airports, ports, power plants
Socioeconomic
- World Bank development indicators
- Multidimensional poverty index (MPI)
- Population density (Kontur H3 hexagons - 33K cells)
Natural Environment
- Protected areas (STRI GIS Portal)
- Forest cover and land use
- Rivers and water bodies
π Full Dataset List | Adding New Data
π‘ How It Works
- User Query: "Show me hospitals in Panama City"
- Intent Detection: LLM classifies as MAP_REQUEST
- Semantic Search: Finds
panama_healthsites_geojsonvia embeddings - SQL Generation: LLM creates:
SELECT name, geom FROM panama_healthsites_geojson WHERE ST_Intersects(geom, (SELECT geom FROM pan_admin2 WHERE adm2_name = 'PanamΓ‘')) - Execution: DuckDB Spatial runs query β 45 features
- Visualization: Auto-styled map with π₯ icons
- Explanation: LLM streams natural language summary
Streaming: See the LLM's thinking process in real-time via Server-Sent Events.
π Detailed Data Flow | LLM Integration
πΊοΈ Advanced Features
Choropleth Maps
Automatically detects numeric columns and creates color gradients:
- Linear scale: For area, count
- Logarithmic scale: For population, density
Point Visualization Modes
- Icon markers π₯πβͺ: For categorical POI (<500 points)
- Circle points β: For large datasets like intersections (>500 points)
Spatial Operations
- Intersection: "Find hospitals within protected areas"
- Difference: "Show me areas outside national parks"
- Buffer: "Show 5km radius around hospitals"
Multi-Step Queries
Complex questions automatically decomposed:
- "Compare population density with hospital coverage by province"
- Calculate population per province
- Count hospitals per province
- Compute ratios
- Generate comparison chart
π Documentation
| Document | Description |
|---|---|
| ARCHITECTURE.md | System design, components, decisions |
| SETUP.md | Development environment setup |
| docs/backend/CORE_SERVICES.md | Backend services reference |
| docs/backend/API_ENDPOINTS.md | API endpoint documentation |
| docs/frontend/COMPONENTS.md | React component architecture |
| docs/DATA_FLOW.md | End-to-end request walkthrough |
π License
MIT License - see LICENSE for details.
π Acknowledgments
Data Sources:
- OpenStreetMap - Infrastructure and POI data
- Humanitarian Data Exchange (HDX) - Administrative boundaries
- World Bank Open Data - Socioeconomic indicators
- Kontur Population Dataset - H3 population grid
- STRI GIS Portal - Environmental datasets
Technologies:
- Google Gemini - LLM API
- DuckDB - Fast in-process analytics
- Leaflet - Interactive maps
- Next.js - React framework