GeoQuery / README.md
GerardCB's picture
Deploy to Spaces (Final Clean)
4851501
metadata
title: GeoQuery
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 7860

GeoQuery

πŸŒπŸ€–

Territorial Intelligence Platform - Natural language interface for geospatial data analysis powered by LLMs and DuckDB Spatial.

Status Python Next.js License


✨ What is GeoQuery?

GeoQuery transforms geographic data analysis by combining Large Language Models with spatial databases. Simply ask questions in natural language and get instant maps, charts, and insights.

Example: "Show me hospitals in Panama City" β†’ Interactive map with 45 hospital locations, automatically styled with πŸ₯ icons.

Key Capabilities

  • πŸ—£οΈ Conversational Queries - Natural language instead of SQL or GIS interfaces
  • πŸ—ΊοΈ Auto-Visualization - Smart choropleth maps, point markers, and heatmaps
  • πŸ“Š Dynamic Charts - Automatic bar, pie, and line chart generation
  • πŸ” Semantic Discovery - Finds relevant datasets from 100+ options using AI embeddings
  • 🧩 Multi-Step Analysis - Complex queries automatically decomposed and executed
  • πŸ’‘ Thinking Transparency - See the LLM's reasoning process in real-time
  • 🎨 Custom Point Styles - Icon markers for POI, circle points for large datasets

🎬 Quick Demo

Try These Queries

Query What You Get
"Show me all provinces colored by area" Choropleth map with size-based gradient
"Where are the universities?" Point map with πŸŽ“ icons
"Compare hospital count vs school count by province" Multi-step analysis with side-by-side bar charts
"Show intersections in David as circle points" 1,288 traffic intersections as simple colored circles
"Population density in Veraguas" H3 hexagon heatmap (33K cells)

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Frontend (Next.js)                    β”‚
β”‚   Chat Interface  β”‚  Leaflet Maps  β”‚  Data Explorer     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ (SSE Streaming)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Backend (FastAPI)                       β”‚
β”‚  Intent Detection β†’ Semantic Search β†’ SQL Generation     β”‚
β”‚         ↓                ↓                  ↓             β”‚
β”‚    Gemini LLM    DataCatalog (Embeddings) DuckDB Spatial β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

It supports dynamic dataset discovery via semantic embeddings + LLM-generated spatial SQL.

πŸ“– Detailed Architecture


πŸš€ Quick Start

Prerequisites

Installation

# 1. Clone repository
git clone https://github.com/GerardCB/GeoQuery.git
cd GeoQuery

# 2. Backend setup
cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .

# 3. Configure API key
export GEMINI_API_KEY="your-api-key-here"

# 4. Start backend
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000

# 5. Frontend setup (new terminal)
cd frontend
npm install
npm run dev

πŸŽ‰ Done!

Open http://localhost:3000 and start querying!

πŸ“˜ Detailed Setup Guide


πŸ“‚ Project Structure

GeoQuery/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ api/                    # FastAPI endpoints
β”‚   β”‚   └── endpoints/          # /chat, /catalog, /schema
β”‚   β”œβ”€β”€ core/                   # Core services
β”‚   β”‚   β”œβ”€β”€ llm_gateway.py      # Gemini API integration
β”‚   β”‚   β”œβ”€β”€ geo_engine.py       # DuckDB Spatial wrapper
β”‚   β”‚   β”œβ”€β”€ semantic_search.py  # Embedding-based discovery
β”‚   β”‚   β”œβ”€β”€ data_catalog.py     # Dataset metadata management
β”‚   β”‚   β”œβ”€β”€ query_planner.py    # Multi-step query orchestration
β”‚   β”‚   └── prompts.py          # LLM system instructions
β”‚   β”œβ”€β”€ services/               # Business logic
β”‚   β”‚   β”œβ”€β”€ executor.py         # Query pipeline orchestrator
β”‚   β”‚   └── response_formatter.py # GeoJSON/chart formatting
β”‚   β”œβ”€β”€ data/                   # Datasets and metadata
β”‚   β”‚   β”œβ”€β”€ catalog.json        # Dataset registry
β”‚   β”‚   β”œβ”€β”€ embeddings.npy      # Vector embeddings
β”‚   β”‚   β”œβ”€β”€ osm/                # OpenStreetMap data
β”‚   β”‚   β”œβ”€β”€ admin/              # Administrative boundaries
β”‚   β”‚   β”œβ”€β”€ global/             # Global datasets (Kontur, etc.)
β”‚   β”‚   └── socioeconomic/      # World Bank, poverty data
β”‚   └── scripts/                # Data ingestion scripts
β”‚       β”œβ”€β”€ download_geofabrik.py
β”‚       β”œβ”€β”€ download_hdx_panama.py
β”‚       └── stri_catalog_scraper.py
β”œβ”€β”€ frontend/
β”‚   └── src/
β”‚       β”œβ”€β”€ app/                # Next.js App Router pages
β”‚       └── components/
β”‚           β”œβ”€β”€ ChatPanel.tsx   # Chat interface with SSE
β”‚           β”œβ”€β”€ MapViewer.tsx   # Leaflet map with layers
β”‚           └── DataExplorer.tsx # Tabular data view
└── docs/                       # Technical documentation
    β”œβ”€β”€ backend/                # Backend deep-dives
    β”œβ”€β”€ frontend/               # Frontend architecture
    └── data/                   # Data system docs

πŸ”§ Technology Stack

Layer Technology Purpose
LLM Google Gemini 2.0 Intent detection, SQL generation, explanations
Backend Python 3.11 + FastAPI Async HTTP server with SSE streaming
Database DuckDB with Spatial In-memory spatial analytics
Frontend Next.js 15 + React 18 Server-side rendering + interactive UI
Maps Leaflet 1.9 Interactive web maps
Embeddings sentence-transformers Semantic dataset search
Data GeoJSON + Parquet Standardized geospatial formats

πŸ“Š Available Datasets

GeoQuery currently includes 100+ datasets across multiple categories:

Administrative

  • Panama provinces, districts, corregimientos (HDX 2021)
  • Comarca boundaries
  • Electoral districts

Infrastructure

  • Roads and highways (OpenStreetMap)
  • Hospitals and health facilities (986 locations)
  • Universities and schools (200+ institutions)
  • Airports, ports, power plants

Socioeconomic

  • World Bank development indicators
  • Multidimensional poverty index (MPI)
  • Population density (Kontur H3 hexagons - 33K cells)

Natural Environment

  • Protected areas (STRI GIS Portal)
  • Forest cover and land use
  • Rivers and water bodies

πŸ“– Full Dataset List | Adding New Data


πŸ’‘ How It Works

  1. User Query: "Show me hospitals in Panama City"
  2. Intent Detection: LLM classifies as MAP_REQUEST
  3. Semantic Search: Finds panama_healthsites_geojson via embeddings
  4. SQL Generation: LLM creates: SELECT name, geom FROM panama_healthsites_geojson WHERE ST_Intersects(geom, (SELECT geom FROM pan_admin2 WHERE adm2_name = 'PanamΓ‘'))
  5. Execution: DuckDB Spatial runs query β†’ 45 features
  6. Visualization: Auto-styled map with πŸ₯ icons
  7. Explanation: LLM streams natural language summary

Streaming: See the LLM's thinking process in real-time via Server-Sent Events.

πŸ“– Detailed Data Flow | LLM Integration


πŸ—ΊοΈ Advanced Features

Choropleth Maps

Automatically detects numeric columns and creates color gradients:

  • Linear scale: For area, count
  • Logarithmic scale: For population, density

Point Visualization Modes

  • Icon markers πŸ₯πŸŽ“β›ͺ: For categorical POI (<500 points)
  • Circle points β­•: For large datasets like intersections (>500 points)

Spatial Operations

  • Intersection: "Find hospitals within protected areas"
  • Difference: "Show me areas outside national parks"
  • Buffer: "Show 5km radius around hospitals"

Multi-Step Queries

Complex questions automatically decomposed:

  • "Compare population density with hospital coverage by province"
    1. Calculate population per province
    2. Count hospitals per province
    3. Compute ratios
    4. Generate comparison chart

πŸ“š Documentation

Document Description
ARCHITECTURE.md System design, components, decisions
SETUP.md Development environment setup
docs/backend/CORE_SERVICES.md Backend services reference
docs/backend/API_ENDPOINTS.md API endpoint documentation
docs/frontend/COMPONENTS.md React component architecture
docs/DATA_FLOW.md End-to-end request walkthrough

πŸ“„ License

MIT License - see LICENSE for details.


πŸ™ Acknowledgments

Data Sources:

Technologies: