GeoQuery / README.md
GerardCB's picture
Deploy to Spaces (Final Clean)
4851501
---
title: GeoQuery
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# GeoQuery
πŸŒπŸ€–
**Territorial Intelligence Platform** - Natural language interface for geospatial data analysis powered by LLMs and DuckDB Spatial.
![Status](https://img.shields.io/badge/Status-Active-success) ![Python](https://img.shields.io/badge/Python-3.11+-blue) ![Next.js](https://img.shields.io/badge/Next.js-15-black) ![License](https://img.shields.io/badge/License-MIT-green)
---
## ✨ What is GeoQuery?
GeoQuery transforms geographic data analysis by combining **Large Language Models** with **spatial databases**. Simply ask questions in natural language and get instant maps, charts, and insights.
**Example**: *"Show me hospitals in Panama City"* β†’ Interactive map with 45 hospital locations, automatically styled with πŸ₯ icons.
### Key Capabilities
- πŸ—£οΈ **Conversational Queries** - Natural language instead of SQL or GIS interfaces
- πŸ—ΊοΈ **Auto-Visualization** - Smart choropleth maps, point markers, and heatmaps
- πŸ“Š **Dynamic Charts** - Automatic bar, pie, and line chart generation
- πŸ” **Semantic Discovery** - Finds relevant datasets from 100+ options using AI embeddings
- 🧩 **Multi-Step Analysis** - Complex queries automatically decomposed and executed
- πŸ’‘ **Thinking Transparency** - See the LLM's reasoning process in real-time
- 🎨 **Custom Point Styles** - Icon markers for POI, circle points for large datasets
---
## 🎬 Quick Demo
### Try These Queries
| Query | What You Get |
|-------|--------------|
| "Show me all provinces colored by area" | Choropleth map with size-based gradient |
| "Where are the universities?" | Point map with πŸŽ“ icons |
| "Compare hospital count vs school count by province" | Multi-step analysis with side-by-side bar charts |
| "Show intersections in David as circle points" | 1,288 traffic intersections as simple colored circles |
| "Population density in Veraguas" | H3 hexagon heatmap (33K cells) |
---
## πŸ—οΈ Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Frontend (Next.js) β”‚
β”‚ Chat Interface β”‚ Leaflet Maps β”‚ Data Explorer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ (SSE Streaming)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Backend (FastAPI) β”‚
β”‚ Intent Detection β†’ Semantic Search β†’ SQL Generation β”‚
β”‚ ↓ ↓ ↓ β”‚
β”‚ Gemini LLM DataCatalog (Embeddings) DuckDB Spatial β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
It supports dynamic dataset discovery via semantic embeddings + LLM-generated spatial SQL.
πŸ“– **[Detailed Architecture](ARCHITECTURE.md)**
---
## πŸš€ Quick Start
### Prerequisites
- **Python 3.11+**
- **Node.js 18+**
- **Google AI API Key** ([Get one free](https://aistudio.google.com/app/apikey))
### Installation
```bash
# 1. Clone repository
git clone https://github.com/GerardCB/GeoQuery.git
cd GeoQuery
# 2. Backend setup
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e .
# 3. Configure API key
export GEMINI_API_KEY="your-api-key-here"
# 4. Start backend
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
# 5. Frontend setup (new terminal)
cd frontend
npm install
npm run dev
```
### πŸŽ‰ Done!
Open **http://localhost:3000** and start querying!
πŸ“˜ **[Detailed Setup Guide](SETUP.md)**
---
## πŸ“‚ Project Structure
```
GeoQuery/
β”œβ”€β”€ backend/
β”‚ β”œβ”€β”€ api/ # FastAPI endpoints
β”‚ β”‚ └── endpoints/ # /chat, /catalog, /schema
β”‚ β”œβ”€β”€ core/ # Core services
β”‚ β”‚ β”œβ”€β”€ llm_gateway.py # Gemini API integration
β”‚ β”‚ β”œβ”€β”€ geo_engine.py # DuckDB Spatial wrapper
β”‚ β”‚ β”œβ”€β”€ semantic_search.py # Embedding-based discovery
β”‚ β”‚ β”œβ”€β”€ data_catalog.py # Dataset metadata management
β”‚ β”‚ β”œβ”€β”€ query_planner.py # Multi-step query orchestration
β”‚ β”‚ └── prompts.py # LLM system instructions
β”‚ β”œβ”€β”€ services/ # Business logic
β”‚ β”‚ β”œβ”€β”€ executor.py # Query pipeline orchestrator
β”‚ β”‚ └── response_formatter.py # GeoJSON/chart formatting
β”‚ β”œβ”€β”€ data/ # Datasets and metadata
β”‚ β”‚ β”œβ”€β”€ catalog.json # Dataset registry
β”‚ β”‚ β”œβ”€β”€ embeddings.npy # Vector embeddings
β”‚ β”‚ β”œβ”€β”€ osm/ # OpenStreetMap data
β”‚ β”‚ β”œβ”€β”€ admin/ # Administrative boundaries
β”‚ β”‚ β”œβ”€β”€ global/ # Global datasets (Kontur, etc.)
β”‚ β”‚ └── socioeconomic/ # World Bank, poverty data
β”‚ └── scripts/ # Data ingestion scripts
β”‚ β”œβ”€β”€ download_geofabrik.py
β”‚ β”œβ”€β”€ download_hdx_panama.py
β”‚ └── stri_catalog_scraper.py
β”œβ”€β”€ frontend/
β”‚ └── src/
β”‚ β”œβ”€β”€ app/ # Next.js App Router pages
β”‚ └── components/
β”‚ β”œβ”€β”€ ChatPanel.tsx # Chat interface with SSE
β”‚ β”œβ”€β”€ MapViewer.tsx # Leaflet map with layers
β”‚ └── DataExplorer.tsx # Tabular data view
└── docs/ # Technical documentation
β”œβ”€β”€ backend/ # Backend deep-dives
β”œβ”€β”€ frontend/ # Frontend architecture
└── data/ # Data system docs
```
---
## πŸ”§ Technology Stack
| Layer | Technology | Purpose |
|-------|-----------|---------|
| **LLM** | Google Gemini 2.0 | Intent detection, SQL generation, explanations |
| **Backend** | Python 3.11 + FastAPI | Async HTTP server with SSE streaming |
| **Database** | DuckDB with Spatial | In-memory spatial analytics |
| **Frontend** | Next.js 15 + React 18 | Server-side rendering + interactive UI |
| **Maps** | Leaflet 1.9 | Interactive web maps |
| **Embeddings** | sentence-transformers | Semantic dataset search |
| **Data** | GeoJSON + Parquet | Standardized geospatial formats |
---
## πŸ“Š Available Datasets
GeoQuery currently includes 100+ datasets across multiple categories:
### Administrative
- Panama provinces, districts, corregimientos (HDX 2021)
- Comarca boundaries
- Electoral districts
### Infrastructure
- Roads and highways (OpenStreetMap)
- Hospitals and health facilities (986 locations)
- Universities and schools (200+ institutions)
- Airports, ports, power plants
### Socioeconomic
- World Bank development indicators
- Multidimensional poverty index (MPI)
- Population density (Kontur H3 hexagons - 33K cells)
### Natural Environment
- Protected areas (STRI GIS Portal)
- Forest cover and land use
- Rivers and water bodies
πŸ“– **[Full Dataset List](docs/data/DATASET_SOURCES.md)** | **[Adding New Data](docs/backend/SCRIPTS.md)**
---
## πŸ’‘ How It Works
1. **User Query**: "Show me hospitals in Panama City"
2. **Intent Detection**: LLM classifies as MAP_REQUEST
3. **Semantic Search**: Finds `panama_healthsites_geojson` via embeddings
4. **SQL Generation**: LLM creates: `SELECT name, geom FROM panama_healthsites_geojson WHERE ST_Intersects(geom, (SELECT geom FROM pan_admin2 WHERE adm2_name = 'PanamΓ‘'))`
5. **Execution**: DuckDB Spatial runs query β†’ 45 features
6. **Visualization**: Auto-styled map with πŸ₯ icons
7. **Explanation**: LLM streams natural language summary
**Streaming**: See the LLM's thinking process in real-time via Server-Sent Events.
πŸ“– **[Detailed Data Flow](docs/DATA_FLOW.md)** | **[LLM Integration](docs/backend/LLM_INTEGRATION.md)**
---
## πŸ—ΊοΈ Advanced Features
### Choropleth Maps
Automatically detects numeric columns and creates color gradients:
- **Linear scale**: For area, count
- **Logarithmic scale**: For population, density
### Point Visualization Modes
- **Icon markers** πŸ₯πŸŽ“β›ͺ: For categorical POI (<500 points)
- **Circle points** β­•: For large datasets like intersections (>500 points)
### Spatial Operations
- Intersection: "Find hospitals within protected areas"
- Difference: "Show me areas outside national parks"
- Buffer: "Show 5km radius around hospitals"
### Multi-Step Queries
Complex questions automatically decomposed:
- "Compare population density with hospital coverage by province"
1. Calculate population per province
2. Count hospitals per province
3. Compute ratios
4. Generate comparison chart
---
## πŸ“š Documentation
| Document | Description |
|----------|-------------|
| **[ARCHITECTURE.md](ARCHITECTURE.md)** | System design, components, decisions |
| **[SETUP.md](SETUP.md)** | Development environment setup |
| **[docs/backend/CORE_SERVICES.md](docs/backend/CORE_SERVICES.md)** | Backend services reference |
| **[docs/backend/API_ENDPOINTS.md](docs/backend/API_ENDPOINTS.md)** | API endpoint documentation |
| **[docs/frontend/COMPONENTS.md](docs/frontend/COMPONENTS.md)** | React component architecture |
| **[docs/DATA_FLOW.md](docs/DATA_FLOW.md)** | End-to-end request walkthrough |
---
## πŸ“„ License
MIT License - see **[LICENSE](LICENSE)** for details.
---
## πŸ™ Acknowledgments
**Data Sources**:
- [OpenStreetMap](https://www.openstreetmap.org/) - Infrastructure and POI data
- [Humanitarian Data Exchange (HDX)](https://data.humdata.org/) - Administrative boundaries
- [World Bank Open Data](https://data.worldbank.org/) - Socioeconomic indicators
- [Kontur Population Dataset](https://data.humdata.org/organization/kontur) - H3 population grid
- [STRI GIS Portal](https://stridata-si.opendata.arcgis.com/) - Environmental datasets
**Technologies**:
- [Google Gemini](https://ai.google.dev/) - LLM API
- [DuckDB](https://duckdb.org/) - Fast in-process analytics
- [Leaflet](https://leafletjs.com/) - Interactive maps
- [Next.js](https://nextjs.org/) - React framework