|
|
--- |
|
|
title: GeoQuery |
|
|
emoji: π |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: docker |
|
|
pinned: false |
|
|
license: mit |
|
|
app_port: 7860 |
|
|
--- |
|
|
|
|
|
# GeoQuery |
|
|
ππ€ |
|
|
|
|
|
**Territorial Intelligence Platform** - Natural language interface for geospatial data analysis powered by LLMs and DuckDB Spatial. |
|
|
|
|
|
    |
|
|
|
|
|
--- |
|
|
|
|
|
## β¨ What is GeoQuery? |
|
|
|
|
|
GeoQuery transforms geographic data analysis by combining **Large Language Models** with **spatial databases**. Simply ask questions in natural language and get instant maps, charts, and insights. |
|
|
|
|
|
**Example**: *"Show me hospitals in Panama City"* β Interactive map with 45 hospital locations, automatically styled with π₯ icons. |
|
|
|
|
|
### Key Capabilities |
|
|
|
|
|
- π£οΈ **Conversational Queries** - Natural language instead of SQL or GIS interfaces |
|
|
- πΊοΈ **Auto-Visualization** - Smart choropleth maps, point markers, and heatmaps |
|
|
- π **Dynamic Charts** - Automatic bar, pie, and line chart generation |
|
|
- π **Semantic Discovery** - Finds relevant datasets from 100+ options using AI embeddings |
|
|
- π§© **Multi-Step Analysis** - Complex queries automatically decomposed and executed |
|
|
- π‘ **Thinking Transparency** - See the LLM's reasoning process in real-time |
|
|
- π¨ **Custom Point Styles** - Icon markers for POI, circle points for large datasets |
|
|
|
|
|
--- |
|
|
|
|
|
## π¬ Quick Demo |
|
|
|
|
|
### Try These Queries |
|
|
|
|
|
| Query | What You Get | |
|
|
|-------|--------------| |
|
|
| "Show me all provinces colored by area" | Choropleth map with size-based gradient | |
|
|
| "Where are the universities?" | Point map with π icons | |
|
|
| "Compare hospital count vs school count by province" | Multi-step analysis with side-by-side bar charts | |
|
|
| "Show intersections in David as circle points" | 1,288 traffic intersections as simple colored circles | |
|
|
| "Population density in Veraguas" | H3 hexagon heatmap (33K cells) | |
|
|
|
|
|
--- |
|
|
|
|
|
## ποΈ Architecture |
|
|
|
|
|
``` |
|
|
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β Frontend (Next.js) β |
|
|
β Chat Interface β Leaflet Maps β Data Explorer β |
|
|
ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ |
|
|
β (SSE Streaming) |
|
|
ββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββ |
|
|
β Backend (FastAPI) β |
|
|
β Intent Detection β Semantic Search β SQL Generation β |
|
|
β β β β β |
|
|
β Gemini LLM DataCatalog (Embeddings) DuckDB Spatial β |
|
|
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
``` |
|
|
|
|
|
It supports dynamic dataset discovery via semantic embeddings + LLM-generated spatial SQL. |
|
|
|
|
|
π **[Detailed Architecture](ARCHITECTURE.md)** |
|
|
|
|
|
--- |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Prerequisites |
|
|
|
|
|
- **Python 3.11+** |
|
|
- **Node.js 18+** |
|
|
- **Google AI API Key** ([Get one free](https://aistudio.google.com/app/apikey)) |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
# 1. Clone repository |
|
|
git clone https://github.com/GerardCB/GeoQuery.git |
|
|
cd GeoQuery |
|
|
|
|
|
# 2. Backend setup |
|
|
cd backend |
|
|
python -m venv venv |
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate |
|
|
pip install -e . |
|
|
|
|
|
# 3. Configure API key |
|
|
export GEMINI_API_KEY="your-api-key-here" |
|
|
|
|
|
# 4. Start backend |
|
|
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000 |
|
|
|
|
|
# 5. Frontend setup (new terminal) |
|
|
cd frontend |
|
|
npm install |
|
|
npm run dev |
|
|
``` |
|
|
|
|
|
### π Done! |
|
|
|
|
|
Open **http://localhost:3000** and start querying! |
|
|
|
|
|
π **[Detailed Setup Guide](SETUP.md)** |
|
|
|
|
|
--- |
|
|
|
|
|
## π Project Structure |
|
|
|
|
|
``` |
|
|
GeoQuery/ |
|
|
βββ backend/ |
|
|
β βββ api/ # FastAPI endpoints |
|
|
β β βββ endpoints/ # /chat, /catalog, /schema |
|
|
β βββ core/ # Core services |
|
|
β β βββ llm_gateway.py # Gemini API integration |
|
|
β β βββ geo_engine.py # DuckDB Spatial wrapper |
|
|
β β βββ semantic_search.py # Embedding-based discovery |
|
|
β β βββ data_catalog.py # Dataset metadata management |
|
|
β β βββ query_planner.py # Multi-step query orchestration |
|
|
β β βββ prompts.py # LLM system instructions |
|
|
β βββ services/ # Business logic |
|
|
β β βββ executor.py # Query pipeline orchestrator |
|
|
β β βββ response_formatter.py # GeoJSON/chart formatting |
|
|
β βββ data/ # Datasets and metadata |
|
|
β β βββ catalog.json # Dataset registry |
|
|
β β βββ embeddings.npy # Vector embeddings |
|
|
β β βββ osm/ # OpenStreetMap data |
|
|
β β βββ admin/ # Administrative boundaries |
|
|
β β βββ global/ # Global datasets (Kontur, etc.) |
|
|
β β βββ socioeconomic/ # World Bank, poverty data |
|
|
β βββ scripts/ # Data ingestion scripts |
|
|
β βββ download_geofabrik.py |
|
|
β βββ download_hdx_panama.py |
|
|
β βββ stri_catalog_scraper.py |
|
|
βββ frontend/ |
|
|
β βββ src/ |
|
|
β βββ app/ # Next.js App Router pages |
|
|
β βββ components/ |
|
|
β βββ ChatPanel.tsx # Chat interface with SSE |
|
|
β βββ MapViewer.tsx # Leaflet map with layers |
|
|
β βββ DataExplorer.tsx # Tabular data view |
|
|
βββ docs/ # Technical documentation |
|
|
βββ backend/ # Backend deep-dives |
|
|
βββ frontend/ # Frontend architecture |
|
|
βββ data/ # Data system docs |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Technology Stack |
|
|
|
|
|
| Layer | Technology | Purpose | |
|
|
|-------|-----------|---------| |
|
|
| **LLM** | Google Gemini 2.0 | Intent detection, SQL generation, explanations | |
|
|
| **Backend** | Python 3.11 + FastAPI | Async HTTP server with SSE streaming | |
|
|
| **Database** | DuckDB with Spatial | In-memory spatial analytics | |
|
|
| **Frontend** | Next.js 15 + React 18 | Server-side rendering + interactive UI | |
|
|
| **Maps** | Leaflet 1.9 | Interactive web maps | |
|
|
| **Embeddings** | sentence-transformers | Semantic dataset search | |
|
|
| **Data** | GeoJSON + Parquet | Standardized geospatial formats | |
|
|
|
|
|
--- |
|
|
|
|
|
## π Available Datasets |
|
|
|
|
|
GeoQuery currently includes 100+ datasets across multiple categories: |
|
|
|
|
|
### Administrative |
|
|
- Panama provinces, districts, corregimientos (HDX 2021) |
|
|
- Comarca boundaries |
|
|
- Electoral districts |
|
|
|
|
|
### Infrastructure |
|
|
- Roads and highways (OpenStreetMap) |
|
|
- Hospitals and health facilities (986 locations) |
|
|
- Universities and schools (200+ institutions) |
|
|
- Airports, ports, power plants |
|
|
|
|
|
### Socioeconomic |
|
|
- World Bank development indicators |
|
|
- Multidimensional poverty index (MPI) |
|
|
- Population density (Kontur H3 hexagons - 33K cells) |
|
|
|
|
|
### Natural Environment |
|
|
- Protected areas (STRI GIS Portal) |
|
|
- Forest cover and land use |
|
|
- Rivers and water bodies |
|
|
|
|
|
π **[Full Dataset List](docs/data/DATASET_SOURCES.md)** | **[Adding New Data](docs/backend/SCRIPTS.md)** |
|
|
|
|
|
--- |
|
|
|
|
|
## π‘ How It Works |
|
|
|
|
|
1. **User Query**: "Show me hospitals in Panama City" |
|
|
2. **Intent Detection**: LLM classifies as MAP_REQUEST |
|
|
3. **Semantic Search**: Finds `panama_healthsites_geojson` via embeddings |
|
|
4. **SQL Generation**: LLM creates: `SELECT name, geom FROM panama_healthsites_geojson WHERE ST_Intersects(geom, (SELECT geom FROM pan_admin2 WHERE adm2_name = 'PanamΓ‘'))` |
|
|
5. **Execution**: DuckDB Spatial runs query β 45 features |
|
|
6. **Visualization**: Auto-styled map with π₯ icons |
|
|
7. **Explanation**: LLM streams natural language summary |
|
|
|
|
|
**Streaming**: See the LLM's thinking process in real-time via Server-Sent Events. |
|
|
|
|
|
π **[Detailed Data Flow](docs/DATA_FLOW.md)** | **[LLM Integration](docs/backend/LLM_INTEGRATION.md)** |
|
|
|
|
|
--- |
|
|
|
|
|
## πΊοΈ Advanced Features |
|
|
|
|
|
### Choropleth Maps |
|
|
Automatically detects numeric columns and creates color gradients: |
|
|
- **Linear scale**: For area, count |
|
|
- **Logarithmic scale**: For population, density |
|
|
|
|
|
### Point Visualization Modes |
|
|
- **Icon markers** π₯πβͺ: For categorical POI (<500 points) |
|
|
- **Circle points** β: For large datasets like intersections (>500 points) |
|
|
|
|
|
### Spatial Operations |
|
|
- Intersection: "Find hospitals within protected areas" |
|
|
- Difference: "Show me areas outside national parks" |
|
|
- Buffer: "Show 5km radius around hospitals" |
|
|
|
|
|
### Multi-Step Queries |
|
|
Complex questions automatically decomposed: |
|
|
- "Compare population density with hospital coverage by province" |
|
|
1. Calculate population per province |
|
|
2. Count hospitals per province |
|
|
3. Compute ratios |
|
|
4. Generate comparison chart |
|
|
|
|
|
--- |
|
|
|
|
|
## π Documentation |
|
|
|
|
|
| Document | Description | |
|
|
|----------|-------------| |
|
|
| **[ARCHITECTURE.md](ARCHITECTURE.md)** | System design, components, decisions | |
|
|
| **[SETUP.md](SETUP.md)** | Development environment setup | |
|
|
| **[docs/backend/CORE_SERVICES.md](docs/backend/CORE_SERVICES.md)** | Backend services reference | |
|
|
| **[docs/backend/API_ENDPOINTS.md](docs/backend/API_ENDPOINTS.md)** | API endpoint documentation | |
|
|
| **[docs/frontend/COMPONENTS.md](docs/frontend/COMPONENTS.md)** | React component architecture | |
|
|
| **[docs/DATA_FLOW.md](docs/DATA_FLOW.md)** | End-to-end request walkthrough | |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
|
|
|
MIT License - see **[LICENSE](LICENSE)** for details. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
**Data Sources**: |
|
|
- [OpenStreetMap](https://www.openstreetmap.org/) - Infrastructure and POI data |
|
|
- [Humanitarian Data Exchange (HDX)](https://data.humdata.org/) - Administrative boundaries |
|
|
- [World Bank Open Data](https://data.worldbank.org/) - Socioeconomic indicators |
|
|
- [Kontur Population Dataset](https://data.humdata.org/organization/kontur) - H3 population grid |
|
|
- [STRI GIS Portal](https://stridata-si.opendata.arcgis.com/) - Environmental datasets |
|
|
|
|
|
**Technologies**: |
|
|
- [Google Gemini](https://ai.google.dev/) - LLM API |
|
|
- [DuckDB](https://duckdb.org/) - Fast in-process analytics |
|
|
- [Leaflet](https://leafletjs.com/) - Interactive maps |
|
|
- [Next.js](https://nextjs.org/) - React framework |
|
|
|
|
|
|