--- title: GeoQuery emoji: 🌍 colorFrom: blue colorTo: green sdk: docker pinned: false license: mit app_port: 7860 --- # GeoQuery πŸŒπŸ€– **Territorial Intelligence Platform** - Natural language interface for geospatial data analysis powered by LLMs and DuckDB Spatial. ![Status](https://img.shields.io/badge/Status-Active-success) ![Python](https://img.shields.io/badge/Python-3.11+-blue) ![Next.js](https://img.shields.io/badge/Next.js-15-black) ![License](https://img.shields.io/badge/License-MIT-green) --- ## ✨ What is GeoQuery? GeoQuery transforms geographic data analysis by combining **Large Language Models** with **spatial databases**. Simply ask questions in natural language and get instant maps, charts, and insights. **Example**: *"Show me hospitals in Panama City"* β†’ Interactive map with 45 hospital locations, automatically styled with πŸ₯ icons. ### Key Capabilities - πŸ—£οΈ **Conversational Queries** - Natural language instead of SQL or GIS interfaces - πŸ—ΊοΈ **Auto-Visualization** - Smart choropleth maps, point markers, and heatmaps - πŸ“Š **Dynamic Charts** - Automatic bar, pie, and line chart generation - πŸ” **Semantic Discovery** - Finds relevant datasets from 100+ options using AI embeddings - 🧩 **Multi-Step Analysis** - Complex queries automatically decomposed and executed - πŸ’‘ **Thinking Transparency** - See the LLM's reasoning process in real-time - 🎨 **Custom Point Styles** - Icon markers for POI, circle points for large datasets --- ## 🎬 Quick Demo ### Try These Queries | Query | What You Get | |-------|--------------| | "Show me all provinces colored by area" | Choropleth map with size-based gradient | | "Where are the universities?" | Point map with πŸŽ“ icons | | "Compare hospital count vs school count by province" | Multi-step analysis with side-by-side bar charts | | "Show intersections in David as circle points" | 1,288 traffic intersections as simple colored circles | | "Population density in Veraguas" | H3 hexagon heatmap (33K cells) | --- ## πŸ—οΈ Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Frontend (Next.js) β”‚ β”‚ Chat Interface β”‚ Leaflet Maps β”‚ Data Explorer β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ (SSE Streaming) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Backend (FastAPI) β”‚ β”‚ Intent Detection β†’ Semantic Search β†’ SQL Generation β”‚ β”‚ ↓ ↓ ↓ β”‚ β”‚ Gemini LLM DataCatalog (Embeddings) DuckDB Spatial β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` It supports dynamic dataset discovery via semantic embeddings + LLM-generated spatial SQL. πŸ“– **[Detailed Architecture](ARCHITECTURE.md)** --- ## πŸš€ Quick Start ### Prerequisites - **Python 3.11+** - **Node.js 18+** - **Google AI API Key** ([Get one free](https://aistudio.google.com/app/apikey)) ### Installation ```bash # 1. Clone repository git clone https://github.com/GerardCB/GeoQuery.git cd GeoQuery # 2. Backend setup cd backend python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -e . # 3. Configure API key export GEMINI_API_KEY="your-api-key-here" # 4. Start backend uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000 # 5. Frontend setup (new terminal) cd frontend npm install npm run dev ``` ### πŸŽ‰ Done! Open **http://localhost:3000** and start querying! πŸ“˜ **[Detailed Setup Guide](SETUP.md)** --- ## πŸ“‚ Project Structure ``` GeoQuery/ β”œβ”€β”€ backend/ β”‚ β”œβ”€β”€ api/ # FastAPI endpoints β”‚ β”‚ └── endpoints/ # /chat, /catalog, /schema β”‚ β”œβ”€β”€ core/ # Core services β”‚ β”‚ β”œβ”€β”€ llm_gateway.py # Gemini API integration β”‚ β”‚ β”œβ”€β”€ geo_engine.py # DuckDB Spatial wrapper β”‚ β”‚ β”œβ”€β”€ semantic_search.py # Embedding-based discovery β”‚ β”‚ β”œβ”€β”€ data_catalog.py # Dataset metadata management β”‚ β”‚ β”œβ”€β”€ query_planner.py # Multi-step query orchestration β”‚ β”‚ └── prompts.py # LLM system instructions β”‚ β”œβ”€β”€ services/ # Business logic β”‚ β”‚ β”œβ”€β”€ executor.py # Query pipeline orchestrator β”‚ β”‚ └── response_formatter.py # GeoJSON/chart formatting β”‚ β”œβ”€β”€ data/ # Datasets and metadata β”‚ β”‚ β”œβ”€β”€ catalog.json # Dataset registry β”‚ β”‚ β”œβ”€β”€ embeddings.npy # Vector embeddings β”‚ β”‚ β”œβ”€β”€ osm/ # OpenStreetMap data β”‚ β”‚ β”œβ”€β”€ admin/ # Administrative boundaries β”‚ β”‚ β”œβ”€β”€ global/ # Global datasets (Kontur, etc.) β”‚ β”‚ └── socioeconomic/ # World Bank, poverty data β”‚ └── scripts/ # Data ingestion scripts β”‚ β”œβ”€β”€ download_geofabrik.py β”‚ β”œβ”€β”€ download_hdx_panama.py β”‚ └── stri_catalog_scraper.py β”œβ”€β”€ frontend/ β”‚ └── src/ β”‚ β”œβ”€β”€ app/ # Next.js App Router pages β”‚ └── components/ β”‚ β”œβ”€β”€ ChatPanel.tsx # Chat interface with SSE β”‚ β”œβ”€β”€ MapViewer.tsx # Leaflet map with layers β”‚ └── DataExplorer.tsx # Tabular data view └── docs/ # Technical documentation β”œβ”€β”€ backend/ # Backend deep-dives β”œβ”€β”€ frontend/ # Frontend architecture └── data/ # Data system docs ``` --- ## πŸ”§ Technology Stack | Layer | Technology | Purpose | |-------|-----------|---------| | **LLM** | Google Gemini 2.0 | Intent detection, SQL generation, explanations | | **Backend** | Python 3.11 + FastAPI | Async HTTP server with SSE streaming | | **Database** | DuckDB with Spatial | In-memory spatial analytics | | **Frontend** | Next.js 15 + React 18 | Server-side rendering + interactive UI | | **Maps** | Leaflet 1.9 | Interactive web maps | | **Embeddings** | sentence-transformers | Semantic dataset search | | **Data** | GeoJSON + Parquet | Standardized geospatial formats | --- ## πŸ“Š Available Datasets GeoQuery currently includes 100+ datasets across multiple categories: ### Administrative - Panama provinces, districts, corregimientos (HDX 2021) - Comarca boundaries - Electoral districts ### Infrastructure - Roads and highways (OpenStreetMap) - Hospitals and health facilities (986 locations) - Universities and schools (200+ institutions) - Airports, ports, power plants ### Socioeconomic - World Bank development indicators - Multidimensional poverty index (MPI) - Population density (Kontur H3 hexagons - 33K cells) ### Natural Environment - Protected areas (STRI GIS Portal) - Forest cover and land use - Rivers and water bodies πŸ“– **[Full Dataset List](docs/data/DATASET_SOURCES.md)** | **[Adding New Data](docs/backend/SCRIPTS.md)** --- ## πŸ’‘ How It Works 1. **User Query**: "Show me hospitals in Panama City" 2. **Intent Detection**: LLM classifies as MAP_REQUEST 3. **Semantic Search**: Finds `panama_healthsites_geojson` via embeddings 4. **SQL Generation**: LLM creates: `SELECT name, geom FROM panama_healthsites_geojson WHERE ST_Intersects(geom, (SELECT geom FROM pan_admin2 WHERE adm2_name = 'PanamΓ‘'))` 5. **Execution**: DuckDB Spatial runs query β†’ 45 features 6. **Visualization**: Auto-styled map with πŸ₯ icons 7. **Explanation**: LLM streams natural language summary **Streaming**: See the LLM's thinking process in real-time via Server-Sent Events. πŸ“– **[Detailed Data Flow](docs/DATA_FLOW.md)** | **[LLM Integration](docs/backend/LLM_INTEGRATION.md)** --- ## πŸ—ΊοΈ Advanced Features ### Choropleth Maps Automatically detects numeric columns and creates color gradients: - **Linear scale**: For area, count - **Logarithmic scale**: For population, density ### Point Visualization Modes - **Icon markers** πŸ₯πŸŽ“β›ͺ: For categorical POI (<500 points) - **Circle points** β­•: For large datasets like intersections (>500 points) ### Spatial Operations - Intersection: "Find hospitals within protected areas" - Difference: "Show me areas outside national parks" - Buffer: "Show 5km radius around hospitals" ### Multi-Step Queries Complex questions automatically decomposed: - "Compare population density with hospital coverage by province" 1. Calculate population per province 2. Count hospitals per province 3. Compute ratios 4. Generate comparison chart --- ## πŸ“š Documentation | Document | Description | |----------|-------------| | **[ARCHITECTURE.md](ARCHITECTURE.md)** | System design, components, decisions | | **[SETUP.md](SETUP.md)** | Development environment setup | | **[docs/backend/CORE_SERVICES.md](docs/backend/CORE_SERVICES.md)** | Backend services reference | | **[docs/backend/API_ENDPOINTS.md](docs/backend/API_ENDPOINTS.md)** | API endpoint documentation | | **[docs/frontend/COMPONENTS.md](docs/frontend/COMPONENTS.md)** | React component architecture | | **[docs/DATA_FLOW.md](docs/DATA_FLOW.md)** | End-to-end request walkthrough | --- ## πŸ“„ License MIT License - see **[LICENSE](LICENSE)** for details. --- ## πŸ™ Acknowledgments **Data Sources**: - [OpenStreetMap](https://www.openstreetmap.org/) - Infrastructure and POI data - [Humanitarian Data Exchange (HDX)](https://data.humdata.org/) - Administrative boundaries - [World Bank Open Data](https://data.worldbank.org/) - Socioeconomic indicators - [Kontur Population Dataset](https://data.humdata.org/organization/kontur) - H3 population grid - [STRI GIS Portal](https://stridata-si.opendata.arcgis.com/) - Environmental datasets **Technologies**: - [Google Gemini](https://ai.google.dev/) - LLM API - [DuckDB](https://duckdb.org/) - Fast in-process analytics - [Leaflet](https://leafletjs.com/) - Interactive maps - [Next.js](https://nextjs.org/) - React framework