File size: 10,580 Bytes
4851501 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 |
---
title: GeoQuery
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# GeoQuery
ππ€
**Territorial Intelligence Platform** - Natural language interface for geospatial data analysis powered by LLMs and DuckDB Spatial.
   
---
## β¨ What is GeoQuery?
GeoQuery transforms geographic data analysis by combining **Large Language Models** with **spatial databases**. Simply ask questions in natural language and get instant maps, charts, and insights.
**Example**: *"Show me hospitals in Panama City"* β Interactive map with 45 hospital locations, automatically styled with π₯ icons.
### Key Capabilities
- π£οΈ **Conversational Queries** - Natural language instead of SQL or GIS interfaces
- πΊοΈ **Auto-Visualization** - Smart choropleth maps, point markers, and heatmaps
- π **Dynamic Charts** - Automatic bar, pie, and line chart generation
- π **Semantic Discovery** - Finds relevant datasets from 100+ options using AI embeddings
- π§© **Multi-Step Analysis** - Complex queries automatically decomposed and executed
- π‘ **Thinking Transparency** - See the LLM's reasoning process in real-time
- π¨ **Custom Point Styles** - Icon markers for POI, circle points for large datasets
---
## π¬ Quick Demo
### Try These Queries
| Query | What You Get |
|-------|--------------|
| "Show me all provinces colored by area" | Choropleth map with size-based gradient |
| "Where are the universities?" | Point map with π icons |
| "Compare hospital count vs school count by province" | Multi-step analysis with side-by-side bar charts |
| "Show intersections in David as circle points" | 1,288 traffic intersections as simple colored circles |
| "Population density in Veraguas" | H3 hexagon heatmap (33K cells) |
---
## ποΈ Architecture
```
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (Next.js) β
β Chat Interface β Leaflet Maps β Data Explorer β
ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β (SSE Streaming)
ββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββ
β Backend (FastAPI) β
β Intent Detection β Semantic Search β SQL Generation β
β β β β β
β Gemini LLM DataCatalog (Embeddings) DuckDB Spatial β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
It supports dynamic dataset discovery via semantic embeddings + LLM-generated spatial SQL.
π **[Detailed Architecture](ARCHITECTURE.md)**
---
## π Quick Start
### Prerequisites
- **Python 3.11+**
- **Node.js 18+**
- **Google AI API Key** ([Get one free](https://aistudio.google.com/app/apikey))
### Installation
```bash
# 1. Clone repository
git clone https://github.com/GerardCB/GeoQuery.git
cd GeoQuery
# 2. Backend setup
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e .
# 3. Configure API key
export GEMINI_API_KEY="your-api-key-here"
# 4. Start backend
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
# 5. Frontend setup (new terminal)
cd frontend
npm install
npm run dev
```
### π Done!
Open **http://localhost:3000** and start querying!
π **[Detailed Setup Guide](SETUP.md)**
---
## π Project Structure
```
GeoQuery/
βββ backend/
β βββ api/ # FastAPI endpoints
β β βββ endpoints/ # /chat, /catalog, /schema
β βββ core/ # Core services
β β βββ llm_gateway.py # Gemini API integration
β β βββ geo_engine.py # DuckDB Spatial wrapper
β β βββ semantic_search.py # Embedding-based discovery
β β βββ data_catalog.py # Dataset metadata management
β β βββ query_planner.py # Multi-step query orchestration
β β βββ prompts.py # LLM system instructions
β βββ services/ # Business logic
β β βββ executor.py # Query pipeline orchestrator
β β βββ response_formatter.py # GeoJSON/chart formatting
β βββ data/ # Datasets and metadata
β β βββ catalog.json # Dataset registry
β β βββ embeddings.npy # Vector embeddings
β β βββ osm/ # OpenStreetMap data
β β βββ admin/ # Administrative boundaries
β β βββ global/ # Global datasets (Kontur, etc.)
β β βββ socioeconomic/ # World Bank, poverty data
β βββ scripts/ # Data ingestion scripts
β βββ download_geofabrik.py
β βββ download_hdx_panama.py
β βββ stri_catalog_scraper.py
βββ frontend/
β βββ src/
β βββ app/ # Next.js App Router pages
β βββ components/
β βββ ChatPanel.tsx # Chat interface with SSE
β βββ MapViewer.tsx # Leaflet map with layers
β βββ DataExplorer.tsx # Tabular data view
βββ docs/ # Technical documentation
βββ backend/ # Backend deep-dives
βββ frontend/ # Frontend architecture
βββ data/ # Data system docs
```
---
## π§ Technology Stack
| Layer | Technology | Purpose |
|-------|-----------|---------|
| **LLM** | Google Gemini 2.0 | Intent detection, SQL generation, explanations |
| **Backend** | Python 3.11 + FastAPI | Async HTTP server with SSE streaming |
| **Database** | DuckDB with Spatial | In-memory spatial analytics |
| **Frontend** | Next.js 15 + React 18 | Server-side rendering + interactive UI |
| **Maps** | Leaflet 1.9 | Interactive web maps |
| **Embeddings** | sentence-transformers | Semantic dataset search |
| **Data** | GeoJSON + Parquet | Standardized geospatial formats |
---
## π Available Datasets
GeoQuery currently includes 100+ datasets across multiple categories:
### Administrative
- Panama provinces, districts, corregimientos (HDX 2021)
- Comarca boundaries
- Electoral districts
### Infrastructure
- Roads and highways (OpenStreetMap)
- Hospitals and health facilities (986 locations)
- Universities and schools (200+ institutions)
- Airports, ports, power plants
### Socioeconomic
- World Bank development indicators
- Multidimensional poverty index (MPI)
- Population density (Kontur H3 hexagons - 33K cells)
### Natural Environment
- Protected areas (STRI GIS Portal)
- Forest cover and land use
- Rivers and water bodies
π **[Full Dataset List](docs/data/DATASET_SOURCES.md)** | **[Adding New Data](docs/backend/SCRIPTS.md)**
---
## π‘ How It Works
1. **User Query**: "Show me hospitals in Panama City"
2. **Intent Detection**: LLM classifies as MAP_REQUEST
3. **Semantic Search**: Finds `panama_healthsites_geojson` via embeddings
4. **SQL Generation**: LLM creates: `SELECT name, geom FROM panama_healthsites_geojson WHERE ST_Intersects(geom, (SELECT geom FROM pan_admin2 WHERE adm2_name = 'PanamΓ‘'))`
5. **Execution**: DuckDB Spatial runs query β 45 features
6. **Visualization**: Auto-styled map with π₯ icons
7. **Explanation**: LLM streams natural language summary
**Streaming**: See the LLM's thinking process in real-time via Server-Sent Events.
π **[Detailed Data Flow](docs/DATA_FLOW.md)** | **[LLM Integration](docs/backend/LLM_INTEGRATION.md)**
---
## πΊοΈ Advanced Features
### Choropleth Maps
Automatically detects numeric columns and creates color gradients:
- **Linear scale**: For area, count
- **Logarithmic scale**: For population, density
### Point Visualization Modes
- **Icon markers** π₯πβͺ: For categorical POI (<500 points)
- **Circle points** β: For large datasets like intersections (>500 points)
### Spatial Operations
- Intersection: "Find hospitals within protected areas"
- Difference: "Show me areas outside national parks"
- Buffer: "Show 5km radius around hospitals"
### Multi-Step Queries
Complex questions automatically decomposed:
- "Compare population density with hospital coverage by province"
1. Calculate population per province
2. Count hospitals per province
3. Compute ratios
4. Generate comparison chart
---
## π Documentation
| Document | Description |
|----------|-------------|
| **[ARCHITECTURE.md](ARCHITECTURE.md)** | System design, components, decisions |
| **[SETUP.md](SETUP.md)** | Development environment setup |
| **[docs/backend/CORE_SERVICES.md](docs/backend/CORE_SERVICES.md)** | Backend services reference |
| **[docs/backend/API_ENDPOINTS.md](docs/backend/API_ENDPOINTS.md)** | API endpoint documentation |
| **[docs/frontend/COMPONENTS.md](docs/frontend/COMPONENTS.md)** | React component architecture |
| **[docs/DATA_FLOW.md](docs/DATA_FLOW.md)** | End-to-end request walkthrough |
---
## π License
MIT License - see **[LICENSE](LICENSE)** for details.
---
## π Acknowledgments
**Data Sources**:
- [OpenStreetMap](https://www.openstreetmap.org/) - Infrastructure and POI data
- [Humanitarian Data Exchange (HDX)](https://data.humdata.org/) - Administrative boundaries
- [World Bank Open Data](https://data.worldbank.org/) - Socioeconomic indicators
- [Kontur Population Dataset](https://data.humdata.org/organization/kontur) - H3 population grid
- [STRI GIS Portal](https://stridata-si.opendata.arcgis.com/) - Environmental datasets
**Technologies**:
- [Google Gemini](https://ai.google.dev/) - LLM API
- [DuckDB](https://duckdb.org/) - Fast in-process analytics
- [Leaflet](https://leafletjs.com/) - Interactive maps
- [Next.js](https://nextjs.org/) - React framework
|