GeoQuery / docs /data /DATASET_SOURCES.md
GerardCB's picture
Deploy to Spaces (Final Clean)
4851501
# Dataset Sources
Complete list of datasets available in GeoQuery with source attributions.
---
## Administrative Boundaries
### Panama Admin Levels (HDX)
**Source**: Humanitarian Data Exchange
**Provider**: INEC (National Institute of Statistics and Census)
**Year**: 2021
**URL**: https://data.humdata.org/dataset/panama-administrative-boundaries
**Files**:
- `hdx/pan_admin1_2021.geojson` - 10 provinces + comarcas
- `hdx/pan_admin2_2021.geojson` - 81 districts
- `hdx/pan_admin3_2021.geojson` - 679 corregimientos
**License**: Creative Commons Attribution
---
## Infrastructure
### Roads (OpenStreetMap via Geofabrik)
**Source**: OpenStreetMap
**Provider**: Geofabrik
**URL**: https://download.geofabrik.de/central-america/panama.html
**Files**:
- `osm/roads.geojson` - Highway network (motorways, primary, secondary roads)
**License**: ODbL (Open Database License)
### Healthcare (Healthsites.io)
**Source**: Healthsites.io / OpenStreetMap
**URL**: https://healthsites.io/
**Files**:
- `osm/healthsites.geojson` - 986 healthcare facilities
**License**: ODbL
### Education (OpenStreetMap)
**Source**: OpenStreetMap
**Files**:
- `osm/universities.geojson` - 67 universities
- `osm/schools.geojson` - Schools and educational facilities
**License**: ODbL
### Other POI (OpenStreetMap)
**Files**:
- `osm/traffic.geojson` - Traffic signals and intersections
- `osm/amenities.geojson` - Various amenities
- `osm/buildings.geojson` - Building footprints
---
## Socioeconomic
### World Bank Development Indicators
**Source**: World Bank Open Data
**URL**: https://data.worldbank.org/
**Files**:
- `worldbank/indicators.geojson` - Country-level indicators joined with geometry
**Indicators Available**:
- GDP per capita
- Life expectancy
- Access to electricity
- Internet users (% of population)
- And more...
**License**: Creative Commons Attribution 4.0
### Multidimensional Poverty Index (MPI)
**Source**: UNDP / Government of Panama
**Files**:
- `socioeconomic/mpi_panama.geojson` - Poverty index by district
**License**: Open Data
### Province Socioeconomic Data
**Source**: INEC Census 2023 (processed)
**Files**:
- `socioeconomic/province_socioeconomic.geojson` - Province-level statistics
**Metrics**:
- Population estimates
- Area
- Demographics
---
## Population
### Kontur Population Dataset
**Source**: Kontur
**Provider**: Meta/Facebook population estimates
**URL**: https://data.humdata.org/organization/kontur
**Files**:
- `kontur/kontur_population_PA_20220630.geojson` - 33,000+ H3 hexagons
**Description**: High-resolution population density grid using H3 spatial index
**License**: Creative Commons Attribution International
---
## Environmental
### STRI GIS Portal
**Source**: Smithsonian Tropical Research Institute
**URL**: https://stridata-si.opendata.arcgis.com/
**Files**:
- `stri/protected_areas_2025.geojson` - Protected areas
- `stri/forest_cover_2021.geojson` - Forest cover classification
**License**: CC BY 4.0
---
## Global Datasets
### Natural Earth
**Source**: Natural Earth Data
**URL**: https://www.naturalearthdata.com/
**Files**:
- `global/countries_110m.geojson` - Country boundaries (low resolution)
**License**: Public Domain
---
## Dataset Statistics
| Category | Datasets | Total Features |
|----------|----------|----------------|
| Administrative | 3 | ~770 |
| Infrastructure | 8 | ~50,000 |
| Socioeconomic | 3 | ~100 |
| Population | 1 | 33,000 |
| Environmental | 2 | ~500 |
| Global | 1 | 177 |
**Total**: ~100 datasets, ~85,000 features
---
## Data Update Schedule
| Dataset | Update Frequency | Last Updated |
|---------|-----------------|--------------|
| OSM Data | Monthly | 2026-01 |
| Admin Boundaries | Yearly | 2021 |
| Kontur Population | Quarterly | 2022-06 |
| STRI Environmental | As released | 2025 |
| World Bank | Annually | 2023 |
---
## Adding New Datasets
See [../backend/SCRIPTS.md](../backend/SCRIPTS.md) for data ingestion procedures.
### Quick Steps
1. Download GeoJSON file
2. Place in appropriate `backend/data/` subdirectory
3. Add entry to `backend/data/catalog.json`:
```json
"my_dataset": {
"path": "category/my_dataset.geojson",
"description": "Short description",
"semantic_description": "Detailed description for AI",
"categories": ["category"],
"tags": ["tag1", "tag2"]
}
```
4. Regenerate embeddings:
```bash
rm backend/data/embeddings.npy
python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()"
```
---
## Data Licenses Summary
- **OpenStreetMap**: ODbL (share-alike, attribution required)
- **HDX/Government**: CC BY (attribution required)
- **World Bank**: CC BY 4.0
- **Natural Earth**: Public Domain
- **STRI**: CC BY 4.0
- **Kontur**: CC BY International
**All datasets permit commercial use with proper attribution.**
---
## Attribution in App
GeoQuery automatically generates citations for query results:
```json
{
"data_citations": [
"Administrative boundary data from HDX/INEC, 2021",
"Healthcare facilities from OpenStreetMap via Healthsites.io"
]
}
```
These appear in the chat response for user transparency.
---
## Next Steps
- **Ingestion Scripts**: [../backend/SCRIPTS.md](../backend/SCRIPTS.md)