| # Dataset Sources | |
| Complete list of datasets available in GeoQuery with source attributions. | |
| --- | |
| ## Administrative Boundaries | |
| ### Panama Admin Levels (HDX) | |
| **Source**: Humanitarian Data Exchange | |
| **Provider**: INEC (National Institute of Statistics and Census) | |
| **Year**: 2021 | |
| **URL**: https://data.humdata.org/dataset/panama-administrative-boundaries | |
| **Files**: | |
| - `hdx/pan_admin1_2021.geojson` - 10 provinces + comarcas | |
| - `hdx/pan_admin2_2021.geojson` - 81 districts | |
| - `hdx/pan_admin3_2021.geojson` - 679 corregimientos | |
| **License**: Creative Commons Attribution | |
| --- | |
| ## Infrastructure | |
| ### Roads (OpenStreetMap via Geofabrik) | |
| **Source**: OpenStreetMap | |
| **Provider**: Geofabrik | |
| **URL**: https://download.geofabrik.de/central-america/panama.html | |
| **Files**: | |
| - `osm/roads.geojson` - Highway network (motorways, primary, secondary roads) | |
| **License**: ODbL (Open Database License) | |
| ### Healthcare (Healthsites.io) | |
| **Source**: Healthsites.io / OpenStreetMap | |
| **URL**: https://healthsites.io/ | |
| **Files**: | |
| - `osm/healthsites.geojson` - 986 healthcare facilities | |
| **License**: ODbL | |
| ### Education (OpenStreetMap) | |
| **Source**: OpenStreetMap | |
| **Files**: | |
| - `osm/universities.geojson` - 67 universities | |
| - `osm/schools.geojson` - Schools and educational facilities | |
| **License**: ODbL | |
| ### Other POI (OpenStreetMap) | |
| **Files**: | |
| - `osm/traffic.geojson` - Traffic signals and intersections | |
| - `osm/amenities.geojson` - Various amenities | |
| - `osm/buildings.geojson` - Building footprints | |
| --- | |
| ## Socioeconomic | |
| ### World Bank Development Indicators | |
| **Source**: World Bank Open Data | |
| **URL**: https://data.worldbank.org/ | |
| **Files**: | |
| - `worldbank/indicators.geojson` - Country-level indicators joined with geometry | |
| **Indicators Available**: | |
| - GDP per capita | |
| - Life expectancy | |
| - Access to electricity | |
| - Internet users (% of population) | |
| - And more... | |
| **License**: Creative Commons Attribution 4.0 | |
| ### Multidimensional Poverty Index (MPI) | |
| **Source**: UNDP / Government of Panama | |
| **Files**: | |
| - `socioeconomic/mpi_panama.geojson` - Poverty index by district | |
| **License**: Open Data | |
| ### Province Socioeconomic Data | |
| **Source**: INEC Census 2023 (processed) | |
| **Files**: | |
| - `socioeconomic/province_socioeconomic.geojson` - Province-level statistics | |
| **Metrics**: | |
| - Population estimates | |
| - Area | |
| - Demographics | |
| --- | |
| ## Population | |
| ### Kontur Population Dataset | |
| **Source**: Kontur | |
| **Provider**: Meta/Facebook population estimates | |
| **URL**: https://data.humdata.org/organization/kontur | |
| **Files**: | |
| - `kontur/kontur_population_PA_20220630.geojson` - 33,000+ H3 hexagons | |
| **Description**: High-resolution population density grid using H3 spatial index | |
| **License**: Creative Commons Attribution International | |
| --- | |
| ## Environmental | |
| ### STRI GIS Portal | |
| **Source**: Smithsonian Tropical Research Institute | |
| **URL**: https://stridata-si.opendata.arcgis.com/ | |
| **Files**: | |
| - `stri/protected_areas_2025.geojson` - Protected areas | |
| - `stri/forest_cover_2021.geojson` - Forest cover classification | |
| **License**: CC BY 4.0 | |
| --- | |
| ## Global Datasets | |
| ### Natural Earth | |
| **Source**: Natural Earth Data | |
| **URL**: https://www.naturalearthdata.com/ | |
| **Files**: | |
| - `global/countries_110m.geojson` - Country boundaries (low resolution) | |
| **License**: Public Domain | |
| --- | |
| ## Dataset Statistics | |
| | Category | Datasets | Total Features | | |
| |----------|----------|----------------| | |
| | Administrative | 3 | ~770 | | |
| | Infrastructure | 8 | ~50,000 | | |
| | Socioeconomic | 3 | ~100 | | |
| | Population | 1 | 33,000 | | |
| | Environmental | 2 | ~500 | | |
| | Global | 1 | 177 | | |
| **Total**: ~100 datasets, ~85,000 features | |
| --- | |
| ## Data Update Schedule | |
| | Dataset | Update Frequency | Last Updated | | |
| |---------|-----------------|--------------| | |
| | OSM Data | Monthly | 2026-01 | | |
| | Admin Boundaries | Yearly | 2021 | | |
| | Kontur Population | Quarterly | 2022-06 | | |
| | STRI Environmental | As released | 2025 | | |
| | World Bank | Annually | 2023 | | |
| --- | |
| ## Adding New Datasets | |
| See [../backend/SCRIPTS.md](../backend/SCRIPTS.md) for data ingestion procedures. | |
| ### Quick Steps | |
| 1. Download GeoJSON file | |
| 2. Place in appropriate `backend/data/` subdirectory | |
| 3. Add entry to `backend/data/catalog.json`: | |
| ```json | |
| "my_dataset": { | |
| "path": "category/my_dataset.geojson", | |
| "description": "Short description", | |
| "semantic_description": "Detailed description for AI", | |
| "categories": ["category"], | |
| "tags": ["tag1", "tag2"] | |
| } | |
| ``` | |
| 4. Regenerate embeddings: | |
| ```bash | |
| rm backend/data/embeddings.npy | |
| python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()" | |
| ``` | |
| --- | |
| ## Data Licenses Summary | |
| - **OpenStreetMap**: ODbL (share-alike, attribution required) | |
| - **HDX/Government**: CC BY (attribution required) | |
| - **World Bank**: CC BY 4.0 | |
| - **Natural Earth**: Public Domain | |
| - **STRI**: CC BY 4.0 | |
| - **Kontur**: CC BY International | |
| **All datasets permit commercial use with proper attribution.** | |
| --- | |
| ## Attribution in App | |
| GeoQuery automatically generates citations for query results: | |
| ```json | |
| { | |
| "data_citations": [ | |
| "Administrative boundary data from HDX/INEC, 2021", | |
| "Healthcare facilities from OpenStreetMap via Healthsites.io" | |
| ] | |
| } | |
| ``` | |
| These appear in the chat response for user transparency. | |
| --- | |
| ## Next Steps | |
| - **Ingestion Scripts**: [../backend/SCRIPTS.md](../backend/SCRIPTS.md) | |