# Dataset Sources Complete list of datasets available in GeoQuery with source attributions. --- ## Administrative Boundaries ### Panama Admin Levels (HDX) **Source**: Humanitarian Data Exchange **Provider**: INEC (National Institute of Statistics and Census) **Year**: 2021 **URL**: https://data.humdata.org/dataset/panama-administrative-boundaries **Files**: - `hdx/pan_admin1_2021.geojson` - 10 provinces + comarcas - `hdx/pan_admin2_2021.geojson` - 81 districts - `hdx/pan_admin3_2021.geojson` - 679 corregimientos **License**: Creative Commons Attribution --- ## Infrastructure ### Roads (OpenStreetMap via Geofabrik) **Source**: OpenStreetMap **Provider**: Geofabrik **URL**: https://download.geofabrik.de/central-america/panama.html **Files**: - `osm/roads.geojson` - Highway network (motorways, primary, secondary roads) **License**: ODbL (Open Database License) ### Healthcare (Healthsites.io) **Source**: Healthsites.io / OpenStreetMap **URL**: https://healthsites.io/ **Files**: - `osm/healthsites.geojson` - 986 healthcare facilities **License**: ODbL ### Education (OpenStreetMap) **Source**: OpenStreetMap **Files**: - `osm/universities.geojson` - 67 universities - `osm/schools.geojson` - Schools and educational facilities **License**: ODbL ### Other POI (OpenStreetMap) **Files**: - `osm/traffic.geojson` - Traffic signals and intersections - `osm/amenities.geojson` - Various amenities - `osm/buildings.geojson` - Building footprints --- ## Socioeconomic ### World Bank Development Indicators **Source**: World Bank Open Data **URL**: https://data.worldbank.org/ **Files**: - `worldbank/indicators.geojson` - Country-level indicators joined with geometry **Indicators Available**: - GDP per capita - Life expectancy - Access to electricity - Internet users (% of population) - And more... **License**: Creative Commons Attribution 4.0 ### Multidimensional Poverty Index (MPI) **Source**: UNDP / Government of Panama **Files**: - `socioeconomic/mpi_panama.geojson` - Poverty index by district **License**: Open Data ### Province Socioeconomic Data **Source**: INEC Census 2023 (processed) **Files**: - `socioeconomic/province_socioeconomic.geojson` - Province-level statistics **Metrics**: - Population estimates - Area - Demographics --- ## Population ### Kontur Population Dataset **Source**: Kontur **Provider**: Meta/Facebook population estimates **URL**: https://data.humdata.org/organization/kontur **Files**: - `kontur/kontur_population_PA_20220630.geojson` - 33,000+ H3 hexagons **Description**: High-resolution population density grid using H3 spatial index **License**: Creative Commons Attribution International --- ## Environmental ### STRI GIS Portal **Source**: Smithsonian Tropical Research Institute **URL**: https://stridata-si.opendata.arcgis.com/ **Files**: - `stri/protected_areas_2025.geojson` - Protected areas - `stri/forest_cover_2021.geojson` - Forest cover classification **License**: CC BY 4.0 --- ## Global Datasets ### Natural Earth **Source**: Natural Earth Data **URL**: https://www.naturalearthdata.com/ **Files**: - `global/countries_110m.geojson` - Country boundaries (low resolution) **License**: Public Domain --- ## Dataset Statistics | Category | Datasets | Total Features | |----------|----------|----------------| | Administrative | 3 | ~770 | | Infrastructure | 8 | ~50,000 | | Socioeconomic | 3 | ~100 | | Population | 1 | 33,000 | | Environmental | 2 | ~500 | | Global | 1 | 177 | **Total**: ~100 datasets, ~85,000 features --- ## Data Update Schedule | Dataset | Update Frequency | Last Updated | |---------|-----------------|--------------| | OSM Data | Monthly | 2026-01 | | Admin Boundaries | Yearly | 2021 | | Kontur Population | Quarterly | 2022-06 | | STRI Environmental | As released | 2025 | | World Bank | Annually | 2023 | --- ## Adding New Datasets See [../backend/SCRIPTS.md](../backend/SCRIPTS.md) for data ingestion procedures. ### Quick Steps 1. Download GeoJSON file 2. Place in appropriate `backend/data/` subdirectory 3. Add entry to `backend/data/catalog.json`: ```json "my_dataset": { "path": "category/my_dataset.geojson", "description": "Short description", "semantic_description": "Detailed description for AI", "categories": ["category"], "tags": ["tag1", "tag2"] } ``` 4. Regenerate embeddings: ```bash rm backend/data/embeddings.npy python -c "from backend.core.semantic_search import get_semantic_search; get_semantic_search()" ``` --- ## Data Licenses Summary - **OpenStreetMap**: ODbL (share-alike, attribution required) - **HDX/Government**: CC BY (attribution required) - **World Bank**: CC BY 4.0 - **Natural Earth**: Public Domain - **STRI**: CC BY 4.0 - **Kontur**: CC BY International **All datasets permit commercial use with proper attribution.** --- ## Attribution in App GeoQuery automatically generates citations for query results: ```json { "data_citations": [ "Administrative boundary data from HDX/INEC, 2021", "Healthcare facilities from OpenStreetMap via Healthsites.io" ] } ``` These appear in the chat response for user transparency. --- ## Next Steps - **Ingestion Scripts**: [../backend/SCRIPTS.md](../backend/SCRIPTS.md)