| # Data Flow: End-to-End Request Processing | |
| This document provides a detailed walkthrough of how a user query flows through the GeoQuery system from input to visualization. | |
| --- | |
| ## Overview | |
| ``` | |
| User Query β Intent Detection β Semantic Search β SQL Generation β | |
| Query Execution β Result Formatting β Explanation β Map Rendering | |
| ``` | |
| **Timeline**: 2-8 seconds for typical queries | |
| --- | |
| ## Step-by-Step Walkthrough | |
| ### Example Query | |
| **User Input**: *"Show me hospitals in Panama City"* | |
| --- | |
| ### Step 1: Frontend - User Submits Query | |
| **Component**: `ChatPanel.tsx` | |
| ```typescript | |
| const handleSubmit = async (message: string) => { | |
| // Add user message to chat | |
| setMessages(prev => [...prev, { role: 'user', content: message }]); | |
| // Send to backend via SSE | |
| const response = await fetch('http://localhost:8000/api/chat', { | |
| method: 'POST', | |
| headers: {'Content-Type': 'application/json'}, | |
| body: JSON.stringify({ message, history }) | |
| }); | |
| // Start streaming response | |
| const reader = response.body.getReader(); | |
| ... | |
| }; | |
| ``` | |
| **Request Payload**: | |
| ```json | |
| { | |
| "message": "Show me hospitals in Panama City", | |
| "history": [] | |
| } | |
| ``` | |
| --- | |
| ### Step 2: Backend - API Endpoint Receives Request | |
| **File**: `backend/api/endpoints/chat.py` | |
| ```python | |
| @router.post("/chat") | |
| async def chat(request: ChatRequest): | |
| # Initialize executor | |
| executor = QueryExecutor() | |
| # Process query with streaming | |
| async for event in executor.process_query_stream( | |
| request.message, | |
| request.history | |
| ): | |
| yield sse_format(event) | |
| ``` | |
| **Action**: Routes to `QueryExecutor.process_query_stream()` | |
| --- | |
| ### Step 3: Intent Detection | |
| **Service**: `LLMGateway.detect_intent()` | |
| **File**: `backend/core/llm_gateway.py` | |
| **LLM Prompt**: | |
| ``` | |
| Analyze this user query and determine the best output type. | |
| User Query: "Show me hospitals in Panama City" | |
| THINK STEP BY STEP: | |
| 1. What is the user asking for? | |
| 2. Does this require geographic visualization (map)? | |
| ... | |
| Respond with ONLY ONE of these exact words: | |
| - GENERAL_CHAT | |
| - DATA_QUERY | |
| - MAP_REQUEST | |
| - SPATIAL_OP | |
| - STAT_QUERY | |
| ``` | |
| **Gemini Response**: | |
| ``` | |
| Thinking: The user wants to SEE hospitals on a map, explicitly asks to "show" | |
| Response: MAP_REQUEST | |
| ``` | |
| **Streaming to Frontend**: | |
| ```json | |
| { | |
| "event": "intent", | |
| "data": {"intent": "MAP_REQUEST"} | |
| } | |
| ``` | |
| **Frontend**: Displays intent badge in chat | |
| --- | |
| ### Step 4: Semantic Discovery | |
| **Service**: `SemanticSearch.search_table_names()` | |
| **File**: `backend/core/semantic_search.py` | |
| **Process**: | |
| 1. Convert query to embedding vector (384 dimensions) | |
| ```python | |
| query_embedding = model.encode("Show me hospitals in Panama City") | |
| ``` | |
| 2. Calculate cosine similarity with all dataset embeddings | |
| ```python | |
| similarities = cosine_similarity(query_embedding, catalog_embeddings) | |
| ``` | |
| 3. Return top-k matches | |
| ```python | |
| top_k_indices = np.argsort(similarities)[-15:][::-1] | |
| ``` | |
| **Result**: | |
| ```python | |
| [ | |
| "panama_healthsites_geojson", # similarity: 0.89 | |
| "osm_amenities", # similarity: 0.76 | |
| "panama_hospitals", # similarity: 0.74 | |
| "osm_healthcare", # similarity: 0.71 | |
| ... | |
| ] | |
| ``` | |
| **Streaming to Frontend**: | |
| ```json | |
| { | |
| "event": "status", | |
| "data": {"status": "π Searching data catalog..."} | |
| } | |
| ``` | |
| **Performance**: <10ms for 100+ datasets | |
| --- | |
| ### Step 5: Table Schema Retrieval | |
| **Service**: `GeoEngine.ensure_table_loaded()` | |
| **File**: `backend/core/geo_engine.py` | |
| **Process**: | |
| 1. Check if table already loaded in DuckDB | |
| ```python | |
| if "panama_healthsites_geojson" not in self.loaded_tables: | |
| ``` | |
| 2. Load GeoJSON file | |
| ```python | |
| gdf = gpd.read_file("backend/data/osm/healthsites.geojson") | |
| ``` | |
| 3. Register in DuckDB | |
| ```python | |
| conn.register("panama_healthsites_geojson", gdf) | |
| self.loaded_tables["panama_healthsites_geojson"] = gdf | |
| ``` | |
| **Schema Retrieved**: | |
| ``` | |
| Table: panama_healthsites_geojson | |
| Columns: osm_id, name, amenity, operator, addr:city, geom | |
| Row count: 986 | |
| Geometry type: Point | |
| ``` | |
| **Streaming to Frontend**: | |
| ```json | |
| { | |
| "event": "status", | |
| "data": {"status": "πΎ Loading tables: panama_healthsites_geojson..."} | |
| } | |
| ``` | |
| --- | |
| ### Step 6: SQL Generation | |
| **Service**: `LLMGateway.generate_analytical_sql()` | |
| **File**: `backend/core/llm_gateway.py` | |
| **Prompt to Gemini**: | |
| ``` | |
| You are a DuckDB SQL expert. Generate a valid DuckDB SQL query. | |
| Table: panama_healthsites_geojson | |
| Columns: osm_id, name, amenity, operator, addr:city, geom | |
| Row count: 986 | |
| User Request: "Show me hospitals in Panama City" | |
| Rules: | |
| 1. Return ONLY the SQL query | |
| 2. ALWAYS include 'geom' for map visualization | |
| 3. Use ILIKE for case-insensitive matching | |
| 4. Use ST_Intersects for spatial filtering | |
| ... | |
| Generate SQL: | |
| ``` | |
| **Gemini Thinking** (streamed): | |
| ``` | |
| "I need to: | |
| 1. Filter for amenity='hospital' | |
| 2. Filter for Panama City (district name is 'PanamΓ‘') | |
| 3. Use spatial intersection with admin boundaries | |
| 4. Include geometry for map rendering" | |
| ``` | |
| **Gemini Response**: | |
| ```sql | |
| SELECT | |
| name, | |
| amenity, | |
| operator, | |
| geom | |
| FROM panama_healthsites_geojson | |
| WHERE amenity = 'hospital' | |
| AND ST_Intersects( | |
| geom, | |
| (SELECT geom FROM pan_admin2 WHERE adm2_name = 'PanamΓ‘') | |
| ) | |
| ``` | |
| **Streaming to Frontend**: | |
| ```json | |
| { | |
| "event": "chunk", | |
| "data": {"type": "thought", "content": "I need to filter for hospitals..."} | |
| } | |
| ``` | |
| **Performance**: ~1 second for SQL generation | |
| --- | |
| ### Step 7: Query Execution | |
| **Service**: `GeoEngine.execute_spatial_query()` | |
| **File**: `backend/core/geo_engine.py` | |
| **Execution**: | |
| ```python | |
| try: | |
| result = conn.execute(sql).fetchdf() | |
| geojson = json.loads(result.to_json()) | |
| except Exception as e: | |
| # Try SQL correction | |
| corrected_sql = await llm.correct_sql(query, sql, str(e), schema) | |
| result = conn.execute(corrected_sql).fetchdf() | |
| ``` | |
| **Result**: | |
| ```python | |
| # GeoDataFrame with 45 rows | |
| name amenity operator geom | |
| 0 Hospital Santo TomΓ‘s hospital MINSA POINT(...) | |
| 1 Hospital del NiΓ±o hospital CSS POINT(...) | |
| ... | |
| ``` | |
| **Convert to GeoJSON**: | |
| ```json | |
| { | |
| "type": "FeatureCollection", | |
| "features": [ | |
| { | |
| "type": "Feature", | |
| "geometry": {"type": "Point", "coordinates": [-79.5, 8.98]}, | |
| "properties": {"name": "Hospital Santo TomΓ‘s", "amenity": "hospital"} | |
| }, | |
| ... | |
| ] | |
| } | |
| ``` | |
| **Streaming to Frontend**: | |
| ```json | |
| { | |
| "event": "status", | |
| "data": {"status": "β Found 45 results"} | |
| } | |
| ``` | |
| **Performance**: 100ms - 2s depending on data size | |
| --- | |
| ### Step 8: Result Formatting | |
| **Service**: `ResponseFormatter.format_geojson_layer()` | |
| **File**: `backend/services/response_formatter.py` | |
| **Layer Name Generation**: | |
| ```python | |
| layer_info = await llm.generate_layer_name(query, sql) | |
| # Returns: {"name": "Hospitals in Panama City", "emoji": "π₯", "pointStyle": "icon"} | |
| ``` | |
| **GeoJSON Enhancement**: | |
| ```python | |
| geojson["properties"] = { | |
| "layer_id": "abc123", | |
| "layer_name": "Hospitals in Panama City", | |
| "style": { | |
| "color": "#E63946", | |
| "fillColor": "#E63946", | |
| "opacity": 0.8, | |
| "fillOpacity": 0.4 | |
| }, | |
| "pointMarker": { | |
| "icon": "π₯", | |
| "style": "icon", | |
| "color": "#E63946", | |
| "size": 32 | |
| }, | |
| "choropleth": {"enabled": false} | |
| } | |
| ``` | |
| **Auto-Detection**: | |
| - Detects geometry type (Point) | |
| - Checks for numeric columns (none meaningful) | |
| - Configures point marker style based on `pointStyle: "icon"` | |
| --- | |
| ### Step 9: Explanation Generation | |
| **Service**: `LLMGateway.stream_explanation()` | |
| **File**: `backend/core/llm_gateway.py` | |
| **Prompt to Gemini**: | |
| ``` | |
| Explain the results of this data query to the user. | |
| User Question: "Show me hospitals in Panama City" | |
| SQL Query: SELECT name, amenity, geom FROM ... WHERE amenity='hospital'... | |
| Data Result Summary: Found 45 features (Points) | |
| Instructions: | |
| 1. Keep response concise | |
| 2. Only describe ACTUAL data returned | |
| 3. Cite data source | |
| 4. Speak as GeoQuery | |
| ``` | |
| **Gemini Response** (streamed): | |
| ``` | |
| "I have located 45 hospitals within Panama City district. | |
| The map displays each hospital as a π₯ icon. You can click on any | |
| marker to see details including the hospital name and operator. | |
| **Key facilities include**: | |
| - Hospital Santo TomΓ‘s (MINSA) | |
| - Hospital del NiΓ±o (CSS) | |
| - Hospital Punta Pacifica (Private) | |
| Source: Healthcare facility data from OpenStreetMap via Healthsites.io" | |
| ``` | |
| **Streaming to Frontend**: | |
| ```json | |
| { | |
| "event": "chunk", | |
| "data": {"type": "text", "content": "I have located 45 hospitals..."} | |
| } | |
| ``` | |
| **Performance**: ~1-2 seconds for explanation | |
| --- | |
| ### Step 10: Final Result Event | |
| **Event Sent**: | |
| ```json | |
| { | |
| "event": "result", | |
| "data": { | |
| "response": "I have located 45 hospitals within Panama City...", | |
| "sql_query": "SELECT name, amenity, geom FROM ...", | |
| "geojson": { /* GeoJSON with 45 features */ }, | |
| "chart_data": null, | |
| "raw_data": [ /* 45 rows of data */ ], | |
| "data_citations": [ | |
| "Healthcare facility data from OpenStreetMap via Healthsites.io" | |
| ] | |
| } | |
| } | |
| ``` | |
| --- | |
| ### Step 11: Frontend - Map Rendering | |
| **Component**: `MapViewer.tsx` | |
| **Process**: | |
| 1. Receive GeoJSON from result event | |
| 2. Create new MapLayer | |
| ```typescript | |
| const newLayer: MapLayer = { | |
| id: geojson.properties.layer_id, | |
| name: geojson.properties.layer_name, | |
| data: geojson, | |
| visible: true, | |
| style: geojson.properties.style, | |
| pointMarker: geojson.properties.pointMarker | |
| }; | |
| ``` | |
| 3. Render with Leaflet | |
| ```typescript | |
| <GeoJSON | |
| data={layer.data} | |
| pointToLayer={(feature, latlng) => { | |
| if (layer.pointMarker?.style === "icon") { | |
| return L.marker(latlng, { | |
| icon: L.divIcon({ | |
| html: `<div style="font-size: 32px">${layer.pointMarker.icon}</div>` | |
| }) | |
| }); | |
| } | |
| }} | |
| /> | |
| ``` | |
| 4. Auto-fit bounds to show all hospitals | |
| 5. Display layer in legend panel | |
| **Result**: Interactive map with 45 hospital markers (π₯ icons) | |
| --- | |
| ## Performance Breakdown | |
| | Step | Service | Time | Async | | |
| |------|---------|------|-------| | |
| | 1. Frontend Submit | - | <10ms | - | | |
| | 2. API Routing | FastAPI | <5ms | - | | |
| | 3. Intent Detection | Gemini | ~500ms | β | | |
| | 4. Semantic Search | SentenceTransformer | <10ms | β | | |
| | 5. Schema Loading | DuckDB | 50-200ms | - | | |
| | 6. SQL Generation | Gemini | ~1s | β Streamed | | |
| | 7. Query Execution | DuckDB | 100ms-2s | - | | |
| | 8. Formatting | Python | 10-50ms | - | | |
| | 9. Explanation | Gemini | ~1s | β Streamed | | |
| | 10. Frontend Render | Leaflet | 50-200ms | - | | |
| **Total**: 2-5 seconds (perception: faster due to streaming) | |
| --- | |
| ## Error Handling Flow | |
| ### SQL Execution Failure | |
| ``` | |
| SQL Error β Extract Error Message β Send to LLM β Generate Corrected SQL β | |
| Retry Execution β If Still Fails β Return Error to User | |
| ``` | |
| **Example**: | |
| ```python | |
| try: | |
| result = execute_query(sql) | |
| except Exception as e: | |
| # Error: column "hospitals" does not exist | |
| corrected_sql = await llm.correct_sql(query, sql, str(e), schema) | |
| # LLM fixes: hospitals β panama_healthsites_geojson | |
| result = execute_query(corrected_sql) | |
| ``` | |
| ### Data Unavailable | |
| ``` | |
| LLM Realizes Data Missing β Returns Special Marker β | |
| System Detects Marker β Returns Helpful Error Message | |
| ``` | |
| **Example**: | |
| ```sql | |
| -- ERROR: DATA_UNAVAILABLE | |
| -- Requested: crime statistics | |
| -- Available: admin boundaries, hospitals, schools | |
| ``` | |
| --- | |
| ## Streaming Architecture | |
| **Benefits of SSE (Server-Sent Events)**: | |
| 1. **Progressive Disclosure**: User sees thinking process | |
| 2. **Faster Perceived Performance**: Content streams in | |
| 3. **Transparency**: Shows "why" behind answers | |
| 4. **Simple Protocol**: HTTP-based, works everywhere | |
| **Event Types**: | |
| - `status`: Processing updates ("π Searching...", "β‘ Executing...") | |
| - `intent`: Detected intent category | |
| - `chunk`: Streamed content (thought or text) | |
| - `result`: Final payload with all data | |
| --- | |
| ## Complex Query Flow | |
| For queries requiring multiple steps (e.g., "Compare hospital density with school density by province"): | |
| 1. **Complexity Detection**: QueryPlanner identifies multi-dataset query | |
| 2. **Step Decomposition**: Break into atomic steps | |
| - Step 1: Count hospitals per province | |
| - Step 2: Count schools per province | |
| - Step 3: Calculate ratios | |
| 3. **Parallel Execution**: Execute independent steps concurrently | |
| 4. **Result Combination**: Merge results for final answer | |
| 5. **Unified Explanation**: LLM explains combined analysis | |
| See `backend/core/query_planner.py` for implementation. | |
| --- | |
| ## Next Steps | |
| - **Backend Services**: [backend/CORE_SERVICES.md](backend/CORE_SERVICES.md) | |
| - **API Reference**: [backend/API_ENDPOINTS.md](backend/API_ENDPOINTS.md) | |
| - **Frontend Components**: [frontend/COMPONENTS.md](frontend/COMPONENTS.md) | |