Data Flow: End-to-End Request Processing
This document provides a detailed walkthrough of how a user query flows through the GeoQuery system from input to visualization.
Overview
User Query → Intent Detection → Semantic Search → SQL Generation →
Query Execution → Result Formatting → Explanation → Map Rendering
Timeline: 2-8 seconds for typical queries
Step-by-Step Walkthrough
Example Query
User Input: "Show me hospitals in Panama City"
Step 1: Frontend - User Submits Query
Component: ChatPanel.tsx
const handleSubmit = async (message: string) => {
// Add user message to chat
setMessages(prev => [...prev, { role: 'user', content: message }]);
// Send to backend via SSE
const response = await fetch('http://localhost:8000/api/chat', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({ message, history })
});
// Start streaming response
const reader = response.body.getReader();
...
};
Request Payload:
{
"message": "Show me hospitals in Panama City",
"history": []
}
Step 2: Backend - API Endpoint Receives Request
File: backend/api/endpoints/chat.py
@router.post("/chat")
async def chat(request: ChatRequest):
# Initialize executor
executor = QueryExecutor()
# Process query with streaming
async for event in executor.process_query_stream(
request.message,
request.history
):
yield sse_format(event)
Action: Routes to QueryExecutor.process_query_stream()
Step 3: Intent Detection
Service: LLMGateway.detect_intent()
File: backend/core/llm_gateway.py
LLM Prompt:
Analyze this user query and determine the best output type.
User Query: "Show me hospitals in Panama City"
THINK STEP BY STEP:
1. What is the user asking for?
2. Does this require geographic visualization (map)?
...
Respond with ONLY ONE of these exact words:
- GENERAL_CHAT
- DATA_QUERY
- MAP_REQUEST
- SPATIAL_OP
- STAT_QUERY
Gemini Response:
Thinking: The user wants to SEE hospitals on a map, explicitly asks to "show"
Response: MAP_REQUEST
Streaming to Frontend:
{
"event": "intent",
"data": {"intent": "MAP_REQUEST"}
}
Frontend: Displays intent badge in chat
Step 4: Semantic Discovery
Service: SemanticSearch.search_table_names()
File: backend/core/semantic_search.py
Process:
Convert query to embedding vector (384 dimensions)
query_embedding = model.encode("Show me hospitals in Panama City")Calculate cosine similarity with all dataset embeddings
similarities = cosine_similarity(query_embedding, catalog_embeddings)Return top-k matches
top_k_indices = np.argsort(similarities)[-15:][::-1]
Result:
[
"panama_healthsites_geojson", # similarity: 0.89
"osm_amenities", # similarity: 0.76
"panama_hospitals", # similarity: 0.74
"osm_healthcare", # similarity: 0.71
...
]
Streaming to Frontend:
{
"event": "status",
"data": {"status": "📚 Searching data catalog..."}
}
Performance: <10ms for 100+ datasets
Step 5: Table Schema Retrieval
Service: GeoEngine.ensure_table_loaded()
File: backend/core/geo_engine.py
Process:
Check if table already loaded in DuckDB
if "panama_healthsites_geojson" not in self.loaded_tables:Load GeoJSON file
gdf = gpd.read_file("backend/data/osm/healthsites.geojson")Register in DuckDB
conn.register("panama_healthsites_geojson", gdf) self.loaded_tables["panama_healthsites_geojson"] = gdf
Schema Retrieved:
Table: panama_healthsites_geojson
Columns: osm_id, name, amenity, operator, addr:city, geom
Row count: 986
Geometry type: Point
Streaming to Frontend:
{
"event": "status",
"data": {"status": "💾 Loading tables: panama_healthsites_geojson..."}
}
Step 6: SQL Generation
Service: LLMGateway.generate_analytical_sql()
File: backend/core/llm_gateway.py
Prompt to Gemini:
You are a DuckDB SQL expert. Generate a valid DuckDB SQL query.
Table: panama_healthsites_geojson
Columns: osm_id, name, amenity, operator, addr:city, geom
Row count: 986
User Request: "Show me hospitals in Panama City"
Rules:
1. Return ONLY the SQL query
2. ALWAYS include 'geom' for map visualization
3. Use ILIKE for case-insensitive matching
4. Use ST_Intersects for spatial filtering
...
Generate SQL:
Gemini Thinking (streamed):
"I need to:
1. Filter for amenity='hospital'
2. Filter for Panama City (district name is 'Panamá')
3. Use spatial intersection with admin boundaries
4. Include geometry for map rendering"
Gemini Response:
SELECT
name,
amenity,
operator,
geom
FROM panama_healthsites_geojson
WHERE amenity = 'hospital'
AND ST_Intersects(
geom,
(SELECT geom FROM pan_admin2 WHERE adm2_name = 'Panamá')
)
Streaming to Frontend:
{
"event": "chunk",
"data": {"type": "thought", "content": "I need to filter for hospitals..."}
}
Performance: ~1 second for SQL generation
Step 7: Query Execution
Service: GeoEngine.execute_spatial_query()
File: backend/core/geo_engine.py
Execution:
try:
result = conn.execute(sql).fetchdf()
geojson = json.loads(result.to_json())
except Exception as e:
# Try SQL correction
corrected_sql = await llm.correct_sql(query, sql, str(e), schema)
result = conn.execute(corrected_sql).fetchdf()
Result:
# GeoDataFrame with 45 rows
name amenity operator geom
0 Hospital Santo Tomás hospital MINSA POINT(...)
1 Hospital del Niño hospital CSS POINT(...)
...
Convert to GeoJSON:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {"type": "Point", "coordinates": [-79.5, 8.98]},
"properties": {"name": "Hospital Santo Tomás", "amenity": "hospital"}
},
...
]
}
Streaming to Frontend:
{
"event": "status",
"data": {"status": "✅ Found 45 results"}
}
Performance: 100ms - 2s depending on data size
Step 8: Result Formatting
Service: ResponseFormatter.format_geojson_layer()
File: backend/services/response_formatter.py
Layer Name Generation:
layer_info = await llm.generate_layer_name(query, sql)
# Returns: {"name": "Hospitals in Panama City", "emoji": "🏥", "pointStyle": "icon"}
GeoJSON Enhancement:
geojson["properties"] = {
"layer_id": "abc123",
"layer_name": "Hospitals in Panama City",
"style": {
"color": "#E63946",
"fillColor": "#E63946",
"opacity": 0.8,
"fillOpacity": 0.4
},
"pointMarker": {
"icon": "🏥",
"style": "icon",
"color": "#E63946",
"size": 32
},
"choropleth": {"enabled": false}
}
Auto-Detection:
- Detects geometry type (Point)
- Checks for numeric columns (none meaningful)
- Configures point marker style based on
pointStyle: "icon"
Step 9: Explanation Generation
Service: LLMGateway.stream_explanation()
File: backend/core/llm_gateway.py
Prompt to Gemini:
Explain the results of this data query to the user.
User Question: "Show me hospitals in Panama City"
SQL Query: SELECT name, amenity, geom FROM ... WHERE amenity='hospital'...
Data Result Summary: Found 45 features (Points)
Instructions:
1. Keep response concise
2. Only describe ACTUAL data returned
3. Cite data source
4. Speak as GeoQuery
Gemini Response (streamed):
"I have located 45 hospitals within Panama City district.
The map displays each hospital as a 🏥 icon. You can click on any
marker to see details including the hospital name and operator.
**Key facilities include**:
- Hospital Santo Tomás (MINSA)
- Hospital del Niño (CSS)
- Hospital Punta Pacifica (Private)
Source: Healthcare facility data from OpenStreetMap via Healthsites.io"
Streaming to Frontend:
{
"event": "chunk",
"data": {"type": "text", "content": "I have located 45 hospitals..."}
}
Performance: ~1-2 seconds for explanation
Step 10: Final Result Event
Event Sent:
{
"event": "result",
"data": {
"response": "I have located 45 hospitals within Panama City...",
"sql_query": "SELECT name, amenity, geom FROM ...",
"geojson": { /* GeoJSON with 45 features */ },
"chart_data": null,
"raw_data": [ /* 45 rows of data */ ],
"data_citations": [
"Healthcare facility data from OpenStreetMap via Healthsites.io"
]
}
}
Step 11: Frontend - Map Rendering
Component: MapViewer.tsx
Process:
Receive GeoJSON from result event
Create new MapLayer
const newLayer: MapLayer = { id: geojson.properties.layer_id, name: geojson.properties.layer_name, data: geojson, visible: true, style: geojson.properties.style, pointMarker: geojson.properties.pointMarker };Render with Leaflet
<GeoJSON data={layer.data} pointToLayer={(feature, latlng) => { if (layer.pointMarker?.style === "icon") { return L.marker(latlng, { icon: L.divIcon({ html: `<div style="font-size: 32px">${layer.pointMarker.icon}</div>` }) }); } }} />Auto-fit bounds to show all hospitals
Display layer in legend panel
Result: Interactive map with 45 hospital markers (🏥 icons)
Performance Breakdown
| Step | Service | Time | Async |
|---|---|---|---|
| 1. Frontend Submit | - | <10ms | - |
| 2. API Routing | FastAPI | <5ms | - |
| 3. Intent Detection | Gemini | ~500ms | ✓ |
| 4. Semantic Search | SentenceTransformer | <10ms | ✓ |
| 5. Schema Loading | DuckDB | 50-200ms | - |
| 6. SQL Generation | Gemini | ~1s | ✓ Streamed |
| 7. Query Execution | DuckDB | 100ms-2s | - |
| 8. Formatting | Python | 10-50ms | - |
| 9. Explanation | Gemini | ~1s | ✓ Streamed |
| 10. Frontend Render | Leaflet | 50-200ms | - |
Total: 2-5 seconds (perception: faster due to streaming)
Error Handling Flow
SQL Execution Failure
SQL Error → Extract Error Message → Send to LLM → Generate Corrected SQL →
Retry Execution → If Still Fails → Return Error to User
Example:
try:
result = execute_query(sql)
except Exception as e:
# Error: column "hospitals" does not exist
corrected_sql = await llm.correct_sql(query, sql, str(e), schema)
# LLM fixes: hospitals → panama_healthsites_geojson
result = execute_query(corrected_sql)
Data Unavailable
LLM Realizes Data Missing → Returns Special Marker →
System Detects Marker → Returns Helpful Error Message
Example:
-- ERROR: DATA_UNAVAILABLE
-- Requested: crime statistics
-- Available: admin boundaries, hospitals, schools
Streaming Architecture
Benefits of SSE (Server-Sent Events):
- Progressive Disclosure: User sees thinking process
- Faster Perceived Performance: Content streams in
- Transparency: Shows "why" behind answers
- Simple Protocol: HTTP-based, works everywhere
Event Types:
status: Processing updates ("🔍 Searching...", "⚡ Executing...")intent: Detected intent categorychunk: Streamed content (thought or text)result: Final payload with all data
Complex Query Flow
For queries requiring multiple steps (e.g., "Compare hospital density with school density by province"):
- Complexity Detection: QueryPlanner identifies multi-dataset query
- Step Decomposition: Break into atomic steps
- Step 1: Count hospitals per province
- Step 2: Count schools per province
- Step 3: Calculate ratios
- Parallel Execution: Execute independent steps concurrently
- Result Combination: Merge results for final answer
- Unified Explanation: LLM explains combined analysis
See backend/core/query_planner.py for implementation.
Next Steps
- Backend Services: backend/CORE_SERVICES.md
- API Reference: backend/API_ENDPOINTS.md
- Frontend Components: frontend/COMPONENTS.md