| # Dual-Path RAG Architecture | |
| ## Overview | |
| Dual-Path RAG là kiến trúc tối ưu cho chatbot legal, tách biệt 2 đường xử lý: | |
| - **Fast Path**: Golden dataset (200 câu phổ biến) → <200ms, 100% accuracy | |
| - **Slow Path**: Full RAG pipeline → 4-8s, 99.99% accuracy | |
| ## Architecture | |
| ``` | |
| User Query | |
| ↓ | |
| Intent Classification | |
| ↓ | |
| Dual-Path Router | |
| ├─ Keyword Router (exact/fuzzy match) | |
| ├─ Semantic Similarity Search (threshold 0.85) | |
| └─ LLM Router (optional, for edge cases) | |
| ↓ | |
| ┌─────────────────┬─────────────────┐ | |
| │ Fast Path │ Slow Path │ | |
| │ (<200ms) │ (4-8s) │ | |
| │ │ │ | |
| │ Golden Dataset │ Full RAG: │ | |
| │ - Exact match │ - Hybrid Search │ | |
| │ - Fuzzy match │ - Top 20 docs │ | |
| │ - Similarity │ - LLM Generation │ | |
| │ │ - Guardrails │ | |
| │ 100% accuracy │ 99.99% accuracy │ | |
| └─────────────────┴─────────────────┘ | |
| ↓ | |
| Response + Routing Log | |
| ``` | |
| ## Components | |
| ### 1. Database Models | |
| **GoldenQuery**: Stores verified queries and responses | |
| - `query`, `query_normalized`, `query_embedding` | |
| - `intent`, `response_message`, `response_data` | |
| - `verified_by`, `usage_count`, `accuracy_score` | |
| **QueryRoutingLog**: Logs routing decisions for monitoring | |
| - `route` (fast_path/slow_path) | |
| - `router_method` (keyword/similarity/llm/default) | |
| - `response_time_ms`, `similarity_score` | |
| ### 2. Router Components | |
| **KeywordRouter**: Fast keyword-based matching | |
| - Exact match (normalized query) | |
| - Fuzzy match (70% word overlap) | |
| - ~1-5ms latency | |
| **DualPathRouter**: Main router with hybrid logic | |
| - Step 1: Keyword routing (fastest) | |
| - Step 2: Semantic similarity (threshold 0.85) | |
| - Step 3: LLM router fallback (optional) | |
| - Default: Slow Path | |
| ### 3. Path Handlers | |
| **FastPathHandler**: Returns cached responses from golden dataset | |
| - Increments usage count | |
| - Returns verified response instantly | |
| **SlowPathHandler**: Full RAG pipeline | |
| - Hybrid search (BM25 + vector) | |
| - Top 20 documents | |
| - LLM generation with structured output | |
| - Auto-save high-quality responses to golden dataset | |
| ## Setup | |
| ### 1. Run Migration | |
| ```bash | |
| cd backend/hue_portal | |
| python manage.py migrate core | |
| ``` | |
| ### 2. Import Initial Golden Dataset | |
| ```bash | |
| # Import from JSON file | |
| python manage.py manage_golden_dataset import --file golden_queries.json --format json | |
| # Or import from CSV | |
| python manage.py manage_golden_dataset import --file golden_queries.csv --format csv | |
| ``` | |
| ### 3. Generate Embeddings (for semantic search) | |
| ```bash | |
| # Generate embeddings for all queries | |
| python manage.py manage_golden_dataset update_embeddings | |
| # Or for specific query | |
| python manage.py manage_golden_dataset update_embeddings --query-id 123 | |
| ``` | |
| ## Management Commands | |
| ### Import Queries | |
| ```bash | |
| python manage.py manage_golden_dataset import \ | |
| --file golden_queries.json \ | |
| --format json \ | |
| --verify-by legal_expert \ | |
| --skip-embeddings # Skip if embeddings will be generated later | |
| ``` | |
| ### Verify Query | |
| ```bash | |
| python manage.py manage_golden_dataset verify \ | |
| --query-id 123 \ | |
| --verify-by gpt4 \ | |
| --accuracy 1.0 | |
| ``` | |
| ### Update Embeddings | |
| ```bash | |
| python manage.py manage_golden_dataset update_embeddings \ | |
| --batch-size 10 | |
| ``` | |
| ### View Statistics | |
| ```bash | |
| python manage.py manage_golden_dataset stats | |
| ``` | |
| ### Export Dataset | |
| ```bash | |
| python manage.py manage_golden_dataset export \ | |
| --file exported_queries.json \ | |
| --active-only | |
| ``` | |
| ### Delete Query | |
| ```bash | |
| # Soft delete (deactivate) | |
| python manage.py manage_golden_dataset delete --query-id 123 --soft | |
| # Hard delete | |
| python manage.py manage_golden_dataset delete --query-id 123 | |
| ``` | |
| ## API Endpoints | |
| ### Chat Endpoint (unchanged) | |
| ``` | |
| POST /api/chatbot/chat/ | |
| { | |
| "message": "Mức phạt vượt đèn đỏ là bao nhiêu?", | |
| "session_id": "optional-uuid", | |
| "reset_session": false | |
| } | |
| ``` | |
| Response includes routing metadata: | |
| ```json | |
| { | |
| "message": "...", | |
| "intent": "search_fine", | |
| "results": [...], | |
| "_source": "fast_path", // or "slow_path" | |
| "_routing": { | |
| "path": "fast_path", | |
| "method": "keyword", | |
| "confidence": 1.0 | |
| }, | |
| "_golden_query_id": 123 // if fast_path | |
| } | |
| ``` | |
| ### Analytics Endpoint | |
| ``` | |
| GET /api/chatbot/analytics/?days=7&type=all | |
| ``` | |
| Returns: | |
| - `routing`: Fast/Slow path statistics | |
| - `golden_dataset`: Golden dataset stats | |
| - `performance`: P50/P95/P99 response times | |
| ## Golden Dataset Format | |
| ### JSON Format | |
| ```json | |
| [ | |
| { | |
| "query": "Mức phạt vượt đèn đỏ là bao nhiêu?", | |
| "intent": "search_fine", | |
| "response_message": "Mức phạt vượt đèn đỏ là từ 200.000 - 400.000 VNĐ...", | |
| "response_data": { | |
| "message": "...", | |
| "intent": "search_fine", | |
| "results": [...], | |
| "count": 1 | |
| }, | |
| "verified_by": "legal_expert", | |
| "accuracy_score": 1.0 | |
| } | |
| ] | |
| ``` | |
| ### CSV Format | |
| ```csv | |
| query,intent,response_message,response_data | |
| "Mức phạt vượt đèn đỏ là bao nhiêu?","search_fine","Mức phạt...","{\"message\":\"...\",\"results\":[...]}" | |
| ``` | |
| ## Monitoring | |
| ### Routing Statistics | |
| ```python | |
| from hue_portal.chatbot.analytics import get_routing_stats | |
| stats = get_routing_stats(days=7) | |
| print(f"Fast Path: {stats['fast_path_percentage']:.1f}%") | |
| print(f"Slow Path: {stats['slow_path_percentage']:.1f}%") | |
| print(f"Fast Path Avg Time: {stats['fast_path_avg_time_ms']:.1f}ms") | |
| print(f"Slow Path Avg Time: {stats['slow_path_avg_time_ms']:.1f}ms") | |
| ``` | |
| ### Golden Dataset Stats | |
| ```python | |
| from hue_portal.chatbot.analytics import get_golden_dataset_stats | |
| stats = get_golden_dataset_stats() | |
| print(f"Active queries: {stats['active_queries']}") | |
| print(f"Embedding coverage: {stats['embedding_coverage']:.1f}%") | |
| ``` | |
| ## Best Practices | |
| ### 1. Building Golden Dataset | |
| - Start with 50-100 most common queries from logs | |
| - Verify each response manually or with strong LLM (GPT-4/Claude) | |
| - Add queries gradually based on usage patterns | |
| - Target: 200 queries covering 80% of traffic | |
| ### 2. Verification Process | |
| - **Weekly review**: Check top 20 most-used queries | |
| - **Monthly audit**: Review all queries for accuracy | |
| - **Update embeddings**: When adding new queries | |
| - **Version control**: Track changes with `version` field | |
| ### 3. Tuning Similarity Threshold | |
| - Default: 0.85 (conservative, high precision) | |
| - Lower (0.75): More queries go to Fast Path, but risk false matches | |
| - Higher (0.90): Fewer false matches, but more queries go to Slow Path | |
| ### 4. Auto-Save from Slow Path | |
| Slow Path automatically saves high-quality responses: | |
| - Confidence >= 0.95 | |
| - Has results | |
| - Message length > 50 chars | |
| - Not already in golden dataset | |
| Review auto-saved queries weekly and verify before activating. | |
| ## Troubleshooting | |
| ### Fast Path not matching | |
| 1. Check if query is normalized correctly | |
| 2. Verify golden query exists: `GoldenQuery.objects.filter(query_normalized=...)` | |
| 3. Check similarity threshold (may be too high) | |
| 4. Ensure embeddings are generated: `update_embeddings` | |
| ### Slow performance | |
| 1. Check routing logs: `QueryRoutingLog.objects.filter(route='fast_path')` | |
| 2. Verify Fast Path percentage (should be ~80%) | |
| 3. Check embedding model loading time | |
| 4. Monitor database query performance | |
| ### Low accuracy | |
| 1. Review golden dataset verification | |
| 2. Check `accuracy_score` of golden queries | |
| 3. Monitor Slow Path responses for quality | |
| 4. Update golden queries with better responses | |
| ## Expected Performance | |
| - **Fast Path**: <200ms (target: <100ms) | |
| - **Slow Path**: 4-8s (full RAG pipeline) | |
| - **Overall**: 80% queries <200ms, 20% queries 4-8s | |
| - **Cache Hit Rate**: 75-85% (Fast Path usage) | |
| ## Next Steps | |
| 1. Import initial 200 common queries | |
| 2. Generate embeddings for all queries | |
| 3. Monitor routing statistics for 1 week | |
| 4. Tune similarity threshold based on metrics | |
| 5. Expand golden dataset based on usage patterns | |