hue-portal-backend-v2 / backend /DUAL_PATH_RAG_README.md
davidtran999's picture
Push full code from hue-portal-backend folder
519b145
# Dual-Path RAG Architecture
## Overview
Dual-Path RAG là kiến trúc tối ưu cho chatbot legal, tách biệt 2 đường xử lý:
- **Fast Path**: Golden dataset (200 câu phổ biến) → <200ms, 100% accuracy
- **Slow Path**: Full RAG pipeline → 4-8s, 99.99% accuracy
## Architecture
```
User Query
Intent Classification
Dual-Path Router
├─ Keyword Router (exact/fuzzy match)
├─ Semantic Similarity Search (threshold 0.85)
└─ LLM Router (optional, for edge cases)
┌─────────────────┬─────────────────┐
│ Fast Path │ Slow Path │
│ (<200ms) │ (4-8s) │
│ │ │
│ Golden Dataset │ Full RAG: │
│ - Exact match │ - Hybrid Search │
│ - Fuzzy match │ - Top 20 docs │
│ - Similarity │ - LLM Generation │
│ │ - Guardrails │
│ 100% accuracy │ 99.99% accuracy │
└─────────────────┴─────────────────┘
Response + Routing Log
```
## Components
### 1. Database Models
**GoldenQuery**: Stores verified queries and responses
- `query`, `query_normalized`, `query_embedding`
- `intent`, `response_message`, `response_data`
- `verified_by`, `usage_count`, `accuracy_score`
**QueryRoutingLog**: Logs routing decisions for monitoring
- `route` (fast_path/slow_path)
- `router_method` (keyword/similarity/llm/default)
- `response_time_ms`, `similarity_score`
### 2. Router Components
**KeywordRouter**: Fast keyword-based matching
- Exact match (normalized query)
- Fuzzy match (70% word overlap)
- ~1-5ms latency
**DualPathRouter**: Main router with hybrid logic
- Step 1: Keyword routing (fastest)
- Step 2: Semantic similarity (threshold 0.85)
- Step 3: LLM router fallback (optional)
- Default: Slow Path
### 3. Path Handlers
**FastPathHandler**: Returns cached responses from golden dataset
- Increments usage count
- Returns verified response instantly
**SlowPathHandler**: Full RAG pipeline
- Hybrid search (BM25 + vector)
- Top 20 documents
- LLM generation with structured output
- Auto-save high-quality responses to golden dataset
## Setup
### 1. Run Migration
```bash
cd backend/hue_portal
python manage.py migrate core
```
### 2. Import Initial Golden Dataset
```bash
# Import from JSON file
python manage.py manage_golden_dataset import --file golden_queries.json --format json
# Or import from CSV
python manage.py manage_golden_dataset import --file golden_queries.csv --format csv
```
### 3. Generate Embeddings (for semantic search)
```bash
# Generate embeddings for all queries
python manage.py manage_golden_dataset update_embeddings
# Or for specific query
python manage.py manage_golden_dataset update_embeddings --query-id 123
```
## Management Commands
### Import Queries
```bash
python manage.py manage_golden_dataset import \
--file golden_queries.json \
--format json \
--verify-by legal_expert \
--skip-embeddings # Skip if embeddings will be generated later
```
### Verify Query
```bash
python manage.py manage_golden_dataset verify \
--query-id 123 \
--verify-by gpt4 \
--accuracy 1.0
```
### Update Embeddings
```bash
python manage.py manage_golden_dataset update_embeddings \
--batch-size 10
```
### View Statistics
```bash
python manage.py manage_golden_dataset stats
```
### Export Dataset
```bash
python manage.py manage_golden_dataset export \
--file exported_queries.json \
--active-only
```
### Delete Query
```bash
# Soft delete (deactivate)
python manage.py manage_golden_dataset delete --query-id 123 --soft
# Hard delete
python manage.py manage_golden_dataset delete --query-id 123
```
## API Endpoints
### Chat Endpoint (unchanged)
```
POST /api/chatbot/chat/
{
"message": "Mức phạt vượt đèn đỏ là bao nhiêu?",
"session_id": "optional-uuid",
"reset_session": false
}
```
Response includes routing metadata:
```json
{
"message": "...",
"intent": "search_fine",
"results": [...],
"_source": "fast_path", // or "slow_path"
"_routing": {
"path": "fast_path",
"method": "keyword",
"confidence": 1.0
},
"_golden_query_id": 123 // if fast_path
}
```
### Analytics Endpoint
```
GET /api/chatbot/analytics/?days=7&type=all
```
Returns:
- `routing`: Fast/Slow path statistics
- `golden_dataset`: Golden dataset stats
- `performance`: P50/P95/P99 response times
## Golden Dataset Format
### JSON Format
```json
[
{
"query": "Mức phạt vượt đèn đỏ là bao nhiêu?",
"intent": "search_fine",
"response_message": "Mức phạt vượt đèn đỏ là từ 200.000 - 400.000 VNĐ...",
"response_data": {
"message": "...",
"intent": "search_fine",
"results": [...],
"count": 1
},
"verified_by": "legal_expert",
"accuracy_score": 1.0
}
]
```
### CSV Format
```csv
query,intent,response_message,response_data
"Mức phạt vượt đèn đỏ là bao nhiêu?","search_fine","Mức phạt...","{\"message\":\"...\",\"results\":[...]}"
```
## Monitoring
### Routing Statistics
```python
from hue_portal.chatbot.analytics import get_routing_stats
stats = get_routing_stats(days=7)
print(f"Fast Path: {stats['fast_path_percentage']:.1f}%")
print(f"Slow Path: {stats['slow_path_percentage']:.1f}%")
print(f"Fast Path Avg Time: {stats['fast_path_avg_time_ms']:.1f}ms")
print(f"Slow Path Avg Time: {stats['slow_path_avg_time_ms']:.1f}ms")
```
### Golden Dataset Stats
```python
from hue_portal.chatbot.analytics import get_golden_dataset_stats
stats = get_golden_dataset_stats()
print(f"Active queries: {stats['active_queries']}")
print(f"Embedding coverage: {stats['embedding_coverage']:.1f}%")
```
## Best Practices
### 1. Building Golden Dataset
- Start with 50-100 most common queries from logs
- Verify each response manually or with strong LLM (GPT-4/Claude)
- Add queries gradually based on usage patterns
- Target: 200 queries covering 80% of traffic
### 2. Verification Process
- **Weekly review**: Check top 20 most-used queries
- **Monthly audit**: Review all queries for accuracy
- **Update embeddings**: When adding new queries
- **Version control**: Track changes with `version` field
### 3. Tuning Similarity Threshold
- Default: 0.85 (conservative, high precision)
- Lower (0.75): More queries go to Fast Path, but risk false matches
- Higher (0.90): Fewer false matches, but more queries go to Slow Path
### 4. Auto-Save from Slow Path
Slow Path automatically saves high-quality responses:
- Confidence >= 0.95
- Has results
- Message length > 50 chars
- Not already in golden dataset
Review auto-saved queries weekly and verify before activating.
## Troubleshooting
### Fast Path not matching
1. Check if query is normalized correctly
2. Verify golden query exists: `GoldenQuery.objects.filter(query_normalized=...)`
3. Check similarity threshold (may be too high)
4. Ensure embeddings are generated: `update_embeddings`
### Slow performance
1. Check routing logs: `QueryRoutingLog.objects.filter(route='fast_path')`
2. Verify Fast Path percentage (should be ~80%)
3. Check embedding model loading time
4. Monitor database query performance
### Low accuracy
1. Review golden dataset verification
2. Check `accuracy_score` of golden queries
3. Monitor Slow Path responses for quality
4. Update golden queries with better responses
## Expected Performance
- **Fast Path**: <200ms (target: <100ms)
- **Slow Path**: 4-8s (full RAG pipeline)
- **Overall**: 80% queries <200ms, 20% queries 4-8s
- **Cache Hit Rate**: 75-85% (Fast Path usage)
## Next Steps
1. Import initial 200 common queries
2. Generate embeddings for all queries
3. Monitor routing statistics for 1 week
4. Tune similarity threshold based on metrics
5. Expand golden dataset based on usage patterns