Dual-Path RAG Architecture
Overview
Dual-Path RAG là kiến trúc tối ưu cho chatbot legal, tách biệt 2 đường xử lý:
- Fast Path: Golden dataset (200 câu phổ biến) → <200ms, 100% accuracy
- Slow Path: Full RAG pipeline → 4-8s, 99.99% accuracy
Architecture
User Query
↓
Intent Classification
↓
Dual-Path Router
├─ Keyword Router (exact/fuzzy match)
├─ Semantic Similarity Search (threshold 0.85)
└─ LLM Router (optional, for edge cases)
↓
┌─────────────────┬─────────────────┐
│ Fast Path │ Slow Path │
│ (<200ms) │ (4-8s) │
│ │ │
│ Golden Dataset │ Full RAG: │
│ - Exact match │ - Hybrid Search │
│ - Fuzzy match │ - Top 20 docs │
│ - Similarity │ - LLM Generation │
│ │ - Guardrails │
│ 100% accuracy │ 99.99% accuracy │
└─────────────────┴─────────────────┘
↓
Response + Routing Log
Components
1. Database Models
GoldenQuery: Stores verified queries and responses
query,query_normalized,query_embeddingintent,response_message,response_dataverified_by,usage_count,accuracy_score
QueryRoutingLog: Logs routing decisions for monitoring
route(fast_path/slow_path)router_method(keyword/similarity/llm/default)response_time_ms,similarity_score
2. Router Components
KeywordRouter: Fast keyword-based matching
- Exact match (normalized query)
- Fuzzy match (70% word overlap)
- ~1-5ms latency
DualPathRouter: Main router with hybrid logic
- Step 1: Keyword routing (fastest)
- Step 2: Semantic similarity (threshold 0.85)
- Step 3: LLM router fallback (optional)
- Default: Slow Path
3. Path Handlers
FastPathHandler: Returns cached responses from golden dataset
- Increments usage count
- Returns verified response instantly
SlowPathHandler: Full RAG pipeline
- Hybrid search (BM25 + vector)
- Top 20 documents
- LLM generation with structured output
- Auto-save high-quality responses to golden dataset
Setup
1. Run Migration
cd backend/hue_portal
python manage.py migrate core
2. Import Initial Golden Dataset
# Import from JSON file
python manage.py manage_golden_dataset import --file golden_queries.json --format json
# Or import from CSV
python manage.py manage_golden_dataset import --file golden_queries.csv --format csv
3. Generate Embeddings (for semantic search)
# Generate embeddings for all queries
python manage.py manage_golden_dataset update_embeddings
# Or for specific query
python manage.py manage_golden_dataset update_embeddings --query-id 123
Management Commands
Import Queries
python manage.py manage_golden_dataset import \
--file golden_queries.json \
--format json \
--verify-by legal_expert \
--skip-embeddings # Skip if embeddings will be generated later
Verify Query
python manage.py manage_golden_dataset verify \
--query-id 123 \
--verify-by gpt4 \
--accuracy 1.0
Update Embeddings
python manage.py manage_golden_dataset update_embeddings \
--batch-size 10
View Statistics
python manage.py manage_golden_dataset stats
Export Dataset
python manage.py manage_golden_dataset export \
--file exported_queries.json \
--active-only
Delete Query
# Soft delete (deactivate)
python manage.py manage_golden_dataset delete --query-id 123 --soft
# Hard delete
python manage.py manage_golden_dataset delete --query-id 123
API Endpoints
Chat Endpoint (unchanged)
POST /api/chatbot/chat/
{
"message": "Mức phạt vượt đèn đỏ là bao nhiêu?",
"session_id": "optional-uuid",
"reset_session": false
}
Response includes routing metadata:
{
"message": "...",
"intent": "search_fine",
"results": [...],
"_source": "fast_path", // or "slow_path"
"_routing": {
"path": "fast_path",
"method": "keyword",
"confidence": 1.0
},
"_golden_query_id": 123 // if fast_path
}
Analytics Endpoint
GET /api/chatbot/analytics/?days=7&type=all
Returns:
routing: Fast/Slow path statisticsgolden_dataset: Golden dataset statsperformance: P50/P95/P99 response times
Golden Dataset Format
JSON Format
[
{
"query": "Mức phạt vượt đèn đỏ là bao nhiêu?",
"intent": "search_fine",
"response_message": "Mức phạt vượt đèn đỏ là từ 200.000 - 400.000 VNĐ...",
"response_data": {
"message": "...",
"intent": "search_fine",
"results": [...],
"count": 1
},
"verified_by": "legal_expert",
"accuracy_score": 1.0
}
]
CSV Format
query,intent,response_message,response_data
"Mức phạt vượt đèn đỏ là bao nhiêu?","search_fine","Mức phạt...","{\"message\":\"...\",\"results\":[...]}"
Monitoring
Routing Statistics
from hue_portal.chatbot.analytics import get_routing_stats
stats = get_routing_stats(days=7)
print(f"Fast Path: {stats['fast_path_percentage']:.1f}%")
print(f"Slow Path: {stats['slow_path_percentage']:.1f}%")
print(f"Fast Path Avg Time: {stats['fast_path_avg_time_ms']:.1f}ms")
print(f"Slow Path Avg Time: {stats['slow_path_avg_time_ms']:.1f}ms")
Golden Dataset Stats
from hue_portal.chatbot.analytics import get_golden_dataset_stats
stats = get_golden_dataset_stats()
print(f"Active queries: {stats['active_queries']}")
print(f"Embedding coverage: {stats['embedding_coverage']:.1f}%")
Best Practices
1. Building Golden Dataset
- Start with 50-100 most common queries from logs
- Verify each response manually or with strong LLM (GPT-4/Claude)
- Add queries gradually based on usage patterns
- Target: 200 queries covering 80% of traffic
2. Verification Process
- Weekly review: Check top 20 most-used queries
- Monthly audit: Review all queries for accuracy
- Update embeddings: When adding new queries
- Version control: Track changes with
versionfield
3. Tuning Similarity Threshold
- Default: 0.85 (conservative, high precision)
- Lower (0.75): More queries go to Fast Path, but risk false matches
- Higher (0.90): Fewer false matches, but more queries go to Slow Path
4. Auto-Save from Slow Path
Slow Path automatically saves high-quality responses:
- Confidence >= 0.95
- Has results
- Message length > 50 chars
- Not already in golden dataset
Review auto-saved queries weekly and verify before activating.
Troubleshooting
Fast Path not matching
- Check if query is normalized correctly
- Verify golden query exists:
GoldenQuery.objects.filter(query_normalized=...) - Check similarity threshold (may be too high)
- Ensure embeddings are generated:
update_embeddings
Slow performance
- Check routing logs:
QueryRoutingLog.objects.filter(route='fast_path') - Verify Fast Path percentage (should be ~80%)
- Check embedding model loading time
- Monitor database query performance
Low accuracy
- Review golden dataset verification
- Check
accuracy_scoreof golden queries - Monitor Slow Path responses for quality
- Update golden queries with better responses
Expected Performance
- Fast Path: <200ms (target: <100ms)
- Slow Path: 4-8s (full RAG pipeline)
- Overall: 80% queries <200ms, 20% queries 4-8s
- Cache Hit Rate: 75-85% (Fast Path usage)
Next Steps
- Import initial 200 common queries
- Generate embeddings for all queries
- Monitor routing statistics for 1 week
- Tune similarity threshold based on metrics
- Expand golden dataset based on usage patterns