hue-portal-backend-v2 / backend /DUAL_PATH_RAG_README.md
davidtran999's picture
Push full code from hue-portal-backend folder
519b145

Dual-Path RAG Architecture

Overview

Dual-Path RAG là kiến trúc tối ưu cho chatbot legal, tách biệt 2 đường xử lý:

  • Fast Path: Golden dataset (200 câu phổ biến) → <200ms, 100% accuracy
  • Slow Path: Full RAG pipeline → 4-8s, 99.99% accuracy

Architecture

User Query
    ↓
Intent Classification
    ↓
Dual-Path Router
    ├─ Keyword Router (exact/fuzzy match)
    ├─ Semantic Similarity Search (threshold 0.85)
    └─ LLM Router (optional, for edge cases)
    ↓
┌─────────────────┬─────────────────┐
│   Fast Path     │   Slow Path      │
│   (<200ms)      │   (4-8s)         │
│                 │                  │
│ Golden Dataset  │ Full RAG:        │
│ - Exact match   │ - Hybrid Search  │
│ - Fuzzy match   │ - Top 20 docs    │
│ - Similarity    │ - LLM Generation │
│                 │ - Guardrails     │
│ 100% accuracy   │ 99.99% accuracy  │
└─────────────────┴─────────────────┘
    ↓
Response + Routing Log

Components

1. Database Models

GoldenQuery: Stores verified queries and responses

  • query, query_normalized, query_embedding
  • intent, response_message, response_data
  • verified_by, usage_count, accuracy_score

QueryRoutingLog: Logs routing decisions for monitoring

  • route (fast_path/slow_path)
  • router_method (keyword/similarity/llm/default)
  • response_time_ms, similarity_score

2. Router Components

KeywordRouter: Fast keyword-based matching

  • Exact match (normalized query)
  • Fuzzy match (70% word overlap)
  • ~1-5ms latency

DualPathRouter: Main router with hybrid logic

  • Step 1: Keyword routing (fastest)
  • Step 2: Semantic similarity (threshold 0.85)
  • Step 3: LLM router fallback (optional)
  • Default: Slow Path

3. Path Handlers

FastPathHandler: Returns cached responses from golden dataset

  • Increments usage count
  • Returns verified response instantly

SlowPathHandler: Full RAG pipeline

  • Hybrid search (BM25 + vector)
  • Top 20 documents
  • LLM generation with structured output
  • Auto-save high-quality responses to golden dataset

Setup

1. Run Migration

cd backend/hue_portal
python manage.py migrate core

2. Import Initial Golden Dataset

# Import from JSON file
python manage.py manage_golden_dataset import --file golden_queries.json --format json

# Or import from CSV
python manage.py manage_golden_dataset import --file golden_queries.csv --format csv

3. Generate Embeddings (for semantic search)

# Generate embeddings for all queries
python manage.py manage_golden_dataset update_embeddings

# Or for specific query
python manage.py manage_golden_dataset update_embeddings --query-id 123

Management Commands

Import Queries

python manage.py manage_golden_dataset import \
    --file golden_queries.json \
    --format json \
    --verify-by legal_expert \
    --skip-embeddings  # Skip if embeddings will be generated later

Verify Query

python manage.py manage_golden_dataset verify \
    --query-id 123 \
    --verify-by gpt4 \
    --accuracy 1.0

Update Embeddings

python manage.py manage_golden_dataset update_embeddings \
    --batch-size 10

View Statistics

python manage.py manage_golden_dataset stats

Export Dataset

python manage.py manage_golden_dataset export \
    --file exported_queries.json \
    --active-only

Delete Query

# Soft delete (deactivate)
python manage.py manage_golden_dataset delete --query-id 123 --soft

# Hard delete
python manage.py manage_golden_dataset delete --query-id 123

API Endpoints

Chat Endpoint (unchanged)

POST /api/chatbot/chat/
{
  "message": "Mức phạt vượt đèn đỏ là bao nhiêu?",
  "session_id": "optional-uuid",
  "reset_session": false
}

Response includes routing metadata:

{
  "message": "...",
  "intent": "search_fine",
  "results": [...],
  "_source": "fast_path",  // or "slow_path"
  "_routing": {
    "path": "fast_path",
    "method": "keyword",
    "confidence": 1.0
  },
  "_golden_query_id": 123  // if fast_path
}

Analytics Endpoint

GET /api/chatbot/analytics/?days=7&type=all

Returns:

  • routing: Fast/Slow path statistics
  • golden_dataset: Golden dataset stats
  • performance: P50/P95/P99 response times

Golden Dataset Format

JSON Format

[
  {
    "query": "Mức phạt vượt đèn đỏ là bao nhiêu?",
    "intent": "search_fine",
    "response_message": "Mức phạt vượt đèn đỏ là từ 200.000 - 400.000 VNĐ...",
    "response_data": {
      "message": "...",
      "intent": "search_fine",
      "results": [...],
      "count": 1
    },
    "verified_by": "legal_expert",
    "accuracy_score": 1.0
  }
]

CSV Format

query,intent,response_message,response_data
"Mức phạt vượt đèn đỏ là bao nhiêu?","search_fine","Mức phạt...","{\"message\":\"...\",\"results\":[...]}"

Monitoring

Routing Statistics

from hue_portal.chatbot.analytics import get_routing_stats

stats = get_routing_stats(days=7)
print(f"Fast Path: {stats['fast_path_percentage']:.1f}%")
print(f"Slow Path: {stats['slow_path_percentage']:.1f}%")
print(f"Fast Path Avg Time: {stats['fast_path_avg_time_ms']:.1f}ms")
print(f"Slow Path Avg Time: {stats['slow_path_avg_time_ms']:.1f}ms")

Golden Dataset Stats

from hue_portal.chatbot.analytics import get_golden_dataset_stats

stats = get_golden_dataset_stats()
print(f"Active queries: {stats['active_queries']}")
print(f"Embedding coverage: {stats['embedding_coverage']:.1f}%")

Best Practices

1. Building Golden Dataset

  • Start with 50-100 most common queries from logs
  • Verify each response manually or with strong LLM (GPT-4/Claude)
  • Add queries gradually based on usage patterns
  • Target: 200 queries covering 80% of traffic

2. Verification Process

  • Weekly review: Check top 20 most-used queries
  • Monthly audit: Review all queries for accuracy
  • Update embeddings: When adding new queries
  • Version control: Track changes with version field

3. Tuning Similarity Threshold

  • Default: 0.85 (conservative, high precision)
  • Lower (0.75): More queries go to Fast Path, but risk false matches
  • Higher (0.90): Fewer false matches, but more queries go to Slow Path

4. Auto-Save from Slow Path

Slow Path automatically saves high-quality responses:

  • Confidence >= 0.95
  • Has results
  • Message length > 50 chars
  • Not already in golden dataset

Review auto-saved queries weekly and verify before activating.

Troubleshooting

Fast Path not matching

  1. Check if query is normalized correctly
  2. Verify golden query exists: GoldenQuery.objects.filter(query_normalized=...)
  3. Check similarity threshold (may be too high)
  4. Ensure embeddings are generated: update_embeddings

Slow performance

  1. Check routing logs: QueryRoutingLog.objects.filter(route='fast_path')
  2. Verify Fast Path percentage (should be ~80%)
  3. Check embedding model loading time
  4. Monitor database query performance

Low accuracy

  1. Review golden dataset verification
  2. Check accuracy_score of golden queries
  3. Monitor Slow Path responses for quality
  4. Update golden queries with better responses

Expected Performance

  • Fast Path: <200ms (target: <100ms)
  • Slow Path: 4-8s (full RAG pipeline)
  • Overall: 80% queries <200ms, 20% queries 4-8s
  • Cache Hit Rate: 75-85% (Fast Path usage)

Next Steps

  1. Import initial 200 common queries
  2. Generate embeddings for all queries
  3. Monitor routing statistics for 1 week
  4. Tune similarity threshold based on metrics
  5. Expand golden dataset based on usage patterns