File size: 7,982 Bytes
519b145
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
# Dual-Path RAG Architecture

## Overview

Dual-Path RAG là kiến trúc tối ưu cho chatbot legal, tách biệt 2 đường xử lý:

- **Fast Path**: Golden dataset (200 câu phổ biến) → <200ms, 100% accuracy
- **Slow Path**: Full RAG pipeline → 4-8s, 99.99% accuracy

## Architecture

```
User Query

Intent Classification

Dual-Path Router
    ├─ Keyword Router (exact/fuzzy match)
    ├─ Semantic Similarity Search (threshold 0.85)
    └─ LLM Router (optional, for edge cases)

┌─────────────────┬─────────────────┐
│   Fast Path     │   Slow Path      │
│   (<200ms)      │   (4-8s)         │
│                 │                  │
│ Golden Dataset  │ Full RAG:        │
│ - Exact match   │ - Hybrid Search  │
│ - Fuzzy match   │ - Top 20 docs    │
│ - Similarity    │ - LLM Generation │
│                 │ - Guardrails     │
│ 100% accuracy   │ 99.99% accuracy  │
└─────────────────┴─────────────────┘

Response + Routing Log
```

## Components

### 1. Database Models

**GoldenQuery**: Stores verified queries and responses
- `query`, `query_normalized`, `query_embedding`
- `intent`, `response_message`, `response_data`
- `verified_by`, `usage_count`, `accuracy_score`

**QueryRoutingLog**: Logs routing decisions for monitoring
- `route` (fast_path/slow_path)
- `router_method` (keyword/similarity/llm/default)
- `response_time_ms`, `similarity_score`

### 2. Router Components

**KeywordRouter**: Fast keyword-based matching
- Exact match (normalized query)
- Fuzzy match (70% word overlap)
- ~1-5ms latency

**DualPathRouter**: Main router with hybrid logic
- Step 1: Keyword routing (fastest)
- Step 2: Semantic similarity (threshold 0.85)
- Step 3: LLM router fallback (optional)
- Default: Slow Path

### 3. Path Handlers

**FastPathHandler**: Returns cached responses from golden dataset
- Increments usage count
- Returns verified response instantly

**SlowPathHandler**: Full RAG pipeline
- Hybrid search (BM25 + vector)
- Top 20 documents
- LLM generation with structured output
- Auto-save high-quality responses to golden dataset

## Setup

### 1. Run Migration

```bash
cd backend/hue_portal
python manage.py migrate core
```

### 2. Import Initial Golden Dataset

```bash
# Import from JSON file
python manage.py manage_golden_dataset import --file golden_queries.json --format json

# Or import from CSV
python manage.py manage_golden_dataset import --file golden_queries.csv --format csv
```

### 3. Generate Embeddings (for semantic search)

```bash
# Generate embeddings for all queries
python manage.py manage_golden_dataset update_embeddings

# Or for specific query
python manage.py manage_golden_dataset update_embeddings --query-id 123
```

## Management Commands

### Import Queries

```bash
python manage.py manage_golden_dataset import \
    --file golden_queries.json \
    --format json \
    --verify-by legal_expert \
    --skip-embeddings  # Skip if embeddings will be generated later
```

### Verify Query

```bash
python manage.py manage_golden_dataset verify \
    --query-id 123 \
    --verify-by gpt4 \
    --accuracy 1.0
```

### Update Embeddings

```bash
python manage.py manage_golden_dataset update_embeddings \
    --batch-size 10
```

### View Statistics

```bash
python manage.py manage_golden_dataset stats
```

### Export Dataset

```bash
python manage.py manage_golden_dataset export \
    --file exported_queries.json \
    --active-only
```

### Delete Query

```bash
# Soft delete (deactivate)
python manage.py manage_golden_dataset delete --query-id 123 --soft

# Hard delete
python manage.py manage_golden_dataset delete --query-id 123
```

## API Endpoints

### Chat Endpoint (unchanged)

```
POST /api/chatbot/chat/
{
  "message": "Mức phạt vượt đèn đỏ là bao nhiêu?",
  "session_id": "optional-uuid",
  "reset_session": false
}
```

Response includes routing metadata:
```json
{
  "message": "...",
  "intent": "search_fine",
  "results": [...],
  "_source": "fast_path",  // or "slow_path"
  "_routing": {
    "path": "fast_path",
    "method": "keyword",
    "confidence": 1.0
  },
  "_golden_query_id": 123  // if fast_path
}
```

### Analytics Endpoint

```
GET /api/chatbot/analytics/?days=7&type=all
```

Returns:
- `routing`: Fast/Slow path statistics
- `golden_dataset`: Golden dataset stats
- `performance`: P50/P95/P99 response times

## Golden Dataset Format

### JSON Format

```json
[
  {
    "query": "Mức phạt vượt đèn đỏ là bao nhiêu?",
    "intent": "search_fine",
    "response_message": "Mức phạt vượt đèn đỏ là từ 200.000 - 400.000 VNĐ...",
    "response_data": {
      "message": "...",
      "intent": "search_fine",
      "results": [...],
      "count": 1
    },
    "verified_by": "legal_expert",
    "accuracy_score": 1.0
  }
]
```

### CSV Format

```csv
query,intent,response_message,response_data
"Mức phạt vượt đèn đỏ là bao nhiêu?","search_fine","Mức phạt...","{\"message\":\"...\",\"results\":[...]}"
```

## Monitoring

### Routing Statistics

```python
from hue_portal.chatbot.analytics import get_routing_stats

stats = get_routing_stats(days=7)
print(f"Fast Path: {stats['fast_path_percentage']:.1f}%")
print(f"Slow Path: {stats['slow_path_percentage']:.1f}%")
print(f"Fast Path Avg Time: {stats['fast_path_avg_time_ms']:.1f}ms")
print(f"Slow Path Avg Time: {stats['slow_path_avg_time_ms']:.1f}ms")
```

### Golden Dataset Stats

```python
from hue_portal.chatbot.analytics import get_golden_dataset_stats

stats = get_golden_dataset_stats()
print(f"Active queries: {stats['active_queries']}")
print(f"Embedding coverage: {stats['embedding_coverage']:.1f}%")
```

## Best Practices

### 1. Building Golden Dataset

- Start with 50-100 most common queries from logs
- Verify each response manually or with strong LLM (GPT-4/Claude)
- Add queries gradually based on usage patterns
- Target: 200 queries covering 80% of traffic

### 2. Verification Process

- **Weekly review**: Check top 20 most-used queries
- **Monthly audit**: Review all queries for accuracy
- **Update embeddings**: When adding new queries
- **Version control**: Track changes with `version` field

### 3. Tuning Similarity Threshold

- Default: 0.85 (conservative, high precision)
- Lower (0.75): More queries go to Fast Path, but risk false matches
- Higher (0.90): Fewer false matches, but more queries go to Slow Path

### 4. Auto-Save from Slow Path

Slow Path automatically saves high-quality responses:
- Confidence >= 0.95
- Has results
- Message length > 50 chars
- Not already in golden dataset

Review auto-saved queries weekly and verify before activating.

## Troubleshooting

### Fast Path not matching

1. Check if query is normalized correctly
2. Verify golden query exists: `GoldenQuery.objects.filter(query_normalized=...)`
3. Check similarity threshold (may be too high)
4. Ensure embeddings are generated: `update_embeddings`

### Slow performance

1. Check routing logs: `QueryRoutingLog.objects.filter(route='fast_path')`
2. Verify Fast Path percentage (should be ~80%)
3. Check embedding model loading time
4. Monitor database query performance

### Low accuracy

1. Review golden dataset verification
2. Check `accuracy_score` of golden queries
3. Monitor Slow Path responses for quality
4. Update golden queries with better responses

## Expected Performance

- **Fast Path**: <200ms (target: <100ms)
- **Slow Path**: 4-8s (full RAG pipeline)
- **Overall**: 80% queries <200ms, 20% queries 4-8s
- **Cache Hit Rate**: 75-85% (Fast Path usage)

## Next Steps

1. Import initial 200 common queries
2. Generate embeddings for all queries
3. Monitor routing statistics for 1 week
4. Tune similarity threshold based on metrics
5. Expand golden dataset based on usage patterns