Spaces:
Sleeping
Sleeping
Tối ưu Tốc độ và Độ chính xác Chatbot
Ngày tạo: 2025-01-27
1. Phân tích Bottlenecks hiện tại
1.1 Intent Classification
Vấn đề:
- Loop qua nhiều keywords mỗi lần (fine_keywords: 9 items, fine_single_words: 7 items)
- Tính
_remove_accents()nhiều lần cho cùng keyword - Không có compiled regex patterns
Impact: ~5-10ms mỗi query
1.2 Search Pipeline
Vấn đề:
list(queryset)- Load TẤT CẢ objects vào memory trước khi search- TF-IDF vectorization cho toàn bộ dataset mỗi lần
- Không có early exit khi tìm thấy kết quả tốt
- Query expansion query database mỗi lần
Impact: ~100-500ms cho dataset lớn
1.3 LLM Generation
Vấn đề:
- Prompt được build lại mỗi lần (không cache)
- Không có streaming response
- max_new_tokens=150 (OK) nhưng có thể tối ưu thêm
- Không cache generated responses
Impact: ~1-5s cho local model, ~2-10s cho API
1.4 Không có Response Caching
Vấn đề:
- Cùng query được xử lý lại từ đầu
- Search results không được cache
- Intent classification không được cache
Impact: ~100-500ms cho duplicate queries
2. Tối ưu Intent Classification
2.1 Pre-compile Keyword Patterns
# backend/hue_portal/core/chatbot.py
import re
from functools import lru_cache
class Chatbot:
def __init__(self):
self.intent_classifier = None
self.vectorizer = None
# Pre-compile keyword patterns
self._compile_keyword_patterns()
self._train_classifier()
def _compile_keyword_patterns(self):
"""Pre-compile regex patterns for faster matching."""
# Fine keywords (multi-word first, then single)
self.fine_patterns_multi = [
re.compile(r'\b' + re.escape(kw) + r'\b', re.IGNORECASE)
for kw in ["mức phạt", "vi phạm", "đèn đỏ", "nồng độ cồn",
"mũ bảo hiểm", "tốc độ", "bằng lái", "vượt đèn"]
]
self.fine_patterns_single = [
re.compile(r'\b' + re.escape(kw) + r'\b', re.IGNORECASE)
for kw in ["phạt", "vượt", "đèn", "mức"]
]
# Pre-compute accent-free versions
self.fine_keywords_ascii = [self._remove_accents(kw) for kw in
["mức phạt", "vi phạm", "đèn đỏ", ...]]
# Procedure, Office, Advisory patterns...
# Similar pattern compilation
@lru_cache(maxsize=1000)
def classify_intent(self, query: str) -> Tuple[str, float]:
"""Cached intent classification."""
query_lower = query.lower().strip()
# Fast path: Check compiled patterns
for pattern in self.fine_patterns_multi:
if pattern.search(query_lower):
return ("search_fine", 0.95)
# ... rest of logic
Lợi ích:
- Giảm ~50% thời gian intent classification
- Cache kết quả cho duplicate queries
2.2 Early Exit Strategy
def _keyword_based_intent(self, query: str) -> Tuple[str, float]:
query_lower = query.lower().strip()
# Fast path: Check most common intents first
# Fine queries are most common → check first
if any(pattern.search(query_lower) for pattern in self.fine_patterns_multi):
return ("search_fine", 0.95)
# Early exit for very short queries (likely greeting)
if len(query.split()) <= 2:
if any(greeting in query_lower for greeting in ["xin chào", "chào", "hello"]):
return ("greeting", 0.9)
# ... rest
3. Tối ưu Search Pipeline
3.1 Limit QuerySet trước khi Load
# backend/hue_portal/core/search_ml.py
def search_with_ml(queryset, query, text_fields, top_k=20, min_score=0.1, use_hybrid=True):
if not query:
return queryset[:top_k]
# OPTIMIZATION: Limit queryset early for large datasets
# Only search in first N records if dataset is huge
MAX_SEARCH_CANDIDATES = 1000
total_count = queryset.count()
if total_count > MAX_SEARCH_CANDIDATES:
# Use database-level filtering first
# Try exact match on primary field first
primary_field = text_fields[0] if text_fields else None
if primary_field:
exact_matches = queryset.filter(
**{f"{primary_field}__icontains": query}
)[:top_k * 2]
if exact_matches.count() >= top_k:
# We have enough exact matches, return them
return exact_matches[:top_k]
# Limit candidates for ML search
queryset = queryset[:MAX_SEARCH_CANDIDATES]
# Continue with existing search logic...
3.2 Cache Search Results
# backend/hue_portal/core/search_ml.py
from functools import lru_cache
import hashlib
import json
def _get_query_hash(query: str, model_name: str, text_fields: tuple) -> str:
"""Generate hash for query caching."""
key = f"{query}|{model_name}|{':'.join(text_fields)}"
return hashlib.md5(key.encode()).hexdigest()
# Cache search results for 1 hour
@lru_cache(maxsize=500)
def _cached_search(query_hash: str, queryset_ids: tuple, top_k: int):
"""Cached search results."""
# This will be called with actual queryset in wrapper
pass
def search_with_ml(queryset, query, text_fields, top_k=20, min_score=0.1, use_hybrid=True):
# Check cache first
query_hash = _get_query_hash(query, queryset.model.__name__, tuple(text_fields))
# Try to get from cache (if queryset hasn't changed)
# Note: Full caching requires tracking queryset state
# ... existing search logic
3.3 Optimize TF-IDF Calculation
# Pre-compute TF-IDF vectors for common queries
# Use incremental TF-IDF instead of recalculating
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
class CachedTfidfVectorizer:
"""TF-IDF vectorizer with caching."""
def __init__(self):
self.vectorizer = None
self.doc_vectors = None
self.doc_ids = None
def fit_transform_cached(self, documents: List[str], doc_ids: List[int]):
"""Fit and cache document vectors."""
if self.doc_ids == tuple(doc_ids):
# Same documents, reuse vectors
return self.doc_vectors
# New documents, recompute
self.vectorizer = TfidfVectorizer(
analyzer='word',
ngram_range=(1, 2),
min_df=1,
max_df=0.95,
lowercase=True
)
self.doc_vectors = self.vectorizer.fit_transform(documents)
self.doc_ids = tuple(doc_ids)
return self.doc_vectors
3.4 Early Exit khi có Exact Match
def search_with_ml(queryset, query, text_fields, top_k=20, min_score=0.1, use_hybrid=True):
# OPTIMIZATION: Check exact matches first (fastest)
query_normalized = normalize_text(query)
# Try exact match on primary field
primary_field = text_fields[0] if text_fields else None
if primary_field:
exact_qs = queryset.filter(**{f"{primary_field}__iexact": query})
if exact_qs.exists():
# Found exact match, return immediately
return exact_qs[:top_k]
# Try case-insensitive contains (faster than ML)
contains_qs = queryset.filter(**{f"{primary_field}__icontains": query})
if contains_qs.count() <= top_k * 2:
# Small result set, return directly
return contains_qs[:top_k]
# Only use ML search if no good exact matches
# ... existing ML search logic
4. Tối ưu LLM Generation
4.1 Prompt Caching
# backend/hue_portal/chatbot/llm_integration.py
from functools import lru_cache
import hashlib
class LLMGenerator:
def __init__(self, provider: Optional[str] = None):
self.provider = provider or LLM_PROVIDER
self.prompt_cache = {} # Cache prompts by hash
self.response_cache = {} # Cache responses
def _get_prompt_hash(self, query: str, documents: List[Any]) -> str:
"""Generate hash for prompt caching."""
doc_ids = [getattr(doc, 'id', None) for doc in documents[:5]]
key = f"{query}|{doc_ids}"
return hashlib.md5(key.encode()).hexdigest()
def generate_answer(self, query: str, context: Optional[List[Dict]], documents: Optional[List[Any]]):
if not self.is_available():
return None
# Check cache first
prompt_hash = self._get_prompt_hash(query, documents or [])
if prompt_hash in self.response_cache:
cached_response = self.response_cache[prompt_hash]
# Check if cache is still valid (e.g., < 1 hour old)
if cached_response.get('timestamp', 0) > time.time() - 3600:
return cached_response['response']
# Build prompt (may be cached)
prompt = self._build_prompt(query, context, documents)
response = self._generate_from_prompt(prompt, context=context)
# Cache response
if response:
self.response_cache[prompt_hash] = {
'response': response,
'timestamp': time.time()
}
return response
4.2 Optimize Local Model Generation
def _generate_local(self, prompt: str) -> Optional[str]:
# OPTIMIZATION: Use faster generation parameters
with torch.no_grad():
outputs = self.local_model.generate(
**inputs,
max_new_tokens=100, # Reduced from 150
temperature=0.5, # Lower for faster generation
top_p=0.8, # Lower top_p
do_sample=False, # Greedy decoding (faster)
use_cache=True,
pad_token_id=self.local_tokenizer.eos_token_id,
repetition_penalty=1.1,
# OPTIMIZATION: Early stopping
eos_token_id=self.local_tokenizer.eos_token_id,
)
4.3 Streaming Response (for better UX)
# For API endpoints, support streaming
def generate_answer_streaming(self, query: str, context, documents):
"""Generate answer with streaming for better UX."""
if self.provider == LLM_PROVIDER_LOCAL:
# Use generate with stream=True
for token in self._generate_local_streaming(prompt):
yield token
elif self.provider == LLM_PROVIDER_OPENAI:
# Use OpenAI streaming API
for chunk in self.client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}],
stream=True
):
yield chunk.choices[0].delta.content
5. Response Caching Strategy
5.1 Multi-level Caching
# backend/hue_portal/core/cache_utils.py
from functools import lru_cache
from django.core.cache import cache
import hashlib
import json
class ChatbotCache:
"""Multi-level caching for chatbot responses."""
CACHE_TIMEOUT = 3600 # 1 hour
@staticmethod
def get_cache_key(query: str, intent: str, session_id: str = None) -> str:
"""Generate cache key."""
key_parts = [query.lower().strip(), intent]
if session_id:
key_parts.append(session_id)
key_str = "|".join(key_parts)
return f"chatbot:{hashlib.md5(key_str.encode()).hexdigest()}"
@staticmethod
def get_cached_response(query: str, intent: str, session_id: str = None):
"""Get cached response."""
cache_key = ChatbotCache.get_cache_key(query, intent, session_id)
return cache.get(cache_key)
@staticmethod
def set_cached_response(query: str, intent: str, response: dict, session_id: str = None):
"""Cache response."""
cache_key = ChatbotCache.get_cache_key(query, intent, session_id)
cache.set(cache_key, response, ChatbotCache.CACHE_TIMEOUT)
@staticmethod
def get_cached_search_results(query: str, model_name: str, text_fields: tuple):
"""Get cached search results."""
key = f"search:{hashlib.md5(f'{query}|{model_name}|{text_fields}'.encode()).hexdigest()}"
return cache.get(key)
@staticmethod
def set_cached_search_results(query: str, model_name: str, text_fields: tuple, results):
"""Cache search results."""
key = f"search:{hashlib.md5(f'{query}|{model_name}|{text_fields}'.encode()).hexdigest()}"
cache.set(key, results, ChatbotCache.CACHE_TIMEOUT)
5.2 Integrate vào Chatbot
# backend/hue_portal/core/chatbot.py
from .cache_utils import ChatbotCache
class Chatbot:
def generate_response(self, query: str, session_id: str = None) -> Dict[str, Any]:
query = query.strip()
# Classify intent
intent, confidence = self.classify_intent(query)
# Check cache first
cached_response = ChatbotCache.get_cached_response(query, intent, session_id)
if cached_response:
return cached_response
# ... existing logic
# Cache response before returning
response = {
"message": message,
"intent": intent,
"confidence": confidence,
"results": search_result["results"],
"count": search_result["count"]
}
ChatbotCache.set_cached_response(query, intent, response, session_id)
return response
6. Tối ưu Query Expansion
6.1 Cache Synonyms
# backend/hue_portal/core/search_ml.py
from django.core.cache import cache
@lru_cache(maxsize=1)
def get_all_synonyms():
"""Get all synonyms (cached)."""
return list(Synonym.objects.all())
def expand_query_with_synonyms(query: str) -> List[str]:
"""Expand query using cached synonyms."""
query_normalized = normalize_text(query)
expanded = [query_normalized]
# Use cached synonyms
synonyms = get_all_synonyms()
for synonym in synonyms:
keyword = normalize_text(synonym.keyword)
alias = normalize_text(synonym.alias)
if keyword in query_normalized:
expanded.append(query_normalized.replace(keyword, alias))
if alias in query_normalized:
expanded.append(query_normalized.replace(alias, keyword))
return list(set(expanded))
7. Database Query Optimization
7.1 Use select_related / prefetch_related
# backend/hue_portal/core/chatbot.py
def search_by_intent(self, intent: str, query: str, limit: int = 5):
if intent == "search_fine":
qs = Fine.objects.all().select_related('decree') # If has FK
# ... rest
elif intent == "search_legal":
qs = LegalSection.objects.all().select_related('document')
# ... rest
7.2 Add Database Indexes
# backend/hue_portal/core/models.py
class Fine(models.Model):
name = models.CharField(max_length=500, db_index=True) # Add index
code = models.CharField(max_length=50, db_index=True) # Add index
class Meta:
indexes = [
models.Index(fields=['name', 'code']),
models.Index(fields=['min_fine', 'max_fine']),
]
8. Tối ưu Frontend
8.1 Debounce Search Input
// frontend/src/pages/Chat.tsx
const [input, setInput] = useState('')
const debouncedInput = useDebounce(input, 300) // Wait 300ms
useEffect(() => {
if (debouncedInput) {
// Trigger search suggestions
}
}, [debouncedInput])
8.2 Optimistic UI Updates
const handleSend = async (messageText?: string) => {
// Show message immediately (optimistic)
setMessages(prev => [...prev, {
role: 'user',
content: textToSend,
timestamp: new Date()
}])
// Then fetch response
const response = await chat(textToSend, sessionId)
// Update with actual response
}
9. Monitoring & Metrics
9.1 Add Performance Logging
# backend/hue_portal/chatbot/views.py
import time
from django.utils import timezone
@api_view(["POST"])
def chat(request: Request) -> Response:
start_time = time.time()
# ... existing logic
# Log performance metrics
elapsed = time.time() - start_time
logger.info(f"[PERF] Chat response time: {elapsed:.3f}s | Intent: {intent} | Results: {count}")
# Track slow queries
if elapsed > 2.0:
logger.warning(f"[SLOW] Query took {elapsed:.3f}s: {message[:100]}")
return Response(response)
9.2 Track Cache Hit Rate
class ChatbotCache:
cache_hits = 0
cache_misses = 0
@staticmethod
def get_cached_response(query: str, intent: str, session_id: str = None):
cached = cache.get(ChatbotCache.get_cache_key(query, intent, session_id))
if cached:
ChatbotCache.cache_hits += 1
return cached
ChatbotCache.cache_misses += 1
return None
@staticmethod
def get_cache_stats():
total = ChatbotCache.cache_hits + ChatbotCache.cache_misses
if total == 0:
return {"hit_rate": 0, "hits": 0, "misses": 0}
return {
"hit_rate": ChatbotCache.cache_hits / total,
"hits": ChatbotCache.cache_hits,
"misses": ChatbotCache.cache_misses
}
10. Expected Performance Improvements
| Optimization | Current | Optimized | Improvement |
|---|---|---|---|
| Intent Classification | 5-10ms | 1-3ms | 70% faster |
| Search (small dataset) | 50-100ms | 10-30ms | 70% faster |
| Search (large dataset) | 200-500ms | 50-150ms | 70% faster |
| LLM Generation (cached) | 1-5s | 0.01-0.1s | 99% faster |
| LLM Generation (uncached) | 1-5s | 0.8-4s | 20% faster |
| Total Response (cached) | 100-500ms | 10-50ms | 90% faster |
| Total Response (uncached) | 1-6s | 0.5-3s | 50% faster |
11. Implementation Priority
Phase 1: Quick Wins (1-2 days)
- ✅ Add response caching (Django cache)
- ✅ Pre-compile keyword patterns
- ✅ Cache synonyms
- ✅ Add database indexes
- ✅ Early exit for exact matches
Phase 2: Medium Impact (3-5 days)
- ✅ Limit QuerySet before loading
- ✅ Optimize TF-IDF calculation
- ✅ Prompt caching for LLM
- ✅ Optimize local model generation
- ✅ Add performance logging
Phase 3: Advanced (1-2 weeks)
- ✅ Streaming responses
- ✅ Incremental TF-IDF
- ✅ Advanced caching strategies
- ✅ Query result pre-computation
12. Testing Performance
# backend/scripts/benchmark_chatbot.py
import time
import statistics
def benchmark_chatbot():
chatbot = get_chatbot()
test_queries = [
"Mức phạt vượt đèn đỏ là bao nhiêu?",
"Thủ tục đăng ký cư trú cần gì?",
"Địa chỉ công an phường ở đâu?",
# ... more queries
]
times = []
for query in test_queries:
start = time.time()
response = chatbot.generate_response(query)
elapsed = time.time() - start
times.append(elapsed)
print(f"Query: {query[:50]}... | Time: {elapsed:.3f}s")
print(f"\nAverage: {statistics.mean(times):.3f}s")
print(f"Median: {statistics.median(times):.3f}s")
print(f"P95: {statistics.quantiles(times, n=20)[18]:.3f}s")
Kết luận
Với các tối ưu trên, chatbot sẽ:
- Nhanh hơn 50-90% cho cached queries
- Nhanh hơn 20-70% cho uncached queries
- Chính xác hơn với early exit và exact matching
- Scalable hơn với database indexes và query limiting