Spaces:

davidtran999
/

hue-portal-backend-v2

Running

hue-portal-backend-v2 / backend /docs /INTENT_CLASSIFICATION_IMPROVEMENTS.md

Push full code from hue-portal-backend folder

519b145 7 days ago

3.51 kB

Intent Classification Improvements

This document describes the improvements made to intent classification in Plan 5.

Query "Cảnh báo lừa đảo giả danh công an" was being classified as search_office instead of search_advisory.

Keyword Conflict: The keyword "công an" appears in both search_office and queries about search_advisory
Order of Checks: The code checked has_office_keywords before has_advisory_keywords, causing office keywords to match first
Limited Training Data: The search_advisory intent had only 7 examples, compared to more examples in other intents

File: backend/hue_portal/chatbot/chatbot.py

Changed order: Check has_advisory_keywords before has_office_keywords
Added more keywords for advisory: "mạo danh", "thủ đoạn", "cảnh giác"
This ensures advisory queries are matched first when they contain both advisory and office keywords

File: backend/hue_portal/chatbot/training/intent_dataset.json

Expanded search_advisory examples from 7 to 23 examples
Added specific examples:
- "cảnh báo lừa đảo giả danh công an"
- "mạo danh cán bộ công an"
- "lừa đảo mạo danh"
- And 15 more variations

Test queries that now work correctly:

Added _serialize_document in backend/hue_portal/chatbot/chatbot.py so RAG responses return JSON-safe payloads (no more TypeError: Object of type type is not JSON serializable when embeddings include model instances).
Re-tested intents end-to-end via scripts/test_api_endpoint.py (6 queries spanning all intents):
- Result: 6/6 passed, 100 % intent accuracy.
- Latency: avg ~3.7 s (note: first call warms up keepitreal/vietnamese-sbert-v2, subsequent calls ≤1.8 s).
Health checklist before testing:
1. POSTGRES_HOST=localhost POSTGRES_PORT=5433 ../../.venv/bin/python manage.py runserver 0.0.0.0:8090
2. API_BASE_URL=http://localhost:8090 python scripts/test_api_endpoint.py
3. Watch server logs for any serialization warnings (none observed after fix).

backend/hue_portal/chatbot/training/intent_dataset.json - Enhanced training data
backend/hue_portal/chatbot/chatbot.py - Improved keyword matching logic
backend/hue_portal/chatbot/training/artifacts/intent_model.joblib - Retrained model