Spaces:
Sleeping
Sleeping
YanBoChen
commited on
Commit
·
bb34199
1
Parent(s):
d603ef9
Update deployment documentation and implement cloud data loading for customization pipeline
Browse files- Revise deployment steps for HuggingFace Spaces
- Add cloud data loading functionality in customization pipeline
- Create generate_mapping_json.py for mapping CSV to JSON
- Implement tests for customization and full integration pipelines
Next_after_pushlargefiles_to_dataset.md
CHANGED
|
@@ -1,50 +1,333 @@
|
|
| 1 |
-
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
| 12 |
-
2. **測試**: 雲端載入器獨立功能
|
| 13 |
-
3. **確認**: Dataset 連線正常
|
| 14 |
|
| 15 |
-
###
|
|
|
|
| 16 |
|
| 17 |
-
|
| 18 |
-
2. **測試**: 核心檢索系統雲端載入
|
| 19 |
-
3. **確認**: General pipeline 正常運作
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
|
|
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
-
|
| 43 |
-
- **漸進整合**: 確保不破壞現有功能
|
| 44 |
-
- **快速回退**: 如果有問題可以立即復原
|
| 45 |
|
| 46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Next Steps: Deploy to HuggingFace Spaces - Phase 5
|
| 2 |
|
| 3 |
+
## 📋 當前狀態確認
|
| 4 |
|
| 5 |
+
### ✅ 已完成的階段:
|
| 6 |
+
- **階段 1**: 雲端載入器基礎功能 ✅
|
| 7 |
+
- **階段 2**: General Pipeline 雲端載入 ✅
|
| 8 |
+
- **階段 3**: Customization Pipeline 雲端載入 ✅
|
| 9 |
+
- **階段 4**: 完整整合測試 ✅
|
| 10 |
|
| 11 |
+
### 📊 雲端載入驗證:
|
| 12 |
+
- **Dataset Repository**: `ybchen928/oncall-guide-ai-models`
|
| 13 |
+
- **總資料大小**: ~1.6GB (models/ + customization_data/)
|
| 14 |
+
- **下載性能**: 首次 ~2分鐘,後續使用快取
|
| 15 |
+
- **功能完整性**: 兩條 Pipeline 都正常運作
|
| 16 |
|
| 17 |
+
---
|
| 18 |
|
| 19 |
+
## Phase 5: 部署到 HuggingFace Spaces
|
|
|
|
|
|
|
| 20 |
|
| 21 |
+
### 🎯 部署目標
|
| 22 |
+
**將不含大檔案的程式碼推送到 Spaces,實現雲端資料載入**
|
| 23 |
|
| 24 |
+
---
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
## Step 5.1: 準備部署檔案
|
| 27 |
|
| 28 |
+
### 檢查當前 Git 狀態
|
| 29 |
+
```bash
|
| 30 |
+
# 確認在正確的 branch
|
| 31 |
+
git branch
|
| 32 |
+
# 應該顯示: * HuggingFace_space_dataset_deployment
|
| 33 |
|
| 34 |
+
# 檢查檔案狀態
|
| 35 |
+
git status
|
| 36 |
+
```
|
| 37 |
|
| 38 |
+
### 確認要部署的檔案清單
|
| 39 |
+
**✅ 必須包含的檔案:**
|
| 40 |
+
```
|
| 41 |
+
├── README.md ✅ (Spaces 配置)
|
| 42 |
+
├── app.py ✅ (主應用程式)
|
| 43 |
+
├── requirements.txt ✅ (依賴清單)
|
| 44 |
+
├── .gitattributes ✅ (Git 配置)
|
| 45 |
+
└── src/ ✅ (核心程式碼)
|
| 46 |
+
├── user_prompt.py
|
| 47 |
+
├── retrieval.py (已修改支援雲端載入)
|
| 48 |
+
├── generation.py
|
| 49 |
+
├── llm_clients.py
|
| 50 |
+
├── medical_conditions.py
|
| 51 |
+
├── data_processing.py
|
| 52 |
+
└── cloud_loader.py (新增)
|
| 53 |
+
```
|
| 54 |
|
| 55 |
+
**✅ Customization 程式碼:**
|
| 56 |
+
```
|
| 57 |
+
└── customization/ ✅ (只保留程式碼)
|
| 58 |
+
├── customization_pipeline.py (已修改支援雲端載入)
|
| 59 |
+
├── generate_embeddings.py
|
| 60 |
+
├── test/
|
| 61 |
+
└── src/ (20個 .py 檔案)
|
| 62 |
+
├── cloud_config.py (新增)
|
| 63 |
+
├── indexing/storage.py
|
| 64 |
+
├── indexing/annoy_manager.py
|
| 65 |
+
└── 其他程式碼檔案
|
| 66 |
+
```
|
| 67 |
|
| 68 |
+
**❌ 不要包含的檔案:**
|
| 69 |
+
```
|
| 70 |
+
❌ models/ (已移至 Dataset)
|
| 71 |
+
❌ customization/processing/ (已移至 Dataset)
|
| 72 |
+
❌ evaluation/, tests/, docs/ (開發用)
|
| 73 |
+
❌ dataset/, onCallGuideAIvenv/ (本地環境)
|
| 74 |
+
❌ .env (敏感資訊)
|
| 75 |
+
❌ test_stage*.py (測試腳本)
|
| 76 |
+
```
|
| 77 |
|
| 78 |
+
---
|
| 79 |
|
| 80 |
+
## Step 5.2: 確認配置檔案
|
|
|
|
|
|
|
| 81 |
|
| 82 |
+
### 檢查 requirements.txt
|
| 83 |
+
```bash
|
| 84 |
+
grep "huggingface-hub" requirements.txt
|
| 85 |
+
# 確保包含: huggingface-hub>=0.33,<0.35
|
| 86 |
+
```
|
| 87 |
|
| 88 |
+
### 檢查 README.md YAML frontmatter
|
| 89 |
+
```yaml
|
| 90 |
+
---
|
| 91 |
+
title: OnCall.ai - Medical Emergency Assistant
|
| 92 |
+
emoji: 🏥
|
| 93 |
+
colorFrom: red
|
| 94 |
+
colorTo: blue
|
| 95 |
+
sdk: gradio
|
| 96 |
+
sdk_version: "5.38.0"
|
| 97 |
+
app_file: app.py
|
| 98 |
+
python_version: "3.11"
|
| 99 |
+
pinned: false
|
| 100 |
+
license: mit
|
| 101 |
+
tags:
|
| 102 |
+
- medical
|
| 103 |
+
- healthcare
|
| 104 |
+
- RAG
|
| 105 |
+
- emergency
|
| 106 |
+
- clinical-guidance
|
| 107 |
+
- gradio
|
| 108 |
+
---
|
| 109 |
+
```
|
| 110 |
|
| 111 |
+
### 檢查環境變數設置
|
| 112 |
+
**在 Spaces Settings 中確認:**
|
| 113 |
+
- `HF_TOKEN`: 你的 HuggingFace API token ✅ (已設置)
|
| 114 |
+
- `USE_CLOUD_DATA`: `true` (自動使用雲端模式)
|
| 115 |
+
|
| 116 |
+
---
|
| 117 |
+
|
| 118 |
+
## Step 5.3: Git 提交準備
|
| 119 |
+
|
| 120 |
+
### 暫時移除本地大檔案資料夾
|
| 121 |
+
```bash
|
| 122 |
+
# 備份本地 processing 資料夾 (如果存在)
|
| 123 |
+
if [ -d "customization/processing" ]; then
|
| 124 |
+
mv customization/processing customization_processing_backup
|
| 125 |
+
echo "✅ 已備份 customization/processing"
|
| 126 |
+
fi
|
| 127 |
+
|
| 128 |
+
# 確認大檔案不在 Git 追蹤中
|
| 129 |
+
git status | grep "models\|processing"
|
| 130 |
+
# 不應該看到任何 models/ 或 processing/ 相關檔案
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
### 添加修改的程式碼檔案
|
| 134 |
+
```bash
|
| 135 |
+
# 添加所有程式碼修改
|
| 136 |
+
git add src/cloud_loader.py
|
| 137 |
+
git add src/retrieval.py
|
| 138 |
+
git add customization/src/cloud_config.py
|
| 139 |
+
git add customization/customization_pipeline.py
|
| 140 |
+
git add requirements.txt
|
| 141 |
+
git add README.md
|
| 142 |
+
git add app.py
|
| 143 |
+
git add .gitattributes
|
| 144 |
+
|
| 145 |
+
# 檢查 staging area
|
| 146 |
+
git status
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
---
|
| 150 |
+
|
| 151 |
+
## Step 5.4: 提交並推送到 Spaces
|
| 152 |
+
|
| 153 |
+
### Git 提交
|
| 154 |
+
```bash
|
| 155 |
+
git commit -m "Implement cloud data loading for HuggingFace Spaces deployment
|
| 156 |
+
|
| 157 |
+
- Add cloud_loader.py for core system data loading
|
| 158 |
+
- Add customization cloud_config.py for hospital-specific data
|
| 159 |
+
- Modify retrieval.py to use cloud data loading
|
| 160 |
+
- Modify customization_pipeline.py to use preloading
|
| 161 |
+
- Support both local and cloud deployment modes
|
| 162 |
+
- Tested with full integration verification"
|
| 163 |
+
```
|
| 164 |
+
|
| 165 |
+
### 檢查推送目標
|
| 166 |
+
```bash
|
| 167 |
+
# 確認 remote 設置
|
| 168 |
+
git remote -v
|
| 169 |
+
# 應該看到: hf git@hf.co:spaces/ybchen928/oncall-guide-ai
|
| 170 |
+
|
| 171 |
+
# 檢查要推送的檔案大小
|
| 172 |
+
du -sh src/ customization/src/ *.py *.txt *.md
|
| 173 |
+
# 確保總大小 < 1GB
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
### 推送到 Spaces
|
| 177 |
+
```bash
|
| 178 |
+
git push hf HuggingFace_space_dataset_deployment:main --force
|
| 179 |
+
```
|
| 180 |
+
|
| 181 |
+
---
|
| 182 |
+
|
| 183 |
+
## Step 5.5: 監控部署過程
|
| 184 |
+
|
| 185 |
+
### 檢查建置狀態
|
| 186 |
+
1. **前往 Spaces**: https://huggingface.co/spaces/ybchen928/oncall-guide-ai
|
| 187 |
+
2. **點擊 "App" 標籤**查看建置進度
|
| 188 |
+
3. **觀察 Logs**:
|
| 189 |
+
- 依賴安裝進度
|
| 190 |
+
- 雲端檔案下載進度
|
| 191 |
+
- 系統初始化狀態
|
| 192 |
+
|
| 193 |
+
### 預期建置過程
|
| 194 |
+
```
|
| 195 |
+
Phase 1: Installing dependencies (2-3 minutes)
|
| 196 |
+
├── Installing Python packages from requirements.txt
|
| 197 |
+
├── Setting up Gradio environment
|
| 198 |
+
└── Configuring Python 3.11 environment
|
| 199 |
+
|
| 200 |
+
Phase 2: Application startup (3-5 minutes)
|
| 201 |
+
├── Downloading models/ from Dataset (1.5GB)
|
| 202 |
+
├── Downloading customization_data/ from Dataset (150MB)
|
| 203 |
+
├── Initializing retrieval systems
|
| 204 |
+
└── Starting Gradio interface
|
| 205 |
+
|
| 206 |
+
Phase 3: Ready for use
|
| 207 |
+
├── App status: "Running" 🟢
|
| 208 |
+
├── Interface accessible
|
| 209 |
+
└── All features available
|
| 210 |
+
```
|
| 211 |
+
|
| 212 |
+
---
|
| 213 |
+
|
| 214 |
+
## 🚨 故障排除指南
|
| 215 |
+
|
| 216 |
+
### 常見問題與解決方案
|
| 217 |
+
|
| 218 |
+
#### 1. 依賴安裝失敗
|
| 219 |
+
**症狀**: 建置卡在 "Installing dependencies"
|
| 220 |
+
**解決**:
|
| 221 |
+
```bash
|
| 222 |
+
# 檢查 requirements.txt 語法
|
| 223 |
+
pip check
|
| 224 |
+
|
| 225 |
+
# 確認版本相容性
|
| 226 |
+
pip install --dry-run -r requirements.txt
|
| 227 |
+
```
|
| 228 |
+
|
| 229 |
+
#### 2. 雲端檔案下載失敗
|
| 230 |
+
**症狀**: "404 Not Found" 或 "Access denied"
|
| 231 |
+
**解決**:
|
| 232 |
+
- 確認 Dataset Repository 是 Public
|
| 233 |
+
- 檢查 `HF_TOKEN` 環境變數設置
|
| 234 |
+
- 驗證 Dataset 中的檔案路徑
|
| 235 |
+
|
| 236 |
+
#### 3. 記憶體不足
|
| 237 |
+
**症狀**: "Out of memory" 或建置超時
|
| 238 |
+
**解決**:
|
| 239 |
+
- 考慮升級 Spaces 硬體 (CPU basic → CPU upgrade)
|
| 240 |
+
- 或實施懶載入 (需要時才載入大檔案)
|
| 241 |
+
|
| 242 |
+
#### 4. Gradio 介面無法啟動
|
| 243 |
+
**症狀**: 建置成功但無法存取
|
| 244 |
+
**解決**:
|
| 245 |
+
- 檢查 app.py 中的 launch() 配置
|
| 246 |
+
- 確認沒有硬編碼的 port/host 設定
|
| 247 |
+
|
| 248 |
+
#### 5. Customization 功能失效
|
| 249 |
+
**症狀**: Hospital Only 模式無法運作
|
| 250 |
+
**解決**:
|
| 251 |
+
- 檢查 customization/src/cloud_config.py 的 dataset_repo 名稱
|
| 252 |
+
- 確認所有 processing 檔案都在 Dataset 中
|
| 253 |
+
|
| 254 |
+
---
|
| 255 |
+
|
| 256 |
+
## Step 5.6: 功能驗證
|
| 257 |
+
|
| 258 |
+
### 線上測試清單
|
| 259 |
+
**部署成功後進行的驗證:**
|
| 260 |
+
|
| 261 |
+
#### 基本功能測試
|
| 262 |
+
- [ ] **General Mode**: 測試基本醫療查詢
|
| 263 |
+
- [ ] **Hospital Mode**: 測試 customization 功能
|
| 264 |
+
- [ ] **Combined Mode**: 測試混合功能
|
| 265 |
+
|
| 266 |
+
#### 性能測試
|
| 267 |
+
- [ ] **首次載入時間**: 記錄冷啟動時間
|
| 268 |
+
- [ ] **查詢響應時間**: 測試熱啟動性能
|
| 269 |
+
- [ ] **記憶體使用**: 監控系統資源
|
| 270 |
+
|
| 271 |
+
#### 錯誤處理測試
|
| 272 |
+
- [ ] **無效查詢**: 測試非醫療查詢的拒絕機制
|
| 273 |
+
- [ ] **網路錯誤**: 模擬檔案下載失敗的處理
|
| 274 |
+
- [ ] **資料不一致**: 測試部分檔案缺失的容錯性
|
| 275 |
+
|
| 276 |
+
---
|
| 277 |
+
|
| 278 |
+
## 📊 部署成功指標
|
| 279 |
+
|
| 280 |
+
### 技術指標
|
| 281 |
+
- **建置時間**: < 10 分鐘
|
| 282 |
+
- **首次載入**: < 5 分鐘
|
| 283 |
+
- **查詢響應**: < 3 秒
|
| 284 |
+
- **記憶體使用**: < 4GB (免費額度內)
|
| 285 |
+
|
| 286 |
+
### 功能指標
|
| 287 |
+
- **General Pipeline**: 正常返回醫療建議
|
| 288 |
+
- **Hospital Pipeline**: 正常返回院所特定建議
|
| 289 |
+
- **Combined Mode**: 整合結果正確
|
| 290 |
+
- **錯誤處理**: 適當的錯誤訊息和回退機制
|
| 291 |
+
|
| 292 |
+
---
|
| 293 |
+
|
| 294 |
+
## 🎯 部署後的後續步驟
|
| 295 |
+
|
| 296 |
+
### 文檔更新
|
| 297 |
+
- [ ] 更新 README.md 說明雲端部署架構
|
| 298 |
+
- [ ] 創建使用指南
|
| 299 |
+
- [ ] 記錄已知限制和注意事項
|
| 300 |
+
|
| 301 |
+
### 監控和維護
|
| 302 |
+
- [ ] 設置 Spaces 的使用監控
|
| 303 |
+
- [ ] 定期檢查 Dataset Repository 狀態
|
| 304 |
+
- [ ] 監控建置和執行 logs
|
| 305 |
+
|
| 306 |
+
### 擴展計畫
|
| 307 |
+
- [ ] 考慮實施檔案版本控制
|
| 308 |
+
- [ ] 評估硬體升級需求
|
| 309 |
+
- [ ] 規劃額外功能的雲端化
|
| 310 |
+
|
| 311 |
+
---
|
| 312 |
+
|
| 313 |
+
## ⚠️ 重要注意事項
|
| 314 |
+
|
| 315 |
+
1. **首次使用者體驗**: 第一個使用者需要等待檔案下載 (2-5分鐘)
|
| 316 |
+
2. **快取共享**: 所有使用者共享 HuggingFace 的檔案快取
|
| 317 |
+
3. **成本考量**: 免費 Spaces 有使用時間限制,考慮升級計畫
|
| 318 |
+
4. **資料同步**: Dataset 更新時,Spaces 需要清除快取重新下載
|
| 319 |
+
5. **備份策略**: 保持本地開發環境的完整性,以備緊急回退
|
| 320 |
+
|
| 321 |
+
---
|
| 322 |
+
|
| 323 |
+
## 🚀 執行檢查清單
|
| 324 |
+
|
| 325 |
+
部署前確認:
|
| 326 |
+
- [ ] 階段 4 整合測試通過
|
| 327 |
+
- [ ] Git 狀態乾淨 (無大檔案)
|
| 328 |
+
- [ ] Requirements.txt 包含所有依賴
|
| 329 |
+
- [ ] README.md YAML ��置正確
|
| 330 |
+
- [ ] 環境變數已設置
|
| 331 |
+
- [ ] Dataset Repository 可正常存取
|
| 332 |
+
|
| 333 |
+
準備就緒後,執行 Phase 5 部署步驟!
|
ToDo_huggingface_deployment.md
CHANGED
|
@@ -572,3 +572,22 @@ mv customization_processing_backup customization/processing
|
|
| 572 |
- [ ] 本地測試雲端整合功能
|
| 573 |
- [ ] 部署到 Spaces (不含大檔案)
|
| 574 |
- [ ] 驗證完整功能正常運作
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 572 |
- [ ] 本地測試雲端整合功能
|
| 573 |
- [ ] 部署到 Spaces (不含大檔案)
|
| 574 |
- [ ] 驗證完整功能正常運作
|
| 575 |
+
## ✅ 階段測試結果更新
|
| 576 |
+
|
| 577 |
+
### 階段 1 測試 ✅ 成功
|
| 578 |
+
- 雲端載入器連線正常
|
| 579 |
+
- Dataset Repository 存取成功
|
| 580 |
+
|
| 581 |
+
### 階段 2 測試 ✅ 成功
|
| 582 |
+
- 核心檢索系統雲端載入正常
|
| 583 |
+
- General Pipeline 完整功能驗證
|
| 584 |
+
- 效能: Emergency (84.5M), Treatment (331M) 檔案下載成功
|
| 585 |
+
|
| 586 |
+
### 階段 3 測試 ✅ 成功
|
| 587 |
+
- Customization Pipeline 雲端載入正常
|
| 588 |
+
- 預載入 10 個檔案 (~150MB) 成功
|
| 589 |
+
- Hospital-specific 功能完整驗證
|
| 590 |
+
- 測試結果: "chest pain" (36 results), "emergency treatment" (59 results)
|
| 591 |
+
- 系統載入: 110 tags, 3,784 chunks
|
| 592 |
+
|
| 593 |
+
**當前狀態: 準備階段 4 整合測試**
|
customization/customization_pipeline.py
CHANGED
|
@@ -87,14 +87,17 @@ def retrieve_document_chunks(query: str, top_k: int = 5, llm_client=None) -> Lis
|
|
| 87 |
# Load model and existing embeddings
|
| 88 |
embedding_model = load_biomedbert_model()
|
| 89 |
|
| 90 |
-
# Load from
|
| 91 |
-
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
# Load the saved system with ANNOY indices
|
| 94 |
document_index, tag_embeddings, doc_tag_mapping, chunk_embeddings, annoy_manager = \
|
| 95 |
load_document_system_with_annoy(
|
| 96 |
-
input_dir=
|
| 97 |
-
annoy_dir=
|
| 98 |
)
|
| 99 |
|
| 100 |
if annoy_manager is None:
|
|
|
|
| 87 |
# Load model and existing embeddings
|
| 88 |
embedding_model = load_biomedbert_model()
|
| 89 |
|
| 90 |
+
# Load processing data from cloud or local with preloading
|
| 91 |
+
from cloud_config import customization_loader
|
| 92 |
+
|
| 93 |
+
# Preload all processing files and get directory paths
|
| 94 |
+
embeddings_dir, indices_dir = customization_loader.preload_all_processing_files()
|
| 95 |
|
| 96 |
# Load the saved system with ANNOY indices
|
| 97 |
document_index, tag_embeddings, doc_tag_mapping, chunk_embeddings, annoy_manager = \
|
| 98 |
load_document_system_with_annoy(
|
| 99 |
+
input_dir=embeddings_dir,
|
| 100 |
+
annoy_dir=indices_dir
|
| 101 |
)
|
| 102 |
|
| 103 |
if annoy_manager is None:
|
customization/processing/generate_mapping_json.py
ADDED
|
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Generate mapping.json from combined_er_symptoms_diagnoses.csv
|
| 4 |
+
This script creates the mapping file needed for the customization pipeline.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import csv
|
| 8 |
+
import json
|
| 9 |
+
import os
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
|
| 12 |
+
def csv_to_mapping_json():
|
| 13 |
+
"""Convert CSV to mapping.json format"""
|
| 14 |
+
|
| 15 |
+
# Define paths
|
| 16 |
+
processing_dir = Path(__file__).parent
|
| 17 |
+
customization_dir = processing_dir.parent
|
| 18 |
+
csv_path = customization_dir / "docs" / "combined_er_symptoms_diagnoses.csv"
|
| 19 |
+
output_path = processing_dir / "mapping.json"
|
| 20 |
+
|
| 21 |
+
# Read CSV and convert to mapping format
|
| 22 |
+
mappings = []
|
| 23 |
+
|
| 24 |
+
with open(csv_path, 'r', encoding='utf-8-sig') as csvfile: # Handle BOM
|
| 25 |
+
reader = csv.DictReader(csvfile)
|
| 26 |
+
|
| 27 |
+
for row in reader:
|
| 28 |
+
# Skip empty rows
|
| 29 |
+
if not row.get('PDF Abbreviation'):
|
| 30 |
+
continue
|
| 31 |
+
|
| 32 |
+
# Extract symptoms and diagnoses
|
| 33 |
+
symptoms_raw = row['ER Symptom (Surface)'].strip()
|
| 34 |
+
diagnoses_raw = row['Underlying Diagnosis (Core)'].strip()
|
| 35 |
+
|
| 36 |
+
# Split symptoms by comma and clean
|
| 37 |
+
symptoms = [s.strip() for s in symptoms_raw.split(',') if s.strip()]
|
| 38 |
+
|
| 39 |
+
# Split diagnoses by comma and clean
|
| 40 |
+
diagnoses = [d.strip() for d in diagnoses_raw.split(',') if d.strip()]
|
| 41 |
+
|
| 42 |
+
# Create PDF filename based on abbreviation
|
| 43 |
+
pdf_name = get_pdf_filename(row['PDF Abbreviation'])
|
| 44 |
+
|
| 45 |
+
# Create mapping entry
|
| 46 |
+
mapping = {
|
| 47 |
+
"pdf": pdf_name,
|
| 48 |
+
"symptoms": symptoms,
|
| 49 |
+
"diagnoses": diagnoses
|
| 50 |
+
}
|
| 51 |
+
|
| 52 |
+
mappings.append(mapping)
|
| 53 |
+
|
| 54 |
+
# Write to JSON file
|
| 55 |
+
with open(output_path, 'w', encoding='utf-8') as jsonfile:
|
| 56 |
+
json.dump(mappings, jsonfile, indent=2, ensure_ascii=False)
|
| 57 |
+
|
| 58 |
+
print(f"✅ Generated mapping.json with {len(mappings)} entries")
|
| 59 |
+
print(f"📄 Output saved to: {output_path}")
|
| 60 |
+
|
| 61 |
+
# Verify all PDFs exist
|
| 62 |
+
docs_dir = customization_dir / "docs"
|
| 63 |
+
missing_pdfs = []
|
| 64 |
+
|
| 65 |
+
for mapping in mappings:
|
| 66 |
+
pdf_path = docs_dir / mapping['pdf']
|
| 67 |
+
if not pdf_path.exists():
|
| 68 |
+
missing_pdfs.append(mapping['pdf'])
|
| 69 |
+
|
| 70 |
+
if missing_pdfs:
|
| 71 |
+
print(f"\n⚠️ Warning: {len(missing_pdfs)} PDF files not found:")
|
| 72 |
+
for pdf in missing_pdfs[:5]: # Show first 5
|
| 73 |
+
print(f" - {pdf}")
|
| 74 |
+
if len(missing_pdfs) > 5:
|
| 75 |
+
print(f" ... and {len(missing_pdfs) - 5} more")
|
| 76 |
+
else:
|
| 77 |
+
print("\n✅ All PDF files found in docs directory")
|
| 78 |
+
|
| 79 |
+
return mappings
|
| 80 |
+
|
| 81 |
+
def get_pdf_filename(abbreviation):
|
| 82 |
+
"""Convert abbreviation to actual PDF filename based on files in docs directory"""
|
| 83 |
+
|
| 84 |
+
# Mapping of abbreviations to actual PDF filenames
|
| 85 |
+
pdf_mapping = {
|
| 86 |
+
"SpinalCordEmergencies": "Recognizing Spinal Cord Emergencies.pdf",
|
| 87 |
+
"DizzinessApproach": "*Dizziness - A Diagnostic Approach.pdf",
|
| 88 |
+
"CodeHeadache": "*Code Headache - Development of a protocol for optimizing headache management in the emergency room.pdf",
|
| 89 |
+
"EarlyAFTherapy": "Early Rhythm-Control Therapy in Patients with Atrial Fibrillation.pdf",
|
| 90 |
+
"2024ESC_AF_Guidelines": "2024 ESC Guidelines for the management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery.pdf",
|
| 91 |
+
"PregnancyBleeding_ED": "What assessment, intervention and diagnostics should women with early pregnancy bleeding receive in the emergency department and when A scoping review and synthesis of evidence.pdf",
|
| 92 |
+
"UGIB_Guideline": "acg_clinical_guideline__upper_gastrointestinal_and.14.pdf",
|
| 93 |
+
"PulmonaryEmbolism": "Acute Pulmonary Embolism A Review.pdf",
|
| 94 |
+
"CAP_Review": "Community-Acquired Pneumonia.pdf",
|
| 95 |
+
"AcuteIschemicStroke_Guideline": "Guidelines for the Early Management of Patients With Acute Ischemic Stroke.pdf",
|
| 96 |
+
"ChestPain_Guideline_2021": "2021 Guideline for the Evaluation and Diagnosis of Chest Pain.pdf",
|
| 97 |
+
"FUO_Neutropenia_2024": "2024 update of the AGIHO guideline on diagnosis and empirical treatment of fever of unknown origin (FUO) in adult neutropenic patients with solid tumours and hematological malignancies.pdf",
|
| 98 |
+
"Eclampsia_ER_Management": "*Management of eclampsia in the accident and emergency department.pdf",
|
| 99 |
+
"UTI_Mazzulli": "Diagnosis and Management of simple and complicated urinary tract infections (UTIs).pdf",
|
| 100 |
+
"Pediatric_Seizures_2016": "J Paediatrics Child Health - 2016 - Lawton - Seizures in the paediatric emergency department.pdf",
|
| 101 |
+
"PregnancyLoss_Review": "A REVIEW OF THE MANAGEMENT OF LOSS OF PREGNANCY IN THE EMERGENCY DEPARTMENT.pdf",
|
| 102 |
+
"FUO_Children": "Update on Fever of Unknown Origin in Children Focus on Etiologies and Clinical Apporach.pdf",
|
| 103 |
+
# New entries based on actual files in docs directory
|
| 104 |
+
"MyastheniaGravis": "[Transition of Japanese clinical guidelines for myasthenia gravis].pdf",
|
| 105 |
+
"AcutePorphyrias": "AGA Clinical Practice Update on Diagnosis and Management of Acute Hepatic Porphyrias- Expert Review.pdf",
|
| 106 |
+
"Botulism": "Clinical Guidelines for Diagnosis and Treatment of Botulism, 2021.pdf",
|
| 107 |
+
"WilsonsDisease": "EASL-ERN Clinical Practice Guidelines on Wilsons disease.pdf",
|
| 108 |
+
"HereditaryAngioedema": "The international WAO:EAACI guideline for the management of hereditary angioedema-The 2021 revision and update.pdf",
|
| 109 |
+
}
|
| 110 |
+
|
| 111 |
+
# Return mapped filename or create a generic one based on abbreviation
|
| 112 |
+
return pdf_mapping.get(abbreviation, f"{abbreviation}.pdf")
|
| 113 |
+
|
| 114 |
+
if __name__ == "__main__":
|
| 115 |
+
csv_to_mapping_json()
|
test_stage3_customization.py
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""階段 3 測試:Customization Pipeline 雲端載入測試"""
|
| 3 |
+
|
| 4 |
+
import os
|
| 5 |
+
import sys
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
|
| 8 |
+
# 設置環境變數測試雲端模式
|
| 9 |
+
os.environ['USE_CLOUD_DATA'] = 'true'
|
| 10 |
+
|
| 11 |
+
# 添加路徑
|
| 12 |
+
current_dir = Path(__file__).parent
|
| 13 |
+
src_dir = current_dir / "src"
|
| 14 |
+
sys.path.insert(0, str(src_dir))
|
| 15 |
+
sys.path.insert(0, str(current_dir))
|
| 16 |
+
|
| 17 |
+
def test_customization_pipeline():
|
| 18 |
+
"""測試 Customization Pipeline 雲端載入"""
|
| 19 |
+
print("🧪 階段 3 測試:Customization Pipeline 雲端載入...")
|
| 20 |
+
|
| 21 |
+
try:
|
| 22 |
+
from customization.customization_pipeline import retrieve_document_chunks
|
| 23 |
+
print("✅ customization_pipeline 模組載入成功")
|
| 24 |
+
|
| 25 |
+
# 測試 customization pipeline (會觸發雲端下載)
|
| 26 |
+
print("🏥 測試 customization 查詢...")
|
| 27 |
+
results = retrieve_document_chunks("chest pain", top_k=3)
|
| 28 |
+
print(f"✅ Customization search 成功,返回 {len(results)} 個結果")
|
| 29 |
+
|
| 30 |
+
# 測試另一個查詢
|
| 31 |
+
print("🏥 測試另一個 customization 查詢...")
|
| 32 |
+
results2 = retrieve_document_chunks("emergency treatment", top_k=5)
|
| 33 |
+
print(f"✅ 第二個查詢成功,返回 {len(results2)} 個結果")
|
| 34 |
+
|
| 35 |
+
print("🎉 階段 3 測試通過:Customization Pipeline 雲端載入正常!")
|
| 36 |
+
return True
|
| 37 |
+
|
| 38 |
+
except Exception as e:
|
| 39 |
+
print(f"❌ 階段 3 測試失敗: {e}")
|
| 40 |
+
import traceback
|
| 41 |
+
traceback.print_exc()
|
| 42 |
+
return False
|
| 43 |
+
|
| 44 |
+
if __name__ == "__main__":
|
| 45 |
+
success = test_customization_pipeline()
|
| 46 |
+
print(f"\n📋 測試結果: {'成功' if success else '失敗'}")
|
| 47 |
+
exit(0 if success else 1)
|
test_stage4_full_integration.py
ADDED
|
@@ -0,0 +1,186 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""階段 4 測試:完整整合測試 - 模擬 app.py 流程"""
|
| 3 |
+
|
| 4 |
+
import os
|
| 5 |
+
import sys
|
| 6 |
+
import time
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
from typing import Dict, Any
|
| 9 |
+
|
| 10 |
+
# 設置環境變數測試雲端模式
|
| 11 |
+
os.environ['USE_CLOUD_DATA'] = 'true'
|
| 12 |
+
|
| 13 |
+
# 添加路徑
|
| 14 |
+
current_dir = Path(__file__).parent
|
| 15 |
+
src_dir = current_dir / "src"
|
| 16 |
+
sys.path.insert(0, str(src_dir))
|
| 17 |
+
sys.path.insert(0, str(current_dir))
|
| 18 |
+
|
| 19 |
+
def test_full_integration():
|
| 20 |
+
"""完整整合測試"""
|
| 21 |
+
print("🧪 階段 4 測試:完整整合測試...")
|
| 22 |
+
print("=" * 60)
|
| 23 |
+
|
| 24 |
+
results = {}
|
| 25 |
+
|
| 26 |
+
try:
|
| 27 |
+
# Test 1: Core system initialization
|
| 28 |
+
print("\n🔧 測試 1: 核心系統初始化")
|
| 29 |
+
start_time = time.time()
|
| 30 |
+
|
| 31 |
+
from user_prompt import UserPromptProcessor
|
| 32 |
+
from retrieval import BasicRetrievalSystem
|
| 33 |
+
|
| 34 |
+
processor = UserPromptProcessor()
|
| 35 |
+
retrieval_system = BasicRetrievalSystem()
|
| 36 |
+
|
| 37 |
+
init_time = time.time() - start_time
|
| 38 |
+
print(f"✅ 核心系統初始化成功 ({init_time:.2f}秒)")
|
| 39 |
+
results['core_init_time'] = init_time
|
| 40 |
+
|
| 41 |
+
# Test 2: General Pipeline test
|
| 42 |
+
print("\n🔍 測試 2: General Pipeline 功能")
|
| 43 |
+
test_queries = [
|
| 44 |
+
"chest pain emergency",
|
| 45 |
+
"heart attack symptoms",
|
| 46 |
+
"respiratory distress"
|
| 47 |
+
]
|
| 48 |
+
|
| 49 |
+
for query in test_queries:
|
| 50 |
+
start_time = time.time()
|
| 51 |
+
search_results = retrieval_system.search(query, top_k=5)
|
| 52 |
+
search_time = time.time() - start_time
|
| 53 |
+
|
| 54 |
+
result_count = len(search_results.get('processed_results', []))
|
| 55 |
+
print(f" 📊 '{query}': {result_count} 個結果 ({search_time:.3f}秒)")
|
| 56 |
+
|
| 57 |
+
print("✅ General Pipeline 測試完成")
|
| 58 |
+
|
| 59 |
+
# Test 3: Customization Pipeline test
|
| 60 |
+
print("\n🏥 測試 3: Customization Pipeline 功能")
|
| 61 |
+
try:
|
| 62 |
+
from customization.customization_pipeline import retrieve_document_chunks
|
| 63 |
+
|
| 64 |
+
for query in test_queries:
|
| 65 |
+
start_time = time.time()
|
| 66 |
+
custom_results = retrieve_document_chunks(query, top_k=3)
|
| 67 |
+
custom_time = time.time() - start_time
|
| 68 |
+
|
| 69 |
+
print(f" 🏥 '{query}': {len(custom_results)} 個結果 ({custom_time:.3f}秒)")
|
| 70 |
+
|
| 71 |
+
print("✅ Customization Pipeline 測試完成")
|
| 72 |
+
results['customization_available'] = True
|
| 73 |
+
|
| 74 |
+
except Exception as e:
|
| 75 |
+
print(f"❌ Customization Pipeline 錯誤: {e}")
|
| 76 |
+
results['customization_available'] = False
|
| 77 |
+
|
| 78 |
+
# Test 4: Combined mode simulation (like app.py)
|
| 79 |
+
print("\n🔄 測試 4: Combined Mode 模擬")
|
| 80 |
+
test_query = "chest pain emergency treatment"
|
| 81 |
+
|
| 82 |
+
# Step 1: UserPromptProcessor (correct method name)
|
| 83 |
+
start_time = time.time()
|
| 84 |
+
extraction_result = processor.extract_condition_keywords(test_query)
|
| 85 |
+
extraction_time = time.time() - start_time
|
| 86 |
+
print(f" 📝 Condition extraction: {extraction_time:.3f}秒")
|
| 87 |
+
print(f" Condition: {extraction_result.get('condition', 'None')}")
|
| 88 |
+
print(f" Emergency keywords: {extraction_result.get('emergency_keywords', 'None')}")
|
| 89 |
+
print(f" Treatment keywords: {extraction_result.get('treatment_keywords', 'None')}")
|
| 90 |
+
|
| 91 |
+
# Step 2: General retrieval
|
| 92 |
+
start_time = time.time()
|
| 93 |
+
general_results = retrieval_system.search(test_query, top_k=5)
|
| 94 |
+
general_time = time.time() - start_time
|
| 95 |
+
general_count = len(general_results.get('processed_results', []))
|
| 96 |
+
print(f" 🔍 General retrieval: {general_count} 個結果 ({general_time:.3f}秒)")
|
| 97 |
+
|
| 98 |
+
# Step 3: Customization retrieval (if available)
|
| 99 |
+
if results['customization_available']:
|
| 100 |
+
start_time = time.time()
|
| 101 |
+
custom_results = retrieve_document_chunks(test_query, top_k=3)
|
| 102 |
+
custom_time = time.time() - start_time
|
| 103 |
+
print(f" 🏥 Hospital retrieval: {len(custom_results)} 個結果 ({custom_time:.3f}秒)")
|
| 104 |
+
|
| 105 |
+
print("✅ Combined Mode 模擬完成")
|
| 106 |
+
|
| 107 |
+
# Test 5: Performance comparison
|
| 108 |
+
print("\n⚡ 測試 5: 性能測試 (熱啟動)")
|
| 109 |
+
queries_for_speed = ["emergency", "treatment", "chest pain"]
|
| 110 |
+
|
| 111 |
+
for query in queries_for_speed:
|
| 112 |
+
# General pipeline speed
|
| 113 |
+
start_time = time.time()
|
| 114 |
+
retrieval_system.search(query, top_k=3)
|
| 115 |
+
general_speed = time.time() - start_time
|
| 116 |
+
|
| 117 |
+
# Customization pipeline speed (if available)
|
| 118 |
+
if results['customization_available']:
|
| 119 |
+
start_time = time.time()
|
| 120 |
+
retrieve_document_chunks(query, top_k=3)
|
| 121 |
+
custom_speed = time.time() - start_time
|
| 122 |
+
print(f" ⚡ '{query}': General {general_speed:.3f}s, Hospital {custom_speed:.3f}s")
|
| 123 |
+
else:
|
| 124 |
+
print(f" ⚡ '{query}': General {general_speed:.3f}s")
|
| 125 |
+
|
| 126 |
+
print("✅ 性能測試完成")
|
| 127 |
+
|
| 128 |
+
print("\n" + "=" * 60)
|
| 129 |
+
print("🎉 階段 4 整合測試完全成功!")
|
| 130 |
+
print("📊 摘要:")
|
| 131 |
+
print(f" - 核心系統初始化時間: {results['core_init_time']:.2f}秒")
|
| 132 |
+
print(f" - Customization 功能: {'可用' if results['customization_available'] else '不可用'}")
|
| 133 |
+
print(f" - 兩條 Pipeline 都能從雲端載入資料")
|
| 134 |
+
print(f" - 系統整合功能完整")
|
| 135 |
+
|
| 136 |
+
return True
|
| 137 |
+
|
| 138 |
+
except Exception as e:
|
| 139 |
+
print(f"❌ 階段 4 整合測試失敗: {e}")
|
| 140 |
+
import traceback
|
| 141 |
+
traceback.print_exc()
|
| 142 |
+
return False
|
| 143 |
+
|
| 144 |
+
def test_environment_switching():
|
| 145 |
+
"""測試環境變數切換功能"""
|
| 146 |
+
print("\n🔄 額外測試: 環境變數切換")
|
| 147 |
+
|
| 148 |
+
try:
|
| 149 |
+
# Test cloud mode
|
| 150 |
+
os.environ['USE_CLOUD_DATA'] = 'true'
|
| 151 |
+
from cloud_loader import CloudDataLoader
|
| 152 |
+
loader_cloud = CloudDataLoader()
|
| 153 |
+
print(f" ☁️ 雲端模式: {loader_cloud.use_cloud}")
|
| 154 |
+
|
| 155 |
+
# Test local mode
|
| 156 |
+
os.environ['USE_CLOUD_DATA'] = 'false'
|
| 157 |
+
loader_local = CloudDataLoader()
|
| 158 |
+
print(f" 💻 本地模式: {loader_local.use_cloud}")
|
| 159 |
+
|
| 160 |
+
# Reset to cloud mode
|
| 161 |
+
os.environ['USE_CLOUD_DATA'] = 'true'
|
| 162 |
+
|
| 163 |
+
print("✅ 環境變數切換測試成功")
|
| 164 |
+
return True
|
| 165 |
+
|
| 166 |
+
except Exception as e:
|
| 167 |
+
print(f"❌ 環境變數切換測試失敗: {e}")
|
| 168 |
+
return False
|
| 169 |
+
|
| 170 |
+
if __name__ == "__main__":
|
| 171 |
+
print("🚀 開始階段 4: 完整整合測試")
|
| 172 |
+
|
| 173 |
+
# Main integration test
|
| 174 |
+
integration_success = test_full_integration()
|
| 175 |
+
|
| 176 |
+
# Environment switching test
|
| 177 |
+
env_success = test_environment_switching()
|
| 178 |
+
|
| 179 |
+
overall_success = integration_success and env_success
|
| 180 |
+
|
| 181 |
+
print(f"\n📋 階段 4 總結果: {'完全成功' if overall_success else '部分失敗'}")
|
| 182 |
+
|
| 183 |
+
if overall_success:
|
| 184 |
+
print("🎯 準備進入階段 5: 部署到 Spaces")
|
| 185 |
+
|
| 186 |
+
exit(0 if overall_success else 1)
|