Spaces:

ianshank
/

phi35-moe-demo

Sleeping

File size: 2,645 Bytes

7fcd1e7
0a3cece
 
 
 
7fcd1e7
9b932bc
15fc08d
3eeba36
4193aac
7fcd1e7
0a3cece
6510698
 
 
7fcd1e7
 
6510698
0a3cece
6510698
0a3cece
bc47fb9
0a3cece
3eeba36
6510698
3eeba36
 
 
0a3cece
bc47fb9
 
3eeba36
bc47fb9
3eeba36
bc47fb9
 
0a3cece
3eeba36
 
 
 
 
 
 
 
 
 
bc47fb9
6510698
bc47fb9
3eeba36
 
 
 
6510698
 
 
bc47fb9
 
 
 
6510698
3eeba36
 
 
 
 
 
 
 
6510698
 
bc47fb9

---
title: Phi-3.5-MoE Expert Assistant
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
entrypoint: start.sh
startup_duration_timeout: 600
pinned: false
license: mit
short_description: AI assistant with expert routing and CPU/GPU support
models:
- microsoft/Phi-3.5-MoE-instruct
---

# 🤖 Phi-3.5-MoE Expert Assistant

A robust, production-ready AI assistant powered by Microsoft's Phi-3.5-MoE model with intelligent expert routing and comprehensive CPU/GPU environment support.

## 🚀 Key Features

- **🧠 Expert Routing**: Automatically routes queries to specialized experts (Code, Math, Reasoning, Multilingual, General)
- **🔧 Environment Adaptive**: Works seamlessly on both CPU and GPU environments
- **🛡️ Robust Dependency Management**: Conditional installation of dependencies based on environment
- **📦 Fault Tolerance**: Handles missing dependencies with fallback mechanisms
- **⚡ Performance Optimized**: Environment-specific optimizations for best performance

## 🔧 Recent Fixes

- ✅ **Missing Dependencies**: Added `einops` to requirements, conditional `flash_attn` installation
- ✅ **Deprecated Parameters**: Fixed all `torch_dtype` → `dtype` usage
- ✅ **CPU Compatibility**: Automatic CPU-safe model revision selection
- ✅ **Error Handling**: Comprehensive fallback mechanisms
- ✅ **Security**: Updated to Gradio 4.44.0+ for security fixes

## 🏗️ Architecture

```
app.py              # Main application entry point
preinstall.py       # Pre-installation script for dependencies
model_patch.py      # Patch for handling missing dependencies
start.sh            # Startup script
requirements.txt    # Core dependencies
```

## 🎯 How It Works

1. **Environment Detection**: Automatically detects CPU vs GPU environment
2. **Dependency Management**: Installs required dependencies based on environment
3. **Model Configuration**: Uses optimal settings for each environment
4. **Expert Routing**: Classifies queries and routes to appropriate expert
5. **Graceful Fallbacks**: Works even when dependencies are missing

## 📊 Performance

| Environment | Startup | Memory | Tokens/sec |
|-------------|---------|--------|------------|
| **CPU**     | 3-5 min | 8-12 GB | 2-5 |
| **GPU**     | 2-3 min | 16-20 GB | 15-30 |

## 🔍 Troubleshooting

If you encounter issues:
1. Check the logs for dependency installation
2. Verify the pre-installation script executed successfully
3. Ensure all required packages are installed
4. Try the fallback mode if model loading fails

---

**Built with ❤️ for reliable, production-ready AI applications**