Spaces:
Sleeping
Sleeping
| # HuggingFace Spaces Deployment Guide | |
| ## Overview | |
| This application is configured to run on **HuggingFace Spaces** using local model inference (no external API calls required). | |
| --- | |
| ## Quick Setup | |
| ### 1. Create a New Space | |
| 1. Go to https://huggingface.co/new-space | |
| 2. Choose **Gradio** as the SDK | |
| 3. Select **GPU** hardware (T4 or better recommended) | |
| 4. Name your Space (e.g., `transcriptor-ai`) | |
| ### 2. Upload Your Code | |
| Upload all files from this directory to your Space, or connect a Git repository. | |
| ### 3. Configure Space Settings (Optional) | |
| Go to **Settings β Variables** in your Space and add: | |
| | Variable | Value | Description | | |
| |----------|-------|-------------| | |
| | `DEBUG_MODE` | `True` or `False` | Enable detailed logging | | |
| | `LLM_TEMPERATURE` | `0.7` | Model creativity (0.0-1.0) | | |
| | `LLM_TIMEOUT` | `120` | Timeout in seconds | | |
| | `LOCAL_MODEL` | `microsoft/Phi-3-mini-4k-instruct` | Model to use | | |
| **Note:** All settings have sensible defaults - you don't need to set these unless you want to customize. | |
| --- | |
| ## Hardware Requirements | |
| ### Recommended: GPU (T4 or better) | |
| - **Phi-3-mini-4k-instruct**: 3.8B params, ~8GB GPU RAM | |
| - Processing speed: ~30-60 seconds per transcript chunk | |
| - **Best for:** Production use with multiple users | |
| ### Alternative: CPU (not recommended) | |
| - Will work but be very slow (5-10 minutes per chunk) | |
| - Only suitable for testing | |
| --- | |
| ## Supported Models | |
| You can change the model by setting the `LOCAL_MODEL` variable: | |
| ### Small & Fast (Recommended for Free Tier) | |
| ``` | |
| LOCAL_MODEL=microsoft/Phi-3-mini-4k-instruct (Default - 3.8B params) | |
| ``` | |
| ### Medium (Better quality, needs more GPU) | |
| ``` | |
| LOCAL_MODEL=mistralai/Mistral-7B-Instruct-v0.3 (7B params) | |
| ``` | |
| ### Alternatives | |
| ``` | |
| LOCAL_MODEL=HuggingFaceH4/zephyr-7b-beta (7B params, good instruction following) | |
| LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0 (1.1B params, very fast but lower quality) | |
| ``` | |
| --- | |
| ## Configuration Files | |
| ### β Required Files | |
| - `app.py` - Main application | |
| - `requirements.txt` - Python dependencies | |
| - `llm.py`, `extractors.py`, etc. - Core modules | |
| ### β οΈ NOT Needed for Spaces | |
| - `.env` file - Use Spaces Variables instead | |
| - Local database files | |
| - API keys (unless using external APIs) | |
| --- | |
| ## Environment Configuration | |
| The app automatically detects if it's running on HuggingFace Spaces and uses local model inference by default. | |
| **Default Configuration (no .env needed):** | |
| ```python | |
| USE_HF_API = False # Don't use HF Inference API | |
| USE_LMSTUDIO = False # Don't use LM Studio | |
| LLM_BACKEND = local # Use local transformers | |
| DEBUG_MODE = False # Disable debug logs | |
| ``` | |
| **To override:** Set Spaces Variables (Settings β Variables) | |
| --- | |
| ## Troubleshooting | |
| ### Issue: "Out of Memory" Error | |
| **Solution:** Switch to a smaller model | |
| ``` | |
| LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0 | |
| ``` | |
| ### Issue: Very Slow Processing | |
| **Solution:** | |
| 1. Make sure you selected **GPU** hardware (not CPU) | |
| 2. Check Space logs for "Model loaded on cuda" confirmation | |
| 3. If on CPU, upgrade to GPU tier | |
| ### Issue: Quality Score 0.00 | |
| **Causes:** | |
| 1. Model not loaded properly (check logs for "[Local Model] Loading...") | |
| 2. GPU out of memory (model falls back to CPU) | |
| 3. Timeout too short (increase `LLM_TIMEOUT`) | |
| **Debug Steps:** | |
| 1. Set `DEBUG_MODE=True` in Spaces Variables | |
| 2. Check logs for detailed error messages | |
| 3. Look for "[Local Model] β Generated X characters" | |
| ### Issue: Model Downloads Every Time | |
| **Solution:** HuggingFace Spaces caches models automatically, but first load takes 2-5 minutes. | |
| - Subsequent starts are faster (~30 seconds) | |
| - Don't restart Space unnecessarily | |
| --- | |
| ## Performance Optimization | |
| ### 1. Reduce Context Window | |
| Edit `llm.py` line 399: | |
| ```python | |
| max_length=2000 # Reduce from 3500 for faster processing | |
| ``` | |
| ### 2. Lower Token Limit | |
| Set Spaces Variable: | |
| ``` | |
| MAX_TOKENS_PER_REQUEST=800 # Default is 1500 | |
| ``` | |
| ### 3. Use Smaller Model | |
| ``` | |
| LOCAL_MODEL=TinyLlama/TinyLlama-1.1B-Chat-v1.0 | |
| ``` | |
| ### 4. Disable Debug Mode | |
| ``` | |
| DEBUG_MODE=False | |
| ``` | |
| --- | |
| ## Monitoring | |
| ### View Logs | |
| 1. Go to your Space | |
| 2. Click **Logs** tab at the top | |
| 3. Look for startup messages: | |
| ``` | |
| β Configuration loaded for HuggingFace Spaces | |
| π TranscriptorAI Enterprise - LLM Backend: local | |
| [Local Model] Loading microsoft/Phi-3-mini-4k-instruct... | |
| [Local Model] β Model loaded on cuda:0 | |
| ``` | |
| ### Check Processing | |
| During analysis, you should see: | |
| ``` | |
| [Local Model] Generating (1500 max tokens, temp=0.7)... | |
| [Local Model] β Generated 1247 characters | |
| [LLM Debug] β Successfully extracted JSON with 7 fields | |
| ``` | |
| --- | |
| ## Cost Estimation | |
| ### Free Tier (CPU) | |
| - β οΈ Very slow but free | |
| - ~5-10 minutes per transcript | |
| ### GPU (T4) - ~$0.60/hour | |
| - β‘ Fast processing | |
| - ~30-60 seconds per transcript | |
| - Space sleeps after inactivity (saves money) | |
| ### Persistent GPU (Upgraded) | |
| - Always-on for instant access | |
| - Higher cost but best user experience | |
| --- | |
| ## Security Notes | |
| 1. **No API Keys Needed:** Everything runs locally | |
| 2. **Private Processing:** Data never leaves your Space | |
| 3. **Secrets Management:** Use Spaces Secrets (not Variables) for sensitive data | |
| 4. **Model Access:** Phi-3 and most models don't require gated access | |
| --- | |
| ## Next Steps | |
| 1. β Upload code to your Space | |
| 2. β Select GPU hardware | |
| 3. β Wait for first model download (~2-5 min) | |
| 4. β Test with a sample transcript | |
| 5. π Share your Space URL! | |
| --- | |
| ## Support | |
| - **HuggingFace Spaces Docs:** https://huggingface.co/docs/hub/spaces | |
| - **Transformers Docs:** https://huggingface.co/docs/transformers | |
| - **GPU Pricing:** https://huggingface.co/pricing | |
| --- | |
| **Last Updated:** October 2025 | |