VedaMD-Backend-v2 / docs /implementation-plan /stable-deployment-plan.md
sniro23's picture
VedaMD Enhanced: Clean deployment with 5x Enhanced Medical RAG System
01f0120
# Stable Deployment Plan: Public Testing
This document outlines a reliable and robust strategy for deploying the VedaMD Clinical Assistant for public testing, with the backend hosted on Hugging Face Spaces and the frontend on Netlify.
## 1. Background and Motivation
Previous deployment attempts have been plagued by resource exhaustion and dependency conflicts on Hugging Face Spaces. The primary issue was attempting to perform a heavy, one-time build task (creating the vector store) during application startup in a resource-constrained environment.
This new plan decouples the build process from the runtime process, which is a standard best practice for deploying ML applications.
## 2. High-Level Architecture
- **Vector Store Creation**: A local script will generate the FAISS index and associated metadata.
- **Artifact Hosting**: The generated vector store artifacts will be uploaded to a new model repository on the Hugging Face Hub using Git LFS.
- **Backend (Hugging Face Space)**: A lightweight FastAPI application that downloads the vector store from the Hub and serves the RAG API. It will not perform any on-the-fly processing.
- **Frontend (Netlify)**: The existing Next.js application, configured to point to the new, stable backend API endpoint.
## 3. Key Advantages of This Approach
- **Reliability**: The backend will have a fast and predictable startup time, as it's only downloading files, not computing them.
- **Scalability**: The heavy lifting is done offline. The online component is lightweight and can handle API requests efficiently.
- **Maintainability**: Separating concerns makes debugging and updating each component (vector store, backend, frontend) much easier.
- **Cost-Effectiveness**: We can continue to use the free tiers for both Hugging Face Spaces and Netlify.
## 4. High-level Task Breakdown
This plan is broken down into clear, verifiable steps. We will proceed one step at a time.
- [x] **Task 1: Pre-compute Vector Store Locally**
- *Update*: A complete vector store was found at `src/vector_store`. We can use this existing artifact and do not need to re-compute it.
- [x] **Task 2: Upload Vector Store to Hugging Face Hub**
- *Update*: Successfully uploaded the vector store files to the `sniro23/VedaMD-Vector-Store` repository on the Hugging Face Hub.
- [x] **Task 3: Refactor Backend to Load from Hub**
- *Update*: The backend has been successfully refactored. The `simple_vector_store.py` and `groq_medical_rag.py` modules now load the pre-computed index directly from the Hub. The `Dockerfile` and `requirements.txt` have been streamlined, removing all build-time dependencies.
- [x] **Task 4: Deploy a New, Clean Backend Space**
- *Update*: Successfully created and deployed a new, private Docker-based Space at `sniro23/VedaMD-Backend-v2`. The application is now running and logs can be monitored on the Hub.
- [x] **Task 5: Configure and Deploy Frontend to Netlify**
- *Update*: The frontend has been configured to connect to the new backend endpoint. The changes have been pushed, and a new deployment has been triggered on Netlify. The application should now be fully operational.
---
**Deployment Complete!** The VedaMD Clinical Assistant is now running with a stable, decoupled architecture.
## 5. Post-Deployment Issues & Fixes
- **Issue (2024-07-26):** Encountered a persistent `PermissionError: [Errno 13] Permission denied` on Hugging Face Spaces. The application, running as a non-root user, could not write to the cache directory (`/app/data`) because it was created by the `root` user during the Docker build.
- **Solution:** The `Dockerfile` was updated to explicitly create a non-root `user`, create the `/app/data` directory, and then transfer ownership of that directory to the `user` with `chown`. This ensures the application has the necessary write permissions at runtime.