Spaces:
Sleeping
Sleeping
| # Stable Deployment Plan: Public Testing | |
| This document outlines a reliable and robust strategy for deploying the VedaMD Clinical Assistant for public testing, with the backend hosted on Hugging Face Spaces and the frontend on Netlify. | |
| ## 1. Background and Motivation | |
| Previous deployment attempts have been plagued by resource exhaustion and dependency conflicts on Hugging Face Spaces. The primary issue was attempting to perform a heavy, one-time build task (creating the vector store) during application startup in a resource-constrained environment. | |
| This new plan decouples the build process from the runtime process, which is a standard best practice for deploying ML applications. | |
| ## 2. High-Level Architecture | |
| - **Vector Store Creation**: A local script will generate the FAISS index and associated metadata. | |
| - **Artifact Hosting**: The generated vector store artifacts will be uploaded to a new model repository on the Hugging Face Hub using Git LFS. | |
| - **Backend (Hugging Face Space)**: A lightweight FastAPI application that downloads the vector store from the Hub and serves the RAG API. It will not perform any on-the-fly processing. | |
| - **Frontend (Netlify)**: The existing Next.js application, configured to point to the new, stable backend API endpoint. | |
| ## 3. Key Advantages of This Approach | |
| - **Reliability**: The backend will have a fast and predictable startup time, as it's only downloading files, not computing them. | |
| - **Scalability**: The heavy lifting is done offline. The online component is lightweight and can handle API requests efficiently. | |
| - **Maintainability**: Separating concerns makes debugging and updating each component (vector store, backend, frontend) much easier. | |
| - **Cost-Effectiveness**: We can continue to use the free tiers for both Hugging Face Spaces and Netlify. | |
| ## 4. High-level Task Breakdown | |
| This plan is broken down into clear, verifiable steps. We will proceed one step at a time. | |
| - [x] **Task 1: Pre-compute Vector Store Locally** | |
| - *Update*: A complete vector store was found at `src/vector_store`. We can use this existing artifact and do not need to re-compute it. | |
| - [x] **Task 2: Upload Vector Store to Hugging Face Hub** | |
| - *Update*: Successfully uploaded the vector store files to the `sniro23/VedaMD-Vector-Store` repository on the Hugging Face Hub. | |
| - [x] **Task 3: Refactor Backend to Load from Hub** | |
| - *Update*: The backend has been successfully refactored. The `simple_vector_store.py` and `groq_medical_rag.py` modules now load the pre-computed index directly from the Hub. The `Dockerfile` and `requirements.txt` have been streamlined, removing all build-time dependencies. | |
| - [x] **Task 4: Deploy a New, Clean Backend Space** | |
| - *Update*: Successfully created and deployed a new, private Docker-based Space at `sniro23/VedaMD-Backend-v2`. The application is now running and logs can be monitored on the Hub. | |
| - [x] **Task 5: Configure and Deploy Frontend to Netlify** | |
| - *Update*: The frontend has been configured to connect to the new backend endpoint. The changes have been pushed, and a new deployment has been triggered on Netlify. The application should now be fully operational. | |
| --- | |
| **Deployment Complete!** The VedaMD Clinical Assistant is now running with a stable, decoupled architecture. | |
| ## 5. Post-Deployment Issues & Fixes | |
| - **Issue (2024-07-26):** Encountered a persistent `PermissionError: [Errno 13] Permission denied` on Hugging Face Spaces. The application, running as a non-root user, could not write to the cache directory (`/app/data`) because it was created by the `root` user during the Docker build. | |
| - **Solution:** The `Dockerfile` was updated to explicitly create a non-root `user`, create the `/app/data` directory, and then transfer ownership of that directory to the `user` with `chown`. This ensures the application has the necessary write permissions at runtime. |