VedaMD-Backend-v2 / docs /implementation-plan /stable-deployment-plan.md
sniro23's picture
VedaMD Enhanced: Clean deployment with 5x Enhanced Medical RAG System
01f0120

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Stable Deployment Plan: Public Testing

This document outlines a reliable and robust strategy for deploying the VedaMD Clinical Assistant for public testing, with the backend hosted on Hugging Face Spaces and the frontend on Netlify.

1. Background and Motivation

Previous deployment attempts have been plagued by resource exhaustion and dependency conflicts on Hugging Face Spaces. The primary issue was attempting to perform a heavy, one-time build task (creating the vector store) during application startup in a resource-constrained environment.

This new plan decouples the build process from the runtime process, which is a standard best practice for deploying ML applications.

2. High-Level Architecture

  • Vector Store Creation: A local script will generate the FAISS index and associated metadata.
  • Artifact Hosting: The generated vector store artifacts will be uploaded to a new model repository on the Hugging Face Hub using Git LFS.
  • Backend (Hugging Face Space): A lightweight FastAPI application that downloads the vector store from the Hub and serves the RAG API. It will not perform any on-the-fly processing.
  • Frontend (Netlify): The existing Next.js application, configured to point to the new, stable backend API endpoint.

3. Key Advantages of This Approach

  • Reliability: The backend will have a fast and predictable startup time, as it's only downloading files, not computing them.
  • Scalability: The heavy lifting is done offline. The online component is lightweight and can handle API requests efficiently.
  • Maintainability: Separating concerns makes debugging and updating each component (vector store, backend, frontend) much easier.
  • Cost-Effectiveness: We can continue to use the free tiers for both Hugging Face Spaces and Netlify.

4. High-level Task Breakdown

This plan is broken down into clear, verifiable steps. We will proceed one step at a time.

  • Task 1: Pre-compute Vector Store Locally
    • Update: A complete vector store was found at src/vector_store. We can use this existing artifact and do not need to re-compute it.
  • Task 2: Upload Vector Store to Hugging Face Hub
    • Update: Successfully uploaded the vector store files to the sniro23/VedaMD-Vector-Store repository on the Hugging Face Hub.
  • Task 3: Refactor Backend to Load from Hub
    • Update: The backend has been successfully refactored. The simple_vector_store.py and groq_medical_rag.py modules now load the pre-computed index directly from the Hub. The Dockerfile and requirements.txt have been streamlined, removing all build-time dependencies.
  • Task 4: Deploy a New, Clean Backend Space
    • Update: Successfully created and deployed a new, private Docker-based Space at sniro23/VedaMD-Backend-v2. The application is now running and logs can be monitored on the Hub.
  • Task 5: Configure and Deploy Frontend to Netlify
    • Update: The frontend has been configured to connect to the new backend endpoint. The changes have been pushed, and a new deployment has been triggered on Netlify. The application should now be fully operational.

Deployment Complete! The VedaMD Clinical Assistant is now running with a stable, decoupled architecture.

5. Post-Deployment Issues & Fixes

  • Issue (2024-07-26): Encountered a persistent PermissionError: [Errno 13] Permission denied on Hugging Face Spaces. The application, running as a non-root user, could not write to the cache directory (/app/data) because it was created by the root user during the Docker build.
  • Solution: The Dockerfile was updated to explicitly create a non-root user, create the /app/data directory, and then transfer ownership of that directory to the user with chown. This ensures the application has the necessary write permissions at runtime.