Spaces:

sniro23
/

VedaMD-Backend-v2

Sleeping

App Files Files Community

VedaMD-Backend-v2 / docs /implementation-plan /stable-deployment-plan.md

sniro23

VedaMD Enhanced: Clean deployment with 5x Enhanced Medical RAG System

01f0120 5 months ago

preview code

raw

history blame contribute delete

3.93 kB

	# Stable Deployment Plan: Public Testing

	This document outlines a reliable and robust strategy for deploying the VedaMD Clinical Assistant for public testing, with the backend hosted on Hugging Face Spaces and the frontend on Netlify.

	## 1. Background and Motivation

	Previous deployment attempts have been plagued by resource exhaustion and dependency conflicts on Hugging Face Spaces. The primary issue was attempting to perform a heavy, one-time build task (creating the vector store) during application startup in a resource-constrained environment.

	This new plan decouples the build process from the runtime process, which is a standard best practice for deploying ML applications.

	## 2. High-Level Architecture

	- Vector Store Creation: A local script will generate the FAISS index and associated metadata.
	- Artifact Hosting: The generated vector store artifacts will be uploaded to a new model repository on the Hugging Face Hub using Git LFS.
	- Backend (Hugging Face Space): A lightweight FastAPI application that downloads the vector store from the Hub and serves the RAG API. It will not perform any on-the-fly processing.
	- Frontend (Netlify): The existing Next.js application, configured to point to the new, stable backend API endpoint.

	## 3. Key Advantages of This Approach

	- Reliability: The backend will have a fast and predictable startup time, as it's only downloading files, not computing them.
	- Scalability: The heavy lifting is done offline. The online component is lightweight and can handle API requests efficiently.
	- Maintainability: Separating concerns makes debugging and updating each component (vector store, backend, frontend) much easier.
	- Cost-Effectiveness: We can continue to use the free tiers for both Hugging Face Spaces and Netlify.

	## 4. High-level Task Breakdown

	This plan is broken down into clear, verifiable steps. We will proceed one step at a time.

	- [x] Task 1: Pre-compute Vector Store Locally
	- Update: A complete vector store was found at `src/vector_store`. We can use this existing artifact and do not need to re-compute it.
	- [x] Task 2: Upload Vector Store to Hugging Face Hub
	- Update: Successfully uploaded the vector store files to the `sniro23/VedaMD-Vector-Store` repository on the Hugging Face Hub.
	- [x] Task 3: Refactor Backend to Load from Hub
	- Update: The backend has been successfully refactored. The `simple_vector_store.py` and `groq_medical_rag.py` modules now load the pre-computed index directly from the Hub. The `Dockerfile` and `requirements.txt` have been streamlined, removing all build-time dependencies.
	- [x] Task 4: Deploy a New, Clean Backend Space
	- Update: Successfully created and deployed a new, private Docker-based Space at `sniro23/VedaMD-Backend-v2`. The application is now running and logs can be monitored on the Hub.
	- [x] Task 5: Configure and Deploy Frontend to Netlify
	- Update: The frontend has been configured to connect to the new backend endpoint. The changes have been pushed, and a new deployment has been triggered on Netlify. The application should now be fully operational.

	---

	Deployment Complete! The VedaMD Clinical Assistant is now running with a stable, decoupled architecture.

	## 5. Post-Deployment Issues & Fixes

	- Issue (2024-07-26): Encountered a persistent `PermissionError: [Errno 13] Permission denied` on Hugging Face Spaces. The application, running as a non-root user, could not write to the cache directory (`/app/data`) because it was created by the `root` user during the Docker build.
	- Solution: The `Dockerfile` was updated to explicitly create a non-root `user`, create the `/app/data` directory, and then transfer ownership of that directory to the `user` with `chown`. This ensures the application has the necessary write permissions at runtime.

	# Stable Deployment Plan: Public Testing

	This document outlines a reliable and robust strategy for deploying the VedaMD Clinical Assistant for public testing, with the backend hosted on Hugging Face Spaces and the frontend on Netlify.

	## 1. Background and Motivation

	Previous deployment attempts have been plagued by resource exhaustion and dependency conflicts on Hugging Face Spaces. The primary issue was attempting to perform a heavy, one-time build task (creating the vector store) during application startup in a resource-constrained environment.

	This new plan decouples the build process from the runtime process, which is a standard best practice for deploying ML applications.

	## 2. High-Level Architecture

	- Vector Store Creation: A local script will generate the FAISS index and associated metadata.
	- Artifact Hosting: The generated vector store artifacts will be uploaded to a new model repository on the Hugging Face Hub using Git LFS.
	- Backend (Hugging Face Space): A lightweight FastAPI application that downloads the vector store from the Hub and serves the RAG API. It will not perform any on-the-fly processing.
	- Frontend (Netlify): The existing Next.js application, configured to point to the new, stable backend API endpoint.

	## 3. Key Advantages of This Approach

	- Reliability: The backend will have a fast and predictable startup time, as it's only downloading files, not computing them.
	- Scalability: The heavy lifting is done offline. The online component is lightweight and can handle API requests efficiently.
	- Maintainability: Separating concerns makes debugging and updating each component (vector store, backend, frontend) much easier.
	- Cost-Effectiveness: We can continue to use the free tiers for both Hugging Face Spaces and Netlify.

	## 4. High-level Task Breakdown

	This plan is broken down into clear, verifiable steps. We will proceed one step at a time.

	- [x] Task 1: Pre-compute Vector Store Locally
	- Update: A complete vector store was found at `src/vector_store`. We can use this existing artifact and do not need to re-compute it.
	- [x] Task 2: Upload Vector Store to Hugging Face Hub
	- Update: Successfully uploaded the vector store files to the `sniro23/VedaMD-Vector-Store` repository on the Hugging Face Hub.
	- [x] Task 3: Refactor Backend to Load from Hub
	- Update: The backend has been successfully refactored. The `simple_vector_store.py` and `groq_medical_rag.py` modules now load the pre-computed index directly from the Hub. The `Dockerfile` and `requirements.txt` have been streamlined, removing all build-time dependencies.
	- [x] Task 4: Deploy a New, Clean Backend Space
	- Update: Successfully created and deployed a new, private Docker-based Space at `sniro23/VedaMD-Backend-v2`. The application is now running and logs can be monitored on the Hub.
	- [x] Task 5: Configure and Deploy Frontend to Netlify
	- Update: The frontend has been configured to connect to the new backend endpoint. The changes have been pushed, and a new deployment has been triggered on Netlify. The application should now be fully operational.

	---

	Deployment Complete! The VedaMD Clinical Assistant is now running with a stable, decoupled architecture.

	## 5. Post-Deployment Issues & Fixes

	- Issue (2024-07-26): Encountered a persistent `PermissionError: [Errno 13] Permission denied` on Hugging Face Spaces. The application, running as a non-root user, could not write to the cache directory (`/app/data`) because it was created by the `root` user during the Docker build.
	- Solution: The `Dockerfile` was updated to explicitly create a non-root `user`, create the `/app/data` directory, and then transfer ownership of that directory to the `user` with `chown`. This ensures the application has the necessary write permissions at runtime.