Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
Stable Deployment Plan: Public Testing
This document outlines a reliable and robust strategy for deploying the VedaMD Clinical Assistant for public testing, with the backend hosted on Hugging Face Spaces and the frontend on Netlify.
1. Background and Motivation
Previous deployment attempts have been plagued by resource exhaustion and dependency conflicts on Hugging Face Spaces. The primary issue was attempting to perform a heavy, one-time build task (creating the vector store) during application startup in a resource-constrained environment.
This new plan decouples the build process from the runtime process, which is a standard best practice for deploying ML applications.
2. High-Level Architecture
- Vector Store Creation: A local script will generate the FAISS index and associated metadata.
- Artifact Hosting: The generated vector store artifacts will be uploaded to a new model repository on the Hugging Face Hub using Git LFS.
- Backend (Hugging Face Space): A lightweight FastAPI application that downloads the vector store from the Hub and serves the RAG API. It will not perform any on-the-fly processing.
- Frontend (Netlify): The existing Next.js application, configured to point to the new, stable backend API endpoint.
3. Key Advantages of This Approach
- Reliability: The backend will have a fast and predictable startup time, as it's only downloading files, not computing them.
- Scalability: The heavy lifting is done offline. The online component is lightweight and can handle API requests efficiently.
- Maintainability: Separating concerns makes debugging and updating each component (vector store, backend, frontend) much easier.
- Cost-Effectiveness: We can continue to use the free tiers for both Hugging Face Spaces and Netlify.
4. High-level Task Breakdown
This plan is broken down into clear, verifiable steps. We will proceed one step at a time.
- Task 1: Pre-compute Vector Store Locally
- Update: A complete vector store was found at
src/vector_store. We can use this existing artifact and do not need to re-compute it.
- Update: A complete vector store was found at
- Task 2: Upload Vector Store to Hugging Face Hub
- Update: Successfully uploaded the vector store files to the
sniro23/VedaMD-Vector-Storerepository on the Hugging Face Hub.
- Update: Successfully uploaded the vector store files to the
- Task 3: Refactor Backend to Load from Hub
- Update: The backend has been successfully refactored. The
simple_vector_store.pyandgroq_medical_rag.pymodules now load the pre-computed index directly from the Hub. TheDockerfileandrequirements.txthave been streamlined, removing all build-time dependencies.
- Update: The backend has been successfully refactored. The
- Task 4: Deploy a New, Clean Backend Space
- Update: Successfully created and deployed a new, private Docker-based Space at
sniro23/VedaMD-Backend-v2. The application is now running and logs can be monitored on the Hub.
- Update: Successfully created and deployed a new, private Docker-based Space at
- Task 5: Configure and Deploy Frontend to Netlify
- Update: The frontend has been configured to connect to the new backend endpoint. The changes have been pushed, and a new deployment has been triggered on Netlify. The application should now be fully operational.
Deployment Complete! The VedaMD Clinical Assistant is now running with a stable, decoupled architecture.
5. Post-Deployment Issues & Fixes
- Issue (2024-07-26): Encountered a persistent
PermissionError: [Errno 13] Permission deniedon Hugging Face Spaces. The application, running as a non-root user, could not write to the cache directory (/app/data) because it was created by therootuser during the Docker build. - Solution: The
Dockerfilewas updated to explicitly create a non-rootuser, create the/app/datadirectory, and then transfer ownership of that directory to theuserwithchown. This ensures the application has the necessary write permissions at runtime.