A newer version of the Gradio SDK is available:
6.1.0
Containerized Installation for Inference on Linux GPU Servers
Ensure docker installed and ready (requires sudo), can skip if system is already capable of running nvidia containers. Example here is for Ubuntu, see NVIDIA Containers for more examples.
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit-base sudo apt install nvidia-container-runtime sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart dockerBuild the container image:
docker build -t h2ogpt .Run the container (you can also use
finetune.pyand all of its parameters as shown above for training):For the fine-tuned h2oGPT with 20 billion parameters:
docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \ -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \ --base_model=h2oai/h2ogpt-oasst1-512-20bif have a private HF token, can instead run:
docker run --runtime=nvidia --shm-size=64g --entrypoint=bash -p 7860:7860 \ -e HUGGINGFACE_API_TOKEN=<HUGGINGFACE_API_TOKEN> \ -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it \ -c 'huggingface-cli login --token $HUGGINGFACE_API_TOKEN && python3.10 generate.py --base_model=h2oai/h2ogpt-oasst1-512-20b --use_auth_token=True'For your own fine-tuned model starting from the gpt-neox-20b foundation model for example:
docker run --runtime=nvidia --shm-size=64g -p 7860:7860 \ -v ${HOME}/.cache:/root/.cache --rm h2ogpt -it generate.py \ --base_model=EleutherAI/gpt-neox-20b \ --lora_weights=h2ogpt_lora_weights --prompt_type=human_botOpen
https://localhost:7860in the browser
Docker Compose Setup & Inference
(optional) Change desired model and weights under
environmentin thedocker-compose.ymlBuild and run the container
docker-compose up -d --buildOpen
https://localhost:7860in the browserSee logs:
docker-compose logs -fClean everything up:
docker-compose down --volumes --rmi all