ocr / README_DEPLOY.md
Beijuka's picture
Upload folder using huggingface_hub
0f922c9 verified

Deployment notes for Hugging Face Spaces

  1. HF_TOKEN secret
  • Create a Hugging Face token at https://huggingface.co/settings/tokens
  • Token should have repository write permissions (to create and push Spaces)
  • In GitHub, go to Settings -> Secrets -> Actions -> New repository secret
    • Name: HF_TOKEN
    • Value:
  1. Streamlit compatibility
  • The workflow creates the Space with space_sdk='streamlit' so it will run as a Streamlit app.
  • Hugging Face Spaces will run streamlit_app.py or app.py by default; this repo contains streamlit_app.py to be explicit.
  1. System dependencies
  • Some OCR engines require system packages (e.g., Tesseract binary, system libs for PaddlePaddle). Hugging Face's Streamlit SDK does not allow installing system packages.
  • If you need system packages, use a Docker-based Space (set space_sdk='docker' and add a Dockerfile that installs required system packages).
  1. LLM / Ollama
  • The app optionally uses ollama for LLM features. Ollama is not installed by default in Spaces; LLM features will be disabled if ollama isn't present.
  1. Tesseract
  • Ensure Tesseract is available in the environment or use the Docker approach to install it.
  1. Running CI/CD
  • After pushing to main and setting HF_TOKEN secret, the GitHub Actions workflow .github/workflows/deploy_to_hf.yml will create the Space and upload the repository.

Note: This repository includes a Dockerfile and the CI workflow is configured to create a Docker-based Space (space_sdk='docker'). The Dockerfile installs system dependencies such as Tesseract so the OCR engines can run inside the Space container.

  1. Troubleshooting
  • If the deployment fails, open the Actions run logs to see the error and adjust the workflow or repository accordingly.