warbler-cda / DOCKER_BUILD_PERFORMANCE.md
Bellok's picture
trying again (#2)
5d2d720 verified

Warbler CDA Docker Build Performance

Build Configuration

  • Dockerfile: Minimal FractalStat testing setup
  • Base Image: python:3.11-slim
  • Build Context Optimization: .dockerignore excludes cache files and large directories
  • Dependency Strategy: Minimal ML dependencies for FractalStat testing

Performance Measurements

Optimized Build Results (Windows with WSL)

βœ… FINAL OPTIMIZED BUILD: 38.4 seconds (~40 seconds)
β”œβ”€β”€ Base Image Pull: 3.7 seconds
β”œβ”€β”€ System Dependencies: 20.5 seconds (git install)
β”œβ”€β”€ Dependencies (pip install): 5.8 seconds
β”‚   - pydantic>=2.0.0 (only needed library!)
β”‚   - pytest>=7.0.0 (testing framework)
β”œβ”€β”€ Code Copy: 0.2 seconds
β”œβ”€β”€ Layer Export: 6.4 seconds
└── Image Unpack: 1.7 seconds

Performance Improvement Achieved

πŸš€ Optimization Results:

  • Build Time Reduction: 94% faster (601.6s β†’ 38.4s)
  • Pip Install Reduction: 98% faster (295.6s β†’ 5.8s)
  • Context Size: 556B (highly optimized .dockerignore - final reduction)
  • Expected Image Size: ~250MB (vs 12.29GB bloated)

πŸ“Š Bottleneck Eliminated:

  • Removed PyTorch/Transformers dependency chain causing 98% of bloat
  • FractalStat modules require zero ML libraries
  • Pure Python with dataclasses, enums, typing, json

πŸ” Root Cause Identified: Original bloat caused by transformers[torch] pulling:

  • PyTorch CPU (~1GB)
  • 100+ optional dependencies (~11GB)
  • All unnecessary for FractalStat core functionality

Recommendations for Faster Builds

For Development Builds

  1. Use cached layers - Base image and system dependencies rarely change
  2. Separate dependency layers - Cache pip installs when code changes frequently
  3. Minimal dependencies - Only install what's needed for testing FractalStat specifically

For Production Builds

  1. Multi-stage builds - Separate testing and runtime images
  2. Dependency optimization - Use Docker layer caching more effectively
  3. Alternative base images - Consider smaller Python images or compiled binaries

Testing Results

  • βœ… All 70 FractalStat entity tests pass
  • βœ… FractalStat coordinates and entities work correctly
  • βœ… RAG bridge integration functions properly
  • βœ… Container startup and imports work as expected

Performance Notes

  • First-time build: ~10 minutes (acceptable for ML dependencies)
  • Subsequent builds: Should be faster with Docker layer caching
  • Network dependency: Download times vary by internet connection
  • WSL overhead: Minimal impact on overall build time