ml-sharp

Sleeping

Robin L. M. Cheung, MBA commited on 2 days ago

Commit

01504c4

1 Parent(s): 0b57493

feat: Add local CUDA support, MCP server, Spaces GPU selection, and stacking roadmap

- Remove ZeroGPU dependency, optimize for local CUDA (4090/3090/3070ti)
- Add MCP server (mcp_server.py) with sharp_predict, list_outputs tools
- Add hardware_config.py for Spaces GPU selection with persistence
- Add Settings tab in Gradio UI for hardware configuration
- Support all HuggingFace Spaces GPUs (ZeroGPU through A100)
- Enable Gradio API by default (show_api=True)
- Add comprehensive WARP.md with codebase map and documentation
- Complete multi-image stacking roadmap with implementation phases

New files:
- WARP.md: Project guidance for WARP/AI assistants
- mcp_server.py: MCP server for programmatic access
- hardware_config.py: GPU hardware selection module

Environment:
- SHARP_PORT (default: 49200) for Gradio
- SHARP_MCP_PORT (default: 49201) for MCP
- CUDA_VISIBLE_DEVICES for multi-GPU selection

Files changed (8) hide show

.gitignore +1 -0
WARP.md +344 -0
app.py +145 -3
hardware_config.py +252 -0
mcp_server.py +224 -0
model_utils.py +71 -20
pyproject.toml +2 -1
requirements.txt +2 -1

.gitignore CHANGED Viewed

@@ -217,3 +217,4 @@ __marimo__/
 # Kilo Code
 .kilocode/

 # Kilo Code
 .kilocode/
+.hardware_config.json

WARP.md ADDED Viewed

	@@ -0,0 +1,344 @@

+# WARP.md
+This file provides guidance to WARP (warp.dev) when working with code in this repository.
+## Project Overview
+SHARP (Single-image 3D Gaussian scene prediction) Gradio demo. Wraps Apple's SHARP model to predict 3D Gaussian scenes from single images, export `.ply` files, and optionally render camera trajectory videos.
+Optimized for local CUDA (4090/3090/3070ti) or HuggingFace Spaces GPU. Includes MCP server for programmatic access.
+## Development Commands
+```bash
+# Install dependencies (uses uv package manager)
+uv sync
+# Run the Gradio app (port 49200 by default)
+uv run python app.py
+# Run MCP server (stdio transport)
+uv run python mcp_server.py
+# Lint with ruff
+uv run ruff check .
+uv run ruff format .
+```
+## Codebase Map
+```
+ml-sharp/
+├── app.py              # Gradio UI (tabs: Run, Examples, About, Settings)
+│   ├── build_demo()    # Main UI builder
+│   ├── run_sharp()     # Inference entrypoint called by UI
+│   └── discover_examples()  # Load precompiled examples
+├── model_utils.py      # Core inference + rendering
+│   ├── ModelWrapper    # Checkpoint loading, predictor caching
+│   │   ├── predict_to_ply()   # Image → Gaussians → PLY
+│   │   └── render_video()     # Gaussians → MP4 trajectory
+│   ├── PredictionOutputs      # Dataclass for inference results
+│   ├── configure_gpu_mode()   # Switch between local/Spaces GPU
+│   └── predict_and_maybe_render_gpu  # Module-level entrypoint
+├── hardware_config.py  # GPU hardware selection & persistence
+│   ├── HardwareConfig  # Dataclass with mode, hardware, duration
+│   ├── get_hardware_choices()  # Dropdown options
+│   └── SPACES_HARDWARE_SPECS   # HF Spaces GPU specs & pricing
+├── mcp_server.py       # MCP server for programmatic access
+│   ├── sharp_predict   # Tool: image → PLY + video
+│   ├── list_outputs    # Tool: list generated files
+│   └── sharp://info    # Resource: GPU status, config
+├── assets/examples/    # Precompiled example outputs
+├── outputs/            # Runtime outputs (PLY, MP4)
+├── .hardware_config.json  # Persisted hardware settings
+├── pyproject.toml      # Dependencies (uv)
+└── WARP.md             # This file
+```
+### Data Flow
+```
+Image → load_rgb() → predict_image() → Gaussians3D → save_ply() → PLY
+                                              ↓
+                                      render_video() → MP4
+```
+## Architecture
+### Core Files
+- `app.py` — Gradio UI with tabs for Run/Examples/About/Settings. Handles example discovery from `assets/examples/` via manifest.json or filename conventions.
+- `model_utils.py` — SHARP model wrapper with checkpoint loading (HF Hub → CDN fallback), inference via `predict_to_ply()`, and CUDA video rendering via `render_video()`.
+- `hardware_config.py` — GPU hardware selection between local CUDA and HuggingFace Spaces. Persists to `.hardware_config.json`.
+- `mcp_server.py` — MCP server exposing `sharp_predict` tool and `sharp://info` resource.
+### Key Patterns
+**Local CUDA mode**: Model kept on GPU by default (`SHARP_KEEP_MODEL_ON_DEVICE=1`) for better performance on dedicated GPUs.
+**Spaces GPU mode**: Uses `@spaces.GPU` decorator for dynamic GPU allocation on HuggingFace Spaces. Configurable via Settings tab.
+**Checkpoint resolution order**:
+1. `SHARP_CHECKPOINT_PATH` env var
+2. HF Hub cache
+3. HF Hub download
+4. Upstream CDN via `torch.hub`
+**Video rendering**: Requires CUDA (gsplat). Falls back gracefully on CPU-only systems by returning `None` for video path.
+## Environment Variables
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SHARP_PORT` | `49200` | Gradio server port |
+| `SHARP_MCP_PORT` | `49201` | MCP server port |
+| `SHARP_CHECKPOINT_PATH` | — | Override local checkpoint path |
+| `SHARP_HF_REPO_ID` | `apple/Sharp` | HuggingFace repo |
+| `SHARP_HF_FILENAME` | `sharp_2572gikvuh.pt` | Checkpoint filename |
+| `SHARP_KEEP_MODEL_ON_DEVICE` | `1` | Keep model on GPU (set `0` to free VRAM) |
+| `CUDA_VISIBLE_DEVICES` | — | GPU selection (e.g., `0` or `0,1`) |
+## Gradio API
+API is enabled by default. Access at `http://localhost:49200/?view=api`.
+### Endpoint: `/api/run_sharp`
+```python
+import requests
+response = requests.post(
+    "http://localhost:49200/api/run_sharp",
+    json={
+        "data": [
+            "/path/to/image.jpg",  # image_path
+            "rotate_forward",       # trajectory_type
+            0,                       # output_long_side (0 = match input)
+            60,                      # num_frames
+            30,                      # fps
+            True,                    # render_video
+        ]
+    }
+)
+result = response.json()["data"]
+video_path, ply_path, status = result
+```
+## MCP Server
+Run the MCP server for integration with AI agents:
+```bash
+uv run python mcp_server.py
+```
+### MCP Config (for clients like Warp)
+```json
+{
+  "mcpServers": {
+    "sharp": {
+      "command": "uv",
+      "args": ["run", "python", "mcp_server.py"],
+      "cwd": "/home/robin/CascadeProjects/ml-sharp"
+    }
+  }
+}
+```
+### Tools
+- `sharp_predict(image_path, render_video=True, trajectory_type="rotate_forward", ...)` — Run inference
+- `list_outputs()` — List generated PLY/MP4 files
+### Resources
+- `sharp://info` — GPU status, configuration
+- `sharp://help` — Usage documentation
+## Multi-GPU Configuration
+Select GPU via environment variable:
+```bash
+# Use GPU 0 (e.g., 4090)
+CUDA_VISIBLE_DEVICES=0 uv run python app.py
+# Use GPU 1 (e.g., 3090)
+CUDA_VISIBLE_DEVICES=1 uv run python app.py
+```
+## HuggingFace Spaces GPU
+The app supports HuggingFace Spaces paid GPUs for faster inference or larger models. Configure via the **Settings** tab.
+### Available Hardware
+| Hardware | VRAM | Price/hr | Best For |
+|----------|------|----------|----------|
+| ZeroGPU (H200) | 70GB | Free (PRO) | Demos, dynamic allocation |
+| T4 small | 16GB | $0.40 | Light workloads |
+| T4 medium | 16GB | $0.60 | Standard workloads |
+| L4x1 | 24GB | $0.80 | Standard inference |
+| L4x4 | 96GB | $3.80 | Multi-GPU |
+| L40Sx1 | 48GB | $1.80 | Large models |
+| L40Sx4 | 192GB | $8.30 | Very large models |
+| A10G small | 24GB | $1.00 | Balanced |
+| A10G large | 24GB | $1.50 | More CPU/RAM |
+| A100 large | 80GB | $2.50 | Maximum VRAM |
+### Deploying to Spaces
+1. Push to HuggingFace Space
+2. Set hardware in Space settings (or use `suggested_hardware` in README.md)
+3. The app auto-detects Spaces environment via `SPACE_ID` env var
+### README.md Metadata for Spaces
+```yaml
+---
+title: SHARP - 3D Gaussian Scene Prediction
+emoji: 🔪
+colorFrom: purple
+colorTo: indigo
+sdk: gradio
+sdk_version: 6.2.0
+python_version: 3.13.11
+app_file: app.py
+suggested_hardware: l4x1  # or zero-gpu, a100-large, etc.
+startup_duration_timeout: 1h
+preload_from_hub:
+- apple/Sharp sharp_2572gikvuh.pt
+---
+```
+## Examples System
+Place precompiled outputs in `assets/examples/`:
+- `<name>.{jpg,png,webp}` + `<name>.mp4` + `<name>.ply`
+- Or define `assets/examples/manifest.json` with `{label, image, video, ply}` entries
+## Multi-Image Stacking Roadmap
+SHARP predicts 3D Gaussians from a single image. To "stack" multiple images into a unified scene:
+### Required Components
+1. **Pose Estimation** (`multi_view.py`)
+   - Estimate relative camera poses between images
+   - Options: COLMAP, hloc, or PnP-based
+   - Transform each prediction to common world frame
+2. **Gaussian Merging** (`gaussian_merge.py`)
+   - Concatenate Gaussian parameters (means, covariances, colors, opacities)
+   - Deduplicate overlapping regions via density-based filtering
+   - Optional: fine-tune merged scene with photometric loss
+3. **UI Changes**
+   - Multi-upload widget
+   - Alignment preview/validation
+   - Progress indicator for multi-image processing
+### Data Structures
+```python
+@dataclass
+class AlignedGaussians:
+    gaussians: Gaussians3D
+    world_transform: torch.Tensor  # 4x4 SE(3)
+    source_image: Path
+def merge_gaussians(aligned: list[AlignedGaussians]) -> Gaussians3D:
+    # 1. Transform each Gaussian's means by world_transform
+    # 2. Concatenate all parameters
+    # 3. Density-based pruning in overlapping regions
+    ...
+```
+### Dependencies to Add
+- `pycolmap` or `hloc` for pose estimation
+- `open3d` for point cloud operations (optional)
+### Implementation Phases
+#### Phase 1: Basic Multi-Image Pipeline
+- [ ] Add `multi_view.py` with `estimate_relative_pose(img1, img2)` using feature matching
+- [ ] Add `gaussian_merge.py` with naive concatenation (no dedup)
+- [ ] UI: Multi-file upload in new "Stack" tab
+- [ ] Export merged PLY
+#### Phase 2: Pose Estimation Options
+- [ ] Integrate COLMAP sparse reconstruction for >2 images
+- [ ] Add hloc (Hierarchical Localization) as lightweight alternative
+- [ ] Fallback: manual pose input for known camera rigs
+#### Phase 3: Gaussian Deduplication
+- [ ] Implement KD-tree based nearest-neighbor pruning
+- [ ] Merge overlapping Gaussians by averaging parameters
+- [ ] Add confidence weighting based on view angle
+#### Phase 4: Refinement (Optional)
+- [ ] Photometric loss optimization on merged scene
+- [ ] Iterative alignment refinement
+- [ ] Support for depth priors from stereo/MVS
+### API Design
+```python
+# multi_view.py
+def estimate_poses(
+    images: list[Path],
+    method: Literal["colmap", "hloc", "pnp"] = "hloc",
+) -> list[np.ndarray]:  # List of 4x4 world-to-camera transforms
+    ...
+# gaussian_merge.py
+def merge_scenes(
+    predictions: list[PredictionOutputs],
+    poses: list[np.ndarray],
+    deduplicate: bool = True,
+    dedup_radius: float = 0.01,  # meters
+) -> Gaussians3D:
+    ...
+# app.py (Stack tab)
+def run_stack(
+    images: list[str],  # Gradio multi-file upload
+    pose_method: str,
+    deduplicate: bool,
+) -> tuple[str | None, str | None, str]:  # video, ply, status
+    ...
+```
+### MCP Extension
+```python
+# mcp_server.py additions
+@mcp.tool()
+def sharp_stack(
+    image_paths: list[str],
+    pose_method: str = "hloc",
+    deduplicate: bool = True,
+    render_video: bool = True,
+) -> dict:
+    """Stack multiple images into unified 3D Gaussian scene."""
+    ...
+```
+### Technical Considerations
+**Coordinate Systems**:
+- SHARP outputs Gaussians in camera-centric coordinates
+- Need to transform to world frame using estimated poses
+- Convention: Y-up, -Z forward (OpenGL style)
+**Memory Management**:
+- Each SHARP prediction ~50-200MB GPU memory
+- Batch processing with model unload between predictions
+- Consider streaming merge for >10 images
+**Quality Metrics**:
+- Reprojection error for pose validation
+- Gaussian density histogram for coverage analysis
+- Visual comparison with ground truth (if available)

app.py CHANGED Viewed

@@ -29,7 +29,22 @@ from typing import Final
 import gradio as gr
-from model_utils import TrajectoryType, predict_and_maybe_render_gpu
 # -----------------------------------------------------------------------------
 # Paths & constants
@@ -42,6 +57,7 @@ EXAMPLES_DIR: Final[Path] = ASSETS_DIR / "examples"
 IMAGE_EXTS: Final[tuple[str, ...]] = (".png", ".jpg", ".jpeg", ".webp")
 DEFAULT_QUEUE_MAX_SIZE: Final[int] = 32
 THEME: Final = gr.themes.Soft(
     primary_hue="indigo",
@@ -239,6 +255,68 @@ def _validate_image(image_path: str | None) -> None:
         raise gr.Error("Upload an image first.")
 def run_sharp(
     image_path: str | None,
     trajectory_type: TrajectoryType,
@@ -354,7 +432,7 @@ def build_demo() -> gr.Blocks:
                                     )
                                 render_toggle = gr.Checkbox(
-                                    label="Render MP4 (CUDA / ZeroGPU only)",
                                     value=True,
                                 )
@@ -490,6 +568,65 @@ def build_demo() -> gr.Blocks:
                             """.strip()
                         )
         demo.queue(max_size=DEFAULT_QUEUE_MAX_SIZE, default_concurrency_limit=1)
         return demo
@@ -497,4 +634,9 @@ def build_demo() -> gr.Blocks:
 demo = build_demo()
 if __name__ == "__main__":
-    demo.launch(theme=THEME, css=CSS)

 import gradio as gr
+import os
+from model_utils import (
+    TrajectoryType,
+    predict_and_maybe_render_gpu,
+    configure_gpu_mode,
+    get_gpu_status,
+)
+from hardware_config import (
+    get_hardware_choices,
+    parse_hardware_choice,
+    get_config,
+    update_config,
+    SPACES_HARDWARE_SPECS,
+    is_running_on_spaces,
+)
 # -----------------------------------------------------------------------------
 # Paths & constants
 IMAGE_EXTS: Final[tuple[str, ...]] = (".png", ".jpg", ".jpeg", ".webp")
 DEFAULT_QUEUE_MAX_SIZE: Final[int] = 32
+DEFAULT_PORT: Final[int] = int(os.getenv("SHARP_PORT", "49200"))
 THEME: Final = gr.themes.Soft(
     primary_hue="indigo",
         raise gr.Error("Upload an image first.")
+# -----------------------------------------------------------------------------
+# Hardware Configuration
+# -----------------------------------------------------------------------------
+def _get_current_hardware_value() -> str:
+    """Get current hardware choice value for dropdown."""
+    config = get_config()
+    if config.mode == "local":
+        return "local"
+    return f"spaces:{config.spaces_hardware}"
+def _format_gpu_status() -> str:
+    """Format GPU status as markdown."""
+    status = get_gpu_status()
+    config = get_config()
+    lines = ["### Current Status"]
+    lines.append(f"- **Mode:** {'Local CUDA' if config.mode == 'local' else 'HuggingFace Spaces'}")
+    if config.mode == "spaces":
+        hw_spec = SPACES_HARDWARE_SPECS.get(config.spaces_hardware, {})
+        lines.append(f"- **Spaces Hardware:** {hw_spec.get('name', config.spaces_hardware)}")
+        lines.append(f"- **VRAM:** {hw_spec.get('vram', 'N/A')}")
+        lines.append(f"- **Price:** {hw_spec.get('price', 'N/A')}")
+        lines.append(f"- **Duration:** {config.spaces_duration}s")
+    else:
+        lines.append(f"- **CUDA Available:** {'✅ Yes' if status['cuda_available'] else '❌ No'}")
+        lines.append(f"- **Spaces Module:** {'✅ Installed' if status['spaces_available'] else '❌ Not installed'}")
+        if status['devices']:
+            lines.append("\n### Local GPUs")
+            for dev in status['devices']:
+                lines.append(f"- **GPU {dev['index']}:** {dev['name']} ({dev['total_memory_gb']}GB)")
+    if is_running_on_spaces():
+        lines.append("\n⚠️ *Running on HuggingFace Spaces*")
+    return "\n".join(lines)
+def _apply_hardware_config(choice: str, duration: int) -> str:
+    """Apply hardware configuration and return status."""
+    mode, spaces_hw = parse_hardware_choice(choice)
+    # Update config
+    update_config(
+        mode=mode,
+        spaces_hardware=spaces_hw if spaces_hw else "zero-gpu",
+        spaces_duration=duration,
+    )
+    # Configure GPU mode in model_utils
+    configure_gpu_mode(
+        use_spaces=(mode == "spaces"),
+        duration=duration,
+    )
+    return _format_gpu_status()
 def run_sharp(
     image_path: str | None,
     trajectory_type: TrajectoryType,
                                     )
                                 render_toggle = gr.Checkbox(
+                                    label="Render MP4 (requires CUDA)",
                                     value=True,
                                 )
                             """.strip()
                         )
+                with gr.Tab("⚙️ Settings", id="settings"):
+                    with gr.Column(elem_id="settings-panel"):
+                        gr.Markdown("### GPU Hardware Selection")
+                        gr.Markdown(
+                            "Select local CUDA or HuggingFace Spaces GPU for inference. "
+                            "Spaces GPUs require deploying to HuggingFace Spaces."
+                        )
+                        with gr.Row():
+                            with gr.Column(scale=3):
+                                hw_dropdown = gr.Dropdown(
+                                    label="Hardware",
+                                    choices=get_hardware_choices(),
+                                    value=_get_current_hardware_value(),
+                                    interactive=True,
+                                )
+                                duration_slider = gr.Slider(
+                                    label="Spaces GPU Duration (seconds)",
+                                    info="Max time for @spaces.GPU decorator (ZeroGPU only)",
+                                    minimum=60,
+                                    maximum=300,
+                                    step=30,
+                                    value=get_config().spaces_duration,
+                                    interactive=True,
+                                )
+                                apply_btn = gr.Button("Apply & Save", variant="primary")
+                            with gr.Column(scale=2):
+                                hw_status = gr.Markdown(
+                                    value=_format_gpu_status(),
+                                    elem_id="hw-status",
+                                )
+                        apply_btn.click(
+                            fn=_apply_hardware_config,
+                            inputs=[hw_dropdown, duration_slider],
+                            outputs=[hw_status],
+                        )
+                        gr.Markdown(
+                            """
+---
+### Spaces Hardware Reference
+| Hardware | VRAM | Price | Best For |
+|----------|------|-------|----------|
+| ZeroGPU (H200) | 70GB | Free (PRO) | Demos, dynamic allocation |
+| T4 small/medium | 16GB | $0.40-0.60/hr | Light workloads |
+| L4x1 | 24GB | $0.80/hr | Standard inference |
+| L40Sx1 | 48GB | $1.80/hr | Large models |
+| A10G large | 24GB | $1.50/hr | Balanced cost/performance |
+| A100 large | 80GB | $2.50/hr | Maximum VRAM |
+*Prices as of Dec 2024. See [HuggingFace Spaces GPU docs](https://huggingface.co/docs/hub/spaces-gpus).*
+                            """
+                        )
         demo.queue(max_size=DEFAULT_QUEUE_MAX_SIZE, default_concurrency_limit=1)
         return demo
 demo = build_demo()
 if __name__ == "__main__":
+    demo.launch(
+        theme=THEME,
+        css=CSS,
+        server_port=DEFAULT_PORT,
+        show_api=True,
+    )

hardware_config.py ADDED Viewed

	@@ -0,0 +1,252 @@

+"""Hardware configuration for local CUDA and HuggingFace Spaces GPU selection.
+This module provides:
+- Hardware mode selection (local CUDA vs Spaces GPU)
+- Persistent configuration via JSON file
+- HuggingFace Spaces GPU hardware options
+Spaces GPU pricing (as of Dec 2024):
+- ZeroGPU (H200): Free (PRO subscribers), dynamic allocation
+- T4-small: $0.40/hr, 16GB VRAM
+- T4-medium: $0.60/hr, 16GB VRAM
+- L4x1: $0.80/hr, 24GB VRAM
+- L4x4: $3.80/hr, 96GB VRAM
+- L40Sx1: $1.80/hr, 48GB VRAM
+- L40Sx4: $8.30/hr, 192GB VRAM
+- A10G-small: $1.00/hr, 24GB VRAM
+- A10G-large: $1.50/hr, 24GB VRAM
+- A100-large: $2.50/hr, 80GB VRAM
+"""
+from __future__ import annotations
+import json
+import os
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Final, Literal
+# Hardware mode: local CUDA or HuggingFace Spaces
+HardwareMode = Literal["local", "spaces"]
+# Spaces hardware flavors (from HF docs)
+SpacesHardware = Literal[
+    "zero-gpu",      # ZeroGPU (H200, dynamic, free for PRO)
+    "t4-small",      # Nvidia T4 small
+    "t4-medium",     # Nvidia T4 medium
+    "l4x1",          # 1x Nvidia L4
+    "l4x4",          # 4x Nvidia L4
+    "l40s-x1",       # 1x Nvidia L40S
+    "l40s-x4",       # 4x Nvidia L40S
+    "a10g-small",    # Nvidia A10G small
+    "a10g-large",    # Nvidia A10G large
+    "a10g-largex2",  # 2x Nvidia A10G large
+    "a10g-largex4",  # 4x Nvidia A10G large
+    "a100-large",    # Nvidia A100 large (80GB)
+]
+# Hardware specs for display
+SPACES_HARDWARE_SPECS: Final[dict[str, dict]] = {
+    "zero-gpu": {
+        "name": "ZeroGPU (H200)",
+        "vram": "70GB",
+        "price": "Free (PRO)",
+        "description": "Dynamic allocation, best for demos",
+    },
+    "t4-small": {
+        "name": "Nvidia T4 small",
+        "vram": "16GB",
+        "price": "$0.40/hr",
+        "description": "4 vCPU, 15GB RAM",
+    },
+    "t4-medium": {
+        "name": "Nvidia T4 medium",
+        "vram": "16GB",
+        "price": "$0.60/hr",
+        "description": "8 vCPU, 30GB RAM",
+    },
+    "l4x1": {
+        "name": "1x Nvidia L4",
+        "vram": "24GB",
+        "price": "$0.80/hr",
+        "description": "8 vCPU, 30GB RAM",
+    },
+    "l4x4": {
+        "name": "4x Nvidia L4",
+        "vram": "96GB",
+        "price": "$3.80/hr",
+        "description": "48 vCPU, 186GB RAM",
+    },
+    "l40s-x1": {
+        "name": "1x Nvidia L40S",
+        "vram": "48GB",
+        "price": "$1.80/hr",
+        "description": "8 vCPU, 62GB RAM",
+    },
+    "l40s-x4": {
+        "name": "4x Nvidia L40S",
+        "vram": "192GB",
+        "price": "$8.30/hr",
+        "description": "48 vCPU, 382GB RAM",
+    },
+    "a10g-small": {
+        "name": "Nvidia A10G small",
+        "vram": "24GB",
+        "price": "$1.00/hr",
+        "description": "4 vCPU, 14GB RAM",
+    },
+    "a10g-large": {
+        "name": "Nvidia A10G large",
+        "vram": "24GB",
+        "price": "$1.50/hr",
+        "description": "12 vCPU, 46GB RAM",
+    },
+    "a10g-largex2": {
+        "name": "2x Nvidia A10G large",
+        "vram": "48GB",
+        "price": "$3.00/hr",
+        "description": "24 vCPU, 92GB RAM",
+    },
+    "a10g-largex4": {
+        "name": "4x Nvidia A10G large",
+        "vram": "96GB",
+        "price": "$5.00/hr",
+        "description": "48 vCPU, 184GB RAM",
+    },
+    "a100-large": {
+        "name": "Nvidia A100 large",
+        "vram": "80GB",
+        "price": "$2.50/hr",
+        "description": "12 vCPU, 142GB RAM, best for large models",
+    },
+}
+CONFIG_FILE: Final[Path] = Path(__file__).resolve().parent / ".hardware_config.json"
+@dataclass
+class HardwareConfig:
+    """Persistent hardware configuration."""
+    mode: HardwareMode = "local"
+    spaces_hardware: SpacesHardware = "zero-gpu"
+    spaces_duration: int = 180  # seconds for @spaces.GPU decorator
+    local_device: str = "auto"  # auto, cuda, cpu, mps
+    keep_model_on_device: bool = True
+    def to_dict(self) -> dict:
+        return {
+            "mode": self.mode,
+            "spaces_hardware": self.spaces_hardware,
+            "spaces_duration": self.spaces_duration,
+            "local_device": self.local_device,
+            "keep_model_on_device": self.keep_model_on_device,
+        }
+    @classmethod
+    def from_dict(cls, data: dict) -> "HardwareConfig":
+        return cls(
+            mode=data.get("mode", "local"),
+            spaces_hardware=data.get("spaces_hardware", "zero-gpu"),
+            spaces_duration=data.get("spaces_duration", 180),
+            local_device=data.get("local_device", "auto"),
+            keep_model_on_device=data.get("keep_model_on_device", True),
+        )
+    def save(self, path: Path = CONFIG_FILE) -> None:
+        """Save configuration to JSON file."""
+        path.write_text(json.dumps(self.to_dict(), indent=2))
+    @classmethod
+    def load(cls, path: Path = CONFIG_FILE) -> "HardwareConfig":
+        """Load configuration from JSON file, or return defaults."""
+        if path.exists():
+            try:
+                data = json.loads(path.read_text())
+                return cls.from_dict(data)
+            except Exception:
+                pass
+        return cls()
+def get_hardware_choices() -> list[tuple[str, str]]:
+    """Get hardware choices for Gradio dropdown.
+    Returns list of (display_name, value) tuples.
+    """
+    choices = [
+        ("🖥️ Local CUDA (auto-detect)", "local"),
+    ]
+    for hw_id, spec in SPACES_HARDWARE_SPECS.items():
+        label = f"☁️ {spec['name']} - {spec['vram']} VRAM ({spec['price']})"
+        choices.append((label, f"spaces:{hw_id}"))
+    return choices
+def parse_hardware_choice(choice: str) -> tuple[HardwareMode, SpacesHardware | None]:
+    """Parse hardware choice string into mode and hardware type."""
+    if choice == "local":
+        return "local", None
+    elif choice.startswith("spaces:"):
+        hw = choice.replace("spaces:", "")
+        return "spaces", hw  # type: ignore
+    else:
+        return "local", None
+def is_running_on_spaces() -> bool:
+    """Check if we're running on HuggingFace Spaces."""
+    return os.getenv("SPACE_ID") is not None
+def get_spaces_module():
+    """Import and return the spaces module if available."""
+    try:
+        import spaces
+        return spaces
+    except ImportError:
+        return None
+# Global config instance
+_config: HardwareConfig | None = None
+def get_config() -> HardwareConfig:
+    """Get the global hardware configuration."""
+    global _config
+    if _config is None:
+        _config = HardwareConfig.load()
+    return _config
+def update_config(
+    mode: HardwareMode | None = None,
+    spaces_hardware: SpacesHardware | None = None,
+    spaces_duration: int | None = None,
+    local_device: str | None = None,
+    keep_model_on_device: bool | None = None,
+    save: bool = True,
+) -> HardwareConfig:
+    """Update and optionally save the hardware configuration."""
+    global _config
+    config = get_config()
+    if mode is not None:
+        config.mode = mode
+    if spaces_hardware is not None:
+        config.spaces_hardware = spaces_hardware
+    if spaces_duration is not None:
+        config.spaces_duration = spaces_duration
+    if local_device is not None:
+        config.local_device = local_device
+    if keep_model_on_device is not None:
+        config.keep_model_on_device = keep_model_on_device
+    if save:
+        config.save()
+    _config = config
+    return config

mcp_server.py ADDED Viewed

	@@ -0,0 +1,224 @@

+"""SHARP MCP Server for programmatic access to 3D Gaussian prediction.
+Run standalone:
+    uv run python mcp_server.py
+Or integrate with MCP clients via stdio transport.
+"""
+from __future__ import annotations
+import json
+import os
+from pathlib import Path
+from typing import Literal
+import torch
+from mcp.server.fastmcp import FastMCP
+from model_utils import (
+    DEFAULT_OUTPUTS_DIR,
+    ModelWrapper,
+    TrajectoryType,
+    get_global_model,
+)
+MCP_PORT: int = int(os.getenv("SHARP_MCP_PORT", "49201"))
+mcp = FastMCP(
+    "sharp",
+    description="SHARP: Single-image 3D Gaussian scene prediction",
+)
+# -----------------------------------------------------------------------------
+# Tools
+# -----------------------------------------------------------------------------
+@mcp.tool()
+def sharp_predict(
+    image_path: str,
+    render_video: bool = True,
+    trajectory_type: TrajectoryType = "rotate_forward",
+    num_frames: int = 60,
+    fps: int = 30,
+    output_long_side: int | None = None,
+) -> dict:
+    """Predict 3D Gaussians from a single image.
+    Args:
+        image_path: Absolute path to input image (jpg/png/webp).
+        render_video: Whether to render a camera trajectory video (requires CUDA).
+        trajectory_type: Camera trajectory type (swipe/shake/rotate/rotate_forward).
+        num_frames: Number of frames for video rendering.
+        fps: Frames per second for video.
+        output_long_side: Output resolution (longest side). None = match input.
+    Returns:
+        dict with keys:
+            - ply_path: Path to exported PLY file
+            - video_path: Path to rendered MP4 (or null if not rendered)
+            - cuda_available: Whether CUDA was available
+    """
+    image_path_obj = Path(image_path)
+    if not image_path_obj.exists():
+        raise FileNotFoundError(f"Image not found: {image_path}")
+    model = get_global_model()
+    video_path, ply_path = model.predict_and_maybe_render(
+        image_path_obj,
+        trajectory_type=trajectory_type,
+        num_frames=num_frames,
+        fps=fps,
+        output_long_side=output_long_side,
+        render_video=render_video,
+    )
+    return {
+        "ply_path": str(ply_path),
+        "video_path": str(video_path) if video_path else None,
+        "cuda_available": torch.cuda.is_available(),
+    }
+@mcp.tool()
+def sharp_render(
+    ply_path: str,
+    trajectory_type: TrajectoryType = "rotate_forward",
+    num_frames: int = 60,
+    fps: int = 30,
+    output_long_side: int | None = None,
+) -> dict:
+    """Render a video from an existing PLY file.
+    Note: This requires re-predicting from the original image since Gaussians
+    are not stored in standard PLY format. For now, returns an error.
+    Future versions may support loading Gaussians from PLY.
+    Args:
+        ply_path: Path to PLY file (from previous prediction).
+        trajectory_type: Camera trajectory type.
+        num_frames: Number of frames.
+        fps: Frames per second.
+        output_long_side: Output resolution.
+    Returns:
+        dict with error message (feature not yet implemented).
+    """
+    return {
+        "error": "Rendering from PLY not yet implemented. Use sharp_predict with render_video=True.",
+        "hint": "PLY files store only point data, not the full Gaussian parameters needed for rendering.",
+    }
+@mcp.tool()
+def list_outputs() -> dict:
+    """List all generated output files (PLY and MP4).
+    Returns:
+        dict with keys:
+            - outputs_dir: Path to outputs directory
+            - ply_files: List of PLY file paths
+            - video_files: List of MP4 file paths
+    """
+    outputs_dir = DEFAULT_OUTPUTS_DIR
+    ply_files = sorted(outputs_dir.glob("*.ply"))
+    video_files = sorted(outputs_dir.glob("*.mp4"))
+    return {
+        "outputs_dir": str(outputs_dir),
+        "ply_files": [str(f) for f in ply_files],
+        "video_files": [str(f) for f in video_files],
+    }
+# -----------------------------------------------------------------------------
+# Resources
+# -----------------------------------------------------------------------------
+@mcp.resource("sharp://info")
+def get_info() -> str:
+    """Get SHARP server info including GPU status and configuration."""
+    cuda_available = torch.cuda.is_available()
+    gpu_info = []
+    if cuda_available:
+        for i in range(torch.cuda.device_count()):
+            props = torch.cuda.get_device_properties(i)
+            gpu_info.append({
+                "index": i,
+                "name": props.name,
+                "total_memory_gb": round(props.total_memory / (1024**3), 2),
+                "compute_capability": f"{props.major}.{props.minor}",
+            })
+    info = {
+        "model": "SHARP (Apple ml-sharp)",
+        "description": "Single-image 3D Gaussian scene prediction",
+        "cuda_available": cuda_available,
+        "cuda_device_count": torch.cuda.device_count() if cuda_available else 0,
+        "gpus": gpu_info,
+        "outputs_dir": str(DEFAULT_OUTPUTS_DIR),
+        "checkpoint_sources": [
+            "SHARP_CHECKPOINT_PATH env var",
+            "HuggingFace Hub (apple/Sharp)",
+            "Upstream CDN (torch.hub)",
+        ],
+        "env_vars": {
+            "SHARP_CHECKPOINT_PATH": os.getenv("SHARP_CHECKPOINT_PATH", "(not set)"),
+            "SHARP_KEEP_MODEL_ON_DEVICE": os.getenv("SHARP_KEEP_MODEL_ON_DEVICE", "1"),
+            "CUDA_VISIBLE_DEVICES": os.getenv("CUDA_VISIBLE_DEVICES", "(not set)"),
+        },
+    }
+    return json.dumps(info, indent=2)
+@mcp.resource("sharp://help")
+def get_help() -> str:
+    """Get usage help for the SHARP MCP server."""
+    help_text = """
+# SHARP MCP Server
+## Tools
+### sharp_predict
+Predict 3D Gaussians from a single image.
+Parameters:
+- image_path (required): Absolute path to input image
+- render_video: Whether to render MP4 (default: true, requires CUDA)
+- trajectory_type: swipe | shake | rotate | rotate_forward (default: rotate_forward)
+- num_frames: Number of video frames (default: 60)
+- fps: Video frame rate (default: 30)
+- output_long_side: Output resolution, null = match input
+### list_outputs
+List all generated PLY and MP4 files.
+## Resources
+### sharp://info
+Server info, GPU status, configuration.
+### sharp://help
+This help text.
+## Environment Variables
+- SHARP_MCP_PORT: MCP server port (default: 49201)
+- SHARP_CHECKPOINT_PATH: Local checkpoint path override
+- SHARP_KEEP_MODEL_ON_DEVICE: Keep model on GPU (default: 1)
+- CUDA_VISIBLE_DEVICES: GPU selection (e.g., "0" or "0,1")
+"""
+    return help_text.strip()
+# -----------------------------------------------------------------------------
+# Main
+# -----------------------------------------------------------------------------
+if __name__ == "__main__":
+    # Run as stdio transport for MCP clients
+    mcp.run()

model_utils.py CHANGED Viewed

@@ -23,10 +23,13 @@ from typing import Final, Literal
 import torch
 try:
     import spaces
-except Exception:  # pragma: no cover
     spaces = None  # type: ignore[assignment]
 try:
     # Prefer HF cache / Hub downloads (works with Spaces `preload_from_hub`).
@@ -175,15 +178,19 @@ class ModelWrapper:
         self.device_preference = device_preference
-        # For ZeroGPU, it's safer to not keep large tensors on CUDA across calls.
         if keep_model_on_device is None:
-            keep_env = (
-                os.getenv("SHARP_KEEP_MODEL_ON_DEVICE")
-            )
-            self.keep_model_on_device = keep_env == "1"
         else:
             self.keep_model_on_device = keep_model_on_device
         self._lock = threading.RLock()
         self._predictor: torch.nn.Module | None = None
         self._predictor_device: torch.device | None = None
@@ -560,16 +567,8 @@ class ModelWrapper:
 # -----------------------------------------------------------------------------
-# ZeroGPU entrypoints
 # -----------------------------------------------------------------------------
-#
-# IMPORTANT: Do NOT decorate bound instance methods with `@spaces.GPU` on ZeroGPU.
-# The wrapper uses multiprocessing queues and pickles args/kwargs. If `self` is
-# included, Python will try to pickle the whole instance. ModelWrapper contains
-# a threading.RLock (not pickleable) and the model itself should not be pickled.
-#
-# Expose module-level functions that accept only pickleable arguments and
-# create/cache the ModelWrapper inside the GPU worker process.
 DEFAULT_OUTPUTS_DIR: Final[Path] = _ensure_dir(Path(__file__).resolve().parent / "outputs")
@@ -605,8 +604,60 @@ def predict_and_maybe_render(
     )
-# Export the GPU-wrapped callable (or a no-op wrapper locally).
-if spaces is not None:
-    predict_and_maybe_render_gpu = spaces.GPU(duration=180)(predict_and_maybe_render)
-else:  # pragma: no cover
-    predict_and_maybe_render_gpu = predict_and_maybe_render

 import torch
+# Optional Spaces GPU support (for HuggingFace Spaces deployment)
 try:
     import spaces
+    _SPACES_AVAILABLE = True
+except ImportError:
     spaces = None  # type: ignore[assignment]
+    _SPACES_AVAILABLE = False
 try:
     # Prefer HF cache / Hub downloads (works with Spaces `preload_from_hub`).
         self.device_preference = device_preference
+        # Local CUDA: keep model on device by default for better performance
         if keep_model_on_device is None:
+            keep_env = os.getenv("SHARP_KEEP_MODEL_ON_DEVICE", "1")
+            self.keep_model_on_device = keep_env != "0"
         else:
             self.keep_model_on_device = keep_model_on_device
+        # Support CUDA device selection via env var
+        cuda_device = os.getenv("CUDA_VISIBLE_DEVICES")
+        if cuda_device and device_preference == "auto":
+            # Let PyTorch handle device mapping via CUDA_VISIBLE_DEVICES
+            pass
         self._lock = threading.RLock()
         self._predictor: torch.nn.Module | None = None
         self._predictor_device: torch.device | None = None
 # -----------------------------------------------------------------------------
+# Module-level entrypoints
 # -----------------------------------------------------------------------------
 DEFAULT_OUTPUTS_DIR: Final[Path] = _ensure_dir(Path(__file__).resolve().parent / "outputs")
     )
+# -----------------------------------------------------------------------------
+# GPU-wrapped entrypoint (Spaces or local)
+# -----------------------------------------------------------------------------
+def _create_spaces_gpu_wrapper(duration: int = 180):
+    """Create a Spaces GPU-wrapped version of predict_and_maybe_render.
+    This is called dynamically based on hardware configuration.
+    """
+    if spaces is not None and _SPACES_AVAILABLE:
+        return spaces.GPU(duration=duration)(predict_and_maybe_render)
+    return predict_and_maybe_render
+# Default export: use local CUDA unless explicitly configured for Spaces
+# The actual wrapper is created dynamically based on hardware_config
+predict_and_maybe_render_gpu = predict_and_maybe_render
+def configure_gpu_mode(use_spaces: bool = False, duration: int = 180) -> None:
+    """Configure the GPU mode at runtime.
+    Args:
+        use_spaces: If True and spaces module available, use @spaces.GPU decorator
+        duration: Duration for @spaces.GPU decorator (seconds)
+    """
+    global predict_and_maybe_render_gpu
+    if use_spaces and _SPACES_AVAILABLE and spaces is not None:
+        predict_and_maybe_render_gpu = spaces.GPU(duration=duration)(predict_and_maybe_render)
+    else:
+        predict_and_maybe_render_gpu = predict_and_maybe_render
+def get_gpu_status() -> dict:
+    """Get current GPU status information."""
+    import torch
+    status = {
+        "cuda_available": torch.cuda.is_available(),
+        "spaces_available": _SPACES_AVAILABLE,
+        "device_count": torch.cuda.device_count() if torch.cuda.is_available() else 0,
+        "devices": [],
+    }
+    if torch.cuda.is_available():
+        for i in range(torch.cuda.device_count()):
+            props = torch.cuda.get_device_properties(i)
+            status["devices"].append({
+                "index": i,
+                "name": props.name,
+                "total_memory_gb": round(props.total_memory / (1024**3), 2),
+                "compute_capability": f"{props.major}.{props.minor}",
+            })
+    return status

pyproject.toml CHANGED Viewed

@@ -7,8 +7,9 @@ requires-python = ">=3.13"
 dependencies = [
     "gradio==6.1.0",
     "huggingface-hub>=1.2.3",
     "sharp",
-    "spaces==0.44.0",
     "torch>=2.9.1",
     "torchvision>=0.24.1",
 ]

 dependencies = [
     "gradio==6.1.0",
     "huggingface-hub>=1.2.3",
+    "mcp>=1.0.0",
     "sharp",
+    "spaces>=0.30.0",
     "torch>=2.9.1",
     "torchvision>=0.24.1",
 ]

requirements.txt CHANGED Viewed

@@ -1,6 +1,7 @@
 gradio==6.2.0
-spaces==0.44.0
 huggingface_hub>=1.2.3
 torch
 torchvision
 sharp @ git+https://github.com/apple/ml-sharp.git@cdb4ddc6796402bee5487c7312260f2edd8bd5f0

 gradio==6.2.0
 huggingface_hub>=1.2.3
+spaces>=0.30.0
 torch
 torchvision
 sharp @ git+https://github.com/apple/ml-sharp.git@cdb4ddc6796402bee5487c7312260f2edd8bd5f0
+mcp>=1.0.0