Training loop and around the codebase

Browse files

Files changed (11) hide show

.gitignore +6 -0
README.md +22 -1
SEGMENTATION_PLAN.md +10 -2
configs/default.yaml +14 -4
requirements.txt +1 -0
src/wireseghr/data/dataset.py +20 -14
src/wireseghr/data/sampler.py +19 -4
src/wireseghr/metrics.py +29 -6
src/wireseghr/model/encoder.py +120 -12
src/wireseghr/model/model.py +2 -2
src/wireseghr/train.py +386 -1

.gitignore CHANGED Viewed

@@ -114,3 +114,9 @@ venv/
 ENV/
 env.bak/
 venv.bak/

 ENV/
 env.bak/
 venv.bak/
+# Secrets
+secrets/
+# dataset
+dataset/

README.md CHANGED Viewed

@@ -28,4 +28,25 @@ python src/wireseghr/infer.py --config configs/default.yaml --image /path/to/ima
 ## Notes
 - This is a segmentation-only codebase. Inpainting is out of scope here.
-- Defaults locked: MiT-B3 encoder, patch size 768, MinMax 6×6, global+binary mask conditioning with patch-cropped global map.

 ## Notes
 - This is a segmentation-only codebase. Inpainting is out of scope here.
+- Defaults locked: SegFormer MiT-B3 encoder, patch size 768, MinMax 6×6, global+binary mask conditioning with patch-cropped global map.
+### Backbone Source
+- Preferred: HuggingFace Transformers SegFormer (e.g., `nvidia/mit-b3`). We set `num_channels` to match input channels.
+- Optional: `timm` features_only if a compatible SegFormer is available.
+- Fallback: a small internal CNN that preserves 1/4, 1/8, 1/16, 1/32 strides with channels [64, 128, 320, 512].
+Install requirements to get Transformers:
+```
+pip install -r requirements.txt
+```
+## Dataset Convention
+- Flat directories with numeric filenames; images are `.jpg`/`.jpeg`, masks are `.png`.
+- Example (after split 85/5/10):
+  - `dataset/train/images/1.jpg, 2.jpg, ...` and `dataset/train/gts/1.png, 2.png, ...`
+  - `dataset/val/images/...` and `dataset/val/gts/...`
+  - `dataset/test/images/...` and `dataset/test/gts/...`
+- Masks are binary: foreground = white (255), background = black (0).
+- The loader strictly enforces numeric stems and 1:1 pairing and will assert on mismatches.
+Update `configs/default.yaml` with your paths under `data.train_images`, `data.train_masks`, etc. Defaults point to `dataset/train/images`, `dataset/train/gts`, and validation to `dataset/val/...`.

SEGMENTATION_PLAN.md CHANGED Viewed

@@ -9,7 +9,7 @@ This plan distills the model and pipeline described in the paper sources:
 Focus: segmentation only (no dataset collection or inpainting).
 ## Decisions and Defaults (locked)
-- Backbone: SegFormer MiT-B3 (shared encoder `E`).
 - Fine/local patch size p: 768.
 - Conditioning: global map + binary location mask by default (Table `tables/logit.tex`).
 - Conditioning map scope: patch-cropped from the global map per `paper-tex/sections/method_yq.tex` (no full-image concatenation variant).
@@ -41,7 +41,7 @@ Focus: segmentation only (no dataset collection or inpainting).
 - `README.md` (segmentation-only usage)
 ## Model Specification
-- Shared encoder `E`: SegFormer MiT-B3.
   - Input channels (default): 3 (RGB) + 2 (MinMax) + 1 (global cond) + 1 (binary location) = 7.
   - For the coarse pass, the cond and location channels are zeros to keep channel count consistent (`method_yq.tex`).
   - Weight init for extra channels: copy mean of RGB conv weights or zero-init.
@@ -65,6 +65,14 @@ Focus: segmentation only (no dataset collection or inpainting).
   - Downsample full-res mask to coarse size with max-pooling to prevent wire vanishing (`method_yq.tex`).
 - Normalization: standard mean/std per backbone; apply consistently across channels (new channels can be mean=0, std=1 by convention, or min-max scaled).
 ## Training Pipeline
 - Augment the full-res image (scaling, rotation, horizontal flip, photometric distortion) before constructing coarse/fine inputs (`method.tex`).
 - Coarse input: downsample augmented full image to 512×512; build channels [RGB+MinMax+zeros(2)] → `E` → `D_C`.

 Focus: segmentation only (no dataset collection or inpainting).
 ## Decisions and Defaults (locked)
+- Backbone: SegFormer MiT-B3 via HuggingFace Transformers (shared encoder `E`), with `timm` or tiny CNN fallback.
 - Fine/local patch size p: 768.
 - Conditioning: global map + binary location mask by default (Table `tables/logit.tex`).
 - Conditioning map scope: patch-cropped from the global map per `paper-tex/sections/method_yq.tex` (no full-image concatenation variant).
 - `README.md` (segmentation-only usage)
 ## Model Specification
+- Shared encoder `E`: SegFormer MiT-B3 (HF Transformers preferred).
   - Input channels (default): 3 (RGB) + 2 (MinMax) + 1 (global cond) + 1 (binary location) = 7.
   - For the coarse pass, the cond and location channels are zeros to keep channel count consistent (`method_yq.tex`).
   - Weight init for extra channels: copy mean of RGB conv weights or zero-init.
   - Downsample full-res mask to coarse size with max-pooling to prevent wire vanishing (`method_yq.tex`).
 - Normalization: standard mean/std per backbone; apply consistently across channels (new channels can be mean=0, std=1 by convention, or min-max scaled).
+### Dataset Convention (project-specific)
+- Flat directories with numeric filenames; images are `.jpg`/`.jpeg`, masks are `.png`.
+- Example:
+  - `dataset/images/1.jpg, 2.jpg, ..., N.jpg` (or `.jpeg`)
+  - `dataset/gts/1.png, 2.png, ..., N.png`
+- Masks are binary: foreground = white (255), background = black (0).
+- The loader (`data/dataset.py`) strictly enforces numeric stems and 1:1 pairing and will assert on mismatch.
 ## Training Pipeline
 - Augment the full-res image (scaling, rotation, horizontal flip, photometric distortion) before constructing coarse/fine inputs (`method.tex`).
 - Coarse input: downsample augmented full image to 512×512; build channels [RGB+MinMax+zeros(2)] → `E` → `D_C`.

configs/default.yaml CHANGED Viewed

@@ -1,5 +1,6 @@
 # Default configuration for WireSegHR (segmentation-only)
 backbone: mit_b3
 coarse:
   train_size: 512
@@ -34,9 +35,18 @@ optim:
   schedule: poly
   power: 1.0
 # dataset paths (placeholders)
 data:
-  train_images: /path/to/train/images
-  train_masks: /path/to/train/masks
-  val_images: /path/to/val/images
-  val_masks: /path/to/val/masks

 # Default configuration for WireSegHR (segmentation-only)
 backbone: mit_b3
+pretrained: true  # Uses HF SegFormer weights if available; else timm or tiny fallback
 coarse:
   train_size: 512
   schedule: poly
   power: 1.0
+# training housekeeping
+seed: 42
+out_dir: runs/wireseghr
+eval_interval: 500
+ckpt_interval: 1000
+# resume: runs/wireseghr/ckpt_1000.pt  # optional
 # dataset paths (placeholders)
 data:
+  train_images: dataset/train/images
+  train_masks: dataset/train/gts
+  val_images: dataset/val/images
+  val_masks: dataset/val/gts
+  test_images: dataset/test/images
+  test_masks: dataset/test/gts

requirements.txt CHANGED Viewed

@@ -1,6 +1,7 @@
 torch>=2.1.0
 torchvision>=0.16.0
 timm>=0.9.8
 numpy>=1.24.0
 opencv-python>=4.8.0.76
 Pillow>=9.5.0

 torch>=2.1.0
 torchvision>=0.16.0
 timm>=0.9.8
+transformers>=4.37.0
 numpy>=1.24.0
 opencv-python>=4.8.0.76
 Pillow>=9.5.0

src/wireseghr/data/dataset.py CHANGED Viewed

@@ -36,19 +36,25 @@ class WireSegDataset:
         return {"image": img, "mask": mask_bin, "image_path": str(img_path), "mask_path": str(mask_path)}
     def _index_pairs(self) -> List[tuple[Path, Path]]:
-        exts_img = {".png", ".jpg", ".jpeg", ".bmp", ".tif", ".tiff"}
-        exts_mask = {".png", ".jpg", ".jpeg", ".bmp", ".tif", ".tiff"}
-        imgs: Dict[str, Path] = {}
-        for p in sorted(self.images_dir.rglob("*")):
-            if p.is_file() and p.suffix.lower() in exts_img:
-                imgs[p.stem] = p
-        masks: Dict[str, Path] = {}
-        for p in sorted(self.masks_dir.rglob("*")):
-            if p.is_file() and p.suffix.lower() in exts_mask:
-                masks[p.stem] = p
         pairs: List[tuple[Path, Path]] = []
-        for stem, ip in imgs.items():
-            if stem in masks:
-                pairs.append((ip, masks[stem]))
-        assert len(pairs) > 0, f"No image-mask pairs found in {self.images_dir} and {self.masks_dir}"
         return pairs

         return {"image": img, "mask": mask_bin, "image_path": str(img_path), "mask_path": str(mask_path)}
     def _index_pairs(self) -> List[tuple[Path, Path]]:
+        # Convention: numeric filenames; images are .jpg/.jpeg; masks (gts) are .png
+        img_files = sorted([p for p in self.images_dir.glob("*.jpg") if p.is_file()])
+        img_files += sorted([p for p in self.images_dir.glob("*.jpeg") if p.is_file()])
+        assert len(img_files) > 0, f"No .jpg/.jpeg images in {self.images_dir}"
         pairs: List[tuple[Path, Path]] = []
+        ids: List[int] = []
+        for p in img_files:
+            stem = p.stem
+            assert stem.isdigit(), f"Non-numeric filename encountered: {p.name}"
+            ids.append(int(stem))
+        ids = sorted(ids)
+        for i in ids:
+            # Prefer .jpg, else .jpeg
+            ip_jpg = self.images_dir / f"{i}.jpg"
+            ip_jpeg = self.images_dir / f"{i}.jpeg"
+            ip = ip_jpg if ip_jpg.exists() else ip_jpeg
+            assert ip.exists(), f"Missing image for {i}: {ip_jpg} or {ip_jpeg}"
+            mp = self.masks_dir / f"{i}.png"
+            assert mp.exists(), f"Missing mask for {i}: {mp}"
+            pairs.append((ip, mp))
+        assert len(pairs) > 0, f"No numeric pairs found in {self.images_dir} and {self.masks_dir}"
         return pairs

src/wireseghr/data/sampler.py CHANGED Viewed

@@ -1,14 +1,29 @@
 # Balanced patch sampler (>=1% wire pixels)
-# TODO: Implement logic over mask to pick patches with wire ratio >= threshold.
 from dataclasses import dataclass
 @dataclass
 class BalancedPatchSampler:
     patch_size: int = 768
     min_wire_ratio: float = 0.01
-    def sample(self, image, mask):
-        # TODO: sample and return top-left (y, x) of a valid patch
-        return 0, 0

 # Balanced patch sampler (>=1% wire pixels)
+"""Balanced patch sampling with >= min_wire_ratio positives.
+Sampling is uniform over valid top-left positions; tries a fixed number of
+attempts and asserts if none meet the threshold.
+"""
 from dataclasses import dataclass
+import numpy as np
 @dataclass
 class BalancedPatchSampler:
     patch_size: int = 768
     min_wire_ratio: float = 0.01
+    max_tries: int = 200
+    def sample(self, image: np.ndarray, mask: np.ndarray) -> tuple[int, int]:
+        h, w = mask.shape
+        p = self.patch_size
+        assert h >= p and w >= p, "Image smaller than patch size"
+        for _ in range(self.max_tries):
+            y = np.random.randint(0, h - p + 1)
+            x = np.random.randint(0, w - p + 1)
+            m = mask[y : y + p, x : x + p]
+            ratio = float(m.sum()) / float(p * p)
+            if ratio >= self.min_wire_ratio:
+                return int(y), int(x)
+        raise AssertionError("Failed to sample a patch meeting min_wire_ratio")

src/wireseghr/metrics.py CHANGED Viewed

@@ -1,9 +1,32 @@
-# Metrics placeholder: IoU, F1, Precision, Recall
-# TODO: Implement proper metrics matching paper tables.
 from typing import Dict
-def compute_metrics(pred_mask, gt_mask) -> Dict[str, float]:
-    # TODO: implement
-    return {"iou": 0.0, "f1": 0.0, "precision": 0.0, "recall": 0.0}

 from typing import Dict
+import numpy as np
+def compute_metrics(pred_mask: np.ndarray, gt_mask: np.ndarray) -> Dict[str, float]:
+    """Compute binary segmentation metrics on 0/1 numpy masks.
+    Args:
+        pred_mask: HxW uint8 or bool in {0,1}
+        gt_mask:   HxW uint8 or bool in {0,1}
+    Returns:
+        dict with iou, f1, precision, recall
+    """
+    p = (pred_mask > 0).astype(np.uint8)
+    g = (gt_mask > 0).astype(np.uint8)
+    tp = int(np.sum((p == 1) & (g == 1)))
+    fp = int(np.sum((p == 1) & (g == 0)))
+    fn = int(np.sum((p == 0) & (g == 1)))
+    denom_iou = tp + fp + fn
+    iou = (tp / denom_iou) if denom_iou > 0 else 0.0
+    prec_den = tp + fp
+    rec_den = tp + fn
+    precision = (tp / prec_den) if prec_den > 0 else 0.0
+    recall = (tp / rec_den) if rec_den > 0 else 0.0
+    denom_f1 = precision + recall
+    f1 = (2 * precision * recall / denom_f1) if denom_f1 > 0 else 0.0
+    return {"iou": float(iou), "f1": float(f1), "precision": float(precision), "recall": float(recall)}

src/wireseghr/model/encoder.py CHANGED Viewed

@@ -25,18 +25,126 @@ class SegFormerEncoder(nn.Module):
         self.pretrained = pretrained
         self.out_indices = out_indices
-        # Create MiT with features_only to obtain multi-scale feature maps.
-        # in_chans allows expanded inputs (RGB + minmax + cond + loc)
-        self.encoder = timm.create_model(
-            backbone,
-            pretrained=pretrained,
-            features_only=True,
-            out_indices=out_indices,
-            in_chans=in_channels,
         )
     def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
-        feats = self.encoder(x)
-        # Ensure list of tensors is returned
-        assert isinstance(feats, (list, tuple)) and len(feats) == len(self.out_indices)
-        return list(feats)

         self.pretrained = pretrained
         self.out_indices = out_indices
+        # Prefer HuggingFace SegFormer for 'mit_*' backbones.
+        # Otherwise try timm features_only. Always have Tiny CNN fallback.
+        self.encoder = None
+        self.hf = None
+        prefer_hf = backbone.startswith("mit_") or backbone.startswith("segformer")
+        if prefer_hf:
+            # HF -> timm -> tiny
+            try:
+                self.hf = _HFEncoderWrapper(in_channels, backbone, pretrained)
+                self.feature_dims = self.hf.feature_dims
+            except Exception:
+                try:
+                    self.encoder = timm.create_model(
+                        backbone,
+                        pretrained=pretrained,
+                        features_only=True,
+                        out_indices=out_indices,
+                        in_chans=in_channels,
+                    )
+                    self.feature_dims = list(self.encoder.feature_info.channels())
+                except Exception:
+                    self.encoder = None
+                    self.fallback = _TinyEncoder(in_channels)
+                    self.feature_dims = [64, 128, 320, 512]
+        else:
+            # timm -> HF -> tiny
+            try:
+                self.encoder = timm.create_model(
+                    backbone,
+                    pretrained=pretrained,
+                    features_only=True,
+                    out_indices=out_indices,
+                    in_chans=in_channels,
+                )
+                self.feature_dims = list(self.encoder.feature_info.channels())
+            except Exception:
+                try:
+                    self.hf = _HFEncoderWrapper(in_channels, backbone, pretrained)
+                    self.feature_dims = self.hf.feature_dims
+                except Exception:
+                    self.encoder = None
+                    self.fallback = _TinyEncoder(in_channels)
+                    self.feature_dims = [64, 128, 320, 512]
+    def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
+        if self.encoder is not None:
+            feats = self.encoder(x)
+            assert isinstance(feats, (list, tuple)) and len(feats) == len(self.out_indices)
+            return list(feats)
+        elif self.hf is not None:
+            return self.hf(x)
+        else:
+            return self.fallback(x)
+class _TinyEncoder(nn.Module):
+    def __init__(self, in_chans: int):
+        super().__init__()
+        # Output strides: 4, 8, 16, 32 with channels 64,128,320,512
+        self.stem = nn.Sequential(
+            nn.Conv2d(in_chans, 64, kernel_size=7, stride=4, padding=3, bias=False),
+            nn.BatchNorm2d(64),
+            nn.ReLU(inplace=True),
+        )
+        self.stage1 = nn.Sequential(
+            nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1, bias=False),
+            nn.BatchNorm2d(128),
+            nn.ReLU(inplace=True),
+        )
+        self.stage2 = nn.Sequential(
+            nn.Conv2d(128, 320, kernel_size=3, stride=2, padding=1, bias=False),
+            nn.BatchNorm2d(320),
+            nn.ReLU(inplace=True),
+        )
+        self.stage3 = nn.Sequential(
+            nn.Conv2d(320, 512, kernel_size=3, stride=2, padding=1, bias=False),
+            nn.BatchNorm2d(512),
+            nn.ReLU(inplace=True),
         )
     def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
+        c0 = self.stem(x)   # 1/4
+        c1 = self.stage1(c0)  # 1/8
+        c2 = self.stage2(c1)  # 1/16
+        c3 = self.stage3(c2)  # 1/32
+        return [c0, c1, c2, c3]
+class _HFEncoderWrapper(nn.Module):
+    def __init__(self, in_chans: int, backbone: str, pretrained: bool):
+        super().__init__()
+        # Lazy import to avoid hard dependency during tests if not used
+        from transformers import SegformerModel, SegformerConfig
+        name_map = {
+            "mit_b0": "nvidia/mit-b0",
+            "mit_b1": "nvidia/mit-b1",
+            "mit_b2": "nvidia/mit-b2",
+            "mit_b3": "nvidia/mit-b3",
+            "mit_b4": "nvidia/mit-b4",
+            "mit_b5": "nvidia/mit-b5",
+        }
+        model_id = name_map.get(backbone, "nvidia/mit-b0")
+        if pretrained:
+            base_cfg = SegformerConfig.from_pretrained(model_id)
+            base_cfg.num_channels = in_chans
+            self.model = SegformerModel.from_pretrained(
+                model_id, config=base_cfg, ignore_mismatched_sizes=True
+            )
+        else:
+            cfg = SegformerConfig()  # default config (B0-like)
+            cfg.num_channels = in_chans
+            self.model = SegformerModel(cfg)
+        # Expose channel dims per stage
+        self.feature_dims = list(self.model.config.hidden_sizes)
+    def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
+        outputs = self.model(pixel_values=x, output_hidden_states=True, return_dict=True)
+        feats = list(outputs.hidden_states)
+        assert len(feats) == 4
+        return feats

src/wireseghr/model/model.py CHANGED Viewed

@@ -22,8 +22,8 @@ class WireSegHR(nn.Module):
     def __init__(self, backbone: str = "mit_b3", in_channels: int = 7, pretrained: bool = True):
         super().__init__()
         self.encoder = SegFormerEncoder(backbone=backbone, in_channels=in_channels, pretrained=pretrained)
-        # Default MiT-B3 channel dims for stages
-        in_chs = (64, 128, 320, 512)
         self.coarse_head = CoarseDecoder(in_chs=in_chs, embed_dim=128, num_classes=2)
         self.fine_head = FineDecoder(in_chs=in_chs, embed_dim=128, num_classes=2)
         self.cond1x1 = Conditioning1x1()

     def __init__(self, backbone: str = "mit_b3", in_channels: int = 7, pretrained: bool = True):
         super().__init__()
         self.encoder = SegFormerEncoder(backbone=backbone, in_channels=in_channels, pretrained=pretrained)
+        # Use encoder-exposed feature dims for decoder projections
+        in_chs = tuple(self.encoder.feature_dims)
         self.coarse_head = CoarseDecoder(in_chs=in_chs, embed_dim=128, num_classes=2)
         self.fine_head = FineDecoder(in_chs=in_chs, embed_dim=128, num_classes=2)
         self.cond1x1 = Conditioning1x1()

src/wireseghr/train.py CHANGED Viewed

@@ -2,6 +2,24 @@ import argparse
 import os
 import pprint
 import yaml
 def main():
@@ -18,8 +36,375 @@ def main():
     print("[WireSegHR][train] Loaded config from:", cfg_path)
     pprint.pprint(cfg)
-    print("[WireSegHR][train] Skeleton OK. Implement training per SEGMENTATION_PLAN.md.")
 if __name__ == "__main__":
     main()

 import os
 import pprint
 import yaml
+from typing import Tuple, List, Optional, Dict
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.cuda.amp import autocast, GradScaler
+from tqdm import tqdm
+import random
+import torch.backends.cudnn as cudnn
+import cv2
+from wireseghr.model import WireSegHR
+from wireseghr.model.minmax import MinMaxLuminance
+from wireseghr.model.label_downsample import downsample_label_maxpool
+from wireseghr.data.dataset import WireSegDataset
+from wireseghr.data.sampler import BalancedPatchSampler
+from wireseghr.metrics import compute_metrics
 def main():
     print("[WireSegHR][train] Loaded config from:", cfg_path)
     pprint.pprint(cfg)
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    print(f"[WireSegHR][train] Device: {device}")
+    # Config
+    coarse_train = int(cfg["coarse"]["train_size"])  # 512
+    patch_size = int(cfg["fine"]["patch_size"])      # 768
+    iters = int(cfg["optim"]["iters"])               # 40000
+    batch_size = int(cfg["optim"]["batch_size"])     # 8
+    base_lr = float(cfg["optim"]["lr"])              # 6e-5
+    weight_decay = float(cfg["optim"]["weight_decay"])  # 0.01
+    power = float(cfg["optim"]["power"])             # 1.0
+    amp_flag = bool(cfg["optim"].get("amp", True))
+    # Housekeeping
+    seed = int(cfg.get("seed", 42))
+    out_dir = cfg.get("out_dir", "runs/wireseghr")
+    eval_interval = int(cfg.get("eval_interval", 500))
+    ckpt_interval = int(cfg.get("ckpt_interval", 1000))
+    os.makedirs(out_dir, exist_ok=True)
+    set_seed(seed)
+    # Dataset
+    train_images = cfg["data"]["train_images"]
+    train_masks = cfg["data"]["train_masks"]
+    dset = WireSegDataset(train_images, train_masks, split="train")
+    # Validation and test
+    val_images = cfg["data"].get("val_images", None)
+    val_masks = cfg["data"].get("val_masks", None)
+    test_images = cfg["data"].get("test_images", None)
+    test_masks = cfg["data"].get("test_masks", None)
+    dset_val = WireSegDataset(val_images, val_masks, split="val") if val_images and val_masks else None
+    dset_test = WireSegDataset(test_images, test_masks, split="test") if test_images and test_masks else None
+    sampler = BalancedPatchSampler(patch_size=patch_size, min_wire_ratio=0.01)
+    minmax = MinMaxLuminance(kernel=cfg["minmax"]["kernel"]) if cfg["minmax"]["enable"] else None
+    # Model
+    # Channel definition: RGB(3) + MinMax(2) + cond(1) + loc(1) = 7
+    pretrained_flag = bool(cfg.get("pretrained", False))
+    model = WireSegHR(backbone=cfg["backbone"], in_channels=7, pretrained=pretrained_flag)
+    model = model.to(device)
+    # Optimizer and loss
+    optim = torch.optim.AdamW(model.parameters(), lr=base_lr, weight_decay=weight_decay)
+    scaler = GradScaler(enabled=(device.type == "cuda" and amp_flag))
+    ce = nn.CrossEntropyLoss()
+    # Resume
+    start_step = 0
+    best_f1 = -1.0
+    resume_path = cfg.get("resume", None)
+    if resume_path and os.path.isfile(resume_path):
+        print(f"[WireSegHR][train] Resuming from {resume_path}")
+        start_step, best_f1 = _load_checkpoint(resume_path, model, optim, scaler, device)
+    # Training loop
+    model.train()
+    step = start_step
+    pbar = tqdm(total=iters - step, initial=0, desc="Train", ncols=100)
+    while step < iters:
+        optim.zero_grad(set_to_none=True)
+        imgs, masks = _sample_batch_same_size(dset, batch_size)
+        batch = _prepare_batch(imgs, masks, coarse_train, patch_size, sampler, minmax, device)
+        logits_coarse, cond_map = model.forward_coarse(batch["x_coarse"])  # (B,2,Hc/4,Wc/4) and (B,1,Hc/4,Wc/4)
+        # Upsample cond to full-res to crop the fine patch-aligned conditioning
+        B, _, hc4, wc4 = cond_map.shape
+        cond_up = F.interpolate(
+            cond_map.detach(), size=(batch["full_h"], batch["full_w"]), mode="bilinear", align_corners=False
+        )
+        # Build fine inputs: crop cond to patch, concat with patch RGB+MinMax and loc mask
+        x_fine = _build_fine_inputs(batch, cond_up, device)
+        logits_fine = model.forward_fine(x_fine)
+        # Targets
+        y_coarse = _build_coarse_targets(batch["mask_full"], hc4, wc4, device)
+        y_fine = _build_fine_targets(batch["mask_patches"], logits_fine.shape[2], logits_fine.shape[3], device)
+        with autocast(enabled=(device.type == "cuda" and amp_flag)):
+            loss_coarse = ce(logits_coarse, y_coarse)
+            loss_fine = ce(logits_fine, y_fine)
+            loss = loss_coarse + loss_fine
+        scaler.scale(loss).backward()
+        scaler.step(optim)
+        scaler.update()
+        # Poly LR schedule (per optimizer step)
+        lr = base_lr * ((1.0 - float(step) / float(iters)) ** power)
+        for pg in optim.param_groups:
+            pg["lr"] = lr
+        if step % 50 == 0:
+            print(
+                f"[Iter {step}/{iters}] lr={lr:.6e}"
+            )
+        # Eval & Checkpoint
+        if (step % eval_interval == 0) and (dset_val is not None):
+            model.eval()
+            val_stats = validate(model, dset_val, coarse_train, device, amp_flag)
+            print(f"[Val @ {step}] IoU={val_stats['iou']:.4f} F1={val_stats['f1']:.4f} P={val_stats['precision']:.4f} R={val_stats['recall']:.4f}")
+            # Save best
+            if val_stats["f1"] > best_f1:
+                best_f1 = val_stats["f1"]
+                _save_checkpoint(os.path.join(out_dir, "best.pt"), step, model, optim, scaler, best_f1)
+            # Save periodic ckpt
+            if ckpt_interval > 0 and (step % ckpt_interval == 0):
+                _save_checkpoint(os.path.join(out_dir, f"ckpt_{step}.pt"), step, model, optim, scaler, best_f1)
+            # Save test visualizations
+            if dset_test is not None:
+                save_test_visuals(model, dset_test, coarse_train, device, os.path.join(out_dir, f"test_vis_{step}"), amp_flag, max_samples=8)
+            model.train()
+        step += 1
+        pbar.update(1)
+    print("[WireSegHR][train] Done.")
+def _sample_batch_same_size(dset: WireSegDataset, batch_size: int) -> Tuple[List[np.ndarray], List[np.ndarray]]:
+    # Select a seed sample, then fill the batch with samples of the same (H,W)
+    assert len(dset) > 0
+    seed_idx = int(np.random.randint(0, len(dset)))
+    seed_item = dset[seed_idx]
+    H, W = seed_item["image"].shape[:2]
+    imgs: List[np.ndarray] = [seed_item["image"]]
+    masks: List[np.ndarray] = [seed_item["mask"]]
+    tries = 0
+    while len(imgs) < batch_size and tries < 5000:
+        idx = int(np.random.randint(0, len(dset)))
+        item = dset[idx]
+        im = item["image"]
+        if im.shape[0] == H and im.shape[1] == W:
+            imgs.append(im)
+            masks.append(item["mask"])
+        tries += 1
+    assert len(imgs) == batch_size, "Failed to assemble same-size batch"
+    return imgs, masks
+def _prepare_batch(
+    imgs: List[np.ndarray],
+    masks: List[np.ndarray],
+    coarse_train: int,
+    patch_size: int,
+    sampler: BalancedPatchSampler,
+    minmax: Optional[MinMaxLuminance],
+    device: torch.device,
+):
+    B = len(imgs)
+    assert B == len(masks)
+    # Keep numpy versions for geometry and torch versions for model inputs
+    import cv2
+    full_h = imgs[0].shape[0]
+    full_w = imgs[0].shape[1]
+    for im, m in zip(imgs, masks):
+        assert im.shape[0] == full_h and im.shape[1] == full_w
+        assert m.shape[0] == full_h and m.shape[1] == full_w
+    xs_coarse = []
+    patches_rgb = []
+    patches_mask = []
+    patches_min = []
+    patches_max = []
+    loc_masks = []
+    yx_list: List[tuple[int, int]] = []
+    for img, mask in zip(imgs, masks):
+        # Float32 [0,1]
+        imgf = img.astype(np.float32) / 255.0
+        if minmax is not None:
+            y_min, y_max = minmax(imgf)
+        else:
+            y = (0.299 * imgf[..., 0] + 0.587 * imgf[..., 1] + 0.114 * imgf[..., 2]).astype(np.float32)
+            y_min, y_max = y, y
+        # Coarse input: resize RGB + MinMax to coarse_train, pad cond+loc zeros to reach 7 channels
+        rgb_coarse = cv2.resize(imgf, (coarse_train, coarse_train), interpolation=cv2.INTER_LINEAR)
+        y_min_c = cv2.resize(y_min, (coarse_train, coarse_train), interpolation=cv2.INTER_LINEAR)
+        y_max_c = cv2.resize(y_max, (coarse_train, coarse_train), interpolation=cv2.INTER_LINEAR)
+        c = np.concatenate([
+            np.transpose(rgb_coarse, (2, 0, 1)),           # 3xHxW
+            y_min_c[None, ...],                            # 1xHxW
+            y_max_c[None, ...],                            # 1xHxW
+            np.zeros((1, coarse_train, coarse_train), np.float32),  # cond placeholder
+            np.zeros((1, coarse_train, coarse_train), np.float32),  # loc placeholder
+        ], axis=0)
+        xs_coarse.append(torch.from_numpy(c))
+        # Sample fine patch
+        y0, x0 = sampler.sample(imgf, mask)
+        patch_rgb = imgf[y0 : y0 + patch_size, x0 : x0 + patch_size, :]
+        patch_mask = mask[y0 : y0 + patch_size, x0 : x0 + patch_size]
+        patches_rgb.append(patch_rgb)
+        patches_mask.append(patch_mask)
+        patches_min.append(y_min[y0 : y0 + patch_size, x0 : x0 + patch_size])
+        patches_max.append(y_max[y0 : y0 + patch_size, x0 : x0 + patch_size])
+        # Binary location mask (ones inside the patch)
+        loc_masks.append(np.ones((patch_size, patch_size), dtype=np.float32))
+        yx_list.append((y0, x0))
+    x_coarse = torch.stack(xs_coarse, dim=0).to(device)  # Bx7xHc x Wc
+    # Store numpy arrays for fine build
+    return {
+        "x_coarse": x_coarse,
+        "full_h": full_h,
+        "full_w": full_w,
+        "rgb_patches": patches_rgb,
+        "mask_patches": patches_mask,
+        "ymin_patches": patches_min,
+        "ymax_patches": patches_max,
+        "loc_patches": loc_masks,
+        "patch_yx": yx_list,
+        "mask_full": masks,
+    }
+def _build_fine_inputs(batch, cond_up: torch.Tensor, device: torch.device) -> torch.Tensor:
+    # Build fine input tensor Bx7xP x P from per-sample numpy buffers and upsampled cond maps
+    B = cond_up.shape[0]
+    P = batch["loc_patches"][0].shape[0]
+    xs: List[torch.Tensor] = []
+    for i in range(B):
+        rgb = batch["rgb_patches"][i]
+        ymin = batch["ymin_patches"][i]
+        ymax = batch["ymax_patches"][i]
+        loc = batch["loc_patches"][i]
+        y0, x0 = batch["patch_yx"][i]
+        cond_patch = cond_up[i : i + 1, :, y0 : y0 + P, x0 : x0 + P]  # 1x1xPxP
+        cond_patch = cond_patch.squeeze(1)  # 1xPxP
+        # Convert numpy channels to torch and concat
+        rgb_t = torch.from_numpy(np.transpose(rgb, (2, 0, 1)))  # 3xPxP
+        ymin_t = torch.from_numpy(ymin)[None, ...]  # 1xPxP
+        ymax_t = torch.from_numpy(ymax)[None, ...]  # 1xPxP
+        loc_t = torch.from_numpy(loc)[None, ...]    # 1xPxP
+        x = torch.cat([rgb_t, ymin_t, ymax_t, cond_patch.cpu(), loc_t], dim=0).float()  # 7xPxP
+        xs.append(x)
+    x_fine = torch.stack(xs, dim=0).to(device)
+    return x_fine
+def _build_coarse_targets(masks: List[np.ndarray], out_h: int, out_w: int, device: torch.device) -> torch.Tensor:
+    ys: List[torch.Tensor] = []
+    for m in masks:
+        dm = downsample_label_maxpool(m, out_h, out_w)
+        ys.append(torch.from_numpy(dm.astype(np.int64)))
+    y = torch.stack(ys, dim=0).to(device)  # BxHc4xWc4 with values {0,1}
+    return y
+def _build_fine_targets(mask_patches: List[np.ndarray], out_h: int, out_w: int, device: torch.device) -> torch.Tensor:
+    ys: List[torch.Tensor] = []
+    for m in mask_patches:
+        dm = downsample_label_maxpool(m, out_h, out_w)
+        ys.append(torch.from_numpy(dm.astype(np.int64)))
+    y = torch.stack(ys, dim=0).to(device)  # BxHf4xWf4 with values {0,1}
+    return y
 if __name__ == "__main__":
     main()
+def set_seed(seed: int):
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed_all(seed)
+    cudnn.benchmark = False
+    cudnn.deterministic = True
+def _save_checkpoint(path: str, step: int, model: nn.Module, optim: torch.optim.Optimizer, scaler: GradScaler, best_f1: float):
+    os.makedirs(os.path.dirname(path), exist_ok=True)
+    state = {
+        "step": step,
+        "model": model.state_dict(),
+        "optim": optim.state_dict(),
+        "scaler": scaler.state_dict(),
+        "best_f1": best_f1,
+    }
+    torch.save(state, path)
+    print(f"[WireSegHR][train] Saved checkpoint: {path}")
+def _load_checkpoint(path: str, model: nn.Module, optim: torch.optim.Optimizer, scaler: GradScaler, device: torch.device) -> Tuple[int, float]:
+    ckpt = torch.load(path, map_location=device)
+    model.load_state_dict(ckpt["model"])
+    optim.load_state_dict(ckpt["optim"])
+    try:
+        scaler.load_state_dict(ckpt["scaler"])  # may not exist
+    except Exception:
+        pass
+    step = int(ckpt.get("step", 0))
+    best_f1 = float(ckpt.get("best_f1", -1.0))
+    return step, best_f1
+@torch.no_grad()
+def validate(model: WireSegHR, dset_val: WireSegDataset, coarse_size: int, device: torch.device, amp_flag: bool) -> Dict[str, float]:
+    # Coarse-only validation: resize image to coarse_size, predict coarse logits, upsample to full and compute metrics
+    model = model.to(device)
+    metrics_sum = {"iou": 0.0, "f1": 0.0, "precision": 0.0, "recall": 0.0}
+    n = 0
+    for i in range(len(dset_val)):
+        item = dset_val[i]
+        img = item["image"].astype(np.float32) / 255.0  # HxWx3
+        mask = item["mask"].astype(np.uint8)
+        H, W = mask.shape
+        # Build coarse input (zeros for cond+loc)
+        rgb_c = cv2.resize(img, (coarse_size, coarse_size), interpolation=cv2.INTER_LINEAR)
+        y = (0.299 * img[..., 0] + 0.587 * img[..., 1] + 0.114 * img[..., 2]).astype(np.float32)
+        y_min = cv2.resize(y, (coarse_size, coarse_size), interpolation=cv2.INTER_LINEAR)
+        y_max = y_min
+        x = np.concatenate([
+            np.transpose(rgb_c, (2, 0, 1)),
+            y_min[None, ...],
+            y_max[None, ...],
+            np.zeros((1, coarse_size, coarse_size), np.float32),
+            np.zeros((1, coarse_size, coarse_size), np.float32),
+        ], axis=0)
+        x_t = torch.from_numpy(x)[None, ...].to(device)
+        with autocast(enabled=(device.type == "cuda" and amp_flag)):
+            logits_c, _ = model.forward_coarse(x_t)
+        prob = torch.softmax(logits_c, dim=1)[:, 1:2]
+        prob_up = F.interpolate(prob, size=(H, W), mode="bilinear", align_corners=False)[0, 0].detach().cpu().numpy()
+        pred = (prob_up > 0.5).astype(np.uint8)
+        m = compute_metrics(pred, mask)
+        for k in metrics_sum:
+            metrics_sum[k] += m[k]
+        n += 1
+    if n == 0:
+        return {k: 0.0 for k in metrics_sum}
+    return {k: v / float(n) for k, v in metrics_sum.items()}
+@torch.no_grad()
+def save_test_visuals(model: WireSegHR, dset_test: WireSegDataset, coarse_size: int, device: torch.device, out_dir: str, amp_flag: bool, max_samples: int = 8):
+    os.makedirs(out_dir, exist_ok=True)
+    for i in range(min(max_samples, len(dset_test))):
+        item = dset_test[i]
+        img = item["image"].astype(np.float32) / 255.0
+        H, W = img.shape[:2]
+        rgb_c = cv2.resize(img, (coarse_size, coarse_size), interpolation=cv2.INTER_LINEAR)
+        y = (0.299 * img[..., 0] + 0.587 * img[..., 1] + 0.114 * img[..., 2]).astype(np.float32)
+        y_min = cv2.resize(y, (coarse_size, coarse_size), interpolation=cv2.INTER_LINEAR)
+        y_max = y_min
+        x = np.concatenate([
+            np.transpose(rgb_c, (2, 0, 1)),
+            y_min[None, ...],
+            y_max[None, ...],
+            np.zeros((1, coarse_size, coarse_size), np.float32),
+            np.zeros((1, coarse_size, coarse_size), np.float32),
+        ], axis=0)
+        x_t = torch.from_numpy(x)[None, ...].to(device)
+        with autocast(enabled=(device.type == "cuda" and amp_flag)):
+            logits_c, _ = model.forward_coarse(x_t)
+        prob = torch.softmax(logits_c, dim=1)[:, 1:2]
+        prob_up = F.interpolate(prob, size=(H, W), mode="bilinear", align_corners=False)[0, 0].detach().cpu().numpy()
+        pred = (prob_up > 0.5).astype(np.uint8) * 255
+        # Save input and prediction
+        img_bgr = (img[..., ::-1] * 255.0).astype(np.uint8)
+        cv2.imwrite(os.path.join(out_dir, f"{i:03d}_input.jpg"), img_bgr)
+        cv2.imwrite(os.path.join(out_dir, f"{i:03d}_pred.png"), pred)