Spaces:
Running
Running
| # Code Structure of CUDA operators | |
| This folder contains all non-python code for MMCV custom ops. Please follow the same architecture if you want to add new ops. | |
| ## Directories Tree | |
| ```folder | |
| . | |
| βββ common | |
| β βββ box_iou_rotated_utils.hpp | |
| β βββ parrots_cpp_helper.hpp | |
| β βββ parrots_cuda_helper.hpp | |
| β βββ pytorch_cpp_helper.hpp | |
| β βββ pytorch_cuda_helper.hpp | |
| β βββ pytorch_device_registry.hpp | |
| βΒ Β βββ cuda | |
| βΒ Β β βββ common_cuda_helper.hpp | |
| βΒ Β β βββ parrots_cudawarpfunction.cuh | |
| βΒ Β β βββ ... | |
| βΒ Β β βββ ops_cuda_kernel.cuh | |
| |Β Β βββ mps | |
| βΒ Β β βββ MPSLibrary.h | |
| βΒ Β β βββ ... | |
| βΒ Β β βββ MPSUtils.h | |
| |Β Β βββ mlu | |
| βΒ Β β βββ ... | |
| |Β Β βββ utils | |
| βΒ Β β βββ ... | |
| βββ parrots | |
| βΒ Β βββ ... | |
| βΒ Β βββ ops.cpp | |
| βΒ Β βββ ops_parrots.cpp | |
| βΒ Β βββ ops_pytorch.h | |
| βββ pytorch | |
| Β Β Β βββ info.cpp | |
| Β Β Β βββ pybind.cpp | |
| Β Β Β βββ ... | |
| Β Β Β βββ ops.cpp | |
| Β Β Β βββ cuda | |
| Β Β Β βΒ Β βββ ... | |
| Β Β Β βΒ Β βββ ops_cuda.cu | |
| Β Β Β βββ cpu | |
| Β Β Β βΒ Β βββ ... | |
| Β Β Β βΒ Β βββ ops.cpp | |
| Β Β Β βββ mps | |
| Β Β Β βΒ Β βββ ... | |
| Β Β Β |Β Β βββ op_mps.mm | |
| Β Β Β βββ mlu | |
| Β Β Β Β Β βββ ... | |
| Β Β Β Β Β βββ op_mlu.cpp | |
| ``` | |
| ## Components | |
| - `common`: This directory contains all tools and shared codes. | |
| - `cuda`: The cuda kernels which can be shared by all backends. **HIP** kernel is also here since they have similar syntax. | |
| - `mps`: The tools used to support MPS ops. **NOTE** that MPS support is **experimental**. | |
| - `mlu`: The MLU kernels used to support [Cambricon](https://www.cambricon.com/) device. | |
| - `utils`: The kernels and utils of spconv. | |
| - `parrots`: **Parrots** is a deep learning frame for model training and inference. Parrots custom ops are placed in this directory. | |
| - `pytorch`: **PyTorch** custom ops are supported by binding C++ to Python with **pybind11**. The ops implementation and binding codes are placed in this directory. | |
| - `cuda`: This directory contains cuda kernel launchers, which feed memory pointers of tensor to the cuda kernel in `common/cuda`. The launchers provide c++ interface of cuda implementation of corresponding custom ops. | |
| - `cpu`: This directory contain cpu implementations of corresponding custom ops. | |
| - `mlu`: This directory contain launchers of each MLU kernels. | |
| - `mps`: MPS ops implementation and launchers. | |
| ## How to add new PyTorch ops? | |
| 1. (Optional) Add shared kernel in `common` to support special hardware platform. | |
| ```c++ | |
| // src/common/cuda/new_ops_cuda_kernel.cuh | |
| template <typename T> | |
| __global__ void new_ops_forward_cuda_kernel(const T* input, T* output, ...) { | |
| // forward here | |
| } | |
| ``` | |
| Add cuda kernel launcher in `pytorch/cuda`. | |
| ```c++ | |
| // src/pytorch/cuda | |
| #include <new_ops_cuda_kernel.cuh> | |
| void NewOpsForwardCUDAKernelLauncher(Tensor input, Tensor output, ...){ | |
| // initialize | |
| at::cuda::CUDAGuard device_guard(input.device()); | |
| cudaStream_t stream = at::cuda::getCurrentCUDAStream(); | |
| ... | |
| AT_DISPATCH_FLOATING_TYPES_AND_HALF( | |
| input.scalar_type(), "new_ops_forward_cuda_kernel", ([&] { | |
| new_ops_forward_cuda_kernel<scalar_t> | |
| <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>( | |
| input.data_ptr<scalar_t>(), output.data_ptr<scalar_t>(),...); | |
| })); | |
| AT_CUDA_CHECK(cudaGetLastError()); | |
| } | |
| ``` | |
| 2. Register implementation for different devices. | |
| ```c++ | |
| // src/pytorch/cuda/cudabind.cpp | |
| ... | |
| Tensor new_ops_forward_cuda(Tensor input, Tensor output, ...){ | |
| // implement cuda forward here | |
| // use `NewOpsForwardCUDAKernelLauncher` here | |
| } | |
| // declare interface here. | |
| Tensor new_ops_forward_impl(Tensor input, Tensor output, ...); | |
| // register the implementation for given device (CUDA here). | |
| REGISTER_DEVICE_IMPL(new_ops_forward_impl, CUDA, new_ops_forward_cuda); | |
| ``` | |
| 3. Add ops implementation in `pytorch` directory. Select different implementations according to device type. | |
| ```c++ | |
| // src/pytorch/new_ops.cpp | |
| Tensor new_ops_forward_impl(Tensor input, Tensor output, ...){ | |
| // dispatch the implementation according to the device type of input. | |
| DISPATCH_DEVICE_IMPL(new_ops_forward_impl, input, output, ...); | |
| } | |
| ... | |
| Tensor new_ops_forward(Tensor input, Tensor output, ...){ | |
| return new_ops_forward_impl(input, output, ...); | |
| } | |
| ``` | |
| 4. Binding the implementation in `pytorch/pybind.cpp` | |
| ```c++ | |
| // src/pytorch/pybind.cpp | |
| ... | |
| Tensor new_ops_forward(Tensor input, Tensor output, ...); | |
| ... | |
| // bind with pybind11 | |
| m.def("new_ops_forward", &new_ops_forward, "new_ops_forward", | |
| py::arg("input"), py::arg("output"), ...); | |
| ... | |
| ``` | |
| 5. Build MMCV again. Enjoy new ops in python | |
| ```python | |
| from ..utils import ext_loader | |
| ext_module = ext_loader.load_ext('_ext', ['new_ops_forward']) | |
| ... | |
| ext_module.new_ops_forward(input, output, ...) | |
| ``` | |