Image-to-Image
Fixer
English
nvidia
image2image
shsolanki commited on
Commit
f73fd5f
·
verified ·
1 Parent(s): 3e96523

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +167 -167
README.md CHANGED
@@ -1,167 +1,167 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- ---
6
- # **Fixer: Improving 3D Reconstructions with Single-Step Diffusion Models**
7
- [**Code**](https://github.com/nv-tlabs/Fixer) | [**Paper**](https://arxiv.org/abs/2503.01774)
8
-
9
- ## Use the Fixer Model
10
- Please visit the [Fixer repository](https://github.com/nv-tlabs/Fixer) to access all relevant files and code needed to use Fixer
11
-
12
- ## Description:
13
- Fixer is a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views caused by
14
- underconstrained regions of three-dimensional (3D) representation. The technology behind Fixer is based on the concepts outlined in the paper titled
15
- [Difix3d+: Improving 3D Reconstructions with Single-Step Diffusion Models](https://arxiv.org/abs/2503.01774).
16
-
17
- Fixer has two operation modes:
18
-
19
- * Offline mode: Used during the reconstruction phase to clean up pseudo-training views that are rendered from the reconstruction
20
- and then distill them back into 3D. This greatly enhances underconstrained regions and improves the overall 3D representation quality.
21
- * Online mode: Acts as a neural enhancer during inference, effectively removing residual artifacts arising from imperfect 3D
22
- supervision and the limited capacity of current reconstruction models.
23
-
24
- Fixer is an all-encompassing solution, a single model compatible with both Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) representations. This model, however, was trained on 3DGUT data and is highly adaptable to GS scenes.
25
-
26
- **Model Developer:** NVIDIA
27
-
28
- **Model Versions:** Fixer
29
-
30
- **Deployment Geography:** Global
31
-
32
- ### License/Terms of Use:
33
- Your use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
34
-
35
- **This model is ready for commercial/non-commercial use.**
36
-
37
- ### Use Case:
38
- Fixer is intended for Physical AI developers looking to enhance and improve their Neural Reconstruction pipelines. The model takes an image as an input and outputs a fixed image.
39
-
40
- **Release Date:**
41
- - V1 (Stable Diffusion): June 2025, Hugging Face - https://huggingface.co/nvidia/difix
42
- - V2 (Cosmos): October 2025, Hugging Face - https://huggingface.co/nvidia/fixer
43
-
44
- ## Model Architecture
45
-
46
- **Architecture Type**: Linear Diffusion Transformer
47
-
48
- **Network Architecture**: Linear-attention Diffusion Transformer with a Deep Compression Autoencoder (DC-AE) for efficient high-resolution image generation.
49
-
50
- **Based on**: Cosmos-Predict-0.6B
51
-
52
- **Number of model parameters**: 0.6B
53
-
54
- ## Input
55
-
56
- **Input Type(s)**: Image
57
-
58
- **Input Format(s)**: Red, Green, Blue (RGB)
59
-
60
- **Input Parameters**: Two-Dimensional (2D)
61
-
62
- **Other Properties Related to Input**:
63
- * Specific Resolution: [576px x 1024px]
64
-
65
- ## Output
66
-
67
- **Output Type(s)**: Image
68
-
69
- **Output Format(s)**: Red, Green, Blue (RGB)
70
-
71
- **Output Parameters**: Two-Dimensional (2D)
72
-
73
- **Other Properties Related to Output**:
74
- * Specific Resolution: [576px x 1024px]
75
-
76
- ## Software Integration
77
-
78
- **Runtime Engine(s)**: PyTorch
79
-
80
- **Supported Hardware Microarchitecture Compatibility**:
81
- * NVIDIA Ampere
82
-
83
- **[Preferred/Supported] Operating System(s)**: Linux
84
-
85
- **Note**: Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
86
-
87
- The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
88
-
89
- ## Inference
90
- **Engine**: PyTorch>=2.0.0
91
-
92
- **Test Hardware**:
93
- We tested on H100, A100, A10, and L20:
94
-
95
- | GPU Hardware | Inference Runtime |
96
- |--------------|-------------------|
97
- | NVIDIA H100 | 19ms |
98
- | NVIDIA A100 | 25ms |
99
- | NVIDIA L20 | 28ms |
100
- | NVIDIA A10 | 43ms |
101
-
102
- ## Training, Testing, and Evaluation Datasets
103
-
104
- Fixer was trained, tested, and evaluated using an internal dataset, where 80% of the data was used for training, 10% for evaluation, and 10% for testing.
105
-
106
- ### NVIDIA Internal AV Dataset
107
-
108
- - **Data Modality**: Image
109
- - **Image Training Data Size**: 1 Million to 1 Billion Images
110
- - **Data Collection Method**: Sensors
111
- - **Labeling Method by Dataset**: Human
112
- - **Properties**: The dataset contains the autonomous driving image/videos captured by NVIDIA Vehicles. It's collected by autonomous driving vehicles.
113
-
114
- ## Ethical Considerations:
115
- NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
116
-
117
- Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
118
-
119
- Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)
120
-
121
- ---
122
-
123
- ## ModelCard++
124
-
125
- ### Bias
126
-
127
- | Field | Response |
128
- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------- |
129
- | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
130
- | Measures taken to mitigate against unwanted bias: | None |
131
-
132
- ### Explainability
133
-
134
- | Field | Response |
135
- | :-------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------- |
136
- | Intended Domain: | Advanced Driver Assistance Systems |
137
- | Model Type: | Image-to-Image |
138
- | Intended Users: | Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines. |
139
- | Output: | Image |
140
- | Describe how the model works: | The model takes as an input an image, and outputs a fixed image |
141
- | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | None |
142
- | Technical Limitations: | The reconstruction relies on the quality and consistency of input images and camera calibrations; any deficiencies in these areas can negatively impact the final output. |
143
- | Verified to have met prescribed NVIDIA quality standards: | Yes |
144
- | Performance Metrics: | FID (Fréchet Inception Distance), PSNR (Peak Signal-to-Noise Ratio), LPIPS (Learned Perceptual Image Patch Similarity) |
145
- | Potential Known Risks: | The model is not guaranteed to fix 100% of the image artifacts. Please verify the generated scenarios are context and use appropriate. |
146
- | Licensing: | Your use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
147
-
148
- ### Privacy
149
-
150
- | Field | Response |
151
- | :------------------------------------------------------------------ | :------------- |
152
- | Generatable or reverse engineerable personal data? | No |
153
- | Personal data used to create this model? | No |
154
- | How often is the dataset reviewed? | Before release |
155
- | Is there provenance for all datasets used in training? | Yes |
156
- | Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
157
- | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
158
- | Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ |
159
-
160
- ### Safety & Security
161
-
162
- | Field | Response |
163
- | :---------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
164
- | Model Application(s): | Image Enhancement - The model can be used to develop Autonomous Vehicles stacks that can be integrated inside vehicles. The Fixer model should not be deployed in a vehicle. |
165
- | Describe the life critical impact (if present). | N/A - The model should not be deployed in a vehicle and will not perform life-critical tasks. |
166
- | Use Case Restrictions: | Your use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
167
- | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ ---
6
+ # **Fixer: Improving 3D Reconstructions with Single-Step Diffusion Models**
7
+ [**Code**](https://github.com/nv-tlabs/Fixer) | [**Paper**](https://arxiv.org/abs/2503.01774)
8
+
9
+ ## Use the Fixer Model
10
+ Please visit the [Fixer repository](https://github.com/nv-tlabs/Fixer) to access all relevant files and code needed to use Fixer
11
+
12
+ ## Description:
13
+ Fixer is a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views caused by
14
+ underconstrained regions of three-dimensional (3D) representation. The technology behind Fixer is based on the concepts outlined in the paper titled
15
+ [Difix3d+: Improving 3D Reconstructions with Single-Step Diffusion Models](https://arxiv.org/abs/2503.01774).
16
+
17
+ Fixer has two operation modes:
18
+
19
+ * Offline mode: Used during the reconstruction phase to clean up pseudo-training views that are rendered from the reconstruction
20
+ and then distill them back into 3D. This greatly enhances underconstrained regions and improves the overall 3D representation quality.
21
+ * Online mode: Acts as a neural enhancer during inference, effectively removing residual artifacts arising from imperfect 3D
22
+ supervision and the limited capacity of current reconstruction models.
23
+
24
+ Fixer is an all-encompassing solution, a single model compatible with both Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) representations. This model, however, was trained on 3DGUT data and is highly adaptable to GS scenes.
25
+
26
+ **Model Developer:** NVIDIA
27
+
28
+ **Model Versions:** Fixer
29
+
30
+ **Deployment Geography:** Global
31
+
32
+ **This model is ready for commercial/non-commercial use.**
33
+
34
+ ### License/Terms of Use:
35
+ Your use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
36
+
37
+ ### Use Case:
38
+ Fixer is intended for Physical AI developers looking to enhance and improve their Neural Reconstruction pipelines. The model takes an image as an input and outputs a fixed image.
39
+
40
+ **Release Date:**
41
+ - V1 (Stable Diffusion): June 2025, Hugging Face - https://huggingface.co/nvidia/difix
42
+ - V2 (Cosmos): October 2025, Hugging Face - https://huggingface.co/nvidia/fixer
43
+
44
+ ## Model Architecture
45
+
46
+ **Architecture Type**: Linear Diffusion Transformer
47
+
48
+ **Network Architecture**: Linear-attention Diffusion Transformer with a Deep Compression Autoencoder (DC-AE) for efficient high-resolution image generation.
49
+
50
+ **Based on**: Cosmos-Predict-0.6B
51
+
52
+ **Number of model parameters**: 0.6B
53
+
54
+ ## Input
55
+
56
+ **Input Type(s)**: Image
57
+
58
+ **Input Format(s)**: Red, Green, Blue (RGB)
59
+
60
+ **Input Parameters**: Two-Dimensional (2D)
61
+
62
+ **Other Properties Related to Input**:
63
+ * Specific Resolution: [576px x 1024px]
64
+
65
+ ## Output
66
+
67
+ **Output Type(s)**: Image
68
+
69
+ **Output Format(s)**: Red, Green, Blue (RGB)
70
+
71
+ **Output Parameters**: Two-Dimensional (2D)
72
+
73
+ **Other Properties Related to Output**:
74
+ * Specific Resolution: [576px x 1024px]
75
+
76
+ ## Software Integration
77
+
78
+ **Runtime Engine(s)**: PyTorch
79
+
80
+ **Supported Hardware Microarchitecture Compatibility**:
81
+ * NVIDIA Ampere
82
+
83
+ **[Preferred/Supported] Operating System(s)**: Linux
84
+
85
+ **Note**: Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
86
+
87
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
88
+
89
+ ## Inference
90
+ **Engine**: PyTorch>=2.0.0
91
+
92
+ **Test Hardware**:
93
+ We tested on H100, A100, A10, and L20:
94
+
95
+ | GPU Hardware | Inference Runtime |
96
+ |--------------|-------------------|
97
+ | NVIDIA H100 | 19ms |
98
+ | NVIDIA A100 | 25ms |
99
+ | NVIDIA L20 | 28ms |
100
+ | NVIDIA A10 | 43ms |
101
+
102
+ ## Training, Testing, and Evaluation Datasets
103
+
104
+ Fixer was trained, tested, and evaluated using an internal dataset, where 80% of the data was used for training, 10% for evaluation, and 10% for testing.
105
+
106
+ ### NVIDIA Internal AV Dataset
107
+
108
+ - **Data Modality**: Image
109
+ - **Image Training Data Size**: 1 Million to 1 Billion Images
110
+ - **Data Collection Method**: Sensors
111
+ - **Labeling Method by Dataset**: Human
112
+ - **Properties**: The dataset contains the autonomous driving image/videos captured by NVIDIA Vehicles. It's collected by autonomous driving vehicles.
113
+
114
+ ## Ethical Considerations:
115
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
116
+
117
+ Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
118
+
119
+ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)
120
+
121
+ ---
122
+
123
+ ## ModelCard++
124
+
125
+ ### Bias
126
+
127
+ | Field | Response |
128
+ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------- |
129
+ | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
130
+ | Measures taken to mitigate against unwanted bias: | None |
131
+
132
+ ### Explainability
133
+
134
+ | Field | Response |
135
+ | :-------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------- |
136
+ | Intended Domain: | Advanced Driver Assistance Systems |
137
+ | Model Type: | Image-to-Image |
138
+ | Intended Users: | Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines. |
139
+ | Output: | Image |
140
+ | Describe how the model works: | The model takes as an input an image, and outputs a fixed image |
141
+ | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | None |
142
+ | Technical Limitations: | The reconstruction relies on the quality and consistency of input images and camera calibrations; any deficiencies in these areas can negatively impact the final output. |
143
+ | Verified to have met prescribed NVIDIA quality standards: | Yes |
144
+ | Performance Metrics: | FID (Fréchet Inception Distance), PSNR (Peak Signal-to-Noise Ratio), LPIPS (Learned Perceptual Image Patch Similarity) |
145
+ | Potential Known Risks: | The model is not guaranteed to fix 100% of the image artifacts. Please verify the generated scenarios are context and use appropriate. |
146
+ | Licensing: | Your use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
147
+
148
+ ### Privacy
149
+
150
+ | Field | Response |
151
+ | :------------------------------------------------------------------ | :------------- |
152
+ | Generatable or reverse engineerable personal data? | No |
153
+ | Personal data used to create this model? | No |
154
+ | How often is the dataset reviewed? | Before release |
155
+ | Is there provenance for all datasets used in training? | Yes |
156
+ | Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
157
+ | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
158
+ | Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ |
159
+
160
+ ### Safety & Security
161
+
162
+ | Field | Response |
163
+ | :---------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
164
+ | Model Application(s): | Image Enhancement - The model can be used to develop Autonomous Vehicles stacks that can be integrated inside vehicles. The Fixer model should not be deployed in a vehicle. |
165
+ | Describe the life critical impact (if present). | N/A - The model should not be deployed in a vehicle and will not perform life-critical tasks. |
166
+ | Use Case Restrictions: | Your use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
167
+ | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |