speedinghzl commited on
Commit
47b1f1c
·
verified ·
1 Parent(s): 7468b16

Upload folder using huggingface_hub

Browse files
clip_vit_l16_s512m_bs16k_mix0_8/checkpoints/epoch_4.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fed8224469483093727f6f75d7c02063bdae3293f111080d7ab0dd2873757b1e
3
+ size 5133437730
clip_vit_l16_s512m_bs16k_mix0_8/out.log ADDED
@@ -0,0 +1,498 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2025-05-06,11:17:54 | INFO | No latest resume checkpoint found in ./logs-lr1e-3-datacomp/clip_vit_l16_s512m_bs16k_mix0_8/checkpoints.
2
+ 2025-05-06,11:17:57 | INFO | Running in distributed mode with multiple processes. Device: cuda:0.Process (global: 0, local 0), total 32.
3
+ 2025-05-06,11:17:57 | INFO | Loaded ViT-L-16 model config.
4
+ 2025-05-06,11:18:01 | INFO | Model:
5
+ 2025-05-06,11:18:01 | INFO | CLIP(
6
+ (visual): VisionTransformer(
7
+ (conv1): Conv2d(3, 1024, kernel_size=(16, 16), stride=(16, 16), bias=False)
8
+ (patch_dropout): Identity()
9
+ (ln_pre): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
10
+ (transformer): Transformer(
11
+ (resblocks): ModuleList(
12
+ (0-23): 24 x ResidualAttentionBlock(
13
+ (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
14
+ (attn): MultiheadAttention(
15
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=1024, out_features=1024, bias=True)
16
+ )
17
+ (ls_1): Identity()
18
+ (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
19
+ (mlp): Sequential(
20
+ (c_fc): Linear(in_features=1024, out_features=4096, bias=True)
21
+ (gelu): GELU(approximate='none')
22
+ (c_proj): Linear(in_features=4096, out_features=1024, bias=True)
23
+ )
24
+ (ls_2): Identity()
25
+ )
26
+ )
27
+ )
28
+ (ln_post): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
29
+ )
30
+ (transformer): Transformer(
31
+ (resblocks): ModuleList(
32
+ (0-11): 12 x ResidualAttentionBlock(
33
+ (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
34
+ (attn): MultiheadAttention(
35
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
36
+ )
37
+ (ls_1): Identity()
38
+ (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
39
+ (mlp): Sequential(
40
+ (c_fc): Linear(in_features=768, out_features=3072, bias=True)
41
+ (gelu): GELU(approximate='none')
42
+ (c_proj): Linear(in_features=3072, out_features=768, bias=True)
43
+ )
44
+ (ls_2): Identity()
45
+ )
46
+ )
47
+ )
48
+ (token_embedding): Embedding(49408, 768)
49
+ (ln_final): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
50
+ )
51
+ 2025-05-06,11:18:01 | INFO | Params:
52
+ 2025-05-06,11:18:01 | INFO | NDR_patch_size: 16
53
+ 2025-05-06,11:18:01 | INFO | accum_freq: 1
54
+ 2025-05-06,11:18:01 | INFO | aug_cfg: {}
55
+ 2025-05-06,11:18:01 | INFO | batch_size: 512
56
+ 2025-05-06,11:18:01 | INFO | beta1: 0.9
57
+ 2025-05-06,11:18:01 | INFO | beta2: 0.98
58
+ 2025-05-06,11:18:01 | INFO | checkpoint_path: ./logs-lr1e-3-datacomp/clip_vit_l16_s512m_bs16k_mix0_8/checkpoints
59
+ 2025-05-06,11:18:01 | INFO | coca_caption_loss_weight: 2.0
60
+ 2025-05-06,11:18:01 | INFO | coca_contrastive_loss_weight: 1.0
61
+ 2025-05-06,11:18:01 | INFO | copy_codebase: False
62
+ 2025-05-06,11:18:01 | INFO | csv_caption_key: title
63
+ 2025-05-06,11:18:01 | INFO | csv_img_key: filepath
64
+ 2025-05-06,11:18:01 | INFO | csv_separator:
65
+ 2025-05-06,11:18:01 | INFO | dataset_resampled: False
66
+ 2025-05-06,11:18:01 | INFO | dataset_type: webdataset
67
+ 2025-05-06,11:18:01 | INFO | ddp_static_graph: True
68
+ 2025-05-06,11:18:01 | INFO | debug: False
69
+ 2025-05-06,11:18:01 | INFO | delete_prev_step_ckpt: True
70
+ 2025-05-06,11:18:01 | INFO | delete_previous_checkpoint: False
71
+ 2025-05-06,11:18:01 | INFO | device: cuda:0
72
+ 2025-05-06,11:18:01 | INFO | dist_backend: nccl
73
+ 2025-05-06,11:18:01 | INFO | dist_url: env://
74
+ 2025-05-06,11:18:01 | INFO | distill: False
75
+ 2025-05-06,11:18:01 | INFO | distill_model: None
76
+ 2025-05-06,11:18:01 | INFO | distill_pretrained: None
77
+ 2025-05-06,11:18:01 | INFO | distributed: True
78
+ 2025-05-06,11:18:01 | INFO | epochs: 4
79
+ 2025-05-06,11:18:01 | INFO | epochs_cooldown: None
80
+ 2025-05-06,11:18:01 | INFO | eps: 1e-06
81
+ 2025-05-06,11:18:01 | INFO | force_custom_text: False
82
+ 2025-05-06,11:18:01 | INFO | force_image_size: 224
83
+ 2025-05-06,11:18:01 | INFO | force_patch_dropout: None
84
+ 2025-05-06,11:18:01 | INFO | force_quick_gelu: False
85
+ 2025-05-06,11:18:01 | INFO | gather_with_grad: True
86
+ 2025-05-06,11:18:01 | INFO | global_batch_size: 16384
87
+ 2025-05-06,11:18:01 | INFO | grad_checkpointing: True
88
+ 2025-05-06,11:18:01 | INFO | grad_clip_norm: None
89
+ 2025-05-06,11:18:01 | INFO | horovod: False
90
+ 2025-05-06,11:18:01 | INFO | image_interpolation: None
91
+ 2025-05-06,11:18:01 | INFO | image_mean: None
92
+ 2025-05-06,11:18:01 | INFO | image_resize_mode: None
93
+ 2025-05-06,11:18:01 | INFO | image_std: None
94
+ 2025-05-06,11:18:01 | INFO | imagenet_v2: None
95
+ 2025-05-06,11:18:01 | INFO | imagenet_val: /mnt/bn/zilongdata-hl/dataset/imagenet/val
96
+ 2025-05-06,11:18:01 | INFO | is_cls_token: True
97
+ 2025-05-06,11:18:01 | INFO | local_loss: True
98
+ 2025-05-06,11:18:01 | INFO | local_rank: 0
99
+ 2025-05-06,11:18:01 | INFO | lock_image: False
100
+ 2025-05-06,11:18:01 | INFO | lock_image_freeze_bn_stats: False
101
+ 2025-05-06,11:18:01 | INFO | lock_image_unlocked_groups: 0
102
+ 2025-05-06,11:18:01 | INFO | lock_text: False
103
+ 2025-05-06,11:18:01 | INFO | lock_text_freeze_layer_norm: False
104
+ 2025-05-06,11:18:01 | INFO | lock_text_unlocked_layers: 0
105
+ 2025-05-06,11:18:01 | INFO | log_every_n_steps: 128
106
+ 2025-05-06,11:18:01 | INFO | log_level: 20
107
+ 2025-05-06,11:18:01 | INFO | log_local: False
108
+ 2025-05-06,11:18:01 | INFO | log_path: ./logs-lr1e-3-datacomp/clip_vit_l16_s512m_bs16k_mix0_8/out.log
109
+ 2025-05-06,11:18:01 | INFO | logs: ./logs-lr1e-3-datacomp
110
+ 2025-05-06,11:18:01 | INFO | lr: 0.001
111
+ 2025-05-06,11:18:01 | INFO | lr_cooldown_end: 0.0
112
+ 2025-05-06,11:18:01 | INFO | lr_cooldown_power: 1.0
113
+ 2025-05-06,11:18:01 | INFO | lr_scheduler: cosine
114
+ 2025-05-06,11:18:01 | INFO | max_seq_len: 15000
115
+ 2025-05-06,11:18:01 | INFO | model: ViT-L-16
116
+ 2025-05-06,11:18:01 | INFO | name: clip_vit_l16_s512m_bs16k_mix0_8
117
+ 2025-05-06,11:18:01 | INFO | native_dynamic_resolution: False
118
+ 2025-05-06,11:18:01 | INFO | no_set_device_rank: False
119
+ 2025-05-06,11:18:01 | INFO | only_packing: False
120
+ 2025-05-06,11:18:01 | INFO | precision: amp
121
+ 2025-05-06,11:18:01 | INFO | pretrained:
122
+ 2025-05-06,11:18:01 | INFO | pretrained_image:
123
+ 2025-05-06,11:18:01 | INFO | pretrained_text:
124
+ 2025-05-06,11:18:01 | INFO | rank: 0
125
+ 2025-05-06,11:18:01 | INFO | remote_sync: None
126
+ 2025-05-06,11:18:01 | INFO | remote_sync_frequency: 300
127
+ 2025-05-06,11:18:01 | INFO | remote_sync_protocol: s3
128
+ 2025-05-06,11:18:01 | INFO | report_to: wandb
129
+ 2025-05-06,11:18:01 | INFO | resume: None
130
+ 2025-05-06,11:18:01 | INFO | rope_attn_num_heads: 12
131
+ 2025-05-06,11:18:01 | INFO | rope_model_width: 768
132
+ 2025-05-06,11:18:01 | INFO | save_every_n_steps: 6104
133
+ 2025-05-06,11:18:01 | INFO | save_frequency: 1
134
+ 2025-05-06,11:18:01 | INFO | save_most_recent: False
135
+ 2025-05-06,11:18:01 | INFO | seed: 0
136
+ 2025-05-06,11:18:01 | INFO | siglip: False
137
+ 2025-05-06,11:18:01 | INFO | skip_scheduler: False
138
+ 2025-05-06,11:18:01 | INFO | tensorboard: False
139
+ 2025-05-06,11:18:01 | INFO | tensorboard_path:
140
+ 2025-05-06,11:18:01 | INFO | torchcompile: False
141
+ 2025-05-06,11:18:01 | INFO | torchscript: False
142
+ 2025-05-06,11:18:01 | INFO | trace: False
143
+ 2025-05-06,11:18:01 | INFO | train_data: /mnt/bn/zilongdata-hl/dataset/Recap-DataComp-1B-Dataset/{000000..140146}.tar
144
+ 2025-05-06,11:18:01 | INFO | train_data_upsampling_factors: None
145
+ 2025-05-06,11:18:01 | INFO | train_num_samples: 128000000
146
+ 2025-05-06,11:18:01 | INFO | use_bn_sync: False
147
+ 2025-05-06,11:18:01 | INFO | use_bnb_linear: None
148
+ 2025-05-06,11:18:01 | INFO | val_data: None
149
+ 2025-05-06,11:18:01 | INFO | val_frequency: 1
150
+ 2025-05-06,11:18:01 | INFO | val_num_samples: None
151
+ 2025-05-06,11:18:01 | INFO | val_steps: 0
152
+ 2025-05-06,11:18:01 | INFO | wandb: True
153
+ 2025-05-06,11:18:01 | INFO | wandb_notes:
154
+ 2025-05-06,11:18:01 | INFO | wandb_project_name: cls-clip-NDR
155
+ 2025-05-06,11:18:01 | INFO | warmup: 500
156
+ 2025-05-06,11:18:01 | INFO | wd: 0.2
157
+ 2025-05-06,11:18:01 | INFO | workers: 1
158
+ 2025-05-06,11:18:01 | INFO | world_size: 32
159
+ 2025-05-06,11:18:01 | INFO | zeroshot_frequency: 4
160
+ 2025-05-06,11:18:01 | INFO | zeroshot_steps: 0
161
+ 2025-05-06,11:18:18 | INFO | Start epoch 0
162
+ 2025-05-06,11:18:32 | INFO | Train Epoch: 0 [ 16384/128008192 (0%)] Data (t): 4.627 Batch (t): 13.271, 1234.61/s, 38.5816/s/gpu LR: 0.000002 Logit Scale: 14.286 Contrastive_loss: 9.7333 (9.7333) Loss: 9.7333 (9.7333)
163
+ 2025-05-06,11:32:12 | INFO | Train Epoch: 0 [ 2113536/128008192 (2%)] Data (t): 0.171 Batch (t): 6.407, 2544.74/s, 79.5230/s/gpu LR: 0.000258 Logit Scale: 14.243 Contrastive_loss: 9.0793 (9.4063) Loss: 9.0793 (9.4063)
164
+ 2025-05-06,11:45:51 | INFO | Train Epoch: 0 [ 4210688/128008192 (3%)] Data (t): 0.173 Batch (t): 6.397, 2560.74/s, 80.0232/s/gpu LR: 0.000514 Logit Scale: 14.201 Contrastive_loss: 9.1232 (9.3119) Loss: 9.1232 (9.3119)
165
+ 2025-05-06,11:50:14 | WARNING | Handling webdataset error (OSError('image file is truncated (25 bytes not processed)')). Ignoring.
166
+ 2025-05-06,11:59:29 | INFO | Train Epoch: 0 [ 6307840/128008192 (5%)] Data (t): 0.173 Batch (t): 6.394, 2555.63/s, 79.8634/s/gpu LR: 0.000770 Logit Scale: 14.170 Contrastive_loss: 8.7996 (9.1838) Loss: 8.7996 (9.1838)
167
+ 2025-05-06,12:13:07 | INFO | Train Epoch: 0 [ 8404992/128008192 (7%)] Data (t): 0.175 Batch (t): 6.393, 2560.84/s, 80.0264/s/gpu LR: 0.001000 Logit Scale: 14.364 Contrastive_loss: 8.5564 (9.0584) Loss: 8.5564 (9.0584)
168
+ 2025-05-06,12:26:46 | INFO | Train Epoch: 0 [ 10502144/128008192 (8%)] Data (t): 0.176 Batch (t): 6.397, 2561.09/s, 80.0340/s/gpu LR: 0.001000 Logit Scale: 15.097 Contrastive_loss: 8.0981 (8.8983) Loss: 8.0981 (8.8983)
169
+ 2025-05-06,12:30:58 | WARNING | Handling webdataset error (OSError('image file is truncated (21 bytes not processed)')). Ignoring.
170
+ 2025-05-06,12:34:41 | WARNING | Handling webdataset error (OSError('image file is truncated (104 bytes not processed)')). Ignoring.
171
+ 2025-05-06,12:40:26 | INFO | Train Epoch: 0 [ 12599296/128008192 (10%)] Data (t): 0.175 Batch (t): 6.407, 2547.65/s, 79.6141/s/gpu LR: 0.001000 Logit Scale: 16.282 Contrastive_loss: 8.0068 (8.7710) Loss: 8.0068 (8.7710)
172
+ 2025-05-06,12:54:07 | INFO | Train Epoch: 0 [ 14696448/128008192 (11%)] Data (t): 0.176 Batch (t): 6.412, 2555.22/s, 79.8507/s/gpu LR: 0.001000 Logit Scale: 17.469 Contrastive_loss: 7.3908 (8.5984) Loss: 7.3908 (8.5984)
173
+ 2025-05-06,13:06:25 | WARNING | Handling webdataset error (OSError('image file is truncated (23 bytes not processed)')). Ignoring.
174
+ 2025-05-06,13:07:47 | INFO | Train Epoch: 0 [ 16793600/128008192 (13%)] Data (t): 0.176 Batch (t): 6.409, 2542.17/s, 79.4428/s/gpu LR: 0.000999 Logit Scale: 19.156 Contrastive_loss: 7.0134 (8.4223) Loss: 7.0134 (8.4223)
175
+ 2025-05-06,13:21:28 | INFO | Train Epoch: 0 [ 18890752/128008192 (15%)] Data (t): 0.181 Batch (t): 6.411, 2560.27/s, 80.0085/s/gpu LR: 0.000999 Logit Scale: 21.348 Contrastive_loss: 5.0364 (8.0837) Loss: 5.0364 (8.0837)
176
+ 2025-05-06,13:35:07 | INFO | Train Epoch: 0 [ 20987904/128008192 (16%)] Data (t): 0.176 Batch (t): 6.404, 2563.84/s, 80.1200/s/gpu LR: 0.000998 Logit Scale: 23.239 Contrastive_loss: 6.3534 (7.9264) Loss: 6.3534 (7.9264)
177
+ 2025-05-06,13:48:47 | INFO | Train Epoch: 0 [ 23085056/128008192 (18%)] Data (t): 0.175 Batch (t): 6.402, 2550.28/s, 79.6964/s/gpu LR: 0.000998 Logit Scale: 25.494 Contrastive_loss: 6.1541 (7.7787) Loss: 6.1541 (7.7787)
178
+ 2025-05-06,14:02:28 | INFO | Train Epoch: 0 [ 25182208/128008192 (20%)] Data (t): 0.175 Batch (t): 6.416, 2553.43/s, 79.7947/s/gpu LR: 0.000997 Logit Scale: 27.896 Contrastive_loss: 5.7264 (7.6209) Loss: 5.7264 (7.6209)
179
+ 2025-05-06,14:16:09 | INFO | Train Epoch: 0 [ 27279360/128008192 (21%)] Data (t): 0.176 Batch (t): 6.416, 2558.93/s, 79.9665/s/gpu LR: 0.000996 Logit Scale: 29.863 Contrastive_loss: 5.8574 (7.4949) Loss: 5.8574 (7.4949)
180
+ 2025-05-06,14:29:50 | INFO | Train Epoch: 0 [ 29376512/128008192 (23%)] Data (t): 0.180 Batch (t): 6.410, 2564.51/s, 80.1410/s/gpu LR: 0.000996 Logit Scale: 32.126 Contrastive_loss: 5.4119 (7.3560) Loss: 5.4119 (7.3560)
181
+ 2025-05-06,14:34:08 | WARNING | Handling webdataset error (OSError('image file is truncated (1 bytes not processed)')). Ignoring.
182
+ 2025-05-06,14:43:32 | INFO | Train Epoch: 0 [ 31473664/128008192 (25%)] Data (t): 0.179 Batch (t): 6.423, 2564.95/s, 80.1547/s/gpu LR: 0.000995 Logit Scale: 34.452 Contrastive_loss: 4.8748 (7.2009) Loss: 4.8748 (7.2009)
183
+ 2025-05-06,14:57:14 | INFO | Train Epoch: 0 [ 33570816/128008192 (26%)] Data (t): 0.177 Batch (t): 6.420, 2550.28/s, 79.6961/s/gpu LR: 0.000994 Logit Scale: 35.396 Contrastive_loss: 5.2292 (7.0850) Loss: 5.2292 (7.0850)
184
+ 2025-05-06,15:04:17 | WARNING | Handling webdataset error (OSError('image file is truncated (0 bytes not processed)')). Ignoring.
185
+ 2025-05-06,15:10:54 | INFO | Train Epoch: 0 [ 35667968/128008192 (28%)] Data (t): 0.174 Batch (t): 6.409, 2550.87/s, 79.7147/s/gpu LR: 0.000993 Logit Scale: 37.256 Contrastive_loss: 4.7600 (6.9558) Loss: 4.7600 (6.9558)
186
+ 2025-05-06,15:24:34 | INFO | Train Epoch: 0 [ 37765120/128008192 (30%)] Data (t): 0.173 Batch (t): 6.409, 2560.01/s, 80.0004/s/gpu LR: 0.000992 Logit Scale: 39.138 Contrastive_loss: 4.4714 (6.8250) Loss: 4.4714 (6.8250)
187
+ 2025-05-06,15:26:11 | WARNING | Handling webdataset error (OSError('image file is truncated (18 bytes not processed)')). Ignoring.
188
+ 2025-05-06,15:38:15 | INFO | Train Epoch: 0 [ 39862272/128008192 (31%)] Data (t): 0.176 Batch (t): 6.415, 2542.76/s, 79.4613/s/gpu LR: 0.000990 Logit Scale: 41.012 Contrastive_loss: 2.2269 (6.5951) Loss: 2.2269 (6.5951)
189
+ 2025-05-06,15:51:56 | INFO | Train Epoch: 0 [ 41959424/128008192 (33%)] Data (t): 0.172 Batch (t): 6.410, 2559.54/s, 79.9858/s/gpu LR: 0.000989 Logit Scale: 42.573 Contrastive_loss: 2.2209 (6.3868) Loss: 2.2209 (6.3868)
190
+ 2025-05-06,16:05:36 | INFO | Train Epoch: 0 [ 44056576/128008192 (34%)] Data (t): 0.174 Batch (t): 6.408, 2556.29/s, 79.8842/s/gpu LR: 0.000988 Logit Scale: 44.199 Contrastive_loss: 2.0094 (6.1879) Loss: 2.0094 (6.1879)
191
+ 2025-05-06,16:19:17 | INFO | Train Epoch: 0 [ 46153728/128008192 (36%)] Data (t): 0.173 Batch (t): 6.413, 2560.07/s, 80.0022/s/gpu LR: 0.000986 Logit Scale: 45.615 Contrastive_loss: 3.9921 (6.0924) Loss: 3.9921 (6.0924)
192
+ 2025-05-06,16:32:59 | INFO | Train Epoch: 0 [ 48250880/128008192 (38%)] Data (t): 0.181 Batch (t): 6.421, 2548.30/s, 79.6344/s/gpu LR: 0.000984 Logit Scale: 47.028 Contrastive_loss: 1.5336 (5.9024) Loss: 1.5336 (5.9024)
193
+ 2025-05-06,16:46:41 | INFO | Train Epoch: 0 [ 50348032/128008192 (39%)] Data (t): 0.181 Batch (t): 6.419, 2563.26/s, 80.1019/s/gpu LR: 0.000983 Logit Scale: 48.203 Contrastive_loss: 3.6447 (5.8121) Loss: 3.6447 (5.8121)
194
+ 2025-05-06,17:00:23 | INFO | Train Epoch: 0 [ 52445184/128008192 (41%)] Data (t): 0.181 Batch (t): 6.422, 2550.52/s, 79.7038/s/gpu LR: 0.000981 Logit Scale: 49.368 Contrastive_loss: 3.6589 (5.7293) Loss: 3.6589 (5.7293)
195
+ 2025-05-06,17:14:04 | INFO | Train Epoch: 0 [ 54542336/128008192 (43%)] Data (t): 0.181 Batch (t): 6.415, 2565.55/s, 80.1734/s/gpu LR: 0.000979 Logit Scale: 50.507 Contrastive_loss: 3.6747 (5.6532) Loss: 3.6747 (5.6532)
196
+ 2025-05-06,17:27:45 | INFO | Train Epoch: 0 [ 56639488/128008192 (44%)] Data (t): 0.180 Batch (t): 6.414, 2544.03/s, 79.5010/s/gpu LR: 0.000977 Logit Scale: 51.522 Contrastive_loss: 3.4208 (5.5735) Loss: 3.4208 (5.5735)
197
+ 2025-05-06,17:41:26 | INFO | Train Epoch: 0 [ 58736640/128008192 (46%)] Data (t): 0.180 Batch (t): 6.417, 2548.60/s, 79.6437/s/gpu LR: 0.000975 Logit Scale: 52.419 Contrastive_loss: 3.5051 (5.5022) Loss: 3.5051 (5.5022)
198
+ 2025-05-06,17:46:41 | WARNING | Handling webdataset error (OSError('image file is truncated (82 bytes not processed)')). Ignoring.
199
+ 2025-05-06,17:55:07 | INFO | Train Epoch: 0 [ 60833792/128008192 (48%)] Data (t): 0.182 Batch (t): 6.417, 2549.94/s, 79.6856/s/gpu LR: 0.000973 Logit Scale: 53.463 Contrastive_loss: 3.3447 (5.4303) Loss: 3.3447 (5.4303)
200
+ 2025-05-06,18:08:49 | INFO | Train Epoch: 0 [ 62930944/128008192 (49%)] Data (t): 0.181 Batch (t): 6.421, 2556.87/s, 79.9021/s/gpu LR: 0.000971 Logit Scale: 54.365 Contrastive_loss: 3.3730 (5.3639) Loss: 3.3730 (5.3639)
201
+ 2025-05-06,18:22:31 | INFO | Train Epoch: 0 [ 65028096/128008192 (51%)] Data (t): 0.181 Batch (t): 6.419, 2551.03/s, 79.7197/s/gpu LR: 0.000969 Logit Scale: 55.365 Contrastive_loss: 2.9639 (5.2889) Loss: 2.9639 (5.2889)
202
+ 2025-05-06,18:36:12 | INFO | Train Epoch: 0 [ 67125248/128008192 (52%)] Data (t): 0.183 Batch (t): 6.412, 2546.09/s, 79.5653/s/gpu LR: 0.000967 Logit Scale: 56.055 Contrastive_loss: 2.8990 (5.2165) Loss: 2.8990 (5.2165)
203
+ 2025-05-06,18:48:17 | WARNING | Handling webdataset error (OSError('image file is truncated (46 bytes not processed)')). Ignoring.
204
+ 2025-05-06,18:49:53 | INFO | Train Epoch: 0 [ 69222400/128008192 (54%)] Data (t): 0.181 Batch (t): 6.414, 2554.19/s, 79.8185/s/gpu LR: 0.000964 Logit Scale: 56.922 Contrastive_loss: 1.5903 (5.1098) Loss: 1.5903 (5.1098)
205
+ 2025-05-06,18:59:45 | WARNING | Handling webdataset error (OSError('image file is truncated (31 bytes not processed)')). Ignoring.
206
+ 2025-05-06,19:03:36 | INFO | Train Epoch: 0 [ 71319552/128008192 (56%)] Data (t): 0.181 Batch (t): 6.434, 2540.58/s, 79.3931/s/gpu LR: 0.000962 Logit Scale: 57.702 Contrastive_loss: 1.4186 (5.0044) Loss: 1.4186 (5.0044)
207
+ 2025-05-06,19:17:17 | INFO | Train Epoch: 0 [ 73416704/128008192 (57%)] Data (t): 0.182 Batch (t): 6.413, 2552.05/s, 79.7514/s/gpu LR: 0.000959 Logit Scale: 58.171 Contrastive_loss: 2.8611 (4.9448) Loss: 2.8611 (4.9448)
208
+ 2025-05-06,19:30:59 | INFO | Train Epoch: 0 [ 75513856/128008192 (59%)] Data (t): 0.186 Batch (t): 6.421, 2559.92/s, 79.9975/s/gpu LR: 0.000957 Logit Scale: 59.044 Contrastive_loss: 3.1587 (4.8965) Loss: 3.1587 (4.8965)
209
+ 2025-05-06,19:42:39 | WARNING | Handling webdataset error (OSError('image file is truncated (64 bytes not processed)')). Ignoring.
210
+ 2025-05-06,19:44:40 | INFO | Train Epoch: 0 [ 77611008/128008192 (61%)] Data (t): 0.181 Batch (t): 6.418, 2555.23/s, 79.8511/s/gpu LR: 0.000954 Logit Scale: 59.740 Contrastive_loss: 2.6725 (4.8380) Loss: 2.6725 (4.8380)
211
+ 2025-05-06,19:58:22 | INFO | Train Epoch: 0 [ 79708160/128008192 (62%)] Data (t): 0.182 Batch (t): 6.415, 2551.39/s, 79.7308/s/gpu LR: 0.000951 Logit Scale: 60.393 Contrastive_loss: 2.9092 (4.7886) Loss: 2.9092 (4.7886)
212
+ 2025-05-06,20:12:03 | INFO | Train Epoch: 0 [ 81805312/128008192 (64%)] Data (t): 0.183 Batch (t): 6.416, 2562.50/s, 80.0780/s/gpu LR: 0.000948 Logit Scale: 61.061 Contrastive_loss: 2.7537 (4.7377) Loss: 2.7537 (4.7377)
213
+ 2025-05-06,20:25:44 | INFO | Train Epoch: 0 [ 83902464/128008192 (66%)] Data (t): 0.180 Batch (t): 6.415, 2553.80/s, 79.8061/s/gpu LR: 0.000945 Logit Scale: 61.694 Contrastive_loss: 2.7510 (4.6892) Loss: 2.7510 (4.6892)
214
+ 2025-05-06,20:39:25 | INFO | Train Epoch: 0 [ 85999616/128008192 (67%)] Data (t): 0.181 Batch (t): 6.417, 2562.41/s, 80.0752/s/gpu LR: 0.000942 Logit Scale: 62.253 Contrastive_loss: 2.5767 (4.6389) Loss: 2.5767 (4.6389)
215
+ 2025-05-06,20:53:06 | INFO | Train Epoch: 0 [ 88096768/128008192 (69%)] Data (t): 0.181 Batch (t): 6.414, 2558.31/s, 79.9472/s/gpu LR: 0.000939 Logit Scale: 62.749 Contrastive_loss: 2.7887 (4.5959) Loss: 2.7887 (4.5959)
216
+ 2025-05-06,21:06:47 | INFO | Train Epoch: 0 [ 90193920/128008192 (70%)] Data (t): 0.182 Batch (t): 6.412, 2558.63/s, 79.9572/s/gpu LR: 0.000936 Logit Scale: 63.314 Contrastive_loss: 2.6883 (4.5526) Loss: 2.6883 (4.5526)
217
+ 2025-05-06,21:20:27 | INFO | Train Epoch: 0 [ 92291072/128008192 (72%)] Data (t): 0.180 Batch (t): 6.408, 2555.82/s, 79.8694/s/gpu LR: 0.000933 Logit Scale: 63.684 Contrastive_loss: 2.6334 (4.5099) Loss: 2.6334 (4.5099)
218
+ 2025-05-06,21:34:09 | INFO | Train Epoch: 0 [ 94388224/128008192 (74%)] Data (t): 0.182 Batch (t): 6.421, 2532.53/s, 79.1415/s/gpu LR: 0.000930 Logit Scale: 64.198 Contrastive_loss: 2.8361 (4.4735) Loss: 2.8361 (4.4735)
219
+ 2025-05-06,21:47:59 | INFO | Train Epoch: 0 [ 96485376/128008192 (75%)] Data (t): 0.182 Batch (t): 6.478, 2529.06/s, 79.0331/s/gpu LR: 0.000926 Logit Scale: 64.779 Contrastive_loss: 2.3765 (4.4289) Loss: 2.3765 (4.4289)
220
+ 2025-05-06,22:01:48 | INFO | Train Epoch: 0 [ 98582528/128008192 (77%)] Data (t): 0.181 Batch (t): 6.481, 2524.29/s, 78.8842/s/gpu LR: 0.000923 Logit Scale: 65.296 Contrastive_loss: 2.7193 (4.3933) Loss: 2.7193 (4.3933)
221
+ 2025-05-06,22:15:49 | INFO | Train Epoch: 0 [100679680/128008192 (79%)] Data (t): 0.274 Batch (t): 6.567, 2533.51/s, 79.1721/s/gpu LR: 0.000919 Logit Scale: 65.682 Contrastive_loss: 2.4283 (4.3532) Loss: 2.4283 (4.3532)
222
+ 2025-05-06,22:29:36 | INFO | Train Epoch: 0 [102776832/128008192 (80%)] Data (t): 0.185 Batch (t): 6.465, 2536.74/s, 79.2730/s/gpu LR: 0.000916 Logit Scale: 66.119 Contrastive_loss: 2.4963 (4.3160) Loss: 2.4963 (4.3160)
223
+ 2025-05-06,22:43:20 | INFO | Train Epoch: 0 [104873984/128008192 (82%)] Data (t): 0.185 Batch (t): 6.436, 2554.15/s, 79.8173/s/gpu LR: 0.000912 Logit Scale: 66.586 Contrastive_loss: 2.5728 (4.2819) Loss: 2.5728 (4.2819)
224
+ 2025-05-06,22:57:02 | INFO | Train Epoch: 0 [106971136/128008192 (84%)] Data (t): 0.182 Batch (t): 6.419, 2544.59/s, 79.5184/s/gpu LR: 0.000908 Logit Scale: 67.054 Contrastive_loss: 2.2404 (4.2426) Loss: 2.2404 (4.2426)
225
+ 2025-05-06,23:10:43 | INFO | Train Epoch: 0 [109068288/128008192 (85%)] Data (t): 0.182 Batch (t): 6.417, 2553.94/s, 79.8108/s/gpu LR: 0.000904 Logit Scale: 67.553 Contrastive_loss: 1.2048 (4.1853) Loss: 1.2048 (4.1853)
226
+ 2025-05-06,23:16:31 | WARNING | Handling webdataset error (OSError('image file is truncated (59 bytes not processed)')). Ignoring.
227
+ 2025-05-06,23:24:26 | INFO | Train Epoch: 0 [111165440/128008192 (87%)] Data (t): 0.180 Batch (t): 6.434, 2558.04/s, 79.9388/s/gpu LR: 0.000900 Logit Scale: 68.004 Contrastive_loss: 2.4082 (4.1524) Loss: 2.4082 (4.1524)
228
+ 2025-05-06,23:38:09 | INFO | Train Epoch: 0 [113262592/128008192 (88%)] Data (t): 0.182 Batch (t): 6.428, 2540.59/s, 79.3934/s/gpu LR: 0.000897 Logit Scale: 68.416 Contrastive_loss: 2.1269 (4.1156) Loss: 2.1269 (4.1156)
229
+ 2025-05-06,23:51:53 | INFO | Train Epoch: 0 [115359744/128008192 (90%)] Data (t): 0.181 Batch (t): 6.438, 2543.29/s, 79.4779/s/gpu LR: 0.000892 Logit Scale: 68.786 Contrastive_loss: 2.2505 (4.0822) Loss: 2.2505 (4.0822)
230
+ 2025-05-07,00:05:38 | INFO | Train Epoch: 0 [117456896/128008192 (92%)] Data (t): 0.181 Batch (t): 6.442, 2545.60/s, 79.5499/s/gpu LR: 0.000888 Logit Scale: 69.259 Contrastive_loss: 2.2192 (4.0496) Loss: 2.2192 (4.0496)
231
+ 2025-05-07,00:19:22 | INFO | Train Epoch: 0 [119554048/128008192 (93%)] Data (t): 0.181 Batch (t): 6.436, 2556.31/s, 79.8847/s/gpu LR: 0.000884 Logit Scale: 69.660 Contrastive_loss: 2.2746 (4.0190) Loss: 2.2746 (4.0190)
232
+ 2025-05-07,00:28:35 | WARNING | Handling webdataset error (OSError('image file is truncated (82 bytes not processed)')). Ignoring.
233
+ 2025-05-07,00:33:03 | INFO | Train Epoch: 0 [121651200/128008192 (95%)] Data (t): 0.182 Batch (t): 6.418, 2562.74/s, 80.0855/s/gpu LR: 0.000880 Logit Scale: 70.146 Contrastive_loss: 2.1285 (3.9869) Loss: 2.1285 (3.9869)
234
+ 2025-05-07,00:35:25 | WARNING | Handling webdataset error (OSError('image file is truncated (37 bytes not processed)')). Ignoring.
235
+ 2025-05-07,00:39:09 | WARNING | Handling webdataset error (OSError('image file is truncated (4 bytes not processed)')). Ignoring.
236
+ 2025-05-07,00:45:16 | WARNING | Handling webdataset error (OSError('image file is truncated (88 bytes not processed)')). Ignoring.
237
+ 2025-05-07,00:46:44 | INFO | Train Epoch: 0 [123748352/128008192 (97%)] Data (t): 0.182 Batch (t): 6.412, 2556.18/s, 79.8805/s/gpu LR: 0.000876 Logit Scale: 70.530 Contrastive_loss: 2.0094 (3.9540) Loss: 2.0094 (3.9540)
238
+ 2025-05-07,01:00:24 | INFO | Train Epoch: 0 [125845504/128008192 (98%)] Data (t): 0.181 Batch (t): 6.411, 2554.21/s, 79.8190/s/gpu LR: 0.000871 Logit Scale: 70.927 Contrastive_loss: 1.0926 (3.9071) Loss: 1.0926 (3.9071)
239
+ 2025-05-07,01:14:06 | INFO | Train Epoch: 0 [127942656/128008192 (100%)] Data (t): 0.183 Batch (t): 6.416, 2557.58/s, 79.9243/s/gpu LR: 0.000867 Logit Scale: 71.267 Contrastive_loss: 2.0225 (3.8767) Loss: 2.0225 (3.8767)
240
+ 2025-05-07,01:14:31 | INFO | Train Epoch: 0 [128008192/128008192 (100%)] Data (t): 0.185 Batch (t): 6.410, 2563.91/s, 80.1222/s/gpu LR: 0.000867 Logit Scale: 71.282 Contrastive_loss: 0.96909 (3.8305) Loss: 0.96909 (3.8305)
241
+ 2025-05-07,01:14:49 | INFO | Start epoch 1
242
+ 2025-05-07,01:14:59 | INFO | Train Epoch: 1 [ 16384/128008192 (0%)] Data (t): 4.266 Batch (t): 10.411, 1573.73/s, 49.1790/s/gpu LR: 0.000867 Logit Scale: 71.274 Contrastive_loss: 1.9177 (1.9177) Loss: 1.9177 (1.9177)
243
+ 2025-05-07,01:28:40 | INFO | Train Epoch: 1 [ 2113536/128008192 (2%)] Data (t): 0.180 Batch (t): 6.410, 2546.09/s, 79.5653/s/gpu LR: 0.000862 Logit Scale: 71.705 Contrastive_loss: 2.0434 (1.9805) Loss: 2.0434 (1.9805)
244
+ 2025-05-07,01:32:32 | WARNING | Handling webdataset error (OSError('image file is truncated (53 bytes not processed)')). Ignoring.
245
+ 2025-05-07,01:42:27 | INFO | Train Epoch: 1 [ 4210688/128008192 (3%)] Data (t): 0.182 Batch (t): 6.460, 2532.43/s, 79.1384/s/gpu LR: 0.000858 Logit Scale: 72.091 Contrastive_loss: 1.8296 (1.9302) Loss: 1.8296 (1.9302)
246
+ 2025-05-07,01:56:16 | INFO | Train Epoch: 1 [ 6307840/128008192 (5%)] Data (t): 0.182 Batch (t): 6.482, 2530.72/s, 79.0850/s/gpu LR: 0.000853 Logit Scale: 72.565 Contrastive_loss: 1.0221 (1.7032) Loss: 1.0221 (1.7032)
247
+ 2025-05-07,02:10:06 | INFO | Train Epoch: 1 [ 8404992/128008192 (7%)] Data (t): 0.183 Batch (t): 6.481, 2521.64/s, 78.8014/s/gpu LR: 0.000849 Logit Scale: 72.948 Contrastive_loss: 1.9116 (1.7449) Loss: 1.9116 (1.7449)
248
+ 2025-05-07,02:23:56 | INFO | Train Epoch: 1 [ 10502144/128008192 (8%)] Data (t): 0.181 Batch (t): 6.487, 2528.35/s, 79.0110/s/gpu LR: 0.000844 Logit Scale: 73.274 Contrastive_loss: 1.8069 (1.7552) Loss: 1.8069 (1.7552)
249
+ 2025-05-07,02:37:46 | INFO | Train Epoch: 1 [ 12599296/128008192 (10%)] Data (t): 0.182 Batch (t): 6.482, 2531.77/s, 79.1179/s/gpu LR: 0.000839 Logit Scale: 73.544 Contrastive_loss: 1.9214 (1.7789) Loss: 1.9214 (1.7789)
250
+ 2025-05-07,02:51:36 | INFO | Train Epoch: 1 [ 14696448/128008192 (11%)] Data (t): 0.180 Batch (t): 6.482, 2530.09/s, 79.0652/s/gpu LR: 0.000834 Logit Scale: 73.936 Contrastive_loss: 1.9687 (1.8027) Loss: 1.9687 (1.8027)
251
+ 2025-05-07,03:05:25 | INFO | Train Epoch: 1 [ 16793600/128008192 (13%)] Data (t): 0.182 Batch (t): 6.479, 2533.14/s, 79.1605/s/gpu LR: 0.000829 Logit Scale: 74.281 Contrastive_loss: 1.9881 (1.8233) Loss: 1.9881 (1.8233)
252
+ 2025-05-07,03:19:14 | INFO | Train Epoch: 1 [ 18890752/128008192 (15%)] Data (t): 0.180 Batch (t): 6.477, 2532.61/s, 79.1439/s/gpu LR: 0.000824 Logit Scale: 74.579 Contrastive_loss: 1.8266 (1.8236) Loss: 1.8266 (1.8236)
253
+ 2025-05-07,03:33:04 | INFO | Train Epoch: 1 [ 20987904/128008192 (16%)] Data (t): 0.181 Batch (t): 6.487, 2530.66/s, 79.0830/s/gpu LR: 0.000819 Logit Scale: 75.032 Contrastive_loss: 1.9037 (1.8309) Loss: 1.9037 (1.8309)
254
+ 2025-05-07,03:36:14 | WARNING | Handling webdataset error (OSError('image file is truncated (32 bytes not processed)')). Ignoring.
255
+ 2025-05-07,03:46:54 | INFO | Train Epoch: 1 [ 23085056/128008192 (18%)] Data (t): 0.183 Batch (t): 6.485, 2533.17/s, 79.1615/s/gpu LR: 0.000814 Logit Scale: 75.278 Contrastive_loss: 1.8527 (1.8327) Loss: 1.8527 (1.8327)
256
+ 2025-05-07,04:00:44 | INFO | Train Epoch: 1 [ 25182208/128008192 (20%)] Data (t): 0.182 Batch (t): 6.478, 2535.73/s, 79.2417/s/gpu LR: 0.000809 Logit Scale: 75.541 Contrastive_loss: 1.7299 (1.8248) Loss: 1.7299 (1.8248)
257
+ 2025-05-07,04:12:17 | WARNING | Handling webdataset error (OSError('image file is truncated (5 bytes not processed)')). Ignoring.
258
+ 2025-05-07,04:14:28 | WARNING | Handling webdataset error (OSError('image file is truncated (5 bytes not processed)')). Ignoring.
259
+ 2025-05-07,04:14:32 | INFO | Train Epoch: 1 [ 27279360/128008192 (21%)] Data (t): 0.183 Batch (t): 6.475, 2530.63/s, 79.0821/s/gpu LR: 0.000804 Logit Scale: 75.826 Contrastive_loss: 1.8436 (1.8262) Loss: 1.8436 (1.8262)
260
+ 2025-05-07,04:20:29 | WARNING | Handling webdataset error (OSError('image file is truncated (131 bytes not processed)')). Ignoring.
261
+ 2025-05-07,04:28:21 | INFO | Train Epoch: 1 [ 29376512/128008192 (23%)] Data (t): 0.183 Batch (t): 6.477, 2533.17/s, 79.1617/s/gpu LR: 0.000799 Logit Scale: 76.108 Contrastive_loss: 1.6174 (1.8122) Loss: 1.6174 (1.8122)
262
+ 2025-05-07,04:30:33 | WARNING | Handling webdataset error (OSError('image file is truncated (15 bytes not processed)')). Ignoring.
263
+ 2025-05-07,04:42:11 | INFO | Train Epoch: 1 [ 31473664/128008192 (25%)] Data (t): 0.184 Batch (t): 6.481, 2528.36/s, 79.0111/s/gpu LR: 0.000794 Logit Scale: 76.363 Contrastive_loss: 1.8645 (1.8155) Loss: 1.8645 (1.8155)
264
+ 2025-05-07,04:56:00 | INFO | Train Epoch: 1 [ 33570816/128008192 (26%)] Data (t): 0.181 Batch (t): 6.477, 2528.40/s, 79.0125/s/gpu LR: 0.000788 Logit Scale: 76.721 Contrastive_loss: 1.6262 (1.8044) Loss: 1.6262 (1.8044)
265
+ 2025-05-07,05:09:48 | INFO | Train Epoch: 1 [ 35667968/128008192 (28%)] Data (t): 0.180 Batch (t): 6.471, 2535.33/s, 79.2289/s/gpu LR: 0.000783 Logit Scale: 77.041 Contrastive_loss: 1.0209 (1.7608) Loss: 1.0209 (1.7608)
266
+ 2025-05-07,05:22:01 | WARNING | Handling webdataset error (OSError('image file is truncated (186 bytes not processed)')). Ignoring.
267
+ 2025-05-07,05:23:37 | INFO | Train Epoch: 1 [ 37765120/128008192 (30%)] Data (t): 0.181 Batch (t): 6.473, 2533.53/s, 79.1729/s/gpu LR: 0.000777 Logit Scale: 77.321 Contrastive_loss: 1.6161 (1.7532) Loss: 1.6161 (1.7532)
268
+ 2025-05-07,05:37:25 | INFO | Train Epoch: 1 [ 39862272/128008192 (31%)] Data (t): 0.180 Batch (t): 6.474, 2531.87/s, 79.1210/s/gpu LR: 0.000772 Logit Scale: 77.698 Contrastive_loss: 1.7779 (1.7545) Loss: 1.7779 (1.7545)
269
+ 2025-05-07,05:51:14 | INFO | Train Epoch: 1 [ 41959424/128008192 (33%)] Data (t): 0.180 Batch (t): 6.474, 2532.54/s, 79.1419/s/gpu LR: 0.000767 Logit Scale: 77.990 Contrastive_loss: 1.7784 (1.7556) Loss: 1.7784 (1.7556)
270
+ 2025-05-07,06:05:02 | INFO | Train Epoch: 1 [ 44056576/128008192 (34%)] Data (t): 0.180 Batch (t): 6.470, 2536.71/s, 79.2723/s/gpu LR: 0.000761 Logit Scale: 78.243 Contrastive_loss: 0.81705 (1.7129) Loss: 0.81705 (1.7129)
271
+ 2025-05-07,06:18:51 | INFO | Train Epoch: 1 [ 46153728/128008192 (36%)] Data (t): 0.182 Batch (t): 6.474, 2531.03/s, 79.0946/s/gpu LR: 0.000755 Logit Scale: 78.429 Contrastive_loss: 1.6104 (1.7085) Loss: 1.6104 (1.7085)
272
+ 2025-05-07,06:32:40 | INFO | Train Epoch: 1 [ 48250880/128008192 (38%)] Data (t): 0.181 Batch (t): 6.475, 2529.89/s, 79.0591/s/gpu LR: 0.000750 Logit Scale: 78.643 Contrastive_loss: 1.5814 (1.7032) Loss: 1.5814 (1.7032)
273
+ 2025-05-07,06:46:28 | INFO | Train Epoch: 1 [ 50348032/128008192 (39%)] Data (t): 0.181 Batch (t): 6.471, 2530.08/s, 79.0651/s/gpu LR: 0.000744 Logit Scale: 78.934 Contrastive_loss: 1.5512 (1.6971) Loss: 1.5512 (1.6971)
274
+ 2025-05-07,06:49:58 | WARNING | Handling webdataset error (OSError('image file is truncated (5 bytes not processed)')). Ignoring.
275
+ 2025-05-07,06:51:21 | WARNING | Handling webdataset error (OSError('image file is truncated (14 bytes not processed)')). Ignoring.
276
+ 2025-05-07,07:00:17 | INFO | Train Epoch: 1 [ 52445184/128008192 (41%)] Data (t): 0.182 Batch (t): 6.475, 2527.38/s, 78.9808/s/gpu LR: 0.000738 Logit Scale: 79.213 Contrastive_loss: 0.91170 (1.6669) Loss: 0.91170 (1.6669)
277
+ 2025-05-07,07:14:06 | INFO | Train Epoch: 1 [ 54542336/128008192 (43%)] Data (t): 0.183 Batch (t): 6.479, 2521.94/s, 78.8106/s/gpu LR: 0.000733 Logit Scale: 79.492 Contrastive_loss: 1.5317 (1.6619) Loss: 1.5317 (1.6619)
278
+ 2025-05-07,07:27:55 | INFO | Train Epoch: 1 [ 56639488/128008192 (44%)] Data (t): 0.184 Batch (t): 6.476, 2538.97/s, 79.3428/s/gpu LR: 0.000727 Logit Scale: 79.757 Contrastive_loss: 1.5126 (1.6566) Loss: 1.5126 (1.6566)
279
+ 2025-05-07,07:32:42 | WARNING | Handling webdataset error (OSError('image file is truncated (1 bytes not processed)')). Ignoring.
280
+ 2025-05-07,07:41:44 | INFO | Train Epoch: 1 [ 58736640/128008192 (46%)] Data (t): 0.183 Batch (t): 6.475, 2532.44/s, 79.1388/s/gpu LR: 0.000721 Logit Scale: 79.881 Contrastive_loss: 1.4215 (1.6485) Loss: 1.4215 (1.6485)
281
+ 2025-05-07,07:44:34 | WARNING | Handling webdataset error (OSError('image file is truncated (7 bytes not processed)')). Ignoring.
282
+ 2025-05-07,07:55:34 | INFO | Train Epoch: 1 [ 60833792/128008192 (48%)] Data (t): 0.182 Batch (t): 6.483, 2535.53/s, 79.2355/s/gpu LR: 0.000715 Logit Scale: 80.159 Contrastive_loss: 1.5078 (1.6438) Loss: 1.5078 (1.6438)
283
+ 2025-05-07,08:09:24 | INFO | Train Epoch: 1 [ 62930944/128008192 (49%)] Data (t): 0.183 Batch (t): 6.483, 2534.71/s, 79.2097/s/gpu LR: 0.000709 Logit Scale: 80.353 Contrastive_loss: 1.3391 (1.6339) Loss: 1.3391 (1.6339)
284
+ 2025-05-07,08:22:08 | WARNING | Handling webdataset error (OSError('image file is truncated (28 bytes not processed)')). Ignoring.
285
+ 2025-05-07,08:23:13 | INFO | Train Epoch: 1 [ 65028096/128008192 (51%)] Data (t): 0.180 Batch (t): 6.478, 2524.50/s, 78.8905/s/gpu LR: 0.000703 Logit Scale: 80.585 Contrastive_loss: 1.3414 (1.6248) Loss: 1.3414 (1.6248)
286
+ 2025-05-07,08:29:57 | WARNING | Handling webdataset error (OSError('image file is truncated (12 bytes not processed)')). Ignoring.
287
+ 2025-05-07,08:37:02 | INFO | Train Epoch: 1 [ 67125248/128008192 (52%)] Data (t): 0.181 Batch (t): 6.482, 2522.27/s, 78.8211/s/gpu LR: 0.000697 Logit Scale: 80.792 Contrastive_loss: 1.4539 (1.6196) Loss: 1.4539 (1.6196)
288
+ 2025-05-07,08:50:51 | INFO | Train Epoch: 1 [ 69222400/128008192 (54%)] Data (t): 0.183 Batch (t): 6.475, 2534.06/s, 79.1893/s/gpu LR: 0.000691 Logit Scale: 81.002 Contrastive_loss: 1.0537 (1.6030) Loss: 1.0537 (1.6030)
289
+ 2025-05-07,09:04:40 | INFO | Train Epoch: 1 [ 71319552/128008192 (56%)] Data (t): 0.182 Batch (t): 6.474, 2530.42/s, 79.0756/s/gpu LR: 0.000685 Logit Scale: 81.295 Contrastive_loss: 1.5639 (1.6019) Loss: 1.5639 (1.6019)
290
+ 2025-05-07,09:18:40 | INFO | Train Epoch: 1 [ 73416704/128008192 (57%)] Data (t): 0.269 Batch (t): 6.566, 2534.00/s, 79.1876/s/gpu LR: 0.000679 Logit Scale: 81.498 Contrastive_loss: 1.0619 (1.5869) Loss: 1.0619 (1.5869)
291
+ 2025-05-07,09:32:30 | INFO | Train Epoch: 1 [ 75513856/128008192 (59%)] Data (t): 0.181 Batch (t): 6.480, 2534.11/s, 79.1910/s/gpu LR: 0.000673 Logit Scale: 81.690 Contrastive_loss: 0.85471 (1.5671) Loss: 0.85471 (1.5671)
292
+ 2025-05-07,09:46:20 | INFO | Train Epoch: 1 [ 77611008/128008192 (61%)] Data (t): 0.181 Batch (t): 6.484, 2531.07/s, 79.0959/s/gpu LR: 0.000667 Logit Scale: 81.856 Contrastive_loss: 1.3957 (1.5626) Loss: 1.3957 (1.5626)
293
+ 2025-05-07,10:00:09 | INFO | Train Epoch: 1 [ 79708160/128008192 (62%)] Data (t): 0.182 Batch (t): 6.481, 2538.76/s, 79.3361/s/gpu LR: 0.000661 Logit Scale: 82.135 Contrastive_loss: 1.5249 (1.5616) Loss: 1.5249 (1.5616)
294
+ 2025-05-07,10:14:00 | INFO | Train Epoch: 1 [ 81805312/128008192 (64%)] Data (t): 0.183 Batch (t): 6.488, 2532.19/s, 79.1311/s/gpu LR: 0.000654 Logit Scale: 82.245 Contrastive_loss: 1.3272 (1.5557) Loss: 1.3272 (1.5557)
295
+ 2025-05-07,10:27:50 | INFO | Train Epoch: 1 [ 83902464/128008192 (66%)] Data (t): 0.184 Batch (t): 6.487, 2518.84/s, 78.7138/s/gpu LR: 0.000648 Logit Scale: 82.452 Contrastive_loss: 1.4105 (1.5522) Loss: 1.4105 (1.5522)
296
+ 2025-05-07,10:41:41 | INFO | Train Epoch: 1 [ 85999616/128008192 (67%)] Data (t): 0.182 Batch (t): 6.488, 2533.04/s, 79.1574/s/gpu LR: 0.000642 Logit Scale: 82.538 Contrastive_loss: 0.64785 (1.5307) Loss: 0.64785 (1.5307)
297
+ 2025-05-07,10:55:30 | INFO | Train Epoch: 1 [ 88096768/128008192 (69%)] Data (t): 0.182 Batch (t): 6.483, 2530.81/s, 79.0877/s/gpu LR: 0.000636 Logit Scale: 82.735 Contrastive_loss: 1.3310 (1.5260) Loss: 1.3310 (1.5260)
298
+ 2025-05-07,11:09:19 | INFO | Train Epoch: 1 [ 90193920/128008192 (70%)] Data (t): 0.182 Batch (t): 6.473, 2536.34/s, 79.2606/s/gpu LR: 0.000629 Logit Scale: 82.960 Contrastive_loss: 1.2809 (1.5204) Loss: 1.2809 (1.5204)
299
+ 2025-05-07,11:23:08 | INFO | Train Epoch: 1 [ 92291072/128008192 (72%)] Data (t): 0.182 Batch (t): 6.477, 2523.08/s, 78.8461/s/gpu LR: 0.000623 Logit Scale: 83.194 Contrastive_loss: 0.92749 (1.5073) Loss: 0.92749 (1.5073)
300
+ 2025-05-07,11:27:34 | WARNING | Handling webdataset error (OSError('image file is truncated (99 bytes not processed)')). Ignoring.
301
+ 2025-05-07,11:31:30 | WARNING | Handling webdataset error (OSError('image file is truncated (45 bytes not processed)')). Ignoring.
302
+ 2025-05-07,11:36:50 | INFO | Train Epoch: 1 [ 94388224/128008192 (74%)] Data (t): 0.181 Batch (t): 6.420, 2543.32/s, 79.4788/s/gpu LR: 0.000617 Logit Scale: 83.278 Contrastive_loss: 1.4370 (1.5057) Loss: 1.4370 (1.5057)
303
+ 2025-05-07,11:50:30 | INFO | Train Epoch: 1 [ 96485376/128008192 (75%)] Data (t): 0.184 Batch (t): 6.412, 2569.22/s, 80.2882/s/gpu LR: 0.000610 Logit Scale: 83.489 Contrastive_loss: 0.91950 (1.4933) Loss: 0.91950 (1.4933)
304
+ 2025-05-07,11:52:09 | WARNING | Handling webdataset error (OSError('image file is truncated (33 bytes not processed)')). Ignoring.
305
+ 2025-05-07,12:04:11 | INFO | Train Epoch: 1 [ 98582528/128008192 (77%)] Data (t): 0.182 Batch (t): 6.407, 2555.23/s, 79.8509/s/gpu LR: 0.000604 Logit Scale: 83.693 Contrastive_loss: 1.3820 (1.4910) Loss: 1.3820 (1.4910)
306
+ 2025-05-07,12:17:54 | INFO | Train Epoch: 1 [100679680/128008192 (79%)] Data (t): 0.181 Batch (t): 6.435, 2540.63/s, 79.3948/s/gpu LR: 0.000597 Logit Scale: 83.824 Contrastive_loss: 1.3835 (1.4888) Loss: 1.3835 (1.4888)
307
+ 2025-05-07,12:27:34 | WARNING | Handling webdataset error (OSError('image file is truncated (108 bytes not processed)')). Ignoring.
308
+ 2025-05-07,12:31:35 | INFO | Train Epoch: 1 [102776832/128008192 (80%)] Data (t): 0.181 Batch (t): 6.413, 2552.80/s, 79.7750/s/gpu LR: 0.000591 Logit Scale: 83.995 Contrastive_loss: 1.2898 (1.4848) Loss: 1.2898 (1.4848)
309
+ 2025-05-07,12:35:15 | WARNING | Handling webdataset error (OSError('image file is truncated (85 bytes not processed)')). Ignoring.
310
+ 2025-05-07,12:42:23 | WARNING | Handling webdataset error (OSError('image file is truncated (25 bytes not processed)')). Ignoring.
311
+ 2025-05-07,12:45:16 | INFO | Train Epoch: 1 [104873984/128008192 (82%)] Data (t): 0.181 Batch (t): 6.413, 2548.30/s, 79.6343/s/gpu LR: 0.000585 Logit Scale: 84.129 Contrastive_loss: 1.1117 (1.4775) Loss: 1.1117 (1.4775)
312
+ 2025-05-07,12:58:58 | INFO | Train Epoch: 1 [106971136/128008192 (84%)] Data (t): 0.180 Batch (t): 6.419, 2564.46/s, 80.1394/s/gpu LR: 0.000578 Logit Scale: 84.374 Contrastive_loss: 1.2866 (1.4738) Loss: 1.2866 (1.4738)
313
+ 2025-05-07,13:09:21 | WARNING | Handling webdataset error (OSError('image file is truncated (67 bytes not processed)')). Ignoring.
314
+ 2025-05-07,13:12:38 | INFO | Train Epoch: 1 [109068288/128008192 (85%)] Data (t): 0.181 Batch (t): 6.412, 2553.81/s, 79.8065/s/gpu LR: 0.000572 Logit Scale: 84.498 Contrastive_loss: 0.76095 (1.4603) Loss: 0.76095 (1.4603)
315
+ 2025-05-07,13:26:20 | INFO | Train Epoch: 1 [111165440/128008192 (87%)] Data (t): 0.182 Batch (t): 6.419, 2556.04/s, 79.8762/s/gpu LR: 0.000565 Logit Scale: 84.719 Contrastive_loss: 1.2380 (1.4562) Loss: 1.2380 (1.4562)
316
+ 2025-05-07,13:40:03 | INFO | Train Epoch: 1 [113262592/128008192 (88%)] Data (t): 0.184 Batch (t): 6.431, 2546.50/s, 79.5782/s/gpu LR: 0.000559 Logit Scale: 84.863 Contrastive_loss: 1.2916 (1.4532) Loss: 1.2916 (1.4532)
317
+ 2025-05-07,13:53:46 | INFO | Train Epoch: 1 [115359744/128008192 (90%)] Data (t): 0.182 Batch (t): 6.430, 2563.44/s, 80.1074/s/gpu LR: 0.000552 Logit Scale: 85.037 Contrastive_loss: 1.2766 (1.4501) Loss: 1.2766 (1.4501)
318
+ 2025-05-07,14:07:29 | INFO | Train Epoch: 1 [117456896/128008192 (92%)] Data (t): 0.182 Batch (t): 6.426, 2557.08/s, 79.9087/s/gpu LR: 0.000546 Logit Scale: 85.231 Contrastive_loss: 1.2462 (1.4465) Loss: 1.2462 (1.4465)
319
+ 2025-05-07,14:20:00 | WARNING | Handling webdataset error (OSError('image file is truncated (38 bytes not processed)')). Ignoring.
320
+ 2025-05-07,14:21:10 | INFO | Train Epoch: 1 [119554048/128008192 (93%)] Data (t): 0.182 Batch (t): 6.417, 2570.26/s, 80.3205/s/gpu LR: 0.000539 Logit Scale: 85.440 Contrastive_loss: 1.2088 (1.4424) Loss: 1.2088 (1.4424)
321
+ 2025-05-07,14:34:50 | INFO | Train Epoch: 1 [121651200/128008192 (95%)] Data (t): 0.181 Batch (t): 6.409, 2563.05/s, 80.0954/s/gpu LR: 0.000533 Logit Scale: 85.531 Contrastive_loss: 1.1789 (1.4379) Loss: 1.1789 (1.4379)
322
+ 2025-05-07,14:39:41 | WARNING | Handling webdataset error (OSError('image file is truncated (89 bytes not processed)')). Ignoring.
323
+ 2025-05-07,14:48:31 | INFO | Train Epoch: 1 [123748352/128008192 (97%)] Data (t): 0.182 Batch (t): 6.414, 2548.80/s, 79.6501/s/gpu LR: 0.000526 Logit Scale: 85.685 Contrastive_loss: 1.2296 (1.4345) Loss: 1.2296 (1.4345)
324
+ 2025-05-07,15:02:13 | INFO | Train Epoch: 1 [125845504/128008192 (98%)] Data (t): 0.183 Batch (t): 6.417, 2542.40/s, 79.4501/s/gpu LR: 0.000520 Logit Scale: 85.837 Contrastive_loss: 1.3986 (1.4339) Loss: 1.3986 (1.4339)
325
+ 2025-05-07,15:14:31 | WARNING | Handling webdataset error (OSError('image file is truncated (46 bytes not processed)')). Ignoring.
326
+ 2025-05-07,15:15:53 | INFO | Train Epoch: 1 [127942656/128008192 (100%)] Data (t): 0.182 Batch (t): 6.409, 2547.72/s, 79.6164/s/gpu LR: 0.000513 Logit Scale: 86.119 Contrastive_loss: 1.1594 (1.4294) Loss: 1.1594 (1.4294)
327
+ 2025-05-07,15:16:19 | INFO | Train Epoch: 1 [128008192/128008192 (100%)] Data (t): 0.204 Batch (t): 6.416, 2556.17/s, 79.8804/s/gpu LR: 0.000513 Logit Scale: 86.120 Contrastive_loss: 0.89117 (1.4209) Loss: 0.89117 (1.4209)
328
+ 2025-05-07,15:16:40 | INFO | Start epoch 2
329
+ 2025-05-07,15:16:51 | INFO | Train Epoch: 2 [ 16384/128008192 (0%)] Data (t): 4.320 Batch (t): 10.460, 1566.34/s, 48.9483/s/gpu LR: 0.000513 Logit Scale: 86.123 Contrastive_loss: 1.1445 (1.1445) Loss: 1.1445 (1.1445)
330
+ 2025-05-07,15:30:30 | INFO | Train Epoch: 2 [ 2113536/128008192 (2%)] Data (t): 0.184 Batch (t): 6.399, 2569.55/s, 80.2985/s/gpu LR: 0.000506 Logit Scale: 86.364 Contrastive_loss: 1.1829 (1.1637) Loss: 1.1829 (1.1637)
331
+ 2025-05-07,15:44:10 | INFO | Train Epoch: 2 [ 4210688/128008192 (3%)] Data (t): 0.181 Batch (t): 6.412, 2564.20/s, 80.1312/s/gpu LR: 0.000500 Logit Scale: 86.462 Contrastive_loss: 0.85927 (1.0622) Loss: 0.85927 (1.0622)
332
+ 2025-05-07,15:57:53 | INFO | Train Epoch: 2 [ 6307840/128008192 (5%)] Data (t): 0.178 Batch (t): 6.424, 2548.11/s, 79.6285/s/gpu LR: 0.000493 Logit Scale: 86.749 Contrastive_loss: 1.0237 (1.0526) Loss: 1.0237 (1.0526)
333
+ 2025-05-07,16:04:58 | WARNING | Handling webdataset error (OSError('image file is truncated (3 bytes not processed)')). Ignoring.
334
+ 2025-05-07,16:11:35 | INFO | Train Epoch: 2 [ 8404992/128008192 (7%)] Data (t): 0.176 Batch (t): 6.421, 2559.01/s, 79.9690/s/gpu LR: 0.000487 Logit Scale: 86.847 Contrastive_loss: 1.1774 (1.0776) Loss: 1.1774 (1.0776)
335
+ 2025-05-07,16:25:16 | INFO | Train Epoch: 2 [ 10502144/128008192 (8%)] Data (t): 0.179 Batch (t): 6.417, 2394.63/s, 74.8321/s/gpu LR: 0.000480 Logit Scale: 86.885 Contrastive_loss: 1.0619 (1.0750) Loss: 1.0619 (1.0750)
336
+ 2025-05-07,16:39:00 | INFO | Train Epoch: 2 [ 12599296/128008192 (10%)] Data (t): 0.179 Batch (t): 6.439, 2551.29/s, 79.7277/s/gpu LR: 0.000474 Logit Scale: 87.037 Contrastive_loss: 1.1776 (1.0896) Loss: 1.1776 (1.0896)
337
+ 2025-05-07,16:52:43 | INFO | Train Epoch: 2 [ 14696448/128008192 (11%)] Data (t): 0.180 Batch (t): 6.431, 2554.01/s, 79.8128/s/gpu LR: 0.000467 Logit Scale: 87.248 Contrastive_loss: 1.0901 (1.0897) Loss: 1.0901 (1.0897)
338
+ 2025-05-07,17:06:27 | INFO | Train Epoch: 2 [ 16793600/128008192 (13%)] Data (t): 0.180 Batch (t): 6.433, 2540.42/s, 79.3882/s/gpu LR: 0.000461 Logit Scale: 87.434 Contrastive_loss: 1.1669 (1.0983) Loss: 1.1669 (1.0983)
339
+ 2025-05-07,17:15:47 | WARNING | Handling webdataset error (OSError('image file is truncated (7 bytes not processed)')). Ignoring.
340
+ 2025-05-07,17:20:09 | INFO | Train Epoch: 2 [ 18890752/128008192 (15%)] Data (t): 0.180 Batch (t): 6.424, 2554.21/s, 79.8190/s/gpu LR: 0.000454 Logit Scale: 87.644 Contrastive_loss: 0.99046 (1.0875) Loss: 0.99046 (1.0875)
341
+ 2025-05-07,17:33:51 | INFO | Train Epoch: 2 [ 20987904/128008192 (16%)] Data (t): 0.179 Batch (t): 6.426, 2547.25/s, 79.6015/s/gpu LR: 0.000447 Logit Scale: 87.716 Contrastive_loss: 1.1286 (1.0912) Loss: 1.1286 (1.0912)
342
+ 2025-05-07,17:47:33 | INFO | Train Epoch: 2 [ 23085056/128008192 (18%)] Data (t): 0.180 Batch (t): 6.417, 2563.48/s, 80.1087/s/gpu LR: 0.000441 Logit Scale: 87.920 Contrastive_loss: 0.77843 (1.0651) Loss: 0.77843 (1.0651)
343
+ 2025-05-07,17:51:19 | WARNING | Handling webdataset error (OSError('image file is truncated (82 bytes not processed)')). Ignoring.
344
+ 2025-05-07,18:01:15 | INFO | Train Epoch: 2 [ 25182208/128008192 (20%)] Data (t): 0.173 Batch (t): 6.425, 2535.20/s, 79.2249/s/gpu LR: 0.000435 Logit Scale: 88.026 Contrastive_loss: 1.0896 (1.0670) Loss: 1.0896 (1.0670)
345
+ 2025-05-07,18:14:58 | INFO | Train Epoch: 2 [ 27279360/128008192 (21%)] Data (t): 0.174 Batch (t): 6.431, 2546.14/s, 79.5669/s/gpu LR: 0.000428 Logit Scale: 88.247 Contrastive_loss: 0.87304 (1.0532) Loss: 0.87304 (1.0532)
346
+ 2025-05-07,18:28:41 | INFO | Train Epoch: 2 [ 29376512/128008192 (23%)] Data (t): 0.170 Batch (t): 6.431, 2554.25/s, 79.8203/s/gpu LR: 0.000422 Logit Scale: 88.481 Contrastive_loss: 0.95847 (1.0469) Loss: 0.95847 (1.0469)
347
+ 2025-05-07,18:42:22 | INFO | Train Epoch: 2 [ 31473664/128008192 (25%)] Data (t): 0.174 Batch (t): 6.415, 2564.62/s, 80.1445/s/gpu LR: 0.000415 Logit Scale: 88.569 Contrastive_loss: 1.0627 (1.0478) Loss: 1.0627 (1.0478)
348
+ 2025-05-07,18:56:02 | INFO | Train Epoch: 2 [ 33570816/128008192 (26%)] Data (t): 0.173 Batch (t): 6.406, 2560.11/s, 80.0036/s/gpu LR: 0.000409 Logit Scale: 88.669 Contrastive_loss: 0.88806 (1.0385) Loss: 0.88806 (1.0385)
349
+ 2025-05-07,19:09:42 | INFO | Train Epoch: 2 [ 35667968/128008192 (28%)] Data (t): 0.173 Batch (t): 6.404, 2558.59/s, 79.9559/s/gpu LR: 0.000402 Logit Scale: 88.850 Contrastive_loss: 0.88531 (1.0299) Loss: 0.88531 (1.0299)
350
+ 2025-05-07,19:23:22 | INFO | Train Epoch: 2 [ 37765120/128008192 (30%)] Data (t): 0.171 Batch (t): 6.403, 2572.92/s, 80.4037/s/gpu LR: 0.000396 Logit Scale: 88.989 Contrastive_loss: 1.0340 (1.0302) Loss: 1.0340 (1.0302)
351
+ 2025-05-07,19:37:02 | INFO | Train Epoch: 2 [ 39862272/128008192 (31%)] Data (t): 0.172 Batch (t): 6.405, 2561.79/s, 80.0559/s/gpu LR: 0.000389 Logit Scale: 89.232 Contrastive_loss: 1.0733 (1.0323) Loss: 1.0733 (1.0323)
352
+ 2025-05-07,19:42:35 | WARNING | Handling webdataset error (OSError('image file is truncated (76 bytes not processed)')). Ignoring.
353
+ 2025-05-07,19:43:26 | WARNING | Handling webdataset error (OSError('image file is truncated (230 bytes not processed)')). Ignoring.
354
+ 2025-05-07,19:50:40 | INFO | Train Epoch: 2 [ 41959424/128008192 (33%)] Data (t): 0.170 Batch (t): 6.398, 2562.19/s, 80.0683/s/gpu LR: 0.000383 Logit Scale: 89.472 Contrastive_loss: 1.1645 (1.0386) Loss: 1.1645 (1.0386)
355
+ 2025-05-07,19:53:09 | WARNING | Handling webdataset error (OSError('image file is truncated (12 bytes not processed)')). Ignoring.
356
+ 2025-05-07,20:04:31 | INFO | Train Epoch: 2 [ 44056576/128008192 (34%)] Data (t): 0.264 Batch (t): 6.491, 2569.86/s, 80.3080/s/gpu LR: 0.000377 Logit Scale: 89.628 Contrastive_loss: 1.0165 (1.0376) Loss: 1.0165 (1.0376)
357
+ 2025-05-07,20:18:10 | INFO | Train Epoch: 2 [ 46153728/128008192 (36%)] Data (t): 0.173 Batch (t): 6.394, 2555.61/s, 79.8629/s/gpu LR: 0.000370 Logit Scale: 89.820 Contrastive_loss: 0.84902 (1.0294) Loss: 0.84902 (1.0294)
358
+ 2025-05-07,20:31:49 | INFO | Train Epoch: 2 [ 48250880/128008192 (38%)] Data (t): 0.173 Batch (t): 6.399, 2559.00/s, 79.9687/s/gpu LR: 0.000364 Logit Scale: 89.938 Contrastive_loss: 1.0616 (1.0307) Loss: 1.0616 (1.0307)
359
+ 2025-05-07,20:45:32 | INFO | Train Epoch: 2 [ 50348032/128008192 (39%)] Data (t): 0.173 Batch (t): 6.429, 2565.61/s, 80.1755/s/gpu LR: 0.000358 Logit Scale: 90.064 Contrastive_loss: 0.88606 (1.0250) Loss: 0.88606 (1.0250)
360
+ 2025-05-07,20:59:12 | INFO | Train Epoch: 2 [ 52445184/128008192 (41%)] Data (t): 0.173 Batch (t): 6.412, 2555.34/s, 79.8542/s/gpu LR: 0.000352 Logit Scale: 90.199 Contrastive_loss: 1.0008 (1.0240) Loss: 1.0008 (1.0240)
361
+ 2025-05-07,21:12:52 | INFO | Train Epoch: 2 [ 54542336/128008192 (43%)] Data (t): 0.172 Batch (t): 6.403, 2561.16/s, 80.0364/s/gpu LR: 0.000345 Logit Scale: 90.430 Contrastive_loss: 1.0461 (1.0248) Loss: 1.0461 (1.0248)
362
+ 2025-05-07,21:26:32 | INFO | Train Epoch: 2 [ 56639488/128008192 (44%)] Data (t): 0.171 Batch (t): 6.404, 2541.70/s, 79.4280/s/gpu LR: 0.000339 Logit Scale: 90.644 Contrastive_loss: 0.76301 (1.0155) Loss: 0.76301 (1.0155)
363
+ 2025-05-07,21:40:11 | INFO | Train Epoch: 2 [ 58736640/128008192 (46%)] Data (t): 0.172 Batch (t): 6.403, 2548.49/s, 79.6403/s/gpu LR: 0.000333 Logit Scale: 90.753 Contrastive_loss: 0.99624 (1.0148) Loss: 0.99624 (1.0148)
364
+ 2025-05-07,21:53:51 | INFO | Train Epoch: 2 [ 60833792/128008192 (48%)] Data (t): 0.173 Batch (t): 6.403, 2550.96/s, 79.7174/s/gpu LR: 0.000327 Logit Scale: 90.927 Contrastive_loss: 0.86366 (1.0098) Loss: 0.86366 (1.0098)
365
+ 2025-05-07,22:07:35 | INFO | Train Epoch: 2 [ 62930944/128008192 (49%)] Data (t): 0.172 Batch (t): 6.441, 2558.30/s, 79.9469/s/gpu LR: 0.000321 Logit Scale: 91.105 Contrastive_loss: 0.97236 (1.0086) Loss: 0.97236 (1.0086)
366
+ 2025-05-07,22:14:46 | WARNING | Handling webdataset error (OSError('image file is truncated (8 bytes not processed)')). Ignoring.
367
+ 2025-05-07,22:21:16 | INFO | Train Epoch: 2 [ 65028096/128008192 (51%)] Data (t): 0.174 Batch (t): 6.409, 2569.18/s, 80.2870/s/gpu LR: 0.000315 Logit Scale: 91.266 Contrastive_loss: 0.90785 (1.0054) Loss: 0.90785 (1.0054)
368
+ 2025-05-07,22:26:17 | WARNING | Handling webdataset error (OSError('image file is truncated (54 bytes not processed)')). Ignoring.
369
+ 2025-05-07,22:34:55 | INFO | Train Epoch: 2 [ 67125248/128008192 (52%)] Data (t): 0.172 Batch (t): 6.405, 2556.77/s, 79.8991/s/gpu LR: 0.000309 Logit Scale: 91.447 Contrastive_loss: 0.94508 (1.0036) Loss: 0.94508 (1.0036)
370
+ 2025-05-07,22:48:35 | INFO | Train Epoch: 2 [ 69222400/128008192 (54%)] Data (t): 0.173 Batch (t): 6.405, 2567.35/s, 80.2296/s/gpu LR: 0.000303 Logit Scale: 91.660 Contrastive_loss: 0.78137 (0.99707) Loss: 0.78137 (0.99707)
371
+ 2025-05-07,23:02:15 | INFO | Train Epoch: 2 [ 71319552/128008192 (56%)] Data (t): 0.170 Batch (t): 6.401, 2558.63/s, 79.9573/s/gpu LR: 0.000297 Logit Scale: 91.806 Contrastive_loss: 0.71535 (0.98902) Loss: 0.71535 (0.98902)
372
+ 2025-05-07,23:15:54 | INFO | Train Epoch: 2 [ 73416704/128008192 (57%)] Data (t): 0.173 Batch (t): 6.404, 2556.37/s, 79.8866/s/gpu LR: 0.000291 Logit Scale: 91.954 Contrastive_loss: 1.0308 (0.99018) Loss: 1.0308 (0.99018)
373
+ 2025-05-07,23:23:24 | WARNING | Handling webdataset error (OSError('image file is truncated (34 bytes not processed)')). Ignoring.
374
+ 2025-05-07,23:29:34 | INFO | Train Epoch: 2 [ 75513856/128008192 (59%)] Data (t): 0.173 Batch (t): 6.402, 2549.49/s, 79.6715/s/gpu LR: 0.000285 Logit Scale: 92.161 Contrastive_loss: 0.85054 (0.98641) Loss: 0.85054 (0.98641)
375
+ 2025-05-07,23:40:51 | WARNING | Handling webdataset error (OSError('image file is truncated (37 bytes not processed)')). Ignoring.
376
+ 2025-05-07,23:43:17 | INFO | Train Epoch: 2 [ 77611008/128008192 (61%)] Data (t): 0.172 Batch (t): 6.429, 2547.88/s, 79.6214/s/gpu LR: 0.000279 Logit Scale: 92.305 Contrastive_loss: 0.96870 (0.98594) Loss: 0.96870 (0.98594)
377
+ 2025-05-07,23:47:48 | WARNING | Handling webdataset error (OSError('image file is truncated (80 bytes not processed)')). Ignoring.
378
+ 2025-05-07,23:57:00 | INFO | Train Epoch: 2 [ 79708160/128008192 (62%)] Data (t): 0.172 Batch (t): 6.431, 2537.14/s, 79.2856/s/gpu LR: 0.000273 Logit Scale: 92.491 Contrastive_loss: 0.85520 (0.98259) Loss: 0.85520 (0.98259)
379
+ 2025-05-08,00:10:43 | INFO | Train Epoch: 2 [ 81805312/128008192 (64%)] Data (t): 0.174 Batch (t): 6.427, 2542.75/s, 79.4610/s/gpu LR: 0.000267 Logit Scale: 92.644 Contrastive_loss: 0.95289 (0.98185) Loss: 0.95289 (0.98185)
380
+ 2025-05-08,00:16:23 | WARNING | Handling webdataset error (OSError('image file is truncated (60 bytes not processed)')). Ignoring.
381
+ 2025-05-08,00:24:23 | INFO | Train Epoch: 2 [ 83902464/128008192 (66%)] Data (t): 0.170 Batch (t): 6.409, 2548.86/s, 79.6518/s/gpu LR: 0.000261 Logit Scale: 92.852 Contrastive_loss: 0.83558 (0.97828) Loss: 0.83558 (0.97828)
382
+ 2025-05-08,00:38:04 | INFO | Train Epoch: 2 [ 85999616/128008192 (67%)] Data (t): 0.171 Batch (t): 6.411, 2571.28/s, 80.3525/s/gpu LR: 0.000256 Logit Scale: 93.131 Contrastive_loss: 0.82488 (0.97463) Loss: 0.82488 (0.97463)
383
+ 2025-05-08,00:51:43 | INFO | Train Epoch: 2 [ 88096768/128008192 (69%)] Data (t): 0.172 Batch (t): 6.405, 2547.23/s, 79.6010/s/gpu LR: 0.000250 Logit Scale: 93.269 Contrastive_loss: 0.89011 (0.97266) Loss: 0.89011 (0.97266)
384
+ 2025-05-08,01:05:25 | INFO | Train Epoch: 2 [ 90193920/128008192 (70%)] Data (t): 0.172 Batch (t): 6.422, 2554.11/s, 79.8159/s/gpu LR: 0.000244 Logit Scale: 93.457 Contrastive_loss: 0.76739 (0.96800) Loss: 0.76739 (0.96800)
385
+ 2025-05-08,01:09:56 | WARNING | Handling webdataset error (OSError('image file is truncated (16 bytes not processed)')). Ignoring.
386
+ 2025-05-08,01:19:07 | INFO | Train Epoch: 2 [ 92291072/128008192 (72%)] Data (t): 0.173 Batch (t): 6.416, 2561.75/s, 80.0548/s/gpu LR: 0.000239 Logit Scale: 93.565 Contrastive_loss: 0.84787 (0.96533) Loss: 0.84787 (0.96533)
387
+ 2025-05-08,01:32:47 | INFO | Train Epoch: 2 [ 94388224/128008192 (74%)] Data (t): 0.173 Batch (t): 6.407, 2551.01/s, 79.7191/s/gpu LR: 0.000233 Logit Scale: 93.780 Contrastive_loss: 0.82349 (0.96225) Loss: 0.82349 (0.96225)
388
+ 2025-05-08,01:46:27 | INFO | Train Epoch: 2 [ 96485376/128008192 (75%)] Data (t): 0.174 Batch (t): 6.407, 2551.21/s, 79.7254/s/gpu LR: 0.000228 Logit Scale: 94.005 Contrastive_loss: 0.81553 (0.95912) Loss: 0.81553 (0.95912)
389
+ 2025-05-08,02:00:07 | INFO | Train Epoch: 2 [ 98582528/128008192 (77%)] Data (t): 0.165 Batch (t): 6.410, 2534.29/s, 79.1966/s/gpu LR: 0.000222 Logit Scale: 94.136 Contrastive_loss: 0.70505 (0.95383) Loss: 0.70505 (0.95383)
390
+ 2025-05-08,02:13:48 | INFO | Train Epoch: 2 [100679680/128008192 (79%)] Data (t): 0.173 Batch (t): 6.414, 2561.89/s, 80.0590/s/gpu LR: 0.000217 Logit Scale: 94.359 Contrastive_loss: 0.80939 (0.95088) Loss: 0.80939 (0.95088)
391
+ 2025-05-08,02:27:28 | INFO | Train Epoch: 2 [102776832/128008192 (80%)] Data (t): 0.172 Batch (t): 6.406, 2564.66/s, 80.1457/s/gpu LR: 0.000211 Logit Scale: 94.558 Contrastive_loss: 0.61869 (0.94424) Loss: 0.61869 (0.94424)
392
+ 2025-05-08,02:41:08 | INFO | Train Epoch: 2 [104873984/128008192 (82%)] Data (t): 0.173 Batch (t): 6.404, 2561.26/s, 80.0394/s/gpu LR: 0.000206 Logit Scale: 94.704 Contrastive_loss: 0.74351 (0.94030) Loss: 0.74351 (0.94030)
393
+ 2025-05-08,02:54:47 | INFO | Train Epoch: 2 [106971136/128008192 (84%)] Data (t): 0.172 Batch (t): 6.400, 2554.32/s, 79.8223/s/gpu LR: 0.000201 Logit Scale: 94.890 Contrastive_loss: 0.74321 (0.93651) Loss: 0.74321 (0.93651)
394
+ 2025-05-08,03:08:27 | INFO | Train Epoch: 2 [109068288/128008192 (85%)] Data (t): 0.173 Batch (t): 6.406, 2559.51/s, 79.9847/s/gpu LR: 0.000196 Logit Scale: 95.058 Contrastive_loss: 0.82052 (0.93432) Loss: 0.82052 (0.93432)
395
+ 2025-05-08,03:17:40 | WARNING | Handling webdataset error (OSError('image file is truncated (80 bytes not processed)')). Ignoring.
396
+ 2025-05-08,03:22:07 | INFO | Train Epoch: 2 [111165440/128008192 (87%)] Data (t): 0.173 Batch (t): 6.407, 2565.56/s, 80.1736/s/gpu LR: 0.000190 Logit Scale: 95.242 Contrastive_loss: 0.76668 (0.93122) Loss: 0.76668 (0.93122)
397
+ 2025-05-08,03:29:44 | WARNING | Handling webdataset error (OSError('image file is truncated (19 bytes not processed)')). Ignoring.
398
+ 2025-05-08,03:35:48 | INFO | Train Epoch: 2 [113262592/128008192 (88%)] Data (t): 0.172 Batch (t): 6.411, 2563.46/s, 80.1081/s/gpu LR: 0.000185 Logit Scale: 95.439 Contrastive_loss: 0.71657 (0.92732) Loss: 0.71657 (0.92732)
399
+ 2025-05-08,03:49:29 | INFO | Train Epoch: 2 [115359744/128008192 (90%)] Data (t): 0.173 Batch (t): 6.413, 2541.24/s, 79.4137/s/gpu LR: 0.000180 Logit Scale: 95.639 Contrastive_loss: 0.81185 (0.92525) Loss: 0.81185 (0.92525)
400
+ 2025-05-08,03:56:02 | WARNING | Handling webdataset error (OSError('image file is truncated (8 bytes not processed)')). Ignoring.
401
+ 2025-05-08,04:03:09 | INFO | Train Epoch: 2 [117456896/128008192 (92%)] Data (t): 0.172 Batch (t): 6.408, 2557.61/s, 79.9254/s/gpu LR: 0.000175 Logit Scale: 95.830 Contrastive_loss: 0.66491 (0.92069) Loss: 0.66491 (0.92069)
402
+ 2025-05-08,04:16:49 | INFO | Train Epoch: 2 [119554048/128008192 (93%)] Data (t): 0.173 Batch (t): 6.403, 2550.30/s, 79.6968/s/gpu LR: 0.000170 Logit Scale: 95.990 Contrastive_loss: 0.78242 (0.91830) Loss: 0.78242 (0.91830)
403
+ 2025-05-08,04:30:28 | INFO | Train Epoch: 2 [121651200/128008192 (95%)] Data (t): 0.173 Batch (t): 6.404, 2563.63/s, 80.1134/s/gpu LR: 0.000165 Logit Scale: 96.171 Contrastive_loss: 0.70744 (0.91473) Loss: 0.70744 (0.91473)
404
+ 2025-05-08,04:39:22 | WARNING | Handling webdataset error (OSError('image file is truncated (96 bytes not processed)')). Ignoring.
405
+ 2025-05-08,04:44:08 | INFO | Train Epoch: 2 [123748352/128008192 (97%)] Data (t): 0.173 Batch (t): 6.404, 2561.06/s, 80.0333/s/gpu LR: 0.000161 Logit Scale: 96.307 Contrastive_loss: 0.76422 (0.91222) Loss: 0.76422 (0.91222)
406
+ 2025-05-08,04:57:48 | INFO | Train Epoch: 2 [125845504/128008192 (98%)] Data (t): 0.173 Batch (t): 6.408, 2562.86/s, 80.0894/s/gpu LR: 0.000156 Logit Scale: 96.539 Contrastive_loss: 0.74557 (0.90949) Loss: 0.74557 (0.90949)
407
+ 2025-05-08,05:01:26 | WARNING | Handling webdataset error (OSError('image file is truncated (29 bytes not processed)')). Ignoring.
408
+ 2025-05-08,05:11:16 | WARNING | Handling webdataset error (OSError('image file is truncated (5 bytes not processed)')). Ignoring.
409
+ 2025-05-08,05:11:28 | INFO | Train Epoch: 2 [127942656/128008192 (100%)] Data (t): 0.172 Batch (t): 6.405, 2559.06/s, 79.9706/s/gpu LR: 0.000151 Logit Scale: 96.695 Contrastive_loss: 0.86292 (0.90874) Loss: 0.86292 (0.90874)
410
+ 2025-05-08,05:11:54 | INFO | Train Epoch: 2 [128008192/128008192 (100%)] Data (t): 0.175 Batch (t): 6.397, 2564.84/s, 80.1513/s/gpu LR: 0.000151 Logit Scale: 96.693 Contrastive_loss: 0.72556 (0.90583) Loss: 0.72556 (0.90583)
411
+ 2025-05-08,05:12:17 | INFO | Start epoch 3
412
+ 2025-05-08,05:12:28 | INFO | Train Epoch: 3 [ 16384/128008192 (0%)] Data (t): 4.302 Batch (t): 10.441, 1569.21/s, 49.0377/s/gpu LR: 0.000151 Logit Scale: 96.692 Contrastive_loss: 0.70954 (0.70954) Loss: 0.70954 (0.70954)
413
+ 2025-05-08,05:26:09 | INFO | Train Epoch: 3 [ 2113536/128008192 (2%)] Data (t): 0.177 Batch (t): 6.418, 2557.82/s, 79.9318/s/gpu LR: 0.000146 Logit Scale: 97.026 Contrastive_loss: 0.67179 (0.69066) Loss: 0.67179 (0.69066)
414
+ 2025-05-08,05:39:49 | INFO | Train Epoch: 3 [ 4210688/128008192 (3%)] Data (t): 0.173 Batch (t): 6.404, 2567.53/s, 80.2354/s/gpu LR: 0.000142 Logit Scale: 97.272 Contrastive_loss: 0.72579 (0.70237) Loss: 0.72579 (0.70237)
415
+ 2025-05-08,05:53:34 | INFO | Train Epoch: 3 [ 6307840/128008192 (5%)] Data (t): 0.175 Batch (t): 6.441, 2536.32/s, 79.2601/s/gpu LR: 0.000137 Logit Scale: 97.423 Contrastive_loss: 0.84779 (0.73873) Loss: 0.84779 (0.73873)
416
+ 2025-05-08,06:05:26 | WARNING | Handling webdataset error (OSError('image file is truncated (26 bytes not processed)')). Ignoring.
417
+ 2025-05-08,06:07:21 | INFO | Train Epoch: 3 [ 8404992/128008192 (7%)] Data (t): 0.173 Batch (t): 6.461, 2533.29/s, 79.1652/s/gpu LR: 0.000133 Logit Scale: 97.624 Contrastive_loss: 0.68247 (0.72748) Loss: 0.68247 (0.72748)
418
+ 2025-05-08,06:21:08 | INFO | Train Epoch: 3 [ 10502144/128008192 (8%)] Data (t): 0.173 Batch (t): 6.462, 2554.15/s, 79.8173/s/gpu LR: 0.000128 Logit Scale: 97.719 Contrastive_loss: 0.60918 (0.70776) Loss: 0.60918 (0.70776)
419
+ 2025-05-08,06:34:50 | INFO | Train Epoch: 3 [ 12599296/128008192 (10%)] Data (t): 0.174 Batch (t): 6.419, 2528.00/s, 79.0001/s/gpu LR: 0.000124 Logit Scale: 97.851 Contrastive_loss: 0.72948 (0.71086) Loss: 0.72948 (0.71086)
420
+ 2025-05-08,06:44:12 | WARNING | Handling webdataset error (OSError('broken data stream when reading image file')). Ignoring.
421
+ 2025-05-08,06:48:34 | INFO | Train Epoch: 3 [ 14696448/128008192 (11%)] Data (t): 0.173 Batch (t): 6.439, 2565.06/s, 80.1580/s/gpu LR: 0.000120 Logit Scale: 98.132 Contrastive_loss: 0.79705 (0.72164) Loss: 0.79705 (0.72164)
422
+ 2025-05-08,07:02:27 | INFO | Train Epoch: 3 [ 16793600/128008192 (13%)] Data (t): 0.282 Batch (t): 6.510, 2552.46/s, 79.7644/s/gpu LR: 0.000116 Logit Scale: 98.269 Contrastive_loss: 0.61353 (0.70962) Loss: 0.61353 (0.70962)
423
+ 2025-05-08,07:16:07 | INFO | Train Epoch: 3 [ 18890752/128008192 (15%)] Data (t): 0.171 Batch (t): 6.402, 2560.36/s, 80.0114/s/gpu LR: 0.000111 Logit Scale: 98.489 Contrastive_loss: 0.63963 (0.70263) Loss: 0.63963 (0.70263)
424
+ 2025-05-08,07:29:46 | INFO | Train Epoch: 3 [ 20987904/128008192 (16%)] Data (t): 0.172 Batch (t): 6.401, 2561.65/s, 80.0515/s/gpu LR: 0.000107 Logit Scale: 98.650 Contrastive_loss: 0.66842 (0.69952) Loss: 0.66842 (0.69952)
425
+ 2025-05-08,07:43:25 | INFO | Train Epoch: 3 [ 23085056/128008192 (18%)] Data (t): 0.173 Batch (t): 6.399, 2564.17/s, 80.1304/s/gpu LR: 0.000103 Logit Scale: 98.832 Contrastive_loss: 0.61742 (0.69267) Loss: 0.61742 (0.69267)
426
+ 2025-05-08,07:57:05 | INFO | Train Epoch: 3 [ 25182208/128008192 (20%)] Data (t): 0.174 Batch (t): 6.403, 2561.49/s, 80.0466/s/gpu LR: 0.000099 Logit Scale: 98.973 Contrastive_loss: 0.74366 (0.69660) Loss: 0.74366 (0.69660)
427
+ 2025-05-08,08:10:45 | INFO | Train Epoch: 3 [ 27279360/128008192 (21%)] Data (t): 0.172 Batch (t): 6.407, 2551.22/s, 79.7257/s/gpu LR: 0.000095 Logit Scale: 99.101 Contrastive_loss: 0.60358 (0.68995) Loss: 0.60358 (0.68995)
428
+ 2025-05-08,08:22:04 | WARNING | Handling webdataset error (OSError('image file is truncated (12 bytes not processed)')). Ignoring.
429
+ 2025-05-08,08:24:24 | INFO | Train Epoch: 3 [ 29376512/128008192 (23%)] Data (t): 0.172 Batch (t): 6.404, 2562.45/s, 80.0766/s/gpu LR: 0.000092 Logit Scale: 99.277 Contrastive_loss: 0.63768 (0.68647) Loss: 0.63768 (0.68647)
430
+ 2025-05-08,08:30:06 | WARNING | Handling webdataset error (OSError('image file is truncated (9 bytes not processed)')). Ignoring.
431
+ 2025-05-08,08:34:53 | WARNING | Handling webdataset error (OSError('image file is truncated (17 bytes not processed)')). Ignoring.
432
+ 2025-05-08,08:38:04 | INFO | Train Epoch: 3 [ 31473664/128008192 (25%)] Data (t): 0.174 Batch (t): 6.405, 2556.10/s, 79.8782/s/gpu LR: 0.000088 Logit Scale: 99.476 Contrastive_loss: 0.66016 (0.68482) Loss: 0.66016 (0.68482)
433
+ 2025-05-08,08:38:49 | WARNING | Handling webdataset error (OSError('image file is truncated (4 bytes not processed)')). Ignoring.
434
+ 2025-05-08,08:51:44 | INFO | Train Epoch: 3 [ 33570816/128008192 (26%)] Data (t): 0.174 Batch (t): 6.404, 2563.29/s, 80.1027/s/gpu LR: 0.000084 Logit Scale: 99.606 Contrastive_loss: 0.58103 (0.67872) Loss: 0.58103 (0.67872)
435
+ 2025-05-08,08:57:31 | WARNING | Handling webdataset error (OSError('image file is truncated (66 bytes not processed)')). Ignoring.
436
+ 2025-05-08,09:05:24 | INFO | Train Epoch: 3 [ 35667968/128008192 (28%)] Data (t): 0.175 Batch (t): 6.406, 2568.87/s, 80.2772/s/gpu LR: 0.000081 Logit Scale: 99.792 Contrastive_loss: 0.63794 (0.67645) Loss: 0.63794 (0.67645)
437
+ 2025-05-08,09:19:04 | INFO | Train Epoch: 3 [ 37765120/128008192 (30%)] Data (t): 0.176 Batch (t): 6.405, 2570.44/s, 80.3262/s/gpu LR: 0.000077 Logit Scale: 99.981 Contrastive_loss: 0.85533 (0.68587) Loss: 0.85533 (0.68587)
438
+ 2025-05-08,09:32:44 | INFO | Train Epoch: 3 [ 39862272/128008192 (31%)] Data (t): 0.175 Batch (t): 6.405, 2563.71/s, 80.1158/s/gpu LR: 0.000074 Logit Scale: 99.989 Contrastive_loss: 0.87071 (0.69511) Loss: 0.87071 (0.69511)
439
+ 2025-05-08,09:46:21 | WARNING | Handling webdataset error (OSError('image file is truncated (54 bytes not processed)')). Ignoring.
440
+ 2025-05-08,09:46:26 | INFO | Train Epoch: 3 [ 41959424/128008192 (33%)] Data (t): 0.174 Batch (t): 6.423, 2560.02/s, 80.0006/s/gpu LR: 0.000070 Logit Scale: 100.000 Contrastive_loss: 0.58384 (0.68981) Loss: 0.58384 (0.68981)
441
+ 2025-05-08,10:00:07 | INFO | Train Epoch: 3 [ 44056576/128008192 (34%)] Data (t): 0.174 Batch (t): 6.416, 2556.75/s, 79.8985/s/gpu LR: 0.000067 Logit Scale: 100.000 Contrastive_loss: 0.45629 (0.67920) Loss: 0.45629 (0.67920)
442
+ 2025-05-08,10:10:11 | WARNING | Handling webdataset error (OSError('image file is truncated (48 bytes not processed)')). Ignoring.
443
+ 2025-05-08,10:13:48 | INFO | Train Epoch: 3 [ 46153728/128008192 (36%)] Data (t): 0.174 Batch (t): 6.413, 2559.38/s, 79.9807/s/gpu LR: 0.000064 Logit Scale: 100.000 Contrastive_loss: 0.68254 (0.67934) Loss: 0.68254 (0.67934)
444
+ 2025-05-08,10:27:28 | INFO | Train Epoch: 3 [ 48250880/128008192 (38%)] Data (t): 0.176 Batch (t): 6.411, 2541.52/s, 79.4226/s/gpu LR: 0.000061 Logit Scale: 100.000 Contrastive_loss: 0.89161 (0.68819) Loss: 0.89161 (0.68819)
445
+ 2025-05-08,10:39:16 | WARNING | Handling webdataset error (OSError('image file is truncated (152 bytes not processed)')). Ignoring.
446
+ 2025-05-08,10:41:09 | INFO | Train Epoch: 3 [ 50348032/128008192 (39%)] Data (t): 0.175 Batch (t): 6.412, 2562.48/s, 80.0775/s/gpu LR: 0.000058 Logit Scale: 100.000 Contrastive_loss: 0.58635 (0.68411) Loss: 0.58635 (0.68411)
447
+ 2025-05-08,10:54:48 | INFO | Train Epoch: 3 [ 52445184/128008192 (41%)] Data (t): 0.176 Batch (t): 6.400, 2560.58/s, 80.0182/s/gpu LR: 0.000055 Logit Scale: 100.000 Contrastive_loss: 0.57741 (0.68001) Loss: 0.57741 (0.68001)
448
+ 2025-05-08,11:08:28 | INFO | Train Epoch: 3 [ 54542336/128008192 (43%)] Data (t): 0.174 Batch (t): 6.404, 2555.35/s, 79.8548/s/gpu LR: 0.000052 Logit Scale: 99.996 Contrastive_loss: 0.56950 (0.67592) Loss: 0.56950 (0.67592)
449
+ 2025-05-08,11:22:08 | INFO | Train Epoch: 3 [ 56639488/128008192 (44%)] Data (t): 0.175 Batch (t): 6.407, 2561.16/s, 80.0361/s/gpu LR: 0.000049 Logit Scale: 100.000 Contrastive_loss: 0.57244 (0.67222) Loss: 0.57244 (0.67222)
450
+ 2025-05-08,11:33:30 | WARNING | Handling webdataset error (OSError('image file is truncated (2 bytes not processed)')). Ignoring.
451
+ 2025-05-08,11:35:49 | INFO | Train Epoch: 3 [ 58736640/128008192 (46%)] Data (t): 0.176 Batch (t): 6.414, 2553.17/s, 79.7865/s/gpu LR: 0.000046 Logit Scale: 100.000 Contrastive_loss: 0.51754 (0.66689) Loss: 0.51754 (0.66689)
452
+ 2025-05-08,11:39:36 | WARNING | Handling webdataset error (OSError('image file is truncated (31 bytes not processed)')). Ignoring.
453
+ 2025-05-08,11:49:30 | INFO | Train Epoch: 3 [ 60833792/128008192 (48%)] Data (t): 0.176 Batch (t): 6.410, 2553.14/s, 79.7857/s/gpu LR: 0.000043 Logit Scale: 100.000 Contrastive_loss: 0.57857 (0.66394) Loss: 0.57857 (0.66394)
454
+ 2025-05-08,12:03:09 | INFO | Train Epoch: 3 [ 62930944/128008192 (49%)] Data (t): 0.176 Batch (t): 6.405, 2560.93/s, 80.0290/s/gpu LR: 0.000041 Logit Scale: 100.000 Contrastive_loss: 0.64274 (0.66326) Loss: 0.64274 (0.66326)
455
+ 2025-05-08,12:16:50 | INFO | Train Epoch: 3 [ 65028096/128008192 (51%)] Data (t): 0.174 Batch (t): 6.409, 2556.90/s, 79.9032/s/gpu LR: 0.000038 Logit Scale: 100.000 Contrastive_loss: 0.53750 (0.65933) Loss: 0.53750 (0.65933)
456
+ 2025-05-08,12:18:28 | WARNING | Handling webdataset error (OSError('image file is truncated (92 bytes not processed)')). Ignoring.
457
+ 2025-05-08,12:30:30 | INFO | Train Epoch: 3 [ 67125248/128008192 (52%)] Data (t): 0.175 Batch (t): 6.405, 2568.28/s, 80.2586/s/gpu LR: 0.000036 Logit Scale: 100.000 Contrastive_loss: 0.50851 (0.65476) Loss: 0.50851 (0.65476)
458
+ 2025-05-08,12:44:06 | WARNING | Handling webdataset error (OSError('image file is truncated (123 bytes not processed)')). Ignoring.
459
+ 2025-05-08,12:44:11 | INFO | Train Epoch: 3 [ 69222400/128008192 (54%)] Data (t): 0.204 Batch (t): 6.416, 2547.21/s, 79.6002/s/gpu LR: 0.000033 Logit Scale: 99.998 Contrastive_loss: 0.51292 (0.65059) Loss: 0.51292 (0.65059)
460
+ 2025-05-08,12:57:54 | INFO | Train Epoch: 3 [ 71319552/128008192 (56%)] Data (t): 0.216 Batch (t): 6.431, 2563.71/s, 80.1161/s/gpu LR: 0.000031 Logit Scale: 100.000 Contrastive_loss: 0.52533 (0.64701) Loss: 0.52533 (0.64701)
461
+ 2025-05-08,13:11:35 | INFO | Train Epoch: 3 [ 73416704/128008192 (57%)] Data (t): 0.175 Batch (t): 6.410, 2554.31/s, 79.8223/s/gpu LR: 0.000029 Logit Scale: 100.000 Contrastive_loss: 0.49118 (0.64268) Loss: 0.49118 (0.64268)
462
+ 2025-05-08,13:25:14 | INFO | Train Epoch: 3 [ 75513856/128008192 (59%)] Data (t): 0.175 Batch (t): 6.404, 2554.03/s, 79.8135/s/gpu LR: 0.000027 Logit Scale: 100.000 Contrastive_loss: 0.53598 (0.63980) Loss: 0.53598 (0.63980)
463
+ 2025-05-08,13:29:00 | WARNING | Handling webdataset error (OSError('image file is truncated (27 bytes not processed)')). Ignoring.
464
+ 2025-05-08,13:38:56 | INFO | Train Epoch: 3 [ 77611008/128008192 (61%)] Data (t): 0.175 Batch (t): 6.421, 2536.65/s, 79.2705/s/gpu LR: 0.000025 Logit Scale: 100.000 Contrastive_loss: 0.51526 (0.63652) Loss: 0.51526 (0.63652)
465
+ 2025-05-08,13:52:39 | INFO | Train Epoch: 3 [ 79708160/128008192 (62%)] Data (t): 0.175 Batch (t): 6.429, 2555.95/s, 79.8736/s/gpu LR: 0.000023 Logit Scale: 100.000 Contrastive_loss: 0.45846 (0.63195) Loss: 0.45846 (0.63195)
466
+ 2025-05-08,14:06:22 | INFO | Train Epoch: 3 [ 81805312/128008192 (64%)] Data (t): 0.177 Batch (t): 6.427, 2553.74/s, 79.8044/s/gpu LR: 0.000021 Logit Scale: 100.000 Contrastive_loss: 0.47665 (0.62807) Loss: 0.47665 (0.62807)
467
+ 2025-05-08,14:07:35 | WARNING | Handling webdataset error (OSError('image file is truncated (2 bytes not processed)')). Ignoring.
468
+ 2025-05-08,14:15:31 | WARNING | Handling webdataset error (OSError('image file is truncated (7 bytes not processed)')). Ignoring.
469
+ 2025-05-08,14:20:07 | INFO | Train Epoch: 3 [ 83902464/128008192 (66%)] Data (t): 0.201 Batch (t): 6.446, 2544.34/s, 79.5107/s/gpu LR: 0.000019 Logit Scale: 100.000 Contrastive_loss: 0.48294 (0.62453) Loss: 0.48294 (0.62453)
470
+ 2025-05-08,14:26:52 | WARNING | Handling webdataset error (OSError('image file is truncated (92 bytes not processed)')). Ignoring.
471
+ 2025-05-08,14:33:48 | INFO | Train Epoch: 3 [ 85999616/128008192 (67%)] Data (t): 0.174 Batch (t): 6.416, 2556.94/s, 79.9043/s/gpu LR: 0.000017 Logit Scale: 100.000 Contrastive_loss: 0.78602 (0.62838) Loss: 0.78602 (0.62838)
472
+ 2025-05-08,14:47:29 | INFO | Train Epoch: 3 [ 88096768/128008192 (69%)] Data (t): 0.177 Batch (t): 6.410, 2562.73/s, 80.0854/s/gpu LR: 0.000015 Logit Scale: 100.000 Contrastive_loss: 0.50361 (0.62547) Loss: 0.50361 (0.62547)
473
+ 2025-05-08,15:01:09 | INFO | Train Epoch: 3 [ 90193920/128008192 (70%)] Data (t): 0.176 Batch (t): 6.409, 2560.79/s, 80.0247/s/gpu LR: 0.000014 Logit Scale: 100.000 Contrastive_loss: 0.57320 (0.62429) Loss: 0.57320 (0.62429)
474
+ 2025-05-08,15:14:48 | INFO | Train Epoch: 3 [ 92291072/128008192 (72%)] Data (t): 0.176 Batch (t): 6.401, 2534.46/s, 79.2018/s/gpu LR: 0.000012 Logit Scale: 100.000 Contrastive_loss: 0.48445 (0.62118) Loss: 0.48445 (0.62118)
475
+ 2025-05-08,15:28:30 | INFO | Train Epoch: 3 [ 94388224/128008192 (74%)] Data (t): 0.174 Batch (t): 6.422, 2557.16/s, 79.9111/s/gpu LR: 0.000011 Logit Scale: 100.000 Contrastive_loss: 0.76965 (0.62441) Loss: 0.76965 (0.62441)
476
+ 2025-05-08,15:42:10 | INFO | Train Epoch: 3 [ 96485376/128008192 (75%)] Data (t): 0.176 Batch (t): 6.406, 2564.11/s, 80.1285/s/gpu LR: 0.000010 Logit Scale: 100.000 Contrastive_loss: 0.41394 (0.61993) Loss: 0.41394 (0.61993)
477
+ 2025-05-08,15:55:49 | INFO | Train Epoch: 3 [ 98582528/128008192 (77%)] Data (t): 0.176 Batch (t): 6.397, 2564.90/s, 80.1532/s/gpu LR: 0.000008 Logit Scale: 100.000 Contrastive_loss: 0.52012 (0.61785) Loss: 0.52012 (0.61785)
478
+ 2025-05-08,16:09:27 | INFO | Train Epoch: 3 [100679680/128008192 (79%)] Data (t): 0.175 Batch (t): 6.393, 2558.55/s, 79.9548/s/gpu LR: 0.000007 Logit Scale: 99.999 Contrastive_loss: 0.54107 (0.61628) Loss: 0.54107 (0.61628)
479
+ 2025-05-08,16:23:09 | INFO | Train Epoch: 3 [102776832/128008192 (80%)] Data (t): 0.190 Batch (t): 6.423, 2522.64/s, 78.8325/s/gpu LR: 0.000006 Logit Scale: 100.000 Contrastive_loss: 0.51131 (0.61418) Loss: 0.51131 (0.61418)
480
+ 2025-05-08,16:36:52 | INFO | Train Epoch: 3 [104873984/128008192 (82%)] Data (t): 0.174 Batch (t): 6.423, 2550.72/s, 79.7099/s/gpu LR: 0.000005 Logit Scale: 100.000 Contrastive_loss: 0.61509 (0.61420) Loss: 0.61509 (0.61420)
481
+ 2025-05-08,16:50:31 | INFO | Train Epoch: 3 [106971136/128008192 (84%)] Data (t): 0.175 Batch (t): 6.400, 2546.04/s, 79.5638/s/gpu LR: 0.000004 Logit Scale: 100.000 Contrastive_loss: 0.41313 (0.61033) Loss: 0.41313 (0.61033)
482
+ 2025-05-08,17:04:13 | INFO | Train Epoch: 3 [109068288/128008192 (85%)] Data (t): 0.177 Batch (t): 6.427, 2555.86/s, 79.8708/s/gpu LR: 0.000003 Logit Scale: 100.000 Contrastive_loss: 0.46009 (0.60750) Loss: 0.46009 (0.60750)
483
+ 2025-05-08,17:17:19 | WARNING | Handling webdataset error (OSError('image file is truncated (58 bytes not processed)')). Ignoring.
484
+ 2025-05-08,17:17:56 | INFO | Train Epoch: 3 [111165440/128008192 (87%)] Data (t): 0.175 Batch (t): 6.424, 2553.93/s, 79.8104/s/gpu LR: 0.000003 Logit Scale: 99.999 Contrastive_loss: 0.51906 (0.60586) Loss: 0.51906 (0.60586)
485
+ 2025-05-08,17:31:40 | INFO | Train Epoch: 3 [113262592/128008192 (88%)] Data (t): 0.174 Batch (t): 6.439, 2551.32/s, 79.7288/s/gpu LR: 0.000002 Logit Scale: 100.000 Contrastive_loss: 0.41001 (0.60230) Loss: 0.41001 (0.60230)
486
+ 2025-05-08,17:45:25 | INFO | Train Epoch: 3 [115359744/128008192 (90%)] Data (t): 0.174 Batch (t): 6.448, 2534.77/s, 79.2114/s/gpu LR: 0.000002 Logit Scale: 100.000 Contrastive_loss: 0.72973 (0.60458) Loss: 0.72973 (0.60458)
487
+ 2025-05-08,17:48:08 | WARNING | Handling webdataset error (OSError('image file is truncated (6 bytes not processed)')). Ignoring.
488
+ 2025-05-08,17:59:30 | INFO | Train Epoch: 3 [117456896/128008192 (92%)] Data (t): 0.299 Batch (t): 6.598, 2533.39/s, 79.1683/s/gpu LR: 0.000001 Logit Scale: 100.000 Contrastive_loss: 0.67828 (0.60587) Loss: 0.67828 (0.60587)
489
+ 2025-05-08,18:13:17 | INFO | Train Epoch: 3 [119554048/128008192 (93%)] Data (t): 0.175 Batch (t): 6.463, 2539.80/s, 79.3687/s/gpu LR: 0.000001 Logit Scale: 99.999 Contrastive_loss: 0.48863 (0.60385) Loss: 0.48863 (0.60385)
490
+ 2025-05-08,18:27:04 | INFO | Train Epoch: 3 [121651200/128008192 (95%)] Data (t): 0.175 Batch (t): 6.463, 2547.83/s, 79.6197/s/gpu LR: 0.000000 Logit Scale: 99.999 Contrastive_loss: 0.58212 (0.60348) Loss: 0.58212 (0.60348)
491
+ 2025-05-08,18:38:09 | WARNING | Handling webdataset error (OSError('image file is truncated (31 bytes not processed)')). Ignoring.
492
+ 2025-05-08,18:40:47 | INFO | Train Epoch: 3 [123748352/128008192 (97%)] Data (t): 0.177 Batch (t): 6.431, 2546.77/s, 79.5867/s/gpu LR: 0.000000 Logit Scale: 99.999 Contrastive_loss: 0.36669 (0.59953) Loss: 0.36669 (0.59953)
493
+ 2025-05-08,18:54:29 | INFO | Train Epoch: 3 [125845504/128008192 (98%)] Data (t): 0.176 Batch (t): 6.415, 2564.60/s, 80.1438/s/gpu LR: 0.000000 Logit Scale: 99.999 Contrastive_loss: 0.65729 (0.60048) Loss: 0.65729 (0.60048)
494
+ 2025-05-08,19:08:09 | INFO | Train Epoch: 3 [127942656/128008192 (100%)] Data (t): 0.182 Batch (t): 6.409, 2559.26/s, 79.9769/s/gpu LR: 0.000000 Logit Scale: 99.999 Contrastive_loss: 0.66025 (0.60144) Loss: 0.66025 (0.60144)
495
+ 2025-05-08,19:08:34 | INFO | Train Epoch: 3 [128008192/128008192 (100%)] Data (t): 0.178 Batch (t): 6.398, 2556.97/s, 79.9053/s/gpu LR: 0.000000 Logit Scale: 99.999 Contrastive_loss: 0.54178 (0.60050) Loss: 0.54178 (0.60050)
496
+ 2025-05-08,19:08:50 | INFO | Starting zero-shot imagenet.
497
+ 2025-05-08,19:08:50 | INFO | Building zero-shot classifier
498
+ 2025-05-08,19:09:19 | INFO | Using classifier
clip_vit_l16_s512m_bs16k_mix0_8/params.txt ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ NDR_patch_size: 16
2
+ accum_freq: 1
3
+ aug_cfg: {}
4
+ batch_size: 512
5
+ beta1: 0.9
6
+ beta2: 0.98
7
+ checkpoint_path: ./logs-lr1e-3-datacomp/clip_vit_l16_s512m_bs16k_mix0_8/checkpoints
8
+ coca_caption_loss_weight: 2.0
9
+ coca_contrastive_loss_weight: 1.0
10
+ copy_codebase: False
11
+ csv_caption_key: title
12
+ csv_img_key: filepath
13
+ csv_separator:
14
+ dataset_resampled: False
15
+ dataset_type: webdataset
16
+ ddp_static_graph: True
17
+ debug: False
18
+ delete_prev_step_ckpt: True
19
+ delete_previous_checkpoint: False
20
+ device: cuda:0
21
+ dist_backend: nccl
22
+ dist_url: env://
23
+ distill: False
24
+ distill_model: None
25
+ distill_pretrained: None
26
+ distributed: True
27
+ epochs: 4
28
+ epochs_cooldown: None
29
+ eps: 1e-06
30
+ force_custom_text: False
31
+ force_image_size: 224
32
+ force_patch_dropout: None
33
+ force_quick_gelu: False
34
+ gather_with_grad: True
35
+ global_batch_size: 16384
36
+ grad_checkpointing: True
37
+ grad_clip_norm: None
38
+ horovod: False
39
+ image_interpolation: None
40
+ image_mean: None
41
+ image_resize_mode: None
42
+ image_std: None
43
+ imagenet_v2: None
44
+ imagenet_val: /mnt/bn/zilongdata-hl/dataset/imagenet/val
45
+ is_cls_token: True
46
+ local_loss: True
47
+ local_rank: 0
48
+ lock_image: False
49
+ lock_image_freeze_bn_stats: False
50
+ lock_image_unlocked_groups: 0
51
+ lock_text: False
52
+ lock_text_freeze_layer_norm: False
53
+ lock_text_unlocked_layers: 0
54
+ log_every_n_steps: 128
55
+ log_level: 20
56
+ log_local: False
57
+ log_path: ./logs-lr1e-3-datacomp/clip_vit_l16_s512m_bs16k_mix0_8/out.log
58
+ logs: ./logs-lr1e-3-datacomp
59
+ lr: 0.001
60
+ lr_cooldown_end: 0.0
61
+ lr_cooldown_power: 1.0
62
+ lr_scheduler: cosine
63
+ max_seq_len: 15000
64
+ model: ViT-L-16
65
+ name: clip_vit_l16_s512m_bs16k_mix0_8
66
+ native_dynamic_resolution: False
67
+ no_set_device_rank: False
68
+ only_packing: False
69
+ precision: amp
70
+ pretrained:
71
+ pretrained_image:
72
+ pretrained_text:
73
+ rank: 0
74
+ remote_sync: None
75
+ remote_sync_frequency: 300
76
+ remote_sync_protocol: s3
77
+ report_to: wandb
78
+ resume: None
79
+ rope_attn_num_heads: 12
80
+ rope_model_width: 768
81
+ save_every_n_steps: 6104
82
+ save_frequency: 1
83
+ save_most_recent: False
84
+ seed: 0
85
+ siglip: False
86
+ skip_scheduler: False
87
+ tensorboard: False
88
+ tensorboard_path:
89
+ torchcompile: False
90
+ torchscript: False
91
+ trace: False
92
+ train_data: /mnt/bn/zilongdata-hl/dataset/Recap-DataComp-1B-Dataset/{000000..140146}.tar
93
+ train_data_upsampling_factors: None
94
+ train_num_samples: 128000000
95
+ use_bn_sync: False
96
+ use_bnb_linear: None
97
+ val_data: None
98
+ val_frequency: 1
99
+ val_num_samples: None
100
+ val_steps: 0
101
+ wandb: True
102
+ wandb_notes:
103
+ wandb_project_name: cls-clip-NDR
104
+ warmup: 500
105
+ wd: 0.2
106
+ workers: 1
107
+ world_size: 32
108
+ zeroshot_frequency: 4
109
+ zeroshot_steps: 0