speedinghzl commited on
Commit
2bc7fbf
·
verified ·
1 Parent(s): b270c94

Upload folder using huggingface_hub

Browse files
clip_vit_l16_s128m_bs16k/checkpoints/results.jsonl ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ {"imagenet-zeroshot-val-top1": 0.4657, "imagenet-zeroshot-val-top5": 0.74802}
2
+ {"imagenet-zeroshot-val-top1": 0.48818, "imagenet-zeroshot-val-top5": 0.76934}
clip_vit_l16_s128m_bs16k/checkpoints/step_6104.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd3295304a7cd1658c2026ec172f1a4760a22719c3f0340a7cf12a166655a020
3
+ size 5133440026
clip_vit_l16_s128m_bs16k/out.log ADDED
@@ -0,0 +1,279 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-04-12,21:02:46 | INFO | No latest resume checkpoint found in ./logs/vit_l16_s128m_bs16k/checkpoints.
2
+ 2024-04-12,21:03:05 | INFO | Running in distributed mode with multiple processes. Device: cuda:0.Process (global: 0, local 0), total 16.
3
+ 2024-04-12,21:03:05 | INFO | Loaded ViT-L-16 model config.
4
+ 2024-04-12,21:03:09 | INFO | Model:
5
+ 2024-04-12,21:03:09 | INFO | CLIP(
6
+ (visual): VisionTransformer(
7
+ (patchnorm_pre_ln): Identity()
8
+ (conv1): Conv2d(3, 1024, kernel_size=(16, 16), stride=(16, 16), bias=False)
9
+ (patch_dropout): Identity()
10
+ (ln_pre): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
11
+ (transformer): Transformer(
12
+ (resblocks): ModuleList(
13
+ (0-23): 24 x ResidualAttentionBlock(
14
+ (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
15
+ (attn): MultiheadAttention(
16
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=1024, out_features=1024, bias=True)
17
+ )
18
+ (ls_1): Identity()
19
+ (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
20
+ (mlp): Sequential(
21
+ (c_fc): Linear(in_features=1024, out_features=4096, bias=True)
22
+ (gelu): GELU(approximate='none')
23
+ (c_proj): Linear(in_features=4096, out_features=1024, bias=True)
24
+ )
25
+ (ls_2): Identity()
26
+ )
27
+ )
28
+ )
29
+ (ln_post): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
30
+ )
31
+ (transformer): Transformer(
32
+ (resblocks): ModuleList(
33
+ (0-11): 12 x ResidualAttentionBlock(
34
+ (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
35
+ (attn): MultiheadAttention(
36
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
37
+ )
38
+ (ls_1): Identity()
39
+ (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
40
+ (mlp): Sequential(
41
+ (c_fc): Linear(in_features=768, out_features=3072, bias=True)
42
+ (gelu): GELU(approximate='none')
43
+ (c_proj): Linear(in_features=3072, out_features=768, bias=True)
44
+ )
45
+ (ls_2): Identity()
46
+ )
47
+ )
48
+ )
49
+ (token_embedding): Embedding(49408, 768)
50
+ (ln_final): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
51
+ )
52
+ 2024-04-12,21:03:09 | INFO | Params:
53
+ 2024-04-12,21:03:09 | INFO | accum_freq: 1
54
+ 2024-04-12,21:03:09 | INFO | aug_cfg: {}
55
+ 2024-04-12,21:03:09 | INFO | batch_size: 1024
56
+ 2024-04-12,21:03:09 | INFO | beta1: 0.9
57
+ 2024-04-12,21:03:09 | INFO | beta2: 0.98
58
+ 2024-04-12,21:03:09 | INFO | bin_balanced_sampling_expand: None
59
+ 2024-04-12,21:03:09 | INFO | bin_balanced_sampling_nbins: None
60
+ 2024-04-12,21:03:09 | INFO | checkpoint_path: ./logs/vit_l16_s128m_bs16k/checkpoints
61
+ 2024-04-12,21:03:09 | INFO | coca_caption_loss_weight: 2.0
62
+ 2024-04-12,21:03:09 | INFO | coca_contrastive_loss_weight: 1.0
63
+ 2024-04-12,21:03:09 | INFO | copy_codebase: False
64
+ 2024-04-12,21:03:09 | INFO | csv_caption_key: title
65
+ 2024-04-12,21:03:09 | INFO | csv_img_key: filepath
66
+ 2024-04-12,21:03:09 | INFO | csv_separator:
67
+ 2024-04-12,21:03:09 | INFO | dataset_resampled: False
68
+ 2024-04-12,21:03:09 | INFO | dataset_type: webdataset
69
+ 2024-04-12,21:03:09 | INFO | ddp_static_graph: True
70
+ 2024-04-12,21:03:09 | INFO | debug: False
71
+ 2024-04-12,21:03:09 | INFO | delete_prev_step_ckpt: True
72
+ 2024-04-12,21:03:09 | INFO | delete_previous_checkpoint: False
73
+ 2024-04-12,21:03:09 | INFO | device: cuda:0
74
+ 2024-04-12,21:03:09 | INFO | dist_backend: nccl
75
+ 2024-04-12,21:03:09 | INFO | dist_url: env://
76
+ 2024-04-12,21:03:09 | INFO | distill: False
77
+ 2024-04-12,21:03:09 | INFO | distill_model: None
78
+ 2024-04-12,21:03:09 | INFO | distill_pretrained: None
79
+ 2024-04-12,21:03:09 | INFO | distributed: True
80
+ 2024-04-12,21:03:09 | INFO | epochs: 1
81
+ 2024-04-12,21:03:09 | INFO | epochs_cooldown: None
82
+ 2024-04-12,21:03:09 | INFO | eps: 1e-06
83
+ 2024-04-12,21:03:09 | INFO | flash_attn: False
84
+ 2024-04-12,21:03:09 | INFO | force_custom_text: False
85
+ 2024-04-12,21:03:09 | INFO | force_image_size: 224
86
+ 2024-04-12,21:03:09 | INFO | force_patch_dropout: None
87
+ 2024-04-12,21:03:09 | INFO | force_quick_gelu: False
88
+ 2024-04-12,21:03:09 | INFO | gather_with_grad: True
89
+ 2024-04-12,21:03:09 | INFO | global_batch_size: 16384
90
+ 2024-04-12,21:03:09 | INFO | grad_checkpointing: True
91
+ 2024-04-12,21:03:09 | INFO | grad_clip_norm: None
92
+ 2024-04-12,21:03:09 | INFO | horovod: False
93
+ 2024-04-12,21:03:09 | INFO | image_mean: None
94
+ 2024-04-12,21:03:09 | INFO | image_std: None
95
+ 2024-04-12,21:03:09 | INFO | imagenet_v2: None
96
+ 2024-04-12,21:03:09 | INFO | imagenet_val: /mnt/bn/zilongdata-us/dataset/ILSVRC/Data/CLS-LOC/val
97
+ 2024-04-12,21:03:09 | INFO | local_loss: True
98
+ 2024-04-12,21:03:09 | INFO | local_rank: 0
99
+ 2024-04-12,21:03:09 | INFO | lock_image: False
100
+ 2024-04-12,21:03:09 | INFO | lock_image_freeze_bn_stats: False
101
+ 2024-04-12,21:03:09 | INFO | lock_image_unlocked_groups: 0
102
+ 2024-04-12,21:03:09 | INFO | lock_text: False
103
+ 2024-04-12,21:03:09 | INFO | lock_text_freeze_layer_norm: False
104
+ 2024-04-12,21:03:09 | INFO | lock_text_unlocked_layers: 0
105
+ 2024-04-12,21:03:09 | INFO | log_every_n_steps: 128
106
+ 2024-04-12,21:03:09 | INFO | log_level: 20
107
+ 2024-04-12,21:03:09 | INFO | log_local: False
108
+ 2024-04-12,21:03:09 | INFO | log_path: ./logs/vit_l16_s128m_bs16k/out.log
109
+ 2024-04-12,21:03:09 | INFO | logs: ./logs
110
+ 2024-04-12,21:03:09 | INFO | lr: 0.0005
111
+ 2024-04-12,21:03:09 | INFO | lr_cooldown_end: 0.0
112
+ 2024-04-12,21:03:09 | INFO | lr_cooldown_power: 1.0
113
+ 2024-04-12,21:03:09 | INFO | lr_multiplier_text: None
114
+ 2024-04-12,21:03:09 | INFO | lr_scheduler: cosine
115
+ 2024-04-12,21:03:09 | INFO | model: ViT-L-16
116
+ 2024-04-12,21:03:09 | INFO | name: vit_l16_s128m_bs16k
117
+ 2024-04-12,21:03:09 | INFO | no_set_device_rank: False
118
+ 2024-04-12,21:03:09 | INFO | precision: amp_bfloat16
119
+ 2024-04-12,21:03:09 | INFO | pretrained:
120
+ 2024-04-12,21:03:09 | INFO | pretrained_image: False
121
+ 2024-04-12,21:03:09 | INFO | pretrained_optim_scaler: False
122
+ 2024-04-12,21:03:09 | INFO | pretrained_text:
123
+ 2024-04-12,21:03:09 | INFO | rank: 0
124
+ 2024-04-12,21:03:09 | INFO | remote_sync: None
125
+ 2024-04-12,21:03:09 | INFO | remote_sync_frequency: 300
126
+ 2024-04-12,21:03:09 | INFO | remote_sync_protocol: s3
127
+ 2024-04-12,21:03:09 | INFO | report_to: tensorboard
128
+ 2024-04-12,21:03:09 | INFO | resume: None
129
+ 2024-04-12,21:03:09 | INFO | save_every_n_steps: 6104
130
+ 2024-04-12,21:03:09 | INFO | save_frequency: 1
131
+ 2024-04-12,21:03:09 | INFO | save_most_recent: False
132
+ 2024-04-12,21:03:09 | INFO | seed: 0
133
+ 2024-04-12,21:03:09 | INFO | skip_scheduler: False
134
+ 2024-04-12,21:03:09 | INFO | tensorboard: True
135
+ 2024-04-12,21:03:09 | INFO | tensorboard_path: ./logs/vit_l16_s128m_bs16k/tensorboard
136
+ 2024-04-12,21:03:09 | INFO | torchcompile: False
137
+ 2024-04-12,21:03:09 | INFO | torchscript: False
138
+ 2024-04-12,21:03:09 | INFO | trace: False
139
+ 2024-04-12,21:03:09 | INFO | train_data: /mnt/bn/zilongdata-us/dataset/datacomp-1b-webdataset/{000000..140146}.tar
140
+ 2024-04-12,21:03:09 | INFO | train_data_upsampling_factors: None
141
+ 2024-04-12,21:03:09 | INFO | train_num_samples: 128000000
142
+ 2024-04-12,21:03:09 | INFO | unlock_text_proj: False
143
+ 2024-04-12,21:03:09 | INFO | unset_text_grad_checkpointing: False
144
+ 2024-04-12,21:03:09 | INFO | use_bn_sync: False
145
+ 2024-04-12,21:03:09 | INFO | use_bnb_linear: None
146
+ 2024-04-12,21:03:09 | INFO | val_data: None
147
+ 2024-04-12,21:03:09 | INFO | val_frequency: 1
148
+ 2024-04-12,21:03:09 | INFO | val_num_samples: None
149
+ 2024-04-12,21:03:09 | INFO | val_steps: 6104
150
+ 2024-04-12,21:03:09 | INFO | wandb: False
151
+ 2024-04-12,21:03:09 | INFO | wandb_notes:
152
+ 2024-04-12,21:03:09 | INFO | wandb_project_name: open-clip
153
+ 2024-04-12,21:03:09 | INFO | warmup: 500
154
+ 2024-04-12,21:03:09 | INFO | wd: 0.2
155
+ 2024-04-12,21:03:09 | INFO | workers: 6
156
+ 2024-04-12,21:03:09 | INFO | world_size: 16
157
+ 2024-04-12,21:03:09 | INFO | zeroshot_frequency: 2
158
+ 2024-04-12,21:03:09 | INFO | zeroshot_steps: 6104
159
+ 2024-04-12,21:03:29 | INFO | Start epoch 0
160
+ 2024-04-12,21:03:53 | INFO | Train Epoch: 0 [ 16384/128090112 (0%)] Data (t): 11.822 Batch (t): 24.653, 664.586/s, 41.5366/s/gpu LR: 0.000001 Logit Scale: 14.286 Contrastive_loss: 9.7578 (9.7578) Loss: 9.7578 (9.7578)
161
+ 2024-04-12,21:04:40 | WARNING | Handling webdataset error (OSError('image file is truncated (44 bytes not processed)')). Ignoring.
162
+ 2024-04-12,21:16:05 | INFO | Train Epoch: 0 [ 2113536/128090112 (2%)] Data (t): 0.411 Batch (t): 5.719, 2821.46/s, 176.341/s/gpu LR: 0.000129 Logit Scale: 14.255 Contrastive_loss: 9.0504 (9.4041) Loss: 9.0504 (9.4041)
163
+ 2024-04-12,21:23:35 | WARNING | Handling webdataset error (OSError('image file is truncated (68 bytes not processed)')). Ignoring.
164
+ 2024-04-12,21:28:17 | INFO | Train Epoch: 0 [ 4210688/128090112 (3%)] Data (t): 0.419 Batch (t): 5.718, 2869.95/s, 179.372/s/gpu LR: 0.000257 Logit Scale: 14.227 Contrastive_loss: 8.5726 (9.1269) Loss: 8.5726 (9.1269)
165
+ 2024-04-12,21:39:16 | WARNING | Handling webdataset error (OSError('image file is truncated (25 bytes not processed)')). Ignoring.
166
+ 2024-04-12,21:40:28 | INFO | Train Epoch: 0 [ 6307840/128090112 (5%)] Data (t): 0.422 Batch (t): 5.712, 2883.32/s, 180.207/s/gpu LR: 0.000385 Logit Scale: 14.235 Contrastive_loss: 8.2402 (8.9052) Loss: 8.2402 (8.9052)
167
+ 2024-04-12,21:52:39 | INFO | Train Epoch: 0 [ 8404992/128090112 (7%)] Data (t): 0.419 Batch (t): 5.707, 2866.15/s, 179.135/s/gpu LR: 0.000500 Logit Scale: 14.418 Contrastive_loss: 7.9331 (8.7108) Loss: 7.9331 (8.7108)
168
+ 2024-04-12,22:04:51 | INFO | Train Epoch: 0 [ 10502144/128090112 (8%)] Data (t): 0.422 Batch (t): 5.720, 2839.29/s, 177.455/s/gpu LR: 0.000500 Logit Scale: 14.977 Contrastive_loss: 7.2912 (8.4742) Loss: 7.2912 (8.4742)
169
+ 2024-04-12,22:17:02 | INFO | Train Epoch: 0 [ 12599296/128090112 (10%)] Data (t): 0.420 Batch (t): 5.709, 2897.45/s, 181.090/s/gpu LR: 0.000498 Logit Scale: 15.924 Contrastive_loss: 7.0521 (8.2710) Loss: 7.0521 (8.2710)
170
+ 2024-04-12,22:18:00 | WARNING | Handling webdataset error (OSError('image file is truncated (40 bytes not processed)')). Ignoring.
171
+ 2024-04-12,22:25:05 | WARNING | Handling webdataset error (OSError('image file is truncated (50 bytes not processed)')). Ignoring.
172
+ 2024-04-12,22:27:03 | WARNING | Handling webdataset error (OSError('image file is truncated (21 bytes not processed)')). Ignoring.
173
+ 2024-04-12,22:27:28 | WARNING | Handling webdataset error (OSError('image file is truncated (104 bytes not processed)')). Ignoring.
174
+ 2024-04-12,22:29:13 | INFO | Train Epoch: 0 [ 14696448/128090112 (11%)] Data (t): 0.423 Batch (t): 5.713, 2849.12/s, 178.070/s/gpu LR: 0.000496 Logit Scale: 17.014 Contrastive_loss: 6.3296 (8.0284) Loss: 6.3296 (8.0284)
175
+ 2024-04-12,22:40:17 | WARNING | Handling webdataset error (OSError('image file is truncated (147 bytes not processed)')). Ignoring.
176
+ 2024-04-12,22:41:24 | INFO | Train Epoch: 0 [ 16793600/128090112 (13%)] Data (t): 0.423 Batch (t): 5.713, 2818.41/s, 176.151/s/gpu LR: 0.000494 Logit Scale: 18.174 Contrastive_loss: 5.8330 (7.7844) Loss: 5.8330 (7.7844)
177
+ 2024-04-12,22:51:11 | WARNING | Handling webdataset error (OSError('image file is truncated (32 bytes not processed)')). Ignoring.
178
+ 2024-04-12,22:53:37 | INFO | Train Epoch: 0 [ 18890752/128090112 (15%)] Data (t): 0.424 Batch (t): 5.723, 2905.41/s, 181.588/s/gpu LR: 0.000490 Logit Scale: 19.370 Contrastive_loss: 5.5715 (7.5631) Loss: 5.5715 (7.5631)
179
+ 2024-04-12,23:05:21 | WARNING | Handling webdataset error (OSError('image file is truncated (23 bytes not processed)')). Ignoring.
180
+ 2024-04-12,23:05:48 | INFO | Train Epoch: 0 [ 20987904/128090112 (16%)] Data (t): 0.421 Batch (t): 5.714, 2851.18/s, 178.199/s/gpu LR: 0.000486 Logit Scale: 20.619 Contrastive_loss: 5.3531 (7.3622) Loss: 5.3531 (7.3622)
181
+ 2024-04-12,23:18:00 | INFO | Train Epoch: 0 [ 23085056/128090112 (18%)] Data (t): 0.422 Batch (t): 5.719, 2841.64/s, 177.602/s/gpu LR: 0.000481 Logit Scale: 21.907 Contrastive_loss: 4.8808 (7.1554) Loss: 4.8808 (7.1554)
182
+ 2024-04-12,23:23:00 | WARNING | Handling webdataset error (OSError('image file is truncated (49 bytes not processed)')). Ignoring.
183
+ 2024-04-12,23:30:12 | INFO | Train Epoch: 0 [ 25182208/128090112 (20%)] Data (t): 0.423 Batch (t): 5.720, 2886.18/s, 180.386/s/gpu LR: 0.000476 Logit Scale: 23.236 Contrastive_loss: 4.7109 (6.9674) Loss: 4.7109 (6.9674)
184
+ 2024-04-12,23:42:26 | INFO | Train Epoch: 0 [ 27279360/128090112 (21%)] Data (t): 0.423 Batch (t): 5.731, 2867.17/s, 179.198/s/gpu LR: 0.000469 Logit Scale: 24.569 Contrastive_loss: 4.5048 (6.7915) Loss: 4.5048 (6.7915)
185
+ 2024-04-12,23:54:39 | INFO | Train Epoch: 0 [ 29376512/128090112 (23%)] Data (t): 0.422 Batch (t): 5.729, 2816.47/s, 176.029/s/gpu LR: 0.000463 Logit Scale: 26.015 Contrastive_loss: 4.2151 (6.6197) Loss: 4.2151 (6.6197)
186
+ 2024-04-13,00:06:52 | INFO | Train Epoch: 0 [ 31473664/128090112 (25%)] Data (t): 0.426 Batch (t): 5.726, 2875.11/s, 179.695/s/gpu LR: 0.000455 Logit Scale: 27.464 Contrastive_loss: 3.9691 (6.4541) Loss: 3.9691 (6.4541)
187
+ 2024-04-13,00:14:21 | WARNING | Handling webdataset error (OSError('image file is truncated (43 bytes not processed)')). Ignoring.
188
+ 2024-04-13,00:16:03 | WARNING | Handling webdataset error (OSError('image file is truncated (9 bytes not processed)')). Ignoring.
189
+ 2024-04-13,00:19:07 | INFO | Train Epoch: 0 [ 33570816/128090112 (26%)] Data (t): 0.425 Batch (t): 5.739, 2868.86/s, 179.304/s/gpu LR: 0.000447 Logit Scale: 28.948 Contrastive_loss: 3.9410 (6.3062) Loss: 3.9410 (6.3062)
190
+ 2024-04-13,00:31:22 | INFO | Train Epoch: 0 [ 35667968/128090112 (28%)] Data (t): 0.421 Batch (t): 5.743, 2834.51/s, 177.157/s/gpu LR: 0.000438 Logit Scale: 30.432 Contrastive_loss: 3.8489 (6.1697) Loss: 3.8489 (6.1697)
191
+ 2024-04-13,00:31:35 | WARNING | Handling webdataset error (OSError('image file is truncated (59 bytes not processed)')). Ignoring.
192
+ 2024-04-13,00:35:55 | WARNING | Handling webdataset error (OSError('image file is truncated (153 bytes not processed)')). Ignoring.
193
+ 2024-04-13,00:43:37 | INFO | Train Epoch: 0 [ 37765120/128090112 (29%)] Data (t): 0.422 Batch (t): 5.744, 2878.43/s, 179.902/s/gpu LR: 0.000429 Logit Scale: 31.931 Contrastive_loss: 3.6258 (6.0358) Loss: 3.6258 (6.0358)
194
+ 2024-04-13,00:54:53 | WARNING | Handling webdataset error (OSError('image file is truncated (1 bytes not processed)')). Ignoring.
195
+ 2024-04-13,00:55:54 | INFO | Train Epoch: 0 [ 39862272/128090112 (31%)] Data (t): 0.404 Batch (t): 5.753, 2829.96/s, 176.873/s/gpu LR: 0.000419 Logit Scale: 33.416 Contrastive_loss: 3.5229 (5.9102) Loss: 3.5229 (5.9102)
196
+ 2024-04-13,00:59:13 | WARNING | Handling webdataset error (OSError('image file is truncated (199 bytes not processed)')). Ignoring.
197
+ 2024-04-13,01:08:09 | INFO | Train Epoch: 0 [ 41959424/128090112 (33%)] Data (t): 0.399 Batch (t): 5.742, 2796.80/s, 174.800/s/gpu LR: 0.000408 Logit Scale: 34.880 Contrastive_loss: 3.4352 (5.7923) Loss: 3.4352 (5.7923)
198
+ 2024-04-13,01:20:23 | INFO | Train Epoch: 0 [ 44056576/128090112 (34%)] Data (t): 0.391 Batch (t): 5.738, 2883.89/s, 180.243/s/gpu LR: 0.000398 Logit Scale: 36.271 Contrastive_loss: 3.2947 (5.6788) Loss: 3.2947 (5.6788)
199
+ 2024-04-13,01:23:57 | WARNING | Handling webdataset error (OSError('image file is truncated (0 bytes not processed)')). Ignoring.
200
+ 2024-04-13,01:24:23 | WARNING | Handling webdataset error (OSError('image file is truncated (11 bytes not processed)')). Ignoring.
201
+ 2024-04-13,01:25:56 | WARNING | Handling webdataset error (OSError('image file is truncated (87 bytes not processed)')). Ignoring.
202
+ 2024-04-13,01:32:38 | INFO | Train Epoch: 0 [ 46153728/128090112 (36%)] Data (t): 0.380 Batch (t): 5.739, 2863.42/s, 178.964/s/gpu LR: 0.000386 Logit Scale: 37.602 Contrastive_loss: 3.0228 (5.5633) Loss: 3.0228 (5.5633)
203
+ 2024-04-13,01:44:53 | INFO | Train Epoch: 0 [ 48250880/128090112 (38%)] Data (t): 0.378 Batch (t): 5.746, 2833.90/s, 177.119/s/gpu LR: 0.000375 Logit Scale: 38.753 Contrastive_loss: 3.1084 (5.4610) Loss: 3.1084 (5.4610)
204
+ 2024-04-13,01:48:03 | WARNING | Handling webdataset error (OSError('image file is truncated (18 bytes not processed)')). Ignoring.
205
+ 2024-04-13,01:48:15 | WARNING | Handling webdataset error (OSError('image file is truncated (59 bytes not processed)')). Ignoring.
206
+ 2024-04-13,01:57:09 | INFO | Train Epoch: 0 [ 50348032/128090112 (39%)] Data (t): 0.384 Batch (t): 5.750, 2858.92/s, 178.682/s/gpu LR: 0.000362 Logit Scale: 39.920 Contrastive_loss: 3.0141 (5.3632) Loss: 3.0141 (5.3632)
207
+ 2024-04-13,02:04:03 | WARNING | Handling webdataset error (OSError('image file is truncated (84 bytes not processed)')). Ignoring.
208
+ 2024-04-13,02:09:25 | INFO | Train Epoch: 0 [ 52445184/128090112 (41%)] Data (t): 0.388 Batch (t): 5.746, 2822.08/s, 176.380/s/gpu LR: 0.000350 Logit Scale: 40.943 Contrastive_loss: 2.8858 (5.2679) Loss: 2.8858 (5.2679)
209
+ 2024-04-13,02:21:40 | INFO | Train Epoch: 0 [ 54542336/128090112 (43%)] Data (t): 0.387 Batch (t): 5.747, 2867.27/s, 179.204/s/gpu LR: 0.000337 Logit Scale: 41.906 Contrastive_loss: 2.7359 (5.1741) Loss: 2.7359 (5.1741)
210
+ 2024-04-13,02:33:54 | INFO | Train Epoch: 0 [ 56639488/128090112 (44%)] Data (t): 0.389 Batch (t): 5.730, 2893.65/s, 180.853/s/gpu LR: 0.000324 Logit Scale: 42.867 Contrastive_loss: 2.6730 (5.0848) Loss: 2.6730 (5.0848)
211
+ 2024-04-13,02:38:31 | WARNING | Handling webdataset error (OSError('image file is truncated (131 bytes not processed)')). Ignoring.
212
+ 2024-04-13,02:39:16 | WARNING | Handling webdataset error (OSError('image file is truncated (49 bytes not processed)')). Ignoring.
213
+ 2024-04-13,02:46:09 | INFO | Train Epoch: 0 [ 58736640/128090112 (46%)] Data (t): 0.393 Batch (t): 5.742, 2828.86/s, 176.804/s/gpu LR: 0.000311 Logit Scale: 43.679 Contrastive_loss: 2.5947 (4.9989) Loss: 2.5947 (4.9989)
214
+ 2024-04-13,02:56:25 | WARNING | Handling webdataset error (OSError('image file is truncated (40 bytes not processed)')). Ignoring.
215
+ 2024-04-13,02:58:22 | INFO | Train Epoch: 0 [ 60833792/128090112 (47%)] Data (t): 0.417 Batch (t): 5.728, 2834.82/s, 177.176/s/gpu LR: 0.000298 Logit Scale: 44.469 Contrastive_loss: 2.5503 (4.9173) Loss: 2.5503 (4.9173)
216
+ 2024-04-13,03:05:38 | WARNING | Handling webdataset error (OSError('image file is truncated (25 bytes not processed)')). Ignoring.
217
+ 2024-04-13,03:10:36 | INFO | Train Epoch: 0 [ 62930944/128090112 (49%)] Data (t): 0.416 Batch (t): 5.738, 2903.23/s, 181.452/s/gpu LR: 0.000284 Logit Scale: 45.178 Contrastive_loss: 2.5830 (4.8420) Loss: 2.5830 (4.8420)
218
+ 2024-04-13,03:22:51 | INFO | Train Epoch: 0 [ 65028096/128090112 (51%)] Data (t): 0.416 Batch (t): 5.738, 2835.35/s, 177.209/s/gpu LR: 0.000270 Logit Scale: 45.873 Contrastive_loss: 2.4395 (4.7669) Loss: 2.4395 (4.7669)
219
+ 2024-04-13,03:23:32 | WARNING | Handling webdataset error (OSError('image file is truncated (41 bytes not processed)')). Ignoring.
220
+ 2024-04-13,03:35:05 | INFO | Train Epoch: 0 [ 67125248/128090112 (52%)] Data (t): 0.415 Batch (t): 5.736, 2870.09/s, 179.380/s/gpu LR: 0.000257 Logit Scale: 46.548 Contrastive_loss: 2.5179 (4.6988) Loss: 2.5179 (4.6988)
221
+ 2024-04-13,03:47:18 | INFO | Train Epoch: 0 [ 69222400/128090112 (54%)] Data (t): 0.415 Batch (t): 5.725, 2873.48/s, 179.592/s/gpu LR: 0.000243 Logit Scale: 47.196 Contrastive_loss: 2.4449 (4.6325) Loss: 2.4449 (4.6325)
222
+ 2024-04-13,03:59:33 | INFO | Train Epoch: 0 [ 71319552/128090112 (56%)] Data (t): 0.416 Batch (t): 5.741, 2789.61/s, 174.351/s/gpu LR: 0.000229 Logit Scale: 47.824 Contrastive_loss: 2.2695 (4.5650) Loss: 2.2695 (4.5650)
223
+ 2024-04-13,04:10:38 | WARNING | Handling webdataset error (OSError('image file is truncated (46 bytes not processed)')). Ignoring.
224
+ 2024-04-13,04:11:47 | INFO | Train Epoch: 0 [ 73416704/128090112 (57%)] Data (t): 0.416 Batch (t): 5.735, 2851.66/s, 178.229/s/gpu LR: 0.000216 Logit Scale: 48.458 Contrastive_loss: 2.1761 (4.4986) Loss: 2.1761 (4.4986)
225
+ 2024-04-13,04:24:01 | INFO | Train Epoch: 0 [ 75513856/128090112 (59%)] Data (t): 0.414 Batch (t): 5.734, 2883.65/s, 180.228/s/gpu LR: 0.000202 Logit Scale: 49.016 Contrastive_loss: 2.2363 (4.4375) Loss: 2.2363 (4.4375)
226
+ 2024-04-13,04:28:13 | WARNING | Handling webdataset error (OSError('image file is truncated (73 bytes not processed)')). Ignoring.
227
+ 2024-04-13,04:31:36 | WARNING | Handling webdataset error (OSError('image file is truncated (82 bytes not processed)')). Ignoring.
228
+ 2024-04-13,04:32:05 | WARNING | Handling webdataset error (OSError('image file is truncated (107 bytes not processed)')). Ignoring.
229
+ 2024-04-13,04:36:15 | INFO | Train Epoch: 0 [ 77611008/128090112 (61%)] Data (t): 0.413 Batch (t): 5.734, 2834.46/s, 177.154/s/gpu LR: 0.000189 Logit Scale: 49.594 Contrastive_loss: 2.2648 (4.3803) Loss: 2.2648 (4.3803)
230
+ 2024-04-13,04:47:39 | WARNING | Handling webdataset error (OSError('image file is truncated (120 bytes not processed)')). Ignoring.
231
+ 2024-04-13,04:48:28 | INFO | Train Epoch: 0 [ 79708160/128090112 (62%)] Data (t): 0.406 Batch (t): 5.732, 2813.06/s, 175.817/s/gpu LR: 0.000175 Logit Scale: 50.139 Contrastive_loss: 2.0931 (4.3216) Loss: 2.0931 (4.3216)
232
+ 2024-04-13,04:58:02 | WARNING | Handling webdataset error (OSError('image file is truncated (2 bytes not processed)')). Ignoring.
233
+ 2024-04-13,05:00:42 | INFO | Train Epoch: 0 [ 81805312/128090112 (64%)] Data (t): 0.406 Batch (t): 5.731, 2899.02/s, 181.189/s/gpu LR: 0.000162 Logit Scale: 50.647 Contrastive_loss: 2.1905 (4.2684) Loss: 2.1905 (4.2684)
234
+ 2024-04-13,05:12:57 | INFO | Train Epoch: 0 [ 83902464/128090112 (66%)] Data (t): 0.395 Batch (t): 5.744, 2830.14/s, 176.884/s/gpu LR: 0.000150 Logit Scale: 51.182 Contrastive_loss: 1.9640 (4.2121) Loss: 1.9640 (4.2121)
235
+ 2024-04-13,05:18:11 | WARNING | Handling webdataset error (OSError('image file is truncated (54 bytes not processed)')). Ignoring.
236
+ 2024-04-13,05:25:11 | INFO | Train Epoch: 0 [ 85999616/128090112 (67%)] Data (t): 0.396 Batch (t): 5.738, 2855.24/s, 178.453/s/gpu LR: 0.000137 Logit Scale: 51.646 Contrastive_loss: 2.0197 (4.1599) Loss: 2.0197 (4.1599)
237
+ 2024-04-13,05:37:28 | INFO | Train Epoch: 0 [ 88096768/128090112 (69%)] Data (t): 0.399 Batch (t): 5.751, 2837.14/s, 177.321/s/gpu LR: 0.000125 Logit Scale: 52.111 Contrastive_loss: 1.9012 (4.1074) Loss: 1.9012 (4.1074)
238
+ 2024-04-13,05:40:29 | WARNING | Handling webdataset error (OSError('image file is truncated (46 bytes not processed)')). Ignoring.
239
+ 2024-04-13,05:47:54 | WARNING | Handling webdataset error (OSError('image file is truncated (87 bytes not processed)')). Ignoring.
240
+ 2024-04-13,05:49:45 | INFO | Train Epoch: 0 [ 90193920/128090112 (70%)] Data (t): 0.386 Batch (t): 5.761, 2866.58/s, 179.161/s/gpu LR: 0.000114 Logit Scale: 52.591 Contrastive_loss: 1.9273 (4.0579) Loss: 1.9273 (4.0579)
241
+ 2024-04-13,05:58:30 | WARNING | Handling webdataset error (OSError('image file is truncated (31 bytes not processed)')). Ignoring.
242
+ 2024-04-13,06:02:00 | INFO | Train Epoch: 0 [ 92291072/128090112 (72%)] Data (t): 0.391 Batch (t): 5.741, 2845.74/s, 177.858/s/gpu LR: 0.000102 Logit Scale: 52.979 Contrastive_loss: 1.9484 (4.0110) Loss: 1.9484 (4.0110)
243
+ 2024-04-13,06:14:14 | INFO | Train Epoch: 0 [ 94388224/128090112 (74%)] Data (t): 0.390 Batch (t): 5.735, 2852.46/s, 178.279/s/gpu LR: 0.000091 Logit Scale: 53.354 Contrastive_loss: 1.7383 (3.9616) Loss: 1.7383 (3.9616)
244
+ 2024-04-13,06:26:29 | INFO | Train Epoch: 0 [ 96485376/128090112 (75%)] Data (t): 0.407 Batch (t): 5.744, 2818.63/s, 176.165/s/gpu LR: 0.000081 Logit Scale: 53.673 Contrastive_loss: 1.6647 (3.9127) Loss: 1.6647 (3.9127)
245
+ 2024-04-13,06:38:42 | INFO | Train Epoch: 0 [ 98582528/128090112 (77%)] Data (t): 0.418 Batch (t): 5.724, 2861.27/s, 178.830/s/gpu LR: 0.000071 Logit Scale: 53.997 Contrastive_loss: 1.8124 (3.8690) Loss: 1.8124 (3.8690)
246
+ 2024-04-13,06:47:01 | INFO | Starting zero-shot imagenet.
247
+ 2024-04-13,06:47:01 | INFO | Building zero-shot classifier
248
+ 2024-04-13,06:47:17 | INFO | Using classifier
249
+ 2024-04-13,06:50:43 | INFO | Finished zero-shot imagenet.
250
+ 2024-04-13,06:50:43 | INFO | Eval Epoch: 0.7806344333589154 imagenet-zeroshot-val-top1: 0.4657 imagenet-zeroshot-val-top5: 0.7480
251
+ 2024-04-13,06:51:48 | WARNING | Handling webdataset error (OSError('image file is truncated (64 bytes not processed)')). Ignoring.
252
+ 2024-04-13,06:54:44 | INFO | Train Epoch: 0 [100679680/128090112 (79%)] Data (t): 2.193 Batch (t): 7.517, 2914.18/s, 182.136/s/gpu LR: 0.000062 Logit Scale: 54.269 Contrastive_loss: 1.7838 (3.8264) Loss: 1.7838 (3.8264)
253
+ 2024-04-13,07:06:57 | INFO | Train Epoch: 0 [102776832/128090112 (80%)] Data (t): 0.400 Batch (t): 5.728, 2875.26/s, 179.704/s/gpu LR: 0.000053 Logit Scale: 54.535 Contrastive_loss: 1.6652 (3.7832) Loss: 1.6652 (3.7832)
254
+ 2024-04-13,07:15:03 | WARNING | Handling webdataset error (OSError('image file is truncated (23 bytes not processed)')). Ignoring.
255
+ 2024-04-13,07:19:10 | INFO | Train Epoch: 0 [104873984/128090112 (82%)] Data (t): 0.394 Batch (t): 5.722, 2870.83/s, 179.427/s/gpu LR: 0.000045 Logit Scale: 54.755 Contrastive_loss: 1.7902 (3.7441) Loss: 1.7902 (3.7441)
256
+ 2024-04-13,07:26:27 | WARNING | Handling webdataset error (OSError('image file is truncated (90 bytes not processed)')). Ignoring.
257
+ 2024-04-13,07:31:23 | INFO | Train Epoch: 0 [106971136/128090112 (84%)] Data (t): 0.400 Batch (t): 5.729, 2893.52/s, 180.845/s/gpu LR: 0.000037 Logit Scale: 54.946 Contrastive_loss: 1.6774 (3.7044) Loss: 1.6774 (3.7044)
258
+ 2024-04-13,07:43:37 | INFO | Train Epoch: 0 [109068288/128090112 (85%)] Data (t): 0.401 Batch (t): 5.730, 2809.98/s, 175.624/s/gpu LR: 0.000030 Logit Scale: 55.109 Contrastive_loss: 1.7930 (3.6683) Loss: 1.7930 (3.6683)
259
+ 2024-04-13,07:44:25 | WARNING | Handling webdataset error (OSError('image file is truncated (54 bytes not processed)')). Ignoring.
260
+ 2024-04-13,07:55:39 | WARNING | Handling webdataset error (OSError('image file is truncated (226 bytes not processed)')). Ignoring.
261
+ 2024-04-13,07:55:49 | INFO | Train Epoch: 0 [111165440/128090112 (87%)] Data (t): 0.401 Batch (t): 5.724, 2846.56/s, 177.910/s/gpu LR: 0.000024 Logit Scale: 55.252 Contrastive_loss: 1.5762 (3.6296) Loss: 1.5762 (3.6296)
262
+ 2024-04-13,08:08:02 | INFO | Train Epoch: 0 [113262592/128090112 (88%)] Data (t): 0.399 Batch (t): 5.724, 2887.41/s, 180.463/s/gpu LR: 0.000019 Logit Scale: 55.363 Contrastive_loss: 1.6003 (3.5927) Loss: 1.6003 (3.5927)
263
+ 2024-04-13,08:08:05 | WARNING | Handling webdataset error (OSError('image file is truncated (72 bytes not processed)')). Ignoring.
264
+ 2024-04-13,08:20:12 | INFO | Train Epoch: 0 [115359744/128090112 (90%)] Data (t): 0.410 Batch (t): 5.707, 2905.62/s, 181.601/s/gpu LR: 0.000014 Logit Scale: 55.453 Contrastive_loss: 1.5806 (3.5567) Loss: 1.5806 (3.5567)
265
+ 2024-04-13,08:23:57 | WARNING | Handling webdataset error (OSError('image file is truncated (35 bytes not processed)')). Ignoring.
266
+ 2024-04-13,08:32:23 | INFO | Train Epoch: 0 [117456896/128090112 (92%)] Data (t): 0.394 Batch (t): 5.713, 2857.89/s, 178.618/s/gpu LR: 0.000010 Logit Scale: 55.518 Contrastive_loss: 1.6991 (3.5241) Loss: 1.6991 (3.5241)
267
+ 2024-04-13,08:44:35 | INFO | Train Epoch: 0 [119554048/128090112 (93%)] Data (t): 0.396 Batch (t): 5.716, 2898.56/s, 181.160/s/gpu LR: 0.000006 Logit Scale: 55.560 Contrastive_loss: 1.6996 (3.4927) Loss: 1.6996 (3.4927)
268
+ 2024-04-13,08:51:18 | WARNING | Handling webdataset error (OSError('image file is truncated (78 bytes not processed)')). Ignoring.
269
+ 2024-04-13,08:56:47 | INFO | Train Epoch: 0 [121651200/128090112 (95%)] Data (t): 0.396 Batch (t): 5.716, 2856.62/s, 178.539/s/gpu LR: 0.000004 Logit Scale: 55.587 Contrastive_loss: 1.6545 (3.4615) Loss: 1.6545 (3.4615)
270
+ 2024-04-13,09:08:38 | WARNING | Handling webdataset error (OSError('image file is truncated (72 bytes not processed)')). Ignoring.
271
+ 2024-04-13,09:08:59 | INFO | Train Epoch: 0 [123748352/128090112 (97%)] Data (t): 0.394 Batch (t): 5.720, 2851.77/s, 178.235/s/gpu LR: 0.000002 Logit Scale: 55.600 Contrastive_loss: 1.6294 (3.4310) Loss: 1.6294 (3.4310)
272
+ 2024-04-13,09:21:10 | INFO | Train Epoch: 0 [125845504/128090112 (98%)] Data (t): 0.388 Batch (t): 5.712, 2879.64/s, 179.978/s/gpu LR: 0.000000 Logit Scale: 55.606 Contrastive_loss: 1.5663 (3.4004) Loss: 1.5663 (3.4004)
273
+ 2024-04-13,09:27:12 | WARNING | Handling webdataset error (OSError('image file is truncated (87 bytes not processed)')). Ignoring.
274
+ 2024-04-13,09:33:20 | INFO | Train Epoch: 0 [127942656/128090112 (100%)] Data (t): 0.387 Batch (t): 5.703, 2853.54/s, 178.346/s/gpu LR: 0.000000 Logit Scale: 55.607 Contrastive_loss: 1.6320 (3.3719) Loss: 1.6320 (3.3719)
275
+ 2024-04-13,09:34:11 | INFO | Starting zero-shot imagenet.
276
+ 2024-04-13,09:34:11 | INFO | Building zero-shot classifier
277
+ 2024-04-13,09:34:27 | INFO | Using classifier
278
+ 2024-04-13,09:36:49 | INFO | Finished zero-shot imagenet.
279
+ 2024-04-13,09:36:49 | INFO | Eval Epoch: 1 imagenet-zeroshot-val-top1: 0.4882 imagenet-zeroshot-val-top5: 0.7693
clip_vit_l16_s128m_bs16k/params.txt ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ accum_freq: 1
2
+ aug_cfg: {}
3
+ batch_size: 1024
4
+ beta1: 0.9
5
+ beta2: 0.98
6
+ bin_balanced_sampling_expand: None
7
+ bin_balanced_sampling_nbins: None
8
+ checkpoint_path: ./logs/vit_l16_s128m_bs16k/checkpoints
9
+ coca_caption_loss_weight: 2.0
10
+ coca_contrastive_loss_weight: 1.0
11
+ copy_codebase: False
12
+ csv_caption_key: title
13
+ csv_img_key: filepath
14
+ csv_separator:
15
+ dataset_resampled: False
16
+ dataset_type: webdataset
17
+ ddp_static_graph: True
18
+ debug: False
19
+ delete_prev_step_ckpt: True
20
+ delete_previous_checkpoint: False
21
+ device: cuda:0
22
+ dist_backend: nccl
23
+ dist_url: env://
24
+ distill: False
25
+ distill_model: None
26
+ distill_pretrained: None
27
+ distributed: True
28
+ epochs: 1
29
+ epochs_cooldown: None
30
+ eps: 1e-06
31
+ flash_attn: False
32
+ force_custom_text: False
33
+ force_image_size: 224
34
+ force_patch_dropout: None
35
+ force_quick_gelu: False
36
+ gather_with_grad: True
37
+ global_batch_size: 16384
38
+ grad_checkpointing: True
39
+ grad_clip_norm: None
40
+ horovod: False
41
+ image_mean: None
42
+ image_std: None
43
+ imagenet_v2: None
44
+ imagenet_val: /mnt/bn/zilongdata-us/dataset/ILSVRC/Data/CLS-LOC/val
45
+ local_loss: True
46
+ local_rank: 0
47
+ lock_image: False
48
+ lock_image_freeze_bn_stats: False
49
+ lock_image_unlocked_groups: 0
50
+ lock_text: False
51
+ lock_text_freeze_layer_norm: False
52
+ lock_text_unlocked_layers: 0
53
+ log_every_n_steps: 128
54
+ log_level: 20
55
+ log_local: False
56
+ log_path: ./logs/vit_l16_s128m_bs16k/out.log
57
+ logs: ./logs
58
+ lr: 0.0005
59
+ lr_cooldown_end: 0.0
60
+ lr_cooldown_power: 1.0
61
+ lr_multiplier_text: None
62
+ lr_scheduler: cosine
63
+ model: ViT-L-16
64
+ name: vit_l16_s128m_bs16k
65
+ no_set_device_rank: False
66
+ precision: amp_bfloat16
67
+ pretrained:
68
+ pretrained_image: False
69
+ pretrained_optim_scaler: False
70
+ pretrained_text:
71
+ rank: 0
72
+ remote_sync: None
73
+ remote_sync_frequency: 300
74
+ remote_sync_protocol: s3
75
+ report_to: tensorboard
76
+ resume: None
77
+ save_every_n_steps: 6104
78
+ save_frequency: 1
79
+ save_most_recent: False
80
+ seed: 0
81
+ skip_scheduler: False
82
+ tensorboard: True
83
+ tensorboard_path: ./logs/vit_l16_s128m_bs16k/tensorboard
84
+ torchcompile: False
85
+ torchscript: False
86
+ trace: False
87
+ train_data: /mnt/bn/zilongdata-us/dataset/datacomp-1b-webdataset/{000000..140146}.tar
88
+ train_data_upsampling_factors: None
89
+ train_num_samples: 128000000
90
+ unlock_text_proj: False
91
+ unset_text_grad_checkpointing: False
92
+ use_bn_sync: False
93
+ use_bnb_linear: None
94
+ val_data: None
95
+ val_frequency: 1
96
+ val_num_samples: None
97
+ val_steps: 6104
98
+ wandb: False
99
+ wandb_notes:
100
+ wandb_project_name: open-clip
101
+ warmup: 500
102
+ wd: 0.2
103
+ workers: 6
104
+ world_size: 16
105
+ zeroshot_frequency: 2
106
+ zeroshot_steps: 6104
clip_vit_l16_s128m_bs16k/tensorboard/events.out.tfevents.1712927009.n107-099-120.2689.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd8a88e16decdec2699faa246898141a27eca1cc666c8d54cfacc830a9b8a38a
3
+ size 28232