lusxvr commited on
Commit
a024e38
·
1 Parent(s): 6342005
app/src/content/article.mdx CHANGED
@@ -263,7 +263,7 @@ Each of our ablations trains a 450M model with maximal image size of 1536x1536 p
263
  ### How does FineVision compare against the Baselines?
264
  Compared against existing VLM training datasets, FineVision produces significantly higher benchmark ranks than the other options.
265
 
266
- <HtmlEmbed src="d3-line.html" title="D3 Line" desc="TODO - Average Rank of Models trained on different open source datasets." />
267
 
268
  ### How contaminated are the datasets?
269
  To investigate data leakage from benchmarks into this dataset, we construct a deduplication pipeline based on the sample images. We embed the images of 66 image-test datasets from the lmms-eval framework using the SSCD descriptor, and compute the cosine similarity between our samples and the test-set embeddings. Whenever a sample has a similarity higher than a threshold of 0.95 it is assumed to be a duplicate. While our tests with various thresholds show that this is flagging some samples that are not actual duplicates (especially if the image depicts similar but different images in detail, like graphs or tables), we preferred to err on the side of caution. We open-source the deduplication pipeline here as well as the precomputed test-set embedding’s here.
@@ -279,7 +279,7 @@ TODO: Insert the Images here
279
 
280
  Additionally, we experimented with removing all found samples from all datasets to see if the outcome is different from the results above, but we observe the same distribution.
281
 
282
- <HtmlEmbed src="against-baselines-deduplicated.html" title="D3 Line" desc="TODO - Average Rank of Models trained on different deduplicated open source datasets." />
283
 
284
  TODO: After removing these duplicates, the performance of the models dropped by … % over all benchmarks.
285
 
@@ -297,12 +297,12 @@ Similarly to the comparison of the size, we also wanted to evaluate the datasets
297
  Since the training of a VLM already builds upon pretrained vision and language backbones, datasets are usually not completely unstructured, but follow an image+question and answer structure. Recent works have shown that consolidating multiple questions for the same image into a multi-turn conversation where the image is shown only once improves model performance, and additionally also reduces the datasets memory footprint. We therefore experiment with deduplicating every image in our dataset internally using the same SSCD descriptors, manually inspect the resulting clusters and merge fitting samples into a multi-turn conversation.
298
  Even when training for longer than the other ablations, we did not observe a significant difference, if at all rather one in favour against merging multiple samples together.
299
 
300
- <HtmlEmbed src="internal-deduplication.html" title="D3 Line" desc="TODO - Average Ranking of Models trained with internally deduplicated / merged samples." />
301
 
302
  ### Should you train on multilingual data if your language backbone was not?
303
  There are some multilingual datasets in our mixture, but since our Language Backbone is only trained on English data, we experimented with removing all the multilingual, mainly Chinese, subsets. This does also not seem to make a big difference, with slight advantages to leaving the data, even if it was not part of the Language Backbone's initial training. In our training setup with this configuration, one epoch over the whole dataset equals ~12k steps, so the benefit of unseen languages only materializes after the first full epoch.
304
 
305
- <HtmlEmbed src="remove-ch.html" title="D3 Line" desc="TODO - Average Rank of Models trained with and without multilingual samples" />
306
 
307
  ### How can you assess the quality of the dataset?
308
 
@@ -324,7 +324,7 @@ This is the distribution of scores across the different filters for FineVision.
324
 
325
  To try to quantify the quality of the training data and the effect it has on the model’s performance, we run extensive ablations on our generated ratings.
326
 
327
- <HtmlEmbed src="all-ratings.html" title="D3 Line" desc="TODO - Average Rank of Models trained with samples that have all 4 ratings above a certain threshold." />
328
 
329
  Interestingly, both when only training on turns that have any of the 4 ratings under a certain threshold, as well as when training on turns where only a single rating at a time is used, we observe the same behaviour. Simply training on all samples of the dataset outperforms in benchmarks. This could mean multiple things.
330
  We can almost see the same distribution in the ranks across all filters: From best to worst with an increase in the rating threshold. For example the visual dependency and the image correspondence rating both result in exactly the same distribution of rankings, corresponding to the natural order of options, 1 through 5. This could indicate that with a sufficiently large dataset that you train on long enough, it hurts more to remove samples, even if they were judged to be of low quality, than to train on them.
@@ -332,21 +332,24 @@ The notion of quality for VLM datasets is nuanced in general. If we compare trai
332
  Alternatively, while we used state-of-the-art open source models to judge our datapoints, we still had to find a compromise between model quality and cost due to the raw required effort to rate every single turn of FineVision. The chosen models could simply not be powerful enough to recognize and judge the quality of samples.
333
  Even though our first proposal to judge the quality of multimodal data on a per-turn basis did not yield any improvement in model performance, we believe that this is still an exciting and important direction of research and hope the release of FineVision encourages the community to develop techniques for this at large scale.
334
 
335
- <HtmlEmbed src="d3-line.html" title="D3 Line" desc="TODO - Average Rank of Models 4 plots for the rankings." />
 
 
 
336
 
337
  ### Should you train in multiple stages?
338
  The standard training procedure of a VLM usually follows at least two stages. First, you train only the connecting module, potentially in addition the image encoder, and then you train the whole model in a second stage. Some work has even introduced an additional Stage 2.5, where you train the full model on a smaller subset of higher quality data. To investigate this on small models, we experiment both with single, dual and triple stage training.
339
 
340
  #### 1 Stage vs 2 Stages
341
 
342
- <HtmlEmbed src="ss_vs_s1.html" title="D3 Line" desc="TODO - Average Rank of a model trained for 20K steps in a single stage, and a model trained for the same 20k steps on top of pretraining the Modality Projection and Vision Encoder for 10k steps." />
343
 
344
  We observe that at this model size, with this amount of available data, training only a single stage actually outperforms a multi stage approach.
345
 
346
  #### 2 Stages vs 2.5 Stages
347
  We also experiment if splitting the second stage results in any performance improvements. We take the baseline, and continue training for another 20k steps, both with the unfiltered (>= 1) as well as filtered subsets of FineVision according to our ratings.
348
 
349
- <HtmlEmbed src="s25_ratings.html" title="D3 Line" desc="TODO - Average Rank if a model trained for an additional 20K steps on top of unfiltered training for 20K steps." />
350
 
351
  Like in the previous results, we observe that the best outcome is simply achieved by training on as much data as possible.
352
 
 
263
  ### How does FineVision compare against the Baselines?
264
  Compared against existing VLM training datasets, FineVision produces significantly higher benchmark ranks than the other options.
265
 
266
+ <HtmlEmbed src="against-baselines.html" desc="Average Rank of Models trained on different open source datasets." />
267
 
268
  ### How contaminated are the datasets?
269
  To investigate data leakage from benchmarks into this dataset, we construct a deduplication pipeline based on the sample images. We embed the images of 66 image-test datasets from the lmms-eval framework using the SSCD descriptor, and compute the cosine similarity between our samples and the test-set embeddings. Whenever a sample has a similarity higher than a threshold of 0.95 it is assumed to be a duplicate. While our tests with various thresholds show that this is flagging some samples that are not actual duplicates (especially if the image depicts similar but different images in detail, like graphs or tables), we preferred to err on the side of caution. We open-source the deduplication pipeline here as well as the precomputed test-set embedding’s here.
 
279
 
280
  Additionally, we experimented with removing all found samples from all datasets to see if the outcome is different from the results above, but we observe the same distribution.
281
 
282
+ <HtmlEmbed src="against-baselines-deduplicated.html" desc="Average Rank of Models trained on different deduplicated open source datasets." />
283
 
284
  TODO: After removing these duplicates, the performance of the models dropped by … % over all benchmarks.
285
 
 
297
  Since the training of a VLM already builds upon pretrained vision and language backbones, datasets are usually not completely unstructured, but follow an image+question and answer structure. Recent works have shown that consolidating multiple questions for the same image into a multi-turn conversation where the image is shown only once improves model performance, and additionally also reduces the datasets memory footprint. We therefore experiment with deduplicating every image in our dataset internally using the same SSCD descriptors, manually inspect the resulting clusters and merge fitting samples into a multi-turn conversation.
298
  Even when training for longer than the other ablations, we did not observe a significant difference, if at all rather one in favour against merging multiple samples together.
299
 
300
+ <HtmlEmbed src="internal-deduplication.html" desc="Average Ranking of Models trained with internally deduplicated / merged samples." />
301
 
302
  ### Should you train on multilingual data if your language backbone was not?
303
  There are some multilingual datasets in our mixture, but since our Language Backbone is only trained on English data, we experimented with removing all the multilingual, mainly Chinese, subsets. This does also not seem to make a big difference, with slight advantages to leaving the data, even if it was not part of the Language Backbone's initial training. In our training setup with this configuration, one epoch over the whole dataset equals ~12k steps, so the benefit of unseen languages only materializes after the first full epoch.
304
 
305
+ <HtmlEmbed src="remove-ch.html" desc="Average Rank of Models trained with and without multilingual samples" />
306
 
307
  ### How can you assess the quality of the dataset?
308
 
 
324
 
325
  To try to quantify the quality of the training data and the effect it has on the model’s performance, we run extensive ablations on our generated ratings.
326
 
327
+ <HtmlEmbed src="all-ratings.html" desc="Average Rank of Models trained with samples that have all 4 ratings above a certain threshold." />
328
 
329
  Interestingly, both when only training on turns that have any of the 4 ratings under a certain threshold, as well as when training on turns where only a single rating at a time is used, we observe the same behaviour. Simply training on all samples of the dataset outperforms in benchmarks. This could mean multiple things.
330
  We can almost see the same distribution in the ranks across all filters: From best to worst with an increase in the rating threshold. For example the visual dependency and the image correspondence rating both result in exactly the same distribution of rankings, corresponding to the natural order of options, 1 through 5. This could indicate that with a sufficiently large dataset that you train on long enough, it hurts more to remove samples, even if they were judged to be of low quality, than to train on them.
 
332
  Alternatively, while we used state-of-the-art open source models to judge our datapoints, we still had to find a compromise between model quality and cost due to the raw required effort to rate every single turn of FineVision. The chosen models could simply not be powerful enough to recognize and judge the quality of samples.
333
  Even though our first proposal to judge the quality of multimodal data on a per-turn basis did not yield any improvement in model performance, we believe that this is still an exciting and important direction of research and hope the release of FineVision encourages the community to develop techniques for this at large scale.
334
 
335
+ <HtmlEmbed src="formatting-filters.html" title="Formatting Filter" desc="Average Rank of Models that have the Formatting Filter above a threshold." />
336
+ <HtmlEmbed src="relevance-filters.html" title="Relevance Filter" desc="Average Rank of Models that have the Relevance Filter above a threshold." />
337
+ <HtmlEmbed src="visual-dependency-filters.html" title="Visual Dependency Filter" desc="Average Rank of Models that have the Visual Dependency Filter above a threshold." />
338
+ <HtmlEmbed src="image-correspondence-filters.html" title="Image Correspondence Filter" desc="Average Rank of Models that have the Image-Correspondence Filter above a threshold." />
339
 
340
  ### Should you train in multiple stages?
341
  The standard training procedure of a VLM usually follows at least two stages. First, you train only the connecting module, potentially in addition the image encoder, and then you train the whole model in a second stage. Some work has even introduced an additional Stage 2.5, where you train the full model on a smaller subset of higher quality data. To investigate this on small models, we experiment both with single, dual and triple stage training.
342
 
343
  #### 1 Stage vs 2 Stages
344
 
345
+ <HtmlEmbed src="ss-vs-s1.html" desc="Average Rank of a model trained for 20K steps in a single stage, and a model trained for the same 20k steps on top of pretraining the Modality Projection and Vision Encoder for 10k steps." />
346
 
347
  We observe that at this model size, with this amount of available data, training only a single stage actually outperforms a multi stage approach.
348
 
349
  #### 2 Stages vs 2.5 Stages
350
  We also experiment if splitting the second stage results in any performance improvements. We take the baseline, and continue training for another 20k steps, both with the unfiltered (>= 1) as well as filtered subsets of FineVision according to our ratings.
351
 
352
+ <HtmlEmbed src="s25-ratings.html" desc="Average Rank if a model trained for an additional 20K steps on top of unfiltered training for 20K steps." />
353
 
354
  Like in the previous results, we observe that the best outcome is simply achieved by training on as much data as possible.
355
 
app/src/content/assets/audio/audio-example.wav DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:552f71aef82738f9b5c9f1d6be495e0f83cec0eabf485066628badb3283cb4b8
3
- size 48830444
 
 
 
 
app/src/content/assets/data/against_baselines.csv CHANGED
@@ -1,3 +1,961 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:db5080bacf123328439db1c2e611e0a073cc4c1d2ff250ff4f7e7dc2ec6c04e2
3
- size 37016
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run,step,metric,value,stderr
2
+ FineVision,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
3
+ FineVision,1000,average,0.27120689295763617,
4
+ FineVision,1000,average_rank,2.8,
5
+ FineVision,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
6
+ FineVision,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
7
+ FineVision,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
8
+ FineVision,1000,mme_total_score,977.4280712284914,
9
+ FineVision,1000,mmmu_val_mmmu_acc,0.25222,
10
+ FineVision,1000,mmstar_average,0.23215874078908072,
11
+ FineVision,1000,ocrbench_ocrbench_accuracy,0.286,
12
+ FineVision,1000,seedbench_seed_all,0.2563646470261256,
13
+ FineVision,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
14
+ FineVision,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
15
+ FineVision,2000,average,0.3202068275596269,
16
+ FineVision,2000,average_rank,2.6,
17
+ FineVision,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
18
+ FineVision,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
19
+ FineVision,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
20
+ FineVision,2000,mme_total_score,1049.3036214485794,
21
+ FineVision,2000,mmmu_val_mmmu_acc,0.24556,
22
+ FineVision,2000,mmstar_average,0.21305462434540698,
23
+ FineVision,2000,ocrbench_ocrbench_accuracy,0.395,
24
+ FineVision,2000,seedbench_seed_all,0.258532518065592,
25
+ FineVision,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
26
+ FineVision,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
27
+ FineVision,3000,average,0.3507423834414229,
28
+ FineVision,3000,average_rank,2.6,
29
+ FineVision,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
30
+ FineVision,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
31
+ FineVision,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
32
+ FineVision,3000,mme_total_score,1170.2383953581434,
33
+ FineVision,3000,mmmu_val_mmmu_acc,0.27556,
34
+ FineVision,3000,mmstar_average,0.25432376938577683,
35
+ FineVision,3000,ocrbench_ocrbench_accuracy,0.436,
36
+ FineVision,3000,seedbench_seed_all,0.2792106725958866,
37
+ FineVision,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
38
+ FineVision,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
39
+ FineVision,4000,average,0.36961781722974835,
40
+ FineVision,4000,average_rank,2.7,
41
+ FineVision,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
42
+ FineVision,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
43
+ FineVision,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
44
+ FineVision,4000,mme_total_score,1155.203781512605,
45
+ FineVision,4000,mmmu_val_mmmu_acc,0.25556,
46
+ FineVision,4000,mmstar_average,0.2575590188757354,
47
+ FineVision,4000,ocrbench_ocrbench_accuracy,0.453,
48
+ FineVision,4000,seedbench_seed_all,0.33913285158421347,
49
+ FineVision,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
50
+ FineVision,5000,ai2d_exact_match,0.3125,0.008342439145556371
51
+ FineVision,5000,average,0.3974627910380972,
52
+ FineVision,5000,average_rank,2.6,
53
+ FineVision,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
54
+ FineVision,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
55
+ FineVision,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
56
+ FineVision,5000,mme_total_score,1181.4653861544618,
57
+ FineVision,5000,mmmu_val_mmmu_acc,0.26667,
58
+ FineVision,5000,mmstar_average,0.29596648146165705,
59
+ FineVision,5000,ocrbench_ocrbench_accuracy,0.462,
60
+ FineVision,5000,seedbench_seed_all,0.43107281823235133,
61
+ FineVision,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
62
+ FineVision,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
63
+ FineVision,6000,average,0.4161227404571003,
64
+ FineVision,6000,average_rank,2.1,
65
+ FineVision,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
66
+ FineVision,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
67
+ FineVision,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
68
+ FineVision,6000,mme_total_score,1284.1648659463785,
69
+ FineVision,6000,mmmu_val_mmmu_acc,0.27111,
70
+ FineVision,6000,mmstar_average,0.2978489412854164,
71
+ FineVision,6000,ocrbench_ocrbench_accuracy,0.495,
72
+ FineVision,6000,seedbench_seed_all,0.4795997776542524,
73
+ FineVision,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
74
+ FineVision,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
75
+ FineVision,7000,average,0.4291083177345374,
76
+ FineVision,7000,average_rank,2.4,
77
+ FineVision,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
78
+ FineVision,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
79
+ FineVision,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
80
+ FineVision,7000,mme_total_score,1185.875650260104,
81
+ FineVision,7000,mmmu_val_mmmu_acc,0.26556,
82
+ FineVision,7000,mmstar_average,0.31372400960777047,
83
+ FineVision,7000,ocrbench_ocrbench_accuracy,0.504,
84
+ FineVision,7000,seedbench_seed_all,0.4964424680377988,
85
+ FineVision,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
86
+ FineVision,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
87
+ FineVision,8000,average,0.43846759477995995,
88
+ FineVision,8000,average_rank,2.2,
89
+ FineVision,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
90
+ FineVision,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
91
+ FineVision,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
92
+ FineVision,8000,mme_total_score,1199.2409963985594,
93
+ FineVision,8000,mmmu_val_mmmu_acc,0.28111,
94
+ FineVision,8000,mmstar_average,0.33512257186205047,
95
+ FineVision,8000,ocrbench_ocrbench_accuracy,0.51,
96
+ FineVision,8000,seedbench_seed_all,0.5024458032240133,
97
+ FineVision,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
98
+ FineVision,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
99
+ FineVision,9000,average,0.4422510732201056,
100
+ FineVision,9000,average_rank,2.0,
101
+ FineVision,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
102
+ FineVision,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
103
+ FineVision,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
104
+ FineVision,9000,mme_total_score,1231.5195078031213,
105
+ FineVision,9000,mmmu_val_mmmu_acc,0.25889,
106
+ FineVision,9000,mmstar_average,0.3216444898242951,
107
+ FineVision,9000,ocrbench_ocrbench_accuracy,0.515,
108
+ FineVision,9000,seedbench_seed_all,0.5120622568093385,
109
+ FineVision,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
110
+ FineVision,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
111
+ FineVision,10000,average,0.4523875703250908,
112
+ FineVision,10000,average_rank,1.7,
113
+ FineVision,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
114
+ FineVision,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
115
+ FineVision,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
116
+ FineVision,10000,mme_total_score,1240.8218287314926,
117
+ FineVision,10000,mmmu_val_mmmu_acc,0.28778,
118
+ FineVision,10000,mmstar_average,0.32972717906018517,
119
+ FineVision,10000,ocrbench_ocrbench_accuracy,0.517,
120
+ FineVision,10000,seedbench_seed_all,0.5217342968315731,
121
+ FineVision,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
122
+ FineVision,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
123
+ FineVision,11000,average,0.4561398159525099,
124
+ FineVision,11000,average_rank,1.7,
125
+ FineVision,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
126
+ FineVision,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
127
+ FineVision,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
128
+ FineVision,11000,mme_total_score,1322.9488795518205,
129
+ FineVision,11000,mmmu_val_mmmu_acc,0.27778,
130
+ FineVision,11000,mmstar_average,0.3298563439522548,
131
+ FineVision,11000,ocrbench_ocrbench_accuracy,0.521,
132
+ FineVision,11000,seedbench_seed_all,0.5237354085603113,
133
+ FineVision,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
134
+ FineVision,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
135
+ FineVision,12000,average,0.4582751140055433,
136
+ FineVision,12000,average_rank,1.6,
137
+ FineVision,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
138
+ FineVision,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
139
+ FineVision,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
140
+ FineVision,12000,mme_total_score,1225.6453581432572,
141
+ FineVision,12000,mmmu_val_mmmu_acc,0.27889,
142
+ FineVision,12000,mmstar_average,0.34010867846816534,
143
+ FineVision,12000,ocrbench_ocrbench_accuracy,0.512,
144
+ FineVision,12000,seedbench_seed_all,0.5350194552529183,
145
+ FineVision,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
146
+ FineVision,13000,ai2d_exact_match,0.4375,0.008928571428571428
147
+ FineVision,13000,average,0.4692868662590049,
148
+ FineVision,13000,average_rank,1.5,
149
+ FineVision,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
150
+ FineVision,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
151
+ FineVision,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
152
+ FineVision,13000,mme_total_score,1281.7122849139657,
153
+ FineVision,13000,mmmu_val_mmmu_acc,0.28222,
154
+ FineVision,13000,mmstar_average,0.3453069542917521,
155
+ FineVision,13000,ocrbench_ocrbench_accuracy,0.549,
156
+ FineVision,13000,seedbench_seed_all,0.5442468037798777,
157
+ FineVision,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
158
+ FineVision,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
159
+ FineVision,14000,average,0.47352486841689195,
160
+ FineVision,14000,average_rank,1.4,
161
+ FineVision,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
162
+ FineVision,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
163
+ FineVision,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
164
+ FineVision,14000,mme_total_score,1309.1444577831132,
165
+ FineVision,14000,mmmu_val_mmmu_acc,0.28111,
166
+ FineVision,14000,mmstar_average,0.34575818188776586,
167
+ FineVision,14000,ocrbench_ocrbench_accuracy,0.551,
168
+ FineVision,14000,seedbench_seed_all,0.5483602001111729,
169
+ FineVision,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
170
+ FineVision,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
171
+ FineVision,15000,average,0.47878665012878824,
172
+ FineVision,15000,average_rank,1.3,
173
+ FineVision,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
174
+ FineVision,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
175
+ FineVision,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
176
+ FineVision,15000,mme_total_score,1384.2171868747498,
177
+ FineVision,15000,mmmu_val_mmmu_acc,0.30222,
178
+ FineVision,15000,mmstar_average,0.35408135695920684,
179
+ FineVision,15000,ocrbench_ocrbench_accuracy,0.558,
180
+ FineVision,15000,seedbench_seed_all,0.5411339633129516,
181
+ FineVision,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
182
+ FineVision,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
183
+ FineVision,16000,average,0.47665128022935843,
184
+ FineVision,16000,average_rank,1.5,
185
+ FineVision,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
186
+ FineVision,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
187
+ FineVision,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
188
+ FineVision,16000,mme_total_score,1317.8491396558625,
189
+ FineVision,16000,mmmu_val_mmmu_acc,0.27556,
190
+ FineVision,16000,mmstar_average,0.33214333327093315,
191
+ FineVision,16000,ocrbench_ocrbench_accuracy,0.56,
192
+ FineVision,16000,seedbench_seed_all,0.5463590883824346,
193
+ FineVision,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
194
+ FineVision,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
195
+ FineVision,17000,average,0.4777141780162423,
196
+ FineVision,17000,average_rank,1.3,
197
+ FineVision,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
198
+ FineVision,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
199
+ FineVision,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
200
+ FineVision,17000,mme_total_score,1381.9161664665867,
201
+ FineVision,17000,mmmu_val_mmmu_acc,0.27667,
202
+ FineVision,17000,mmstar_average,0.3370289492329521,
203
+ FineVision,17000,ocrbench_ocrbench_accuracy,0.519,
204
+ FineVision,17000,seedbench_seed_all,0.5510283490828238,
205
+ FineVision,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
206
+ FineVision,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
207
+ FineVision,18000,average,0.4819834595278701,
208
+ FineVision,18000,average_rank,1.2,
209
+ FineVision,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
210
+ FineVision,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
211
+ FineVision,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
212
+ FineVision,18000,mme_total_score,1336.922769107643,
213
+ FineVision,18000,mmmu_val_mmmu_acc,0.28667,
214
+ FineVision,18000,mmstar_average,0.34482796716566916,
215
+ FineVision,18000,ocrbench_ocrbench_accuracy,0.533,
216
+ FineVision,18000,seedbench_seed_all,0.5543079488604781,
217
+ FineVision,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
218
+ FineVision,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
219
+ FineVision,19000,average,0.4899006713916878,
220
+ FineVision,19000,average_rank,1.2,
221
+ FineVision,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
222
+ FineVision,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
223
+ FineVision,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
224
+ FineVision,19000,mme_total_score,1406.6628651460583,
225
+ FineVision,19000,mmmu_val_mmmu_acc,0.28333,
226
+ FineVision,19000,mmstar_average,0.356220913822775,
227
+ FineVision,19000,ocrbench_ocrbench_accuracy,0.577,
228
+ FineVision,19000,seedbench_seed_all,0.554585881045025,
229
+ FineVision,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
230
+ FineVision,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
231
+ FineVision,20000,average,0.4873169067639118,
232
+ FineVision,20000,average_rank,1.2,
233
+ FineVision,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
234
+ FineVision,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
235
+ FineVision,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
236
+ FineVision,20000,mme_total_score,1324.6738695478193,
237
+ FineVision,20000,mmmu_val_mmmu_acc,0.30111,
238
+ FineVision,20000,mmstar_average,0.33806766134497995,
239
+ FineVision,20000,ocrbench_ocrbench_accuracy,0.555,
240
+ FineVision,20000,seedbench_seed_all,0.5587548638132296,
241
+ FineVision,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
242
+ Cauldron,1000,ai2d_exact_match,0.28886010362694303,0.008157423105367313
243
+ Cauldron,1000,average,0.29904301214549334,
244
+ Cauldron,1000,average_rank,1.9,
245
+ Cauldron,1000,chartqa_relaxed_overall,0.1936,0.007903961351247664
246
+ Cauldron,1000,docvqa_val_anls,0.32153744261519257,0.005317068996930092
247
+ Cauldron,1000,infovqa_val_anls,0.1431990055083018,0.005424936025458022
248
+ Cauldron,1000,mme_total_score,1172.0779311724689,
249
+ Cauldron,1000,mmmu_val_mmmu_acc,0.27667,
250
+ Cauldron,1000,mmstar_average,0.2911329978035828,
251
+ Cauldron,1000,ocrbench_ocrbench_accuracy,0.337,
252
+ Cauldron,1000,seedbench_seed_all,0.39360755975541967,
253
+ Cauldron,1000,textvqa_val_exact_match,0.44578,0.0067711747933144
254
+ Cauldron,2000,ai2d_exact_match,0.41871761658031087,0.008879446246519871
255
+ Cauldron,2000,average,0.34894207663644056,
256
+ Cauldron,2000,average_rank,1.9,
257
+ Cauldron,2000,chartqa_relaxed_overall,0.2056,0.00808440468059435
258
+ Cauldron,2000,docvqa_val_anls,0.37496112947656884,0.005489559822643159
259
+ Cauldron,2000,infovqa_val_anls,0.14667060624395192,0.005473110880489631
260
+ Cauldron,2000,mme_total_score,1248.6002400960383,
261
+ Cauldron,2000,mmmu_val_mmmu_acc,0.28667,
262
+ Cauldron,2000,mmstar_average,0.34478967650439835,
263
+ Cauldron,2000,ocrbench_ocrbench_accuracy,0.368,
264
+ Cauldron,2000,seedbench_seed_all,0.5013896609227348,
265
+ Cauldron,2000,textvqa_val_exact_match,0.49368,0.0068081481840761415
266
+ Cauldron,3000,ai2d_exact_match,0.4653497409326425,0.00897751861457722
267
+ Cauldron,3000,average,0.3647655686453986,
268
+ Cauldron,3000,average_rank,2.4,
269
+ Cauldron,3000,chartqa_relaxed_overall,0.2192,0.008275744025504309
270
+ Cauldron,3000,docvqa_val_anls,0.3999560247980121,0.005545460541574292
271
+ Cauldron,3000,infovqa_val_anls,0.15452276899525894,0.005625373377223539
272
+ Cauldron,3000,mme_total_score,1164.4316726690677,
273
+ Cauldron,3000,mmmu_val_mmmu_acc,0.27667,
274
+ Cauldron,3000,mmstar_average,0.34444117730168444,
275
+ Cauldron,3000,ocrbench_ocrbench_accuracy,0.403,
276
+ Cauldron,3000,seedbench_seed_all,0.5147304057809894,
277
+ Cauldron,3000,textvqa_val_exact_match,0.50502,0.006802809387533405
278
+ Cauldron,4000,ai2d_exact_match,0.48121761658031087,0.008992802471886854
279
+ Cauldron,4000,average,0.3694904966669109,
280
+ Cauldron,4000,average_rank,2.3,
281
+ Cauldron,4000,chartqa_relaxed_overall,0.2184,0.008264859294607735
282
+ Cauldron,4000,docvqa_val_anls,0.40927640030259055,0.005557758057811595
283
+ Cauldron,4000,infovqa_val_anls,0.15259984907145144,0.005629341537638722
284
+ Cauldron,4000,mme_total_score,1238.5236094437776,
285
+ Cauldron,4000,mmmu_val_mmmu_acc,0.26667,
286
+ Cauldron,4000,mmstar_average,0.36056167686607765,
287
+ Cauldron,4000,ocrbench_ocrbench_accuracy,0.414,
288
+ Cauldron,4000,seedbench_seed_all,0.5240689271817677,
289
+ Cauldron,4000,textvqa_val_exact_match,0.49862,0.006804563140709856
290
+ Cauldron,5000,ai2d_exact_match,0.48607512953367876,0.008995663534025174
291
+ Cauldron,5000,average,0.3715613183242104,
292
+ Cauldron,5000,average_rank,2.3,
293
+ Cauldron,5000,chartqa_relaxed_overall,0.2236,0.008334806752495259
294
+ Cauldron,5000,docvqa_val_anls,0.42332206291362884,0.005573327842684563
295
+ Cauldron,5000,infovqa_val_anls,0.15868297927477548,0.005670852175948406
296
+ Cauldron,5000,mme_total_score,1159.8522408963586,
297
+ Cauldron,5000,mmmu_val_mmmu_acc,0.26889,
298
+ Cauldron,5000,mmstar_average,0.360337335219157,
299
+ Cauldron,5000,ocrbench_ocrbench_accuracy,0.401,
300
+ Cauldron,5000,seedbench_seed_all,0.5198443579766537,
301
+ Cauldron,5000,textvqa_val_exact_match,0.5023,0.0068036313744923
302
+ Cauldron,6000,ai2d_exact_match,0.5025906735751295,0.008999033321198393
303
+ Cauldron,6000,average,0.3678206000506273,
304
+ Cauldron,6000,average_rank,2.2,
305
+ Cauldron,6000,chartqa_relaxed_overall,0.2228,0.008324168469720259
306
+ Cauldron,6000,docvqa_val_anls,0.4147154618557465,0.005557478918091434
307
+ Cauldron,6000,infovqa_val_anls,0.14825798330117057,0.005517775162348899
308
+ Cauldron,6000,mme_total_score,1182.059923969588,
309
+ Cauldron,6000,mmmu_val_mmmu_acc,0.27111,
310
+ Cauldron,6000,mmstar_average,0.3484854117958612,
311
+ Cauldron,6000,ocrbench_ocrbench_accuracy,0.391,
312
+ Cauldron,6000,seedbench_seed_all,0.5185658699277377,
313
+ Cauldron,6000,textvqa_val_exact_match,0.49285999999999996,0.0068052528515312825
314
+ Cauldron,7000,ai2d_exact_match,0.49838082901554404,0.008999106932714641
315
+ Cauldron,7000,average,0.3749288136256422,
316
+ Cauldron,7000,average_rank,2.0,
317
+ Cauldron,7000,chartqa_relaxed_overall,0.2276,0.00838733777631434
318
+ Cauldron,7000,docvqa_val_anls,0.42525461500166023,0.005595478547875609
319
+ Cauldron,7000,infovqa_val_anls,0.14305767989732765,0.005444282186253047
320
+ Cauldron,7000,mme_total_score,1262.065426170468,
321
+ Cauldron,7000,mmmu_val_mmmu_acc,0.29333,
322
+ Cauldron,7000,mmstar_average,0.35012603751558075,
323
+ Cauldron,7000,ocrbench_ocrbench_accuracy,0.403,
324
+ Cauldron,7000,seedbench_seed_all,0.5222901612006671,
325
+ Cauldron,7000,textvqa_val_exact_match,0.51132,0.00682164778449453
326
+ Cauldron,8000,ai2d_exact_match,0.49028497409326427,0.008997455247470544
327
+ Cauldron,8000,average,0.3674367285685282,
328
+ Cauldron,8000,average_rank,2.8,
329
+ Cauldron,8000,chartqa_relaxed_overall,0.2256,0.008361209238380008
330
+ Cauldron,8000,docvqa_val_anls,0.40937518311359955,0.005568234588180622
331
+ Cauldron,8000,infovqa_val_anls,0.14953110986986237,0.005518589617885333
332
+ Cauldron,8000,mme_total_score,1210.7711084433772,
333
+ Cauldron,8000,mmmu_val_mmmu_acc,0.28889,
334
+ Cauldron,8000,mmstar_average,0.32742675529850473,
335
+ Cauldron,8000,ocrbench_ocrbench_accuracy,0.406,
336
+ Cauldron,8000,seedbench_seed_all,0.512562534741523,
337
+ Cauldron,8000,textvqa_val_exact_match,0.49726000000000004,0.006823680165585169
338
+ Cauldron,9000,ai2d_exact_match,0.49287564766839376,0.008998240543632314
339
+ Cauldron,9000,average,0.3635862393983371,
340
+ Cauldron,9000,average_rank,3.0,
341
+ Cauldron,9000,chartqa_relaxed_overall,0.2264,0.008371693383064148
342
+ Cauldron,9000,docvqa_val_anls,0.4019142603693516,0.005557969721056488
343
+ Cauldron,9000,infovqa_val_anls,0.15576345355793061,0.005631711679425604
344
+ Cauldron,9000,mme_total_score,1161.06112444978,
345
+ Cauldron,9000,mmmu_val_mmmu_acc,0.27,
346
+ Cauldron,9000,mmstar_average,0.33510800699714055,
347
+ Cauldron,9000,ocrbench_ocrbench_accuracy,0.401,
348
+ Cauldron,9000,seedbench_seed_all,0.5066147859922179,
349
+ Cauldron,9000,textvqa_val_exact_match,0.4825999999999999,0.006824717089570126
350
+ Cauldron,10000,ai2d_exact_match,0.4951424870466321,0.008998729431386465
351
+ Cauldron,10000,average,0.3613896970671388,
352
+ Cauldron,10000,average_rank,3.2,
353
+ Cauldron,10000,chartqa_relaxed_overall,0.2276,0.00838733777631434
354
+ Cauldron,10000,docvqa_val_anls,0.400968382089468,0.005551850287661274
355
+ Cauldron,10000,infovqa_val_anls,0.15155496077062244,0.0055346119867504375
356
+ Cauldron,10000,mme_total_score,1230.2276910764306,
357
+ Cauldron,10000,mmmu_val_mmmu_acc,0.26,
358
+ Cauldron,10000,mmstar_average,0.32908517910608676,
359
+ Cauldron,10000,ocrbench_ocrbench_accuracy,0.395,
360
+ Cauldron,10000,seedbench_seed_all,0.4972762645914397,
361
+ Cauldron,10000,textvqa_val_exact_match,0.49588000000000004,0.006836984276038533
362
+ Cauldron,11000,ai2d_exact_match,0.49676165803108807,0.008998965371572357
363
+ Cauldron,11000,average,0.36198497174992383,
364
+ Cauldron,11000,average_rank,3.0,
365
+ Cauldron,11000,chartqa_relaxed_overall,0.2284,0.008397713059747491
366
+ Cauldron,11000,docvqa_val_anls,0.4051111426655002,0.0055740680205303966
367
+ Cauldron,11000,infovqa_val_anls,0.14954437197310022,0.005537262124650125
368
+ Cauldron,11000,mme_total_score,1210.5605242096838,
369
+ Cauldron,11000,mmmu_val_mmmu_acc,0.27111,
370
+ Cauldron,11000,mmstar_average,0.33316183100069335,
371
+ Cauldron,11000,ocrbench_ocrbench_accuracy,0.383,
372
+ Cauldron,11000,seedbench_seed_all,0.5043357420789327,
373
+ Cauldron,11000,textvqa_val_exact_match,0.48644,0.006834542228525236
374
+ Cauldron,12000,ai2d_exact_match,0.5009715025906736,0.008999137132137068
375
+ Cauldron,12000,average,0.3661893496614986,
376
+ Cauldron,12000,average_rank,3.2,
377
+ Cauldron,12000,chartqa_relaxed_overall,0.2332,0.008459061785476934
378
+ Cauldron,12000,docvqa_val_anls,0.40826612382074784,0.0055749766883040515
379
+ Cauldron,12000,infovqa_val_anls,0.1451043668322714,0.0054346014264420334
380
+ Cauldron,12000,mme_total_score,1204.859843937575,
381
+ Cauldron,12000,mmmu_val_mmmu_acc,0.29222,
382
+ Cauldron,12000,mmstar_average,0.3322773065724958,
383
+ Cauldron,12000,ocrbench_ocrbench_accuracy,0.386,
384
+ Cauldron,12000,seedbench_seed_all,0.5047248471372985,
385
+ Cauldron,12000,textvqa_val_exact_match,0.49294000000000004,0.006824466715369768
386
+ Cauldron,13000,ai2d_exact_match,0.4880181347150259,0.00899656981935399
387
+ Cauldron,13000,average,0.3609903418270159,
388
+ Cauldron,13000,average_rank,3.2,
389
+ Cauldron,13000,chartqa_relaxed_overall,0.23,0.008418334000200726
390
+ Cauldron,13000,docvqa_val_anls,0.39428463826041577,0.005550710740937849
391
+ Cauldron,13000,infovqa_val_anls,0.15077272156398794,0.005555043265840396
392
+ Cauldron,13000,mme_total_score,1199.0380152060825,
393
+ Cauldron,13000,mmmu_val_mmmu_acc,0.27667,
394
+ Cauldron,13000,mmstar_average,0.3323119954668039,
395
+ Cauldron,13000,ocrbench_ocrbench_accuracy,0.39,
396
+ Cauldron,13000,seedbench_seed_all,0.5000555864369094,
397
+ Cauldron,13000,textvqa_val_exact_match,0.4868,0.006822203492428118
398
+ Cauldron,14000,ai2d_exact_match,0.49060880829015546,0.00899756662777987
399
+ Cauldron,14000,average,0.36202481121184005,
400
+ Cauldron,14000,average_rank,2.9,
401
+ Cauldron,14000,chartqa_relaxed_overall,0.2264,0.008371693383064148
402
+ Cauldron,14000,docvqa_val_anls,0.40917044569115923,0.0055666808292464285
403
+ Cauldron,14000,infovqa_val_anls,0.1424839907142797,0.0054301311838352165
404
+ Cauldron,14000,mme_total_score,1183.6356542617045,
405
+ Cauldron,14000,mmmu_val_mmmu_acc,0.29,
406
+ Cauldron,14000,mmstar_average,0.31528335804531843,
407
+ Cauldron,14000,ocrbench_ocrbench_accuracy,0.393,
408
+ Cauldron,14000,seedbench_seed_all,0.5020566981656476,
409
+ Cauldron,14000,textvqa_val_exact_match,0.48922,0.006837726904596613
410
+ Cauldron,15000,ai2d_exact_match,0.4896373056994819,0.008997221155546275
411
+ Cauldron,15000,average,0.3560155869130515,
412
+ Cauldron,15000,average_rank,3.2,
413
+ Cauldron,15000,chartqa_relaxed_overall,0.2264,0.008371693383064148
414
+ Cauldron,15000,docvqa_val_anls,0.39997251595677663,0.0055655493795707745
415
+ Cauldron,15000,infovqa_val_anls,0.13834600428667498,0.005423970029609658
416
+ Cauldron,15000,mme_total_score,1171.8512404961984,
417
+ Cauldron,15000,mmmu_val_mmmu_acc,0.27667,
418
+ Cauldron,15000,mmstar_average,0.31369390041016126,
419
+ Cauldron,15000,ocrbench_ocrbench_accuracy,0.385,
420
+ Cauldron,15000,seedbench_seed_all,0.5010005558643691,
421
+ Cauldron,15000,textvqa_val_exact_match,0.47342,0.006818885551175648
422
+ Cauldron,16000,ai2d_exact_match,0.4838082901554404,0.008994434238637765
423
+ Cauldron,16000,average,0.3566345947908368,
424
+ Cauldron,16000,average_rank,3.4,
425
+ Cauldron,16000,chartqa_relaxed_overall,0.22,0.008286583553358689
426
+ Cauldron,16000,docvqa_val_anls,0.40446794741098796,0.005565712054024941
427
+ Cauldron,16000,infovqa_val_anls,0.1414810779340465,0.005414255001486301
428
+ Cauldron,16000,mme_total_score,1163.921468587435,
429
+ Cauldron,16000,mmmu_val_mmmu_acc,0.26444,
430
+ Cauldron,16000,mmstar_average,0.3211159497904861,
431
+ Cauldron,16000,ocrbench_ocrbench_accuracy,0.392,
432
+ Cauldron,16000,seedbench_seed_all,0.5045580878265703,
433
+ Cauldron,16000,textvqa_val_exact_match,0.47784,0.0068411071493878735
434
+ Cauldron,17000,ai2d_exact_match,0.4795984455958549,0.008991659681159872
435
+ Cauldron,17000,average,0.35664663136828295,
436
+ Cauldron,17000,average_rank,3.3,
437
+ Cauldron,17000,chartqa_relaxed_overall,0.2232,0.008329493152795851
438
+ Cauldron,17000,docvqa_val_anls,0.39683521379075226,0.0055483771434975925
439
+ Cauldron,17000,infovqa_val_anls,0.14519383287788715,0.005493162839439223
440
+ Cauldron,17000,mme_total_score,1216.2439975990396,
441
+ Cauldron,17000,mmmu_val_mmmu_acc,0.27667,
442
+ Cauldron,17000,mmstar_average,0.3294722845469949,
443
+ Cauldron,17000,ocrbench_ocrbench_accuracy,0.386,
444
+ Cauldron,17000,seedbench_seed_all,0.4938299055030573,
445
+ Cauldron,17000,textvqa_val_exact_match,0.47902,0.006822615153700749
446
+ Cauldron,18000,ai2d_exact_match,0.48575129533678757,0.008995499260034972
447
+ Cauldron,18000,average,0.3559572601168983,
448
+ Cauldron,18000,average_rank,3.3,
449
+ Cauldron,18000,chartqa_relaxed_overall,0.22,0.008286583553358689
450
+ Cauldron,18000,docvqa_val_anls,0.39553075414155453,0.005560094600545488
451
+ Cauldron,18000,infovqa_val_anls,0.1441200977793978,0.005482620397489444
452
+ Cauldron,18000,mme_total_score,1146.935774309724,
453
+ Cauldron,18000,mmmu_val_mmmu_acc,0.28333,
454
+ Cauldron,18000,mmstar_average,0.31718334943636844,
455
+ Cauldron,18000,ocrbench_ocrbench_accuracy,0.393,
456
+ Cauldron,18000,seedbench_seed_all,0.49571984435797667,
457
+ Cauldron,18000,textvqa_val_exact_match,0.46897999999999995,0.006834829544251984
458
+ Cauldron,19000,ai2d_exact_match,0.47506476683937826,0.00898795641911507
459
+ Cauldron,19000,average,0.35389113555756785,
460
+ Cauldron,19000,average_rank,3.4,
461
+ Cauldron,19000,chartqa_relaxed_overall,0.2196,0.008281169428700436
462
+ Cauldron,19000,docvqa_val_anls,0.3927677091095705,0.005557918115613283
463
+ Cauldron,19000,infovqa_val_anls,0.14242963523056748,0.005420426599891758
464
+ Cauldron,19000,mme_total_score,1156.7713085234095,
465
+ Cauldron,19000,mmmu_val_mmmu_acc,0.26667,
466
+ Cauldron,19000,mmstar_average,0.3300183589775604,
467
+ Cauldron,19000,ocrbench_ocrbench_accuracy,0.393,
468
+ Cauldron,19000,seedbench_seed_all,0.4895497498610339,
469
+ Cauldron,19000,textvqa_val_exact_match,0.47591999999999995,0.0068329619195279245
470
+ Cauldron,20000,ai2d_exact_match,0.48218911917098445,0.008993442748995703
471
+ Cauldron,20000,average,0.35315414152261965,
472
+ Cauldron,20000,average_rank,3.1,
473
+ Cauldron,20000,chartqa_relaxed_overall,0.2228,0.008324168469720259
474
+ Cauldron,20000,docvqa_val_anls,0.3995019956467228,0.005554102577571356
475
+ Cauldron,20000,infovqa_val_anls,0.13561089161386572,0.005312619238987202
476
+ Cauldron,20000,mme_total_score,1205.715886354542,
477
+ Cauldron,20000,mmmu_val_mmmu_acc,0.27667,
478
+ Cauldron,20000,mmstar_average,0.3019064734976851,
479
+ Cauldron,20000,ocrbench_ocrbench_accuracy,0.392,
480
+ Cauldron,20000,seedbench_seed_all,0.49182879377431904,
481
+ Cauldron,20000,textvqa_val_exact_match,0.4758799999999999,0.0068345144112400185
482
+ Cambrian,1000,ai2d_exact_match,0.2969559585492228,0.00822373246069825
483
+ Cambrian,1000,average,0.2927820669039429,
484
+ Cambrian,1000,average_rank,2.3,
485
+ Cambrian,1000,chartqa_relaxed_overall,0.3652,0.009631650506356148
486
+ Cambrian,1000,docvqa_val_anls,0.3321611875422322,0.005779917542014128
487
+ Cambrian,1000,infovqa_val_anls,0.14245417507906105,0.005737797137238206
488
+ Cambrian,1000,mme_total_score,1199.468087234894,
489
+ Cambrian,1000,mmmu_val_mmmu_acc,0.24556,
490
+ Cambrian,1000,mmstar_average,0.25503356223234036,
491
+ Cambrian,1000,ocrbench_ocrbench_accuracy,0.257,
492
+ Cambrian,1000,seedbench_seed_all,0.3486937187326292,
493
+ Cambrian,1000,textvqa_val_exact_match,0.39198,0.0066503820519040295
494
+ Cambrian,2000,ai2d_exact_match,0.36204663212435234,0.008649846657326264
495
+ Cambrian,2000,average,0.34977426052091565,
496
+ Cambrian,2000,average_rank,2.3,
497
+ Cambrian,2000,chartqa_relaxed_overall,0.4272,0.009895414680177737
498
+ Cambrian,2000,docvqa_val_anls,0.4044005302893221,0.006099745172446295
499
+ Cambrian,2000,infovqa_val_anls,0.16067123444748188,0.005906486800204124
500
+ Cambrian,2000,mme_total_score,1191.6502601040415,
501
+ Cambrian,2000,mmmu_val_mmmu_acc,0.27,
502
+ Cambrian,2000,mmstar_average,0.3140124492167455,
503
+ Cambrian,2000,ocrbench_ocrbench_accuracy,0.293,
504
+ Cambrian,2000,seedbench_seed_all,0.4954974986103391,
505
+ Cambrian,2000,textvqa_val_exact_match,0.42113999999999996,0.006720777771268006
506
+ Cambrian,3000,ai2d_exact_match,0.3954015544041451,0.008800034697838395
507
+ Cambrian,3000,average,0.36894910100121225,
508
+ Cambrian,3000,average_rank,1.9,
509
+ Cambrian,3000,chartqa_relaxed_overall,0.4512,0.00995424828018316
510
+ Cambrian,3000,docvqa_val_anls,0.4317442116227413,0.006203480507897517
511
+ Cambrian,3000,infovqa_val_anls,0.17555075927653038,0.006227695613801885
512
+ Cambrian,3000,mme_total_score,1311.187975190076,
513
+ Cambrian,3000,mmmu_val_mmmu_acc,0.28222,
514
+ Cambrian,3000,mmstar_average,0.3241666733128301,
515
+ Cambrian,3000,ocrbench_ocrbench_accuracy,0.289,
516
+ Cambrian,3000,seedbench_seed_all,0.5216787103946637,
517
+ Cambrian,3000,textvqa_val_exact_match,0.4495799999999999,0.006762330259763156
518
+ Cambrian,4000,ai2d_exact_match,0.3960492227979275,0.00880252039912977
519
+ Cambrian,4000,average,0.38270567946732525,
520
+ Cambrian,4000,average_rank,2.2,
521
+ Cambrian,4000,chartqa_relaxed_overall,0.4764,0.009990852959439592
522
+ Cambrian,4000,docvqa_val_anls,0.46350742276594625,0.006276498296530657
523
+ Cambrian,4000,infovqa_val_anls,0.17819320935276328,0.006230849386066924
524
+ Cambrian,4000,mme_total_score,1239.0667266906762,
525
+ Cambrian,4000,mmmu_val_mmmu_acc,0.26778,
526
+ Cambrian,4000,mmstar_average,0.3298927333298682,
527
+ Cambrian,4000,ocrbench_ocrbench_accuracy,0.334,
528
+ Cambrian,4000,seedbench_seed_all,0.5273485269594219,
529
+ Cambrian,4000,textvqa_val_exact_match,0.47118000000000004,0.0067854764061200295
530
+ Cambrian,5000,ai2d_exact_match,0.40382124352331605,0.00883109414387431
531
+ Cambrian,5000,average,0.3896927239658996,
532
+ Cambrian,5000,average_rank,2.2,
533
+ Cambrian,5000,chartqa_relaxed_overall,0.4912,0.01000045137036546
534
+ Cambrian,5000,docvqa_val_anls,0.47067674424138894,0.006257580396259991
535
+ Cambrian,5000,infovqa_val_anls,0.19432385292037085,0.00653326869729313
536
+ Cambrian,5000,mme_total_score,1214.843337334934,
537
+ Cambrian,5000,mmmu_val_mmmu_acc,0.26556,
538
+ Cambrian,5000,mmstar_average,0.3255942091936794,
539
+ Cambrian,5000,ocrbench_ocrbench_accuracy,0.348,
540
+ Cambrian,5000,seedbench_seed_all,0.5292384658143413,
541
+ Cambrian,5000,textvqa_val_exact_match,0.47881999999999997,0.0067962283116337965
542
+ Cambrian,6000,ai2d_exact_match,0.4183937823834197,0.00887848400426025
543
+ Cambrian,6000,average,0.39990121640985093,
544
+ Cambrian,6000,average_rank,2.4,
545
+ Cambrian,6000,chartqa_relaxed_overall,0.5048,0.010001539697392967
546
+ Cambrian,6000,docvqa_val_anls,0.5016482570925722,0.006248476976439708
547
+ Cambrian,6000,infovqa_val_anls,0.19206925076752404,0.006399951499514914
548
+ Cambrian,6000,mme_total_score,1176.5368147258905,
549
+ Cambrian,6000,mmmu_val_mmmu_acc,0.26667,
550
+ Cambrian,6000,mmstar_average,0.33910121942401966,
551
+ Cambrian,6000,ocrbench_ocrbench_accuracy,0.349,
552
+ Cambrian,6000,seedbench_seed_all,0.5391884380211228,
553
+ Cambrian,6000,textvqa_val_exact_match,0.48823999999999995,0.006792935247288521
554
+ Cambrian,7000,ai2d_exact_match,0.4326424870466321,0.008917121282993509
555
+ Cambrian,7000,average,0.40874111160527243,
556
+ Cambrian,7000,average_rank,2.2,
557
+ Cambrian,7000,chartqa_relaxed_overall,0.5088,0.01000045137036546
558
+ Cambrian,7000,docvqa_val_anls,0.5036441729071615,0.006331057466984081
559
+ Cambrian,7000,infovqa_val_anls,0.21047690542452482,0.0067248622097179815
560
+ Cambrian,7000,mme_total_score,1226.7814125650261,
561
+ Cambrian,7000,mmmu_val_mmmu_acc,0.29,
562
+ Cambrian,7000,mmstar_average,0.338458434622219,
563
+ Cambrian,7000,ocrbench_ocrbench_accuracy,0.366,
564
+ Cambrian,7000,seedbench_seed_all,0.5344080044469149,
565
+ Cambrian,7000,textvqa_val_exact_match,0.49423999999999996,0.006789004536492761
566
+ Cambrian,8000,ai2d_exact_match,0.4375,0.008928571428571428
567
+ Cambrian,8000,average,0.4145399236017655,
568
+ Cambrian,8000,average_rank,2.2,
569
+ Cambrian,8000,chartqa_relaxed_overall,0.5312,0.009982508912777261
570
+ Cambrian,8000,docvqa_val_anls,0.5139425879433994,0.006316907313170543
571
+ Cambrian,8000,infovqa_val_anls,0.20402472511542052,0.00665285157736885
572
+ Cambrian,8000,mme_total_score,1243.7800120048018,
573
+ Cambrian,8000,mmmu_val_mmmu_acc,0.28222,
574
+ Cambrian,8000,mmstar_average,0.3300028831814166,
575
+ Cambrian,8000,ocrbench_ocrbench_accuracy,0.397,
576
+ Cambrian,8000,seedbench_seed_all,0.5364091161756531,
577
+ Cambrian,8000,textvqa_val_exact_match,0.49855999999999995,0.006793174127235705
578
+ Cambrian,9000,ai2d_exact_match,0.4251943005181347,0.008897867521411106
579
+ Cambrian,9000,average,0.41587431550154147,
580
+ Cambrian,9000,average_rank,2.0,
581
+ Cambrian,9000,chartqa_relaxed_overall,0.5316,0.009982005418395102
582
+ Cambrian,9000,docvqa_val_anls,0.524278096798472,0.006327817979288962
583
+ Cambrian,9000,infovqa_val_anls,0.2075069347958689,0.006574086714467312
584
+ Cambrian,9000,mme_total_score,1196.0997398959585,
585
+ Cambrian,9000,mmmu_val_mmmu_acc,0.28556,
586
+ Cambrian,9000,mmstar_average,0.33833745626187595,
587
+ Cambrian,9000,ocrbench_ocrbench_accuracy,0.381,
588
+ Cambrian,9000,seedbench_seed_all,0.5456920511395219,
589
+ Cambrian,9000,textvqa_val_exact_match,0.5036999999999999,0.006790970877355565
590
+ Cambrian,10000,ai2d_exact_match,0.44559585492227977,0.008945723914357835
591
+ Cambrian,10000,average,0.41659534392300923,
592
+ Cambrian,10000,average_rank,2.0,
593
+ Cambrian,10000,chartqa_relaxed_overall,0.5416,0.00996732235888869
594
+ Cambrian,10000,docvqa_val_anls,0.5215772912722147,0.006314944464077694
595
+ Cambrian,10000,infovqa_val_anls,0.18925972424188112,0.006302599390246784
596
+ Cambrian,10000,mme_total_score,1241.6579631852742,
597
+ Cambrian,10000,mmmu_val_mmmu_acc,0.27889,
598
+ Cambrian,10000,mmstar_average,0.34495128935097424,
599
+ Cambrian,10000,ocrbench_ocrbench_accuracy,0.373,
600
+ Cambrian,10000,seedbench_seed_all,0.5510839355197332,
601
+ Cambrian,10000,textvqa_val_exact_match,0.5034000000000001,0.0067932111363852585
602
+ Cambrian,11000,ai2d_exact_match,0.4481865284974093,0.008950704796242765
603
+ Cambrian,11000,average,0.42096531591252645,
604
+ Cambrian,11000,average_rank,2.0,
605
+ Cambrian,11000,chartqa_relaxed_overall,0.5388,0.0099718403035556
606
+ Cambrian,11000,docvqa_val_anls,0.5266496382012209,0.006315639724937912
607
+ Cambrian,11000,infovqa_val_anls,0.210453542763111,0.006757501751011823
608
+ Cambrian,11000,mme_total_score,1288.1182472989194,
609
+ Cambrian,11000,mmmu_val_mmmu_acc,0.28556,
610
+ Cambrian,11000,mmstar_average,0.33813173019346515,
611
+ Cambrian,11000,ocrbench_ocrbench_accuracy,0.372,
612
+ Cambrian,11000,seedbench_seed_all,0.547526403557532,
613
+ Cambrian,11000,textvqa_val_exact_match,0.5213800000000001,0.00677771101429669
614
+ Cambrian,12000,ai2d_exact_match,0.4566062176165803,0.008965198879336198
615
+ Cambrian,12000,average,0.42647137409223257,
616
+ Cambrian,12000,average_rank,2.1,
617
+ Cambrian,12000,chartqa_relaxed_overall,0.5488,0.00995424828018316
618
+ Cambrian,12000,docvqa_val_anls,0.5432685128640529,0.006286968775744768
619
+ Cambrian,12000,infovqa_val_anls,0.214068867667478,0.006728697021311144
620
+ Cambrian,12000,mme_total_score,1272.0885354141656,
621
+ Cambrian,12000,mmmu_val_mmmu_acc,0.27556,
622
+ Cambrian,12000,mmstar_average,0.3364706975313428,
623
+ Cambrian,12000,ocrbench_ocrbench_accuracy,0.396,
624
+ Cambrian,12000,seedbench_seed_all,0.5505280711506393,
625
+ Cambrian,12000,textvqa_val_exact_match,0.51694,0.00676817323313926
626
+ Cambrian,13000,ai2d_exact_match,0.44591968911917096,0.008946359966425538
627
+ Cambrian,13000,average,0.42595033048849396,
628
+ Cambrian,13000,average_rank,2.1,
629
+ Cambrian,13000,chartqa_relaxed_overall,0.5484,0.009955029736109216
630
+ Cambrian,13000,docvqa_val_anls,0.5438384263330651,0.006322105329987294
631
+ Cambrian,13000,infovqa_val_anls,0.2206834922799479,0.006931006985711701
632
+ Cambrian,13000,mme_total_score,1294.3567426970787,
633
+ Cambrian,13000,mmmu_val_mmmu_acc,0.27889,
634
+ Cambrian,13000,mmstar_average,0.3258043460972802,
635
+ Cambrian,13000,ocrbench_ocrbench_accuracy,0.404,
636
+ Cambrian,13000,seedbench_seed_all,0.5466370205669816,
637
+ Cambrian,13000,textvqa_val_exact_match,0.5193800000000001,0.006779976160381913
638
+ Cambrian,14000,ai2d_exact_match,0.452720207253886,0.00895883074213608
639
+ Cambrian,14000,average,0.4290628718702856,
640
+ Cambrian,14000,average_rank,2.2,
641
+ Cambrian,14000,chartqa_relaxed_overall,0.5624,0.009923804147377265
642
+ Cambrian,14000,docvqa_val_anls,0.5501582985035621,0.006289139790552158
643
+ Cambrian,14000,infovqa_val_anls,0.2108586833777777,0.006694603397438603
644
+ Cambrian,14000,mme_total_score,1258.3851540616247,
645
+ Cambrian,14000,mmmu_val_mmmu_acc,0.28444,
646
+ Cambrian,14000,mmstar_average,0.3392338272359765,
647
+ Cambrian,14000,ocrbench_ocrbench_accuracy,0.391,
648
+ Cambrian,14000,seedbench_seed_all,0.5506948304613675,
649
+ Cambrian,14000,textvqa_val_exact_match,0.5200600000000001,0.006762031077483937
650
+ Cambrian,15000,ai2d_exact_match,0.4575777202072539,0.008966704964444827
651
+ Cambrian,15000,average,0.4277300448618869,
652
+ Cambrian,15000,average_rank,2.2,
653
+ Cambrian,15000,chartqa_relaxed_overall,0.5572,0.009936335154498413
654
+ Cambrian,15000,docvqa_val_anls,0.550106577844955,0.006305789516584643
655
+ Cambrian,15000,infovqa_val_anls,0.2065365477570411,0.006585265308234506
656
+ Cambrian,15000,mme_total_score,1191.499399759904,
657
+ Cambrian,15000,mmmu_val_mmmu_acc,0.27667,
658
+ Cambrian,15000,mmstar_average,0.3287834934674655,
659
+ Cambrian,15000,ocrbench_ocrbench_accuracy,0.403,
660
+ Cambrian,15000,seedbench_seed_all,0.5489160644802669,
661
+ Cambrian,15000,textvqa_val_exact_match,0.52078,0.006761241098810132
662
+ Cambrian,16000,ai2d_exact_match,0.45174870466321243,0.008957152666985158
663
+ Cambrian,16000,average,0.4283932783055524,
664
+ Cambrian,16000,average_rank,2.0,
665
+ Cambrian,16000,chartqa_relaxed_overall,0.566,0.00991448025705367
666
+ Cambrian,16000,docvqa_val_anls,0.5507111549470696,0.006298722691255348
667
+ Cambrian,16000,infovqa_val_anls,0.21185403234992514,0.0065982885956266755
668
+ Cambrian,16000,mme_total_score,1242.7407963185274,
669
+ Cambrian,16000,mmmu_val_mmmu_acc,0.28111,
670
+ Cambrian,16000,mmstar_average,0.32560559611383355,
671
+ Cambrian,16000,ocrbench_ocrbench_accuracy,0.394,
672
+ Cambrian,16000,seedbench_seed_all,0.5540300166759311,
673
+ Cambrian,16000,textvqa_val_exact_match,0.5204799999999999,0.006783488561456611
674
+ Cambrian,17000,ai2d_exact_match,0.4585492227979275,0.008968176705111413
675
+ Cambrian,17000,average,0.43044446070382536,
676
+ Cambrian,17000,average_rank,2.4,
677
+ Cambrian,17000,chartqa_relaxed_overall,0.5656,0.009915542506251351
678
+ Cambrian,17000,docvqa_val_anls,0.5528747665552118,0.006300095973166064
679
+ Cambrian,17000,infovqa_val_anls,0.20960594545383252,0.0066643358201217045
680
+ Cambrian,17000,mme_total_score,1292.4750900360143,
681
+ Cambrian,17000,mmmu_val_mmmu_acc,0.27111,
682
+ Cambrian,17000,mmstar_average,0.3297184661133375,
683
+ Cambrian,17000,ocrbench_ocrbench_accuracy,0.409,
684
+ Cambrian,17000,seedbench_seed_all,0.555141745414119,
685
+ Cambrian,17000,textvqa_val_exact_match,0.5224,0.006774129151791618
686
+ Cambrian,18000,ai2d_exact_match,0.4523963730569948,0.008958275210820045
687
+ Cambrian,18000,average,0.43086034100304976,
688
+ Cambrian,18000,average_rank,2.4,
689
+ Cambrian,18000,chartqa_relaxed_overall,0.566,0.00991448025705367
690
+ Cambrian,18000,docvqa_val_anls,0.5527950768923724,0.006311862091164367
691
+ Cambrian,18000,infovqa_val_anls,0.21943552260393814,0.006848865968629337
692
+ Cambrian,18000,mme_total_score,1271.4629851940776,
693
+ Cambrian,18000,mmmu_val_mmmu_acc,0.28333,
694
+ Cambrian,18000,mmstar_average,0.3399009269355101,
695
+ Cambrian,18000,ocrbench_ocrbench_accuracy,0.403,
696
+ Cambrian,18000,seedbench_seed_all,0.5493051695386326,
697
+ Cambrian,18000,textvqa_val_exact_match,0.5115799999999999,0.0067870754820260944
698
+ Cambrian,19000,ai2d_exact_match,0.45012953367875647,0.008954279299902583
699
+ Cambrian,19000,average,0.43057935657557483,
700
+ Cambrian,19000,average_rank,2.2,
701
+ Cambrian,19000,chartqa_relaxed_overall,0.5704,0.009902361269085337
702
+ Cambrian,19000,docvqa_val_anls,0.5526262050544066,0.006310038331338026
703
+ Cambrian,19000,infovqa_val_anls,0.21937034023427093,0.006858602078113178
704
+ Cambrian,19000,mme_total_score,1269.9476790716285,
705
+ Cambrian,19000,mmmu_val_mmmu_acc,0.28556,
706
+ Cambrian,19000,mmstar_average,0.3314266960826673,
707
+ Cambrian,19000,ocrbench_ocrbench_accuracy,0.404,
708
+ Cambrian,19000,seedbench_seed_all,0.5465814341300722,
709
+ Cambrian,19000,textvqa_val_exact_match,0.51512,0.006773909823053313
710
+ Cambrian,20000,ai2d_exact_match,0.45531088082901555,0.008963137311190377
711
+ Cambrian,20000,average,0.42817340693945505,
712
+ Cambrian,20000,average_rank,2.4,
713
+ Cambrian,20000,chartqa_relaxed_overall,0.5684,0.009907968668564455
714
+ Cambrian,20000,docvqa_val_anls,0.549188563518089,0.006325944032596611
715
+ Cambrian,20000,infovqa_val_anls,0.21755406764942647,0.0068363256354831885
716
+ Cambrian,20000,mme_total_score,1290.6296518607442,
717
+ Cambrian,20000,mmmu_val_mmmu_acc,0.28444,
718
+ Cambrian,20000,mmstar_average,0.32485343172593534,
719
+ Cambrian,20000,ocrbench_ocrbench_accuracy,0.392,
720
+ Cambrian,20000,seedbench_seed_all,0.5486937187326293,
721
+ Cambrian,20000,textvqa_val_exact_match,0.51312,0.006789609184524225
722
+ LLaVa,1000,ai2d_exact_match,0.25777202072538863,0.007872600874396432
723
+ LLaVa,1000,average,0.2581360512843851,
724
+ LLaVa,1000,average_rank,3.0,
725
+ LLaVa,1000,chartqa_relaxed_overall,0.1576,0.007288768514542319
726
+ LLaVa,1000,docvqa_val_anls,0.2850280465017524,0.005237571860745478
727
+ LLaVa,1000,infovqa_val_anls,0.15291302898150733,0.005597827181699182
728
+ LLaVa,1000,mme_total_score,844.0894357743098,
729
+ LLaVa,1000,mmmu_val_mmmu_acc,0.25333,
730
+ LLaVa,1000,mmstar_average,0.22969486173769915,
731
+ LLaVa,1000,ocrbench_ocrbench_accuracy,0.35,
732
+ LLaVa,1000,seedbench_seed_all,0.2717065036131184,
733
+ LLaVa,1000,textvqa_val_exact_match,0.36518,0.006561838543046682
734
+ LLaVa,2000,ai2d_exact_match,0.24676165803108807,0.007759553547248649
735
+ LLaVa,2000,average,0.28023175511348764,
736
+ LLaVa,2000,average_rank,3.2,
737
+ LLaVa,2000,chartqa_relaxed_overall,0.19,0.007847587772910948
738
+ LLaVa,2000,docvqa_val_anls,0.31839133336930814,0.005353711170722305
739
+ LLaVa,2000,infovqa_val_anls,0.1625232406439703,0.005680709103352321
740
+ LLaVa,2000,mme_total_score,677.0834333733493,
741
+ LLaVa,2000,mmmu_val_mmmu_acc,0.25111,
742
+ LLaVa,2000,mmstar_average,0.2602226545829147,
743
+ LLaVa,2000,ocrbench_ocrbench_accuracy,0.389,
744
+ LLaVa,2000,seedbench_seed_all,0.2864369093941078,
745
+ LLaVa,2000,textvqa_val_exact_match,0.41764000000000007,0.006695635323587844
746
+ LLaVa,3000,ai2d_exact_match,0.31541450777202074,0.00836346730591157
747
+ LLaVa,3000,average,0.3241247472461608,
748
+ LLaVa,3000,average_rank,3.1,
749
+ LLaVa,3000,chartqa_relaxed_overall,0.2048,0.008072722684486087
750
+ LLaVa,3000,docvqa_val_anls,0.33927313841893186,0.005424261898744584
751
+ LLaVa,3000,infovqa_val_anls,0.17400826017663457,0.005878416771815313
752
+ LLaVa,3000,mme_total_score,674.5895358143258,
753
+ LLaVa,3000,mmmu_val_mmmu_acc,0.27778,
754
+ LLaVa,3000,mmstar_average,0.28839612401739867,
755
+ LLaVa,3000,ocrbench_ocrbench_accuracy,0.428,
756
+ LLaVa,3000,seedbench_seed_all,0.4512506948304614,
757
+ LLaVa,3000,textvqa_val_exact_match,0.4382,0.006743326070219196
758
+ LLaVa,4000,ai2d_exact_match,0.30667098445595853,0.008299228398743067
759
+ LLaVa,4000,average,0.34151562451124173,
760
+ LLaVa,4000,average_rank,2.8,
761
+ LLaVa,4000,chartqa_relaxed_overall,0.2168,0.00824295350666284
762
+ LLaVa,4000,docvqa_val_anls,0.36894439928615425,0.005583877165382837
763
+ LLaVa,4000,infovqa_val_anls,0.1815741433661475,0.005975096001960774
764
+ LLaVa,4000,mme_total_score,660.3387354941976,
765
+ LLaVa,4000,mmmu_val_mmmu_acc,0.29444,
766
+ LLaVa,4000,mmstar_average,0.3089940618086463,
767
+ LLaVa,4000,ocrbench_ocrbench_accuracy,0.439,
768
+ LLaVa,4000,seedbench_seed_all,0.48265703168426904,
769
+ LLaVa,4000,textvqa_val_exact_match,0.4745599999999999,0.006778004835488831
770
+ LLaVa,5000,ai2d_exact_match,0.3176813471502591,0.00837955903737489
771
+ LLaVa,5000,average,0.3488971740226244,
772
+ LLaVa,5000,average_rank,2.9,
773
+ LLaVa,5000,chartqa_relaxed_overall,0.2076,0.008113397986710395
774
+ LLaVa,5000,docvqa_val_anls,0.37667351380566144,0.005504553709162657
775
+ LLaVa,5000,infovqa_val_anls,0.19157302816202296,0.006066754825254386
776
+ LLaVa,5000,mme_total_score,596.045218087235,
777
+ LLaVa,5000,mmmu_val_mmmu_acc,0.28889,
778
+ LLaVa,5000,mmstar_average,0.30911460927022283,
779
+ LLaVa,5000,ocrbench_ocrbench_accuracy,0.471,
780
+ LLaVa,5000,seedbench_seed_all,0.49972206781545303,
781
+ LLaVa,5000,textvqa_val_exact_match,0.47781999999999997,0.00678922884027701
782
+ LLaVa,6000,ai2d_exact_match,0.3626943005181347,0.00865318426683941
783
+ LLaVa,6000,average,0.35336013036474917,
784
+ LLaVa,6000,average_rank,3.3,
785
+ LLaVa,6000,chartqa_relaxed_overall,0.2164,0.00823744852629073
786
+ LLaVa,6000,docvqa_val_anls,0.3796381971300078,0.005512363416378596
787
+ LLaVa,6000,infovqa_val_anls,0.1911083172357537,0.00606756561226675
788
+ LLaVa,6000,mme_total_score,751.7179871948779,
789
+ LLaVa,6000,mmmu_val_mmmu_acc,0.27111,
790
+ LLaVa,6000,mmstar_average,0.3230226430014031,
791
+ LLaVa,6000,ocrbench_ocrbench_accuracy,0.471,
792
+ LLaVa,6000,seedbench_seed_all,0.49788771539744303,
793
+ LLaVa,6000,textvqa_val_exact_match,0.46738,0.006777431212101451
794
+ LLaVa,7000,ai2d_exact_match,0.3636658031088083,0.008658158841882565
795
+ LLaVa,7000,average,0.36232264653787655,
796
+ LLaVa,7000,average_rank,3.4,
797
+ LLaVa,7000,chartqa_relaxed_overall,0.2276,0.00838733777631434
798
+ LLaVa,7000,docvqa_val_anls,0.38862032747814834,0.005554025202613156
799
+ LLaVa,7000,infovqa_val_anls,0.1987523491607365,0.006169459873730798
800
+ LLaVa,7000,mme_total_score,700.0341136454582,
801
+ LLaVa,7000,mmmu_val_mmmu_acc,0.28,
802
+ LLaVa,7000,mmstar_average,0.32238002502982693,
803
+ LLaVa,7000,ocrbench_ocrbench_accuracy,0.469,
804
+ LLaVa,7000,seedbench_seed_all,0.5175653140633686,
805
+ LLaVa,7000,textvqa_val_exact_match,0.49332,0.006784414578741135
806
+ LLaVa,8000,ai2d_exact_match,0.38244818652849744,0.008746910624026853
807
+ LLaVa,8000,average,0.36916094621046264,
808
+ LLaVa,8000,average_rank,2.8,
809
+ LLaVa,8000,chartqa_relaxed_overall,0.2276,0.00838733777631434
810
+ LLaVa,8000,docvqa_val_anls,0.4000384036155175,0.005647492303754258
811
+ LLaVa,8000,infovqa_val_anls,0.20267340215584623,0.006186451136703468
812
+ LLaVa,8000,mme_total_score,787.0998399359744,
813
+ LLaVa,8000,mmmu_val_mmmu_acc,0.28333,
814
+ LLaVa,8000,mmstar_average,0.33877512170436386,
815
+ LLaVa,8000,ocrbench_ocrbench_accuracy,0.47,
816
+ LLaVa,8000,seedbench_seed_all,0.5221234018899389,
817
+ LLaVa,8000,textvqa_val_exact_match,0.49546,0.006796875545678079
818
+ LLaVa,9000,ai2d_exact_match,0.3856865284974093,0.008760803506529557
819
+ LLaVa,9000,average,0.3660729124456708,
820
+ LLaVa,9000,average_rank,3.0,
821
+ LLaVa,9000,chartqa_relaxed_overall,0.2212,0.00830275847651416
822
+ LLaVa,9000,docvqa_val_anls,0.3961556104365206,0.005555787005997977
823
+ LLaVa,9000,infovqa_val_anls,0.20795411138332273,0.006302696156883479
824
+ LLaVa,9000,mme_total_score,697.6510604241697,
825
+ LLaVa,9000,mmmu_val_mmmu_acc,0.27444,
826
+ LLaVa,9000,mmstar_average,0.33019217959261743,
827
+ LLaVa,9000,ocrbench_ocrbench_accuracy,0.47,
828
+ LLaVa,9000,seedbench_seed_all,0.5140077821011673,
829
+ LLaVa,9000,textvqa_val_exact_match,0.49501999999999996,0.006795224421237829
830
+ LLaVa,10000,ai2d_exact_match,0.3636658031088083,0.008658158841882561
831
+ LLaVa,10000,average,0.36465272894871764,
832
+ LLaVa,10000,average_rank,3.1,
833
+ LLaVa,10000,chartqa_relaxed_overall,0.2216,0.008308127706914342
834
+ LLaVa,10000,docvqa_val_anls,0.3905169927438113,0.005559588309122447
835
+ LLaVa,10000,infovqa_val_anls,0.210842797817216,0.0062742161273205005
836
+ LLaVa,10000,mme_total_score,710.1757703081232,
837
+ LLaVa,10000,mmmu_val_mmmu_acc,0.25667,
838
+ LLaVa,10000,mmstar_average,0.33485115141559363,
839
+ LLaVa,10000,ocrbench_ocrbench_accuracy,0.484,
840
+ LLaVa,10000,seedbench_seed_all,0.5220678154530295,
841
+ LLaVa,10000,textvqa_val_exact_match,0.49766000000000005,0.0067820722630208075
842
+ LLaVa,11000,ai2d_exact_match,0.3539507772020725,0.008606685322379343
843
+ LLaVa,11000,average,0.3619647158138698,
844
+ LLaVa,11000,average_rank,3.3,
845
+ LLaVa,11000,chartqa_relaxed_overall,0.226,0.008366456779283321
846
+ LLaVa,11000,docvqa_val_anls,0.39615321520069524,0.0055548098783566
847
+ LLaVa,11000,infovqa_val_anls,0.20231707967850712,0.006189706400735626
848
+ LLaVa,11000,mme_total_score,620.8629451780713,
849
+ LLaVa,11000,mmmu_val_mmmu_acc,0.26778,
850
+ LLaVa,11000,mmstar_average,0.3504522318333254,
851
+ LLaVa,11000,ocrbench_ocrbench_accuracy,0.48,
852
+ LLaVa,11000,seedbench_seed_all,0.5084491384102279,
853
+ LLaVa,11000,textvqa_val_exact_match,0.47257999999999994,0.0067942373414689025
854
+ LLaVa,12000,ai2d_exact_match,0.3963730569948187,0.008803757198545707
855
+ LLaVa,12000,average,0.36835635606525785,
856
+ LLaVa,12000,average_rank,3.1,
857
+ LLaVa,12000,chartqa_relaxed_overall,0.234,0.008469137530835504
858
+ LLaVa,12000,docvqa_val_anls,0.3998087503562603,0.005606788206948343
859
+ LLaVa,12000,infovqa_val_anls,0.19486992137918643,0.006137557366661157
860
+ LLaVa,12000,mme_total_score,707.7871148459384,
861
+ LLaVa,12000,mmmu_val_mmmu_acc,0.26444,
862
+ LLaVa,12000,mmstar_average,0.34510216846405867,
863
+ LLaVa,12000,ocrbench_ocrbench_accuracy,0.466,
864
+ LLaVa,12000,seedbench_seed_all,0.5159533073929962,
865
+ LLaVa,12000,textvqa_val_exact_match,0.49866000000000005,0.006787787245571138
866
+ LLaVa,13000,ai2d_exact_match,0.37661917098445596,0.008720866089740391
867
+ LLaVa,13000,average,0.3660925061677603,
868
+ LLaVa,13000,average_rank,3.2,
869
+ LLaVa,13000,chartqa_relaxed_overall,0.23,0.008418334000200726
870
+ LLaVa,13000,docvqa_val_anls,0.39678037656395876,0.005562201990102385
871
+ LLaVa,13000,infovqa_val_anls,0.20007389352596994,0.006181717086032354
872
+ LLaVa,13000,mme_total_score,762.4510804321728,
873
+ LLaVa,13000,mmmu_val_mmmu_acc,0.26111,
874
+ LLaVa,13000,mmstar_average,0.3487764851969923,
875
+ LLaVa,13000,ocrbench_ocrbench_accuracy,0.487,
876
+ LLaVa,13000,seedbench_seed_all,0.5187326292384659,
877
+ LLaVa,13000,textvqa_val_exact_match,0.47573999999999994,0.006786037174972445
878
+ LLaVa,14000,ai2d_exact_match,0.40382124352331605,0.008831094143874325
879
+ LLaVa,14000,average,0.3665520961603681,
880
+ LLaVa,14000,average_rank,3.5,
881
+ LLaVa,14000,chartqa_relaxed_overall,0.224,0.0083401092900026
882
+ LLaVa,14000,docvqa_val_anls,0.39653795108545226,0.0055480083540036754
883
+ LLaVa,14000,infovqa_val_anls,0.1966338205713239,0.006145830112184984
884
+ LLaVa,14000,mme_total_score,648.8810524209684,
885
+ LLaVa,14000,mmmu_val_mmmu_acc,0.27222,
886
+ LLaVa,14000,mmstar_average,0.3348780070169728,
887
+ LLaVa,14000,ocrbench_ocrbench_accuracy,0.482,
888
+ LLaVa,14000,seedbench_seed_all,0.5121178432462479,
889
+ LLaVa,14000,textvqa_val_exact_match,0.47676,0.006784540255411228
890
+ LLaVa,15000,ai2d_exact_match,0.38374352331606215,0.008752516998880439
891
+ LLaVa,15000,average,0.3656314014070533,
892
+ LLaVa,15000,average_rank,3.3,
893
+ LLaVa,15000,chartqa_relaxed_overall,0.222,0.008313485768211027
894
+ LLaVa,15000,docvqa_val_anls,0.3956148602850384,0.005571289516040145
895
+ LLaVa,15000,infovqa_val_anls,0.2003939669503818,0.006205919365204143
896
+ LLaVa,15000,mme_total_score,744.8995598239295,
897
+ LLaVa,15000,mmmu_val_mmmu_acc,0.25111,
898
+ LLaVa,15000,mmstar_average,0.34431451447442113,
899
+ LLaVa,15000,ocrbench_ocrbench_accuracy,0.491,
900
+ LLaVa,15000,seedbench_seed_all,0.5223457476375765,
901
+ LLaVa,15000,textvqa_val_exact_match,0.48016000000000003,0.006780152577471598
902
+ LLaVa,16000,ai2d_exact_match,0.38244818652849744,0.008746910624026851
903
+ LLaVa,16000,average,0.3664952284054124,
904
+ LLaVa,16000,average_rank,3.1,
905
+ LLaVa,16000,chartqa_relaxed_overall,0.2272,0.008382133861209024
906
+ LLaVa,16000,docvqa_val_anls,0.3971604594021061,0.005596507964441207
907
+ LLaVa,16000,infovqa_val_anls,0.20130541865614268,0.006177273754737603
908
+ LLaVa,16000,mme_total_score,741.5084033613446,
909
+ LLaVa,16000,mmmu_val_mmmu_acc,0.25444,
910
+ LLaVa,16000,mmstar_average,0.34322789378570057,
911
+ LLaVa,16000,ocrbench_ocrbench_accuracy,0.488,
912
+ LLaVa,16000,seedbench_seed_all,0.5151750972762645,
913
+ LLaVa,16000,textvqa_val_exact_match,0.4895,0.0067890182024819105
914
+ LLaVa,17000,ai2d_exact_match,0.36852331606217614,0.008682460781863906
915
+ LLaVa,17000,average,0.3659850040618015,
916
+ LLaVa,17000,average_rank,3.0,
917
+ LLaVa,17000,chartqa_relaxed_overall,0.2264,0.008371693383064148
918
+ LLaVa,17000,docvqa_val_anls,0.3895535425900796,0.005559420230793686
919
+ LLaVa,17000,infovqa_val_anls,0.19870913061640477,0.0061833458200064835
920
+ LLaVa,17000,mme_total_score,738.0654261704681,
921
+ LLaVa,17000,mmmu_val_mmmu_acc,0.27667,
922
+ LLaVa,17000,mmstar_average,0.3488362957589257,
923
+ LLaVa,17000,ocrbench_ocrbench_accuracy,0.486,
924
+ LLaVa,17000,seedbench_seed_all,0.514952751528627,
925
+ LLaVa,17000,textvqa_val_exact_match,0.48422,0.006797929147037179
926
+ LLaVa,18000,ai2d_exact_match,0.3785621761658031,0.008729696327646351
927
+ LLaVa,18000,average,0.3667559662544118,
928
+ LLaVa,18000,average_rank,3.1,
929
+ LLaVa,18000,chartqa_relaxed_overall,0.2268,0.008376919070233621
930
+ LLaVa,18000,docvqa_val_anls,0.39054490192374947,0.005557124380968682
931
+ LLaVa,18000,infovqa_val_anls,0.19983100041999644,0.006171606410532323
932
+ LLaVa,18000,mme_total_score,746.5269107643057,
933
+ LLaVa,18000,mmmu_val_mmmu_acc,0.27,
934
+ LLaVa,18000,mmstar_average,0.3522401814266279,
935
+ LLaVa,18000,ocrbench_ocrbench_accuracy,0.497,
936
+ LLaVa,18000,seedbench_seed_all,0.5137854363535297,
937
+ LLaVa,18000,textvqa_val_exact_match,0.47203999999999996,0.006793178720998519
938
+ LLaVa,19000,ai2d_exact_match,0.3707901554404145,0.008693477555877339
939
+ LLaVa,19000,average,0.3627892845719615,
940
+ LLaVa,19000,average_rank,3.2,
941
+ LLaVa,19000,chartqa_relaxed_overall,0.2284,0.008397713059747491
942
+ LLaVa,19000,docvqa_val_anls,0.3886627325813464,0.005572189741680524
943
+ LLaVa,19000,infovqa_val_anls,0.18766806187395813,0.006047287494792444
944
+ LLaVa,19000,mme_total_score,735.0644257703082,
945
+ LLaVa,19000,mmmu_val_mmmu_acc,0.27556,
946
+ LLaVa,19000,mmstar_average,0.34617955399790473,
947
+ LLaVa,19000,ocrbench_ocrbench_accuracy,0.487,
948
+ LLaVa,19000,seedbench_seed_all,0.50550305725403,
949
+ LLaVa,19000,textvqa_val_exact_match,0.47534,0.00678734045691651
950
+ LLaVa,20000,ai2d_exact_match,0.3746761658031088,0.008711886524907501
951
+ LLaVa,20000,average,0.3636232406961286,
952
+ LLaVa,20000,average_rank,3.3,
953
+ LLaVa,20000,chartqa_relaxed_overall,0.2224,0.00831883268198588
954
+ LLaVa,20000,docvqa_val_anls,0.3865323770909091,0.005551659686181904
955
+ LLaVa,20000,infovqa_val_anls,0.1967140503390298,0.006138459642690392
956
+ LLaVa,20000,mme_total_score,688.5517206882753,
957
+ LLaVa,20000,mmmu_val_mmmu_acc,0.27556,
958
+ LLaVa,20000,mmstar_average,0.3525069399025931,
959
+ LLaVa,20000,ocrbench_ocrbench_accuracy,0.494,
960
+ LLaVa,20000,seedbench_seed_all,0.5113396331295164,
961
+ LLaVa,20000,textvqa_val_exact_match,0.45888,0.006775175991953595
app/src/content/assets/data/against_baselines_deduplicated.csv ADDED
@@ -0,0 +1,828 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run,step,metric,value,stderr
2
+ FineVisionDD,1200,average,0.264341097123272,
3
+ FineVisionDD,1200,average_rank,2.5714285714285716,
4
+ FineVisionDD,1200,docvqa_val_anls,0.3715200680496628,0.005949832790823121
5
+ FineVisionDD,1200,infovqa_val_anls,0.19222676120723237,0.006565134600763451
6
+ FineVisionDD,1200,mme_total_score,743.1522609043617,
7
+ FineVisionDD,1200,mmmu_val_mmmu_acc,0.26222,
8
+ FineVisionDD,1200,mmstar_average,0.21525975348273643,
9
+ FineVisionDD,1200,ocrbench_ocrbench_accuracy,0.3,
10
+ FineVisionDD,1200,textvqa_val_exact_match,0.24482000000000004,0.005905726800471586
11
+ FineVisionDD,2400,average,0.3178775750923926,
12
+ FineVisionDD,2400,average_rank,2.4285714285714284,
13
+ FineVisionDD,2400,docvqa_val_anls,0.47030638473718095,0.006228583735740807
14
+ FineVisionDD,2400,infovqa_val_anls,0.20933736286426122,0.006709818578853176
15
+ FineVisionDD,2400,mme_total_score,1185.2899159663866,
16
+ FineVisionDD,2400,mmmu_val_mmmu_acc,0.25,
17
+ FineVisionDD,2400,mmstar_average,0.24490170295291339,
18
+ FineVisionDD,2400,ocrbench_ocrbench_accuracy,0.384,
19
+ FineVisionDD,2400,textvqa_val_exact_match,0.34872,0.00652553360559637
20
+ FineVisionDD,3600,average,0.34596783716441254,
21
+ FineVisionDD,3600,average_rank,2.4285714285714284,
22
+ FineVisionDD,3600,docvqa_val_anls,0.52073479618703,0.006284214687786431
23
+ FineVisionDD,3600,infovqa_val_anls,0.22809076679417026,0.006878849345111437
24
+ FineVisionDD,3600,mme_total_score,1168.4510804321728,
25
+ FineVisionDD,3600,mmmu_val_mmmu_acc,0.25667,
26
+ FineVisionDD,3600,mmstar_average,0.23323146000527503,
27
+ FineVisionDD,3600,ocrbench_ocrbench_accuracy,0.454,
28
+ FineVisionDD,3600,textvqa_val_exact_match,0.38308000000000003,0.0066477952252059665
29
+ FineVisionDD,4800,average,0.3549622071061929,
30
+ FineVisionDD,4800,average_rank,2.2857142857142856,
31
+ FineVisionDD,4800,docvqa_val_anls,0.5347116037470354,0.006161120918755636
32
+ FineVisionDD,4800,infovqa_val_anls,0.22616829864068178,0.006791811877573115
33
+ FineVisionDD,4800,mme_total_score,1067.0920368147258,
34
+ FineVisionDD,4800,mmmu_val_mmmu_acc,0.27444,
35
+ FineVisionDD,4800,mmstar_average,0.23307334024944037,
36
+ FineVisionDD,4800,ocrbench_ocrbench_accuracy,0.473,
37
+ FineVisionDD,4800,textvqa_val_exact_match,0.38837999999999995,0.006654731565618713
38
+ FineVisionDD,6000,average,0.3848921103122081,
39
+ FineVisionDD,6000,average_rank,2.142857142857143,
40
+ FineVisionDD,6000,docvqa_val_anls,0.5762794835718067,0.006247345256607651
41
+ FineVisionDD,6000,infovqa_val_anls,0.25437900510747613,0.007245969162163573
42
+ FineVisionDD,6000,mme_total_score,1182.3837535014004,
43
+ FineVisionDD,6000,mmmu_val_mmmu_acc,0.27222,
44
+ FineVisionDD,6000,mmstar_average,0.2747341731939661,
45
+ FineVisionDD,6000,ocrbench_ocrbench_accuracy,0.495,
46
+ FineVisionDD,6000,textvqa_val_exact_match,0.43673999999999996,0.006759376621735387
47
+ FineVisionDD,7200,average,0.3978156352765745,
48
+ FineVisionDD,7200,average_rank,1.8571428571428572,
49
+ FineVisionDD,7200,docvqa_val_anls,0.5914916761381446,0.006230792162717311
50
+ FineVisionDD,7200,infovqa_val_anls,0.2584115961449724,0.007214877478455323
51
+ FineVisionDD,7200,mme_total_score,1174.9931972789116,
52
+ FineVisionDD,7200,mmmu_val_mmmu_acc,0.28889,
53
+ FineVisionDD,7200,mmstar_average,0.30312053937633016,
54
+ FineVisionDD,7200,ocrbench_ocrbench_accuracy,0.501,
55
+ FineVisionDD,7200,textvqa_val_exact_match,0.44398000000000004,0.006765405092173878
56
+ FineVisionDD,8400,average,0.4059159035113804,
57
+ FineVisionDD,8400,average_rank,1.7142857142857142,
58
+ FineVisionDD,8400,docvqa_val_anls,0.6115548076222326,0.006189572923188405
59
+ FineVisionDD,8400,infovqa_val_anls,0.2617197889496108,0.007158591695868175
60
+ FineVisionDD,8400,mme_total_score,1252.2165866346538,
61
+ FineVisionDD,8400,mmmu_val_mmmu_acc,0.29444,
62
+ FineVisionDD,8400,mmstar_average,0.285260824496439,
63
+ FineVisionDD,8400,ocrbench_ocrbench_accuracy,0.52,
64
+ FineVisionDD,8400,textvqa_val_exact_match,0.4625200000000001,0.0067937236370175695
65
+ FineVisionDD,9600,average,0.41115899049749083,
66
+ FineVisionDD,9600,average_rank,1.5714285714285714,
67
+ FineVisionDD,9600,docvqa_val_anls,0.6213641622467091,0.006165172206432181
68
+ FineVisionDD,9600,infovqa_val_anls,0.2757908658532091,0.007363785243871019
69
+ FineVisionDD,9600,mme_total_score,1239.7746098439375,
70
+ FineVisionDD,9600,mmmu_val_mmmu_acc,0.29444,
71
+ FineVisionDD,9600,mmstar_average,0.2999389148850269,
72
+ FineVisionDD,9600,ocrbench_ocrbench_accuracy,0.519,
73
+ FineVisionDD,9600,textvqa_val_exact_match,0.45642000000000005,0.006788827170791062
74
+ FineVisionDD,10800,average,0.41894565175282533,
75
+ FineVisionDD,10800,average_rank,1.1428571428571428,
76
+ FineVisionDD,10800,docvqa_val_anls,0.6353621980573124,0.006124533744452508
77
+ FineVisionDD,10800,infovqa_val_anls,0.26751996667040645,0.0071172404352328284
78
+ FineVisionDD,10800,mme_total_score,1353.3499399759903,
79
+ FineVisionDD,10800,mmmu_val_mmmu_acc,0.29778,
80
+ FineVisionDD,10800,mmstar_average,0.325351745789233,
81
+ FineVisionDD,10800,ocrbench_ocrbench_accuracy,0.516,
82
+ FineVisionDD,10800,textvqa_val_exact_match,0.47165999999999997,0.0067931287489374085
83
+ FineVisionDD,12000,average,0.4208515127756214,
84
+ FineVisionDD,12000,average_rank,1.4285714285714286,
85
+ FineVisionDD,12000,docvqa_val_anls,0.6294351828158641,0.006169625021925361
86
+ FineVisionDD,12000,infovqa_val_anls,0.2797661440287805,0.007408513793528687
87
+ FineVisionDD,12000,mme_total_score,1091.6394557823128,
88
+ FineVisionDD,12000,mmmu_val_mmmu_acc,0.29556,
89
+ FineVisionDD,12000,mmstar_average,0.32114774980908367,
90
+ FineVisionDD,12000,ocrbench_ocrbench_accuracy,0.525,
91
+ FineVisionDD,12000,textvqa_val_exact_match,0.4742,0.006787465354400525
92
+ FineVisionDD,13200,average,0.42658753741516975,
93
+ FineVisionDD,13200,average_rank,1.5714285714285714,
94
+ FineVisionDD,13200,docvqa_val_anls,0.6427877927509281,0.006125147292514003
95
+ FineVisionDD,13200,infovqa_val_anls,0.2907270038093242,0.007372590798085613
96
+ FineVisionDD,13200,mme_total_score,1211.7135854341736,
97
+ FineVisionDD,13200,mmmu_val_mmmu_acc,0.28889,
98
+ FineVisionDD,13200,mmstar_average,0.30988042793076603,
99
+ FineVisionDD,13200,ocrbench_ocrbench_accuracy,0.546,
100
+ FineVisionDD,13200,textvqa_val_exact_match,0.48123999999999995,0.0068072667243212395
101
+ FineVisionDD,14400,average,0.4273536900736185,
102
+ FineVisionDD,14400,average_rank,1.5714285714285714,
103
+ FineVisionDD,14400,docvqa_val_anls,0.654480111743584,0.006079437400066777
104
+ FineVisionDD,14400,infovqa_val_anls,0.2776743812062677,0.007152404684338895
105
+ FineVisionDD,14400,mme_total_score,1211.577330932373,
106
+ FineVisionDD,14400,mmmu_val_mmmu_acc,0.28222,
107
+ FineVisionDD,14400,mmstar_average,0.32896764749185925,
108
+ FineVisionDD,14400,ocrbench_ocrbench_accuracy,0.527,
109
+ FineVisionDD,14400,textvqa_val_exact_match,0.49378,0.006791486374677893
110
+ FineVisionDD,15600,average,0.4373836230155283,
111
+ FineVisionDD,15600,average_rank,1.0,
112
+ FineVisionDD,15600,docvqa_val_anls,0.6587223702708729,0.0060724859630705355
113
+ FineVisionDD,15600,infovqa_val_anls,0.2954608342132971,0.007455706284703673
114
+ FineVisionDD,15600,mme_total_score,1196.3369347739094,
115
+ FineVisionDD,15600,mmmu_val_mmmu_acc,0.29333,
116
+ FineVisionDD,15600,mmstar_average,0.33750853360899963,
117
+ FineVisionDD,15600,ocrbench_ocrbench_accuracy,0.54,
118
+ FineVisionDD,15600,textvqa_val_exact_match,0.49927999999999995,0.0067965531666418525
119
+ FineVisionDD,16800,average,0.43378959957858315,
120
+ FineVisionDD,16800,average_rank,1.2857142857142858,
121
+ FineVisionDD,16800,docvqa_val_anls,0.6677987652181413,0.006012562319824571
122
+ FineVisionDD,16800,infovqa_val_anls,0.2813134865271826,0.007107230565585641
123
+ FineVisionDD,16800,mme_total_score,1303.9127651060423,
124
+ FineVisionDD,16800,mmmu_val_mmmu_acc,0.28111,
125
+ FineVisionDD,16800,mmstar_average,0.3315953457261746,
126
+ FineVisionDD,16800,ocrbench_ocrbench_accuracy,0.549,
127
+ FineVisionDD,16800,textvqa_val_exact_match,0.4919200000000001,0.006795246706011423
128
+ FineVisionDD,18000,average,0.4460242607466102,
129
+ FineVisionDD,18000,average_rank,1.1428571428571428,
130
+ FineVisionDD,18000,docvqa_val_anls,0.6719255126618523,0.006008621561058294
131
+ FineVisionDD,18000,infovqa_val_anls,0.29900934485493813,0.007466958171203317
132
+ FineVisionDD,18000,mme_total_score,1236.6654661864745,
133
+ FineVisionDD,18000,mmmu_val_mmmu_acc,0.3,
134
+ FineVisionDD,18000,mmstar_average,0.34327070696287054,
135
+ FineVisionDD,18000,ocrbench_ocrbench_accuracy,0.546,
136
+ FineVisionDD,18000,textvqa_val_exact_match,0.5159400000000001,0.006793085637800874
137
+ FineVisionDD,19200,average,0.44845865852995476,
138
+ FineVisionDD,19200,average_rank,1.0,
139
+ FineVisionDD,19200,docvqa_val_anls,0.6777684245254485,0.005985910291387732
140
+ FineVisionDD,19200,infovqa_val_anls,0.2877789783739627,0.007152893066126468
141
+ FineVisionDD,19200,mme_total_score,1240.2280912364945,
142
+ FineVisionDD,19200,mmmu_val_mmmu_acc,0.29778,
143
+ FineVisionDD,19200,mmstar_average,0.3473245482803175,
144
+ FineVisionDD,19200,ocrbench_ocrbench_accuracy,0.568,
145
+ FineVisionDD,19200,textvqa_val_exact_match,0.5121,0.006797143387603819
146
+ FineVisionDD,20400,average,0.4507597489696731,
147
+ FineVisionDD,20400,docvqa_val_anls,0.683992435806577,0.005972444631447485
148
+ FineVisionDD,20400,infovqa_val_anls,0.29487349692639875,0.00732361020606081
149
+ FineVisionDD,20400,mme_total_score,1273.203481392557,
150
+ FineVisionDD,20400,mmmu_val_mmmu_acc,0.28222,
151
+ FineVisionDD,20400,mmstar_average,0.349552561085063,
152
+ FineVisionDD,20400,ocrbench_ocrbench_accuracy,0.575,
153
+ FineVisionDD,20400,textvqa_val_exact_match,0.5189199999999999,0.006790760605846829
154
+ CauldronDD,300,average,0.19965858916400772,
155
+ CauldronDD,300,average_rank,1.5714285714285714,
156
+ CauldronDD,300,docvqa_val_anls,0.1630709902951134,0.004134430994096956
157
+ CauldronDD,300,infovqa_val_anls,0.11235975762377737,0.0049002431669000045
158
+ CauldronDD,300,mme_total_score,916.8871548619447,
159
+ CauldronDD,300,mmmu_val_mmmu_acc,0.25667,
160
+ CauldronDD,300,mmstar_average,0.22555078706515572,
161
+ CauldronDD,300,ocrbench_ocrbench_accuracy,0.181,
162
+ CauldronDD,300,textvqa_val_exact_match,0.2593,0.006011350036876339
163
+ CauldronDD,1200,average,0.29972102969630693,
164
+ CauldronDD,1200,average_rank,1.5714285714285714,
165
+ CauldronDD,1200,docvqa_val_anls,0.3393747623503541,0.005393199870631087
166
+ CauldronDD,1200,infovqa_val_anls,0.14788475521512282,0.005517625394198703
167
+ CauldronDD,1200,mme_total_score,1237.1527611044417,
168
+ CauldronDD,1200,mmmu_val_mmmu_acc,0.28444,
169
+ CauldronDD,1200,mmstar_average,0.2961666606123647,
170
+ CauldronDD,1200,ocrbench_ocrbench_accuracy,0.324,
171
+ CauldronDD,1200,textvqa_val_exact_match,0.40646000000000004,0.006706135111196755
172
+ CauldronDD,2400,average,0.3338688722253544,
173
+ CauldronDD,2400,average_rank,1.8571428571428572,
174
+ CauldronDD,2400,docvqa_val_anls,0.4106908679403099,0.00557717705073105
175
+ CauldronDD,2400,infovqa_val_anls,0.16022819076638478,0.005740317063734872
176
+ CauldronDD,2400,mme_total_score,1243.3691476590636,
177
+ CauldronDD,2400,mmmu_val_mmmu_acc,0.27889,
178
+ CauldronDD,2400,mmstar_average,0.33588417464543163,
179
+ CauldronDD,2400,ocrbench_ocrbench_accuracy,0.366,
180
+ CauldronDD,2400,textvqa_val_exact_match,0.45152,0.006779965450229171
181
+ CauldronDD,2700,average,0.34202507191206166,
182
+ CauldronDD,2700,average_rank,1.5714285714285714,
183
+ CauldronDD,2700,docvqa_val_anls,0.4194265744988737,0.005598230883238166
184
+ CauldronDD,2700,infovqa_val_anls,0.16192600102107405,0.005717482217545598
185
+ CauldronDD,2700,mme_total_score,1197.9157663065225,
186
+ CauldronDD,2700,mmmu_val_mmmu_acc,0.29,
187
+ CauldronDD,2700,mmstar_average,0.32295785595242227,
188
+ CauldronDD,2700,ocrbench_ocrbench_accuracy,0.388,
189
+ CauldronDD,2700,textvqa_val_exact_match,0.46984,0.006812118310127491
190
+ CauldronDD,3600,average,0.33947430615719726,
191
+ CauldronDD,3600,average_rank,2.2857142857142856,
192
+ CauldronDD,3600,docvqa_val_anls,0.43097255569855397,0.005587910026275849
193
+ CauldronDD,3600,infovqa_val_anls,0.1641426454649424,0.005800068910792727
194
+ CauldronDD,3600,mme_total_score,1310.0697278911566,
195
+ CauldronDD,3600,mmmu_val_mmmu_acc,0.28333,
196
+ CauldronDD,3600,mmstar_average,0.3259006357796873,
197
+ CauldronDD,3600,ocrbench_ocrbench_accuracy,0.36,
198
+ CauldronDD,3600,textvqa_val_exact_match,0.4725,0.006816571214960329
199
+ CauldronDD,4800,average,0.3474647210512976,
200
+ CauldronDD,4800,average_rank,2.142857142857143,
201
+ CauldronDD,4800,docvqa_val_anls,0.44347290757863167,0.005625752855686164
202
+ CauldronDD,4800,infovqa_val_anls,0.16073440834957092,0.00572812246049592
203
+ CauldronDD,4800,mme_total_score,1239.124949979992,
204
+ CauldronDD,4800,mmmu_val_mmmu_acc,0.31556,
205
+ CauldronDD,4800,mmstar_average,0.3157610103795831,
206
+ CauldronDD,4800,ocrbench_ocrbench_accuracy,0.378,
207
+ CauldronDD,4800,textvqa_val_exact_match,0.47125999999999996,0.00680373872603368
208
+ CauldronDD,5100,average,0.34849328691237624,
209
+ CauldronDD,5100,average_rank,1.7142857142857142,
210
+ CauldronDD,5100,docvqa_val_anls,0.4400533401720571,0.005603146586802499
211
+ CauldronDD,5100,infovqa_val_anls,0.1592834226378583,0.005693695979163053
212
+ CauldronDD,5100,mme_total_score,1319.4603841536614,
213
+ CauldronDD,5100,mmmu_val_mmmu_acc,0.30333,
214
+ CauldronDD,5100,mmstar_average,0.33557295866434195,
215
+ CauldronDD,5100,ocrbench_ocrbench_accuracy,0.373,
216
+ CauldronDD,5100,textvqa_val_exact_match,0.47972,0.00682083932443933
217
+ CauldronDD,6000,average,0.3400596935324955,
218
+ CauldronDD,6000,average_rank,2.0,
219
+ CauldronDD,6000,docvqa_val_anls,0.43150620522864996,0.005601817666455916
220
+ CauldronDD,6000,infovqa_val_anls,0.16804581718043338,0.005797914749544558
221
+ CauldronDD,6000,mme_total_score,1246.4825930372149,
222
+ CauldronDD,6000,mmmu_val_mmmu_acc,0.27667,
223
+ CauldronDD,6000,mmstar_average,0.34191613878588945,
224
+ CauldronDD,6000,ocrbench_ocrbench_accuracy,0.368,
225
+ CauldronDD,6000,textvqa_val_exact_match,0.45421999999999996,0.006799535650102248
226
+ CauldronDD,7200,average,0.3391609673818097,
227
+ CauldronDD,7200,average_rank,2.2857142857142856,
228
+ CauldronDD,7200,docvqa_val_anls,0.4285872356274967,0.005613450362222006
229
+ CauldronDD,7200,infovqa_val_anls,0.1673609356908039,0.0058332340615507815
230
+ CauldronDD,7200,mme_total_score,1225.8680472188876,
231
+ CauldronDD,7200,mmmu_val_mmmu_acc,0.28778,
232
+ CauldronDD,7200,mmstar_average,0.31851763297255725,
233
+ CauldronDD,7200,ocrbench_ocrbench_accuracy,0.378,
234
+ CauldronDD,7200,textvqa_val_exact_match,0.45472000000000007,0.006786512776907903
235
+ CauldronDD,7500,average,0.34519234835518026,
236
+ CauldronDD,7500,average_rank,1.8571428571428572,
237
+ CauldronDD,7500,docvqa_val_anls,0.4400007858471883,0.005617720028882394
238
+ CauldronDD,7500,infovqa_val_anls,0.1702707959590441,0.0058853960353902985
239
+ CauldronDD,7500,mme_total_score,1251.4401760704282,
240
+ CauldronDD,7500,mmmu_val_mmmu_acc,0.29889,
241
+ CauldronDD,7500,mmstar_average,0.3133725083248492,
242
+ CauldronDD,7500,ocrbench_ocrbench_accuracy,0.391,
243
+ CauldronDD,7500,textvqa_val_exact_match,0.4576200000000001,0.006805178117422201
244
+ CauldronDD,8400,average,0.3431478061334871,
245
+ CauldronDD,8400,average_rank,2.5714285714285716,
246
+ CauldronDD,8400,docvqa_val_anls,0.440186698815653,0.005613446205499607
247
+ CauldronDD,8400,infovqa_val_anls,0.17029748604016814,0.005836597208873185
248
+ CauldronDD,8400,mme_total_score,1271.5840336134456,
249
+ CauldronDD,8400,mmmu_val_mmmu_acc,0.27778,
250
+ CauldronDD,8400,mmstar_average,0.32566265194510147,
251
+ CauldronDD,8400,ocrbench_ocrbench_accuracy,0.386,
252
+ CauldronDD,8400,textvqa_val_exact_match,0.45896000000000003,0.00681272532289869
253
+ CauldronDD,9600,average,0.3413459009956081,
254
+ CauldronDD,9600,average_rank,2.857142857142857,
255
+ CauldronDD,9600,docvqa_val_anls,0.4403774280666133,0.005612804160672664
256
+ CauldronDD,9600,infovqa_val_anls,0.16559694737276026,0.0058146690100803694
257
+ CauldronDD,9600,mme_total_score,1235.5730292116846,
258
+ CauldronDD,9600,mmmu_val_mmmu_acc,0.28,
259
+ CauldronDD,9600,mmstar_average,0.33264103053427463,
260
+ CauldronDD,9600,ocrbench_ocrbench_accuracy,0.383,
261
+ CauldronDD,9600,textvqa_val_exact_match,0.44646,0.006795434442760313
262
+ CauldronDD,9900,average,0.3355067141945109,
263
+ CauldronDD,9900,average_rank,2.142857142857143,
264
+ CauldronDD,9900,docvqa_val_anls,0.43635606798831567,0.0056201106916182715
265
+ CauldronDD,9900,infovqa_val_anls,0.15989145755054796,0.005753347711050537
266
+ CauldronDD,9900,mme_total_score,1246.687775110044,
267
+ CauldronDD,9900,mmmu_val_mmmu_acc,0.27111,
268
+ CauldronDD,9900,mmstar_average,0.31970275962820155,
269
+ CauldronDD,9900,ocrbench_ocrbench_accuracy,0.381,
270
+ CauldronDD,9900,textvqa_val_exact_match,0.44497999999999993,0.006793245877922539
271
+ CauldronDD,10800,average,0.3380861972330776,
272
+ CauldronDD,10800,average_rank,3.142857142857143,
273
+ CauldronDD,10800,docvqa_val_anls,0.4402326817553441,0.005626934973411334
274
+ CauldronDD,10800,infovqa_val_anls,0.16122827030707865,0.005747720437259022
275
+ CauldronDD,10800,mme_total_score,1245.125650260104,
276
+ CauldronDD,10800,mmmu_val_mmmu_acc,0.29444,
277
+ CauldronDD,10800,mmstar_average,0.309516231336043,
278
+ CauldronDD,10800,ocrbench_ocrbench_accuracy,0.383,
279
+ CauldronDD,10800,textvqa_val_exact_match,0.4401,0.006786752537259658
280
+ CauldronDD,11400,average,0.33351487442945066,
281
+ CauldronDD,11400,average_rank,2.4285714285714284,
282
+ CauldronDD,11400,docvqa_val_anls,0.43406486854294124,0.005623873843784784
283
+ CauldronDD,11400,infovqa_val_anls,0.16714581293411426,0.005782736796323627
284
+ CauldronDD,11400,mme_total_score,1237.4036614645859,
285
+ CauldronDD,11400,mmmu_val_mmmu_acc,0.26222,
286
+ CauldronDD,11400,mmstar_average,0.32995856509964816,
287
+ CauldronDD,11400,ocrbench_ocrbench_accuracy,0.364,
288
+ CauldronDD,11400,textvqa_val_exact_match,0.4437,0.006807916828686236
289
+ CauldronDD,12000,average,0.33154594568198864,
290
+ CauldronDD,12000,average_rank,3.2857142857142856,
291
+ CauldronDD,12000,docvqa_val_anls,0.43508650222322015,0.00561327125316578
292
+ CauldronDD,12000,infovqa_val_anls,0.16563023539653135,0.0058079534236688945
293
+ CauldronDD,12000,mme_total_score,1240.7185874349739,
294
+ CauldronDD,12000,mmmu_val_mmmu_acc,0.27556,
295
+ CauldronDD,12000,mmstar_average,0.2978389364721804,
296
+ CauldronDD,12000,ocrbench_ocrbench_accuracy,0.375,
297
+ CauldronDD,12000,textvqa_val_exact_match,0.44016000000000005,0.006801256229349064
298
+ CauldronDD,13200,average,0.3323617201953493,
299
+ CauldronDD,13200,average_rank,3.2857142857142856,
300
+ CauldronDD,13200,docvqa_val_anls,0.4336687642519214,0.00561127691138422
301
+ CauldronDD,13200,infovqa_val_anls,0.16294964748823013,0.00577613475202133
302
+ CauldronDD,13200,mme_total_score,1232.6909763905562,
303
+ CauldronDD,13200,mmmu_val_mmmu_acc,0.27556,
304
+ CauldronDD,13200,mmstar_average,0.3120919094319445,
305
+ CauldronDD,13200,ocrbench_ocrbench_accuracy,0.37,
306
+ CauldronDD,13200,textvqa_val_exact_match,0.4398999999999999,0.006800709369586816
307
+ CauldronDD,14400,average,0.33686465162435447,
308
+ CauldronDD,14400,average_rank,3.0,
309
+ CauldronDD,14400,docvqa_val_anls,0.4346981780601323,0.005637000083152569
310
+ CauldronDD,14400,infovqa_val_anls,0.15117394150977184,0.005624727950317896
311
+ CauldronDD,14400,mme_total_score,1229.5749299719887,
312
+ CauldronDD,14400,mmmu_val_mmmu_acc,0.28444,
313
+ CauldronDD,14400,mmstar_average,0.3150357901762228,
314
+ CauldronDD,14400,ocrbench_ocrbench_accuracy,0.396,
315
+ CauldronDD,14400,textvqa_val_exact_match,0.43983999999999995,0.006801397406514065
316
+ CauldronDD,14700,average,0.33429875896686784,
317
+ CauldronDD,14700,average_rank,2.2857142857142856,
318
+ CauldronDD,14700,docvqa_val_anls,0.4327738487046949,0.005633644388554696
319
+ CauldronDD,14700,infovqa_val_anls,0.160120841205593,0.005735225827091493
320
+ CauldronDD,14700,mme_total_score,1207.2609043617447,
321
+ CauldronDD,14700,mmmu_val_mmmu_acc,0.26,
322
+ CauldronDD,14700,mmstar_average,0.31633786389091934,
323
+ CauldronDD,14700,ocrbench_ocrbench_accuracy,0.389,
324
+ CauldronDD,14700,textvqa_val_exact_match,0.44756,0.0068101163585480235
325
+ CauldronDD,15600,average,0.32646326413760035,
326
+ CauldronDD,15600,average_rank,3.5714285714285716,
327
+ CauldronDD,15600,docvqa_val_anls,0.433995514472087,0.005646461618482555
328
+ CauldronDD,15600,infovqa_val_anls,0.1562018233604324,0.005700992835439662
329
+ CauldronDD,15600,mme_total_score,1122.3809523809523,
330
+ CauldronDD,15600,mmmu_val_mmmu_acc,0.26333,
331
+ CauldronDD,15600,mmstar_average,0.30641224699308284,
332
+ CauldronDD,15600,ocrbench_ocrbench_accuracy,0.366,
333
+ CauldronDD,15600,textvqa_val_exact_match,0.43283999999999995,0.006800820326359335
334
+ CauldronDD,16800,average,0.32818017568992097,
335
+ CauldronDD,16800,average_rank,3.2857142857142856,
336
+ CauldronDD,16800,docvqa_val_anls,0.43345387633219307,0.005602799050931306
337
+ CauldronDD,16800,infovqa_val_anls,0.16417934269316956,0.005815179007624968
338
+ CauldronDD,16800,mme_total_score,1197.6628651460585,
339
+ CauldronDD,16800,mmmu_val_mmmu_acc,0.27111,
340
+ CauldronDD,16800,mmstar_average,0.3091778351141632,
341
+ CauldronDD,16800,ocrbench_ocrbench_accuracy,0.36,
342
+ CauldronDD,16800,textvqa_val_exact_match,0.43116000000000004,0.006790215923404594
343
+ CauldronDD,17100,average,0.3385701687163391,
344
+ CauldronDD,17100,average_rank,2.2857142857142856,
345
+ CauldronDD,17100,docvqa_val_anls,0.44035807792372417,0.005618024098992455
346
+ CauldronDD,17100,infovqa_val_anls,0.15927117998447532,0.0057253221102160036
347
+ CauldronDD,17100,mme_total_score,1125.826630652261,
348
+ CauldronDD,17100,mmmu_val_mmmu_acc,0.29111,
349
+ CauldronDD,17100,mmstar_average,0.32228175438983514,
350
+ CauldronDD,17100,ocrbench_ocrbench_accuracy,0.385,
351
+ CauldronDD,17100,textvqa_val_exact_match,0.4334,0.006792916659532094
352
+ CauldronDD,18000,average,0.3341436139545066,
353
+ CauldronDD,18000,average_rank,3.2857142857142856,
354
+ CauldronDD,18000,docvqa_val_anls,0.4405469745279471,0.0056286501797814135
355
+ CauldronDD,18000,infovqa_val_anls,0.1660848313620339,0.005819813220995324
356
+ CauldronDD,18000,mme_total_score,1242.9980992396959,
357
+ CauldronDD,18000,mmmu_val_mmmu_acc,0.27778,
358
+ CauldronDD,18000,mmstar_average,0.31554987783705823,
359
+ CauldronDD,18000,ocrbench_ocrbench_accuracy,0.373,
360
+ CauldronDD,18000,textvqa_val_exact_match,0.4319,0.006790913141858027
361
+ CauldronDD,19200,average,0.33290606090591973,
362
+ CauldronDD,19200,average_rank,3.2857142857142856,
363
+ CauldronDD,19200,docvqa_val_anls,0.43616573848632056,0.005619579845927559
364
+ CauldronDD,19200,infovqa_val_anls,0.16528162106770297,0.005801061681754425
365
+ CauldronDD,19200,mme_total_score,1230.0974389755902,
366
+ CauldronDD,19200,mmmu_val_mmmu_acc,0.27,
367
+ CauldronDD,19200,mmstar_average,0.3266290058814946,
368
+ CauldronDD,19200,ocrbench_ocrbench_accuracy,0.374,
369
+ CauldronDD,19200,textvqa_val_exact_match,0.42536,0.006794218598284299
370
+ CauldronDD,19500,average,0.32553352494764914,
371
+ CauldronDD,19500,average_rank,2.4285714285714284,
372
+ CauldronDD,19500,docvqa_val_anls,0.4288225859433628,0.005619113441752853
373
+ CauldronDD,19500,infovqa_val_anls,0.15150002729561038,0.00560320463678714
374
+ CauldronDD,19500,mme_total_score,1198.1334533813524,
375
+ CauldronDD,19500,mmmu_val_mmmu_acc,0.25111,
376
+ CauldronDD,19500,mmstar_average,0.3198285364469219,
377
+ CauldronDD,19500,ocrbench_ocrbench_accuracy,0.373,
378
+ CauldronDD,19500,textvqa_val_exact_match,0.42894,0.006790325248719436
379
+ CambrianDD,300,average,0.17970577045668043,
380
+ CambrianDD,300,average_rank,1.8571428571428572,
381
+ CambrianDD,300,docvqa_val_anls,0.14433321388458845,0.004176784210049873
382
+ CambrianDD,300,infovqa_val_anls,0.13148487541870452,0.0056192589681577886
383
+ CambrianDD,300,mme_total_score,990.4948979591837,
384
+ CambrianDD,300,mmmu_val_mmmu_acc,0.24222,
385
+ CambrianDD,300,mmstar_average,0.2454565334367895,
386
+ CambrianDD,300,ocrbench_ocrbench_accuracy,0.134,
387
+ CambrianDD,300,textvqa_val_exact_match,0.18074,0.005296623577739393
388
+ CambrianDD,1200,average,0.2568586004702917,
389
+ CambrianDD,1200,average_rank,2.857142857142857,
390
+ CambrianDD,1200,docvqa_val_anls,0.3316039842462008,0.0057785603046722
391
+ CambrianDD,1200,infovqa_val_anls,0.14630377786332374,0.005668585125239906
392
+ CambrianDD,1200,mme_total_score,1112.7626050420167,
393
+ CambrianDD,1200,mmmu_val_mmmu_acc,0.26111,
394
+ CambrianDD,1200,mmstar_average,0.21803384071222537,
395
+ CambrianDD,1200,ocrbench_ocrbench_accuracy,0.247,
396
+ CambrianDD,1200,textvqa_val_exact_match,0.3371,0.006460330113317322
397
+ CambrianDD,2400,average,0.30575373318860816,
398
+ CambrianDD,2400,average_rank,2.7142857142857144,
399
+ CambrianDD,2400,docvqa_val_anls,0.40422225671207945,0.006074261001968628
400
+ CambrianDD,2400,infovqa_val_anls,0.1523121409563817,0.005638329718892052
401
+ CambrianDD,2400,mme_total_score,1059.9440776310523,
402
+ CambrianDD,2400,mmmu_val_mmmu_acc,0.28444,
403
+ CambrianDD,2400,mmstar_average,0.3110480014631879,
404
+ CambrianDD,2400,ocrbench_ocrbench_accuracy,0.3,
405
+ CambrianDD,2400,textvqa_val_exact_match,0.38249999999999995,0.006625581458704827
406
+ CambrianDD,2700,average,0.3094104328037755,
407
+ CambrianDD,2700,average_rank,2.2857142857142856,
408
+ CambrianDD,2700,docvqa_val_anls,0.4213173056510248,0.006079072406765826
409
+ CambrianDD,2700,infovqa_val_anls,0.16214248952051602,0.005831948548024231
410
+ CambrianDD,2700,mme_total_score,1054.2070828331332,
411
+ CambrianDD,2700,mmmu_val_mmmu_acc,0.26222,
412
+ CambrianDD,2700,mmstar_average,0.3088828016511124,
413
+ CambrianDD,2700,ocrbench_ocrbench_accuracy,0.306,
414
+ CambrianDD,2700,textvqa_val_exact_match,0.3959,0.006664497063111428
415
+ CambrianDD,3600,average,0.3244376041867266,
416
+ CambrianDD,3600,average_rank,2.7142857142857144,
417
+ CambrianDD,3600,docvqa_val_anls,0.4477711985871837,0.006244212556452033
418
+ CambrianDD,3600,infovqa_val_anls,0.17166556922234352,0.006038401288152695
419
+ CambrianDD,3600,mme_total_score,1054.6183473389356,
420
+ CambrianDD,3600,mmmu_val_mmmu_acc,0.28778,
421
+ CambrianDD,3600,mmstar_average,0.3192288573108325,
422
+ CambrianDD,3600,ocrbench_ocrbench_accuracy,0.325,
423
+ CambrianDD,3600,textvqa_val_exact_match,0.39518000000000003,0.00666872160834278
424
+ CambrianDD,4800,average,0.33575298162563233,
425
+ CambrianDD,4800,average_rank,2.7142857142857144,
426
+ CambrianDD,4800,docvqa_val_anls,0.48021663592502906,0.006264475129046182
427
+ CambrianDD,4800,infovqa_val_anls,0.17732197564395005,0.005979359845801751
428
+ CambrianDD,4800,mme_total_score,984.9863945578231,
429
+ CambrianDD,4800,mmmu_val_mmmu_acc,0.29111,
430
+ CambrianDD,4800,mmstar_average,0.29772927818481454,
431
+ CambrianDD,4800,ocrbench_ocrbench_accuracy,0.346,
432
+ CambrianDD,4800,textvqa_val_exact_match,0.42214000000000007,0.0067477011177196344
433
+ CambrianDD,5100,average,0.3359877520445322,
434
+ CambrianDD,5100,average_rank,2.2857142857142856,
435
+ CambrianDD,5100,docvqa_val_anls,0.4754298197157412,0.006168130327198727
436
+ CambrianDD,5100,infovqa_val_anls,0.18076704246631303,0.0060732104869038175
437
+ CambrianDD,5100,mme_total_score,895.3776510604241,
438
+ CambrianDD,5100,mmmu_val_mmmu_acc,0.27889,
439
+ CambrianDD,5100,mmstar_average,0.31171965008513874,
440
+ CambrianDD,5100,ocrbench_ocrbench_accuracy,0.35,
441
+ CambrianDD,5100,textvqa_val_exact_match,0.41912,0.0067289182479918445
442
+ CambrianDD,6000,average,0.32347651657813326,
443
+ CambrianDD,6000,average_rank,3.0,
444
+ CambrianDD,6000,docvqa_val_anls,0.46634507029121364,0.0062238629881778374
445
+ CambrianDD,6000,infovqa_val_anls,0.17940221095579675,0.006141333951799168
446
+ CambrianDD,6000,mme_total_score,1072.1291516606643,
447
+ CambrianDD,6000,mmmu_val_mmmu_acc,0.27667,
448
+ CambrianDD,6000,mmstar_average,0.31024181822178915,
449
+ CambrianDD,6000,ocrbench_ocrbench_accuracy,0.305,
450
+ CambrianDD,6000,textvqa_val_exact_match,0.4032,0.006697142849340224
451
+ CambrianDD,7200,average,0.3486686601177924,
452
+ CambrianDD,7200,average_rank,3.0,
453
+ CambrianDD,7200,docvqa_val_anls,0.5033994017292339,0.006211902263203208
454
+ CambrianDD,7200,infovqa_val_anls,0.1898192044728013,0.006149174628390649
455
+ CambrianDD,7200,mme_total_score,879.126550620248,
456
+ CambrianDD,7200,mmmu_val_mmmu_acc,0.27556,
457
+ CambrianDD,7200,mmstar_average,0.32559335450471943,
458
+ CambrianDD,7200,ocrbench_ocrbench_accuracy,0.365,
459
+ CambrianDD,7200,textvqa_val_exact_match,0.43263999999999997,0.006774430209318876
460
+ CambrianDD,7500,average,0.3515269876619196,
461
+ CambrianDD,7500,average_rank,2.0,
462
+ CambrianDD,7500,docvqa_val_anls,0.4864674639384977,0.006096512234708711
463
+ CambrianDD,7500,infovqa_val_anls,0.19628222332012993,0.006277685053210526
464
+ CambrianDD,7500,mme_total_score,1053.9439775910364,
465
+ CambrianDD,7500,mmmu_val_mmmu_acc,0.28778,
466
+ CambrianDD,7500,mmstar_average,0.32443223871288984,
467
+ CambrianDD,7500,ocrbench_ocrbench_accuracy,0.38,
468
+ CambrianDD,7500,textvqa_val_exact_match,0.4342000000000001,0.006762249448483892
469
+ CambrianDD,8400,average,0.3566021934695403,
470
+ CambrianDD,8400,average_rank,2.7142857142857144,
471
+ CambrianDD,8400,docvqa_val_anls,0.49954330523768764,0.006213069198769485
472
+ CambrianDD,8400,infovqa_val_anls,0.199645571135255,0.006349349194468786
473
+ CambrianDD,8400,mme_total_score,1114.3343337334934,
474
+ CambrianDD,8400,mmmu_val_mmmu_acc,0.29556,
475
+ CambrianDD,8400,mmstar_average,0.3215042844442992,
476
+ CambrianDD,8400,ocrbench_ocrbench_accuracy,0.379,
477
+ CambrianDD,8400,textvqa_val_exact_match,0.4443599999999999,0.006777995745444597
478
+ CambrianDD,9600,average,0.3625887269392778,
479
+ CambrianDD,9600,average_rank,2.5714285714285716,
480
+ CambrianDD,9600,docvqa_val_anls,0.5209747359075046,0.006185627757446921
481
+ CambrianDD,9600,infovqa_val_anls,0.20779524694498724,0.006396756819481715
482
+ CambrianDD,9600,mme_total_score,881.3031212484995,
483
+ CambrianDD,9600,mmmu_val_mmmu_acc,0.29889,
484
+ CambrianDD,9600,mmstar_average,0.3216523787831747,
485
+ CambrianDD,9600,ocrbench_ocrbench_accuracy,0.378,
486
+ CambrianDD,9600,textvqa_val_exact_match,0.44822,0.006790212641555748
487
+ CambrianDD,9900,average,0.36146463385115846,
488
+ CambrianDD,9900,average_rank,1.8571428571428572,
489
+ CambrianDD,9900,docvqa_val_anls,0.5081099773959435,0.00614386469189631
490
+ CambrianDD,9900,infovqa_val_anls,0.19863544950298542,0.006290462477841115
491
+ CambrianDD,9900,mme_total_score,947.5219087635055,
492
+ CambrianDD,9900,mmmu_val_mmmu_acc,0.30222,
493
+ CambrianDD,9900,mmstar_average,0.3418223762080219,
494
+ CambrianDD,9900,ocrbench_ocrbench_accuracy,0.375,
495
+ CambrianDD,9900,textvqa_val_exact_match,0.443,0.006785293511824548
496
+ CambrianDD,10800,average,0.36225439567996837,
497
+ CambrianDD,10800,average_rank,2.857142857142857,
498
+ CambrianDD,10800,docvqa_val_anls,0.5360831553687384,0.0062503917855996835
499
+ CambrianDD,10800,infovqa_val_anls,0.20358054292257038,0.0063419401635538405
500
+ CambrianDD,10800,mme_total_score,1067.7270908363346,
501
+ CambrianDD,10800,mmmu_val_mmmu_acc,0.28333,
502
+ CambrianDD,10800,mmstar_average,0.3324526757885013,
503
+ CambrianDD,10800,ocrbench_ocrbench_accuracy,0.368,
504
+ CambrianDD,10800,textvqa_val_exact_match,0.45008000000000004,0.006781238512185797
505
+ CambrianDD,11400,average,0.36662182455529396,
506
+ CambrianDD,11400,average_rank,1.8571428571428572,
507
+ CambrianDD,11400,docvqa_val_anls,0.5403085464525686,0.006288098035238887
508
+ CambrianDD,11400,infovqa_val_anls,0.20724387987551376,0.006422369131375898
509
+ CambrianDD,11400,mme_total_score,1090.8822529011604,
510
+ CambrianDD,11400,mmmu_val_mmmu_acc,0.29889,
511
+ CambrianDD,11400,mmstar_average,0.3195285210036817,
512
+ CambrianDD,11400,ocrbench_ocrbench_accuracy,0.38,
513
+ CambrianDD,11400,textvqa_val_exact_match,0.45375999999999994,0.006790875913984575
514
+ CambrianDD,12000,average,0.37022690841525296,
515
+ CambrianDD,12000,average_rank,2.142857142857143,
516
+ CambrianDD,12000,docvqa_val_anls,0.5329231904501042,0.00621682474881696
517
+ CambrianDD,12000,infovqa_val_anls,0.2099071605782676,0.0064660431120906045
518
+ CambrianDD,12000,mme_total_score,1029.8929571828733,
519
+ CambrianDD,12000,mmmu_val_mmmu_acc,0.30444,
520
+ CambrianDD,12000,mmstar_average,0.322291099463146,
521
+ CambrianDD,12000,ocrbench_ocrbench_accuracy,0.402,
522
+ CambrianDD,12000,textvqa_val_exact_match,0.4498,0.006790199802853561
523
+ CambrianDD,13200,average,0.3705817479124603,
524
+ CambrianDD,13200,average_rank,2.5714285714285716,
525
+ CambrianDD,13200,docvqa_val_anls,0.5406674097008617,0.006220185507992941
526
+ CambrianDD,13200,infovqa_val_anls,0.21720675802877365,0.00650836938989414
527
+ CambrianDD,13200,mme_total_score,1134.0421168467387,
528
+ CambrianDD,13200,mmmu_val_mmmu_acc,0.27889,
529
+ CambrianDD,13200,mmstar_average,0.3148263197451263,
530
+ CambrianDD,13200,ocrbench_ocrbench_accuracy,0.409,
531
+ CambrianDD,13200,textvqa_val_exact_match,0.4629,0.006796348730841747
532
+ CambrianDD,14400,average,0.3623658612664291,
533
+ CambrianDD,14400,average_rank,2.5714285714285716,
534
+ CambrianDD,14400,docvqa_val_anls,0.5152099093312626,0.006100903397549162
535
+ CambrianDD,14400,infovqa_val_anls,0.21109380152234544,0.006429931358574082
536
+ CambrianDD,14400,mme_total_score,1050.657763105242,
537
+ CambrianDD,14400,mmmu_val_mmmu_acc,0.28778,
538
+ CambrianDD,14400,mmstar_average,0.32701145674496673,
539
+ CambrianDD,14400,ocrbench_ocrbench_accuracy,0.383,
540
+ CambrianDD,14400,textvqa_val_exact_match,0.4501,0.006783833877713699
541
+ CambrianDD,14700,average,0.3748386548010339,
542
+ CambrianDD,14700,average_rank,1.4285714285714286,
543
+ CambrianDD,14700,docvqa_val_anls,0.5443355714236217,0.006257858861006952
544
+ CambrianDD,14700,infovqa_val_anls,0.21459500091962927,0.006462122780374779
545
+ CambrianDD,14700,mme_total_score,1105.4395758303322,
546
+ CambrianDD,14700,mmmu_val_mmmu_acc,0.29111,
547
+ CambrianDD,14700,mmstar_average,0.3272513564629525,
548
+ CambrianDD,14700,ocrbench_ocrbench_accuracy,0.404,
549
+ CambrianDD,14700,textvqa_val_exact_match,0.46774,0.006791751177480765
550
+ CambrianDD,15600,average,0.37528695975168413,
551
+ CambrianDD,15600,average_rank,2.142857142857143,
552
+ CambrianDD,15600,docvqa_val_anls,0.5490540524723359,0.006271460845615347
553
+ CambrianDD,15600,infovqa_val_anls,0.2171513714875839,0.006549339354210817
554
+ CambrianDD,15600,mme_total_score,1127.4101640656263,
555
+ CambrianDD,15600,mmmu_val_mmmu_acc,0.28556,
556
+ CambrianDD,15600,mmstar_average,0.332896334550185,
557
+ CambrianDD,15600,ocrbench_ocrbench_accuracy,0.399,
558
+ CambrianDD,15600,textvqa_val_exact_match,0.46806000000000003,0.006792053715831151
559
+ CambrianDD,16800,average,0.378379686213323,
560
+ CambrianDD,16800,average_rank,2.142857142857143,
561
+ CambrianDD,16800,docvqa_val_anls,0.5508556858421052,0.006230983486378255
562
+ CambrianDD,16800,infovqa_val_anls,0.22644813810901007,0.0065684324248959204
563
+ CambrianDD,16800,mme_total_score,956.2077831132453,
564
+ CambrianDD,16800,mmmu_val_mmmu_acc,0.29444,
565
+ CambrianDD,16800,mmstar_average,0.34207429332882255,
566
+ CambrianDD,16800,ocrbench_ocrbench_accuracy,0.405,
567
+ CambrianDD,16800,textvqa_val_exact_match,0.45146000000000003,0.00677465518462557
568
+ CambrianDD,17100,average,0.3745613817083588,
569
+ CambrianDD,17100,average_rank,1.4285714285714286,
570
+ CambrianDD,17100,docvqa_val_anls,0.5239343594449052,0.006067698891173559
571
+ CambrianDD,17100,infovqa_val_anls,0.21597294475540602,0.006475700240072832
572
+ CambrianDD,17100,mme_total_score,1066.7460984393756,
573
+ CambrianDD,17100,mmmu_val_mmmu_acc,0.3,
574
+ CambrianDD,17100,mmstar_average,0.34430098604984144,
575
+ CambrianDD,17100,ocrbench_ocrbench_accuracy,0.406,
576
+ CambrianDD,17100,textvqa_val_exact_match,0.45715999999999996,0.00678614450416776
577
+ CambrianDD,18000,average,0.37736657627182946,
578
+ CambrianDD,18000,average_rank,2.4285714285714284,
579
+ CambrianDD,18000,docvqa_val_anls,0.550171109156601,0.006266033692968377
580
+ CambrianDD,18000,infovqa_val_anls,0.2180520852784964,0.0064910045262362975
581
+ CambrianDD,18000,mme_total_score,1068.6598639455783,
582
+ CambrianDD,18000,mmmu_val_mmmu_acc,0.29,
583
+ CambrianDD,18000,mmstar_average,0.33205626319587944,
584
+ CambrianDD,18000,ocrbench_ocrbench_accuracy,0.409,
585
+ CambrianDD,18000,textvqa_val_exact_match,0.46492,0.0068105767385077025
586
+ CambrianDD,19200,average,0.37238254789618885,
587
+ CambrianDD,19200,average_rank,2.4285714285714284,
588
+ CambrianDD,19200,docvqa_val_anls,0.5332665411568654,0.006195231490784442
589
+ CambrianDD,19200,infovqa_val_anls,0.21571031377445513,0.006431739740859299
590
+ CambrianDD,19200,mme_total_score,1008.0998399359744,
591
+ CambrianDD,19200,mmmu_val_mmmu_acc,0.28444,
592
+ CambrianDD,19200,mmstar_average,0.33939843244581247,
593
+ CambrianDD,19200,ocrbench_ocrbench_accuracy,0.412,
594
+ CambrianDD,19200,textvqa_val_exact_match,0.44948,0.00679714544181831
595
+ CambrianDD,19500,average,0.3702087762443897,
596
+ CambrianDD,19500,average_rank,1.4285714285714286,
597
+ CambrianDD,19500,docvqa_val_anls,0.5327441491291284,0.006171493726324771
598
+ CambrianDD,19500,infovqa_val_anls,0.2134713917399994,0.006380629468185958
599
+ CambrianDD,19500,mme_total_score,1048.2445978391356,
600
+ CambrianDD,19500,mmmu_val_mmmu_acc,0.29444,
601
+ CambrianDD,19500,mmstar_average,0.33125711659721024,
602
+ CambrianDD,19500,ocrbench_ocrbench_accuracy,0.396,
603
+ CambrianDD,19500,textvqa_val_exact_match,0.4533400000000001,0.00679251032529976
604
+ LLaVaDD,300,average,0.14192111229918192,
605
+ LLaVaDD,300,average_rank,2.5714285714285716,
606
+ LLaVaDD,300,docvqa_val_anls,0.06089443514559298,0.0026836803170977547
607
+ LLaVaDD,300,infovqa_val_anls,0.0916406235448352,0.0046298095289004654
608
+ LLaVaDD,300,mme_total_score,777.2206882753101,
609
+ LLaVaDD,300,mmmu_val_mmmu_acc,0.24778,
610
+ LLaVaDD,300,mmstar_average,0.2549716151046633,
611
+ LLaVaDD,300,ocrbench_ocrbench_accuracy,0.118,
612
+ LLaVaDD,300,textvqa_val_exact_match,0.07824,0.0036768470624795064
613
+ LLaVaDD,1200,average,0.2509776310427157,
614
+ LLaVaDD,1200,average_rank,3.0,
615
+ LLaVaDD,1200,docvqa_val_anls,0.2444383475360029,0.005026540300329091
616
+ LLaVaDD,1200,infovqa_val_anls,0.15487600151177214,0.005600679946634536
617
+ LLaVaDD,1200,mme_total_score,860.4959983993598,
618
+ LLaVaDD,1200,mmmu_val_mmmu_acc,0.24667,
619
+ LLaVaDD,1200,mmstar_average,0.21306143720851922,
620
+ LLaVaDD,1200,ocrbench_ocrbench_accuracy,0.325,
621
+ LLaVaDD,1200,textvqa_val_exact_match,0.32182000000000005,0.006396230129691582
622
+ LLaVaDD,2400,average,0.29579280325109375,
623
+ LLaVaDD,2400,average_rank,3.0,
624
+ LLaVaDD,2400,docvqa_val_anls,0.31538339385878306,0.005424291843634001
625
+ LLaVaDD,2400,infovqa_val_anls,0.18261071688457164,0.0059828856978779545
626
+ LLaVaDD,2400,mme_total_score,744.4002601040415,
627
+ LLaVaDD,2400,mmmu_val_mmmu_acc,0.24889,
628
+ LLaVaDD,2400,mmstar_average,0.24909270876320772,
629
+ LLaVaDD,2400,ocrbench_ocrbench_accuracy,0.398,
630
+ LLaVaDD,2400,textvqa_val_exact_match,0.38077999999999995,0.006625050685037501
631
+ LLaVaDD,2700,average,0.3161939125538216,
632
+ LLaVaDD,2700,average_rank,2.142857142857143,
633
+ LLaVaDD,2700,docvqa_val_anls,0.33989172267075535,0.005558177429759288
634
+ LLaVaDD,2700,infovqa_val_anls,0.18986570203917563,0.0060535360821678975
635
+ LLaVaDD,2700,mme_total_score,794.4580832332933,
636
+ LLaVaDD,2700,mmmu_val_mmmu_acc,0.26667,
637
+ LLaVaDD,2700,mmstar_average,0.26941605061299884,
638
+ LLaVaDD,2700,ocrbench_ocrbench_accuracy,0.427,
639
+ LLaVaDD,2700,textvqa_val_exact_match,0.40432,0.006696240396453028
640
+ LLaVaDD,3600,average,0.32734135036250206,
641
+ LLaVaDD,3600,average_rank,2.5714285714285716,
642
+ LLaVaDD,3600,docvqa_val_anls,0.35235179662486144,0.005549556404054767
643
+ LLaVaDD,3600,infovqa_val_anls,0.18556296710855402,0.006043411346585987
644
+ LLaVaDD,3600,mme_total_score,835.5973389355743,
645
+ LLaVaDD,3600,mmmu_val_mmmu_acc,0.29778,
646
+ LLaVaDD,3600,mmstar_average,0.2915733384415969,
647
+ LLaVaDD,3600,ocrbench_ocrbench_accuracy,0.426,
648
+ LLaVaDD,3600,textvqa_val_exact_match,0.41078000000000003,0.0067073508951900115
649
+ LLaVaDD,4800,average,0.33013109358835874,
650
+ LLaVaDD,4800,average_rank,2.857142857142857,
651
+ LLaVaDD,4800,docvqa_val_anls,0.3502881859839653,0.005478097656928352
652
+ LLaVaDD,4800,infovqa_val_anls,0.19107082217989702,0.006085171603850096
653
+ LLaVaDD,4800,mme_total_score,733.0080032012804,
654
+ LLaVaDD,4800,mmmu_val_mmmu_acc,0.27,
655
+ LLaVaDD,4800,mmstar_average,0.32564755336629003,
656
+ LLaVaDD,4800,ocrbench_ocrbench_accuracy,0.424,
657
+ LLaVaDD,4800,textvqa_val_exact_match,0.41978000000000004,0.006734153256647549
658
+ LLaVaDD,5100,average,0.33484217665675037,
659
+ LLaVaDD,5100,average_rank,2.0,
660
+ LLaVaDD,5100,docvqa_val_anls,0.36535966236487605,0.005521676818896047
661
+ LLaVaDD,5100,infovqa_val_anls,0.18507025741281324,0.005999863664896731
662
+ LLaVaDD,5100,mme_total_score,782.4783913565427,
663
+ LLaVaDD,5100,mmmu_val_mmmu_acc,0.27333,
664
+ LLaVaDD,5100,mmstar_average,0.336633140162813,
665
+ LLaVaDD,5100,ocrbench_ocrbench_accuracy,0.427,
666
+ LLaVaDD,5100,textvqa_val_exact_match,0.42166000000000003,0.006745344232414143
667
+ LLaVaDD,6000,average,0.35016629838344665,
668
+ LLaVaDD,6000,average_rank,2.857142857142857,
669
+ LLaVaDD,6000,docvqa_val_anls,0.3972329845041029,0.005775860539243304
670
+ LLaVaDD,6000,infovqa_val_anls,0.2075063299082507,0.006269613699866996
671
+ LLaVaDD,6000,mme_total_score,793.4260704281713,
672
+ LLaVaDD,6000,mmmu_val_mmmu_acc,0.26778,
673
+ LLaVaDD,6000,mmstar_average,0.31483847588832625,
674
+ LLaVaDD,6000,ocrbench_ocrbench_accuracy,0.466,
675
+ LLaVaDD,6000,textvqa_val_exact_match,0.44764000000000004,0.006783751907166682
676
+ LLaVaDD,7200,average,0.34725325204788143,
677
+ LLaVaDD,7200,average_rank,2.857142857142857,
678
+ LLaVaDD,7200,docvqa_val_anls,0.38590528101197885,0.0056434459440418885
679
+ LLaVaDD,7200,infovqa_val_anls,0.20202261217969525,0.006207536626913416
680
+ LLaVaDD,7200,mme_total_score,806.6480592236894,
681
+ LLaVaDD,7200,mmmu_val_mmmu_acc,0.27778,
682
+ LLaVaDD,7200,mmstar_average,0.31109161909561434,
683
+ LLaVaDD,7200,ocrbench_ocrbench_accuracy,0.461,
684
+ LLaVaDD,7200,textvqa_val_exact_match,0.44572,0.006774357143149495
685
+ LLaVaDD,7500,average,0.3567314821960954,
686
+ LLaVaDD,7500,average_rank,2.142857142857143,
687
+ LLaVaDD,7500,docvqa_val_anls,0.40402326367659314,0.005709021228633167
688
+ LLaVaDD,7500,infovqa_val_anls,0.20360823695540417,0.006242149206646578
689
+ LLaVaDD,7500,mme_total_score,881.968587434974,
690
+ LLaVaDD,7500,mmmu_val_mmmu_acc,0.26889,
691
+ LLaVaDD,7500,mmstar_average,0.3211273925445752,
692
+ LLaVaDD,7500,ocrbench_ocrbench_accuracy,0.486,
693
+ LLaVaDD,7500,textvqa_val_exact_match,0.45674,0.006794008109300271
694
+ LLaVaDD,8400,average,0.36048790647619494,
695
+ LLaVaDD,8400,average_rank,3.0,
696
+ LLaVaDD,8400,docvqa_val_anls,0.41445027737785084,0.005825413484689958
697
+ LLaVaDD,8400,infovqa_val_anls,0.2172068852347218,0.006375888876018907
698
+ LLaVaDD,8400,mme_total_score,838.7092837134853,
699
+ LLaVaDD,8400,mmmu_val_mmmu_acc,0.29444,
700
+ LLaVaDD,8400,mmstar_average,0.31933027624459676,
701
+ LLaVaDD,8400,ocrbench_ocrbench_accuracy,0.473,
702
+ LLaVaDD,8400,textvqa_val_exact_match,0.4445,0.006768213334577188
703
+ LLaVaDD,9600,average,0.35282227960826557,
704
+ LLaVaDD,9600,average_rank,3.0,
705
+ LLaVaDD,9600,docvqa_val_anls,0.39757298714048717,0.005640323691319893
706
+ LLaVaDD,9600,infovqa_val_anls,0.2056866550403572,0.006279512757986692
707
+ LLaVaDD,9600,mme_total_score,760.0508203281312,
708
+ LLaVaDD,9600,mmmu_val_mmmu_acc,0.26556,
709
+ LLaVaDD,9600,mmstar_average,0.32465403546874877,
710
+ LLaVaDD,9600,ocrbench_ocrbench_accuracy,0.469,
711
+ LLaVaDD,9600,textvqa_val_exact_match,0.45446000000000003,0.006778466729448514
712
+ LLaVaDD,9900,average,0.3568107860471943,
713
+ LLaVaDD,9900,average_rank,2.0,
714
+ LLaVaDD,9900,docvqa_val_anls,0.40618497552083077,0.005714626310350028
715
+ LLaVaDD,9900,infovqa_val_anls,0.2062911109085894,0.006269310831066159
716
+ LLaVaDD,9900,mme_total_score,828.2609043617447,
717
+ LLaVaDD,9900,mmmu_val_mmmu_acc,0.26667,
718
+ LLaVaDD,9900,mmstar_average,0.32645862985374585,
719
+ LLaVaDD,9900,ocrbench_ocrbench_accuracy,0.473,
720
+ LLaVaDD,9900,textvqa_val_exact_match,0.46225999999999995,0.006800763821638828
721
+ LLaVaDD,10800,average,0.36137323363878177,
722
+ LLaVaDD,10800,average_rank,2.857142857142857,
723
+ LLaVaDD,10800,docvqa_val_anls,0.408003869061574,0.005694760075750652
724
+ LLaVaDD,10800,infovqa_val_anls,0.21338055182077123,0.0063085701231859895
725
+ LLaVaDD,10800,mme_total_score,895.123949579832,
726
+ LLaVaDD,10800,mmmu_val_mmmu_acc,0.28444,
727
+ LLaVaDD,10800,mmstar_average,0.32415498095034523,
728
+ LLaVaDD,10800,ocrbench_ocrbench_accuracy,0.48,
729
+ LLaVaDD,10800,textvqa_val_exact_match,0.45826000000000006,0.0067923767383995465
730
+ LLaVaDD,11400,average,0.36018553894261496,
731
+ LLaVaDD,11400,average_rank,1.7142857142857142,
732
+ LLaVaDD,11400,docvqa_val_anls,0.4075278809955403,0.005745964403945676
733
+ LLaVaDD,11400,infovqa_val_anls,0.21426494246529132,0.0063051564080262214
734
+ LLaVaDD,11400,mme_total_score,924.1244497799119,
735
+ LLaVaDD,11400,mmmu_val_mmmu_acc,0.27222,
736
+ LLaVaDD,11400,mmstar_average,0.33054041019485814,
737
+ LLaVaDD,11400,ocrbench_ocrbench_accuracy,0.478,
738
+ LLaVaDD,11400,textvqa_val_exact_match,0.45856,0.0067880601670997605
739
+ LLaVaDD,12000,average,0.3604179175862374,
740
+ LLaVaDD,12000,average_rank,3.142857142857143,
741
+ LLaVaDD,12000,docvqa_val_anls,0.4168137297955176,0.005781419012200098
742
+ LLaVaDD,12000,infovqa_val_anls,0.20846969163165352,0.006257602143639074
743
+ LLaVaDD,12000,mme_total_score,947.7637054821929,
744
+ LLaVaDD,12000,mmmu_val_mmmu_acc,0.26778,
745
+ LLaVaDD,12000,mmstar_average,0.3158840840902534,
746
+ LLaVaDD,12000,ocrbench_ocrbench_accuracy,0.498,
747
+ LLaVaDD,12000,textvqa_val_exact_match,0.45556,0.006805283216437887
748
+ LLaVaDD,13200,average,0.3615047561957609,
749
+ LLaVaDD,13200,average_rank,2.5714285714285716,
750
+ LLaVaDD,13200,docvqa_val_anls,0.4019041370209914,0.005625248973963487
751
+ LLaVaDD,13200,infovqa_val_anls,0.20363163177627105,0.00622309778124376
752
+ LLaVaDD,13200,mme_total_score,874.9429771908764,
753
+ LLaVaDD,13200,mmmu_val_mmmu_acc,0.28111,
754
+ LLaVaDD,13200,mmstar_average,0.3177427683773028,
755
+ LLaVaDD,13200,ocrbench_ocrbench_accuracy,0.494,
756
+ LLaVaDD,13200,textvqa_val_exact_match,0.47063999999999995,0.006828303099939613
757
+ LLaVaDD,14400,average,0.35822126770736845,
758
+ LLaVaDD,14400,average_rank,2.857142857142857,
759
+ LLaVaDD,14400,docvqa_val_anls,0.40475932408589743,0.005711979175622161
760
+ LLaVaDD,14400,infovqa_val_anls,0.2054455223584203,0.006260304272567981
761
+ LLaVaDD,14400,mme_total_score,895.4330732292917,
762
+ LLaVaDD,14400,mmmu_val_mmmu_acc,0.24889,
763
+ LLaVaDD,14400,mmstar_average,0.3293727597998932,
764
+ LLaVaDD,14400,ocrbench_ocrbench_accuracy,0.486,
765
+ LLaVaDD,14400,textvqa_val_exact_match,0.47486000000000006,0.006809762892651316
766
+ LLaVaDD,14700,average,0.3558320320318881,
767
+ LLaVaDD,14700,average_rank,2.2857142857142856,
768
+ LLaVaDD,14700,docvqa_val_anls,0.40809052878460395,0.0057531214312033715
769
+ LLaVaDD,14700,infovqa_val_anls,0.20725347402609343,0.006333426981708809
770
+ LLaVaDD,14700,mme_total_score,934.3972589035614,
771
+ LLaVaDD,14700,mmmu_val_mmmu_acc,0.25889,
772
+ LLaVaDD,14700,mmstar_average,0.3138981893806314,
773
+ LLaVaDD,14700,ocrbench_ocrbench_accuracy,0.474,
774
+ LLaVaDD,14700,textvqa_val_exact_match,0.47286000000000006,0.0068163316054393255
775
+ LLaVaDD,15600,average,0.3531190433154545,
776
+ LLaVaDD,15600,average_rank,3.2857142857142856,
777
+ LLaVaDD,15600,docvqa_val_anls,0.39525955886140174,0.005587329122981871
778
+ LLaVaDD,15600,infovqa_val_anls,0.20744642424798548,0.006270716143292359
779
+ LLaVaDD,15600,mme_total_score,887.2177871148459,
780
+ LLaVaDD,15600,mmmu_val_mmmu_acc,0.25111,
781
+ LLaVaDD,15600,mmstar_average,0.3150982767833397,
782
+ LLaVaDD,15600,ocrbench_ocrbench_accuracy,0.483,
783
+ LLaVaDD,15600,textvqa_val_exact_match,0.4668000000000001,0.00682372806117965
784
+ LLaVaDD,16800,average,0.35105363138195517,
785
+ LLaVaDD,16800,average_rank,3.2857142857142856,
786
+ LLaVaDD,16800,docvqa_val_anls,0.41852303319453404,0.005850640721947784
787
+ LLaVaDD,16800,infovqa_val_anls,0.2060249552494562,0.006276240807887592
788
+ LLaVaDD,16800,mme_total_score,922.4671868747499,
789
+ LLaVaDD,16800,mmmu_val_mmmu_acc,0.26444,
790
+ LLaVaDD,16800,mmstar_average,0.2870137998477405,
791
+ LLaVaDD,16800,ocrbench_ocrbench_accuracy,0.476,
792
+ LLaVaDD,16800,textvqa_val_exact_match,0.45432,0.006822490512661711
793
+ LLaVaDD,17100,average,0.3539392341852292,
794
+ LLaVaDD,17100,average_rank,2.2857142857142856,
795
+ LLaVaDD,17100,docvqa_val_anls,0.3926126439334205,0.005597953615807782
796
+ LLaVaDD,17100,infovqa_val_anls,0.19981020781200884,0.006157098782486468
797
+ LLaVaDD,17100,mme_total_score,906.0204081632653,
798
+ LLaVaDD,17100,mmmu_val_mmmu_acc,0.26333,
799
+ LLaVaDD,17100,mmstar_average,0.32212255336594614,
800
+ LLaVaDD,17100,ocrbench_ocrbench_accuracy,0.486,
801
+ LLaVaDD,17100,textvqa_val_exact_match,0.45975999999999995,0.006811389541459004
802
+ LLaVaDD,18000,average,0.3577636224241274,
803
+ LLaVaDD,18000,average_rank,3.142857142857143,
804
+ LLaVaDD,18000,docvqa_val_anls,0.40703277772305824,0.005678461401864167
805
+ LLaVaDD,18000,infovqa_val_anls,0.20110975485759583,0.006181644423856455
806
+ LLaVaDD,18000,mme_total_score,810.5214085634254,
807
+ LLaVaDD,18000,mmmu_val_mmmu_acc,0.26444,
808
+ LLaVaDD,18000,mmstar_average,0.3233392019641104,
809
+ LLaVaDD,18000,ocrbench_ocrbench_accuracy,0.482,
810
+ LLaVaDD,18000,textvqa_val_exact_match,0.46865999999999997,0.006819241988099444
811
+ LLaVaDD,19200,average,0.35213697279154416,
812
+ LLaVaDD,19200,average_rank,3.2857142857142856,
813
+ LLaVaDD,19200,docvqa_val_anls,0.40393359954160324,0.0057202986837765315
814
+ LLaVaDD,19200,infovqa_val_anls,0.19769978171894423,0.006187032583796771
815
+ LLaVaDD,19200,mme_total_score,918.2750100040016,
816
+ LLaVaDD,19200,mmmu_val_mmmu_acc,0.26778,
817
+ LLaVaDD,19200,mmstar_average,0.31024845548871743,
818
+ LLaVaDD,19200,ocrbench_ocrbench_accuracy,0.478,
819
+ LLaVaDD,19200,textvqa_val_exact_match,0.45516,0.006808813910614232
820
+ LLaVaDD,19500,average,0.3515463502032892,
821
+ LLaVaDD,19500,average_rank,2.142857142857143,
822
+ LLaVaDD,19500,docvqa_val_anls,0.4008490038529374,0.00568561290008947
823
+ LLaVaDD,19500,infovqa_val_anls,0.2007011465723568,0.00621601219684737
824
+ LLaVaDD,19500,mme_total_score,817.8963585434174,
825
+ LLaVaDD,19500,mmmu_val_mmmu_acc,0.26556,
826
+ LLaVaDD,19500,mmstar_average,0.3017679507944412,
827
+ LLaVaDD,19500,ocrbench_ocrbench_accuracy,0.48,
828
+ LLaVaDD,19500,textvqa_val_exact_match,0.4604,0.006801913054739883
app/src/content/assets/data/all_ratings_luis.csv ADDED
@@ -0,0 +1,1201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run,step,metric,value,stderr
2
+ Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
3
+ Baseline,1000,average,0.27120689295763617,
4
+ Baseline,1000,average_rank,3.0,
5
+ Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
6
+ Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
7
+ Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
8
+ Baseline,1000,mme_total_score,977.4280712284914,
9
+ Baseline,1000,mmmu_val_mmmu_acc,0.25222,
10
+ Baseline,1000,mmstar_average,0.23215874078908072,
11
+ Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
12
+ Baseline,1000,seedbench_seed_all,0.2563646470261256,
13
+ Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
14
+ Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
15
+ Baseline,2000,average,0.3202068275596269,
16
+ Baseline,2000,average_rank,2.8,
17
+ Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
18
+ Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
19
+ Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
20
+ Baseline,2000,mme_total_score,1049.3036214485794,
21
+ Baseline,2000,mmmu_val_mmmu_acc,0.24556,
22
+ Baseline,2000,mmstar_average,0.21305462434540698,
23
+ Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
24
+ Baseline,2000,seedbench_seed_all,0.258532518065592,
25
+ Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
26
+ Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
27
+ Baseline,3000,average,0.3507423834414229,
28
+ Baseline,3000,average_rank,2.6,
29
+ Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
30
+ Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
31
+ Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
32
+ Baseline,3000,mme_total_score,1170.2383953581434,
33
+ Baseline,3000,mmmu_val_mmmu_acc,0.27556,
34
+ Baseline,3000,mmstar_average,0.25432376938577683,
35
+ Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
36
+ Baseline,3000,seedbench_seed_all,0.2792106725958866,
37
+ Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
38
+ Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
39
+ Baseline,4000,average,0.36961781722974835,
40
+ Baseline,4000,average_rank,2.8,
41
+ Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
42
+ Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
43
+ Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
44
+ Baseline,4000,mme_total_score,1155.203781512605,
45
+ Baseline,4000,mmmu_val_mmmu_acc,0.25556,
46
+ Baseline,4000,mmstar_average,0.2575590188757354,
47
+ Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
48
+ Baseline,4000,seedbench_seed_all,0.33913285158421347,
49
+ Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
50
+ Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
51
+ Baseline,5000,average,0.3974627910380972,
52
+ Baseline,5000,average_rank,2.3,
53
+ Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
54
+ Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
55
+ Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
56
+ Baseline,5000,mme_total_score,1181.4653861544618,
57
+ Baseline,5000,mmmu_val_mmmu_acc,0.26667,
58
+ Baseline,5000,mmstar_average,0.29596648146165705,
59
+ Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
60
+ Baseline,5000,seedbench_seed_all,0.43107281823235133,
61
+ Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
62
+ Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
63
+ Baseline,6000,average,0.4161227404571003,
64
+ Baseline,6000,average_rank,2.3,
65
+ Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
66
+ Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
67
+ Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
68
+ Baseline,6000,mme_total_score,1284.1648659463785,
69
+ Baseline,6000,mmmu_val_mmmu_acc,0.27111,
70
+ Baseline,6000,mmstar_average,0.2978489412854164,
71
+ Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
72
+ Baseline,6000,seedbench_seed_all,0.4795997776542524,
73
+ Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
74
+ Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
75
+ Baseline,7000,average,0.4291083177345374,
76
+ Baseline,7000,average_rank,2.1,
77
+ Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
78
+ Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
79
+ Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
80
+ Baseline,7000,mme_total_score,1185.875650260104,
81
+ Baseline,7000,mmmu_val_mmmu_acc,0.26556,
82
+ Baseline,7000,mmstar_average,0.31372400960777047,
83
+ Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
84
+ Baseline,7000,seedbench_seed_all,0.4964424680377988,
85
+ Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
86
+ Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
87
+ Baseline,8000,average,0.43846759477995995,
88
+ Baseline,8000,average_rank,1.9,
89
+ Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
90
+ Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
91
+ Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
92
+ Baseline,8000,mme_total_score,1199.2409963985594,
93
+ Baseline,8000,mmmu_val_mmmu_acc,0.28111,
94
+ Baseline,8000,mmstar_average,0.33512257186205047,
95
+ Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
96
+ Baseline,8000,seedbench_seed_all,0.5024458032240133,
97
+ Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
98
+ Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
99
+ Baseline,9000,average,0.4422510732201056,
100
+ Baseline,9000,average_rank,2.5,
101
+ Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
102
+ Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
103
+ Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
104
+ Baseline,9000,mme_total_score,1231.5195078031213,
105
+ Baseline,9000,mmmu_val_mmmu_acc,0.25889,
106
+ Baseline,9000,mmstar_average,0.3216444898242951,
107
+ Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
108
+ Baseline,9000,seedbench_seed_all,0.5120622568093385,
109
+ Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
110
+ Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
111
+ Baseline,10000,average,0.4523875703250908,
112
+ Baseline,10000,average_rank,1.9,
113
+ Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
114
+ Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
115
+ Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
116
+ Baseline,10000,mme_total_score,1240.8218287314926,
117
+ Baseline,10000,mmmu_val_mmmu_acc,0.28778,
118
+ Baseline,10000,mmstar_average,0.32972717906018517,
119
+ Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
120
+ Baseline,10000,seedbench_seed_all,0.5217342968315731,
121
+ Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
122
+ Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
123
+ Baseline,11000,average,0.4561398159525099,
124
+ Baseline,11000,average_rank,2.3,
125
+ Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
126
+ Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
127
+ Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
128
+ Baseline,11000,mme_total_score,1322.9488795518205,
129
+ Baseline,11000,mmmu_val_mmmu_acc,0.27778,
130
+ Baseline,11000,mmstar_average,0.3298563439522548,
131
+ Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
132
+ Baseline,11000,seedbench_seed_all,0.5237354085603113,
133
+ Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
134
+ Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
135
+ Baseline,12000,average,0.4582751140055433,
136
+ Baseline,12000,average_rank,2.4,
137
+ Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
138
+ Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
139
+ Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
140
+ Baseline,12000,mme_total_score,1225.6453581432572,
141
+ Baseline,12000,mmmu_val_mmmu_acc,0.27889,
142
+ Baseline,12000,mmstar_average,0.34010867846816534,
143
+ Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
144
+ Baseline,12000,seedbench_seed_all,0.5350194552529183,
145
+ Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
146
+ Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
147
+ Baseline,13000,average,0.4692868662590049,
148
+ Baseline,13000,average_rank,1.7,
149
+ Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
150
+ Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
151
+ Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
152
+ Baseline,13000,mme_total_score,1281.7122849139657,
153
+ Baseline,13000,mmmu_val_mmmu_acc,0.28222,
154
+ Baseline,13000,mmstar_average,0.3453069542917521,
155
+ Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
156
+ Baseline,13000,seedbench_seed_all,0.5442468037798777,
157
+ Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
158
+ Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
159
+ Baseline,14000,average,0.47352486841689195,
160
+ Baseline,14000,average_rank,1.9,
161
+ Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
162
+ Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
163
+ Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
164
+ Baseline,14000,mme_total_score,1309.1444577831132,
165
+ Baseline,14000,mmmu_val_mmmu_acc,0.28111,
166
+ Baseline,14000,mmstar_average,0.34575818188776586,
167
+ Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
168
+ Baseline,14000,seedbench_seed_all,0.5483602001111729,
169
+ Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
170
+ Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
171
+ Baseline,15000,average,0.47878665012878824,
172
+ Baseline,15000,average_rank,1.4,
173
+ Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
174
+ Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
175
+ Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
176
+ Baseline,15000,mme_total_score,1384.2171868747498,
177
+ Baseline,15000,mmmu_val_mmmu_acc,0.30222,
178
+ Baseline,15000,mmstar_average,0.35408135695920684,
179
+ Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
180
+ Baseline,15000,seedbench_seed_all,0.5411339633129516,
181
+ Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
182
+ Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
183
+ Baseline,16000,average,0.47665128022935843,
184
+ Baseline,16000,average_rank,2.1,
185
+ Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
186
+ Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
187
+ Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
188
+ Baseline,16000,mme_total_score,1317.8491396558625,
189
+ Baseline,16000,mmmu_val_mmmu_acc,0.27556,
190
+ Baseline,16000,mmstar_average,0.33214333327093315,
191
+ Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
192
+ Baseline,16000,seedbench_seed_all,0.5463590883824346,
193
+ Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
194
+ Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
195
+ Baseline,17000,average,0.4777141780162423,
196
+ Baseline,17000,average_rank,1.8,
197
+ Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
198
+ Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
199
+ Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
200
+ Baseline,17000,mme_total_score,1381.9161664665867,
201
+ Baseline,17000,mmmu_val_mmmu_acc,0.27667,
202
+ Baseline,17000,mmstar_average,0.3370289492329521,
203
+ Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
204
+ Baseline,17000,seedbench_seed_all,0.5510283490828238,
205
+ Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
206
+ Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
207
+ Baseline,18000,average,0.4819834595278701,
208
+ Baseline,18000,average_rank,1.6,
209
+ Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
210
+ Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
211
+ Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
212
+ Baseline,18000,mme_total_score,1336.922769107643,
213
+ Baseline,18000,mmmu_val_mmmu_acc,0.28667,
214
+ Baseline,18000,mmstar_average,0.34482796716566916,
215
+ Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
216
+ Baseline,18000,seedbench_seed_all,0.5543079488604781,
217
+ Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
218
+ Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
219
+ Baseline,19000,average,0.4899006713916878,
220
+ Baseline,19000,average_rank,1.4,
221
+ Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
222
+ Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
223
+ Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
224
+ Baseline,19000,mme_total_score,1406.6628651460583,
225
+ Baseline,19000,mmmu_val_mmmu_acc,0.28333,
226
+ Baseline,19000,mmstar_average,0.356220913822775,
227
+ Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
228
+ Baseline,19000,seedbench_seed_all,0.554585881045025,
229
+ Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
230
+ Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
231
+ Baseline,20000,average,0.4873169067639118,
232
+ Baseline,20000,average_rank,1.4,
233
+ Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
234
+ Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
235
+ Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
236
+ Baseline,20000,mme_total_score,1324.6738695478193,
237
+ Baseline,20000,mmmu_val_mmmu_acc,0.30111,
238
+ Baseline,20000,mmstar_average,0.33806766134497995,
239
+ Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
240
+ Baseline,20000,seedbench_seed_all,0.5587548638132296,
241
+ Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
242
+ ≥2,1000,ai2d_exact_match,0.27331606217616583,0.008021157484423315
243
+ ≥2,1000,average,0.2964817591841572,
244
+ ≥2,1000,average_rank,2.0,
245
+ ≥2,1000,chartqa_relaxed_overall,0.4016,0.009806398022560107
246
+ ≥2,1000,docvqa_val_anls,0.38703197724603455,0.0059317827343935035
247
+ ≥2,1000,infovqa_val_anls,0.17280000404070578,0.006201144732918485
248
+ ≥2,1000,mme_total_score,961.9496798719488,
249
+ ≥2,1000,mmmu_val_mmmu_acc,0.27556,
250
+ ≥2,1000,mmstar_average,0.20051212493658782,
251
+ ≥2,1000,ocrbench_ocrbench_accuracy,0.331,
252
+ ≥2,1000,seedbench_seed_all,0.25219566425792106,
253
+ ≥2,1000,textvqa_val_exact_match,0.37432,0.006614110432353112
254
+ ≥2,2000,ai2d_exact_match,0.27428756476683935,0.008030027397236182
255
+ ≥2,2000,average,0.3376151239444176,
256
+ ≥2,2000,average_rank,1.8,
257
+ ≥2,2000,chartqa_relaxed_overall,0.4984,0.010001949389825897
258
+ ≥2,2000,docvqa_val_anls,0.47035044389194575,0.006171152822696564
259
+ ≥2,2000,infovqa_val_anls,0.21264444578610614,0.006798221032077756
260
+ ≥2,2000,mme_total_score,995.0442176870747,
261
+ ≥2,2000,mmmu_val_mmmu_acc,0.26111,
262
+ ≥2,2000,mmstar_average,0.2371410151404708,
263
+ ≥2,2000,ocrbench_ocrbench_accuracy,0.386,
264
+ ≥2,2000,seedbench_seed_all,0.27276264591439686,
265
+ ≥2,2000,textvqa_val_exact_match,0.42583999999999994,0.006752390527477444
266
+ ≥2,3000,ai2d_exact_match,0.28886010362694303,0.008157423105367313
267
+ ≥2,3000,average,0.3650476191493284,
268
+ ≥2,3000,average_rank,2.1,
269
+ ≥2,3000,chartqa_relaxed_overall,0.5296,0.009984458511341809
270
+ ≥2,3000,docvqa_val_anls,0.5084048093337913,0.006266409805144786
271
+ ≥2,3000,infovqa_val_anls,0.226696840609911,0.0070183318907300766
272
+ ≥2,3000,mme_total_score,966.6394557823129,
273
+ ≥2,3000,mmmu_val_mmmu_acc,0.27556,
274
+ ≥2,3000,mmstar_average,0.25798680765602255,
275
+ ≥2,3000,ocrbench_ocrbench_accuracy,0.423,
276
+ ≥2,3000,seedbench_seed_all,0.3360200111172874,
277
+ ≥2,3000,textvqa_val_exact_match,0.4393,0.0067683280101374045
278
+ ≥2,4000,ai2d_exact_match,0.3180051813471503,0.00838183912252989
279
+ ≥2,4000,average,0.3939919625964655,
280
+ ≥2,4000,average_rank,2.0,
281
+ ≥2,4000,chartqa_relaxed_overall,0.5392,0.009971214271372281
282
+ ≥2,4000,docvqa_val_anls,0.5318426170932731,0.006287567577266625
283
+ ≥2,4000,infovqa_val_anls,0.24176968468370258,0.007226680233814427
284
+ ≥2,4000,mme_total_score,1052.9128651460585,
285
+ ≥2,4000,mmmu_val_mmmu_acc,0.27778,
286
+ ≥2,4000,mmstar_average,0.30433696178936676,
287
+ ≥2,4000,ocrbench_ocrbench_accuracy,0.447,
288
+ ≥2,4000,seedbench_seed_all,0.42779321845469703,
289
+ ≥2,4000,textvqa_val_exact_match,0.4581999999999999,0.006800867765254084
290
+ ≥2,5000,ai2d_exact_match,0.3448834196891192,0.008555140353607655
291
+ ≥2,5000,average,0.40963271881608265,
292
+ ≥2,5000,average_rank,2.1,
293
+ ≥2,5000,chartqa_relaxed_overall,0.548,0.009955804699716018
294
+ ≥2,5000,docvqa_val_anls,0.575799913178854,0.006211088978189562
295
+ ≥2,5000,infovqa_val_anls,0.25711323262099633,0.0073775881337487925
296
+ ≥2,5000,mme_total_score,1010.4850940376151,
297
+ ≥2,5000,mmmu_val_mmmu_acc,0.27667,
298
+ ≥2,5000,mmstar_average,0.2871021117490485,
299
+ ≥2,5000,ocrbench_ocrbench_accuracy,0.455,
300
+ ≥2,5000,seedbench_seed_all,0.46642579210672597,
301
+ ≥2,5000,textvqa_val_exact_match,0.4757,0.006785477915527278
302
+ ≥2,6000,ai2d_exact_match,0.3795336787564767,0.008734055590837087
303
+ ≥2,6000,average,0.423161039572533,
304
+ ≥2,6000,average_rank,1.4,
305
+ ≥2,6000,chartqa_relaxed_overall,0.5668,0.009912336039617753
306
+ ≥2,6000,docvqa_val_anls,0.5827000147792567,0.006217654063020532
307
+ ≥2,6000,infovqa_val_anls,0.24558020684647988,0.0071473774205313935
308
+ ≥2,6000,mme_total_score,1096.4623849539817,
309
+ ≥2,6000,mmmu_val_mmmu_acc,0.27222,
310
+ ≥2,6000,mmstar_average,0.3026938215293386,
311
+ ≥2,6000,ocrbench_ocrbench_accuracy,0.475,
312
+ ≥2,6000,seedbench_seed_all,0.49494163424124515,
313
+ ≥2,6000,textvqa_val_exact_match,0.4889799999999999,0.006798040496416463
314
+ ≥2,7000,ai2d_exact_match,0.3863341968911917,0.00876353292332671
315
+ ≥2,7000,average,0.43260201849012403,
316
+ ≥2,7000,average_rank,2.1,
317
+ ≥2,7000,chartqa_relaxed_overall,0.572,0.009897756626351943
318
+ ≥2,7000,docvqa_val_anls,0.5958889673096114,0.006197986096231253
319
+ ≥2,7000,infovqa_val_anls,0.24831461076228495,0.0071830066608344805
320
+ ≥2,7000,mme_total_score,1098.0422168867549,
321
+ ≥2,7000,mmmu_val_mmmu_acc,0.28333,
322
+ ≥2,7000,mmstar_average,0.31254705626181345,
323
+ ≥2,7000,ocrbench_ocrbench_accuracy,0.493,
324
+ ≥2,7000,seedbench_seed_all,0.5060033351862145,
325
+ ≥2,7000,textvqa_val_exact_match,0.496,0.006798444216786202
326
+ ≥2,8000,ai2d_exact_match,0.4025259067357513,0.00882649222855129
327
+ ≥2,8000,average,0.4423608272909927,
328
+ ≥2,8000,average_rank,2.1,
329
+ ≥2,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
330
+ ≥2,8000,docvqa_val_anls,0.6081292058298197,0.006190473638311687
331
+ ≥2,8000,infovqa_val_anls,0.25707448915865344,0.007179410853014501
332
+ ≥2,8000,mme_total_score,1100.4132653061224,
333
+ ≥2,8000,mmmu_val_mmmu_acc,0.28,
334
+ ≥2,8000,mmstar_average,0.3170263263849818,
335
+ ≥2,8000,ocrbench_ocrbench_accuracy,0.504,
336
+ ≥2,8000,seedbench_seed_all,0.5167315175097277,
337
+ ≥2,8000,textvqa_val_exact_match,0.5125600000000001,0.006790351320381798
338
+ ≥2,9000,ai2d_exact_match,0.4106217616580311,0.008854207883828036
339
+ ≥2,9000,average,0.4477239927349069,
340
+ ≥2,9000,average_rank,1.8,
341
+ ≥2,9000,chartqa_relaxed_overall,0.5884,0.009844437067525526
342
+ ≥2,9000,docvqa_val_anls,0.6233981201771228,0.006152789393932141
343
+ ≥2,9000,infovqa_val_anls,0.25099979430746866,0.006997337550850154
344
+ ≥2,9000,mme_total_score,1100.9423769507803,
345
+ ≥2,9000,mmmu_val_mmmu_acc,0.27778,
346
+ ≥2,9000,mmstar_average,0.3172130122236236,
347
+ ≥2,9000,ocrbench_ocrbench_accuracy,0.518,
348
+ ≥2,9000,seedbench_seed_all,0.5178432462479156,
349
+ ≥2,9000,textvqa_val_exact_match,0.5252600000000001,0.006790435073078627
350
+ ≥2,10000,ai2d_exact_match,0.41904145077720206,0.008880404559123598
351
+ ≥2,10000,average,0.450650749528602,
352
+ ≥2,10000,average_rank,2.2,
353
+ ≥2,10000,chartqa_relaxed_overall,0.5956,0.009817474681589429
354
+ ≥2,10000,docvqa_val_anls,0.6254308760372823,0.006142114135609194
355
+ ≥2,10000,infovqa_val_anls,0.23792853517114784,0.006776022015067822
356
+ ≥2,10000,mme_total_score,1157.0735294117646,
357
+ ≥2,10000,mmmu_val_mmmu_acc,0.27667,
358
+ ≥2,10000,mmstar_average,0.31479930233765546,
359
+ ≥2,10000,ocrbench_ocrbench_accuracy,0.53,
360
+ ≥2,10000,seedbench_seed_all,0.5238465814341301,
361
+ ≥2,10000,textvqa_val_exact_match,0.53254,0.006777862315178193
362
+ ≥2,11000,ai2d_exact_match,0.43555699481865284,0.008924095913829727
363
+ ≥2,11000,average,0.4613124059808435,
364
+ ≥2,11000,average_rank,1.8,
365
+ ≥2,11000,chartqa_relaxed_overall,0.5984,0.009806398022560106
366
+ ≥2,11000,docvqa_val_anls,0.6453200065413649,0.0060722869307158955
367
+ ≥2,11000,infovqa_val_anls,0.24059820801450565,0.006814633527776416
368
+ ≥2,11000,mme_total_score,1262.6299519807922,
369
+ ≥2,11000,mmmu_val_mmmu_acc,0.3,
370
+ ≥2,11000,mmstar_average,0.33559717819403534,
371
+ ≥2,11000,ocrbench_ocrbench_accuracy,0.527,
372
+ ≥2,11000,seedbench_seed_all,0.5226792662590328,
373
+ ≥2,11000,textvqa_val_exact_match,0.54666,0.0067526356704400645
374
+ ≥2,12000,ai2d_exact_match,0.44073834196891193,0.008935721506916777
375
+ ≥2,12000,average,0.46516707040731664,
376
+ ≥2,12000,average_rank,1.9,
377
+ ≥2,12000,chartqa_relaxed_overall,0.598,0.009808000752013664
378
+ ≥2,12000,docvqa_val_anls,0.6402481933825662,0.006107198073878916
379
+ ≥2,12000,infovqa_val_anls,0.2601009880983462,0.0070991293032872695
380
+ ≥2,12000,mme_total_score,1112.7142857142858,
381
+ ≥2,12000,mmmu_val_mmmu_acc,0.31,
382
+ ≥2,12000,mmstar_average,0.32603422027717016,
383
+ ≥2,12000,ocrbench_ocrbench_accuracy,0.547,
384
+ ≥2,12000,seedbench_seed_all,0.523401889938855,
385
+ ≥2,12000,textvqa_val_exact_match,0.54098,0.006767635340177507
386
+ ≥2,13000,ai2d_exact_match,0.44041450777202074,0.008935023865613881
387
+ ≥2,13000,average,0.46553651974650545,
388
+ ≥2,13000,average_rank,2.2,
389
+ ≥2,13000,chartqa_relaxed_overall,0.6092,0.009760545645634788
390
+ ≥2,13000,docvqa_val_anls,0.6433035796450283,0.006095519860378371
391
+ ≥2,13000,infovqa_val_anls,0.2594356954563223,0.007105630672634776
392
+ ≥2,13000,mme_total_score,1207.9944977991197,
393
+ ≥2,13000,mmmu_val_mmmu_acc,0.28111,
394
+ ≥2,13000,mmstar_average,0.3383640832831994,
395
+ ≥2,13000,ocrbench_ocrbench_accuracy,0.539,
396
+ ≥2,13000,seedbench_seed_all,0.5294608115619789,
397
+ ≥2,13000,textvqa_val_exact_match,0.5495399999999999,0.006753508692222968
398
+ ≥2,14000,ai2d_exact_match,0.44462435233160624,0.008943792697097361
399
+ ≥2,14000,average,0.46921726913331274,
400
+ ≥2,14000,average_rank,2.1,
401
+ ≥2,14000,chartqa_relaxed_overall,0.612,0.009747841205275417
402
+ ≥2,14000,docvqa_val_anls,0.65515509916543,0.006051151525703575
403
+ ≥2,14000,infovqa_val_anls,0.2677755343748415,0.007100955702899581
404
+ ≥2,14000,mme_total_score,1163.8374349739895,
405
+ ≥2,14000,mmmu_val_mmmu_acc,0.28556,
406
+ ≥2,14000,mmstar_average,0.32353974705611904,
407
+ ≥2,14000,ocrbench_ocrbench_accuracy,0.543,
408
+ ≥2,14000,seedbench_seed_all,0.5332406892718177,
409
+ ≥2,14000,textvqa_val_exact_match,0.55806,0.006725656411892758
410
+ ≥2,15000,ai2d_exact_match,0.44624352331606215,0.008946992176353901
411
+ ≥2,15000,average,0.4737967933693773,
412
+ ≥2,15000,average_rank,2.2,
413
+ ≥2,15000,chartqa_relaxed_overall,0.618,0.009719474639861454
414
+ ≥2,15000,docvqa_val_anls,0.6614354910767699,0.006013191753461033
415
+ ≥2,15000,infovqa_val_anls,0.26176573129112124,0.007093151287118967
416
+ ≥2,15000,mme_total_score,1229.438475390156,
417
+ ≥2,15000,mmmu_val_mmmu_acc,0.29556,
418
+ ≥2,15000,mmstar_average,0.32387576651370553,
419
+ ≥2,15000,ocrbench_ocrbench_accuracy,0.561,
420
+ ≥2,15000,seedbench_seed_all,0.5351306281267371,
421
+ ≥2,15000,textvqa_val_exact_match,0.56116,0.006722390124486763
422
+ ≥2,16000,ai2d_exact_match,0.4478626943005181,0.0089500956222288
423
+ ≥2,16000,average,0.4748174802839308,
424
+ ≥2,16000,average_rank,2.2,
425
+ ≥2,16000,chartqa_relaxed_overall,0.6192,0.009713613422114641
426
+ ≥2,16000,docvqa_val_anls,0.6585392720477772,0.0060616936904167125
427
+ ≥2,16000,infovqa_val_anls,0.2653830027853819,0.007108417358601188
428
+ ≥2,16000,mme_total_score,1157.8782513005203,
429
+ ≥2,16000,mmmu_val_mmmu_acc,0.29889,
430
+ ≥2,16000,mmstar_average,0.3217940710425999,
431
+ ≥2,16000,ocrbench_ocrbench_accuracy,0.561,
432
+ ≥2,16000,seedbench_seed_all,0.5349082823790995,
433
+ ≥2,16000,textvqa_val_exact_match,0.5657800000000001,0.006716429140851619
434
+ ≥2,17000,ai2d_exact_match,0.4540155440414508,0.00896101461327443
435
+ ≥2,17000,average,0.4765363782507968,
436
+ ≥2,17000,average_rank,2.2,
437
+ ≥2,17000,chartqa_relaxed_overall,0.6184,0.009717527882093043
438
+ ≥2,17000,docvqa_val_anls,0.6605538305641464,0.006048170352990264
439
+ ≥2,17000,infovqa_val_anls,0.27438351817158263,0.007183740557624646
440
+ ≥2,17000,mme_total_score,1231.31512605042,
441
+ ≥2,17000,mmmu_val_mmmu_acc,0.30111,
442
+ ≥2,17000,mmstar_average,0.3273406426639828,
443
+ ≥2,17000,ocrbench_ocrbench_accuracy,0.555,
444
+ ≥2,17000,seedbench_seed_all,0.5349638688160089,
445
+ ≥2,17000,textvqa_val_exact_match,0.5630599999999999,0.006726822229512349
446
+ ≥2,18000,ai2d_exact_match,0.4540155440414508,0.008961014613274428
447
+ ≥2,18000,average,0.4749977548559891,
448
+ ≥2,18000,average_rank,2.3,
449
+ ≥2,18000,chartqa_relaxed_overall,0.614,0.009738559226822298
450
+ ≥2,18000,docvqa_val_anls,0.6647865229953943,0.00602531683337989
451
+ ≥2,18000,infovqa_val_anls,0.26486387970800995,0.006977819681460442
452
+ ≥2,18000,mme_total_score,1245.188775510204,
453
+ ≥2,18000,mmmu_val_mmmu_acc,0.29222,
454
+ ≥2,18000,mmstar_average,0.32473355790957514,
455
+ ≥2,18000,ocrbench_ocrbench_accuracy,0.555,
456
+ ≥2,18000,seedbench_seed_all,0.5365202890494719,
457
+ ≥2,18000,textvqa_val_exact_match,0.56884,0.006699820027260398
458
+ ≥2,19000,ai2d_exact_match,0.45466321243523317,0.00896208360613934
459
+ ≥2,19000,average,0.4768734192584572,
460
+ ≥2,19000,average_rank,2.6,
461
+ ≥2,19000,chartqa_relaxed_overall,0.62,0.009709671008043154
462
+ ≥2,19000,docvqa_val_anls,0.6628357233664792,0.006042075311037487
463
+ ≥2,19000,infovqa_val_anls,0.2657171063652747,0.007078002720459511
464
+ ≥2,19000,mme_total_score,1248.7323929571828,
465
+ ≥2,19000,mmmu_val_mmmu_acc,0.28889,
466
+ ≥2,19000,mmstar_average,0.32802808302127334,
467
+ ≥2,19000,ocrbench_ocrbench_accuracy,0.565,
468
+ ≥2,19000,seedbench_seed_all,0.5399666481378543,
469
+ ≥2,19000,textvqa_val_exact_match,0.5667599999999999,0.00671422643700147
470
+ ≥2,20000,ai2d_exact_match,0.46178756476683935,0.008972834678172942
471
+ ≥2,20000,average,0.47802392695549656,
472
+ ≥2,20000,average_rank,2.3,
473
+ ≥2,20000,chartqa_relaxed_overall,0.618,0.009719474639861454
474
+ ≥2,20000,docvqa_val_anls,0.666568303416173,0.0059980334517589756
475
+ ≥2,20000,infovqa_val_anls,0.2651324480102521,0.0070565217028431
476
+ ≥2,20000,mme_total_score,1233.6009403761504,
477
+ ≥2,20000,mmmu_val_mmmu_acc,0.28,
478
+ ≥2,20000,mmstar_average,0.33277914424945065,
479
+ ≥2,20000,ocrbench_ocrbench_accuracy,0.562,
480
+ ≥2,20000,seedbench_seed_all,0.5381878821567537,
481
+ ≥2,20000,textvqa_val_exact_match,0.5777599999999999,0.00668799090343766
482
+ ≥3,1000,ai2d_exact_match,0.2661917098445596,0.007954634970279362
483
+ ≥3,1000,average,0.2680725844272073,
484
+ ≥3,1000,average_rank,3.2,
485
+ ≥3,1000,chartqa_relaxed_overall,0.3476,0.009526069199715017
486
+ ≥3,1000,docvqa_val_anls,0.3752729856163278,0.005939283617489936
487
+ ≥3,1000,infovqa_val_anls,0.17325429231808173,0.0062340220795234725
488
+ ≥3,1000,mme_total_score,707.53231292517,
489
+ ≥3,1000,mmmu_val_mmmu_acc,0.23889,
490
+ ≥3,1000,mmstar_average,0.19784737378907616,
491
+ ≥3,1000,ocrbench_ocrbench_accuracy,0.288,
492
+ ≥3,1000,seedbench_seed_all,0.25041689827682045,
493
+ ≥3,1000,textvqa_val_exact_match,0.27518000000000004,0.006128613668775364
494
+ ≥3,2000,ai2d_exact_match,0.27266839378238344,0.008015217564479073
495
+ ≥3,2000,average,0.31253656058741547,
496
+ ≥3,2000,average_rank,3.4,
497
+ ≥3,2000,chartqa_relaxed_overall,0.4308,0.00990574548014469
498
+ ≥3,2000,docvqa_val_anls,0.4481749259885666,0.00619992092326252
499
+ ≥3,2000,infovqa_val_anls,0.19674507942801486,0.006580664003046453
500
+ ≥3,2000,mme_total_score,786.0510204081633,
501
+ ≥3,2000,mmmu_val_mmmu_acc,0.23556,
502
+ ≥3,2000,mmstar_average,0.19768658271923586,
503
+ ≥3,2000,ocrbench_ocrbench_accuracy,0.377,
504
+ ≥3,2000,seedbench_seed_all,0.2653140633685381,
505
+ ≥3,2000,textvqa_val_exact_match,0.38888,0.006660461055234364
506
+ ≥3,3000,ai2d_exact_match,0.28270725388601037,0.008104913435481193
507
+ ≥3,3000,average,0.34936609328629703,
508
+ ≥3,3000,average_rank,3.2,
509
+ ≥3,3000,chartqa_relaxed_overall,0.4844,0.009997131241172205
510
+ ≥3,3000,docvqa_val_anls,0.49044354643512195,0.0062294371457984315
511
+ ≥3,3000,infovqa_val_anls,0.21295743099446893,0.006855571779287104
512
+ ≥3,3000,mme_total_score,861.8877551020407,
513
+ ≥3,3000,mmmu_val_mmmu_acc,0.24889,
514
+ ≥3,3000,mmstar_average,0.258368014597926,
515
+ ≥3,3000,ocrbench_ocrbench_accuracy,0.394,
516
+ ≥3,3000,seedbench_seed_all,0.3434685936631462,
517
+ ≥3,3000,textvqa_val_exact_match,0.42906000000000005,0.0067494454796565755
518
+ ≥3,4000,ai2d_exact_match,0.3325777202072539,0.00847966336079129
519
+ ≥3,4000,average,0.3855383645559374,
520
+ ≥3,4000,average_rank,2.8,
521
+ ≥3,4000,chartqa_relaxed_overall,0.508,0.010000720262176365
522
+ ≥3,4000,docvqa_val_anls,0.5226854794419781,0.006293466169647169
523
+ ≥3,4000,infovqa_val_anls,0.2322658206586996,0.007103396837310004
524
+ ≥3,4000,mme_total_score,912.9521808723489,
525
+ ≥3,4000,mmmu_val_mmmu_acc,0.26667,
526
+ ≥3,4000,mmstar_average,0.3070035703119584,
527
+ ≥3,4000,ocrbench_ocrbench_accuracy,0.438,
528
+ ≥3,4000,seedbench_seed_all,0.41684269038354643,
529
+ ≥3,4000,textvqa_val_exact_match,0.4458,0.006781745381100857
530
+ ≥3,5000,ai2d_exact_match,0.34520725388601037,0.008557040186364025
531
+ ≥3,5000,average,0.39676974212184324,
532
+ ≥3,5000,average_rank,2.7,
533
+ ≥3,5000,chartqa_relaxed_overall,0.51,0.01
534
+ ≥3,5000,docvqa_val_anls,0.5420071464866951,0.006256421242173299
535
+ ≥3,5000,infovqa_val_anls,0.21485812900527704,0.0066319183580626885
536
+ ≥3,5000,mme_total_score,957.2279911964786,
537
+ ≥3,5000,mmmu_val_mmmu_acc,0.26111,
538
+ ≥3,5000,mmstar_average,0.30632830702822333,
539
+ ≥3,5000,ocrbench_ocrbench_accuracy,0.44,
540
+ ≥3,5000,seedbench_seed_all,0.47031684269038354,
541
+ ≥3,5000,textvqa_val_exact_match,0.4811,0.00681344572213808
542
+ ≥3,6000,ai2d_exact_match,0.37629533678756477,0.008719379877890884
543
+ ≥3,6000,average,0.40447034433869705,
544
+ ≥3,6000,average_rank,3.3,
545
+ ≥3,6000,chartqa_relaxed_overall,0.5084,0.010000589018267121
546
+ ≥3,6000,docvqa_val_anls,0.5540669563018141,0.006258072329892215
547
+ ≥3,6000,infovqa_val_anls,0.216535214445592,0.00668611609159469
548
+ ≥3,6000,mme_total_score,864.5272108843537,
549
+ ≥3,6000,mmmu_val_mmmu_acc,0.26889,
550
+ ≥3,6000,mmstar_average,0.2932406887895669,
551
+ ≥3,6000,ocrbench_ocrbench_accuracy,0.454,
552
+ ≥3,6000,seedbench_seed_all,0.4848249027237354,
553
+ ≥3,6000,textvqa_val_exact_match,0.48398,0.006803464510517356
554
+ ≥3,7000,ai2d_exact_match,0.3947538860103627,0.008797532848529207
555
+ ≥3,7000,average,0.42355543935120793,
556
+ ≥3,7000,average_rank,2.7,
557
+ ≥3,7000,chartqa_relaxed_overall,0.5488,0.00995424828018316
558
+ ≥3,7000,docvqa_val_anls,0.5797391833660968,0.006220930330963092
559
+ ≥3,7000,infovqa_val_anls,0.2221619185818123,0.00673372198453672
560
+ ≥3,7000,mme_total_score,866.3928571428571,
561
+ ≥3,7000,mmmu_val_mmmu_acc,0.28667,
562
+ ≥3,7000,mmstar_average,0.3209799250686363,
563
+ ≥3,7000,ocrbench_ocrbench_accuracy,0.476,
564
+ ≥3,7000,seedbench_seed_all,0.49327404113396334,
565
+ ≥3,7000,textvqa_val_exact_match,0.48962000000000006,0.006807769110659733
566
+ ≥3,8000,ai2d_exact_match,0.4102979274611399,0.008853146969712133
567
+ ≥3,8000,average,0.42791468613133354,
568
+ ≥3,8000,average_rank,3.4,
569
+ ≥3,8000,chartqa_relaxed_overall,0.5456,0.00996031822662661
570
+ ≥3,8000,docvqa_val_anls,0.5824594046059755,0.006268157085435711
571
+ ≥3,8000,infovqa_val_anls,0.22074277862778585,0.006618518997755148
572
+ ≥3,8000,mme_total_score,788.9880952380953,
573
+ ≥3,8000,mmmu_val_mmmu_acc,0.27556,
574
+ ≥3,8000,mmstar_average,0.32537357643818443,
575
+ ≥3,8000,ocrbench_ocrbench_accuracy,0.5,
576
+ ≥3,8000,seedbench_seed_all,0.5012784880489161,
577
+ ≥3,8000,textvqa_val_exact_match,0.48991999999999997,0.006810591424473371
578
+ ≥3,9000,ai2d_exact_match,0.4251943005181347,0.0088978675214111
579
+ ≥3,9000,average,0.4411468725875502,
580
+ ≥3,9000,average_rank,2.9,
581
+ ≥3,9000,chartqa_relaxed_overall,0.5648,0.009917647296166388
582
+ ≥3,9000,docvqa_val_anls,0.6050413765127355,0.006187758928771102
583
+ ≥3,9000,infovqa_val_anls,0.23301995192200392,0.00676964747288323
584
+ ≥3,9000,mme_total_score,825.0221088435375,
585
+ ≥3,9000,mmmu_val_mmmu_acc,0.27556,
586
+ ≥3,9000,mmstar_average,0.33219983189483276,
587
+ ≥3,9000,ocrbench_ocrbench_accuracy,0.504,
588
+ ≥3,9000,seedbench_seed_all,0.5115063924402445,
589
+ ≥3,9000,textvqa_val_exact_match,0.519,0.006787356896666665
590
+ ≥3,10000,ai2d_exact_match,0.4258419689119171,0.00889962357526378
591
+ ≥3,10000,average,0.44419201562479543,
592
+ ≥3,10000,average_rank,2.7,
593
+ ≥3,10000,chartqa_relaxed_overall,0.576,0.009885782289560632
594
+ ≥3,10000,docvqa_val_anls,0.6087522279355707,0.006173079977045839
595
+ ≥3,10000,infovqa_val_anls,0.24383042893389267,0.0069221731872859795
596
+ ≥3,10000,mme_total_score,915.8061224489795,
597
+ ≥3,10000,mmmu_val_mmmu_acc,0.27333,
598
+ ≥3,10000,mmstar_average,0.3351679228462254,
599
+ ≥3,10000,ocrbench_ocrbench_accuracy,0.489,
600
+ ≥3,10000,seedbench_seed_all,0.5180655919955531,
601
+ ≥3,10000,textvqa_val_exact_match,0.5277400000000001,0.006769908774345677
602
+ ≥3,11000,ai2d_exact_match,0.43426165803108807,0.008921034830887029
603
+ ≥3,11000,average,0.45138194167282136,
604
+ ≥3,11000,average_rank,2.9,
605
+ ≥3,11000,chartqa_relaxed_overall,0.5784,0.009878279615563902
606
+ ≥3,11000,docvqa_val_anls,0.6240570866567314,0.006144737191710238
607
+ ≥3,11000,infovqa_val_anls,0.2562175057951717,0.0071028888697453095
608
+ ≥3,11000,mme_total_score,852.3894557823129,
609
+ ≥3,11000,mmmu_val_mmmu_acc,0.28778,
610
+ ≥3,11000,mmstar_average,0.3331474836051967,
611
+ ≥3,11000,ocrbench_ocrbench_accuracy,0.5,
612
+ ≥3,11000,seedbench_seed_all,0.520733740967204,
613
+ ≥3,11000,textvqa_val_exact_match,0.5278400000000001,0.00678178334931745
614
+ ≥3,12000,ai2d_exact_match,0.4381476683937824,0.008930032335354965
615
+ ≥3,12000,average,0.45691171338244096,
616
+ ≥3,12000,average_rank,2.5,
617
+ ≥3,12000,chartqa_relaxed_overall,0.572,0.009897756626351943
618
+ ≥3,12000,docvqa_val_anls,0.6273497290110698,0.006129247411332687
619
+ ≥3,12000,infovqa_val_anls,0.268135358118058,0.007380056393275344
620
+ ≥3,12000,mme_total_score,893.8265306122448,
621
+ ≥3,12000,mmmu_val_mmmu_acc,0.29556,
622
+ ≥3,12000,mmstar_average,0.34474290394073753,
623
+ ≥3,12000,ocrbench_ocrbench_accuracy,0.508,
624
+ ≥3,12000,seedbench_seed_all,0.5255697609783213,
625
+ ≥3,12000,textvqa_val_exact_match,0.5327,0.006782133990735781
626
+ ≥3,13000,ai2d_exact_match,0.43458549222797926,0.008921805911548512
627
+ ≥3,13000,average,0.4607824778908788,
628
+ ≥3,13000,average_rank,2.7,
629
+ ≥3,13000,chartqa_relaxed_overall,0.5876,0.009847298295140926
630
+ ≥3,13000,docvqa_val_anls,0.6386402725745638,0.006069984676680257
631
+ ≥3,13000,infovqa_val_anls,0.2536816276782758,0.00704241123014852
632
+ ≥3,13000,mme_total_score,941.5953381352541,
633
+ ≥3,13000,mmmu_val_mmmu_acc,0.29667,
634
+ ≥3,13000,mmstar_average,0.34638755445148733,
635
+ ≥3,13000,ocrbench_ocrbench_accuracy,0.53,
636
+ ≥3,13000,seedbench_seed_all,0.5272373540856031,
637
+ ≥3,13000,textvqa_val_exact_match,0.53224,0.00678673179267349
638
+ ≥3,14000,ai2d_exact_match,0.43490932642487046,0.008922573118260885
639
+ ≥3,14000,average,0.4621098839598732,
640
+ ≥3,14000,average_rank,2.8,
641
+ ≥3,14000,chartqa_relaxed_overall,0.5936,0.009825183443166683
642
+ ≥3,14000,docvqa_val_anls,0.6373184890679852,0.006105256249191251
643
+ ≥3,14000,infovqa_val_anls,0.2624975120280117,0.007131056805776271
644
+ ≥3,14000,mme_total_score,901.2585034013605,
645
+ ≥3,14000,mmmu_val_mmmu_acc,0.28444,
646
+ ≥3,14000,mmstar_average,0.34930389493288816,
647
+ ≥3,14000,ocrbench_ocrbench_accuracy,0.517,
648
+ ≥3,14000,seedbench_seed_all,0.5355197331851028,
649
+ ≥3,14000,textvqa_val_exact_match,0.5444,0.006752217894092123
650
+ ≥3,15000,ai2d_exact_match,0.44527202072538863,0.008945084019331405
651
+ ≥3,15000,average,0.46643076140543904,
652
+ ≥3,15000,average_rank,3.0,
653
+ ≥3,15000,chartqa_relaxed_overall,0.5848,0.00985710144918839
654
+ ≥3,15000,docvqa_val_anls,0.642316016710227,0.006100312721783546
655
+ ≥3,15000,infovqa_val_anls,0.2596632231498878,0.007146587424008848
656
+ ≥3,15000,mme_total_score,891.8367346938775,
657
+ ≥3,15000,mmmu_val_mmmu_acc,0.29778,
658
+ ≥3,15000,mmstar_average,0.34413882163543197,
659
+ ≥3,15000,ocrbench_ocrbench_accuracy,0.538,
660
+ ≥3,15000,seedbench_seed_all,0.5361867704280155,
661
+ ≥3,15000,textvqa_val_exact_match,0.54972,0.006745330549116431
662
+ ≥3,16000,ai2d_exact_match,0.4494818652849741,0.008953103134587205
663
+ ≥3,16000,average,0.46786516199576034,
664
+ ≥3,16000,average_rank,2.7,
665
+ ≥3,16000,chartqa_relaxed_overall,0.5976,0.009809596692775395
666
+ ≥3,16000,docvqa_val_anls,0.6432815750341822,0.006081847680686157
667
+ ≥3,16000,infovqa_val_anls,0.2702450654855036,0.007372825383364985
668
+ ≥3,16000,mme_total_score,919.3826530612245,
669
+ ≥3,16000,mmmu_val_mmmu_acc,0.28333,
670
+ ≥3,16000,mmstar_average,0.3386692973489569,
671
+ ≥3,16000,ocrbench_ocrbench_accuracy,0.534,
672
+ ≥3,16000,seedbench_seed_all,0.5415786548082268,
673
+ ≥3,16000,textvqa_val_exact_match,0.5526,0.006745409410081935
674
+ ≥3,17000,ai2d_exact_match,0.4494818652849741,0.008953103134587206
675
+ ≥3,17000,average,0.4694732091424512,
676
+ ≥3,17000,average_rank,2.8,
677
+ ≥3,17000,chartqa_relaxed_overall,0.596,0.009815912634917984
678
+ ≥3,17000,docvqa_val_anls,0.6468732282054332,0.006069886071041202
679
+ ≥3,17000,infovqa_val_anls,0.2650584835459577,0.0072427928867972455
680
+ ≥3,17000,mme_total_score,889.5646258503401,
681
+ ≥3,17000,mmmu_val_mmmu_acc,0.29333,
682
+ ≥3,17000,mmstar_average,0.342978718252922,
683
+ ≥3,17000,ocrbench_ocrbench_accuracy,0.53,
684
+ ≥3,17000,seedbench_seed_all,0.5418565869927737,
685
+ ≥3,17000,textvqa_val_exact_match,0.5596800000000001,0.006734324743131207
686
+ ≥3,18000,ai2d_exact_match,0.45531088082901555,0.008963137311190377
687
+ ≥3,18000,average,0.46991408851845295,
688
+ ≥3,18000,average_rank,2.7,
689
+ ≥3,18000,chartqa_relaxed_overall,0.6036,0.009784943231599163
690
+ ≥3,18000,docvqa_val_anls,0.6501128555487647,0.006068985343727089
691
+ ≥3,18000,infovqa_val_anls,0.26796275265157754,0.007202201134473747
692
+ ≥3,18000,mme_total_score,894.1054421768707,
693
+ ≥3,18000,mmmu_val_mmmu_acc,0.28333,
694
+ ≥3,18000,mmstar_average,0.33590517144994875,
695
+ ≥3,18000,ocrbench_ocrbench_accuracy,0.534,
696
+ ≥3,18000,seedbench_seed_all,0.5412451361867704,
697
+ ≥3,18000,textvqa_val_exact_match,0.5577599999999999,0.0067408786051132655
698
+ ≥3,19000,ai2d_exact_match,0.4498056994818653,0.008953693133598168
699
+ ≥3,19000,average,0.47011136523574254,
700
+ ≥3,19000,average_rank,3.0,
701
+ ≥3,19000,chartqa_relaxed_overall,0.6096,0.009758751420735989
702
+ ≥3,19000,docvqa_val_anls,0.6538834113203496,0.006040538366936906
703
+ ≥3,19000,infovqa_val_anls,0.2705360277052952,0.007291872911349649
704
+ ≥3,19000,mme_total_score,906.3231292517007,
705
+ ≥3,19000,mmmu_val_mmmu_acc,0.27556,
706
+ ≥3,19000,mmstar_average,0.3356215177081144,
707
+ ≥3,19000,ocrbench_ocrbench_accuracy,0.539,
708
+ ≥3,19000,seedbench_seed_all,0.5441356309060589,
709
+ ≥3,19000,textvqa_val_exact_match,0.5528599999999999,0.006753272200724876
710
+ ≥3,20000,ai2d_exact_match,0.44656735751295334,0.008947620544957215
711
+ ≥3,20000,average,0.4679556547685855,
712
+ ≥3,20000,average_rank,2.9,
713
+ ≥3,20000,chartqa_relaxed_overall,0.5976,0.009809596692775395
714
+ ≥3,20000,docvqa_val_anls,0.6493769742508846,0.006072933213063366
715
+ ≥3,20000,infovqa_val_anls,0.26540905854876357,0.007209592372844281
716
+ ≥3,20000,mme_total_score,926.0901360544218,
717
+ ≥3,20000,mmmu_val_mmmu_acc,0.27333,
718
+ ≥3,20000,mmstar_average,0.34157097675697473,
719
+ ≥3,20000,ocrbench_ocrbench_accuracy,0.539,
720
+ ≥3,20000,seedbench_seed_all,0.5437465258476931,
721
+ ≥3,20000,textvqa_val_exact_match,0.555,0.0067346322137300735
722
+ ≥4,1000,ai2d_exact_match,0.25874352331606215,0.00788225861008497
723
+ ≥4,1000,average,0.27914578527127093,
724
+ ≥4,1000,average_rank,2.6,
725
+ ≥4,1000,chartqa_relaxed_overall,0.3512,0.009548816468986268
726
+ ≥4,1000,docvqa_val_anls,0.36858592315444033,0.005921151680127505
727
+ ≥4,1000,infovqa_val_anls,0.17699311795329079,0.006346227986201575
728
+ ≥4,1000,mme_total_score,671.343537414966,
729
+ ≥4,1000,mmmu_val_mmmu_acc,0.27111,
730
+ ≥4,1000,mmstar_average,0.2086858732233149,
731
+ ≥4,1000,ocrbench_ocrbench_accuracy,0.261,
732
+ ≥4,1000,seedbench_seed_all,0.2605336297943302,
733
+ ≥4,1000,textvqa_val_exact_match,0.35546000000000005,0.006549153835664011
734
+ ≥4,2000,ai2d_exact_match,0.280440414507772,0.008085099461783339
735
+ ≥4,2000,average,0.320025358717614,
736
+ ≥4,2000,average_rank,2.8,
737
+ ≥4,2000,chartqa_relaxed_overall,0.4488,0.009949423119365426
738
+ ≥4,2000,docvqa_val_anls,0.43140645952438456,0.006042366638541379
739
+ ≥4,2000,infovqa_val_anls,0.16528808420419083,0.005907032628809945
740
+ ≥4,2000,mme_total_score,705.5901360544218,
741
+ ≥4,2000,mmmu_val_mmmu_acc,0.27222,
742
+ ≥4,2000,mmstar_average,0.24877125799316246,
743
+ ≥4,2000,ocrbench_ocrbench_accuracy,0.329,
744
+ ≥4,2000,seedbench_seed_all,0.3196220122290161,
745
+ ≥4,2000,textvqa_val_exact_match,0.38468,0.006645983248449226
746
+ ≥4,3000,ai2d_exact_match,0.34617875647668395,0.008562713351618977
747
+ ≥4,3000,average,0.3596236953408542,
748
+ ≥4,3000,average_rank,2.8,
749
+ ≥4,3000,chartqa_relaxed_overall,0.468,0.009981495484186743
750
+ ≥4,3000,docvqa_val_anls,0.464923009199496,0.006156900593094097
751
+ ≥4,3000,infovqa_val_anls,0.18011502045718095,0.0061004080312330325
752
+ ≥4,3000,mme_total_score,709.3333333333333,
753
+ ≥4,3000,mmmu_val_mmmu_acc,0.27778,
754
+ ≥4,3000,mmstar_average,0.30716380934399923,
755
+ ≥4,3000,ocrbench_ocrbench_accuracy,0.351,
756
+ ≥4,3000,seedbench_seed_all,0.42679266259032794,
757
+ ≥4,3000,textvqa_val_exact_match,0.41466,0.006725300202411972
758
+ ≥4,4000,ai2d_exact_match,0.36593264248704666,0.008669617940526182
759
+ ≥4,4000,average,0.3829150140884673,
760
+ ≥4,4000,average_rank,2.7,
761
+ ≥4,4000,chartqa_relaxed_overall,0.5136,0.009998299975543861
762
+ ≥4,4000,docvqa_val_anls,0.5002844765367886,0.0062258433013991955
763
+ ≥4,4000,infovqa_val_anls,0.18808280764611432,0.006209185081756124
764
+ ≥4,4000,mme_total_score,700.1989795918367,
765
+ ≥4,4000,mmmu_val_mmmu_acc,0.28889,
766
+ ≥4,4000,mmstar_average,0.3128795636615537,
767
+ ≥4,4000,ocrbench_ocrbench_accuracy,0.379,
768
+ ≥4,4000,seedbench_seed_all,0.46214563646470264,
769
+ ≥4,4000,textvqa_val_exact_match,0.4354199999999999,0.006770365742739316
770
+ ≥4,5000,ai2d_exact_match,0.39702072538860106,0.008806218703419164
771
+ ≥4,5000,average,0.3990130243200321,
772
+ ≥4,5000,average_rank,3.0,
773
+ ≥4,5000,chartqa_relaxed_overall,0.5432,0.009964598400764347
774
+ ≥4,5000,docvqa_val_anls,0.5330701388059006,0.006244542429703876
775
+ ≥4,5000,infovqa_val_anls,0.20064814149562474,0.006400433745304747
776
+ ≥4,5000,mme_total_score,687.6802721088436,
777
+ ≥4,5000,mmmu_val_mmmu_acc,0.26333,
778
+ ≥4,5000,mmstar_average,0.30889646221740025,
779
+ ≥4,5000,ocrbench_ocrbench_accuracy,0.412,
780
+ ≥4,5000,seedbench_seed_all,0.47315175097276263,
781
+ ≥4,5000,textvqa_val_exact_match,0.4598,0.006799443983716428
782
+ ≥4,6000,ai2d_exact_match,0.41224093264248707,0.00885945303235887
783
+ ≥4,6000,average,0.4037939250305515,
784
+ ≥4,6000,average_rank,3.2,
785
+ ≥4,6000,chartqa_relaxed_overall,0.5312,0.009982508912777261
786
+ ≥4,6000,docvqa_val_anls,0.5259911309932884,0.006272635836910295
787
+ ≥4,6000,infovqa_val_anls,0.22056731437063212,0.00674209963892894
788
+ ≥4,6000,mme_total_score,717.0051020408164,
789
+ ≥4,6000,mmmu_val_mmmu_acc,0.26667,
790
+ ≥4,6000,mmstar_average,0.3316518783413741,
791
+ ≥4,6000,ocrbench_ocrbench_accuracy,0.408,
792
+ ≥4,6000,seedbench_seed_all,0.48332406892718177,
793
+ ≥4,6000,textvqa_val_exact_match,0.4544999999999999,0.006790726970992053
794
+ ≥4,7000,ai2d_exact_match,0.4102979274611399,0.008853146969712133
795
+ ≥4,7000,average,0.41740315045514464,
796
+ ≥4,7000,average_rank,3.1,
797
+ ≥4,7000,chartqa_relaxed_overall,0.5588,0.009932597172675325
798
+ ≥4,7000,docvqa_val_anls,0.5597972576652357,0.0062571833970283125
799
+ ≥4,7000,infovqa_val_anls,0.21665617889681224,0.006562362156515704
800
+ ≥4,7000,mme_total_score,716.7908163265306,
801
+ ≥4,7000,mmmu_val_mmmu_acc,0.28556,
802
+ ≥4,7000,mmstar_average,0.32150517239662685,
803
+ ≥4,7000,ocrbench_ocrbench_accuracy,0.431,
804
+ ≥4,7000,seedbench_seed_all,0.4892718176764869,
805
+ ≥4,7000,textvqa_val_exact_match,0.48374000000000006,0.006820617761268334
806
+ ≥4,8000,ai2d_exact_match,0.4213082901554404,0.00888700282309854
807
+ ≥4,8000,average,0.4251708917847074,
808
+ ≥4,8000,average_rank,2.9,
809
+ ≥4,8000,chartqa_relaxed_overall,0.564,0.009919725822025206
810
+ ≥4,8000,docvqa_val_anls,0.5702706873242411,0.006237250618852069
811
+ ≥4,8000,infovqa_val_anls,0.24000454829818865,0.006935520157929643
812
+ ≥4,8000,mme_total_score,705.8180272108843,
813
+ ≥4,8000,mmmu_val_mmmu_acc,0.28778,
814
+ ≥4,8000,mmstar_average,0.3384645614295773,
815
+ ≥4,8000,ocrbench_ocrbench_accuracy,0.42,
816
+ ≥4,8000,seedbench_seed_all,0.5018899388549194,
817
+ ≥4,8000,textvqa_val_exact_match,0.48281999999999997,0.006811185503977551
818
+ ≥4,9000,ai2d_exact_match,0.4319948186528497,0.008915528710615487
819
+ ≥4,9000,average,0.4318231930659084,
820
+ ≥4,9000,average_rank,2.9,
821
+ ≥4,9000,chartqa_relaxed_overall,0.5676,0.009910165515884228
822
+ ≥4,9000,docvqa_val_anls,0.5846178021051754,0.006187149390116838
823
+ ≥4,9000,infovqa_val_anls,0.2228617948699063,0.0066001763459020155
824
+ ≥4,9000,mme_total_score,733.3503401360545,
825
+ ≥4,9000,mmmu_val_mmmu_acc,0.28444,
826
+ ≥4,9000,mmstar_average,0.3307124653782511,
827
+ ≥4,9000,ocrbench_ocrbench_accuracy,0.463,
828
+ ≥4,9000,seedbench_seed_all,0.5153418565869927,
829
+ ≥4,9000,textvqa_val_exact_match,0.48583999999999994,0.0068269187957708125
830
+ ≥4,10000,ai2d_exact_match,0.4410621761658031,0.0089364152923413
831
+ ≥4,10000,average,0.436822457787989,
832
+ ≥4,10000,average_rank,3.4,
833
+ ≥4,10000,chartqa_relaxed_overall,0.5756,0.009887009516677585
834
+ ≥4,10000,docvqa_val_anls,0.591441723638793,0.0062031994384821754
835
+ ≥4,10000,infovqa_val_anls,0.22327754225685992,0.00649750251357461
836
+ ≥4,10000,mme_total_score,695.8112244897959,
837
+ ≥4,10000,mmmu_val_mmmu_acc,0.28778,
838
+ ≥4,10000,mmstar_average,0.33369690927002266,
839
+ ≥4,10000,ocrbench_ocrbench_accuracy,0.472,
840
+ ≥4,10000,seedbench_seed_all,0.5107837687604224,
841
+ ≥4,10000,textvqa_val_exact_match,0.49576000000000003,0.006808118284439173
842
+ ≥4,11000,ai2d_exact_match,0.44332901554404147,0.008941163900483134
843
+ ≥4,11000,average,0.44624717945755144,
844
+ ≥4,11000,average_rank,3.0,
845
+ ≥4,11000,chartqa_relaxed_overall,0.5868,0.009850132691777215
846
+ ≥4,11000,docvqa_val_anls,0.60625861922937,0.006159202385167996
847
+ ≥4,11000,infovqa_val_anls,0.2435454505191485,0.006860039872881237
848
+ ≥4,11000,mme_total_score,751.1462585034014,
849
+ ≥4,11000,mmmu_val_mmmu_acc,0.29222,
850
+ ≥4,11000,mmstar_average,0.3470954764624236,
851
+ ≥4,11000,ocrbench_ocrbench_accuracy,0.486,
852
+ ≥4,11000,seedbench_seed_all,0.5128960533629794,
853
+ ≥4,11000,textvqa_val_exact_match,0.49808,0.006799508024988012
854
+ ≥4,12000,ai2d_exact_match,0.45142487046632124,0.008956585653027465
855
+ ≥4,12000,average,0.44514971381341617,
856
+ ≥4,12000,average_rank,3.3,
857
+ ≥4,12000,chartqa_relaxed_overall,0.5868,0.009850132691777215
858
+ ≥4,12000,docvqa_val_anls,0.6047188055272135,0.0061847009209673315
859
+ ≥4,12000,infovqa_val_anls,0.2506217753279014,0.006972909032069362
860
+ ≥4,12000,mme_total_score,742.969387755102,
861
+ ≥4,12000,mmmu_val_mmmu_acc,0.28556,
862
+ ≥4,12000,mmstar_average,0.33917912697374,
863
+ ≥4,12000,ocrbench_ocrbench_accuracy,0.459,
864
+ ≥4,12000,seedbench_seed_all,0.5211228460255698,
865
+ ≥4,12000,textvqa_val_exact_match,0.5079199999999999,0.006798462954205747
866
+ ≥4,13000,ai2d_exact_match,0.44689119170984454,0.00894824507304496
867
+ ≥4,13000,average,0.4478540374461813,
868
+ ≥4,13000,average_rank,3.5,
869
+ ≥4,13000,chartqa_relaxed_overall,0.5936,0.009825183443166683
870
+ ≥4,13000,docvqa_val_anls,0.6123877664020703,0.0061423212651813735
871
+ ≥4,13000,infovqa_val_anls,0.23197941094655744,0.0066388766376455225
872
+ ≥4,13000,mme_total_score,705.0068027210884,
873
+ ≥4,13000,mmmu_val_mmmu_acc,0.29444,
874
+ ≥4,13000,mmstar_average,0.3172158612312008,
875
+ ≥4,13000,ocrbench_ocrbench_accuracy,0.5,
876
+ ≥4,13000,seedbench_seed_all,0.5257921067259589,
877
+ ≥4,13000,textvqa_val_exact_match,0.50838,0.006803735244897213
878
+ ≥4,14000,ai2d_exact_match,0.45628238341968913,0.008964689215887884
879
+ ≥4,14000,average,0.4541657280018954,
880
+ ≥4,14000,average_rank,3.3,
881
+ ≥4,14000,chartqa_relaxed_overall,0.5988,0.0098047885010856
882
+ ≥4,14000,docvqa_val_anls,0.6230752215069362,0.006110772532320183
883
+ ≥4,14000,infovqa_val_anls,0.23950752488444424,0.00673701613611272
884
+ ≥4,14000,mme_total_score,693.2602040816327,
885
+ ≥4,14000,mmmu_val_mmmu_acc,0.29,
886
+ ≥4,14000,mmstar_average,0.3462132371031542,
887
+ ≥4,14000,ocrbench_ocrbench_accuracy,0.492,
888
+ ≥4,14000,seedbench_seed_all,0.519733185102835,
889
+ ≥4,14000,textvqa_val_exact_match,0.52188,0.0067822601638824
890
+ ≥4,15000,ai2d_exact_match,0.4536917098445596,0.008960474382205331
891
+ ≥4,15000,average,0.4546421832173102,
892
+ ≥4,15000,average_rank,3.5,
893
+ ≥4,15000,chartqa_relaxed_overall,0.6012,0.0097949885513097
894
+ ≥4,15000,docvqa_val_anls,0.6265798815467575,0.006118388682866076
895
+ ≥4,15000,infovqa_val_anls,0.24253641235942872,0.006778846024017067
896
+ ≥4,15000,mme_total_score,745.8826530612245,
897
+ ≥4,15000,mmmu_val_mmmu_acc,0.28111,
898
+ ≥4,15000,mmstar_average,0.3514223789460134,
899
+ ≥4,15000,ocrbench_ocrbench_accuracy,0.493,
900
+ ≥4,15000,seedbench_seed_all,0.5226792662590328,
901
+ ≥4,15000,textvqa_val_exact_match,0.51956,0.006792518600768668
902
+ ≥4,16000,ai2d_exact_match,0.4582253886010363,0.008967689939886603
903
+ ≥4,16000,average,0.46307812033280477,
904
+ ≥4,16000,average_rank,3.1,
905
+ ≥4,16000,chartqa_relaxed_overall,0.6092,0.009760545645634788
906
+ ≥4,16000,docvqa_val_anls,0.6397697311161549,0.006077931892063438
907
+ ≥4,16000,infovqa_val_anls,0.2566929717899322,0.007049147355082826
908
+ ≥4,16000,mme_total_score,769.8112244897959,
909
+ ≥4,16000,mmmu_val_mmmu_acc,0.29,
910
+ ≥4,16000,mmstar_average,0.3500855084419833,
911
+ ≥4,16000,ocrbench_ocrbench_accuracy,0.512,
912
+ ≥4,16000,seedbench_seed_all,0.5250694830461368,
913
+ ≥4,16000,textvqa_val_exact_match,0.5266599999999999,0.006785297114451678
914
+ ≥4,17000,ai2d_exact_match,0.46243523316062174,0.008973720555405783
915
+ ≥4,17000,average,0.4637285100748874,
916
+ ≥4,17000,average_rank,3.2,
917
+ ≥4,17000,chartqa_relaxed_overall,0.6072,0.00976941352263433
918
+ ≥4,17000,docvqa_val_anls,0.6316407990464801,0.006115829668357635
919
+ ≥4,17000,infovqa_val_anls,0.26095289130380417,0.007179006033610968
920
+ ≥4,17000,mme_total_score,772.2568027210884,
921
+ ≥4,17000,mmmu_val_mmmu_acc,0.29222,
922
+ ≥4,17000,mmstar_average,0.3487846654954876,
923
+ ≥4,17000,ocrbench_ocrbench_accuracy,0.516,
924
+ ≥4,17000,seedbench_seed_all,0.5254030016675931,
925
+ ≥4,17000,textvqa_val_exact_match,0.52892,0.006777692390690844
926
+ ≥4,18000,ai2d_exact_match,0.46729274611398963,0.00897987952745343
927
+ ≥4,18000,average,0.46301237822364466,
928
+ ≥4,18000,average_rank,3.5,
929
+ ≥4,18000,chartqa_relaxed_overall,0.6024,0.009789996609470577
930
+ ≥4,18000,docvqa_val_anls,0.6353229754668962,0.006102794809473289
931
+ ≥4,18000,infovqa_val_anls,0.2566414572268362,0.006998597263140097
932
+ ≥4,18000,mme_total_score,770.295918367347,
933
+ ≥4,18000,mmmu_val_mmmu_acc,0.27778,
934
+ ≥4,18000,mmstar_average,0.3522173046936848,
935
+ ≥4,18000,ocrbench_ocrbench_accuracy,0.518,
936
+ ≥4,18000,seedbench_seed_all,0.5224569205113953,
937
+ ≥4,18000,textvqa_val_exact_match,0.535,0.006782934589123506
938
+ ≥4,19000,ai2d_exact_match,0.4647020725388601,0.008976701230834869
939
+ ≥4,19000,average,0.4657296959805982,
940
+ ≥4,19000,average_rank,3.3,
941
+ ≥4,19000,chartqa_relaxed_overall,0.6088,0.009762332982341016
942
+ ≥4,19000,docvqa_val_anls,0.6386155506856869,0.006091782897731878
943
+ ≥4,19000,infovqa_val_anls,0.2477875071753752,0.006879861435025137
944
+ ≥4,19000,mme_total_score,772.204081632653,
945
+ ≥4,19000,mmmu_val_mmmu_acc,0.30333,
946
+ ≥4,19000,mmstar_average,0.3470027726694857,
947
+ ≥4,19000,ocrbench_ocrbench_accuracy,0.512,
948
+ ≥4,19000,seedbench_seed_all,0.5288493607559756,
949
+ ≥4,19000,textvqa_val_exact_match,0.54048,0.006763536279536092
950
+ ≥4,20000,ai2d_exact_match,0.4634067357512953,0.008975020819363737
951
+ ≥4,20000,average,0.46162598712482705,
952
+ ≥4,20000,average_rank,3.4,
953
+ ≥4,20000,chartqa_relaxed_overall,0.61,0.009756950303844571
954
+ ≥4,20000,docvqa_val_anls,0.6435026807424298,0.006070985460919362
955
+ ≥4,20000,infovqa_val_anls,0.2543282868714285,0.006962743278022537
956
+ ≥4,20000,mme_total_score,765.8690476190477,
957
+ ≥4,20000,mmmu_val_mmmu_acc,0.27222,
958
+ ≥4,20000,mmstar_average,0.34236379610014667,
959
+ ≥4,20000,ocrbench_ocrbench_accuracy,0.509,
960
+ ≥4,20000,seedbench_seed_all,0.5262923846581434,
961
+ ≥4,20000,textvqa_val_exact_match,0.53352,0.006776464123213716
962
+ ≥5,1000,ai2d_exact_match,0.24902849740932642,0.007783374690341817
963
+ ≥5,1000,average,0.23561247048158757,
964
+ ≥5,1000,average_rank,4.2,
965
+ ≥5,1000,chartqa_relaxed_overall,0.2548,0.008716718216771047
966
+ ≥5,1000,docvqa_val_anls,0.24096701334945672,0.004990683419188375
967
+ ≥5,1000,infovqa_val_anls,0.12232054164836681,0.0051959928578510384
968
+ ≥5,1000,mme_total_score,620.9336734693877,
969
+ ≥5,1000,mmmu_val_mmmu_acc,0.23778,
970
+ ≥5,1000,mmstar_average,0.26414819971479786,
971
+ ≥5,1000,ocrbench_ocrbench_accuracy,0.216,
972
+ ≥5,1000,seedbench_seed_all,0.2623679822123402,
973
+ ≥5,1000,textvqa_val_exact_match,0.27310000000000006,0.0061250290771750005
974
+ ≥5,2000,ai2d_exact_match,0.2344559585492228,0.007625132817591135
975
+ ≥5,2000,average,0.2752283006434932,
976
+ ≥5,2000,average_rank,4.2,
977
+ ≥5,2000,chartqa_relaxed_overall,0.3732,0.009675026948726469
978
+ ≥5,2000,docvqa_val_anls,0.331054267713041,0.005645142408620243
979
+ ≥5,2000,infovqa_val_anls,0.1253737215538702,0.00524700917894423
980
+ ≥5,2000,mme_total_score,678.2414965986395,
981
+ ≥5,2000,mmmu_val_mmmu_acc,0.24,
982
+ ≥5,2000,mmstar_average,0.24144442112149672,
983
+ ≥5,2000,ocrbench_ocrbench_accuracy,0.264,
984
+ ≥5,2000,seedbench_seed_all,0.33140633685380766,
985
+ ≥5,2000,textvqa_val_exact_match,0.33612,0.006470505591414144
986
+ ≥5,3000,ai2d_exact_match,0.22409326424870465,0.007505002611196186
987
+ ≥5,3000,average,0.29997958942235364,
988
+ ≥5,3000,average_rank,4.3,
989
+ ≥5,3000,chartqa_relaxed_overall,0.392,0.00976588700628918
990
+ ≥5,3000,docvqa_val_anls,0.37299390513630937,0.005683849773109756
991
+ ≥5,3000,infovqa_val_anls,0.13605101039483827,0.005410567699808442
992
+ ≥5,3000,mme_total_score,659.7210884353742,
993
+ ≥5,3000,mmmu_val_mmmu_acc,0.27,
994
+ ≥5,3000,mmstar_average,0.2682811266889234,
995
+ ≥5,3000,ocrbench_ocrbench_accuracy,0.287,
996
+ ≥5,3000,seedbench_seed_all,0.3745969983324069,
997
+ ≥5,3000,textvqa_val_exact_match,0.3748,0.006628980364742018
998
+ ≥5,4000,ai2d_exact_match,0.22733160621761658,0.007543244231635894
999
+ ≥5,4000,average,0.3084813519082869,
1000
+ ≥5,4000,average_rank,4.7,
1001
+ ≥5,4000,chartqa_relaxed_overall,0.43,0.00990349593288537
1002
+ ≥5,4000,docvqa_val_anls,0.4066720118815712,0.006028824654560211
1003
+ ≥5,4000,infovqa_val_anls,0.14319025154556023,0.005617800071290847
1004
+ ≥5,4000,mme_total_score,656.1462585034013,
1005
+ ≥5,4000,mmmu_val_mmmu_acc,0.25667,
1006
+ ≥5,4000,mmstar_average,0.2585945343280555,
1007
+ ≥5,4000,ocrbench_ocrbench_accuracy,0.294,
1008
+ ≥5,4000,seedbench_seed_all,0.39277376320177876,
1009
+ ≥5,4000,textvqa_val_exact_match,0.3671,0.006592830278584186
1010
+ ≥5,5000,ai2d_exact_match,0.24028497409326424,0.007689893942245019
1011
+ ≥5,5000,average,0.3230129052623469,
1012
+ ≥5,5000,average_rank,4.9,
1013
+ ≥5,5000,chartqa_relaxed_overall,0.442,0.009934479228979264
1014
+ ≥5,5000,docvqa_val_anls,0.43465518326761016,0.006092084287625314
1015
+ ≥5,5000,infovqa_val_anls,0.16044569408280707,0.005985099003597859
1016
+ ≥5,5000,mme_total_score,700.9234693877552,
1017
+ ≥5,5000,mmmu_val_mmmu_acc,0.26,
1018
+ ≥5,5000,mmstar_average,0.27948727201527235,
1019
+ ≥5,5000,ocrbench_ocrbench_accuracy,0.309,
1020
+ ≥5,5000,seedbench_seed_all,0.39744302390216785,
1021
+ ≥5,5000,textvqa_val_exact_match,0.3838,0.006651041968883851
1022
+ ≥5,6000,ai2d_exact_match,0.21761658031088082,0.007426556596739526
1023
+ ≥5,6000,average,0.3285664644731758,
1024
+ ≥5,6000,average_rank,4.8,
1025
+ ≥5,6000,chartqa_relaxed_overall,0.4708,0.009984929820955767
1026
+ ≥5,6000,docvqa_val_anls,0.4274906773084525,0.005930539560380286
1027
+ ≥5,6000,infovqa_val_anls,0.15122815225662642,0.005687399721363878
1028
+ ≥5,6000,mme_total_score,692.2227891156463,
1029
+ ≥5,6000,mmmu_val_mmmu_acc,0.27,
1030
+ ≥5,6000,mmstar_average,0.27596736182231085,
1031
+ ≥5,6000,ocrbench_ocrbench_accuracy,0.341,
1032
+ ≥5,6000,seedbench_seed_all,0.4237354085603113,
1033
+ ≥5,6000,textvqa_val_exact_match,0.37926,0.006628782590470618
1034
+ ≥5,7000,ai2d_exact_match,0.22959844559585493,0.007569631399592313
1035
+ ≥5,7000,average,0.3397133831241853,
1036
+ ≥5,7000,average_rank,5.0,
1037
+ ≥5,7000,chartqa_relaxed_overall,0.4864,0.009998299975543861
1038
+ ≥5,7000,docvqa_val_anls,0.4538685197224749,0.00598758370400633
1039
+ ≥5,7000,infovqa_val_anls,0.15500462855057698,0.005842239614739797
1040
+ ≥5,7000,mme_total_score,662.3809523809523,
1041
+ ≥5,7000,mmmu_val_mmmu_acc,0.26444,
1042
+ ≥5,7000,mmstar_average,0.2946102327923966,
1043
+ ≥5,7000,ocrbench_ocrbench_accuracy,0.339,
1044
+ ≥5,7000,seedbench_seed_all,0.43351862145636466,
1045
+ ≥5,7000,textvqa_val_exact_match,0.40098,0.00668858395709213
1046
+ ≥5,8000,ai2d_exact_match,0.26878238341968913,0.007979127569354613
1047
+ ≥5,8000,average,0.3468669425903158,
1048
+ ≥5,8000,average_rank,4.7,
1049
+ ≥5,8000,chartqa_relaxed_overall,0.4644,0.009976616117083942
1050
+ ≥5,8000,docvqa_val_anls,0.43320064291973065,0.005825461000081097
1051
+ ≥5,8000,infovqa_val_anls,0.1525871677997588,0.0057380999639673955
1052
+ ≥5,8000,mme_total_score,714.7789115646258,
1053
+ ≥5,8000,mmmu_val_mmmu_acc,0.27667,
1054
+ ≥5,8000,mmstar_average,0.3189178311414238,
1055
+ ≥5,8000,ocrbench_ocrbench_accuracy,0.358,
1056
+ ≥5,8000,seedbench_seed_all,0.4440244580322401,
1057
+ ≥5,8000,textvqa_val_exact_match,0.40522,0.006705157876473132
1058
+ ≥5,9000,ai2d_exact_match,0.23834196891191708,0.007668527149232641
1059
+ ≥5,9000,average,0.34742834361066494,
1060
+ ≥5,9000,average_rank,4.9,
1061
+ ≥5,9000,chartqa_relaxed_overall,0.4832,0.009996353076494045
1062
+ ≥5,9000,docvqa_val_anls,0.44997891177952337,0.005999690608407377
1063
+ ≥5,9000,infovqa_val_anls,0.15249014258349003,0.005725765633377559
1064
+ ≥5,9000,mme_total_score,696.5544217687075,
1065
+ ≥5,9000,mmmu_val_mmmu_acc,0.26444,
1066
+ ≥5,9000,mmstar_average,0.3019547640515156,
1067
+ ≥5,9000,ocrbench_ocrbench_accuracy,0.384,
1068
+ ≥5,9000,seedbench_seed_all,0.44874930516953865,
1069
+ ≥5,9000,textvqa_val_exact_match,0.4037,0.006699928343494548
1070
+ ≥5,10000,ai2d_exact_match,0.2979274611398964,0.008231480357867917
1071
+ ≥5,10000,average,0.3538147252476138,
1072
+ ≥5,10000,average_rank,4.8,
1073
+ ≥5,10000,chartqa_relaxed_overall,0.48,0.009993995796516643
1074
+ ≥5,10000,docvqa_val_anls,0.45125781190343667,0.0059273100312449535
1075
+ ≥5,10000,infovqa_val_anls,0.15739085013451903,0.005776029267754871
1076
+ ≥5,10000,mme_total_score,718.7227891156463,
1077
+ ≥5,10000,mmmu_val_mmmu_acc,0.27556,
1078
+ ≥5,10000,mmstar_average,0.3004387942674594,
1079
+ ≥5,10000,ocrbench_ocrbench_accuracy,0.357,
1080
+ ≥5,10000,seedbench_seed_all,0.4556976097832129,
1081
+ ≥5,10000,textvqa_val_exact_match,0.40906000000000003,0.006714715240436636
1082
+ ≥5,11000,ai2d_exact_match,0.3167098445595855,0.008372690712254882
1083
+ ≥5,11000,average,0.36396020347184427,
1084
+ ≥5,11000,average_rank,5.0,
1085
+ ≥5,11000,chartqa_relaxed_overall,0.4924,0.010000845102345324
1086
+ ≥5,11000,docvqa_val_anls,0.4691277601070516,0.0060867637597330085
1087
+ ≥5,11000,infovqa_val_anls,0.15562897334070494,0.005768608804593679
1088
+ ≥5,11000,mme_total_score,680.7667066826731,
1089
+ ≥5,11000,mmmu_val_mmmu_acc,0.27667,
1090
+ ≥5,11000,mmstar_average,0.3111702671358657,
1091
+ ≥5,11000,ocrbench_ocrbench_accuracy,0.388,
1092
+ ≥5,11000,seedbench_seed_all,0.45497498610339077,
1093
+ ≥5,11000,textvqa_val_exact_match,0.41096000000000005,0.006715250896200365
1094
+ ≥5,12000,ai2d_exact_match,0.24838082901554404,0.007776597937116943
1095
+ ≥5,12000,average,0.35400963042471534,
1096
+ ≥5,12000,average_rank,4.9,
1097
+ ≥5,12000,chartqa_relaxed_overall,0.4624,0.00997367964766694
1098
+ ≥5,12000,docvqa_val_anls,0.46480289866811825,0.005910238300168798
1099
+ ≥5,12000,infovqa_val_anls,0.15657154481637633,0.0057842205757870115
1100
+ ≥5,12000,mme_total_score,742.4894957983194,
1101
+ ≥5,12000,mmmu_val_mmmu_acc,0.28444,
1102
+ ≥5,12000,mmstar_average,0.30237252416842486,
1103
+ ≥5,12000,ocrbench_ocrbench_accuracy,0.391,
1104
+ ≥5,12000,seedbench_seed_all,0.46197887715397445,
1105
+ ≥5,12000,textvqa_val_exact_match,0.41414000000000006,0.0067237975855013775
1106
+ ≥5,13000,ai2d_exact_match,0.27266839378238344,0.008015217564479081
1107
+ ≥5,13000,average,0.3605408154099655,
1108
+ ≥5,13000,average_rank,4.9,
1109
+ ≥5,13000,chartqa_relaxed_overall,0.4796,0.00999367226769808
1110
+ ≥5,13000,docvqa_val_anls,0.4888368998254502,0.006080092164054846
1111
+ ≥5,13000,infovqa_val_anls,0.1685412928680358,0.006153102666352037
1112
+ ≥5,13000,mme_total_score,715.9022609043617,
1113
+ ≥5,13000,mmmu_val_mmmu_acc,0.27,
1114
+ ≥5,13000,mmstar_average,0.30550310907874534,
1115
+ ≥5,13000,ocrbench_ocrbench_accuracy,0.39,
1116
+ ≥5,13000,seedbench_seed_all,0.46375764313507506,
1117
+ ≥5,13000,textvqa_val_exact_match,0.40596000000000004,0.006708225975557757
1118
+ ≥5,14000,ai2d_exact_match,0.27266839378238344,0.008015217564479094
1119
+ ≥5,14000,average,0.35876061642606916,
1120
+ ≥5,14000,average_rank,4.9,
1121
+ ≥5,14000,chartqa_relaxed_overall,0.4832,0.009996353076494045
1122
+ ≥5,14000,docvqa_val_anls,0.4686745608937551,0.005954780465596843
1123
+ ≥5,14000,infovqa_val_anls,0.16026985404926572,0.00587737555538511
1124
+ ≥5,14000,mme_total_score,694.7702080832332,
1125
+ ≥5,14000,mmmu_val_mmmu_acc,0.27778,
1126
+ ≥5,14000,mmstar_average,0.3065739842454048,
1127
+ ≥5,14000,ocrbench_ocrbench_accuracy,0.388,
1128
+ ≥5,14000,seedbench_seed_all,0.46575875486381324,
1129
+ ≥5,14000,textvqa_val_exact_match,0.40592,0.006717590038338499
1130
+ ≥5,15000,ai2d_exact_match,0.26295336787564766,0.007923526907377253
1131
+ ≥5,15000,average,0.3594508046372947,
1132
+ ≥5,15000,average_rank,4.9,
1133
+ ≥5,15000,chartqa_relaxed_overall,0.4904,0.010000156861514821
1134
+ ≥5,15000,docvqa_val_anls,0.47702085294845603,0.006014469495902542
1135
+ ≥5,15000,infovqa_val_anls,0.1709556715444569,0.006117350998294382
1136
+ ≥5,15000,mme_total_score,748.1163465386154,
1137
+ ≥5,15000,mmmu_val_mmmu_acc,0.25667,
1138
+ ≥5,15000,mmstar_average,0.2990729469212882,
1139
+ ≥5,15000,ocrbench_ocrbench_accuracy,0.404,
1140
+ ≥5,15000,seedbench_seed_all,0.46392440244580324,
1141
+ ≥5,15000,textvqa_val_exact_match,0.4100599999999999,0.0067243737790625615
1142
+ ≥5,16000,ai2d_exact_match,0.28950777202072536,0.00816284339533906
1143
+ ≥5,16000,average,0.3652803192394071,
1144
+ ≥5,16000,average_rank,4.9,
1145
+ ≥5,16000,chartqa_relaxed_overall,0.5004,0.010001997399559365
1146
+ ≥5,16000,docvqa_val_anls,0.4789319968433556,0.005936381904079473
1147
+ ≥5,16000,infovqa_val_anls,0.16818261112655605,0.006062058685336811
1148
+ ≥5,16000,mme_total_score,703.8838535414166,
1149
+ ≥5,16000,mmmu_val_mmmu_acc,0.28111,
1150
+ ≥5,16000,mmstar_average,0.30021933140749574,
1151
+ ≥5,16000,ocrbench_ocrbench_accuracy,0.392,
1152
+ ≥5,16000,seedbench_seed_all,0.4640911617565314,
1153
+ ≥5,16000,textvqa_val_exact_match,0.41308,0.006723304491442948
1154
+ ≥5,17000,ai2d_exact_match,0.28335492227979275,0.008110527983566212
1155
+ ≥5,17000,average,0.36065417779712866,
1156
+ ≥5,17000,average_rank,5.0,
1157
+ ≥5,17000,chartqa_relaxed_overall,0.4688,0.009982508912777261
1158
+ ≥5,17000,docvqa_val_anls,0.4676527518642357,0.00590362287731878
1159
+ ≥5,17000,infovqa_val_anls,0.16818540516392913,0.00605571000794457
1160
+ ≥5,17000,mme_total_score,754.0354141656662,
1161
+ ≥5,17000,mmmu_val_mmmu_acc,0.26222,
1162
+ ≥5,17000,mmstar_average,0.31626391497403816,
1163
+ ≥5,17000,ocrbench_ocrbench_accuracy,0.404,
1164
+ ≥5,17000,seedbench_seed_all,0.46309060589216233,
1165
+ ≥5,17000,textvqa_val_exact_match,0.41231999999999996,0.006722044383678169
1166
+ ≥5,18000,ai2d_exact_match,0.2911269430051813,0.00817630569100236
1167
+ ≥5,18000,average,0.3642489832139911,
1168
+ ≥5,18000,average_rank,4.9,
1169
+ ≥5,18000,chartqa_relaxed_overall,0.488,0.009999119609104738
1170
+ ≥5,18000,docvqa_val_anls,0.4852288069276555,0.006044640681137398
1171
+ ≥5,18000,infovqa_val_anls,0.1659765406298008,0.006009331694189444
1172
+ ≥5,18000,mme_total_score,748.4861944777911,
1173
+ ≥5,18000,mmmu_val_mmmu_acc,0.28111,
1174
+ ≥5,18000,mmstar_average,0.3014618713149217,
1175
+ ≥5,18000,ocrbench_ocrbench_accuracy,0.389,
1176
+ ≥5,18000,seedbench_seed_all,0.4660366870483602,
1177
+ ≥5,18000,textvqa_val_exact_match,0.4103,0.0067180509406887
1178
+ ≥5,19000,ai2d_exact_match,0.2817357512953368,0.008096452844781159
1179
+ ≥5,19000,average,0.35871512802442374,
1180
+ ≥5,19000,average_rank,4.7,
1181
+ ≥5,19000,chartqa_relaxed_overall,0.452,0.009955804699716018
1182
+ ≥5,19000,docvqa_val_anls,0.4693437417424619,0.005945802716190409
1183
+ ≥5,19000,infovqa_val_anls,0.17352672765291935,0.006108049035774969
1184
+ ≥5,19000,mme_total_score,757.4390756302521,
1185
+ ≥5,19000,mmmu_val_mmmu_acc,0.29556,
1186
+ ≥5,19000,mmstar_average,0.299605929305638,
1187
+ ≥5,19000,ocrbench_ocrbench_accuracy,0.382,
1188
+ ≥5,19000,seedbench_seed_all,0.4672040022234575,
1189
+ ≥5,19000,textvqa_val_exact_match,0.40746,0.006711235192985202
1190
+ ≥5,20000,ai2d_exact_match,0.28950777202072536,0.008162843395339051
1191
+ ≥5,20000,average,0.3571101844602158,
1192
+ ≥5,20000,average_rank,5.0,
1193
+ ≥5,20000,chartqa_relaxed_overall,0.452,0.009955804699716018
1194
+ ≥5,20000,docvqa_val_anls,0.4781541164812954,0.006040598891772297
1195
+ ≥5,20000,infovqa_val_anls,0.16871824680773087,0.00599943702354704
1196
+ ≥5,20000,mme_total_score,713.3514405762305,
1197
+ ≥5,20000,mmmu_val_mmmu_acc,0.26667,
1198
+ ≥5,20000,mmstar_average,0.30644375940695473,
1199
+ ≥5,20000,ocrbench_ocrbench_accuracy,0.398,
1200
+ ≥5,20000,seedbench_seed_all,0.4599777654252362,
1201
+ ≥5,20000,textvqa_val_exact_match,0.39452000000000004,0.006680937127692554
app/src/content/assets/data/formatting_filters.csv ADDED
@@ -0,0 +1,1201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run,step,metric,value,stderr
2
+ Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
3
+ Baseline,1000,average,0.27120689295763617,
4
+ Baseline,1000,average_rank,3.8,
5
+ Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
6
+ Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
7
+ Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
8
+ Baseline,1000,mme_total_score,977.4280712284914,
9
+ Baseline,1000,mmmu_val_mmmu_acc,0.25222,
10
+ Baseline,1000,mmstar_average,0.23215874078908072,
11
+ Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
12
+ Baseline,1000,seedbench_seed_all,0.2563646470261256,
13
+ Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
14
+ Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
15
+ Baseline,2000,average,0.3202068275596269,
16
+ Baseline,2000,average_rank,3.7,
17
+ Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
18
+ Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
19
+ Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
20
+ Baseline,2000,mme_total_score,1049.3036214485794,
21
+ Baseline,2000,mmmu_val_mmmu_acc,0.24556,
22
+ Baseline,2000,mmstar_average,0.21305462434540698,
23
+ Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
24
+ Baseline,2000,seedbench_seed_all,0.258532518065592,
25
+ Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
26
+ Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
27
+ Baseline,3000,average,0.3507423834414229,
28
+ Baseline,3000,average_rank,2.6,
29
+ Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
30
+ Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
31
+ Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
32
+ Baseline,3000,mme_total_score,1170.2383953581434,
33
+ Baseline,3000,mmmu_val_mmmu_acc,0.27556,
34
+ Baseline,3000,mmstar_average,0.25432376938577683,
35
+ Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
36
+ Baseline,3000,seedbench_seed_all,0.2792106725958866,
37
+ Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
38
+ Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
39
+ Baseline,4000,average,0.36961781722974835,
40
+ Baseline,4000,average_rank,2.8,
41
+ Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
42
+ Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
43
+ Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
44
+ Baseline,4000,mme_total_score,1155.203781512605,
45
+ Baseline,4000,mmmu_val_mmmu_acc,0.25556,
46
+ Baseline,4000,mmstar_average,0.2575590188757354,
47
+ Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
48
+ Baseline,4000,seedbench_seed_all,0.33913285158421347,
49
+ Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
50
+ Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
51
+ Baseline,5000,average,0.3974627910380972,
52
+ Baseline,5000,average_rank,3.1,
53
+ Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
54
+ Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
55
+ Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
56
+ Baseline,5000,mme_total_score,1181.4653861544618,
57
+ Baseline,5000,mmmu_val_mmmu_acc,0.26667,
58
+ Baseline,5000,mmstar_average,0.29596648146165705,
59
+ Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
60
+ Baseline,5000,seedbench_seed_all,0.43107281823235133,
61
+ Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
62
+ Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
63
+ Baseline,6000,average,0.4161227404571003,
64
+ Baseline,6000,average_rank,2.3,
65
+ Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
66
+ Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
67
+ Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
68
+ Baseline,6000,mme_total_score,1284.1648659463785,
69
+ Baseline,6000,mmmu_val_mmmu_acc,0.27111,
70
+ Baseline,6000,mmstar_average,0.2978489412854164,
71
+ Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
72
+ Baseline,6000,seedbench_seed_all,0.4795997776542524,
73
+ Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
74
+ Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
75
+ Baseline,7000,average,0.4291083177345374,
76
+ Baseline,7000,average_rank,2.6,
77
+ Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
78
+ Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
79
+ Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
80
+ Baseline,7000,mme_total_score,1185.875650260104,
81
+ Baseline,7000,mmmu_val_mmmu_acc,0.26556,
82
+ Baseline,7000,mmstar_average,0.31372400960777047,
83
+ Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
84
+ Baseline,7000,seedbench_seed_all,0.4964424680377988,
85
+ Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
86
+ Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
87
+ Baseline,8000,average,0.43846759477995995,
88
+ Baseline,8000,average_rank,2.1,
89
+ Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
90
+ Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
91
+ Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
92
+ Baseline,8000,mme_total_score,1199.2409963985594,
93
+ Baseline,8000,mmmu_val_mmmu_acc,0.28111,
94
+ Baseline,8000,mmstar_average,0.33512257186205047,
95
+ Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
96
+ Baseline,8000,seedbench_seed_all,0.5024458032240133,
97
+ Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
98
+ Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
99
+ Baseline,9000,average,0.4422510732201056,
100
+ Baseline,9000,average_rank,2.1,
101
+ Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
102
+ Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
103
+ Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
104
+ Baseline,9000,mme_total_score,1231.5195078031213,
105
+ Baseline,9000,mmmu_val_mmmu_acc,0.25889,
106
+ Baseline,9000,mmstar_average,0.3216444898242951,
107
+ Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
108
+ Baseline,9000,seedbench_seed_all,0.5120622568093385,
109
+ Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
110
+ Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
111
+ Baseline,10000,average,0.4523875703250908,
112
+ Baseline,10000,average_rank,2.4,
113
+ Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
114
+ Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
115
+ Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
116
+ Baseline,10000,mme_total_score,1240.8218287314926,
117
+ Baseline,10000,mmmu_val_mmmu_acc,0.28778,
118
+ Baseline,10000,mmstar_average,0.32972717906018517,
119
+ Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
120
+ Baseline,10000,seedbench_seed_all,0.5217342968315731,
121
+ Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
122
+ Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
123
+ Baseline,11000,average,0.4561398159525099,
124
+ Baseline,11000,average_rank,2.1,
125
+ Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
126
+ Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
127
+ Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
128
+ Baseline,11000,mme_total_score,1322.9488795518205,
129
+ Baseline,11000,mmmu_val_mmmu_acc,0.27778,
130
+ Baseline,11000,mmstar_average,0.3298563439522548,
131
+ Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
132
+ Baseline,11000,seedbench_seed_all,0.5237354085603113,
133
+ Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
134
+ Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
135
+ Baseline,12000,average,0.4582751140055433,
136
+ Baseline,12000,average_rank,2.4,
137
+ Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
138
+ Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
139
+ Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
140
+ Baseline,12000,mme_total_score,1225.6453581432572,
141
+ Baseline,12000,mmmu_val_mmmu_acc,0.27889,
142
+ Baseline,12000,mmstar_average,0.34010867846816534,
143
+ Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
144
+ Baseline,12000,seedbench_seed_all,0.5350194552529183,
145
+ Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
146
+ Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
147
+ Baseline,13000,average,0.4692868662590049,
148
+ Baseline,13000,average_rank,1.6,
149
+ Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
150
+ Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
151
+ Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
152
+ Baseline,13000,mme_total_score,1281.7122849139657,
153
+ Baseline,13000,mmmu_val_mmmu_acc,0.28222,
154
+ Baseline,13000,mmstar_average,0.3453069542917521,
155
+ Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
156
+ Baseline,13000,seedbench_seed_all,0.5442468037798777,
157
+ Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
158
+ Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
159
+ Baseline,14000,average,0.47352486841689195,
160
+ Baseline,14000,average_rank,1.7,
161
+ Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
162
+ Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
163
+ Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
164
+ Baseline,14000,mme_total_score,1309.1444577831132,
165
+ Baseline,14000,mmmu_val_mmmu_acc,0.28111,
166
+ Baseline,14000,mmstar_average,0.34575818188776586,
167
+ Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
168
+ Baseline,14000,seedbench_seed_all,0.5483602001111729,
169
+ Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
170
+ Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
171
+ Baseline,15000,average,0.47878665012878824,
172
+ Baseline,15000,average_rank,1.6,
173
+ Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
174
+ Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
175
+ Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
176
+ Baseline,15000,mme_total_score,1384.2171868747498,
177
+ Baseline,15000,mmmu_val_mmmu_acc,0.30222,
178
+ Baseline,15000,mmstar_average,0.35408135695920684,
179
+ Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
180
+ Baseline,15000,seedbench_seed_all,0.5411339633129516,
181
+ Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
182
+ Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
183
+ Baseline,16000,average,0.47665128022935843,
184
+ Baseline,16000,average_rank,1.6,
185
+ Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
186
+ Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
187
+ Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
188
+ Baseline,16000,mme_total_score,1317.8491396558625,
189
+ Baseline,16000,mmmu_val_mmmu_acc,0.27556,
190
+ Baseline,16000,mmstar_average,0.33214333327093315,
191
+ Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
192
+ Baseline,16000,seedbench_seed_all,0.5463590883824346,
193
+ Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
194
+ Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
195
+ Baseline,17000,average,0.4777141780162423,
196
+ Baseline,17000,average_rank,1.9,
197
+ Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
198
+ Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
199
+ Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
200
+ Baseline,17000,mme_total_score,1381.9161664665867,
201
+ Baseline,17000,mmmu_val_mmmu_acc,0.27667,
202
+ Baseline,17000,mmstar_average,0.3370289492329521,
203
+ Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
204
+ Baseline,17000,seedbench_seed_all,0.5510283490828238,
205
+ Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
206
+ Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
207
+ Baseline,18000,average,0.4819834595278701,
208
+ Baseline,18000,average_rank,1.7,
209
+ Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
210
+ Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
211
+ Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
212
+ Baseline,18000,mme_total_score,1336.922769107643,
213
+ Baseline,18000,mmmu_val_mmmu_acc,0.28667,
214
+ Baseline,18000,mmstar_average,0.34482796716566916,
215
+ Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
216
+ Baseline,18000,seedbench_seed_all,0.5543079488604781,
217
+ Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
218
+ Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
219
+ Baseline,19000,average,0.4899006713916878,
220
+ Baseline,19000,average_rank,1.5,
221
+ Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
222
+ Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
223
+ Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
224
+ Baseline,19000,mme_total_score,1406.6628651460583,
225
+ Baseline,19000,mmmu_val_mmmu_acc,0.28333,
226
+ Baseline,19000,mmstar_average,0.356220913822775,
227
+ Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
228
+ Baseline,19000,seedbench_seed_all,0.554585881045025,
229
+ Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
230
+ Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
231
+ Baseline,20000,average,0.4873169067639118,
232
+ Baseline,20000,average_rank,1.7,
233
+ Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
234
+ Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
235
+ Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
236
+ Baseline,20000,mme_total_score,1324.6738695478193,
237
+ Baseline,20000,mmmu_val_mmmu_acc,0.30111,
238
+ Baseline,20000,mmstar_average,0.33806766134497995,
239
+ Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
240
+ Baseline,20000,seedbench_seed_all,0.5587548638132296,
241
+ Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
242
+ ≥2,1000,ai2d_exact_match,0.2619818652849741,0.007914086941902845
243
+ ≥2,1000,average,0.2852885776543714,
244
+ ≥2,1000,average_rank,2.8,
245
+ ≥2,1000,chartqa_relaxed_overall,0.36,0.009601920576192066
246
+ ≥2,1000,docvqa_val_anls,0.3691495236959511,0.0059102400877721764
247
+ ≥2,1000,infovqa_val_anls,0.18005913830944342,0.006300821228003093
248
+ ≥2,1000,mme_total_score,1034.4992997198879,
249
+ ≥2,1000,mmmu_val_mmmu_acc,0.25222,
250
+ ≥2,1000,mmstar_average,0.20333316409480473,
251
+ ≥2,1000,ocrbench_ocrbench_accuracy,0.331,
252
+ ≥2,1000,seedbench_seed_all,0.264313507504169,
253
+ ≥2,1000,textvqa_val_exact_match,0.34554,0.006483180392801138
254
+ ≥2,2000,ai2d_exact_match,0.25971502590673573,0.007891865786132407
255
+ ≥2,2000,average,0.3309474525195546,
256
+ ≥2,2000,average_rank,2.3,
257
+ ≥2,2000,chartqa_relaxed_overall,0.4664,0.009979391329160321
258
+ ≥2,2000,docvqa_val_anls,0.44591152951098784,0.006197056264354256
259
+ ≥2,2000,infovqa_val_anls,0.20775747304541303,0.006645281501282388
260
+ ≥2,2000,mme_total_score,1083.1982793117247,
261
+ ≥2,2000,mmmu_val_mmmu_acc,0.26222,
262
+ ≥2,2000,mmstar_average,0.23515873070535015,
263
+ ≥2,2000,ocrbench_ocrbench_accuracy,0.413,
264
+ ≥2,2000,seedbench_seed_all,0.2757643135075042,
265
+ ≥2,2000,textvqa_val_exact_match,0.4126,0.006707581257746032
266
+ ≥2,3000,ai2d_exact_match,0.27299222797927464,0.00801819019286542
267
+ ≥2,3000,average,0.35386512749374127,
268
+ ≥2,3000,average_rank,1.8,
269
+ ≥2,3000,chartqa_relaxed_overall,0.5124,0.009998924311892653
270
+ ≥2,3000,docvqa_val_anls,0.48910828732243933,0.006274136020264289
271
+ ≥2,3000,infovqa_val_anls,0.2070472808493129,0.0065577848697521875
272
+ ≥2,3000,mme_total_score,1128.4556822729091,
273
+ ≥2,3000,mmmu_val_mmmu_acc,0.25222,
274
+ ≥2,3000,mmstar_average,0.25601322622316125,
275
+ ≥2,3000,ocrbench_ocrbench_accuracy,0.447,
276
+ ≥2,3000,seedbench_seed_all,0.30522512506948307,
277
+ ≥2,3000,textvqa_val_exact_match,0.44278,0.006763750733490772
278
+ ≥2,4000,ai2d_exact_match,0.3173575129533679,0.008377274276497445
279
+ ≥2,4000,average,0.3859767914191833,
280
+ ≥2,4000,average_rank,2.4,
281
+ ≥2,4000,chartqa_relaxed_overall,0.5388,0.0099718403035556
282
+ ≥2,4000,docvqa_val_anls,0.5360750144064242,0.006293888693576319
283
+ ≥2,4000,infovqa_val_anls,0.20188347210673038,0.0064101781168258935
284
+ ≥2,4000,mme_total_score,1110.1481592637056,
285
+ ≥2,4000,mmmu_val_mmmu_acc,0.25222,
286
+ ≥2,4000,mmstar_average,0.28905163247788923,
287
+ ≥2,4000,ocrbench_ocrbench_accuracy,0.473,
288
+ ≥2,4000,seedbench_seed_all,0.4102834908282379,
289
+ ≥2,4000,textvqa_val_exact_match,0.4551200000000001,0.0067829756649846785
290
+ ≥2,5000,ai2d_exact_match,0.32577720207253885,0.008435168191407938
291
+ ≥2,5000,average,0.4029564015858402,
292
+ ≥2,5000,average_rank,2.5,
293
+ ≥2,5000,chartqa_relaxed_overall,0.5628,0.00992279440175477
294
+ ≥2,5000,docvqa_val_anls,0.548770508666009,0.006315482288099859
295
+ ≥2,5000,infovqa_val_anls,0.20783386531525747,0.006421967027729742
296
+ ≥2,5000,mme_total_score,1206.0127050820329,
297
+ ≥2,5000,mmmu_val_mmmu_acc,0.25667,
298
+ ≥2,5000,mmstar_average,0.3210115801865161,
299
+ ≥2,5000,ocrbench_ocrbench_accuracy,0.484,
300
+ ≥2,5000,seedbench_seed_all,0.4440244580322401,
301
+ ≥2,5000,textvqa_val_exact_match,0.47572,0.006783457774606987
302
+ ≥2,6000,ai2d_exact_match,0.3542746113989637,0.00860846328571982
303
+ ≥2,6000,average,0.4118759304334577,
304
+ ≥2,6000,average_rank,3.3,
305
+ ≥2,6000,chartqa_relaxed_overall,0.5644,0.00991868984106597
306
+ ≥2,6000,docvqa_val_anls,0.5618652265799138,0.006261889040657647
307
+ ≥2,6000,infovqa_val_anls,0.2101901707833487,0.006387610514125727
308
+ ≥2,6000,mme_total_score,1135.471288515406,
309
+ ≥2,6000,mmmu_val_mmmu_acc,0.26333,
310
+ ≥2,6000,mmstar_average,0.3255666447386709,
311
+ ≥2,6000,ocrbench_ocrbench_accuracy,0.482,
312
+ ≥2,6000,seedbench_seed_all,0.4740967204002223,
313
+ ≥2,6000,textvqa_val_exact_match,0.47116,0.00678908456375694
314
+ ≥2,7000,ai2d_exact_match,0.37338082901554404,0.008705816961084268
315
+ ≥2,7000,average,0.4291995483001856,
316
+ ≥2,7000,average_rank,2.4,
317
+ ≥2,7000,chartqa_relaxed_overall,0.5716,0.009898917689756362
318
+ ≥2,7000,docvqa_val_anls,0.5846126379804475,0.006218823793449337
319
+ ≥2,7000,infovqa_val_anls,0.2243908724169204,0.006651785538916188
320
+ ≥2,7000,mme_total_score,1249.9180672268908,
321
+ ≥2,7000,mmmu_val_mmmu_acc,0.27556,
322
+ ≥2,7000,mmstar_average,0.32644844909642934,
323
+ ≥2,7000,ocrbench_ocrbench_accuracy,0.506,
324
+ ≥2,7000,seedbench_seed_all,0.4936631461923291,
325
+ ≥2,7000,textvqa_val_exact_match,0.5071399999999999,0.00678246300696791
326
+ ≥2,8000,ai2d_exact_match,0.3963730569948187,0.008803757198545703
327
+ ≥2,8000,average,0.43448504857720643,
328
+ ≥2,8000,average_rank,3.0,
329
+ ≥2,8000,chartqa_relaxed_overall,0.5784,0.009878279615563902
330
+ ≥2,8000,docvqa_val_anls,0.5935884981677085,0.006228109848938283
331
+ ≥2,8000,infovqa_val_anls,0.22034669568379356,0.006538842004996925
332
+ ≥2,8000,mme_total_score,1251.6327531012405,
333
+ ≥2,8000,mmmu_val_mmmu_acc,0.27444,
334
+ ≥2,8000,mmstar_average,0.3368503047476477,
335
+ ≥2,8000,ocrbench_ocrbench_accuracy,0.516,
336
+ ≥2,8000,seedbench_seed_all,0.4963868816008894,
337
+ ≥2,8000,textvqa_val_exact_match,0.49798000000000003,0.006777844181917349
338
+ ≥2,9000,ai2d_exact_match,0.4015544041450777,0.008822998789014784
339
+ ≥2,9000,average,0.4409076865862069,
340
+ ≥2,9000,average_rank,2.8,
341
+ ≥2,9000,chartqa_relaxed_overall,0.5952,0.0098190299592035
342
+ ≥2,9000,docvqa_val_anls,0.6142957639909281,0.006149142953850004
343
+ ≥2,9000,infovqa_val_anls,0.225441641847203,0.006565814507342015
344
+ ≥2,9000,mme_total_score,1170.923569427771,
345
+ ≥2,9000,mmmu_val_mmmu_acc,0.28,
346
+ ≥2,9000,mmstar_average,0.32763888124373686,
347
+ ≥2,9000,ocrbench_ocrbench_accuracy,0.514,
348
+ ≥2,9000,seedbench_seed_all,0.5012784880489161,
349
+ ≥2,9000,textvqa_val_exact_match,0.50876,0.006788539558245703
350
+ ≥2,10000,ai2d_exact_match,0.4018782383419689,0.008824167272304229
351
+ ≥2,10000,average,0.44844067183729286,
352
+ ≥2,10000,average_rank,2.7,
353
+ ≥2,10000,chartqa_relaxed_overall,0.5956,0.009817474681589429
354
+ ≥2,10000,docvqa_val_anls,0.6161881255627961,0.006150295182189919
355
+ ≥2,10000,infovqa_val_anls,0.2273186020139702,0.006609762944776786
356
+ ≥2,10000,mme_total_score,1244.6918767507002,
357
+ ≥2,10000,mmmu_val_mmmu_acc,0.28667,
358
+ ≥2,10000,mmstar_average,0.3405769394273513,
359
+ ≥2,10000,ocrbench_ocrbench_accuracy,0.529,
360
+ ≥2,10000,seedbench_seed_all,0.5174541411895498,
361
+ ≥2,10000,textvqa_val_exact_match,0.52128,0.0067723707312184415
362
+ ≥2,11000,ai2d_exact_match,0.41386010362694303,0.008864599272573477
363
+ ≥2,11000,average,0.4508015001273205,
364
+ ≥2,11000,average_rank,3.1,
365
+ ≥2,11000,chartqa_relaxed_overall,0.5916,0.0098327233755248
366
+ ≥2,11000,docvqa_val_anls,0.6132516406541649,0.006147223601932411
367
+ ≥2,11000,infovqa_val_anls,0.23136501765139353,0.006670154065298524
368
+ ≥2,11000,mme_total_score,1193.1198479391755,
369
+ ≥2,11000,mmmu_val_mmmu_acc,0.28222,
370
+ ≥2,11000,mmstar_average,0.34055130285985363,
371
+ ≥2,11000,ocrbench_ocrbench_accuracy,0.544,
372
+ ≥2,11000,seedbench_seed_all,0.5137854363535297,
373
+ ≥2,11000,textvqa_val_exact_match,0.52658,0.006779520123033763
374
+ ≥2,12000,ai2d_exact_match,0.42033678756476683,0.008884198538329101
375
+ ≥2,12000,average,0.4593162089992856,
376
+ ≥2,12000,average_rank,2.4,
377
+ ≥2,12000,chartqa_relaxed_overall,0.612,0.009747841205275417
378
+ ≥2,12000,docvqa_val_anls,0.6322256818549263,0.006037251396803284
379
+ ≥2,12000,infovqa_val_anls,0.23499854511160906,0.006635085630122106
380
+ ≥2,12000,mme_total_score,1282.3226290516207,
381
+ ≥2,12000,mmmu_val_mmmu_acc,0.29444,
382
+ ≥2,12000,mmstar_average,0.3455632878074604,
383
+ ≥2,12000,ocrbench_ocrbench_accuracy,0.542,
384
+ ≥2,12000,seedbench_seed_all,0.5148415786548082,
385
+ ≥2,12000,textvqa_val_exact_match,0.5374399999999999,0.0067549667056943374
386
+ ≥2,13000,ai2d_exact_match,0.4329663212435233,0.008917911748577596
387
+ ≥2,13000,average,0.4594856750450977,
388
+ ≥2,13000,average_rank,3.1,
389
+ ≥2,13000,chartqa_relaxed_overall,0.6116,0.009749676839741497
390
+ ≥2,13000,docvqa_val_anls,0.6480115225202001,0.006082136258345928
391
+ ≥2,13000,infovqa_val_anls,0.2390399772273204,0.006801403608154099
392
+ ≥2,13000,mme_total_score,1255.4888955582232,
393
+ ≥2,13000,mmmu_val_mmmu_acc,0.26667,
394
+ ≥2,13000,mmstar_average,0.3276926929918222,
395
+ ≥2,13000,ocrbench_ocrbench_accuracy,0.551,
396
+ ≥2,13000,seedbench_seed_all,0.5190105614230128,
397
+ ≥2,13000,textvqa_val_exact_match,0.5393800000000001,0.006748937157104821
398
+ ≥2,14000,ai2d_exact_match,0.43523316062176165,0.00892333645202351
399
+ ≥2,14000,average,0.46380397688227554,
400
+ ≥2,14000,average_rank,3.4,
401
+ ≥2,14000,chartqa_relaxed_overall,0.6136,0.009740429476494075
402
+ ≥2,14000,docvqa_val_anls,0.6474419557198757,0.006056802443739013
403
+ ≥2,14000,infovqa_val_anls,0.24341248035748822,0.006789396426159645
404
+ ≥2,14000,mme_total_score,1209.5489195678272,
405
+ ≥2,14000,mmmu_val_mmmu_acc,0.27556,
406
+ ≥2,14000,mmstar_average,0.35309886783724065,
407
+ ≥2,14000,ocrbench_ocrbench_accuracy,0.545,
408
+ ≥2,14000,seedbench_seed_all,0.5207893274041134,
409
+ ≥2,14000,textvqa_val_exact_match,0.5400999999999999,0.006762835587905254
410
+ ≥2,15000,ai2d_exact_match,0.43588082901554404,0.008924851504668983
411
+ ≥2,15000,average,0.46327247775474995,
412
+ ≥2,15000,average_rank,3.4,
413
+ ≥2,15000,chartqa_relaxed_overall,0.614,0.009738559226822298
414
+ ≥2,15000,docvqa_val_anls,0.638973421646662,0.005999307255506728
415
+ ≥2,15000,infovqa_val_anls,0.23590457960067904,0.006699424952743598
416
+ ≥2,15000,mme_total_score,1230.12775110044,
417
+ ≥2,15000,mmmu_val_mmmu_acc,0.28667,
418
+ ≥2,15000,mmstar_average,0.3545309625815601,
419
+ ≥2,15000,ocrbench_ocrbench_accuracy,0.536,
420
+ ≥2,15000,seedbench_seed_all,0.5225125069483046,
421
+ ≥2,15000,textvqa_val_exact_match,0.54498,0.006749227387936104
422
+ ≥2,16000,ai2d_exact_match,0.4413860103626943,0.008937105222785166
423
+ ≥2,16000,average,0.4691799993543692,
424
+ ≥2,16000,average_rank,3.1,
425
+ ≥2,16000,chartqa_relaxed_overall,0.6168,0.009725273074549106
426
+ ≥2,16000,docvqa_val_anls,0.6539654303329543,0.00605387835402321
427
+ ≥2,16000,infovqa_val_anls,0.251011584102177,0.006888371171829252
428
+ ≥2,16000,mme_total_score,1235.6986794717886,
429
+ ≥2,16000,mmmu_val_mmmu_acc,0.28556,
430
+ ≥2,16000,mmstar_average,0.35467603553935745,
431
+ ≥2,16000,ocrbench_ocrbench_accuracy,0.545,
432
+ ≥2,16000,seedbench_seed_all,0.5256809338521401,
433
+ ≥2,16000,textvqa_val_exact_match,0.5485399999999999,0.0067546057338473825
434
+ ≥2,17000,ai2d_exact_match,0.43976683937823835,0.008933617011753861
435
+ ≥2,17000,average,0.47032074037035837,
436
+ ≥2,17000,average_rank,3.1,
437
+ ≥2,17000,chartqa_relaxed_overall,0.6184,0.009717527882093043
438
+ ≥2,17000,docvqa_val_anls,0.655318828070759,0.005978737407680595
439
+ ≥2,17000,infovqa_val_anls,0.24899610305034758,0.006825123869520012
440
+ ≥2,17000,mme_total_score,1246.5548219287716,
441
+ ≥2,17000,mmmu_val_mmmu_acc,0.29556,
442
+ ≥2,17000,mmstar_average,0.3399003792152031,
443
+ ≥2,17000,ocrbench_ocrbench_accuracy,0.553,
444
+ ≥2,17000,seedbench_seed_all,0.5241245136186771,
445
+ ≥2,17000,textvqa_val_exact_match,0.55782,0.0067278542139723035
446
+ ≥2,18000,ai2d_exact_match,0.4413860103626943,0.008937105222785166
447
+ ≥2,18000,average,0.4720458439616472,
448
+ ≥2,18000,average_rank,3.0,
449
+ ≥2,18000,chartqa_relaxed_overall,0.6256,0.009681288495793083
450
+ ≥2,18000,docvqa_val_anls,0.6595541124701471,0.00598982698063352
451
+ ≥2,18000,infovqa_val_anls,0.24628476636774824,0.006771852338911992
452
+ ≥2,18000,mme_total_score,1246.580632252901,
453
+ ≥2,18000,mmmu_val_mmmu_acc,0.29556,
454
+ ≥2,18000,mmstar_average,0.3370877286888103,
455
+ ≥2,18000,ocrbench_ocrbench_accuracy,0.557,
456
+ ≥2,18000,seedbench_seed_all,0.5279599777654252,
457
+ ≥2,18000,textvqa_val_exact_match,0.5579800000000001,0.006730967620262408
458
+ ≥2,19000,ai2d_exact_match,0.44009067357512954,0.008934322367529354
459
+ ≥2,19000,average,0.47484048232505544,
460
+ ≥2,19000,average_rank,3.1,
461
+ ≥2,19000,chartqa_relaxed_overall,0.6236,0.009691583292459796
462
+ ≥2,19000,docvqa_val_anls,0.656845810830509,0.005998201285366962
463
+ ≥2,19000,infovqa_val_anls,0.2483428639081206,0.006859079557818024
464
+ ≥2,19000,mme_total_score,1255.002801120448,
465
+ ≥2,19000,mmmu_val_mmmu_acc,0.30667,
466
+ ≥2,19000,mmstar_average,0.3484268658746636,
467
+ ≥2,19000,ocrbench_ocrbench_accuracy,0.557,
468
+ ≥2,19000,seedbench_seed_all,0.5306281267370762,
469
+ ≥2,19000,textvqa_val_exact_match,0.56196,0.006711810587335734
470
+ ≥2,20000,ai2d_exact_match,0.44397668393782386,0.008942485993062323
471
+ ≥2,20000,average,0.4719647885204447,
472
+ ≥2,20000,average_rank,3.3,
473
+ ≥2,20000,chartqa_relaxed_overall,0.6252,0.009683361554563506
474
+ ≥2,20000,docvqa_val_anls,0.6531065052301426,0.005958657790556006
475
+ ≥2,20000,infovqa_val_anls,0.2515640311557441,0.00684713602725156
476
+ ≥2,20000,mme_total_score,1269.56512605042,
477
+ ≥2,20000,mmmu_val_mmmu_acc,0.29222,
478
+ ≥2,20000,mmstar_average,0.3405247257210479,
479
+ ≥2,20000,ocrbench_ocrbench_accuracy,0.557,
480
+ ≥2,20000,seedbench_seed_all,0.528071150639244,
481
+ ≥2,20000,textvqa_val_exact_match,0.5560200000000001,0.006742124529303335
482
+ ≥3,1000,ai2d_exact_match,0.25712435233160624,0.007866134203324925
483
+ ≥3,1000,average,0.2908366935977347,
484
+ ≥3,1000,average_rank,2.6,
485
+ ≥3,1000,chartqa_relaxed_overall,0.3724,0.009670817229291067
486
+ ≥3,1000,docvqa_val_anls,0.36190361730095816,0.005874681377878617
487
+ ≥3,1000,infovqa_val_anls,0.1897409167650202,0.006570751118077319
488
+ ≥3,1000,mme_total_score,938.3572428971589,
489
+ ≥3,1000,mmmu_val_mmmu_acc,0.26333,
490
+ ≥3,1000,mmstar_average,0.2361438073438948,
491
+ ≥3,1000,ocrbench_ocrbench_accuracy,0.313,
492
+ ≥3,1000,seedbench_seed_all,0.25758754863813227,
493
+ ≥3,1000,textvqa_val_exact_match,0.36629999999999996,0.006582851113746775
494
+ ≥3,2000,ai2d_exact_match,0.25647668393782386,0.007859644922870102
495
+ ≥3,2000,average,0.327478691314146,
496
+ ≥3,2000,average_rank,2.9,
497
+ ≥3,2000,chartqa_relaxed_overall,0.4708,0.009984929820955767
498
+ ≥3,2000,docvqa_val_anls,0.455859181323049,0.0061669106819143196
499
+ ≥3,2000,infovqa_val_anls,0.20804914764579785,0.0066905821266465505
500
+ ≥3,2000,mme_total_score,990.4248699479792,
501
+ ≥3,2000,mmmu_val_mmmu_acc,0.27111,
502
+ ≥3,2000,mmstar_average,0.21380673865938699,
503
+ ≥3,2000,ocrbench_ocrbench_accuracy,0.405,
504
+ ≥3,2000,seedbench_seed_all,0.26364647026125626,
505
+ ≥3,2000,textvqa_val_exact_match,0.40256,0.0066960030295180025
506
+ ≥3,3000,ai2d_exact_match,0.2697538860103627,0.007988222765138163
507
+ ≥3,3000,average,0.34640625458292296,
508
+ ≥3,3000,average_rank,3.1,
509
+ ≥3,3000,chartqa_relaxed_overall,0.514,0.009998079047189691
510
+ ≥3,3000,docvqa_val_anls,0.4749731938810012,0.00604931863100692
511
+ ≥3,3000,infovqa_val_anls,0.19785201580687228,0.00636819106561235
512
+ ≥3,3000,mme_total_score,1022.0748299319728,
513
+ ≥3,3000,mmmu_val_mmmu_acc,0.26556,
514
+ ≥3,3000,mmstar_average,0.2234035546364529,
515
+ ≥3,3000,ocrbench_ocrbench_accuracy,0.435,
516
+ ≥3,3000,seedbench_seed_all,0.29655364091161757,
517
+ ≥3,3000,textvqa_val_exact_match,0.44056000000000006,0.006770653264576898
518
+ ≥3,4000,ai2d_exact_match,0.31476683937823835,0.008358827401711809
519
+ ≥3,4000,average,0.3840989485719881,
520
+ ≥3,4000,average_rank,2.5,
521
+ ≥3,4000,chartqa_relaxed_overall,0.5244,0.009990083919101193
522
+ ≥3,4000,docvqa_val_anls,0.5398623644141017,0.006209437344747972
523
+ ≥3,4000,infovqa_val_anls,0.21841657659961455,0.006654701266433889
524
+ ≥3,4000,mme_total_score,1008.1938775510204,
525
+ ≥3,4000,mmmu_val_mmmu_acc,0.27778,
526
+ ≥3,4000,mmstar_average,0.25611290572758977,
527
+ ≥3,4000,ocrbench_ocrbench_accuracy,0.462,
528
+ ≥3,4000,seedbench_seed_all,0.39733185102834906,
529
+ ≥3,4000,textvqa_val_exact_match,0.46621999999999997,0.006799457981763631
530
+ ≥3,5000,ai2d_exact_match,0.3442357512953368,0.008551327504046387
531
+ ≥3,5000,average,0.4034839586685592,
532
+ ≥3,5000,average_rank,2.4,
533
+ ≥3,5000,chartqa_relaxed_overall,0.5544,0.009942625323290008
534
+ ≥3,5000,docvqa_val_anls,0.5567727758893183,0.006173642024037381
535
+ ≥3,5000,infovqa_val_anls,0.21638639427926507,0.006559084868006158
536
+ ≥3,5000,mme_total_score,1074.1284513805522,
537
+ ≥3,5000,mmmu_val_mmmu_acc,0.26778,
538
+ ≥3,5000,mmstar_average,0.3009278216170371,
539
+ ≥3,5000,ocrbench_ocrbench_accuracy,0.482,
540
+ ≥3,5000,seedbench_seed_all,0.4471928849360756,
541
+ ≥3,5000,textvqa_val_exact_match,0.46165999999999996,0.006793381991893107
542
+ ≥3,6000,ai2d_exact_match,0.36819948186528495,0.008680870162409787
543
+ ≥3,6000,average,0.41987173897944946,
544
+ ≥3,6000,average_rank,2.2,
545
+ ≥3,6000,chartqa_relaxed_overall,0.5636,0.009920755241100424
546
+ ≥3,6000,docvqa_val_anls,0.5766507662420887,0.006104661016322198
547
+ ≥3,6000,infovqa_val_anls,0.2209691160904877,0.0066290878786102805
548
+ ≥3,6000,mme_total_score,1088.6353541416568,
549
+ ≥3,6000,mmmu_val_mmmu_acc,0.29778,
550
+ ≥3,6000,mmstar_average,0.2960442855054549,
551
+ ≥3,6000,ocrbench_ocrbench_accuracy,0.496,
552
+ ≥3,6000,seedbench_seed_all,0.48360200111172874,
553
+ ≥3,6000,textvqa_val_exact_match,0.476,0.006791614329821814
554
+ ≥3,7000,ai2d_exact_match,0.3905440414507772,0.008780876258359173
555
+ ≥3,7000,average,0.4305333557001585,
556
+ ≥3,7000,average_rank,2.4,
557
+ ≥3,7000,chartqa_relaxed_overall,0.5744,0.009890651444389179
558
+ ≥3,7000,docvqa_val_anls,0.5943945047786826,0.006168637154272831
559
+ ≥3,7000,infovqa_val_anls,0.23015651757384684,0.006652654324068369
560
+ ≥3,7000,mme_total_score,1024.3486394557824,
561
+ ≥3,7000,mmmu_val_mmmu_acc,0.29,
562
+ ≥3,7000,mmstar_average,0.3086297067032336,
563
+ ≥3,7000,ocrbench_ocrbench_accuracy,0.496,
564
+ ≥3,7000,seedbench_seed_all,0.49577543079488606,
565
+ ≥3,7000,textvqa_val_exact_match,0.4949,0.006791673090238732
566
+ ≥3,8000,ai2d_exact_match,0.39863989637305697,0.008812301996070583
567
+ ≥3,8000,average,0.43894563539556736,
568
+ ≥3,8000,average_rank,2.4,
569
+ ≥3,8000,chartqa_relaxed_overall,0.5812,0.009869224115088964
570
+ ≥3,8000,docvqa_val_anls,0.597896936397571,0.006178924858305047
571
+ ≥3,8000,infovqa_val_anls,0.23624667779429379,0.006701812126185011
572
+ ≥3,8000,mme_total_score,1087.3003201280512,
573
+ ≥3,8000,mmmu_val_mmmu_acc,0.30444,
574
+ ≥3,8000,mmstar_average,0.3179238728089695,
575
+ ≥3,8000,ocrbench_ocrbench_accuracy,0.52,
576
+ ≥3,8000,seedbench_seed_all,0.5060033351862145,
577
+ ≥3,8000,textvqa_val_exact_match,0.48816000000000004,0.006805617250862191
578
+ ≥3,9000,ai2d_exact_match,0.4073834196891192,0.008843420154535594
579
+ ≥3,9000,average,0.4380819691649286,
580
+ ≥3,9000,average_rank,2.8,
581
+ ≥3,9000,chartqa_relaxed_overall,0.5892,0.009841548985529353
582
+ ≥3,9000,docvqa_val_anls,0.5926801961513722,0.00607014347283834
583
+ ≥3,9000,infovqa_val_anls,0.22884227739619317,0.006587321958723987
584
+ ≥3,9000,mme_total_score,960.1394557823129,
585
+ ≥3,9000,mmmu_val_mmmu_acc,0.29222,
586
+ ≥3,9000,mmstar_average,0.3023124740503409,
587
+ ≥3,9000,ocrbench_ocrbench_accuracy,0.516,
588
+ ≥3,9000,seedbench_seed_all,0.5108393551973318,
589
+ ≥3,9000,textvqa_val_exact_match,0.50326,0.006787480273097782
590
+ ≥3,10000,ai2d_exact_match,0.42487046632124353,0.008896983637113786
591
+ ≥3,10000,average,0.45376975700130806,
592
+ ≥3,10000,average_rank,2.5,
593
+ ≥3,10000,chartqa_relaxed_overall,0.592,0.009831228876620145
594
+ ≥3,10000,docvqa_val_anls,0.6288940533515488,0.006078026262812974
595
+ ≥3,10000,infovqa_val_anls,0.2639557991160976,0.007015193539901653
596
+ ≥3,10000,mme_total_score,1135.5116046418568,
597
+ ≥3,10000,mmmu_val_mmmu_acc,0.29556,
598
+ ≥3,10000,mmstar_average,0.3171165325775241,
599
+ ≥3,10000,ocrbench_ocrbench_accuracy,0.53,
600
+ ≥3,10000,seedbench_seed_all,0.5157309616453586,
601
+ ≥3,10000,textvqa_val_exact_match,0.5158,0.0067831610812991135
602
+ ≥3,11000,ai2d_exact_match,0.4271373056994819,0.008903088856242218
603
+ ≥3,11000,average,0.4507656942256156,
604
+ ≥3,11000,average_rank,3.1,
605
+ ≥3,11000,chartqa_relaxed_overall,0.6008,0.00979663889573671
606
+ ≥3,11000,docvqa_val_anls,0.6266233612884972,0.006097228164879785
607
+ ≥3,11000,infovqa_val_anls,0.23605295775343718,0.006674753327687541
608
+ ≥3,11000,mme_total_score,1115.0593237294918,
609
+ ≥3,11000,mmmu_val_mmmu_acc,0.27889,
610
+ ≥3,11000,mmstar_average,0.3244509807099135,
611
+ ≥3,11000,ocrbench_ocrbench_accuracy,0.524,
612
+ ≥3,11000,seedbench_seed_all,0.5219566425792107,
613
+ ≥3,11000,textvqa_val_exact_match,0.5169799999999999,0.006776837095888084
614
+ ≥3,12000,ai2d_exact_match,0.42843264248704666,0.008906491762178372
615
+ ≥3,12000,average,0.4596080978908205,
616
+ ≥3,12000,average_rank,2.6,
617
+ ≥3,12000,chartqa_relaxed_overall,0.6048,0.009779828322460816
618
+ ≥3,12000,docvqa_val_anls,0.6391083009950597,0.006038971765674556
619
+ ≥3,12000,infovqa_val_anls,0.24141834493583503,0.006794485284013245
620
+ ≥3,12000,mme_total_score,1183.3176270508202,
621
+ ≥3,12000,mmmu_val_mmmu_acc,0.28444,
622
+ ≥3,12000,mmstar_average,0.3293224419601992,
623
+ ≥3,12000,ocrbench_ocrbench_accuracy,0.555,
624
+ ≥3,12000,seedbench_seed_all,0.528071150639244,
625
+ ≥3,12000,textvqa_val_exact_match,0.5258799999999999,0.006773951756875811
626
+ ≥3,13000,ai2d_exact_match,0.43458549222797926,0.008921805911548515
627
+ ≥3,13000,average,0.4623863039639755,
628
+ ≥3,13000,average_rank,2.7,
629
+ ≥3,13000,chartqa_relaxed_overall,0.6108,0.00975332737879659
630
+ ≥3,13000,docvqa_val_anls,0.6376374898768016,0.006015671277879292
631
+ ≥3,13000,infovqa_val_anls,0.24710671614089955,0.006756961641692092
632
+ ≥3,13000,mme_total_score,1261.84493797519,
633
+ ≥3,13000,mmmu_val_mmmu_acc,0.28889,
634
+ ≥3,13000,mmstar_average,0.3264133242561134,
635
+ ≥3,13000,ocrbench_ocrbench_accuracy,0.553,
636
+ ≥3,13000,seedbench_seed_all,0.5306837131739855,
637
+ ≥3,13000,textvqa_val_exact_match,0.5323599999999999,0.0067627001192260856
638
+ ≥3,14000,ai2d_exact_match,0.4381476683937824,0.008930032335354969
639
+ ≥3,14000,average,0.4678786302971554,
640
+ ≥3,14000,average_rank,2.8,
641
+ ≥3,14000,chartqa_relaxed_overall,0.6104,0.009755142291143075
642
+ ≥3,14000,docvqa_val_anls,0.6523739582747238,0.006065891171788989
643
+ ≥3,14000,infovqa_val_anls,0.2541881734588241,0.006851623469491799
644
+ ≥3,14000,mme_total_score,1188.5243097238895,
645
+ ≥3,14000,mmmu_val_mmmu_acc,0.30333,
646
+ ≥3,14000,mmstar_average,0.3360144984503474,
647
+ ≥3,14000,ocrbench_ocrbench_accuracy,0.544,
648
+ ≥3,14000,seedbench_seed_all,0.5320733740967204,
649
+ ≥3,14000,textvqa_val_exact_match,0.54038,0.006754155375986593
650
+ ≥3,15000,ai2d_exact_match,0.4420336787564767,0.008938473522297184
651
+ ≥3,15000,average,0.4717225541426424,
652
+ ≥3,15000,average_rank,2.6,
653
+ ≥3,15000,chartqa_relaxed_overall,0.6204,0.009707689307588963
654
+ ≥3,15000,docvqa_val_anls,0.6657062061222615,0.005987913582977679
655
+ ≥3,15000,infovqa_val_anls,0.2547770182395889,0.00697535381427897
656
+ ≥3,15000,mme_total_score,1150.081532613045,
657
+ ≥3,15000,mmmu_val_mmmu_acc,0.29889,
658
+ ≥3,15000,mmstar_average,0.32974187627218104,
659
+ ≥3,15000,ocrbench_ocrbench_accuracy,0.556,
660
+ ≥3,15000,seedbench_seed_all,0.533574207893274,
661
+ ≥3,15000,textvqa_val_exact_match,0.54438,0.006740769296908389
662
+ ≥3,16000,ai2d_exact_match,0.44397668393782386,0.008942485993062323
663
+ ≥3,16000,average,0.4725329505079693,
664
+ ≥3,16000,average_rank,2.7,
665
+ ≥3,16000,chartqa_relaxed_overall,0.6136,0.009740429476494075
666
+ ≥3,16000,docvqa_val_anls,0.6627356411508976,0.00599828184206493
667
+ ≥3,16000,infovqa_val_anls,0.25144929243788827,0.006859545868541458
668
+ ≥3,16000,mme_total_score,1189.4136654661866,
669
+ ≥3,16000,mmmu_val_mmmu_acc,0.30556,
670
+ ≥3,16000,mmstar_average,0.32828609880164517,
671
+ ≥3,16000,ocrbench_ocrbench_accuracy,0.565,
672
+ ≥3,16000,seedbench_seed_all,0.5359088382434686,
673
+ ≥3,16000,textvqa_val_exact_match,0.54628,0.006755557699551266
674
+ ≥3,17000,ai2d_exact_match,0.4423575129533679,0.008939151893135124
675
+ ≥3,17000,average,0.47219284094380815,
676
+ ≥3,17000,average_rank,2.9,
677
+ ≥3,17000,chartqa_relaxed_overall,0.6196,0.009711645711462604
678
+ ≥3,17000,docvqa_val_anls,0.6671354152323413,0.005979986643812461
679
+ ≥3,17000,infovqa_val_anls,0.26085018558007095,0.006930202483417548
680
+ ≥3,17000,mme_total_score,1181.9268707482993,
681
+ ≥3,17000,mmmu_val_mmmu_acc,0.29667,
682
+ ≥3,17000,mmstar_average,0.3246494808541184,
683
+ ≥3,17000,ocrbench_ocrbench_accuracy,0.556,
684
+ ≥3,17000,seedbench_seed_all,0.5353529738743746,
685
+ ≥3,17000,textvqa_val_exact_match,0.5471199999999999,0.006741055517194408
686
+ ≥3,18000,ai2d_exact_match,0.44624352331606215,0.0089469921763539
687
+ ≥3,18000,average,0.47727537354972976,
688
+ ≥3,18000,average_rank,2.8,
689
+ ≥3,18000,chartqa_relaxed_overall,0.6212,0.009703704898413913
690
+ ≥3,18000,docvqa_val_anls,0.6676971859833172,0.005968624246725931
691
+ ≥3,18000,infovqa_val_anls,0.2614701461784385,0.006943538426265278
692
+ ≥3,18000,mme_total_score,1133.047819127651,
693
+ ≥3,18000,mmmu_val_mmmu_acc,0.30444,
694
+ ≥3,18000,mmstar_average,0.3242292852357318,
695
+ ≥3,18000,ocrbench_ocrbench_accuracy,0.582,
696
+ ≥3,18000,seedbench_seed_all,0.5367982212340189,
697
+ ≥3,18000,textvqa_val_exact_match,0.5513999999999999,0.006735687188133017
698
+ ≥3,19000,ai2d_exact_match,0.4520725388601036,0.008957715852675527
699
+ ≥3,19000,average,0.4762675915069992,
700
+ ≥3,19000,average_rank,3.0,
701
+ ≥3,19000,chartqa_relaxed_overall,0.6216,0.009701702181065136
702
+ ≥3,19000,docvqa_val_anls,0.6679273632688325,0.00596194457686321
703
+ ≥3,19000,infovqa_val_anls,0.25211534311880446,0.006837669178934141
704
+ ≥3,19000,mme_total_score,1168.6077430972389,
705
+ ≥3,19000,mmmu_val_mmmu_acc,0.30111,
706
+ ≥3,19000,mmstar_average,0.334229548576509,
707
+ ≥3,19000,ocrbench_ocrbench_accuracy,0.566,
708
+ ≥3,19000,seedbench_seed_all,0.5363535297387437,
709
+ ≥3,19000,textvqa_val_exact_match,0.555,0.006737661257130932
710
+ ≥3,20000,ai2d_exact_match,0.4566062176165803,0.008965198879336198
711
+ ≥3,20000,average,0.4782761612786655,
712
+ ≥3,20000,average_rank,2.7,
713
+ ≥3,20000,chartqa_relaxed_overall,0.6268,0.009675026948726469
714
+ ≥3,20000,docvqa_val_anls,0.6699567897644018,0.005975453790424837
715
+ ≥3,20000,infovqa_val_anls,0.2594904076423186,0.006910668664574003
716
+ ≥3,20000,mme_total_score,1194.4682873149259,
717
+ ≥3,20000,mmmu_val_mmmu_acc,0.30667,
718
+ ≥3,20000,mmstar_average,0.3291890626103143,
719
+ ≥3,20000,ocrbench_ocrbench_accuracy,0.571,
720
+ ≥3,20000,seedbench_seed_all,0.5353529738743746,
721
+ ≥3,20000,textvqa_val_exact_match,0.54942,0.0067426571472292
722
+ ≥4,1000,ai2d_exact_match,0.266839378238342,0.007960790788435024
723
+ ≥4,1000,average,0.28718938224797474,
724
+ ≥4,1000,average_rank,2.8,
725
+ ≥4,1000,chartqa_relaxed_overall,0.3824,0.009721414421746647
726
+ ≥4,1000,docvqa_val_anls,0.3742280929549393,0.005897617626003216
727
+ ≥4,1000,infovqa_val_anls,0.18767733564942402,0.006495529242061099
728
+ ≥4,1000,mme_total_score,970.0657262905162,
729
+ ≥4,1000,mmmu_val_mmmu_acc,0.24667,
730
+ ≥4,1000,mmstar_average,0.20409674845299178,
731
+ ≥4,1000,ocrbench_ocrbench_accuracy,0.324,
732
+ ≥4,1000,seedbench_seed_all,0.2471928849360756,
733
+ ≥4,1000,textvqa_val_exact_match,0.3516,0.006519815150594346
734
+ ≥4,2000,ai2d_exact_match,0.2700777202072539,0.007991243694641088
735
+ ≥4,2000,average,0.32538295993176786,
736
+ ≥4,2000,average_rank,2.8,
737
+ ≥4,2000,chartqa_relaxed_overall,0.476,0.009990471651004463
738
+ ≥4,2000,docvqa_val_anls,0.45055679456484166,0.006087636141467791
739
+ ≥4,2000,infovqa_val_anls,0.21184608413063888,0.006740983882332282
740
+ ≥4,2000,mme_total_score,1065.059423769508,
741
+ ≥4,2000,mmmu_val_mmmu_acc,0.25444,
742
+ ≥4,2000,mmstar_average,0.20479630173942967,
743
+ ≥4,2000,ocrbench_ocrbench_accuracy,0.404,
744
+ ≥4,2000,seedbench_seed_all,0.2535297387437465,
745
+ ≥4,2000,textvqa_val_exact_match,0.4032,0.00669032914742019
746
+ ≥4,3000,ai2d_exact_match,0.26813471502590674,0.007973037037795191
747
+ ≥4,3000,average,0.3429973351943505,
748
+ ≥4,3000,average_rank,3.6,
749
+ ≥4,3000,chartqa_relaxed_overall,0.5052,0.010001459677380663
750
+ ≥4,3000,docvqa_val_anls,0.4883627712637139,0.006123671768321872
751
+ ≥4,3000,infovqa_val_anls,0.2020989926298624,0.006492359244043468
752
+ ≥4,3000,mme_total_score,1028.0742296918768,
753
+ ≥4,3000,mmmu_val_mmmu_acc,0.24444,
754
+ ≥4,3000,mmstar_average,0.23755500197641968,
755
+ ≥4,3000,ocrbench_ocrbench_accuracy,0.417,
756
+ ≥4,3000,seedbench_seed_all,0.2961645358532518,
757
+ ≥4,3000,textvqa_val_exact_match,0.42802000000000007,0.006729073636571477
758
+ ≥4,4000,ai2d_exact_match,0.297279792746114,0.008226320033454882
759
+ ≥4,4000,average,0.37640705986204226,
760
+ ≥4,4000,average_rank,3.0,
761
+ ≥4,4000,chartqa_relaxed_overall,0.5328,0.009980456292330589
762
+ ≥4,4000,docvqa_val_anls,0.5114700599486628,0.006120071866795458
763
+ ≥4,4000,infovqa_val_anls,0.20557836945629954,0.006329851460183733
764
+ ≥4,4000,mme_total_score,1074.640656262505,
765
+ ≥4,4000,mmmu_val_mmmu_acc,0.25889,
766
+ ≥4,4000,mmstar_average,0.24011275407256244,
767
+ ≥4,4000,ocrbench_ocrbench_accuracy,0.489,
768
+ ≥4,4000,seedbench_seed_all,0.40261256253474154,
769
+ ≥4,4000,textvqa_val_exact_match,0.44992,0.006773387223162055
770
+ ≥4,5000,ai2d_exact_match,0.32998704663212436,0.008462949140760363
771
+ ≥4,5000,average,0.3995227518308942,
772
+ ≥4,5000,average_rank,2.8,
773
+ ≥4,5000,chartqa_relaxed_overall,0.55,0.009951864943131942
774
+ ≥4,5000,docvqa_val_anls,0.5627332434349699,0.006167596088104117
775
+ ≥4,5000,infovqa_val_anls,0.20676909019266723,0.0063195922615256655
776
+ ≥4,5000,mme_total_score,1081.3841536614646,
777
+ ≥4,5000,mmmu_val_mmmu_acc,0.26667,
778
+ ≥4,5000,mmstar_average,0.26742033896981504,
779
+ ≥4,5000,ocrbench_ocrbench_accuracy,0.49,
780
+ ≥4,5000,seedbench_seed_all,0.4530850472484714,
781
+ ≥4,5000,textvqa_val_exact_match,0.46903999999999996,0.006785801728684695
782
+ ≥4,6000,ai2d_exact_match,0.35654145077720206,0.008620788425978479
783
+ ≥4,6000,average,0.41458714913417777,
784
+ ≥4,6000,average_rank,2.8,
785
+ ≥4,6000,chartqa_relaxed_overall,0.5632,0.009921778100334079
786
+ ≥4,6000,docvqa_val_anls,0.5818014224607982,0.006182490179642956
787
+ ≥4,6000,infovqa_val_anls,0.2145391217079547,0.006472934595237677
788
+ ≥4,6000,mme_total_score,1132.2886154461785,
789
+ ≥4,6000,mmmu_val_mmmu_acc,0.26667,
790
+ ≥4,6000,mmstar_average,0.28714914548287923,
791
+ ≥4,6000,ocrbench_ocrbench_accuracy,0.499,
792
+ ≥4,6000,seedbench_seed_all,0.47376320177876596,
793
+ ≥4,6000,textvqa_val_exact_match,0.48862,0.006787319991169747
794
+ ≥4,7000,ai2d_exact_match,0.38471502590673573,0.008756678690415541
795
+ ≥4,7000,average,0.42592935170009355,
796
+ ≥4,7000,average_rank,3.1,
797
+ ≥4,7000,chartqa_relaxed_overall,0.5804,0.009871844677005952
798
+ ≥4,7000,docvqa_val_anls,0.5710623718792285,0.006078423874650784
799
+ ≥4,7000,infovqa_val_anls,0.22007869704137703,0.006475129444868969
800
+ ≥4,7000,mme_total_score,1041.2597038815525,
801
+ ≥4,7000,mmmu_val_mmmu_acc,0.28444,
802
+ ≥4,7000,mmstar_average,0.3026487819798931,
803
+ ≥4,7000,ocrbench_ocrbench_accuracy,0.502,
804
+ ≥4,7000,seedbench_seed_all,0.4947192884936076,
805
+ ≥4,7000,textvqa_val_exact_match,0.4933,0.006785560460724908
806
+ ≥4,8000,ai2d_exact_match,0.3915155440414508,0.008784780895708938
807
+ ≥4,8000,average,0.43659006376695136,
808
+ ≥4,8000,average_rank,3.0,
809
+ ≥4,8000,chartqa_relaxed_overall,0.5736,0.009893046292521752
810
+ ≥4,8000,docvqa_val_anls,0.6079864136742988,0.006139878520335163
811
+ ≥4,8000,infovqa_val_anls,0.23243402779245617,0.006686893363455147
812
+ ≥4,8000,mme_total_score,1108.9173669467787,
813
+ ≥4,8000,mmmu_val_mmmu_acc,0.28,
814
+ ≥4,8000,mmstar_average,0.3276025817239844,
815
+ ≥4,8000,ocrbench_ocrbench_accuracy,0.508,
816
+ ≥4,8000,seedbench_seed_all,0.5016120066703724,
817
+ ≥4,8000,textvqa_val_exact_match,0.50656,0.006805281452749051
818
+ ≥4,9000,ai2d_exact_match,0.39248704663212436,0.008788649010397578
819
+ ≥4,9000,average,0.4379212821083599,
820
+ ≥4,9000,average_rank,2.9,
821
+ ≥4,9000,chartqa_relaxed_overall,0.5844,0.009858475126140203
822
+ ≥4,9000,docvqa_val_anls,0.6225000882770518,0.00610983265425905
823
+ ≥4,9000,infovqa_val_anls,0.2357319670089269,0.006735352134813103
824
+ ≥4,9000,mme_total_score,1054.3165266106444,
825
+ ≥4,9000,mmmu_val_mmmu_acc,0.28556,
826
+ ≥4,9000,mmstar_average,0.30919474945291153,
827
+ ≥4,9000,ocrbench_ocrbench_accuracy,0.496,
828
+ ≥4,9000,seedbench_seed_all,0.5078376876042245,
829
+ ≥4,9000,textvqa_val_exact_match,0.50758,0.0067866133191798106
830
+ ≥4,10000,ai2d_exact_match,0.4177461139896373,0.008876547725654098
831
+ ≥4,10000,average,0.4482945169324334,
832
+ ≥4,10000,average_rank,3.0,
833
+ ≥4,10000,chartqa_relaxed_overall,0.5872,0.009848718845878486
834
+ ≥4,10000,docvqa_val_anls,0.6178172719068701,0.006018237392964321
835
+ ≥4,10000,infovqa_val_anls,0.24180220451279583,0.006673139519957623
836
+ ≥4,10000,mme_total_score,1143.5380152060825,
837
+ ≥4,10000,mmmu_val_mmmu_acc,0.29667,
838
+ ≥4,10000,mmstar_average,0.31635030378359796,
839
+ ≥4,10000,ocrbench_ocrbench_accuracy,0.524,
840
+ ≥4,10000,seedbench_seed_all,0.5165647581989995,
841
+ ≥4,10000,textvqa_val_exact_match,0.5165,0.006796704277648658
842
+ ≥4,11000,ai2d_exact_match,0.41515544041450775,0.00886864516657515
843
+ ≥4,11000,average,0.45134109009976725,
844
+ ≥4,11000,average_rank,2.5,
845
+ ≥4,11000,chartqa_relaxed_overall,0.5956,0.009817474681589429
846
+ ≥4,11000,docvqa_val_anls,0.629269001239484,0.00608373788497042
847
+ ≥4,11000,infovqa_val_anls,0.24324006994727237,0.006777064540159464
848
+ ≥4,11000,mme_total_score,1228.8085234093637,
849
+ ≥4,11000,mmmu_val_mmmu_acc,0.28333,
850
+ ≥4,11000,mmstar_average,0.3288790569397762,
851
+ ≥4,11000,ocrbench_ocrbench_accuracy,0.522,
852
+ ≥4,11000,seedbench_seed_all,0.525236242356865,
853
+ ≥4,11000,textvqa_val_exact_match,0.5193599999999999,0.0067761804436039675
854
+ ≥4,12000,ai2d_exact_match,0.4183937823834197,0.008878484004260249
855
+ ≥4,12000,average,0.45687238598965646,
856
+ ≥4,12000,average_rank,3.3,
857
+ ≥4,12000,chartqa_relaxed_overall,0.5988,0.0098047885010856
858
+ ≥4,12000,docvqa_val_anls,0.6281800356608191,0.005956403319187123
859
+ ≥4,12000,infovqa_val_anls,0.242249391011484,0.006664412716854741
860
+ ≥4,12000,mme_total_score,1051.548619447779,
861
+ ≥4,12000,mmmu_val_mmmu_acc,0.28,
862
+ ≥4,12000,mmstar_average,0.32638661949265274,
863
+ ≥4,12000,ocrbench_ocrbench_accuracy,0.553,
864
+ ≥4,12000,seedbench_seed_all,0.5309616453585325,
865
+ ≥4,12000,textvqa_val_exact_match,0.53388,0.006762808309810877
866
+ ≥4,13000,ai2d_exact_match,0.4323186528497409,0.008916326937351901
867
+ ≥4,13000,average,0.46134498058034357,
868
+ ≥4,13000,average_rank,3.1,
869
+ ≥4,13000,chartqa_relaxed_overall,0.5948,0.009820578470976232
870
+ ≥4,13000,docvqa_val_anls,0.6459204882256453,0.006047391420582867
871
+ ≥4,13000,infovqa_val_anls,0.24395762124162781,0.006787945348887751
872
+ ≥4,13000,mme_total_score,1195.637755102041,
873
+ ≥4,13000,mmmu_val_mmmu_acc,0.29556,
874
+ ≥4,13000,mmstar_average,0.329867095702076,
875
+ ≥4,13000,ocrbench_ocrbench_accuracy,0.542,
876
+ ≥4,13000,seedbench_seed_all,0.5337409672040022,
877
+ ≥4,13000,textvqa_val_exact_match,0.53394,0.006767804364428913
878
+ ≥4,14000,ai2d_exact_match,0.4319948186528497,0.008915528710615492
879
+ ≥4,14000,average,0.4668148142530245,
880
+ ≥4,14000,average_rank,2.7,
881
+ ≥4,14000,chartqa_relaxed_overall,0.6076,0.009767653701044555
882
+ ≥4,14000,docvqa_val_anls,0.6561789267585798,0.005953346874132679
883
+ ≥4,14000,infovqa_val_anls,0.24945371957223306,0.006769327490532885
884
+ ≥4,14000,mme_total_score,1259.298019207683,
885
+ ≥4,14000,mmmu_val_mmmu_acc,0.30111,
886
+ ≥4,14000,mmstar_average,0.32172026573936097,
887
+ ≥4,14000,ocrbench_ocrbench_accuracy,0.55,
888
+ ≥4,14000,seedbench_seed_all,0.5360755975541968,
889
+ ≥4,14000,textvqa_val_exact_match,0.5472000000000001,0.006748951153204005
890
+ ≥4,15000,ai2d_exact_match,0.44624352331606215,0.008946992176353898
891
+ ≥4,15000,average,0.46662671868754135,
892
+ ≥4,15000,average_rank,2.9,
893
+ ≥4,15000,chartqa_relaxed_overall,0.6044,0.009781540134915584
894
+ ≥4,15000,docvqa_val_anls,0.6622581274446402,0.005962189435141322
895
+ ≥4,15000,infovqa_val_anls,0.2534140745372918,0.006885986461871116
896
+ ≥4,15000,mme_total_score,1200.2537014805923,
897
+ ≥4,15000,mmmu_val_mmmu_acc,0.29,
898
+ ≥4,15000,mmstar_average,0.3198168607331238,
899
+ ≥4,15000,ocrbench_ocrbench_accuracy,0.538,
900
+ ≥4,15000,seedbench_seed_all,0.5381878821567537,
901
+ ≥4,15000,textvqa_val_exact_match,0.54732,0.006746470669416614
902
+ ≥4,16000,ai2d_exact_match,0.43911917098445596,0.008932194723472647
903
+ ≥4,16000,average,0.46812785270927354,
904
+ ≥4,16000,average_rank,3.0,
905
+ ≥4,16000,chartqa_relaxed_overall,0.6108,0.00975332737879659
906
+ ≥4,16000,docvqa_val_anls,0.6699643513666329,0.005944732124585459
907
+ ≥4,16000,infovqa_val_anls,0.2589072217280723,0.006864729360775582
908
+ ≥4,16000,mme_total_score,1239.2577030812326,
909
+ ≥4,16000,mmmu_val_mmmu_acc,0.28444,
910
+ ≥4,16000,mmstar_average,0.3155913699930164,
911
+ ≥4,16000,ocrbench_ocrbench_accuracy,0.547,
912
+ ≥4,16000,seedbench_seed_all,0.535408560311284,
913
+ ≥4,16000,textvqa_val_exact_match,0.55192,0.006727935474062503
914
+ ≥4,17000,ai2d_exact_match,0.44591968911917096,0.00894635996642554
915
+ ≥4,17000,average,0.4711902865454063,
916
+ ≥4,17000,average_rank,2.9,
917
+ ≥4,17000,chartqa_relaxed_overall,0.6092,0.009760545645634788
918
+ ≥4,17000,docvqa_val_anls,0.6630256300679175,0.005926991608870499
919
+ ≥4,17000,infovqa_val_anls,0.2604941623528308,0.0069459226352746855
920
+ ≥4,17000,mme_total_score,1231.9475790316128,
921
+ ≥4,17000,mmmu_val_mmmu_acc,0.28667,
922
+ ≥4,17000,mmstar_average,0.3281863880858027,
923
+ ≥4,17000,ocrbench_ocrbench_accuracy,0.559,
924
+ ≥4,17000,seedbench_seed_all,0.5380767092829349,
925
+ ≥4,17000,textvqa_val_exact_match,0.55014,0.00673464677421427
926
+ ≥4,18000,ai2d_exact_match,0.44527202072538863,0.008945084019331404
927
+ ≥4,18000,average,0.4730863890198541,
928
+ ≥4,18000,average_rank,3.1,
929
+ ≥4,18000,chartqa_relaxed_overall,0.6148,0.00973479791861169
930
+ ≥4,18000,docvqa_val_anls,0.6724670614264582,0.0059283840951577715
931
+ ≥4,18000,infovqa_val_anls,0.2591524677671406,0.006860568910244235
932
+ ≥4,18000,mme_total_score,1230.187074829932,
933
+ ≥4,18000,mmmu_val_mmmu_acc,0.28222,
934
+ ≥4,18000,mmstar_average,0.3313130996754855,
935
+ ≥4,18000,ocrbench_ocrbench_accuracy,0.559,
936
+ ≥4,18000,seedbench_seed_all,0.5391328515842134,
937
+ ≥4,18000,textvqa_val_exact_match,0.55442,0.0067378017419973775
938
+ ≥4,19000,ai2d_exact_match,0.4475388601036269,0.008949482610884277
939
+ ≥4,19000,average,0.4748981492546839,
940
+ ≥4,19000,average_rank,3.0,
941
+ ≥4,19000,chartqa_relaxed_overall,0.614,0.009738559226822298
942
+ ≥4,19000,docvqa_val_anls,0.6780110114952748,0.005954038851856335
943
+ ≥4,19000,infovqa_val_anls,0.2592553130284412,0.0068947091925615645
944
+ ≥4,19000,mme_total_score,1280.6934773909566,
945
+ ≥4,19000,mmmu_val_mmmu_acc,0.29778,
946
+ ≥4,19000,mmstar_average,0.33015385627459537,
947
+ ≥4,19000,ocrbench_ocrbench_accuracy,0.55,
948
+ ≥4,19000,seedbench_seed_all,0.5397443023902168,
949
+ ≥4,19000,textvqa_val_exact_match,0.5576000000000001,0.00671993150976252
950
+ ≥4,20000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
951
+ ≥4,20000,average,0.47750231781976243,
952
+ ≥4,20000,average_rank,2.7,
953
+ ≥4,20000,chartqa_relaxed_overall,0.6204,0.009707689307588963
954
+ ≥4,20000,docvqa_val_anls,0.673153386104693,0.005945073379634221
955
+ ≥4,20000,infovqa_val_anls,0.2604945747241511,0.006917780880359967
956
+ ≥4,20000,mme_total_score,1348.6498599439776,
957
+ ≥4,20000,mmmu_val_mmmu_acc,0.29222,
958
+ ≥4,20000,mmstar_average,0.32618642565880246,
959
+ ≥4,20000,ocrbench_ocrbench_accuracy,0.569,
960
+ ≥4,20000,seedbench_seed_all,0.5406892718176765,
961
+ ≥4,20000,textvqa_val_exact_match,0.5646,0.006722023885782034
962
+ ≥5,1000,ai2d_exact_match,0.27396373056994816,0.008027076080717028
963
+ ≥5,1000,average,0.26438200891802877,
964
+ ≥5,1000,average_rank,3.0,
965
+ ≥5,1000,chartqa_relaxed_overall,0.2832,0.00901285729603301
966
+ ≥5,1000,docvqa_val_anls,0.32055326545515606,0.0056245129740867565
967
+ ≥5,1000,infovqa_val_anls,0.15327397474830004,0.005916826112508726
968
+ ≥5,1000,mme_total_score,1087.6624649859943,
969
+ ≥5,1000,mmmu_val_mmmu_acc,0.29778,
970
+ ≥5,1000,mmstar_average,0.26060215117868224,
971
+ ≥5,1000,ocrbench_ocrbench_accuracy,0.259,
972
+ ≥5,1000,seedbench_seed_all,0.2649249583101723,
973
+ ≥5,1000,textvqa_val_exact_match,0.26614,0.006037548383085275
974
+ ≥5,2000,ai2d_exact_match,0.26392487046632124,0.007932917099101329
975
+ ≥5,2000,average,0.2929576826335877,
976
+ ≥5,2000,average_rank,3.3,
977
+ ≥5,2000,chartqa_relaxed_overall,0.3824,0.009721414421746647
978
+ ≥5,2000,docvqa_val_anls,0.3929824030686217,0.005977850940256623
979
+ ≥5,2000,infovqa_val_anls,0.15895135963621443,0.005878593482981634
980
+ ≥5,2000,mme_total_score,1073.4139655862346,
981
+ ≥5,2000,mmmu_val_mmmu_acc,0.27333,
982
+ ≥5,2000,mmstar_average,0.25335477956948643,
983
+ ≥5,2000,ocrbench_ocrbench_accuracy,0.301,
984
+ ≥5,2000,seedbench_seed_all,0.26831573096164535,
985
+ ≥5,2000,textvqa_val_exact_match,0.34236,0.006479253215027554
986
+ ≥5,3000,ai2d_exact_match,0.2600388601036269,0.007895056974601723
987
+ ≥5,3000,average,0.3126381365493242,
988
+ ≥5,3000,average_rank,3.9,
989
+ ≥5,3000,chartqa_relaxed_overall,0.4324,0.009910165515884228
990
+ ≥5,3000,docvqa_val_anls,0.4366357263318607,0.00610598785442012
991
+ ≥5,3000,infovqa_val_anls,0.17846201123654198,0.006273305639193489
992
+ ≥5,3000,mme_total_score,1164.5565226090434,
993
+ ≥5,3000,mmmu_val_mmmu_acc,0.28778,
994
+ ≥5,3000,mmstar_average,0.25130117268378377,
995
+ ≥5,3000,ocrbench_ocrbench_accuracy,0.344,
996
+ ≥5,3000,seedbench_seed_all,0.2858254585881045,
997
+ ≥5,3000,textvqa_val_exact_match,0.3373,0.006457064405451384
998
+ ≥5,4000,ai2d_exact_match,0.25647668393782386,0.007859644922870104
999
+ ≥5,4000,average,0.3300809923443584,
1000
+ ≥5,4000,average_rank,4.3,
1001
+ ≥5,4000,chartqa_relaxed_overall,0.4428,0.009936335154498413
1002
+ ≥5,4000,docvqa_val_anls,0.4736486989184438,0.006240863735639683
1003
+ ≥5,4000,infovqa_val_anls,0.19267658764675277,0.006512420811238904
1004
+ ≥5,4000,mme_total_score,1218.2668067226891,
1005
+ ≥5,4000,mmmu_val_mmmu_acc,0.26889,
1006
+ ≥5,4000,mmstar_average,0.22297093502644408,
1007
+ ≥5,4000,ocrbench_ocrbench_accuracy,0.379,
1008
+ ≥5,4000,seedbench_seed_all,0.322846025569761,
1009
+ ≥5,4000,textvqa_val_exact_match,0.41142,0.006712445761838313
1010
+ ≥5,5000,ai2d_exact_match,0.25161917098445596,0.007810248924722509
1011
+ ≥5,5000,average,0.3420574749713038,
1012
+ ≥5,5000,average_rank,4.2,
1013
+ ≥5,5000,chartqa_relaxed_overall,0.4488,0.009949423119365426
1014
+ ≥5,5000,docvqa_val_anls,0.4973120888104521,0.00627054301371889
1015
+ ≥5,5000,infovqa_val_anls,0.20687122924296383,0.006767419172429617
1016
+ ≥5,5000,mme_total_score,1285.299119647859,
1017
+ ≥5,5000,mmmu_val_mmmu_acc,0.26778,
1018
+ ≥5,5000,mmstar_average,0.24681232878335083,
1019
+ ≥5,5000,ocrbench_ocrbench_accuracy,0.392,
1020
+ ≥5,5000,seedbench_seed_all,0.3604224569205114,
1021
+ ≥5,5000,textvqa_val_exact_match,0.4069,0.00670861230775927
1022
+ ≥5,6000,ai2d_exact_match,0.2704015544041451,0.00799425923314582
1023
+ ≥5,6000,average,0.35916516291601697,
1024
+ ≥5,6000,average_rank,4.4,
1025
+ ≥5,6000,chartqa_relaxed_overall,0.4844,0.009997131241172205
1026
+ ≥5,6000,docvqa_val_anls,0.5108154498847224,0.0062636540505031655
1027
+ ≥5,6000,infovqa_val_anls,0.20262763630072025,0.0066138397079363274
1028
+ ≥5,6000,mme_total_score,1273.862545018007,
1029
+ ≥5,6000,mmmu_val_mmmu_acc,0.27444,
1030
+ ≥5,6000,mmstar_average,0.2588150329919745,
1031
+ ≥5,6000,ocrbench_ocrbench_accuracy,0.403,
1032
+ ≥5,6000,seedbench_seed_all,0.4082267926625903,
1033
+ ≥5,6000,textvqa_val_exact_match,0.41976,0.006731520716318925
1034
+ ≥5,7000,ai2d_exact_match,0.31994818652849744,0.008395421656067303
1035
+ ≥5,7000,average,0.3723337802797541,
1036
+ ≥5,7000,average_rank,4.5,
1037
+ ≥5,7000,chartqa_relaxed_overall,0.476,0.009990471651004463
1038
+ ≥5,7000,docvqa_val_anls,0.5291779466505276,0.006267960743408816
1039
+ ≥5,7000,infovqa_val_anls,0.20957812087727798,0.006721757004150909
1040
+ ≥5,7000,mme_total_score,1327.2439975990396,
1041
+ ≥5,7000,mmmu_val_mmmu_acc,0.27222,
1042
+ ≥5,7000,mmstar_average,0.29354698358099446,
1043
+ ≥5,7000,ocrbench_ocrbench_accuracy,0.403,
1044
+ ≥5,7000,seedbench_seed_all,0.42301278488048916,
1045
+ ≥5,7000,textvqa_val_exact_match,0.42452,0.006734688198055274
1046
+ ≥5,8000,ai2d_exact_match,0.30958549222797926,0.008321027166750249
1047
+ ≥5,8000,average,0.37926040717793597,
1048
+ ≥5,8000,average_rank,4.5,
1049
+ ≥5,8000,chartqa_relaxed_overall,0.5136,0.009998299975543861
1050
+ ≥5,8000,docvqa_val_anls,0.5386485557171258,0.006250093887872433
1051
+ ≥5,8000,infovqa_val_anls,0.21347817313272946,0.006767638253739939
1052
+ ≥5,8000,mme_total_score,1351.172769107643,
1053
+ ≥5,8000,mmmu_val_mmmu_acc,0.27667,
1054
+ ≥5,8000,mmstar_average,0.27077818615838667,
1055
+ ≥5,8000,ocrbench_ocrbench_accuracy,0.406,
1056
+ ≥5,8000,seedbench_seed_all,0.4538632573652029,
1057
+ ≥5,8000,textvqa_val_exact_match,0.43072,0.00674498000523754
1058
+ ≥5,9000,ai2d_exact_match,0.32642487046632124,0.00843949241376102
1059
+ ≥5,9000,average,0.3915431470529602,
1060
+ ≥5,9000,average_rank,4.4,
1061
+ ≥5,9000,chartqa_relaxed_overall,0.5196,0.009994312908659929
1062
+ ≥5,9000,docvqa_val_anls,0.5447526718541965,0.006277223186340111
1063
+ ≥5,9000,infovqa_val_anls,0.22534586447558344,0.006943394394173722
1064
+ ≥5,9000,mme_total_score,1380.0509203681472,
1065
+ ≥5,9000,mmmu_val_mmmu_acc,0.28222,
1066
+ ≥5,9000,mmstar_average,0.2981132324115024,
1067
+ ≥5,9000,ocrbench_ocrbench_accuracy,0.42,
1068
+ ≥5,9000,seedbench_seed_all,0.45703168426903834,
1069
+ ≥5,9000,textvqa_val_exact_match,0.4504,0.0067806462400486975
1070
+ ≥5,10000,ai2d_exact_match,0.3121761658031088,0.008340079044408505
1071
+ ≥5,10000,average,0.3945344056050298,
1072
+ ≥5,10000,average_rank,4.4,
1073
+ ≥5,10000,chartqa_relaxed_overall,0.524,0.009990471651004463
1074
+ ≥5,10000,docvqa_val_anls,0.5476477162524015,0.006282119242898783
1075
+ ≥5,10000,infovqa_val_anls,0.2268357982996008,0.007080273138697436
1076
+ ≥5,10000,mme_total_score,1385.6108443377352,
1077
+ ≥5,10000,mmmu_val_mmmu_acc,0.29222,
1078
+ ≥5,10000,mmstar_average,0.29882846925636025,
1079
+ ≥5,10000,ocrbench_ocrbench_accuracy,0.43,
1080
+ ≥5,10000,seedbench_seed_all,0.4627015008337966,
1081
+ ≥5,10000,textvqa_val_exact_match,0.4564,0.006792248149691337
1082
+ ≥5,11000,ai2d_exact_match,0.3403497409326425,0.008528080007639036
1083
+ ≥5,11000,average,0.40311924614292627,
1084
+ ≥5,11000,average_rank,4.2,
1085
+ ≥5,11000,chartqa_relaxed_overall,0.5404,0.009969297405349211
1086
+ ≥5,11000,docvqa_val_anls,0.5698821874786791,0.006251823346664307
1087
+ ≥5,11000,infovqa_val_anls,0.22660700332356035,0.006919487246988994
1088
+ ≥5,11000,mme_total_score,1358.4087635054022,
1089
+ ≥5,11000,mmmu_val_mmmu_acc,0.28778,
1090
+ ≥5,11000,mmstar_average,0.28965012568597354,
1091
+ ≥5,11000,ocrbench_ocrbench_accuracy,0.436,
1092
+ ≥5,11000,seedbench_seed_all,0.4714841578654808,
1093
+ ≥5,11000,textvqa_val_exact_match,0.46592,0.006784225516827446
1094
+ ≥5,12000,ai2d_exact_match,0.342940414507772,0.008543648986216495
1095
+ ≥5,12000,average,0.4128131018529697,
1096
+ ≥5,12000,average_rank,4.3,
1097
+ ≥5,12000,chartqa_relaxed_overall,0.5548,0.009941746291659784
1098
+ ≥5,12000,docvqa_val_anls,0.578981486161722,0.00625708617478689
1099
+ ≥5,12000,infovqa_val_anls,0.2380032381589791,0.007080943870134072
1100
+ ≥5,12000,mme_total_score,1390.3039215686274,
1101
+ ≥5,12000,mmmu_val_mmmu_acc,0.28222,
1102
+ ≥5,12000,mmstar_average,0.3060607822951693,
1103
+ ≥5,12000,ocrbench_ocrbench_accuracy,0.472,
1104
+ ≥5,12000,seedbench_seed_all,0.46559199555308506,
1105
+ ≥5,12000,textvqa_val_exact_match,0.47472,0.006773519058221244
1106
+ ≥5,13000,ai2d_exact_match,0.33678756476683935,0.008506208807020252
1107
+ ≥5,13000,average,0.41416266738683244,
1108
+ ≥5,13000,average_rank,4.5,
1109
+ ≥5,13000,chartqa_relaxed_overall,0.5564,0.009938164963872337
1110
+ ≥5,13000,docvqa_val_anls,0.5882749499950303,0.0062089530468064
1111
+ ≥5,13000,infovqa_val_anls,0.2250291831460855,0.007008754051627638
1112
+ ≥5,13000,mme_total_score,1463.7286914765905,
1113
+ ≥5,13000,mmmu_val_mmmu_acc,0.28222,
1114
+ ≥5,13000,mmstar_average,0.32070873992428756,
1115
+ ≥5,13000,ocrbench_ocrbench_accuracy,0.475,
1116
+ ≥5,13000,seedbench_seed_all,0.4624235686492496,
1117
+ ≥5,13000,textvqa_val_exact_match,0.48062,0.006792356759039414
1118
+ ≥5,14000,ai2d_exact_match,0.35103626943005184,0.008590489143063932
1119
+ ≥5,14000,average,0.4197541703337554,
1120
+ ≥5,14000,average_rank,4.4,
1121
+ ≥5,14000,chartqa_relaxed_overall,0.5644,0.00991868984106597
1122
+ ≥5,14000,docvqa_val_anls,0.5968397354218249,0.006216108072191749
1123
+ ≥5,14000,infovqa_val_anls,0.23493831065135024,0.00713715521281919
1124
+ ≥5,14000,mme_total_score,1381.4046618647458,
1125
+ ≥5,14000,mmmu_val_mmmu_acc,0.28778,
1126
+ ≥5,14000,mmstar_average,0.3178569306745569,
1127
+ ≥5,14000,ocrbench_ocrbench_accuracy,0.465,
1128
+ ≥5,14000,seedbench_seed_all,0.46931628682601445,
1129
+ ≥5,14000,textvqa_val_exact_match,0.49062,0.0067877549928948315
1130
+ ≥5,15000,ai2d_exact_match,0.3448834196891192,0.008555140353607656
1131
+ ≥5,15000,average,0.4156222682362929,
1132
+ ≥5,15000,average_rank,4.5,
1133
+ ≥5,15000,chartqa_relaxed_overall,0.5544,0.009942625323290008
1134
+ ≥5,15000,docvqa_val_anls,0.5981327465682499,0.0062119027314077434
1135
+ ≥5,15000,infovqa_val_anls,0.2430496253387209,0.007241165402150032
1136
+ ≥5,15000,mme_total_score,1405.2406962785115,
1137
+ ≥5,15000,mmmu_val_mmmu_acc,0.27667,
1138
+ ≥5,15000,mmstar_average,0.30769549523760537,
1139
+ ≥5,15000,ocrbench_ocrbench_accuracy,0.462,
1140
+ ≥5,15000,seedbench_seed_all,0.4724291272929405,
1141
+ ≥5,15000,textvqa_val_exact_match,0.48134,0.006785688616050607
1142
+ ≥5,16000,ai2d_exact_match,0.3555699481865285,0.008615532040064747
1143
+ ≥5,16000,average,0.41928937760980056,
1144
+ ≥5,16000,average_rank,4.6,
1145
+ ≥5,16000,chartqa_relaxed_overall,0.556,0.00993907007952043
1146
+ ≥5,16000,docvqa_val_anls,0.5950015990375694,0.006217949166028718
1147
+ ≥5,16000,infovqa_val_anls,0.2429016453664355,0.007192121794741783
1148
+ ≥5,16000,mme_total_score,1444.7096838735495,
1149
+ ≥5,16000,mmmu_val_mmmu_acc,0.27222,
1150
+ ≥5,16000,mmstar_average,0.29597997743741594,
1151
+ ≥5,16000,ocrbench_ocrbench_accuracy,0.484,
1152
+ ≥5,16000,seedbench_seed_all,0.4802112284602557,
1153
+ ≥5,16000,textvqa_val_exact_match,0.49172000000000005,0.006790781344017229
1154
+ ≥5,17000,ai2d_exact_match,0.35783678756476683,0.008627736835305362
1155
+ ≥5,17000,average,0.4243671877798907,
1156
+ ≥5,17000,average_rank,4.2,
1157
+ ≥5,17000,chartqa_relaxed_overall,0.566,0.00991448025705367
1158
+ ≥5,17000,docvqa_val_anls,0.6011648636453683,0.006202633675401635
1159
+ ≥5,17000,infovqa_val_anls,0.24233190899997978,0.007185211142139982
1160
+ ≥5,17000,mme_total_score,1383.0262104841936,
1161
+ ≥5,17000,mmmu_val_mmmu_acc,0.29778,
1162
+ ≥5,17000,mmstar_average,0.30487588800790116,
1163
+ ≥5,17000,ocrbench_ocrbench_accuracy,0.48,
1164
+ ≥5,17000,seedbench_seed_all,0.48343524180100056,
1165
+ ≥5,17000,textvqa_val_exact_match,0.48588000000000003,0.006793096079908642
1166
+ ≥5,18000,ai2d_exact_match,0.3484455958549223,0.008575797499263314
1167
+ ≥5,18000,average,0.4229679292209723,
1168
+ ≥5,18000,average_rank,4.4,
1169
+ ≥5,18000,chartqa_relaxed_overall,0.5564,0.009938164963872337
1170
+ ≥5,18000,docvqa_val_anls,0.6015112951191799,0.006202182626672507
1171
+ ≥5,18000,infovqa_val_anls,0.2406225801562843,0.007159684093951319
1172
+ ≥5,18000,mme_total_score,1388.7428971588636,
1173
+ ≥5,18000,mmmu_val_mmmu_acc,0.29444,
1174
+ ≥5,18000,mmstar_average,0.3048242431646457,
1175
+ ≥5,18000,ocrbench_ocrbench_accuracy,0.489,
1176
+ ≥5,18000,seedbench_seed_all,0.48176764869371874,
1177
+ ≥5,18000,textvqa_val_exact_match,0.4897,0.006784304485905058
1178
+ ≥5,19000,ai2d_exact_match,0.3552461139896373,0.00861377131101951
1179
+ ≥5,19000,average,0.4271521214095191,
1180
+ ≥5,19000,average_rank,4.4,
1181
+ ≥5,19000,chartqa_relaxed_overall,0.564,0.009919725822025206
1182
+ ≥5,19000,docvqa_val_anls,0.6030864459750552,0.006186369284106836
1183
+ ≥5,19000,infovqa_val_anls,0.24933668460761893,0.007278561320407618
1184
+ ≥5,19000,mme_total_score,1420.4112645058024,
1185
+ ≥5,19000,mmmu_val_mmmu_acc,0.28889,
1186
+ ≥5,19000,mmstar_average,0.3146239893029105,
1187
+ ≥5,19000,ocrbench_ocrbench_accuracy,0.495,
1188
+ ≥5,19000,seedbench_seed_all,0.48254585881045026,
1189
+ ≥5,19000,textvqa_val_exact_match,0.49163999999999997,0.006786164784802775
1190
+ ≥5,20000,ai2d_exact_match,0.3630181347150259,0.008654846701304475
1191
+ ≥5,20000,average,0.4295614849660391,
1192
+ ≥5,20000,average_rank,4.6,
1193
+ ≥5,20000,chartqa_relaxed_overall,0.566,0.00991448025705367
1194
+ ≥5,20000,docvqa_val_anls,0.6104819342838989,0.006177273769345363
1195
+ ≥5,20000,infovqa_val_anls,0.24655527159214874,0.007266841528312276
1196
+ ≥5,20000,mme_total_score,1413.0101040416166,
1197
+ ≥5,20000,mmmu_val_mmmu_acc,0.28444,
1198
+ ≥5,20000,mmstar_average,0.32507494461467334,
1199
+ ≥5,20000,ocrbench_ocrbench_accuracy,0.499,
1200
+ ≥5,20000,seedbench_seed_all,0.4775430794886048,
1201
+ ≥5,20000,textvqa_val_exact_match,0.4939399999999999,0.006784796384004054
app/src/content/assets/data/image_correspondence_filters.csv ADDED
@@ -0,0 +1,1177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run,step,metric,value,stderr
2
+ Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
3
+ Baseline,1000,average,0.27120689295763617,
4
+ Baseline,1000,average_rank,3.3,
5
+ Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
6
+ Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
7
+ Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
8
+ Baseline,1000,mme_total_score,977.4280712284914,
9
+ Baseline,1000,mmmu_val_mmmu_acc,0.25222,
10
+ Baseline,1000,mmstar_average,0.23215874078908072,
11
+ Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
12
+ Baseline,1000,seedbench_seed_all,0.2563646470261256,
13
+ Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
14
+ Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
15
+ Baseline,2000,average,0.3202068275596269,
16
+ Baseline,2000,average_rank,3.1,
17
+ Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
18
+ Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
19
+ Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
20
+ Baseline,2000,mme_total_score,1049.3036214485794,
21
+ Baseline,2000,mmmu_val_mmmu_acc,0.24556,
22
+ Baseline,2000,mmstar_average,0.21305462434540698,
23
+ Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
24
+ Baseline,2000,seedbench_seed_all,0.258532518065592,
25
+ Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
26
+ Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
27
+ Baseline,3000,average,0.3507423834414229,
28
+ Baseline,3000,average_rank,2.6,
29
+ Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
30
+ Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
31
+ Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
32
+ Baseline,3000,mme_total_score,1170.2383953581434,
33
+ Baseline,3000,mmmu_val_mmmu_acc,0.27556,
34
+ Baseline,3000,mmstar_average,0.25432376938577683,
35
+ Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
36
+ Baseline,3000,seedbench_seed_all,0.2792106725958866,
37
+ Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
38
+ Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
39
+ Baseline,4000,average,0.36961781722974835,
40
+ Baseline,4000,average_rank,3.2,
41
+ Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
42
+ Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
43
+ Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
44
+ Baseline,4000,mme_total_score,1155.203781512605,
45
+ Baseline,4000,mmmu_val_mmmu_acc,0.25556,
46
+ Baseline,4000,mmstar_average,0.2575590188757354,
47
+ Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
48
+ Baseline,4000,seedbench_seed_all,0.33913285158421347,
49
+ Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
50
+ Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
51
+ Baseline,5000,average,0.3974627910380972,
52
+ Baseline,5000,average_rank,3.1,
53
+ Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
54
+ Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
55
+ Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
56
+ Baseline,5000,mme_total_score,1181.4653861544618,
57
+ Baseline,5000,mmmu_val_mmmu_acc,0.26667,
58
+ Baseline,5000,mmstar_average,0.29596648146165705,
59
+ Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
60
+ Baseline,5000,seedbench_seed_all,0.43107281823235133,
61
+ Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
62
+ Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
63
+ Baseline,6000,average,0.4161227404571003,
64
+ Baseline,6000,average_rank,2.9,
65
+ Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
66
+ Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
67
+ Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
68
+ Baseline,6000,mme_total_score,1284.1648659463785,
69
+ Baseline,6000,mmmu_val_mmmu_acc,0.27111,
70
+ Baseline,6000,mmstar_average,0.2978489412854164,
71
+ Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
72
+ Baseline,6000,seedbench_seed_all,0.4795997776542524,
73
+ Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
74
+ Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
75
+ Baseline,7000,average,0.4291083177345374,
76
+ Baseline,7000,average_rank,2.4,
77
+ Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
78
+ Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
79
+ Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
80
+ Baseline,7000,mme_total_score,1185.875650260104,
81
+ Baseline,7000,mmmu_val_mmmu_acc,0.26556,
82
+ Baseline,7000,mmstar_average,0.31372400960777047,
83
+ Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
84
+ Baseline,7000,seedbench_seed_all,0.4964424680377988,
85
+ Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
86
+ Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
87
+ Baseline,8000,average,0.43846759477995995,
88
+ Baseline,8000,average_rank,2.4,
89
+ Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
90
+ Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
91
+ Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
92
+ Baseline,8000,mme_total_score,1199.2409963985594,
93
+ Baseline,8000,mmmu_val_mmmu_acc,0.28111,
94
+ Baseline,8000,mmstar_average,0.33512257186205047,
95
+ Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
96
+ Baseline,8000,seedbench_seed_all,0.5024458032240133,
97
+ Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
98
+ Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
99
+ Baseline,9000,average,0.4422510732201056,
100
+ Baseline,9000,average_rank,2.5,
101
+ Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
102
+ Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
103
+ Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
104
+ Baseline,9000,mme_total_score,1231.5195078031213,
105
+ Baseline,9000,mmmu_val_mmmu_acc,0.25889,
106
+ Baseline,9000,mmstar_average,0.3216444898242951,
107
+ Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
108
+ Baseline,9000,seedbench_seed_all,0.5120622568093385,
109
+ Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
110
+ Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
111
+ Baseline,10000,average,0.4523875703250908,
112
+ Baseline,10000,average_rank,2.3,
113
+ Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
114
+ Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
115
+ Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
116
+ Baseline,10000,mme_total_score,1240.8218287314926,
117
+ Baseline,10000,mmmu_val_mmmu_acc,0.28778,
118
+ Baseline,10000,mmstar_average,0.32972717906018517,
119
+ Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
120
+ Baseline,10000,seedbench_seed_all,0.5217342968315731,
121
+ Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
122
+ Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
123
+ Baseline,11000,average,0.4561398159525099,
124
+ Baseline,11000,average_rank,2.6,
125
+ Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
126
+ Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
127
+ Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
128
+ Baseline,11000,mme_total_score,1322.9488795518205,
129
+ Baseline,11000,mmmu_val_mmmu_acc,0.27778,
130
+ Baseline,11000,mmstar_average,0.3298563439522548,
131
+ Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
132
+ Baseline,11000,seedbench_seed_all,0.5237354085603113,
133
+ Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
134
+ Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
135
+ Baseline,12000,average,0.4582751140055433,
136
+ Baseline,12000,average_rank,2.7,
137
+ Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
138
+ Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
139
+ Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
140
+ Baseline,12000,mme_total_score,1225.6453581432572,
141
+ Baseline,12000,mmmu_val_mmmu_acc,0.27889,
142
+ Baseline,12000,mmstar_average,0.34010867846816534,
143
+ Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
144
+ Baseline,12000,seedbench_seed_all,0.5350194552529183,
145
+ Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
146
+ Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
147
+ Baseline,13000,average,0.4692868662590049,
148
+ Baseline,13000,average_rank,2.6,
149
+ Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
150
+ Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
151
+ Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
152
+ Baseline,13000,mme_total_score,1281.7122849139657,
153
+ Baseline,13000,mmmu_val_mmmu_acc,0.28222,
154
+ Baseline,13000,mmstar_average,0.3453069542917521,
155
+ Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
156
+ Baseline,13000,seedbench_seed_all,0.5442468037798777,
157
+ Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
158
+ Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
159
+ Baseline,14000,average,0.47352486841689195,
160
+ Baseline,14000,average_rank,2.5,
161
+ Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
162
+ Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
163
+ Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
164
+ Baseline,14000,mme_total_score,1309.1444577831132,
165
+ Baseline,14000,mmmu_val_mmmu_acc,0.28111,
166
+ Baseline,14000,mmstar_average,0.34575818188776586,
167
+ Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
168
+ Baseline,14000,seedbench_seed_all,0.5483602001111729,
169
+ Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
170
+ Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
171
+ Baseline,15000,average,0.47878665012878824,
172
+ Baseline,15000,average_rank,2.1,
173
+ Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
174
+ Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
175
+ Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
176
+ Baseline,15000,mme_total_score,1384.2171868747498,
177
+ Baseline,15000,mmmu_val_mmmu_acc,0.30222,
178
+ Baseline,15000,mmstar_average,0.35408135695920684,
179
+ Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
180
+ Baseline,15000,seedbench_seed_all,0.5411339633129516,
181
+ Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
182
+ Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
183
+ Baseline,16000,average,0.47665128022935843,
184
+ Baseline,16000,average_rank,2.3,
185
+ Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
186
+ Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
187
+ Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
188
+ Baseline,16000,mme_total_score,1317.8491396558625,
189
+ Baseline,16000,mmmu_val_mmmu_acc,0.27556,
190
+ Baseline,16000,mmstar_average,0.33214333327093315,
191
+ Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
192
+ Baseline,16000,seedbench_seed_all,0.5463590883824346,
193
+ Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
194
+ Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
195
+ Baseline,17000,average,0.4777141780162423,
196
+ Baseline,17000,average_rank,2.3,
197
+ Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
198
+ Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
199
+ Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
200
+ Baseline,17000,mme_total_score,1381.9161664665867,
201
+ Baseline,17000,mmmu_val_mmmu_acc,0.27667,
202
+ Baseline,17000,mmstar_average,0.3370289492329521,
203
+ Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
204
+ Baseline,17000,seedbench_seed_all,0.5510283490828238,
205
+ Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
206
+ Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
207
+ Baseline,18000,average,0.4819834595278701,
208
+ Baseline,18000,average_rank,2.5,
209
+ Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
210
+ Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
211
+ Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
212
+ Baseline,18000,mme_total_score,1336.922769107643,
213
+ Baseline,18000,mmmu_val_mmmu_acc,0.28667,
214
+ Baseline,18000,mmstar_average,0.34482796716566916,
215
+ Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
216
+ Baseline,18000,seedbench_seed_all,0.5543079488604781,
217
+ Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
218
+ Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
219
+ Baseline,19000,average,0.4899006713916878,
220
+ Baseline,19000,average_rank,2.1,
221
+ Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
222
+ Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
223
+ Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
224
+ Baseline,19000,mme_total_score,1406.6628651460583,
225
+ Baseline,19000,mmmu_val_mmmu_acc,0.28333,
226
+ Baseline,19000,mmstar_average,0.356220913822775,
227
+ Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
228
+ Baseline,19000,seedbench_seed_all,0.554585881045025,
229
+ Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
230
+ Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
231
+ Baseline,20000,average,0.4873169067639118,
232
+ Baseline,20000,average_rank,2.1,
233
+ Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
234
+ Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
235
+ Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
236
+ Baseline,20000,mme_total_score,1324.6738695478193,
237
+ Baseline,20000,mmmu_val_mmmu_acc,0.30111,
238
+ Baseline,20000,mmstar_average,0.33806766134497995,
239
+ Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
240
+ Baseline,20000,seedbench_seed_all,0.5587548638132296,
241
+ Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
242
+ ≥2,1000,ai2d_exact_match,0.25647668393782386,0.007859644922870102
243
+ ≥2,1000,average,0.27425088839708317,
244
+ ≥2,1000,average_rank,3.6,
245
+ ≥2,1000,chartqa_relaxed_overall,0.3528,0.009558734841217527
246
+ ≥2,1000,docvqa_val_anls,0.3487177879998493,0.005772448136868996
247
+ ≥2,1000,infovqa_val_anls,0.16953112470324783,0.006224999754409024
248
+ ≥2,1000,mme_total_score,824.5716286514606,
249
+ ≥2,1000,mmmu_val_mmmu_acc,0.26444,
250
+ ≥2,1000,mmstar_average,0.22150750732637972,
251
+ ≥2,1000,ocrbench_ocrbench_accuracy,0.277,
252
+ ≥2,1000,seedbench_seed_all,0.24880489160644803,
253
+ ≥2,1000,textvqa_val_exact_match,0.32898,0.006433157002732356
254
+ ≥2,2000,ai2d_exact_match,0.2713730569948187,0.008003273563555614
255
+ ≥2,2000,average,0.3170484430215563,
256
+ ≥2,2000,average_rank,3.2,
257
+ ≥2,2000,chartqa_relaxed_overall,0.4576,0.009965973321743335
258
+ ≥2,2000,docvqa_val_anls,0.4397320562320007,0.006117658797669254
259
+ ≥2,2000,infovqa_val_anls,0.19182235122159558,0.006445631136889586
260
+ ≥2,2000,mme_total_score,943.4792917166867,
261
+ ≥2,2000,mmmu_val_mmmu_acc,0.26222,
262
+ ≥2,2000,mmstar_average,0.21467384792624822,
263
+ ≥2,2000,ocrbench_ocrbench_accuracy,0.366,
264
+ ≥2,2000,seedbench_seed_all,0.2464146748193441,
265
+ ≥2,2000,textvqa_val_exact_match,0.40359999999999996,0.006696325571179395
266
+ ≥2,3000,ai2d_exact_match,0.2801165803108808,0.008082248116182685
267
+ ≥2,3000,average,0.34966771857538625,
268
+ ≥2,3000,average_rank,2.8,
269
+ ≥2,3000,chartqa_relaxed_overall,0.5228,0.009991596308834713
270
+ ≥2,3000,docvqa_val_anls,0.4874772803745103,0.006201774677139367
271
+ ≥2,3000,infovqa_val_anls,0.22283805188110828,0.00694062083381895
272
+ ≥2,3000,mme_total_score,966.8010204081634,
273
+ ≥2,3000,mmmu_val_mmmu_acc,0.27,
274
+ ≥2,3000,mmstar_average,0.2379978992478856,
275
+ ≥2,3000,ocrbench_ocrbench_accuracy,0.411,
276
+ ≥2,3000,seedbench_seed_all,0.28337965536409115,
277
+ ≥2,3000,textvqa_val_exact_match,0.4314,0.0067401404954778015
278
+ ≥2,4000,ai2d_exact_match,0.29533678756476683,0.008210720304314063
279
+ ≥2,4000,average,0.3812564539509352,
280
+ ≥2,4000,average_rank,2.5,
281
+ ≥2,4000,chartqa_relaxed_overall,0.5388,0.0099718403035556
282
+ ≥2,4000,docvqa_val_anls,0.5330469699330452,0.006286693650476338
283
+ ≥2,4000,infovqa_val_anls,0.24204946206609423,0.00717558288279668
284
+ ≥2,4000,mme_total_score,995.9115646258504,
285
+ ≥2,4000,mmmu_val_mmmu_acc,0.26667,
286
+ ≥2,4000,mmstar_average,0.3026544157443715,
287
+ ≥2,4000,ocrbench_ocrbench_accuracy,0.455,
288
+ ≥2,4000,seedbench_seed_all,0.35881045025013897,
289
+ ≥2,4000,textvqa_val_exact_match,0.43893999999999994,0.006772821384172211
290
+ ≥2,5000,ai2d_exact_match,0.33711139896373055,0.008508219384896985
291
+ ≥2,5000,average,0.4071218650285344,
292
+ ≥2,5000,average_rank,2.3,
293
+ ≥2,5000,chartqa_relaxed_overall,0.5668,0.009912336039617753
294
+ ≥2,5000,docvqa_val_anls,0.564524028708337,0.0062521888936635335
295
+ ≥2,5000,infovqa_val_anls,0.24496968712079598,0.007124175210142404
296
+ ≥2,5000,mme_total_score,1015.9452781112445,
297
+ ≥2,5000,mmmu_val_mmmu_acc,0.27889,
298
+ ≥2,5000,mmstar_average,0.28054619519991103,
299
+ ≥2,5000,ocrbench_ocrbench_accuracy,0.489,
300
+ ≥2,5000,seedbench_seed_all,0.43985547526403557,
301
+ ≥2,5000,textvqa_val_exact_match,0.4624,0.006784220893413342
302
+ ≥2,6000,ai2d_exact_match,0.35103626943005184,0.008590489143063932
303
+ ≥2,6000,average,0.4121891443057646,
304
+ ≥2,6000,average_rank,3.0,
305
+ ≥2,6000,chartqa_relaxed_overall,0.5768,0.009883307943718245
306
+ ≥2,6000,docvqa_val_anls,0.5776287354366231,0.0062230020803370695
307
+ ≥2,6000,infovqa_val_anls,0.2221908019883868,0.006590859192234515
308
+ ≥2,6000,mme_total_score,1020.3381352541016,
309
+ ≥2,6000,mmmu_val_mmmu_acc,0.28,
310
+ ≥2,6000,mmstar_average,0.27381767588792544,
311
+ ≥2,6000,ocrbench_ocrbench_accuracy,0.488,
312
+ ≥2,6000,seedbench_seed_all,0.46386881600889385,
313
+ ≥2,6000,textvqa_val_exact_match,0.47636000000000006,0.006799814525081922
314
+ ≥2,7000,ai2d_exact_match,0.37629533678756477,0.008719379877890883
315
+ ≥2,7000,average,0.41852126487504937,
316
+ ≥2,7000,average_rank,3.6,
317
+ ≥2,7000,chartqa_relaxed_overall,0.5784,0.009878279615563902
318
+ ≥2,7000,docvqa_val_anls,0.5890225700952161,0.00623482047941176
319
+ ≥2,7000,infovqa_val_anls,0.223522004380568,0.006616105445267792
320
+ ≥2,7000,mme_total_score,1017.6768707482994,
321
+ ≥2,7000,mmmu_val_mmmu_acc,0.26444,
322
+ ≥2,7000,mmstar_average,0.2842963864531179,
323
+ ≥2,7000,ocrbench_ocrbench_accuracy,0.485,
324
+ ≥2,7000,seedbench_seed_all,0.47915508615897723,
325
+ ≥2,7000,textvqa_val_exact_match,0.48656000000000005,0.006793372009587883
326
+ ≥2,8000,ai2d_exact_match,0.4015544041450777,0.008822998789014791
327
+ ≥2,8000,average,0.43741617461905385,
328
+ ≥2,8000,average_rank,2.7,
329
+ ≥2,8000,chartqa_relaxed_overall,0.5868,0.009850132691777215
330
+ ≥2,8000,docvqa_val_anls,0.6064868329976114,0.006195078404871516
331
+ ≥2,8000,infovqa_val_anls,0.237253715462471,0.006761266007987291
332
+ ≥2,8000,mme_total_score,1051.3844537815125,
333
+ ≥2,8000,mmmu_val_mmmu_acc,0.29556,
334
+ ≥2,8000,mmstar_average,0.3249125644916164,
335
+ ≥2,8000,ocrbench_ocrbench_accuracy,0.499,
336
+ ≥2,8000,seedbench_seed_all,0.4964980544747082,
337
+ ≥2,8000,textvqa_val_exact_match,0.48868,0.006786367399168372
338
+ ≥2,9000,ai2d_exact_match,0.40382124352331605,0.008831094143874315
339
+ ≥2,9000,average,0.4404946424331453,
340
+ ≥2,9000,average_rank,2.9,
341
+ ≥2,9000,chartqa_relaxed_overall,0.6032,0.00978663452296623
342
+ ≥2,9000,docvqa_val_anls,0.6121548768634689,0.0061762532067103386
343
+ ≥2,9000,infovqa_val_anls,0.22182207634556947,0.006503514281737561
344
+ ≥2,9000,mme_total_score,1016.1477591036414,
345
+ ≥2,9000,mmmu_val_mmmu_acc,0.28222,
346
+ ≥2,9000,mmstar_average,0.33800404653337934,
347
+ ≥2,9000,ocrbench_ocrbench_accuracy,0.5,
348
+ ≥2,9000,seedbench_seed_all,0.5051695386325736,
349
+ ≥2,9000,textvqa_val_exact_match,0.49805999999999995,0.006801536551389838
350
+ ≥2,10000,ai2d_exact_match,0.4258419689119171,0.00889962357526378
351
+ ≥2,10000,average,0.45210592763811075,
352
+ ≥2,10000,average_rank,2.3,
353
+ ≥2,10000,chartqa_relaxed_overall,0.5944,0.009822120220107639
354
+ ≥2,10000,docvqa_val_anls,0.6316361917189336,0.006144343697405114
355
+ ≥2,10000,infovqa_val_anls,0.23913212463600403,0.0067351911105917
356
+ ≥2,10000,mme_total_score,989.5969387755102,
357
+ ≥2,10000,mmmu_val_mmmu_acc,0.29111,
358
+ ≥2,10000,mmstar_average,0.3311992002187771,
359
+ ≥2,10000,ocrbench_ocrbench_accuracy,0.524,
360
+ ≥2,10000,seedbench_seed_all,0.5169538632573653,
361
+ ≥2,10000,textvqa_val_exact_match,0.51468,0.006777646111841742
362
+ ≥2,11000,ai2d_exact_match,0.42875647668393785,0.008907332750968604
363
+ ≥2,11000,average,0.45881179587417986,
364
+ ≥2,11000,average_rank,2.4,
365
+ ≥2,11000,chartqa_relaxed_overall,0.6112,0.009751505562952713
366
+ ≥2,11000,docvqa_val_anls,0.6351833269477972,0.006125490431617443
367
+ ≥2,11000,infovqa_val_anls,0.23606787081800862,0.006703826515822327
368
+ ≥2,11000,mme_total_score,1065.0292116846738,
369
+ ≥2,11000,mmmu_val_mmmu_acc,0.29444,
370
+ ≥2,11000,mmstar_average,0.3469301171004765,
371
+ ≥2,11000,ocrbench_ocrbench_accuracy,0.534,
372
+ ≥2,11000,seedbench_seed_all,0.5230683713173986,
373
+ ≥2,11000,textvqa_val_exact_match,0.51966,0.006767766057679764
374
+ ≥2,12000,ai2d_exact_match,0.4384715025906736,0.008930756993395149
375
+ ≥2,12000,average,0.4594169631513685,
376
+ ≥2,12000,average_rank,2.9,
377
+ ≥2,12000,chartqa_relaxed_overall,0.604,0.009783245103435851
378
+ ≥2,12000,docvqa_val_anls,0.6404649164504777,0.006108516485005316
379
+ ≥2,12000,infovqa_val_anls,0.24399960384251437,0.006798095411909792
380
+ ≥2,12000,mme_total_score,1071.3268307322928,
381
+ ≥2,12000,mmmu_val_mmmu_acc,0.28,
382
+ ≥2,12000,mmstar_average,0.3319074070128363,
383
+ ≥2,12000,ocrbench_ocrbench_accuracy,0.535,
384
+ ≥2,12000,seedbench_seed_all,0.5326292384658143,
385
+ ≥2,12000,textvqa_val_exact_match,0.52828,0.006774867385094495
386
+ ≥2,13000,ai2d_exact_match,0.4494818652849741,0.008953103134587205
387
+ ≥2,13000,average,0.4664204868584231,
388
+ ≥2,13000,average_rank,2.9,
389
+ ≥2,13000,chartqa_relaxed_overall,0.6072,0.00976941352263433
390
+ ≥2,13000,docvqa_val_anls,0.6520830792564345,0.006078195885582825
391
+ ≥2,13000,infovqa_val_anls,0.2540091405377872,0.006900079046632844
392
+ ≥2,13000,mme_total_score,1102.111644657863,
393
+ ≥2,13000,mmmu_val_mmmu_acc,0.27889,
394
+ ≥2,13000,mmstar_average,0.344523865295862,
395
+ ≥2,13000,ocrbench_ocrbench_accuracy,0.544,
396
+ ≥2,13000,seedbench_seed_all,0.5375764313507504,
397
+ ≥2,13000,textvqa_val_exact_match,0.5300199999999999,0.006760687930991938
398
+ ≥2,14000,ai2d_exact_match,0.46599740932642486,0.008978320789223167
399
+ ≥2,14000,average,0.47503952495924406,
400
+ ≥2,14000,average_rank,2.1,
401
+ ≥2,14000,chartqa_relaxed_overall,0.618,0.009719474639861454
402
+ ≥2,14000,docvqa_val_anls,0.6580902962118945,0.006056507155937736
403
+ ≥2,14000,infovqa_val_anls,0.2596815364895075,0.006931336614399575
404
+ ≥2,14000,mme_total_score,1081.4191676670669,
405
+ ≥2,14000,mmmu_val_mmmu_acc,0.29333,
406
+ ≥2,14000,mmstar_average,0.34673893952588086,
407
+ ≥2,14000,ocrbench_ocrbench_accuracy,0.547,
408
+ ≥2,14000,seedbench_seed_all,0.5395775430794886,
409
+ ≥2,14000,textvqa_val_exact_match,0.5469400000000001,0.006754557875273413
410
+ ≥2,15000,ai2d_exact_match,0.46211139896373055,0.008973279520621462
411
+ ≥2,15000,average,0.4760526986294352,
412
+ ≥2,15000,average_rank,2.4,
413
+ ≥2,15000,chartqa_relaxed_overall,0.628,0.009668701749325345
414
+ ≥2,15000,docvqa_val_anls,0.6648088448329239,0.006037271631807744
415
+ ≥2,15000,infovqa_val_anls,0.25795022333006473,0.006890072365188988
416
+ ≥2,15000,mme_total_score,1089.547719087635,
417
+ ≥2,15000,mmmu_val_mmmu_acc,0.28556,
418
+ ≥2,15000,mmstar_average,0.3469607521668807,
419
+ ≥2,15000,ocrbench_ocrbench_accuracy,0.552,
420
+ ≥2,15000,seedbench_seed_all,0.5415230683713174,
421
+ ≥2,15000,textvqa_val_exact_match,0.5455599999999999,0.006760798692446918
422
+ ≥2,16000,ai2d_exact_match,0.45919689119170987,0.008969138793675547
423
+ ≥2,16000,average,0.47749499404431844,
424
+ ≥2,16000,average_rank,2.6,
425
+ ≥2,16000,chartqa_relaxed_overall,0.6308,0.009653694708691147
426
+ ≥2,16000,docvqa_val_anls,0.6761390499297608,0.005978202784466009
427
+ ≥2,16000,infovqa_val_anls,0.258655084903391,0.006852561120622793
428
+ ≥2,16000,mme_total_score,1131.8199279711885,
429
+ ≥2,16000,mmmu_val_mmmu_acc,0.28111,
430
+ ≥2,16000,mmstar_average,0.3465829809632207,
431
+ ≥2,16000,ocrbench_ocrbench_accuracy,0.55,
432
+ ≥2,16000,seedbench_seed_all,0.5436909394107837,
433
+ ≥2,16000,textvqa_val_exact_match,0.55128,0.00673284030314915
434
+ ≥2,17000,ai2d_exact_match,0.47053108808290156,0.008983510489560252
435
+ ≥2,17000,average,0.4824112637913165,
436
+ ≥2,17000,average_rank,2.5,
437
+ ≥2,17000,chartqa_relaxed_overall,0.6264,0.009677121197436144
438
+ ≥2,17000,docvqa_val_anls,0.6761198524404004,0.005987381973810974
439
+ ≥2,17000,infovqa_val_anls,0.2750604377713151,0.007138932651224592
440
+ ≥2,17000,mme_total_score,1029.6003401360545,
441
+ ≥2,17000,mmmu_val_mmmu_acc,0.28667,
442
+ ≥2,17000,mmstar_average,0.35096186353151126,
443
+ ≥2,17000,ocrbench_ocrbench_accuracy,0.554,
444
+ ≥2,17000,seedbench_seed_all,0.5486381322957199,
445
+ ≥2,17000,textvqa_val_exact_match,0.55332,0.006735374419712295
446
+ ≥2,18000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
447
+ ≥2,18000,average,0.48820815333600937,
448
+ ≥2,18000,average_rank,2.0,
449
+ ≥2,18000,chartqa_relaxed_overall,0.644,0.009578219924326623
450
+ ≥2,18000,docvqa_val_anls,0.6810993351781675,0.005958907235334871
451
+ ≥2,18000,infovqa_val_anls,0.26273964411171846,0.006970166334491144
452
+ ≥2,18000,mme_total_score,1233.8152260904362,
453
+ ≥2,18000,mmmu_val_mmmu_acc,0.30889,
454
+ ≥2,18000,mmstar_average,0.35081513813292553,
455
+ ≥2,18000,ocrbench_ocrbench_accuracy,0.578,
456
+ ≥2,18000,seedbench_seed_all,0.5450250138966092,
457
+ ≥2,18000,textvqa_val_exact_match,0.5550400000000001,0.006740445564002446
458
+ ≥2,19000,ai2d_exact_match,0.48154145077720206,0.00899301968014488
459
+ ≥2,19000,average,0.48747038725935266,
460
+ ≥2,19000,average_rank,2.5,
461
+ ≥2,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
462
+ ≥2,19000,docvqa_val_anls,0.6803936935190295,0.005967440848189623
463
+ ≥2,19000,infovqa_val_anls,0.2673329802724793,0.006976813332121409
464
+ ≥2,19000,mme_total_score,1179.0547218887555,
465
+ ≥2,19000,mmmu_val_mmmu_acc,0.29556,
466
+ ≥2,19000,mmstar_average,0.340879324078415,
467
+ ≥2,19000,ocrbench_ocrbench_accuracy,0.562,
468
+ ≥2,19000,seedbench_seed_all,0.5588660366870484,
469
+ ≥2,19000,textvqa_val_exact_match,0.5562600000000001,0.006734421501999508
470
+ ≥2,20000,ai2d_exact_match,0.4805699481865285,0.008992356706334513
471
+ ≥2,20000,average,0.49109872298543183,
472
+ ≥2,20000,average_rank,1.7,
473
+ ≥2,20000,chartqa_relaxed_overall,0.6464,0.009563650001989001
474
+ ≥2,20000,docvqa_val_anls,0.6823974164165829,0.005959610876737005
475
+ ≥2,20000,infovqa_val_anls,0.26825054401896686,0.007072214875698234
476
+ ≥2,20000,mme_total_score,1187.1244497799119,
477
+ ≥2,20000,mmmu_val_mmmu_acc,0.31,
478
+ ≥2,20000,mmstar_average,0.3539436054730449,
479
+ ≥2,20000,ocrbench_ocrbench_accuracy,0.568,
480
+ ≥2,20000,seedbench_seed_all,0.5565869927737632,
481
+ ≥2,20000,textvqa_val_exact_match,0.55374,0.006734617546282709
482
+ ≥3,1000,ai2d_exact_match,0.2619818652849741,0.007914086941902848
483
+ ≥3,1000,average,0.2794334029794183,
484
+ ≥3,1000,average_rank,2.8,
485
+ ≥3,1000,chartqa_relaxed_overall,0.3624,0.009615793331418735
486
+ ≥3,1000,docvqa_val_anls,0.358726414254659,0.00583517632645742
487
+ ≥3,1000,infovqa_val_anls,0.17567716068461908,0.0063503165333000855
488
+ ≥3,1000,mme_total_score,754.6462585034014,
489
+ ≥3,1000,mmmu_val_mmmu_acc,0.25889,
490
+ ≥3,1000,mmstar_average,0.20669310209912864,
491
+ ≥3,1000,ocrbench_ocrbench_accuracy,0.299,
492
+ ≥3,1000,seedbench_seed_all,0.2537520844913841,
493
+ ≥3,1000,textvqa_val_exact_match,0.33777999999999997,0.006462823526724795
494
+ ≥3,2000,ai2d_exact_match,0.2707253886010363,0.007997269386750955
495
+ ≥3,2000,average,0.324956811840241,
496
+ ≥3,2000,average_rank,2.9,
497
+ ≥3,2000,chartqa_relaxed_overall,0.468,0.009981495484186743
498
+ ≥3,2000,docvqa_val_anls,0.4401305975808376,0.006085479161829202
499
+ ≥3,2000,infovqa_val_anls,0.21738366907082515,0.00690560152820958
500
+ ≥3,2000,mme_total_score,780.5238095238094,
501
+ ≥3,2000,mmmu_val_mmmu_acc,0.25222,
502
+ ≥3,2000,mmstar_average,0.2313413567013541,
503
+ ≥3,2000,ocrbench_ocrbench_accuracy,0.386,
504
+ ≥3,2000,seedbench_seed_all,0.2545302946081156,
505
+ ≥3,2000,textvqa_val_exact_match,0.40428000000000003,0.006698634984990034
506
+ ≥3,3000,ai2d_exact_match,0.27363989637305697,0.008024119445073188
507
+ ≥3,3000,average,0.35281014111410386,
508
+ ≥3,3000,average_rank,2.6,
509
+ ≥3,3000,chartqa_relaxed_overall,0.5132,0.009998514495506157
510
+ ≥3,3000,docvqa_val_anls,0.49578090596419144,0.0062540129206588675
511
+ ≥3,3000,infovqa_val_anls,0.22472603379950587,0.006863330299819649
512
+ ≥3,3000,mme_total_score,868.3095238095237,
513
+ ≥3,3000,mmmu_val_mmmu_acc,0.27444,
514
+ ≥3,3000,mmstar_average,0.25839301643603935,
515
+ ≥3,3000,ocrbench_ocrbench_accuracy,0.409,
516
+ ≥3,3000,seedbench_seed_all,0.2925514174541412,
517
+ ≥3,3000,textvqa_val_exact_match,0.43356000000000006,0.006754959006110611
518
+ ≥3,4000,ai2d_exact_match,0.28335492227979275,0.008110527983566214
519
+ ≥3,4000,average,0.3674252373982893,
520
+ ≥3,4000,average_rank,3.5,
521
+ ≥3,4000,chartqa_relaxed_overall,0.5396,0.009970581778431997
522
+ ≥3,4000,docvqa_val_anls,0.5289127945577605,0.006289931251894248
523
+ ≥3,4000,infovqa_val_anls,0.21582133824627234,0.00674279410471775
524
+ ≥3,4000,mme_total_score,889.4285714285714,
525
+ ≥3,4000,mmmu_val_mmmu_acc,0.24556,
526
+ ≥3,4000,mmstar_average,0.2644251965647028,
527
+ ≥3,4000,ocrbench_ocrbench_accuracy,0.429,
528
+ ≥3,4000,seedbench_seed_all,0.3471928849360756,
529
+ ≥3,4000,textvqa_val_exact_match,0.45296000000000003,0.006791544205446865
530
+ ≥3,5000,ai2d_exact_match,0.33516839378238344,0.008496088804445223
531
+ ≥3,5000,average,0.39888444206353324,
532
+ ≥3,5000,average_rank,3.2,
533
+ ≥3,5000,chartqa_relaxed_overall,0.5716,0.009898917689756362
534
+ ≥3,5000,docvqa_val_anls,0.5575899695261644,0.006265975659556661
535
+ ≥3,5000,infovqa_val_anls,0.23013455835644483,0.0068368490116401705
536
+ ≥3,5000,mme_total_score,985.1445578231293,
537
+ ≥3,5000,mmmu_val_mmmu_acc,0.27111,
538
+ ≥3,5000,mmstar_average,0.2946740552392133,
539
+ ≥3,5000,ocrbench_ocrbench_accuracy,0.43,
540
+ ≥3,5000,seedbench_seed_all,0.4254030016675931,
541
+ ≥3,5000,textvqa_val_exact_match,0.4742799999999999,0.006788410183729657
542
+ ≥3,6000,ai2d_exact_match,0.3601036269430052,0.008639731726372677
543
+ ≥3,6000,average,0.4225217783490169,
544
+ ≥3,6000,average_rank,2.1,
545
+ ≥3,6000,chartqa_relaxed_overall,0.5752,0.009888230116554488
546
+ ≥3,6000,docvqa_val_anls,0.5829205672304983,0.006247927399021504
547
+ ≥3,6000,infovqa_val_anls,0.24796306866649032,0.007051352766215089
548
+ ≥3,6000,mme_total_score,894.8299319727892,
549
+ ≥3,6000,mmmu_val_mmmu_acc,0.28667,
550
+ ≥3,6000,mmstar_average,0.32224643546402654,
551
+ ≥3,6000,ocrbench_ocrbench_accuracy,0.466,
552
+ ≥3,6000,seedbench_seed_all,0.4741523068371317,
553
+ ≥3,6000,textvqa_val_exact_match,0.48744000000000004,0.006797771047795272
554
+ ≥3,7000,ai2d_exact_match,0.39378238341968913,0.008793749766856823
555
+ ≥3,7000,average,0.42775795560136004,
556
+ ≥3,7000,average_rank,2.5,
557
+ ≥3,7000,chartqa_relaxed_overall,0.5876,0.009847298295140926
558
+ ≥3,7000,docvqa_val_anls,0.6000941606468793,0.006194010994352466
559
+ ≥3,7000,infovqa_val_anls,0.24479859857192363,0.007060559159607034
560
+ ≥3,7000,mme_total_score,876.7176870748299,
561
+ ≥3,7000,mmmu_val_mmmu_acc,0.27222,
562
+ ≥3,7000,mmstar_average,0.2977466801194958,
563
+ ≥3,7000,ocrbench_ocrbench_accuracy,0.477,
564
+ ≥3,7000,seedbench_seed_all,0.4795997776542524,
565
+ ≥3,7000,textvqa_val_exact_match,0.49698000000000003,0.0067935120726511
566
+ ≥3,8000,ai2d_exact_match,0.4057642487046632,0.008837877210720615
567
+ ≥3,8000,average,0.43551031057375794,
568
+ ≥3,8000,average_rank,2.9,
569
+ ≥3,8000,chartqa_relaxed_overall,0.592,0.009831228876620145
570
+ ≥3,8000,docvqa_val_anls,0.6177973292861353,0.006138014034823096
571
+ ≥3,8000,infovqa_val_anls,0.23532628107457257,0.00676650636184475
572
+ ≥3,8000,mme_total_score,939.6819727891157,
573
+ ≥3,8000,mmmu_val_mmmu_acc,0.27444,
574
+ ≥3,8000,mmstar_average,0.29094521403063506,
575
+ ≥3,8000,ocrbench_ocrbench_accuracy,0.499,
576
+ ≥3,8000,seedbench_seed_all,0.49949972206781545,
577
+ ≥3,8000,textvqa_val_exact_match,0.5048199999999999,0.0067899465531651255
578
+ ≥3,9000,ai2d_exact_match,0.40770725388601037,0.008844516803704298
579
+ ≥3,9000,average,0.4390017474760467,
580
+ ≥3,9000,average_rank,2.9,
581
+ ≥3,9000,chartqa_relaxed_overall,0.5872,0.009848718845878486
582
+ ≥3,9000,docvqa_val_anls,0.61752739984947,0.00618332088681346
583
+ ≥3,9000,infovqa_val_anls,0.25912362264120503,0.007280015371693194
584
+ ≥3,9000,mme_total_score,879.0001000400159,
585
+ ≥3,9000,mmmu_val_mmmu_acc,0.27889,
586
+ ≥3,9000,mmstar_average,0.31710081388716777,
587
+ ≥3,9000,ocrbench_ocrbench_accuracy,0.475,
588
+ ≥3,9000,seedbench_seed_all,0.503946637020567,
589
+ ≥3,9000,textvqa_val_exact_match,0.5045200000000001,0.006796966505047244
590
+ ≥3,10000,ai2d_exact_match,0.41580310880829013,0.008870644443998564
591
+ ≥3,10000,average,0.4482767982697443,
592
+ ≥3,10000,average_rank,3.1,
593
+ ≥3,10000,chartqa_relaxed_overall,0.5948,0.009820578470976232
594
+ ≥3,10000,docvqa_val_anls,0.632014816225421,0.006118052909783931
595
+ ≥3,10000,infovqa_val_anls,0.26061122659986763,0.007146718031628882
596
+ ≥3,10000,mme_total_score,988.3401360544218,
597
+ ≥3,10000,mmmu_val_mmmu_acc,0.28556,
598
+ ≥3,10000,mmstar_average,0.30861783045948943,
599
+ ≥3,10000,ocrbench_ocrbench_accuracy,0.506,
600
+ ≥3,10000,seedbench_seed_all,0.5155642023346303,
601
+ ≥3,10000,textvqa_val_exact_match,0.5155200000000001,0.006789480490366388
602
+ ≥3,11000,ai2d_exact_match,0.43944300518134716,0.0089329077973751
603
+ ≥3,11000,average,0.4597510485723372,
604
+ ≥3,11000,average_rank,2.5,
605
+ ≥3,11000,chartqa_relaxed_overall,0.6092,0.009760545645634788
606
+ ≥3,11000,docvqa_val_anls,0.6464425255299558,0.006062020581004778
607
+ ≥3,11000,infovqa_val_anls,0.25020176764946855,0.006887224684938156
608
+ ≥3,11000,mme_total_score,960.4336734693878,
609
+ ≥3,11000,mmmu_val_mmmu_acc,0.28556,
610
+ ≥3,11000,mmstar_average,0.3246843233372334,
611
+ ≥3,11000,ocrbench_ocrbench_accuracy,0.523,
612
+ ≥3,11000,seedbench_seed_all,0.5220678154530295,
613
+ ≥3,11000,textvqa_val_exact_match,0.53716,0.006766105446753199
614
+ ≥3,12000,ai2d_exact_match,0.44430051813471505,0.008943141268224502
615
+ ≥3,12000,average,0.46138583256526733,
616
+ ≥3,12000,average_rank,2.4,
617
+ ≥3,12000,chartqa_relaxed_overall,0.6176,0.00972141442174665
618
+ ≥3,12000,docvqa_val_anls,0.6470164139517766,0.006072800791453103
619
+ ≥3,12000,infovqa_val_anls,0.25520554317365624,0.00698649679999368
620
+ ≥3,12000,mme_total_score,907.6496598639455,
621
+ ≥3,12000,mmmu_val_mmmu_acc,0.26889,
622
+ ≥3,12000,mmstar_average,0.3420805959262021,
623
+ ≥3,12000,ocrbench_ocrbench_accuracy,0.518,
624
+ ≥3,12000,seedbench_seed_all,0.5269594219010562,
625
+ ≥3,12000,textvqa_val_exact_match,0.53242,0.006773903296709706
626
+ ≥3,13000,ai2d_exact_match,0.4523963730569948,0.00895827521082005
627
+ ≥3,13000,average,0.46949309002333817,
628
+ ≥3,13000,average_rank,2.4,
629
+ ≥3,13000,chartqa_relaxed_overall,0.6212,0.009703704898413913
630
+ ≥3,13000,docvqa_val_anls,0.6619667030411912,0.006021347175756138
631
+ ≥3,13000,infovqa_val_anls,0.2616368936908815,0.007081151619852865
632
+ ≥3,13000,mme_total_score,949.1751700680272,
633
+ ≥3,13000,mmmu_val_mmmu_acc,0.29,
634
+ ≥3,13000,mmstar_average,0.3366565952847893,
635
+ ≥3,13000,ocrbench_ocrbench_accuracy,0.531,
636
+ ≥3,13000,seedbench_seed_all,0.5342412451361868,
637
+ ≥3,13000,textvqa_val_exact_match,0.5363399999999999,0.006781436312145912
638
+ ≥3,14000,ai2d_exact_match,0.45466321243523317,0.008962083606139334
639
+ ≥3,14000,average,0.4712975949864662,
640
+ ≥3,14000,average_rank,2.7,
641
+ ≥3,14000,chartqa_relaxed_overall,0.6264,0.009677121197436144
642
+ ≥3,14000,docvqa_val_anls,0.6740908198240112,0.005957001035082802
643
+ ≥3,14000,infovqa_val_anls,0.2577834440006994,0.006966195686343909
644
+ ≥3,14000,mme_total_score,1001.2684073629453,
645
+ ≥3,14000,mmmu_val_mmmu_acc,0.29333,
646
+ ≥3,14000,mmstar_average,0.32549694865716267,
647
+ ≥3,14000,ocrbench_ocrbench_accuracy,0.523,
648
+ ≥3,14000,seedbench_seed_all,0.5330739299610895,
649
+ ≥3,14000,textvqa_val_exact_match,0.55384,0.006735794315818514
650
+ ≥3,15000,ai2d_exact_match,0.4540155440414508,0.008961014613274426
651
+ ≥3,15000,average,0.47089632593243724,
652
+ ≥3,15000,average_rank,3.0,
653
+ ≥3,15000,chartqa_relaxed_overall,0.6308,0.009653694708691147
654
+ ≥3,15000,docvqa_val_anls,0.6653892896567976,0.006002863596715536
655
+ ≥3,15000,infovqa_val_anls,0.25006728644957676,0.006925310310812123
656
+ ≥3,15000,mme_total_score,952.5915366146459,
657
+ ≥3,15000,mmmu_val_mmmu_acc,0.28778,
658
+ ≥3,15000,mmstar_average,0.321321405795528,
659
+ ≥3,15000,ocrbench_ocrbench_accuracy,0.544,
660
+ ≥3,15000,seedbench_seed_all,0.5401334074485825,
661
+ ≥3,15000,textvqa_val_exact_match,0.5445599999999999,0.00676213180626591
662
+ ≥3,16000,ai2d_exact_match,0.46113989637305697,0.008971933568013594
663
+ ≥3,16000,average,0.4765857166360205,
664
+ ≥3,16000,average_rank,2.7,
665
+ ≥3,16000,chartqa_relaxed_overall,0.6292,0.00966231277258432
666
+ ≥3,16000,docvqa_val_anls,0.6750185163043225,0.005960110331744373
667
+ ≥3,16000,infovqa_val_anls,0.26628641470953346,0.007064079945590166
668
+ ≥3,16000,mme_total_score,1021.2616046418567,
669
+ ≥3,16000,mmmu_val_mmmu_acc,0.28889,
670
+ ≥3,16000,mmstar_average,0.31529947948012893,
671
+ ≥3,16000,ocrbench_ocrbench_accuracy,0.558,
672
+ ≥3,16000,seedbench_seed_all,0.5428571428571428,
673
+ ≥3,16000,textvqa_val_exact_match,0.5525800000000001,0.0067489094137982856
674
+ ≥3,17000,ai2d_exact_match,0.46016839378238344,0.00897055333097463
675
+ ≥3,17000,average,0.4767395382894575,
676
+ ≥3,17000,average_rank,2.8,
677
+ ≥3,17000,chartqa_relaxed_overall,0.636,0.009624897685803465
678
+ ≥3,17000,docvqa_val_anls,0.6722259229035369,0.005969094115618876
679
+ ≥3,17000,infovqa_val_anls,0.24935441742721265,0.006802032350689338
680
+ ≥3,17000,mme_total_score,936.1445578231293,
681
+ ≥3,17000,mmmu_val_mmmu_acc,0.28778,
682
+ ≥3,17000,mmstar_average,0.3236027747499053,
683
+ ≥3,17000,ocrbench_ocrbench_accuracy,0.557,
684
+ ≥3,17000,seedbench_seed_all,0.547804335742079,
685
+ ≥3,17000,textvqa_val_exact_match,0.55672,0.00673676815164555
686
+ ≥3,18000,ai2d_exact_match,0.4637305699481865,0.008975446629055962
687
+ ≥3,18000,average,0.48384455011990835,
688
+ ≥3,18000,average_rank,2.5,
689
+ ≥3,18000,chartqa_relaxed_overall,0.6448,0.009573392498878078
690
+ ≥3,18000,docvqa_val_anls,0.6796365109602944,0.005968558973913562
691
+ ≥3,18000,infovqa_val_anls,0.26322789951300113,0.0069935719047194405
692
+ ≥3,18000,mme_total_score,1086.5858343337334,
693
+ ≥3,18000,mmmu_val_mmmu_acc,0.29667,
694
+ ≥3,18000,mmstar_average,0.32901024525469136,
695
+ ≥3,18000,ocrbench_ocrbench_accuracy,0.568,
696
+ ≥3,18000,seedbench_seed_all,0.5503057254030017,
697
+ ≥3,18000,textvqa_val_exact_match,0.5592199999999999,0.006724703907870569
698
+ ≥3,19000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
699
+ ≥3,19000,average,0.48669992125156075,
700
+ ≥3,19000,average_rank,2.6,
701
+ ≥3,19000,chartqa_relaxed_overall,0.6432,0.009583018193402223
702
+ ≥3,19000,docvqa_val_anls,0.6804836987040075,0.005959667749728633
703
+ ≥3,19000,infovqa_val_anls,0.2638534672767717,0.007026196725406296
704
+ ≥3,19000,mme_total_score,960.3163265306122,
705
+ ≥3,19000,mmmu_val_mmmu_acc,0.29778,
706
+ ≥3,19000,mmstar_average,0.3418245205114739,
707
+ ≥3,19000,ocrbench_ocrbench_accuracy,0.568,
708
+ ≥3,19000,seedbench_seed_all,0.5493051695386326,
709
+ ≥3,19000,textvqa_val_exact_match,0.56014,0.006731277597872481
710
+ ≥3,20000,ai2d_exact_match,0.47117875647668395,0.008984191131586656
711
+ ≥3,20000,average,0.4903196828425222,
712
+ ≥3,20000,average_rank,2.2,
713
+ ≥3,20000,chartqa_relaxed_overall,0.648,0.009553790345406665
714
+ ≥3,20000,docvqa_val_anls,0.6902930502166585,0.0059096225576472155
715
+ ≥3,20000,infovqa_val_anls,0.2637260616044305,0.007044756469416206
716
+ ≥3,20000,mme_total_score,968.2636054421769,
717
+ ≥3,20000,mmmu_val_mmmu_acc,0.29778,
718
+ ≥3,20000,mmstar_average,0.3516103723377342,
719
+ ≥3,20000,ocrbench_ocrbench_accuracy,0.568,
720
+ ≥3,20000,seedbench_seed_all,0.5520289049471929,
721
+ ≥3,20000,textvqa_val_exact_match,0.57026,0.0067066312154801
722
+ ≥4,1000,ai2d_exact_match,0.24514248704663213,0.00774236194438642
723
+ ≥4,1000,average,0.2886475913888803,
724
+ ≥4,1000,average_rank,2.3,
725
+ ≥4,1000,chartqa_relaxed_overall,0.3972,0.009788318981080978
726
+ ≥4,1000,docvqa_val_anls,0.37365436598717294,0.005925680297715887
727
+ ≥4,1000,infovqa_val_anls,0.17743073573571846,0.0061602017146906085
728
+ ≥4,1000,mme_total_score,632.5170068027211,
729
+ ≥4,1000,mmmu_val_mmmu_acc,0.26889,
730
+ ≥4,1000,mmstar_average,0.2207913007120554,
731
+ ≥4,1000,ocrbench_ocrbench_accuracy,0.295,
732
+ ≥4,1000,seedbench_seed_all,0.26297943301834353,
733
+ ≥4,1000,textvqa_val_exact_match,0.35674,0.006549831642027738
734
+ ≥4,2000,ai2d_exact_match,0.2658678756476684,0.007951548865715979
735
+ ≥4,2000,average,0.3301948089612685,
736
+ ≥4,2000,average_rank,2.2,
737
+ ≥4,2000,chartqa_relaxed_overall,0.5108,0.009999667061284322
738
+ ≥4,2000,docvqa_val_anls,0.47288379978857037,0.006197116763458197
739
+ ≥4,2000,infovqa_val_anls,0.19614396715396193,0.006363207550147918
740
+ ≥4,2000,mme_total_score,649.1020408163265,
741
+ ≥4,2000,mmmu_val_mmmu_acc,0.25333,
742
+ ≥4,2000,mmstar_average,0.23593804384220507,
743
+ ≥4,2000,ocrbench_ocrbench_accuracy,0.344,
744
+ ≥4,2000,seedbench_seed_all,0.28526959421901055,
745
+ ≥4,2000,textvqa_val_exact_match,0.40752,0.006707017723053031
746
+ ≥4,3000,ai2d_exact_match,0.27428756476683935,0.008030027397236163
747
+ ≥4,3000,average,0.35549277156858605,
748
+ ≥4,3000,average_rank,3.1,
749
+ ≥4,3000,chartqa_relaxed_overall,0.5332,0.009979927032670678
750
+ ≥4,3000,docvqa_val_anls,0.5073841710534057,0.006243585075672888
751
+ ≥4,3000,infovqa_val_anls,0.2112620781635733,0.006555166517270566
752
+ ≥4,3000,mme_total_score,631.2074829931973,
753
+ ≥4,3000,mmmu_val_mmmu_acc,0.27,
754
+ ≥4,3000,mmstar_average,0.23471667210121605,
755
+ ≥4,3000,ocrbench_ocrbench_accuracy,0.404,
756
+ ≥4,3000,seedbench_seed_all,0.34402445803224013,
757
+ ≥4,3000,textvqa_val_exact_match,0.42056000000000004,0.006749442071286688
758
+ ≥4,4000,ai2d_exact_match,0.31832901554404147,0.008384114535775948
759
+ ≥4,4000,average,0.385231957814173,
760
+ ≥4,4000,average_rank,2.4,
761
+ ≥4,4000,chartqa_relaxed_overall,0.5652,0.009916598185256227
762
+ ≥4,4000,docvqa_val_anls,0.5416928947604102,0.006213976135239445
763
+ ≥4,4000,infovqa_val_anls,0.20356144693573172,0.0062836907942324565
764
+ ≥4,4000,mme_total_score,653.2091836734694,
765
+ ≥4,4000,mmmu_val_mmmu_acc,0.28,
766
+ ≥4,4000,mmstar_average,0.29405509132528374,
767
+ ≥4,4000,ocrbench_ocrbench_accuracy,0.405,
768
+ ≥4,4000,seedbench_seed_all,0.41650917176209007,
769
+ ≥4,4000,textvqa_val_exact_match,0.44273999999999997,0.006779280950811967
770
+ ≥4,5000,ai2d_exact_match,0.36755181347150256,0.00867767630454297
771
+ ≥4,5000,average,0.4077588201424885,
772
+ ≥4,5000,average_rank,2.6,
773
+ ≥4,5000,chartqa_relaxed_overall,0.58,0.009873144969898833
774
+ ≥4,5000,docvqa_val_anls,0.5630037000906716,0.006209962710311604
775
+ ≥4,5000,infovqa_val_anls,0.21458613370689986,0.006397856835104317
776
+ ≥4,5000,mme_total_score,625.7602040816327,
777
+ ≥4,5000,mmmu_val_mmmu_acc,0.27333,
778
+ ≥4,5000,mmstar_average,0.3209608357365017,
779
+ ≥4,5000,ocrbench_ocrbench_accuracy,0.445,
780
+ ≥4,5000,seedbench_seed_all,0.45041689827682047,
781
+ ≥4,5000,textvqa_val_exact_match,0.45498,0.006789422594877592
782
+ ≥4,6000,ai2d_exact_match,0.3785621761658031,0.008729696327646355
783
+ ≥4,6000,average,0.41319441398053125,
784
+ ≥4,6000,average_rank,3.0,
785
+ ≥4,6000,chartqa_relaxed_overall,0.598,0.009808000752013664
786
+ ≥4,6000,docvqa_val_anls,0.5873915760067876,0.006194850343529871
787
+ ≥4,6000,infovqa_val_anls,0.2118313803571488,0.0064268256454762555
788
+ ≥4,6000,mme_total_score,647.9846938775511,
789
+ ≥4,6000,mmmu_val_mmmu_acc,0.27667,
790
+ ≥4,6000,mmstar_average,0.31152278673584216,
791
+ ≥4,6000,ocrbench_ocrbench_accuracy,0.432,
792
+ ≥4,6000,seedbench_seed_all,0.45325180655919955,
793
+ ≥4,6000,textvqa_val_exact_match,0.46952,0.006815356464393287
794
+ ≥4,7000,ai2d_exact_match,0.405440414507772,0.008836756671878079
795
+ ≥4,7000,average,0.4292314266450108,
796
+ ≥4,7000,average_rank,2.5,
797
+ ≥4,7000,chartqa_relaxed_overall,0.602,0.00979166741164548
798
+ ≥4,7000,docvqa_val_anls,0.5975001541722433,0.006202378232201727
799
+ ≥4,7000,infovqa_val_anls,0.22746329153304923,0.006598501883805769
800
+ ≥4,7000,mme_total_score,644.2482993197278,
801
+ ≥4,7000,mmmu_val_mmmu_acc,0.29556,
802
+ ≥4,7000,mmstar_average,0.3402250385136554,
803
+ ≥4,7000,ocrbench_ocrbench_accuracy,0.456,
804
+ ≥4,7000,seedbench_seed_all,0.4690939410783769,
805
+ ≥4,7000,textvqa_val_exact_match,0.4698,0.006774981333879443
806
+ ≥4,8000,ai2d_exact_match,0.420660621761658,0.008885137221616577
807
+ ≥4,8000,average,0.4378486452448895,
808
+ ≥4,8000,average_rank,2.7,
809
+ ≥4,8000,chartqa_relaxed_overall,0.6104,0.009755142291143075
810
+ ≥4,8000,docvqa_val_anls,0.6138040101801899,0.006179898911166834
811
+ ≥4,8000,infovqa_val_anls,0.22606978572806807,0.006579692710461168
812
+ ≥4,8000,mme_total_score,658.9676870748299,
813
+ ≥4,8000,mmmu_val_mmmu_acc,0.28778,
814
+ ≥4,8000,mmstar_average,0.3468841677442059,
815
+ ≥4,8000,ocrbench_ocrbench_accuracy,0.463,
816
+ ≥4,8000,seedbench_seed_all,0.4785992217898833,
817
+ ≥4,8000,textvqa_val_exact_match,0.49344000000000005,0.006803341118162017
818
+ ≥4,9000,ai2d_exact_match,0.4219559585492228,0.008888852746011196
819
+ ≥4,9000,average,0.4420874430953781,
820
+ ≥4,9000,average_rank,2.6,
821
+ ≥4,9000,chartqa_relaxed_overall,0.6152,0.009732906852031212
822
+ ≥4,9000,docvqa_val_anls,0.6305245733667586,0.006112674867758156
823
+ ≥4,9000,infovqa_val_anls,0.2397582783787718,0.006679019564643084
824
+ ≥4,9000,mme_total_score,637.5170068027211,
825
+ ≥4,9000,mmmu_val_mmmu_acc,0.29444,
826
+ ≥4,9000,mmstar_average,0.32453031208282723,
827
+ ≥4,9000,ocrbench_ocrbench_accuracy,0.47,
828
+ ≥4,9000,seedbench_seed_all,0.4841578654808227,
829
+ ≥4,9000,textvqa_val_exact_match,0.49822,0.0067895350265813805
830
+ ≥4,10000,ai2d_exact_match,0.4226036269430052,0.008890687000142644
831
+ ≥4,10000,average,0.44420978068105677,
832
+ ≥4,10000,average_rank,3.0,
833
+ ≥4,10000,chartqa_relaxed_overall,0.6216,0.009701702181065136
834
+ ≥4,10000,docvqa_val_anls,0.6324744510097755,0.006105489658656957
835
+ ≥4,10000,infovqa_val_anls,0.2379840149655046,0.006688060277831451
836
+ ≥4,10000,mme_total_score,655.0340136054422,
837
+ ≥4,10000,mmmu_val_mmmu_acc,0.27778,
838
+ ≥4,10000,mmstar_average,0.3590975730111145,
839
+ ≥4,10000,ocrbench_ocrbench_accuracy,0.46,
840
+ ≥4,10000,seedbench_seed_all,0.48704836020011116,
841
+ ≥4,10000,textvqa_val_exact_match,0.4993,0.006805289442255823
842
+ ≥4,11000,ai2d_exact_match,0.43102331606217614,0.008913110733383509
843
+ ≥4,11000,average,0.44988739002985145,
844
+ ≥4,11000,average_rank,3.2,
845
+ ≥4,11000,chartqa_relaxed_overall,0.634,0.00963611653607192
846
+ ≥4,11000,docvqa_val_anls,0.6322712133935365,0.006121517573716792
847
+ ≥4,11000,infovqa_val_anls,0.2413865745385472,0.006692342108960141
848
+ ≥4,11000,mme_total_score,658.1836734693877,
849
+ ≥4,11000,mmmu_val_mmmu_acc,0.28444,
850
+ ≥4,11000,mmstar_average,0.342313121671846,
851
+ ≥4,11000,ocrbench_ocrbench_accuracy,0.479,
852
+ ≥4,11000,seedbench_seed_all,0.502112284602557,
853
+ ≥4,11000,textvqa_val_exact_match,0.50244,0.00679965119188229
854
+ ≥4,12000,ai2d_exact_match,0.42875647668393785,0.008907332750968597
855
+ ≥4,12000,average,0.4548323782860016,
856
+ ≥4,12000,average_rank,2.8,
857
+ ≥4,12000,chartqa_relaxed_overall,0.6304,0.009655859891905061
858
+ ≥4,12000,docvqa_val_anls,0.6455348323026604,0.006090623668615334
859
+ ≥4,12000,infovqa_val_anls,0.24824487207836357,0.006776227086451809
860
+ ≥4,12000,mme_total_score,659.5289115646259,
861
+ ≥4,12000,mmmu_val_mmmu_acc,0.28667,
862
+ ≥4,12000,mmstar_average,0.3540534947708648,
863
+ ≥4,12000,ocrbench_ocrbench_accuracy,0.479,
864
+ ≥4,12000,seedbench_seed_all,0.5011117287381879,
865
+ ≥4,12000,textvqa_val_exact_match,0.51972,0.006789421445801825
866
+ ≥4,13000,ai2d_exact_match,0.43879533678756477,0.008931477789122115
867
+ ≥4,13000,average,0.4591999882773314,
868
+ ≥4,13000,average_rank,2.9,
869
+ ≥4,13000,chartqa_relaxed_overall,0.6404,0.009599583157550096
870
+ ≥4,13000,docvqa_val_anls,0.6527664026795218,0.006064581205597092
871
+ ≥4,13000,infovqa_val_anls,0.25301984861581456,0.006782117731741186
872
+ ≥4,13000,mme_total_score,682.5748299319728,
873
+ ≥4,13000,mmmu_val_mmmu_acc,0.29111,
874
+ ≥4,13000,mmstar_average,0.3517055270912357,
875
+ ≥4,13000,ocrbench_ocrbench_accuracy,0.478,
876
+ ≥4,13000,seedbench_seed_all,0.5050027793218455,
877
+ ≥4,13000,textvqa_val_exact_match,0.522,0.0067926156909974755
878
+ ≥4,14000,ai2d_exact_match,0.4323186528497409,0.008916326937351901
879
+ ≥4,14000,average,0.4646863548031565,
880
+ ≥4,14000,average_rank,3.0,
881
+ ≥4,14000,chartqa_relaxed_overall,0.644,0.009578219924326623
882
+ ≥4,14000,docvqa_val_anls,0.6548905776766276,0.006057263905849616
883
+ ≥4,14000,infovqa_val_anls,0.2562200123257713,0.006874581648592813
884
+ ≥4,14000,mme_total_score,671.9404761904761,
885
+ ≥4,14000,mmmu_val_mmmu_acc,0.29667,
886
+ ≥4,14000,mmstar_average,0.37705986254969837,
887
+ ≥4,14000,ocrbench_ocrbench_accuracy,0.493,
888
+ ≥4,14000,seedbench_seed_all,0.5045580878265703,
889
+ ≥4,14000,textvqa_val_exact_match,0.52346,0.006781469114039297
890
+ ≥4,15000,ai2d_exact_match,0.44430051813471505,0.008943141268224495
891
+ ≥4,15000,average,0.46755888531075723,
892
+ ≥4,15000,average_rank,3.2,
893
+ ≥4,15000,chartqa_relaxed_overall,0.6456,0.009568535872927508
894
+ ≥4,15000,docvqa_val_anls,0.6580148685293012,0.006071011273366836
895
+ ≥4,15000,infovqa_val_anls,0.25650876918794263,0.006807617862342499
896
+ ≥4,15000,mme_total_score,652.6581632653061,
897
+ ≥4,15000,mmmu_val_mmmu_acc,0.3,
898
+ ≥4,15000,mmstar_average,0.37405941950461163,
899
+ ≥4,15000,ocrbench_ocrbench_accuracy,0.494,
900
+ ≥4,15000,seedbench_seed_all,0.5115063924402445,
901
+ ≥4,15000,textvqa_val_exact_match,0.5240400000000001,0.006789641781942949
902
+ ≥4,16000,ai2d_exact_match,0.44591968911917096,0.00894635996642554
903
+ ≥4,16000,average,0.4675481362198435,
904
+ ≥4,16000,average_rank,3.4,
905
+ ≥4,16000,chartqa_relaxed_overall,0.644,0.009578219924326623
906
+ ≥4,16000,docvqa_val_anls,0.6701700723398091,0.00598528368808041
907
+ ≥4,16000,infovqa_val_anls,0.25594206541579917,0.006827795722845132
908
+ ≥4,16000,mme_total_score,621.2006802721088,
909
+ ≥4,16000,mmmu_val_mmmu_acc,0.29333,
910
+ ≥4,16000,mmstar_average,0.3555236170026451,
911
+ ≥4,16000,ocrbench_ocrbench_accuracy,0.5,
912
+ ≥4,16000,seedbench_seed_all,0.5140077821011673,
913
+ ≥4,16000,textvqa_val_exact_match,0.52904,0.006781307610378791
914
+ ≥4,17000,ai2d_exact_match,0.4423575129533679,0.008939151893135124
915
+ ≥4,17000,average,0.470260028617176,
916
+ ≥4,17000,average_rank,3.2,
917
+ ≥4,17000,chartqa_relaxed_overall,0.6528,0.009523504757028414
918
+ ≥4,17000,docvqa_val_anls,0.6715440208321617,0.005990650413425848
919
+ ≥4,17000,infovqa_val_anls,0.2505498351142534,0.0067846959958436336
920
+ ≥4,17000,mme_total_score,645.9387755102041,
921
+ ≥4,17000,mmmu_val_mmmu_acc,0.28889,
922
+ ≥4,17000,mmstar_average,0.36983647620343896,
923
+ ≥4,17000,ocrbench_ocrbench_accuracy,0.506,
924
+ ≥4,17000,seedbench_seed_all,0.5163424124513619,
925
+ ≥4,17000,textvqa_val_exact_match,0.5340199999999999,0.006775522818343422
926
+ ≥4,18000,ai2d_exact_match,0.44365284974093266,0.008941826870765836
927
+ ≥4,18000,average,0.4716323231463362,
928
+ ≥4,18000,average_rank,3.3,
929
+ ≥4,18000,chartqa_relaxed_overall,0.6548,0.009510571191350932
930
+ ≥4,18000,docvqa_val_anls,0.6713197941217036,0.006012007386995055
931
+ ≥4,18000,infovqa_val_anls,0.25642018150567725,0.006824635684186863
932
+ ≥4,18000,mme_total_score,678.5255102040816,
933
+ ≥4,18000,mmmu_val_mmmu_acc,0.30222,
934
+ ≥4,18000,mmstar_average,0.36512008406044094,
935
+ ≥4,18000,ocrbench_ocrbench_accuracy,0.505,
936
+ ≥4,18000,seedbench_seed_all,0.5163979988882713,
937
+ ≥4,18000,textvqa_val_exact_match,0.5297599999999999,0.006789351039496154
938
+ ≥4,19000,ai2d_exact_match,0.4420336787564767,0.008938473522297173
939
+ ≥4,19000,average,0.47578432127978293,
940
+ ≥4,19000,average_rank,3.1,
941
+ ≥4,19000,chartqa_relaxed_overall,0.6552,0.009507962165354631
942
+ ≥4,19000,docvqa_val_anls,0.6758822484033866,0.005978188514035828
943
+ ≥4,19000,infovqa_val_anls,0.25920579379187,0.006859000726576107
944
+ ≥4,19000,mme_total_score,648.0051020408163,
945
+ ≥4,19000,mmmu_val_mmmu_acc,0.31222,
946
+ ≥4,19000,mmstar_average,0.36867213443512875,
947
+ ≥4,19000,ocrbench_ocrbench_accuracy,0.514,
948
+ ≥4,19000,seedbench_seed_all,0.517065036131184,
949
+ ≥4,19000,textvqa_val_exact_match,0.53778,0.006774194584041153
950
+ ≥5,1000,ai2d_exact_match,0.2707253886010363,0.007997269386750962
951
+ ≥5,1000,average,0.28746693021594993,
952
+ ≥5,1000,average_rank,3.0,
953
+ ≥5,1000,chartqa_relaxed_overall,0.4188,0.009869224115088964
954
+ ≥5,1000,docvqa_val_anls,0.3974008919607103,0.006032200167231822
955
+ ≥5,1000,infovqa_val_anls,0.19100831944942226,0.006573293191722239
956
+ ≥5,1000,mme_total_score,611.360544217687,
957
+ ≥5,1000,mmmu_val_mmmu_acc,0.24,
958
+ ≥5,1000,mmstar_average,0.1976861821602846,
959
+ ≥5,1000,ocrbench_ocrbench_accuracy,0.275,
960
+ ≥5,1000,seedbench_seed_all,0.2508615897720956,
961
+ ≥5,1000,textvqa_val_exact_match,0.34571999999999997,0.0065009529145933324
962
+ ≥5,2000,ai2d_exact_match,0.26424870466321243,0.007936036132740997
963
+ ≥5,2000,average,0.3233315583742189,
964
+ ≥5,2000,average_rank,3.6,
965
+ ≥5,2000,chartqa_relaxed_overall,0.5084,0.010000589018267121
966
+ ≥5,2000,docvqa_val_anls,0.4796362298612967,0.006176090342725644
967
+ ≥5,2000,infovqa_val_anls,0.18440735237308775,0.006268696895430431
968
+ ≥5,2000,mme_total_score,573.4557823129252,
969
+ ≥5,2000,mmmu_val_mmmu_acc,0.25333,
970
+ ≥5,2000,mmstar_average,0.2035063743792113,
971
+ ≥5,2000,ocrbench_ocrbench_accuracy,0.338,
972
+ ≥5,2000,seedbench_seed_all,0.27965536409116176,
973
+ ≥5,2000,textvqa_val_exact_match,0.3988,0.006679269011419241
974
+ ≥5,3000,ai2d_exact_match,0.27428756476683935,0.00803002739723617
975
+ ≥5,3000,average,0.3443864268244443,
976
+ ≥5,3000,average_rank,3.9,
977
+ ≥5,3000,chartqa_relaxed_overall,0.5452,0.00996104778570988
978
+ ≥5,3000,docvqa_val_anls,0.4898334505042122,0.006220141079936321
979
+ ≥5,3000,infovqa_val_anls,0.1905138805204429,0.0062439153410265395
980
+ ≥5,3000,mme_total_score,556.8095238095239,
981
+ ≥5,3000,mmmu_val_mmmu_acc,0.26778,
982
+ ≥5,3000,mmstar_average,0.2298314503533517,
983
+ ≥5,3000,ocrbench_ocrbench_accuracy,0.357,
984
+ ≥5,3000,seedbench_seed_all,0.34469149527515286,
985
+ ≥5,3000,textvqa_val_exact_match,0.40034000000000003,0.006692244325099119
986
+ ≥5,4000,ai2d_exact_match,0.32027202072538863,0.008397669117307337
987
+ ≥5,4000,average,0.376205627947542,
988
+ ≥5,4000,average_rank,3.4,
989
+ ≥5,4000,chartqa_relaxed_overall,0.5652,0.009916598185256227
990
+ ≥5,4000,docvqa_val_anls,0.5177963857889966,0.006222777093039552
991
+ ≥5,4000,infovqa_val_anls,0.18916843913392253,0.006215579683049469
992
+ ≥5,4000,mme_total_score,604.8146258503401,
993
+ ≥5,4000,mmmu_val_mmmu_acc,0.28778,
994
+ ≥5,4000,mmstar_average,0.29371580699129896,
995
+ ≥5,4000,ocrbench_ocrbench_accuracy,0.386,
996
+ ≥5,4000,seedbench_seed_all,0.4163979988882713,
997
+ ≥5,4000,textvqa_val_exact_match,0.40952,0.006715332684134995
998
+ ≥5,5000,ai2d_exact_match,0.33743523316062174,0.008510225495976804
999
+ ≥5,5000,average,0.3848546106999232,
1000
+ ≥5,5000,average_rank,3.8,
1001
+ ≥5,5000,chartqa_relaxed_overall,0.5804,0.009871844677005952
1002
+ ≥5,5000,docvqa_val_anls,0.5377359601766157,0.0062198901419007885
1003
+ ≥5,5000,infovqa_val_anls,0.2011732768094287,0.0063147175491986935
1004
+ ≥5,5000,mme_total_score,635.4557823129252,
1005
+ ≥5,5000,mmmu_val_mmmu_acc,0.28333,
1006
+ ≥5,5000,mmstar_average,0.2707551695656502,
1007
+ ≥5,5000,ocrbench_ocrbench_accuracy,0.413,
1008
+ ≥5,5000,seedbench_seed_all,0.4153418565869928,
1009
+ ≥5,5000,textvqa_val_exact_match,0.4245199999999999,0.006749286220993934
1010
+ ≥5,6000,ai2d_exact_match,0.33775906735751293,0.008512227143417681
1011
+ ≥5,6000,average,0.40160010157704135,
1012
+ ≥5,6000,average_rank,4.0,
1013
+ ≥5,6000,chartqa_relaxed_overall,0.5972,0.009811185848158155
1014
+ ≥5,6000,docvqa_val_anls,0.5638436995149737,0.006239864600720921
1015
+ ≥5,6000,infovqa_val_anls,0.2134170597784526,0.0064637890910638615
1016
+ ≥5,6000,mme_total_score,629.6700680272108,
1017
+ ≥5,6000,mmmu_val_mmmu_acc,0.29111,
1018
+ ≥5,6000,mmstar_average,0.30043490077200463,
1019
+ ≥5,6000,ocrbench_ocrbench_accuracy,0.428,
1020
+ ≥5,6000,seedbench_seed_all,0.445136186770428,
1021
+ ≥5,6000,textvqa_val_exact_match,0.4375000000000001,0.006770193284051843
1022
+ ≥5,7000,ai2d_exact_match,0.38147668393782386,0.008742662684201102
1023
+ ≥5,7000,average,0.4128574907483326,
1024
+ ≥5,7000,average_rank,4.0,
1025
+ ≥5,7000,chartqa_relaxed_overall,0.5924,0.009829727637028773
1026
+ ≥5,7000,docvqa_val_anls,0.5787327229367657,0.006207050035602503
1027
+ ≥5,7000,infovqa_val_anls,0.21426196373121223,0.006473055692640963
1028
+ ≥5,7000,mme_total_score,634.8928571428571,
1029
+ ≥5,7000,mmmu_val_mmmu_acc,0.28778,
1030
+ ≥5,7000,mmstar_average,0.3047775080524821,
1031
+ ≥5,7000,ocrbench_ocrbench_accuracy,0.445,
1032
+ ≥5,7000,seedbench_seed_all,0.4633685380767093,
1033
+ ≥5,7000,textvqa_val_exact_match,0.4479200000000001,0.006776925561345115
1034
+ ≥5,8000,ai2d_exact_match,0.37694300518134716,0.008722348153640555
1035
+ ≥5,8000,average,0.4177104341751616,
1036
+ ≥5,8000,average_rank,4.3,
1037
+ ≥5,8000,chartqa_relaxed_overall,0.6052,0.009778109662477129
1038
+ ≥5,8000,docvqa_val_anls,0.5848226849030369,0.006118280955924086
1039
+ ≥5,8000,infovqa_val_anls,0.22716516383976268,0.006599069597925426
1040
+ ≥5,8000,mme_total_score,631.8299319727892,
1041
+ ≥5,8000,mmmu_val_mmmu_acc,0.28778,
1042
+ ≥5,8000,mmstar_average,0.310840196509451,
1043
+ ≥5,8000,ocrbench_ocrbench_accuracy,0.45,
1044
+ ≥5,8000,seedbench_seed_all,0.45714285714285713,
1045
+ ≥5,8000,textvqa_val_exact_match,0.4595,0.006799352633835655
1046
+ ≥5,9000,ai2d_exact_match,0.39345854922279794,0.008792480650628211
1047
+ ≥5,9000,average,0.4230005209124751,
1048
+ ≥5,9000,average_rank,4.1,
1049
+ ≥5,9000,chartqa_relaxed_overall,0.6164,0.009727191953761483
1050
+ ≥5,9000,docvqa_val_anls,0.5898502952284708,0.006210892818800316
1051
+ ≥5,9000,infovqa_val_anls,0.22700559034851783,0.006566637554657031
1052
+ ≥5,9000,mme_total_score,650.124149659864,
1053
+ ≥5,9000,mmmu_val_mmmu_acc,0.26333,
1054
+ ≥5,9000,mmstar_average,0.3054500143908104,
1055
+ ≥5,9000,ocrbench_ocrbench_accuracy,0.476,
1056
+ ≥5,9000,seedbench_seed_all,0.4744302390216787,
1057
+ ≥5,9000,textvqa_val_exact_match,0.46107999999999993,0.00679104339362331
1058
+ ≥5,10000,ai2d_exact_match,0.40770725388601037,0.008844516803704286
1059
+ ≥5,10000,average,0.42883369231582574,
1060
+ ≥5,10000,average_rank,4.3,
1061
+ ≥5,10000,chartqa_relaxed_overall,0.6212,0.009703704898413913
1062
+ ≥5,10000,docvqa_val_anls,0.6019271313791558,0.0061562654420724205
1063
+ ≥5,10000,infovqa_val_anls,0.22574970075829065,0.006593818910085297
1064
+ ≥5,10000,mme_total_score,596.4710884353742,
1065
+ ≥5,10000,mmmu_val_mmmu_acc,0.28778,
1066
+ ≥5,10000,mmstar_average,0.31747912258440025,
1067
+ ≥5,10000,ocrbench_ocrbench_accuracy,0.458,
1068
+ ≥5,10000,seedbench_seed_all,0.47204002223457475,
1069
+ ≥5,10000,textvqa_val_exact_match,0.46762,0.00680558375997701
1070
+ ≥5,11000,ai2d_exact_match,0.41224093264248707,0.008859453032358869
1071
+ ≥5,11000,average,0.43139135632851494,
1072
+ ≥5,11000,average_rank,4.3,
1073
+ ≥5,11000,chartqa_relaxed_overall,0.624,0.009689538423575438
1074
+ ≥5,11000,docvqa_val_anls,0.5969298636083556,0.006077525402912234
1075
+ ≥5,11000,infovqa_val_anls,0.23206384708038705,0.006649968873942653
1076
+ ≥5,11000,mme_total_score,618.6326530612246,
1077
+ ≥5,11000,mmmu_val_mmmu_acc,0.29778,
1078
+ ≥5,11000,mmstar_average,0.304166274020068,
1079
+ ≥5,11000,ocrbench_ocrbench_accuracy,0.467,
1080
+ ≥5,11000,seedbench_seed_all,0.4783212896053363,
1081
+ ≥5,11000,textvqa_val_exact_match,0.47002,0.006791013436350906
1082
+ ≥5,12000,ai2d_exact_match,0.41386010362694303,0.00886459927257348
1083
+ ≥5,12000,average,0.43780261093672457,
1084
+ ≥5,12000,average_rank,4.2,
1085
+ ≥5,12000,chartqa_relaxed_overall,0.63,0.00965801796044974
1086
+ ≥5,12000,docvqa_val_anls,0.61907502588855,0.006141216046133696
1087
+ ≥5,12000,infovqa_val_anls,0.23508371459325778,0.006715413220374958
1088
+ ≥5,12000,mme_total_score,636.8095238095239,
1089
+ ≥5,12000,mmmu_val_mmmu_acc,0.29444,
1090
+ ≥5,12000,mmstar_average,0.30800611068641726,
1091
+ ≥5,12000,ocrbench_ocrbench_accuracy,0.48,
1092
+ ≥5,12000,seedbench_seed_all,0.481378543635353,
1093
+ ≥5,12000,textvqa_val_exact_match,0.47838,0.0067952446747735285
1094
+ ≥5,13000,ai2d_exact_match,0.41968911917098445,0.008882309400443855
1095
+ ≥5,13000,average,0.44244351114513614,
1096
+ ≥5,13000,average_rank,4.2,
1097
+ ≥5,13000,chartqa_relaxed_overall,0.63,0.00965801796044974
1098
+ ≥5,13000,docvqa_val_anls,0.6332585799642098,0.006108497007314537
1099
+ ≥5,13000,infovqa_val_anls,0.23941240946565165,0.006705748427777414
1100
+ ≥5,13000,mme_total_score,626.8248299319728,
1101
+ ≥5,13000,mmmu_val_mmmu_acc,0.29444,
1102
+ ≥5,13000,mmstar_average,0.31812106368981574,
1103
+ ≥5,13000,ocrbench_ocrbench_accuracy,0.479,
1104
+ ≥5,13000,seedbench_seed_all,0.4867704280155642,
1105
+ ≥5,13000,textvqa_val_exact_match,0.4812999999999999,0.006807740525770151
1106
+ ≥5,14000,ai2d_exact_match,0.4180699481865285,0.008877517831066049
1107
+ ≥5,14000,average,0.43859864594625364,
1108
+ ≥5,14000,average_rank,4.7,
1109
+ ≥5,14000,chartqa_relaxed_overall,0.634,0.00963611653607192
1110
+ ≥5,14000,docvqa_val_anls,0.6228194930250945,0.00603170401976013
1111
+ ≥5,14000,infovqa_val_anls,0.2373742957554242,0.006749917839373105
1112
+ ≥5,14000,mme_total_score,640.9353741496599,
1113
+ ≥5,14000,mmmu_val_mmmu_acc,0.28111,
1114
+ ≥5,14000,mmstar_average,0.30870170300837935,
1115
+ ≥5,14000,ocrbench_ocrbench_accuracy,0.468,
1116
+ ≥5,14000,seedbench_seed_all,0.490272373540856,
1117
+ ≥5,14000,textvqa_val_exact_match,0.48704,0.006811555412490416
1118
+ ≥5,15000,ai2d_exact_match,0.42033678756476683,0.0088841985383291
1119
+ ≥5,15000,average,0.4493020549744483,
1120
+ ≥5,15000,average_rank,4.3,
1121
+ ≥5,15000,chartqa_relaxed_overall,0.634,0.00963611653607192
1122
+ ≥5,15000,docvqa_val_anls,0.6441936439366807,0.006083543225400507
1123
+ ≥5,15000,infovqa_val_anls,0.24225340680501783,0.006766995068281554
1124
+ ≥5,15000,mme_total_score,649.6683673469388,
1125
+ ≥5,15000,mmmu_val_mmmu_acc,0.31444,
1126
+ ≥5,15000,mmstar_average,0.3088639171639584,
1127
+ ≥5,15000,ocrbench_ocrbench_accuracy,0.49,
1128
+ ≥5,15000,seedbench_seed_all,0.4953307392996109,
1129
+ ≥5,15000,textvqa_val_exact_match,0.4943,0.006802702209524558
1130
+ ≥5,16000,ai2d_exact_match,0.43102331606217614,0.008913110733383512
1131
+ ≥5,16000,average,0.4547498341048439,
1132
+ ≥5,16000,average_rank,4.0,
1133
+ ≥5,16000,chartqa_relaxed_overall,0.6376,0.009615793331418735
1134
+ ≥5,16000,docvqa_val_anls,0.6400316893948637,0.006137343277865641
1135
+ ≥5,16000,infovqa_val_anls,0.2494232082698933,0.006878312285939229
1136
+ ≥5,16000,mme_total_score,651.4880952380952,
1137
+ ≥5,16000,mmmu_val_mmmu_acc,0.31111,
1138
+ ≥5,16000,mmstar_average,0.33139384518998005,
1139
+ ≥5,16000,ocrbench_ocrbench_accuracy,0.504,
1140
+ ≥5,16000,seedbench_seed_all,0.49160644802668146,
1141
+ ≥5,16000,textvqa_val_exact_match,0.49655999999999995,0.006809089955434884
1142
+ ≥5,17000,ai2d_exact_match,0.41547927461139894,0.008869646776634897
1143
+ ≥5,17000,average,0.45183488389514526,
1144
+ ≥5,17000,average_rank,4.2,
1145
+ ≥5,17000,chartqa_relaxed_overall,0.6444,0.009575809858898698
1146
+ ≥5,17000,docvqa_val_anls,0.6470904368022692,0.00610746328694398
1147
+ ≥5,17000,infovqa_val_anls,0.24587813104100764,0.006774660099659317
1148
+ ≥5,17000,mme_total_score,635.6700680272108,
1149
+ ≥5,17000,mmmu_val_mmmu_acc,0.29222,
1150
+ ≥5,17000,mmstar_average,0.3214527496221985,
1151
+ ≥5,17000,ocrbench_ocrbench_accuracy,0.51,
1152
+ ≥5,17000,seedbench_seed_all,0.49605336297943303,
1153
+ ≥5,17000,textvqa_val_exact_match,0.49394000000000005,0.006805286008944275
1154
+ ≥5,18000,ai2d_exact_match,0.4319948186528497,0.008915528710615484
1155
+ ≥5,18000,average,0.4524884361615038,
1156
+ ≥5,18000,average_rank,4.7,
1157
+ ≥5,18000,chartqa_relaxed_overall,0.6412,0.009594886593362934
1158
+ ≥5,18000,docvqa_val_anls,0.6417498056690655,0.006070355136614947
1159
+ ≥5,18000,infovqa_val_anls,0.2404225713513222,0.006730728710089941
1160
+ ≥5,18000,mme_total_score,655.829931972789,
1161
+ ≥5,18000,mmmu_val_mmmu_acc,0.29222,
1162
+ ≥5,18000,mmstar_average,0.32211620059741813,
1163
+ ≥5,18000,ocrbench_ocrbench_accuracy,0.513,
1164
+ ≥5,18000,seedbench_seed_all,0.4945525291828794,
1165
+ ≥5,18000,textvqa_val_exact_match,0.49513999999999997,0.0067971844482318305
1166
+ ≥5,19000,ai2d_exact_match,0.4213082901554404,0.008887002823098537
1167
+ ≥5,19000,average,0.4541437926528052,
1168
+ ≥5,19000,average_rank,4.7,
1169
+ ≥5,19000,chartqa_relaxed_overall,0.642,0.009590161024476605
1170
+ ≥5,19000,docvqa_val_anls,0.6518779003706476,0.006074501334886487
1171
+ ≥5,19000,infovqa_val_anls,0.24196990768518037,0.006752919839434985
1172
+ ≥5,19000,mme_total_score,646.6513605442177,
1173
+ ≥5,19000,mmmu_val_mmmu_acc,0.30111,
1174
+ ≥5,19000,mmstar_average,0.3147497032570857,
1175
+ ≥5,19000,ocrbench_ocrbench_accuracy,0.51,
1176
+ ≥5,19000,seedbench_seed_all,0.49699833240689273,
1177
+ ≥5,19000,textvqa_val_exact_match,0.5072800000000001,0.006794367741490103
app/src/content/assets/data/internal_deduplication.csv ADDED
@@ -0,0 +1,729 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run,step,metric,value,stderr
2
+ Baseline,300,ai2d_exact_match,0.2551813471502591,0.007846598309236504
3
+ Baseline,300,average,0.1836384379377178,
4
+ Baseline,300,average_rank,1.4444444444444444,
5
+ Baseline,300,chartqa_relaxed_overall,0.1328,0.006788526912302523
6
+ Baseline,300,docvqa_val_anls,0.1503143424142802,0.004151727384820528
7
+ Baseline,300,infovqa_val_anls,0.11374396685909084,0.005163280990095591
8
+ Baseline,300,mme_total_score,691.1952781112445,
9
+ Baseline,300,mmmu_val_mmmu_acc,0.26556,
10
+ Baseline,300,mmstar_average,0.2859278470781123,
11
+ Baseline,300,ocrbench_ocrbench_accuracy,0.149,
12
+ Baseline,300,textvqa_val_exact_match,0.11657999999999999,0.004405144921606561
13
+ Baseline,1500,ai2d_exact_match,0.27525906735751293,0.008038849490577982
14
+ Baseline,1500,average,0.318819844462715,
15
+ Baseline,1500,average_rank,1.2222222222222223,
16
+ Baseline,1500,chartqa_relaxed_overall,0.374,0.009679208378267924
17
+ Baseline,1500,docvqa_val_anls,0.437411196849637,0.0061765544267728045
18
+ Baseline,1500,infovqa_val_anls,0.21582289145457856,0.006873661480889723
19
+ Baseline,1500,mme_total_score,1066.704581832733,
20
+ Baseline,1500,mmmu_val_mmmu_acc,0.24,
21
+ Baseline,1500,mmstar_average,0.23474560003999134,
22
+ Baseline,1500,ocrbench_ocrbench_accuracy,0.411,
23
+ Baseline,1500,textvqa_val_exact_match,0.36232000000000003,0.006579840604488538
24
+ Baseline,2700,ai2d_exact_match,0.27849740932642486,0.008067913113285858
25
+ Baseline,2700,average,0.36471172748595665,
26
+ Baseline,2700,average_rank,1.4444444444444444,
27
+ Baseline,2700,chartqa_relaxed_overall,0.4624,0.00997367964766694
28
+ Baseline,2700,docvqa_val_anls,0.4953558755845657,0.006275075768152338
29
+ Baseline,2700,infovqa_val_anls,0.20975551937756792,0.006468441430093479
30
+ Baseline,2700,mme_total_score,1172.469887955182,
31
+ Baseline,2700,mmmu_val_mmmu_acc,0.27111,
32
+ Baseline,2700,mmstar_average,0.2503150155990948,
33
+ Baseline,2700,ocrbench_ocrbench_accuracy,0.486,
34
+ Baseline,2700,textvqa_val_exact_match,0.46426000000000006,0.006792330795207658
35
+ Baseline,3900,ai2d_exact_match,0.35038860103626945,0.008586842325753156
36
+ Baseline,3900,average,0.398537125609502,
37
+ Baseline,3900,average_rank,1.4444444444444444,
38
+ Baseline,3900,chartqa_relaxed_overall,0.4948,0.010001459677380663
39
+ Baseline,3900,docvqa_val_anls,0.5407649774017467,0.00626354456311192
40
+ Baseline,3900,infovqa_val_anls,0.22943878312324553,0.006664668392753554
41
+ Baseline,3900,mme_total_score,1168.9393757503,
42
+ Baseline,3900,mmmu_val_mmmu_acc,0.27,
43
+ Baseline,3900,mmstar_average,0.3015046433147543,
44
+ Baseline,3900,ocrbench_ocrbench_accuracy,0.517,
45
+ Baseline,3900,textvqa_val_exact_match,0.4844,0.006794038548018284
46
+ Baseline,5100,ai2d_exact_match,0.3898963730569948,0.008778252852376944
47
+ Baseline,5100,average,0.42767475240113806,
48
+ Baseline,5100,average_rank,1.2222222222222223,
49
+ Baseline,5100,chartqa_relaxed_overall,0.5264,0.009988048880946633
50
+ Baseline,5100,docvqa_val_anls,0.5781350651939515,0.006244324391533268
51
+ Baseline,5100,infovqa_val_anls,0.2546269175216946,0.007112814176935012
52
+ Baseline,5100,mme_total_score,1185.1023409363747,
53
+ Baseline,5100,mmmu_val_mmmu_acc,0.29222,
54
+ Baseline,5100,mmstar_average,0.33637966343646347,
55
+ Baseline,5100,ocrbench_ocrbench_accuracy,0.533,
56
+ Baseline,5100,textvqa_val_exact_match,0.51074,0.0068004249599511925
57
+ Baseline,6300,ai2d_exact_match,0.41515544041450775,0.00886864516657515
58
+ Baseline,6300,average,0.43890688312888254,
59
+ Baseline,6300,average_rank,1.4444444444444444,
60
+ Baseline,6300,chartqa_relaxed_overall,0.5388,0.0099718403035556
61
+ Baseline,6300,docvqa_val_anls,0.6024512173813115,0.006190216536053702
62
+ Baseline,6300,infovqa_val_anls,0.2548412895443468,0.007030638027408485
63
+ Baseline,6300,mme_total_score,1187.329431772709,
64
+ Baseline,6300,mmmu_val_mmmu_acc,0.30667,
65
+ Baseline,6300,mmstar_average,0.3500771176908943,
66
+ Baseline,6300,ocrbench_ocrbench_accuracy,0.516,
67
+ Baseline,6300,textvqa_val_exact_match,0.52726,0.006770298802059908
68
+ Baseline,7500,ai2d_exact_match,0.42972797927461137,0.008909832364541428
69
+ Baseline,7500,average,0.44878537461255386,
70
+ Baseline,7500,average_rank,1.3333333333333333,
71
+ Baseline,7500,chartqa_relaxed_overall,0.5728,0.009895414680177737
72
+ Baseline,7500,docvqa_val_anls,0.6164034078362094,0.006122657396260068
73
+ Baseline,7500,infovqa_val_anls,0.25244937386016403,0.006941949044716374
74
+ Baseline,7500,mme_total_score,1282.560024009604,
75
+ Baseline,7500,mmmu_val_mmmu_acc,0.29667,
76
+ Baseline,7500,mmstar_average,0.3339722359294459,
77
+ Baseline,7500,ocrbench_ocrbench_accuracy,0.558,
78
+ Baseline,7500,textvqa_val_exact_match,0.5302600000000001,0.0067524799649562395
79
+ Baseline,8700,ai2d_exact_match,0.44527202072538863,0.008945084019331404
80
+ Baseline,8700,average,0.4558942646480554,
81
+ Baseline,8700,average_rank,1.5555555555555556,
82
+ Baseline,8700,chartqa_relaxed_overall,0.5852,0.009855721084488851
83
+ Baseline,8700,docvqa_val_anls,0.6221835109907441,0.006147036255020746
84
+ Baseline,8700,infovqa_val_anls,0.25900127209441604,0.006885435292484948
85
+ Baseline,8700,mme_total_score,1182.047919167667,
86
+ Baseline,8700,mmmu_val_mmmu_acc,0.30333,
87
+ Baseline,8700,mmstar_average,0.3299073133738943,
88
+ Baseline,8700,ocrbench_ocrbench_accuracy,0.559,
89
+ Baseline,8700,textvqa_val_exact_match,0.54326,0.0067297527736521565
90
+ Baseline,9900,ai2d_exact_match,0.4520725388601036,0.008957715852675529
91
+ Baseline,9900,average,0.4655685311713072,
92
+ Baseline,9900,average_rank,1.5555555555555556,
93
+ Baseline,9900,chartqa_relaxed_overall,0.5888,0.009842996384797287
94
+ Baseline,9900,docvqa_val_anls,0.6443822232919176,0.006072644236356477
95
+ Baseline,9900,infovqa_val_anls,0.2707219279967856,0.007060292176646616
96
+ Baseline,9900,mme_total_score,1293.4631852741097,
97
+ Baseline,9900,mmmu_val_mmmu_acc,0.30444,
98
+ Baseline,9900,mmstar_average,0.34327155922165065,
99
+ Baseline,9900,ocrbench_ocrbench_accuracy,0.557,
100
+ Baseline,9900,textvqa_val_exact_match,0.56386,0.006703146016110842
101
+ Baseline,11100,ai2d_exact_match,0.4494818652849741,0.008953103134587198
102
+ Baseline,11100,average,0.471077301321738,
103
+ Baseline,11100,average_rank,1.6666666666666667,
104
+ Baseline,11100,chartqa_relaxed_overall,0.5948,0.009820578470976232
105
+ Baseline,11100,docvqa_val_anls,0.657973309294109,0.006015458191652746
106
+ Baseline,11100,infovqa_val_anls,0.29696232573726855,0.007574623301736419
107
+ Baseline,11100,mme_total_score,1338.3029211684673,
108
+ Baseline,11100,mmmu_val_mmmu_acc,0.29667,
109
+ Baseline,11100,mmstar_average,0.3394909102575524,
110
+ Baseline,11100,ocrbench_ocrbench_accuracy,0.565,
111
+ Baseline,11100,textvqa_val_exact_match,0.56824,0.006679879088496093
112
+ Baseline,12300,ai2d_exact_match,0.4676165803108808,0.008980259712600086
113
+ Baseline,12300,average,0.47342294699365395,
114
+ Baseline,12300,average_rank,1.5555555555555556,
115
+ Baseline,12300,chartqa_relaxed_overall,0.598,0.009808000752013664
116
+ Baseline,12300,docvqa_val_anls,0.6588847758219586,0.00602421968017162
117
+ Baseline,12300,infovqa_val_anls,0.2830975650419957,0.007216197962807829
118
+ Baseline,12300,mme_total_score,1269.7461984793918,
119
+ Baseline,12300,mmmu_val_mmmu_acc,0.28333,
120
+ Baseline,12300,mmstar_average,0.3693946547743964,
121
+ Baseline,12300,ocrbench_ocrbench_accuracy,0.559,
122
+ Baseline,12300,textvqa_val_exact_match,0.5680599999999999,0.006686980665598219
123
+ Baseline,13500,ai2d_exact_match,0.47085492227979275,0.008983852707691612
124
+ Baseline,13500,average,0.48226394524672617,
125
+ Baseline,13500,average_rank,1.5555555555555556,
126
+ Baseline,13500,chartqa_relaxed_overall,0.618,0.009719474639861454
127
+ Baseline,13500,docvqa_val_anls,0.6663692127257962,0.005978102603390597
128
+ Baseline,13500,infovqa_val_anls,0.32051341945189793,0.007779116582967409
129
+ Baseline,13500,mme_total_score,1202.768607442977,
130
+ Baseline,13500,mmmu_val_mmmu_acc,0.28,
131
+ Baseline,13500,mmstar_average,0.35477400751632243,
132
+ Baseline,13500,ocrbench_ocrbench_accuracy,0.569,
133
+ Baseline,13500,textvqa_val_exact_match,0.5785999999999999,0.006676145758177908
134
+ Baseline,14700,ai2d_exact_match,0.46567357512953367,0.008977921602780724
135
+ Baseline,14700,average,0.48621829332317545,
136
+ Baseline,14700,average_rank,1.5555555555555556,
137
+ Baseline,14700,chartqa_relaxed_overall,0.6296,0.0096601689190934
138
+ Baseline,14700,docvqa_val_anls,0.6810941724065047,0.005910647813959628
139
+ Baseline,14700,infovqa_val_anls,0.3016034504434661,0.007417514325399065
140
+ Baseline,14700,mme_total_score,1281.9612845138056,
141
+ Baseline,14700,mmmu_val_mmmu_acc,0.29778,
142
+ Baseline,14700,mmstar_average,0.365895148605899,
143
+ Baseline,14700,ocrbench_ocrbench_accuracy,0.562,
144
+ Baseline,14700,textvqa_val_exact_match,0.5861,0.006642001297519238
145
+ Baseline,15900,ai2d_exact_match,0.48186528497409326,0.008993233105757854
146
+ Baseline,15900,average,0.48999290982002447,
147
+ Baseline,15900,average_rank,1.5,
148
+ Baseline,15900,chartqa_relaxed_overall,0.64,0.009601920576192066
149
+ Baseline,15900,docvqa_val_anls,0.6858324657211811,0.00589619582327283
150
+ Baseline,15900,infovqa_val_anls,0.2913749730393032,0.007302812648430173
151
+ Baseline,15900,mme_total_score,1296.9955982392958,
152
+ Baseline,15900,mmmu_val_mmmu_acc,0.29111,
153
+ Baseline,15900,mmstar_average,0.35848055482561814,
154
+ Baseline,15900,ocrbench_ocrbench_accuracy,0.581,
155
+ Baseline,15900,textvqa_val_exact_match,0.59028,0.006635865524726405
156
+ Baseline,17100,ai2d_exact_match,0.4740932642487047,0.008987066275159845
157
+ Baseline,17100,average,0.4931189092163302,
158
+ Baseline,17100,average_rank,1.7777777777777777,
159
+ Baseline,17100,chartqa_relaxed_overall,0.644,0.009578219924326623
160
+ Baseline,17100,docvqa_val_anls,0.6847803896363295,0.005919128355709122
161
+ Baseline,17100,infovqa_val_anls,0.3018247984331409,0.007408081810180743
162
+ Baseline,17100,mme_total_score,1262.8012204881952,
163
+ Baseline,17100,mmmu_val_mmmu_acc,0.28444,
164
+ Baseline,17100,mmstar_average,0.36583282141246676,
165
+ Baseline,17100,ocrbench_ocrbench_accuracy,0.588,
166
+ Baseline,17100,textvqa_val_exact_match,0.6019800000000001,0.0065905009567234045
167
+ Baseline,18300,ai2d_exact_match,0.4876943005181347,0.008996428218289523
168
+ Baseline,18300,average,0.5004883767088391,
169
+ Baseline,18300,average_rank,1.5,
170
+ Baseline,18300,chartqa_relaxed_overall,0.652,0.00952862623294433
171
+ Baseline,18300,docvqa_val_anls,0.6975218894019752,0.005845051202995877
172
+ Baseline,18300,infovqa_val_anls,0.3185079040699619,0.007608667971660477
173
+ Baseline,18300,mme_total_score,1310.265706282513,
174
+ Baseline,18300,mmmu_val_mmmu_acc,0.29556,
175
+ Baseline,18300,mmstar_average,0.36108291968064027,
176
+ Baseline,18300,ocrbench_ocrbench_accuracy,0.588,
177
+ Baseline,18300,textvqa_val_exact_match,0.60354,0.006611280926348344
178
+ Baseline,19500,ai2d_exact_match,0.47765544041450775,0.00899016344465196
179
+ Baseline,19500,average,0.5040547762672563,
180
+ Baseline,19500,average_rank,1.4444444444444444,
181
+ Baseline,19500,chartqa_relaxed_overall,0.6552,0.009507962165354631
182
+ Baseline,19500,docvqa_val_anls,0.7041825239698998,0.005808767160221614
183
+ Baseline,19500,infovqa_val_anls,0.3209241432627218,0.007605560217474187
184
+ Baseline,19500,mme_total_score,1295.3964585834333,
185
+ Baseline,19500,mmmu_val_mmmu_acc,0.30333,
186
+ Baseline,19500,mmstar_average,0.35936610249092044,
187
+ Baseline,19500,ocrbench_ocrbench_accuracy,0.604,
188
+ Baseline,19500,textvqa_val_exact_match,0.60778,0.006595164407254131
189
+ Baseline,20700,ai2d_exact_match,0.49190414507772023,0.008997974381217105
190
+ Baseline,20700,average,0.5348651598748863,
191
+ Baseline,20700,average_rank,1.25,
192
+ Baseline,20700,chartqa_relaxed_overall,0.6472,0.009558734841217527
193
+ Baseline,20700,docvqa_val_anls,0.70377508713271,0.005815829966103309
194
+ Baseline,20700,infovqa_val_anls,0.31228879567103124,0.0074592773891107925
195
+ Baseline,20700,mme_total_score,1267.3561424569828,
196
+ Baseline,20700,mmstar_average,0.36086809124274183,
197
+ Baseline,20700,ocrbench_ocrbench_accuracy,0.605,
198
+ Baseline,20700,textvqa_val_exact_match,0.62302,0.006536647571369781
199
+ Baseline,21900,ai2d_exact_match,0.49125647668393785,0.008997778057794698
200
+ Baseline,21900,average,0.5035549318138456,
201
+ Baseline,21900,average_rank,1.4444444444444444,
202
+ Baseline,21900,chartqa_relaxed_overall,0.6556,0.009505345687488459
203
+ Baseline,21900,docvqa_val_anls,0.7044656227681543,0.005797355786446792
204
+ Baseline,21900,infovqa_val_anls,0.3214548388700204,0.007656455061893302
205
+ Baseline,21900,mme_total_score,1270.262104841937,
206
+ Baseline,21900,mmmu_val_mmmu_acc,0.28111,
207
+ Baseline,21900,mmstar_average,0.36167251618865237,
208
+ Baseline,21900,ocrbench_ocrbench_accuracy,0.597,
209
+ Baseline,21900,textvqa_val_exact_match,0.61588,0.006563701818052925
210
+ Baseline,23100,ai2d_exact_match,0.49319948186528495,0.008998321712163856
211
+ Baseline,23100,average,0.5385543058304301,
212
+ Baseline,23100,average_rank,1.5,
213
+ Baseline,23100,chartqa_relaxed_overall,0.6592,0.009481461028833927
214
+ Baseline,23100,docvqa_val_anls,0.7121972356483652,0.005769225218375019
215
+ Baseline,23100,infovqa_val_anls,0.31967136620122777,0.007611618366213475
216
+ Baseline,23100,mme_total_score,1318.2786114445778,
217
+ Baseline,23100,mmstar_average,0.3630320570981325,
218
+ Baseline,23100,ocrbench_ocrbench_accuracy,0.602,
219
+ Baseline,23100,textvqa_val_exact_match,0.62058,0.006524799408523169
220
+ Baseline,24300,ai2d_exact_match,0.49255181347150256,0.008998155599035915
221
+ Baseline,24300,average,0.5094308504545716,
222
+ Baseline,24300,average_rank,1.5555555555555556,
223
+ Baseline,24300,chartqa_relaxed_overall,0.6704,0.009403239035659185
224
+ Baseline,24300,docvqa_val_anls,0.7177853964151442,0.005720014481294498
225
+ Baseline,24300,infovqa_val_anls,0.31972012794378407,0.007606738233281323
226
+ Baseline,24300,mme_total_score,1306.592336934774,
227
+ Baseline,24300,mmmu_val_mmmu_acc,0.29778,
228
+ Baseline,24300,mmstar_average,0.37076946580614156,
229
+ Baseline,24300,ocrbench_ocrbench_accuracy,0.59,
230
+ Baseline,24300,textvqa_val_exact_match,0.6164400000000001,0.006543401905866729
231
+ Baseline,25500,ai2d_exact_match,0.501619170984456,0.008999106932714636
232
+ Baseline,25500,average,0.5486249165918439,
233
+ Baseline,25500,average_rank,1.625,
234
+ Baseline,25500,chartqa_relaxed_overall,0.6752,0.00936787525721462
235
+ Baseline,25500,docvqa_val_anls,0.7137288248520355,0.0057597420625403505
236
+ Baseline,25500,infovqa_val_anls,0.34135511904919924,0.0077802284678825705
237
+ Baseline,25500,mme_total_score,1323.6883753501402,
238
+ Baseline,25500,mmstar_average,0.369071301257217,
239
+ Baseline,25500,ocrbench_ocrbench_accuracy,0.619,
240
+ Baseline,25500,textvqa_val_exact_match,0.6204,0.00653548089294892
241
+ Baseline,26700,ai2d_exact_match,0.4990284974093264,0.008999137132137064
242
+ Baseline,26700,average,0.5171016246428288,
243
+ Baseline,26700,average_rank,1.4444444444444444,
244
+ Baseline,26700,chartqa_relaxed_overall,0.6712,0.009397422445513864
245
+ Baseline,26700,docvqa_val_anls,0.7233130041233962,0.005709000608468465
246
+ Baseline,26700,infovqa_val_anls,0.34093933218960265,0.007871398735359877
247
+ Baseline,26700,mme_total_score,1290.1798719487797,
248
+ Baseline,26700,mmmu_val_mmmu_acc,0.29889,
249
+ Baseline,26700,mmstar_average,0.3681821634203056,
250
+ Baseline,26700,ocrbench_ocrbench_accuracy,0.602,
251
+ Baseline,26700,textvqa_val_exact_match,0.63326,0.006491932186699375
252
+ Baseline,27900,ai2d_exact_match,0.49773316062176165,0.008999061633391479
253
+ Baseline,27900,average,0.5456332793229398,
254
+ Baseline,27900,average_rank,1.625,
255
+ Baseline,27900,chartqa_relaxed_overall,0.6756,0.009364877808842454
256
+ Baseline,27900,docvqa_val_anls,0.7132690678246167,0.00575358310740901
257
+ Baseline,27900,infovqa_val_anls,0.3362338249924974,0.007684149470716349
258
+ Baseline,27900,mme_total_score,1267.1172468987595,
259
+ Baseline,27900,mmstar_average,0.3725169018217032,
260
+ Baseline,27900,ocrbench_ocrbench_accuracy,0.599,
261
+ Baseline,27900,textvqa_val_exact_match,0.62508,0.006518059200340837
262
+ Baseline,29100,ai2d_exact_match,0.5019430051813472,0.008999086170553228
263
+ Baseline,29100,average,0.5238317316407767,
264
+ Baseline,29100,average_rank,1.0,
265
+ Baseline,29100,chartqa_relaxed_overall,0.6828,0.009309582768982347
266
+ Baseline,29100,docvqa_val_anls,0.7233823673869951,0.005705166797815572
267
+ Baseline,29100,infovqa_val_anls,0.34214735285161113,0.007759163899965965
268
+ Baseline,29100,mme_total_score,1321.8040216086433,
269
+ Baseline,29100,mmmu_val_mmmu_acc,0.31222,
270
+ Baseline,29100,mmstar_average,0.3709411277062599,
271
+ Baseline,29100,ocrbench_ocrbench_accuracy,0.622,
272
+ Baseline,29100,textvqa_val_exact_match,0.6352199999999999,0.00647159073314463
273
+ Baseline,30300,ai2d_exact_match,0.5055051813471503,0.008998608627616667
274
+ Baseline,30300,average,0.5497034826600226,
275
+ Baseline,30300,average_rank,1.375,
276
+ Baseline,30300,chartqa_relaxed_overall,0.6784,0.009343676884347384
277
+ Baseline,30300,docvqa_val_anls,0.7227075209990185,0.005720573311731873
278
+ Baseline,30300,infovqa_val_anls,0.33249900926543363,0.007751325884024483
279
+ Baseline,30300,mme_total_score,1290.3790516206482,
280
+ Baseline,30300,mmstar_average,0.36331266700855536,
281
+ Baseline,30300,ocrbench_ocrbench_accuracy,0.612,
282
+ Baseline,30300,textvqa_val_exact_match,0.6335,0.006488911402865572
283
+ Baseline,31500,ai2d_exact_match,0.4993523316062176,0.008999146569435543
284
+ Baseline,31500,average,0.5220721222554265,
285
+ Baseline,31500,average_rank,1.5555555555555556,
286
+ Baseline,31500,chartqa_relaxed_overall,0.6872,0.009274528060677767
287
+ Baseline,31500,docvqa_val_anls,0.732681296661989,0.005643494305560718
288
+ Baseline,31500,infovqa_val_anls,0.34453436089995576,0.007841367492503165
289
+ Baseline,31500,mme_total_score,1304.8996598639455,
290
+ Baseline,31500,mmmu_val_mmmu_acc,0.29444,
291
+ Baseline,31500,mmstar_average,0.37192898887525,
292
+ Baseline,31500,ocrbench_ocrbench_accuracy,0.61,
293
+ Baseline,31500,textvqa_val_exact_match,0.63644,0.006473052244580776
294
+ Baseline,32700,ai2d_exact_match,0.49870466321243523,0.00899912391990207
295
+ Baseline,32700,average,0.5546837276191249,
296
+ Baseline,32700,average_rank,1.5,
297
+ Baseline,32700,chartqa_relaxed_overall,0.68,0.009331389496316869
298
+ Baseline,32700,docvqa_val_anls,0.7278962076951819,0.005686137433507678
299
+ Baseline,32700,infovqa_val_anls,0.3359004823603636,0.007743137801806592
300
+ Baseline,32700,mme_total_score,1329.2223889555821,
301
+ Baseline,32700,mmstar_average,0.3761847400658931,
302
+ Baseline,32700,ocrbench_ocrbench_accuracy,0.626,
303
+ Baseline,32700,textvqa_val_exact_match,0.6381000000000001,0.006469625121275727
304
+ Baseline,33900,ai2d_exact_match,0.5019430051813472,0.00899908617055323
305
+ Baseline,33900,average,0.5185104134885045,
306
+ Baseline,33900,average_rank,1.5555555555555556,
307
+ Baseline,33900,chartqa_relaxed_overall,0.6784,0.009343676884347384
308
+ Baseline,33900,docvqa_val_anls,0.7328401883203162,0.005641229328683336
309
+ Baseline,33900,infovqa_val_anls,0.33727943427582574,0.0077500601420040695
310
+ Baseline,33900,mme_total_score,1330.3196278511405,
311
+ Baseline,33900,mmmu_val_mmmu_acc,0.28,
312
+ Baseline,33900,mmstar_average,0.3640006801305467,
313
+ Baseline,33900,ocrbench_ocrbench_accuracy,0.617,
314
+ Baseline,33900,textvqa_val_exact_match,0.63662,0.006467562214018388
315
+ Baseline,35100,ai2d_exact_match,0.5029145077720207,0.008999001233939133
316
+ Baseline,35100,average,0.5522905800868071,
317
+ Baseline,35100,average_rank,1.625,
318
+ Baseline,35100,chartqa_relaxed_overall,0.68,0.009331389496316869
319
+ Baseline,35100,docvqa_val_anls,0.7269648828481717,0.005683622810231662
320
+ Baseline,35100,infovqa_val_anls,0.33846207838337145,0.00774681529996113
321
+ Baseline,35100,mme_total_score,1299.1129451780712,
322
+ Baseline,35100,mmstar_average,0.36183259160408615,
323
+ Baseline,35100,ocrbench_ocrbench_accuracy,0.616,
324
+ Baseline,35100,textvqa_val_exact_match,0.63986,0.0064564830453322595
325
+ Baseline,36300,ai2d_exact_match,0.501619170984456,0.008999106932714636
326
+ Baseline,36300,average,0.5203510175588769,
327
+ Baseline,36300,average_rank,1.4444444444444444,
328
+ Baseline,36300,chartqa_relaxed_overall,0.6808,0.009325198535746702
329
+ Baseline,36300,docvqa_val_anls,0.7270212281583848,0.0056833541878296414
330
+ Baseline,36300,infovqa_val_anls,0.3340392024865933,0.007611756166885497
331
+ Baseline,36300,mme_total_score,1280.1442577030812,
332
+ Baseline,36300,mmmu_val_mmmu_acc,0.30111,
333
+ Baseline,36300,mmstar_average,0.36247853884158143,
334
+ Baseline,36300,ocrbench_ocrbench_accuracy,0.615,
335
+ Baseline,36300,textvqa_val_exact_match,0.64074,0.0064493076522863105
336
+ Baseline,37500,ai2d_exact_match,0.5074481865284974,0.008998155599035891
337
+ Baseline,37500,average,0.5599086924183005,
338
+ Baseline,37500,average_rank,1.25,
339
+ Baseline,37500,chartqa_relaxed_overall,0.69,0.009251715392027472
340
+ Baseline,37500,docvqa_val_anls,0.7338638293909314,0.005628628195159443
341
+ Baseline,37500,infovqa_val_anls,0.35075945776545553,0.007880392253956911
342
+ Baseline,37500,mme_total_score,1308.0833333333333,
343
+ Baseline,37500,mmstar_average,0.37624937324321944,
344
+ Baseline,37500,ocrbench_ocrbench_accuracy,0.622,
345
+ Baseline,37500,textvqa_val_exact_match,0.63904,0.006478670412520058
346
+ Baseline,38700,ai2d_exact_match,0.5,0.008999154119267315
347
+ Baseline,38700,average,0.5225140432328732,
348
+ Baseline,38700,average_rank,1.5555555555555556,
349
+ Baseline,38700,chartqa_relaxed_overall,0.6832,0.009306435832216308
350
+ Baseline,38700,docvqa_val_anls,0.73088808708227,0.00563114482117092
351
+ Baseline,38700,infovqa_val_anls,0.3478216232204623,0.00789714223139076
352
+ Baseline,38700,mme_total_score,1277.5526210484195,
353
+ Baseline,38700,mmmu_val_mmmu_acc,0.28667,
354
+ Baseline,38700,mmstar_average,0.3681926355602532,
355
+ Baseline,38700,ocrbench_ocrbench_accuracy,0.624,
356
+ Baseline,38700,textvqa_val_exact_match,0.6393399999999999,0.00647079957419683
357
+ Baseline,39900,ai2d_exact_match,0.5058290155440415,0.008998542562369288
358
+ Baseline,39900,average,0.5567573845010034,
359
+ Baseline,39900,average_rank,1.375,
360
+ Baseline,39900,chartqa_relaxed_overall,0.6788,0.00934061683451043
361
+ Baseline,39900,docvqa_val_anls,0.7307115103048833,0.005666517404544185
362
+ Baseline,39900,infovqa_val_anls,0.3519024541637205,0.007911172051974351
363
+ Baseline,39900,mme_total_score,1294.3033213285314,
364
+ Baseline,39900,mmstar_average,0.36969871149437833,
365
+ Baseline,39900,ocrbench_ocrbench_accuracy,0.619,
366
+ Baseline,39900,textvqa_val_exact_match,0.6413599999999999,0.006448549204074314
367
+ Internal Deduplication,300,ai2d_exact_match,0.2503238341968912,0.007796858242572104
368
+ Internal Deduplication,300,average,0.19412722789194248,
369
+ Internal Deduplication,300,average_rank,1.5555555555555556,
370
+ Internal Deduplication,300,chartqa_relaxed_overall,0.1412,0.0069659481604092775
371
+ Internal Deduplication,300,docvqa_val_anls,0.15637861297756628,0.004267695603476823
372
+ Internal Deduplication,300,infovqa_val_anls,0.1042887841127396,0.005046536381262501
373
+ Internal Deduplication,300,mme_total_score,598.6149459783913,
374
+ Internal Deduplication,300,mmmu_val_mmmu_acc,0.26556,
375
+ Internal Deduplication,300,mmstar_average,0.2694265918483427,
376
+ Internal Deduplication,300,ocrbench_ocrbench_accuracy,0.167,
377
+ Internal Deduplication,300,textvqa_val_exact_match,0.19884000000000002,0.005492264002465154
378
+ Internal Deduplication,1500,ai2d_exact_match,0.27299222797927464,0.008018190192865413
379
+ Internal Deduplication,1500,average,0.31955460499150806,
380
+ Internal Deduplication,1500,average_rank,1.7777777777777777,
381
+ Internal Deduplication,1500,chartqa_relaxed_overall,0.3708,0.00966231277258432
382
+ Internal Deduplication,1500,docvqa_val_anls,0.42768709568231533,0.006154040400291129
383
+ Internal Deduplication,1500,infovqa_val_anls,0.2099303690224102,0.00676857279363082
384
+ Internal Deduplication,1500,mme_total_score,992.9132653061225,
385
+ Internal Deduplication,1500,mmmu_val_mmmu_acc,0.26889,
386
+ Internal Deduplication,1500,mmstar_average,0.21057714724806412,
387
+ Internal Deduplication,1500,ocrbench_ocrbench_accuracy,0.404,
388
+ Internal Deduplication,1500,textvqa_val_exact_match,0.39155999999999996,0.006665511164780805
389
+ Internal Deduplication,2700,ai2d_exact_match,0.295660621761658,0.008213332656949247
390
+ Internal Deduplication,2700,average,0.36762151428382045,
391
+ Internal Deduplication,2700,average_rank,1.5555555555555556,
392
+ Internal Deduplication,2700,chartqa_relaxed_overall,0.4752,0.009989689762981844
393
+ Internal Deduplication,2700,docvqa_val_anls,0.5094800317043119,0.006254649346492251
394
+ Internal Deduplication,2700,infovqa_val_anls,0.20719401979989327,0.006520807933324386
395
+ Internal Deduplication,2700,mme_total_score,1071.3925570228091,
396
+ Internal Deduplication,2700,mmmu_val_mmmu_acc,0.27,
397
+ Internal Deduplication,2700,mmstar_average,0.2397774410047003,
398
+ Internal Deduplication,2700,ocrbench_ocrbench_accuracy,0.494,
399
+ Internal Deduplication,2700,textvqa_val_exact_match,0.44965999999999995,0.006770608917152268
400
+ Internal Deduplication,3900,ai2d_exact_match,0.35751295336787564,0.008626006165018857
401
+ Internal Deduplication,3900,average,0.40092708598125315,
402
+ Internal Deduplication,3900,average_rank,1.5555555555555556,
403
+ Internal Deduplication,3900,chartqa_relaxed_overall,0.5108,0.009999667061284322
404
+ Internal Deduplication,3900,docvqa_val_anls,0.5404721998847206,0.0062378368939630035
405
+ Internal Deduplication,3900,infovqa_val_anls,0.22349780573998537,0.006643570027298634
406
+ Internal Deduplication,3900,mme_total_score,1134.516706682673,
407
+ Internal Deduplication,3900,mmmu_val_mmmu_acc,0.29111,
408
+ Internal Deduplication,3900,mmstar_average,0.27976372885744333,
409
+ Internal Deduplication,3900,ocrbench_ocrbench_accuracy,0.51,
410
+ Internal Deduplication,3900,textvqa_val_exact_match,0.49426000000000003,0.006797576913163843
411
+ Internal Deduplication,5100,ai2d_exact_match,0.38827720207253885,0.008771623130477878
412
+ Internal Deduplication,5100,average,0.4219485735226934,
413
+ Internal Deduplication,5100,average_rank,1.7777777777777777,
414
+ Internal Deduplication,5100,chartqa_relaxed_overall,0.5236,0.009990852959439592
415
+ Internal Deduplication,5100,docvqa_val_anls,0.5747949496010799,0.006245322873999332
416
+ Internal Deduplication,5100,infovqa_val_anls,0.2283558074433608,0.006643505571541433
417
+ Internal Deduplication,5100,mme_total_score,1120.3775510204082,
418
+ Internal Deduplication,5100,mmmu_val_mmmu_acc,0.27444,
419
+ Internal Deduplication,5100,mmstar_average,0.32262062906456745,
420
+ Internal Deduplication,5100,ocrbench_ocrbench_accuracy,0.546,
421
+ Internal Deduplication,5100,textvqa_val_exact_match,0.5175,0.006791610648074506
422
+ Internal Deduplication,6300,ai2d_exact_match,0.3947538860103627,0.008797532848529212
423
+ Internal Deduplication,6300,average,0.4392913905300591,
424
+ Internal Deduplication,6300,average_rank,1.5555555555555556,
425
+ Internal Deduplication,6300,chartqa_relaxed_overall,0.554,0.009943497838271193
426
+ Internal Deduplication,6300,docvqa_val_anls,0.6054354573141266,0.006148692369883667
427
+ Internal Deduplication,6300,infovqa_val_anls,0.2479668172159887,0.006849066135124891
428
+ Internal Deduplication,6300,mme_total_score,1120.747699079632,
429
+ Internal Deduplication,6300,mmmu_val_mmmu_acc,0.28222,
430
+ Internal Deduplication,6300,mmstar_average,0.33081496369999497,
431
+ Internal Deduplication,6300,ocrbench_ocrbench_accuracy,0.562,
432
+ Internal Deduplication,6300,textvqa_val_exact_match,0.53714,0.00675218797787041
433
+ Internal Deduplication,7500,ai2d_exact_match,0.4368523316062176,0.008927095061184939
434
+ Internal Deduplication,7500,average,0.4484625925841701,
435
+ Internal Deduplication,7500,average_rank,1.6666666666666667,
436
+ Internal Deduplication,7500,chartqa_relaxed_overall,0.5716,0.009898917689756362
437
+ Internal Deduplication,7500,docvqa_val_anls,0.6158904129878224,0.006156668221029065
438
+ Internal Deduplication,7500,infovqa_val_anls,0.2491041330885082,0.006950914810318631
439
+ Internal Deduplication,7500,mme_total_score,1182.0997398959585,
440
+ Internal Deduplication,7500,mmmu_val_mmmu_acc,0.30222,
441
+ Internal Deduplication,7500,mmstar_average,0.3126938629908125,
442
+ Internal Deduplication,7500,ocrbench_ocrbench_accuracy,0.554,
443
+ Internal Deduplication,7500,textvqa_val_exact_match,0.5453399999999999,0.006743052026354684
444
+ Internal Deduplication,8700,ai2d_exact_match,0.43555699481865284,0.008924095913829722
445
+ Internal Deduplication,8700,average,0.4610890710492869,
446
+ Internal Deduplication,8700,average_rank,1.4444444444444444,
447
+ Internal Deduplication,8700,chartqa_relaxed_overall,0.5856,0.009854334029231191
448
+ Internal Deduplication,8700,docvqa_val_anls,0.6337792662388687,0.006121292484093459
449
+ Internal Deduplication,8700,infovqa_val_anls,0.3014589775424448,0.007723778532370607
450
+ Internal Deduplication,8700,mme_total_score,1146.702080832333,
451
+ Internal Deduplication,8700,mmmu_val_mmmu_acc,0.28111,
452
+ Internal Deduplication,8700,mmstar_average,0.34138732979432873,
453
+ Internal Deduplication,8700,ocrbench_ocrbench_accuracy,0.554,
454
+ Internal Deduplication,8700,textvqa_val_exact_match,0.5558200000000001,0.006722310868494742
455
+ Internal Deduplication,9900,ai2d_exact_match,0.4530440414507772,0.008959382447335284
456
+ Internal Deduplication,9900,average,0.4640919637505932,
457
+ Internal Deduplication,9900,average_rank,1.4444444444444444,
458
+ Internal Deduplication,9900,chartqa_relaxed_overall,0.596,0.009815912634917984
459
+ Internal Deduplication,9900,docvqa_val_anls,0.6449581300442709,0.006031449307242489
460
+ Internal Deduplication,9900,infovqa_val_anls,0.2651241729320676,0.007027677036596941
461
+ Internal Deduplication,9900,mme_total_score,1198.2277911164465,
462
+ Internal Deduplication,9900,mmmu_val_mmmu_acc,0.28,
463
+ Internal Deduplication,9900,mmstar_average,0.33564936557763,
464
+ Internal Deduplication,9900,ocrbench_ocrbench_accuracy,0.571,
465
+ Internal Deduplication,9900,textvqa_val_exact_match,0.5669599999999999,0.0067004067615447065
466
+ Internal Deduplication,11100,ai2d_exact_match,0.4566062176165803,0.008965198879336196
467
+ Internal Deduplication,11100,average,0.4745786301209996,
468
+ Internal Deduplication,11100,average_rank,1.3333333333333333,
469
+ Internal Deduplication,11100,chartqa_relaxed_overall,0.608,0.00976588700628918
470
+ Internal Deduplication,11100,docvqa_val_anls,0.6596743239996393,0.005996833864420919
471
+ Internal Deduplication,11100,infovqa_val_anls,0.30142039609988674,0.0075421730872732295
472
+ Internal Deduplication,11100,mme_total_score,1136.5589235694279,
473
+ Internal Deduplication,11100,mmmu_val_mmmu_acc,0.29,
474
+ Internal Deduplication,11100,mmstar_average,0.32532810325189065,
475
+ Internal Deduplication,11100,ocrbench_ocrbench_accuracy,0.586,
476
+ Internal Deduplication,11100,textvqa_val_exact_match,0.5696,0.00669753233570974
477
+ Internal Deduplication,12300,ai2d_exact_match,0.47085492227979275,0.0089838527076916
478
+ Internal Deduplication,12300,average,0.47675266119609205,
479
+ Internal Deduplication,12300,average_rank,1.4444444444444444,
480
+ Internal Deduplication,12300,chartqa_relaxed_overall,0.6024,0.009789996609470577
481
+ Internal Deduplication,12300,docvqa_val_anls,0.6541921314490913,0.0059901948837693935
482
+ Internal Deduplication,12300,infovqa_val_anls,0.26890492643687214,0.0068929334847927185
483
+ Internal Deduplication,12300,mme_total_score,1180.1697679071628,
484
+ Internal Deduplication,12300,mmmu_val_mmmu_acc,0.30111,
485
+ Internal Deduplication,12300,mmstar_average,0.3420593094029801,
486
+ Internal Deduplication,12300,ocrbench_ocrbench_accuracy,0.588,
487
+ Internal Deduplication,12300,textvqa_val_exact_match,0.5865000000000001,0.006650353031162167
488
+ Internal Deduplication,13500,ai2d_exact_match,0.4689119170984456,0.008981742470016596
489
+ Internal Deduplication,13500,average,0.477194042186954,
490
+ Internal Deduplication,13500,average_rank,1.4444444444444444,
491
+ Internal Deduplication,13500,chartqa_relaxed_overall,0.6076,0.009767653701044555
492
+ Internal Deduplication,13500,docvqa_val_anls,0.6669529256090054,0.005964340335624923
493
+ Internal Deduplication,13500,infovqa_val_anls,0.28048200541677026,0.00715533754622952
494
+ Internal Deduplication,13500,mme_total_score,1205.548119247699,
495
+ Internal Deduplication,13500,mmmu_val_mmmu_acc,0.28556,
496
+ Internal Deduplication,13500,mmstar_average,0.3358454893714108,
497
+ Internal Deduplication,13500,ocrbench_ocrbench_accuracy,0.589,
498
+ Internal Deduplication,13500,textvqa_val_exact_match,0.5832,0.006654352566675162
499
+ Internal Deduplication,14700,ai2d_exact_match,0.47733160621761656,0.008989900821900263
500
+ Internal Deduplication,14700,average,0.4884023663438535,
501
+ Internal Deduplication,14700,average_rank,1.4444444444444444,
502
+ Internal Deduplication,14700,chartqa_relaxed_overall,0.6304,0.009655859891905061
503
+ Internal Deduplication,14700,docvqa_val_anls,0.6801802838124448,0.005922660123416213
504
+ Internal Deduplication,14700,infovqa_val_anls,0.306442807638199,0.007585813874676366
505
+ Internal Deduplication,14700,mme_total_score,1141.5065026010404,
506
+ Internal Deduplication,14700,mmmu_val_mmmu_acc,0.28556,
507
+ Internal Deduplication,14700,mmstar_average,0.3313042330825678,
508
+ Internal Deduplication,14700,ocrbench_ocrbench_accuracy,0.601,
509
+ Internal Deduplication,14700,textvqa_val_exact_match,0.595,0.006618682753560443
510
+ Internal Deduplication,15900,ai2d_exact_match,0.48737046632124353,0.0089962828388782
511
+ Internal Deduplication,15900,average,0.5203517701538484,
512
+ Internal Deduplication,15900,average_rank,1.5,
513
+ Internal Deduplication,15900,chartqa_relaxed_overall,0.6268,0.009675026948726469
514
+ Internal Deduplication,15900,docvqa_val_anls,0.6832159326200654,0.005900840845629961
515
+ Internal Deduplication,15900,infovqa_val_anls,0.3152545751330662,0.007651477632904633
516
+ Internal Deduplication,15900,mme_total_score,1225.4948979591836,
517
+ Internal Deduplication,15900,mmstar_average,0.32764141700256333,
518
+ Internal Deduplication,15900,ocrbench_ocrbench_accuracy,0.603,
519
+ Internal Deduplication,15900,textvqa_val_exact_match,0.5991799999999999,0.006605224547149299
520
+ Internal Deduplication,17100,ai2d_exact_match,0.47636010362694303,0.008989090232793597
521
+ Internal Deduplication,17100,average,0.4961663419392575,
522
+ Internal Deduplication,17100,average_rank,1.2222222222222223,
523
+ Internal Deduplication,17100,chartqa_relaxed_overall,0.6464,0.009563650001989001
524
+ Internal Deduplication,17100,docvqa_val_anls,0.6927261914773173,0.005861047908265113
525
+ Internal Deduplication,17100,infovqa_val_anls,0.3154358494585615,0.00763456160506387
526
+ Internal Deduplication,17100,mme_total_score,1286.2750100040016,
527
+ Internal Deduplication,17100,mmmu_val_mmmu_acc,0.29889,
528
+ Internal Deduplication,17100,mmstar_average,0.34921859095123836,
529
+ Internal Deduplication,17100,ocrbench_ocrbench_accuracy,0.587,
530
+ Internal Deduplication,17100,textvqa_val_exact_match,0.6033,0.006602767700613255
531
+ Internal Deduplication,18300,ai2d_exact_match,0.4786269430051813,0.008990928596702264
532
+ Internal Deduplication,18300,average,0.5266473503807093,
533
+ Internal Deduplication,18300,average_rank,1.5,
534
+ Internal Deduplication,18300,chartqa_relaxed_overall,0.6552,0.009507962165354631
535
+ Internal Deduplication,18300,docvqa_val_anls,0.6989798369115747,0.00583327960847754
536
+ Internal Deduplication,18300,infovqa_val_anls,0.31662733272229215,0.00758318378302427
537
+ Internal Deduplication,18300,mme_total_score,1217.9891956782712,
538
+ Internal Deduplication,18300,mmstar_average,0.3360973400259174,
539
+ Internal Deduplication,18300,ocrbench_ocrbench_accuracy,0.595,
540
+ Internal Deduplication,18300,textvqa_val_exact_match,0.6060000000000001,0.006592108249887561
541
+ Internal Deduplication,19500,ai2d_exact_match,0.4896373056994819,0.008997221155546277
542
+ Internal Deduplication,19500,average,0.5003413312777834,
543
+ Internal Deduplication,19500,average_rank,1.5555555555555556,
544
+ Internal Deduplication,19500,chartqa_relaxed_overall,0.6508,0.009536252935404934
545
+ Internal Deduplication,19500,docvqa_val_anls,0.7013552478733074,0.005824977752328648
546
+ Internal Deduplication,19500,infovqa_val_anls,0.32620790060169225,0.007764453086996403
547
+ Internal Deduplication,19500,mme_total_score,1299.4400760304122,
548
+ Internal Deduplication,19500,mmmu_val_mmmu_acc,0.29556,
549
+ Internal Deduplication,19500,mmstar_average,0.3368301960477849,
550
+ Internal Deduplication,19500,ocrbench_ocrbench_accuracy,0.593,
551
+ Internal Deduplication,19500,textvqa_val_exact_match,0.60934,0.006559905437723197
552
+ Internal Deduplication,20700,ai2d_exact_match,0.4889896373056995,0.008996971954224612
553
+ Internal Deduplication,20700,average,0.5296276786578733,
554
+ Internal Deduplication,20700,average_rank,1.75,
555
+ Internal Deduplication,20700,chartqa_relaxed_overall,0.6444,0.009575809858898698
556
+ Internal Deduplication,20700,docvqa_val_anls,0.6989112987356239,0.00585808944665685
557
+ Internal Deduplication,20700,infovqa_val_anls,0.3158264619814475,0.007568423570507376
558
+ Internal Deduplication,20700,mme_total_score,1174.7768107242898,
559
+ Internal Deduplication,20700,mmstar_average,0.33400635258234235,
560
+ Internal Deduplication,20700,ocrbench_ocrbench_accuracy,0.614,
561
+ Internal Deduplication,20700,textvqa_val_exact_match,0.6112599999999999,0.0065589363778955695
562
+ Internal Deduplication,21900,ai2d_exact_match,0.4957901554404145,0.008998835133354702
563
+ Internal Deduplication,21900,average,0.5035083877228906,
564
+ Internal Deduplication,21900,average_rank,1.5555555555555556,
565
+ Internal Deduplication,21900,chartqa_relaxed_overall,0.64,0.009601920576192066
566
+ Internal Deduplication,21900,docvqa_val_anls,0.7037412472922321,0.005813532329025727
567
+ Internal Deduplication,21900,infovqa_val_anls,0.3194560697014221,0.007649647661031666
568
+ Internal Deduplication,21900,mme_total_score,1199.6734693877552,
569
+ Internal Deduplication,21900,mmmu_val_mmmu_acc,0.30889,
570
+ Internal Deduplication,21900,mmstar_average,0.33692962934905674,
571
+ Internal Deduplication,21900,ocrbench_ocrbench_accuracy,0.603,
572
+ Internal Deduplication,21900,textvqa_val_exact_match,0.6202599999999999,0.006539392877923941
573
+ Internal Deduplication,23100,ai2d_exact_match,0.4944948186528497,0.008998608627616672
574
+ Internal Deduplication,23100,average,0.5413853458503779,
575
+ Internal Deduplication,23100,average_rank,1.5,
576
+ Internal Deduplication,23100,chartqa_relaxed_overall,0.646,0.009566096595876119
577
+ Internal Deduplication,23100,docvqa_val_anls,0.7101587999220607,0.005806193919644477
578
+ Internal Deduplication,23100,infovqa_val_anls,0.336754873549068,0.007886540099947482
579
+ Internal Deduplication,23100,mme_total_score,1316.6187474989997,
580
+ Internal Deduplication,23100,mmstar_average,0.3476289288286667,
581
+ Internal Deduplication,23100,ocrbench_ocrbench_accuracy,0.627,
582
+ Internal Deduplication,23100,textvqa_val_exact_match,0.62766,0.006520482207447814
583
+ Internal Deduplication,24300,ai2d_exact_match,0.4899611398963731,0.008997340090107673
584
+ Internal Deduplication,24300,average,0.5100750686661266,
585
+ Internal Deduplication,24300,average_rank,1.4444444444444444,
586
+ Internal Deduplication,24300,chartqa_relaxed_overall,0.6516,0.009531175862679805
587
+ Internal Deduplication,24300,docvqa_val_anls,0.7179021844889384,0.005742973360829408
588
+ Internal Deduplication,24300,infovqa_val_anls,0.3358758923979091,0.007878017215252312
589
+ Internal Deduplication,24300,mme_total_score,1409.844237695078,
590
+ Internal Deduplication,24300,mmmu_val_mmmu_acc,0.28556,
591
+ Internal Deduplication,24300,mmstar_average,0.3347613325457924,
592
+ Internal Deduplication,24300,ocrbench_ocrbench_accuracy,0.634,
593
+ Internal Deduplication,24300,textvqa_val_exact_match,0.63094,0.006498229657201687
594
+ Internal Deduplication,25500,ai2d_exact_match,0.48607512953367876,0.008995663534025174
595
+ Internal Deduplication,25500,average,0.5472398215745332,
596
+ Internal Deduplication,25500,average_rank,1.375,
597
+ Internal Deduplication,25500,chartqa_relaxed_overall,0.6536,0.0095183536193109
598
+ Internal Deduplication,25500,docvqa_val_anls,0.7180940785000507,0.005735169057784404
599
+ Internal Deduplication,25500,infovqa_val_anls,0.35632636677863483,0.008180298439903802
600
+ Internal Deduplication,25500,mme_total_score,1376.716986794718,
601
+ Internal Deduplication,25500,mmstar_average,0.3529231762093682,
602
+ Internal Deduplication,25500,ocrbench_ocrbench_accuracy,0.633,
603
+ Internal Deduplication,25500,textvqa_val_exact_match,0.63066,0.006504156647155582
604
+ Internal Deduplication,26700,ai2d_exact_match,0.49255181347150256,0.008998155599035912
605
+ Internal Deduplication,26700,average,0.516487110189266,
606
+ Internal Deduplication,26700,average_rank,1.5555555555555556,
607
+ Internal Deduplication,26700,chartqa_relaxed_overall,0.6644,0.009445885130487209
608
+ Internal Deduplication,26700,docvqa_val_anls,0.7168133343849862,0.005756579734549226
609
+ Internal Deduplication,26700,infovqa_val_anls,0.34371436472133005,0.008017561696940439
610
+ Internal Deduplication,26700,mme_total_score,1409.4487795118048,
611
+ Internal Deduplication,26700,mmmu_val_mmmu_acc,0.30222,
612
+ Internal Deduplication,26700,mmstar_average,0.35023736893630925,
613
+ Internal Deduplication,26700,ocrbench_ocrbench_accuracy,0.63,
614
+ Internal Deduplication,26700,textvqa_val_exact_match,0.6319600000000001,0.006495302107669356
615
+ Internal Deduplication,27900,ai2d_exact_match,0.4954663212435233,0.008998784170060767
616
+ Internal Deduplication,27900,average,0.5488694312151498,
617
+ Internal Deduplication,27900,average_rank,1.375,
618
+ Internal Deduplication,27900,chartqa_relaxed_overall,0.6736,0.009379787213112317
619
+ Internal Deduplication,27900,docvqa_val_anls,0.7224633461958828,0.005716176978314635
620
+ Internal Deduplication,27900,infovqa_val_anls,0.35413809221269893,0.00811649922857756
621
+ Internal Deduplication,27900,mme_total_score,1365.8970588235293,
622
+ Internal Deduplication,27900,mmstar_average,0.33847825885394267,
623
+ Internal Deduplication,27900,ocrbench_ocrbench_accuracy,0.623,
624
+ Internal Deduplication,27900,textvqa_val_exact_match,0.6349400000000001,0.006474057612069333
625
+ Internal Deduplication,29100,ai2d_exact_match,0.4957901554404145,0.008998835133354704
626
+ Internal Deduplication,29100,average,0.5113797484193323,
627
+ Internal Deduplication,29100,average_rank,2.0,
628
+ Internal Deduplication,29100,chartqa_relaxed_overall,0.6604,0.009473364442136777
629
+ Internal Deduplication,29100,docvqa_val_anls,0.716657704725735,0.005756925555640175
630
+ Internal Deduplication,29100,infovqa_val_anls,0.3372271343716428,0.007828634509891694
631
+ Internal Deduplication,29100,mme_total_score,1300.1049419767908,
632
+ Internal Deduplication,29100,mmmu_val_mmmu_acc,0.29556,
633
+ Internal Deduplication,29100,mmstar_average,0.33882299281686595,
634
+ Internal Deduplication,29100,ocrbench_ocrbench_accuracy,0.613,
635
+ Internal Deduplication,29100,textvqa_val_exact_match,0.6335799999999999,0.006486361946288509
636
+ Internal Deduplication,30300,ai2d_exact_match,0.49676165803108807,0.008998965371572352
637
+ Internal Deduplication,30300,average,0.5468368131516261,
638
+ Internal Deduplication,30300,average_rank,1.625,
639
+ Internal Deduplication,30300,chartqa_relaxed_overall,0.6608,0.009470650520873179
640
+ Internal Deduplication,30300,docvqa_val_anls,0.7208981382284003,0.005745692168242118
641
+ Internal Deduplication,30300,infovqa_val_anls,0.33146012551516996,0.007795838114372819
642
+ Internal Deduplication,30300,mme_total_score,1330.1678671468587,
643
+ Internal Deduplication,30300,mmstar_average,0.35709777028672485,
644
+ Internal Deduplication,30300,ocrbench_ocrbench_accuracy,0.622,
645
+ Internal Deduplication,30300,textvqa_val_exact_match,0.6388400000000001,0.006462092742178937
646
+ Internal Deduplication,31500,ai2d_exact_match,0.4996761658031088,0.008999152231809677
647
+ Internal Deduplication,31500,average,0.5161255997108974,
648
+ Internal Deduplication,31500,average_rank,1.4444444444444444,
649
+ Internal Deduplication,31500,chartqa_relaxed_overall,0.6624,0.009459719367730022
650
+ Internal Deduplication,31500,docvqa_val_anls,0.7248827916963386,0.005715267948257416
651
+ Internal Deduplication,31500,infovqa_val_anls,0.3462785194206036,0.007940616340604684
652
+ Internal Deduplication,31500,mme_total_score,1388.7246898759504,
653
+ Internal Deduplication,31500,mmmu_val_mmmu_acc,0.28556,
654
+ Internal Deduplication,31500,mmstar_average,0.34634732076712815,
655
+ Internal Deduplication,31500,ocrbench_ocrbench_accuracy,0.622,
656
+ Internal Deduplication,31500,textvqa_val_exact_match,0.64186,0.006449237676913657
657
+ Internal Deduplication,32700,ai2d_exact_match,0.4957901554404145,0.008998835133354704
658
+ Internal Deduplication,32700,average,0.5500475012134611,
659
+ Internal Deduplication,32700,average_rank,1.5,
660
+ Internal Deduplication,32700,chartqa_relaxed_overall,0.6688,0.009414779829167153
661
+ Internal Deduplication,32700,docvqa_val_anls,0.7263156273407247,0.00570514646941267
662
+ Internal Deduplication,32700,infovqa_val_anls,0.3489756877198793,0.00798640336179305
663
+ Internal Deduplication,32700,mme_total_score,1362.764905962385,
664
+ Internal Deduplication,32700,mmstar_average,0.3385910379932094,
665
+ Internal Deduplication,32700,ocrbench_ocrbench_accuracy,0.63,
666
+ Internal Deduplication,32700,textvqa_val_exact_match,0.64186,0.006452586710386076
667
+ Internal Deduplication,33900,ai2d_exact_match,0.4957901554404145,0.008998835133354704
668
+ Internal Deduplication,33900,average,0.5160312203077811,
669
+ Internal Deduplication,33900,average_rank,1.4444444444444444,
670
+ Internal Deduplication,33900,chartqa_relaxed_overall,0.674,0.009376820884924869
671
+ Internal Deduplication,33900,docvqa_val_anls,0.7257174511919398,0.005702388110070895
672
+ Internal Deduplication,33900,infovqa_val_anls,0.3422539948680319,0.007936425119162906
673
+ Internal Deduplication,33900,mme_total_score,1389.4628851540615,
674
+ Internal Deduplication,33900,mmmu_val_mmmu_acc,0.28444,
675
+ Internal Deduplication,33900,mmstar_average,0.34272816096186326,
676
+ Internal Deduplication,33900,ocrbench_ocrbench_accuracy,0.619,
677
+ Internal Deduplication,33900,textvqa_val_exact_match,0.64432,0.0064359794815068575
678
+ Internal Deduplication,35100,ai2d_exact_match,0.49838082901554404,0.008999106932714645
679
+ Internal Deduplication,35100,average,0.5533101842015907,
680
+ Internal Deduplication,35100,average_rank,1.375,
681
+ Internal Deduplication,35100,chartqa_relaxed_overall,0.6736,0.009379787213112317
682
+ Internal Deduplication,35100,docvqa_val_anls,0.7278181728761878,0.005688301164010059
683
+ Internal Deduplication,35100,infovqa_val_anls,0.351201318391893,0.008119188634171728
684
+ Internal Deduplication,35100,mme_total_score,1411.3839535814327,
685
+ Internal Deduplication,35100,mmstar_average,0.34205096912751043,
686
+ Internal Deduplication,35100,ocrbench_ocrbench_accuracy,0.634,
687
+ Internal Deduplication,35100,textvqa_val_exact_match,0.64612,0.006431209933771596
688
+ Internal Deduplication,36300,ai2d_exact_match,0.49805699481865284,0.00899908617055324
689
+ Internal Deduplication,36300,average,0.5195231205481649,
690
+ Internal Deduplication,36300,average_rank,1.5555555555555556,
691
+ Internal Deduplication,36300,chartqa_relaxed_overall,0.672,0.009391574983583366
692
+ Internal Deduplication,36300,docvqa_val_anls,0.730916270863908,0.005660120362847363
693
+ Internal Deduplication,36300,infovqa_val_anls,0.3412406587672079,0.007911958522422949
694
+ Internal Deduplication,36300,mme_total_score,1367.637254901961,
695
+ Internal Deduplication,36300,mmmu_val_mmmu_acc,0.29444,
696
+ Internal Deduplication,36300,mmstar_average,0.34529103993555027,
697
+ Internal Deduplication,36300,ocrbench_ocrbench_accuracy,0.634,
698
+ Internal Deduplication,36300,textvqa_val_exact_match,0.6402399999999999,0.006461617365628822
699
+ Internal Deduplication,37500,ai2d_exact_match,0.5019430051813472,0.008999086170553233
700
+ Internal Deduplication,37500,average,0.5495836143474903,
701
+ Internal Deduplication,37500,average_rank,1.75,
702
+ Internal Deduplication,37500,chartqa_relaxed_overall,0.6756,0.009364877808842454
703
+ Internal Deduplication,37500,docvqa_val_anls,0.7255309514873474,0.005687086085909167
704
+ Internal Deduplication,37500,infovqa_val_anls,0.3366534174444908,0.007850461211973954
705
+ Internal Deduplication,37500,mme_total_score,1364.8713485394157,
706
+ Internal Deduplication,37500,mmstar_average,0.3467179263192468,
707
+ Internal Deduplication,37500,ocrbench_ocrbench_accuracy,0.618,
708
+ Internal Deduplication,37500,textvqa_val_exact_match,0.64264,0.0064540760066348676
709
+ Internal Deduplication,38700,ai2d_exact_match,0.49708549222797926,0.008999001233939138
710
+ Internal Deduplication,38700,average,0.5196671356527304,
711
+ Internal Deduplication,38700,average_rank,1.4444444444444444,
712
+ Internal Deduplication,38700,chartqa_relaxed_overall,0.6744,0.009373846787815587
713
+ Internal Deduplication,38700,docvqa_val_anls,0.732080533728902,0.0056514543481841085
714
+ Internal Deduplication,38700,infovqa_val_anls,0.34326469229313616,0.0079487702679686
715
+ Internal Deduplication,38700,mme_total_score,1366.760604241697,
716
+ Internal Deduplication,38700,mmmu_val_mmmu_acc,0.28778,
717
+ Internal Deduplication,38700,mmstar_average,0.34458636697182526,
718
+ Internal Deduplication,38700,ocrbench_ocrbench_accuracy,0.632,
719
+ Internal Deduplication,38700,textvqa_val_exact_match,0.6461399999999999,0.00642093963319658
720
+ Internal Deduplication,39900,ai2d_exact_match,0.4957901554404145,0.008998835133354702
721
+ Internal Deduplication,39900,average,0.5516529838475074,
722
+ Internal Deduplication,39900,average_rank,1.625,
723
+ Internal Deduplication,39900,chartqa_relaxed_overall,0.6696,0.009409024811273465
724
+ Internal Deduplication,39900,docvqa_val_anls,0.723701988394961,0.005721818793341698
725
+ Internal Deduplication,39900,infovqa_val_anls,0.3483904533235705,0.007951328084102772
726
+ Internal Deduplication,39900,mme_total_score,1403.717386954782,
727
+ Internal Deduplication,39900,mmstar_average,0.34950828977360593,
728
+ Internal Deduplication,39900,ocrbench_ocrbench_accuracy,0.629,
729
+ Internal Deduplication,39900,textvqa_val_exact_match,0.64558,0.006428340177019748
app/src/content/assets/data/mnist-variant-model.json DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:7dca86e85be46c1fca6a4e2503786e88e3f8d4609fb7284c8a1479620a5827da
3
- size 4315
 
 
 
 
app/src/content/assets/data/relevance_filters.csv ADDED
@@ -0,0 +1,1201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run,step,metric,value,stderr
2
+ Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
3
+ Baseline,1000,average,0.27120689295763617,
4
+ Baseline,1000,average_rank,3.1,
5
+ Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
6
+ Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
7
+ Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
8
+ Baseline,1000,mme_total_score,977.4280712284914,
9
+ Baseline,1000,mmmu_val_mmmu_acc,0.25222,
10
+ Baseline,1000,mmstar_average,0.23215874078908072,
11
+ Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
12
+ Baseline,1000,seedbench_seed_all,0.2563646470261256,
13
+ Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
14
+ Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
15
+ Baseline,2000,average,0.3202068275596269,
16
+ Baseline,2000,average_rank,2.9,
17
+ Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
18
+ Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
19
+ Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
20
+ Baseline,2000,mme_total_score,1049.3036214485794,
21
+ Baseline,2000,mmmu_val_mmmu_acc,0.24556,
22
+ Baseline,2000,mmstar_average,0.21305462434540698,
23
+ Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
24
+ Baseline,2000,seedbench_seed_all,0.258532518065592,
25
+ Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
26
+ Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
27
+ Baseline,3000,average,0.3507423834414229,
28
+ Baseline,3000,average_rank,2.7,
29
+ Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
30
+ Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
31
+ Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
32
+ Baseline,3000,mme_total_score,1170.2383953581434,
33
+ Baseline,3000,mmmu_val_mmmu_acc,0.27556,
34
+ Baseline,3000,mmstar_average,0.25432376938577683,
35
+ Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
36
+ Baseline,3000,seedbench_seed_all,0.2792106725958866,
37
+ Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
38
+ Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
39
+ Baseline,4000,average,0.36961781722974835,
40
+ Baseline,4000,average_rank,3.7,
41
+ Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
42
+ Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
43
+ Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
44
+ Baseline,4000,mme_total_score,1155.203781512605,
45
+ Baseline,4000,mmmu_val_mmmu_acc,0.25556,
46
+ Baseline,4000,mmstar_average,0.2575590188757354,
47
+ Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
48
+ Baseline,4000,seedbench_seed_all,0.33913285158421347,
49
+ Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
50
+ Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
51
+ Baseline,5000,average,0.3974627910380972,
52
+ Baseline,5000,average_rank,3.3,
53
+ Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
54
+ Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
55
+ Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
56
+ Baseline,5000,mme_total_score,1181.4653861544618,
57
+ Baseline,5000,mmmu_val_mmmu_acc,0.26667,
58
+ Baseline,5000,mmstar_average,0.29596648146165705,
59
+ Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
60
+ Baseline,5000,seedbench_seed_all,0.43107281823235133,
61
+ Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
62
+ Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
63
+ Baseline,6000,average,0.4161227404571003,
64
+ Baseline,6000,average_rank,2.6,
65
+ Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
66
+ Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
67
+ Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
68
+ Baseline,6000,mme_total_score,1284.1648659463785,
69
+ Baseline,6000,mmmu_val_mmmu_acc,0.27111,
70
+ Baseline,6000,mmstar_average,0.2978489412854164,
71
+ Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
72
+ Baseline,6000,seedbench_seed_all,0.4795997776542524,
73
+ Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
74
+ Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
75
+ Baseline,7000,average,0.4291083177345374,
76
+ Baseline,7000,average_rank,2.9,
77
+ Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
78
+ Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
79
+ Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
80
+ Baseline,7000,mme_total_score,1185.875650260104,
81
+ Baseline,7000,mmmu_val_mmmu_acc,0.26556,
82
+ Baseline,7000,mmstar_average,0.31372400960777047,
83
+ Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
84
+ Baseline,7000,seedbench_seed_all,0.4964424680377988,
85
+ Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
86
+ Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
87
+ Baseline,8000,average,0.43846759477995995,
88
+ Baseline,8000,average_rank,3.2,
89
+ Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
90
+ Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
91
+ Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
92
+ Baseline,8000,mme_total_score,1199.2409963985594,
93
+ Baseline,8000,mmmu_val_mmmu_acc,0.28111,
94
+ Baseline,8000,mmstar_average,0.33512257186205047,
95
+ Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
96
+ Baseline,8000,seedbench_seed_all,0.5024458032240133,
97
+ Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
98
+ Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
99
+ Baseline,9000,average,0.4422510732201056,
100
+ Baseline,9000,average_rank,3.2,
101
+ Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
102
+ Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
103
+ Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
104
+ Baseline,9000,mme_total_score,1231.5195078031213,
105
+ Baseline,9000,mmmu_val_mmmu_acc,0.25889,
106
+ Baseline,9000,mmstar_average,0.3216444898242951,
107
+ Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
108
+ Baseline,9000,seedbench_seed_all,0.5120622568093385,
109
+ Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
110
+ Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
111
+ Baseline,10000,average,0.4523875703250908,
112
+ Baseline,10000,average_rank,2.9,
113
+ Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
114
+ Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
115
+ Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
116
+ Baseline,10000,mme_total_score,1240.8218287314926,
117
+ Baseline,10000,mmmu_val_mmmu_acc,0.28778,
118
+ Baseline,10000,mmstar_average,0.32972717906018517,
119
+ Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
120
+ Baseline,10000,seedbench_seed_all,0.5217342968315731,
121
+ Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
122
+ Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
123
+ Baseline,11000,average,0.4561398159525099,
124
+ Baseline,11000,average_rank,3.0,
125
+ Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
126
+ Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
127
+ Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
128
+ Baseline,11000,mme_total_score,1322.9488795518205,
129
+ Baseline,11000,mmmu_val_mmmu_acc,0.27778,
130
+ Baseline,11000,mmstar_average,0.3298563439522548,
131
+ Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
132
+ Baseline,11000,seedbench_seed_all,0.5237354085603113,
133
+ Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
134
+ Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
135
+ Baseline,12000,average,0.4582751140055433,
136
+ Baseline,12000,average_rank,3.5,
137
+ Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
138
+ Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
139
+ Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
140
+ Baseline,12000,mme_total_score,1225.6453581432572,
141
+ Baseline,12000,mmmu_val_mmmu_acc,0.27889,
142
+ Baseline,12000,mmstar_average,0.34010867846816534,
143
+ Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
144
+ Baseline,12000,seedbench_seed_all,0.5350194552529183,
145
+ Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
146
+ Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
147
+ Baseline,13000,average,0.4692868662590049,
148
+ Baseline,13000,average_rank,2.7,
149
+ Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
150
+ Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
151
+ Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
152
+ Baseline,13000,mme_total_score,1281.7122849139657,
153
+ Baseline,13000,mmmu_val_mmmu_acc,0.28222,
154
+ Baseline,13000,mmstar_average,0.3453069542917521,
155
+ Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
156
+ Baseline,13000,seedbench_seed_all,0.5442468037798777,
157
+ Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
158
+ Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
159
+ Baseline,14000,average,0.47352486841689195,
160
+ Baseline,14000,average_rank,2.5,
161
+ Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
162
+ Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
163
+ Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
164
+ Baseline,14000,mme_total_score,1309.1444577831132,
165
+ Baseline,14000,mmmu_val_mmmu_acc,0.28111,
166
+ Baseline,14000,mmstar_average,0.34575818188776586,
167
+ Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
168
+ Baseline,14000,seedbench_seed_all,0.5483602001111729,
169
+ Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
170
+ Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
171
+ Baseline,15000,average,0.47878665012878824,
172
+ Baseline,15000,average_rank,2.6,
173
+ Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
174
+ Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
175
+ Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
176
+ Baseline,15000,mme_total_score,1384.2171868747498,
177
+ Baseline,15000,mmmu_val_mmmu_acc,0.30222,
178
+ Baseline,15000,mmstar_average,0.35408135695920684,
179
+ Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
180
+ Baseline,15000,seedbench_seed_all,0.5411339633129516,
181
+ Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
182
+ Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
183
+ Baseline,16000,average,0.47665128022935843,
184
+ Baseline,16000,average_rank,3.0,
185
+ Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
186
+ Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
187
+ Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
188
+ Baseline,16000,mme_total_score,1317.8491396558625,
189
+ Baseline,16000,mmmu_val_mmmu_acc,0.27556,
190
+ Baseline,16000,mmstar_average,0.33214333327093315,
191
+ Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
192
+ Baseline,16000,seedbench_seed_all,0.5463590883824346,
193
+ Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
194
+ Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
195
+ Baseline,17000,average,0.4777141780162423,
196
+ Baseline,17000,average_rank,2.5,
197
+ Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
198
+ Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
199
+ Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
200
+ Baseline,17000,mme_total_score,1381.9161664665867,
201
+ Baseline,17000,mmmu_val_mmmu_acc,0.27667,
202
+ Baseline,17000,mmstar_average,0.3370289492329521,
203
+ Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
204
+ Baseline,17000,seedbench_seed_all,0.5510283490828238,
205
+ Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
206
+ Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
207
+ Baseline,18000,average,0.4819834595278701,
208
+ Baseline,18000,average_rank,2.9,
209
+ Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
210
+ Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
211
+ Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
212
+ Baseline,18000,mme_total_score,1336.922769107643,
213
+ Baseline,18000,mmmu_val_mmmu_acc,0.28667,
214
+ Baseline,18000,mmstar_average,0.34482796716566916,
215
+ Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
216
+ Baseline,18000,seedbench_seed_all,0.5543079488604781,
217
+ Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
218
+ Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
219
+ Baseline,19000,average,0.4899006713916878,
220
+ Baseline,19000,average_rank,2.7,
221
+ Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
222
+ Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
223
+ Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
224
+ Baseline,19000,mme_total_score,1406.6628651460583,
225
+ Baseline,19000,mmmu_val_mmmu_acc,0.28333,
226
+ Baseline,19000,mmstar_average,0.356220913822775,
227
+ Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
228
+ Baseline,19000,seedbench_seed_all,0.554585881045025,
229
+ Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
230
+ Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
231
+ Baseline,20000,average,0.4873169067639118,
232
+ Baseline,20000,average_rank,3.1,
233
+ Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
234
+ Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
235
+ Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
236
+ Baseline,20000,mme_total_score,1324.6738695478193,
237
+ Baseline,20000,mmmu_val_mmmu_acc,0.30111,
238
+ Baseline,20000,mmstar_average,0.33806766134497995,
239
+ Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
240
+ Baseline,20000,seedbench_seed_all,0.5587548638132296,
241
+ Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
242
+ ≥2,1000,ai2d_exact_match,0.2645725388601036,0.007939149662089442
243
+ ≥2,1000,average,0.2722931646460497,
244
+ ≥2,1000,average_rank,3.2,
245
+ ≥2,1000,chartqa_relaxed_overall,0.3664,0.009638338810708616
246
+ ≥2,1000,docvqa_val_anls,0.35825461497275807,0.005864292098743202
247
+ ≥2,1000,infovqa_val_anls,0.16722293767954274,0.0061333612650745235
248
+ ≥2,1000,mme_total_score,994.9906962785115,
249
+ ≥2,1000,mmmu_val_mmmu_acc,0.25111,
250
+ ≥2,1000,mmstar_average,0.2099224814637991,
251
+ ≥2,1000,ocrbench_ocrbench_accuracy,0.304,
252
+ ≥2,1000,seedbench_seed_all,0.24463590883824346,
253
+ ≥2,1000,textvqa_val_exact_match,0.28452000000000005,0.006179555914647949
254
+ ≥2,2000,ai2d_exact_match,0.2648963730569948,0.007942257693619753
255
+ ≥2,2000,average,0.3161289250086133,
256
+ ≥2,2000,average_rank,3.2,
257
+ ≥2,2000,chartqa_relaxed_overall,0.4476,0.00994692276581072
258
+ ≥2,2000,docvqa_val_anls,0.44553207035528164,0.006176982458046509
259
+ ≥2,2000,infovqa_val_anls,0.19690312157526974,0.00648399793536667
260
+ ≥2,2000,mme_total_score,1054.2768107242898,
261
+ ≥2,2000,mmmu_val_mmmu_acc,0.24778,
262
+ ≥2,2000,mmstar_average,0.20779488571532076,
263
+ ≥2,2000,ocrbench_ocrbench_accuracy,0.383,
264
+ ≥2,2000,seedbench_seed_all,0.2529738743746526,
265
+ ≥2,2000,textvqa_val_exact_match,0.39868,0.006677826756815335
266
+ ≥2,3000,ai2d_exact_match,0.2697538860103627,0.007988222765138163
267
+ ≥2,3000,average,0.34461871112110076,
268
+ ≥2,3000,average_rank,3.7,
269
+ ≥2,3000,chartqa_relaxed_overall,0.502,0.010001920583875201
270
+ ≥2,3000,docvqa_val_anls,0.4943706505276063,0.006276617082627261
271
+ ≥2,3000,infovqa_val_anls,0.21287605644341218,0.006682253709215569
272
+ ≥2,3000,mme_total_score,1162.2701080432173,
273
+ ≥2,3000,mmmu_val_mmmu_acc,0.25556,
274
+ ��2,3000,mmstar_average,0.21427247636922603,
275
+ ≥2,3000,ocrbench_ocrbench_accuracy,0.449,
276
+ ≥2,3000,seedbench_seed_all,0.2715953307392996,
277
+ ≥2,3000,textvqa_val_exact_match,0.43213999999999997,0.006742795943777913
278
+ ≥2,4000,ai2d_exact_match,0.27525906735751293,0.008038849490577975
279
+ ≥2,4000,average,0.37379440715652495,
280
+ ≥2,4000,average_rank,2.6,
281
+ ≥2,4000,chartqa_relaxed_overall,0.5356,0.009976616117083942
282
+ ≥2,4000,docvqa_val_anls,0.5415736777563739,0.006259488230977563
283
+ ≥2,4000,infovqa_val_anls,0.22392444384387208,0.00676041701311943
284
+ ≥2,4000,mme_total_score,1195.5438175270108,
285
+ ≥2,4000,mmmu_val_mmmu_acc,0.26667,
286
+ ≥2,4000,mmstar_average,0.2507897461569136,
287
+ ≥2,4000,ocrbench_ocrbench_accuracy,0.462,
288
+ ≥2,4000,seedbench_seed_all,0.34291272929405225,
289
+ ≥2,4000,textvqa_val_exact_match,0.46542,0.0067795602517745565
290
+ ≥2,5000,ai2d_exact_match,0.31055699481865284,0.008328207321163279
291
+ ≥2,5000,average,0.39445964778826137,
292
+ ≥2,5000,average_rank,3.3,
293
+ ≥2,5000,chartqa_relaxed_overall,0.552,0.00994776272300849
294
+ ≥2,5000,docvqa_val_anls,0.5556927230238289,0.006299299461817651
295
+ ≥2,5000,infovqa_val_anls,0.24261245038285142,0.007075738778751112
296
+ ≥2,5000,mme_total_score,1220.672168867547,
297
+ ≥2,5000,mmmu_val_mmmu_acc,0.27444,
298
+ ≥2,5000,mmstar_average,0.2522926162881413,
299
+ ≥2,5000,ocrbench_ocrbench_accuracy,0.467,
300
+ ≥2,5000,seedbench_seed_all,0.42768204558087825,
301
+ ≥2,5000,textvqa_val_exact_match,0.46785999999999994,0.006777889939974511
302
+ ≥2,6000,ai2d_exact_match,0.3325777202072539,0.008479663360791275
303
+ ≥2,6000,average,0.4101998600043759,
304
+ ≥2,6000,average_rank,3.7,
305
+ ≥2,6000,chartqa_relaxed_overall,0.5672,0.009911254067113462
306
+ ≥2,6000,docvqa_val_anls,0.5702012141050906,0.006263916894054504
307
+ ≥2,6000,infovqa_val_anls,0.21632587505016104,0.006473865748732477
308
+ ≥2,6000,mme_total_score,1313.7047819127652,
309
+ ≥2,6000,mmmu_val_mmmu_acc,0.28,
310
+ ≥2,6000,mmstar_average,0.28566177948176935,
311
+ ≥2,6000,ocrbench_ocrbench_accuracy,0.486,
312
+ ≥2,6000,seedbench_seed_all,0.4698721511951084,
313
+ ≥2,6000,textvqa_val_exact_match,0.48396000000000006,0.006801425994533192
314
+ ≥2,7000,ai2d_exact_match,0.35200777202072536,0.00859592682822483
315
+ ≥2,7000,average,0.4204955224344633,
316
+ ≥2,7000,average_rank,4.0,
317
+ ≥2,7000,chartqa_relaxed_overall,0.5712,0.00990007214980924
318
+ ≥2,7000,docvqa_val_anls,0.5850734578344774,0.006202520219850679
319
+ ≥2,7000,infovqa_val_anls,0.23449023638527144,0.0067906990453115955
320
+ ≥2,7000,mme_total_score,1247.423969587835,
321
+ ≥2,7000,mmmu_val_mmmu_acc,0.28444,
322
+ ≥2,7000,mmstar_average,0.29053864145068503,
323
+ ≥2,7000,ocrbench_ocrbench_accuracy,0.487,
324
+ ≥2,7000,seedbench_seed_all,0.48526959421901056,
325
+ ≥2,7000,textvqa_val_exact_match,0.49444000000000005,0.006796105847537853
326
+ ≥2,8000,ai2d_exact_match,0.3746761658031088,0.008711886524907496
327
+ ≥2,8000,average,0.43663916832315425,
328
+ ≥2,8000,average_rank,2.9,
329
+ ≥2,8000,chartqa_relaxed_overall,0.5816,0.00986790384075991
330
+ ≥2,8000,docvqa_val_anls,0.6028798426362394,0.006214872354058686
331
+ ≥2,8000,infovqa_val_anls,0.2535281850303886,0.0070045473889607445
332
+ ≥2,8000,mme_total_score,1300.5965386154462,
333
+ ≥2,8000,mmmu_val_mmmu_acc,0.27333,
334
+ ≥2,8000,mmstar_average,0.310944925107356,
335
+ ≥2,8000,ocrbench_ocrbench_accuracy,0.516,
336
+ ≥2,8000,seedbench_seed_all,0.5041133963312951,
337
+ ≥2,8000,textvqa_val_exact_match,0.51268,0.006798079603627737
338
+ ≥2,9000,ai2d_exact_match,0.3795336787564767,0.00873405559083709
339
+ ≥2,9000,average,0.43759974296352216,
340
+ ≥2,9000,average_rank,2.9,
341
+ ≥2,9000,chartqa_relaxed_overall,0.5884,0.009844437067525526
342
+ ≥2,9000,docvqa_val_anls,0.6175894644110065,0.0061700253612544395
343
+ ≥2,9000,infovqa_val_anls,0.24471327484068725,0.006934982517240646
344
+ ≥2,9000,mme_total_score,1258.1754701880752,
345
+ ≥2,9000,mmmu_val_mmmu_acc,0.27,
346
+ ≥2,9000,mmstar_average,0.2988526527658083,
347
+ ≥2,9000,ocrbench_ocrbench_accuracy,0.514,
348
+ ≥2,9000,seedbench_seed_all,0.5155086158977209,
349
+ ≥2,9000,textvqa_val_exact_match,0.5098,0.00680062068405066
350
+ ≥2,10000,ai2d_exact_match,0.407059585492228,0.008842319527489083
351
+ ≥2,10000,average,0.45127176699921406,
352
+ ≥2,10000,average_rank,3.1,
353
+ ≥2,10000,chartqa_relaxed_overall,0.5956,0.009817474681589429
354
+ ≥2,10000,docvqa_val_anls,0.6286443353240219,0.006128441640319587
355
+ ≥2,10000,infovqa_val_anls,0.25277210900180563,0.007055702724548255
356
+ ≥2,10000,mme_total_score,1320.1028411364546,
357
+ ≥2,10000,mmmu_val_mmmu_acc,0.27556,
358
+ ≥2,10000,mmstar_average,0.3429750538307907,
359
+ ≥2,10000,ocrbench_ocrbench_accuracy,0.523,
360
+ ≥2,10000,seedbench_seed_all,0.51467481934408,
361
+ ≥2,10000,textvqa_val_exact_match,0.5211600000000001,0.006783601870014644
362
+ ≥2,11000,ai2d_exact_match,0.41580310880829013,0.008870644443998564
363
+ ≥2,11000,average,0.4525862975952584,
364
+ ≥2,11000,average_rank,3.5,
365
+ ≥2,11000,chartqa_relaxed_overall,0.598,0.009808000752013664
366
+ ≥2,11000,docvqa_val_anls,0.6307438129106796,0.006133911991297053
367
+ ≥2,11000,infovqa_val_anls,0.25390014221903434,0.007050537280004977
368
+ ≥2,11000,mme_total_score,1302.5287114845937,
369
+ ≥2,11000,mmmu_val_mmmu_acc,0.29333,
370
+ ≥2,11000,mmstar_average,0.303972877343168,
371
+ ���2,11000,ocrbench_ocrbench_accuracy,0.523,
372
+ ≥2,11000,seedbench_seed_all,0.5281267370761534,
373
+ ≥2,11000,textvqa_val_exact_match,0.5264,0.006786826961404041
374
+ ≥2,12000,ai2d_exact_match,0.43426165803108807,0.008921034830887027
375
+ ≥2,12000,average,0.46342874141175217,
376
+ ≥2,12000,average_rank,2.7,
377
+ ≥2,12000,chartqa_relaxed_overall,0.6188,0.009715574144248037
378
+ ≥2,12000,docvqa_val_anls,0.6419729722202083,0.006094582531110984
379
+ ≥2,12000,infovqa_val_anls,0.24776952598966778,0.006784112219881613
380
+ ≥2,12000,mme_total_score,1255.4957983193276,
381
+ ≥2,12000,mmmu_val_mmmu_acc,0.27111,
382
+ ≥2,12000,mmstar_average,0.3424608032908198,
383
+ ≥2,12000,ocrbench_ocrbench_accuracy,0.541,
384
+ ≥2,12000,seedbench_seed_all,0.5306837131739855,
385
+ ≥2,12000,textvqa_val_exact_match,0.5428,0.006758192556691964
386
+ ≥2,13000,ai2d_exact_match,0.42843264248704666,0.008906491762178375
387
+ ≥2,13000,average,0.4611120038339278,
388
+ ≥2,13000,average_rank,3.8,
389
+ ≥2,13000,chartqa_relaxed_overall,0.606,0.00977465178546074
390
+ ≥2,13000,docvqa_val_anls,0.6433656711922792,0.0061086851054902285
391
+ ≥2,13000,infovqa_val_anls,0.2535479547381062,0.006989226376396767
392
+ ≥2,13000,mme_total_score,1360.003101240496,
393
+ ≥2,13000,mmmu_val_mmmu_acc,0.28556,
394
+ ≥2,13000,mmstar_average,0.3320394092229932,
395
+ ≥2,13000,ocrbench_ocrbench_accuracy,0.526,
396
+ ≥2,13000,seedbench_seed_all,0.5362423568649249,
397
+ ≥2,13000,textvqa_val_exact_match,0.53882,0.006765393974568386
398
+ ≥2,14000,ai2d_exact_match,0.44689119170984454,0.008948245073044956
399
+ ≥2,14000,average,0.47130833654714216,
400
+ ≥2,14000,average_rank,2.8,
401
+ ≥2,14000,chartqa_relaxed_overall,0.6216,0.009701702181065136
402
+ ≥2,14000,docvqa_val_anls,0.6619108814388047,0.006015398975274413
403
+ ≥2,14000,infovqa_val_anls,0.2567040650730957,0.006986745571340195
404
+ ≥2,14000,mme_total_score,1310.3628451380553,
405
+ ≥2,14000,mmmu_val_mmmu_acc,0.28333,
406
+ ≥2,14000,mmstar_average,0.3315916867003111,
407
+ ≥2,14000,ocrbench_ocrbench_accuracy,0.547,
408
+ ≥2,14000,seedbench_seed_all,0.5409672040022234,
409
+ ≥2,14000,textvqa_val_exact_match,0.55178,0.006748546131944198
410
+ ≥2,15000,ai2d_exact_match,0.4523963730569948,0.00895827521082005
411
+ ≥2,15000,average,0.4720211465604895,
412
+ ≥2,15000,average_rank,3.5,
413
+ ≥2,15000,chartqa_relaxed_overall,0.62,0.009709671008043154
414
+ ≥2,15000,docvqa_val_anls,0.6679183447758706,0.005982903367170995
415
+ ≥2,15000,infovqa_val_anls,0.24815705436683513,0.006864270716284432
416
+ ≥2,15000,mme_total_score,1236.2534013605443,
417
+ ≥2,15000,mmmu_val_mmmu_acc,0.29889,
418
+ ≥2,15000,mmstar_average,0.3351456007635487,
419
+ ≥2,15000,ocrbench_ocrbench_accuracy,0.527,
420
+ ≥2,15000,seedbench_seed_all,0.5453029460811561,
421
+ ≥2,15000,textvqa_val_exact_match,0.55338,0.006735012041373013
422
+ ≥2,16000,ai2d_exact_match,0.44624352331606215,0.008946992176353898
423
+ ≥2,16000,average,0.4766960932538844,
424
+ ≥2,16000,average_rank,3.2,
425
+ ≥2,16000,chartqa_relaxed_overall,0.612,0.009747841205275417
426
+ ≥2,16000,docvqa_val_anls,0.6754589054855508,0.005966817690473989
427
+ ≥2,16000,infovqa_val_anls,0.27323519213464514,0.007206289716945655
428
+ ≥2,16000,mme_total_score,1305.906762705082,
429
+ ≥2,16000,mmmu_val_mmmu_acc,0.29,
430
+ ≥2,16000,mmstar_average,0.34328884147265926,
431
+ ≥2,16000,ocrbench_ocrbench_accuracy,0.555,
432
+ ≥2,16000,seedbench_seed_all,0.5410783768760422,
433
+ ≥2,16000,textvqa_val_exact_match,0.55396,0.00674076785464787
434
+ ≥2,17000,ai2d_exact_match,0.4485103626943005,0.008951310133709686
435
+ ≥2,17000,average,0.4803744475549501,
436
+ ≥2,17000,average_rank,3.3,
437
+ ≥2,17000,chartqa_relaxed_overall,0.6352,0.009629406741314642
438
+ ≥2,17000,docvqa_val_anls,0.6735387256928971,0.006001868055856522
439
+ ≥2,17000,infovqa_val_anls,0.2713449738427,0.007231154690666275
440
+ ≥2,17000,mme_total_score,1302.8314325730291,
441
+ ≥2,17000,mmmu_val_mmmu_acc,0.28667,
442
+ ≥2,17000,mmstar_average,0.33631999578132954,
443
+ ≥2,17000,ocrbench_ocrbench_accuracy,0.571,
444
+ ≥2,17000,seedbench_seed_all,0.542745969983324,
445
+ ≥2,17000,textvqa_val_exact_match,0.5580400000000001,0.006741465801458199
446
+ ≥2,18000,ai2d_exact_match,0.46113989637305697,0.008971933568013592
447
+ ≥2,18000,average,0.48745721111983964,
448
+ ≥2,18000,average_rank,2.6,
449
+ ≥2,18000,chartqa_relaxed_overall,0.6276,0.009670817229291067
450
+ ≥2,18000,docvqa_val_anls,0.6812777947859573,0.005935773909547658
451
+ ≥2,18000,infovqa_val_anls,0.27095882924867687,0.007164605404977649
452
+ ≥2,18000,mme_total_score,1289.7513005202081,
453
+ ≥2,18000,mmmu_val_mmmu_acc,0.31556,
454
+ ≥2,18000,mmstar_average,0.35401030852022664,
455
+ ≥2,18000,ocrbench_ocrbench_accuracy,0.564,
456
+ ≥2,18000,seedbench_seed_all,0.5505280711506393,
457
+ ≥2,18000,textvqa_val_exact_match,0.5620400000000001,0.00673487040527694
458
+ ≥2,19000,ai2d_exact_match,0.4698834196891192,0.008982814668850815
459
+ ≥2,19000,average,0.48664836716175586,
460
+ ≥2,19000,average_rank,3.2,
461
+ ≥2,19000,chartqa_relaxed_overall,0.6276,0.009670817229291067
462
+ ≥2,19000,docvqa_val_anls,0.6838077764263535,0.005944136929785695
463
+ ≥2,19000,infovqa_val_anls,0.26757170067350106,0.007096398035000058
464
+ ≥2,19000,mme_total_score,1310.4946978791518,
465
+ ≥2,19000,mmmu_val_mmmu_acc,0.29444,
466
+ ≥2,19000,mmstar_average,0.365800601107629,
467
+ ≥2,19000,ocrbench_ocrbench_accuracy,0.559,
468
+ ≥2,19000,seedbench_seed_all,0.5532518065591996,
469
+ ≥2,19000,textvqa_val_exact_match,0.55848,0.006735717623117797
470
+ ≥2,20000,ai2d_exact_match,0.4727979274611399,0.008985826352357515
471
+ ≥2,20000,average,0.4887875980209429,
472
+ ≥2,20000,average_rank,3.4,
473
+ ≥2,20000,chartqa_relaxed_overall,0.6392,0.00960657371300514
474
+ ≥2,20000,docvqa_val_anls,0.6828620051596259,0.005923332769971399
475
+ ≥2,20000,infovqa_val_anls,0.2701274975234547,0.007055868134029247
476
+ ≥2,20000,mme_total_score,1323.9108643457382,
477
+ ≥2,20000,mmmu_val_mmmu_acc,0.30222,
478
+ ≥2,20000,mmstar_average,0.33931189145504953,
479
+ ≥2,20000,ocrbench_ocrbench_accuracy,0.57,
480
+ ≥2,20000,seedbench_seed_all,0.5563090605892163,
481
+ ≥2,20000,textvqa_val_exact_match,0.56626,0.0067178082936069205
482
+ ≥3,1000,ai2d_exact_match,0.2691062176165803,0.007982164708643914
483
+ ≥3,1000,average,0.27573784261835144,
484
+ ≥3,1000,average_rank,2.8,
485
+ ≥3,1000,chartqa_relaxed_overall,0.352,0.009553790345406665
486
+ ≥3,1000,docvqa_val_anls,0.3425840937939014,0.005755186508181206
487
+ ≥3,1000,infovqa_val_anls,0.1714752271538445,0.006218691549786442
488
+ ≥3,1000,mme_total_score,1013.1872749099639,
489
+ ≥3,1000,mmmu_val_mmmu_acc,0.24778,
490
+ ≥3,1000,mmstar_average,0.2075589805205699,
491
+ ≥3,1000,ocrbench_ocrbench_accuracy,0.324,
492
+ ≥3,1000,seedbench_seed_all,0.24891606448026682,
493
+ ≥3,1000,textvqa_val_exact_match,0.31822,0.006368399926474836
494
+ ≥3,2000,ai2d_exact_match,0.25647668393782386,0.007859644922870102
495
+ ≥3,2000,average,0.32059377128504934,
496
+ ≥3,2000,average_rank,2.8,
497
+ ≥3,2000,chartqa_relaxed_overall,0.4628,0.009974279848861338
498
+ ≥3,2000,docvqa_val_anls,0.4518369496978485,0.00619300217721929
499
+ ≥3,2000,infovqa_val_anls,0.21204013425009277,0.006820894774458214
500
+ ≥3,2000,mme_total_score,1118.8858543417368,
501
+ ≥3,2000,mmmu_val_mmmu_acc,0.25222,
502
+ ≥3,2000,mmstar_average,0.20454842826555975,
503
+ ≥3,2000,ocrbench_ocrbench_accuracy,0.376,
504
+ ≥3,2000,seedbench_seed_all,0.25514174541411894,
505
+ ≥3,2000,textvqa_val_exact_match,0.41428000000000004,0.006714956027174666
506
+ ≥3,3000,ai2d_exact_match,0.25259067357512954,0.007820231277456426
507
+ ≥3,3000,average,0.35341646277484595,
508
+ ≥3,3000,average_rank,2.4,
509
+ ≥3,3000,chartqa_relaxed_overall,0.5208,0.00999334232158103
510
+ ≥3,3000,docvqa_val_anls,0.49758866181984457,0.00626460182861003
511
+ ≥3,3000,infovqa_val_anls,0.21333414080666746,0.0067509043256437935
512
+ ≥3,3000,mme_total_score,1165.3744497799119,
513
+ ≥3,3000,mmmu_val_mmmu_acc,0.26,
514
+ ≥3,3000,mmstar_average,0.2652435492500152,
515
+ ≥3,3000,ocrbench_ocrbench_accuracy,0.442,
516
+ ≥3,3000,seedbench_seed_all,0.29205113952195666,
517
+ ≥3,3000,textvqa_val_exact_match,0.43714,0.006763850531672249
518
+ ≥3,4000,ai2d_exact_match,0.28303108808290156,0.008107723290508887
519
+ ≥3,4000,average,0.37496255619498237,
520
+ ≥3,4000,average_rank,3.4,
521
+ ≥3,4000,chartqa_relaxed_overall,0.5412,0.009967987174315731
522
+ ≥3,4000,docvqa_val_anls,0.5296261512617491,0.006274192303767133
523
+ ≥3,4000,infovqa_val_anls,0.2050381576936679,0.006416570814061769
524
+ ≥3,4000,mme_total_score,1119.7681072428973,
525
+ ≥3,4000,mmmu_val_mmmu_acc,0.25556,
526
+ ≥3,4000,mmstar_average,0.24897141082880767,
527
+ ≥3,4000,ocrbench_ocrbench_accuracy,0.47,
528
+ ≥3,4000,seedbench_seed_all,0.3811561978877154,
529
+ ≥3,4000,textvqa_val_exact_match,0.46007999999999993,0.006793769924125808
530
+ ≥3,5000,ai2d_exact_match,0.3248056994818653,0.008428647470081763
531
+ ≥3,5000,average,0.3977887563101667,
532
+ ≥3,5000,average_rank,2.8,
533
+ ≥3,5000,chartqa_relaxed_overall,0.5544,0.009942625323290008
534
+ ≥3,5000,docvqa_val_anls,0.553669449701632,0.006282439058750721
535
+ ≥3,5000,infovqa_val_anls,0.20821650889148954,0.006430552192683275
536
+ ≥3,5000,mme_total_score,1326.9777911164465,
537
+ ≥3,5000,mmmu_val_mmmu_acc,0.26444,
538
+ ≥3,5000,mmstar_average,0.279759822424129,
539
+ ≥3,5000,ocrbench_ocrbench_accuracy,0.487,
540
+ ≥3,5000,seedbench_seed_all,0.43718732629238466,
541
+ ≥3,5000,textvqa_val_exact_match,0.47062,0.0067917147023207275
542
+ ≥3,6000,ai2d_exact_match,0.3536269430051813,0.008604903043803527
543
+ ≥3,6000,average,0.41524300122458385,
544
+ ≥3,6000,average_rank,3.1,
545
+ ≥3,6000,chartqa_relaxed_overall,0.568,0.009909070383761948
546
+ ≥3,6000,docvqa_val_anls,0.5722640243712676,0.00625854154899254
547
+ ≥3,6000,infovqa_val_anls,0.2204869348964998,0.00662088578415522
548
+ ≥3,6000,mme_total_score,1270.3575430172068,
549
+ ≥3,6000,mmmu_val_mmmu_acc,0.26556,
550
+ ≥3,6000,mmstar_average,0.2958896090262379,
551
+ ≥3,6000,ocrbench_ocrbench_accuracy,0.497,
552
+ ≥3,6000,seedbench_seed_all,0.47909949972206783,
553
+ ≥3,6000,textvqa_val_exact_match,0.48526,0.006795924028171543
554
+ ≥3,7000,ai2d_exact_match,0.3805051813471503,0.00873837769131663
555
+ ≥3,7000,average,0.42920372592352884,
556
+ ≥3,7000,average_rank,2.7,
557
+ ≥3,7000,chartqa_relaxed_overall,0.5728,0.009895414680177737
558
+ ≥3,7000,docvqa_val_anls,0.5922749765517075,0.006249497802747461
559
+ ≥3,7000,infovqa_val_anls,0.23025261139769496,0.006777932440928761
560
+ ≥3,7000,mme_total_score,1289.3664465786314,
561
+ ≥3,7000,mmmu_val_mmmu_acc,0.27111,
562
+ ≥3,7000,mmstar_average,0.3153601470057574,
563
+ ≥3,7000,ocrbench_ocrbench_accuracy,0.498,
564
+ ≥3,7000,seedbench_seed_all,0.4991106170094497,
565
+ ≥3,7000,textvqa_val_exact_match,0.50342,0.006801949281110862
566
+ ≥3,8000,ai2d_exact_match,0.39799222797927464,0.008809880751131852
567
+ ≥3,8000,average,0.438180751977588,
568
+ ≥3,8000,average_rank,2.4,
569
+ ≥3,8000,chartqa_relaxed_overall,0.5844,0.009858475126140203
570
+ ≥3,8000,docvqa_val_anls,0.6044755547364623,0.006202062618138765
571
+ ≥3,8000,infovqa_val_anls,0.21693088745597935,0.006529416377309533
572
+ ≥3,8000,mme_total_score,1187.3639455782313,
573
+ ≥3,8000,mmmu_val_mmmu_acc,0.28667,
574
+ ≥3,8000,mmstar_average,0.31735843114519735,
575
+ ≥3,8000,ocrbench_ocrbench_accuracy,0.506,
576
+ ≥3,8000,seedbench_seed_all,0.5193996664813786,
577
+ ≥3,8000,textvqa_val_exact_match,0.5104,0.0067972647853171315
578
+ ≥3,9000,ai2d_exact_match,0.407059585492228,0.008842319527489083
579
+ ≥3,9000,average,0.44395606448032265,
580
+ ≥3,9000,average_rank,3.0,
581
+ ≥3,9000,chartqa_relaxed_overall,0.598,0.009808000752013664
582
+ ≥3,9000,docvqa_val_anls,0.6107522318987826,0.006184930065074595
583
+ ≥3,9000,infovqa_val_anls,0.2347778400526839,0.0067525186273140235
584
+ ≥3,9000,mme_total_score,1195.0110044017606,
585
+ ≥3,9000,mmmu_val_mmmu_acc,0.28222,
586
+ ≥3,9000,mmstar_average,0.3264280968647572,
587
+ ≥3,9000,ocrbench_ocrbench_accuracy,0.521,
588
+ ≥3,9000,seedbench_seed_all,0.5162868260144525,
589
+ ≥3,9000,textvqa_val_exact_match,0.4990799999999999,0.00679372222366579
590
+ ≥3,10000,ai2d_exact_match,0.41580310880829013,0.008870644443998564
591
+ ≥3,10000,average,0.4524021135685592,
592
+ ≥3,10000,average_rank,2.5,
593
+ ≥3,10000,chartqa_relaxed_overall,0.5992,0.00980317218424473
594
+ ≥3,10000,docvqa_val_anls,0.6291907180725226,0.0061343676879221844
595
+ ≥3,10000,infovqa_val_anls,0.2282836442456148,0.006711844883510513
596
+ ≥3,10000,mme_total_score,1326.8972589035614,
597
+ ≥3,10000,mmmu_val_mmmu_acc,0.30111,
598
+ ≥3,10000,mmstar_average,0.3402582102457474,
599
+ ≥3,10000,ocrbench_ocrbench_accuracy,0.522,
600
+ ≥3,10000,seedbench_seed_all,0.5240133407448583,
601
+ ≥3,10000,textvqa_val_exact_match,0.51176,0.006789754092169055
602
+ ≥3,11000,ai2d_exact_match,0.42389896373056996,0.008894308540753343
603
+ ≥3,11000,average,0.45530296075039445,
604
+ ≥3,11000,average_rank,3.3,
605
+ ≥3,11000,chartqa_relaxed_overall,0.5992,0.00980317218424473
606
+ ≥3,11000,docvqa_val_anls,0.637004884118944,0.0060952660672868655
607
+ ≥3,11000,infovqa_val_anls,0.24182483065748125,0.006800414154487266
608
+ ≥3,11000,mme_total_score,1229.9441776710685,
609
+ ≥3,11000,mmmu_val_mmmu_acc,0.28556,
610
+ ≥3,11000,mmstar_average,0.3210406141609519,
611
+ ≥3,11000,ocrbench_ocrbench_accuracy,0.532,
612
+ ≥3,11000,seedbench_seed_all,0.5272373540856031,
613
+ ≥3,11000,textvqa_val_exact_match,0.52996,0.006774485841130848
614
+ ≥3,12000,ai2d_exact_match,0.4378238341968912,0.008929303814062614
615
+ ≥3,12000,average,0.4603808175211579,
616
+ ≥3,12000,average_rank,2.8,
617
+ ≥3,12000,chartqa_relaxed_overall,0.6036,0.009784943231599163
618
+ ≥3,12000,docvqa_val_anls,0.6425836471445318,0.006082856374953106
619
+ ≥3,12000,infovqa_val_anls,0.23921346499497054,0.00674373988949671
620
+ ≥3,12000,mme_total_score,1253.4613845538215,
621
+ ≥3,12000,mmmu_val_mmmu_acc,0.28,
622
+ ≥3,12000,mmstar_average,0.3402058443723711,
623
+ ≥3,12000,ocrbench_ocrbench_accuracy,0.533,
624
+ ≥3,12000,seedbench_seed_all,0.5370205669816565,
625
+ ≥3,12000,textvqa_val_exact_match,0.52998,0.006788538632972067
626
+ ≥3,13000,ai2d_exact_match,0.4410621761658031,0.0089364152923413
627
+ ≥3,13000,average,0.46617815773624777,
628
+ ≥3,13000,average_rank,2.8,
629
+ ≥3,13000,chartqa_relaxed_overall,0.6116,0.009749676839741497
630
+ ≥3,13000,docvqa_val_anls,0.6435913615068958,0.006093449845266186
631
+ ≥3,13000,infovqa_val_anls,0.24655403627027533,0.0068431739840280935
632
+ ≥3,13000,mme_total_score,1338.6154461784713,
633
+ ≥3,13000,mmmu_val_mmmu_acc,0.29556,
634
+ ≥3,13000,mmstar_average,0.33746561777886447,
635
+ ≥3,13000,ocrbench_ocrbench_accuracy,0.543,
636
+ ≥3,13000,seedbench_seed_all,0.5384102279043913,
637
+ ≥3,13000,textvqa_val_exact_match,0.5383600000000001,0.006773985492742893
638
+ ≥3,14000,ai2d_exact_match,0.4426813471502591,0.008939826412531762
639
+ ≥3,14000,average,0.46514162030247774,
640
+ ≥3,14000,average_rank,3.6,
641
+ ≥3,14000,chartqa_relaxed_overall,0.6104,0.009755142291143075
642
+ ≥3,14000,docvqa_val_anls,0.6522898002805984,0.006013616663077038
643
+ ≥3,14000,infovqa_val_anls,0.23824160343368236,0.006685403314320424
644
+ ≥3,14000,mme_total_score,1290.797318927571,
645
+ ≥3,14000,mmmu_val_mmmu_acc,0.29111,
646
+ ≥3,14000,mmstar_average,0.34665083130189556,
647
+ ≥3,14000,ocrbench_ocrbench_accuracy,0.533,
648
+ ≥3,14000,seedbench_seed_all,0.5418010005558643,
649
+ ≥3,14000,textvqa_val_exact_match,0.5300999999999999,0.006785072250248203
650
+ ≥3,15000,ai2d_exact_match,0.4536917098445596,0.00896047438220532
651
+ ≥3,15000,average,0.47760694744777243,
652
+ ≥3,15000,average_rank,2.9,
653
+ ≥3,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
654
+ ≥3,15000,docvqa_val_anls,0.6656498012528964,0.006035466987037702
655
+ ≥3,15000,infovqa_val_anls,0.2625991131461808,0.007063916588796129
656
+ ≥3,15000,mme_total_score,1285.4465786314527,
657
+ ≥3,15000,mmmu_val_mmmu_acc,0.30222,
658
+ ≥3,15000,mmstar_average,0.3502185231309507,
659
+ ≥3,15000,ocrbench_ocrbench_accuracy,0.558,
660
+ ≥3,15000,seedbench_seed_all,0.5500833796553641,
661
+ ≥3,15000,textvqa_val_exact_match,0.544,0.0067575389652278954
662
+ ≥3,16000,ai2d_exact_match,0.4689119170984456,0.008981742470016596
663
+ ≥3,16000,average,0.4804309718902879,
664
+ ≥3,16000,average_rank,2.5,
665
+ ≥3,16000,chartqa_relaxed_overall,0.6204,0.009707689307588963
666
+ ≥3,16000,docvqa_val_anls,0.6742164965149466,0.0059800657435710326
667
+ ≥3,16000,infovqa_val_anls,0.2633355771988975,0.00704601997176055
668
+ ≥3,16000,mme_total_score,1288.4584833933575,
669
+ ≥3,16000,mmmu_val_mmmu_acc,0.29556,
670
+ ≥3,16000,mmstar_average,0.3443487528651147,
671
+ ≥3,16000,ocrbench_ocrbench_accuracy,0.55,
672
+ ≥3,16000,seedbench_seed_all,0.5508060033351863,
673
+ ≥3,16000,textvqa_val_exact_match,0.5563,0.006742548063668376
674
+ ≥3,17000,ai2d_exact_match,0.45595854922279794,0.008964175733819342
675
+ ≥3,17000,average,0.4809373657329622,
676
+ ≥3,17000,average_rank,3.3,
677
+ ≥3,17000,chartqa_relaxed_overall,0.6204,0.009707689307588963
678
+ ≥3,17000,docvqa_val_anls,0.6739488016448908,0.005975889304414765
679
+ ≥3,17000,infovqa_val_anls,0.2580649809644441,0.007031141926644411
680
+ ≥3,17000,mme_total_score,1230.4375750300119,
681
+ ≥3,17000,mmmu_val_mmmu_acc,0.29444,
682
+ ≥3,17000,mmstar_average,0.3444925534276732,
683
+ ≥3,17000,ocrbench_ocrbench_accuracy,0.578,
684
+ ≥3,17000,seedbench_seed_all,0.5565314063368538,
685
+ ≥3,17000,textvqa_val_exact_match,0.5466,0.006752985159298985
686
+ ≥3,18000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
687
+ ≥3,18000,average,0.48088758936650067,
688
+ ≥3,18000,average_rank,3.5,
689
+ ≥3,18000,chartqa_relaxed_overall,0.6252,0.009683361554563506
690
+ ≥3,18000,docvqa_val_anls,0.675384499731014,0.005997750609265588
691
+ ≥3,18000,infovqa_val_anls,0.2579974692510198,0.0070128299378415275
692
+ ≥3,18000,mme_total_score,1234.843237294918,
693
+ ≥3,18000,mmmu_val_mmmu_acc,0.3,
694
+ ≥3,18000,mmstar_average,0.3363850750308216,
695
+ ≥3,18000,ocrbench_ocrbench_accuracy,0.566,
696
+ ≥3,18000,seedbench_seed_all,0.5558643690939411,
697
+ ≥3,18000,textvqa_val_exact_match,0.55196,0.006755291146330729
698
+ ≥3,19000,ai2d_exact_match,0.4634067357512953,0.008975020819363737
699
+ ≥3,19000,average,0.4861360634692545,
700
+ ≥3,19000,average_rank,3.3,
701
+ ≥3,19000,chartqa_relaxed_overall,0.6312,0.009651522406019766
702
+ ≥3,19000,docvqa_val_anls,0.6819220996842664,0.005927423649467908
703
+ ≥3,19000,infovqa_val_anls,0.26277439983326806,0.007102707331042042
704
+ ≥3,19000,mme_total_score,1337.9653861544616,
705
+ ≥3,19000,mmmu_val_mmmu_acc,0.29889,
706
+ ≥3,19000,mmstar_average,0.34778832316957964,
707
+ ≥3,19000,ocrbench_ocrbench_accuracy,0.574,
708
+ ≥3,19000,seedbench_seed_all,0.5614230127848805,
709
+ ≥3,19000,textvqa_val_exact_match,0.55382,0.006743039020727005
710
+ ≥3,20000,ai2d_exact_match,0.4841321243523316,0.008994621193008031
711
+ ≥3,20000,average,0.4916087790351852,
712
+ ≥3,20000,average_rank,2.3,
713
+ ≥3,20000,chartqa_relaxed_overall,0.638,0.009613499245701268
714
+ ≥3,20000,docvqa_val_anls,0.6839168937073106,0.005936410873687919
715
+ ≥3,20000,infovqa_val_anls,0.25441216838205727,0.006890877173562315
716
+ ≥3,20000,mme_total_score,1330.3037214885953,
717
+ ≥3,20000,mmmu_val_mmmu_acc,0.31,
718
+ ≥3,20000,mmstar_average,0.35052721898280503,
719
+ ≥3,20000,ocrbench_ocrbench_accuracy,0.572,
720
+ ≥3,20000,seedbench_seed_all,0.5630906058921623,
721
+ ≥3,20000,textvqa_val_exact_match,0.5684000000000001,0.00672360984783302
722
+ ≥4,1000,ai2d_exact_match,0.26360103626943004,0.00792979255467583
723
+ ≥4,1000,average,0.26922373369534647,
724
+ ≥4,1000,average_rank,3.3,
725
+ ≥4,1000,chartqa_relaxed_overall,0.3488,0.009533718094861256
726
+ ≥4,1000,docvqa_val_anls,0.3599045480096881,0.005885735735631119
727
+ ≥4,1000,infovqa_val_anls,0.17148252623256244,0.0061724612150041895
728
+ ≥4,1000,mme_total_score,1104.3533413365346,
729
+ ≥4,1000,mmmu_val_mmmu_acc,0.24,
730
+ ≥4,1000,mmstar_average,0.21109041770474804,
731
+ ≥4,1000,ocrbench_ocrbench_accuracy,0.29,
732
+ ≥4,1000,seedbench_seed_all,0.24313507504168982,
733
+ ≥4,1000,textvqa_val_exact_match,0.295,0.006241441429527609
734
+ ≥4,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
735
+ ≥4,2000,average,0.3192621308996215,
736
+ ≥4,2000,average_rank,2.9,
737
+ ≥4,2000,chartqa_relaxed_overall,0.4644,0.009976616117083942
738
+ ≥4,2000,docvqa_val_anls,0.44610634422212336,0.006125661837378556
739
+ ≥4,2000,infovqa_val_anls,0.19012118870963063,0.006420072608935975
740
+ ≥4,2000,mme_total_score,1052.5613245298118,
741
+ ≥4,2000,mmmu_val_mmmu_acc,0.23889,
742
+ ≥4,2000,mmstar_average,0.22088912220303317,
743
+ ≥4,2000,ocrbench_ocrbench_accuracy,0.389,
744
+ ≥4,2000,seedbench_seed_all,0.262479155086159,
745
+ ≥4,2000,textvqa_val_exact_match,0.39852,0.006693677836929181
746
+ ≥4,3000,ai2d_exact_match,0.2545336787564767,0.007840040862810524
747
+ ≥4,3000,average,0.34899183633853254,
748
+ ≥4,3000,average_rank,3.1,
749
+ ≥4,3000,chartqa_relaxed_overall,0.5144,0.009997851710018818
750
+ ≥4,3000,docvqa_val_anls,0.5122633926443586,0.006250170224123374
751
+ ≥4,3000,infovqa_val_anls,0.21839296983497156,0.006786311152255019
752
+ ≥4,3000,mme_total_score,1148.9873949579833,
753
+ ≥4,3000,mmmu_val_mmmu_acc,0.24667,
754
+ ≥4,3000,mmstar_average,0.23884910949080793,
755
+ ≥4,3000,ocrbench_ocrbench_accuracy,0.426,
756
+ ≥4,3000,seedbench_seed_all,0.29927737632017787,
757
+ ≥4,3000,textvqa_val_exact_match,0.43054,0.006761938068430401
758
+ ≥4,4000,ai2d_exact_match,0.2814119170984456,0.00809362228799086
759
+ ≥4,4000,average,0.3808723899304912,
760
+ ≥4,4000,average_rank,2.3,
761
+ ≥4,4000,chartqa_relaxed_overall,0.536,0.009976041728231964
762
+ ≥4,4000,docvqa_val_anls,0.5444976153718191,0.006262351643342788
763
+ ≥4,4000,infovqa_val_anls,0.22943118895386538,0.006865542219383826
764
+ ≥4,4000,mme_total_score,1161.4330732292917,
765
+ ≥4,4000,mmmu_val_mmmu_acc,0.26889,
766
+ ≥4,4000,mmstar_average,0.2546319608241094,
767
+ ≥4,4000,ocrbench_ocrbench_accuracy,0.459,
768
+ ≥4,4000,seedbench_seed_all,0.39988882712618123,
769
+ ≥4,4000,textvqa_val_exact_match,0.4541,0.006780990662644609
770
+ ≥4,5000,ai2d_exact_match,0.31573834196891193,0.00836578020190971
771
+ ≥4,5000,average,0.4004212057382194,
772
+ ≥4,5000,average_rank,2.7,
773
+ ≥4,5000,chartqa_relaxed_overall,0.5544,0.009942625323290008
774
+ ≥4,5000,docvqa_val_anls,0.556855142418819,0.006267140081468451
775
+ ≥4,5000,infovqa_val_anls,0.23435340618373432,0.006883129487757931
776
+ ≥4,5000,mme_total_score,1145.157863145258,
777
+ ≥4,5000,mmmu_val_mmmu_acc,0.26556,
778
+ ≥4,5000,mmstar_average,0.2888277743020811,
779
+ ≥4,5000,ocrbench_ocrbench_accuracy,0.475,
780
+ ≥4,5000,seedbench_seed_all,0.445136186770428,
781
+ ≥4,5000,textvqa_val_exact_match,0.46792,0.0067973094238147356
782
+ ≥4,6000,ai2d_exact_match,0.38471502590673573,0.008756678690415541
783
+ ≥4,6000,average,0.42131977921781544,
784
+ ≥4,6000,average_rank,3.0,
785
+ ≥4,6000,chartqa_relaxed_overall,0.556,0.00993907007952043
786
+ ≥4,6000,docvqa_val_anls,0.5727106862384739,0.006269180765398416
787
+ ≥4,6000,infovqa_val_anls,0.2310709838980833,0.006744459748098398
788
+ ≥4,6000,mme_total_score,1139.8311324529811,
789
+ ≥4,6000,mmmu_val_mmmu_acc,0.27,
790
+ ≥4,6000,mmstar_average,0.30779610290926424,
791
+ ≥4,6000,ocrbench_ocrbench_accuracy,0.492,
792
+ ≥4,6000,seedbench_seed_all,0.4933852140077821,
793
+ ≥4,6000,textvqa_val_exact_match,0.4841999999999999,0.006796772117869219
794
+ ≥4,7000,ai2d_exact_match,0.39281088082901555,0.008789930274160654
795
+ ≥4,7000,average,0.42891500537341953,
796
+ ≥4,7000,average_rank,2.9,
797
+ ≥4,7000,chartqa_relaxed_overall,0.576,0.009885782289560632
798
+ ≥4,7000,docvqa_val_anls,0.5907488324071782,0.006231156163373406
799
+ ≥4,7000,infovqa_val_anls,0.24013816441297325,0.006930097636315065
800
+ ≥4,7000,mme_total_score,1162.137755102041,
801
+ ≥4,7000,mmmu_val_mmmu_acc,0.27556,
802
+ ≥4,7000,mmstar_average,0.29752599783778977,
803
+ ≥4,7000,ocrbench_ocrbench_accuracy,0.504,
804
+ ≥4,7000,seedbench_seed_all,0.5001111728738188,
805
+ ≥4,7000,textvqa_val_exact_match,0.48333999999999994,0.006805450147517214
806
+ ≥4,8000,ai2d_exact_match,0.4164507772020725,0.008872627955954676
807
+ ≥4,8000,average,0.43574351275219425,
808
+ ≥4,8000,average_rank,3.5,
809
+ ≥4,8000,chartqa_relaxed_overall,0.5808,0.009870537726284339
810
+ ≥4,8000,docvqa_val_anls,0.6057226019616091,0.0061946427553956785
811
+ ≥4,8000,infovqa_val_anls,0.2476713069705094,0.006953489019987495
812
+ ≥4,8000,mme_total_score,1170.280612244898,
813
+ ≥4,8000,mmmu_val_mmmu_acc,0.26778,
814
+ ≥4,8000,mmstar_average,0.30520454953605713,
815
+ ≥4,8000,ocrbench_ocrbench_accuracy,0.496,
816
+ ≥4,8000,seedbench_seed_all,0.5082823790994997,
817
+ ≥4,8000,textvqa_val_exact_match,0.49378,0.006806491606223952
818
+ ≥4,9000,ai2d_exact_match,0.42357512953367876,0.008893409023558714
819
+ ≥4,9000,average,0.441337144868937,
820
+ ≥4,9000,average_rank,3.4,
821
+ ≥4,9000,chartqa_relaxed_overall,0.578,0.00987954665846924
822
+ ≥4,9000,docvqa_val_anls,0.6243353881540346,0.006123815047404004
823
+ ≥4,9000,infovqa_val_anls,0.2437398253282973,0.00692277294272151
824
+ ≥4,9000,mme_total_score,1255.001700680272,
825
+ ≥4,9000,mmmu_val_mmmu_acc,0.26778,
826
+ ≥4,9000,mmstar_average,0.31167080905344985,
827
+ ≥4,9000,ocrbench_ocrbench_accuracy,0.512,
828
+ ≥4,9000,seedbench_seed_all,0.5116731517509727,
829
+ ≥4,9000,textvqa_val_exact_match,0.49926000000000004,0.006799642454386958
830
+ ≥4,10000,ai2d_exact_match,0.44462435233160624,0.00894379269709736
831
+ ≥4,10000,average,0.4536388119594498,
832
+ ≥4,10000,average_rank,3.6,
833
+ ≥4,10000,chartqa_relaxed_overall,0.5992,0.00980317218424473
834
+ ≥4,10000,docvqa_val_anls,0.6264595846035441,0.006147505656275056
835
+ ≥4,10000,infovqa_val_anls,0.2598110483089896,0.00706252458320144
836
+ ≥4,10000,mme_total_score,1192.952080832333,
837
+ ≥4,10000,mmmu_val_mmmu_acc,0.28556,
838
+ ≥4,10000,mmstar_average,0.3186673407344323,
839
+ ≥4,10000,ocrbench_ocrbench_accuracy,0.52,
840
+ ≥4,10000,seedbench_seed_all,0.5205669816564759,
841
+ ≥4,10000,textvqa_val_exact_match,0.5078600000000001,0.006802447996107573
842
+ ≥4,11000,ai2d_exact_match,0.4536917098445596,0.008960474382205324
843
+ ≥4,11000,average,0.46152725733636885,
844
+ ≥4,11000,average_rank,2.3,
845
+ ≥4,11000,chartqa_relaxed_overall,0.6004,0.009798282427824488
846
+ ≥4,11000,docvqa_val_anls,0.6401993666501584,0.0060898160255800525
847
+ ≥4,11000,infovqa_val_anls,0.2552761209118603,0.007046581941151624
848
+ ≥4,11000,mme_total_score,1246.6340536214486,
849
+ ≥4,11000,mmmu_val_mmmu_acc,0.28,
850
+ ≥4,11000,mmstar_average,0.3347344054467562,
851
+ ≥4,11000,ocrbench_ocrbench_accuracy,0.533,
852
+ ≥4,11000,seedbench_seed_all,0.5306837131739855,
853
+ ≥4,11000,textvqa_val_exact_match,0.5257599999999999,0.006772980077619183
854
+ ≥4,12000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
855
+ ≥4,12000,average,0.46100484234717837,
856
+ ≥4,12000,average_rank,3.7,
857
+ ≥4,12000,chartqa_relaxed_overall,0.5956,0.009817474681589429
858
+ ≥4,12000,docvqa_val_anls,0.6409268162702743,0.006097072959583667
859
+ ≥4,12000,infovqa_val_anls,0.26230050824120466,0.007170670588017343
860
+ ≥4,12000,mme_total_score,1168.390556222489,
861
+ ≥4,12000,mmmu_val_mmmu_acc,0.26889,
862
+ ≥4,12000,mmstar_average,0.3293674421306994,
863
+ ≥4,12000,ocrbench_ocrbench_accuracy,0.538,
864
+ ≥4,12000,seedbench_seed_all,0.5314619232907171,
865
+ ≥4,12000,textvqa_val_exact_match,0.5233,0.006791483405661084
866
+ ≥4,13000,ai2d_exact_match,0.46081606217616583,0.008971477299154906
867
+ ≥4,13000,average,0.46897968661537504,
868
+ ≥4,13000,average_rank,3.0,
869
+ ≥4,13000,chartqa_relaxed_overall,0.6084,0.00976411343463736
870
+ ≥4,13000,docvqa_val_anls,0.6557097904355208,0.006045284321472833
871
+ ≥4,13000,infovqa_val_anls,0.25716935409374025,0.007037968981507592
872
+ ≥4,13000,mme_total_score,1214.4760904361744,
873
+ ≥4,13000,mmmu_val_mmmu_acc,0.27444,
874
+ ≥4,13000,mmstar_average,0.35062705343328215,
875
+ ≥4,13000,ocrbench_ocrbench_accuracy,0.542,
876
+ ≥4,13000,seedbench_seed_all,0.5388549193996665,
877
+ ≥4,13000,textvqa_val_exact_match,0.5328,0.006772208248489718
878
+ ≥4,14000,ai2d_exact_match,0.4637305699481865,0.008975446629055962
879
+ ≥4,14000,average,0.46882712562329804,
880
+ ≥4,14000,average_rank,3.4,
881
+ ≥4,14000,chartqa_relaxed_overall,0.6052,0.009778109662477129
882
+ ≥4,14000,docvqa_val_anls,0.6600293980607723,0.006003818486747537
883
+ ≥4,14000,infovqa_val_anls,0.2604896578960276,0.0070806001081496605
884
+ ≥4,14000,mme_total_score,1180.2360944377751,
885
+ ≥4,14000,mmmu_val_mmmu_acc,0.29889,
886
+ ≥4,14000,mmstar_average,0.3370405135985262,
887
+ ≥4,14000,ocrbench_ocrbench_accuracy,0.532,
888
+ ≥4,14000,seedbench_seed_all,0.5311839911061701,
889
+ ≥4,14000,textvqa_val_exact_match,0.53088,0.006765681045393848
890
+ ≥4,15000,ai2d_exact_match,0.469559585492228,0.008982461065390123
891
+ ≥4,15000,average,0.47678210727691706,
892
+ ≥4,15000,average_rank,3.0,
893
+ ≥4,15000,chartqa_relaxed_overall,0.6228,0.009695651925812239
894
+ ≥4,15000,docvqa_val_anls,0.668732849209273,0.006002172541493102
895
+ ≥4,15000,infovqa_val_anls,0.2541377129865746,0.006911037097155498
896
+ ≥4,15000,mme_total_score,1198.8395358143257,
897
+ ≥4,15000,mmmu_val_mmmu_acc,0.28111,
898
+ ≥4,15000,mmstar_average,0.3574887121899482,
899
+ ≥4,15000,ocrbench_ocrbench_accuracy,0.558,
900
+ ≥4,15000,seedbench_seed_all,0.5421901056142301,
901
+ ≥4,15000,textvqa_val_exact_match,0.53702,0.0067620891069120025
902
+ ≥4,16000,ai2d_exact_match,0.4689119170984456,0.00898174247001659
903
+ ≥4,16000,average,0.47623501147363423,
904
+ ≥4,16000,average_rank,3.7,
905
+ ≥4,16000,chartqa_relaxed_overall,0.6184,0.009717527882093043
906
+ ≥4,16000,docvqa_val_anls,0.664711612332228,0.006033753206179003
907
+ ≥4,16000,infovqa_val_anls,0.26137627968800997,0.0069587136315641595
908
+ ≥4,16000,mme_total_score,1223.327631052421,
909
+ ≥4,16000,mmmu_val_mmmu_acc,0.27778,
910
+ ≥4,16000,mmstar_average,0.3532221312757646,
911
+ ≥4,16000,ocrbench_ocrbench_accuracy,0.545,
912
+ ≥4,16000,seedbench_seed_all,0.5476931628682602,
913
+ ≥4,16000,textvqa_val_exact_match,0.54902,0.006730591957147508
914
+ ≥4,17000,ai2d_exact_match,0.47830310880829013,0.008990677331728418
915
+ ≥4,17000,average,0.4815150623543914,
916
+ ≥4,17000,average_rank,2.8,
917
+ ≥4,17000,chartqa_relaxed_overall,0.6208,0.009705700605814084
918
+ ≥4,17000,docvqa_val_anls,0.6784945768946954,0.005958779114256312
919
+ ≥4,17000,infovqa_val_anls,0.27415576971914574,0.007211057524316044
920
+ ≥4,17000,mme_total_score,1267.6510604241696,
921
+ ≥4,17000,mmmu_val_mmmu_acc,0.27889,
922
+ ≥4,17000,mmstar_average,0.35485659715149337,
923
+ ≥4,17000,ocrbench_ocrbench_accuracy,0.55,
924
+ ≥4,17000,seedbench_seed_all,0.5479155086158978,
925
+ ≥4,17000,textvqa_val_exact_match,0.5502199999999999,0.006738803500215962
926
+ ≥4,18000,ai2d_exact_match,0.4795984455958549,0.008991659681159872
927
+ ≥4,18000,average,0.4839656525796875,
928
+ ≥4,18000,average_rank,3.1,
929
+ ≥4,18000,chartqa_relaxed_overall,0.6228,0.009695651925812239
930
+ ≥4,18000,docvqa_val_anls,0.680615041882376,0.005957029786047422
931
+ ≥4,18000,infovqa_val_anls,0.27507992619170296,0.007267921800589956
932
+ ≥4,18000,mme_total_score,1226.5048019207684,
933
+ ≥4,18000,mmmu_val_mmmu_acc,0.28111,
934
+ ≥4,18000,mmstar_average,0.35607565298805366,
935
+ ≥4,18000,ocrbench_ocrbench_accuracy,0.555,
936
+ ≥4,18000,seedbench_seed_all,0.5532518065591996,
937
+ ≥4,18000,textvqa_val_exact_match,0.55216,0.006730239676654988
938
+ ≥4,19000,ai2d_exact_match,0.4734455958549223,0.008986453895645547
939
+ ≥4,19000,average,0.485443851233213,
940
+ ≥4,19000,average_rank,3.0,
941
+ ≥4,19000,chartqa_relaxed_overall,0.6276,0.009670817229291067
942
+ ≥4,19000,docvqa_val_anls,0.690884348495626,0.005908240141234498
943
+ ≥4,19000,infovqa_val_anls,0.2676836840845966,0.007165567282387595
944
+ ≥4,19000,mme_total_score,1323.2516006402561,
945
+ ≥4,19000,mmmu_val_mmmu_acc,0.28556,
946
+ ≥4,19000,mmstar_average,0.33406913716627346,
947
+ ≥4,19000,ocrbench_ocrbench_accuracy,0.584,
948
+ ≥4,19000,seedbench_seed_all,0.5414118954974986,
949
+ ≥4,19000,textvqa_val_exact_match,0.56434,0.006692191716171407
950
+ ≥4,20000,ai2d_exact_match,0.4876943005181347,0.008996428218289526
951
+ ≥4,20000,average,0.4906341423361293,
952
+ ≥4,20000,average_rank,3.3,
953
+ ≥4,20000,chartqa_relaxed_overall,0.6284,0.009666579183001631
954
+ ≥4,20000,docvqa_val_anls,0.6887236251150223,0.005918556723502163
955
+ ≥4,20000,infovqa_val_anls,0.2809124119459898,0.007354611102020885
956
+ ≥4,20000,mme_total_score,1254.5532212885155,
957
+ ≥4,20000,mmmu_val_mmmu_acc,0.29333,
958
+ ≥4,20000,mmstar_average,0.34736535367392096,
959
+ ≥4,20000,ocrbench_ocrbench_accuracy,0.572,
960
+ ≥4,20000,seedbench_seed_all,0.5508615897720957,
961
+ ≥4,20000,textvqa_val_exact_match,0.56642,0.00672606309106159
962
+ ≥5,1000,ai2d_exact_match,0.26327720207253885,0.007926662492947056
963
+ ≥5,1000,average,0.27709006947371073,
964
+ ≥5,1000,average_rank,2.6,
965
+ ≥5,1000,chartqa_relaxed_overall,0.3412,0.009484144853461517
966
+ ≥5,1000,docvqa_val_anls,0.36296241117667905,0.005852839558467308
967
+ ≥5,1000,infovqa_val_anls,0.17994878830754762,0.006336933369747534
968
+ ≥5,1000,mme_total_score,968.375450180072,
969
+ ≥5,1000,mmmu_val_mmmu_acc,0.26667,
970
+ ≥5,1000,mmstar_average,0.22684359669162246,
971
+ ≥5,1000,ocrbench_ocrbench_accuracy,0.301,
972
+ ≥5,1000,seedbench_seed_all,0.25152862701500833,
973
+ ≥5,1000,textvqa_val_exact_match,0.30038,0.006282823083071704
974
+ ≥5,2000,ai2d_exact_match,0.27331606217616583,0.008021157484423327
975
+ ≥5,2000,average,0.318491261297989,
976
+ ≥5,2000,average_rank,3.2,
977
+ ≥5,2000,chartqa_relaxed_overall,0.4524,0.009956573172519544
978
+ ≥5,2000,docvqa_val_anls,0.4578740641673747,0.006180081722767688
979
+ ≥5,2000,infovqa_val_anls,0.1919057230410833,0.006401757863597739
980
+ ≥5,2000,mme_total_score,1031.2603041216487,
981
+ ≥5,2000,mmmu_val_mmmu_acc,0.24667,
982
+ ≥5,2000,mmstar_average,0.21129996032951712,
983
+ ≥5,2000,ocrbench_ocrbench_accuracy,0.383,
984
+ ≥5,2000,seedbench_seed_all,0.25597554196775985,
985
+ ≥5,2000,textvqa_val_exact_match,0.39398,0.0066750028503822015
986
+ ≥5,3000,ai2d_exact_match,0.2661917098445596,0.007954634970279373
987
+ ≥5,3000,average,0.3470898411915701,
988
+ ≥5,3000,average_rank,3.1,
989
+ ≥5,3000,chartqa_relaxed_overall,0.4888,0.009999490983443667
990
+ ≥5,3000,docvqa_val_anls,0.5063663265388635,0.006269377896147078
991
+ ≥5,3000,infovqa_val_anls,0.2002412084672373,0.006449644926640854
992
+ ≥5,3000,mme_total_score,1176.8578431372548,
993
+ ≥5,3000,mmmu_val_mmmu_acc,0.25889,
994
+ ≥5,3000,mmstar_average,0.2226891646728035,
995
+ ≥5,3000,ocrbench_ocrbench_accuracy,0.422,
996
+ ≥5,3000,seedbench_seed_all,0.32229016120066706,
997
+ ≥5,3000,textvqa_val_exact_match,0.43633999999999995,0.006743513614961789
998
+ ≥5,4000,ai2d_exact_match,0.32091968911917096,0.008402150106895235
999
+ ≥5,4000,average,0.38454946481840957,
1000
+ ≥5,4000,average_rank,3.0,
1001
+ ≥5,4000,chartqa_relaxed_overall,0.5244,0.009990083919101193
1002
+ ≥5,4000,docvqa_val_anls,0.5408182220870532,0.0062304604635426315
1003
+ ≥5,4000,infovqa_val_anls,0.21034975209325477,0.006529781109938355
1004
+ ≥5,4000,mme_total_score,1186.4263705482194,
1005
+ ≥5,4000,mmmu_val_mmmu_acc,0.26556,
1006
+ ≥5,4000,mmstar_average,0.26918979355147643,
1007
+ ≥5,4000,ocrbench_ocrbench_accuracy,0.452,
1008
+ ≥5,4000,seedbench_seed_all,0.4339077265147304,
1009
+ ≥5,4000,textvqa_val_exact_match,0.4438,0.006776008770579609
1010
+ ≥5,5000,ai2d_exact_match,0.3494170984455959,0.008581339503665948
1011
+ ≥5,5000,average,0.4053929772745627,
1012
+ ≥5,5000,average_rank,2.9,
1013
+ ≥5,5000,chartqa_relaxed_overall,0.546,0.009959582185560013
1014
+ ≥5,5000,docvqa_val_anls,0.5611769594797935,0.006252030837783964
1015
+ ≥5,5000,infovqa_val_anls,0.2283202771889911,0.006874345513158979
1016
+ ≥5,5000,mme_total_score,1179.6603641456581,
1017
+ ≥5,5000,mmmu_val_mmmu_acc,0.27556,
1018
+ ≥5,5000,mmstar_average,0.28276518409209217,
1019
+ ≥5,5000,ocrbench_ocrbench_accuracy,0.464,
1020
+ ≥5,5000,seedbench_seed_all,0.4750972762645914,
1021
+ ≥5,5000,textvqa_val_exact_match,0.4662,0.0067984671677640855
1022
+ ≥5,6000,ai2d_exact_match,0.3636658031088083,0.008658158841882573
1023
+ ≥5,6000,average,0.41623598541202544,
1024
+ ≥5,6000,average_rank,2.6,
1025
+ ≥5,6000,chartqa_relaxed_overall,0.5584,0.009933541468098847
1026
+ ≥5,6000,docvqa_val_anls,0.5839255211800125,0.006223251970774856
1027
+ ≥5,6000,infovqa_val_anls,0.23899504944949723,0.007013133491201096
1028
+ ≥5,6000,mme_total_score,1252.7314925970388,
1029
+ ≥5,6000,mmmu_val_mmmu_acc,0.27222,
1030
+ ≥5,6000,mmstar_average,0.3101670336024846,
1031
+ ≥5,6000,ocrbench_ocrbench_accuracy,0.473,
1032
+ ≥5,6000,seedbench_seed_all,0.49483046136742637,
1033
+ ≥5,6000,textvqa_val_exact_match,0.45092,0.006772193384764505
1034
+ ≥5,7000,ai2d_exact_match,0.42033678756476683,0.008884198538329093
1035
+ ≥5,7000,average,0.4338560588435303,
1036
+ ≥5,7000,average_rank,2.5,
1037
+ ≥5,7000,chartqa_relaxed_overall,0.5692,0.00990574548014469
1038
+ ≥5,7000,docvqa_val_anls,0.5924368390904757,0.006231022369252223
1039
+ ≥5,7000,infovqa_val_anls,0.23945153983485024,0.007006534034576772
1040
+ ≥5,7000,mme_total_score,1315.113445378151,
1041
+ ≥5,7000,mmmu_val_mmmu_acc,0.3,
1042
+ ≥5,7000,mmstar_average,0.31063340423564356,
1043
+ ≥5,7000,ocrbench_ocrbench_accuracy,0.488,
1044
+ ≥5,7000,seedbench_seed_all,0.5067259588660367,
1045
+ ≥5,7000,textvqa_val_exact_match,0.47791999999999996,0.006793800546466833
1046
+ ≥5,8000,ai2d_exact_match,0.42908031088082904,0.008908169846895226
1047
+ ≥5,8000,average,0.43778255861533233,
1048
+ ≥5,8000,average_rank,3.0,
1049
+ ≥5,8000,chartqa_relaxed_overall,0.5752,0.009888230116554488
1050
+ ≥5,8000,docvqa_val_anls,0.6032859006895523,0.006193925022795706
1051
+ ≥5,8000,infovqa_val_anls,0.24493490021598546,0.007008771158507111
1052
+ ≥5,8000,mme_total_score,1304.6824729891955,
1053
+ ≥5,8000,mmmu_val_mmmu_acc,0.28667,
1054
+ ≥5,8000,mmstar_average,0.31703546216629863,
1055
+ ≥5,8000,ocrbench_ocrbench_accuracy,0.487,
1056
+ ≥5,8000,seedbench_seed_all,0.5096164535853251,
1057
+ ≥5,8000,textvqa_val_exact_match,0.48722,0.006804659800386776
1058
+ ≥5,9000,ai2d_exact_match,0.42940414507772023,0.008909003051055709
1059
+ ≥5,9000,average,0.44649777930382,
1060
+ ≥5,9000,average_rank,2.5,
1061
+ ≥5,9000,chartqa_relaxed_overall,0.5792,0.009875725592704212
1062
+ ≥5,9000,docvqa_val_anls,0.6158422097964253,0.00617698110304048
1063
+ ≥5,9000,infovqa_val_anls,0.24039717009699607,0.0068877247346275485
1064
+ ≥5,9000,mme_total_score,1379.7254901960782,
1065
+ ≥5,9000,mmmu_val_mmmu_acc,0.29889,
1066
+ ≥5,9000,mmstar_average,0.3280690123874737,
1067
+ ≥5,9000,ocrbench_ocrbench_accuracy,0.513,
1068
+ ≥5,9000,seedbench_seed_all,0.5234574763757643,
1069
+ ≥5,9000,textvqa_val_exact_match,0.4902200000000001,0.006801597067199211
1070
+ ≥5,10000,ai2d_exact_match,0.45012953367875647,0.00895427929990258
1071
+ ≥5,10000,average,0.4555663389491801,
1072
+ ≥5,10000,average_rank,2.9,
1073
+ ≥5,10000,chartqa_relaxed_overall,0.5844,0.009858475126140203
1074
+ ≥5,10000,docvqa_val_anls,0.6189420793161403,0.006040465868816934
1075
+ ≥5,10000,infovqa_val_anls,0.24850918819779613,0.007091394184737253
1076
+ ≥5,10000,mme_total_score,1235.7704081632655,
1077
+ ≥5,10000,mmmu_val_mmmu_acc,0.27667,
1078
+ ≥5,10000,mmstar_average,0.34675895640940496,
1079
+ ≥5,10000,ocrbench_ocrbench_accuracy,0.528,
1080
+ ≥5,10000,seedbench_seed_all,0.5291272929405225,
1081
+ ≥5,10000,textvqa_val_exact_match,0.51756,0.006786717998284417
1082
+ ≥5,11000,ai2d_exact_match,0.4566062176165803,0.008965198879336196
1083
+ ≥5,11000,average,0.458748059148796,
1084
+ ≥5,11000,average_rank,2.9,
1085
+ ≥5,11000,chartqa_relaxed_overall,0.5916,0.0098327233755248
1086
+ ≥5,11000,docvqa_val_anls,0.633602507666147,0.006134122729213928
1087
+ ≥5,11000,infovqa_val_anls,0.2621320066294427,0.007275786683175354
1088
+ ≥5,11000,mme_total_score,1326.4276710684273,
1089
+ ≥5,11000,mmmu_val_mmmu_acc,0.27667,
1090
+ ≥5,11000,mmstar_average,0.34479339575773355,
1091
+ ≥5,11000,ocrbench_ocrbench_accuracy,0.517,
1092
+ ≥5,11000,seedbench_seed_all,0.5311284046692607,
1093
+ ≥5,11000,textvqa_val_exact_match,0.5152000000000001,0.006786619456012555
1094
+ ≥5,12000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
1095
+ ≥5,12000,average,0.4644995480385772,
1096
+ ≥5,12000,average_rank,2.3,
1097
+ ≥5,12000,chartqa_relaxed_overall,0.596,0.009815912634917984
1098
+ ≥5,12000,docvqa_val_anls,0.6453485539631237,0.006065269954977215
1099
+ ≥5,12000,infovqa_val_anls,0.2685572578806166,0.007278550841020009
1100
+ ≥5,12000,mme_total_score,1374.9406762705082,
1101
+ ≥5,12000,mmmu_val_mmmu_acc,0.28444,
1102
+ ≥5,12000,mmstar_average,0.35205377405882654,
1103
+ ≥5,12000,ocrbench_ocrbench_accuracy,0.519,
1104
+ ≥5,12000,seedbench_seed_all,0.5350194552529183,
1105
+ ≥5,12000,textvqa_val_exact_match,0.52088,0.006777757204160069
1106
+ ≥5,13000,ai2d_exact_match,0.4640544041450777,0.008975868633841907
1107
+ ≥5,13000,average,0.4696984757423332,
1108
+ ≥5,13000,average_rank,2.7,
1109
+ ≥5,13000,chartqa_relaxed_overall,0.608,0.00976588700628918
1110
+ ≥5,13000,docvqa_val_anls,0.6599237778239753,0.006035894149838363
1111
+ ≥5,13000,infovqa_val_anls,0.25759117282312316,0.007107246020667877
1112
+ ≥5,13000,mme_total_score,1326.0453181272508,
1113
+ ≥5,13000,mmmu_val_mmmu_acc,0.28667,
1114
+ ≥5,13000,mmstar_average,0.35252858336464304,
1115
+ ≥5,13000,ocrbench_ocrbench_accuracy,0.533,
1116
+ ≥5,13000,seedbench_seed_all,0.5330183435241801,
1117
+ ≥5,13000,textvqa_val_exact_match,0.5325,0.006770636476998357
1118
+ ≥5,14000,ai2d_exact_match,0.4689119170984456,0.00898174247001659
1119
+ ≥5,14000,average,0.47293227498131896,
1120
+ ≥5,14000,average_rank,2.7,
1121
+ ≥5,14000,chartqa_relaxed_overall,0.614,0.009738559226822298
1122
+ ≥5,14000,docvqa_val_anls,0.6583491716485876,0.0060256160547597325
1123
+ ≥5,14000,infovqa_val_anls,0.26613522559599984,0.0071532088405842145
1124
+ ≥5,14000,mme_total_score,1278.5425170068027,
1125
+ ≥5,14000,mmmu_val_mmmu_acc,0.28,
1126
+ ≥5,14000,mmstar_average,0.35624004153386235,
1127
+ ≥5,14000,ocrbench_ocrbench_accuracy,0.55,
1128
+ ≥5,14000,seedbench_seed_all,0.5454141189549749,
1129
+ ≥5,14000,textvqa_val_exact_match,0.5173399999999999,0.006787096420087393
1130
+ ≥5,15000,ai2d_exact_match,0.4740932642487047,0.008987066275159846
1131
+ ≥5,15000,average,0.47568039073709784,
1132
+ ≥5,15000,average_rank,3.0,
1133
+ ≥5,15000,chartqa_relaxed_overall,0.602,0.00979166741164548
1134
+ ≥5,15000,docvqa_val_anls,0.6649825816931088,0.006012202194059076
1135
+ ≥5,15000,infovqa_val_anls,0.2659187859072639,0.007233849219121225
1136
+ ≥5,15000,mme_total_score,1301.498799519808,
1137
+ ≥5,15000,mmmu_val_mmmu_acc,0.30333,
1138
+ ≥5,15000,mmstar_average,0.363574304462402,
1139
+ ≥5,15000,ocrbench_ocrbench_accuracy,0.536,
1140
+ ≥5,15000,seedbench_seed_all,0.5402445803224013,
1141
+ ≥5,15000,textvqa_val_exact_match,0.53098,0.006774896882281907
1142
+ ≥5,16000,ai2d_exact_match,0.47538860103626945,0.008988245555188545
1143
+ ≥5,16000,average,0.48103362013771567,
1144
+ ≥5,16000,average_rank,2.6,
1145
+ ≥5,16000,chartqa_relaxed_overall,0.6172,0.009723347231923635
1146
+ ≥5,16000,docvqa_val_anls,0.6661394800733964,0.006000339067695713
1147
+ ≥5,16000,infovqa_val_anls,0.27200681388207976,0.007361243845813883
1148
+ ≥5,16000,mme_total_score,1312.4185674269709,
1149
+ ≥5,16000,mmmu_val_mmmu_acc,0.30667,
1150
+ ≥5,16000,mmstar_average,0.352673072573432,
1151
+ ≥5,16000,ocrbench_ocrbench_accuracy,0.553,
1152
+ ≥5,16000,seedbench_seed_all,0.5483046136742635,
1153
+ ≥5,16000,textvqa_val_exact_match,0.53792,0.0067618902203356104
1154
+ ≥5,17000,ai2d_exact_match,0.4740932642487047,0.008987066275159846
1155
+ ≥5,17000,average,0.4842246444979549,
1156
+ ≥5,17000,average_rank,3.1,
1157
+ ≥5,17000,chartqa_relaxed_overall,0.6252,0.009683361554563506
1158
+ ≥5,17000,docvqa_val_anls,0.6727784028551866,0.005982986502554192
1159
+ ≥5,17000,infovqa_val_anls,0.273461783643309,0.00736211121641681
1160
+ ≥5,17000,mme_total_score,1256.561224489796,
1161
+ ≥5,17000,mmmu_val_mmmu_acc,0.31889,
1162
+ ≥5,17000,mmstar_average,0.35664172938975786,
1163
+ ≥5,17000,ocrbench_ocrbench_accuracy,0.545,
1164
+ ≥5,17000,seedbench_seed_all,0.549916620344636,
1165
+ ≥5,17000,textvqa_val_exact_match,0.5420400000000001,0.006760567190239792
1166
+ ≥5,18000,ai2d_exact_match,0.4802461139896373,0.008992128148477658
1167
+ ≥5,18000,average,0.48392876207158253,
1168
+ ≥5,18000,average_rank,2.9,
1169
+ ≥5,18000,chartqa_relaxed_overall,0.6252,0.009683361554563506
1170
+ ≥5,18000,docvqa_val_anls,0.68034033548242,0.005889534044935538
1171
+ ≥5,18000,infovqa_val_anls,0.28015930560506613,0.007380855727131182
1172
+ ≥5,18000,mme_total_score,1380.5266106442577,
1173
+ ≥5,18000,mmmu_val_mmmu_acc,0.3,
1174
+ ≥5,18000,mmstar_average,0.34877922919246646,
1175
+ ≥5,18000,ocrbench_ocrbench_accuracy,0.549,
1176
+ ≥5,18000,seedbench_seed_all,0.5529738743746526,
1177
+ ≥5,18000,textvqa_val_exact_match,0.5386599999999999,0.0067648675941745775
1178
+ ≥5,19000,ai2d_exact_match,0.4805699481865285,0.008992356706334513
1179
+ ≥5,19000,average,0.49271643602329757,
1180
+ ≥5,19000,average_rank,2.8,
1181
+ ≥5,19000,chartqa_relaxed_overall,0.6248,0.009685427559111736
1182
+ ≥5,19000,docvqa_val_anls,0.6825000217737053,0.005934471601355602
1183
+ ≥5,19000,infovqa_val_anls,0.2841253071402532,0.007403930662950274
1184
+ ≥5,19000,mme_total_score,1261.751700680272,
1185
+ ≥5,19000,mmmu_val_mmmu_acc,0.32,
1186
+ ≥5,19000,mmstar_average,0.3611420745688909,
1187
+ ≥5,19000,ocrbench_ocrbench_accuracy,0.572,
1188
+ ≥5,19000,seedbench_seed_all,0.5550305725403002,
1189
+ ≥5,19000,textvqa_val_exact_match,0.5542799999999999,0.006739897741383979
1190
+ ≥5,20000,ai2d_exact_match,0.4844559585492228,0.008994804366753555
1191
+ ≥5,20000,average,0.49543136618963995,
1192
+ ≥5,20000,average_rank,2.9,
1193
+ ≥5,20000,chartqa_relaxed_overall,0.638,0.009613499245701268
1194
+ ≥5,20000,docvqa_val_anls,0.688451623661496,0.005905200575549553
1195
+ ≥5,20000,infovqa_val_anls,0.2789607789162199,0.007289443671361681
1196
+ ≥5,20000,mme_total_score,1296.043617446979,
1197
+ ≥5,20000,mmmu_val_mmmu_acc,0.3,
1198
+ ≥5,20000,mmstar_average,0.3804272197382418,
1199
+ ≥5,20000,ocrbench_ocrbench_accuracy,0.577,
1200
+ ≥5,20000,seedbench_seed_all,0.5560867148415787,
1201
+ ≥5,20000,textvqa_val_exact_match,0.5555,0.006734970078953051
app/src/content/assets/data/remove_ch.csv ADDED
@@ -0,0 +1,455 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run,step,metric,value,stderr
2
+ Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
3
+ Baseline,1000,average,0.27120689295763617,
4
+ Baseline,1000,average_rank,1.7,
5
+ Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
6
+ Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
7
+ Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
8
+ Baseline,1000,mme_total_score,977.4280712284914,
9
+ Baseline,1000,mmmu_val_mmmu_acc,0.25222,
10
+ Baseline,1000,mmstar_average,0.23215874078908072,
11
+ Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
12
+ Baseline,1000,seedbench_seed_all,0.2563646470261256,
13
+ Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
14
+ Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
15
+ Baseline,2000,average,0.3202068275596269,
16
+ Baseline,2000,average_rank,1.5,
17
+ Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
18
+ Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
19
+ Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
20
+ Baseline,2000,mme_total_score,1049.3036214485794,
21
+ Baseline,2000,mmmu_val_mmmu_acc,0.24556,
22
+ Baseline,2000,mmstar_average,0.21305462434540698,
23
+ Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
24
+ Baseline,2000,seedbench_seed_all,0.258532518065592,
25
+ Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
26
+ Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
27
+ Baseline,3000,average,0.3507423834414229,
28
+ Baseline,3000,average_rank,1.6,
29
+ Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
30
+ Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
31
+ Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
32
+ Baseline,3000,mme_total_score,1170.2383953581434,
33
+ Baseline,3000,mmmu_val_mmmu_acc,0.27556,
34
+ Baseline,3000,mmstar_average,0.25432376938577683,
35
+ Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
36
+ Baseline,3000,seedbench_seed_all,0.2792106725958866,
37
+ Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
38
+ Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
39
+ Baseline,4000,average,0.36961781722974835,
40
+ Baseline,4000,average_rank,1.6,
41
+ Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
42
+ Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
43
+ Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
44
+ Baseline,4000,mme_total_score,1155.203781512605,
45
+ Baseline,4000,mmmu_val_mmmu_acc,0.25556,
46
+ Baseline,4000,mmstar_average,0.2575590188757354,
47
+ Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
48
+ Baseline,4000,seedbench_seed_all,0.33913285158421347,
49
+ Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
50
+ Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
51
+ Baseline,5000,average,0.3974627910380972,
52
+ Baseline,5000,average_rank,1.6,
53
+ Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
54
+ Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
55
+ Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
56
+ Baseline,5000,mme_total_score,1181.4653861544618,
57
+ Baseline,5000,mmmu_val_mmmu_acc,0.26667,
58
+ Baseline,5000,mmstar_average,0.29596648146165705,
59
+ Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
60
+ Baseline,5000,seedbench_seed_all,0.43107281823235133,
61
+ Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
62
+ Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
63
+ Baseline,6000,average,0.4161227404571003,
64
+ Baseline,6000,average_rank,1.7,
65
+ Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
66
+ Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
67
+ Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
68
+ Baseline,6000,mme_total_score,1284.1648659463785,
69
+ Baseline,6000,mmmu_val_mmmu_acc,0.27111,
70
+ Baseline,6000,mmstar_average,0.2978489412854164,
71
+ Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
72
+ Baseline,6000,seedbench_seed_all,0.4795997776542524,
73
+ Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
74
+ Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
75
+ Baseline,7000,average,0.4291083177345374,
76
+ Baseline,7000,average_rank,1.6,
77
+ Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
78
+ Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
79
+ Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
80
+ Baseline,7000,mme_total_score,1185.875650260104,
81
+ Baseline,7000,mmmu_val_mmmu_acc,0.26556,
82
+ Baseline,7000,mmstar_average,0.31372400960777047,
83
+ Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
84
+ Baseline,7000,seedbench_seed_all,0.4964424680377988,
85
+ Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
86
+ Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
87
+ Baseline,8000,average,0.43846759477995995,
88
+ Baseline,8000,average_rank,1.8,
89
+ Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
90
+ Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
91
+ Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
92
+ Baseline,8000,mme_total_score,1199.2409963985594,
93
+ Baseline,8000,mmmu_val_mmmu_acc,0.28111,
94
+ Baseline,8000,mmstar_average,0.33512257186205047,
95
+ Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
96
+ Baseline,8000,seedbench_seed_all,0.5024458032240133,
97
+ Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
98
+ Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
99
+ Baseline,9000,average,0.4422510732201056,
100
+ Baseline,9000,average_rank,1.8,
101
+ Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
102
+ Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
103
+ Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
104
+ Baseline,9000,mme_total_score,1231.5195078031213,
105
+ Baseline,9000,mmmu_val_mmmu_acc,0.25889,
106
+ Baseline,9000,mmstar_average,0.3216444898242951,
107
+ Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
108
+ Baseline,9000,seedbench_seed_all,0.5120622568093385,
109
+ Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
110
+ Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
111
+ Baseline,10000,average,0.4523875703250908,
112
+ Baseline,10000,average_rank,1.7,
113
+ Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
114
+ Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
115
+ Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
116
+ Baseline,10000,mme_total_score,1240.8218287314926,
117
+ Baseline,10000,mmmu_val_mmmu_acc,0.28778,
118
+ Baseline,10000,mmstar_average,0.32972717906018517,
119
+ Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
120
+ Baseline,10000,seedbench_seed_all,0.5217342968315731,
121
+ Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
122
+ Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
123
+ Baseline,11000,average,0.4561398159525099,
124
+ Baseline,11000,average_rank,1.7,
125
+ Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
126
+ Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
127
+ Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
128
+ Baseline,11000,mme_total_score,1322.9488795518205,
129
+ Baseline,11000,mmmu_val_mmmu_acc,0.27778,
130
+ Baseline,11000,mmstar_average,0.3298563439522548,
131
+ Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
132
+ Baseline,11000,seedbench_seed_all,0.5237354085603113,
133
+ Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
134
+ Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
135
+ Baseline,12000,average,0.4582751140055433,
136
+ Baseline,12000,average_rank,1.7,
137
+ Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
138
+ Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
139
+ Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
140
+ Baseline,12000,mme_total_score,1225.6453581432572,
141
+ Baseline,12000,mmmu_val_mmmu_acc,0.27889,
142
+ Baseline,12000,mmstar_average,0.34010867846816534,
143
+ Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
144
+ Baseline,12000,seedbench_seed_all,0.5350194552529183,
145
+ Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
146
+ Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
147
+ Baseline,13000,average,0.4692868662590049,
148
+ Baseline,13000,average_rank,1.4,
149
+ Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
150
+ Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
151
+ Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
152
+ Baseline,13000,mme_total_score,1281.7122849139657,
153
+ Baseline,13000,mmmu_val_mmmu_acc,0.28222,
154
+ Baseline,13000,mmstar_average,0.3453069542917521,
155
+ Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
156
+ Baseline,13000,seedbench_seed_all,0.5442468037798777,
157
+ Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
158
+ Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
159
+ Baseline,14000,average,0.47352486841689195,
160
+ Baseline,14000,average_rank,1.3,
161
+ Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
162
+ Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
163
+ Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
164
+ Baseline,14000,mme_total_score,1309.1444577831132,
165
+ Baseline,14000,mmmu_val_mmmu_acc,0.28111,
166
+ Baseline,14000,mmstar_average,0.34575818188776586,
167
+ Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
168
+ Baseline,14000,seedbench_seed_all,0.5483602001111729,
169
+ Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
170
+ Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
171
+ Baseline,15000,average,0.47878665012878824,
172
+ Baseline,15000,average_rank,1.2,
173
+ Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
174
+ Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
175
+ Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
176
+ Baseline,15000,mme_total_score,1384.2171868747498,
177
+ Baseline,15000,mmmu_val_mmmu_acc,0.30222,
178
+ Baseline,15000,mmstar_average,0.35408135695920684,
179
+ Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
180
+ Baseline,15000,seedbench_seed_all,0.5411339633129516,
181
+ Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
182
+ Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
183
+ Baseline,16000,average,0.47665128022935843,
184
+ Baseline,16000,average_rank,1.5,
185
+ Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
186
+ Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
187
+ Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
188
+ Baseline,16000,mme_total_score,1317.8491396558625,
189
+ Baseline,16000,mmmu_val_mmmu_acc,0.27556,
190
+ Baseline,16000,mmstar_average,0.33214333327093315,
191
+ Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
192
+ Baseline,16000,seedbench_seed_all,0.5463590883824346,
193
+ Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
194
+ Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
195
+ Baseline,17000,average,0.4777141780162423,
196
+ Baseline,17000,average_rank,1.2,
197
+ Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
198
+ Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
199
+ Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
200
+ Baseline,17000,mme_total_score,1381.9161664665867,
201
+ Baseline,17000,mmmu_val_mmmu_acc,0.27667,
202
+ Baseline,17000,mmstar_average,0.3370289492329521,
203
+ Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
204
+ Baseline,17000,seedbench_seed_all,0.5510283490828238,
205
+ Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
206
+ Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
207
+ Baseline,18000,average,0.4819834595278701,
208
+ Baseline,18000,average_rank,1.1,
209
+ Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
210
+ Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
211
+ Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
212
+ Baseline,18000,mme_total_score,1336.922769107643,
213
+ Baseline,18000,mmmu_val_mmmu_acc,0.28667,
214
+ Baseline,18000,mmstar_average,0.34482796716566916,
215
+ Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
216
+ Baseline,18000,seedbench_seed_all,0.5543079488604781,
217
+ Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
218
+ Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
219
+ Baseline,19000,average,0.4899006713916878,
220
+ Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
221
+ Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
222
+ Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
223
+ Baseline,19000,mme_total_score,1406.6628651460583,
224
+ Baseline,19000,mmmu_val_mmmu_acc,0.28333,
225
+ Baseline,19000,mmstar_average,0.356220913822775,
226
+ Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
227
+ Baseline,19000,seedbench_seed_all,0.554585881045025,
228
+ Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
229
+ Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
230
+ Baseline,20000,average,0.4873169067639118,
231
+ Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
232
+ Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
233
+ Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
234
+ Baseline,20000,mme_total_score,1324.6738695478193,
235
+ Baseline,20000,mmmu_val_mmmu_acc,0.30111,
236
+ Baseline,20000,mmstar_average,0.33806766134497995,
237
+ Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
238
+ Baseline,20000,seedbench_seed_all,0.5587548638132296,
239
+ Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
240
+ Remove Multilingual Data,1000,ai2d_exact_match,0.2619818652849741,0.007914086941902855
241
+ Remove Multilingual Data,1000,average,0.29340443385847137,
242
+ Remove Multilingual Data,1000,average_rank,1.3,
243
+ Remove Multilingual Data,1000,chartqa_relaxed_overall,0.3736,0.009677121197436144
244
+ Remove Multilingual Data,1000,docvqa_val_anls,0.403140100303888,0.006111323163666132
245
+ Remove Multilingual Data,1000,infovqa_val_anls,0.1764617576183696,0.006251319736392345
246
+ Remove Multilingual Data,1000,mme_total_score,979.3045218087235,
247
+ Remove Multilingual Data,1000,mmmu_val_mmmu_acc,0.25222,
248
+ Remove Multilingual Data,1000,mmstar_average,0.2073057646207335,
249
+ Remove Multilingual Data,1000,ocrbench_ocrbench_accuracy,0.333,
250
+ Remove Multilingual Data,1000,seedbench_seed_all,0.2507504168982768,
251
+ Remove Multilingual Data,1000,textvqa_val_exact_match,0.38218,0.006631325992355026
252
+ Remove Multilingual Data,2000,ai2d_exact_match,0.25291450777202074,0.007823547213659585
253
+ Remove Multilingual Data,2000,average,0.32254499165624334,
254
+ Remove Multilingual Data,2000,average_rank,1.5,
255
+ Remove Multilingual Data,2000,chartqa_relaxed_overall,0.4692,0.009983005968307607
256
+ Remove Multilingual Data,2000,docvqa_val_anls,0.472590835723597,0.006255090657185791
257
+ Remove Multilingual Data,2000,infovqa_val_anls,0.19402428600531574,0.006415305613638088
258
+ Remove Multilingual Data,2000,mme_total_score,1067.5286114445778,
259
+ Remove Multilingual Data,2000,mmmu_val_mmmu_acc,0.24444,
260
+ Remove Multilingual Data,2000,mmstar_average,0.20544885849586278,
261
+ Remove Multilingual Data,2000,ocrbench_ocrbench_accuracy,0.409,
262
+ Remove Multilingual Data,2000,seedbench_seed_all,0.2555864369093941,
263
+ Remove Multilingual Data,2000,textvqa_val_exact_match,0.3997,0.006677042652231296
264
+ Remove Multilingual Data,3000,ai2d_exact_match,0.2658678756476684,0.00795154886571598
265
+ Remove Multilingual Data,3000,average,0.35383248024337044,
266
+ Remove Multilingual Data,3000,average_rank,1.4,
267
+ Remove Multilingual Data,3000,chartqa_relaxed_overall,0.536,0.009976041728231964
268
+ Remove Multilingual Data,3000,docvqa_val_anls,0.5115050780592246,0.006297134520533815
269
+ Remove Multilingual Data,3000,infovqa_val_anls,0.1959317380528948,0.006353999153527862
270
+ Remove Multilingual Data,3000,mme_total_score,1055.7074829931971,
271
+ Remove Multilingual Data,3000,mmmu_val_mmmu_acc,0.26,
272
+ Remove Multilingual Data,3000,mmstar_average,0.2325690534433309,
273
+ Remove Multilingual Data,3000,ocrbench_ocrbench_accuracy,0.449,
274
+ Remove Multilingual Data,3000,seedbench_seed_all,0.28943857698721515,
275
+ Remove Multilingual Data,3000,textvqa_val_exact_match,0.44418,0.0067730052591185854
276
+ Remove Multilingual Data,4000,ai2d_exact_match,0.2856217616580311,0.008130016747303466
277
+ Remove Multilingual Data,4000,average,0.3775873253769421,
278
+ Remove Multilingual Data,4000,average_rank,1.4,
279
+ Remove Multilingual Data,4000,chartqa_relaxed_overall,0.55,0.009951864943131942
280
+ Remove Multilingual Data,4000,docvqa_val_anls,0.5339851175847934,0.0062957385772197255
281
+ Remove Multilingual Data,4000,infovqa_val_anls,0.20750676546327357,0.006369425500899887
282
+ Remove Multilingual Data,4000,mme_total_score,1228.202280912365,
283
+ Remove Multilingual Data,4000,mmmu_val_mmmu_acc,0.27111,
284
+ Remove Multilingual Data,4000,mmstar_average,0.24655460164079995,
285
+ Remove Multilingual Data,4000,ocrbench_ocrbench_accuracy,0.456,
286
+ Remove Multilingual Data,4000,seedbench_seed_all,0.3898276820455809,
287
+ Remove Multilingual Data,4000,textvqa_val_exact_match,0.45768000000000003,0.006781666588703993
288
+ Remove Multilingual Data,5000,ai2d_exact_match,0.3121761658031088,0.008340079044408505
289
+ Remove Multilingual Data,5000,average,0.3976192139479395,
290
+ Remove Multilingual Data,5000,average_rank,1.4,
291
+ Remove Multilingual Data,5000,chartqa_relaxed_overall,0.5684,0.009907968668564455
292
+ Remove Multilingual Data,5000,docvqa_val_anls,0.5611339219828478,0.006260862186673622
293
+ Remove Multilingual Data,5000,infovqa_val_anls,0.21913407408993218,0.006638320670102091
294
+ Remove Multilingual Data,5000,mme_total_score,1219.2377951180472,
295
+ Remove Multilingual Data,5000,mmmu_val_mmmu_acc,0.29444,
296
+ Remove Multilingual Data,5000,mmstar_average,0.23556637343877926,
297
+ Remove Multilingual Data,5000,ocrbench_ocrbench_accuracy,0.472,
298
+ Remove Multilingual Data,5000,seedbench_seed_all,0.4443023902167871,
299
+ Remove Multilingual Data,5000,textvqa_val_exact_match,0.47142,0.006807048104779351
300
+ Remove Multilingual Data,6000,ai2d_exact_match,0.35200777202072536,0.008595926828224822
301
+ Remove Multilingual Data,6000,average,0.42451996443270734,
302
+ Remove Multilingual Data,6000,average_rank,1.3,
303
+ Remove Multilingual Data,6000,chartqa_relaxed_overall,0.5744,0.009890651444389179
304
+ Remove Multilingual Data,6000,docvqa_val_anls,0.5825552977560686,0.006257174245982806
305
+ Remove Multilingual Data,6000,infovqa_val_anls,0.252828230577843,0.007149939162213116
306
+ Remove Multilingual Data,6000,mme_total_score,1216.607643057223,
307
+ Remove Multilingual Data,6000,mmmu_val_mmmu_acc,0.30222,
308
+ Remove Multilingual Data,6000,mmstar_average,0.2807390632529032,
309
+ Remove Multilingual Data,6000,ocrbench_ocrbench_accuracy,0.497,
310
+ Remove Multilingual Data,6000,seedbench_seed_all,0.484769316286826,
311
+ Remove Multilingual Data,6000,textvqa_val_exact_match,0.49416000000000004,0.006798707477504303
312
+ Remove Multilingual Data,7000,ai2d_exact_match,0.3801813471502591,0.008736941116932581
313
+ Remove Multilingual Data,7000,average,0.428085510128325,
314
+ Remove Multilingual Data,7000,average_rank,1.4,
315
+ Remove Multilingual Data,7000,chartqa_relaxed_overall,0.5796,0.009874438607593145
316
+ Remove Multilingual Data,7000,docvqa_val_anls,0.5966369586509165,0.006224801729990067
317
+ Remove Multilingual Data,7000,infovqa_val_anls,0.23354910759447625,0.006817906701297544
318
+ Remove Multilingual Data,7000,mme_total_score,1188.1020408163265,
319
+ Remove Multilingual Data,7000,mmmu_val_mmmu_acc,0.27556,
320
+ Remove Multilingual Data,7000,mmstar_average,0.292518909276783,
321
+ Remove Multilingual Data,7000,ocrbench_ocrbench_accuracy,0.503,
322
+ Remove Multilingual Data,7000,seedbench_seed_all,0.48988326848249025,
323
+ Remove Multilingual Data,7000,textvqa_val_exact_match,0.5018400000000001,0.006795274684043781
324
+ Remove Multilingual Data,8000,ai2d_exact_match,0.3863341968911917,0.008763532923326706
325
+ Remove Multilingual Data,8000,average,0.4413787447198958,
326
+ Remove Multilingual Data,8000,average_rank,1.2,
327
+ Remove Multilingual Data,8000,chartqa_relaxed_overall,0.5964,0.009814343815957088
328
+ Remove Multilingual Data,8000,docvqa_val_anls,0.603351366738696,0.006235087701254087
329
+ Remove Multilingual Data,8000,infovqa_val_anls,0.25307646024963104,0.007198626238671866
330
+ Remove Multilingual Data,8000,mme_total_score,1261.5517206882753,
331
+ Remove Multilingual Data,8000,mmmu_val_mmmu_acc,0.29556,
332
+ Remove Multilingual Data,8000,mmstar_average,0.30595531673183934,
333
+ Remove Multilingual Data,8000,ocrbench_ocrbench_accuracy,0.505,
334
+ Remove Multilingual Data,8000,seedbench_seed_all,0.5124513618677042,
335
+ Remove Multilingual Data,8000,textvqa_val_exact_match,0.51428,0.006792322389925977
336
+ Remove Multilingual Data,9000,ai2d_exact_match,0.3908678756476684,0.008782181865213609
337
+ Remove Multilingual Data,9000,average,0.4483393474436153,
338
+ Remove Multilingual Data,9000,average_rank,1.2,
339
+ Remove Multilingual Data,9000,chartqa_relaxed_overall,0.6008,0.00979663889573671
340
+ Remove Multilingual Data,9000,docvqa_val_anls,0.6206417157518567,0.006160046717594884
341
+ Remove Multilingual Data,9000,infovqa_val_anls,0.2517144366407357,0.007092352700671051
342
+ Remove Multilingual Data,9000,mme_total_score,1270.4974989995999,
343
+ Remove Multilingual Data,9000,mmmu_val_mmmu_acc,0.29333,
344
+ Remove Multilingual Data,9000,mmstar_average,0.32657768650091523,
345
+ Remove Multilingual Data,9000,ocrbench_ocrbench_accuracy,0.52,
346
+ Remove Multilingual Data,9000,seedbench_seed_all,0.5163424124513619,
347
+ Remove Multilingual Data,9000,textvqa_val_exact_match,0.51478,0.006772730933446224
348
+ Remove Multilingual Data,10000,ai2d_exact_match,0.41450777202072536,0.008866630113019596
349
+ Remove Multilingual Data,10000,average,0.45448389614950035,
350
+ Remove Multilingual Data,10000,average_rank,1.3,
351
+ Remove Multilingual Data,10000,chartqa_relaxed_overall,0.6068,0.009771166474772143
352
+ Remove Multilingual Data,10000,docvqa_val_anls,0.6232449599819007,0.006177718712473361
353
+ Remove Multilingual Data,10000,infovqa_val_anls,0.23737546748097776,0.006778926597473845
354
+ Remove Multilingual Data,10000,mme_total_score,1276.3549419767905,
355
+ Remove Multilingual Data,10000,mmmu_val_mmmu_acc,0.29889,
356
+ Remove Multilingual Data,10000,mmstar_average,0.3130758097195978,
357
+ Remove Multilingual Data,10000,ocrbench_ocrbench_accuracy,0.539,
358
+ Remove Multilingual Data,10000,seedbench_seed_all,0.5219010561423013,
359
+ Remove Multilingual Data,10000,textvqa_val_exact_match,0.53556,0.00676001751827386
360
+ Remove Multilingual Data,11000,ai2d_exact_match,0.41904145077720206,0.008880404559123601
361
+ Remove Multilingual Data,11000,average,0.4609227111862355,
362
+ Remove Multilingual Data,11000,average_rank,1.3,
363
+ Remove Multilingual Data,11000,chartqa_relaxed_overall,0.6108,0.00975332737879659
364
+ Remove Multilingual Data,11000,docvqa_val_anls,0.6387481065492241,0.006094036395159673
365
+ Remove Multilingual Data,11000,infovqa_val_anls,0.25052436731474453,0.006993658213921465
366
+ Remove Multilingual Data,11000,mme_total_score,1258.2553021208482,
367
+ Remove Multilingual Data,11000,mmmu_val_mmmu_acc,0.28,
368
+ Remove Multilingual Data,11000,mmstar_average,0.3213557456291676,
369
+ Remove Multilingual Data,11000,ocrbench_ocrbench_accuracy,0.561,
370
+ Remove Multilingual Data,11000,seedbench_seed_all,0.526514730405781,
371
+ Remove Multilingual Data,11000,textvqa_val_exact_match,0.54032,0.0067608876222200335
372
+ Remove Multilingual Data,12000,ai2d_exact_match,0.41353626943005184,0.00886357792887845
373
+ Remove Multilingual Data,12000,average,0.46149948562642984,
374
+ Remove Multilingual Data,12000,average_rank,1.3,
375
+ Remove Multilingual Data,12000,chartqa_relaxed_overall,0.622,0.009699692449425671
376
+ Remove Multilingual Data,12000,docvqa_val_anls,0.6481870346272672,0.0060803752132680255
377
+ Remove Multilingual Data,12000,infovqa_val_anls,0.25116762340113796,0.006993814336062128
378
+ Remove Multilingual Data,12000,mme_total_score,1256.7357943177271,
379
+ Remove Multilingual Data,12000,mmmu_val_mmmu_acc,0.28222,
380
+ Remove Multilingual Data,12000,mmstar_average,0.311104865636332,
381
+ Remove Multilingual Data,12000,ocrbench_ocrbench_accuracy,0.547,
382
+ Remove Multilingual Data,12000,seedbench_seed_all,0.5312395775430795,
383
+ Remove Multilingual Data,12000,textvqa_val_exact_match,0.54704,0.006750774938661079
384
+ Remove Multilingual Data,13000,ai2d_exact_match,0.42810880829015546,0.008905646879422012
385
+ Remove Multilingual Data,13000,average,0.4658949593838579,
386
+ Remove Multilingual Data,13000,average_rank,1.6,
387
+ Remove Multilingual Data,13000,chartqa_relaxed_overall,0.622,0.009699692449425671
388
+ Remove Multilingual Data,13000,docvqa_val_anls,0.6461697403304425,0.006072036108570188
389
+ Remove Multilingual Data,13000,infovqa_val_anls,0.2635164421127001,0.007102540516236264
390
+ Remove Multilingual Data,13000,mme_total_score,1295.0039015606244,
391
+ Remove Multilingual Data,13000,mmmu_val_mmmu_acc,0.29,
392
+ Remove Multilingual Data,13000,mmstar_average,0.3296444797414335,
393
+ Remove Multilingual Data,13000,ocrbench_ocrbench_accuracy,0.54,
394
+ Remove Multilingual Data,13000,seedbench_seed_all,0.5312951639799889,
395
+ Remove Multilingual Data,13000,textvqa_val_exact_match,0.54232,0.006771571040376891
396
+ Remove Multilingual Data,14000,ai2d_exact_match,0.42487046632124353,0.008896983637113786
397
+ Remove Multilingual Data,14000,average,0.46755416993970794,
398
+ Remove Multilingual Data,14000,average_rank,1.7,
399
+ Remove Multilingual Data,14000,chartqa_relaxed_overall,0.6256,0.009681288495793083
400
+ Remove Multilingual Data,14000,docvqa_val_anls,0.6470833619171145,0.006119244473927763
401
+ Remove Multilingual Data,14000,infovqa_val_anls,0.2541720455309047,0.007006172199083197
402
+ Remove Multilingual Data,14000,mme_total_score,1262.1793717486994,
403
+ Remove Multilingual Data,14000,mmmu_val_mmmu_acc,0.28556,
404
+ Remove Multilingual Data,14000,mmstar_average,0.327544946405174,
405
+ Remove Multilingual Data,14000,ocrbench_ocrbench_accuracy,0.559,
406
+ Remove Multilingual Data,14000,seedbench_seed_all,0.5380767092829349,
407
+ Remove Multilingual Data,14000,textvqa_val_exact_match,0.5460799999999999,0.006754587449305995
408
+ Remove Multilingual Data,15000,ai2d_exact_match,0.42908031088082904,0.00890816984689523
409
+ Remove Multilingual Data,15000,average,0.4720258172705174,
410
+ Remove Multilingual Data,15000,average_rank,1.8,
411
+ Remove Multilingual Data,15000,chartqa_relaxed_overall,0.626,0.009679208378267924
412
+ Remove Multilingual Data,15000,docvqa_val_anls,0.655881547989144,0.006058079036611966
413
+ Remove Multilingual Data,15000,infovqa_val_anls,0.2538472956751567,0.006929926842577286
414
+ Remove Multilingual Data,15000,mme_total_score,1283.2800120048018,
415
+ Remove Multilingual Data,15000,mmmu_val_mmmu_acc,0.29,
416
+ Remove Multilingual Data,15000,mmstar_average,0.3309383426349411,
417
+ Remove Multilingual Data,15000,ocrbench_ocrbench_accuracy,0.572,
418
+ Remove Multilingual Data,15000,seedbench_seed_all,0.5407448582545858,
419
+ Remove Multilingual Data,15000,textvqa_val_exact_match,0.54974,0.006738090742441116
420
+ Remove Multilingual Data,16000,ai2d_exact_match,0.42940414507772023,0.008909003051055714
421
+ Remove Multilingual Data,16000,average,0.476926180401357,
422
+ Remove Multilingual Data,16000,average_rank,1.5,
423
+ Remove Multilingual Data,16000,chartqa_relaxed_overall,0.626,0.009679208378267924
424
+ Remove Multilingual Data,16000,docvqa_val_anls,0.6622394005833824,0.006046858134280091
425
+ Remove Multilingual Data,16000,infovqa_val_anls,0.2633356312454137,0.007137388413784386
426
+ Remove Multilingual Data,16000,mme_total_score,1328.4599839935972,
427
+ Remove Multilingual Data,16000,mmmu_val_mmmu_acc,0.29556,
428
+ Remove Multilingual Data,16000,mmstar_average,0.33932578522709744,
429
+ Remove Multilingual Data,16000,ocrbench_ocrbench_accuracy,0.578,
430
+ Remove Multilingual Data,16000,seedbench_seed_all,0.5431906614785992,
431
+ Remove Multilingual Data,16000,textvqa_val_exact_match,0.55528,0.006733817132847886
432
+ Remove Multilingual Data,17000,ai2d_exact_match,0.42940414507772023,0.008909003051055712
433
+ Remove Multilingual Data,17000,average,0.4732087844936434,
434
+ Remove Multilingual Data,17000,average_rank,1.8,
435
+ Remove Multilingual Data,17000,chartqa_relaxed_overall,0.6264,0.009677121197436144
436
+ Remove Multilingual Data,17000,docvqa_val_anls,0.661817176575324,0.0060368801840957114
437
+ Remove Multilingual Data,17000,infovqa_val_anls,0.25584519300448166,0.007033162778192734
438
+ Remove Multilingual Data,17000,mme_total_score,1270.766606642657,
439
+ Remove Multilingual Data,17000,mmmu_val_mmmu_acc,0.28,
440
+ Remove Multilingual Data,17000,mmstar_average,0.3233592606268431,
441
+ Remove Multilingual Data,17000,ocrbench_ocrbench_accuracy,0.58,
442
+ Remove Multilingual Data,17000,seedbench_seed_all,0.5439132851584213,
443
+ Remove Multilingual Data,17000,textvqa_val_exact_match,0.5581400000000001,0.006731048171116916
444
+ Remove Multilingual Data,18000,ai2d_exact_match,0.4368523316062176,0.008927095061184944
445
+ Remove Multilingual Data,18000,average,0.4769341122300441,
446
+ Remove Multilingual Data,18000,average_rank,1.9,
447
+ Remove Multilingual Data,18000,chartqa_relaxed_overall,0.636,0.009624897685803465
448
+ Remove Multilingual Data,18000,docvqa_val_anls,0.671397164123935,0.006004837667492473
449
+ Remove Multilingual Data,18000,infovqa_val_anls,0.2570865428675732,0.007022334730795061
450
+ Remove Multilingual Data,18000,mme_total_score,1330.2323929571828,
451
+ Remove Multilingual Data,18000,mmmu_val_mmmu_acc,0.28444,
452
+ Remove Multilingual Data,18000,mmstar_average,0.3272633338962395,
453
+ Remove Multilingual Data,18000,ocrbench_ocrbench_accuracy,0.579,
454
+ Remove Multilingual Data,18000,seedbench_seed_all,0.5457476375764313,
455
+ Remove Multilingual Data,18000,textvqa_val_exact_match,0.55462,0.0067429981999808505
app/src/content/assets/data/s25_ratings.csv ADDED
@@ -0,0 +1,1189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run,step,metric,value,stderr
2
+ ≥1,1000,ai2d_exact_match,0.48283678756476683,0.00899385068939683
3
+ ≥1,1000,average,0.4841740613238066,
4
+ ≥1,1000,average_rank,2.4,
5
+ ≥1,1000,chartqa_relaxed_overall,0.6328,0.00964276190429159
6
+ ≥1,1000,docvqa_val_anls,0.6709958484393396,0.006009113294340719
7
+ ≥1,1000,infovqa_val_anls,0.2911610792718508,0.007480963558334323
8
+ ≥1,1000,mme_total_score,1300.6441576630652,
9
+ ≥1,1000,mmmu_val_mmmu_acc,0.28111,
10
+ ≥1,1000,mmstar_average,0.34899099672724077,
11
+ ≥1,1000,ocrbench_ocrbench_accuracy,0.53,
12
+ ≥1,1000,seedbench_seed_all,0.5613118399110617,
13
+ ≥1,1000,textvqa_val_exact_match,0.5583600000000001,0.006733787259646062
14
+ ≥1,2000,ai2d_exact_match,0.4834844559585492,0.008994243503406855
15
+ ≥1,2000,average,0.4870755750428875,
16
+ ≥1,2000,average_rank,2.0,
17
+ ≥1,2000,chartqa_relaxed_overall,0.6296,0.0096601689190934
18
+ ≥1,2000,docvqa_val_anls,0.6827112292156415,0.005909694544631059
19
+ ≥1,2000,infovqa_val_anls,0.26248215166111283,0.006999241957900095
20
+ ≥1,2000,mme_total_score,1316.5322128851542,
21
+ ≥1,2000,mmmu_val_mmmu_acc,0.29556,
22
+ ≥1,2000,mmstar_average,0.351185684854186,
23
+ ≥1,2000,ocrbench_ocrbench_accuracy,0.557,
24
+ ≥1,2000,seedbench_seed_all,0.5579766536964981,
25
+ ≥1,2000,textvqa_val_exact_match,0.5636800000000001,0.006720565803631728
26
+ ≥1,3000,ai2d_exact_match,0.47085492227979275,0.008983852707691605
27
+ ≥1,3000,average,0.48291385198510484,
28
+ ≥1,3000,average_rank,2.7,
29
+ ≥1,3000,chartqa_relaxed_overall,0.6416,0.00959252743718011
30
+ ≥1,3000,docvqa_val_anls,0.680081009037435,0.005963713977526521
31
+ ≥1,3000,infovqa_val_anls,0.2758757523314467,0.007145074435929658
32
+ ≥1,3000,mme_total_score,1338.268607442977,
33
+ ≥1,3000,mmmu_val_mmmu_acc,0.26889,
34
+ ≥1,3000,mmstar_average,0.34908867626840856,
35
+ ≥1,3000,ocrbench_ocrbench_accuracy,0.542,
36
+ ≥1,3000,seedbench_seed_all,0.5577543079488605,
37
+ ≥1,3000,textvqa_val_exact_match,0.56008,0.00674696843305253
38
+ ≥1,4000,ai2d_exact_match,0.48218911917098445,0.008993442748995703
39
+ ≥1,4000,average,0.49172515123492716,
40
+ ≥1,4000,average_rank,2.3,
41
+ ≥1,4000,chartqa_relaxed_overall,0.6488,0.009548816468986266
42
+ ≥1,4000,docvqa_val_anls,0.6902890941626307,0.005912204920631156
43
+ ≥1,4000,infovqa_val_anls,0.26986279043614175,0.007091114226807192
44
+ ≥1,4000,mme_total_score,1322.6090436174468,
45
+ ≥1,4000,mmmu_val_mmmu_acc,0.31,
46
+ ≥1,4000,mmstar_average,0.35470222226954573,
47
+ ≥1,4000,ocrbench_ocrbench_accuracy,0.542,
48
+ ≥1,4000,seedbench_seed_all,0.5576431350750417,
49
+ ≥1,4000,textvqa_val_exact_match,0.57004,0.006721660198430491
50
+ ≥1,5000,ai2d_exact_match,0.48704663212435234,0.008996133680935945
51
+ ≥1,5000,average,0.4922453953675835,
52
+ ≥1,5000,average_rank,2.3,
53
+ ≥1,5000,chartqa_relaxed_overall,0.6524,0.009526069199715017
54
+ ≥1,5000,docvqa_val_anls,0.7021575420936199,0.005829944728253253
55
+ ≥1,5000,infovqa_val_anls,0.2714850202382579,0.0071017460136769345
56
+ ≥1,5000,mme_total_score,1372.0063025210084,
57
+ ≥1,5000,mmmu_val_mmmu_acc,0.28444,
58
+ ≥1,5000,mmstar_average,0.34918092027225467,
59
+ ≥1,5000,ocrbench_ocrbench_accuracy,0.553,
60
+ ≥1,5000,seedbench_seed_all,0.5571984435797666,
61
+ ≥1,5000,textvqa_val_exact_match,0.5733,0.0066972526186883305
62
+ ≥1,6000,ai2d_exact_match,0.4838082901554404,0.008994434238637761
63
+ ≥1,6000,average,0.4949352825546263,
64
+ ≥1,6000,average_rank,2.3,
65
+ ≥1,6000,chartqa_relaxed_overall,0.6484,0.009551307082635064
66
+ ≥1,6000,docvqa_val_anls,0.7034964362890477,0.00583650860725618
67
+ ≥1,6000,infovqa_val_anls,0.2724245614355471,0.0071074877022118095
68
+ ≥1,6000,mme_total_score,1406.1297519007603,
69
+ ≥1,6000,mmmu_val_mmmu_acc,0.30333,
70
+ ≥1,6000,mmstar_average,0.3537726186468994,
71
+ ≥1,6000,ocrbench_ocrbench_accuracy,0.551,
72
+ ≥1,6000,seedbench_seed_all,0.5621456364647026,
73
+ ≥1,6000,textvqa_val_exact_match,0.57604,0.006696965995935035
74
+ ≥1,7000,ai2d_exact_match,0.49158031088082904,0.008997878107766406
75
+ ≥1,7000,average,0.5010900439307898,
76
+ ≥1,7000,average_rank,1.9,
77
+ ≥1,7000,chartqa_relaxed_overall,0.6564,0.009500090351500593
78
+ ≥1,7000,docvqa_val_anls,0.7105997601562098,0.005781434620670767
79
+ ≥1,7000,infovqa_val_anls,0.29338120425035286,0.007415977951206446
80
+ ≥1,7000,mme_total_score,1362.5676270508204,
81
+ ≥1,7000,mmmu_val_mmmu_acc,0.30778,
82
+ ≥1,7000,mmstar_average,0.34667048751606516,
83
+ ≥1,7000,ocrbench_ocrbench_accuracy,0.555,
84
+ ≥1,7000,seedbench_seed_all,0.569538632573652,
85
+ ≥1,7000,textvqa_val_exact_match,0.57886,0.006701104464206482
86
+ ≥1,8000,ai2d_exact_match,0.48704663212435234,0.008996133680935945
87
+ ≥1,8000,average,0.5012874693126343,
88
+ ≥1,8000,average_rank,2.4,
89
+ ≥1,8000,chartqa_relaxed_overall,0.66,0.009476070829586857
90
+ ≥1,8000,docvqa_val_anls,0.7013710839567656,0.00584567154399218
91
+ ≥1,8000,infovqa_val_anls,0.2843596286067672,0.00726326016667778
92
+ ≥1,8000,mme_total_score,1366.6049419767905,
93
+ ≥1,8000,mmmu_val_mmmu_acc,0.29778,
94
+ ≥1,8000,mmstar_average,0.3726316801263804,
95
+ ≥1,8000,ocrbench_ocrbench_accuracy,0.568,
96
+ ≥1,8000,seedbench_seed_all,0.5647581989994441,
97
+ ≥1,8000,textvqa_val_exact_match,0.5756399999999999,0.006701275960583923
98
+ ≥1,9000,ai2d_exact_match,0.5080958549222798,0.008997974381217102
99
+ ≥1,9000,average,0.5049424624827252,
100
+ ≥1,9000,average_rank,1.9,
101
+ ≥1,9000,chartqa_relaxed_overall,0.6644,0.009445885130487209
102
+ ≥1,9000,docvqa_val_anls,0.7114743939854425,0.005784207378273765
103
+ ≥1,9000,infovqa_val_anls,0.27927629692536604,0.007234508289873752
104
+ ≥1,9000,mme_total_score,1385.721988795518,
105
+ ≥1,9000,mmmu_val_mmmu_acc,0.30333,
106
+ ≥1,9000,mmstar_average,0.35371044141416225,
107
+ ≥1,9000,ocrbench_ocrbench_accuracy,0.572,
108
+ ≥1,9000,seedbench_seed_all,0.5673151750972762,
109
+ ≥1,9000,textvqa_val_exact_match,0.58488,0.006674247990391685
110
+ ≥1,10000,ai2d_exact_match,0.5006476683937824,0.008999146569435552
111
+ ≥1,10000,average,0.5082439013030791,
112
+ ≥1,10000,average_rank,2.1,
113
+ ≥1,10000,chartqa_relaxed_overall,0.66,0.009476070829586857
114
+ ≥1,10000,docvqa_val_anls,0.7160888537676756,0.005756158349745215
115
+ ≥1,10000,infovqa_val_anls,0.29920594326668903,0.0074179476099996864
116
+ ≥1,10000,mme_total_score,1331.7510004001601,
117
+ ≥1,10000,mmmu_val_mmmu_acc,0.31222,
118
+ ≥1,10000,mmstar_average,0.34770435280317685,
119
+ ≥1,10000,ocrbench_ocrbench_accuracy,0.572,
120
+ ≥1,10000,seedbench_seed_all,0.5709282934963869,
121
+ ≥1,10000,textvqa_val_exact_match,0.5954,0.006639803114330983
122
+ ≥1,11000,ai2d_exact_match,0.506800518134715,0.008998321712163856
123
+ ≥1,11000,average,0.5113045470128461,
124
+ ≥1,11000,average_rank,2.4,
125
+ ≥1,11000,chartqa_relaxed_overall,0.6648,0.009443095510537233
126
+ ≥1,11000,docvqa_val_anls,0.7219007936057111,0.005738025679608452
127
+ ≥1,11000,infovqa_val_anls,0.2919206859707748,0.007295238934448537
128
+ ≥1,11000,mme_total_score,1423.2838135254103,
129
+ ≥1,11000,mmmu_val_mmmu_acc,0.32,
130
+ ≥1,11000,mmstar_average,0.34837257743331856,
131
+ ≥1,11000,ocrbench_ocrbench_accuracy,0.584,
132
+ ≥1,11000,seedbench_seed_all,0.567426347971095,
133
+ ≥1,11000,textvqa_val_exact_match,0.5965199999999999,0.006637830223651069
134
+ ≥1,12000,ai2d_exact_match,0.4957901554404145,0.0089988351333547
135
+ ≥1,12000,average,0.5133063005116858,
136
+ ≥1,12000,average_rank,2.0,
137
+ ≥1,12000,chartqa_relaxed_overall,0.6752,0.00936787525721462
138
+ ≥1,12000,docvqa_val_anls,0.7317458509080867,0.005677899397993261
139
+ ≥1,12000,infovqa_val_anls,0.30244398410320705,0.0074372299260171675
140
+ ≥1,12000,mme_total_score,1358.8711484593837,
141
+ ≥1,12000,mmmu_val_mmmu_acc,0.30222,
142
+ ≥1,12000,mmstar_average,0.36151764800560426,
143
+ ≥1,12000,ocrbench_ocrbench_accuracy,0.571,
144
+ ≥1,12000,seedbench_seed_all,0.5743190661478599,
145
+ ≥1,12000,textvqa_val_exact_match,0.6055199999999998,0.006601107546780982
146
+ ≥1,13000,ai2d_exact_match,0.5029145077720207,0.008999001233939133
147
+ ≥1,13000,average,0.5113232076887448,
148
+ ≥1,13000,average_rank,2.2,
149
+ ≥1,13000,chartqa_relaxed_overall,0.6764,0.009358859508536295
150
+ ≥1,13000,docvqa_val_anls,0.7299154645021083,0.005686391180628681
151
+ ≥1,13000,infovqa_val_anls,0.28296895663700367,0.007106598521793854
152
+ ≥1,13000,mme_total_score,1461.5425170068027,
153
+ ≥1,13000,mmmu_val_mmmu_acc,0.28444,
154
+ ≥1,13000,mmstar_average,0.3679555656349867,
155
+ ≥1,13000,ocrbench_ocrbench_accuracy,0.575,
156
+ ≥1,13000,seedbench_seed_all,0.5738743746525847,
157
+ ≥1,13000,textvqa_val_exact_match,0.60844,0.006603822784953804
158
+ ≥1,14000,ai2d_exact_match,0.508419689119171,0.00899787810776641
159
+ ≥1,14000,average,0.5204248423941521,
160
+ ≥1,14000,average_rank,1.5,
161
+ ≥1,14000,chartqa_relaxed_overall,0.6748,0.009370864914387439
162
+ ≥1,14000,docvqa_val_anls,0.7348023413497262,0.005658144612389036
163
+ ≥1,14000,infovqa_val_anls,0.30339204212390886,0.007452040139655917
164
+ ≥1,14000,mme_total_score,1421.6612645058024,
165
+ ≥1,14000,mmmu_val_mmmu_acc,0.32333,
166
+ ≥1,14000,mmstar_average,0.3578816768256025,
167
+ ≥1,14000,ocrbench_ocrbench_accuracy,0.59,
168
+ ≥1,14000,seedbench_seed_all,0.5760978321289605,
169
+ ≥1,14000,textvqa_val_exact_match,0.6151,0.006568548330143662
170
+ ≥1,15000,ai2d_exact_match,0.5123056994818653,0.008996428218289524
171
+ ≥1,15000,average,0.518135626255078,
172
+ ≥1,15000,average_rank,1.8,
173
+ ≥1,15000,chartqa_relaxed_overall,0.6768,0.009355838641547569
174
+ ≥1,15000,docvqa_val_anls,0.7406818231641893,0.00561534943093856
175
+ ≥1,15000,infovqa_val_anls,0.2993680664172523,0.007344080406067735
176
+ ≥1,15000,mme_total_score,1410.685474189676,
177
+ ≥1,15000,mmmu_val_mmmu_acc,0.31778,
178
+ ≥1,15000,mmstar_average,0.34818335740471335,
179
+ ≥1,15000,ocrbench_ocrbench_accuracy,0.581,
180
+ ≥1,15000,seedbench_seed_all,0.575041689827682,
181
+ ≥1,15000,textvqa_val_exact_match,0.61206,0.006579602534644686
182
+ ≥1,16000,ai2d_exact_match,0.5148963730569949,0.008995159373289019
183
+ ≥1,16000,average,0.5188529848530237,
184
+ ≥1,16000,average_rank,2.2,
185
+ ≥1,16000,chartqa_relaxed_overall,0.6768,0.009355838641547569
186
+ ≥1,16000,docvqa_val_anls,0.7381040832460759,0.005632273383411858
187
+ ≥1,16000,infovqa_val_anls,0.30209162600532213,0.007372809699325085
188
+ ≥1,16000,mme_total_score,1390.1362545018007,
189
+ ≥1,16000,mmmu_val_mmmu_acc,0.31111,
190
+ ≥1,16000,mmstar_average,0.35327018992913145,
191
+ ≥1,16000,ocrbench_ocrbench_accuracy,0.581,
192
+ ≥1,16000,seedbench_seed_all,0.5762645914396887,
193
+ ≥1,16000,textvqa_val_exact_match,0.6161399999999999,0.006566896139347796
194
+ ≥1,17000,ai2d_exact_match,0.5148963730569949,0.008995159373289019
195
+ ≥1,17000,average,0.5197229023161958,
196
+ ≥1,17000,average_rank,2.4,
197
+ ≥1,17000,chartqa_relaxed_overall,0.6808,0.009325198535746702
198
+ ≥1,17000,docvqa_val_anls,0.7415371461870564,0.005606416638789011
199
+ ≥1,17000,infovqa_val_anls,0.31757741607819345,0.0075605614362149656
200
+ ≥1,17000,mme_total_score,1349.7522008803521,
201
+ ≥1,17000,mmmu_val_mmmu_acc,0.29556,
202
+ ≥1,17000,mmstar_average,0.3467129398314668,
203
+ ≥1,17000,ocrbench_ocrbench_accuracy,0.589,
204
+ ≥1,17000,seedbench_seed_all,0.5760422456920511,
205
+ ≥1,17000,textvqa_val_exact_match,0.6153799999999999,0.0065759668329423305
206
+ ≥1,18000,ai2d_exact_match,0.5113341968911918,0.008996841687150462
207
+ ≥1,18000,average,0.5217542622446647,
208
+ ≥1,18000,average_rank,2.1,
209
+ ≥1,18000,chartqa_relaxed_overall,0.686,0.00928418431696466
210
+ ≥1,18000,docvqa_val_anls,0.7485976064804745,0.005545760483304357
211
+ ≥1,18000,infovqa_val_anls,0.3079394168596966,0.007506515528281936
212
+ ≥1,18000,mme_total_score,1386.236494597839,
213
+ ≥1,18000,mmmu_val_mmmu_acc,0.30889,
214
+ ≥1,18000,mmstar_average,0.36329690094894107,
215
+ ≥1,18000,ocrbench_ocrbench_accuracy,0.58,
216
+ ≥1,18000,seedbench_seed_all,0.5744302390216787,
217
+ ≥1,18000,textvqa_val_exact_match,0.6153,0.006569673821646289
218
+ ≥1,19000,ai2d_exact_match,0.5116580310880829,0.008996707642249475
219
+ ≥1,19000,average,0.5243525940235553,
220
+ ≥1,19000,average_rank,1.6,
221
+ ≥1,19000,chartqa_relaxed_overall,0.6896,0.009254998541285659
222
+ ≥1,19000,docvqa_val_anls,0.7410075109051968,0.005624845495160182
223
+ ≥1,19000,infovqa_val_anls,0.31451986671246684,0.00754441993362511
224
+ ≥1,19000,mme_total_score,1379.0539215686274,
225
+ ≥1,19000,mmmu_val_mmmu_acc,0.30889,
226
+ ≥1,19000,mmstar_average,0.36379458008546134,
227
+ ≥1,19000,ocrbench_ocrbench_accuracy,0.594,
228
+ ≥1,19000,seedbench_seed_all,0.5780433574207893,
229
+ ≥1,19000,textvqa_val_exact_match,0.61766,0.006552511881896322
230
+ ≥2,1000,ai2d_exact_match,0.47765544041450775,0.00899016344465196
231
+ ≥2,1000,average,0.48208320918746633,
232
+ ≥2,1000,average_rank,2.7,
233
+ ≥2,1000,chartqa_relaxed_overall,0.626,0.009679208378267924
234
+ ≥2,1000,docvqa_val_anls,0.6830886615719474,0.005941664313882304
235
+ ≥2,1000,infovqa_val_anls,0.2636626226113445,0.007012099858086531
236
+ ≥2,1000,mme_total_score,1394.7869147659064,
237
+ ≥2,1000,mmmu_val_mmmu_acc,0.28111,
238
+ ≥2,1000,mmstar_average,0.3621500124529322,
239
+ ≥2,1000,ocrbench_ocrbench_accuracy,0.53,
240
+ ≥2,1000,seedbench_seed_all,0.5518621456364647,
241
+ ≥2,1000,textvqa_val_exact_match,0.5632199999999999,0.006735793977260649
242
+ ≥2,2000,ai2d_exact_match,0.47506476683937826,0.00898795641911507
243
+ ≥2,2000,average,0.48647523098478523,
244
+ ≥2,2000,average_rank,2.4,
245
+ ≥2,2000,chartqa_relaxed_overall,0.6392,0.00960657371300514
246
+ ≥2,2000,docvqa_val_anls,0.6776161818000301,0.005964335785163625
247
+ ≥2,2000,infovqa_val_anls,0.28064001553745443,0.007228333231022024
248
+ ≥2,2000,mme_total_score,1262.5283113245298,
249
+ ≥2,2000,mmmu_val_mmmu_acc,0.29556,
250
+ ≥2,2000,mmstar_average,0.3433600502059375,
251
+ ≥2,2000,ocrbench_ocrbench_accuracy,0.562,
252
+ ≥2,2000,seedbench_seed_all,0.5489160644802669,
253
+ ≥2,2000,textvqa_val_exact_match,0.55592,0.006741845534884587
254
+ ≥2,3000,ai2d_exact_match,0.4854274611398964,0.00899533120652686
255
+ ≥2,3000,average,0.4892979098475977,
256
+ ≥2,3000,average_rank,2.0,
257
+ ≥2,3000,chartqa_relaxed_overall,0.642,0.009590161024476605
258
+ ≥2,3000,docvqa_val_anls,0.682810147307377,0.005940269120275799
259
+ ≥2,3000,infovqa_val_anls,0.27552490540828095,0.007240182675336717
260
+ ≥2,3000,mme_total_score,1310.3195278111243,
261
+ ≥2,3000,mmmu_val_mmmu_acc,0.29667,
262
+ ≥2,3000,mmstar_average,0.33383353302741087,
263
+ ≥2,3000,ocrbench_ocrbench_accuracy,0.56,
264
+ ≥2,3000,seedbench_seed_all,0.5592551417454141,
265
+ ≥2,3000,textvqa_val_exact_match,0.56816,0.00671355771938026
266
+ ≥2,4000,ai2d_exact_match,0.4838082901554404,0.008994434238637763
267
+ ≥2,4000,average,0.49195026536834224,
268
+ ≥2,4000,average_rank,2.2,
269
+ ≥2,4000,chartqa_relaxed_overall,0.6428,0.009585406407993486
270
+ ≥2,4000,docvqa_val_anls,0.6936982319965624,0.005883844142208432
271
+ ≥2,4000,infovqa_val_anls,0.26951374340713585,0.007112166845409044
272
+ ≥2,4000,mme_total_score,1301.329931972789,
273
+ ≥2,4000,mmmu_val_mmmu_acc,0.30667,
274
+ ≥2,4000,mmstar_average,0.34946626950413445,
275
+ ≥2,4000,ocrbench_ocrbench_accuracy,0.547,
276
+ ≥2,4000,seedbench_seed_all,0.5645358532518066,
277
+ ≥2,4000,textvqa_val_exact_match,0.5700599999999999,0.006712416151142391
278
+ ≥2,5000,ai2d_exact_match,0.4802461139896373,0.008992128148477658
279
+ ≥2,5000,average,0.4911460216363542,
280
+ ≥2,5000,average_rank,2.6,
281
+ ≥2,5000,chartqa_relaxed_overall,0.6592,0.009481461028833927
282
+ ≥2,5000,docvqa_val_anls,0.6952750329046061,0.005874374530558489
283
+ ≥2,5000,infovqa_val_anls,0.2792676155726946,0.007321946399777712
284
+ ≥2,5000,mme_total_score,1246.5271108443376,
285
+ ≥2,5000,mmmu_val_mmmu_acc,0.30667,
286
+ ≥2,5000,mmstar_average,0.3273375111929903,
287
+ ≥2,5000,ocrbench_ocrbench_accuracy,0.544,
288
+ ≥2,5000,seedbench_seed_all,0.5642579210672596,
289
+ ≥2,5000,textvqa_val_exact_match,0.56406,0.006733849732986717
290
+ ≥2,6000,ai2d_exact_match,0.47636010362694303,0.0089890902327936
291
+ ≥2,6000,average,0.49370635223913606,
292
+ ≥2,6000,average_rank,2.4,
293
+ ≥2,6000,chartqa_relaxed_overall,0.6576,0.00949215130381674
294
+ ≥2,6000,docvqa_val_anls,0.6979936603307108,0.005857650960456797
295
+ ≥2,6000,infovqa_val_anls,0.2848576580974239,0.007220288614025636
296
+ ≥2,6000,mme_total_score,1257.9977991196479,
297
+ ≥2,6000,mmmu_val_mmmu_acc,0.28889,
298
+ ≥2,6000,mmstar_average,0.3386087219715212,
299
+ ≥2,6000,ocrbench_ocrbench_accuracy,0.555,
300
+ ≥2,6000,seedbench_seed_all,0.5646470261256253,
301
+ ≥2,6000,textvqa_val_exact_match,0.5794000000000001,0.0066913139768320015
302
+ ≥2,7000,ai2d_exact_match,0.49125647668393785,0.008997778057794696
303
+ ≥2,7000,average,0.49923190517534066,
304
+ ≥2,7000,average_rank,2.4,
305
+ ≥2,7000,chartqa_relaxed_overall,0.6564,0.009500090351500593
306
+ ≥2,7000,docvqa_val_anls,0.7050049130392773,0.005832016517791021
307
+ ≥2,7000,infovqa_val_anls,0.27630514531293887,0.007147131752819133
308
+ ≥2,7000,mme_total_score,1298.6506602641057,
309
+ ≥2,7000,mmmu_val_mmmu_acc,0.30667,
310
+ ≥2,7000,mmstar_average,0.35103185667809866,
311
+ ≥2,7000,ocrbench_ocrbench_accuracy,0.561,
312
+ ≥2,7000,seedbench_seed_all,0.5657587548638132,
313
+ ≥2,7000,textvqa_val_exact_match,0.5796600000000001,0.006695268643186835
314
+ ≥2,8000,ai2d_exact_match,0.4948186528497409,0.008998670917263325
315
+ ≥2,8000,average,0.5019054681854818,
316
+ ≥2,8000,average_rank,1.7,
317
+ ≥2,8000,chartqa_relaxed_overall,0.6528,0.009523504757028414
318
+ ≥2,8000,docvqa_val_anls,0.7073923991601945,0.005811715016078567
319
+ ≥2,8000,infovqa_val_anls,0.2893855968120429,0.007315932200378898
320
+ ≥2,8000,mme_total_score,1294.7393957583033,
321
+ ≥2,8000,mmmu_val_mmmu_acc,0.31444,
322
+ ≥2,8000,mmstar_average,0.35566192560333365,
323
+ ≥2,8000,ocrbench_ocrbench_accuracy,0.543,
324
+ ≥2,8000,seedbench_seed_all,0.5711506392440244,
325
+ ≥2,8000,textvqa_val_exact_match,0.5885,0.006652668757748281
326
+ ≥2,9000,ai2d_exact_match,0.4961139896373057,0.008998882321332237
327
+ ≥2,9000,average,0.5033958878673905,
328
+ ≥2,9000,average_rank,1.8,
329
+ ≥2,9000,chartqa_relaxed_overall,0.6652,0.009440298284094473
330
+ ≥2,9000,docvqa_val_anls,0.706747911546142,0.005822953083156574
331
+ ≥2,9000,infovqa_val_anls,0.2960318229790583,0.007315313753711981
332
+ ≥2,9000,mme_total_score,1284.486194477791,
333
+ ≥2,9000,mmmu_val_mmmu_acc,0.31111,
334
+ ≥2,9000,mmstar_average,0.3461876713132692,
335
+ ≥2,9000,ocrbench_ocrbench_accuracy,0.551,
336
+ ≥2,9000,seedbench_seed_all,0.5688715953307393,
337
+ ≥2,9000,textvqa_val_exact_match,0.5893,0.006649446971666576
338
+ ≥2,10000,ai2d_exact_match,0.4954663212435233,0.008998784170060763
339
+ ≥2,10000,average,0.5062630509259689,
340
+ ≥2,10000,average_rank,2.5,
341
+ ≥2,10000,chartqa_relaxed_overall,0.668,0.009420504145710235
342
+ ≥2,10000,docvqa_val_anls,0.722875937910498,0.005715570269767272
343
+ ≥2,10000,infovqa_val_anls,0.28155653174519985,0.007182472403759747
344
+ ≥2,10000,mme_total_score,1304.360544217687,
345
+ ≥2,10000,mmmu_val_mmmu_acc,0.31556,
346
+ ≥2,10000,mmstar_average,0.34845583808486047,
347
+ ≥2,10000,ocrbench_ocrbench_accuracy,0.564,
348
+ ≥2,10000,seedbench_seed_all,0.5670928293496387,
349
+ ≥2,10000,textvqa_val_exact_match,0.59336,0.006650836699676301
350
+ ≥2,11000,ai2d_exact_match,0.5093911917098446,0.008997566627779879
351
+ ≥2,11000,average,0.5121996275740728,
352
+ ≥2,11000,average_rank,2.1,
353
+ ≥2,11000,chartqa_relaxed_overall,0.6692,0.009411906161401973
354
+ ≥2,11000,docvqa_val_anls,0.7205703696519083,0.005737270521428796
355
+ ≥2,11000,infovqa_val_anls,0.30697732217578644,0.007486340094072884
356
+ ≥2,11000,mme_total_score,1312.018607442977,
357
+ ≥2,11000,mmmu_val_mmmu_acc,0.30889,
358
+ ≥2,11000,mmstar_average,0.34270221710271187,
359
+ ≥2,11000,ocrbench_ocrbench_accuracy,0.574,
360
+ ≥2,11000,seedbench_seed_all,0.5739855475264035,
361
+ ≥2,11000,textvqa_val_exact_match,0.6040800000000001,0.00661203558088616
362
+ ≥2,12000,ai2d_exact_match,0.5123056994818653,0.008996428218289528
363
+ ≥2,12000,average,0.5150951619345675,
364
+ ≥2,12000,average_rank,2.3,
365
+ ≥2,12000,chartqa_relaxed_overall,0.6672,0.00942619781683542
366
+ ≥2,12000,docvqa_val_anls,0.726550362704052,0.005691891264118933
367
+ ≥2,12000,infovqa_val_anls,0.3008889889078986,0.007362325835960529
368
+ ≥2,12000,mme_total_score,1224.6687675070027,
369
+ ≥2,12000,mmmu_val_mmmu_acc,0.31444,
370
+ ≥2,12000,mmstar_average,0.35781468591706916,
371
+ ≥2,12000,ocrbench_ocrbench_accuracy,0.58,
372
+ ≥2,12000,seedbench_seed_all,0.5740967204002223,
373
+ ≥2,12000,textvqa_val_exact_match,0.60256,0.006618961505423797
374
+ ≥2,13000,ai2d_exact_match,0.5080958549222798,0.0089979743812171
375
+ ≥2,13000,average,0.5180586542380377,
376
+ ≥2,13000,average_rank,1.4,
377
+ ≥2,13000,chartqa_relaxed_overall,0.6752,0.00936787525721462
378
+ ≥2,13000,docvqa_val_anls,0.726059208019786,0.0056904102427854444
379
+ ≥2,13000,infovqa_val_anls,0.3067653345076983,0.007414171368476549
380
+ ≥2,13000,mme_total_score,1241.2817126850741,
381
+ ≥2,13000,mmmu_val_mmmu_acc,0.31778,
382
+ ≥2,13000,mmstar_average,0.35994731281597653,
383
+ ≥2,13000,ocrbench_ocrbench_accuracy,0.582,
384
+ ≥2,13000,seedbench_seed_all,0.5763201778765981,
385
+ ≥2,13000,textvqa_val_exact_match,0.61036,0.006605638574142127
386
+ ≥2,14000,ai2d_exact_match,0.5055051813471503,0.008998608627616667
387
+ ≥2,14000,average,0.5187199474947337,
388
+ ≥2,14000,average_rank,2.1,
389
+ ≥2,14000,chartqa_relaxed_overall,0.6788,0.00934061683451043
390
+ ≥2,14000,docvqa_val_anls,0.7306315173289623,0.005670445587318404
391
+ ≥2,14000,infovqa_val_anls,0.30084936045159133,0.007340699586893536
392
+ ≥2,14000,mme_total_score,1266.9314725890356,
393
+ ≥2,14000,mmmu_val_mmmu_acc,0.32,
394
+ ≥2,14000,mmstar_average,0.360779371604499,
395
+ ≥2,14000,ocrbench_ocrbench_accuracy,0.587,
396
+ ≥2,14000,seedbench_seed_all,0.5733740967204002,
397
+ ≥2,14000,textvqa_val_exact_match,0.61154,0.006582281592745273
398
+ ≥2,15000,ai2d_exact_match,0.5077720207253886,0.008998066878268323
399
+ ≥2,15000,average,0.5182417002827931,
400
+ ≥2,15000,average_rank,2.2,
401
+ ≥2,15000,chartqa_relaxed_overall,0.6732,0.009382745779746297
402
+ ≥2,15000,docvqa_val_anls,0.7366238053330653,0.005647248266865468
403
+ ≥2,15000,infovqa_val_anls,0.30893362163842225,0.007385953320794889
404
+ ≥2,15000,mme_total_score,1280.3160264105643,
405
+ ≥2,15000,mmmu_val_mmmu_acc,0.31778,
406
+ ≥2,15000,mmstar_average,0.3597603073218589,
407
+ ≥2,15000,ocrbench_ocrbench_accuracy,0.571,
408
+ ≥2,15000,seedbench_seed_all,0.5739855475264035,
409
+ ≥2,15000,textvqa_val_exact_match,0.61512,0.006574049037248568
410
+ ≥2,16000,ai2d_exact_match,0.5055051813471503,0.008998608627616667
411
+ ≥2,16000,average,0.5226694963682967,
412
+ ≥2,16000,average_rank,1.9,
413
+ ≥2,16000,chartqa_relaxed_overall,0.6844,0.009296947310365735
414
+ ≥2,16000,docvqa_val_anls,0.7369050997741022,0.00564381681657765
415
+ ≥2,16000,infovqa_val_anls,0.2990672453595873,0.007260058695111045
416
+ ≥2,16000,mme_total_score,1231.8950580232092,
417
+ ≥2,16000,mmmu_val_mmmu_acc,0.32333,
418
+ ≥2,16000,mmstar_average,0.3660943054808571,
419
+ ≥2,16000,ocrbench_ocrbench_accuracy,0.601,
420
+ ≥2,16000,seedbench_seed_all,0.5785436353529739,
421
+ ≥2,16000,textvqa_val_exact_match,0.6091799999999999,0.006589463538554954
422
+ ≥2,17000,ai2d_exact_match,0.5097150259067358,0.008997455247470535
423
+ ≥2,17000,average,0.5231030271400094,
424
+ ≥2,17000,average_rank,1.5,
425
+ ≥2,17000,chartqa_relaxed_overall,0.6844,0.009296947310365735
426
+ ≥2,17000,docvqa_val_anls,0.7407178352725541,0.005609117579860497
427
+ ≥2,17000,infovqa_val_anls,0.30677928223689904,0.007423923972542159
428
+ ≥2,17000,mme_total_score,1251.8118247298921,
429
+ ≥2,17000,mmmu_val_mmmu_acc,0.31333,
430
+ ≥2,17000,mmstar_average,0.35755470618019386,
431
+ ≥2,17000,ocrbench_ocrbench_accuracy,0.595,
432
+ ≥2,17000,seedbench_seed_all,0.5787103946637021,
433
+ ≥2,17000,textvqa_val_exact_match,0.6217199999999999,0.006547657801423109
434
+ ≥2,18000,ai2d_exact_match,0.5129533678756477,0.008996133680935945
435
+ ≥2,18000,average,0.520551210243477,
436
+ ≥2,18000,average_rank,2.1,
437
+ ≥2,18000,chartqa_relaxed_overall,0.6796,0.009334473148259746
438
+ ≥2,18000,docvqa_val_anls,0.7420992559479452,0.005605162069925204
439
+ ≥2,18000,infovqa_val_anls,0.30026388587258485,0.007302705356586967
440
+ ≥2,18000,mme_total_score,1243.8207282913165,
441
+ ≥2,18000,mmmu_val_mmmu_acc,0.31111,
442
+ ≥2,18000,mmstar_average,0.3520362724339696,
443
+ ≥2,18000,ocrbench_ocrbench_accuracy,0.591,
444
+ ≥2,18000,seedbench_seed_all,0.576598110061145,
445
+ ≥2,18000,textvqa_val_exact_match,0.6193,0.006553540400299342
446
+ ≥2,19000,ai2d_exact_match,0.508419689119171,0.008997878107766411
447
+ ≥2,19000,average,0.523370364263479,
448
+ ≥2,19000,average_rank,2.1,
449
+ ≥2,19000,chartqa_relaxed_overall,0.6852,0.009290581788240476
450
+ ≥2,19000,docvqa_val_anls,0.7378793056289451,0.005630284657853331
451
+ ≥2,19000,infovqa_val_anls,0.29852452208029057,0.007300069652856512
452
+ ≥2,19000,mme_total_score,1273.484593837535,
453
+ ≥2,19000,mmmu_val_mmmu_acc,0.31778,
454
+ ≥2,19000,mmstar_average,0.3583181328603031,
455
+ ≥2,19000,ocrbench_ocrbench_accuracy,0.609,
456
+ ≥2,19000,seedbench_seed_all,0.5769316286826014,
457
+ ≥2,19000,textvqa_val_exact_match,0.6182799999999999,0.0065560479462046795
458
+ ≥2,20000,ai2d_exact_match,0.5132772020725389,0.008995980744276042
459
+ ≥2,20000,average,0.5252790448062622,
460
+ ≥2,20000,average_rank,1.1,
461
+ ≥2,20000,chartqa_relaxed_overall,0.6808,0.009325198535746702
462
+ ≥2,20000,docvqa_val_anls,0.7417425674578729,0.0056064333517934105
463
+ ≥2,20000,infovqa_val_anls,0.3091382953917658,0.007408396253875713
464
+ ≥2,20000,mme_total_score,1276.3417366946778,
465
+ ≥2,20000,mmmu_val_mmmu_acc,0.31778,
466
+ ≥2,20000,mmstar_average,0.35941177079666153,
467
+ ≥2,20000,ocrbench_ocrbench_accuracy,0.607,
468
+ ≥2,20000,seedbench_seed_all,0.5788215675375209,
469
+ ≥2,20000,textvqa_val_exact_match,0.6195400000000001,0.0065546800414733606
470
+ ≥3,1000,ai2d_exact_match,0.46696891191709844,0.008979495543032526
471
+ ≥3,1000,average,0.4819077497875202,
472
+ ≥3,1000,average_rank,2.6,
473
+ ≥3,1000,chartqa_relaxed_overall,0.6376,0.009615793331418735
474
+ ≥3,1000,docvqa_val_anls,0.6765375416572318,0.00595906808496784
475
+ ≥3,1000,infovqa_val_anls,0.28562874655210324,0.007377482443151623
476
+ ≥3,1000,mme_total_score,1210.921368547419,
477
+ ≥3,1000,mmmu_val_mmmu_acc,0.28556,
478
+ ≥3,1000,mmstar_average,0.342259005993489,
479
+ ≥3,1000,ocrbench_ocrbench_accuracy,0.534,
480
+ ≥3,1000,seedbench_seed_all,0.5559755419677599,
481
+ ≥3,1000,textvqa_val_exact_match,0.5526399999999999,0.006746058696867995
482
+ ≥3,2000,ai2d_exact_match,0.4685880829015544,0.008981377477192708
483
+ ≥3,2000,average,0.48620065133730717,
484
+ ≥3,2000,average_rank,2.0,
485
+ ≥3,2000,chartqa_relaxed_overall,0.6356,0.009627155802808046
486
+ ≥3,2000,docvqa_val_anls,0.6718354763369633,0.005996528203070324
487
+ ≥3,2000,infovqa_val_anls,0.26419815798743335,0.006939135175774486
488
+ ≥3,2000,mme_total_score,1264.654161664666,
489
+ ≥3,2000,mmmu_val_mmmu_acc,0.30333,
490
+ ≥3,2000,mmstar_average,0.3562263182394967,
491
+ ≥3,2000,ocrbench_ocrbench_accuracy,0.551,
492
+ ≥3,2000,seedbench_seed_all,0.5580878265703169,
493
+ ≥3,2000,textvqa_val_exact_match,0.56694,0.0067100232609457085
494
+ ≥3,3000,ai2d_exact_match,0.4838082901554404,0.008994434238637763
495
+ ≥3,3000,average,0.48597261570869915,
496
+ ≥3,3000,average_rank,3.0,
497
+ ≥3,3000,chartqa_relaxed_overall,0.6316,0.009649342979082627
498
+ ≥3,3000,docvqa_val_anls,0.6746316657514325,0.005965125654000594
499
+ ≥3,3000,infovqa_val_anls,0.26946459224600977,0.007089931445596614
500
+ ≥3,3000,mme_total_score,1247.7741096438576,
501
+ ≥3,3000,mmmu_val_mmmu_acc,0.28556,
502
+ ≥3,3000,mmstar_average,0.3458825563160156,
503
+ ≥3,3000,ocrbench_ocrbench_accuracy,0.562,
504
+ ≥3,3000,seedbench_seed_all,0.5555864369093941,
505
+ ≥3,3000,textvqa_val_exact_match,0.56522,0.006727876573231477
506
+ ≥3,4000,ai2d_exact_match,0.48575129533678757,0.008995499260034972
507
+ ≥3,4000,average,0.49355357269641903,
508
+ ≥3,4000,average_rank,2.2,
509
+ ≥3,4000,chartqa_relaxed_overall,0.6548,0.009510571191350932
510
+ ≥3,4000,docvqa_val_anls,0.6853328262496681,0.0059320222320751875
511
+ ≥3,4000,infovqa_val_anls,0.2850385340966683,0.007361799302921674
512
+ ≥3,4000,mme_total_score,1288.7305922368948,
513
+ ≥3,4000,mmmu_val_mmmu_acc,0.30111,
514
+ ≥3,4000,mmstar_average,0.357958492470139,
515
+ ≥3,4000,ocrbench_ocrbench_accuracy,0.541,
516
+ ≥3,4000,seedbench_seed_all,0.5598110061145081,
517
+ ≥3,4000,textvqa_val_exact_match,0.57118,0.006705227329084893
518
+ ≥3,5000,ai2d_exact_match,0.4727979274611399,0.008985826352357517
519
+ ≥3,5000,average,0.4915808423458039,
520
+ ≥3,5000,average_rank,2.6,
521
+ ≥3,5000,chartqa_relaxed_overall,0.6516,0.009531175862679805
522
+ ≥3,5000,docvqa_val_anls,0.6805544343770252,0.005954062592926349
523
+ ≥3,5000,infovqa_val_anls,0.2790745100628044,0.007226853744230138
524
+ ≥3,5000,mme_total_score,1234.2862144857945,
525
+ ≥3,5000,mmmu_val_mmmu_acc,0.30889,
526
+ ≥3,5000,mmstar_average,0.3515137442307206,
527
+ ≥3,5000,ocrbench_ocrbench_accuracy,0.548,
528
+ ≥3,5000,seedbench_seed_all,0.5665369649805447,
529
+ ≥3,5000,textvqa_val_exact_match,0.56526,0.006737603842695726
530
+ ≥3,6000,ai2d_exact_match,0.4834844559585492,0.008994243503406857
531
+ ≥3,6000,average,0.4916502418219912,
532
+ ≥3,6000,average_rank,2.8,
533
+ ≥3,6000,chartqa_relaxed_overall,0.6504,0.009538780390203614
534
+ ≥3,6000,docvqa_val_anls,0.6884045774843457,0.005915343845415068
535
+ ≥3,6000,infovqa_val_anls,0.27942328823789453,0.007164390448746867
536
+ ≥3,6000,mme_total_score,1169.1683673469388,
537
+ ≥3,6000,mmmu_val_mmmu_acc,0.28333,
538
+ ≥3,6000,mmstar_average,0.3409248964069589,
539
+ ≥3,6000,ocrbench_ocrbench_accuracy,0.564,
540
+ ≥3,6000,seedbench_seed_all,0.5649249583101723,
541
+ ≥3,6000,textvqa_val_exact_match,0.5699599999999999,0.006704305275108255
542
+ ≥3,7000,ai2d_exact_match,0.49158031088082904,0.008997878107766406
543
+ ≥3,7000,average,0.498429728288751,
544
+ ≥3,7000,average_rank,2.5,
545
+ ≥3,7000,chartqa_relaxed_overall,0.6468,0.009561196085649289
546
+ ≥3,7000,docvqa_val_anls,0.6934235036732116,0.005908575911274035
547
+ ≥3,7000,infovqa_val_anls,0.29038240983122426,0.0073217194111741745
548
+ ≥3,7000,mme_total_score,1180.5045018007202,
549
+ ≥3,7000,mmmu_val_mmmu_acc,0.31111,
550
+ ≥3,7000,mmstar_average,0.3501487176509592,
551
+ ≥3,7000,ocrbench_ocrbench_accuracy,0.562,
552
+ ≥3,7000,seedbench_seed_all,0.5647026125625347,
553
+ ≥3,7000,textvqa_val_exact_match,0.57572,0.00668387845238326
554
+ ≥3,8000,ai2d_exact_match,0.5048575129533679,0.008998729431386472
555
+ ≥3,8000,average,0.4984949624992469,
556
+ ≥3,8000,average_rank,2.8,
557
+ ≥3,8000,chartqa_relaxed_overall,0.6472,0.009558734841217527
558
+ ≥3,8000,docvqa_val_anls,0.7000334929155309,0.005878854074791644
559
+ ≥3,8000,infovqa_val_anls,0.286719889854365,0.0073233352192073635
560
+ ≥3,8000,mme_total_score,1136.111644657863,
561
+ ≥3,8000,mmmu_val_mmmu_acc,0.28778,
562
+ ≥3,8000,mmstar_average,0.33112430039975294,
563
+ ≥3,8000,ocrbench_ocrbench_accuracy,0.578,
564
+ ≥3,8000,seedbench_seed_all,0.5710394663702056,
565
+ ≥3,8000,textvqa_val_exact_match,0.5797000000000001,0.006692483833971778
566
+ ≥3,9000,ai2d_exact_match,0.4944948186528497,0.00899860862761667
567
+ ≥3,9000,average,0.5022809828687513,
568
+ ≥3,9000,average_rank,2.8,
569
+ ≥3,9000,chartqa_relaxed_overall,0.6648,0.009443095510537233
570
+ ≥3,9000,docvqa_val_anls,0.7066412322864666,0.005811056629671494
571
+ ≥3,9000,infovqa_val_anls,0.2915189250095514,0.007376511883779376
572
+ ≥3,9000,mme_total_score,1097.9659863945578,
573
+ ≥3,9000,mmmu_val_mmmu_acc,0.3,
574
+ ≥3,9000,mmstar_average,0.34971925063698756,
575
+ ≥3,9000,ocrbench_ocrbench_accuracy,0.565,
576
+ ≥3,9000,seedbench_seed_all,0.5663146192329072,
577
+ ≥3,9000,textvqa_val_exact_match,0.58204,0.006677395090979731
578
+ ≥3,10000,ai2d_exact_match,0.49287564766839376,0.008998240543632312
579
+ ≥3,10000,average,0.5094325810245673,
580
+ ≥3,10000,average_rank,2.2,
581
+ ≥3,10000,chartqa_relaxed_overall,0.6704,0.009403239035659185
582
+ ≥3,10000,docvqa_val_anls,0.7142047579734908,0.005771728801461397
583
+ ≥3,10000,infovqa_val_anls,0.2964737261567996,0.007512514632225057
584
+ ≥3,10000,mme_total_score,1149.6209483793518,
585
+ ≥3,10000,mmmu_val_mmmu_acc,0.30778,
586
+ ≥3,10000,mmstar_average,0.3527466682951291,
587
+ ≥3,10000,ocrbench_ocrbench_accuracy,0.577,
588
+ ≥3,10000,seedbench_seed_all,0.5703724291272929,
589
+ ≥3,10000,textvqa_val_exact_match,0.6030399999999999,0.006618920886575133
590
+ ≥3,11000,ai2d_exact_match,0.506800518134715,0.008998321712163861
591
+ ≥3,11000,average,0.5127105840130626,
592
+ ≥3,11000,average_rank,2.3,
593
+ ≥3,11000,chartqa_relaxed_overall,0.6676,0.009423354808471266
594
+ ≥3,11000,docvqa_val_anls,0.7155651550295605,0.005774638250173171
595
+ ≥3,11000,infovqa_val_anls,0.2960078107648859,0.007491292300444957
596
+ ≥3,11000,mme_total_score,1091.2908163265306,
597
+ ≥3,11000,mmmu_val_mmmu_acc,0.31,
598
+ ≥3,11000,mmstar_average,0.3487291874190849,
599
+ ≥3,11000,ocrbench_ocrbench_accuracy,0.597,
600
+ ≥3,11000,seedbench_seed_all,0.5746525847693162,
601
+ ≥3,11000,textvqa_val_exact_match,0.59804,0.006635181746369987
602
+ ≥3,12000,ai2d_exact_match,0.5045336787564767,0.008998784170060779
603
+ ≥3,12000,average,0.5136574673989756,
604
+ ≥3,12000,average_rank,2.5,
605
+ ≥3,12000,chartqa_relaxed_overall,0.6648,0.009443095510537233
606
+ ≥3,12000,docvqa_val_anls,0.7173866974813923,0.005773853880330729
607
+ ≥3,12000,infovqa_val_anls,0.31948469442993455,0.007793312195447671
608
+ ≥3,12000,mme_total_score,1082.547619047619,
609
+ ≥3,12000,mmmu_val_mmmu_acc,0.29889,
610
+ ≥3,12000,mmstar_average,0.3472872054060228,
611
+ ≥3,12000,ocrbench_ocrbench_accuracy,0.593,
612
+ ≥3,12000,seedbench_seed_all,0.5748749305169538,
613
+ ≥3,12000,textvqa_val_exact_match,0.6026600000000001,0.006626535072978538
614
+ ≥3,13000,ai2d_exact_match,0.4996761658031088,0.008999152231809674
615
+ ≥3,13000,average,0.5115591424915379,
616
+ ≥3,13000,average_rank,3.0,
617
+ ≥3,13000,chartqa_relaxed_overall,0.668,0.009420504145710235
618
+ ≥3,13000,docvqa_val_anls,0.7201586562486062,0.005765862770757432
619
+ ≥3,13000,infovqa_val_anls,0.30087605763050673,0.007444543350447085
620
+ ≥3,13000,mme_total_score,1142.0209083633454,
621
+ ≥3,13000,mmmu_val_mmmu_acc,0.31333,
622
+ ≥3,13000,mmstar_average,0.35413412647702824,
623
+ ≥3,13000,ocrbench_ocrbench_accuracy,0.568,
624
+ ≥3,13000,seedbench_seed_all,0.5750972762645914,
625
+ ≥3,13000,textvqa_val_exact_match,0.60476,0.0066167835724745445
626
+ ≥3,14000,ai2d_exact_match,0.5051813471502591,0.008998670917263325
627
+ ≥3,14000,average,0.512283584996583,
628
+ ≥3,14000,average_rank,2.9,
629
+ ≥3,14000,chartqa_relaxed_overall,0.6748,0.009370864914387439
630
+ ≥3,14000,docvqa_val_anls,0.7235575236423071,0.0057268410738261786
631
+ ≥3,14000,infovqa_val_anls,0.30893243437607226,0.007712373578271492
632
+ ≥3,14000,mme_total_score,1159.9943977591035,
633
+ ≥3,14000,mmmu_val_mmmu_acc,0.29667,
634
+ ≥3,14000,mmstar_average,0.3421543616905478,
635
+ ≥3,14000,ocrbench_ocrbench_accuracy,0.576,
636
+ ≥3,14000,seedbench_seed_all,0.5778765981100611,
637
+ ≥3,14000,textvqa_val_exact_match,0.6053799999999999,0.006612545370071516
638
+ ≥3,15000,ai2d_exact_match,0.501619170984456,0.00899910693271464
639
+ ≥3,15000,average,0.5157692661333466,
640
+ ≥3,15000,average_rank,2.6,
641
+ ≥3,15000,chartqa_relaxed_overall,0.6836,0.009303280948921504
642
+ ≥3,15000,docvqa_val_anls,0.7289675184474169,0.005688711489562826
643
+ ≥3,15000,infovqa_val_anls,0.31447779168584217,0.0076280570930290885
644
+ ≥3,15000,mme_total_score,1129.2125850340135,
645
+ ≥3,15000,mmmu_val_mmmu_acc,0.31222,
646
+ ≥3,15000,mmstar_average,0.35035142103070877,
647
+ ≥3,15000,ocrbench_ocrbench_accuracy,0.563,
648
+ ≥3,15000,seedbench_seed_all,0.5774874930516953,
649
+ ≥3,15000,textvqa_val_exact_match,0.6102,0.006593260666562748
650
+ ≥3,16000,ai2d_exact_match,0.506800518134715,0.00899832171216386
651
+ ≥3,16000,average,0.5182958246289815,
652
+ ≥3,16000,average_rank,2.5,
653
+ ≥3,16000,chartqa_relaxed_overall,0.674,0.009376820884924869
654
+ ≥3,16000,docvqa_val_anls,0.7332718536740643,0.005664165532854214
655
+ ≥3,16000,infovqa_val_anls,0.3097055695251213,0.007564531791761635
656
+ ≥3,16000,mme_total_score,1158.8010204081631,
657
+ ≥3,16000,mmmu_val_mmmu_acc,0.30889,
658
+ ≥3,16000,mmstar_average,0.3535555364692335,
659
+ ≥3,16000,ocrbench_ocrbench_accuracy,0.588,
660
+ ≥3,16000,seedbench_seed_all,0.5780989438576987,
661
+ ≥3,16000,textvqa_val_exact_match,0.61234,0.006584482968555135
662
+ ≥3,17000,ai2d_exact_match,0.4990284974093264,0.008999137132137064
663
+ ≥3,17000,average,0.517538300624539,
664
+ ≥3,17000,average_rank,2.8,
665
+ ≥3,17000,chartqa_relaxed_overall,0.6736,0.009379787213112317
666
+ ≥3,17000,docvqa_val_anls,0.7343487873475517,0.005650745093023672
667
+ ≥3,17000,infovqa_val_anls,0.30023060019445785,0.007383738588396597
668
+ ≥3,17000,mme_total_score,1158.095238095238,
669
+ ≥3,17000,mmmu_val_mmmu_acc,0.31222,
670
+ ≥3,17000,mmstar_average,0.3493043581903593,
671
+ ≥3,17000,ocrbench_ocrbench_accuracy,0.594,
672
+ ≥3,17000,seedbench_seed_all,0.5784324624791551,
673
+ ≥3,17000,textvqa_val_exact_match,0.61668,0.0065583044906102304
674
+ ≥3,18000,ai2d_exact_match,0.5058290155440415,0.008998542562369287
675
+ ≥3,18000,average,0.5182210734972332,
676
+ ≥3,18000,average_rank,2.5,
677
+ ≥3,18000,chartqa_relaxed_overall,0.674,0.009376820884924869
678
+ ≥3,18000,docvqa_val_anls,0.7287326909630594,0.005700735629180951
679
+ ≥3,18000,infovqa_val_anls,0.30100700787702633,0.007386740457934267
680
+ ≥3,18000,mme_total_score,1175.5579231692677,
681
+ ≥3,18000,mmmu_val_mmmu_acc,0.32,
682
+ ≥3,18000,mmstar_average,0.34714462691309605,
683
+ ≥3,18000,ocrbench_ocrbench_accuracy,0.6,
684
+ ≥3,18000,seedbench_seed_all,0.5773763201778765,
685
+ ≥3,18000,textvqa_val_exact_match,0.6099,0.006589801445917723
686
+ ≥3,19000,ai2d_exact_match,0.5045336787564767,0.008998784170060777
687
+ ≥3,19000,average,0.5187824665863345,
688
+ ≥3,19000,average_rank,2.7,
689
+ ≥3,19000,chartqa_relaxed_overall,0.6768,0.009355838641547569
690
+ ≥3,19000,docvqa_val_anls,0.7340665543774125,0.005662673189593881
691
+ ≥3,19000,infovqa_val_anls,0.3094998838176309,0.007498739242965892
692
+ ≥3,19000,mme_total_score,1173.1207482993198,
693
+ ≥3,19000,mmmu_val_mmmu_acc,0.30444,
694
+ ≥3,19000,mmstar_average,0.34505224352615843,
695
+ ≥3,19000,ocrbench_ocrbench_accuracy,0.597,
696
+ ≥3,19000,seedbench_seed_all,0.5777098387993329,
697
+ ≥3,19000,textvqa_val_exact_match,0.6199399999999999,0.0065535844523310115
698
+ ≥3,20000,ai2d_exact_match,0.4944948186528497,0.008998608627616674
699
+ ≥3,20000,average,0.5158935311436484,
700
+ ≥3,20000,average_rank,2.2,
701
+ ≥3,20000,chartqa_relaxed_overall,0.6788,0.00934061683451043
702
+ ≥3,20000,docvqa_val_anls,0.7330651042103438,0.0056772111451400455
703
+ ≥3,20000,infovqa_val_anls,0.2964558374276726,0.007412691037826716
704
+ ≥3,20000,mme_total_score,1203.7891156462586,
705
+ ≥3,20000,mmmu_val_mmmu_acc,0.31556,
706
+ ≥3,20000,mmstar_average,0.3448737131648378,
707
+ ≥3,20000,ocrbench_ocrbench_accuracy,0.592,
708
+ ≥3,20000,seedbench_seed_all,0.5741523068371317,
709
+ ≥3,20000,textvqa_val_exact_match,0.6136400000000001,0.006578650759020563
710
+ ≥4,1000,ai2d_exact_match,0.46599740932642486,0.008978320789223164
711
+ ≥4,1000,average,0.4810433130994131,
712
+ ≥4,1000,average_rank,3.0,
713
+ ≥4,1000,chartqa_relaxed_overall,0.6364,0.009622632385247222
714
+ ≥4,1000,docvqa_val_anls,0.6731681544556957,0.005980246808815758
715
+ ≥4,1000,infovqa_val_anls,0.273064875980351,0.007121239402495689
716
+ ≥4,1000,mme_total_score,1069.875850340136,
717
+ ≥4,1000,mmmu_val_mmmu_acc,0.28889,
718
+ ≥4,1000,mmstar_average,0.35232408630345313,
719
+ ≥4,1000,ocrbench_ocrbench_accuracy,0.54,
720
+ ≥4,1000,seedbench_seed_all,0.5455252918287937,
721
+ ≥4,1000,textvqa_val_exact_match,0.5540200000000001,0.006743431077169729
722
+ ≥4,2000,ai2d_exact_match,0.4579015544041451,0.00896719935987288
723
+ ≥4,2000,average,0.4733427752805117,
724
+ ≥4,2000,average_rank,4.1,
725
+ ≥4,2000,chartqa_relaxed_overall,0.6368,0.009620359896064799
726
+ ≥4,2000,docvqa_val_anls,0.6685623057181342,0.006022846398992095
727
+ ≥4,2000,infovqa_val_anls,0.2586347697028306,0.006939507684848232
728
+ ≥4,2000,mme_total_score,1037.0391156462586,
729
+ ≥4,2000,mmmu_val_mmmu_acc,0.27778,
730
+ ≥4,2000,mmstar_average,0.3426833682664769,
731
+ ≥4,2000,ocrbench_ocrbench_accuracy,0.517,
732
+ ≥4,2000,seedbench_seed_all,0.5533629794330184,
733
+ ≥4,2000,textvqa_val_exact_match,0.5473600000000001,0.006769325729654826
734
+ ≥4,3000,ai2d_exact_match,0.4724740932642487,0.008985506893308395
735
+ ≥4,3000,average,0.48486620292260835,
736
+ ≥4,3000,average_rank,3.1,
737
+ ≥4,3000,chartqa_relaxed_overall,0.648,0.009553790345406665
738
+ ≥4,3000,docvqa_val_anls,0.6797920414745026,0.0059259219910189455
739
+ ≥4,3000,infovqa_val_anls,0.25291991664683544,0.0068990348571168
740
+ ≥4,3000,mme_total_score,989.6139455782312,
741
+ ≥4,3000,mmmu_val_mmmu_acc,0.31889,
742
+ ≥4,3000,mmstar_average,0.359381486980145,
743
+ ≥4,3000,ocrbench_ocrbench_accuracy,0.528,
744
+ ≥4,3000,seedbench_seed_all,0.5529182879377432,
745
+ ≥4,3000,textvqa_val_exact_match,0.55142,0.006751052663282407
746
+ ≥4,4000,ai2d_exact_match,0.48704663212435234,0.008996133680935945
747
+ ≥4,4000,average,0.4833844903828087,
748
+ ≥4,4000,average_rank,3.6,
749
+ ≥4,4000,chartqa_relaxed_overall,0.634,0.00963611653607192
750
+ ≥4,4000,docvqa_val_anls,0.6872369707367743,0.005902275856072045
751
+ ≥4,4000,infovqa_val_anls,0.26951247528968925,0.007084476663871501
752
+ ≥4,4000,mme_total_score,943.8639455782313,
753
+ ≥4,4000,mmmu_val_mmmu_acc,0.28556,
754
+ ≥4,4000,mmstar_average,0.3561252135601649,
755
+ ≥4,4000,ocrbench_ocrbench_accuracy,0.525,
756
+ ≥4,4000,seedbench_seed_all,0.5544191217342969,
757
+ ≥4,4000,textvqa_val_exact_match,0.55156,0.006755726552211068
758
+ ≥4,5000,ai2d_exact_match,0.4944948186528497,0.008998608627616672
759
+ ≥4,5000,average,0.4902778679880179,
760
+ ≥4,5000,average_rank,3.2,
761
+ ≥4,5000,chartqa_relaxed_overall,0.6524,0.009526069199715017
762
+ ≥4,5000,docvqa_val_anls,0.6838821393578449,0.005934519981948664
763
+ ≥4,5000,infovqa_val_anls,0.2885173111410286,0.007387917761485684
764
+ ≥4,5000,mme_total_score,877.7568027210884,
765
+ ≥4,5000,mmmu_val_mmmu_acc,0.28222,
766
+ ≥4,5000,mmstar_average,0.3491170485770135,
767
+ ≥4,5000,ocrbench_ocrbench_accuracy,0.543,
768
+ ≥4,5000,seedbench_seed_all,0.5610894941634241,
769
+ ≥4,5000,textvqa_val_exact_match,0.55778,0.006740043023304169
770
+ ≥4,6000,ai2d_exact_match,0.47830310880829013,0.008990677331728418
771
+ ≥4,6000,average,0.49160704402561856,
772
+ ≥4,6000,average_rank,3.2,
773
+ ≥4,6000,chartqa_relaxed_overall,0.6524,0.009526069199715017
774
+ ≥4,6000,docvqa_val_anls,0.6895610098990497,0.005895883993977457
775
+ ≥4,6000,infovqa_val_anls,0.29445466931250164,0.007468796422091737
776
+ ≥4,6000,mme_total_score,959.8639455782312,
777
+ ≥4,6000,mmmu_val_mmmu_acc,0.29889,
778
+ ≥4,6000,mmstar_average,0.33644817130133137,
779
+ ≥4,6000,ocrbench_ocrbench_accuracy,0.56,
780
+ ≥4,6000,seedbench_seed_all,0.5555864369093941,
781
+ ≥4,6000,textvqa_val_exact_match,0.5588199999999999,0.006728260950578821
782
+ ≥4,7000,ai2d_exact_match,0.5006476683937824,0.008999146569435549
783
+ ≥4,7000,average,0.4896016595164798,
784
+ ≥4,7000,average_rank,3.4,
785
+ ≥4,7000,chartqa_relaxed_overall,0.6572,0.009494805133851454
786
+ ≥4,7000,docvqa_val_anls,0.6893686651094205,0.0058969034940001145
787
+ ≥4,7000,infovqa_val_anls,0.2859893612299588,0.00729316403263038
788
+ ≥4,7000,mme_total_score,927.2312925170069,
789
+ ≥4,7000,mmmu_val_mmmu_acc,0.29222,
790
+ ≥4,7000,mmstar_average,0.3472936989473972,
791
+ ≥4,7000,ocrbench_ocrbench_accuracy,0.519,
792
+ ≥4,7000,seedbench_seed_all,0.5559755419677599,
793
+ ≥4,7000,textvqa_val_exact_match,0.55872,0.0067301301064875444
794
+ ≥4,8000,ai2d_exact_match,0.4731217616580311,0.008986142019669732
795
+ ≥4,8000,average,0.492078815307747,
796
+ ≥4,8000,average_rank,3.5,
797
+ ≥4,8000,chartqa_relaxed_overall,0.6676,0.009423354808471266
798
+ ≥4,8000,docvqa_val_anls,0.6925495561792242,0.005900020554879468
799
+ ≥4,8000,infovqa_val_anls,0.2810222429209379,0.007207972787105912
800
+ ≥4,8000,mme_total_score,848.2908163265306,
801
+ ≥4,8000,mmmu_val_mmmu_acc,0.31333,
802
+ ≥4,8000,mmstar_average,0.350192703081569,
803
+ ≥4,8000,ocrbench_ocrbench_accuracy,0.529,
804
+ ≥4,8000,seedbench_seed_all,0.5595330739299611,
805
+ ≥4,8000,textvqa_val_exact_match,0.56236,0.006739267736625781
806
+ ≥4,9000,ai2d_exact_match,0.48737046632124353,0.0089962828388782
807
+ ≥4,9000,average,0.49234565866208857,
808
+ ≥4,9000,average_rank,4.1,
809
+ ≥4,9000,chartqa_relaxed_overall,0.6608,0.009470650520873179
810
+ ≥4,9000,docvqa_val_anls,0.6999407172900073,0.0058399608509493465
811
+ ≥4,9000,infovqa_val_anls,0.28057597856713984,0.0072085582760555555
812
+ ≥4,9000,mme_total_score,971.6003401360543,
813
+ ≥4,9000,mmmu_val_mmmu_acc,0.28111,
814
+ ≥4,9000,mmstar_average,0.3444424817337138,
815
+ ≥4,9000,ocrbench_ocrbench_accuracy,0.545,
816
+ ≥4,9000,seedbench_seed_all,0.5603112840466926,
817
+ ≥4,9000,textvqa_val_exact_match,0.5715600000000001,0.006710949310502175
818
+ ≥4,10000,ai2d_exact_match,0.5009715025906736,0.008999137132137068
819
+ ≥4,10000,average,0.5030450627246211,
820
+ ≥4,10000,average_rank,3.3,
821
+ ≥4,10000,chartqa_relaxed_overall,0.666,0.009434680984649817
822
+ ≥4,10000,docvqa_val_anls,0.7128440324276674,0.005793211438464534
823
+ ≥4,10000,infovqa_val_anls,0.28379375750066616,0.007201019014370097
824
+ ≥4,10000,mme_total_score,823.7772108843537,
825
+ ≥4,10000,mmmu_val_mmmu_acc,0.30444,
826
+ ≥4,10000,mmstar_average,0.35617504910097153,
827
+ ≥4,10000,ocrbench_ocrbench_accuracy,0.563,
828
+ ≥4,10000,seedbench_seed_all,0.562201222901612,
829
+ ≥4,10000,textvqa_val_exact_match,0.57798,0.00669581889824864
830
+ ≥4,11000,ai2d_exact_match,0.4899611398963731,0.008997340090107678
831
+ ≥4,11000,average,0.5043945508574572,
832
+ ≥4,11000,average_rank,3.7,
833
+ ≥4,11000,chartqa_relaxed_overall,0.6684,0.009417645821601513
834
+ ≥4,11000,docvqa_val_anls,0.718360308980877,0.00573640855634517
835
+ ≥4,11000,infovqa_val_anls,0.3061911172660032,0.007586892248142986
836
+ ≥4,11000,mme_total_score,913.9846938775511,
837
+ ≥4,11000,mmmu_val_mmmu_acc,0.30444,
838
+ ≥4,11000,mmstar_average,0.3441847617795319,
839
+ ≥4,11000,ocrbench_ocrbench_accuracy,0.572,
840
+ ≥4,11000,seedbench_seed_all,0.5605336297943302,
841
+ ≥4,11000,textvqa_val_exact_match,0.5754799999999999,0.006700024775058468
842
+ ≥4,12000,ai2d_exact_match,0.48737046632124353,0.0089962828388782
843
+ ≥4,12000,average,0.5040020270755444,
844
+ ≥4,12000,average_rank,3.7,
845
+ ≥4,12000,chartqa_relaxed_overall,0.6708,0.009400334595970852
846
+ ≥4,12000,docvqa_val_anls,0.7119962267424205,0.0057890771916119035
847
+ ≥4,12000,infovqa_val_anls,0.29271378410211696,0.007308133874246768
848
+ ≥4,12000,mme_total_score,857.7363945578231,
849
+ ≥4,12000,mmmu_val_mmmu_acc,0.31333,
850
+ ≥4,12000,mmstar_average,0.3366375608443022,
851
+ ≥4,12000,ocrbench_ocrbench_accuracy,0.578,
852
+ ≥4,12000,seedbench_seed_all,0.5663702056698166,
853
+ ≥4,12000,textvqa_val_exact_match,0.5788,0.006686093984573812
854
+ ≥4,13000,ai2d_exact_match,0.5029145077720207,0.008999001233939135
855
+ ≥4,13000,average,0.5025324527027837,
856
+ ≥4,13000,average_rank,3.8,
857
+ ≥4,13000,chartqa_relaxed_overall,0.6736,0.009379787213112317
858
+ ≥4,13000,docvqa_val_anls,0.7115068932890629,0.0057865061972425
859
+ ≥4,13000,infovqa_val_anls,0.28657766964072817,0.007222563686487699
860
+ ≥4,13000,mme_total_score,912.2363945578231,
861
+ ≥4,13000,mmmu_val_mmmu_acc,0.30222,
862
+ ≥4,13000,mmstar_average,0.35323329267271303,
863
+ ≥4,13000,ocrbench_ocrbench_accuracy,0.548,
864
+ ≥4,13000,seedbench_seed_all,0.5634797109505281,
865
+ ≥4,13000,textvqa_val_exact_match,0.58126,0.006685319826323647
866
+ ≥4,14000,ai2d_exact_match,0.5029145077720207,0.008999001233939133
867
+ ≥4,14000,average,0.5048464815785578,
868
+ ≥4,14000,average_rank,3.7,
869
+ ≥4,14000,chartqa_relaxed_overall,0.6836,0.009303280948921504
870
+ ≥4,14000,docvqa_val_anls,0.7158797575412708,0.005776895411277372
871
+ ≥4,14000,infovqa_val_anls,0.2977244895971059,0.007409958547797003
872
+ ≥4,14000,mme_total_score,863.687074829932,
873
+ ≥4,14000,mmmu_val_mmmu_acc,0.30111,
874
+ ≥4,14000,mmstar_average,0.33994823410485003,
875
+ ≥4,14000,ocrbench_ocrbench_accuracy,0.562,
876
+ ≥4,14000,seedbench_seed_all,0.5584213451917732,
877
+ ≥4,14000,textvqa_val_exact_match,0.58202,0.0066807687023343965
878
+ ≥4,15000,ai2d_exact_match,0.5123056994818653,0.008996428218289531
879
+ ≥4,15000,average,0.5109092566320045,
880
+ ≥4,15000,average_rank,3.5,
881
+ ≥4,15000,chartqa_relaxed_overall,0.6712,0.009397422445513864
882
+ ≥4,15000,docvqa_val_anls,0.7188356324043049,0.005760252125758746
883
+ ≥4,15000,infovqa_val_anls,0.31301984081498224,0.007566633771439808
884
+ ≥4,15000,mme_total_score,910.8588435374149,
885
+ ≥4,15000,mmmu_val_mmmu_acc,0.31333,
886
+ ≥4,15000,mmstar_average,0.3405171175316349,
887
+ ≥4,15000,ocrbench_ocrbench_accuracy,0.57,
888
+ ≥4,15000,seedbench_seed_all,0.5630350194552529,
889
+ ≥4,15000,textvqa_val_exact_match,0.59594,0.006638698497713893
890
+ ≥4,16000,ai2d_exact_match,0.5035621761658031,0.008998925734053562
891
+ ≥4,16000,average,0.5129618260818141,
892
+ ≥4,16000,average_rank,3.5,
893
+ ≥4,16000,chartqa_relaxed_overall,0.6816,0.00931897598051042
894
+ ≥4,16000,docvqa_val_anls,0.721094250093947,0.005753477697941139
895
+ ≥4,16000,infovqa_val_anls,0.3169075222245947,0.007639057821423446
896
+ ≥4,16000,mme_total_score,900.3299319727892,
897
+ ≥4,16000,mmmu_val_mmmu_acc,0.30222,
898
+ ≥4,16000,mmstar_average,0.34176931782507797,
899
+ ≥4,16000,ocrbench_ocrbench_accuracy,0.588,
900
+ ≥4,16000,seedbench_seed_all,0.5657031684269038,
901
+ ≥4,16000,textvqa_val_exact_match,0.5958,0.006635034041762488
902
+ ≥4,17000,ai2d_exact_match,0.49190414507772023,0.008997974381217109
903
+ ≥4,17000,average,0.5093728538878098,
904
+ ≥4,17000,average_rank,3.7,
905
+ ≥4,17000,chartqa_relaxed_overall,0.68,0.009331389496316869
906
+ ≥4,17000,docvqa_val_anls,0.7210491309814,0.005753367994292813
907
+ ≥4,17000,infovqa_val_anls,0.3201561983029552,0.007662267005009952
908
+ ≥4,17000,mme_total_score,877.8401360544218,
909
+ ≥4,17000,mmmu_val_mmmu_acc,0.30778,
910
+ ≥4,17000,mmstar_average,0.3348906353085911,
911
+ ≥4,17000,ocrbench_ocrbench_accuracy,0.573,
912
+ ≥4,17000,seedbench_seed_all,0.564035575319622,
913
+ ≥4,17000,textvqa_val_exact_match,0.59154,0.006655985735352941
914
+ ≥4,18000,ai2d_exact_match,0.49028497409326427,0.008997455247470554
915
+ ≥4,18000,average,0.5099416307525485,
916
+ ≥4,18000,average_rank,3.7,
917
+ ≥4,18000,chartqa_relaxed_overall,0.6788,0.00934061683451043
918
+ ≥4,18000,docvqa_val_anls,0.7282528158071215,0.00570007403218014
919
+ ≥4,18000,infovqa_val_anls,0.3087200968720397,0.007513490264469946
920
+ ≥4,18000,mme_total_score,929.9506802721088,
921
+ ≥4,18000,mmmu_val_mmmu_acc,0.30111,
922
+ ≥4,18000,mmstar_average,0.342890936748704,
923
+ ≥4,18000,ocrbench_ocrbench_accuracy,0.585,
924
+ ≥4,18000,seedbench_seed_all,0.5645358532518066,
925
+ ≥4,18000,textvqa_val_exact_match,0.5898800000000001,0.006662761859703513
926
+ ≥4,19000,ai2d_exact_match,0.49417098445595853,0.008998542562369278
927
+ ≥4,19000,average,0.5124245876024062,
928
+ ≥4,19000,average_rank,3.7,
929
+ ≥4,19000,chartqa_relaxed_overall,0.6752,0.00936787525721462
930
+ ≥4,19000,docvqa_val_anls,0.7321651331177514,0.005674418458926489
931
+ ≥4,19000,infovqa_val_anls,0.3128382816486564,0.007545062449451713
932
+ ≥4,19000,mme_total_score,920.5612244897959,
933
+ ≥4,19000,mmmu_val_mmmu_acc,0.31444,
934
+ ≥4,19000,mmstar_average,0.3381960053749425,
935
+ ≥4,19000,ocrbench_ocrbench_accuracy,0.581,
936
+ ≥4,19000,seedbench_seed_all,0.5635908838243469,
937
+ ≥4,19000,textvqa_val_exact_match,0.60022,0.0066236821295251325
938
+ ≥4,20000,ai2d_exact_match,0.4993523316062176,0.008999146569435543
939
+ ≥4,20000,average,0.5097536365259775,
940
+ ≥4,20000,average_rank,2.9,
941
+ ≥4,20000,chartqa_relaxed_overall,0.6788,0.00934061683451043
942
+ ≥4,20000,docvqa_val_anls,0.7257805691640822,0.005714530309266441
943
+ ≥4,20000,infovqa_val_anls,0.3115295213783156,0.007581035362425172
944
+ ≥4,20000,mme_total_score,936.8911564625851,
945
+ ≥4,20000,mmmu_val_mmmu_acc,0.29333,
946
+ ≥4,20000,mmstar_average,0.342179144828651,
947
+ ≥4,20000,ocrbench_ocrbench_accuracy,0.572,
948
+ ≥4,20000,seedbench_seed_all,0.5640911617565314,
949
+ ≥4,20000,textvqa_val_exact_match,0.6007199999999999,0.00662859592800733
950
+ ≥5,1000,ai2d_exact_match,0.46275906735751293,0.008974157783087492
951
+ ≥5,1000,average,0.46601067306382465,
952
+ ≥5,1000,average_rank,4.3,
953
+ ≥5,1000,chartqa_relaxed_overall,0.586,0.009852940280589808
954
+ ≥5,1000,docvqa_val_anls,0.6587979311295683,0.006033428065938081
955
+ ≥5,1000,infovqa_val_anls,0.26573226652787757,0.007027770857338852
956
+ ≥5,1000,mme_total_score,1141.8704481792718,
957
+ ≥5,1000,mmmu_val_mmmu_acc,0.29111,
958
+ ≥5,1000,mmstar_average,0.3326633517590184,
959
+ ≥5,1000,ocrbench_ocrbench_accuracy,0.512,
960
+ ≥5,1000,seedbench_seed_all,0.5481934408004447,
961
+ ≥5,1000,textvqa_val_exact_match,0.53684,0.0067933638823904985
962
+ ≥5,2000,ai2d_exact_match,0.46729274611398963,0.008979879527453428
963
+ ≥5,2000,average,0.46843615619784085,
964
+ ≥5,2000,average_rank,4.5,
965
+ ≥5,2000,chartqa_relaxed_overall,0.6232,0.009693621125059844
966
+ ≥5,2000,docvqa_val_anls,0.6576303662245503,0.0060380542666198
967
+ ≥5,2000,infovqa_val_anls,0.2544768002153279,0.006980921578600097
968
+ ≥5,2000,mme_total_score,1121.454081632653,
969
+ ≥5,2000,mmmu_val_mmmu_acc,0.28333,
970
+ ≥5,2000,mmstar_average,0.3471972386408189,
971
+ ≥5,2000,ocrbench_ocrbench_accuracy,0.517,
972
+ ≥5,2000,seedbench_seed_all,0.544858254585881,
973
+ ≥5,2000,textvqa_val_exact_match,0.52094,0.006790900275023118
974
+ ≥5,3000,ai2d_exact_match,0.405440414507772,0.00883675667187808
975
+ ≥5,3000,average,0.46118026257153605,
976
+ ≥5,3000,average_rank,4.2,
977
+ ≥5,3000,chartqa_relaxed_overall,0.6156,0.009731008838409575
978
+ ≥5,3000,docvqa_val_anls,0.6431483265654762,0.0060571462869005105
979
+ ≥5,3000,infovqa_val_anls,0.25688718356638174,0.007171420821325129
980
+ ≥5,3000,mme_total_score,1082.7074829931973,
981
+ ≥5,3000,mmmu_val_mmmu_acc,0.29778,
982
+ ≥5,3000,mmstar_average,0.3516979671312099,
983
+ ≥5,3000,ocrbench_ocrbench_accuracy,0.521,
984
+ ≥5,3000,seedbench_seed_all,0.547248471372985,
985
+ ≥5,3000,textvqa_val_exact_match,0.51182,0.006815757362882421
986
+ ≥5,4000,ai2d_exact_match,0.4491580310880829,0.008952509302111547
987
+ ≥5,4000,average,0.4646276743863569,
988
+ ≥5,4000,average_rank,4.7,
989
+ ≥5,4000,chartqa_relaxed_overall,0.626,0.009679208378267924
990
+ ≥5,4000,docvqa_val_anls,0.6457409563970327,0.006096190550822001
991
+ ≥5,4000,infovqa_val_anls,0.2657314142312884,0.007188343779485259
992
+ ≥5,4000,mme_total_score,1068.4183673469388,
993
+ ≥5,4000,mmmu_val_mmmu_acc,0.30333,
994
+ ≥5,4000,mmstar_average,0.33033606075691685,
995
+ ≥5,4000,ocrbench_ocrbench_accuracy,0.501,
996
+ ≥5,4000,seedbench_seed_all,0.546692607003891,
997
+ ≥5,4000,textvqa_val_exact_match,0.5136599999999999,0.006800789000270868
998
+ ≥5,5000,ai2d_exact_match,0.4630829015544041,0.008974591204222938
999
+ ≥5,5000,average,0.47175640838182836,
1000
+ ≥5,5000,average_rank,4.3,
1001
+ ≥5,5000,chartqa_relaxed_overall,0.6016,0.009793331391099473
1002
+ ≥5,5000,docvqa_val_anls,0.6583943642704193,0.006028435004183156
1003
+ ≥5,5000,infovqa_val_anls,0.28445442343834715,0.007395809557151278
1004
+ ≥5,5000,mme_total_score,1063.6232492997199,
1005
+ ≥5,5000,mmmu_val_mmmu_acc,0.30111,
1006
+ ≥5,5000,mmstar_average,0.34898828745177285,
1007
+ ≥5,5000,ocrbench_ocrbench_accuracy,0.524,
1008
+ ≥5,5000,seedbench_seed_all,0.5438576987215119,
1009
+ ≥5,5000,textvqa_val_exact_match,0.52032,0.006801255099919928
1010
+ ≥5,6000,ai2d_exact_match,0.41936528497409326,0.008881358943343104
1011
+ ≥5,6000,average,0.4607787848869989,
1012
+ ≥5,6000,average_rank,4.3,
1013
+ ≥5,6000,chartqa_relaxed_overall,0.5688,0.009906860368095493
1014
+ ≥5,6000,docvqa_val_anls,0.6530464526768891,0.006064278476677726
1015
+ ≥5,6000,infovqa_val_anls,0.2838075612518576,0.007411753553258339
1016
+ ≥5,6000,mme_total_score,1102.3075230092036,
1017
+ ≥5,6000,mmmu_val_mmmu_acc,0.30222,
1018
+ ≥5,6000,mmstar_average,0.3384709546298998,
1019
+ ≥5,6000,ocrbench_ocrbench_accuracy,0.514,
1020
+ ≥5,6000,seedbench_seed_all,0.5458588104502501,
1021
+ ≥5,6000,textvqa_val_exact_match,0.52144,0.006795447071398616
1022
+ ≥5,7000,ai2d_exact_match,0.44689119170984454,0.008948245073044946
1023
+ ≥5,7000,average,0.46361553646961884,
1024
+ ≥5,7000,average_rank,4.8,
1025
+ ≥5,7000,chartqa_relaxed_overall,0.596,0.009815912634917984
1026
+ ≥5,7000,docvqa_val_anls,0.6473018376832792,0.00607167873881633
1027
+ ≥5,7000,infovqa_val_anls,0.2701993608610082,0.007205660851186524
1028
+ ≥5,7000,mme_total_score,1018.4163665466186,
1029
+ ≥5,7000,mmmu_val_mmmu_acc,0.30556,
1030
+ ≥5,7000,mmstar_average,0.33545236293074815,
1031
+ ≥5,7000,ocrbench_ocrbench_accuracy,0.512,
1032
+ ≥5,7000,seedbench_seed_all,0.5431350750416898,
1033
+ ≥5,7000,textvqa_val_exact_match,0.516,0.006822412261202951
1034
+ ≥5,8000,ai2d_exact_match,0.4488341968911917,0.008951911635408226
1035
+ ≥5,8000,average,0.4668616879585074,
1036
+ ≥5,8000,average_rank,4.6,
1037
+ ≥5,8000,chartqa_relaxed_overall,0.6072,0.00976941352263433
1038
+ ≥5,8000,docvqa_val_anls,0.6519934800658986,0.006080604378776126
1039
+ ≥5,8000,infovqa_val_anls,0.2785842294592336,0.0074156607313128845
1040
+ ≥5,8000,mme_total_score,1057.0833333333333,
1041
+ ≥5,8000,mmmu_val_mmmu_acc,0.30333,
1042
+ ≥5,8000,mmstar_average,0.3378731184509322,
1043
+ ≥5,8000,ocrbench_ocrbench_accuracy,0.518,
1044
+ ≥5,8000,seedbench_seed_all,0.5403001667593107,
1045
+ ≥5,8000,textvqa_val_exact_match,0.51564,0.006799847473666819
1046
+ ≥5,9000,ai2d_exact_match,0.39345854922279794,0.008792480650628204
1047
+ ≥5,9000,average,0.459912532971243,
1048
+ ≥5,9000,average_rank,4.4,
1049
+ ≥5,9000,chartqa_relaxed_overall,0.6084,0.00976411343463736
1050
+ ≥5,9000,docvqa_val_anls,0.6541887939771373,0.006063084097609983
1051
+ ≥5,9000,infovqa_val_anls,0.27276949319611876,0.007319733478874126
1052
+ ≥5,9000,mme_total_score,1123.0184073629453,
1053
+ ≥5,9000,mmmu_val_mmmu_acc,0.32333,
1054
+ ≥5,9000,mmstar_average,0.3247860715180074,
1055
+ ≥5,9000,ocrbench_ocrbench_accuracy,0.509,
1056
+ ≥5,9000,seedbench_seed_all,0.5397998888271262,
1057
+ ≥5,9000,textvqa_val_exact_match,0.51348,0.006813467735926963
1058
+ ≥5,10000,ai2d_exact_match,0.4326424870466321,0.008917121282993509
1059
+ ≥5,10000,average,0.46134428795967075,
1060
+ ≥5,10000,average_rank,4.9,
1061
+ ≥5,10000,chartqa_relaxed_overall,0.6072,0.00976941352263433
1062
+ ≥5,10000,docvqa_val_anls,0.651687815510166,0.006071913532526164
1063
+ ≥5,10000,infovqa_val_anls,0.27997237091892013,0.007395864137910542
1064
+ ≥5,10000,mme_total_score,1022.3228291316527,
1065
+ ≥5,10000,mmmu_val_mmmu_acc,0.28889,
1066
+ ≥5,10000,mmstar_average,0.329809486810568,
1067
+ ≥5,10000,ocrbench_ocrbench_accuracy,0.511,
1068
+ ≥5,10000,seedbench_seed_all,0.5375764313507504,
1069
+ ≥5,10000,textvqa_val_exact_match,0.51332,0.006823388252580171
1070
+ ≥5,11000,ai2d_exact_match,0.4268134715025907,0.008902228386480452
1071
+ ≥5,11000,average,0.46104097512732234,
1072
+ ≥5,11000,average_rank,4.5,
1073
+ ≥5,11000,chartqa_relaxed_overall,0.6012,0.0097949885513097
1074
+ ≥5,11000,docvqa_val_anls,0.644837445168982,0.006085623472495874
1075
+ ≥5,11000,infovqa_val_anls,0.2640855729780956,0.007198720155523597
1076
+ ≥5,11000,mme_total_score,1039.019507803121,
1077
+ ≥5,11000,mmmu_val_mmmu_acc,0.30889,
1078
+ ≥5,11000,mmstar_average,0.3483796516991238,
1079
+ ≥5,11000,ocrbench_ocrbench_accuracy,0.509,
1080
+ ≥5,11000,seedbench_seed_all,0.5367426347971095,
1081
+ ≥5,11000,textvqa_val_exact_match,0.50942,0.00682319308775463
1082
+ ≥5,12000,ai2d_exact_match,0.3944300518134715,0.008796275864065532
1083
+ ≥5,12000,average,0.4536626915651331,
1084
+ ≥5,12000,average_rank,4.5,
1085
+ ≥5,12000,chartqa_relaxed_overall,0.5772,0.009882060820012199
1086
+ ≥5,12000,docvqa_val_anls,0.6559090016447592,0.00604562177320508
1087
+ ≥5,12000,infovqa_val_anls,0.2673016323091914,0.0072722156919221015
1088
+ ≥5,12000,mme_total_score,1023.3735494197679,
1089
+ ≥5,12000,mmmu_val_mmmu_acc,0.31778,
1090
+ ≥5,12000,mmstar_average,0.3313831269791424,
1091
+ ≥5,12000,ocrbench_ocrbench_accuracy,0.505,
1092
+ ≥5,12000,seedbench_seed_all,0.5327404113396331,
1093
+ ≥5,12000,textvqa_val_exact_match,0.50122,0.006832030272732221
1094
+ ≥5,13000,ai2d_exact_match,0.40867875647668395,0.00884778289870742
1095
+ ≥5,13000,average,0.45776989353761316,
1096
+ ≥5,13000,average_rank,4.6,
1097
+ ≥5,13000,chartqa_relaxed_overall,0.5908,0.009835692163550793
1098
+ ≥5,13000,docvqa_val_anls,0.6503688245286325,0.0060676446684505315
1099
+ ≥5,13000,infovqa_val_anls,0.2636657235502622,0.007162177374827191
1100
+ ≥5,13000,mme_total_score,1002.4256702681073,
1101
+ ≥5,13000,mmmu_val_mmmu_acc,0.31556,
1102
+ ≥5,13000,mmstar_average,0.3443129801956691,
1103
+ ≥5,13000,ocrbench_ocrbench_accuracy,0.512,
1104
+ ≥5,13000,seedbench_seed_all,0.5329627570872707,
1105
+ ≥5,13000,textvqa_val_exact_match,0.50158,0.006823401826251807
1106
+ ≥5,14000,ai2d_exact_match,0.41483160621761656,0.008867639612484149
1107
+ ≥5,14000,average,0.45268439675941724,
1108
+ ≥5,14000,average_rank,4.8,
1109
+ ≥5,14000,chartqa_relaxed_overall,0.588,0.009845871036662436
1110
+ ≥5,14000,docvqa_val_anls,0.6498549577427524,0.006073551423686328
1111
+ ≥5,14000,infovqa_val_anls,0.2661623211050356,0.00731739457874179
1112
+ ≥5,14000,mme_total_score,1034.9393757503,
1113
+ ≥5,14000,mmmu_val_mmmu_acc,0.3,
1114
+ ≥5,14000,mmstar_average,0.3342054606442816,
1115
+ ≥5,14000,ocrbench_ocrbench_accuracy,0.494,
1116
+ ≥5,14000,seedbench_seed_all,0.5294052251250695,
1117
+ ≥5,14000,textvqa_val_exact_match,0.4977,0.006830920457365827
1118
+ ≥5,15000,ai2d_exact_match,0.42001295336787564,0.008883255931688048
1119
+ ≥5,15000,average,0.4570450291018434,
1120
+ ≥5,15000,average_rank,4.9,
1121
+ ≥5,15000,chartqa_relaxed_overall,0.59,0.009838634025503496
1122
+ ≥5,15000,docvqa_val_anls,0.6475057079650752,0.006081599544786637
1123
+ ≥5,15000,infovqa_val_anls,0.26732840510253686,0.007267222145742162
1124
+ ≥5,15000,mme_total_score,1033.4811924769908,
1125
+ ≥5,15000,mmmu_val_mmmu_acc,0.30667,
1126
+ ≥5,15000,mmstar_average,0.33017343172346,
1127
+ ≥5,15000,ocrbench_ocrbench_accuracy,0.516,
1128
+ ≥5,15000,seedbench_seed_all,0.5345747637576431,
1129
+ ≥5,15000,textvqa_val_exact_match,0.5011399999999999,0.006833438318727342
1130
+ ≥5,16000,ai2d_exact_match,0.41353626943005184,0.008863577928878446
1131
+ ≥5,16000,average,0.45298319741394405,
1132
+ ≥5,16000,average_rank,4.9,
1133
+ ≥5,16000,chartqa_relaxed_overall,0.5928,0.00982821965366181
1134
+ ≥5,16000,docvqa_val_anls,0.6444827949953511,0.006083163064354419
1135
+ ≥5,16000,infovqa_val_anls,0.27176255535031313,0.0073933278275316846
1136
+ ≥5,16000,mme_total_score,1004.6780712284915,
1137
+ ≥5,16000,mmmu_val_mmmu_acc,0.30111,
1138
+ ≥5,16000,mmstar_average,0.32864449435945237,
1139
+ ≥5,16000,ocrbench_ocrbench_accuracy,0.502,
1140
+ ≥5,16000,seedbench_seed_all,0.526792662590328,
1141
+ ≥5,16000,textvqa_val_exact_match,0.49572000000000005,0.006835033273947625
1142
+ ≥5,17000,ai2d_exact_match,0.4112694300518135,0.008856317823411107
1143
+ ≥5,17000,average,0.4529560838437233,
1144
+ ≥5,17000,average_rank,4.6,
1145
+ ≥5,17000,chartqa_relaxed_overall,0.5876,0.009847298295140926
1146
+ ≥5,17000,docvqa_val_anls,0.6389774022522821,0.00612389508858012
1147
+ ≥5,17000,infovqa_val_anls,0.2806511079053113,0.007508796510654168
1148
+ ≥5,17000,mme_total_score,995.327831132453,
1149
+ ≥5,17000,mmmu_val_mmmu_acc,0.31333,
1150
+ ≥5,17000,mmstar_average,0.33458844306670454,
1151
+ ≥5,17000,ocrbench_ocrbench_accuracy,0.496,
1152
+ ≥5,17000,seedbench_seed_all,0.5230683713173986,
1153
+ ≥5,17000,textvqa_val_exact_match,0.49112,0.006832742230852753
1154
+ ≥5,18000,ai2d_exact_match,0.41936528497409326,0.008881358943343104
1155
+ ≥5,18000,average,0.4506952703715631,
1156
+ ≥5,18000,average_rank,4.6,
1157
+ ≥5,18000,chartqa_relaxed_overall,0.5752,0.009888230116554488
1158
+ ≥5,18000,docvqa_val_anls,0.6393690836463973,0.006115883377355433
1159
+ ≥5,18000,infovqa_val_anls,0.26588334023973736,0.007337010225614936
1160
+ ≥5,18000,mme_total_score,989.2097839135654,
1161
+ ≥5,18000,mmmu_val_mmmu_acc,0.31333,
1162
+ ≥5,18000,mmstar_average,0.32322974671841465,
1163
+ ≥5,18000,ocrbench_ocrbench_accuracy,0.498,
1164
+ ≥5,18000,seedbench_seed_all,0.5279599777654252,
1165
+ ≥5,18000,textvqa_val_exact_match,0.49391999999999997,0.006830895911063903
1166
+ ≥5,19000,ai2d_exact_match,0.3954015544041451,0.008800034697838395
1167
+ ≥5,19000,average,0.44423744725800945,
1168
+ ≥5,19000,average_rank,4.9,
1169
+ ≥5,19000,chartqa_relaxed_overall,0.5744,0.009890651444389179
1170
+ ≥5,19000,docvqa_val_anls,0.6275859067200067,0.006146304949422434
1171
+ ≥5,19000,infovqa_val_anls,0.27001013621966435,0.00741081345045112
1172
+ ≥5,19000,mme_total_score,1012.671468587435,
1173
+ ≥5,19000,mmmu_val_mmmu_acc,0.30222,
1174
+ ≥5,19000,mmstar_average,0.33172139573813564,
1175
+ ≥5,19000,ocrbench_ocrbench_accuracy,0.489,
1176
+ ≥5,19000,seedbench_seed_all,0.5244580322401334,
1177
+ ≥5,19000,textvqa_val_exact_match,0.48334000000000005,0.006839754771120511
1178
+ ≥5,20000,ai2d_exact_match,0.3950777202072539,0.00879878579254534
1179
+ ≥5,20000,average,0.44700580037620813,
1180
+ ≥5,20000,average_rank,3.8,
1181
+ ≥5,20000,chartqa_relaxed_overall,0.5824,0.009865243291986469
1182
+ ≥5,20000,docvqa_val_anls,0.635044358086249,0.006123826440768213
1183
+ ≥5,20000,infovqa_val_anls,0.2648967410637257,0.0073547743128100345
1184
+ ≥5,20000,mme_total_score,1015.2638055222089,
1185
+ ≥5,20000,mmmu_val_mmmu_acc,0.31111,
1186
+ ≥5,20000,mmstar_average,0.33540590765288064,
1187
+ ≥5,20000,ocrbench_ocrbench_accuracy,0.485,
1188
+ ≥5,20000,seedbench_seed_all,0.5234574763757643,
1189
+ ≥5,20000,textvqa_val_exact_match,0.49065999999999993,0.0068247980522276805
app/src/content/assets/data/ss_vs_s1.csv ADDED
@@ -0,0 +1,481 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run,step,metric,value,stderr
2
+ Single Stage,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
3
+ Single Stage,1000,average,0.27120689295763617,
4
+ Single Stage,1000,average_rank,2.0,
5
+ Single Stage,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
6
+ Single Stage,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
7
+ Single Stage,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
8
+ Single Stage,1000,mme_total_score,977.4280712284914,
9
+ Single Stage,1000,mmmu_val_mmmu_acc,0.25222,
10
+ Single Stage,1000,mmstar_average,0.23215874078908072,
11
+ Single Stage,1000,ocrbench_ocrbench_accuracy,0.286,
12
+ Single Stage,1000,seedbench_seed_all,0.2563646470261256,
13
+ Single Stage,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
14
+ Single Stage,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
15
+ Single Stage,2000,average,0.3202068275596269,
16
+ Single Stage,2000,average_rank,1.8,
17
+ Single Stage,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
18
+ Single Stage,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
19
+ Single Stage,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
20
+ Single Stage,2000,mme_total_score,1049.3036214485794,
21
+ Single Stage,2000,mmmu_val_mmmu_acc,0.24556,
22
+ Single Stage,2000,mmstar_average,0.21305462434540698,
23
+ Single Stage,2000,ocrbench_ocrbench_accuracy,0.395,
24
+ Single Stage,2000,seedbench_seed_all,0.258532518065592,
25
+ Single Stage,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
26
+ Single Stage,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
27
+ Single Stage,3000,average,0.3507423834414229,
28
+ Single Stage,3000,average_rank,1.7,
29
+ Single Stage,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
30
+ Single Stage,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
31
+ Single Stage,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
32
+ Single Stage,3000,mme_total_score,1170.2383953581434,
33
+ Single Stage,3000,mmmu_val_mmmu_acc,0.27556,
34
+ Single Stage,3000,mmstar_average,0.25432376938577683,
35
+ Single Stage,3000,ocrbench_ocrbench_accuracy,0.436,
36
+ Single Stage,3000,seedbench_seed_all,0.2792106725958866,
37
+ Single Stage,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
38
+ Single Stage,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
39
+ Single Stage,4000,average,0.36961781722974835,
40
+ Single Stage,4000,average_rank,1.8,
41
+ Single Stage,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
42
+ Single Stage,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
43
+ Single Stage,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
44
+ Single Stage,4000,mme_total_score,1155.203781512605,
45
+ Single Stage,4000,mmmu_val_mmmu_acc,0.25556,
46
+ Single Stage,4000,mmstar_average,0.2575590188757354,
47
+ Single Stage,4000,ocrbench_ocrbench_accuracy,0.453,
48
+ Single Stage,4000,seedbench_seed_all,0.33913285158421347,
49
+ Single Stage,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
50
+ Single Stage,5000,ai2d_exact_match,0.3125,0.008342439145556371
51
+ Single Stage,5000,average,0.3974627910380972,
52
+ Single Stage,5000,average_rank,1.8,
53
+ Single Stage,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
54
+ Single Stage,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
55
+ Single Stage,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
56
+ Single Stage,5000,mme_total_score,1181.4653861544618,
57
+ Single Stage,5000,mmmu_val_mmmu_acc,0.26667,
58
+ Single Stage,5000,mmstar_average,0.29596648146165705,
59
+ Single Stage,5000,ocrbench_ocrbench_accuracy,0.462,
60
+ Single Stage,5000,seedbench_seed_all,0.43107281823235133,
61
+ Single Stage,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
62
+ Single Stage,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
63
+ Single Stage,6000,average,0.4161227404571003,
64
+ Single Stage,6000,average_rank,1.6,
65
+ Single Stage,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
66
+ Single Stage,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
67
+ Single Stage,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
68
+ Single Stage,6000,mme_total_score,1284.1648659463785,
69
+ Single Stage,6000,mmmu_val_mmmu_acc,0.27111,
70
+ Single Stage,6000,mmstar_average,0.2978489412854164,
71
+ Single Stage,6000,ocrbench_ocrbench_accuracy,0.495,
72
+ Single Stage,6000,seedbench_seed_all,0.4795997776542524,
73
+ Single Stage,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
74
+ Single Stage,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
75
+ Single Stage,7000,average,0.4291083177345374,
76
+ Single Stage,7000,average_rank,1.6,
77
+ Single Stage,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
78
+ Single Stage,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
79
+ Single Stage,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
80
+ Single Stage,7000,mme_total_score,1185.875650260104,
81
+ Single Stage,7000,mmmu_val_mmmu_acc,0.26556,
82
+ Single Stage,7000,mmstar_average,0.31372400960777047,
83
+ Single Stage,7000,ocrbench_ocrbench_accuracy,0.504,
84
+ Single Stage,7000,seedbench_seed_all,0.4964424680377988,
85
+ Single Stage,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
86
+ Single Stage,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
87
+ Single Stage,8000,average,0.43846759477995995,
88
+ Single Stage,8000,average_rank,1.5,
89
+ Single Stage,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
90
+ Single Stage,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
91
+ Single Stage,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
92
+ Single Stage,8000,mme_total_score,1199.2409963985594,
93
+ Single Stage,8000,mmmu_val_mmmu_acc,0.28111,
94
+ Single Stage,8000,mmstar_average,0.33512257186205047,
95
+ Single Stage,8000,ocrbench_ocrbench_accuracy,0.51,
96
+ Single Stage,8000,seedbench_seed_all,0.5024458032240133,
97
+ Single Stage,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
98
+ Single Stage,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
99
+ Single Stage,9000,average,0.4422510732201056,
100
+ Single Stage,9000,average_rank,1.6,
101
+ Single Stage,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
102
+ Single Stage,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
103
+ Single Stage,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
104
+ Single Stage,9000,mme_total_score,1231.5195078031213,
105
+ Single Stage,9000,mmmu_val_mmmu_acc,0.25889,
106
+ Single Stage,9000,mmstar_average,0.3216444898242951,
107
+ Single Stage,9000,ocrbench_ocrbench_accuracy,0.515,
108
+ Single Stage,9000,seedbench_seed_all,0.5120622568093385,
109
+ Single Stage,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
110
+ Single Stage,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
111
+ Single Stage,10000,average,0.4523875703250908,
112
+ Single Stage,10000,average_rank,1.3,
113
+ Single Stage,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
114
+ Single Stage,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
115
+ Single Stage,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
116
+ Single Stage,10000,mme_total_score,1240.8218287314926,
117
+ Single Stage,10000,mmmu_val_mmmu_acc,0.28778,
118
+ Single Stage,10000,mmstar_average,0.32972717906018517,
119
+ Single Stage,10000,ocrbench_ocrbench_accuracy,0.517,
120
+ Single Stage,10000,seedbench_seed_all,0.5217342968315731,
121
+ Single Stage,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
122
+ Single Stage,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
123
+ Single Stage,11000,average,0.4561398159525099,
124
+ Single Stage,11000,average_rank,1.2,
125
+ Single Stage,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
126
+ Single Stage,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
127
+ Single Stage,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
128
+ Single Stage,11000,mme_total_score,1322.9488795518205,
129
+ Single Stage,11000,mmmu_val_mmmu_acc,0.27778,
130
+ Single Stage,11000,mmstar_average,0.3298563439522548,
131
+ Single Stage,11000,ocrbench_ocrbench_accuracy,0.521,
132
+ Single Stage,11000,seedbench_seed_all,0.5237354085603113,
133
+ Single Stage,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
134
+ Single Stage,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
135
+ Single Stage,12000,average,0.4582751140055433,
136
+ Single Stage,12000,average_rank,1.4,
137
+ Single Stage,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
138
+ Single Stage,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
139
+ Single Stage,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
140
+ Single Stage,12000,mme_total_score,1225.6453581432572,
141
+ Single Stage,12000,mmmu_val_mmmu_acc,0.27889,
142
+ Single Stage,12000,mmstar_average,0.34010867846816534,
143
+ Single Stage,12000,ocrbench_ocrbench_accuracy,0.512,
144
+ Single Stage,12000,seedbench_seed_all,0.5350194552529183,
145
+ Single Stage,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
146
+ Single Stage,13000,ai2d_exact_match,0.4375,0.008928571428571428
147
+ Single Stage,13000,average,0.4692868662590049,
148
+ Single Stage,13000,average_rank,1.2,
149
+ Single Stage,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
150
+ Single Stage,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
151
+ Single Stage,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
152
+ Single Stage,13000,mme_total_score,1281.7122849139657,
153
+ Single Stage,13000,mmmu_val_mmmu_acc,0.28222,
154
+ Single Stage,13000,mmstar_average,0.3453069542917521,
155
+ Single Stage,13000,ocrbench_ocrbench_accuracy,0.549,
156
+ Single Stage,13000,seedbench_seed_all,0.5442468037798777,
157
+ Single Stage,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
158
+ Single Stage,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
159
+ Single Stage,14000,average,0.47352486841689195,
160
+ Single Stage,14000,average_rank,1.4,
161
+ Single Stage,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
162
+ Single Stage,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
163
+ Single Stage,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
164
+ Single Stage,14000,mme_total_score,1309.1444577831132,
165
+ Single Stage,14000,mmmu_val_mmmu_acc,0.28111,
166
+ Single Stage,14000,mmstar_average,0.34575818188776586,
167
+ Single Stage,14000,ocrbench_ocrbench_accuracy,0.551,
168
+ Single Stage,14000,seedbench_seed_all,0.5483602001111729,
169
+ Single Stage,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
170
+ Single Stage,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
171
+ Single Stage,15000,average,0.47878665012878824,
172
+ Single Stage,15000,average_rank,1.2,
173
+ Single Stage,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
174
+ Single Stage,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
175
+ Single Stage,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
176
+ Single Stage,15000,mme_total_score,1384.2171868747498,
177
+ Single Stage,15000,mmmu_val_mmmu_acc,0.30222,
178
+ Single Stage,15000,mmstar_average,0.35408135695920684,
179
+ Single Stage,15000,ocrbench_ocrbench_accuracy,0.558,
180
+ Single Stage,15000,seedbench_seed_all,0.5411339633129516,
181
+ Single Stage,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
182
+ Single Stage,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
183
+ Single Stage,16000,average,0.47665128022935843,
184
+ Single Stage,16000,average_rank,1.3,
185
+ Single Stage,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
186
+ Single Stage,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
187
+ Single Stage,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
188
+ Single Stage,16000,mme_total_score,1317.8491396558625,
189
+ Single Stage,16000,mmmu_val_mmmu_acc,0.27556,
190
+ Single Stage,16000,mmstar_average,0.33214333327093315,
191
+ Single Stage,16000,ocrbench_ocrbench_accuracy,0.56,
192
+ Single Stage,16000,seedbench_seed_all,0.5463590883824346,
193
+ Single Stage,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
194
+ Single Stage,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
195
+ Single Stage,17000,average,0.4777141780162423,
196
+ Single Stage,17000,average_rank,1.3,
197
+ Single Stage,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
198
+ Single Stage,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
199
+ Single Stage,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
200
+ Single Stage,17000,mme_total_score,1381.9161664665867,
201
+ Single Stage,17000,mmmu_val_mmmu_acc,0.27667,
202
+ Single Stage,17000,mmstar_average,0.3370289492329521,
203
+ Single Stage,17000,ocrbench_ocrbench_accuracy,0.519,
204
+ Single Stage,17000,seedbench_seed_all,0.5510283490828238,
205
+ Single Stage,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
206
+ Single Stage,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
207
+ Single Stage,18000,average,0.4819834595278701,
208
+ Single Stage,18000,average_rank,1.3,
209
+ Single Stage,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
210
+ Single Stage,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
211
+ Single Stage,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
212
+ Single Stage,18000,mme_total_score,1336.922769107643,
213
+ Single Stage,18000,mmmu_val_mmmu_acc,0.28667,
214
+ Single Stage,18000,mmstar_average,0.34482796716566916,
215
+ Single Stage,18000,ocrbench_ocrbench_accuracy,0.533,
216
+ Single Stage,18000,seedbench_seed_all,0.5543079488604781,
217
+ Single Stage,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
218
+ Single Stage,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
219
+ Single Stage,19000,average,0.4899006713916878,
220
+ Single Stage,19000,average_rank,1.1,
221
+ Single Stage,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
222
+ Single Stage,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
223
+ Single Stage,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
224
+ Single Stage,19000,mme_total_score,1406.6628651460583,
225
+ Single Stage,19000,mmmu_val_mmmu_acc,0.28333,
226
+ Single Stage,19000,mmstar_average,0.356220913822775,
227
+ Single Stage,19000,ocrbench_ocrbench_accuracy,0.577,
228
+ Single Stage,19000,seedbench_seed_all,0.554585881045025,
229
+ Single Stage,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
230
+ Single Stage,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
231
+ Single Stage,20000,average,0.4873169067639118,
232
+ Single Stage,20000,average_rank,1.2,
233
+ Single Stage,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
234
+ Single Stage,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
235
+ Single Stage,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
236
+ Single Stage,20000,mme_total_score,1324.6738695478193,
237
+ Single Stage,20000,mmmu_val_mmmu_acc,0.30111,
238
+ Single Stage,20000,mmstar_average,0.33806766134497995,
239
+ Single Stage,20000,ocrbench_ocrbench_accuracy,0.555,
240
+ Single Stage,20000,seedbench_seed_all,0.5587548638132296,
241
+ Single Stage,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
242
+ Two Stage,1000,ai2d_exact_match,0.25906735751295334,0.007885466610693084
243
+ Two Stage,1000,average,0.31368848609084204,
244
+ Two Stage,1000,average_rank,1.0,
245
+ Two Stage,1000,chartqa_relaxed_overall,0.4436,0.009938164963872337
246
+ Two Stage,1000,docvqa_val_anls,0.42857906272393714,0.00617017051120098
247
+ Two Stage,1000,infovqa_val_anls,0.19144447578161194,0.006593728313201272
248
+ Two Stage,1000,mme_total_score,998.7869147659063,
249
+ Two Stage,1000,mmmu_val_mmmu_acc,0.25889,
250
+ Two Stage,1000,mmstar_average,0.2467637945300377,
251
+ Two Stage,1000,ocrbench_ocrbench_accuracy,0.368,
252
+ Two Stage,1000,seedbench_seed_all,0.25703168426903833,
253
+ Two Stage,1000,textvqa_val_exact_match,0.36982,0.006597131039140386
254
+ Two Stage,2000,ai2d_exact_match,0.26327720207253885,0.007926662492947052
255
+ Two Stage,2000,average,0.3358130433652279,
256
+ Two Stage,2000,average_rank,1.2,
257
+ Two Stage,2000,chartqa_relaxed_overall,0.4992,0.010001987797631107
258
+ Two Stage,2000,docvqa_val_anls,0.4932752040405314,0.006286364089099095
259
+ Two Stage,2000,infovqa_val_anls,0.19095428252193772,0.006391194919224349
260
+ Two Stage,2000,mme_total_score,1062.8957583033214,
261
+ Two Stage,2000,mmmu_val_mmmu_acc,0.23333,
262
+ Two Stage,2000,mmstar_average,0.22051867830573926,
263
+ Two Stage,2000,ocrbench_ocrbench_accuracy,0.435,
264
+ Two Stage,2000,seedbench_seed_all,0.2556420233463035,
265
+ Two Stage,2000,textvqa_val_exact_match,0.43112,0.006756288819146318
266
+ Two Stage,3000,ai2d_exact_match,0.2655440414507772,0.007948457289013515
267
+ Two Stage,3000,average,0.3636919255920759,
268
+ Two Stage,3000,average_rank,1.3,
269
+ Two Stage,3000,chartqa_relaxed_overall,0.5348,0.009977745545085072
270
+ Two Stage,3000,docvqa_val_anls,0.5283823835512687,0.006261305725762883
271
+ Two Stage,3000,infovqa_val_anls,0.2064005153919739,0.00660395026420985
272
+ Two Stage,3000,mme_total_score,1152.5195078031213,
273
+ Two Stage,3000,mmmu_val_mmmu_acc,0.26667,
274
+ Two Stage,3000,mmstar_average,0.26072557614922737,
275
+ Two Stage,3000,ocrbench_ocrbench_accuracy,0.455,
276
+ Two Stage,3000,seedbench_seed_all,0.29666481378543635,
277
+ Two Stage,3000,textvqa_val_exact_match,0.45903999999999995,0.006792178031860127
278
+ Two Stage,4000,ai2d_exact_match,0.30343264248704666,0.008274550183857863
279
+ Two Stage,4000,average,0.386738207804619,
280
+ Two Stage,4000,average_rank,1.2,
281
+ Two Stage,4000,chartqa_relaxed_overall,0.5464,0.00995883966107287
282
+ Two Stage,4000,docvqa_val_anls,0.5513347609587042,0.006295149714671814
283
+ Two Stage,4000,infovqa_val_anls,0.209061566918142,0.006630816594060217
284
+ Two Stage,4000,mme_total_score,1092.9095638255303,
285
+ Two Stage,4000,mmmu_val_mmmu_acc,0.26889,
286
+ Two Stage,4000,mmstar_average,0.26686799048357046,
287
+ Two Stage,4000,ocrbench_ocrbench_accuracy,0.477,
288
+ Two Stage,4000,seedbench_seed_all,0.38643690939410785,
289
+ Two Stage,4000,textvqa_val_exact_match,0.47121999999999997,0.006809171409434235
290
+ Two Stage,5000,ai2d_exact_match,0.34617875647668395,0.008562713351618975
291
+ Two Stage,5000,average,0.41048271276999254,
292
+ Two Stage,5000,average_rank,1.2,
293
+ Two Stage,5000,chartqa_relaxed_overall,0.5568,0.009937253322797029
294
+ Two Stage,5000,docvqa_val_anls,0.5616928036954175,0.006281333847375657
295
+ Two Stage,5000,infovqa_val_anls,0.21417615930558564,0.006470237976804916
296
+ Two Stage,5000,mme_total_score,1113.2024809923969,
297
+ Two Stage,5000,mmmu_val_mmmu_acc,0.28889,
298
+ Two Stage,5000,mmstar_average,0.3048769900603613,
299
+ Two Stage,5000,ocrbench_ocrbench_accuracy,0.501,
300
+ Two Stage,5000,seedbench_seed_all,0.4454697053918844,
301
+ Two Stage,5000,textvqa_val_exact_match,0.47525999999999996,0.006811465752181289
302
+ Two Stage,6000,ai2d_exact_match,0.3853626943005181,0.008759432661868542
303
+ Two Stage,6000,average,0.4256324408073156,
304
+ Two Stage,6000,average_rank,1.4,
305
+ Two Stage,6000,chartqa_relaxed_overall,0.574,0.009891852177211218
306
+ Two Stage,6000,docvqa_val_anls,0.5959624206334873,0.006223948314975518
307
+ Two Stage,6000,infovqa_val_anls,0.21910870056052556,0.00650522330852698
308
+ Two Stage,6000,mme_total_score,1166.5228091236495,
309
+ Two Stage,6000,mmmu_val_mmmu_acc,0.28333,
310
+ Two Stage,6000,mmstar_average,0.28797389940888596,
311
+ Two Stage,6000,ocrbench_ocrbench_accuracy,0.512,
312
+ Two Stage,6000,seedbench_seed_all,0.4776542523624236,
313
+ Two Stage,6000,textvqa_val_exact_match,0.4953,0.006792791061270795
314
+ Two Stage,7000,ai2d_exact_match,0.3915155440414508,0.008784780895708935
315
+ Two Stage,7000,average,0.4301306852910006,
316
+ Two Stage,7000,average_rank,1.4,
317
+ Two Stage,7000,chartqa_relaxed_overall,0.5776,0.009880807059104824
318
+ Two Stage,7000,docvqa_val_anls,0.5986163103423551,0.0062031909815058375
319
+ Two Stage,7000,infovqa_val_anls,0.22133856274121264,0.006604073748499083
320
+ Two Stage,7000,mme_total_score,1191.3954581832734,
321
+ Two Stage,7000,mmmu_val_mmmu_acc,0.28667,
322
+ Two Stage,7000,mmstar_average,0.2999043663917079,
323
+ Two Stage,7000,ocrbench_ocrbench_accuracy,0.501,
324
+ Two Stage,7000,seedbench_seed_all,0.48449138410227904,
325
+ Two Stage,7000,textvqa_val_exact_match,0.51004,0.006807782962299279
326
+ Two Stage,8000,ai2d_exact_match,0.4106217616580311,0.008854207883828033
327
+ Two Stage,8000,average,0.4460743520389214,
328
+ Two Stage,8000,average_rank,1.5,
329
+ Two Stage,8000,chartqa_relaxed_overall,0.6044,0.009781540134915584
330
+ Two Stage,8000,docvqa_val_anls,0.6026263625222106,0.006221681650022778
331
+ Two Stage,8000,infovqa_val_anls,0.25653488200256863,0.007114496312902602
332
+ Two Stage,8000,mme_total_score,1122.452581032413,
333
+ Two Stage,8000,mmmu_val_mmmu_acc,0.30556,
334
+ Two Stage,8000,mmstar_average,0.3287554228678711,
335
+ Two Stage,8000,ocrbench_ocrbench_accuracy,0.502,
336
+ Two Stage,8000,seedbench_seed_all,0.4953307392996109,
337
+ Two Stage,8000,textvqa_val_exact_match,0.5088400000000001,0.006790286627123755
338
+ Two Stage,9000,ai2d_exact_match,0.40900259067357514,0.00884886365109852
339
+ Two Stage,9000,average,0.4448373661618862,
340
+ Two Stage,9000,average_rank,1.4,
341
+ Two Stage,9000,chartqa_relaxed_overall,0.602,0.00979166741164548
342
+ Two Stage,9000,docvqa_val_anls,0.6230206474600885,0.006150742264825986
343
+ Two Stage,9000,infovqa_val_anls,0.22695214706156083,0.0066522293148095326
344
+ Two Stage,9000,mme_total_score,1123.2771108443376,
345
+ Two Stage,9000,mmmu_val_mmmu_acc,0.28444,
346
+ Two Stage,9000,mmstar_average,0.31337399530900006,
347
+ Two Stage,9000,ocrbench_ocrbench_accuracy,0.516,
348
+ Two Stage,9000,seedbench_seed_all,0.5044469149527515,
349
+ Two Stage,9000,textvqa_val_exact_match,0.5243,0.006775919466531711
350
+ Two Stage,10000,ai2d_exact_match,0.4167746113989637,0.008873613803189363
351
+ Two Stage,10000,average,0.45019708387432694,
352
+ Two Stage,10000,average_rank,1.7,
353
+ Two Stage,10000,chartqa_relaxed_overall,0.6008,0.00979663889573671
354
+ Two Stage,10000,docvqa_val_anls,0.625559493523932,0.006163808988970625
355
+ Two Stage,10000,infovqa_val_anls,0.2484394159425024,0.006960467307383163
356
+ Two Stage,10000,mme_total_score,1175.7940176070429,
357
+ Two Stage,10000,mmmu_val_mmmu_acc,0.28444,
358
+ Two Stage,10000,mmstar_average,0.3201372990396749,
359
+ Two Stage,10000,ocrbench_ocrbench_accuracy,0.523,
360
+ Two Stage,10000,seedbench_seed_all,0.5092829349638688,
361
+ Two Stage,10000,textvqa_val_exact_match,0.52334,0.006775531746371587
362
+ Two Stage,11000,ai2d_exact_match,0.4219559585492228,0.008888852746011196
363
+ Two Stage,11000,average,0.4544831873326875,
364
+ Two Stage,11000,average_rank,1.8,
365
+ Two Stage,11000,chartqa_relaxed_overall,0.6128,0.009744149186940382
366
+ Two Stage,11000,docvqa_val_anls,0.6332812103643084,0.006140691371662128
367
+ Two Stage,11000,infovqa_val_anls,0.23863681037743975,0.006726839163261667
368
+ Two Stage,11000,mme_total_score,1205.7752100840335,
369
+ Two Stage,11000,mmmu_val_mmmu_acc,0.27667,
370
+ Two Stage,11000,mmstar_average,0.3207287756303977,
371
+ Two Stage,11000,ocrbench_ocrbench_accuracy,0.542,
372
+ Two Stage,11000,seedbench_seed_all,0.5166759310728183,
373
+ Two Stage,11000,textvqa_val_exact_match,0.5276,0.006779501480792346
374
+ Two Stage,12000,ai2d_exact_match,0.43005181347150256,0.00891065778843896
375
+ Two Stage,12000,average,0.4603231834457321,
376
+ Two Stage,12000,average_rank,1.6,
377
+ Two Stage,12000,chartqa_relaxed_overall,0.612,0.009747841205275417
378
+ Two Stage,12000,docvqa_val_anls,0.6395985301346107,0.006113052714689484
379
+ Two Stage,12000,infovqa_val_anls,0.2439170659215255,0.006865310277271596
380
+ Two Stage,12000,mme_total_score,1157.484293717487,
381
+ Two Stage,12000,mmmu_val_mmmu_acc,0.29556,
382
+ Two Stage,12000,mmstar_average,0.33444157500257155,
383
+ Two Stage,12000,ocrbench_ocrbench_accuracy,0.539,
384
+ Two Stage,12000,seedbench_seed_all,0.5193996664813786,
385
+ Two Stage,12000,textvqa_val_exact_match,0.52894,0.006785904875622425
386
+ Two Stage,13000,ai2d_exact_match,0.4339378238341969,0.00892025987527176
387
+ Two Stage,13000,average,0.46490664749620997,
388
+ Two Stage,13000,average_rank,1.8,
389
+ Two Stage,13000,chartqa_relaxed_overall,0.6224,0.009697675699134625
390
+ Two Stage,13000,docvqa_val_anls,0.6462803017356844,0.0061027748005307945
391
+ Two Stage,13000,infovqa_val_anls,0.24426636134362278,0.006797247018813037
392
+ Two Stage,13000,mme_total_score,1191.0042016806724,
393
+ Two Stage,13000,mmmu_val_mmmu_acc,0.3,
394
+ Two Stage,13000,mmstar_average,0.33993002648901727,
395
+ Two Stage,13000,ocrbench_ocrbench_accuracy,0.545,
396
+ Two Stage,13000,seedbench_seed_all,0.5175653140633686,
397
+ Two Stage,13000,textvqa_val_exact_match,0.5347799999999999,0.0067635803775740536
398
+ Two Stage,14000,ai2d_exact_match,0.44332901554404147,0.008941163900483138
399
+ Two Stage,14000,average,0.47155104399726233,
400
+ Two Stage,14000,average_rank,1.6,
401
+ Two Stage,14000,chartqa_relaxed_overall,0.6268,0.009675026948726469
402
+ Two Stage,14000,docvqa_val_anls,0.6586021078894133,0.006060927182389954
403
+ Two Stage,14000,infovqa_val_anls,0.2553127836308732,0.0069494972189920795
404
+ Two Stage,14000,mme_total_score,1219.156662665066,
405
+ Two Stage,14000,mmmu_val_mmmu_acc,0.30444,
406
+ Two Stage,14000,mmstar_average,0.32252187023399065,
407
+ Two Stage,14000,ocrbench_ocrbench_accuracy,0.564,
408
+ Two Stage,14000,seedbench_seed_all,0.5245136186770428,
409
+ Two Stage,14000,textvqa_val_exact_match,0.54444,0.006760159556655915
410
+ Two Stage,15000,ai2d_exact_match,0.44527202072538863,0.008945084019331405
411
+ Two Stage,15000,average,0.47506404899487137,
412
+ Two Stage,15000,average_rank,1.8,
413
+ Two Stage,15000,chartqa_relaxed_overall,0.628,0.009668701749325345
414
+ Two Stage,15000,docvqa_val_anls,0.6614266719753668,0.006055793707421594
415
+ Two Stage,15000,infovqa_val_anls,0.25669760055121127,0.006992050333066725
416
+ Two Stage,15000,mme_total_score,1198.7210884353742,
417
+ Two Stage,15000,mmmu_val_mmmu_acc,0.31222,
418
+ Two Stage,15000,mmstar_average,0.34599838005318234,
419
+ Two Stage,15000,ocrbench_ocrbench_accuracy,0.553,
420
+ Two Stage,15000,seedbench_seed_all,0.5271817676486937,
421
+ Two Stage,15000,textvqa_val_exact_match,0.5457799999999999,0.006751174267547695
422
+ Two Stage,16000,ai2d_exact_match,0.452720207253886,0.008958830742136086
423
+ Two Stage,16000,average,0.4756900312291722,
424
+ Two Stage,16000,average_rank,1.7,
425
+ Two Stage,16000,chartqa_relaxed_overall,0.6228,0.009695651925812239
426
+ Two Stage,16000,docvqa_val_anls,0.6636227651335681,0.006049765989250173
427
+ Two Stage,16000,infovqa_val_anls,0.2545981800588258,0.0069034382302033005
428
+ Two Stage,16000,mme_total_score,1211.0271108443376,
429
+ Two Stage,16000,mmmu_val_mmmu_acc,0.30778,
430
+ Two Stage,16000,mmstar_average,0.3441840591332238,
431
+ Two Stage,16000,ocrbench_ocrbench_accuracy,0.558,
432
+ Two Stage,16000,seedbench_seed_all,0.5251250694830462,
433
+ Two Stage,16000,textvqa_val_exact_match,0.55238,0.006735691577574321
434
+ Two Stage,17000,ai2d_exact_match,0.45142487046632124,0.008956585653027465
435
+ Two Stage,17000,average,0.478877157951835,
436
+ Two Stage,17000,average_rank,1.7,
437
+ Two Stage,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
438
+ Two Stage,17000,docvqa_val_anls,0.6682822523143818,0.006027291004964481
439
+ Two Stage,17000,infovqa_val_anls,0.2566899031113292,0.006984361605936137
440
+ Two Stage,17000,mme_total_score,1157.7550020008005,
441
+ Two Stage,17000,mmmu_val_mmmu_acc,0.31556,
442
+ Two Stage,17000,mmstar_average,0.3413821094043331,
443
+ Two Stage,17000,ocrbench_ocrbench_accuracy,0.563,
444
+ Two Stage,17000,seedbench_seed_all,0.5275152862701501,
445
+ Two Stage,17000,textvqa_val_exact_match,0.55404,0.006743665997528143
446
+ Two Stage,18000,ai2d_exact_match,0.45077720207253885,0.008955440137395842
447
+ Two Stage,18000,average,0.48011960096968553,
448
+ Two Stage,18000,average_rank,1.7,
449
+ Two Stage,18000,chartqa_relaxed_overall,0.6324,0.00964496273307725
450
+ Two Stage,18000,docvqa_val_anls,0.6669938909662756,0.006030949772272312
451
+ Two Stage,18000,infovqa_val_anls,0.26114082779542375,0.006997258882360672
452
+ Two Stage,18000,mme_total_score,1199.3700480192078,
453
+ Two Stage,18000,mmmu_val_mmmu_acc,0.30222,
454
+ Two Stage,18000,mmstar_average,0.34746272024423847,
455
+ Two Stage,18000,ocrbench_ocrbench_accuracy,0.579,
456
+ Two Stage,18000,seedbench_seed_all,0.5271817676486937,
457
+ Two Stage,18000,textvqa_val_exact_match,0.5539,0.0067478933611137175
458
+ Two Stage,19000,ai2d_exact_match,0.44559585492227977,0.00894572391435784
459
+ Two Stage,19000,average,0.48026929849849115,
460
+ Two Stage,19000,average_rank,1.9,
461
+ Two Stage,19000,chartqa_relaxed_overall,0.6372,0.00961808021316077
462
+ Two Stage,19000,docvqa_val_anls,0.6688318561206944,0.006022351017420005
463
+ Two Stage,19000,infovqa_val_anls,0.2646354907091152,0.007027671735260141
464
+ Two Stage,19000,mme_total_score,1170.1806722689075,
465
+ Two Stage,19000,mmmu_val_mmmu_acc,0.29778,
466
+ Two Stage,19000,mmstar_average,0.35086201891999,
467
+ Two Stage,19000,ocrbench_ocrbench_accuracy,0.574,
468
+ Two Stage,19000,seedbench_seed_all,0.5292384658143413,
469
+ Two Stage,19000,textvqa_val_exact_match,0.55428,0.006746127657232224
470
+ Two Stage,20000,ai2d_exact_match,0.44721502590673573,0.008948865761421001
471
+ Two Stage,20000,average,0.4807284005437735,
472
+ Two Stage,20000,average_rank,1.8,
473
+ Two Stage,20000,chartqa_relaxed_overall,0.632,0.00964715642305132
474
+ Two Stage,20000,docvqa_val_anls,0.6696120046502304,0.0060246464192922275
475
+ Two Stage,20000,infovqa_val_anls,0.2643335615077466,0.007024758501317731
476
+ Two Stage,20000,mme_total_score,1187.4589835934376,
477
+ Two Stage,20000,mmmu_val_mmmu_acc,0.29778,
478
+ Two Stage,20000,mmstar_average,0.34891710287927624,
479
+ Two Stage,20000,ocrbench_ocrbench_accuracy,0.582,
480
+ Two Stage,20000,seedbench_seed_all,0.5282379099499722,
481
+ Two Stage,20000,textvqa_val_exact_match,0.5564600000000001,0.006728915911338792
app/src/content/assets/data/visual_dependency_filters.csv ADDED
@@ -0,0 +1,1165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ run,step,metric,value,stderr
2
+ Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
3
+ Baseline,1000,average,0.27120689295763617,
4
+ Baseline,1000,average_rank,3.5,
5
+ Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
6
+ Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
7
+ Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
8
+ Baseline,1000,mme_total_score,977.4280712284914,
9
+ Baseline,1000,mmmu_val_mmmu_acc,0.25222,
10
+ Baseline,1000,mmstar_average,0.23215874078908072,
11
+ Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
12
+ Baseline,1000,seedbench_seed_all,0.2563646470261256,
13
+ Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
14
+ Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
15
+ Baseline,2000,average,0.3202068275596269,
16
+ Baseline,2000,average_rank,3.7,
17
+ Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
18
+ Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
19
+ Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
20
+ Baseline,2000,mme_total_score,1049.3036214485794,
21
+ Baseline,2000,mmmu_val_mmmu_acc,0.24556,
22
+ Baseline,2000,mmstar_average,0.21305462434540698,
23
+ Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
24
+ Baseline,2000,seedbench_seed_all,0.258532518065592,
25
+ Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
26
+ Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
27
+ Baseline,3000,average,0.3507423834414229,
28
+ Baseline,3000,average_rank,2.6,
29
+ Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
30
+ Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
31
+ Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
32
+ Baseline,3000,mme_total_score,1170.2383953581434,
33
+ Baseline,3000,mmmu_val_mmmu_acc,0.27556,
34
+ Baseline,3000,mmstar_average,0.25432376938577683,
35
+ Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
36
+ Baseline,3000,seedbench_seed_all,0.2792106725958866,
37
+ Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
38
+ Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
39
+ Baseline,4000,average,0.36961781722974835,
40
+ Baseline,4000,average_rank,3.2,
41
+ Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
42
+ Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
43
+ Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
44
+ Baseline,4000,mme_total_score,1155.203781512605,
45
+ Baseline,4000,mmmu_val_mmmu_acc,0.25556,
46
+ Baseline,4000,mmstar_average,0.2575590188757354,
47
+ Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
48
+ Baseline,4000,seedbench_seed_all,0.33913285158421347,
49
+ Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
50
+ Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
51
+ Baseline,5000,average,0.3974627910380972,
52
+ Baseline,5000,average_rank,3.2,
53
+ Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
54
+ Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
55
+ Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
56
+ Baseline,5000,mme_total_score,1181.4653861544618,
57
+ Baseline,5000,mmmu_val_mmmu_acc,0.26667,
58
+ Baseline,5000,mmstar_average,0.29596648146165705,
59
+ Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
60
+ Baseline,5000,seedbench_seed_all,0.43107281823235133,
61
+ Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
62
+ Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
63
+ Baseline,6000,average,0.4161227404571003,
64
+ Baseline,6000,average_rank,2.7,
65
+ Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
66
+ Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
67
+ Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
68
+ Baseline,6000,mme_total_score,1284.1648659463785,
69
+ Baseline,6000,mmmu_val_mmmu_acc,0.27111,
70
+ Baseline,6000,mmstar_average,0.2978489412854164,
71
+ Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
72
+ Baseline,6000,seedbench_seed_all,0.4795997776542524,
73
+ Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
74
+ Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
75
+ Baseline,7000,average,0.4291083177345374,
76
+ Baseline,7000,average_rank,2.5,
77
+ Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
78
+ Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
79
+ Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
80
+ Baseline,7000,mme_total_score,1185.875650260104,
81
+ Baseline,7000,mmmu_val_mmmu_acc,0.26556,
82
+ Baseline,7000,mmstar_average,0.31372400960777047,
83
+ Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
84
+ Baseline,7000,seedbench_seed_all,0.4964424680377988,
85
+ Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
86
+ Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
87
+ Baseline,8000,average,0.43846759477995995,
88
+ Baseline,8000,average_rank,2.3,
89
+ Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
90
+ Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
91
+ Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
92
+ Baseline,8000,mme_total_score,1199.2409963985594,
93
+ Baseline,8000,mmmu_val_mmmu_acc,0.28111,
94
+ Baseline,8000,mmstar_average,0.33512257186205047,
95
+ Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
96
+ Baseline,8000,seedbench_seed_all,0.5024458032240133,
97
+ Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
98
+ Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
99
+ Baseline,9000,average,0.4422510732201056,
100
+ Baseline,9000,average_rank,2.6,
101
+ Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
102
+ Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
103
+ Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
104
+ Baseline,9000,mme_total_score,1231.5195078031213,
105
+ Baseline,9000,mmmu_val_mmmu_acc,0.25889,
106
+ Baseline,9000,mmstar_average,0.3216444898242951,
107
+ Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
108
+ Baseline,9000,seedbench_seed_all,0.5120622568093385,
109
+ Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
110
+ Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
111
+ Baseline,10000,average,0.4523875703250908,
112
+ Baseline,10000,average_rank,2.1,
113
+ Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
114
+ Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
115
+ Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
116
+ Baseline,10000,mme_total_score,1240.8218287314926,
117
+ Baseline,10000,mmmu_val_mmmu_acc,0.28778,
118
+ Baseline,10000,mmstar_average,0.32972717906018517,
119
+ Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
120
+ Baseline,10000,seedbench_seed_all,0.5217342968315731,
121
+ Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
122
+ Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
123
+ Baseline,11000,average,0.4561398159525099,
124
+ Baseline,11000,average_rank,2.4,
125
+ Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
126
+ Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
127
+ Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
128
+ Baseline,11000,mme_total_score,1322.9488795518205,
129
+ Baseline,11000,mmmu_val_mmmu_acc,0.27778,
130
+ Baseline,11000,mmstar_average,0.3298563439522548,
131
+ Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
132
+ Baseline,11000,seedbench_seed_all,0.5237354085603113,
133
+ Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
134
+ Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
135
+ Baseline,12000,average,0.4582751140055433,
136
+ Baseline,12000,average_rank,2.7,
137
+ Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
138
+ Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
139
+ Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
140
+ Baseline,12000,mme_total_score,1225.6453581432572,
141
+ Baseline,12000,mmmu_val_mmmu_acc,0.27889,
142
+ Baseline,12000,mmstar_average,0.34010867846816534,
143
+ Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
144
+ Baseline,12000,seedbench_seed_all,0.5350194552529183,
145
+ Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
146
+ Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
147
+ Baseline,13000,average,0.4692868662590049,
148
+ Baseline,13000,average_rank,2.2,
149
+ Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
150
+ Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
151
+ Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
152
+ Baseline,13000,mme_total_score,1281.7122849139657,
153
+ Baseline,13000,mmmu_val_mmmu_acc,0.28222,
154
+ Baseline,13000,mmstar_average,0.3453069542917521,
155
+ Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
156
+ Baseline,13000,seedbench_seed_all,0.5442468037798777,
157
+ Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
158
+ Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
159
+ Baseline,14000,average,0.47352486841689195,
160
+ Baseline,14000,average_rank,2.2,
161
+ Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
162
+ Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
163
+ Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
164
+ Baseline,14000,mme_total_score,1309.1444577831132,
165
+ Baseline,14000,mmmu_val_mmmu_acc,0.28111,
166
+ Baseline,14000,mmstar_average,0.34575818188776586,
167
+ Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
168
+ Baseline,14000,seedbench_seed_all,0.5483602001111729,
169
+ Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
170
+ Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
171
+ Baseline,15000,average,0.47878665012878824,
172
+ Baseline,15000,average_rank,1.6,
173
+ Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
174
+ Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
175
+ Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
176
+ Baseline,15000,mme_total_score,1384.2171868747498,
177
+ Baseline,15000,mmmu_val_mmmu_acc,0.30222,
178
+ Baseline,15000,mmstar_average,0.35408135695920684,
179
+ Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
180
+ Baseline,15000,seedbench_seed_all,0.5411339633129516,
181
+ Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
182
+ Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
183
+ Baseline,16000,average,0.47665128022935843,
184
+ Baseline,16000,average_rank,2.2,
185
+ Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
186
+ Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
187
+ Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
188
+ Baseline,16000,mme_total_score,1317.8491396558625,
189
+ Baseline,16000,mmmu_val_mmmu_acc,0.27556,
190
+ Baseline,16000,mmstar_average,0.33214333327093315,
191
+ Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
192
+ Baseline,16000,seedbench_seed_all,0.5463590883824346,
193
+ Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
194
+ Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
195
+ Baseline,17000,average,0.4777141780162423,
196
+ Baseline,17000,average_rank,2.2,
197
+ Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
198
+ Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
199
+ Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
200
+ Baseline,17000,mme_total_score,1381.9161664665867,
201
+ Baseline,17000,mmmu_val_mmmu_acc,0.27667,
202
+ Baseline,17000,mmstar_average,0.3370289492329521,
203
+ Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
204
+ Baseline,17000,seedbench_seed_all,0.5510283490828238,
205
+ Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
206
+ Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
207
+ Baseline,18000,average,0.4819834595278701,
208
+ Baseline,18000,average_rank,2.1,
209
+ Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
210
+ Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
211
+ Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
212
+ Baseline,18000,mme_total_score,1336.922769107643,
213
+ Baseline,18000,mmmu_val_mmmu_acc,0.28667,
214
+ Baseline,18000,mmstar_average,0.34482796716566916,
215
+ Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
216
+ Baseline,18000,seedbench_seed_all,0.5543079488604781,
217
+ Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
218
+ Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
219
+ Baseline,19000,average,0.4899006713916878,
220
+ Baseline,19000,average_rank,1.8,
221
+ Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
222
+ Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
223
+ Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
224
+ Baseline,19000,mme_total_score,1406.6628651460583,
225
+ Baseline,19000,mmmu_val_mmmu_acc,0.28333,
226
+ Baseline,19000,mmstar_average,0.356220913822775,
227
+ Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
228
+ Baseline,19000,seedbench_seed_all,0.554585881045025,
229
+ Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
230
+ Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
231
+ Baseline,20000,average,0.4873169067639118,
232
+ Baseline,20000,average_rank,1.1,
233
+ Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
234
+ Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
235
+ Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
236
+ Baseline,20000,mme_total_score,1324.6738695478193,
237
+ Baseline,20000,mmmu_val_mmmu_acc,0.30111,
238
+ Baseline,20000,mmstar_average,0.33806766134497995,
239
+ Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
240
+ Baseline,20000,seedbench_seed_all,0.5587548638132296,
241
+ Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
242
+ ≥2,1000,ai2d_exact_match,0.25777202072538863,0.00787260087439643
243
+ ≥2,1000,average,0.29870004148945406,
244
+ ≥2,1000,average_rank,1.6,
245
+ ≥2,1000,chartqa_relaxed_overall,0.392,0.00976588700628918
246
+ ≥2,1000,docvqa_val_anls,0.38022613055363247,0.005894591191928051
247
+ ≥2,1000,infovqa_val_anls,0.18869492615378894,0.0064732209321745355
248
+ ≥2,1000,mme_total_score,1009.0293117246899,
249
+ ≥2,1000,mmmu_val_mmmu_acc,0.26889,
250
+ ≥2,1000,mmstar_average,0.2331497473341441,
251
+ ≥2,1000,ocrbench_ocrbench_accuracy,0.342,
252
+ ≥2,1000,seedbench_seed_all,0.25758754863813227,
253
+ ≥2,1000,textvqa_val_exact_match,0.36798000000000003,0.0065929080994233105
254
+ ≥2,2000,ai2d_exact_match,0.2655440414507772,0.007948457289013512
255
+ ≥2,2000,average,0.33300868864578836,
256
+ ≥2,2000,average_rank,1.9,
257
+ ≥2,2000,chartqa_relaxed_overall,0.4772,0.009991596308834713
258
+ ≥2,2000,docvqa_val_anls,0.46008700414278725,0.006164804465853957
259
+ ≥2,2000,infovqa_val_anls,0.21854176213098941,0.0068306837394726885
260
+ ≥2,2000,mme_total_score,1096.7091836734694,
261
+ ≥2,2000,mmmu_val_mmmu_acc,0.25333,
262
+ ≥2,2000,mmstar_average,0.203705184417725,
263
+ ≥2,2000,ocrbench_ocrbench_accuracy,0.409,
264
+ ≥2,2000,seedbench_seed_all,0.26637020566981656,
265
+ ≥2,2000,textvqa_val_exact_match,0.4433,0.006768706307487082
266
+ ≥2,3000,ai2d_exact_match,0.2610103626943005,0.007904597024354016
267
+ ≥2,3000,average,0.3534218124222384,
268
+ ≥2,3000,average_rank,2.8,
269
+ ≥2,3000,chartqa_relaxed_overall,0.5264,0.009988048880946633
270
+ ≥2,3000,docvqa_val_anls,0.5023083447985476,0.006259506318102633
271
+ ≥2,3000,infovqa_val_anls,0.2121067617258121,0.006644891191437915
272
+ ≥2,3000,mme_total_score,1089.7261904761906,
273
+ ≥2,3000,mmmu_val_mmmu_acc,0.24333,
274
+ ≥2,3000,mmstar_average,0.23083152629465944,
275
+ ≥2,3000,ocrbench_ocrbench_accuracy,0.462,
276
+ ≥2,3000,seedbench_seed_all,0.284769316286826,
277
+ ≥2,3000,textvqa_val_exact_match,0.45803999999999995,0.006781406135443796
278
+ ≥2,4000,ai2d_exact_match,0.30440414507772023,0.00828200443840283
279
+ ≥2,4000,average,0.3907370325722774,
280
+ ≥2,4000,average_rank,2.0,
281
+ ≥2,4000,chartqa_relaxed_overall,0.5336,0.009979391329160321
282
+ ≥2,4000,docvqa_val_anls,0.5333935372644802,0.006254756291115302
283
+ ≥2,4000,infovqa_val_anls,0.21491089642827804,0.006550896522978477
284
+ ≥2,4000,mme_total_score,1174.6278511404562,
285
+ ≥2,4000,mmmu_val_mmmu_acc,0.27778,
286
+ ≥2,4000,mmstar_average,0.28828662655344744,
287
+ ≥2,4000,ocrbench_ocrbench_accuracy,0.483,
288
+ ≥2,4000,seedbench_seed_all,0.4045580878265703,
289
+ ≥2,4000,textvqa_val_exact_match,0.4767,0.00679925495724309
290
+ ≥2,5000,ai2d_exact_match,0.34067357512953367,0.008530041622806898
291
+ ≥2,5000,average,0.4052701461218865,
292
+ ≥2,5000,average_rank,2.4,
293
+ ≥2,5000,chartqa_relaxed_overall,0.556,0.00993907007952043
294
+ ≥2,5000,docvqa_val_anls,0.551636615576218,0.006262113198152568
295
+ ≥2,5000,infovqa_val_anls,0.2187520201553633,0.0065999197492640155
296
+ ≥2,5000,mme_total_score,1239.7700080032012,
297
+ ≥2,5000,mmmu_val_mmmu_acc,0.25889,
298
+ ≥2,5000,mmstar_average,0.3088917112397545,
299
+ ≥2,5000,ocrbench_ocrbench_accuracy,0.479,
300
+ ≥2,5000,seedbench_seed_all,0.45330739299610895,
301
+ ≥2,5000,textvqa_val_exact_match,0.48028000000000004,0.0068060185643105285
302
+ ≥2,6000,ai2d_exact_match,0.3636658031088083,0.008658158841882571
303
+ ≥2,6000,average,0.4239178365378573,
304
+ ≥2,6000,average_rank,1.8,
305
+ ≥2,6000,chartqa_relaxed_overall,0.5588,0.009932597172675325
306
+ ≥2,6000,docvqa_val_anls,0.5759208000996875,0.00627846983332349
307
+ ≥2,6000,infovqa_val_anls,0.22102654462080973,0.006590286832050693
308
+ ≥2,6000,mme_total_score,1250.6378551420567,
309
+ ≥2,6000,mmmu_val_mmmu_acc,0.28889,
310
+ ≥2,6000,mmstar_average,0.3344899768980139,
311
+ ≥2,6000,ocrbench_ocrbench_accuracy,0.487,
312
+ ≥2,6000,seedbench_seed_all,0.4893274041133963,
313
+ ≥2,6000,textvqa_val_exact_match,0.4961400000000001,0.006795016889670414
314
+ ≥2,7000,ai2d_exact_match,0.38147668393782386,0.008742662684201102
315
+ ≥2,7000,average,0.43263406493008894,
316
+ ≥2,7000,average_rank,1.6,
317
+ ≥2,7000,chartqa_relaxed_overall,0.5748,0.009889444091645227
318
+ ≥2,7000,docvqa_val_anls,0.5867668845214878,0.006247104281789733
319
+ ≥2,7000,infovqa_val_anls,0.23824260143164633,0.0068608126823946685
320
+ ≥2,7000,mme_total_score,1296.735694277711,
321
+ ≥2,7000,mmmu_val_mmmu_acc,0.28556,
322
+ ≥2,7000,mmstar_average,0.3072635940240335,
323
+ ≥2,7000,ocrbench_ocrbench_accuracy,0.513,
324
+ ≥2,7000,seedbench_seed_all,0.4982768204558088,
325
+ ≥2,7000,textvqa_val_exact_match,0.5083200000000001,0.006792185382957041
326
+ ≥2,8000,ai2d_exact_match,0.40382124352331605,0.008831094143874323
327
+ ≥2,8000,average,0.4456329819076237,
328
+ ≥2,8000,average_rank,1.2,
329
+ ≥2,8000,chartqa_relaxed_overall,0.5856,0.009854334029231191
330
+ ≥2,8000,docvqa_val_anls,0.6057730552623142,0.006226842243771017
331
+ ≥2,8000,infovqa_val_anls,0.2288666100209681,0.006682620404600125
332
+ ≥2,8000,mme_total_score,1221.7308923569428,
333
+ ≥2,8000,mmmu_val_mmmu_acc,0.29778,
334
+ ≥2,8000,mmstar_average,0.3382327154659617,
335
+ ≥2,8000,ocrbench_ocrbench_accuracy,0.523,
336
+ ≥2,8000,seedbench_seed_all,0.5097832128960533,
337
+ ≥2,8000,textvqa_val_exact_match,0.51784,0.006779139932738188
338
+ ≥2,9000,ai2d_exact_match,0.40025906735751293,0.008818284784223732
339
+ ≥2,9000,average,0.4496383408952438,
340
+ ≥2,9000,average_rank,2.1,
341
+ ≥2,9000,chartqa_relaxed_overall,0.5952,0.0098190299592035
342
+ ≥2,9000,docvqa_val_anls,0.6174168384969986,0.006180917562443268
343
+ ≥2,9000,infovqa_val_anls,0.25032529875297344,0.007005541408150581
344
+ ≥2,9000,mme_total_score,1151.3506402561025,
345
+ ≥2,9000,mmmu_val_mmmu_acc,0.28111,
346
+ ≥2,9000,mmstar_average,0.3278445360455953,
347
+ ≥2,9000,ocrbench_ocrbench_accuracy,0.533,
348
+ ≥2,9000,seedbench_seed_all,0.5207893274041134,
349
+ ≥2,9000,textvqa_val_exact_match,0.5208,0.006794961832215119
350
+ ≥2,10000,ai2d_exact_match,0.41353626943005184,0.00886357792887845
351
+ ≥2,10000,average,0.45099931997713383,
352
+ ≥2,10000,average_rank,2.3,
353
+ ≥2,10000,chartqa_relaxed_overall,0.5808,0.009870537726284339
354
+ ≥2,10000,docvqa_val_anls,0.6302398141335456,0.006157573676383471
355
+ ≥2,10000,infovqa_val_anls,0.2381115764088292,0.0068559608378919125
356
+ ≥2,10000,mme_total_score,1079.641156462585,
357
+ ≥2,10000,mmmu_val_mmmu_acc,0.27222,
358
+ ≥2,10000,mmstar_average,0.32534155056107783,
359
+ ≥2,10000,ocrbench_ocrbench_accuracy,0.54,
360
+ ≥2,10000,seedbench_seed_all,0.5284046692607004,
361
+ ≥2,10000,textvqa_val_exact_match,0.5303399999999999,0.006770691643625976
362
+ ≥2,11000,ai2d_exact_match,0.42972797927461137,0.008909832364541423
363
+ ≥2,11000,average,0.4614569435855967,
364
+ ≥2,11000,average_rank,1.9,
365
+ ≥2,11000,chartqa_relaxed_overall,0.5924,0.009829727637028773
366
+ ≥2,11000,docvqa_val_anls,0.6325497973300792,0.006140954920476996
367
+ ≥2,11000,infovqa_val_anls,0.2522830831713292,0.007064159450996417
368
+ ≥2,11000,mme_total_score,1068.0994397759105,
369
+ ≥2,11000,mmmu_val_mmmu_acc,0.3,
370
+ ≥2,11000,mmstar_average,0.3439924885254794,
371
+ ��2,11000,ocrbench_ocrbench_accuracy,0.541,
372
+ ≥2,11000,seedbench_seed_all,0.5264591439688716,
373
+ ≥2,11000,textvqa_val_exact_match,0.5347,0.006770579749920238
374
+ ≥2,12000,ai2d_exact_match,0.4332901554404145,0.008918698335135207
375
+ ≥2,12000,average,0.46757010591353954,
376
+ ≥2,12000,average_rank,2.3,
377
+ ≥2,12000,chartqa_relaxed_overall,0.6032,0.00978663452296623
378
+ ≥2,12000,docvqa_val_anls,0.6411886944030833,0.006133992458699484
379
+ ≥2,12000,infovqa_val_anls,0.247583905971666,0.0068292606857312445
380
+ ≥2,12000,mme_total_score,1116.4749899959984,
381
+ ≥2,12000,mmmu_val_mmmu_acc,0.31333,
382
+ ≥2,12000,mmstar_average,0.36351686333220656,
383
+ ≥2,12000,ocrbench_ocrbench_accuracy,0.545,
384
+ ≥2,12000,seedbench_seed_all,0.5224013340744859,
385
+ ≥2,12000,textvqa_val_exact_match,0.53862,0.006767396741120647
386
+ ≥2,13000,ai2d_exact_match,0.43102331606217614,0.008913110733383512
387
+ ≥2,13000,average,0.4708623711695242,
388
+ ≥2,13000,average_rank,1.8,
389
+ ≥2,13000,chartqa_relaxed_overall,0.598,0.009808000752013664
390
+ ≥2,13000,docvqa_val_anls,0.647518045726831,0.006104766828382393
391
+ ≥2,13000,infovqa_val_anls,0.2658523754039203,0.0070971911654514885
392
+ ≥2,13000,mme_total_score,1216.8277310924368,
393
+ ≥2,13000,mmmu_val_mmmu_acc,0.30778,
394
+ ≥2,13000,mmstar_average,0.35401024924718744,
395
+ ≥2,13000,ocrbench_ocrbench_accuracy,0.559,
396
+ ≥2,13000,seedbench_seed_all,0.5272373540856031,
397
+ ≥2,13000,textvqa_val_exact_match,0.5473399999999999,0.0067627015094813454
398
+ ≥2,14000,ai2d_exact_match,0.4413860103626943,0.008937105222785164
399
+ ≥2,14000,average,0.47200827220139996,
400
+ ≥2,14000,average_rank,2.2,
401
+ ≥2,14000,chartqa_relaxed_overall,0.608,0.00976588700628918
402
+ ≥2,14000,docvqa_val_anls,0.6481895430203203,0.006114739128147622
403
+ ≥2,14000,infovqa_val_anls,0.24797040095296322,0.006854729144835086
404
+ ≥2,14000,mme_total_score,1153.2053821528611,
405
+ ≥2,14000,mmmu_val_mmmu_acc,0.29778,
406
+ ≥2,14000,mmstar_average,0.3623816027584452,
407
+ ≥2,14000,ocrbench_ocrbench_accuracy,0.557,
408
+ ≥2,14000,seedbench_seed_all,0.5324068927181768,
409
+ ≥2,14000,textvqa_val_exact_match,0.55296,0.006743422160037565
410
+ ≥2,15000,ai2d_exact_match,0.44430051813471505,0.008943141268224493
411
+ ≥2,15000,average,0.4768540644353607,
412
+ ≥2,15000,average_rank,2.1,
413
+ ≥2,15000,chartqa_relaxed_overall,0.6124,0.00974599865564932
414
+ ≥2,15000,docvqa_val_anls,0.657105104030175,0.006083402178894847
415
+ ≥2,15000,infovqa_val_anls,0.2631976158051741,0.0070935590715263345
416
+ ≥2,15000,mme_total_score,1174.1028411364546,
417
+ ≥2,15000,mmmu_val_mmmu_acc,0.29222,
418
+ ≥2,15000,mmstar_average,0.36067644923000525,
419
+ ≥2,15000,ocrbench_ocrbench_accuracy,0.573,
420
+ ≥2,15000,seedbench_seed_all,0.5324068927181768,
421
+ ≥2,15000,textvqa_val_exact_match,0.55638,0.006732794755897807
422
+ ≥2,16000,ai2d_exact_match,0.4423575129533679,0.008939151893135126
423
+ ≥2,16000,average,0.4826201441651047,
424
+ ≥2,16000,average_rank,1.7,
425
+ ≥2,16000,chartqa_relaxed_overall,0.6136,0.009740429476494075
426
+ ≥2,16000,docvqa_val_anls,0.6608706272244327,0.006045334017501067
427
+ ≥2,16000,infovqa_val_anls,0.2658456998658782,0.007096717267723537
428
+ ≥2,16000,mme_total_score,1196.9498799519806,
429
+ ≥2,16000,mmmu_val_mmmu_acc,0.31667,
430
+ ≥2,16000,mmstar_average,0.3731506870142482,
431
+ ≥2,16000,ocrbench_ocrbench_accuracy,0.57,
432
+ ≥2,16000,seedbench_seed_all,0.5361867704280155,
433
+ ≥2,16000,textvqa_val_exact_match,0.5649,0.006716890748275191
434
+ ≥2,17000,ai2d_exact_match,0.4413860103626943,0.008937105222785164
435
+ ≥2,17000,average,0.4831149323865578,
436
+ ≥2,17000,average_rank,1.7,
437
+ ≥2,17000,chartqa_relaxed_overall,0.6176,0.00972141442174665
438
+ ≥2,17000,docvqa_val_anls,0.6642577380136374,0.0060335568273967325
439
+ ≥2,17000,infovqa_val_anls,0.27541255121036223,0.007220728859165202
440
+ ≥2,17000,mme_total_score,1226.7763105242097,
441
+ ≥2,17000,mmmu_val_mmmu_acc,0.31222,
442
+ ≥2,17000,mmstar_average,0.3539718773286798,
443
+ ≥2,17000,ocrbench_ocrbench_accuracy,0.579,
444
+ ≥2,17000,seedbench_seed_all,0.5351862145636465,
445
+ ≥2,17000,textvqa_val_exact_match,0.569,0.006714701590055116
446
+ ≥2,18000,ai2d_exact_match,0.4485103626943005,0.008951310133709684
447
+ ≥2,18000,average,0.4835600793462623,
448
+ ≥2,18000,average_rank,2.0,
449
+ ≥2,18000,chartqa_relaxed_overall,0.6144,0.009736682042198788
450
+ ≥2,18000,docvqa_val_anls,0.6637272679006306,0.006043227308338888
451
+ ≥2,18000,infovqa_val_anls,0.27265678126051135,0.007221049354398339
452
+ ≥2,18000,mme_total_score,1195.9091636654662,
453
+ ≥2,18000,mmmu_val_mmmu_acc,0.30778,
454
+ ≥2,18000,mmstar_average,0.36311980976508723,
455
+ ≥2,18000,ocrbench_ocrbench_accuracy,0.574,
456
+ ≥2,18000,seedbench_seed_all,0.535686492495831,
457
+ ≥2,18000,textvqa_val_exact_match,0.5721599999999999,0.006697668148820628
458
+ ≥2,19000,ai2d_exact_match,0.452720207253886,0.008958830742136076
459
+ ≥2,19000,average,0.48356600425554297,
460
+ ≥2,19000,average_rank,2.1,
461
+ ≥2,19000,chartqa_relaxed_overall,0.6128,0.009744149186940382
462
+ ≥2,19000,docvqa_val_anls,0.6625382006068653,0.006055552711820008
463
+ ≥2,19000,infovqa_val_anls,0.27294748299543115,0.007192809991207902
464
+ ≥2,19000,mme_total_score,1187.5743297318927,
465
+ ≥2,19000,mmmu_val_mmmu_acc,0.31222,
466
+ ≥2,19000,mmstar_average,0.3570382308233601,
467
+ ≥2,19000,ocrbench_ocrbench_accuracy,0.583,
468
+ ≥2,19000,seedbench_seed_all,0.5298499166203446,
469
+ ≥2,19000,textvqa_val_exact_match,0.56898,0.006699263239012863
470
+ ≥3,1000,ai2d_exact_match,0.265220207253886,0.00794536023378452
471
+ ≥3,1000,average,0.28531874553483266,
472
+ ≥3,1000,average_rank,2.7,
473
+ ≥3,1000,chartqa_relaxed_overall,0.3328,0.00942619781683542
474
+ ≥3,1000,docvqa_val_anls,0.3817065818486834,0.0060201820313849075
475
+ ≥3,1000,infovqa_val_anls,0.1724277234914656,0.0063364421395684075
476
+ ≥3,1000,mme_total_score,1014.2047819127652,
477
+ ≥3,1000,mmmu_val_mmmu_acc,0.25222,
478
+ ≥3,1000,mmstar_average,0.21462695986537322,
479
+ ≥3,1000,ocrbench_ocrbench_accuracy,0.34,
480
+ ≥3,1000,seedbench_seed_all,0.2490272373540856,
481
+ ≥3,1000,textvqa_val_exact_match,0.35984,0.0065578377478545035
482
+ ≥3,2000,ai2d_exact_match,0.265220207253886,0.007945360233784508
483
+ ≥3,2000,average,0.31448842347349887,
484
+ ≥3,2000,average_rank,3.2,
485
+ ≥3,2000,chartqa_relaxed_overall,0.4032,0.009812768221458571
486
+ ≥3,2000,docvqa_val_anls,0.4310807448012024,0.006128304803001007
487
+ ≥3,2000,infovqa_val_anls,0.18407319533143493,0.006354264550345518
488
+ ≥3,2000,mme_total_score,950.6575630252102,
489
+ ≥3,2000,mmmu_val_mmmu_acc,0.25667,
490
+ ≥3,2000,mmstar_average,0.21376105798280387,
491
+ ≥3,2000,ocrbench_ocrbench_accuracy,0.404,
492
+ ≥3,2000,seedbench_seed_all,0.2630906058921623,
493
+ ≥3,2000,textvqa_val_exact_match,0.4093,0.006714113334132268
494
+ ≥3,3000,ai2d_exact_match,0.27266839378238344,0.008015217564479087
495
+ ≥3,3000,average,0.34673777173147413,
496
+ ≥3,3000,average_rank,2.9,
497
+ ≥3,3000,chartqa_relaxed_overall,0.4572,0.00996528909739792
498
+ ≥3,3000,docvqa_val_anls,0.4798353261388413,0.006271292862880522
499
+ ≥3,3000,infovqa_val_anls,0.19911035728945153,0.006542447972079325
500
+ ≥3,3000,mme_total_score,1038.0850340136053,
501
+ ≥3,3000,mmmu_val_mmmu_acc,0.28,
502
+ ≥3,3000,mmstar_average,0.22994571273056794,
503
+ ≥3,3000,ocrbench_ocrbench_accuracy,0.452,
504
+ ≥3,3000,seedbench_seed_all,0.3042801556420233,
505
+ ≥3,3000,textvqa_val_exact_match,0.4456,0.006780363018639408
506
+ ≥3,4000,ai2d_exact_match,0.3008419689119171,0.008254458183344766
507
+ ≥3,4000,average,0.372256616377869,
508
+ ≥3,4000,average_rank,3.4,
509
+ ≥3,4000,chartqa_relaxed_overall,0.498,0.010001920583875201
510
+ ≥3,4000,docvqa_val_anls,0.5005393380802441,0.006322332852652876
511
+ ≥3,4000,infovqa_val_anls,0.2022979884693795,0.006565632329170807
512
+ ≥3,4000,mme_total_score,1107.3175270108043,
513
+ ≥3,4000,mmmu_val_mmmu_acc,0.27889,
514
+ ≥3,4000,mmstar_average,0.26412491008269295,
515
+ ≥3,4000,ocrbench_ocrbench_accuracy,0.449,
516
+ ≥3,4000,seedbench_seed_all,0.407615341856587,
517
+ ≥3,4000,textvqa_val_exact_match,0.449,0.006768113582993008
518
+ ≥3,5000,ai2d_exact_match,0.342940414507772,0.008543648986216484
519
+ ≥3,5000,average,0.39554841320961825,
520
+ ≥3,5000,average_rank,2.8,
521
+ ≥3,5000,chartqa_relaxed_overall,0.5016,0.010001949389825897
522
+ ≥3,5000,docvqa_val_anls,0.5217724980647621,0.006332629841214822
523
+ ≥3,5000,infovqa_val_anls,0.20921213176388598,0.006571741980614216
524
+ ≥3,5000,mme_total_score,1143.5384153661464,
525
+ ≥3,5000,mmmu_val_mmmu_acc,0.27889,
526
+ ≥3,5000,mmstar_average,0.3085553160176263,
527
+ ≥3,5000,ocrbench_ocrbench_accuracy,0.471,
528
+ ≥3,5000,seedbench_seed_all,0.4616453585325181,
529
+ ≥3,5000,textvqa_val_exact_match,0.46431999999999995,0.006784172205840865
530
+ ≥3,6000,ai2d_exact_match,0.3591321243523316,0.008634616704865624
531
+ ≥3,6000,average,0.408597756536911,
532
+ ≥3,6000,average_rank,3.4,
533
+ ≥3,6000,chartqa_relaxed_overall,0.524,0.009990471651004463
534
+ ≥3,6000,docvqa_val_anls,0.5385470022871673,0.006299095577053015
535
+ ≥3,6000,infovqa_val_anls,0.19835998026203344,0.006301952324954991
536
+ ≥3,6000,mme_total_score,1166.4043617446978,
537
+ ≥3,6000,mmmu_val_mmmu_acc,0.26889,
538
+ ≥3,6000,mmstar_average,0.3153629642986489,
539
+ ≥3,6000,ocrbench_ocrbench_accuracy,0.524,
540
+ ≥3,6000,seedbench_seed_all,0.4699277376320178,
541
+ ≥3,6000,textvqa_val_exact_match,0.47916000000000003,0.006792463941257152
542
+ ≥3,7000,ai2d_exact_match,0.3743523316062176,0.00871037538055804
543
+ ≥3,7000,average,0.4146622463341539,
544
+ ≥3,7000,average_rank,3.2,
545
+ ≥3,7000,chartqa_relaxed_overall,0.5324,0.009980979109165145
546
+ ≥3,7000,docvqa_val_anls,0.5423666654155802,0.0063068441060751875
547
+ ≥3,7000,infovqa_val_anls,0.20429604055237802,0.006425262069821515
548
+ ≥3,7000,mme_total_score,1148.8516406562626,
549
+ ≥3,7000,mmmu_val_mmmu_acc,0.28111,
550
+ ≥3,7000,mmstar_average,0.314909965425427,
551
+ ≥3,7000,ocrbench_ocrbench_accuracy,0.501,
552
+ ≥3,7000,seedbench_seed_all,0.4933852140077821,
553
+ ≥3,7000,textvqa_val_exact_match,0.4881400000000001,0.006795461996122717
554
+ ≥3,8000,ai2d_exact_match,0.3869818652849741,0.008766245989484155
555
+ ≥3,8000,average,0.42472235991601914,
556
+ ≥3,8000,average_rank,3.2,
557
+ ≥3,8000,chartqa_relaxed_overall,0.548,0.009955804699716018
558
+ ≥3,8000,docvqa_val_anls,0.564403800428159,0.006324143942272913
559
+ ≥3,8000,infovqa_val_anls,0.2120646153586776,0.006489418673794432
560
+ ≥3,8000,mme_total_score,1176.846738695478,
561
+ ≥3,8000,mmmu_val_mmmu_acc,0.27333,
562
+ ≥3,8000,mmstar_average,0.3144330871328953,
563
+ ≥3,8000,ocrbench_ocrbench_accuracy,0.528,
564
+ ≥3,8000,seedbench_seed_all,0.5021678710394664,
565
+ ≥3,8000,textvqa_val_exact_match,0.4931199999999999,0.006792154282108025
566
+ ≥3,9000,ai2d_exact_match,0.3898963730569948,0.008778252852376935
567
+ ≥3,9000,average,0.4346681180521681,
568
+ ≥3,9000,average_rank,2.6,
569
+ ≥3,9000,chartqa_relaxed_overall,0.558,0.009934479228979262
570
+ ≥3,9000,docvqa_val_anls,0.5621929569166433,0.006295010516387767
571
+ ≥3,9000,infovqa_val_anls,0.22522973646589128,0.0066872494457316176
572
+ ≥3,9000,mme_total_score,1198.484293717487,
573
+ ≥3,9000,mmmu_val_mmmu_acc,0.28333,
574
+ ≥3,9000,mmstar_average,0.3344926341622794,
575
+ ≥3,9000,ocrbench_ocrbench_accuracy,0.537,
576
+ ≥3,9000,seedbench_seed_all,0.5124513618677042,
577
+ ≥3,9000,textvqa_val_exact_match,0.50942,0.006797109154528248
578
+ ≥3,10000,ai2d_exact_match,0.39831606217616583,0.008811093384512251
579
+ ≥3,10000,average,0.4363450582991636,
580
+ ≥3,10000,average_rank,3.1,
581
+ ≥3,10000,chartqa_relaxed_overall,0.5644,0.00991868984106597
582
+ ≥3,10000,docvqa_val_anls,0.5876359888597289,0.006275892871477498
583
+ ≥3,10000,infovqa_val_anls,0.21414071448409078,0.00648536598995207
584
+ ≥3,10000,mme_total_score,1125.9251700680272,
585
+ ≥3,10000,mmmu_val_mmmu_acc,0.29111,
586
+ ≥3,10000,mmstar_average,0.3371786791280175,
587
+ ≥3,10000,ocrbench_ocrbench_accuracy,0.509,
588
+ ≥3,10000,seedbench_seed_all,0.5193440800444692,
589
+ ≥3,10000,textvqa_val_exact_match,0.50598,0.006794533174266738
590
+ ≥3,11000,ai2d_exact_match,0.41321243523316065,0.00886255263438398
591
+ ≥3,11000,average,0.44519353317344135,
592
+ ≥3,11000,average_rank,3.2,
593
+ ≥3,11000,chartqa_relaxed_overall,0.5648,0.009917647296166388
594
+ ≥3,11000,docvqa_val_anls,0.5901130752220717,0.006255436475783363
595
+ ≥3,11000,infovqa_val_anls,0.2237154173345093,0.006578524720143259
596
+ ≥3,11000,mme_total_score,1143.140256102441,
597
+ ≥3,11000,mmmu_val_mmmu_acc,0.28333,
598
+ ≥3,11000,mmstar_average,0.3585590419774556,
599
+ ≥3,11000,ocrbench_ocrbench_accuracy,0.53,
600
+ ≥3,11000,seedbench_seed_all,0.5252918287937743,
601
+ ≥3,11000,textvqa_val_exact_match,0.51772,0.0067926451641216416
602
+ ≥3,12000,ai2d_exact_match,0.42487046632124353,0.008896983637113786
603
+ ≥3,12000,average,0.45334975698356383,
604
+ ≥3,12000,average_rank,3.1,
605
+ ≥3,12000,chartqa_relaxed_overall,0.5804,0.009871844677005952
606
+ ≥3,12000,docvqa_val_anls,0.5973296953454501,0.006247696301034546
607
+ ≥3,12000,infovqa_val_anls,0.23268166067961038,0.006728755322175445
608
+ ≥3,12000,mme_total_score,1161.591236494598,
609
+ ≥3,12000,mmmu_val_mmmu_acc,0.29667,
610
+ ≥3,12000,mmstar_average,0.35603672424673755,
611
+ ≥3,12000,ocrbench_ocrbench_accuracy,0.547,
612
+ ≥3,12000,seedbench_seed_all,0.5226792662590328,
613
+ ≥3,12000,textvqa_val_exact_match,0.5224799999999999,0.006785052491135311
614
+ ≥3,13000,ai2d_exact_match,0.4167746113989637,0.008873613803189363
615
+ ≥3,13000,average,0.4552955297331601,
616
+ ≥3,13000,average_rank,3.1,
617
+ ≥3,13000,chartqa_relaxed_overall,0.5792,0.009875725592704212
618
+ ≥3,13000,docvqa_val_anls,0.6043196538705875,0.006220916351545474
619
+ ≥3,13000,infovqa_val_anls,0.2321475605523485,0.006628428362101348
620
+ ≥3,13000,mme_total_score,1192.6958783513405,
621
+ ≥3,13000,mmmu_val_mmmu_acc,0.3,
622
+ ≥3,13000,mmstar_average,0.3557924720711495,
623
+ ≥3,13000,ocrbench_ocrbench_accuracy,0.556,
624
+ ≥3,13000,seedbench_seed_all,0.5218454697053919,
625
+ ≥3,13000,textvqa_val_exact_match,0.53158,0.006760864040676702
626
+ ≥3,14000,ai2d_exact_match,0.4174222797927461,0.008875573686735059
627
+ ≥3,14000,average,0.4584104359960548,
628
+ ≥3,14000,average_rank,3.3,
629
+ ≥3,14000,chartqa_relaxed_overall,0.5804,0.009871844677005952
630
+ ≥3,14000,docvqa_val_anls,0.6088163368119239,0.006242610950178105
631
+ ≥3,14000,infovqa_val_anls,0.2468904756158024,0.006928482619868768
632
+ ≥3,14000,mme_total_score,1161.6323529411766,
633
+ ≥3,14000,mmmu_val_mmmu_acc,0.29111,
634
+ ≥3,14000,mmstar_average,0.3576571997262333,
635
+ ≥3,14000,ocrbench_ocrbench_accuracy,0.556,
636
+ ≥3,14000,seedbench_seed_all,0.5277376320177877,
637
+ ≥3,14000,textvqa_val_exact_match,0.5396599999999999,0.006765661217822162
638
+ ≥3,15000,ai2d_exact_match,0.4268134715025907,0.008902228386480453
639
+ ≥3,15000,average,0.46234537077539845,
640
+ ≥3,15000,average_rank,3.4,
641
+ ≥3,15000,chartqa_relaxed_overall,0.5904,0.009837166458771298
642
+ ≥3,15000,docvqa_val_anls,0.6107165887431807,0.006220748699516272
643
+ ≥3,15000,infovqa_val_anls,0.24344524225018507,0.006901823884143617
644
+ ≥3,15000,mme_total_score,1094.0884353741496,
645
+ ≥3,15000,mmmu_val_mmmu_acc,0.31667,
646
+ ≥3,15000,mmstar_average,0.35479574154210686,
647
+ ≥3,15000,ocrbench_ocrbench_accuracy,0.542,
648
+ ≥3,15000,seedbench_seed_all,0.5291272929405225,
649
+ ≥3,15000,textvqa_val_exact_match,0.54714,0.006751257658836256
650
+ ≥3,16000,ai2d_exact_match,0.4219559585492228,0.008888852746011193
651
+ ≥3,16000,average,0.46445166086391904,
652
+ ≥3,16000,average_rank,3.2,
653
+ ≥3,16000,chartqa_relaxed_overall,0.596,0.009815912634917984
654
+ ≥3,16000,docvqa_val_anls,0.6187371260510179,0.0061854867035024364
655
+ ≥3,16000,infovqa_val_anls,0.24123922023999142,0.006805599174719394
656
+ ≥3,16000,mme_total_score,1190.470988395358,
657
+ ≥3,16000,mmmu_val_mmmu_acc,0.30889,
658
+ ≥3,16000,mmstar_average,0.34969960791558363,
659
+ ≥3,16000,ocrbench_ocrbench_accuracy,0.561,
660
+ ≥3,16000,seedbench_seed_all,0.5334630350194552,
661
+ ≥3,16000,textvqa_val_exact_match,0.54908,0.006745556405979068
662
+ ≥3,17000,ai2d_exact_match,0.42422279792746115,0.008895204147957244
663
+ ≥3,17000,average,0.46637762414930467,
664
+ ≥3,17000,average_rank,2.8,
665
+ ≥3,17000,chartqa_relaxed_overall,0.5904,0.009837166458771298
666
+ ≥3,17000,docvqa_val_anls,0.6216027833511425,0.0061927675114663225
667
+ ≥3,17000,infovqa_val_anls,0.24846818556149347,0.006941645665075416
668
+ ≥3,17000,mme_total_score,1174.2743097238895,
669
+ ≥3,17000,mmmu_val_mmmu_acc,0.30556,
670
+ ≥3,17000,mmstar_average,0.34999517846362294,
671
+ ≥3,17000,ocrbench_ocrbench_accuracy,0.574,
672
+ ≥3,17000,seedbench_seed_all,0.5374096720400222,
673
+ ≥3,17000,textvqa_val_exact_match,0.54574,0.006745181241004187
674
+ ≥3,18000,ai2d_exact_match,0.4264896373056995,0.008901364017155312
675
+ ≥3,18000,average,0.46675721155035865,
676
+ ≥3,18000,average_rank,3.0,
677
+ ≥3,18000,chartqa_relaxed_overall,0.5924,0.009829727637028773
678
+ ≥3,18000,docvqa_val_anls,0.6233196080823749,0.006192343138186881
679
+ ≥3,18000,infovqa_val_anls,0.2450489373084423,0.0069116354204653415
680
+ ≥3,18000,mme_total_score,1183.4800920368148,
681
+ ≥3,18000,mmmu_val_mmmu_acc,0.31,
682
+ ≥3,18000,mmstar_average,0.35095973404159225,
683
+ ≥3,18000,ocrbench_ocrbench_accuracy,0.565,
684
+ ≥3,18000,seedbench_seed_all,0.5385769872151195,
685
+ ≥3,18000,textvqa_val_exact_match,0.54902,0.006747511667477171
686
+ ≥3,19000,ai2d_exact_match,0.4271373056994819,0.008903088856242221
687
+ ≥3,19000,average,0.4695625710163449,
688
+ ≥3,19000,average_rank,3.2,
689
+ ≥3,19000,chartqa_relaxed_overall,0.6,0.009799919151000504
690
+ ≥3,19000,docvqa_val_anls,0.6249883761261851,0.006191311654184014
691
+ ≥3,19000,infovqa_val_anls,0.25697301832944924,0.007011498374179748
692
+ ≥3,19000,mme_total_score,1127.6929771908763,
693
+ ≥3,19000,mmmu_val_mmmu_acc,0.30556,
694
+ ≥3,19000,mmstar_average,0.354252298914167,
695
+ ≥3,19000,ocrbench_ocrbench_accuracy,0.574,
696
+ ≥3,19000,seedbench_seed_all,0.533852140077821,
697
+ ≥3,19000,textvqa_val_exact_match,0.5493,0.006748704394341216
698
+ ≥4,1000,ai2d_exact_match,0.265220207253886,0.00794536023378451
699
+ ≥4,1000,average,0.2793207606664983,
700
+ ≥4,1000,average_rank,2.8,
701
+ ≥4,1000,chartqa_relaxed_overall,0.338,0.009462463489288317
702
+ ≥4,1000,docvqa_val_anls,0.35701229672500506,0.005886294752091894
703
+ ≥4,1000,infovqa_val_anls,0.1702128015434752,0.006226832451986845
704
+ ≥4,1000,mme_total_score,1085.1492597038816,
705
+ ≥4,1000,mmmu_val_mmmu_acc,0.25556,
706
+ ≥4,1000,mmstar_average,0.20140090679073797,
707
+ ≥4,1000,ocrbench_ocrbench_accuracy,0.321,
708
+ ≥4,1000,seedbench_seed_all,0.25314063368538076,
709
+ ≥4,1000,textvqa_val_exact_match,0.35234,0.006541090280099372
710
+ ≥4,2000,ai2d_exact_match,0.265220207253886,0.007945360233784506
711
+ ≥4,2000,average,0.31328817598941605,
712
+ ≥4,2000,average_rank,2.5,
713
+ ≥4,2000,chartqa_relaxed_overall,0.3824,0.009721414421746647
714
+ ≥4,2000,docvqa_val_anls,0.4201305033634262,0.006091444830113075
715
+ ≥4,2000,infovqa_val_anls,0.19137718920242625,0.006520595312619915
716
+ ≥4,2000,mme_total_score,1086.8080232092836,
717
+ ≥4,2000,mmmu_val_mmmu_acc,0.26222,
718
+ ≥4,2000,mmstar_average,0.21611905818172608,
719
+ ≥4,2000,ocrbench_ocrbench_accuracy,0.399,
720
+ ≥4,2000,seedbench_seed_all,0.2679266259032796,
721
+ ≥4,2000,textvqa_val_exact_match,0.4152,0.00671836433098539
722
+ ≥4,3000,ai2d_exact_match,0.26845854922279794,0.007976085014471616
723
+ ≥4,3000,average,0.33960193660592997,
724
+ ≥4,3000,average_rank,3.1,
725
+ ≥4,3000,chartqa_relaxed_overall,0.4464,0.009944363838318645
726
+ ≥4,3000,docvqa_val_anls,0.4407858509014982,0.0060983906556816405
727
+ ≥4,3000,infovqa_val_anls,0.20212796288555795,0.00665038909519927
728
+ ≥4,3000,mme_total_score,1114.6237494998,
729
+ ≥4,3000,mmmu_val_mmmu_acc,0.29111,
730
+ ≥4,3000,mmstar_average,0.24413840162973033,
731
+ ≥4,3000,ocrbench_ocrbench_accuracy,0.437,
732
+ ≥4,3000,seedbench_seed_all,0.29399666481378545,
733
+ ≥4,3000,textvqa_val_exact_match,0.4324,0.006747677691683722
734
+ ≥4,4000,ai2d_exact_match,0.30569948186528495,0.008291875663892657
735
+ ≥4,4000,average,0.3669898926039935,
736
+ ≥4,4000,average_rank,3.4,
737
+ ≥4,4000,chartqa_relaxed_overall,0.4784,0.009992663174896409
738
+ ≥4,4000,docvqa_val_anls,0.48059829461468456,0.006293877686600319
739
+ ≥4,4000,infovqa_val_anls,0.21045581097075505,0.006806991633534936
740
+ ≥4,4000,mme_total_score,1055.9580832332933,
741
+ ≥4,4000,mmmu_val_mmmu_acc,0.26778,
742
+ ≥4,4000,mmstar_average,0.2427667077973349,
743
+ ≥4,4000,ocrbench_ocrbench_accuracy,0.465,
744
+ ≥4,4000,seedbench_seed_all,0.41172873818788214,
745
+ ≥4,4000,textvqa_val_exact_match,0.4404799999999999,0.0067545391167522255
746
+ ≥4,5000,ai2d_exact_match,0.33678756476683935,0.008506208807020249
747
+ ≥4,5000,average,0.39120653632259295,
748
+ ≥4,5000,average_rank,3.6,
749
+ ≥4,5000,chartqa_relaxed_overall,0.49,0.01
750
+ ≥4,5000,docvqa_val_anls,0.5128995224283958,0.0062990418586802415
751
+ ≥4,5000,infovqa_val_anls,0.2258500728355162,0.006992798311741374
752
+ ≥4,5000,mme_total_score,1159.296218487395,
753
+ ≥4,5000,mmmu_val_mmmu_acc,0.27889,
754
+ ≥4,5000,mmstar_average,0.27755131111938897,
755
+ ≥4,5000,ocrbench_ocrbench_accuracy,0.485,
756
+ ≥4,5000,seedbench_seed_all,0.4526403557531962,
757
+ ≥4,5000,textvqa_val_exact_match,0.46124000000000004,0.006800410399109693
758
+ ≥4,6000,ai2d_exact_match,0.3555699481865285,0.008615532040064745
759
+ ≥4,6000,average,0.4066720136664597,
760
+ ≥4,6000,average_rank,3.8,
761
+ ≥4,6000,chartqa_relaxed_overall,0.5164,0.009996618876179197
762
+ ≥4,6000,docvqa_val_anls,0.5222903438245158,0.006294204987152361
763
+ ≥4,6000,infovqa_val_anls,0.2154967392860663,0.006755296318500426
764
+ ≥4,6000,mme_total_score,1106.7833133253303,
765
+ ≥4,6000,mmmu_val_mmmu_acc,0.28444,
766
+ ≥4,6000,mmstar_average,0.3133409749695097,
767
+ ≥4,6000,ocrbench_ocrbench_accuracy,0.497,
768
+ ≥4,6000,seedbench_seed_all,0.47821011673151753,
769
+ ≥4,6000,textvqa_val_exact_match,0.4773,0.006792419995397721
770
+ ≥4,7000,ai2d_exact_match,0.36139896373056996,0.00864649204396549
771
+ ≥4,7000,average,0.4103350463820299,
772
+ ≥4,7000,average_rank,4.0,
773
+ ≥4,7000,chartqa_relaxed_overall,0.5172,0.009996080864671974
774
+ ≥4,7000,docvqa_val_anls,0.5257230333803197,0.006272237880714654
775
+ ≥4,7000,infovqa_val_anls,0.23099276568068242,0.006959690609260986
776
+ ≥4,7000,mme_total_score,1121.3700480192078,
777
+ ≥4,7000,mmmu_val_mmmu_acc,0.28444,
778
+ ≥4,7000,mmstar_average,0.2974223555916661,
779
+ ≥4,7000,ocrbench_ocrbench_accuracy,0.508,
780
+ ≥4,7000,seedbench_seed_all,0.48893829905503056,
781
+ ≥4,7000,textvqa_val_exact_match,0.4789,0.006778594830020003
782
+ ≥4,8000,ai2d_exact_match,0.3753238341968912,0.008714896333400902
783
+ ≥4,8000,average,0.4184757440883718,
784
+ ≥4,8000,average_rank,4.4,
785
+ ≥4,8000,chartqa_relaxed_overall,0.546,0.009959582185560013
786
+ ≥4,8000,docvqa_val_anls,0.5520377547382946,0.006292207241711801
787
+ ≥4,8000,infovqa_val_anls,0.21912642797830967,0.006806034933740873
788
+ ≥4,8000,mme_total_score,1115.3978591436573,
789
+ ≥4,8000,mmmu_val_mmmu_acc,0.27222,
790
+ ≥4,8000,mmstar_average,0.30256719850330677,
791
+ ≥4,8000,ocrbench_ocrbench_accuracy,0.501,
792
+ ≥4,8000,seedbench_seed_all,0.49966648137854364,
793
+ ≥4,8000,textvqa_val_exact_match,0.49834000000000006,0.006795725513663086
794
+ ≥4,9000,ai2d_exact_match,0.37240932642487046,0.008701221016094279
795
+ ≥4,9000,average,0.42372729918985835,
796
+ ≥4,9000,average_rank,4.1,
797
+ ≥4,9000,chartqa_relaxed_overall,0.5412,0.009967987174315731
798
+ ≥4,9000,docvqa_val_anls,0.5552693199281612,0.0063045997581429696
799
+ ≥4,9000,infovqa_val_anls,0.24796577925731056,0.007300649633229033
800
+ ≥4,9000,mme_total_score,1167.27931172469,
801
+ ≥4,9000,mmmu_val_mmmu_acc,0.27444,
802
+ ≥4,9000,mmstar_average,0.3137012670983827,
803
+ ≥4,9000,ocrbench_ocrbench_accuracy,0.51,
804
+ ≥4,9000,seedbench_seed_all,0.5,
805
+ ≥4,9000,textvqa_val_exact_match,0.49855999999999995,0.0067982960104521145
806
+ ≥4,10000,ai2d_exact_match,0.40025906735751293,0.008818284784223729
807
+ ≥4,10000,average,0.43549956089714104,
808
+ ≥4,10000,average_rank,3.2,
809
+ ≥4,10000,chartqa_relaxed_overall,0.554,0.009943497838271193
810
+ ≥4,10000,docvqa_val_anls,0.5744988238115305,0.006278177576359751
811
+ ≥4,10000,infovqa_val_anls,0.2383657067942184,0.0070367443431813784
812
+ ≥4,10000,mme_total_score,1164.3313325330132,
813
+ ≥4,10000,mmmu_val_mmmu_acc,0.28889,
814
+ ≥4,10000,mmstar_average,0.32102500708710546,
815
+ ≥4,10000,ocrbench_ocrbench_accuracy,0.523,
816
+ ≥4,10000,seedbench_seed_all,0.5153974430239021,
817
+ ≥4,10000,textvqa_val_exact_match,0.5040600000000001,0.006783679454498736
818
+ ≥4,11000,ai2d_exact_match,0.38795336787564766,0.00877028496444078
819
+ ≥4,11000,average,0.44176056966808264,
820
+ ≥4,11000,average_rank,3.4,
821
+ ≥4,11000,chartqa_relaxed_overall,0.5736,0.009893046292521752
822
+ ≥4,11000,docvqa_val_anls,0.5809715962307931,0.006265890767561776
823
+ ≥4,11000,infovqa_val_anls,0.23536745207392892,0.006978481065319724
824
+ ≥4,11000,mme_total_score,1179.1470588235293,
825
+ ≥4,11000,mmmu_val_mmmu_acc,0.29,
826
+ ≥4,11000,mmstar_average,0.3247706096650586,
827
+ ≥4,11000,ocrbench_ocrbench_accuracy,0.551,
828
+ ≥4,11000,seedbench_seed_all,0.5077821011673151,
829
+ ≥4,11000,textvqa_val_exact_match,0.5244000000000001,0.006777175962213506
830
+ ≥4,12000,ai2d_exact_match,0.4051165803108808,0.008835632146152574
831
+ ≥4,12000,average,0.45346905961254286,
832
+ ≥4,12000,average_rank,2.7,
833
+ ≥4,12000,chartqa_relaxed_overall,0.566,0.00991448025705367
834
+ ≥4,12000,docvqa_val_anls,0.5921021144001818,0.006234512965080029
835
+ ≥4,12000,infovqa_val_anls,0.2538480181050717,0.007182280029496359
836
+ ≥4,12000,mme_total_score,1209.4413765506204,
837
+ ≥4,12000,mmmu_val_mmmu_acc,0.3,
838
+ ≥4,12000,mmstar_average,0.35097126616478974,
839
+ ≥4,12000,ocrbench_ocrbench_accuracy,0.564,
840
+ ≥4,12000,seedbench_seed_all,0.5264035575319622,
841
+ ≥4,12000,textvqa_val_exact_match,0.52278,0.006771483213963052
842
+ ≥4,13000,ai2d_exact_match,0.41321243523316065,0.00886255263438398
843
+ ≥4,13000,average,0.4517297858459223,
844
+ ≥4,13000,average_rank,3.7,
845
+ ≥4,13000,chartqa_relaxed_overall,0.5612,0.009926794069396146
846
+ ≥4,13000,docvqa_val_anls,0.5990624567313981,0.006220638069941083
847
+ ≥4,13000,infovqa_val_anls,0.25690374869936067,0.007271041982190199
848
+ ≥4,13000,mme_total_score,1155.3058223289315,
849
+ ≥4,13000,mmmu_val_mmmu_acc,0.28556,
850
+ ≥4,13000,mmstar_average,0.34272201671869795,
851
+ ≥4,13000,ocrbench_ocrbench_accuracy,0.558,
852
+ ≥4,13000,seedbench_seed_all,0.5253474152306837,
853
+ ≥4,13000,textvqa_val_exact_match,0.52356,0.006774582221277897
854
+ ≥4,14000,ai2d_exact_match,0.42033678756476683,0.008884198538329086
855
+ ≥4,14000,average,0.4566555050830236,
856
+ ≥4,14000,average_rank,2.9,
857
+ ≥4,14000,chartqa_relaxed_overall,0.5804,0.009871844677005952
858
+ ≥4,14000,docvqa_val_anls,0.5986851751311028,0.006220005268131575
859
+ ≥4,14000,infovqa_val_anls,0.26286587749144424,0.007367704534540481
860
+ ≥4,14000,mme_total_score,1171.1798719487795,
861
+ ≥4,14000,mmmu_val_mmmu_acc,0.29889,
862
+ ≥4,14000,mmstar_average,0.34666697515411715,
863
+ ≥4,14000,ocrbench_ocrbench_accuracy,0.561,
864
+ ≥4,14000,seedbench_seed_all,0.526514730405781,
865
+ ≥4,14000,textvqa_val_exact_match,0.51454,0.006775138240629185
866
+ ≥4,15000,ai2d_exact_match,0.4167746113989637,0.008873613803189377
867
+ ≥4,15000,average,0.4535696515209742,
868
+ ≥4,15000,average_rank,4.0,
869
+ ≥4,15000,chartqa_relaxed_overall,0.5788,0.009877005927832552
870
+ ≥4,15000,docvqa_val_anls,0.6057110103452333,0.006218006894177394
871
+ ≥4,15000,infovqa_val_anls,0.24773239789580184,0.007161205474173786
872
+ ≥4,15000,mme_total_score,1166.4441776710685,
873
+ ≥4,15000,mmmu_val_mmmu_acc,0.28333,
874
+ ≥4,15000,mmstar_average,0.3337998223700587,
875
+ ≥4,15000,ocrbench_ocrbench_accuracy,0.552,
876
+ ≥4,15000,seedbench_seed_all,0.5302390216787104,
877
+ ≥4,15000,textvqa_val_exact_match,0.53374,0.006755401510636802
878
+ ≥4,16000,ai2d_exact_match,0.4180699481865285,0.008877517831066049
879
+ ≥4,16000,average,0.45867291897271806,
880
+ ≥4,16000,average_rank,3.3,
881
+ ≥4,16000,chartqa_relaxed_overall,0.5772,0.009882060820012199
882
+ ≥4,16000,docvqa_val_anls,0.6103972706562478,0.0062217369721592605
883
+ ≥4,16000,infovqa_val_anls,0.2588879789412925,0.007244086191292801
884
+ ≥4,16000,mme_total_score,1221.7044817927172,
885
+ ≥4,16000,mmmu_val_mmmu_acc,0.28444,
886
+ ≥4,16000,mmstar_average,0.3519979156607769,
887
+ ≥4,16000,ocrbench_ocrbench_accuracy,0.566,
888
+ ≥4,16000,seedbench_seed_all,0.5296831573096165,
889
+ ≥4,16000,textvqa_val_exact_match,0.5313800000000001,0.006770452490445899
890
+ ≥4,17000,ai2d_exact_match,0.42033678756476683,0.008884198538329094
891
+ ≥4,17000,average,0.45851865563425853,
892
+ ≥4,17000,average_rank,4.0,
893
+ ≥4,17000,chartqa_relaxed_overall,0.5852,0.009855721084488851
894
+ ≥4,17000,docvqa_val_anls,0.6114594059922879,0.00618088682319735
895
+ ≥4,17000,infovqa_val_anls,0.2562207626214487,0.007200492820461838
896
+ ≥4,17000,mme_total_score,1153.1200480192078,
897
+ ≥4,17000,mmmu_val_mmmu_acc,0.28667,
898
+ ≥4,17000,mmstar_average,0.3472831568700119,
899
+ ≥4,17000,ocrbench_ocrbench_accuracy,0.559,
900
+ ≥4,17000,seedbench_seed_all,0.532017787659811,
901
+ ≥4,17000,textvqa_val_exact_match,0.5284800000000001,0.006763351721247465
902
+ ≥4,18000,ai2d_exact_match,0.41483160621761656,0.008867639612484157
903
+ ≥4,18000,average,0.4608796157707856,
904
+ ≥4,18000,average_rank,3.6,
905
+ ≥4,18000,chartqa_relaxed_overall,0.5908,0.009835692163550793
906
+ ≥4,18000,docvqa_val_anls,0.6149471910441747,0.006205844131422159
907
+ ≥4,18000,infovqa_val_anls,0.25757413998054934,0.007209471673907614
908
+ ≥4,18000,mme_total_score,1208.4107643057223,
909
+ ≥4,18000,mmmu_val_mmmu_acc,0.28333,
910
+ ≥4,18000,mmstar_average,0.34716850185982184,
911
+ ≥4,18000,ocrbench_ocrbench_accuracy,0.571,
912
+ ≥4,18000,seedbench_seed_all,0.5331851028349083,
913
+ ≥4,18000,textvqa_val_exact_match,0.53508,0.006764601295430295
914
+ ≥4,19000,ai2d_exact_match,0.4261658031088083,0.008900495747130163
915
+ ≥4,19000,average,0.4632638192773651,
916
+ ≥4,19000,average_rank,3.8,
917
+ ≥4,19000,chartqa_relaxed_overall,0.59,0.009838634025503496
918
+ ≥4,19000,docvqa_val_anls,0.6191514566027063,0.006184609999615919
919
+ ≥4,19000,infovqa_val_anls,0.254771484204865,0.007169865220056487
920
+ ≥4,19000,mme_total_score,1155.6170468187274,
921
+ ≥4,19000,mmmu_val_mmmu_acc,0.28667,
922
+ ≥4,19000,mmstar_average,0.3639228113475546,
923
+ ≥4,19000,ocrbench_ocrbench_accuracy,0.559,
924
+ ≥4,19000,seedbench_seed_all,0.5310728182323513,
925
+ ≥4,19000,textvqa_val_exact_match,0.5386200000000001,0.0067549703955686585
926
+ ≥5,1000,ai2d_exact_match,0.25971502590673573,0.007891865786132416
927
+ ≥5,1000,average,0.26157782335380847,
928
+ ≥5,1000,average_rank,4.4,
929
+ ≥5,1000,chartqa_relaxed_overall,0.2932,0.009106408439657643
930
+ ≥5,1000,docvqa_val_anls,0.33552834229551537,0.005637187546463478
931
+ ≥5,1000,infovqa_val_anls,0.14237853126797878,0.005593365396144926
932
+ ≥5,1000,mme_total_score,968.6369547819128,
933
+ ≥5,1000,mmmu_val_mmmu_acc,0.24556,
934
+ ≥5,1000,mmstar_average,0.22142932783466918,
935
+ ≥5,1000,ocrbench_ocrbench_accuracy,0.312,
936
+ ≥5,1000,seedbench_seed_all,0.2525291828793774,
937
+ ≥5,1000,textvqa_val_exact_match,0.29186,0.006221486113292513
938
+ ≥5,2000,ai2d_exact_match,0.25809585492227977,0.007875825748825005
939
+ ≥5,2000,average,0.3027645397000401,
940
+ ≥5,2000,average_rank,3.7,
941
+ ≥5,2000,chartqa_relaxed_overall,0.3408,0.009481461028833927
942
+ ≥5,2000,docvqa_val_anls,0.4068080236104559,0.00608411485086675
943
+ ≥5,2000,infovqa_val_anls,0.16040581942520143,0.005930443116029954
944
+ ≥5,2000,mme_total_score,1068.6808723489396,
945
+ ≥5,2000,mmmu_val_mmmu_acc,0.24556,
946
+ ≥5,2000,mmstar_average,0.2341315706820568,
947
+ ≥5,2000,ocrbench_ocrbench_accuracy,0.416,
948
+ ≥5,2000,seedbench_seed_all,0.26725958866036686,
949
+ ≥5,2000,textvqa_val_exact_match,0.39582,0.006679538065297116
950
+ ≥5,3000,ai2d_exact_match,0.2616580310880829,0.007910929195141643
951
+ ≥5,3000,average,0.33060384994476777,
952
+ ≥5,3000,average_rank,3.6,
953
+ ≥5,3000,chartqa_relaxed_overall,0.3832,0.009725273074549106
954
+ ≥5,3000,docvqa_val_anls,0.44828350858716837,0.006213256822478988
955
+ ≥5,3000,infovqa_val_anls,0.18913153904026223,0.0065079547987124675
956
+ ≥5,3000,mme_total_score,1154.7704081632653,
957
+ ≥5,3000,mmmu_val_mmmu_acc,0.26667,
958
+ ≥5,3000,mmstar_average,0.25122915833603465,
959
+ ≥5,3000,ocrbench_ocrbench_accuracy,0.433,
960
+ ≥5,3000,seedbench_seed_all,0.31634241245136185,
961
+ ≥5,3000,textvqa_val_exact_match,0.42591999999999997,0.006732985767062625
962
+ ≥5,4000,ai2d_exact_match,0.30926165803108807,0.008318624237265801
963
+ ≥5,4000,average,0.3627994699316965,
964
+ ≥5,4000,average_rank,3.0,
965
+ ≥5,4000,chartqa_relaxed_overall,0.4268,0.009894233792716745
966
+ ≥5,4000,docvqa_val_anls,0.46900891865422867,0.006307491163968653
967
+ ≥5,4000,infovqa_val_anls,0.1867790227760032,0.006445564559285368
968
+ ≥5,4000,mme_total_score,1174.8784513805522,
969
+ ≥5,4000,mmmu_val_mmmu_acc,0.28333,
970
+ ≥5,4000,mmstar_average,0.2869697933480728,
971
+ ≥5,4000,ocrbench_ocrbench_accuracy,0.454,
972
+ ≥5,4000,seedbench_seed_all,0.41050583657587547,
973
+ ≥5,4000,textvqa_val_exact_match,0.43854,0.0067614552527859775
974
+ ≥5,5000,ai2d_exact_match,0.34229274611398963,0.008539783270456082
975
+ ≥5,5000,average,0.382476547981286,
976
+ ≥5,5000,average_rank,3.0,
977
+ ≥5,5000,chartqa_relaxed_overall,0.4572,0.00996528909739792
978
+ ≥5,5000,docvqa_val_anls,0.4793408867564976,0.006287275417010131
979
+ ≥5,5000,infovqa_val_anls,0.1793224766992334,0.006318821081615601
980
+ ≥5,5000,mme_total_score,1266.0171068427371,
981
+ ≥5,5000,mmmu_val_mmmu_acc,0.28333,
982
+ ≥5,5000,mmstar_average,0.2965278361584629,
983
+ ≥5,5000,ocrbench_ocrbench_accuracy,0.496,
984
+ ≥5,5000,seedbench_seed_all,0.45497498610339077,
985
+ ≥5,5000,textvqa_val_exact_match,0.4532999999999999,0.006785206688521816
986
+ ≥5,6000,ai2d_exact_match,0.3633419689119171,0.008656504892172956
987
+ ≥5,6000,average,0.3927428387872692,
988
+ ≥5,6000,average_rank,3.3,
989
+ ≥5,6000,chartqa_relaxed_overall,0.4496,0.009951057502505313
990
+ ≥5,6000,docvqa_val_anls,0.4903665904735212,0.0062875635436497905
991
+ ≥5,6000,infovqa_val_anls,0.19061425722983663,0.006415658242163764
992
+ ≥5,6000,mme_total_score,1291.9338735494198,
993
+ ≥5,6000,mmmu_val_mmmu_acc,0.29333,
994
+ ≥5,6000,mmstar_average,0.31955832446570054,
995
+ ≥5,6000,ocrbench_ocrbench_accuracy,0.486,
996
+ ≥5,6000,seedbench_seed_all,0.4819344080044469,
997
+ ≥5,6000,textvqa_val_exact_match,0.45993999999999996,0.006786622940033285
998
+ ≥5,7000,ai2d_exact_match,0.34650259067357514,0.00856459563872305
999
+ ≥5,7000,average,0.4018210435787178,
1000
+ ≥5,7000,average_rank,3.7,
1001
+ ≥5,7000,chartqa_relaxed_overall,0.4672,0.009980456292330589
1002
+ ≥5,7000,docvqa_val_anls,0.5111773927622861,0.006315277614665012
1003
+ ≥5,7000,infovqa_val_anls,0.1940751656275074,0.006487614451910199
1004
+ ≥5,7000,mme_total_score,1190.4136654661866,
1005
+ ≥5,7000,mmmu_val_mmmu_acc,0.30333,
1006
+ ≥5,7000,mmstar_average,0.3269684121278599,
1007
+ ≥5,7000,ocrbench_ocrbench_accuracy,0.495,
1008
+ ≥5,7000,seedbench_seed_all,0.4924958310172318,
1009
+ ≥5,7000,textvqa_val_exact_match,0.47963999999999996,0.006798760086055511
1010
+ ≥5,8000,ai2d_exact_match,0.37694300518134716,0.008722348153640555
1011
+ ≥5,8000,average,0.4095380208793758,
1012
+ ≥5,8000,average_rank,3.9,
1013
+ ≥5,8000,chartqa_relaxed_overall,0.482,0.009995517202509246
1014
+ ≥5,8000,docvqa_val_anls,0.525391623371338,0.0063183015218023705
1015
+ ≥5,8000,infovqa_val_anls,0.19303661546973522,0.0064531126694776005
1016
+ ≥5,8000,mme_total_score,1202.8684473789515,
1017
+ ≥5,8000,mmmu_val_mmmu_acc,0.29556,
1018
+ ≥5,8000,mmstar_average,0.31571686940613636,
1019
+ ≥5,8000,ocrbench_ocrbench_accuracy,0.51,
1020
+ ≥5,8000,seedbench_seed_all,0.5013340744858255,
1021
+ ≥5,8000,textvqa_val_exact_match,0.48586,0.006796708845479998
1022
+ ≥5,9000,ai2d_exact_match,0.36755181347150256,0.008677676304542971
1023
+ ≥5,9000,average,0.4152301884271321,
1024
+ ≥5,9000,average_rank,3.6,
1025
+ ≥5,9000,chartqa_relaxed_overall,0.48,0.009993995796516643
1026
+ ≥5,9000,docvqa_val_anls,0.5348533191278194,0.006304701269106288
1027
+ ≥5,9000,infovqa_val_anls,0.19959306843788593,0.006473992753624113
1028
+ ≥5,9000,mme_total_score,1204.311424569828,
1029
+ ≥5,9000,mmmu_val_mmmu_acc,0.30111,
1030
+ ≥5,9000,mmstar_average,0.33670210514605864,
1031
+ ≥5,9000,ocrbench_ocrbench_accuracy,0.526,
1032
+ ≥5,9000,seedbench_seed_all,0.5025013896609227,
1033
+ ≥5,9000,textvqa_val_exact_match,0.48876,0.006790814053639094
1034
+ ≥5,10000,ai2d_exact_match,0.37338082901554404,0.008705816961084262
1035
+ ≥5,10000,average,0.4147710702725824,
1036
+ ≥5,10000,average_rank,4.3,
1037
+ ≥5,10000,chartqa_relaxed_overall,0.5004,0.010001997399559365
1038
+ ≥5,10000,docvqa_val_anls,0.5453333593332199,0.006307378137253011
1039
+ ≥5,10000,infovqa_val_anls,0.19201908600396586,0.006324501041207469
1040
+ ≥5,10000,mme_total_score,1201.624049619848,
1041
+ ≥5,10000,mmmu_val_mmmu_acc,0.28778,
1042
+ ≥5,10000,mmstar_average,0.3212133619915626,
1043
+ ≥5,10000,ocrbench_ocrbench_accuracy,0.518,
1044
+ ≥5,10000,seedbench_seed_all,0.5073929961089494,
1045
+ ≥5,10000,textvqa_val_exact_match,0.48741999999999996,0.006796262690428575
1046
+ ≥5,11000,ai2d_exact_match,0.39216321243523317,0.008787363693921278
1047
+ ≥5,11000,average,0.4233003259407862,
1048
+ ≥5,11000,average_rank,4.1,
1049
+ ≥5,11000,chartqa_relaxed_overall,0.4996,0.010001997399559365
1050
+ ≥5,11000,docvqa_val_anls,0.551420413629631,0.006320257790796602
1051
+ ≥5,11000,infovqa_val_anls,0.210676410854829,0.006763210440361733
1052
+ ≥5,11000,mme_total_score,1205.9969987995198,
1053
+ ≥5,11000,mmmu_val_mmmu_acc,0.28444,
1054
+ ≥5,11000,mmstar_average,0.3293872434067487,
1055
+ ≥5,11000,ocrbench_ocrbench_accuracy,0.529,
1056
+ ≥5,11000,seedbench_seed_all,0.5161756531406337,
1057
+ ≥5,11000,textvqa_val_exact_match,0.49684000000000006,0.0068038269593118286
1058
+ ≥5,12000,ai2d_exact_match,0.3866580310880829,0.008764891499284331
1059
+ ≥5,12000,average,0.42915630067456684,
1060
+ ≥5,12000,average_rank,4.2,
1061
+ ≥5,12000,chartqa_relaxed_overall,0.5208,0.00999334232158103
1062
+ ≥5,12000,docvqa_val_anls,0.5651676550208474,0.006302610383880636
1063
+ ≥5,12000,infovqa_val_anls,0.2027930391809884,0.006544451575065131
1064
+ ≥5,12000,mme_total_score,1229.9349739895958,
1065
+ ≥5,12000,mmmu_val_mmmu_acc,0.28778,
1066
+ ≥5,12000,mmstar_average,0.32392609084232776,
1067
+ ≥5,12000,ocrbench_ocrbench_accuracy,0.543,
1068
+ ≥5,12000,seedbench_seed_all,0.523401889938855,
1069
+ ≥5,12000,textvqa_val_exact_match,0.50888,0.006783556032531116
1070
+ ≥5,13000,ai2d_exact_match,0.3960492227979275,0.008802520399129762
1071
+ ≥5,13000,average,0.42835710337207544,
1072
+ ≥5,13000,average_rank,4.2,
1073
+ ≥5,13000,chartqa_relaxed_overall,0.5016,0.010001949389825897
1074
+ ≥5,13000,docvqa_val_anls,0.5709600314067668,0.006314249102846677
1075
+ ≥5,13000,infovqa_val_anls,0.20954434018332707,0.006654090452675221
1076
+ ≥5,13000,mme_total_score,1299.5349139655862,
1077
+ ≥5,13000,mmmu_val_mmmu_acc,0.28889,
1078
+ ≥5,13000,mmstar_average,0.3229485683119639,
1079
+ ≥5,13000,ocrbench_ocrbench_accuracy,0.531,
1080
+ ≥5,13000,seedbench_seed_all,0.5271817676486937,
1081
+ ≥5,13000,textvqa_val_exact_match,0.50704,0.0067891167394013964
1082
+ ≥5,14000,ai2d_exact_match,0.39993523316062174,0.00881709625708285
1083
+ ≥5,14000,average,0.4331521956786839,
1084
+ ≥5,14000,average_rank,4.4,
1085
+ ≥5,14000,chartqa_relaxed_overall,0.5184,0.009995225751083666
1086
+ ≥5,14000,docvqa_val_anls,0.5724182420273719,0.006294497356115864
1087
+ ≥5,14000,infovqa_val_anls,0.20486077238494155,0.0066055382910337555
1088
+ ≥5,14000,mme_total_score,1249.124649859944,
1089
+ ≥5,14000,mmmu_val_mmmu_acc,0.29778,
1090
+ ≥5,14000,mmstar_average,0.3186280983045363,
1091
+ ≥5,14000,ocrbench_ocrbench_accuracy,0.545,
1092
+ ≥5,14000,seedbench_seed_all,0.5253474152306837,
1093
+ ≥5,14000,textvqa_val_exact_match,0.516,0.006773328250950121
1094
+ ≥5,15000,ai2d_exact_match,0.4015544041450777,0.008822998789014788
1095
+ ≥5,15000,average,0.4411473449525349,
1096
+ ≥5,15000,average_rank,3.9,
1097
+ ≥5,15000,chartqa_relaxed_overall,0.5284,0.009985853138573692
1098
+ ≥5,15000,docvqa_val_anls,0.579157353735036,0.0062656254347961005
1099
+ ≥5,15000,infovqa_val_anls,0.21477765510878127,0.0066658509229973765
1100
+ ≥5,15000,mme_total_score,1272.857042817127,
1101
+ ≥5,15000,mmmu_val_mmmu_acc,0.29778,
1102
+ ≥5,15000,mmstar_average,0.32345739197302437,
1103
+ ≥5,15000,ocrbench_ocrbench_accuracy,0.574,
1104
+ ≥5,15000,seedbench_seed_all,0.5307392996108949,
1105
+ ≥5,15000,textvqa_val_exact_match,0.5204599999999999,0.006785535084079623
1106
+ ≥5,16000,ai2d_exact_match,0.405440414507772,0.008836756671878079
1107
+ ≥5,16000,average,0.43998232270106674,
1108
+ ≥5,16000,average_rank,4.6,
1109
+ ≥5,16000,chartqa_relaxed_overall,0.5352,0.009977184055667825
1110
+ ≥5,16000,docvqa_val_anls,0.5773546915859126,0.006282997503331479
1111
+ ≥5,16000,infovqa_val_anls,0.21908996623791824,0.006795378209171745
1112
+ ≥5,16000,mme_total_score,1216.8606442577031,
1113
+ ≥5,16000,mmmu_val_mmmu_acc,0.29444,
1114
+ ≥5,16000,mmstar_average,0.32497171858166657,
1115
+ ≥5,16000,ocrbench_ocrbench_accuracy,0.554,
1116
+ ≥5,16000,seedbench_seed_all,0.5274041133963313,
1117
+ ≥5,16000,textvqa_val_exact_match,0.52194,0.006778238427735974
1118
+ ≥5,17000,ai2d_exact_match,0.40479274611398963,0.008834503632021165
1119
+ ≥5,17000,average,0.44388314606679896,
1120
+ ≥5,17000,average_rank,4.3,
1121
+ ≥5,17000,chartqa_relaxed_overall,0.5364,0.009975460887997665
1122
+ ≥5,17000,docvqa_val_anls,0.5822797934751336,0.00626122992985784
1123
+ ≥5,17000,infovqa_val_anls,0.21556119368142682,0.0067540438882146454
1124
+ ≥5,17000,mme_total_score,1221.2239895958382,
1125
+ ≥5,17000,mmmu_val_mmmu_acc,0.29778,
1126
+ ≥5,17000,mmstar_average,0.33263595987427624,
1127
+ ≥5,17000,ocrbench_ocrbench_accuracy,0.564,
1128
+ ≥5,17000,seedbench_seed_all,0.5335186214563646,
1129
+ ≥5,17000,textvqa_val_exact_match,0.52798,0.006783526160149534
1130
+ ≥5,18000,ai2d_exact_match,0.4073834196891192,0.008843420154535592
1131
+ ≥5,18000,average,0.44456504203931085,
1132
+ ≥5,18000,average_rank,4.3,
1133
+ ≥5,18000,chartqa_relaxed_overall,0.542,0.009966651075133582
1134
+ ≥5,18000,docvqa_val_anls,0.5939080403347998,0.006256065698832867
1135
+ ≥5,18000,infovqa_val_anls,0.217668975074557,0.006723025382951482
1136
+ ≥5,18000,mme_total_score,1263.6669667867147,
1137
+ ≥5,18000,mmmu_val_mmmu_acc,0.28333,
1138
+ ≥5,18000,mmstar_average,0.3216128643225813,
1139
+ ≥5,18000,ocrbench_ocrbench_accuracy,0.568,
1140
+ ≥5,18000,seedbench_seed_all,0.5357420789327404,
1141
+ ≥5,18000,textvqa_val_exact_match,0.5314399999999999,0.006770308168358284
1142
+ ≥5,19000,ai2d_exact_match,0.4060880829015544,0.008838993764195596
1143
+ ≥5,19000,average,0.44569235541726965,
1144
+ ≥5,19000,average_rank,4.1,
1145
+ ≥5,19000,chartqa_relaxed_overall,0.5384,0.009972459876198698
1146
+ ≥5,19000,docvqa_val_anls,0.5872765253291726,0.006247572686109655
1147
+ ≥5,19000,infovqa_val_anls,0.22290098871841885,0.006768484859310975
1148
+ ≥5,19000,mme_total_score,1243.9738895558223,
1149
+ ≥5,19000,mmmu_val_mmmu_acc,0.29222,
1150
+ ≥5,19000,mmstar_average,0.3312170414949971,
1151
+ ≥5,19000,ocrbench_ocrbench_accuracy,0.569,
1152
+ ≥5,19000,seedbench_seed_all,0.535408560311284,
1153
+ ≥5,19000,textvqa_val_exact_match,0.52872,0.006772725173905718
1154
+ ≥5,20000,ai2d_exact_match,0.40867875647668395,0.00884778289870743
1155
+ ≥5,20000,average,0.4447757248308666,
1156
+ ≥5,20000,average_rank,1.9,
1157
+ ≥5,20000,chartqa_relaxed_overall,0.5368,0.009974873595254053
1158
+ ≥5,20000,docvqa_val_anls,0.5881395593641573,0.00625433143624698
1159
+ ≥5,20000,infovqa_val_anls,0.21756373662547837,0.006798638807266341
1160
+ ≥5,20000,mme_total_score,1235.672769107643,
1161
+ ≥5,20000,mmmu_val_mmmu_acc,0.28667,
1162
+ ≥5,20000,mmstar_average,0.32944615805983984,
1163
+ ≥5,20000,ocrbench_ocrbench_accuracy,0.57,
1164
+ ≥5,20000,seedbench_seed_all,0.5339633129516398,
1165
+ ≥5,20000,textvqa_val_exact_match,0.53172,0.006760466633437396
app/src/content/embeds/against-baselines-deduplicated.html ADDED
@@ -0,0 +1,576 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div class="d3-line" style="width:100%;margin:10px 0;"></div>
2
+ <style>
3
+ .d3-line .d3-line__controls select {
4
+ font-size: 12px;
5
+ padding: 8px 28px 8px 10px;
6
+ border: 1px solid var(--border-color);
7
+ border-radius: 8px;
8
+ background-color: var(--surface-bg);
9
+ color: var(--text-color);
10
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
11
+ background-repeat: no-repeat;
12
+ background-position: right 8px center;
13
+ background-size: 12px;
14
+ -webkit-appearance: none;
15
+ -moz-appearance: none;
16
+ appearance: none;
17
+ cursor: pointer;
18
+ transition: border-color .15s ease, box-shadow .15s ease;
19
+ }
20
+ [data-theme="dark"] .d3-line .d3-line__controls select {
21
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
22
+ }
23
+ .d3-line .d3-line__controls select:hover {
24
+ border-color: var(--primary-color);
25
+ }
26
+ .d3-line .d3-line__controls select:focus {
27
+ border-color: var(--primary-color);
28
+ box-shadow: 0 0 0 3px rgba(232,137,171,.25);
29
+ outline: none;
30
+ }
31
+ .d3-line .d3-line__controls label { gap: 8px; }
32
+
33
+ /* Range slider themed with --primary-color */
34
+ .d3-line .d3-line__controls input[type="range"] {
35
+ -webkit-appearance: none;
36
+ appearance: none;
37
+ width: 100%;
38
+ height: 6px;
39
+ border-radius: 999px;
40
+ background: var(--border-color);
41
+ outline: none;
42
+ }
43
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
44
+ height: 6px;
45
+ background: transparent;
46
+ border-radius: 999px;
47
+ }
48
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
49
+ -webkit-appearance: none;
50
+ appearance: none;
51
+ width: 16px;
52
+ height: 16px;
53
+ border-radius: 50%;
54
+ background: var(--primary-color);
55
+ border: 2px solid var(--on-primary);
56
+ margin-top: -5px;
57
+ cursor: pointer;
58
+ }
59
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-track {
60
+ height: 6px;
61
+ background: transparent;
62
+ border-radius: 999px;
63
+ }
64
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
65
+ width: 16px;
66
+ height: 16px;
67
+ border-radius: 50%;
68
+ background: var(--primary-color);
69
+ border: 2px solid var(--on-primary);
70
+ cursor: pointer;
71
+ }
72
+ /* Improved line color via CSS */
73
+ .d3-line .lines path.improved { stroke: var(--primary-color); }
74
+ </style>
75
+ <script>
76
+ (() => {
77
+ const ensureD3 = (cb) => {
78
+ if (window.d3 && typeof window.d3.select === 'function') return cb();
79
+ let s = document.getElementById('d3-cdn-script');
80
+ if (!s) {
81
+ s = document.createElement('script');
82
+ s.id = 'd3-cdn-script';
83
+ s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
84
+ document.head.appendChild(s);
85
+ }
86
+ const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
87
+ s.addEventListener('load', onReady, { once: true });
88
+ if (window.d3) onReady();
89
+ };
90
+
91
+ const bootstrap = () => {
92
+ const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
93
+ const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
94
+ if (!container) return;
95
+ if (container.dataset) {
96
+ if (container.dataset.mounted === 'true') return;
97
+ container.dataset.mounted = 'true';
98
+ }
99
+
100
+ // CSV: prefer public path, fallback to relative
101
+ const CSV_PATHS = [
102
+ '/data/against_baselines_deduplicated.csv',
103
+ './assets/data/against_baselines_deduplicated.csv',
104
+ '../assets/data/against_baselines_deduplicated.csv',
105
+ '../../assets/data/against_baselines_deduplicated.csv'
106
+ ];
107
+ const fetchFirstAvailable = async (paths) => {
108
+ for (const p of paths) {
109
+ try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
110
+ }
111
+ throw new Error('CSV not found: against_baselines_deduplicated.csv');
112
+ };
113
+
114
+ // Controls UI
115
+ const controls = document.createElement('div');
116
+ controls.className = 'd3-line__controls';
117
+ Object.assign(controls.style, {
118
+ marginTop: '12px',
119
+ display: 'flex',
120
+ gap: '16px',
121
+ alignItems: 'center',
122
+ justifyContent: 'space-between',
123
+ width: '100%'
124
+ });
125
+
126
+ const labelMetric = document.createElement('label');
127
+ Object.assign(labelMetric.style, {
128
+ fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
129
+ });
130
+ labelMetric.textContent = 'Metric';
131
+ const selectMetric = document.createElement('select');
132
+ Object.assign(selectMetric.style, { fontSize: '12px' });
133
+ labelMetric.appendChild(selectMetric);
134
+
135
+ // Inline legend on the right of the select
136
+ const legendInline = document.createElement('div');
137
+ legendInline.className = 'controls__legend';
138
+ Object.assign(legendInline.style, {
139
+ display: 'flex',
140
+ gap: '8px',
141
+ alignItems: 'center',
142
+ flexWrap: 'nowrap',
143
+ fontSize: '11px',
144
+ marginLeft: '8px'
145
+ });
146
+ controls.appendChild(legendInline);
147
+ controls.appendChild(labelMetric);
148
+
149
+ // Create SVG with marker definitions
150
+ const svg = d3.select(container).append('svg')
151
+ .attr('width', '100%')
152
+ .style('display', 'block');
153
+
154
+ // Add marker definitions for different shapes
155
+ const defs = svg.append('defs');
156
+
157
+ // Academic marker shapes
158
+ const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
159
+ const markerSize = 8;
160
+
161
+ // Groups
162
+ const gRoot = svg.append('g');
163
+ const gGrid = gRoot.append('g').attr('class', 'grid');
164
+ const gAxes = gRoot.append('g').attr('class', 'axes');
165
+ const gLines = gRoot.append('g').attr('class', 'lines');
166
+ const gPoints = gRoot.append('g').attr('class', 'points');
167
+ const gHover = gRoot.append('g').attr('class', 'hover');
168
+ const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
169
+
170
+ // Tooltip
171
+ container.style.position = container.style.position || 'relative';
172
+ let tip = container.querySelector('.d3-tooltip');
173
+ let tipInner;
174
+ if (!tip) {
175
+ tip = document.createElement('div');
176
+ tip.className = 'd3-tooltip';
177
+ Object.assign(tip.style, {
178
+ position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
179
+ padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
180
+ background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
181
+ transition: 'opacity .12s ease'
182
+ });
183
+ tipInner = document.createElement('div');
184
+ tipInner.className = 'd3-tooltip__inner';
185
+ tipInner.style.textAlign = 'left';
186
+ tip.appendChild(tipInner);
187
+ container.appendChild(tip);
188
+ } else {
189
+ tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
190
+ }
191
+
192
+ // Colors per run
193
+ const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
194
+ const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
195
+
196
+ // Mapping from metric names to display titles
197
+ const metricTitleMapping = {
198
+ 'docvqa_val_anls': 'DocVQA',
199
+ 'infovqa_val_anls': 'InfoVQA',
200
+ 'mme_total_score': 'MME Total',
201
+ 'mmmu_val_mmmu_acc': 'MMMU',
202
+ 'mmstar_average': 'MMStar',
203
+ 'ocrbench_ocrbench_accuracy': 'OCRBench',
204
+ 'scienceqa_exact_match': 'ScienceQA',
205
+ 'textvqa_val_exact_match': 'TextVQA',
206
+ 'average': 'Average (excl. MME)',
207
+ 'average_rank': 'Average Rank',
208
+ 'ai2d_exact_match': 'AI2D',
209
+ 'chartqa_relaxed_overall': 'ChartQA',
210
+ 'seedbench_seed_all': 'SeedBench'
211
+ };
212
+
213
+ // Function to get display name for metric
214
+ function getMetricDisplayName(metricKey) {
215
+ return metricTitleMapping[metricKey] || metricKey;
216
+ }
217
+
218
+ // State and data
219
+ let metricList = [];
220
+ let runList = [];
221
+ let runOrder = [];
222
+ const dataByMetric = new Map(); // metric => { run => [{step,value}] }
223
+ let isRankStrictFlag = false;
224
+ let rankTickMax = 1;
225
+
226
+ // Scales and layout
227
+ let width = 800, height = 360;
228
+ let margin = { top: 16, right: 28, bottom: 56, left: 64 };
229
+ let xScale = d3.scaleLinear();
230
+ let yScale = d3.scaleLinear();
231
+
232
+ // Line generators - simple linear connections
233
+ const lineGen = d3.line()
234
+ .x((d) => xScale(d.step))
235
+ .y((d) => yScale(d.value));
236
+
237
+ // Function to draw different marker shapes
238
+ function drawMarker(selection, shape, size) {
239
+ const s = size / 2;
240
+ switch (shape) {
241
+ case 'circle':
242
+ return selection.append('circle').attr('r', s);
243
+ case 'square':
244
+ return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
245
+ case 'triangle':
246
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
247
+ case 'diamond':
248
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
249
+ case 'inverted-triangle':
250
+ return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
251
+ default:
252
+ return selection.append('circle').attr('r', s);
253
+ }
254
+ }
255
+
256
+ // Hover elements
257
+ const hoverLine = gHover.append('line').attr('stroke-width', 1);
258
+
259
+ const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
260
+
261
+ function updateScales() {
262
+ const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
263
+ const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
264
+ const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
265
+ const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
266
+
267
+ width = container.clientWidth || 800;
268
+ height = Math.max(360, Math.round(width / 2.2));
269
+ svg.attr('width', width).attr('height', height);
270
+
271
+ const innerWidth = width - margin.left - margin.right;
272
+ const innerHeight = height - margin.top - margin.bottom;
273
+ gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
274
+
275
+ xScale.range([0, innerWidth]);
276
+ yScale.range([innerHeight, 0]);
277
+
278
+ // Compute Y ticks
279
+ let yTicks = [];
280
+ if (isRankStrictFlag) {
281
+ const maxR = Math.max(1, Math.round(rankTickMax));
282
+ for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
283
+ } else {
284
+ // Use D3's tick generator to produce nice floating-point ticks
285
+ yTicks = yScale.ticks(6);
286
+ }
287
+
288
+ // Grid (horizontal)
289
+ gGrid.selectAll('*').remove();
290
+ gGrid.selectAll('line')
291
+ .data(yTicks)
292
+ .join('line')
293
+ .attr('x1', 0)
294
+ .attr('x2', innerWidth)
295
+ .attr('y1', (d) => yScale(d))
296
+ .attr('y2', (d) => yScale(d))
297
+ .attr('stroke', gridColor)
298
+ .attr('stroke-width', 1)
299
+ .attr('shape-rendering', 'crispEdges');
300
+
301
+ // Axes
302
+ gAxes.selectAll('*').remove();
303
+ let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
304
+ if (isRankStrictFlag) {
305
+ const [dx0, dx1] = xScale.domain();
306
+ const start = Math.ceil(dx0 / 1000) * 1000;
307
+ const end = Math.floor(dx1 / 1000) * 1000;
308
+ const xTicks = [];
309
+ for (let v = start; v <= end; v += 1000) xTicks.push(v);
310
+ if (xTicks.length === 0) xTicks.push(Math.round(dx0));
311
+ xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
312
+ } else {
313
+ xAxis = xAxis.ticks(8);
314
+ }
315
+ const yAxis = d3.axisLeft(yScale)
316
+ .tickValues(yTicks)
317
+ .tickSizeOuter(0)
318
+ .tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
319
+ gAxes.append('g')
320
+ .attr('transform', `translate(0,${innerHeight})`)
321
+ .call(xAxis)
322
+ .call((g) => {
323
+ g.selectAll('path, line').attr('stroke', axisColor);
324
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
325
+ });
326
+ gAxes.append('g')
327
+ .call(yAxis)
328
+ .call((g) => {
329
+ g.selectAll('path, line').attr('stroke', axisColor);
330
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
331
+ });
332
+
333
+ // Axis labels (X and Y)
334
+ gAxes.append('text')
335
+ .attr('class', 'axis-label axis-label--x')
336
+ .attr('x', innerWidth / 2)
337
+ .attr('y', innerHeight + 44)
338
+ .attr('text-anchor', 'middle')
339
+ .style('font-size', '12px')
340
+ .style('fill', tickColor)
341
+ .text('Step');
342
+ gAxes.append('text')
343
+ .attr('class', 'axis-label axis-label--y')
344
+ .attr('text-anchor', 'middle')
345
+ .attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
346
+ .style('font-size', '12px')
347
+ .style('fill', tickColor)
348
+ .text('Value');
349
+
350
+ overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
351
+ hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
352
+
353
+ // Legend placeholder; actual content set in renderMetric
354
+ const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
355
+ const legendHeight = 64;
356
+ gLegend
357
+ .attr('x', innerWidth - legendWidth + 42)
358
+ .attr('y', innerHeight - legendHeight - 12)
359
+ .attr('width', legendWidth)
360
+ .attr('height', legendHeight);
361
+ const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
362
+ Object.assign(legendRoot.node().style, {
363
+ background: 'transparent',
364
+ border: 'none',
365
+ borderRadius: '0',
366
+ padding: '0',
367
+ fontSize: '12px',
368
+ lineHeight: '1.35',
369
+ color: 'var(--text-color)'
370
+ });
371
+
372
+ return { innerWidth, innerHeight };
373
+ }
374
+
375
+ function renderMetric(metricKey){
376
+ const map = dataByMetric.get(metricKey) || {};
377
+ const runs = runOrder;
378
+ // Domain
379
+ let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
380
+ const isRank = /rank/i.test(metricKey);
381
+ const isAverage = /average/i.test(metricKey);
382
+ const isRankStrict = isRank && !isAverage;
383
+ runs.forEach(r => {
384
+ const arr = map[r] || [];
385
+ arr.forEach(pt => {
386
+ const val = isRankStrict ? Math.round(pt.value) : pt.value;
387
+ minStep = Math.min(minStep, pt.step);
388
+ maxStep = Math.max(maxStep, pt.step);
389
+ maxVal = Math.max(maxVal, val);
390
+ minVal = Math.min(minVal, val);
391
+ });
392
+ });
393
+ if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
394
+ xScale.domain([minStep, maxStep]);
395
+ if (isRank) {
396
+ rankTickMax = Math.max(1, Math.round(maxVal));
397
+ yScale.domain([rankTickMax, 1]);
398
+ } else {
399
+ yScale.domain([0, Math.max(1, maxVal)]).nice();
400
+ }
401
+ isRankStrictFlag = isRankStrict;
402
+
403
+ const { innerWidth, innerHeight } = updateScales();
404
+
405
+ // Bind lines and markers
406
+ const series = runs.map((r, i) => ({
407
+ run: r,
408
+ color: pool[i % pool.length],
409
+ marker: markerShapes[i % markerShapes.length],
410
+ values: (map[r]||[])
411
+ .slice()
412
+ .sort((a,b)=>a.step-b.step)
413
+ .map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
414
+ }));
415
+
416
+ // Draw lines
417
+ const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
418
+ paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
419
+ .attr('stroke', d=>d.color).attr('opacity',0.9)
420
+ .attr('d', d=>lineGen(d.values))
421
+ .merge(paths)
422
+ .transition().duration(200)
423
+ .attr('stroke', d=>d.color)
424
+ .attr('d', d=>lineGen(d.values));
425
+ paths.exit().remove();
426
+
427
+ // Draw markers for each data point
428
+ gPoints.selectAll('*').remove();
429
+ series.forEach((s, seriesIndex) => {
430
+ const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
431
+ .data(s.values)
432
+ .join('g')
433
+ .attr('class', `points-${seriesIndex}`)
434
+ .attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
435
+
436
+ drawMarker(pointGroup, s.marker, markerSize)
437
+ .attr('fill', s.color)
438
+ .attr('stroke', s.color)
439
+ .attr('stroke-width', 1.5)
440
+ .style('cursor', 'crosshair');
441
+ });
442
+
443
+ // Inline legend content with marker shapes
444
+ legendInline.innerHTML = '';
445
+ series.forEach(s => {
446
+ const legendItem = document.createElement('span');
447
+ legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
448
+
449
+ // Create small SVG for marker shape
450
+ const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
451
+ markerSvg.setAttribute('width', '16');
452
+ markerSvg.setAttribute('height', '12');
453
+ markerSvg.style.display = 'inline-block';
454
+
455
+ const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
456
+ g.setAttribute('transform', 'translate(8,6)');
457
+
458
+ let shape;
459
+ const size = 6;
460
+ const halfSize = size / 2;
461
+ switch(s.marker) {
462
+ case 'circle':
463
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
464
+ shape.setAttribute('r', halfSize);
465
+ break;
466
+ case 'square':
467
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
468
+ shape.setAttribute('x', -halfSize);
469
+ shape.setAttribute('y', -halfSize);
470
+ shape.setAttribute('width', size);
471
+ shape.setAttribute('height', size);
472
+ break;
473
+ case 'triangle':
474
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
475
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
476
+ break;
477
+ case 'diamond':
478
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
479
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
480
+ break;
481
+ case 'inverted-triangle':
482
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
483
+ shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
484
+ break;
485
+ default:
486
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
487
+ shape.setAttribute('r', halfSize);
488
+ }
489
+ shape.setAttribute('fill', s.color);
490
+ shape.setAttribute('stroke', s.color);
491
+ shape.setAttribute('stroke-width', '1');
492
+
493
+ g.appendChild(shape);
494
+ markerSvg.appendChild(g);
495
+
496
+ const label = document.createElement('span');
497
+ label.textContent = s.run;
498
+
499
+ legendItem.appendChild(markerSvg);
500
+ legendItem.appendChild(label);
501
+ legendInline.appendChild(legendItem);
502
+ });
503
+
504
+ // Hover
505
+ const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
506
+ const steps = Array.from(stepSet).sort((a,b)=>a-b);
507
+ function onMove(event){
508
+ const [mx, my] = d3.pointer(event, overlay.node());
509
+ const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
510
+ const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
511
+ const xpx = xScale(nearest);
512
+ hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
513
+ // Tooltip content
514
+ let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
515
+ series.forEach(s=>{
516
+ const m = new Map(s.values.map(v=>[v.step, v.value]));
517
+ const val = m.has(nearest) ? m.get(nearest) : null;
518
+ if (val != null) {
519
+ const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
520
+ html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
521
+ }
522
+ });
523
+ tipInner.innerHTML = html;
524
+ const offsetX = 12, offsetY = 12;
525
+ tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
526
+ }
527
+ function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
528
+ overlay.on('mousemove', onMove).on('mouseleave', onLeave);
529
+ }
530
+
531
+ // (old hover removed; hover is attached in renderMetric)
532
+
533
+ // Load CSV and wire controls
534
+ (async () => {
535
+ try {
536
+ const text = await fetchFirstAvailable(CSV_PATHS);
537
+ const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
538
+ metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
539
+ runList = Array.from(new Set(rows.map(r=>r.run))).sort();
540
+ runOrder = runList;
541
+ // Build dataByMetric
542
+ metricList.forEach(m => {
543
+ const map = {};
544
+ runList.forEach(r => { map[r] = []; });
545
+ rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
546
+ dataByMetric.set(m, map);
547
+ });
548
+
549
+ // Populate metric select (default to average_rank if present)
550
+ metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
551
+ const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
552
+ if (def) selectMetric.value = def;
553
+
554
+ container.appendChild(controls);
555
+ updateScales();
556
+ renderMetric(selectMetric.value);
557
+
558
+ selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
559
+
560
+ const rerender = () => { renderMetric(selectMetric.value); };
561
+ if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
562
+ } catch (e) {
563
+ const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
564
+ pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
565
+ container.appendChild(pre);
566
+ }
567
+ })();
568
+ };
569
+
570
+ if (document.readyState === 'loading') {
571
+ document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
572
+ } else { ensureD3(bootstrap); }
573
+ })();
574
+ </script>
575
+
576
+
app/src/content/embeds/{d3-line.html → against-baselines.html} RENAMED
File without changes
app/src/content/embeds/all-ratings.html ADDED
@@ -0,0 +1,576 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div class="d3-line" style="width:100%;margin:10px 0;"></div>
2
+ <style>
3
+ .d3-line .d3-line__controls select {
4
+ font-size: 12px;
5
+ padding: 8px 28px 8px 10px;
6
+ border: 1px solid var(--border-color);
7
+ border-radius: 8px;
8
+ background-color: var(--surface-bg);
9
+ color: var(--text-color);
10
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
11
+ background-repeat: no-repeat;
12
+ background-position: right 8px center;
13
+ background-size: 12px;
14
+ -webkit-appearance: none;
15
+ -moz-appearance: none;
16
+ appearance: none;
17
+ cursor: pointer;
18
+ transition: border-color .15s ease, box-shadow .15s ease;
19
+ }
20
+ [data-theme="dark"] .d3-line .d3-line__controls select {
21
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
22
+ }
23
+ .d3-line .d3-line__controls select:hover {
24
+ border-color: var(--primary-color);
25
+ }
26
+ .d3-line .d3-line__controls select:focus {
27
+ border-color: var(--primary-color);
28
+ box-shadow: 0 0 0 3px rgba(232,137,171,.25);
29
+ outline: none;
30
+ }
31
+ .d3-line .d3-line__controls label { gap: 8px; }
32
+
33
+ /* Range slider themed with --primary-color */
34
+ .d3-line .d3-line__controls input[type="range"] {
35
+ -webkit-appearance: none;
36
+ appearance: none;
37
+ width: 100%;
38
+ height: 6px;
39
+ border-radius: 999px;
40
+ background: var(--border-color);
41
+ outline: none;
42
+ }
43
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
44
+ height: 6px;
45
+ background: transparent;
46
+ border-radius: 999px;
47
+ }
48
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
49
+ -webkit-appearance: none;
50
+ appearance: none;
51
+ width: 16px;
52
+ height: 16px;
53
+ border-radius: 50%;
54
+ background: var(--primary-color);
55
+ border: 2px solid var(--on-primary);
56
+ margin-top: -5px;
57
+ cursor: pointer;
58
+ }
59
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-track {
60
+ height: 6px;
61
+ background: transparent;
62
+ border-radius: 999px;
63
+ }
64
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
65
+ width: 16px;
66
+ height: 16px;
67
+ border-radius: 50%;
68
+ background: var(--primary-color);
69
+ border: 2px solid var(--on-primary);
70
+ cursor: pointer;
71
+ }
72
+ /* Improved line color via CSS */
73
+ .d3-line .lines path.improved { stroke: var(--primary-color); }
74
+ </style>
75
+ <script>
76
+ (() => {
77
+ const ensureD3 = (cb) => {
78
+ if (window.d3 && typeof window.d3.select === 'function') return cb();
79
+ let s = document.getElementById('d3-cdn-script');
80
+ if (!s) {
81
+ s = document.createElement('script');
82
+ s.id = 'd3-cdn-script';
83
+ s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
84
+ document.head.appendChild(s);
85
+ }
86
+ const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
87
+ s.addEventListener('load', onReady, { once: true });
88
+ if (window.d3) onReady();
89
+ };
90
+
91
+ const bootstrap = () => {
92
+ const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
93
+ const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
94
+ if (!container) return;
95
+ if (container.dataset) {
96
+ if (container.dataset.mounted === 'true') return;
97
+ container.dataset.mounted = 'true';
98
+ }
99
+
100
+ // CSV: prefer public path, fallback to relative
101
+ const CSV_PATHS = [
102
+ '/data/all_ratings_luis.csv',
103
+ './assets/data/all_ratings_luis.csv',
104
+ '../assets/data/all_ratings_luis.csv',
105
+ '../../assets/data/all_ratings_luis.csv'
106
+ ];
107
+ const fetchFirstAvailable = async (paths) => {
108
+ for (const p of paths) {
109
+ try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
110
+ }
111
+ throw new Error('CSV not found: all_ratings_luis.csv');
112
+ };
113
+
114
+ // Controls UI
115
+ const controls = document.createElement('div');
116
+ controls.className = 'd3-line__controls';
117
+ Object.assign(controls.style, {
118
+ marginTop: '12px',
119
+ display: 'flex',
120
+ gap: '16px',
121
+ alignItems: 'center',
122
+ justifyContent: 'space-between',
123
+ width: '100%'
124
+ });
125
+
126
+ const labelMetric = document.createElement('label');
127
+ Object.assign(labelMetric.style, {
128
+ fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
129
+ });
130
+ labelMetric.textContent = 'Metric';
131
+ const selectMetric = document.createElement('select');
132
+ Object.assign(selectMetric.style, { fontSize: '12px' });
133
+ labelMetric.appendChild(selectMetric);
134
+
135
+ // Inline legend on the right of the select
136
+ const legendInline = document.createElement('div');
137
+ legendInline.className = 'controls__legend';
138
+ Object.assign(legendInline.style, {
139
+ display: 'flex',
140
+ gap: '8px',
141
+ alignItems: 'center',
142
+ flexWrap: 'nowrap',
143
+ fontSize: '11px',
144
+ marginLeft: '8px'
145
+ });
146
+ controls.appendChild(legendInline);
147
+ controls.appendChild(labelMetric);
148
+
149
+ // Create SVG with marker definitions
150
+ const svg = d3.select(container).append('svg')
151
+ .attr('width', '100%')
152
+ .style('display', 'block');
153
+
154
+ // Add marker definitions for different shapes
155
+ const defs = svg.append('defs');
156
+
157
+ // Academic marker shapes
158
+ const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
159
+ const markerSize = 8;
160
+
161
+ // Groups
162
+ const gRoot = svg.append('g');
163
+ const gGrid = gRoot.append('g').attr('class', 'grid');
164
+ const gAxes = gRoot.append('g').attr('class', 'axes');
165
+ const gLines = gRoot.append('g').attr('class', 'lines');
166
+ const gPoints = gRoot.append('g').attr('class', 'points');
167
+ const gHover = gRoot.append('g').attr('class', 'hover');
168
+ const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
169
+
170
+ // Tooltip
171
+ container.style.position = container.style.position || 'relative';
172
+ let tip = container.querySelector('.d3-tooltip');
173
+ let tipInner;
174
+ if (!tip) {
175
+ tip = document.createElement('div');
176
+ tip.className = 'd3-tooltip';
177
+ Object.assign(tip.style, {
178
+ position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
179
+ padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
180
+ background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
181
+ transition: 'opacity .12s ease'
182
+ });
183
+ tipInner = document.createElement('div');
184
+ tipInner.className = 'd3-tooltip__inner';
185
+ tipInner.style.textAlign = 'left';
186
+ tip.appendChild(tipInner);
187
+ container.appendChild(tip);
188
+ } else {
189
+ tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
190
+ }
191
+
192
+ // Colors per run
193
+ const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
194
+ const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
195
+
196
+ // Mapping from metric names to display titles
197
+ const metricTitleMapping = {
198
+ 'docvqa_val_anls': 'DocVQA',
199
+ 'infovqa_val_anls': 'InfoVQA',
200
+ 'mme_total_score': 'MME Total',
201
+ 'mmmu_val_mmmu_acc': 'MMMU',
202
+ 'mmstar_average': 'MMStar',
203
+ 'ocrbench_ocrbench_accuracy': 'OCRBench',
204
+ 'scienceqa_exact_match': 'ScienceQA',
205
+ 'textvqa_val_exact_match': 'TextVQA',
206
+ 'average': 'Average (excl. MME)',
207
+ 'average_rank': 'Average Rank',
208
+ 'ai2d_exact_match': 'AI2D',
209
+ 'chartqa_relaxed_overall': 'ChartQA',
210
+ 'seedbench_seed_all': 'SeedBench'
211
+ };
212
+
213
+ // Function to get display name for metric
214
+ function getMetricDisplayName(metricKey) {
215
+ return metricTitleMapping[metricKey] || metricKey;
216
+ }
217
+
218
+ // State and data
219
+ let metricList = [];
220
+ let runList = [];
221
+ let runOrder = [];
222
+ const dataByMetric = new Map(); // metric => { run => [{step,value}] }
223
+ let isRankStrictFlag = false;
224
+ let rankTickMax = 1;
225
+
226
+ // Scales and layout
227
+ let width = 800, height = 360;
228
+ let margin = { top: 16, right: 28, bottom: 56, left: 64 };
229
+ let xScale = d3.scaleLinear();
230
+ let yScale = d3.scaleLinear();
231
+
232
+ // Line generators - simple linear connections
233
+ const lineGen = d3.line()
234
+ .x((d) => xScale(d.step))
235
+ .y((d) => yScale(d.value));
236
+
237
+ // Function to draw different marker shapes
238
+ function drawMarker(selection, shape, size) {
239
+ const s = size / 2;
240
+ switch (shape) {
241
+ case 'circle':
242
+ return selection.append('circle').attr('r', s);
243
+ case 'square':
244
+ return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
245
+ case 'triangle':
246
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
247
+ case 'diamond':
248
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
249
+ case 'inverted-triangle':
250
+ return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
251
+ default:
252
+ return selection.append('circle').attr('r', s);
253
+ }
254
+ }
255
+
256
+ // Hover elements
257
+ const hoverLine = gHover.append('line').attr('stroke-width', 1);
258
+
259
+ const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
260
+
261
+ function updateScales() {
262
+ const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
263
+ const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
264
+ const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
265
+ const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
266
+
267
+ width = container.clientWidth || 800;
268
+ height = Math.max(360, Math.round(width / 2.2));
269
+ svg.attr('width', width).attr('height', height);
270
+
271
+ const innerWidth = width - margin.left - margin.right;
272
+ const innerHeight = height - margin.top - margin.bottom;
273
+ gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
274
+
275
+ xScale.range([0, innerWidth]);
276
+ yScale.range([innerHeight, 0]);
277
+
278
+ // Compute Y ticks
279
+ let yTicks = [];
280
+ if (isRankStrictFlag) {
281
+ const maxR = Math.max(1, Math.round(rankTickMax));
282
+ for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
283
+ } else {
284
+ // Use D3's tick generator to produce nice floating-point ticks
285
+ yTicks = yScale.ticks(6);
286
+ }
287
+
288
+ // Grid (horizontal)
289
+ gGrid.selectAll('*').remove();
290
+ gGrid.selectAll('line')
291
+ .data(yTicks)
292
+ .join('line')
293
+ .attr('x1', 0)
294
+ .attr('x2', innerWidth)
295
+ .attr('y1', (d) => yScale(d))
296
+ .attr('y2', (d) => yScale(d))
297
+ .attr('stroke', gridColor)
298
+ .attr('stroke-width', 1)
299
+ .attr('shape-rendering', 'crispEdges');
300
+
301
+ // Axes
302
+ gAxes.selectAll('*').remove();
303
+ let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
304
+ if (isRankStrictFlag) {
305
+ const [dx0, dx1] = xScale.domain();
306
+ const start = Math.ceil(dx0 / 1000) * 1000;
307
+ const end = Math.floor(dx1 / 1000) * 1000;
308
+ const xTicks = [];
309
+ for (let v = start; v <= end; v += 1000) xTicks.push(v);
310
+ if (xTicks.length === 0) xTicks.push(Math.round(dx0));
311
+ xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
312
+ } else {
313
+ xAxis = xAxis.ticks(8);
314
+ }
315
+ const yAxis = d3.axisLeft(yScale)
316
+ .tickValues(yTicks)
317
+ .tickSizeOuter(0)
318
+ .tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
319
+ gAxes.append('g')
320
+ .attr('transform', `translate(0,${innerHeight})`)
321
+ .call(xAxis)
322
+ .call((g) => {
323
+ g.selectAll('path, line').attr('stroke', axisColor);
324
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
325
+ });
326
+ gAxes.append('g')
327
+ .call(yAxis)
328
+ .call((g) => {
329
+ g.selectAll('path, line').attr('stroke', axisColor);
330
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
331
+ });
332
+
333
+ // Axis labels (X and Y)
334
+ gAxes.append('text')
335
+ .attr('class', 'axis-label axis-label--x')
336
+ .attr('x', innerWidth / 2)
337
+ .attr('y', innerHeight + 44)
338
+ .attr('text-anchor', 'middle')
339
+ .style('font-size', '12px')
340
+ .style('fill', tickColor)
341
+ .text('Step');
342
+ gAxes.append('text')
343
+ .attr('class', 'axis-label axis-label--y')
344
+ .attr('text-anchor', 'middle')
345
+ .attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
346
+ .style('font-size', '12px')
347
+ .style('fill', tickColor)
348
+ .text('Value');
349
+
350
+ overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
351
+ hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
352
+
353
+ // Legend placeholder; actual content set in renderMetric
354
+ const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
355
+ const legendHeight = 64;
356
+ gLegend
357
+ .attr('x', innerWidth - legendWidth + 42)
358
+ .attr('y', innerHeight - legendHeight - 12)
359
+ .attr('width', legendWidth)
360
+ .attr('height', legendHeight);
361
+ const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
362
+ Object.assign(legendRoot.node().style, {
363
+ background: 'transparent',
364
+ border: 'none',
365
+ borderRadius: '0',
366
+ padding: '0',
367
+ fontSize: '12px',
368
+ lineHeight: '1.35',
369
+ color: 'var(--text-color)'
370
+ });
371
+
372
+ return { innerWidth, innerHeight };
373
+ }
374
+
375
+ function renderMetric(metricKey){
376
+ const map = dataByMetric.get(metricKey) || {};
377
+ const runs = runOrder;
378
+ // Domain
379
+ let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
380
+ const isRank = /rank/i.test(metricKey);
381
+ const isAverage = /average/i.test(metricKey);
382
+ const isRankStrict = isRank && !isAverage;
383
+ runs.forEach(r => {
384
+ const arr = map[r] || [];
385
+ arr.forEach(pt => {
386
+ const val = isRankStrict ? Math.round(pt.value) : pt.value;
387
+ minStep = Math.min(minStep, pt.step);
388
+ maxStep = Math.max(maxStep, pt.step);
389
+ maxVal = Math.max(maxVal, val);
390
+ minVal = Math.min(minVal, val);
391
+ });
392
+ });
393
+ if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
394
+ xScale.domain([minStep, maxStep]);
395
+ if (isRank) {
396
+ rankTickMax = Math.max(1, Math.round(maxVal));
397
+ yScale.domain([rankTickMax, 1]);
398
+ } else {
399
+ yScale.domain([0, Math.max(1, maxVal)]).nice();
400
+ }
401
+ isRankStrictFlag = isRankStrict;
402
+
403
+ const { innerWidth, innerHeight } = updateScales();
404
+
405
+ // Bind lines and markers
406
+ const series = runs.map((r, i) => ({
407
+ run: r,
408
+ color: pool[i % pool.length],
409
+ marker: markerShapes[i % markerShapes.length],
410
+ values: (map[r]||[])
411
+ .slice()
412
+ .sort((a,b)=>a.step-b.step)
413
+ .map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
414
+ }));
415
+
416
+ // Draw lines
417
+ const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
418
+ paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
419
+ .attr('stroke', d=>d.color).attr('opacity',0.9)
420
+ .attr('d', d=>lineGen(d.values))
421
+ .merge(paths)
422
+ .transition().duration(200)
423
+ .attr('stroke', d=>d.color)
424
+ .attr('d', d=>lineGen(d.values));
425
+ paths.exit().remove();
426
+
427
+ // Draw markers for each data point
428
+ gPoints.selectAll('*').remove();
429
+ series.forEach((s, seriesIndex) => {
430
+ const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
431
+ .data(s.values)
432
+ .join('g')
433
+ .attr('class', `points-${seriesIndex}`)
434
+ .attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
435
+
436
+ drawMarker(pointGroup, s.marker, markerSize)
437
+ .attr('fill', s.color)
438
+ .attr('stroke', s.color)
439
+ .attr('stroke-width', 1.5)
440
+ .style('cursor', 'crosshair');
441
+ });
442
+
443
+ // Inline legend content with marker shapes
444
+ legendInline.innerHTML = '';
445
+ series.forEach(s => {
446
+ const legendItem = document.createElement('span');
447
+ legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
448
+
449
+ // Create small SVG for marker shape
450
+ const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
451
+ markerSvg.setAttribute('width', '16');
452
+ markerSvg.setAttribute('height', '12');
453
+ markerSvg.style.display = 'inline-block';
454
+
455
+ const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
456
+ g.setAttribute('transform', 'translate(8,6)');
457
+
458
+ let shape;
459
+ const size = 6;
460
+ const halfSize = size / 2;
461
+ switch(s.marker) {
462
+ case 'circle':
463
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
464
+ shape.setAttribute('r', halfSize);
465
+ break;
466
+ case 'square':
467
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
468
+ shape.setAttribute('x', -halfSize);
469
+ shape.setAttribute('y', -halfSize);
470
+ shape.setAttribute('width', size);
471
+ shape.setAttribute('height', size);
472
+ break;
473
+ case 'triangle':
474
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
475
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
476
+ break;
477
+ case 'diamond':
478
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
479
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
480
+ break;
481
+ case 'inverted-triangle':
482
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
483
+ shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
484
+ break;
485
+ default:
486
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
487
+ shape.setAttribute('r', halfSize);
488
+ }
489
+ shape.setAttribute('fill', s.color);
490
+ shape.setAttribute('stroke', s.color);
491
+ shape.setAttribute('stroke-width', '1');
492
+
493
+ g.appendChild(shape);
494
+ markerSvg.appendChild(g);
495
+
496
+ const label = document.createElement('span');
497
+ label.textContent = s.run;
498
+
499
+ legendItem.appendChild(markerSvg);
500
+ legendItem.appendChild(label);
501
+ legendInline.appendChild(legendItem);
502
+ });
503
+
504
+ // Hover
505
+ const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
506
+ const steps = Array.from(stepSet).sort((a,b)=>a-b);
507
+ function onMove(event){
508
+ const [mx, my] = d3.pointer(event, overlay.node());
509
+ const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
510
+ const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
511
+ const xpx = xScale(nearest);
512
+ hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
513
+ // Tooltip content
514
+ let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
515
+ series.forEach(s=>{
516
+ const m = new Map(s.values.map(v=>[v.step, v.value]));
517
+ const val = m.has(nearest) ? m.get(nearest) : null;
518
+ if (val != null) {
519
+ const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
520
+ html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
521
+ }
522
+ });
523
+ tipInner.innerHTML = html;
524
+ const offsetX = 12, offsetY = 12;
525
+ tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
526
+ }
527
+ function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
528
+ overlay.on('mousemove', onMove).on('mouseleave', onLeave);
529
+ }
530
+
531
+ // (old hover removed; hover is attached in renderMetric)
532
+
533
+ // Load CSV and wire controls
534
+ (async () => {
535
+ try {
536
+ const text = await fetchFirstAvailable(CSV_PATHS);
537
+ const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
538
+ metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
539
+ runList = Array.from(new Set(rows.map(r=>r.run))).sort();
540
+ runOrder = runList;
541
+ // Build dataByMetric
542
+ metricList.forEach(m => {
543
+ const map = {};
544
+ runList.forEach(r => { map[r] = []; });
545
+ rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
546
+ dataByMetric.set(m, map);
547
+ });
548
+
549
+ // Populate metric select (default to average_rank if present)
550
+ metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
551
+ const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
552
+ if (def) selectMetric.value = def;
553
+
554
+ container.appendChild(controls);
555
+ updateScales();
556
+ renderMetric(selectMetric.value);
557
+
558
+ selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
559
+
560
+ const rerender = () => { renderMetric(selectMetric.value); };
561
+ if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
562
+ } catch (e) {
563
+ const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
564
+ pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
565
+ container.appendChild(pre);
566
+ }
567
+ })();
568
+ };
569
+
570
+ if (document.readyState === 'loading') {
571
+ document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
572
+ } else { ensureD3(bootstrap); }
573
+ })();
574
+ </script>
575
+
576
+
app/src/content/embeds/formatting-filters.html ADDED
@@ -0,0 +1,576 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div class="d3-line" style="width:100%;margin:10px 0;"></div>
2
+ <style>
3
+ .d3-line .d3-line__controls select {
4
+ font-size: 12px;
5
+ padding: 8px 28px 8px 10px;
6
+ border: 1px solid var(--border-color);
7
+ border-radius: 8px;
8
+ background-color: var(--surface-bg);
9
+ color: var(--text-color);
10
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
11
+ background-repeat: no-repeat;
12
+ background-position: right 8px center;
13
+ background-size: 12px;
14
+ -webkit-appearance: none;
15
+ -moz-appearance: none;
16
+ appearance: none;
17
+ cursor: pointer;
18
+ transition: border-color .15s ease, box-shadow .15s ease;
19
+ }
20
+ [data-theme="dark"] .d3-line .d3-line__controls select {
21
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
22
+ }
23
+ .d3-line .d3-line__controls select:hover {
24
+ border-color: var(--primary-color);
25
+ }
26
+ .d3-line .d3-line__controls select:focus {
27
+ border-color: var(--primary-color);
28
+ box-shadow: 0 0 0 3px rgba(232,137,171,.25);
29
+ outline: none;
30
+ }
31
+ .d3-line .d3-line__controls label { gap: 8px; }
32
+
33
+ /* Range slider themed with --primary-color */
34
+ .d3-line .d3-line__controls input[type="range"] {
35
+ -webkit-appearance: none;
36
+ appearance: none;
37
+ width: 100%;
38
+ height: 6px;
39
+ border-radius: 999px;
40
+ background: var(--border-color);
41
+ outline: none;
42
+ }
43
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
44
+ height: 6px;
45
+ background: transparent;
46
+ border-radius: 999px;
47
+ }
48
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
49
+ -webkit-appearance: none;
50
+ appearance: none;
51
+ width: 16px;
52
+ height: 16px;
53
+ border-radius: 50%;
54
+ background: var(--primary-color);
55
+ border: 2px solid var(--on-primary);
56
+ margin-top: -5px;
57
+ cursor: pointer;
58
+ }
59
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-track {
60
+ height: 6px;
61
+ background: transparent;
62
+ border-radius: 999px;
63
+ }
64
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
65
+ width: 16px;
66
+ height: 16px;
67
+ border-radius: 50%;
68
+ background: var(--primary-color);
69
+ border: 2px solid var(--on-primary);
70
+ cursor: pointer;
71
+ }
72
+ /* Improved line color via CSS */
73
+ .d3-line .lines path.improved { stroke: var(--primary-color); }
74
+ </style>
75
+ <script>
76
+ (() => {
77
+ const ensureD3 = (cb) => {
78
+ if (window.d3 && typeof window.d3.select === 'function') return cb();
79
+ let s = document.getElementById('d3-cdn-script');
80
+ if (!s) {
81
+ s = document.createElement('script');
82
+ s.id = 'd3-cdn-script';
83
+ s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
84
+ document.head.appendChild(s);
85
+ }
86
+ const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
87
+ s.addEventListener('load', onReady, { once: true });
88
+ if (window.d3) onReady();
89
+ };
90
+
91
+ const bootstrap = () => {
92
+ const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
93
+ const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
94
+ if (!container) return;
95
+ if (container.dataset) {
96
+ if (container.dataset.mounted === 'true') return;
97
+ container.dataset.mounted = 'true';
98
+ }
99
+
100
+ // CSV: prefer public path, fallback to relative
101
+ const CSV_PATHS = [
102
+ '/data/formatting_filters.csv',
103
+ './assets/data/formatting_filters.csv',
104
+ '../assets/data/formatting_filters.csv',
105
+ '../../assets/data/formatting_filters.csv'
106
+ ];
107
+ const fetchFirstAvailable = async (paths) => {
108
+ for (const p of paths) {
109
+ try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
110
+ }
111
+ throw new Error('CSV not found: formatting_filters.csv');
112
+ };
113
+
114
+ // Controls UI
115
+ const controls = document.createElement('div');
116
+ controls.className = 'd3-line__controls';
117
+ Object.assign(controls.style, {
118
+ marginTop: '12px',
119
+ display: 'flex',
120
+ gap: '16px',
121
+ alignItems: 'center',
122
+ justifyContent: 'space-between',
123
+ width: '100%'
124
+ });
125
+
126
+ const labelMetric = document.createElement('label');
127
+ Object.assign(labelMetric.style, {
128
+ fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
129
+ });
130
+ labelMetric.textContent = 'Metric';
131
+ const selectMetric = document.createElement('select');
132
+ Object.assign(selectMetric.style, { fontSize: '12px' });
133
+ labelMetric.appendChild(selectMetric);
134
+
135
+ // Inline legend on the right of the select
136
+ const legendInline = document.createElement('div');
137
+ legendInline.className = 'controls__legend';
138
+ Object.assign(legendInline.style, {
139
+ display: 'flex',
140
+ gap: '8px',
141
+ alignItems: 'center',
142
+ flexWrap: 'nowrap',
143
+ fontSize: '11px',
144
+ marginLeft: '8px'
145
+ });
146
+ controls.appendChild(legendInline);
147
+ controls.appendChild(labelMetric);
148
+
149
+ // Create SVG with marker definitions
150
+ const svg = d3.select(container).append('svg')
151
+ .attr('width', '100%')
152
+ .style('display', 'block');
153
+
154
+ // Add marker definitions for different shapes
155
+ const defs = svg.append('defs');
156
+
157
+ // Academic marker shapes
158
+ const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
159
+ const markerSize = 8;
160
+
161
+ // Groups
162
+ const gRoot = svg.append('g');
163
+ const gGrid = gRoot.append('g').attr('class', 'grid');
164
+ const gAxes = gRoot.append('g').attr('class', 'axes');
165
+ const gLines = gRoot.append('g').attr('class', 'lines');
166
+ const gPoints = gRoot.append('g').attr('class', 'points');
167
+ const gHover = gRoot.append('g').attr('class', 'hover');
168
+ const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
169
+
170
+ // Tooltip
171
+ container.style.position = container.style.position || 'relative';
172
+ let tip = container.querySelector('.d3-tooltip');
173
+ let tipInner;
174
+ if (!tip) {
175
+ tip = document.createElement('div');
176
+ tip.className = 'd3-tooltip';
177
+ Object.assign(tip.style, {
178
+ position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
179
+ padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
180
+ background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
181
+ transition: 'opacity .12s ease'
182
+ });
183
+ tipInner = document.createElement('div');
184
+ tipInner.className = 'd3-tooltip__inner';
185
+ tipInner.style.textAlign = 'left';
186
+ tip.appendChild(tipInner);
187
+ container.appendChild(tip);
188
+ } else {
189
+ tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
190
+ }
191
+
192
+ // Colors per run
193
+ const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
194
+ const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
195
+
196
+ // Mapping from metric names to display titles
197
+ const metricTitleMapping = {
198
+ 'docvqa_val_anls': 'DocVQA',
199
+ 'infovqa_val_anls': 'InfoVQA',
200
+ 'mme_total_score': 'MME Total',
201
+ 'mmmu_val_mmmu_acc': 'MMMU',
202
+ 'mmstar_average': 'MMStar',
203
+ 'ocrbench_ocrbench_accuracy': 'OCRBench',
204
+ 'scienceqa_exact_match': 'ScienceQA',
205
+ 'textvqa_val_exact_match': 'TextVQA',
206
+ 'average': 'Average (excl. MME)',
207
+ 'average_rank': 'Average Rank',
208
+ 'ai2d_exact_match': 'AI2D',
209
+ 'chartqa_relaxed_overall': 'ChartQA',
210
+ 'seedbench_seed_all': 'SeedBench'
211
+ };
212
+
213
+ // Function to get display name for metric
214
+ function getMetricDisplayName(metricKey) {
215
+ return metricTitleMapping[metricKey] || metricKey;
216
+ }
217
+
218
+ // State and data
219
+ let metricList = [];
220
+ let runList = [];
221
+ let runOrder = [];
222
+ const dataByMetric = new Map(); // metric => { run => [{step,value}] }
223
+ let isRankStrictFlag = false;
224
+ let rankTickMax = 1;
225
+
226
+ // Scales and layout
227
+ let width = 800, height = 360;
228
+ let margin = { top: 16, right: 28, bottom: 56, left: 64 };
229
+ let xScale = d3.scaleLinear();
230
+ let yScale = d3.scaleLinear();
231
+
232
+ // Line generators - simple linear connections
233
+ const lineGen = d3.line()
234
+ .x((d) => xScale(d.step))
235
+ .y((d) => yScale(d.value));
236
+
237
+ // Function to draw different marker shapes
238
+ function drawMarker(selection, shape, size) {
239
+ const s = size / 2;
240
+ switch (shape) {
241
+ case 'circle':
242
+ return selection.append('circle').attr('r', s);
243
+ case 'square':
244
+ return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
245
+ case 'triangle':
246
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
247
+ case 'diamond':
248
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
249
+ case 'inverted-triangle':
250
+ return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
251
+ default:
252
+ return selection.append('circle').attr('r', s);
253
+ }
254
+ }
255
+
256
+ // Hover elements
257
+ const hoverLine = gHover.append('line').attr('stroke-width', 1);
258
+
259
+ const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
260
+
261
+ function updateScales() {
262
+ const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
263
+ const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
264
+ const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
265
+ const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
266
+
267
+ width = container.clientWidth || 800;
268
+ height = Math.max(360, Math.round(width / 2.2));
269
+ svg.attr('width', width).attr('height', height);
270
+
271
+ const innerWidth = width - margin.left - margin.right;
272
+ const innerHeight = height - margin.top - margin.bottom;
273
+ gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
274
+
275
+ xScale.range([0, innerWidth]);
276
+ yScale.range([innerHeight, 0]);
277
+
278
+ // Compute Y ticks
279
+ let yTicks = [];
280
+ if (isRankStrictFlag) {
281
+ const maxR = Math.max(1, Math.round(rankTickMax));
282
+ for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
283
+ } else {
284
+ // Use D3's tick generator to produce nice floating-point ticks
285
+ yTicks = yScale.ticks(6);
286
+ }
287
+
288
+ // Grid (horizontal)
289
+ gGrid.selectAll('*').remove();
290
+ gGrid.selectAll('line')
291
+ .data(yTicks)
292
+ .join('line')
293
+ .attr('x1', 0)
294
+ .attr('x2', innerWidth)
295
+ .attr('y1', (d) => yScale(d))
296
+ .attr('y2', (d) => yScale(d))
297
+ .attr('stroke', gridColor)
298
+ .attr('stroke-width', 1)
299
+ .attr('shape-rendering', 'crispEdges');
300
+
301
+ // Axes
302
+ gAxes.selectAll('*').remove();
303
+ let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
304
+ if (isRankStrictFlag) {
305
+ const [dx0, dx1] = xScale.domain();
306
+ const start = Math.ceil(dx0 / 1000) * 1000;
307
+ const end = Math.floor(dx1 / 1000) * 1000;
308
+ const xTicks = [];
309
+ for (let v = start; v <= end; v += 1000) xTicks.push(v);
310
+ if (xTicks.length === 0) xTicks.push(Math.round(dx0));
311
+ xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
312
+ } else {
313
+ xAxis = xAxis.ticks(8);
314
+ }
315
+ const yAxis = d3.axisLeft(yScale)
316
+ .tickValues(yTicks)
317
+ .tickSizeOuter(0)
318
+ .tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
319
+ gAxes.append('g')
320
+ .attr('transform', `translate(0,${innerHeight})`)
321
+ .call(xAxis)
322
+ .call((g) => {
323
+ g.selectAll('path, line').attr('stroke', axisColor);
324
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
325
+ });
326
+ gAxes.append('g')
327
+ .call(yAxis)
328
+ .call((g) => {
329
+ g.selectAll('path, line').attr('stroke', axisColor);
330
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
331
+ });
332
+
333
+ // Axis labels (X and Y)
334
+ gAxes.append('text')
335
+ .attr('class', 'axis-label axis-label--x')
336
+ .attr('x', innerWidth / 2)
337
+ .attr('y', innerHeight + 44)
338
+ .attr('text-anchor', 'middle')
339
+ .style('font-size', '12px')
340
+ .style('fill', tickColor)
341
+ .text('Step');
342
+ gAxes.append('text')
343
+ .attr('class', 'axis-label axis-label--y')
344
+ .attr('text-anchor', 'middle')
345
+ .attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
346
+ .style('font-size', '12px')
347
+ .style('fill', tickColor)
348
+ .text('Value');
349
+
350
+ overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
351
+ hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
352
+
353
+ // Legend placeholder; actual content set in renderMetric
354
+ const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
355
+ const legendHeight = 64;
356
+ gLegend
357
+ .attr('x', innerWidth - legendWidth + 42)
358
+ .attr('y', innerHeight - legendHeight - 12)
359
+ .attr('width', legendWidth)
360
+ .attr('height', legendHeight);
361
+ const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
362
+ Object.assign(legendRoot.node().style, {
363
+ background: 'transparent',
364
+ border: 'none',
365
+ borderRadius: '0',
366
+ padding: '0',
367
+ fontSize: '12px',
368
+ lineHeight: '1.35',
369
+ color: 'var(--text-color)'
370
+ });
371
+
372
+ return { innerWidth, innerHeight };
373
+ }
374
+
375
+ function renderMetric(metricKey){
376
+ const map = dataByMetric.get(metricKey) || {};
377
+ const runs = runOrder;
378
+ // Domain
379
+ let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
380
+ const isRank = /rank/i.test(metricKey);
381
+ const isAverage = /average/i.test(metricKey);
382
+ const isRankStrict = isRank && !isAverage;
383
+ runs.forEach(r => {
384
+ const arr = map[r] || [];
385
+ arr.forEach(pt => {
386
+ const val = isRankStrict ? Math.round(pt.value) : pt.value;
387
+ minStep = Math.min(minStep, pt.step);
388
+ maxStep = Math.max(maxStep, pt.step);
389
+ maxVal = Math.max(maxVal, val);
390
+ minVal = Math.min(minVal, val);
391
+ });
392
+ });
393
+ if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
394
+ xScale.domain([minStep, maxStep]);
395
+ if (isRank) {
396
+ rankTickMax = Math.max(1, Math.round(maxVal));
397
+ yScale.domain([rankTickMax, 1]);
398
+ } else {
399
+ yScale.domain([0, Math.max(1, maxVal)]).nice();
400
+ }
401
+ isRankStrictFlag = isRankStrict;
402
+
403
+ const { innerWidth, innerHeight } = updateScales();
404
+
405
+ // Bind lines and markers
406
+ const series = runs.map((r, i) => ({
407
+ run: r,
408
+ color: pool[i % pool.length],
409
+ marker: markerShapes[i % markerShapes.length],
410
+ values: (map[r]||[])
411
+ .slice()
412
+ .sort((a,b)=>a.step-b.step)
413
+ .map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
414
+ }));
415
+
416
+ // Draw lines
417
+ const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
418
+ paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
419
+ .attr('stroke', d=>d.color).attr('opacity',0.9)
420
+ .attr('d', d=>lineGen(d.values))
421
+ .merge(paths)
422
+ .transition().duration(200)
423
+ .attr('stroke', d=>d.color)
424
+ .attr('d', d=>lineGen(d.values));
425
+ paths.exit().remove();
426
+
427
+ // Draw markers for each data point
428
+ gPoints.selectAll('*').remove();
429
+ series.forEach((s, seriesIndex) => {
430
+ const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
431
+ .data(s.values)
432
+ .join('g')
433
+ .attr('class', `points-${seriesIndex}`)
434
+ .attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
435
+
436
+ drawMarker(pointGroup, s.marker, markerSize)
437
+ .attr('fill', s.color)
438
+ .attr('stroke', s.color)
439
+ .attr('stroke-width', 1.5)
440
+ .style('cursor', 'crosshair');
441
+ });
442
+
443
+ // Inline legend content with marker shapes
444
+ legendInline.innerHTML = '';
445
+ series.forEach(s => {
446
+ const legendItem = document.createElement('span');
447
+ legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
448
+
449
+ // Create small SVG for marker shape
450
+ const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
451
+ markerSvg.setAttribute('width', '16');
452
+ markerSvg.setAttribute('height', '12');
453
+ markerSvg.style.display = 'inline-block';
454
+
455
+ const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
456
+ g.setAttribute('transform', 'translate(8,6)');
457
+
458
+ let shape;
459
+ const size = 6;
460
+ const halfSize = size / 2;
461
+ switch(s.marker) {
462
+ case 'circle':
463
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
464
+ shape.setAttribute('r', halfSize);
465
+ break;
466
+ case 'square':
467
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
468
+ shape.setAttribute('x', -halfSize);
469
+ shape.setAttribute('y', -halfSize);
470
+ shape.setAttribute('width', size);
471
+ shape.setAttribute('height', size);
472
+ break;
473
+ case 'triangle':
474
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
475
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
476
+ break;
477
+ case 'diamond':
478
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
479
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
480
+ break;
481
+ case 'inverted-triangle':
482
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
483
+ shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
484
+ break;
485
+ default:
486
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
487
+ shape.setAttribute('r', halfSize);
488
+ }
489
+ shape.setAttribute('fill', s.color);
490
+ shape.setAttribute('stroke', s.color);
491
+ shape.setAttribute('stroke-width', '1');
492
+
493
+ g.appendChild(shape);
494
+ markerSvg.appendChild(g);
495
+
496
+ const label = document.createElement('span');
497
+ label.textContent = s.run;
498
+
499
+ legendItem.appendChild(markerSvg);
500
+ legendItem.appendChild(label);
501
+ legendInline.appendChild(legendItem);
502
+ });
503
+
504
+ // Hover
505
+ const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
506
+ const steps = Array.from(stepSet).sort((a,b)=>a-b);
507
+ function onMove(event){
508
+ const [mx, my] = d3.pointer(event, overlay.node());
509
+ const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
510
+ const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
511
+ const xpx = xScale(nearest);
512
+ hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
513
+ // Tooltip content
514
+ let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
515
+ series.forEach(s=>{
516
+ const m = new Map(s.values.map(v=>[v.step, v.value]));
517
+ const val = m.has(nearest) ? m.get(nearest) : null;
518
+ if (val != null) {
519
+ const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
520
+ html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
521
+ }
522
+ });
523
+ tipInner.innerHTML = html;
524
+ const offsetX = 12, offsetY = 12;
525
+ tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
526
+ }
527
+ function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
528
+ overlay.on('mousemove', onMove).on('mouseleave', onLeave);
529
+ }
530
+
531
+ // (old hover removed; hover is attached in renderMetric)
532
+
533
+ // Load CSV and wire controls
534
+ (async () => {
535
+ try {
536
+ const text = await fetchFirstAvailable(CSV_PATHS);
537
+ const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
538
+ metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
539
+ runList = Array.from(new Set(rows.map(r=>r.run))).sort();
540
+ runOrder = runList;
541
+ // Build dataByMetric
542
+ metricList.forEach(m => {
543
+ const map = {};
544
+ runList.forEach(r => { map[r] = []; });
545
+ rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
546
+ dataByMetric.set(m, map);
547
+ });
548
+
549
+ // Populate metric select (default to average_rank if present)
550
+ metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
551
+ const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
552
+ if (def) selectMetric.value = def;
553
+
554
+ container.appendChild(controls);
555
+ updateScales();
556
+ renderMetric(selectMetric.value);
557
+
558
+ selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
559
+
560
+ const rerender = () => { renderMetric(selectMetric.value); };
561
+ if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
562
+ } catch (e) {
563
+ const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
564
+ pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
565
+ container.appendChild(pre);
566
+ }
567
+ })();
568
+ };
569
+
570
+ if (document.readyState === 'loading') {
571
+ document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
572
+ } else { ensureD3(bootstrap); }
573
+ })();
574
+ </script>
575
+
576
+
app/src/content/embeds/image-correspondence-filters.html ADDED
@@ -0,0 +1,576 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div class="d3-line" style="width:100%;margin:10px 0;"></div>
2
+ <style>
3
+ .d3-line .d3-line__controls select {
4
+ font-size: 12px;
5
+ padding: 8px 28px 8px 10px;
6
+ border: 1px solid var(--border-color);
7
+ border-radius: 8px;
8
+ background-color: var(--surface-bg);
9
+ color: var(--text-color);
10
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
11
+ background-repeat: no-repeat;
12
+ background-position: right 8px center;
13
+ background-size: 12px;
14
+ -webkit-appearance: none;
15
+ -moz-appearance: none;
16
+ appearance: none;
17
+ cursor: pointer;
18
+ transition: border-color .15s ease, box-shadow .15s ease;
19
+ }
20
+ [data-theme="dark"] .d3-line .d3-line__controls select {
21
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
22
+ }
23
+ .d3-line .d3-line__controls select:hover {
24
+ border-color: var(--primary-color);
25
+ }
26
+ .d3-line .d3-line__controls select:focus {
27
+ border-color: var(--primary-color);
28
+ box-shadow: 0 0 0 3px rgba(232,137,171,.25);
29
+ outline: none;
30
+ }
31
+ .d3-line .d3-line__controls label { gap: 8px; }
32
+
33
+ /* Range slider themed with --primary-color */
34
+ .d3-line .d3-line__controls input[type="range"] {
35
+ -webkit-appearance: none;
36
+ appearance: none;
37
+ width: 100%;
38
+ height: 6px;
39
+ border-radius: 999px;
40
+ background: var(--border-color);
41
+ outline: none;
42
+ }
43
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
44
+ height: 6px;
45
+ background: transparent;
46
+ border-radius: 999px;
47
+ }
48
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
49
+ -webkit-appearance: none;
50
+ appearance: none;
51
+ width: 16px;
52
+ height: 16px;
53
+ border-radius: 50%;
54
+ background: var(--primary-color);
55
+ border: 2px solid var(--on-primary);
56
+ margin-top: -5px;
57
+ cursor: pointer;
58
+ }
59
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-track {
60
+ height: 6px;
61
+ background: transparent;
62
+ border-radius: 999px;
63
+ }
64
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
65
+ width: 16px;
66
+ height: 16px;
67
+ border-radius: 50%;
68
+ background: var(--primary-color);
69
+ border: 2px solid var(--on-primary);
70
+ cursor: pointer;
71
+ }
72
+ /* Improved line color via CSS */
73
+ .d3-line .lines path.improved { stroke: var(--primary-color); }
74
+ </style>
75
+ <script>
76
+ (() => {
77
+ const ensureD3 = (cb) => {
78
+ if (window.d3 && typeof window.d3.select === 'function') return cb();
79
+ let s = document.getElementById('d3-cdn-script');
80
+ if (!s) {
81
+ s = document.createElement('script');
82
+ s.id = 'd3-cdn-script';
83
+ s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
84
+ document.head.appendChild(s);
85
+ }
86
+ const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
87
+ s.addEventListener('load', onReady, { once: true });
88
+ if (window.d3) onReady();
89
+ };
90
+
91
+ const bootstrap = () => {
92
+ const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
93
+ const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
94
+ if (!container) return;
95
+ if (container.dataset) {
96
+ if (container.dataset.mounted === 'true') return;
97
+ container.dataset.mounted = 'true';
98
+ }
99
+
100
+ // CSV: prefer public path, fallback to relative
101
+ const CSV_PATHS = [
102
+ '/data/image_correspondence_filters.csv',
103
+ './assets/data/image_correspondence_filters.csv',
104
+ '../assets/data/image_correspondence_filters.csv',
105
+ '../../assets/data/image_correspondence_filters.csv'
106
+ ];
107
+ const fetchFirstAvailable = async (paths) => {
108
+ for (const p of paths) {
109
+ try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
110
+ }
111
+ throw new Error('CSV not found: image_correspondence_filters.csv');
112
+ };
113
+
114
+ // Controls UI
115
+ const controls = document.createElement('div');
116
+ controls.className = 'd3-line__controls';
117
+ Object.assign(controls.style, {
118
+ marginTop: '12px',
119
+ display: 'flex',
120
+ gap: '16px',
121
+ alignItems: 'center',
122
+ justifyContent: 'space-between',
123
+ width: '100%'
124
+ });
125
+
126
+ const labelMetric = document.createElement('label');
127
+ Object.assign(labelMetric.style, {
128
+ fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
129
+ });
130
+ labelMetric.textContent = 'Metric';
131
+ const selectMetric = document.createElement('select');
132
+ Object.assign(selectMetric.style, { fontSize: '12px' });
133
+ labelMetric.appendChild(selectMetric);
134
+
135
+ // Inline legend on the right of the select
136
+ const legendInline = document.createElement('div');
137
+ legendInline.className = 'controls__legend';
138
+ Object.assign(legendInline.style, {
139
+ display: 'flex',
140
+ gap: '8px',
141
+ alignItems: 'center',
142
+ flexWrap: 'nowrap',
143
+ fontSize: '11px',
144
+ marginLeft: '8px'
145
+ });
146
+ controls.appendChild(legendInline);
147
+ controls.appendChild(labelMetric);
148
+
149
+ // Create SVG with marker definitions
150
+ const svg = d3.select(container).append('svg')
151
+ .attr('width', '100%')
152
+ .style('display', 'block');
153
+
154
+ // Add marker definitions for different shapes
155
+ const defs = svg.append('defs');
156
+
157
+ // Academic marker shapes
158
+ const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
159
+ const markerSize = 8;
160
+
161
+ // Groups
162
+ const gRoot = svg.append('g');
163
+ const gGrid = gRoot.append('g').attr('class', 'grid');
164
+ const gAxes = gRoot.append('g').attr('class', 'axes');
165
+ const gLines = gRoot.append('g').attr('class', 'lines');
166
+ const gPoints = gRoot.append('g').attr('class', 'points');
167
+ const gHover = gRoot.append('g').attr('class', 'hover');
168
+ const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
169
+
170
+ // Tooltip
171
+ container.style.position = container.style.position || 'relative';
172
+ let tip = container.querySelector('.d3-tooltip');
173
+ let tipInner;
174
+ if (!tip) {
175
+ tip = document.createElement('div');
176
+ tip.className = 'd3-tooltip';
177
+ Object.assign(tip.style, {
178
+ position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
179
+ padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
180
+ background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
181
+ transition: 'opacity .12s ease'
182
+ });
183
+ tipInner = document.createElement('div');
184
+ tipInner.className = 'd3-tooltip__inner';
185
+ tipInner.style.textAlign = 'left';
186
+ tip.appendChild(tipInner);
187
+ container.appendChild(tip);
188
+ } else {
189
+ tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
190
+ }
191
+
192
+ // Colors per run
193
+ const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
194
+ const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
195
+
196
+ // Mapping from metric names to display titles
197
+ const metricTitleMapping = {
198
+ 'docvqa_val_anls': 'DocVQA',
199
+ 'infovqa_val_anls': 'InfoVQA',
200
+ 'mme_total_score': 'MME Total',
201
+ 'mmmu_val_mmmu_acc': 'MMMU',
202
+ 'mmstar_average': 'MMStar',
203
+ 'ocrbench_ocrbench_accuracy': 'OCRBench',
204
+ 'scienceqa_exact_match': 'ScienceQA',
205
+ 'textvqa_val_exact_match': 'TextVQA',
206
+ 'average': 'Average (excl. MME)',
207
+ 'average_rank': 'Average Rank',
208
+ 'ai2d_exact_match': 'AI2D',
209
+ 'chartqa_relaxed_overall': 'ChartQA',
210
+ 'seedbench_seed_all': 'SeedBench'
211
+ };
212
+
213
+ // Function to get display name for metric
214
+ function getMetricDisplayName(metricKey) {
215
+ return metricTitleMapping[metricKey] || metricKey;
216
+ }
217
+
218
+ // State and data
219
+ let metricList = [];
220
+ let runList = [];
221
+ let runOrder = [];
222
+ const dataByMetric = new Map(); // metric => { run => [{step,value}] }
223
+ let isRankStrictFlag = false;
224
+ let rankTickMax = 1;
225
+
226
+ // Scales and layout
227
+ let width = 800, height = 360;
228
+ let margin = { top: 16, right: 28, bottom: 56, left: 64 };
229
+ let xScale = d3.scaleLinear();
230
+ let yScale = d3.scaleLinear();
231
+
232
+ // Line generators - simple linear connections
233
+ const lineGen = d3.line()
234
+ .x((d) => xScale(d.step))
235
+ .y((d) => yScale(d.value));
236
+
237
+ // Function to draw different marker shapes
238
+ function drawMarker(selection, shape, size) {
239
+ const s = size / 2;
240
+ switch (shape) {
241
+ case 'circle':
242
+ return selection.append('circle').attr('r', s);
243
+ case 'square':
244
+ return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
245
+ case 'triangle':
246
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
247
+ case 'diamond':
248
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
249
+ case 'inverted-triangle':
250
+ return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
251
+ default:
252
+ return selection.append('circle').attr('r', s);
253
+ }
254
+ }
255
+
256
+ // Hover elements
257
+ const hoverLine = gHover.append('line').attr('stroke-width', 1);
258
+
259
+ const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
260
+
261
+ function updateScales() {
262
+ const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
263
+ const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
264
+ const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
265
+ const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
266
+
267
+ width = container.clientWidth || 800;
268
+ height = Math.max(360, Math.round(width / 2.2));
269
+ svg.attr('width', width).attr('height', height);
270
+
271
+ const innerWidth = width - margin.left - margin.right;
272
+ const innerHeight = height - margin.top - margin.bottom;
273
+ gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
274
+
275
+ xScale.range([0, innerWidth]);
276
+ yScale.range([innerHeight, 0]);
277
+
278
+ // Compute Y ticks
279
+ let yTicks = [];
280
+ if (isRankStrictFlag) {
281
+ const maxR = Math.max(1, Math.round(rankTickMax));
282
+ for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
283
+ } else {
284
+ // Use D3's tick generator to produce nice floating-point ticks
285
+ yTicks = yScale.ticks(6);
286
+ }
287
+
288
+ // Grid (horizontal)
289
+ gGrid.selectAll('*').remove();
290
+ gGrid.selectAll('line')
291
+ .data(yTicks)
292
+ .join('line')
293
+ .attr('x1', 0)
294
+ .attr('x2', innerWidth)
295
+ .attr('y1', (d) => yScale(d))
296
+ .attr('y2', (d) => yScale(d))
297
+ .attr('stroke', gridColor)
298
+ .attr('stroke-width', 1)
299
+ .attr('shape-rendering', 'crispEdges');
300
+
301
+ // Axes
302
+ gAxes.selectAll('*').remove();
303
+ let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
304
+ if (isRankStrictFlag) {
305
+ const [dx0, dx1] = xScale.domain();
306
+ const start = Math.ceil(dx0 / 1000) * 1000;
307
+ const end = Math.floor(dx1 / 1000) * 1000;
308
+ const xTicks = [];
309
+ for (let v = start; v <= end; v += 1000) xTicks.push(v);
310
+ if (xTicks.length === 0) xTicks.push(Math.round(dx0));
311
+ xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
312
+ } else {
313
+ xAxis = xAxis.ticks(8);
314
+ }
315
+ const yAxis = d3.axisLeft(yScale)
316
+ .tickValues(yTicks)
317
+ .tickSizeOuter(0)
318
+ .tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
319
+ gAxes.append('g')
320
+ .attr('transform', `translate(0,${innerHeight})`)
321
+ .call(xAxis)
322
+ .call((g) => {
323
+ g.selectAll('path, line').attr('stroke', axisColor);
324
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
325
+ });
326
+ gAxes.append('g')
327
+ .call(yAxis)
328
+ .call((g) => {
329
+ g.selectAll('path, line').attr('stroke', axisColor);
330
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
331
+ });
332
+
333
+ // Axis labels (X and Y)
334
+ gAxes.append('text')
335
+ .attr('class', 'axis-label axis-label--x')
336
+ .attr('x', innerWidth / 2)
337
+ .attr('y', innerHeight + 44)
338
+ .attr('text-anchor', 'middle')
339
+ .style('font-size', '12px')
340
+ .style('fill', tickColor)
341
+ .text('Step');
342
+ gAxes.append('text')
343
+ .attr('class', 'axis-label axis-label--y')
344
+ .attr('text-anchor', 'middle')
345
+ .attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
346
+ .style('font-size', '12px')
347
+ .style('fill', tickColor)
348
+ .text('Value');
349
+
350
+ overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
351
+ hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
352
+
353
+ // Legend placeholder; actual content set in renderMetric
354
+ const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
355
+ const legendHeight = 64;
356
+ gLegend
357
+ .attr('x', innerWidth - legendWidth + 42)
358
+ .attr('y', innerHeight - legendHeight - 12)
359
+ .attr('width', legendWidth)
360
+ .attr('height', legendHeight);
361
+ const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
362
+ Object.assign(legendRoot.node().style, {
363
+ background: 'transparent',
364
+ border: 'none',
365
+ borderRadius: '0',
366
+ padding: '0',
367
+ fontSize: '12px',
368
+ lineHeight: '1.35',
369
+ color: 'var(--text-color)'
370
+ });
371
+
372
+ return { innerWidth, innerHeight };
373
+ }
374
+
375
+ function renderMetric(metricKey){
376
+ const map = dataByMetric.get(metricKey) || {};
377
+ const runs = runOrder;
378
+ // Domain
379
+ let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
380
+ const isRank = /rank/i.test(metricKey);
381
+ const isAverage = /average/i.test(metricKey);
382
+ const isRankStrict = isRank && !isAverage;
383
+ runs.forEach(r => {
384
+ const arr = map[r] || [];
385
+ arr.forEach(pt => {
386
+ const val = isRankStrict ? Math.round(pt.value) : pt.value;
387
+ minStep = Math.min(minStep, pt.step);
388
+ maxStep = Math.max(maxStep, pt.step);
389
+ maxVal = Math.max(maxVal, val);
390
+ minVal = Math.min(minVal, val);
391
+ });
392
+ });
393
+ if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
394
+ xScale.domain([minStep, maxStep]);
395
+ if (isRank) {
396
+ rankTickMax = Math.max(1, Math.round(maxVal));
397
+ yScale.domain([rankTickMax, 1]);
398
+ } else {
399
+ yScale.domain([0, Math.max(1, maxVal)]).nice();
400
+ }
401
+ isRankStrictFlag = isRankStrict;
402
+
403
+ const { innerWidth, innerHeight } = updateScales();
404
+
405
+ // Bind lines and markers
406
+ const series = runs.map((r, i) => ({
407
+ run: r,
408
+ color: pool[i % pool.length],
409
+ marker: markerShapes[i % markerShapes.length],
410
+ values: (map[r]||[])
411
+ .slice()
412
+ .sort((a,b)=>a.step-b.step)
413
+ .map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
414
+ }));
415
+
416
+ // Draw lines
417
+ const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
418
+ paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
419
+ .attr('stroke', d=>d.color).attr('opacity',0.9)
420
+ .attr('d', d=>lineGen(d.values))
421
+ .merge(paths)
422
+ .transition().duration(200)
423
+ .attr('stroke', d=>d.color)
424
+ .attr('d', d=>lineGen(d.values));
425
+ paths.exit().remove();
426
+
427
+ // Draw markers for each data point
428
+ gPoints.selectAll('*').remove();
429
+ series.forEach((s, seriesIndex) => {
430
+ const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
431
+ .data(s.values)
432
+ .join('g')
433
+ .attr('class', `points-${seriesIndex}`)
434
+ .attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
435
+
436
+ drawMarker(pointGroup, s.marker, markerSize)
437
+ .attr('fill', s.color)
438
+ .attr('stroke', s.color)
439
+ .attr('stroke-width', 1.5)
440
+ .style('cursor', 'crosshair');
441
+ });
442
+
443
+ // Inline legend content with marker shapes
444
+ legendInline.innerHTML = '';
445
+ series.forEach(s => {
446
+ const legendItem = document.createElement('span');
447
+ legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
448
+
449
+ // Create small SVG for marker shape
450
+ const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
451
+ markerSvg.setAttribute('width', '16');
452
+ markerSvg.setAttribute('height', '12');
453
+ markerSvg.style.display = 'inline-block';
454
+
455
+ const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
456
+ g.setAttribute('transform', 'translate(8,6)');
457
+
458
+ let shape;
459
+ const size = 6;
460
+ const halfSize = size / 2;
461
+ switch(s.marker) {
462
+ case 'circle':
463
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
464
+ shape.setAttribute('r', halfSize);
465
+ break;
466
+ case 'square':
467
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
468
+ shape.setAttribute('x', -halfSize);
469
+ shape.setAttribute('y', -halfSize);
470
+ shape.setAttribute('width', size);
471
+ shape.setAttribute('height', size);
472
+ break;
473
+ case 'triangle':
474
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
475
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
476
+ break;
477
+ case 'diamond':
478
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
479
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
480
+ break;
481
+ case 'inverted-triangle':
482
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
483
+ shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
484
+ break;
485
+ default:
486
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
487
+ shape.setAttribute('r', halfSize);
488
+ }
489
+ shape.setAttribute('fill', s.color);
490
+ shape.setAttribute('stroke', s.color);
491
+ shape.setAttribute('stroke-width', '1');
492
+
493
+ g.appendChild(shape);
494
+ markerSvg.appendChild(g);
495
+
496
+ const label = document.createElement('span');
497
+ label.textContent = s.run;
498
+
499
+ legendItem.appendChild(markerSvg);
500
+ legendItem.appendChild(label);
501
+ legendInline.appendChild(legendItem);
502
+ });
503
+
504
+ // Hover
505
+ const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
506
+ const steps = Array.from(stepSet).sort((a,b)=>a-b);
507
+ function onMove(event){
508
+ const [mx, my] = d3.pointer(event, overlay.node());
509
+ const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
510
+ const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
511
+ const xpx = xScale(nearest);
512
+ hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
513
+ // Tooltip content
514
+ let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
515
+ series.forEach(s=>{
516
+ const m = new Map(s.values.map(v=>[v.step, v.value]));
517
+ const val = m.has(nearest) ? m.get(nearest) : null;
518
+ if (val != null) {
519
+ const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
520
+ html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
521
+ }
522
+ });
523
+ tipInner.innerHTML = html;
524
+ const offsetX = 12, offsetY = 12;
525
+ tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
526
+ }
527
+ function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
528
+ overlay.on('mousemove', onMove).on('mouseleave', onLeave);
529
+ }
530
+
531
+ // (old hover removed; hover is attached in renderMetric)
532
+
533
+ // Load CSV and wire controls
534
+ (async () => {
535
+ try {
536
+ const text = await fetchFirstAvailable(CSV_PATHS);
537
+ const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
538
+ metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
539
+ runList = Array.from(new Set(rows.map(r=>r.run))).sort();
540
+ runOrder = runList;
541
+ // Build dataByMetric
542
+ metricList.forEach(m => {
543
+ const map = {};
544
+ runList.forEach(r => { map[r] = []; });
545
+ rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
546
+ dataByMetric.set(m, map);
547
+ });
548
+
549
+ // Populate metric select (default to average_rank if present)
550
+ metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
551
+ const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
552
+ if (def) selectMetric.value = def;
553
+
554
+ container.appendChild(controls);
555
+ updateScales();
556
+ renderMetric(selectMetric.value);
557
+
558
+ selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
559
+
560
+ const rerender = () => { renderMetric(selectMetric.value); };
561
+ if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
562
+ } catch (e) {
563
+ const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
564
+ pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
565
+ container.appendChild(pre);
566
+ }
567
+ })();
568
+ };
569
+
570
+ if (document.readyState === 'loading') {
571
+ document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
572
+ } else { ensureD3(bootstrap); }
573
+ })();
574
+ </script>
575
+
576
+
app/src/content/embeds/internal-deduplication.html ADDED
@@ -0,0 +1,576 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div class="d3-line" style="width:100%;margin:10px 0;"></div>
2
+ <style>
3
+ .d3-line .d3-line__controls select {
4
+ font-size: 12px;
5
+ padding: 8px 28px 8px 10px;
6
+ border: 1px solid var(--border-color);
7
+ border-radius: 8px;
8
+ background-color: var(--surface-bg);
9
+ color: var(--text-color);
10
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
11
+ background-repeat: no-repeat;
12
+ background-position: right 8px center;
13
+ background-size: 12px;
14
+ -webkit-appearance: none;
15
+ -moz-appearance: none;
16
+ appearance: none;
17
+ cursor: pointer;
18
+ transition: border-color .15s ease, box-shadow .15s ease;
19
+ }
20
+ [data-theme="dark"] .d3-line .d3-line__controls select {
21
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
22
+ }
23
+ .d3-line .d3-line__controls select:hover {
24
+ border-color: var(--primary-color);
25
+ }
26
+ .d3-line .d3-line__controls select:focus {
27
+ border-color: var(--primary-color);
28
+ box-shadow: 0 0 0 3px rgba(232,137,171,.25);
29
+ outline: none;
30
+ }
31
+ .d3-line .d3-line__controls label { gap: 8px; }
32
+
33
+ /* Range slider themed with --primary-color */
34
+ .d3-line .d3-line__controls input[type="range"] {
35
+ -webkit-appearance: none;
36
+ appearance: none;
37
+ width: 100%;
38
+ height: 6px;
39
+ border-radius: 999px;
40
+ background: var(--border-color);
41
+ outline: none;
42
+ }
43
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
44
+ height: 6px;
45
+ background: transparent;
46
+ border-radius: 999px;
47
+ }
48
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
49
+ -webkit-appearance: none;
50
+ appearance: none;
51
+ width: 16px;
52
+ height: 16px;
53
+ border-radius: 50%;
54
+ background: var(--primary-color);
55
+ border: 2px solid var(--on-primary);
56
+ margin-top: -5px;
57
+ cursor: pointer;
58
+ }
59
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-track {
60
+ height: 6px;
61
+ background: transparent;
62
+ border-radius: 999px;
63
+ }
64
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
65
+ width: 16px;
66
+ height: 16px;
67
+ border-radius: 50%;
68
+ background: var(--primary-color);
69
+ border: 2px solid var(--on-primary);
70
+ cursor: pointer;
71
+ }
72
+ /* Improved line color via CSS */
73
+ .d3-line .lines path.improved { stroke: var(--primary-color); }
74
+ </style>
75
+ <script>
76
+ (() => {
77
+ const ensureD3 = (cb) => {
78
+ if (window.d3 && typeof window.d3.select === 'function') return cb();
79
+ let s = document.getElementById('d3-cdn-script');
80
+ if (!s) {
81
+ s = document.createElement('script');
82
+ s.id = 'd3-cdn-script';
83
+ s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
84
+ document.head.appendChild(s);
85
+ }
86
+ const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
87
+ s.addEventListener('load', onReady, { once: true });
88
+ if (window.d3) onReady();
89
+ };
90
+
91
+ const bootstrap = () => {
92
+ const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
93
+ const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
94
+ if (!container) return;
95
+ if (container.dataset) {
96
+ if (container.dataset.mounted === 'true') return;
97
+ container.dataset.mounted = 'true';
98
+ }
99
+
100
+ // CSV: prefer public path, fallback to relative
101
+ const CSV_PATHS = [
102
+ '/data/internal_deduplication.csv',
103
+ './assets/data/internal_deduplication.csv',
104
+ '../assets/data/internal_deduplication.csv',
105
+ '../../assets/data/internal_deduplication.csv'
106
+ ];
107
+ const fetchFirstAvailable = async (paths) => {
108
+ for (const p of paths) {
109
+ try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
110
+ }
111
+ throw new Error('CSV not found: internal_deduplication.csv');
112
+ };
113
+
114
+ // Controls UI
115
+ const controls = document.createElement('div');
116
+ controls.className = 'd3-line__controls';
117
+ Object.assign(controls.style, {
118
+ marginTop: '12px',
119
+ display: 'flex',
120
+ gap: '16px',
121
+ alignItems: 'center',
122
+ justifyContent: 'space-between',
123
+ width: '100%'
124
+ });
125
+
126
+ const labelMetric = document.createElement('label');
127
+ Object.assign(labelMetric.style, {
128
+ fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
129
+ });
130
+ labelMetric.textContent = 'Metric';
131
+ const selectMetric = document.createElement('select');
132
+ Object.assign(selectMetric.style, { fontSize: '12px' });
133
+ labelMetric.appendChild(selectMetric);
134
+
135
+ // Inline legend on the right of the select
136
+ const legendInline = document.createElement('div');
137
+ legendInline.className = 'controls__legend';
138
+ Object.assign(legendInline.style, {
139
+ display: 'flex',
140
+ gap: '8px',
141
+ alignItems: 'center',
142
+ flexWrap: 'nowrap',
143
+ fontSize: '11px',
144
+ marginLeft: '8px'
145
+ });
146
+ controls.appendChild(legendInline);
147
+ controls.appendChild(labelMetric);
148
+
149
+ // Create SVG with marker definitions
150
+ const svg = d3.select(container).append('svg')
151
+ .attr('width', '100%')
152
+ .style('display', 'block');
153
+
154
+ // Add marker definitions for different shapes
155
+ const defs = svg.append('defs');
156
+
157
+ // Academic marker shapes
158
+ const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
159
+ const markerSize = 8;
160
+
161
+ // Groups
162
+ const gRoot = svg.append('g');
163
+ const gGrid = gRoot.append('g').attr('class', 'grid');
164
+ const gAxes = gRoot.append('g').attr('class', 'axes');
165
+ const gLines = gRoot.append('g').attr('class', 'lines');
166
+ const gPoints = gRoot.append('g').attr('class', 'points');
167
+ const gHover = gRoot.append('g').attr('class', 'hover');
168
+ const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
169
+
170
+ // Tooltip
171
+ container.style.position = container.style.position || 'relative';
172
+ let tip = container.querySelector('.d3-tooltip');
173
+ let tipInner;
174
+ if (!tip) {
175
+ tip = document.createElement('div');
176
+ tip.className = 'd3-tooltip';
177
+ Object.assign(tip.style, {
178
+ position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
179
+ padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
180
+ background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
181
+ transition: 'opacity .12s ease'
182
+ });
183
+ tipInner = document.createElement('div');
184
+ tipInner.className = 'd3-tooltip__inner';
185
+ tipInner.style.textAlign = 'left';
186
+ tip.appendChild(tipInner);
187
+ container.appendChild(tip);
188
+ } else {
189
+ tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
190
+ }
191
+
192
+ // Colors per run
193
+ const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
194
+ const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
195
+
196
+ // Mapping from metric names to display titles
197
+ const metricTitleMapping = {
198
+ 'docvqa_val_anls': 'DocVQA',
199
+ 'infovqa_val_anls': 'InfoVQA',
200
+ 'mme_total_score': 'MME Total',
201
+ 'mmmu_val_mmmu_acc': 'MMMU',
202
+ 'mmstar_average': 'MMStar',
203
+ 'ocrbench_ocrbench_accuracy': 'OCRBench',
204
+ 'scienceqa_exact_match': 'ScienceQA',
205
+ 'textvqa_val_exact_match': 'TextVQA',
206
+ 'average': 'Average (excl. MME)',
207
+ 'average_rank': 'Average Rank',
208
+ 'ai2d_exact_match': 'AI2D',
209
+ 'chartqa_relaxed_overall': 'ChartQA',
210
+ 'seedbench_seed_all': 'SeedBench'
211
+ };
212
+
213
+ // Function to get display name for metric
214
+ function getMetricDisplayName(metricKey) {
215
+ return metricTitleMapping[metricKey] || metricKey;
216
+ }
217
+
218
+ // State and data
219
+ let metricList = [];
220
+ let runList = [];
221
+ let runOrder = [];
222
+ const dataByMetric = new Map(); // metric => { run => [{step,value}] }
223
+ let isRankStrictFlag = false;
224
+ let rankTickMax = 1;
225
+
226
+ // Scales and layout
227
+ let width = 800, height = 360;
228
+ let margin = { top: 16, right: 28, bottom: 56, left: 64 };
229
+ let xScale = d3.scaleLinear();
230
+ let yScale = d3.scaleLinear();
231
+
232
+ // Line generators - simple linear connections
233
+ const lineGen = d3.line()
234
+ .x((d) => xScale(d.step))
235
+ .y((d) => yScale(d.value));
236
+
237
+ // Function to draw different marker shapes
238
+ function drawMarker(selection, shape, size) {
239
+ const s = size / 2;
240
+ switch (shape) {
241
+ case 'circle':
242
+ return selection.append('circle').attr('r', s);
243
+ case 'square':
244
+ return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
245
+ case 'triangle':
246
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
247
+ case 'diamond':
248
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
249
+ case 'inverted-triangle':
250
+ return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
251
+ default:
252
+ return selection.append('circle').attr('r', s);
253
+ }
254
+ }
255
+
256
+ // Hover elements
257
+ const hoverLine = gHover.append('line').attr('stroke-width', 1);
258
+
259
+ const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
260
+
261
+ function updateScales() {
262
+ const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
263
+ const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
264
+ const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
265
+ const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
266
+
267
+ width = container.clientWidth || 800;
268
+ height = Math.max(360, Math.round(width / 2.2));
269
+ svg.attr('width', width).attr('height', height);
270
+
271
+ const innerWidth = width - margin.left - margin.right;
272
+ const innerHeight = height - margin.top - margin.bottom;
273
+ gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
274
+
275
+ xScale.range([0, innerWidth]);
276
+ yScale.range([innerHeight, 0]);
277
+
278
+ // Compute Y ticks
279
+ let yTicks = [];
280
+ if (isRankStrictFlag) {
281
+ const maxR = Math.max(1, Math.round(rankTickMax));
282
+ for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
283
+ } else {
284
+ // Use D3's tick generator to produce nice floating-point ticks
285
+ yTicks = yScale.ticks(6);
286
+ }
287
+
288
+ // Grid (horizontal)
289
+ gGrid.selectAll('*').remove();
290
+ gGrid.selectAll('line')
291
+ .data(yTicks)
292
+ .join('line')
293
+ .attr('x1', 0)
294
+ .attr('x2', innerWidth)
295
+ .attr('y1', (d) => yScale(d))
296
+ .attr('y2', (d) => yScale(d))
297
+ .attr('stroke', gridColor)
298
+ .attr('stroke-width', 1)
299
+ .attr('shape-rendering', 'crispEdges');
300
+
301
+ // Axes
302
+ gAxes.selectAll('*').remove();
303
+ let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
304
+ if (isRankStrictFlag) {
305
+ const [dx0, dx1] = xScale.domain();
306
+ const start = Math.ceil(dx0 / 1000) * 1000;
307
+ const end = Math.floor(dx1 / 1000) * 1000;
308
+ const xTicks = [];
309
+ for (let v = start; v <= end; v += 1000) xTicks.push(v);
310
+ if (xTicks.length === 0) xTicks.push(Math.round(dx0));
311
+ xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
312
+ } else {
313
+ xAxis = xAxis.ticks(8);
314
+ }
315
+ const yAxis = d3.axisLeft(yScale)
316
+ .tickValues(yTicks)
317
+ .tickSizeOuter(0)
318
+ .tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
319
+ gAxes.append('g')
320
+ .attr('transform', `translate(0,${innerHeight})`)
321
+ .call(xAxis)
322
+ .call((g) => {
323
+ g.selectAll('path, line').attr('stroke', axisColor);
324
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
325
+ });
326
+ gAxes.append('g')
327
+ .call(yAxis)
328
+ .call((g) => {
329
+ g.selectAll('path, line').attr('stroke', axisColor);
330
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
331
+ });
332
+
333
+ // Axis labels (X and Y)
334
+ gAxes.append('text')
335
+ .attr('class', 'axis-label axis-label--x')
336
+ .attr('x', innerWidth / 2)
337
+ .attr('y', innerHeight + 44)
338
+ .attr('text-anchor', 'middle')
339
+ .style('font-size', '12px')
340
+ .style('fill', tickColor)
341
+ .text('Step');
342
+ gAxes.append('text')
343
+ .attr('class', 'axis-label axis-label--y')
344
+ .attr('text-anchor', 'middle')
345
+ .attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
346
+ .style('font-size', '12px')
347
+ .style('fill', tickColor)
348
+ .text('Value');
349
+
350
+ overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
351
+ hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
352
+
353
+ // Legend placeholder; actual content set in renderMetric
354
+ const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
355
+ const legendHeight = 64;
356
+ gLegend
357
+ .attr('x', innerWidth - legendWidth + 42)
358
+ .attr('y', innerHeight - legendHeight - 12)
359
+ .attr('width', legendWidth)
360
+ .attr('height', legendHeight);
361
+ const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
362
+ Object.assign(legendRoot.node().style, {
363
+ background: 'transparent',
364
+ border: 'none',
365
+ borderRadius: '0',
366
+ padding: '0',
367
+ fontSize: '12px',
368
+ lineHeight: '1.35',
369
+ color: 'var(--text-color)'
370
+ });
371
+
372
+ return { innerWidth, innerHeight };
373
+ }
374
+
375
+ function renderMetric(metricKey){
376
+ const map = dataByMetric.get(metricKey) || {};
377
+ const runs = runOrder;
378
+ // Domain
379
+ let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
380
+ const isRank = /rank/i.test(metricKey);
381
+ const isAverage = /average/i.test(metricKey);
382
+ const isRankStrict = isRank && !isAverage;
383
+ runs.forEach(r => {
384
+ const arr = map[r] || [];
385
+ arr.forEach(pt => {
386
+ const val = isRankStrict ? Math.round(pt.value) : pt.value;
387
+ minStep = Math.min(minStep, pt.step);
388
+ maxStep = Math.max(maxStep, pt.step);
389
+ maxVal = Math.max(maxVal, val);
390
+ minVal = Math.min(minVal, val);
391
+ });
392
+ });
393
+ if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
394
+ xScale.domain([minStep, maxStep]);
395
+ if (isRank) {
396
+ rankTickMax = Math.max(1, Math.round(maxVal));
397
+ yScale.domain([rankTickMax, 1]);
398
+ } else {
399
+ yScale.domain([0, Math.max(1, maxVal)]).nice();
400
+ }
401
+ isRankStrictFlag = isRankStrict;
402
+
403
+ const { innerWidth, innerHeight } = updateScales();
404
+
405
+ // Bind lines and markers
406
+ const series = runs.map((r, i) => ({
407
+ run: r,
408
+ color: pool[i % pool.length],
409
+ marker: markerShapes[i % markerShapes.length],
410
+ values: (map[r]||[])
411
+ .slice()
412
+ .sort((a,b)=>a.step-b.step)
413
+ .map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
414
+ }));
415
+
416
+ // Draw lines
417
+ const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
418
+ paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
419
+ .attr('stroke', d=>d.color).attr('opacity',0.9)
420
+ .attr('d', d=>lineGen(d.values))
421
+ .merge(paths)
422
+ .transition().duration(200)
423
+ .attr('stroke', d=>d.color)
424
+ .attr('d', d=>lineGen(d.values));
425
+ paths.exit().remove();
426
+
427
+ // Draw markers for each data point
428
+ gPoints.selectAll('*').remove();
429
+ series.forEach((s, seriesIndex) => {
430
+ const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
431
+ .data(s.values)
432
+ .join('g')
433
+ .attr('class', `points-${seriesIndex}`)
434
+ .attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
435
+
436
+ drawMarker(pointGroup, s.marker, markerSize)
437
+ .attr('fill', s.color)
438
+ .attr('stroke', s.color)
439
+ .attr('stroke-width', 1.5)
440
+ .style('cursor', 'crosshair');
441
+ });
442
+
443
+ // Inline legend content with marker shapes
444
+ legendInline.innerHTML = '';
445
+ series.forEach(s => {
446
+ const legendItem = document.createElement('span');
447
+ legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
448
+
449
+ // Create small SVG for marker shape
450
+ const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
451
+ markerSvg.setAttribute('width', '16');
452
+ markerSvg.setAttribute('height', '12');
453
+ markerSvg.style.display = 'inline-block';
454
+
455
+ const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
456
+ g.setAttribute('transform', 'translate(8,6)');
457
+
458
+ let shape;
459
+ const size = 6;
460
+ const halfSize = size / 2;
461
+ switch(s.marker) {
462
+ case 'circle':
463
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
464
+ shape.setAttribute('r', halfSize);
465
+ break;
466
+ case 'square':
467
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
468
+ shape.setAttribute('x', -halfSize);
469
+ shape.setAttribute('y', -halfSize);
470
+ shape.setAttribute('width', size);
471
+ shape.setAttribute('height', size);
472
+ break;
473
+ case 'triangle':
474
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
475
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
476
+ break;
477
+ case 'diamond':
478
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
479
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
480
+ break;
481
+ case 'inverted-triangle':
482
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
483
+ shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
484
+ break;
485
+ default:
486
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
487
+ shape.setAttribute('r', halfSize);
488
+ }
489
+ shape.setAttribute('fill', s.color);
490
+ shape.setAttribute('stroke', s.color);
491
+ shape.setAttribute('stroke-width', '1');
492
+
493
+ g.appendChild(shape);
494
+ markerSvg.appendChild(g);
495
+
496
+ const label = document.createElement('span');
497
+ label.textContent = s.run;
498
+
499
+ legendItem.appendChild(markerSvg);
500
+ legendItem.appendChild(label);
501
+ legendInline.appendChild(legendItem);
502
+ });
503
+
504
+ // Hover
505
+ const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
506
+ const steps = Array.from(stepSet).sort((a,b)=>a-b);
507
+ function onMove(event){
508
+ const [mx, my] = d3.pointer(event, overlay.node());
509
+ const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
510
+ const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
511
+ const xpx = xScale(nearest);
512
+ hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
513
+ // Tooltip content
514
+ let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
515
+ series.forEach(s=>{
516
+ const m = new Map(s.values.map(v=>[v.step, v.value]));
517
+ const val = m.has(nearest) ? m.get(nearest) : null;
518
+ if (val != null) {
519
+ const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
520
+ html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
521
+ }
522
+ });
523
+ tipInner.innerHTML = html;
524
+ const offsetX = 12, offsetY = 12;
525
+ tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
526
+ }
527
+ function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
528
+ overlay.on('mousemove', onMove).on('mouseleave', onLeave);
529
+ }
530
+
531
+ // (old hover removed; hover is attached in renderMetric)
532
+
533
+ // Load CSV and wire controls
534
+ (async () => {
535
+ try {
536
+ const text = await fetchFirstAvailable(CSV_PATHS);
537
+ const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
538
+ metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
539
+ runList = Array.from(new Set(rows.map(r=>r.run))).sort();
540
+ runOrder = runList;
541
+ // Build dataByMetric
542
+ metricList.forEach(m => {
543
+ const map = {};
544
+ runList.forEach(r => { map[r] = []; });
545
+ rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
546
+ dataByMetric.set(m, map);
547
+ });
548
+
549
+ // Populate metric select (default to average_rank if present)
550
+ metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
551
+ const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
552
+ if (def) selectMetric.value = def;
553
+
554
+ container.appendChild(controls);
555
+ updateScales();
556
+ renderMetric(selectMetric.value);
557
+
558
+ selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
559
+
560
+ const rerender = () => { renderMetric(selectMetric.value); };
561
+ if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
562
+ } catch (e) {
563
+ const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
564
+ pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
565
+ container.appendChild(pre);
566
+ }
567
+ })();
568
+ };
569
+
570
+ if (document.readyState === 'loading') {
571
+ document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
572
+ } else { ensureD3(bootstrap); }
573
+ })();
574
+ </script>
575
+
576
+
app/src/content/embeds/relevance-filters.html ADDED
@@ -0,0 +1,576 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div class="d3-line" style="width:100%;margin:10px 0;"></div>
2
+ <style>
3
+ .d3-line .d3-line__controls select {
4
+ font-size: 12px;
5
+ padding: 8px 28px 8px 10px;
6
+ border: 1px solid var(--border-color);
7
+ border-radius: 8px;
8
+ background-color: var(--surface-bg);
9
+ color: var(--text-color);
10
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
11
+ background-repeat: no-repeat;
12
+ background-position: right 8px center;
13
+ background-size: 12px;
14
+ -webkit-appearance: none;
15
+ -moz-appearance: none;
16
+ appearance: none;
17
+ cursor: pointer;
18
+ transition: border-color .15s ease, box-shadow .15s ease;
19
+ }
20
+ [data-theme="dark"] .d3-line .d3-line__controls select {
21
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
22
+ }
23
+ .d3-line .d3-line__controls select:hover {
24
+ border-color: var(--primary-color);
25
+ }
26
+ .d3-line .d3-line__controls select:focus {
27
+ border-color: var(--primary-color);
28
+ box-shadow: 0 0 0 3px rgba(232,137,171,.25);
29
+ outline: none;
30
+ }
31
+ .d3-line .d3-line__controls label { gap: 8px; }
32
+
33
+ /* Range slider themed with --primary-color */
34
+ .d3-line .d3-line__controls input[type="range"] {
35
+ -webkit-appearance: none;
36
+ appearance: none;
37
+ width: 100%;
38
+ height: 6px;
39
+ border-radius: 999px;
40
+ background: var(--border-color);
41
+ outline: none;
42
+ }
43
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
44
+ height: 6px;
45
+ background: transparent;
46
+ border-radius: 999px;
47
+ }
48
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
49
+ -webkit-appearance: none;
50
+ appearance: none;
51
+ width: 16px;
52
+ height: 16px;
53
+ border-radius: 50%;
54
+ background: var(--primary-color);
55
+ border: 2px solid var(--on-primary);
56
+ margin-top: -5px;
57
+ cursor: pointer;
58
+ }
59
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-track {
60
+ height: 6px;
61
+ background: transparent;
62
+ border-radius: 999px;
63
+ }
64
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
65
+ width: 16px;
66
+ height: 16px;
67
+ border-radius: 50%;
68
+ background: var(--primary-color);
69
+ border: 2px solid var(--on-primary);
70
+ cursor: pointer;
71
+ }
72
+ /* Improved line color via CSS */
73
+ .d3-line .lines path.improved { stroke: var(--primary-color); }
74
+ </style>
75
+ <script>
76
+ (() => {
77
+ const ensureD3 = (cb) => {
78
+ if (window.d3 && typeof window.d3.select === 'function') return cb();
79
+ let s = document.getElementById('d3-cdn-script');
80
+ if (!s) {
81
+ s = document.createElement('script');
82
+ s.id = 'd3-cdn-script';
83
+ s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
84
+ document.head.appendChild(s);
85
+ }
86
+ const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
87
+ s.addEventListener('load', onReady, { once: true });
88
+ if (window.d3) onReady();
89
+ };
90
+
91
+ const bootstrap = () => {
92
+ const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
93
+ const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
94
+ if (!container) return;
95
+ if (container.dataset) {
96
+ if (container.dataset.mounted === 'true') return;
97
+ container.dataset.mounted = 'true';
98
+ }
99
+
100
+ // CSV: prefer public path, fallback to relative
101
+ const CSV_PATHS = [
102
+ '/data/relevance_filters.csv',
103
+ './assets/data/relevance_filters.csv',
104
+ '../assets/data/relevance_filters.csv',
105
+ '../../assets/data/relevance_filters.csv'
106
+ ];
107
+ const fetchFirstAvailable = async (paths) => {
108
+ for (const p of paths) {
109
+ try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
110
+ }
111
+ throw new Error('CSV not found: relevance_filters.csv');
112
+ };
113
+
114
+ // Controls UI
115
+ const controls = document.createElement('div');
116
+ controls.className = 'd3-line__controls';
117
+ Object.assign(controls.style, {
118
+ marginTop: '12px',
119
+ display: 'flex',
120
+ gap: '16px',
121
+ alignItems: 'center',
122
+ justifyContent: 'space-between',
123
+ width: '100%'
124
+ });
125
+
126
+ const labelMetric = document.createElement('label');
127
+ Object.assign(labelMetric.style, {
128
+ fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
129
+ });
130
+ labelMetric.textContent = 'Metric';
131
+ const selectMetric = document.createElement('select');
132
+ Object.assign(selectMetric.style, { fontSize: '12px' });
133
+ labelMetric.appendChild(selectMetric);
134
+
135
+ // Inline legend on the right of the select
136
+ const legendInline = document.createElement('div');
137
+ legendInline.className = 'controls__legend';
138
+ Object.assign(legendInline.style, {
139
+ display: 'flex',
140
+ gap: '8px',
141
+ alignItems: 'center',
142
+ flexWrap: 'nowrap',
143
+ fontSize: '11px',
144
+ marginLeft: '8px'
145
+ });
146
+ controls.appendChild(legendInline);
147
+ controls.appendChild(labelMetric);
148
+
149
+ // Create SVG with marker definitions
150
+ const svg = d3.select(container).append('svg')
151
+ .attr('width', '100%')
152
+ .style('display', 'block');
153
+
154
+ // Add marker definitions for different shapes
155
+ const defs = svg.append('defs');
156
+
157
+ // Academic marker shapes
158
+ const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
159
+ const markerSize = 8;
160
+
161
+ // Groups
162
+ const gRoot = svg.append('g');
163
+ const gGrid = gRoot.append('g').attr('class', 'grid');
164
+ const gAxes = gRoot.append('g').attr('class', 'axes');
165
+ const gLines = gRoot.append('g').attr('class', 'lines');
166
+ const gPoints = gRoot.append('g').attr('class', 'points');
167
+ const gHover = gRoot.append('g').attr('class', 'hover');
168
+ const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
169
+
170
+ // Tooltip
171
+ container.style.position = container.style.position || 'relative';
172
+ let tip = container.querySelector('.d3-tooltip');
173
+ let tipInner;
174
+ if (!tip) {
175
+ tip = document.createElement('div');
176
+ tip.className = 'd3-tooltip';
177
+ Object.assign(tip.style, {
178
+ position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
179
+ padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
180
+ background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
181
+ transition: 'opacity .12s ease'
182
+ });
183
+ tipInner = document.createElement('div');
184
+ tipInner.className = 'd3-tooltip__inner';
185
+ tipInner.style.textAlign = 'left';
186
+ tip.appendChild(tipInner);
187
+ container.appendChild(tip);
188
+ } else {
189
+ tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
190
+ }
191
+
192
+ // Colors per run
193
+ const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
194
+ const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
195
+
196
+ // Mapping from metric names to display titles
197
+ const metricTitleMapping = {
198
+ 'docvqa_val_anls': 'DocVQA',
199
+ 'infovqa_val_anls': 'InfoVQA',
200
+ 'mme_total_score': 'MME Total',
201
+ 'mmmu_val_mmmu_acc': 'MMMU',
202
+ 'mmstar_average': 'MMStar',
203
+ 'ocrbench_ocrbench_accuracy': 'OCRBench',
204
+ 'scienceqa_exact_match': 'ScienceQA',
205
+ 'textvqa_val_exact_match': 'TextVQA',
206
+ 'average': 'Average (excl. MME)',
207
+ 'average_rank': 'Average Rank',
208
+ 'ai2d_exact_match': 'AI2D',
209
+ 'chartqa_relaxed_overall': 'ChartQA',
210
+ 'seedbench_seed_all': 'SeedBench'
211
+ };
212
+
213
+ // Function to get display name for metric
214
+ function getMetricDisplayName(metricKey) {
215
+ return metricTitleMapping[metricKey] || metricKey;
216
+ }
217
+
218
+ // State and data
219
+ let metricList = [];
220
+ let runList = [];
221
+ let runOrder = [];
222
+ const dataByMetric = new Map(); // metric => { run => [{step,value}] }
223
+ let isRankStrictFlag = false;
224
+ let rankTickMax = 1;
225
+
226
+ // Scales and layout
227
+ let width = 800, height = 360;
228
+ let margin = { top: 16, right: 28, bottom: 56, left: 64 };
229
+ let xScale = d3.scaleLinear();
230
+ let yScale = d3.scaleLinear();
231
+
232
+ // Line generators - simple linear connections
233
+ const lineGen = d3.line()
234
+ .x((d) => xScale(d.step))
235
+ .y((d) => yScale(d.value));
236
+
237
+ // Function to draw different marker shapes
238
+ function drawMarker(selection, shape, size) {
239
+ const s = size / 2;
240
+ switch (shape) {
241
+ case 'circle':
242
+ return selection.append('circle').attr('r', s);
243
+ case 'square':
244
+ return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
245
+ case 'triangle':
246
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
247
+ case 'diamond':
248
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
249
+ case 'inverted-triangle':
250
+ return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
251
+ default:
252
+ return selection.append('circle').attr('r', s);
253
+ }
254
+ }
255
+
256
+ // Hover elements
257
+ const hoverLine = gHover.append('line').attr('stroke-width', 1);
258
+
259
+ const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
260
+
261
+ function updateScales() {
262
+ const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
263
+ const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
264
+ const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
265
+ const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
266
+
267
+ width = container.clientWidth || 800;
268
+ height = Math.max(360, Math.round(width / 2.2));
269
+ svg.attr('width', width).attr('height', height);
270
+
271
+ const innerWidth = width - margin.left - margin.right;
272
+ const innerHeight = height - margin.top - margin.bottom;
273
+ gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
274
+
275
+ xScale.range([0, innerWidth]);
276
+ yScale.range([innerHeight, 0]);
277
+
278
+ // Compute Y ticks
279
+ let yTicks = [];
280
+ if (isRankStrictFlag) {
281
+ const maxR = Math.max(1, Math.round(rankTickMax));
282
+ for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
283
+ } else {
284
+ // Use D3's tick generator to produce nice floating-point ticks
285
+ yTicks = yScale.ticks(6);
286
+ }
287
+
288
+ // Grid (horizontal)
289
+ gGrid.selectAll('*').remove();
290
+ gGrid.selectAll('line')
291
+ .data(yTicks)
292
+ .join('line')
293
+ .attr('x1', 0)
294
+ .attr('x2', innerWidth)
295
+ .attr('y1', (d) => yScale(d))
296
+ .attr('y2', (d) => yScale(d))
297
+ .attr('stroke', gridColor)
298
+ .attr('stroke-width', 1)
299
+ .attr('shape-rendering', 'crispEdges');
300
+
301
+ // Axes
302
+ gAxes.selectAll('*').remove();
303
+ let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
304
+ if (isRankStrictFlag) {
305
+ const [dx0, dx1] = xScale.domain();
306
+ const start = Math.ceil(dx0 / 1000) * 1000;
307
+ const end = Math.floor(dx1 / 1000) * 1000;
308
+ const xTicks = [];
309
+ for (let v = start; v <= end; v += 1000) xTicks.push(v);
310
+ if (xTicks.length === 0) xTicks.push(Math.round(dx0));
311
+ xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
312
+ } else {
313
+ xAxis = xAxis.ticks(8);
314
+ }
315
+ const yAxis = d3.axisLeft(yScale)
316
+ .tickValues(yTicks)
317
+ .tickSizeOuter(0)
318
+ .tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
319
+ gAxes.append('g')
320
+ .attr('transform', `translate(0,${innerHeight})`)
321
+ .call(xAxis)
322
+ .call((g) => {
323
+ g.selectAll('path, line').attr('stroke', axisColor);
324
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
325
+ });
326
+ gAxes.append('g')
327
+ .call(yAxis)
328
+ .call((g) => {
329
+ g.selectAll('path, line').attr('stroke', axisColor);
330
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
331
+ });
332
+
333
+ // Axis labels (X and Y)
334
+ gAxes.append('text')
335
+ .attr('class', 'axis-label axis-label--x')
336
+ .attr('x', innerWidth / 2)
337
+ .attr('y', innerHeight + 44)
338
+ .attr('text-anchor', 'middle')
339
+ .style('font-size', '12px')
340
+ .style('fill', tickColor)
341
+ .text('Step');
342
+ gAxes.append('text')
343
+ .attr('class', 'axis-label axis-label--y')
344
+ .attr('text-anchor', 'middle')
345
+ .attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
346
+ .style('font-size', '12px')
347
+ .style('fill', tickColor)
348
+ .text('Value');
349
+
350
+ overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
351
+ hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
352
+
353
+ // Legend placeholder; actual content set in renderMetric
354
+ const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
355
+ const legendHeight = 64;
356
+ gLegend
357
+ .attr('x', innerWidth - legendWidth + 42)
358
+ .attr('y', innerHeight - legendHeight - 12)
359
+ .attr('width', legendWidth)
360
+ .attr('height', legendHeight);
361
+ const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
362
+ Object.assign(legendRoot.node().style, {
363
+ background: 'transparent',
364
+ border: 'none',
365
+ borderRadius: '0',
366
+ padding: '0',
367
+ fontSize: '12px',
368
+ lineHeight: '1.35',
369
+ color: 'var(--text-color)'
370
+ });
371
+
372
+ return { innerWidth, innerHeight };
373
+ }
374
+
375
+ function renderMetric(metricKey){
376
+ const map = dataByMetric.get(metricKey) || {};
377
+ const runs = runOrder;
378
+ // Domain
379
+ let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
380
+ const isRank = /rank/i.test(metricKey);
381
+ const isAverage = /average/i.test(metricKey);
382
+ const isRankStrict = isRank && !isAverage;
383
+ runs.forEach(r => {
384
+ const arr = map[r] || [];
385
+ arr.forEach(pt => {
386
+ const val = isRankStrict ? Math.round(pt.value) : pt.value;
387
+ minStep = Math.min(minStep, pt.step);
388
+ maxStep = Math.max(maxStep, pt.step);
389
+ maxVal = Math.max(maxVal, val);
390
+ minVal = Math.min(minVal, val);
391
+ });
392
+ });
393
+ if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
394
+ xScale.domain([minStep, maxStep]);
395
+ if (isRank) {
396
+ rankTickMax = Math.max(1, Math.round(maxVal));
397
+ yScale.domain([rankTickMax, 1]);
398
+ } else {
399
+ yScale.domain([0, Math.max(1, maxVal)]).nice();
400
+ }
401
+ isRankStrictFlag = isRankStrict;
402
+
403
+ const { innerWidth, innerHeight } = updateScales();
404
+
405
+ // Bind lines and markers
406
+ const series = runs.map((r, i) => ({
407
+ run: r,
408
+ color: pool[i % pool.length],
409
+ marker: markerShapes[i % markerShapes.length],
410
+ values: (map[r]||[])
411
+ .slice()
412
+ .sort((a,b)=>a.step-b.step)
413
+ .map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
414
+ }));
415
+
416
+ // Draw lines
417
+ const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
418
+ paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
419
+ .attr('stroke', d=>d.color).attr('opacity',0.9)
420
+ .attr('d', d=>lineGen(d.values))
421
+ .merge(paths)
422
+ .transition().duration(200)
423
+ .attr('stroke', d=>d.color)
424
+ .attr('d', d=>lineGen(d.values));
425
+ paths.exit().remove();
426
+
427
+ // Draw markers for each data point
428
+ gPoints.selectAll('*').remove();
429
+ series.forEach((s, seriesIndex) => {
430
+ const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
431
+ .data(s.values)
432
+ .join('g')
433
+ .attr('class', `points-${seriesIndex}`)
434
+ .attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
435
+
436
+ drawMarker(pointGroup, s.marker, markerSize)
437
+ .attr('fill', s.color)
438
+ .attr('stroke', s.color)
439
+ .attr('stroke-width', 1.5)
440
+ .style('cursor', 'crosshair');
441
+ });
442
+
443
+ // Inline legend content with marker shapes
444
+ legendInline.innerHTML = '';
445
+ series.forEach(s => {
446
+ const legendItem = document.createElement('span');
447
+ legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
448
+
449
+ // Create small SVG for marker shape
450
+ const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
451
+ markerSvg.setAttribute('width', '16');
452
+ markerSvg.setAttribute('height', '12');
453
+ markerSvg.style.display = 'inline-block';
454
+
455
+ const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
456
+ g.setAttribute('transform', 'translate(8,6)');
457
+
458
+ let shape;
459
+ const size = 6;
460
+ const halfSize = size / 2;
461
+ switch(s.marker) {
462
+ case 'circle':
463
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
464
+ shape.setAttribute('r', halfSize);
465
+ break;
466
+ case 'square':
467
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
468
+ shape.setAttribute('x', -halfSize);
469
+ shape.setAttribute('y', -halfSize);
470
+ shape.setAttribute('width', size);
471
+ shape.setAttribute('height', size);
472
+ break;
473
+ case 'triangle':
474
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
475
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
476
+ break;
477
+ case 'diamond':
478
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
479
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
480
+ break;
481
+ case 'inverted-triangle':
482
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
483
+ shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
484
+ break;
485
+ default:
486
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
487
+ shape.setAttribute('r', halfSize);
488
+ }
489
+ shape.setAttribute('fill', s.color);
490
+ shape.setAttribute('stroke', s.color);
491
+ shape.setAttribute('stroke-width', '1');
492
+
493
+ g.appendChild(shape);
494
+ markerSvg.appendChild(g);
495
+
496
+ const label = document.createElement('span');
497
+ label.textContent = s.run;
498
+
499
+ legendItem.appendChild(markerSvg);
500
+ legendItem.appendChild(label);
501
+ legendInline.appendChild(legendItem);
502
+ });
503
+
504
+ // Hover
505
+ const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
506
+ const steps = Array.from(stepSet).sort((a,b)=>a-b);
507
+ function onMove(event){
508
+ const [mx, my] = d3.pointer(event, overlay.node());
509
+ const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
510
+ const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
511
+ const xpx = xScale(nearest);
512
+ hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
513
+ // Tooltip content
514
+ let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
515
+ series.forEach(s=>{
516
+ const m = new Map(s.values.map(v=>[v.step, v.value]));
517
+ const val = m.has(nearest) ? m.get(nearest) : null;
518
+ if (val != null) {
519
+ const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
520
+ html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
521
+ }
522
+ });
523
+ tipInner.innerHTML = html;
524
+ const offsetX = 12, offsetY = 12;
525
+ tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
526
+ }
527
+ function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
528
+ overlay.on('mousemove', onMove).on('mouseleave', onLeave);
529
+ }
530
+
531
+ // (old hover removed; hover is attached in renderMetric)
532
+
533
+ // Load CSV and wire controls
534
+ (async () => {
535
+ try {
536
+ const text = await fetchFirstAvailable(CSV_PATHS);
537
+ const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
538
+ metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
539
+ runList = Array.from(new Set(rows.map(r=>r.run))).sort();
540
+ runOrder = runList;
541
+ // Build dataByMetric
542
+ metricList.forEach(m => {
543
+ const map = {};
544
+ runList.forEach(r => { map[r] = []; });
545
+ rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
546
+ dataByMetric.set(m, map);
547
+ });
548
+
549
+ // Populate metric select (default to average_rank if present)
550
+ metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
551
+ const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
552
+ if (def) selectMetric.value = def;
553
+
554
+ container.appendChild(controls);
555
+ updateScales();
556
+ renderMetric(selectMetric.value);
557
+
558
+ selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
559
+
560
+ const rerender = () => { renderMetric(selectMetric.value); };
561
+ if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
562
+ } catch (e) {
563
+ const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
564
+ pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
565
+ container.appendChild(pre);
566
+ }
567
+ })();
568
+ };
569
+
570
+ if (document.readyState === 'loading') {
571
+ document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
572
+ } else { ensureD3(bootstrap); }
573
+ })();
574
+ </script>
575
+
576
+
app/src/content/embeds/remove-ch.html ADDED
@@ -0,0 +1,576 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div class="d3-line" style="width:100%;margin:10px 0;"></div>
2
+ <style>
3
+ .d3-line .d3-line__controls select {
4
+ font-size: 12px;
5
+ padding: 8px 28px 8px 10px;
6
+ border: 1px solid var(--border-color);
7
+ border-radius: 8px;
8
+ background-color: var(--surface-bg);
9
+ color: var(--text-color);
10
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
11
+ background-repeat: no-repeat;
12
+ background-position: right 8px center;
13
+ background-size: 12px;
14
+ -webkit-appearance: none;
15
+ -moz-appearance: none;
16
+ appearance: none;
17
+ cursor: pointer;
18
+ transition: border-color .15s ease, box-shadow .15s ease;
19
+ }
20
+ [data-theme="dark"] .d3-line .d3-line__controls select {
21
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
22
+ }
23
+ .d3-line .d3-line__controls select:hover {
24
+ border-color: var(--primary-color);
25
+ }
26
+ .d3-line .d3-line__controls select:focus {
27
+ border-color: var(--primary-color);
28
+ box-shadow: 0 0 0 3px rgba(232,137,171,.25);
29
+ outline: none;
30
+ }
31
+ .d3-line .d3-line__controls label { gap: 8px; }
32
+
33
+ /* Range slider themed with --primary-color */
34
+ .d3-line .d3-line__controls input[type="range"] {
35
+ -webkit-appearance: none;
36
+ appearance: none;
37
+ width: 100%;
38
+ height: 6px;
39
+ border-radius: 999px;
40
+ background: var(--border-color);
41
+ outline: none;
42
+ }
43
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
44
+ height: 6px;
45
+ background: transparent;
46
+ border-radius: 999px;
47
+ }
48
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
49
+ -webkit-appearance: none;
50
+ appearance: none;
51
+ width: 16px;
52
+ height: 16px;
53
+ border-radius: 50%;
54
+ background: var(--primary-color);
55
+ border: 2px solid var(--on-primary);
56
+ margin-top: -5px;
57
+ cursor: pointer;
58
+ }
59
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-track {
60
+ height: 6px;
61
+ background: transparent;
62
+ border-radius: 999px;
63
+ }
64
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
65
+ width: 16px;
66
+ height: 16px;
67
+ border-radius: 50%;
68
+ background: var(--primary-color);
69
+ border: 2px solid var(--on-primary);
70
+ cursor: pointer;
71
+ }
72
+ /* Improved line color via CSS */
73
+ .d3-line .lines path.improved { stroke: var(--primary-color); }
74
+ </style>
75
+ <script>
76
+ (() => {
77
+ const ensureD3 = (cb) => {
78
+ if (window.d3 && typeof window.d3.select === 'function') return cb();
79
+ let s = document.getElementById('d3-cdn-script');
80
+ if (!s) {
81
+ s = document.createElement('script');
82
+ s.id = 'd3-cdn-script';
83
+ s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
84
+ document.head.appendChild(s);
85
+ }
86
+ const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
87
+ s.addEventListener('load', onReady, { once: true });
88
+ if (window.d3) onReady();
89
+ };
90
+
91
+ const bootstrap = () => {
92
+ const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
93
+ const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
94
+ if (!container) return;
95
+ if (container.dataset) {
96
+ if (container.dataset.mounted === 'true') return;
97
+ container.dataset.mounted = 'true';
98
+ }
99
+
100
+ // CSV: prefer public path, fallback to relative
101
+ const CSV_PATHS = [
102
+ '/data/remove_ch.csv',
103
+ './assets/data/remove_ch.csv',
104
+ '../assets/data/remove_ch.csv',
105
+ '../../assets/data/remove_ch.csv'
106
+ ];
107
+ const fetchFirstAvailable = async (paths) => {
108
+ for (const p of paths) {
109
+ try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
110
+ }
111
+ throw new Error('CSV not found: remove_ch.csv');
112
+ };
113
+
114
+ // Controls UI
115
+ const controls = document.createElement('div');
116
+ controls.className = 'd3-line__controls';
117
+ Object.assign(controls.style, {
118
+ marginTop: '12px',
119
+ display: 'flex',
120
+ gap: '16px',
121
+ alignItems: 'center',
122
+ justifyContent: 'space-between',
123
+ width: '100%'
124
+ });
125
+
126
+ const labelMetric = document.createElement('label');
127
+ Object.assign(labelMetric.style, {
128
+ fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
129
+ });
130
+ labelMetric.textContent = 'Metric';
131
+ const selectMetric = document.createElement('select');
132
+ Object.assign(selectMetric.style, { fontSize: '12px' });
133
+ labelMetric.appendChild(selectMetric);
134
+
135
+ // Inline legend on the right of the select
136
+ const legendInline = document.createElement('div');
137
+ legendInline.className = 'controls__legend';
138
+ Object.assign(legendInline.style, {
139
+ display: 'flex',
140
+ gap: '8px',
141
+ alignItems: 'center',
142
+ flexWrap: 'nowrap',
143
+ fontSize: '11px',
144
+ marginLeft: '8px'
145
+ });
146
+ controls.appendChild(legendInline);
147
+ controls.appendChild(labelMetric);
148
+
149
+ // Create SVG with marker definitions
150
+ const svg = d3.select(container).append('svg')
151
+ .attr('width', '100%')
152
+ .style('display', 'block');
153
+
154
+ // Add marker definitions for different shapes
155
+ const defs = svg.append('defs');
156
+
157
+ // Academic marker shapes
158
+ const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
159
+ const markerSize = 8;
160
+
161
+ // Groups
162
+ const gRoot = svg.append('g');
163
+ const gGrid = gRoot.append('g').attr('class', 'grid');
164
+ const gAxes = gRoot.append('g').attr('class', 'axes');
165
+ const gLines = gRoot.append('g').attr('class', 'lines');
166
+ const gPoints = gRoot.append('g').attr('class', 'points');
167
+ const gHover = gRoot.append('g').attr('class', 'hover');
168
+ const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
169
+
170
+ // Tooltip
171
+ container.style.position = container.style.position || 'relative';
172
+ let tip = container.querySelector('.d3-tooltip');
173
+ let tipInner;
174
+ if (!tip) {
175
+ tip = document.createElement('div');
176
+ tip.className = 'd3-tooltip';
177
+ Object.assign(tip.style, {
178
+ position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
179
+ padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
180
+ background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
181
+ transition: 'opacity .12s ease'
182
+ });
183
+ tipInner = document.createElement('div');
184
+ tipInner.className = 'd3-tooltip__inner';
185
+ tipInner.style.textAlign = 'left';
186
+ tip.appendChild(tipInner);
187
+ container.appendChild(tip);
188
+ } else {
189
+ tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
190
+ }
191
+
192
+ // Colors per run
193
+ const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
194
+ const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
195
+
196
+ // Mapping from metric names to display titles
197
+ const metricTitleMapping = {
198
+ 'docvqa_val_anls': 'DocVQA',
199
+ 'infovqa_val_anls': 'InfoVQA',
200
+ 'mme_total_score': 'MME Total',
201
+ 'mmmu_val_mmmu_acc': 'MMMU',
202
+ 'mmstar_average': 'MMStar',
203
+ 'ocrbench_ocrbench_accuracy': 'OCRBench',
204
+ 'scienceqa_exact_match': 'ScienceQA',
205
+ 'textvqa_val_exact_match': 'TextVQA',
206
+ 'average': 'Average (excl. MME)',
207
+ 'average_rank': 'Average Rank',
208
+ 'ai2d_exact_match': 'AI2D',
209
+ 'chartqa_relaxed_overall': 'ChartQA',
210
+ 'seedbench_seed_all': 'SeedBench'
211
+ };
212
+
213
+ // Function to get display name for metric
214
+ function getMetricDisplayName(metricKey) {
215
+ return metricTitleMapping[metricKey] || metricKey;
216
+ }
217
+
218
+ // State and data
219
+ let metricList = [];
220
+ let runList = [];
221
+ let runOrder = [];
222
+ const dataByMetric = new Map(); // metric => { run => [{step,value}] }
223
+ let isRankStrictFlag = false;
224
+ let rankTickMax = 1;
225
+
226
+ // Scales and layout
227
+ let width = 800, height = 360;
228
+ let margin = { top: 16, right: 28, bottom: 56, left: 64 };
229
+ let xScale = d3.scaleLinear();
230
+ let yScale = d3.scaleLinear();
231
+
232
+ // Line generators - simple linear connections
233
+ const lineGen = d3.line()
234
+ .x((d) => xScale(d.step))
235
+ .y((d) => yScale(d.value));
236
+
237
+ // Function to draw different marker shapes
238
+ function drawMarker(selection, shape, size) {
239
+ const s = size / 2;
240
+ switch (shape) {
241
+ case 'circle':
242
+ return selection.append('circle').attr('r', s);
243
+ case 'square':
244
+ return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
245
+ case 'triangle':
246
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
247
+ case 'diamond':
248
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
249
+ case 'inverted-triangle':
250
+ return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
251
+ default:
252
+ return selection.append('circle').attr('r', s);
253
+ }
254
+ }
255
+
256
+ // Hover elements
257
+ const hoverLine = gHover.append('line').attr('stroke-width', 1);
258
+
259
+ const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
260
+
261
+ function updateScales() {
262
+ const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
263
+ const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
264
+ const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
265
+ const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
266
+
267
+ width = container.clientWidth || 800;
268
+ height = Math.max(360, Math.round(width / 2.2));
269
+ svg.attr('width', width).attr('height', height);
270
+
271
+ const innerWidth = width - margin.left - margin.right;
272
+ const innerHeight = height - margin.top - margin.bottom;
273
+ gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
274
+
275
+ xScale.range([0, innerWidth]);
276
+ yScale.range([innerHeight, 0]);
277
+
278
+ // Compute Y ticks
279
+ let yTicks = [];
280
+ if (isRankStrictFlag) {
281
+ const maxR = Math.max(1, Math.round(rankTickMax));
282
+ for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
283
+ } else {
284
+ // Use D3's tick generator to produce nice floating-point ticks
285
+ yTicks = yScale.ticks(6);
286
+ }
287
+
288
+ // Grid (horizontal)
289
+ gGrid.selectAll('*').remove();
290
+ gGrid.selectAll('line')
291
+ .data(yTicks)
292
+ .join('line')
293
+ .attr('x1', 0)
294
+ .attr('x2', innerWidth)
295
+ .attr('y1', (d) => yScale(d))
296
+ .attr('y2', (d) => yScale(d))
297
+ .attr('stroke', gridColor)
298
+ .attr('stroke-width', 1)
299
+ .attr('shape-rendering', 'crispEdges');
300
+
301
+ // Axes
302
+ gAxes.selectAll('*').remove();
303
+ let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
304
+ if (isRankStrictFlag) {
305
+ const [dx0, dx1] = xScale.domain();
306
+ const start = Math.ceil(dx0 / 1000) * 1000;
307
+ const end = Math.floor(dx1 / 1000) * 1000;
308
+ const xTicks = [];
309
+ for (let v = start; v <= end; v += 1000) xTicks.push(v);
310
+ if (xTicks.length === 0) xTicks.push(Math.round(dx0));
311
+ xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
312
+ } else {
313
+ xAxis = xAxis.ticks(8);
314
+ }
315
+ const yAxis = d3.axisLeft(yScale)
316
+ .tickValues(yTicks)
317
+ .tickSizeOuter(0)
318
+ .tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
319
+ gAxes.append('g')
320
+ .attr('transform', `translate(0,${innerHeight})`)
321
+ .call(xAxis)
322
+ .call((g) => {
323
+ g.selectAll('path, line').attr('stroke', axisColor);
324
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
325
+ });
326
+ gAxes.append('g')
327
+ .call(yAxis)
328
+ .call((g) => {
329
+ g.selectAll('path, line').attr('stroke', axisColor);
330
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
331
+ });
332
+
333
+ // Axis labels (X and Y)
334
+ gAxes.append('text')
335
+ .attr('class', 'axis-label axis-label--x')
336
+ .attr('x', innerWidth / 2)
337
+ .attr('y', innerHeight + 44)
338
+ .attr('text-anchor', 'middle')
339
+ .style('font-size', '12px')
340
+ .style('fill', tickColor)
341
+ .text('Step');
342
+ gAxes.append('text')
343
+ .attr('class', 'axis-label axis-label--y')
344
+ .attr('text-anchor', 'middle')
345
+ .attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
346
+ .style('font-size', '12px')
347
+ .style('fill', tickColor)
348
+ .text('Value');
349
+
350
+ overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
351
+ hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
352
+
353
+ // Legend placeholder; actual content set in renderMetric
354
+ const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
355
+ const legendHeight = 64;
356
+ gLegend
357
+ .attr('x', innerWidth - legendWidth + 42)
358
+ .attr('y', innerHeight - legendHeight - 12)
359
+ .attr('width', legendWidth)
360
+ .attr('height', legendHeight);
361
+ const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
362
+ Object.assign(legendRoot.node().style, {
363
+ background: 'transparent',
364
+ border: 'none',
365
+ borderRadius: '0',
366
+ padding: '0',
367
+ fontSize: '12px',
368
+ lineHeight: '1.35',
369
+ color: 'var(--text-color)'
370
+ });
371
+
372
+ return { innerWidth, innerHeight };
373
+ }
374
+
375
+ function renderMetric(metricKey){
376
+ const map = dataByMetric.get(metricKey) || {};
377
+ const runs = runOrder;
378
+ // Domain
379
+ let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
380
+ const isRank = /rank/i.test(metricKey);
381
+ const isAverage = /average/i.test(metricKey);
382
+ const isRankStrict = isRank && !isAverage;
383
+ runs.forEach(r => {
384
+ const arr = map[r] || [];
385
+ arr.forEach(pt => {
386
+ const val = isRankStrict ? Math.round(pt.value) : pt.value;
387
+ minStep = Math.min(minStep, pt.step);
388
+ maxStep = Math.max(maxStep, pt.step);
389
+ maxVal = Math.max(maxVal, val);
390
+ minVal = Math.min(minVal, val);
391
+ });
392
+ });
393
+ if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
394
+ xScale.domain([minStep, maxStep]);
395
+ if (isRank) {
396
+ rankTickMax = Math.max(1, Math.round(maxVal));
397
+ yScale.domain([rankTickMax, 1]);
398
+ } else {
399
+ yScale.domain([0, Math.max(1, maxVal)]).nice();
400
+ }
401
+ isRankStrictFlag = isRankStrict;
402
+
403
+ const { innerWidth, innerHeight } = updateScales();
404
+
405
+ // Bind lines and markers
406
+ const series = runs.map((r, i) => ({
407
+ run: r,
408
+ color: pool[i % pool.length],
409
+ marker: markerShapes[i % markerShapes.length],
410
+ values: (map[r]||[])
411
+ .slice()
412
+ .sort((a,b)=>a.step-b.step)
413
+ .map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
414
+ }));
415
+
416
+ // Draw lines
417
+ const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
418
+ paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
419
+ .attr('stroke', d=>d.color).attr('opacity',0.9)
420
+ .attr('d', d=>lineGen(d.values))
421
+ .merge(paths)
422
+ .transition().duration(200)
423
+ .attr('stroke', d=>d.color)
424
+ .attr('d', d=>lineGen(d.values));
425
+ paths.exit().remove();
426
+
427
+ // Draw markers for each data point
428
+ gPoints.selectAll('*').remove();
429
+ series.forEach((s, seriesIndex) => {
430
+ const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
431
+ .data(s.values)
432
+ .join('g')
433
+ .attr('class', `points-${seriesIndex}`)
434
+ .attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
435
+
436
+ drawMarker(pointGroup, s.marker, markerSize)
437
+ .attr('fill', s.color)
438
+ .attr('stroke', s.color)
439
+ .attr('stroke-width', 1.5)
440
+ .style('cursor', 'crosshair');
441
+ });
442
+
443
+ // Inline legend content with marker shapes
444
+ legendInline.innerHTML = '';
445
+ series.forEach(s => {
446
+ const legendItem = document.createElement('span');
447
+ legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
448
+
449
+ // Create small SVG for marker shape
450
+ const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
451
+ markerSvg.setAttribute('width', '16');
452
+ markerSvg.setAttribute('height', '12');
453
+ markerSvg.style.display = 'inline-block';
454
+
455
+ const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
456
+ g.setAttribute('transform', 'translate(8,6)');
457
+
458
+ let shape;
459
+ const size = 6;
460
+ const halfSize = size / 2;
461
+ switch(s.marker) {
462
+ case 'circle':
463
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
464
+ shape.setAttribute('r', halfSize);
465
+ break;
466
+ case 'square':
467
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
468
+ shape.setAttribute('x', -halfSize);
469
+ shape.setAttribute('y', -halfSize);
470
+ shape.setAttribute('width', size);
471
+ shape.setAttribute('height', size);
472
+ break;
473
+ case 'triangle':
474
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
475
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
476
+ break;
477
+ case 'diamond':
478
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
479
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
480
+ break;
481
+ case 'inverted-triangle':
482
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
483
+ shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
484
+ break;
485
+ default:
486
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
487
+ shape.setAttribute('r', halfSize);
488
+ }
489
+ shape.setAttribute('fill', s.color);
490
+ shape.setAttribute('stroke', s.color);
491
+ shape.setAttribute('stroke-width', '1');
492
+
493
+ g.appendChild(shape);
494
+ markerSvg.appendChild(g);
495
+
496
+ const label = document.createElement('span');
497
+ label.textContent = s.run;
498
+
499
+ legendItem.appendChild(markerSvg);
500
+ legendItem.appendChild(label);
501
+ legendInline.appendChild(legendItem);
502
+ });
503
+
504
+ // Hover
505
+ const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
506
+ const steps = Array.from(stepSet).sort((a,b)=>a-b);
507
+ function onMove(event){
508
+ const [mx, my] = d3.pointer(event, overlay.node());
509
+ const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
510
+ const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
511
+ const xpx = xScale(nearest);
512
+ hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
513
+ // Tooltip content
514
+ let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
515
+ series.forEach(s=>{
516
+ const m = new Map(s.values.map(v=>[v.step, v.value]));
517
+ const val = m.has(nearest) ? m.get(nearest) : null;
518
+ if (val != null) {
519
+ const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
520
+ html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
521
+ }
522
+ });
523
+ tipInner.innerHTML = html;
524
+ const offsetX = 12, offsetY = 12;
525
+ tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
526
+ }
527
+ function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
528
+ overlay.on('mousemove', onMove).on('mouseleave', onLeave);
529
+ }
530
+
531
+ // (old hover removed; hover is attached in renderMetric)
532
+
533
+ // Load CSV and wire controls
534
+ (async () => {
535
+ try {
536
+ const text = await fetchFirstAvailable(CSV_PATHS);
537
+ const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
538
+ metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
539
+ runList = Array.from(new Set(rows.map(r=>r.run))).sort();
540
+ runOrder = runList;
541
+ // Build dataByMetric
542
+ metricList.forEach(m => {
543
+ const map = {};
544
+ runList.forEach(r => { map[r] = []; });
545
+ rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
546
+ dataByMetric.set(m, map);
547
+ });
548
+
549
+ // Populate metric select (default to average_rank if present)
550
+ metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
551
+ const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
552
+ if (def) selectMetric.value = def;
553
+
554
+ container.appendChild(controls);
555
+ updateScales();
556
+ renderMetric(selectMetric.value);
557
+
558
+ selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
559
+
560
+ const rerender = () => { renderMetric(selectMetric.value); };
561
+ if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
562
+ } catch (e) {
563
+ const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
564
+ pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
565
+ container.appendChild(pre);
566
+ }
567
+ })();
568
+ };
569
+
570
+ if (document.readyState === 'loading') {
571
+ document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
572
+ } else { ensureD3(bootstrap); }
573
+ })();
574
+ </script>
575
+
576
+
app/src/content/embeds/s25-ratings.html ADDED
@@ -0,0 +1,576 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div class="d3-line" style="width:100%;margin:10px 0;"></div>
2
+ <style>
3
+ .d3-line .d3-line__controls select {
4
+ font-size: 12px;
5
+ padding: 8px 28px 8px 10px;
6
+ border: 1px solid var(--border-color);
7
+ border-radius: 8px;
8
+ background-color: var(--surface-bg);
9
+ color: var(--text-color);
10
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
11
+ background-repeat: no-repeat;
12
+ background-position: right 8px center;
13
+ background-size: 12px;
14
+ -webkit-appearance: none;
15
+ -moz-appearance: none;
16
+ appearance: none;
17
+ cursor: pointer;
18
+ transition: border-color .15s ease, box-shadow .15s ease;
19
+ }
20
+ [data-theme="dark"] .d3-line .d3-line__controls select {
21
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
22
+ }
23
+ .d3-line .d3-line__controls select:hover {
24
+ border-color: var(--primary-color);
25
+ }
26
+ .d3-line .d3-line__controls select:focus {
27
+ border-color: var(--primary-color);
28
+ box-shadow: 0 0 0 3px rgba(232,137,171,.25);
29
+ outline: none;
30
+ }
31
+ .d3-line .d3-line__controls label { gap: 8px; }
32
+
33
+ /* Range slider themed with --primary-color */
34
+ .d3-line .d3-line__controls input[type="range"] {
35
+ -webkit-appearance: none;
36
+ appearance: none;
37
+ width: 100%;
38
+ height: 6px;
39
+ border-radius: 999px;
40
+ background: var(--border-color);
41
+ outline: none;
42
+ }
43
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
44
+ height: 6px;
45
+ background: transparent;
46
+ border-radius: 999px;
47
+ }
48
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
49
+ -webkit-appearance: none;
50
+ appearance: none;
51
+ width: 16px;
52
+ height: 16px;
53
+ border-radius: 50%;
54
+ background: var(--primary-color);
55
+ border: 2px solid var(--on-primary);
56
+ margin-top: -5px;
57
+ cursor: pointer;
58
+ }
59
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-track {
60
+ height: 6px;
61
+ background: transparent;
62
+ border-radius: 999px;
63
+ }
64
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
65
+ width: 16px;
66
+ height: 16px;
67
+ border-radius: 50%;
68
+ background: var(--primary-color);
69
+ border: 2px solid var(--on-primary);
70
+ cursor: pointer;
71
+ }
72
+ /* Improved line color via CSS */
73
+ .d3-line .lines path.improved { stroke: var(--primary-color); }
74
+ </style>
75
+ <script>
76
+ (() => {
77
+ const ensureD3 = (cb) => {
78
+ if (window.d3 && typeof window.d3.select === 'function') return cb();
79
+ let s = document.getElementById('d3-cdn-script');
80
+ if (!s) {
81
+ s = document.createElement('script');
82
+ s.id = 'd3-cdn-script';
83
+ s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
84
+ document.head.appendChild(s);
85
+ }
86
+ const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
87
+ s.addEventListener('load', onReady, { once: true });
88
+ if (window.d3) onReady();
89
+ };
90
+
91
+ const bootstrap = () => {
92
+ const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
93
+ const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
94
+ if (!container) return;
95
+ if (container.dataset) {
96
+ if (container.dataset.mounted === 'true') return;
97
+ container.dataset.mounted = 'true';
98
+ }
99
+
100
+ // CSV: prefer public path, fallback to relative
101
+ const CSV_PATHS = [
102
+ '/data/s25_ratings.csv',
103
+ './assets/data/s25_ratings.csv',
104
+ '../assets/data/s25_ratings.csv',
105
+ '../../assets/data/s25_ratings.csv'
106
+ ];
107
+ const fetchFirstAvailable = async (paths) => {
108
+ for (const p of paths) {
109
+ try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
110
+ }
111
+ throw new Error('CSV not found: s25_ratings.csv');
112
+ };
113
+
114
+ // Controls UI
115
+ const controls = document.createElement('div');
116
+ controls.className = 'd3-line__controls';
117
+ Object.assign(controls.style, {
118
+ marginTop: '12px',
119
+ display: 'flex',
120
+ gap: '16px',
121
+ alignItems: 'center',
122
+ justifyContent: 'space-between',
123
+ width: '100%'
124
+ });
125
+
126
+ const labelMetric = document.createElement('label');
127
+ Object.assign(labelMetric.style, {
128
+ fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
129
+ });
130
+ labelMetric.textContent = 'Metric';
131
+ const selectMetric = document.createElement('select');
132
+ Object.assign(selectMetric.style, { fontSize: '12px' });
133
+ labelMetric.appendChild(selectMetric);
134
+
135
+ // Inline legend on the right of the select
136
+ const legendInline = document.createElement('div');
137
+ legendInline.className = 'controls__legend';
138
+ Object.assign(legendInline.style, {
139
+ display: 'flex',
140
+ gap: '8px',
141
+ alignItems: 'center',
142
+ flexWrap: 'nowrap',
143
+ fontSize: '11px',
144
+ marginLeft: '8px'
145
+ });
146
+ controls.appendChild(legendInline);
147
+ controls.appendChild(labelMetric);
148
+
149
+ // Create SVG with marker definitions
150
+ const svg = d3.select(container).append('svg')
151
+ .attr('width', '100%')
152
+ .style('display', 'block');
153
+
154
+ // Add marker definitions for different shapes
155
+ const defs = svg.append('defs');
156
+
157
+ // Academic marker shapes
158
+ const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
159
+ const markerSize = 8;
160
+
161
+ // Groups
162
+ const gRoot = svg.append('g');
163
+ const gGrid = gRoot.append('g').attr('class', 'grid');
164
+ const gAxes = gRoot.append('g').attr('class', 'axes');
165
+ const gLines = gRoot.append('g').attr('class', 'lines');
166
+ const gPoints = gRoot.append('g').attr('class', 'points');
167
+ const gHover = gRoot.append('g').attr('class', 'hover');
168
+ const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
169
+
170
+ // Tooltip
171
+ container.style.position = container.style.position || 'relative';
172
+ let tip = container.querySelector('.d3-tooltip');
173
+ let tipInner;
174
+ if (!tip) {
175
+ tip = document.createElement('div');
176
+ tip.className = 'd3-tooltip';
177
+ Object.assign(tip.style, {
178
+ position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
179
+ padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
180
+ background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
181
+ transition: 'opacity .12s ease'
182
+ });
183
+ tipInner = document.createElement('div');
184
+ tipInner.className = 'd3-tooltip__inner';
185
+ tipInner.style.textAlign = 'left';
186
+ tip.appendChild(tipInner);
187
+ container.appendChild(tip);
188
+ } else {
189
+ tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
190
+ }
191
+
192
+ // Colors per run
193
+ const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
194
+ const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
195
+
196
+ // Mapping from metric names to display titles
197
+ const metricTitleMapping = {
198
+ 'docvqa_val_anls': 'DocVQA',
199
+ 'infovqa_val_anls': 'InfoVQA',
200
+ 'mme_total_score': 'MME Total',
201
+ 'mmmu_val_mmmu_acc': 'MMMU',
202
+ 'mmstar_average': 'MMStar',
203
+ 'ocrbench_ocrbench_accuracy': 'OCRBench',
204
+ 'scienceqa_exact_match': 'ScienceQA',
205
+ 'textvqa_val_exact_match': 'TextVQA',
206
+ 'average': 'Average (excl. MME)',
207
+ 'average_rank': 'Average Rank',
208
+ 'ai2d_exact_match': 'AI2D',
209
+ 'chartqa_relaxed_overall': 'ChartQA',
210
+ 'seedbench_seed_all': 'SeedBench'
211
+ };
212
+
213
+ // Function to get display name for metric
214
+ function getMetricDisplayName(metricKey) {
215
+ return metricTitleMapping[metricKey] || metricKey;
216
+ }
217
+
218
+ // State and data
219
+ let metricList = [];
220
+ let runList = [];
221
+ let runOrder = [];
222
+ const dataByMetric = new Map(); // metric => { run => [{step,value}] }
223
+ let isRankStrictFlag = false;
224
+ let rankTickMax = 1;
225
+
226
+ // Scales and layout
227
+ let width = 800, height = 360;
228
+ let margin = { top: 16, right: 28, bottom: 56, left: 64 };
229
+ let xScale = d3.scaleLinear();
230
+ let yScale = d3.scaleLinear();
231
+
232
+ // Line generators - simple linear connections
233
+ const lineGen = d3.line()
234
+ .x((d) => xScale(d.step))
235
+ .y((d) => yScale(d.value));
236
+
237
+ // Function to draw different marker shapes
238
+ function drawMarker(selection, shape, size) {
239
+ const s = size / 2;
240
+ switch (shape) {
241
+ case 'circle':
242
+ return selection.append('circle').attr('r', s);
243
+ case 'square':
244
+ return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
245
+ case 'triangle':
246
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
247
+ case 'diamond':
248
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
249
+ case 'inverted-triangle':
250
+ return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
251
+ default:
252
+ return selection.append('circle').attr('r', s);
253
+ }
254
+ }
255
+
256
+ // Hover elements
257
+ const hoverLine = gHover.append('line').attr('stroke-width', 1);
258
+
259
+ const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
260
+
261
+ function updateScales() {
262
+ const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
263
+ const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
264
+ const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
265
+ const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
266
+
267
+ width = container.clientWidth || 800;
268
+ height = Math.max(360, Math.round(width / 2.2));
269
+ svg.attr('width', width).attr('height', height);
270
+
271
+ const innerWidth = width - margin.left - margin.right;
272
+ const innerHeight = height - margin.top - margin.bottom;
273
+ gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
274
+
275
+ xScale.range([0, innerWidth]);
276
+ yScale.range([innerHeight, 0]);
277
+
278
+ // Compute Y ticks
279
+ let yTicks = [];
280
+ if (isRankStrictFlag) {
281
+ const maxR = Math.max(1, Math.round(rankTickMax));
282
+ for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
283
+ } else {
284
+ // Use D3's tick generator to produce nice floating-point ticks
285
+ yTicks = yScale.ticks(6);
286
+ }
287
+
288
+ // Grid (horizontal)
289
+ gGrid.selectAll('*').remove();
290
+ gGrid.selectAll('line')
291
+ .data(yTicks)
292
+ .join('line')
293
+ .attr('x1', 0)
294
+ .attr('x2', innerWidth)
295
+ .attr('y1', (d) => yScale(d))
296
+ .attr('y2', (d) => yScale(d))
297
+ .attr('stroke', gridColor)
298
+ .attr('stroke-width', 1)
299
+ .attr('shape-rendering', 'crispEdges');
300
+
301
+ // Axes
302
+ gAxes.selectAll('*').remove();
303
+ let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
304
+ if (isRankStrictFlag) {
305
+ const [dx0, dx1] = xScale.domain();
306
+ const start = Math.ceil(dx0 / 1000) * 1000;
307
+ const end = Math.floor(dx1 / 1000) * 1000;
308
+ const xTicks = [];
309
+ for (let v = start; v <= end; v += 1000) xTicks.push(v);
310
+ if (xTicks.length === 0) xTicks.push(Math.round(dx0));
311
+ xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
312
+ } else {
313
+ xAxis = xAxis.ticks(8);
314
+ }
315
+ const yAxis = d3.axisLeft(yScale)
316
+ .tickValues(yTicks)
317
+ .tickSizeOuter(0)
318
+ .tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
319
+ gAxes.append('g')
320
+ .attr('transform', `translate(0,${innerHeight})`)
321
+ .call(xAxis)
322
+ .call((g) => {
323
+ g.selectAll('path, line').attr('stroke', axisColor);
324
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
325
+ });
326
+ gAxes.append('g')
327
+ .call(yAxis)
328
+ .call((g) => {
329
+ g.selectAll('path, line').attr('stroke', axisColor);
330
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
331
+ });
332
+
333
+ // Axis labels (X and Y)
334
+ gAxes.append('text')
335
+ .attr('class', 'axis-label axis-label--x')
336
+ .attr('x', innerWidth / 2)
337
+ .attr('y', innerHeight + 44)
338
+ .attr('text-anchor', 'middle')
339
+ .style('font-size', '12px')
340
+ .style('fill', tickColor)
341
+ .text('Step');
342
+ gAxes.append('text')
343
+ .attr('class', 'axis-label axis-label--y')
344
+ .attr('text-anchor', 'middle')
345
+ .attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
346
+ .style('font-size', '12px')
347
+ .style('fill', tickColor)
348
+ .text('Value');
349
+
350
+ overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
351
+ hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
352
+
353
+ // Legend placeholder; actual content set in renderMetric
354
+ const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
355
+ const legendHeight = 64;
356
+ gLegend
357
+ .attr('x', innerWidth - legendWidth + 42)
358
+ .attr('y', innerHeight - legendHeight - 12)
359
+ .attr('width', legendWidth)
360
+ .attr('height', legendHeight);
361
+ const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
362
+ Object.assign(legendRoot.node().style, {
363
+ background: 'transparent',
364
+ border: 'none',
365
+ borderRadius: '0',
366
+ padding: '0',
367
+ fontSize: '12px',
368
+ lineHeight: '1.35',
369
+ color: 'var(--text-color)'
370
+ });
371
+
372
+ return { innerWidth, innerHeight };
373
+ }
374
+
375
+ function renderMetric(metricKey){
376
+ const map = dataByMetric.get(metricKey) || {};
377
+ const runs = runOrder;
378
+ // Domain
379
+ let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
380
+ const isRank = /rank/i.test(metricKey);
381
+ const isAverage = /average/i.test(metricKey);
382
+ const isRankStrict = isRank && !isAverage;
383
+ runs.forEach(r => {
384
+ const arr = map[r] || [];
385
+ arr.forEach(pt => {
386
+ const val = isRankStrict ? Math.round(pt.value) : pt.value;
387
+ minStep = Math.min(minStep, pt.step);
388
+ maxStep = Math.max(maxStep, pt.step);
389
+ maxVal = Math.max(maxVal, val);
390
+ minVal = Math.min(minVal, val);
391
+ });
392
+ });
393
+ if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
394
+ xScale.domain([minStep, maxStep]);
395
+ if (isRank) {
396
+ rankTickMax = Math.max(1, Math.round(maxVal));
397
+ yScale.domain([rankTickMax, 1]);
398
+ } else {
399
+ yScale.domain([0, Math.max(1, maxVal)]).nice();
400
+ }
401
+ isRankStrictFlag = isRankStrict;
402
+
403
+ const { innerWidth, innerHeight } = updateScales();
404
+
405
+ // Bind lines and markers
406
+ const series = runs.map((r, i) => ({
407
+ run: r,
408
+ color: pool[i % pool.length],
409
+ marker: markerShapes[i % markerShapes.length],
410
+ values: (map[r]||[])
411
+ .slice()
412
+ .sort((a,b)=>a.step-b.step)
413
+ .map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
414
+ }));
415
+
416
+ // Draw lines
417
+ const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
418
+ paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
419
+ .attr('stroke', d=>d.color).attr('opacity',0.9)
420
+ .attr('d', d=>lineGen(d.values))
421
+ .merge(paths)
422
+ .transition().duration(200)
423
+ .attr('stroke', d=>d.color)
424
+ .attr('d', d=>lineGen(d.values));
425
+ paths.exit().remove();
426
+
427
+ // Draw markers for each data point
428
+ gPoints.selectAll('*').remove();
429
+ series.forEach((s, seriesIndex) => {
430
+ const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
431
+ .data(s.values)
432
+ .join('g')
433
+ .attr('class', `points-${seriesIndex}`)
434
+ .attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
435
+
436
+ drawMarker(pointGroup, s.marker, markerSize)
437
+ .attr('fill', s.color)
438
+ .attr('stroke', s.color)
439
+ .attr('stroke-width', 1.5)
440
+ .style('cursor', 'crosshair');
441
+ });
442
+
443
+ // Inline legend content with marker shapes
444
+ legendInline.innerHTML = '';
445
+ series.forEach(s => {
446
+ const legendItem = document.createElement('span');
447
+ legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
448
+
449
+ // Create small SVG for marker shape
450
+ const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
451
+ markerSvg.setAttribute('width', '16');
452
+ markerSvg.setAttribute('height', '12');
453
+ markerSvg.style.display = 'inline-block';
454
+
455
+ const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
456
+ g.setAttribute('transform', 'translate(8,6)');
457
+
458
+ let shape;
459
+ const size = 6;
460
+ const halfSize = size / 2;
461
+ switch(s.marker) {
462
+ case 'circle':
463
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
464
+ shape.setAttribute('r', halfSize);
465
+ break;
466
+ case 'square':
467
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
468
+ shape.setAttribute('x', -halfSize);
469
+ shape.setAttribute('y', -halfSize);
470
+ shape.setAttribute('width', size);
471
+ shape.setAttribute('height', size);
472
+ break;
473
+ case 'triangle':
474
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
475
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
476
+ break;
477
+ case 'diamond':
478
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
479
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
480
+ break;
481
+ case 'inverted-triangle':
482
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
483
+ shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
484
+ break;
485
+ default:
486
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
487
+ shape.setAttribute('r', halfSize);
488
+ }
489
+ shape.setAttribute('fill', s.color);
490
+ shape.setAttribute('stroke', s.color);
491
+ shape.setAttribute('stroke-width', '1');
492
+
493
+ g.appendChild(shape);
494
+ markerSvg.appendChild(g);
495
+
496
+ const label = document.createElement('span');
497
+ label.textContent = s.run;
498
+
499
+ legendItem.appendChild(markerSvg);
500
+ legendItem.appendChild(label);
501
+ legendInline.appendChild(legendItem);
502
+ });
503
+
504
+ // Hover
505
+ const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
506
+ const steps = Array.from(stepSet).sort((a,b)=>a-b);
507
+ function onMove(event){
508
+ const [mx, my] = d3.pointer(event, overlay.node());
509
+ const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
510
+ const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
511
+ const xpx = xScale(nearest);
512
+ hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
513
+ // Tooltip content
514
+ let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
515
+ series.forEach(s=>{
516
+ const m = new Map(s.values.map(v=>[v.step, v.value]));
517
+ const val = m.has(nearest) ? m.get(nearest) : null;
518
+ if (val != null) {
519
+ const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
520
+ html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
521
+ }
522
+ });
523
+ tipInner.innerHTML = html;
524
+ const offsetX = 12, offsetY = 12;
525
+ tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
526
+ }
527
+ function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
528
+ overlay.on('mousemove', onMove).on('mouseleave', onLeave);
529
+ }
530
+
531
+ // (old hover removed; hover is attached in renderMetric)
532
+
533
+ // Load CSV and wire controls
534
+ (async () => {
535
+ try {
536
+ const text = await fetchFirstAvailable(CSV_PATHS);
537
+ const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
538
+ metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
539
+ runList = Array.from(new Set(rows.map(r=>r.run))).sort();
540
+ runOrder = runList;
541
+ // Build dataByMetric
542
+ metricList.forEach(m => {
543
+ const map = {};
544
+ runList.forEach(r => { map[r] = []; });
545
+ rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
546
+ dataByMetric.set(m, map);
547
+ });
548
+
549
+ // Populate metric select (default to average_rank if present)
550
+ metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
551
+ const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
552
+ if (def) selectMetric.value = def;
553
+
554
+ container.appendChild(controls);
555
+ updateScales();
556
+ renderMetric(selectMetric.value);
557
+
558
+ selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
559
+
560
+ const rerender = () => { renderMetric(selectMetric.value); };
561
+ if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
562
+ } catch (e) {
563
+ const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
564
+ pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
565
+ container.appendChild(pre);
566
+ }
567
+ })();
568
+ };
569
+
570
+ if (document.readyState === 'loading') {
571
+ document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
572
+ } else { ensureD3(bootstrap); }
573
+ })();
574
+ </script>
575
+
576
+
app/src/content/embeds/ss-vs-s1.html ADDED
@@ -0,0 +1,576 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div class="d3-line" style="width:100%;margin:10px 0;"></div>
2
+ <style>
3
+ .d3-line .d3-line__controls select {
4
+ font-size: 12px;
5
+ padding: 8px 28px 8px 10px;
6
+ border: 1px solid var(--border-color);
7
+ border-radius: 8px;
8
+ background-color: var(--surface-bg);
9
+ color: var(--text-color);
10
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
11
+ background-repeat: no-repeat;
12
+ background-position: right 8px center;
13
+ background-size: 12px;
14
+ -webkit-appearance: none;
15
+ -moz-appearance: none;
16
+ appearance: none;
17
+ cursor: pointer;
18
+ transition: border-color .15s ease, box-shadow .15s ease;
19
+ }
20
+ [data-theme="dark"] .d3-line .d3-line__controls select {
21
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
22
+ }
23
+ .d3-line .d3-line__controls select:hover {
24
+ border-color: var(--primary-color);
25
+ }
26
+ .d3-line .d3-line__controls select:focus {
27
+ border-color: var(--primary-color);
28
+ box-shadow: 0 0 0 3px rgba(232,137,171,.25);
29
+ outline: none;
30
+ }
31
+ .d3-line .d3-line__controls label { gap: 8px; }
32
+
33
+ /* Range slider themed with --primary-color */
34
+ .d3-line .d3-line__controls input[type="range"] {
35
+ -webkit-appearance: none;
36
+ appearance: none;
37
+ width: 100%;
38
+ height: 6px;
39
+ border-radius: 999px;
40
+ background: var(--border-color);
41
+ outline: none;
42
+ }
43
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
44
+ height: 6px;
45
+ background: transparent;
46
+ border-radius: 999px;
47
+ }
48
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
49
+ -webkit-appearance: none;
50
+ appearance: none;
51
+ width: 16px;
52
+ height: 16px;
53
+ border-radius: 50%;
54
+ background: var(--primary-color);
55
+ border: 2px solid var(--on-primary);
56
+ margin-top: -5px;
57
+ cursor: pointer;
58
+ }
59
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-track {
60
+ height: 6px;
61
+ background: transparent;
62
+ border-radius: 999px;
63
+ }
64
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
65
+ width: 16px;
66
+ height: 16px;
67
+ border-radius: 50%;
68
+ background: var(--primary-color);
69
+ border: 2px solid var(--on-primary);
70
+ cursor: pointer;
71
+ }
72
+ /* Improved line color via CSS */
73
+ .d3-line .lines path.improved { stroke: var(--primary-color); }
74
+ </style>
75
+ <script>
76
+ (() => {
77
+ const ensureD3 = (cb) => {
78
+ if (window.d3 && typeof window.d3.select === 'function') return cb();
79
+ let s = document.getElementById('d3-cdn-script');
80
+ if (!s) {
81
+ s = document.createElement('script');
82
+ s.id = 'd3-cdn-script';
83
+ s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
84
+ document.head.appendChild(s);
85
+ }
86
+ const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
87
+ s.addEventListener('load', onReady, { once: true });
88
+ if (window.d3) onReady();
89
+ };
90
+
91
+ const bootstrap = () => {
92
+ const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
93
+ const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
94
+ if (!container) return;
95
+ if (container.dataset) {
96
+ if (container.dataset.mounted === 'true') return;
97
+ container.dataset.mounted = 'true';
98
+ }
99
+
100
+ // CSV: prefer public path, fallback to relative
101
+ const CSV_PATHS = [
102
+ '/data/ss_vs_s1.csv',
103
+ './assets/data/ss_vs_s1.csv',
104
+ '../assets/data/ss_vs_s1.csv',
105
+ '../../assets/data/ss_vs_s1.csv'
106
+ ];
107
+ const fetchFirstAvailable = async (paths) => {
108
+ for (const p of paths) {
109
+ try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
110
+ }
111
+ throw new Error('CSV not found: ss_vs_s1.csv');
112
+ };
113
+
114
+ // Controls UI
115
+ const controls = document.createElement('div');
116
+ controls.className = 'd3-line__controls';
117
+ Object.assign(controls.style, {
118
+ marginTop: '12px',
119
+ display: 'flex',
120
+ gap: '16px',
121
+ alignItems: 'center',
122
+ justifyContent: 'space-between',
123
+ width: '100%'
124
+ });
125
+
126
+ const labelMetric = document.createElement('label');
127
+ Object.assign(labelMetric.style, {
128
+ fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
129
+ });
130
+ labelMetric.textContent = 'Metric';
131
+ const selectMetric = document.createElement('select');
132
+ Object.assign(selectMetric.style, { fontSize: '12px' });
133
+ labelMetric.appendChild(selectMetric);
134
+
135
+ // Inline legend on the right of the select
136
+ const legendInline = document.createElement('div');
137
+ legendInline.className = 'controls__legend';
138
+ Object.assign(legendInline.style, {
139
+ display: 'flex',
140
+ gap: '8px',
141
+ alignItems: 'center',
142
+ flexWrap: 'nowrap',
143
+ fontSize: '11px',
144
+ marginLeft: '8px'
145
+ });
146
+ controls.appendChild(legendInline);
147
+ controls.appendChild(labelMetric);
148
+
149
+ // Create SVG with marker definitions
150
+ const svg = d3.select(container).append('svg')
151
+ .attr('width', '100%')
152
+ .style('display', 'block');
153
+
154
+ // Add marker definitions for different shapes
155
+ const defs = svg.append('defs');
156
+
157
+ // Academic marker shapes
158
+ const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
159
+ const markerSize = 8;
160
+
161
+ // Groups
162
+ const gRoot = svg.append('g');
163
+ const gGrid = gRoot.append('g').attr('class', 'grid');
164
+ const gAxes = gRoot.append('g').attr('class', 'axes');
165
+ const gLines = gRoot.append('g').attr('class', 'lines');
166
+ const gPoints = gRoot.append('g').attr('class', 'points');
167
+ const gHover = gRoot.append('g').attr('class', 'hover');
168
+ const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
169
+
170
+ // Tooltip
171
+ container.style.position = container.style.position || 'relative';
172
+ let tip = container.querySelector('.d3-tooltip');
173
+ let tipInner;
174
+ if (!tip) {
175
+ tip = document.createElement('div');
176
+ tip.className = 'd3-tooltip';
177
+ Object.assign(tip.style, {
178
+ position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
179
+ padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
180
+ background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
181
+ transition: 'opacity .12s ease'
182
+ });
183
+ tipInner = document.createElement('div');
184
+ tipInner.className = 'd3-tooltip__inner';
185
+ tipInner.style.textAlign = 'left';
186
+ tip.appendChild(tipInner);
187
+ container.appendChild(tip);
188
+ } else {
189
+ tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
190
+ }
191
+
192
+ // Colors per run
193
+ const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
194
+ const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
195
+
196
+ // Mapping from metric names to display titles
197
+ const metricTitleMapping = {
198
+ 'docvqa_val_anls': 'DocVQA',
199
+ 'infovqa_val_anls': 'InfoVQA',
200
+ 'mme_total_score': 'MME Total',
201
+ 'mmmu_val_mmmu_acc': 'MMMU',
202
+ 'mmstar_average': 'MMStar',
203
+ 'ocrbench_ocrbench_accuracy': 'OCRBench',
204
+ 'scienceqa_exact_match': 'ScienceQA',
205
+ 'textvqa_val_exact_match': 'TextVQA',
206
+ 'average': 'Average (excl. MME)',
207
+ 'average_rank': 'Average Rank',
208
+ 'ai2d_exact_match': 'AI2D',
209
+ 'chartqa_relaxed_overall': 'ChartQA',
210
+ 'seedbench_seed_all': 'SeedBench'
211
+ };
212
+
213
+ // Function to get display name for metric
214
+ function getMetricDisplayName(metricKey) {
215
+ return metricTitleMapping[metricKey] || metricKey;
216
+ }
217
+
218
+ // State and data
219
+ let metricList = [];
220
+ let runList = [];
221
+ let runOrder = [];
222
+ const dataByMetric = new Map(); // metric => { run => [{step,value}] }
223
+ let isRankStrictFlag = false;
224
+ let rankTickMax = 1;
225
+
226
+ // Scales and layout
227
+ let width = 800, height = 360;
228
+ let margin = { top: 16, right: 28, bottom: 56, left: 64 };
229
+ let xScale = d3.scaleLinear();
230
+ let yScale = d3.scaleLinear();
231
+
232
+ // Line generators - simple linear connections
233
+ const lineGen = d3.line()
234
+ .x((d) => xScale(d.step))
235
+ .y((d) => yScale(d.value));
236
+
237
+ // Function to draw different marker shapes
238
+ function drawMarker(selection, shape, size) {
239
+ const s = size / 2;
240
+ switch (shape) {
241
+ case 'circle':
242
+ return selection.append('circle').attr('r', s);
243
+ case 'square':
244
+ return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
245
+ case 'triangle':
246
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
247
+ case 'diamond':
248
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
249
+ case 'inverted-triangle':
250
+ return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
251
+ default:
252
+ return selection.append('circle').attr('r', s);
253
+ }
254
+ }
255
+
256
+ // Hover elements
257
+ const hoverLine = gHover.append('line').attr('stroke-width', 1);
258
+
259
+ const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
260
+
261
+ function updateScales() {
262
+ const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
263
+ const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
264
+ const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
265
+ const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
266
+
267
+ width = container.clientWidth || 800;
268
+ height = Math.max(360, Math.round(width / 2.2));
269
+ svg.attr('width', width).attr('height', height);
270
+
271
+ const innerWidth = width - margin.left - margin.right;
272
+ const innerHeight = height - margin.top - margin.bottom;
273
+ gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
274
+
275
+ xScale.range([0, innerWidth]);
276
+ yScale.range([innerHeight, 0]);
277
+
278
+ // Compute Y ticks
279
+ let yTicks = [];
280
+ if (isRankStrictFlag) {
281
+ const maxR = Math.max(1, Math.round(rankTickMax));
282
+ for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
283
+ } else {
284
+ // Use D3's tick generator to produce nice floating-point ticks
285
+ yTicks = yScale.ticks(6);
286
+ }
287
+
288
+ // Grid (horizontal)
289
+ gGrid.selectAll('*').remove();
290
+ gGrid.selectAll('line')
291
+ .data(yTicks)
292
+ .join('line')
293
+ .attr('x1', 0)
294
+ .attr('x2', innerWidth)
295
+ .attr('y1', (d) => yScale(d))
296
+ .attr('y2', (d) => yScale(d))
297
+ .attr('stroke', gridColor)
298
+ .attr('stroke-width', 1)
299
+ .attr('shape-rendering', 'crispEdges');
300
+
301
+ // Axes
302
+ gAxes.selectAll('*').remove();
303
+ let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
304
+ if (isRankStrictFlag) {
305
+ const [dx0, dx1] = xScale.domain();
306
+ const start = Math.ceil(dx0 / 1000) * 1000;
307
+ const end = Math.floor(dx1 / 1000) * 1000;
308
+ const xTicks = [];
309
+ for (let v = start; v <= end; v += 1000) xTicks.push(v);
310
+ if (xTicks.length === 0) xTicks.push(Math.round(dx0));
311
+ xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
312
+ } else {
313
+ xAxis = xAxis.ticks(8);
314
+ }
315
+ const yAxis = d3.axisLeft(yScale)
316
+ .tickValues(yTicks)
317
+ .tickSizeOuter(0)
318
+ .tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
319
+ gAxes.append('g')
320
+ .attr('transform', `translate(0,${innerHeight})`)
321
+ .call(xAxis)
322
+ .call((g) => {
323
+ g.selectAll('path, line').attr('stroke', axisColor);
324
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
325
+ });
326
+ gAxes.append('g')
327
+ .call(yAxis)
328
+ .call((g) => {
329
+ g.selectAll('path, line').attr('stroke', axisColor);
330
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
331
+ });
332
+
333
+ // Axis labels (X and Y)
334
+ gAxes.append('text')
335
+ .attr('class', 'axis-label axis-label--x')
336
+ .attr('x', innerWidth / 2)
337
+ .attr('y', innerHeight + 44)
338
+ .attr('text-anchor', 'middle')
339
+ .style('font-size', '12px')
340
+ .style('fill', tickColor)
341
+ .text('Step');
342
+ gAxes.append('text')
343
+ .attr('class', 'axis-label axis-label--y')
344
+ .attr('text-anchor', 'middle')
345
+ .attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
346
+ .style('font-size', '12px')
347
+ .style('fill', tickColor)
348
+ .text('Value');
349
+
350
+ overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
351
+ hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
352
+
353
+ // Legend placeholder; actual content set in renderMetric
354
+ const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
355
+ const legendHeight = 64;
356
+ gLegend
357
+ .attr('x', innerWidth - legendWidth + 42)
358
+ .attr('y', innerHeight - legendHeight - 12)
359
+ .attr('width', legendWidth)
360
+ .attr('height', legendHeight);
361
+ const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
362
+ Object.assign(legendRoot.node().style, {
363
+ background: 'transparent',
364
+ border: 'none',
365
+ borderRadius: '0',
366
+ padding: '0',
367
+ fontSize: '12px',
368
+ lineHeight: '1.35',
369
+ color: 'var(--text-color)'
370
+ });
371
+
372
+ return { innerWidth, innerHeight };
373
+ }
374
+
375
+ function renderMetric(metricKey){
376
+ const map = dataByMetric.get(metricKey) || {};
377
+ const runs = runOrder;
378
+ // Domain
379
+ let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
380
+ const isRank = /rank/i.test(metricKey);
381
+ const isAverage = /average/i.test(metricKey);
382
+ const isRankStrict = isRank && !isAverage;
383
+ runs.forEach(r => {
384
+ const arr = map[r] || [];
385
+ arr.forEach(pt => {
386
+ const val = isRankStrict ? Math.round(pt.value) : pt.value;
387
+ minStep = Math.min(minStep, pt.step);
388
+ maxStep = Math.max(maxStep, pt.step);
389
+ maxVal = Math.max(maxVal, val);
390
+ minVal = Math.min(minVal, val);
391
+ });
392
+ });
393
+ if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
394
+ xScale.domain([minStep, maxStep]);
395
+ if (isRank) {
396
+ rankTickMax = Math.max(1, Math.round(maxVal));
397
+ yScale.domain([rankTickMax, 1]);
398
+ } else {
399
+ yScale.domain([0, Math.max(1, maxVal)]).nice();
400
+ }
401
+ isRankStrictFlag = isRankStrict;
402
+
403
+ const { innerWidth, innerHeight } = updateScales();
404
+
405
+ // Bind lines and markers
406
+ const series = runs.map((r, i) => ({
407
+ run: r,
408
+ color: pool[i % pool.length],
409
+ marker: markerShapes[i % markerShapes.length],
410
+ values: (map[r]||[])
411
+ .slice()
412
+ .sort((a,b)=>a.step-b.step)
413
+ .map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
414
+ }));
415
+
416
+ // Draw lines
417
+ const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
418
+ paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
419
+ .attr('stroke', d=>d.color).attr('opacity',0.9)
420
+ .attr('d', d=>lineGen(d.values))
421
+ .merge(paths)
422
+ .transition().duration(200)
423
+ .attr('stroke', d=>d.color)
424
+ .attr('d', d=>lineGen(d.values));
425
+ paths.exit().remove();
426
+
427
+ // Draw markers for each data point
428
+ gPoints.selectAll('*').remove();
429
+ series.forEach((s, seriesIndex) => {
430
+ const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
431
+ .data(s.values)
432
+ .join('g')
433
+ .attr('class', `points-${seriesIndex}`)
434
+ .attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
435
+
436
+ drawMarker(pointGroup, s.marker, markerSize)
437
+ .attr('fill', s.color)
438
+ .attr('stroke', s.color)
439
+ .attr('stroke-width', 1.5)
440
+ .style('cursor', 'crosshair');
441
+ });
442
+
443
+ // Inline legend content with marker shapes
444
+ legendInline.innerHTML = '';
445
+ series.forEach(s => {
446
+ const legendItem = document.createElement('span');
447
+ legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
448
+
449
+ // Create small SVG for marker shape
450
+ const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
451
+ markerSvg.setAttribute('width', '16');
452
+ markerSvg.setAttribute('height', '12');
453
+ markerSvg.style.display = 'inline-block';
454
+
455
+ const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
456
+ g.setAttribute('transform', 'translate(8,6)');
457
+
458
+ let shape;
459
+ const size = 6;
460
+ const halfSize = size / 2;
461
+ switch(s.marker) {
462
+ case 'circle':
463
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
464
+ shape.setAttribute('r', halfSize);
465
+ break;
466
+ case 'square':
467
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
468
+ shape.setAttribute('x', -halfSize);
469
+ shape.setAttribute('y', -halfSize);
470
+ shape.setAttribute('width', size);
471
+ shape.setAttribute('height', size);
472
+ break;
473
+ case 'triangle':
474
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
475
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
476
+ break;
477
+ case 'diamond':
478
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
479
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
480
+ break;
481
+ case 'inverted-triangle':
482
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
483
+ shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
484
+ break;
485
+ default:
486
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
487
+ shape.setAttribute('r', halfSize);
488
+ }
489
+ shape.setAttribute('fill', s.color);
490
+ shape.setAttribute('stroke', s.color);
491
+ shape.setAttribute('stroke-width', '1');
492
+
493
+ g.appendChild(shape);
494
+ markerSvg.appendChild(g);
495
+
496
+ const label = document.createElement('span');
497
+ label.textContent = s.run;
498
+
499
+ legendItem.appendChild(markerSvg);
500
+ legendItem.appendChild(label);
501
+ legendInline.appendChild(legendItem);
502
+ });
503
+
504
+ // Hover
505
+ const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
506
+ const steps = Array.from(stepSet).sort((a,b)=>a-b);
507
+ function onMove(event){
508
+ const [mx, my] = d3.pointer(event, overlay.node());
509
+ const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
510
+ const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
511
+ const xpx = xScale(nearest);
512
+ hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
513
+ // Tooltip content
514
+ let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
515
+ series.forEach(s=>{
516
+ const m = new Map(s.values.map(v=>[v.step, v.value]));
517
+ const val = m.has(nearest) ? m.get(nearest) : null;
518
+ if (val != null) {
519
+ const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
520
+ html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
521
+ }
522
+ });
523
+ tipInner.innerHTML = html;
524
+ const offsetX = 12, offsetY = 12;
525
+ tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
526
+ }
527
+ function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
528
+ overlay.on('mousemove', onMove).on('mouseleave', onLeave);
529
+ }
530
+
531
+ // (old hover removed; hover is attached in renderMetric)
532
+
533
+ // Load CSV and wire controls
534
+ (async () => {
535
+ try {
536
+ const text = await fetchFirstAvailable(CSV_PATHS);
537
+ const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
538
+ metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
539
+ runList = Array.from(new Set(rows.map(r=>r.run))).sort();
540
+ runOrder = runList;
541
+ // Build dataByMetric
542
+ metricList.forEach(m => {
543
+ const map = {};
544
+ runList.forEach(r => { map[r] = []; });
545
+ rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
546
+ dataByMetric.set(m, map);
547
+ });
548
+
549
+ // Populate metric select (default to average_rank if present)
550
+ metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
551
+ const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
552
+ if (def) selectMetric.value = def;
553
+
554
+ container.appendChild(controls);
555
+ updateScales();
556
+ renderMetric(selectMetric.value);
557
+
558
+ selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
559
+
560
+ const rerender = () => { renderMetric(selectMetric.value); };
561
+ if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
562
+ } catch (e) {
563
+ const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
564
+ pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
565
+ container.appendChild(pre);
566
+ }
567
+ })();
568
+ };
569
+
570
+ if (document.readyState === 'loading') {
571
+ document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
572
+ } else { ensureD3(bootstrap); }
573
+ })();
574
+ </script>
575
+
576
+
app/src/content/embeds/visual-dependency-filters.html ADDED
@@ -0,0 +1,576 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div class="d3-line" style="width:100%;margin:10px 0;"></div>
2
+ <style>
3
+ .d3-line .d3-line__controls select {
4
+ font-size: 12px;
5
+ padding: 8px 28px 8px 10px;
6
+ border: 1px solid var(--border-color);
7
+ border-radius: 8px;
8
+ background-color: var(--surface-bg);
9
+ color: var(--text-color);
10
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
11
+ background-repeat: no-repeat;
12
+ background-position: right 8px center;
13
+ background-size: 12px;
14
+ -webkit-appearance: none;
15
+ -moz-appearance: none;
16
+ appearance: none;
17
+ cursor: pointer;
18
+ transition: border-color .15s ease, box-shadow .15s ease;
19
+ }
20
+ [data-theme="dark"] .d3-line .d3-line__controls select {
21
+ background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
22
+ }
23
+ .d3-line .d3-line__controls select:hover {
24
+ border-color: var(--primary-color);
25
+ }
26
+ .d3-line .d3-line__controls select:focus {
27
+ border-color: var(--primary-color);
28
+ box-shadow: 0 0 0 3px rgba(232,137,171,.25);
29
+ outline: none;
30
+ }
31
+ .d3-line .d3-line__controls label { gap: 8px; }
32
+
33
+ /* Range slider themed with --primary-color */
34
+ .d3-line .d3-line__controls input[type="range"] {
35
+ -webkit-appearance: none;
36
+ appearance: none;
37
+ width: 100%;
38
+ height: 6px;
39
+ border-radius: 999px;
40
+ background: var(--border-color);
41
+ outline: none;
42
+ }
43
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
44
+ height: 6px;
45
+ background: transparent;
46
+ border-radius: 999px;
47
+ }
48
+ .d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
49
+ -webkit-appearance: none;
50
+ appearance: none;
51
+ width: 16px;
52
+ height: 16px;
53
+ border-radius: 50%;
54
+ background: var(--primary-color);
55
+ border: 2px solid var(--on-primary);
56
+ margin-top: -5px;
57
+ cursor: pointer;
58
+ }
59
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-track {
60
+ height: 6px;
61
+ background: transparent;
62
+ border-radius: 999px;
63
+ }
64
+ .d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
65
+ width: 16px;
66
+ height: 16px;
67
+ border-radius: 50%;
68
+ background: var(--primary-color);
69
+ border: 2px solid var(--on-primary);
70
+ cursor: pointer;
71
+ }
72
+ /* Improved line color via CSS */
73
+ .d3-line .lines path.improved { stroke: var(--primary-color); }
74
+ </style>
75
+ <script>
76
+ (() => {
77
+ const ensureD3 = (cb) => {
78
+ if (window.d3 && typeof window.d3.select === 'function') return cb();
79
+ let s = document.getElementById('d3-cdn-script');
80
+ if (!s) {
81
+ s = document.createElement('script');
82
+ s.id = 'd3-cdn-script';
83
+ s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
84
+ document.head.appendChild(s);
85
+ }
86
+ const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
87
+ s.addEventListener('load', onReady, { once: true });
88
+ if (window.d3) onReady();
89
+ };
90
+
91
+ const bootstrap = () => {
92
+ const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
93
+ const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
94
+ if (!container) return;
95
+ if (container.dataset) {
96
+ if (container.dataset.mounted === 'true') return;
97
+ container.dataset.mounted = 'true';
98
+ }
99
+
100
+ // CSV: prefer public path, fallback to relative
101
+ const CSV_PATHS = [
102
+ '/data/visual_dependency_filters.csv',
103
+ './assets/data/visual_dependency_filters.csv',
104
+ '../assets/data/visual_dependency_filters.csv',
105
+ '../../assets/data/visual_dependency_filters.csv'
106
+ ];
107
+ const fetchFirstAvailable = async (paths) => {
108
+ for (const p of paths) {
109
+ try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
110
+ }
111
+ throw new Error('CSV not found: visual_dependency_filters.csv');
112
+ };
113
+
114
+ // Controls UI
115
+ const controls = document.createElement('div');
116
+ controls.className = 'd3-line__controls';
117
+ Object.assign(controls.style, {
118
+ marginTop: '12px',
119
+ display: 'flex',
120
+ gap: '16px',
121
+ alignItems: 'center',
122
+ justifyContent: 'space-between',
123
+ width: '100%'
124
+ });
125
+
126
+ const labelMetric = document.createElement('label');
127
+ Object.assign(labelMetric.style, {
128
+ fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
129
+ });
130
+ labelMetric.textContent = 'Metric';
131
+ const selectMetric = document.createElement('select');
132
+ Object.assign(selectMetric.style, { fontSize: '12px' });
133
+ labelMetric.appendChild(selectMetric);
134
+
135
+ // Inline legend on the right of the select
136
+ const legendInline = document.createElement('div');
137
+ legendInline.className = 'controls__legend';
138
+ Object.assign(legendInline.style, {
139
+ display: 'flex',
140
+ gap: '8px',
141
+ alignItems: 'center',
142
+ flexWrap: 'nowrap',
143
+ fontSize: '11px',
144
+ marginLeft: '8px'
145
+ });
146
+ controls.appendChild(legendInline);
147
+ controls.appendChild(labelMetric);
148
+
149
+ // Create SVG with marker definitions
150
+ const svg = d3.select(container).append('svg')
151
+ .attr('width', '100%')
152
+ .style('display', 'block');
153
+
154
+ // Add marker definitions for different shapes
155
+ const defs = svg.append('defs');
156
+
157
+ // Academic marker shapes
158
+ const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
159
+ const markerSize = 8;
160
+
161
+ // Groups
162
+ const gRoot = svg.append('g');
163
+ const gGrid = gRoot.append('g').attr('class', 'grid');
164
+ const gAxes = gRoot.append('g').attr('class', 'axes');
165
+ const gLines = gRoot.append('g').attr('class', 'lines');
166
+ const gPoints = gRoot.append('g').attr('class', 'points');
167
+ const gHover = gRoot.append('g').attr('class', 'hover');
168
+ const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
169
+
170
+ // Tooltip
171
+ container.style.position = container.style.position || 'relative';
172
+ let tip = container.querySelector('.d3-tooltip');
173
+ let tipInner;
174
+ if (!tip) {
175
+ tip = document.createElement('div');
176
+ tip.className = 'd3-tooltip';
177
+ Object.assign(tip.style, {
178
+ position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
179
+ padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
180
+ background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
181
+ transition: 'opacity .12s ease'
182
+ });
183
+ tipInner = document.createElement('div');
184
+ tipInner.className = 'd3-tooltip__inner';
185
+ tipInner.style.textAlign = 'left';
186
+ tip.appendChild(tipInner);
187
+ container.appendChild(tip);
188
+ } else {
189
+ tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
190
+ }
191
+
192
+ // Colors per run
193
+ const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
194
+ const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
195
+
196
+ // Mapping from metric names to display titles
197
+ const metricTitleMapping = {
198
+ 'docvqa_val_anls': 'DocVQA',
199
+ 'infovqa_val_anls': 'InfoVQA',
200
+ 'mme_total_score': 'MME Total',
201
+ 'mmmu_val_mmmu_acc': 'MMMU',
202
+ 'mmstar_average': 'MMStar',
203
+ 'ocrbench_ocrbench_accuracy': 'OCRBench',
204
+ 'scienceqa_exact_match': 'ScienceQA',
205
+ 'textvqa_val_exact_match': 'TextVQA',
206
+ 'average': 'Average (excl. MME)',
207
+ 'average_rank': 'Average Rank',
208
+ 'ai2d_exact_match': 'AI2D',
209
+ 'chartqa_relaxed_overall': 'ChartQA',
210
+ 'seedbench_seed_all': 'SeedBench'
211
+ };
212
+
213
+ // Function to get display name for metric
214
+ function getMetricDisplayName(metricKey) {
215
+ return metricTitleMapping[metricKey] || metricKey;
216
+ }
217
+
218
+ // State and data
219
+ let metricList = [];
220
+ let runList = [];
221
+ let runOrder = [];
222
+ const dataByMetric = new Map(); // metric => { run => [{step,value}] }
223
+ let isRankStrictFlag = false;
224
+ let rankTickMax = 1;
225
+
226
+ // Scales and layout
227
+ let width = 800, height = 360;
228
+ let margin = { top: 16, right: 28, bottom: 56, left: 64 };
229
+ let xScale = d3.scaleLinear();
230
+ let yScale = d3.scaleLinear();
231
+
232
+ // Line generators - simple linear connections
233
+ const lineGen = d3.line()
234
+ .x((d) => xScale(d.step))
235
+ .y((d) => yScale(d.value));
236
+
237
+ // Function to draw different marker shapes
238
+ function drawMarker(selection, shape, size) {
239
+ const s = size / 2;
240
+ switch (shape) {
241
+ case 'circle':
242
+ return selection.append('circle').attr('r', s);
243
+ case 'square':
244
+ return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
245
+ case 'triangle':
246
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
247
+ case 'diamond':
248
+ return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
249
+ case 'inverted-triangle':
250
+ return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
251
+ default:
252
+ return selection.append('circle').attr('r', s);
253
+ }
254
+ }
255
+
256
+ // Hover elements
257
+ const hoverLine = gHover.append('line').attr('stroke-width', 1);
258
+
259
+ const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
260
+
261
+ function updateScales() {
262
+ const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
263
+ const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
264
+ const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
265
+ const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
266
+
267
+ width = container.clientWidth || 800;
268
+ height = Math.max(360, Math.round(width / 2.2));
269
+ svg.attr('width', width).attr('height', height);
270
+
271
+ const innerWidth = width - margin.left - margin.right;
272
+ const innerHeight = height - margin.top - margin.bottom;
273
+ gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
274
+
275
+ xScale.range([0, innerWidth]);
276
+ yScale.range([innerHeight, 0]);
277
+
278
+ // Compute Y ticks
279
+ let yTicks = [];
280
+ if (isRankStrictFlag) {
281
+ const maxR = Math.max(1, Math.round(rankTickMax));
282
+ for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
283
+ } else {
284
+ // Use D3's tick generator to produce nice floating-point ticks
285
+ yTicks = yScale.ticks(6);
286
+ }
287
+
288
+ // Grid (horizontal)
289
+ gGrid.selectAll('*').remove();
290
+ gGrid.selectAll('line')
291
+ .data(yTicks)
292
+ .join('line')
293
+ .attr('x1', 0)
294
+ .attr('x2', innerWidth)
295
+ .attr('y1', (d) => yScale(d))
296
+ .attr('y2', (d) => yScale(d))
297
+ .attr('stroke', gridColor)
298
+ .attr('stroke-width', 1)
299
+ .attr('shape-rendering', 'crispEdges');
300
+
301
+ // Axes
302
+ gAxes.selectAll('*').remove();
303
+ let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
304
+ if (isRankStrictFlag) {
305
+ const [dx0, dx1] = xScale.domain();
306
+ const start = Math.ceil(dx0 / 1000) * 1000;
307
+ const end = Math.floor(dx1 / 1000) * 1000;
308
+ const xTicks = [];
309
+ for (let v = start; v <= end; v += 1000) xTicks.push(v);
310
+ if (xTicks.length === 0) xTicks.push(Math.round(dx0));
311
+ xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
312
+ } else {
313
+ xAxis = xAxis.ticks(8);
314
+ }
315
+ const yAxis = d3.axisLeft(yScale)
316
+ .tickValues(yTicks)
317
+ .tickSizeOuter(0)
318
+ .tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
319
+ gAxes.append('g')
320
+ .attr('transform', `translate(0,${innerHeight})`)
321
+ .call(xAxis)
322
+ .call((g) => {
323
+ g.selectAll('path, line').attr('stroke', axisColor);
324
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
325
+ });
326
+ gAxes.append('g')
327
+ .call(yAxis)
328
+ .call((g) => {
329
+ g.selectAll('path, line').attr('stroke', axisColor);
330
+ g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
331
+ });
332
+
333
+ // Axis labels (X and Y)
334
+ gAxes.append('text')
335
+ .attr('class', 'axis-label axis-label--x')
336
+ .attr('x', innerWidth / 2)
337
+ .attr('y', innerHeight + 44)
338
+ .attr('text-anchor', 'middle')
339
+ .style('font-size', '12px')
340
+ .style('fill', tickColor)
341
+ .text('Step');
342
+ gAxes.append('text')
343
+ .attr('class', 'axis-label axis-label--y')
344
+ .attr('text-anchor', 'middle')
345
+ .attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
346
+ .style('font-size', '12px')
347
+ .style('fill', tickColor)
348
+ .text('Value');
349
+
350
+ overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
351
+ hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
352
+
353
+ // Legend placeholder; actual content set in renderMetric
354
+ const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
355
+ const legendHeight = 64;
356
+ gLegend
357
+ .attr('x', innerWidth - legendWidth + 42)
358
+ .attr('y', innerHeight - legendHeight - 12)
359
+ .attr('width', legendWidth)
360
+ .attr('height', legendHeight);
361
+ const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
362
+ Object.assign(legendRoot.node().style, {
363
+ background: 'transparent',
364
+ border: 'none',
365
+ borderRadius: '0',
366
+ padding: '0',
367
+ fontSize: '12px',
368
+ lineHeight: '1.35',
369
+ color: 'var(--text-color)'
370
+ });
371
+
372
+ return { innerWidth, innerHeight };
373
+ }
374
+
375
+ function renderMetric(metricKey){
376
+ const map = dataByMetric.get(metricKey) || {};
377
+ const runs = runOrder;
378
+ // Domain
379
+ let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
380
+ const isRank = /rank/i.test(metricKey);
381
+ const isAverage = /average/i.test(metricKey);
382
+ const isRankStrict = isRank && !isAverage;
383
+ runs.forEach(r => {
384
+ const arr = map[r] || [];
385
+ arr.forEach(pt => {
386
+ const val = isRankStrict ? Math.round(pt.value) : pt.value;
387
+ minStep = Math.min(minStep, pt.step);
388
+ maxStep = Math.max(maxStep, pt.step);
389
+ maxVal = Math.max(maxVal, val);
390
+ minVal = Math.min(minVal, val);
391
+ });
392
+ });
393
+ if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
394
+ xScale.domain([minStep, maxStep]);
395
+ if (isRank) {
396
+ rankTickMax = Math.max(1, Math.round(maxVal));
397
+ yScale.domain([rankTickMax, 1]);
398
+ } else {
399
+ yScale.domain([0, Math.max(1, maxVal)]).nice();
400
+ }
401
+ isRankStrictFlag = isRankStrict;
402
+
403
+ const { innerWidth, innerHeight } = updateScales();
404
+
405
+ // Bind lines and markers
406
+ const series = runs.map((r, i) => ({
407
+ run: r,
408
+ color: pool[i % pool.length],
409
+ marker: markerShapes[i % markerShapes.length],
410
+ values: (map[r]||[])
411
+ .slice()
412
+ .sort((a,b)=>a.step-b.step)
413
+ .map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
414
+ }));
415
+
416
+ // Draw lines
417
+ const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
418
+ paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
419
+ .attr('stroke', d=>d.color).attr('opacity',0.9)
420
+ .attr('d', d=>lineGen(d.values))
421
+ .merge(paths)
422
+ .transition().duration(200)
423
+ .attr('stroke', d=>d.color)
424
+ .attr('d', d=>lineGen(d.values));
425
+ paths.exit().remove();
426
+
427
+ // Draw markers for each data point
428
+ gPoints.selectAll('*').remove();
429
+ series.forEach((s, seriesIndex) => {
430
+ const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
431
+ .data(s.values)
432
+ .join('g')
433
+ .attr('class', `points-${seriesIndex}`)
434
+ .attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
435
+
436
+ drawMarker(pointGroup, s.marker, markerSize)
437
+ .attr('fill', s.color)
438
+ .attr('stroke', s.color)
439
+ .attr('stroke-width', 1.5)
440
+ .style('cursor', 'crosshair');
441
+ });
442
+
443
+ // Inline legend content with marker shapes
444
+ legendInline.innerHTML = '';
445
+ series.forEach(s => {
446
+ const legendItem = document.createElement('span');
447
+ legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
448
+
449
+ // Create small SVG for marker shape
450
+ const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
451
+ markerSvg.setAttribute('width', '16');
452
+ markerSvg.setAttribute('height', '12');
453
+ markerSvg.style.display = 'inline-block';
454
+
455
+ const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
456
+ g.setAttribute('transform', 'translate(8,6)');
457
+
458
+ let shape;
459
+ const size = 6;
460
+ const halfSize = size / 2;
461
+ switch(s.marker) {
462
+ case 'circle':
463
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
464
+ shape.setAttribute('r', halfSize);
465
+ break;
466
+ case 'square':
467
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
468
+ shape.setAttribute('x', -halfSize);
469
+ shape.setAttribute('y', -halfSize);
470
+ shape.setAttribute('width', size);
471
+ shape.setAttribute('height', size);
472
+ break;
473
+ case 'triangle':
474
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
475
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
476
+ break;
477
+ case 'diamond':
478
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
479
+ shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
480
+ break;
481
+ case 'inverted-triangle':
482
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
483
+ shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
484
+ break;
485
+ default:
486
+ shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
487
+ shape.setAttribute('r', halfSize);
488
+ }
489
+ shape.setAttribute('fill', s.color);
490
+ shape.setAttribute('stroke', s.color);
491
+ shape.setAttribute('stroke-width', '1');
492
+
493
+ g.appendChild(shape);
494
+ markerSvg.appendChild(g);
495
+
496
+ const label = document.createElement('span');
497
+ label.textContent = s.run;
498
+
499
+ legendItem.appendChild(markerSvg);
500
+ legendItem.appendChild(label);
501
+ legendInline.appendChild(legendItem);
502
+ });
503
+
504
+ // Hover
505
+ const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
506
+ const steps = Array.from(stepSet).sort((a,b)=>a-b);
507
+ function onMove(event){
508
+ const [mx, my] = d3.pointer(event, overlay.node());
509
+ const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
510
+ const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
511
+ const xpx = xScale(nearest);
512
+ hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
513
+ // Tooltip content
514
+ let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
515
+ series.forEach(s=>{
516
+ const m = new Map(s.values.map(v=>[v.step, v.value]));
517
+ const val = m.has(nearest) ? m.get(nearest) : null;
518
+ if (val != null) {
519
+ const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
520
+ html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
521
+ }
522
+ });
523
+ tipInner.innerHTML = html;
524
+ const offsetX = 12, offsetY = 12;
525
+ tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
526
+ }
527
+ function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
528
+ overlay.on('mousemove', onMove).on('mouseleave', onLeave);
529
+ }
530
+
531
+ // (old hover removed; hover is attached in renderMetric)
532
+
533
+ // Load CSV and wire controls
534
+ (async () => {
535
+ try {
536
+ const text = await fetchFirstAvailable(CSV_PATHS);
537
+ const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
538
+ metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
539
+ runList = Array.from(new Set(rows.map(r=>r.run))).sort();
540
+ runOrder = runList;
541
+ // Build dataByMetric
542
+ metricList.forEach(m => {
543
+ const map = {};
544
+ runList.forEach(r => { map[r] = []; });
545
+ rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
546
+ dataByMetric.set(m, map);
547
+ });
548
+
549
+ // Populate metric select (default to average_rank if present)
550
+ metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
551
+ const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
552
+ if (def) selectMetric.value = def;
553
+
554
+ container.appendChild(controls);
555
+ updateScales();
556
+ renderMetric(selectMetric.value);
557
+
558
+ selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
559
+
560
+ const rerender = () => { renderMetric(selectMetric.value); };
561
+ if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
562
+ } catch (e) {
563
+ const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
564
+ pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
565
+ container.appendChild(pre);
566
+ }
567
+ })();
568
+ };
569
+
570
+ if (document.readyState === 'loading') {
571
+ document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
572
+ } else { ensureD3(bootstrap); }
573
+ })();
574
+ </script>
575
+
576
+