Spaces:
Running
Running
new plots
Browse files- app/src/content/article.mdx +11 -8
- app/src/content/assets/audio/audio-example.wav +0 -3
- app/src/content/assets/data/against_baselines.csv +961 -3
- app/src/content/assets/data/against_baselines_deduplicated.csv +828 -0
- app/src/content/assets/data/all_ratings_luis.csv +1201 -0
- app/src/content/assets/data/formatting_filters.csv +1201 -0
- app/src/content/assets/data/image_correspondence_filters.csv +1177 -0
- app/src/content/assets/data/internal_deduplication.csv +729 -0
- app/src/content/assets/data/mnist-variant-model.json +0 -3
- app/src/content/assets/data/relevance_filters.csv +1201 -0
- app/src/content/assets/data/remove_ch.csv +455 -0
- app/src/content/assets/data/s25_ratings.csv +1189 -0
- app/src/content/assets/data/ss_vs_s1.csv +481 -0
- app/src/content/assets/data/visual_dependency_filters.csv +1165 -0
- app/src/content/embeds/against-baselines-deduplicated.html +576 -0
- app/src/content/embeds/{d3-line.html → against-baselines.html} +0 -0
- app/src/content/embeds/all-ratings.html +576 -0
- app/src/content/embeds/formatting-filters.html +576 -0
- app/src/content/embeds/image-correspondence-filters.html +576 -0
- app/src/content/embeds/internal-deduplication.html +576 -0
- app/src/content/embeds/relevance-filters.html +576 -0
- app/src/content/embeds/remove-ch.html +576 -0
- app/src/content/embeds/s25-ratings.html +576 -0
- app/src/content/embeds/ss-vs-s1.html +576 -0
- app/src/content/embeds/visual-dependency-filters.html +576 -0
app/src/content/article.mdx
CHANGED
|
@@ -263,7 +263,7 @@ Each of our ablations trains a 450M model with maximal image size of 1536x1536 p
|
|
| 263 |
### How does FineVision compare against the Baselines?
|
| 264 |
Compared against existing VLM training datasets, FineVision produces significantly higher benchmark ranks than the other options.
|
| 265 |
|
| 266 |
-
<HtmlEmbed src="
|
| 267 |
|
| 268 |
### How contaminated are the datasets?
|
| 269 |
To investigate data leakage from benchmarks into this dataset, we construct a deduplication pipeline based on the sample images. We embed the images of 66 image-test datasets from the lmms-eval framework using the SSCD descriptor, and compute the cosine similarity between our samples and the test-set embeddings. Whenever a sample has a similarity higher than a threshold of 0.95 it is assumed to be a duplicate. While our tests with various thresholds show that this is flagging some samples that are not actual duplicates (especially if the image depicts similar but different images in detail, like graphs or tables), we preferred to err on the side of caution. We open-source the deduplication pipeline here as well as the precomputed test-set embedding’s here.
|
|
@@ -279,7 +279,7 @@ TODO: Insert the Images here
|
|
| 279 |
|
| 280 |
Additionally, we experimented with removing all found samples from all datasets to see if the outcome is different from the results above, but we observe the same distribution.
|
| 281 |
|
| 282 |
-
<HtmlEmbed src="against-baselines-deduplicated.html"
|
| 283 |
|
| 284 |
TODO: After removing these duplicates, the performance of the models dropped by … % over all benchmarks.
|
| 285 |
|
|
@@ -297,12 +297,12 @@ Similarly to the comparison of the size, we also wanted to evaluate the datasets
|
|
| 297 |
Since the training of a VLM already builds upon pretrained vision and language backbones, datasets are usually not completely unstructured, but follow an image+question and answer structure. Recent works have shown that consolidating multiple questions for the same image into a multi-turn conversation where the image is shown only once improves model performance, and additionally also reduces the datasets memory footprint. We therefore experiment with deduplicating every image in our dataset internally using the same SSCD descriptors, manually inspect the resulting clusters and merge fitting samples into a multi-turn conversation.
|
| 298 |
Even when training for longer than the other ablations, we did not observe a significant difference, if at all rather one in favour against merging multiple samples together.
|
| 299 |
|
| 300 |
-
<HtmlEmbed src="internal-deduplication.html"
|
| 301 |
|
| 302 |
### Should you train on multilingual data if your language backbone was not?
|
| 303 |
There are some multilingual datasets in our mixture, but since our Language Backbone is only trained on English data, we experimented with removing all the multilingual, mainly Chinese, subsets. This does also not seem to make a big difference, with slight advantages to leaving the data, even if it was not part of the Language Backbone's initial training. In our training setup with this configuration, one epoch over the whole dataset equals ~12k steps, so the benefit of unseen languages only materializes after the first full epoch.
|
| 304 |
|
| 305 |
-
<HtmlEmbed src="remove-ch.html"
|
| 306 |
|
| 307 |
### How can you assess the quality of the dataset?
|
| 308 |
|
|
@@ -324,7 +324,7 @@ This is the distribution of scores across the different filters for FineVision.
|
|
| 324 |
|
| 325 |
To try to quantify the quality of the training data and the effect it has on the model’s performance, we run extensive ablations on our generated ratings.
|
| 326 |
|
| 327 |
-
<HtmlEmbed src="all-ratings.html"
|
| 328 |
|
| 329 |
Interestingly, both when only training on turns that have any of the 4 ratings under a certain threshold, as well as when training on turns where only a single rating at a time is used, we observe the same behaviour. Simply training on all samples of the dataset outperforms in benchmarks. This could mean multiple things.
|
| 330 |
We can almost see the same distribution in the ranks across all filters: From best to worst with an increase in the rating threshold. For example the visual dependency and the image correspondence rating both result in exactly the same distribution of rankings, corresponding to the natural order of options, 1 through 5. This could indicate that with a sufficiently large dataset that you train on long enough, it hurts more to remove samples, even if they were judged to be of low quality, than to train on them.
|
|
@@ -332,21 +332,24 @@ The notion of quality for VLM datasets is nuanced in general. If we compare trai
|
|
| 332 |
Alternatively, while we used state-of-the-art open source models to judge our datapoints, we still had to find a compromise between model quality and cost due to the raw required effort to rate every single turn of FineVision. The chosen models could simply not be powerful enough to recognize and judge the quality of samples.
|
| 333 |
Even though our first proposal to judge the quality of multimodal data on a per-turn basis did not yield any improvement in model performance, we believe that this is still an exciting and important direction of research and hope the release of FineVision encourages the community to develop techniques for this at large scale.
|
| 334 |
|
| 335 |
-
<HtmlEmbed src="
|
|
|
|
|
|
|
|
|
|
| 336 |
|
| 337 |
### Should you train in multiple stages?
|
| 338 |
The standard training procedure of a VLM usually follows at least two stages. First, you train only the connecting module, potentially in addition the image encoder, and then you train the whole model in a second stage. Some work has even introduced an additional Stage 2.5, where you train the full model on a smaller subset of higher quality data. To investigate this on small models, we experiment both with single, dual and triple stage training.
|
| 339 |
|
| 340 |
#### 1 Stage vs 2 Stages
|
| 341 |
|
| 342 |
-
<HtmlEmbed src="
|
| 343 |
|
| 344 |
We observe that at this model size, with this amount of available data, training only a single stage actually outperforms a multi stage approach.
|
| 345 |
|
| 346 |
#### 2 Stages vs 2.5 Stages
|
| 347 |
We also experiment if splitting the second stage results in any performance improvements. We take the baseline, and continue training for another 20k steps, both with the unfiltered (>= 1) as well as filtered subsets of FineVision according to our ratings.
|
| 348 |
|
| 349 |
-
<HtmlEmbed src="
|
| 350 |
|
| 351 |
Like in the previous results, we observe that the best outcome is simply achieved by training on as much data as possible.
|
| 352 |
|
|
|
|
| 263 |
### How does FineVision compare against the Baselines?
|
| 264 |
Compared against existing VLM training datasets, FineVision produces significantly higher benchmark ranks than the other options.
|
| 265 |
|
| 266 |
+
<HtmlEmbed src="against-baselines.html" desc="Average Rank of Models trained on different open source datasets." />
|
| 267 |
|
| 268 |
### How contaminated are the datasets?
|
| 269 |
To investigate data leakage from benchmarks into this dataset, we construct a deduplication pipeline based on the sample images. We embed the images of 66 image-test datasets from the lmms-eval framework using the SSCD descriptor, and compute the cosine similarity between our samples and the test-set embeddings. Whenever a sample has a similarity higher than a threshold of 0.95 it is assumed to be a duplicate. While our tests with various thresholds show that this is flagging some samples that are not actual duplicates (especially if the image depicts similar but different images in detail, like graphs or tables), we preferred to err on the side of caution. We open-source the deduplication pipeline here as well as the precomputed test-set embedding’s here.
|
|
|
|
| 279 |
|
| 280 |
Additionally, we experimented with removing all found samples from all datasets to see if the outcome is different from the results above, but we observe the same distribution.
|
| 281 |
|
| 282 |
+
<HtmlEmbed src="against-baselines-deduplicated.html" desc="Average Rank of Models trained on different deduplicated open source datasets." />
|
| 283 |
|
| 284 |
TODO: After removing these duplicates, the performance of the models dropped by … % over all benchmarks.
|
| 285 |
|
|
|
|
| 297 |
Since the training of a VLM already builds upon pretrained vision and language backbones, datasets are usually not completely unstructured, but follow an image+question and answer structure. Recent works have shown that consolidating multiple questions for the same image into a multi-turn conversation where the image is shown only once improves model performance, and additionally also reduces the datasets memory footprint. We therefore experiment with deduplicating every image in our dataset internally using the same SSCD descriptors, manually inspect the resulting clusters and merge fitting samples into a multi-turn conversation.
|
| 298 |
Even when training for longer than the other ablations, we did not observe a significant difference, if at all rather one in favour against merging multiple samples together.
|
| 299 |
|
| 300 |
+
<HtmlEmbed src="internal-deduplication.html" desc="Average Ranking of Models trained with internally deduplicated / merged samples." />
|
| 301 |
|
| 302 |
### Should you train on multilingual data if your language backbone was not?
|
| 303 |
There are some multilingual datasets in our mixture, but since our Language Backbone is only trained on English data, we experimented with removing all the multilingual, mainly Chinese, subsets. This does also not seem to make a big difference, with slight advantages to leaving the data, even if it was not part of the Language Backbone's initial training. In our training setup with this configuration, one epoch over the whole dataset equals ~12k steps, so the benefit of unseen languages only materializes after the first full epoch.
|
| 304 |
|
| 305 |
+
<HtmlEmbed src="remove-ch.html" desc="Average Rank of Models trained with and without multilingual samples" />
|
| 306 |
|
| 307 |
### How can you assess the quality of the dataset?
|
| 308 |
|
|
|
|
| 324 |
|
| 325 |
To try to quantify the quality of the training data and the effect it has on the model’s performance, we run extensive ablations on our generated ratings.
|
| 326 |
|
| 327 |
+
<HtmlEmbed src="all-ratings.html" desc="Average Rank of Models trained with samples that have all 4 ratings above a certain threshold." />
|
| 328 |
|
| 329 |
Interestingly, both when only training on turns that have any of the 4 ratings under a certain threshold, as well as when training on turns where only a single rating at a time is used, we observe the same behaviour. Simply training on all samples of the dataset outperforms in benchmarks. This could mean multiple things.
|
| 330 |
We can almost see the same distribution in the ranks across all filters: From best to worst with an increase in the rating threshold. For example the visual dependency and the image correspondence rating both result in exactly the same distribution of rankings, corresponding to the natural order of options, 1 through 5. This could indicate that with a sufficiently large dataset that you train on long enough, it hurts more to remove samples, even if they were judged to be of low quality, than to train on them.
|
|
|
|
| 332 |
Alternatively, while we used state-of-the-art open source models to judge our datapoints, we still had to find a compromise between model quality and cost due to the raw required effort to rate every single turn of FineVision. The chosen models could simply not be powerful enough to recognize and judge the quality of samples.
|
| 333 |
Even though our first proposal to judge the quality of multimodal data on a per-turn basis did not yield any improvement in model performance, we believe that this is still an exciting and important direction of research and hope the release of FineVision encourages the community to develop techniques for this at large scale.
|
| 334 |
|
| 335 |
+
<HtmlEmbed src="formatting-filters.html" title="Formatting Filter" desc="Average Rank of Models that have the Formatting Filter above a threshold." />
|
| 336 |
+
<HtmlEmbed src="relevance-filters.html" title="Relevance Filter" desc="Average Rank of Models that have the Relevance Filter above a threshold." />
|
| 337 |
+
<HtmlEmbed src="visual-dependency-filters.html" title="Visual Dependency Filter" desc="Average Rank of Models that have the Visual Dependency Filter above a threshold." />
|
| 338 |
+
<HtmlEmbed src="image-correspondence-filters.html" title="Image Correspondence Filter" desc="Average Rank of Models that have the Image-Correspondence Filter above a threshold." />
|
| 339 |
|
| 340 |
### Should you train in multiple stages?
|
| 341 |
The standard training procedure of a VLM usually follows at least two stages. First, you train only the connecting module, potentially in addition the image encoder, and then you train the whole model in a second stage. Some work has even introduced an additional Stage 2.5, where you train the full model on a smaller subset of higher quality data. To investigate this on small models, we experiment both with single, dual and triple stage training.
|
| 342 |
|
| 343 |
#### 1 Stage vs 2 Stages
|
| 344 |
|
| 345 |
+
<HtmlEmbed src="ss-vs-s1.html" desc="Average Rank of a model trained for 20K steps in a single stage, and a model trained for the same 20k steps on top of pretraining the Modality Projection and Vision Encoder for 10k steps." />
|
| 346 |
|
| 347 |
We observe that at this model size, with this amount of available data, training only a single stage actually outperforms a multi stage approach.
|
| 348 |
|
| 349 |
#### 2 Stages vs 2.5 Stages
|
| 350 |
We also experiment if splitting the second stage results in any performance improvements. We take the baseline, and continue training for another 20k steps, both with the unfiltered (>= 1) as well as filtered subsets of FineVision according to our ratings.
|
| 351 |
|
| 352 |
+
<HtmlEmbed src="s25-ratings.html" desc="Average Rank if a model trained for an additional 20K steps on top of unfiltered training for 20K steps." />
|
| 353 |
|
| 354 |
Like in the previous results, we observe that the best outcome is simply achieved by training on as much data as possible.
|
| 355 |
|
app/src/content/assets/audio/audio-example.wav
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:552f71aef82738f9b5c9f1d6be495e0f83cec0eabf485066628badb3283cb4b8
|
| 3 |
-
size 48830444
|
|
|
|
|
|
|
|
|
|
|
|
app/src/content/assets/data/against_baselines.csv
CHANGED
|
@@ -1,3 +1,961 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
run,step,metric,value,stderr
|
| 2 |
+
FineVision,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
|
| 3 |
+
FineVision,1000,average,0.27120689295763617,
|
| 4 |
+
FineVision,1000,average_rank,2.8,
|
| 5 |
+
FineVision,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
|
| 6 |
+
FineVision,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
|
| 7 |
+
FineVision,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
|
| 8 |
+
FineVision,1000,mme_total_score,977.4280712284914,
|
| 9 |
+
FineVision,1000,mmmu_val_mmmu_acc,0.25222,
|
| 10 |
+
FineVision,1000,mmstar_average,0.23215874078908072,
|
| 11 |
+
FineVision,1000,ocrbench_ocrbench_accuracy,0.286,
|
| 12 |
+
FineVision,1000,seedbench_seed_all,0.2563646470261256,
|
| 13 |
+
FineVision,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
|
| 14 |
+
FineVision,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
|
| 15 |
+
FineVision,2000,average,0.3202068275596269,
|
| 16 |
+
FineVision,2000,average_rank,2.6,
|
| 17 |
+
FineVision,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
|
| 18 |
+
FineVision,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
|
| 19 |
+
FineVision,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
|
| 20 |
+
FineVision,2000,mme_total_score,1049.3036214485794,
|
| 21 |
+
FineVision,2000,mmmu_val_mmmu_acc,0.24556,
|
| 22 |
+
FineVision,2000,mmstar_average,0.21305462434540698,
|
| 23 |
+
FineVision,2000,ocrbench_ocrbench_accuracy,0.395,
|
| 24 |
+
FineVision,2000,seedbench_seed_all,0.258532518065592,
|
| 25 |
+
FineVision,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
|
| 26 |
+
FineVision,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
|
| 27 |
+
FineVision,3000,average,0.3507423834414229,
|
| 28 |
+
FineVision,3000,average_rank,2.6,
|
| 29 |
+
FineVision,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
|
| 30 |
+
FineVision,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
|
| 31 |
+
FineVision,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
|
| 32 |
+
FineVision,3000,mme_total_score,1170.2383953581434,
|
| 33 |
+
FineVision,3000,mmmu_val_mmmu_acc,0.27556,
|
| 34 |
+
FineVision,3000,mmstar_average,0.25432376938577683,
|
| 35 |
+
FineVision,3000,ocrbench_ocrbench_accuracy,0.436,
|
| 36 |
+
FineVision,3000,seedbench_seed_all,0.2792106725958866,
|
| 37 |
+
FineVision,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
|
| 38 |
+
FineVision,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
|
| 39 |
+
FineVision,4000,average,0.36961781722974835,
|
| 40 |
+
FineVision,4000,average_rank,2.7,
|
| 41 |
+
FineVision,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
|
| 42 |
+
FineVision,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
|
| 43 |
+
FineVision,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
|
| 44 |
+
FineVision,4000,mme_total_score,1155.203781512605,
|
| 45 |
+
FineVision,4000,mmmu_val_mmmu_acc,0.25556,
|
| 46 |
+
FineVision,4000,mmstar_average,0.2575590188757354,
|
| 47 |
+
FineVision,4000,ocrbench_ocrbench_accuracy,0.453,
|
| 48 |
+
FineVision,4000,seedbench_seed_all,0.33913285158421347,
|
| 49 |
+
FineVision,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
|
| 50 |
+
FineVision,5000,ai2d_exact_match,0.3125,0.008342439145556371
|
| 51 |
+
FineVision,5000,average,0.3974627910380972,
|
| 52 |
+
FineVision,5000,average_rank,2.6,
|
| 53 |
+
FineVision,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
|
| 54 |
+
FineVision,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
|
| 55 |
+
FineVision,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
|
| 56 |
+
FineVision,5000,mme_total_score,1181.4653861544618,
|
| 57 |
+
FineVision,5000,mmmu_val_mmmu_acc,0.26667,
|
| 58 |
+
FineVision,5000,mmstar_average,0.29596648146165705,
|
| 59 |
+
FineVision,5000,ocrbench_ocrbench_accuracy,0.462,
|
| 60 |
+
FineVision,5000,seedbench_seed_all,0.43107281823235133,
|
| 61 |
+
FineVision,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
|
| 62 |
+
FineVision,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
|
| 63 |
+
FineVision,6000,average,0.4161227404571003,
|
| 64 |
+
FineVision,6000,average_rank,2.1,
|
| 65 |
+
FineVision,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
|
| 66 |
+
FineVision,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
|
| 67 |
+
FineVision,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
|
| 68 |
+
FineVision,6000,mme_total_score,1284.1648659463785,
|
| 69 |
+
FineVision,6000,mmmu_val_mmmu_acc,0.27111,
|
| 70 |
+
FineVision,6000,mmstar_average,0.2978489412854164,
|
| 71 |
+
FineVision,6000,ocrbench_ocrbench_accuracy,0.495,
|
| 72 |
+
FineVision,6000,seedbench_seed_all,0.4795997776542524,
|
| 73 |
+
FineVision,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
|
| 74 |
+
FineVision,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
|
| 75 |
+
FineVision,7000,average,0.4291083177345374,
|
| 76 |
+
FineVision,7000,average_rank,2.4,
|
| 77 |
+
FineVision,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
|
| 78 |
+
FineVision,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
|
| 79 |
+
FineVision,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
|
| 80 |
+
FineVision,7000,mme_total_score,1185.875650260104,
|
| 81 |
+
FineVision,7000,mmmu_val_mmmu_acc,0.26556,
|
| 82 |
+
FineVision,7000,mmstar_average,0.31372400960777047,
|
| 83 |
+
FineVision,7000,ocrbench_ocrbench_accuracy,0.504,
|
| 84 |
+
FineVision,7000,seedbench_seed_all,0.4964424680377988,
|
| 85 |
+
FineVision,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
|
| 86 |
+
FineVision,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
|
| 87 |
+
FineVision,8000,average,0.43846759477995995,
|
| 88 |
+
FineVision,8000,average_rank,2.2,
|
| 89 |
+
FineVision,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
|
| 90 |
+
FineVision,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
|
| 91 |
+
FineVision,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
|
| 92 |
+
FineVision,8000,mme_total_score,1199.2409963985594,
|
| 93 |
+
FineVision,8000,mmmu_val_mmmu_acc,0.28111,
|
| 94 |
+
FineVision,8000,mmstar_average,0.33512257186205047,
|
| 95 |
+
FineVision,8000,ocrbench_ocrbench_accuracy,0.51,
|
| 96 |
+
FineVision,8000,seedbench_seed_all,0.5024458032240133,
|
| 97 |
+
FineVision,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
|
| 98 |
+
FineVision,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
|
| 99 |
+
FineVision,9000,average,0.4422510732201056,
|
| 100 |
+
FineVision,9000,average_rank,2.0,
|
| 101 |
+
FineVision,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
|
| 102 |
+
FineVision,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
|
| 103 |
+
FineVision,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
|
| 104 |
+
FineVision,9000,mme_total_score,1231.5195078031213,
|
| 105 |
+
FineVision,9000,mmmu_val_mmmu_acc,0.25889,
|
| 106 |
+
FineVision,9000,mmstar_average,0.3216444898242951,
|
| 107 |
+
FineVision,9000,ocrbench_ocrbench_accuracy,0.515,
|
| 108 |
+
FineVision,9000,seedbench_seed_all,0.5120622568093385,
|
| 109 |
+
FineVision,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
|
| 110 |
+
FineVision,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
|
| 111 |
+
FineVision,10000,average,0.4523875703250908,
|
| 112 |
+
FineVision,10000,average_rank,1.7,
|
| 113 |
+
FineVision,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
|
| 114 |
+
FineVision,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
|
| 115 |
+
FineVision,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
|
| 116 |
+
FineVision,10000,mme_total_score,1240.8218287314926,
|
| 117 |
+
FineVision,10000,mmmu_val_mmmu_acc,0.28778,
|
| 118 |
+
FineVision,10000,mmstar_average,0.32972717906018517,
|
| 119 |
+
FineVision,10000,ocrbench_ocrbench_accuracy,0.517,
|
| 120 |
+
FineVision,10000,seedbench_seed_all,0.5217342968315731,
|
| 121 |
+
FineVision,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
|
| 122 |
+
FineVision,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
|
| 123 |
+
FineVision,11000,average,0.4561398159525099,
|
| 124 |
+
FineVision,11000,average_rank,1.7,
|
| 125 |
+
FineVision,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
|
| 126 |
+
FineVision,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
|
| 127 |
+
FineVision,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
|
| 128 |
+
FineVision,11000,mme_total_score,1322.9488795518205,
|
| 129 |
+
FineVision,11000,mmmu_val_mmmu_acc,0.27778,
|
| 130 |
+
FineVision,11000,mmstar_average,0.3298563439522548,
|
| 131 |
+
FineVision,11000,ocrbench_ocrbench_accuracy,0.521,
|
| 132 |
+
FineVision,11000,seedbench_seed_all,0.5237354085603113,
|
| 133 |
+
FineVision,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
|
| 134 |
+
FineVision,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
|
| 135 |
+
FineVision,12000,average,0.4582751140055433,
|
| 136 |
+
FineVision,12000,average_rank,1.6,
|
| 137 |
+
FineVision,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 138 |
+
FineVision,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
|
| 139 |
+
FineVision,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
|
| 140 |
+
FineVision,12000,mme_total_score,1225.6453581432572,
|
| 141 |
+
FineVision,12000,mmmu_val_mmmu_acc,0.27889,
|
| 142 |
+
FineVision,12000,mmstar_average,0.34010867846816534,
|
| 143 |
+
FineVision,12000,ocrbench_ocrbench_accuracy,0.512,
|
| 144 |
+
FineVision,12000,seedbench_seed_all,0.5350194552529183,
|
| 145 |
+
FineVision,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
|
| 146 |
+
FineVision,13000,ai2d_exact_match,0.4375,0.008928571428571428
|
| 147 |
+
FineVision,13000,average,0.4692868662590049,
|
| 148 |
+
FineVision,13000,average_rank,1.5,
|
| 149 |
+
FineVision,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
|
| 150 |
+
FineVision,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
|
| 151 |
+
FineVision,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
|
| 152 |
+
FineVision,13000,mme_total_score,1281.7122849139657,
|
| 153 |
+
FineVision,13000,mmmu_val_mmmu_acc,0.28222,
|
| 154 |
+
FineVision,13000,mmstar_average,0.3453069542917521,
|
| 155 |
+
FineVision,13000,ocrbench_ocrbench_accuracy,0.549,
|
| 156 |
+
FineVision,13000,seedbench_seed_all,0.5442468037798777,
|
| 157 |
+
FineVision,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
|
| 158 |
+
FineVision,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
|
| 159 |
+
FineVision,14000,average,0.47352486841689195,
|
| 160 |
+
FineVision,14000,average_rank,1.4,
|
| 161 |
+
FineVision,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
|
| 162 |
+
FineVision,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
|
| 163 |
+
FineVision,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
|
| 164 |
+
FineVision,14000,mme_total_score,1309.1444577831132,
|
| 165 |
+
FineVision,14000,mmmu_val_mmmu_acc,0.28111,
|
| 166 |
+
FineVision,14000,mmstar_average,0.34575818188776586,
|
| 167 |
+
FineVision,14000,ocrbench_ocrbench_accuracy,0.551,
|
| 168 |
+
FineVision,14000,seedbench_seed_all,0.5483602001111729,
|
| 169 |
+
FineVision,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
|
| 170 |
+
FineVision,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
|
| 171 |
+
FineVision,15000,average,0.47878665012878824,
|
| 172 |
+
FineVision,15000,average_rank,1.3,
|
| 173 |
+
FineVision,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 174 |
+
FineVision,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
|
| 175 |
+
FineVision,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
|
| 176 |
+
FineVision,15000,mme_total_score,1384.2171868747498,
|
| 177 |
+
FineVision,15000,mmmu_val_mmmu_acc,0.30222,
|
| 178 |
+
FineVision,15000,mmstar_average,0.35408135695920684,
|
| 179 |
+
FineVision,15000,ocrbench_ocrbench_accuracy,0.558,
|
| 180 |
+
FineVision,15000,seedbench_seed_all,0.5411339633129516,
|
| 181 |
+
FineVision,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
|
| 182 |
+
FineVision,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
|
| 183 |
+
FineVision,16000,average,0.47665128022935843,
|
| 184 |
+
FineVision,16000,average_rank,1.5,
|
| 185 |
+
FineVision,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 186 |
+
FineVision,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
|
| 187 |
+
FineVision,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
|
| 188 |
+
FineVision,16000,mme_total_score,1317.8491396558625,
|
| 189 |
+
FineVision,16000,mmmu_val_mmmu_acc,0.27556,
|
| 190 |
+
FineVision,16000,mmstar_average,0.33214333327093315,
|
| 191 |
+
FineVision,16000,ocrbench_ocrbench_accuracy,0.56,
|
| 192 |
+
FineVision,16000,seedbench_seed_all,0.5463590883824346,
|
| 193 |
+
FineVision,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
|
| 194 |
+
FineVision,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
|
| 195 |
+
FineVision,17000,average,0.4777141780162423,
|
| 196 |
+
FineVision,17000,average_rank,1.3,
|
| 197 |
+
FineVision,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 198 |
+
FineVision,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
|
| 199 |
+
FineVision,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
|
| 200 |
+
FineVision,17000,mme_total_score,1381.9161664665867,
|
| 201 |
+
FineVision,17000,mmmu_val_mmmu_acc,0.27667,
|
| 202 |
+
FineVision,17000,mmstar_average,0.3370289492329521,
|
| 203 |
+
FineVision,17000,ocrbench_ocrbench_accuracy,0.519,
|
| 204 |
+
FineVision,17000,seedbench_seed_all,0.5510283490828238,
|
| 205 |
+
FineVision,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
|
| 206 |
+
FineVision,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
|
| 207 |
+
FineVision,18000,average,0.4819834595278701,
|
| 208 |
+
FineVision,18000,average_rank,1.2,
|
| 209 |
+
FineVision,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
|
| 210 |
+
FineVision,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
|
| 211 |
+
FineVision,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
|
| 212 |
+
FineVision,18000,mme_total_score,1336.922769107643,
|
| 213 |
+
FineVision,18000,mmmu_val_mmmu_acc,0.28667,
|
| 214 |
+
FineVision,18000,mmstar_average,0.34482796716566916,
|
| 215 |
+
FineVision,18000,ocrbench_ocrbench_accuracy,0.533,
|
| 216 |
+
FineVision,18000,seedbench_seed_all,0.5543079488604781,
|
| 217 |
+
FineVision,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
|
| 218 |
+
FineVision,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
|
| 219 |
+
FineVision,19000,average,0.4899006713916878,
|
| 220 |
+
FineVision,19000,average_rank,1.2,
|
| 221 |
+
FineVision,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
|
| 222 |
+
FineVision,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
|
| 223 |
+
FineVision,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
|
| 224 |
+
FineVision,19000,mme_total_score,1406.6628651460583,
|
| 225 |
+
FineVision,19000,mmmu_val_mmmu_acc,0.28333,
|
| 226 |
+
FineVision,19000,mmstar_average,0.356220913822775,
|
| 227 |
+
FineVision,19000,ocrbench_ocrbench_accuracy,0.577,
|
| 228 |
+
FineVision,19000,seedbench_seed_all,0.554585881045025,
|
| 229 |
+
FineVision,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
|
| 230 |
+
FineVision,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
|
| 231 |
+
FineVision,20000,average,0.4873169067639118,
|
| 232 |
+
FineVision,20000,average_rank,1.2,
|
| 233 |
+
FineVision,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
|
| 234 |
+
FineVision,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
|
| 235 |
+
FineVision,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
|
| 236 |
+
FineVision,20000,mme_total_score,1324.6738695478193,
|
| 237 |
+
FineVision,20000,mmmu_val_mmmu_acc,0.30111,
|
| 238 |
+
FineVision,20000,mmstar_average,0.33806766134497995,
|
| 239 |
+
FineVision,20000,ocrbench_ocrbench_accuracy,0.555,
|
| 240 |
+
FineVision,20000,seedbench_seed_all,0.5587548638132296,
|
| 241 |
+
FineVision,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
|
| 242 |
+
Cauldron,1000,ai2d_exact_match,0.28886010362694303,0.008157423105367313
|
| 243 |
+
Cauldron,1000,average,0.29904301214549334,
|
| 244 |
+
Cauldron,1000,average_rank,1.9,
|
| 245 |
+
Cauldron,1000,chartqa_relaxed_overall,0.1936,0.007903961351247664
|
| 246 |
+
Cauldron,1000,docvqa_val_anls,0.32153744261519257,0.005317068996930092
|
| 247 |
+
Cauldron,1000,infovqa_val_anls,0.1431990055083018,0.005424936025458022
|
| 248 |
+
Cauldron,1000,mme_total_score,1172.0779311724689,
|
| 249 |
+
Cauldron,1000,mmmu_val_mmmu_acc,0.27667,
|
| 250 |
+
Cauldron,1000,mmstar_average,0.2911329978035828,
|
| 251 |
+
Cauldron,1000,ocrbench_ocrbench_accuracy,0.337,
|
| 252 |
+
Cauldron,1000,seedbench_seed_all,0.39360755975541967,
|
| 253 |
+
Cauldron,1000,textvqa_val_exact_match,0.44578,0.0067711747933144
|
| 254 |
+
Cauldron,2000,ai2d_exact_match,0.41871761658031087,0.008879446246519871
|
| 255 |
+
Cauldron,2000,average,0.34894207663644056,
|
| 256 |
+
Cauldron,2000,average_rank,1.9,
|
| 257 |
+
Cauldron,2000,chartqa_relaxed_overall,0.2056,0.00808440468059435
|
| 258 |
+
Cauldron,2000,docvqa_val_anls,0.37496112947656884,0.005489559822643159
|
| 259 |
+
Cauldron,2000,infovqa_val_anls,0.14667060624395192,0.005473110880489631
|
| 260 |
+
Cauldron,2000,mme_total_score,1248.6002400960383,
|
| 261 |
+
Cauldron,2000,mmmu_val_mmmu_acc,0.28667,
|
| 262 |
+
Cauldron,2000,mmstar_average,0.34478967650439835,
|
| 263 |
+
Cauldron,2000,ocrbench_ocrbench_accuracy,0.368,
|
| 264 |
+
Cauldron,2000,seedbench_seed_all,0.5013896609227348,
|
| 265 |
+
Cauldron,2000,textvqa_val_exact_match,0.49368,0.0068081481840761415
|
| 266 |
+
Cauldron,3000,ai2d_exact_match,0.4653497409326425,0.00897751861457722
|
| 267 |
+
Cauldron,3000,average,0.3647655686453986,
|
| 268 |
+
Cauldron,3000,average_rank,2.4,
|
| 269 |
+
Cauldron,3000,chartqa_relaxed_overall,0.2192,0.008275744025504309
|
| 270 |
+
Cauldron,3000,docvqa_val_anls,0.3999560247980121,0.005545460541574292
|
| 271 |
+
Cauldron,3000,infovqa_val_anls,0.15452276899525894,0.005625373377223539
|
| 272 |
+
Cauldron,3000,mme_total_score,1164.4316726690677,
|
| 273 |
+
Cauldron,3000,mmmu_val_mmmu_acc,0.27667,
|
| 274 |
+
Cauldron,3000,mmstar_average,0.34444117730168444,
|
| 275 |
+
Cauldron,3000,ocrbench_ocrbench_accuracy,0.403,
|
| 276 |
+
Cauldron,3000,seedbench_seed_all,0.5147304057809894,
|
| 277 |
+
Cauldron,3000,textvqa_val_exact_match,0.50502,0.006802809387533405
|
| 278 |
+
Cauldron,4000,ai2d_exact_match,0.48121761658031087,0.008992802471886854
|
| 279 |
+
Cauldron,4000,average,0.3694904966669109,
|
| 280 |
+
Cauldron,4000,average_rank,2.3,
|
| 281 |
+
Cauldron,4000,chartqa_relaxed_overall,0.2184,0.008264859294607735
|
| 282 |
+
Cauldron,4000,docvqa_val_anls,0.40927640030259055,0.005557758057811595
|
| 283 |
+
Cauldron,4000,infovqa_val_anls,0.15259984907145144,0.005629341537638722
|
| 284 |
+
Cauldron,4000,mme_total_score,1238.5236094437776,
|
| 285 |
+
Cauldron,4000,mmmu_val_mmmu_acc,0.26667,
|
| 286 |
+
Cauldron,4000,mmstar_average,0.36056167686607765,
|
| 287 |
+
Cauldron,4000,ocrbench_ocrbench_accuracy,0.414,
|
| 288 |
+
Cauldron,4000,seedbench_seed_all,0.5240689271817677,
|
| 289 |
+
Cauldron,4000,textvqa_val_exact_match,0.49862,0.006804563140709856
|
| 290 |
+
Cauldron,5000,ai2d_exact_match,0.48607512953367876,0.008995663534025174
|
| 291 |
+
Cauldron,5000,average,0.3715613183242104,
|
| 292 |
+
Cauldron,5000,average_rank,2.3,
|
| 293 |
+
Cauldron,5000,chartqa_relaxed_overall,0.2236,0.008334806752495259
|
| 294 |
+
Cauldron,5000,docvqa_val_anls,0.42332206291362884,0.005573327842684563
|
| 295 |
+
Cauldron,5000,infovqa_val_anls,0.15868297927477548,0.005670852175948406
|
| 296 |
+
Cauldron,5000,mme_total_score,1159.8522408963586,
|
| 297 |
+
Cauldron,5000,mmmu_val_mmmu_acc,0.26889,
|
| 298 |
+
Cauldron,5000,mmstar_average,0.360337335219157,
|
| 299 |
+
Cauldron,5000,ocrbench_ocrbench_accuracy,0.401,
|
| 300 |
+
Cauldron,5000,seedbench_seed_all,0.5198443579766537,
|
| 301 |
+
Cauldron,5000,textvqa_val_exact_match,0.5023,0.0068036313744923
|
| 302 |
+
Cauldron,6000,ai2d_exact_match,0.5025906735751295,0.008999033321198393
|
| 303 |
+
Cauldron,6000,average,0.3678206000506273,
|
| 304 |
+
Cauldron,6000,average_rank,2.2,
|
| 305 |
+
Cauldron,6000,chartqa_relaxed_overall,0.2228,0.008324168469720259
|
| 306 |
+
Cauldron,6000,docvqa_val_anls,0.4147154618557465,0.005557478918091434
|
| 307 |
+
Cauldron,6000,infovqa_val_anls,0.14825798330117057,0.005517775162348899
|
| 308 |
+
Cauldron,6000,mme_total_score,1182.059923969588,
|
| 309 |
+
Cauldron,6000,mmmu_val_mmmu_acc,0.27111,
|
| 310 |
+
Cauldron,6000,mmstar_average,0.3484854117958612,
|
| 311 |
+
Cauldron,6000,ocrbench_ocrbench_accuracy,0.391,
|
| 312 |
+
Cauldron,6000,seedbench_seed_all,0.5185658699277377,
|
| 313 |
+
Cauldron,6000,textvqa_val_exact_match,0.49285999999999996,0.0068052528515312825
|
| 314 |
+
Cauldron,7000,ai2d_exact_match,0.49838082901554404,0.008999106932714641
|
| 315 |
+
Cauldron,7000,average,0.3749288136256422,
|
| 316 |
+
Cauldron,7000,average_rank,2.0,
|
| 317 |
+
Cauldron,7000,chartqa_relaxed_overall,0.2276,0.00838733777631434
|
| 318 |
+
Cauldron,7000,docvqa_val_anls,0.42525461500166023,0.005595478547875609
|
| 319 |
+
Cauldron,7000,infovqa_val_anls,0.14305767989732765,0.005444282186253047
|
| 320 |
+
Cauldron,7000,mme_total_score,1262.065426170468,
|
| 321 |
+
Cauldron,7000,mmmu_val_mmmu_acc,0.29333,
|
| 322 |
+
Cauldron,7000,mmstar_average,0.35012603751558075,
|
| 323 |
+
Cauldron,7000,ocrbench_ocrbench_accuracy,0.403,
|
| 324 |
+
Cauldron,7000,seedbench_seed_all,0.5222901612006671,
|
| 325 |
+
Cauldron,7000,textvqa_val_exact_match,0.51132,0.00682164778449453
|
| 326 |
+
Cauldron,8000,ai2d_exact_match,0.49028497409326427,0.008997455247470544
|
| 327 |
+
Cauldron,8000,average,0.3674367285685282,
|
| 328 |
+
Cauldron,8000,average_rank,2.8,
|
| 329 |
+
Cauldron,8000,chartqa_relaxed_overall,0.2256,0.008361209238380008
|
| 330 |
+
Cauldron,8000,docvqa_val_anls,0.40937518311359955,0.005568234588180622
|
| 331 |
+
Cauldron,8000,infovqa_val_anls,0.14953110986986237,0.005518589617885333
|
| 332 |
+
Cauldron,8000,mme_total_score,1210.7711084433772,
|
| 333 |
+
Cauldron,8000,mmmu_val_mmmu_acc,0.28889,
|
| 334 |
+
Cauldron,8000,mmstar_average,0.32742675529850473,
|
| 335 |
+
Cauldron,8000,ocrbench_ocrbench_accuracy,0.406,
|
| 336 |
+
Cauldron,8000,seedbench_seed_all,0.512562534741523,
|
| 337 |
+
Cauldron,8000,textvqa_val_exact_match,0.49726000000000004,0.006823680165585169
|
| 338 |
+
Cauldron,9000,ai2d_exact_match,0.49287564766839376,0.008998240543632314
|
| 339 |
+
Cauldron,9000,average,0.3635862393983371,
|
| 340 |
+
Cauldron,9000,average_rank,3.0,
|
| 341 |
+
Cauldron,9000,chartqa_relaxed_overall,0.2264,0.008371693383064148
|
| 342 |
+
Cauldron,9000,docvqa_val_anls,0.4019142603693516,0.005557969721056488
|
| 343 |
+
Cauldron,9000,infovqa_val_anls,0.15576345355793061,0.005631711679425604
|
| 344 |
+
Cauldron,9000,mme_total_score,1161.06112444978,
|
| 345 |
+
Cauldron,9000,mmmu_val_mmmu_acc,0.27,
|
| 346 |
+
Cauldron,9000,mmstar_average,0.33510800699714055,
|
| 347 |
+
Cauldron,9000,ocrbench_ocrbench_accuracy,0.401,
|
| 348 |
+
Cauldron,9000,seedbench_seed_all,0.5066147859922179,
|
| 349 |
+
Cauldron,9000,textvqa_val_exact_match,0.4825999999999999,0.006824717089570126
|
| 350 |
+
Cauldron,10000,ai2d_exact_match,0.4951424870466321,0.008998729431386465
|
| 351 |
+
Cauldron,10000,average,0.3613896970671388,
|
| 352 |
+
Cauldron,10000,average_rank,3.2,
|
| 353 |
+
Cauldron,10000,chartqa_relaxed_overall,0.2276,0.00838733777631434
|
| 354 |
+
Cauldron,10000,docvqa_val_anls,0.400968382089468,0.005551850287661274
|
| 355 |
+
Cauldron,10000,infovqa_val_anls,0.15155496077062244,0.0055346119867504375
|
| 356 |
+
Cauldron,10000,mme_total_score,1230.2276910764306,
|
| 357 |
+
Cauldron,10000,mmmu_val_mmmu_acc,0.26,
|
| 358 |
+
Cauldron,10000,mmstar_average,0.32908517910608676,
|
| 359 |
+
Cauldron,10000,ocrbench_ocrbench_accuracy,0.395,
|
| 360 |
+
Cauldron,10000,seedbench_seed_all,0.4972762645914397,
|
| 361 |
+
Cauldron,10000,textvqa_val_exact_match,0.49588000000000004,0.006836984276038533
|
| 362 |
+
Cauldron,11000,ai2d_exact_match,0.49676165803108807,0.008998965371572357
|
| 363 |
+
Cauldron,11000,average,0.36198497174992383,
|
| 364 |
+
Cauldron,11000,average_rank,3.0,
|
| 365 |
+
Cauldron,11000,chartqa_relaxed_overall,0.2284,0.008397713059747491
|
| 366 |
+
Cauldron,11000,docvqa_val_anls,0.4051111426655002,0.0055740680205303966
|
| 367 |
+
Cauldron,11000,infovqa_val_anls,0.14954437197310022,0.005537262124650125
|
| 368 |
+
Cauldron,11000,mme_total_score,1210.5605242096838,
|
| 369 |
+
Cauldron,11000,mmmu_val_mmmu_acc,0.27111,
|
| 370 |
+
Cauldron,11000,mmstar_average,0.33316183100069335,
|
| 371 |
+
Cauldron,11000,ocrbench_ocrbench_accuracy,0.383,
|
| 372 |
+
Cauldron,11000,seedbench_seed_all,0.5043357420789327,
|
| 373 |
+
Cauldron,11000,textvqa_val_exact_match,0.48644,0.006834542228525236
|
| 374 |
+
Cauldron,12000,ai2d_exact_match,0.5009715025906736,0.008999137132137068
|
| 375 |
+
Cauldron,12000,average,0.3661893496614986,
|
| 376 |
+
Cauldron,12000,average_rank,3.2,
|
| 377 |
+
Cauldron,12000,chartqa_relaxed_overall,0.2332,0.008459061785476934
|
| 378 |
+
Cauldron,12000,docvqa_val_anls,0.40826612382074784,0.0055749766883040515
|
| 379 |
+
Cauldron,12000,infovqa_val_anls,0.1451043668322714,0.0054346014264420334
|
| 380 |
+
Cauldron,12000,mme_total_score,1204.859843937575,
|
| 381 |
+
Cauldron,12000,mmmu_val_mmmu_acc,0.29222,
|
| 382 |
+
Cauldron,12000,mmstar_average,0.3322773065724958,
|
| 383 |
+
Cauldron,12000,ocrbench_ocrbench_accuracy,0.386,
|
| 384 |
+
Cauldron,12000,seedbench_seed_all,0.5047248471372985,
|
| 385 |
+
Cauldron,12000,textvqa_val_exact_match,0.49294000000000004,0.006824466715369768
|
| 386 |
+
Cauldron,13000,ai2d_exact_match,0.4880181347150259,0.00899656981935399
|
| 387 |
+
Cauldron,13000,average,0.3609903418270159,
|
| 388 |
+
Cauldron,13000,average_rank,3.2,
|
| 389 |
+
Cauldron,13000,chartqa_relaxed_overall,0.23,0.008418334000200726
|
| 390 |
+
Cauldron,13000,docvqa_val_anls,0.39428463826041577,0.005550710740937849
|
| 391 |
+
Cauldron,13000,infovqa_val_anls,0.15077272156398794,0.005555043265840396
|
| 392 |
+
Cauldron,13000,mme_total_score,1199.0380152060825,
|
| 393 |
+
Cauldron,13000,mmmu_val_mmmu_acc,0.27667,
|
| 394 |
+
Cauldron,13000,mmstar_average,0.3323119954668039,
|
| 395 |
+
Cauldron,13000,ocrbench_ocrbench_accuracy,0.39,
|
| 396 |
+
Cauldron,13000,seedbench_seed_all,0.5000555864369094,
|
| 397 |
+
Cauldron,13000,textvqa_val_exact_match,0.4868,0.006822203492428118
|
| 398 |
+
Cauldron,14000,ai2d_exact_match,0.49060880829015546,0.00899756662777987
|
| 399 |
+
Cauldron,14000,average,0.36202481121184005,
|
| 400 |
+
Cauldron,14000,average_rank,2.9,
|
| 401 |
+
Cauldron,14000,chartqa_relaxed_overall,0.2264,0.008371693383064148
|
| 402 |
+
Cauldron,14000,docvqa_val_anls,0.40917044569115923,0.0055666808292464285
|
| 403 |
+
Cauldron,14000,infovqa_val_anls,0.1424839907142797,0.0054301311838352165
|
| 404 |
+
Cauldron,14000,mme_total_score,1183.6356542617045,
|
| 405 |
+
Cauldron,14000,mmmu_val_mmmu_acc,0.29,
|
| 406 |
+
Cauldron,14000,mmstar_average,0.31528335804531843,
|
| 407 |
+
Cauldron,14000,ocrbench_ocrbench_accuracy,0.393,
|
| 408 |
+
Cauldron,14000,seedbench_seed_all,0.5020566981656476,
|
| 409 |
+
Cauldron,14000,textvqa_val_exact_match,0.48922,0.006837726904596613
|
| 410 |
+
Cauldron,15000,ai2d_exact_match,0.4896373056994819,0.008997221155546275
|
| 411 |
+
Cauldron,15000,average,0.3560155869130515,
|
| 412 |
+
Cauldron,15000,average_rank,3.2,
|
| 413 |
+
Cauldron,15000,chartqa_relaxed_overall,0.2264,0.008371693383064148
|
| 414 |
+
Cauldron,15000,docvqa_val_anls,0.39997251595677663,0.0055655493795707745
|
| 415 |
+
Cauldron,15000,infovqa_val_anls,0.13834600428667498,0.005423970029609658
|
| 416 |
+
Cauldron,15000,mme_total_score,1171.8512404961984,
|
| 417 |
+
Cauldron,15000,mmmu_val_mmmu_acc,0.27667,
|
| 418 |
+
Cauldron,15000,mmstar_average,0.31369390041016126,
|
| 419 |
+
Cauldron,15000,ocrbench_ocrbench_accuracy,0.385,
|
| 420 |
+
Cauldron,15000,seedbench_seed_all,0.5010005558643691,
|
| 421 |
+
Cauldron,15000,textvqa_val_exact_match,0.47342,0.006818885551175648
|
| 422 |
+
Cauldron,16000,ai2d_exact_match,0.4838082901554404,0.008994434238637765
|
| 423 |
+
Cauldron,16000,average,0.3566345947908368,
|
| 424 |
+
Cauldron,16000,average_rank,3.4,
|
| 425 |
+
Cauldron,16000,chartqa_relaxed_overall,0.22,0.008286583553358689
|
| 426 |
+
Cauldron,16000,docvqa_val_anls,0.40446794741098796,0.005565712054024941
|
| 427 |
+
Cauldron,16000,infovqa_val_anls,0.1414810779340465,0.005414255001486301
|
| 428 |
+
Cauldron,16000,mme_total_score,1163.921468587435,
|
| 429 |
+
Cauldron,16000,mmmu_val_mmmu_acc,0.26444,
|
| 430 |
+
Cauldron,16000,mmstar_average,0.3211159497904861,
|
| 431 |
+
Cauldron,16000,ocrbench_ocrbench_accuracy,0.392,
|
| 432 |
+
Cauldron,16000,seedbench_seed_all,0.5045580878265703,
|
| 433 |
+
Cauldron,16000,textvqa_val_exact_match,0.47784,0.0068411071493878735
|
| 434 |
+
Cauldron,17000,ai2d_exact_match,0.4795984455958549,0.008991659681159872
|
| 435 |
+
Cauldron,17000,average,0.35664663136828295,
|
| 436 |
+
Cauldron,17000,average_rank,3.3,
|
| 437 |
+
Cauldron,17000,chartqa_relaxed_overall,0.2232,0.008329493152795851
|
| 438 |
+
Cauldron,17000,docvqa_val_anls,0.39683521379075226,0.0055483771434975925
|
| 439 |
+
Cauldron,17000,infovqa_val_anls,0.14519383287788715,0.005493162839439223
|
| 440 |
+
Cauldron,17000,mme_total_score,1216.2439975990396,
|
| 441 |
+
Cauldron,17000,mmmu_val_mmmu_acc,0.27667,
|
| 442 |
+
Cauldron,17000,mmstar_average,0.3294722845469949,
|
| 443 |
+
Cauldron,17000,ocrbench_ocrbench_accuracy,0.386,
|
| 444 |
+
Cauldron,17000,seedbench_seed_all,0.4938299055030573,
|
| 445 |
+
Cauldron,17000,textvqa_val_exact_match,0.47902,0.006822615153700749
|
| 446 |
+
Cauldron,18000,ai2d_exact_match,0.48575129533678757,0.008995499260034972
|
| 447 |
+
Cauldron,18000,average,0.3559572601168983,
|
| 448 |
+
Cauldron,18000,average_rank,3.3,
|
| 449 |
+
Cauldron,18000,chartqa_relaxed_overall,0.22,0.008286583553358689
|
| 450 |
+
Cauldron,18000,docvqa_val_anls,0.39553075414155453,0.005560094600545488
|
| 451 |
+
Cauldron,18000,infovqa_val_anls,0.1441200977793978,0.005482620397489444
|
| 452 |
+
Cauldron,18000,mme_total_score,1146.935774309724,
|
| 453 |
+
Cauldron,18000,mmmu_val_mmmu_acc,0.28333,
|
| 454 |
+
Cauldron,18000,mmstar_average,0.31718334943636844,
|
| 455 |
+
Cauldron,18000,ocrbench_ocrbench_accuracy,0.393,
|
| 456 |
+
Cauldron,18000,seedbench_seed_all,0.49571984435797667,
|
| 457 |
+
Cauldron,18000,textvqa_val_exact_match,0.46897999999999995,0.006834829544251984
|
| 458 |
+
Cauldron,19000,ai2d_exact_match,0.47506476683937826,0.00898795641911507
|
| 459 |
+
Cauldron,19000,average,0.35389113555756785,
|
| 460 |
+
Cauldron,19000,average_rank,3.4,
|
| 461 |
+
Cauldron,19000,chartqa_relaxed_overall,0.2196,0.008281169428700436
|
| 462 |
+
Cauldron,19000,docvqa_val_anls,0.3927677091095705,0.005557918115613283
|
| 463 |
+
Cauldron,19000,infovqa_val_anls,0.14242963523056748,0.005420426599891758
|
| 464 |
+
Cauldron,19000,mme_total_score,1156.7713085234095,
|
| 465 |
+
Cauldron,19000,mmmu_val_mmmu_acc,0.26667,
|
| 466 |
+
Cauldron,19000,mmstar_average,0.3300183589775604,
|
| 467 |
+
Cauldron,19000,ocrbench_ocrbench_accuracy,0.393,
|
| 468 |
+
Cauldron,19000,seedbench_seed_all,0.4895497498610339,
|
| 469 |
+
Cauldron,19000,textvqa_val_exact_match,0.47591999999999995,0.0068329619195279245
|
| 470 |
+
Cauldron,20000,ai2d_exact_match,0.48218911917098445,0.008993442748995703
|
| 471 |
+
Cauldron,20000,average,0.35315414152261965,
|
| 472 |
+
Cauldron,20000,average_rank,3.1,
|
| 473 |
+
Cauldron,20000,chartqa_relaxed_overall,0.2228,0.008324168469720259
|
| 474 |
+
Cauldron,20000,docvqa_val_anls,0.3995019956467228,0.005554102577571356
|
| 475 |
+
Cauldron,20000,infovqa_val_anls,0.13561089161386572,0.005312619238987202
|
| 476 |
+
Cauldron,20000,mme_total_score,1205.715886354542,
|
| 477 |
+
Cauldron,20000,mmmu_val_mmmu_acc,0.27667,
|
| 478 |
+
Cauldron,20000,mmstar_average,0.3019064734976851,
|
| 479 |
+
Cauldron,20000,ocrbench_ocrbench_accuracy,0.392,
|
| 480 |
+
Cauldron,20000,seedbench_seed_all,0.49182879377431904,
|
| 481 |
+
Cauldron,20000,textvqa_val_exact_match,0.4758799999999999,0.0068345144112400185
|
| 482 |
+
Cambrian,1000,ai2d_exact_match,0.2969559585492228,0.00822373246069825
|
| 483 |
+
Cambrian,1000,average,0.2927820669039429,
|
| 484 |
+
Cambrian,1000,average_rank,2.3,
|
| 485 |
+
Cambrian,1000,chartqa_relaxed_overall,0.3652,0.009631650506356148
|
| 486 |
+
Cambrian,1000,docvqa_val_anls,0.3321611875422322,0.005779917542014128
|
| 487 |
+
Cambrian,1000,infovqa_val_anls,0.14245417507906105,0.005737797137238206
|
| 488 |
+
Cambrian,1000,mme_total_score,1199.468087234894,
|
| 489 |
+
Cambrian,1000,mmmu_val_mmmu_acc,0.24556,
|
| 490 |
+
Cambrian,1000,mmstar_average,0.25503356223234036,
|
| 491 |
+
Cambrian,1000,ocrbench_ocrbench_accuracy,0.257,
|
| 492 |
+
Cambrian,1000,seedbench_seed_all,0.3486937187326292,
|
| 493 |
+
Cambrian,1000,textvqa_val_exact_match,0.39198,0.0066503820519040295
|
| 494 |
+
Cambrian,2000,ai2d_exact_match,0.36204663212435234,0.008649846657326264
|
| 495 |
+
Cambrian,2000,average,0.34977426052091565,
|
| 496 |
+
Cambrian,2000,average_rank,2.3,
|
| 497 |
+
Cambrian,2000,chartqa_relaxed_overall,0.4272,0.009895414680177737
|
| 498 |
+
Cambrian,2000,docvqa_val_anls,0.4044005302893221,0.006099745172446295
|
| 499 |
+
Cambrian,2000,infovqa_val_anls,0.16067123444748188,0.005906486800204124
|
| 500 |
+
Cambrian,2000,mme_total_score,1191.6502601040415,
|
| 501 |
+
Cambrian,2000,mmmu_val_mmmu_acc,0.27,
|
| 502 |
+
Cambrian,2000,mmstar_average,0.3140124492167455,
|
| 503 |
+
Cambrian,2000,ocrbench_ocrbench_accuracy,0.293,
|
| 504 |
+
Cambrian,2000,seedbench_seed_all,0.4954974986103391,
|
| 505 |
+
Cambrian,2000,textvqa_val_exact_match,0.42113999999999996,0.006720777771268006
|
| 506 |
+
Cambrian,3000,ai2d_exact_match,0.3954015544041451,0.008800034697838395
|
| 507 |
+
Cambrian,3000,average,0.36894910100121225,
|
| 508 |
+
Cambrian,3000,average_rank,1.9,
|
| 509 |
+
Cambrian,3000,chartqa_relaxed_overall,0.4512,0.00995424828018316
|
| 510 |
+
Cambrian,3000,docvqa_val_anls,0.4317442116227413,0.006203480507897517
|
| 511 |
+
Cambrian,3000,infovqa_val_anls,0.17555075927653038,0.006227695613801885
|
| 512 |
+
Cambrian,3000,mme_total_score,1311.187975190076,
|
| 513 |
+
Cambrian,3000,mmmu_val_mmmu_acc,0.28222,
|
| 514 |
+
Cambrian,3000,mmstar_average,0.3241666733128301,
|
| 515 |
+
Cambrian,3000,ocrbench_ocrbench_accuracy,0.289,
|
| 516 |
+
Cambrian,3000,seedbench_seed_all,0.5216787103946637,
|
| 517 |
+
Cambrian,3000,textvqa_val_exact_match,0.4495799999999999,0.006762330259763156
|
| 518 |
+
Cambrian,4000,ai2d_exact_match,0.3960492227979275,0.00880252039912977
|
| 519 |
+
Cambrian,4000,average,0.38270567946732525,
|
| 520 |
+
Cambrian,4000,average_rank,2.2,
|
| 521 |
+
Cambrian,4000,chartqa_relaxed_overall,0.4764,0.009990852959439592
|
| 522 |
+
Cambrian,4000,docvqa_val_anls,0.46350742276594625,0.006276498296530657
|
| 523 |
+
Cambrian,4000,infovqa_val_anls,0.17819320935276328,0.006230849386066924
|
| 524 |
+
Cambrian,4000,mme_total_score,1239.0667266906762,
|
| 525 |
+
Cambrian,4000,mmmu_val_mmmu_acc,0.26778,
|
| 526 |
+
Cambrian,4000,mmstar_average,0.3298927333298682,
|
| 527 |
+
Cambrian,4000,ocrbench_ocrbench_accuracy,0.334,
|
| 528 |
+
Cambrian,4000,seedbench_seed_all,0.5273485269594219,
|
| 529 |
+
Cambrian,4000,textvqa_val_exact_match,0.47118000000000004,0.0067854764061200295
|
| 530 |
+
Cambrian,5000,ai2d_exact_match,0.40382124352331605,0.00883109414387431
|
| 531 |
+
Cambrian,5000,average,0.3896927239658996,
|
| 532 |
+
Cambrian,5000,average_rank,2.2,
|
| 533 |
+
Cambrian,5000,chartqa_relaxed_overall,0.4912,0.01000045137036546
|
| 534 |
+
Cambrian,5000,docvqa_val_anls,0.47067674424138894,0.006257580396259991
|
| 535 |
+
Cambrian,5000,infovqa_val_anls,0.19432385292037085,0.00653326869729313
|
| 536 |
+
Cambrian,5000,mme_total_score,1214.843337334934,
|
| 537 |
+
Cambrian,5000,mmmu_val_mmmu_acc,0.26556,
|
| 538 |
+
Cambrian,5000,mmstar_average,0.3255942091936794,
|
| 539 |
+
Cambrian,5000,ocrbench_ocrbench_accuracy,0.348,
|
| 540 |
+
Cambrian,5000,seedbench_seed_all,0.5292384658143413,
|
| 541 |
+
Cambrian,5000,textvqa_val_exact_match,0.47881999999999997,0.0067962283116337965
|
| 542 |
+
Cambrian,6000,ai2d_exact_match,0.4183937823834197,0.00887848400426025
|
| 543 |
+
Cambrian,6000,average,0.39990121640985093,
|
| 544 |
+
Cambrian,6000,average_rank,2.4,
|
| 545 |
+
Cambrian,6000,chartqa_relaxed_overall,0.5048,0.010001539697392967
|
| 546 |
+
Cambrian,6000,docvqa_val_anls,0.5016482570925722,0.006248476976439708
|
| 547 |
+
Cambrian,6000,infovqa_val_anls,0.19206925076752404,0.006399951499514914
|
| 548 |
+
Cambrian,6000,mme_total_score,1176.5368147258905,
|
| 549 |
+
Cambrian,6000,mmmu_val_mmmu_acc,0.26667,
|
| 550 |
+
Cambrian,6000,mmstar_average,0.33910121942401966,
|
| 551 |
+
Cambrian,6000,ocrbench_ocrbench_accuracy,0.349,
|
| 552 |
+
Cambrian,6000,seedbench_seed_all,0.5391884380211228,
|
| 553 |
+
Cambrian,6000,textvqa_val_exact_match,0.48823999999999995,0.006792935247288521
|
| 554 |
+
Cambrian,7000,ai2d_exact_match,0.4326424870466321,0.008917121282993509
|
| 555 |
+
Cambrian,7000,average,0.40874111160527243,
|
| 556 |
+
Cambrian,7000,average_rank,2.2,
|
| 557 |
+
Cambrian,7000,chartqa_relaxed_overall,0.5088,0.01000045137036546
|
| 558 |
+
Cambrian,7000,docvqa_val_anls,0.5036441729071615,0.006331057466984081
|
| 559 |
+
Cambrian,7000,infovqa_val_anls,0.21047690542452482,0.0067248622097179815
|
| 560 |
+
Cambrian,7000,mme_total_score,1226.7814125650261,
|
| 561 |
+
Cambrian,7000,mmmu_val_mmmu_acc,0.29,
|
| 562 |
+
Cambrian,7000,mmstar_average,0.338458434622219,
|
| 563 |
+
Cambrian,7000,ocrbench_ocrbench_accuracy,0.366,
|
| 564 |
+
Cambrian,7000,seedbench_seed_all,0.5344080044469149,
|
| 565 |
+
Cambrian,7000,textvqa_val_exact_match,0.49423999999999996,0.006789004536492761
|
| 566 |
+
Cambrian,8000,ai2d_exact_match,0.4375,0.008928571428571428
|
| 567 |
+
Cambrian,8000,average,0.4145399236017655,
|
| 568 |
+
Cambrian,8000,average_rank,2.2,
|
| 569 |
+
Cambrian,8000,chartqa_relaxed_overall,0.5312,0.009982508912777261
|
| 570 |
+
Cambrian,8000,docvqa_val_anls,0.5139425879433994,0.006316907313170543
|
| 571 |
+
Cambrian,8000,infovqa_val_anls,0.20402472511542052,0.00665285157736885
|
| 572 |
+
Cambrian,8000,mme_total_score,1243.7800120048018,
|
| 573 |
+
Cambrian,8000,mmmu_val_mmmu_acc,0.28222,
|
| 574 |
+
Cambrian,8000,mmstar_average,0.3300028831814166,
|
| 575 |
+
Cambrian,8000,ocrbench_ocrbench_accuracy,0.397,
|
| 576 |
+
Cambrian,8000,seedbench_seed_all,0.5364091161756531,
|
| 577 |
+
Cambrian,8000,textvqa_val_exact_match,0.49855999999999995,0.006793174127235705
|
| 578 |
+
Cambrian,9000,ai2d_exact_match,0.4251943005181347,0.008897867521411106
|
| 579 |
+
Cambrian,9000,average,0.41587431550154147,
|
| 580 |
+
Cambrian,9000,average_rank,2.0,
|
| 581 |
+
Cambrian,9000,chartqa_relaxed_overall,0.5316,0.009982005418395102
|
| 582 |
+
Cambrian,9000,docvqa_val_anls,0.524278096798472,0.006327817979288962
|
| 583 |
+
Cambrian,9000,infovqa_val_anls,0.2075069347958689,0.006574086714467312
|
| 584 |
+
Cambrian,9000,mme_total_score,1196.0997398959585,
|
| 585 |
+
Cambrian,9000,mmmu_val_mmmu_acc,0.28556,
|
| 586 |
+
Cambrian,9000,mmstar_average,0.33833745626187595,
|
| 587 |
+
Cambrian,9000,ocrbench_ocrbench_accuracy,0.381,
|
| 588 |
+
Cambrian,9000,seedbench_seed_all,0.5456920511395219,
|
| 589 |
+
Cambrian,9000,textvqa_val_exact_match,0.5036999999999999,0.006790970877355565
|
| 590 |
+
Cambrian,10000,ai2d_exact_match,0.44559585492227977,0.008945723914357835
|
| 591 |
+
Cambrian,10000,average,0.41659534392300923,
|
| 592 |
+
Cambrian,10000,average_rank,2.0,
|
| 593 |
+
Cambrian,10000,chartqa_relaxed_overall,0.5416,0.00996732235888869
|
| 594 |
+
Cambrian,10000,docvqa_val_anls,0.5215772912722147,0.006314944464077694
|
| 595 |
+
Cambrian,10000,infovqa_val_anls,0.18925972424188112,0.006302599390246784
|
| 596 |
+
Cambrian,10000,mme_total_score,1241.6579631852742,
|
| 597 |
+
Cambrian,10000,mmmu_val_mmmu_acc,0.27889,
|
| 598 |
+
Cambrian,10000,mmstar_average,0.34495128935097424,
|
| 599 |
+
Cambrian,10000,ocrbench_ocrbench_accuracy,0.373,
|
| 600 |
+
Cambrian,10000,seedbench_seed_all,0.5510839355197332,
|
| 601 |
+
Cambrian,10000,textvqa_val_exact_match,0.5034000000000001,0.0067932111363852585
|
| 602 |
+
Cambrian,11000,ai2d_exact_match,0.4481865284974093,0.008950704796242765
|
| 603 |
+
Cambrian,11000,average,0.42096531591252645,
|
| 604 |
+
Cambrian,11000,average_rank,2.0,
|
| 605 |
+
Cambrian,11000,chartqa_relaxed_overall,0.5388,0.0099718403035556
|
| 606 |
+
Cambrian,11000,docvqa_val_anls,0.5266496382012209,0.006315639724937912
|
| 607 |
+
Cambrian,11000,infovqa_val_anls,0.210453542763111,0.006757501751011823
|
| 608 |
+
Cambrian,11000,mme_total_score,1288.1182472989194,
|
| 609 |
+
Cambrian,11000,mmmu_val_mmmu_acc,0.28556,
|
| 610 |
+
Cambrian,11000,mmstar_average,0.33813173019346515,
|
| 611 |
+
Cambrian,11000,ocrbench_ocrbench_accuracy,0.372,
|
| 612 |
+
Cambrian,11000,seedbench_seed_all,0.547526403557532,
|
| 613 |
+
Cambrian,11000,textvqa_val_exact_match,0.5213800000000001,0.00677771101429669
|
| 614 |
+
Cambrian,12000,ai2d_exact_match,0.4566062176165803,0.008965198879336198
|
| 615 |
+
Cambrian,12000,average,0.42647137409223257,
|
| 616 |
+
Cambrian,12000,average_rank,2.1,
|
| 617 |
+
Cambrian,12000,chartqa_relaxed_overall,0.5488,0.00995424828018316
|
| 618 |
+
Cambrian,12000,docvqa_val_anls,0.5432685128640529,0.006286968775744768
|
| 619 |
+
Cambrian,12000,infovqa_val_anls,0.214068867667478,0.006728697021311144
|
| 620 |
+
Cambrian,12000,mme_total_score,1272.0885354141656,
|
| 621 |
+
Cambrian,12000,mmmu_val_mmmu_acc,0.27556,
|
| 622 |
+
Cambrian,12000,mmstar_average,0.3364706975313428,
|
| 623 |
+
Cambrian,12000,ocrbench_ocrbench_accuracy,0.396,
|
| 624 |
+
Cambrian,12000,seedbench_seed_all,0.5505280711506393,
|
| 625 |
+
Cambrian,12000,textvqa_val_exact_match,0.51694,0.00676817323313926
|
| 626 |
+
Cambrian,13000,ai2d_exact_match,0.44591968911917096,0.008946359966425538
|
| 627 |
+
Cambrian,13000,average,0.42595033048849396,
|
| 628 |
+
Cambrian,13000,average_rank,2.1,
|
| 629 |
+
Cambrian,13000,chartqa_relaxed_overall,0.5484,0.009955029736109216
|
| 630 |
+
Cambrian,13000,docvqa_val_anls,0.5438384263330651,0.006322105329987294
|
| 631 |
+
Cambrian,13000,infovqa_val_anls,0.2206834922799479,0.006931006985711701
|
| 632 |
+
Cambrian,13000,mme_total_score,1294.3567426970787,
|
| 633 |
+
Cambrian,13000,mmmu_val_mmmu_acc,0.27889,
|
| 634 |
+
Cambrian,13000,mmstar_average,0.3258043460972802,
|
| 635 |
+
Cambrian,13000,ocrbench_ocrbench_accuracy,0.404,
|
| 636 |
+
Cambrian,13000,seedbench_seed_all,0.5466370205669816,
|
| 637 |
+
Cambrian,13000,textvqa_val_exact_match,0.5193800000000001,0.006779976160381913
|
| 638 |
+
Cambrian,14000,ai2d_exact_match,0.452720207253886,0.00895883074213608
|
| 639 |
+
Cambrian,14000,average,0.4290628718702856,
|
| 640 |
+
Cambrian,14000,average_rank,2.2,
|
| 641 |
+
Cambrian,14000,chartqa_relaxed_overall,0.5624,0.009923804147377265
|
| 642 |
+
Cambrian,14000,docvqa_val_anls,0.5501582985035621,0.006289139790552158
|
| 643 |
+
Cambrian,14000,infovqa_val_anls,0.2108586833777777,0.006694603397438603
|
| 644 |
+
Cambrian,14000,mme_total_score,1258.3851540616247,
|
| 645 |
+
Cambrian,14000,mmmu_val_mmmu_acc,0.28444,
|
| 646 |
+
Cambrian,14000,mmstar_average,0.3392338272359765,
|
| 647 |
+
Cambrian,14000,ocrbench_ocrbench_accuracy,0.391,
|
| 648 |
+
Cambrian,14000,seedbench_seed_all,0.5506948304613675,
|
| 649 |
+
Cambrian,14000,textvqa_val_exact_match,0.5200600000000001,0.006762031077483937
|
| 650 |
+
Cambrian,15000,ai2d_exact_match,0.4575777202072539,0.008966704964444827
|
| 651 |
+
Cambrian,15000,average,0.4277300448618869,
|
| 652 |
+
Cambrian,15000,average_rank,2.2,
|
| 653 |
+
Cambrian,15000,chartqa_relaxed_overall,0.5572,0.009936335154498413
|
| 654 |
+
Cambrian,15000,docvqa_val_anls,0.550106577844955,0.006305789516584643
|
| 655 |
+
Cambrian,15000,infovqa_val_anls,0.2065365477570411,0.006585265308234506
|
| 656 |
+
Cambrian,15000,mme_total_score,1191.499399759904,
|
| 657 |
+
Cambrian,15000,mmmu_val_mmmu_acc,0.27667,
|
| 658 |
+
Cambrian,15000,mmstar_average,0.3287834934674655,
|
| 659 |
+
Cambrian,15000,ocrbench_ocrbench_accuracy,0.403,
|
| 660 |
+
Cambrian,15000,seedbench_seed_all,0.5489160644802669,
|
| 661 |
+
Cambrian,15000,textvqa_val_exact_match,0.52078,0.006761241098810132
|
| 662 |
+
Cambrian,16000,ai2d_exact_match,0.45174870466321243,0.008957152666985158
|
| 663 |
+
Cambrian,16000,average,0.4283932783055524,
|
| 664 |
+
Cambrian,16000,average_rank,2.0,
|
| 665 |
+
Cambrian,16000,chartqa_relaxed_overall,0.566,0.00991448025705367
|
| 666 |
+
Cambrian,16000,docvqa_val_anls,0.5507111549470696,0.006298722691255348
|
| 667 |
+
Cambrian,16000,infovqa_val_anls,0.21185403234992514,0.0065982885956266755
|
| 668 |
+
Cambrian,16000,mme_total_score,1242.7407963185274,
|
| 669 |
+
Cambrian,16000,mmmu_val_mmmu_acc,0.28111,
|
| 670 |
+
Cambrian,16000,mmstar_average,0.32560559611383355,
|
| 671 |
+
Cambrian,16000,ocrbench_ocrbench_accuracy,0.394,
|
| 672 |
+
Cambrian,16000,seedbench_seed_all,0.5540300166759311,
|
| 673 |
+
Cambrian,16000,textvqa_val_exact_match,0.5204799999999999,0.006783488561456611
|
| 674 |
+
Cambrian,17000,ai2d_exact_match,0.4585492227979275,0.008968176705111413
|
| 675 |
+
Cambrian,17000,average,0.43044446070382536,
|
| 676 |
+
Cambrian,17000,average_rank,2.4,
|
| 677 |
+
Cambrian,17000,chartqa_relaxed_overall,0.5656,0.009915542506251351
|
| 678 |
+
Cambrian,17000,docvqa_val_anls,0.5528747665552118,0.006300095973166064
|
| 679 |
+
Cambrian,17000,infovqa_val_anls,0.20960594545383252,0.0066643358201217045
|
| 680 |
+
Cambrian,17000,mme_total_score,1292.4750900360143,
|
| 681 |
+
Cambrian,17000,mmmu_val_mmmu_acc,0.27111,
|
| 682 |
+
Cambrian,17000,mmstar_average,0.3297184661133375,
|
| 683 |
+
Cambrian,17000,ocrbench_ocrbench_accuracy,0.409,
|
| 684 |
+
Cambrian,17000,seedbench_seed_all,0.555141745414119,
|
| 685 |
+
Cambrian,17000,textvqa_val_exact_match,0.5224,0.006774129151791618
|
| 686 |
+
Cambrian,18000,ai2d_exact_match,0.4523963730569948,0.008958275210820045
|
| 687 |
+
Cambrian,18000,average,0.43086034100304976,
|
| 688 |
+
Cambrian,18000,average_rank,2.4,
|
| 689 |
+
Cambrian,18000,chartqa_relaxed_overall,0.566,0.00991448025705367
|
| 690 |
+
Cambrian,18000,docvqa_val_anls,0.5527950768923724,0.006311862091164367
|
| 691 |
+
Cambrian,18000,infovqa_val_anls,0.21943552260393814,0.006848865968629337
|
| 692 |
+
Cambrian,18000,mme_total_score,1271.4629851940776,
|
| 693 |
+
Cambrian,18000,mmmu_val_mmmu_acc,0.28333,
|
| 694 |
+
Cambrian,18000,mmstar_average,0.3399009269355101,
|
| 695 |
+
Cambrian,18000,ocrbench_ocrbench_accuracy,0.403,
|
| 696 |
+
Cambrian,18000,seedbench_seed_all,0.5493051695386326,
|
| 697 |
+
Cambrian,18000,textvqa_val_exact_match,0.5115799999999999,0.0067870754820260944
|
| 698 |
+
Cambrian,19000,ai2d_exact_match,0.45012953367875647,0.008954279299902583
|
| 699 |
+
Cambrian,19000,average,0.43057935657557483,
|
| 700 |
+
Cambrian,19000,average_rank,2.2,
|
| 701 |
+
Cambrian,19000,chartqa_relaxed_overall,0.5704,0.009902361269085337
|
| 702 |
+
Cambrian,19000,docvqa_val_anls,0.5526262050544066,0.006310038331338026
|
| 703 |
+
Cambrian,19000,infovqa_val_anls,0.21937034023427093,0.006858602078113178
|
| 704 |
+
Cambrian,19000,mme_total_score,1269.9476790716285,
|
| 705 |
+
Cambrian,19000,mmmu_val_mmmu_acc,0.28556,
|
| 706 |
+
Cambrian,19000,mmstar_average,0.3314266960826673,
|
| 707 |
+
Cambrian,19000,ocrbench_ocrbench_accuracy,0.404,
|
| 708 |
+
Cambrian,19000,seedbench_seed_all,0.5465814341300722,
|
| 709 |
+
Cambrian,19000,textvqa_val_exact_match,0.51512,0.006773909823053313
|
| 710 |
+
Cambrian,20000,ai2d_exact_match,0.45531088082901555,0.008963137311190377
|
| 711 |
+
Cambrian,20000,average,0.42817340693945505,
|
| 712 |
+
Cambrian,20000,average_rank,2.4,
|
| 713 |
+
Cambrian,20000,chartqa_relaxed_overall,0.5684,0.009907968668564455
|
| 714 |
+
Cambrian,20000,docvqa_val_anls,0.549188563518089,0.006325944032596611
|
| 715 |
+
Cambrian,20000,infovqa_val_anls,0.21755406764942647,0.0068363256354831885
|
| 716 |
+
Cambrian,20000,mme_total_score,1290.6296518607442,
|
| 717 |
+
Cambrian,20000,mmmu_val_mmmu_acc,0.28444,
|
| 718 |
+
Cambrian,20000,mmstar_average,0.32485343172593534,
|
| 719 |
+
Cambrian,20000,ocrbench_ocrbench_accuracy,0.392,
|
| 720 |
+
Cambrian,20000,seedbench_seed_all,0.5486937187326293,
|
| 721 |
+
Cambrian,20000,textvqa_val_exact_match,0.51312,0.006789609184524225
|
| 722 |
+
LLaVa,1000,ai2d_exact_match,0.25777202072538863,0.007872600874396432
|
| 723 |
+
LLaVa,1000,average,0.2581360512843851,
|
| 724 |
+
LLaVa,1000,average_rank,3.0,
|
| 725 |
+
LLaVa,1000,chartqa_relaxed_overall,0.1576,0.007288768514542319
|
| 726 |
+
LLaVa,1000,docvqa_val_anls,0.2850280465017524,0.005237571860745478
|
| 727 |
+
LLaVa,1000,infovqa_val_anls,0.15291302898150733,0.005597827181699182
|
| 728 |
+
LLaVa,1000,mme_total_score,844.0894357743098,
|
| 729 |
+
LLaVa,1000,mmmu_val_mmmu_acc,0.25333,
|
| 730 |
+
LLaVa,1000,mmstar_average,0.22969486173769915,
|
| 731 |
+
LLaVa,1000,ocrbench_ocrbench_accuracy,0.35,
|
| 732 |
+
LLaVa,1000,seedbench_seed_all,0.2717065036131184,
|
| 733 |
+
LLaVa,1000,textvqa_val_exact_match,0.36518,0.006561838543046682
|
| 734 |
+
LLaVa,2000,ai2d_exact_match,0.24676165803108807,0.007759553547248649
|
| 735 |
+
LLaVa,2000,average,0.28023175511348764,
|
| 736 |
+
LLaVa,2000,average_rank,3.2,
|
| 737 |
+
LLaVa,2000,chartqa_relaxed_overall,0.19,0.007847587772910948
|
| 738 |
+
LLaVa,2000,docvqa_val_anls,0.31839133336930814,0.005353711170722305
|
| 739 |
+
LLaVa,2000,infovqa_val_anls,0.1625232406439703,0.005680709103352321
|
| 740 |
+
LLaVa,2000,mme_total_score,677.0834333733493,
|
| 741 |
+
LLaVa,2000,mmmu_val_mmmu_acc,0.25111,
|
| 742 |
+
LLaVa,2000,mmstar_average,0.2602226545829147,
|
| 743 |
+
LLaVa,2000,ocrbench_ocrbench_accuracy,0.389,
|
| 744 |
+
LLaVa,2000,seedbench_seed_all,0.2864369093941078,
|
| 745 |
+
LLaVa,2000,textvqa_val_exact_match,0.41764000000000007,0.006695635323587844
|
| 746 |
+
LLaVa,3000,ai2d_exact_match,0.31541450777202074,0.00836346730591157
|
| 747 |
+
LLaVa,3000,average,0.3241247472461608,
|
| 748 |
+
LLaVa,3000,average_rank,3.1,
|
| 749 |
+
LLaVa,3000,chartqa_relaxed_overall,0.2048,0.008072722684486087
|
| 750 |
+
LLaVa,3000,docvqa_val_anls,0.33927313841893186,0.005424261898744584
|
| 751 |
+
LLaVa,3000,infovqa_val_anls,0.17400826017663457,0.005878416771815313
|
| 752 |
+
LLaVa,3000,mme_total_score,674.5895358143258,
|
| 753 |
+
LLaVa,3000,mmmu_val_mmmu_acc,0.27778,
|
| 754 |
+
LLaVa,3000,mmstar_average,0.28839612401739867,
|
| 755 |
+
LLaVa,3000,ocrbench_ocrbench_accuracy,0.428,
|
| 756 |
+
LLaVa,3000,seedbench_seed_all,0.4512506948304614,
|
| 757 |
+
LLaVa,3000,textvqa_val_exact_match,0.4382,0.006743326070219196
|
| 758 |
+
LLaVa,4000,ai2d_exact_match,0.30667098445595853,0.008299228398743067
|
| 759 |
+
LLaVa,4000,average,0.34151562451124173,
|
| 760 |
+
LLaVa,4000,average_rank,2.8,
|
| 761 |
+
LLaVa,4000,chartqa_relaxed_overall,0.2168,0.00824295350666284
|
| 762 |
+
LLaVa,4000,docvqa_val_anls,0.36894439928615425,0.005583877165382837
|
| 763 |
+
LLaVa,4000,infovqa_val_anls,0.1815741433661475,0.005975096001960774
|
| 764 |
+
LLaVa,4000,mme_total_score,660.3387354941976,
|
| 765 |
+
LLaVa,4000,mmmu_val_mmmu_acc,0.29444,
|
| 766 |
+
LLaVa,4000,mmstar_average,0.3089940618086463,
|
| 767 |
+
LLaVa,4000,ocrbench_ocrbench_accuracy,0.439,
|
| 768 |
+
LLaVa,4000,seedbench_seed_all,0.48265703168426904,
|
| 769 |
+
LLaVa,4000,textvqa_val_exact_match,0.4745599999999999,0.006778004835488831
|
| 770 |
+
LLaVa,5000,ai2d_exact_match,0.3176813471502591,0.00837955903737489
|
| 771 |
+
LLaVa,5000,average,0.3488971740226244,
|
| 772 |
+
LLaVa,5000,average_rank,2.9,
|
| 773 |
+
LLaVa,5000,chartqa_relaxed_overall,0.2076,0.008113397986710395
|
| 774 |
+
LLaVa,5000,docvqa_val_anls,0.37667351380566144,0.005504553709162657
|
| 775 |
+
LLaVa,5000,infovqa_val_anls,0.19157302816202296,0.006066754825254386
|
| 776 |
+
LLaVa,5000,mme_total_score,596.045218087235,
|
| 777 |
+
LLaVa,5000,mmmu_val_mmmu_acc,0.28889,
|
| 778 |
+
LLaVa,5000,mmstar_average,0.30911460927022283,
|
| 779 |
+
LLaVa,5000,ocrbench_ocrbench_accuracy,0.471,
|
| 780 |
+
LLaVa,5000,seedbench_seed_all,0.49972206781545303,
|
| 781 |
+
LLaVa,5000,textvqa_val_exact_match,0.47781999999999997,0.00678922884027701
|
| 782 |
+
LLaVa,6000,ai2d_exact_match,0.3626943005181347,0.00865318426683941
|
| 783 |
+
LLaVa,6000,average,0.35336013036474917,
|
| 784 |
+
LLaVa,6000,average_rank,3.3,
|
| 785 |
+
LLaVa,6000,chartqa_relaxed_overall,0.2164,0.00823744852629073
|
| 786 |
+
LLaVa,6000,docvqa_val_anls,0.3796381971300078,0.005512363416378596
|
| 787 |
+
LLaVa,6000,infovqa_val_anls,0.1911083172357537,0.00606756561226675
|
| 788 |
+
LLaVa,6000,mme_total_score,751.7179871948779,
|
| 789 |
+
LLaVa,6000,mmmu_val_mmmu_acc,0.27111,
|
| 790 |
+
LLaVa,6000,mmstar_average,0.3230226430014031,
|
| 791 |
+
LLaVa,6000,ocrbench_ocrbench_accuracy,0.471,
|
| 792 |
+
LLaVa,6000,seedbench_seed_all,0.49788771539744303,
|
| 793 |
+
LLaVa,6000,textvqa_val_exact_match,0.46738,0.006777431212101451
|
| 794 |
+
LLaVa,7000,ai2d_exact_match,0.3636658031088083,0.008658158841882565
|
| 795 |
+
LLaVa,7000,average,0.36232264653787655,
|
| 796 |
+
LLaVa,7000,average_rank,3.4,
|
| 797 |
+
LLaVa,7000,chartqa_relaxed_overall,0.2276,0.00838733777631434
|
| 798 |
+
LLaVa,7000,docvqa_val_anls,0.38862032747814834,0.005554025202613156
|
| 799 |
+
LLaVa,7000,infovqa_val_anls,0.1987523491607365,0.006169459873730798
|
| 800 |
+
LLaVa,7000,mme_total_score,700.0341136454582,
|
| 801 |
+
LLaVa,7000,mmmu_val_mmmu_acc,0.28,
|
| 802 |
+
LLaVa,7000,mmstar_average,0.32238002502982693,
|
| 803 |
+
LLaVa,7000,ocrbench_ocrbench_accuracy,0.469,
|
| 804 |
+
LLaVa,7000,seedbench_seed_all,0.5175653140633686,
|
| 805 |
+
LLaVa,7000,textvqa_val_exact_match,0.49332,0.006784414578741135
|
| 806 |
+
LLaVa,8000,ai2d_exact_match,0.38244818652849744,0.008746910624026853
|
| 807 |
+
LLaVa,8000,average,0.36916094621046264,
|
| 808 |
+
LLaVa,8000,average_rank,2.8,
|
| 809 |
+
LLaVa,8000,chartqa_relaxed_overall,0.2276,0.00838733777631434
|
| 810 |
+
LLaVa,8000,docvqa_val_anls,0.4000384036155175,0.005647492303754258
|
| 811 |
+
LLaVa,8000,infovqa_val_anls,0.20267340215584623,0.006186451136703468
|
| 812 |
+
LLaVa,8000,mme_total_score,787.0998399359744,
|
| 813 |
+
LLaVa,8000,mmmu_val_mmmu_acc,0.28333,
|
| 814 |
+
LLaVa,8000,mmstar_average,0.33877512170436386,
|
| 815 |
+
LLaVa,8000,ocrbench_ocrbench_accuracy,0.47,
|
| 816 |
+
LLaVa,8000,seedbench_seed_all,0.5221234018899389,
|
| 817 |
+
LLaVa,8000,textvqa_val_exact_match,0.49546,0.006796875545678079
|
| 818 |
+
LLaVa,9000,ai2d_exact_match,0.3856865284974093,0.008760803506529557
|
| 819 |
+
LLaVa,9000,average,0.3660729124456708,
|
| 820 |
+
LLaVa,9000,average_rank,3.0,
|
| 821 |
+
LLaVa,9000,chartqa_relaxed_overall,0.2212,0.00830275847651416
|
| 822 |
+
LLaVa,9000,docvqa_val_anls,0.3961556104365206,0.005555787005997977
|
| 823 |
+
LLaVa,9000,infovqa_val_anls,0.20795411138332273,0.006302696156883479
|
| 824 |
+
LLaVa,9000,mme_total_score,697.6510604241697,
|
| 825 |
+
LLaVa,9000,mmmu_val_mmmu_acc,0.27444,
|
| 826 |
+
LLaVa,9000,mmstar_average,0.33019217959261743,
|
| 827 |
+
LLaVa,9000,ocrbench_ocrbench_accuracy,0.47,
|
| 828 |
+
LLaVa,9000,seedbench_seed_all,0.5140077821011673,
|
| 829 |
+
LLaVa,9000,textvqa_val_exact_match,0.49501999999999996,0.006795224421237829
|
| 830 |
+
LLaVa,10000,ai2d_exact_match,0.3636658031088083,0.008658158841882561
|
| 831 |
+
LLaVa,10000,average,0.36465272894871764,
|
| 832 |
+
LLaVa,10000,average_rank,3.1,
|
| 833 |
+
LLaVa,10000,chartqa_relaxed_overall,0.2216,0.008308127706914342
|
| 834 |
+
LLaVa,10000,docvqa_val_anls,0.3905169927438113,0.005559588309122447
|
| 835 |
+
LLaVa,10000,infovqa_val_anls,0.210842797817216,0.0062742161273205005
|
| 836 |
+
LLaVa,10000,mme_total_score,710.1757703081232,
|
| 837 |
+
LLaVa,10000,mmmu_val_mmmu_acc,0.25667,
|
| 838 |
+
LLaVa,10000,mmstar_average,0.33485115141559363,
|
| 839 |
+
LLaVa,10000,ocrbench_ocrbench_accuracy,0.484,
|
| 840 |
+
LLaVa,10000,seedbench_seed_all,0.5220678154530295,
|
| 841 |
+
LLaVa,10000,textvqa_val_exact_match,0.49766000000000005,0.0067820722630208075
|
| 842 |
+
LLaVa,11000,ai2d_exact_match,0.3539507772020725,0.008606685322379343
|
| 843 |
+
LLaVa,11000,average,0.3619647158138698,
|
| 844 |
+
LLaVa,11000,average_rank,3.3,
|
| 845 |
+
LLaVa,11000,chartqa_relaxed_overall,0.226,0.008366456779283321
|
| 846 |
+
LLaVa,11000,docvqa_val_anls,0.39615321520069524,0.0055548098783566
|
| 847 |
+
LLaVa,11000,infovqa_val_anls,0.20231707967850712,0.006189706400735626
|
| 848 |
+
LLaVa,11000,mme_total_score,620.8629451780713,
|
| 849 |
+
LLaVa,11000,mmmu_val_mmmu_acc,0.26778,
|
| 850 |
+
LLaVa,11000,mmstar_average,0.3504522318333254,
|
| 851 |
+
LLaVa,11000,ocrbench_ocrbench_accuracy,0.48,
|
| 852 |
+
LLaVa,11000,seedbench_seed_all,0.5084491384102279,
|
| 853 |
+
LLaVa,11000,textvqa_val_exact_match,0.47257999999999994,0.0067942373414689025
|
| 854 |
+
LLaVa,12000,ai2d_exact_match,0.3963730569948187,0.008803757198545707
|
| 855 |
+
LLaVa,12000,average,0.36835635606525785,
|
| 856 |
+
LLaVa,12000,average_rank,3.1,
|
| 857 |
+
LLaVa,12000,chartqa_relaxed_overall,0.234,0.008469137530835504
|
| 858 |
+
LLaVa,12000,docvqa_val_anls,0.3998087503562603,0.005606788206948343
|
| 859 |
+
LLaVa,12000,infovqa_val_anls,0.19486992137918643,0.006137557366661157
|
| 860 |
+
LLaVa,12000,mme_total_score,707.7871148459384,
|
| 861 |
+
LLaVa,12000,mmmu_val_mmmu_acc,0.26444,
|
| 862 |
+
LLaVa,12000,mmstar_average,0.34510216846405867,
|
| 863 |
+
LLaVa,12000,ocrbench_ocrbench_accuracy,0.466,
|
| 864 |
+
LLaVa,12000,seedbench_seed_all,0.5159533073929962,
|
| 865 |
+
LLaVa,12000,textvqa_val_exact_match,0.49866000000000005,0.006787787245571138
|
| 866 |
+
LLaVa,13000,ai2d_exact_match,0.37661917098445596,0.008720866089740391
|
| 867 |
+
LLaVa,13000,average,0.3660925061677603,
|
| 868 |
+
LLaVa,13000,average_rank,3.2,
|
| 869 |
+
LLaVa,13000,chartqa_relaxed_overall,0.23,0.008418334000200726
|
| 870 |
+
LLaVa,13000,docvqa_val_anls,0.39678037656395876,0.005562201990102385
|
| 871 |
+
LLaVa,13000,infovqa_val_anls,0.20007389352596994,0.006181717086032354
|
| 872 |
+
LLaVa,13000,mme_total_score,762.4510804321728,
|
| 873 |
+
LLaVa,13000,mmmu_val_mmmu_acc,0.26111,
|
| 874 |
+
LLaVa,13000,mmstar_average,0.3487764851969923,
|
| 875 |
+
LLaVa,13000,ocrbench_ocrbench_accuracy,0.487,
|
| 876 |
+
LLaVa,13000,seedbench_seed_all,0.5187326292384659,
|
| 877 |
+
LLaVa,13000,textvqa_val_exact_match,0.47573999999999994,0.006786037174972445
|
| 878 |
+
LLaVa,14000,ai2d_exact_match,0.40382124352331605,0.008831094143874325
|
| 879 |
+
LLaVa,14000,average,0.3665520961603681,
|
| 880 |
+
LLaVa,14000,average_rank,3.5,
|
| 881 |
+
LLaVa,14000,chartqa_relaxed_overall,0.224,0.0083401092900026
|
| 882 |
+
LLaVa,14000,docvqa_val_anls,0.39653795108545226,0.0055480083540036754
|
| 883 |
+
LLaVa,14000,infovqa_val_anls,0.1966338205713239,0.006145830112184984
|
| 884 |
+
LLaVa,14000,mme_total_score,648.8810524209684,
|
| 885 |
+
LLaVa,14000,mmmu_val_mmmu_acc,0.27222,
|
| 886 |
+
LLaVa,14000,mmstar_average,0.3348780070169728,
|
| 887 |
+
LLaVa,14000,ocrbench_ocrbench_accuracy,0.482,
|
| 888 |
+
LLaVa,14000,seedbench_seed_all,0.5121178432462479,
|
| 889 |
+
LLaVa,14000,textvqa_val_exact_match,0.47676,0.006784540255411228
|
| 890 |
+
LLaVa,15000,ai2d_exact_match,0.38374352331606215,0.008752516998880439
|
| 891 |
+
LLaVa,15000,average,0.3656314014070533,
|
| 892 |
+
LLaVa,15000,average_rank,3.3,
|
| 893 |
+
LLaVa,15000,chartqa_relaxed_overall,0.222,0.008313485768211027
|
| 894 |
+
LLaVa,15000,docvqa_val_anls,0.3956148602850384,0.005571289516040145
|
| 895 |
+
LLaVa,15000,infovqa_val_anls,0.2003939669503818,0.006205919365204143
|
| 896 |
+
LLaVa,15000,mme_total_score,744.8995598239295,
|
| 897 |
+
LLaVa,15000,mmmu_val_mmmu_acc,0.25111,
|
| 898 |
+
LLaVa,15000,mmstar_average,0.34431451447442113,
|
| 899 |
+
LLaVa,15000,ocrbench_ocrbench_accuracy,0.491,
|
| 900 |
+
LLaVa,15000,seedbench_seed_all,0.5223457476375765,
|
| 901 |
+
LLaVa,15000,textvqa_val_exact_match,0.48016000000000003,0.006780152577471598
|
| 902 |
+
LLaVa,16000,ai2d_exact_match,0.38244818652849744,0.008746910624026851
|
| 903 |
+
LLaVa,16000,average,0.3664952284054124,
|
| 904 |
+
LLaVa,16000,average_rank,3.1,
|
| 905 |
+
LLaVa,16000,chartqa_relaxed_overall,0.2272,0.008382133861209024
|
| 906 |
+
LLaVa,16000,docvqa_val_anls,0.3971604594021061,0.005596507964441207
|
| 907 |
+
LLaVa,16000,infovqa_val_anls,0.20130541865614268,0.006177273754737603
|
| 908 |
+
LLaVa,16000,mme_total_score,741.5084033613446,
|
| 909 |
+
LLaVa,16000,mmmu_val_mmmu_acc,0.25444,
|
| 910 |
+
LLaVa,16000,mmstar_average,0.34322789378570057,
|
| 911 |
+
LLaVa,16000,ocrbench_ocrbench_accuracy,0.488,
|
| 912 |
+
LLaVa,16000,seedbench_seed_all,0.5151750972762645,
|
| 913 |
+
LLaVa,16000,textvqa_val_exact_match,0.4895,0.0067890182024819105
|
| 914 |
+
LLaVa,17000,ai2d_exact_match,0.36852331606217614,0.008682460781863906
|
| 915 |
+
LLaVa,17000,average,0.3659850040618015,
|
| 916 |
+
LLaVa,17000,average_rank,3.0,
|
| 917 |
+
LLaVa,17000,chartqa_relaxed_overall,0.2264,0.008371693383064148
|
| 918 |
+
LLaVa,17000,docvqa_val_anls,0.3895535425900796,0.005559420230793686
|
| 919 |
+
LLaVa,17000,infovqa_val_anls,0.19870913061640477,0.0061833458200064835
|
| 920 |
+
LLaVa,17000,mme_total_score,738.0654261704681,
|
| 921 |
+
LLaVa,17000,mmmu_val_mmmu_acc,0.27667,
|
| 922 |
+
LLaVa,17000,mmstar_average,0.3488362957589257,
|
| 923 |
+
LLaVa,17000,ocrbench_ocrbench_accuracy,0.486,
|
| 924 |
+
LLaVa,17000,seedbench_seed_all,0.514952751528627,
|
| 925 |
+
LLaVa,17000,textvqa_val_exact_match,0.48422,0.006797929147037179
|
| 926 |
+
LLaVa,18000,ai2d_exact_match,0.3785621761658031,0.008729696327646351
|
| 927 |
+
LLaVa,18000,average,0.3667559662544118,
|
| 928 |
+
LLaVa,18000,average_rank,3.1,
|
| 929 |
+
LLaVa,18000,chartqa_relaxed_overall,0.2268,0.008376919070233621
|
| 930 |
+
LLaVa,18000,docvqa_val_anls,0.39054490192374947,0.005557124380968682
|
| 931 |
+
LLaVa,18000,infovqa_val_anls,0.19983100041999644,0.006171606410532323
|
| 932 |
+
LLaVa,18000,mme_total_score,746.5269107643057,
|
| 933 |
+
LLaVa,18000,mmmu_val_mmmu_acc,0.27,
|
| 934 |
+
LLaVa,18000,mmstar_average,0.3522401814266279,
|
| 935 |
+
LLaVa,18000,ocrbench_ocrbench_accuracy,0.497,
|
| 936 |
+
LLaVa,18000,seedbench_seed_all,0.5137854363535297,
|
| 937 |
+
LLaVa,18000,textvqa_val_exact_match,0.47203999999999996,0.006793178720998519
|
| 938 |
+
LLaVa,19000,ai2d_exact_match,0.3707901554404145,0.008693477555877339
|
| 939 |
+
LLaVa,19000,average,0.3627892845719615,
|
| 940 |
+
LLaVa,19000,average_rank,3.2,
|
| 941 |
+
LLaVa,19000,chartqa_relaxed_overall,0.2284,0.008397713059747491
|
| 942 |
+
LLaVa,19000,docvqa_val_anls,0.3886627325813464,0.005572189741680524
|
| 943 |
+
LLaVa,19000,infovqa_val_anls,0.18766806187395813,0.006047287494792444
|
| 944 |
+
LLaVa,19000,mme_total_score,735.0644257703082,
|
| 945 |
+
LLaVa,19000,mmmu_val_mmmu_acc,0.27556,
|
| 946 |
+
LLaVa,19000,mmstar_average,0.34617955399790473,
|
| 947 |
+
LLaVa,19000,ocrbench_ocrbench_accuracy,0.487,
|
| 948 |
+
LLaVa,19000,seedbench_seed_all,0.50550305725403,
|
| 949 |
+
LLaVa,19000,textvqa_val_exact_match,0.47534,0.00678734045691651
|
| 950 |
+
LLaVa,20000,ai2d_exact_match,0.3746761658031088,0.008711886524907501
|
| 951 |
+
LLaVa,20000,average,0.3636232406961286,
|
| 952 |
+
LLaVa,20000,average_rank,3.3,
|
| 953 |
+
LLaVa,20000,chartqa_relaxed_overall,0.2224,0.00831883268198588
|
| 954 |
+
LLaVa,20000,docvqa_val_anls,0.3865323770909091,0.005551659686181904
|
| 955 |
+
LLaVa,20000,infovqa_val_anls,0.1967140503390298,0.006138459642690392
|
| 956 |
+
LLaVa,20000,mme_total_score,688.5517206882753,
|
| 957 |
+
LLaVa,20000,mmmu_val_mmmu_acc,0.27556,
|
| 958 |
+
LLaVa,20000,mmstar_average,0.3525069399025931,
|
| 959 |
+
LLaVa,20000,ocrbench_ocrbench_accuracy,0.494,
|
| 960 |
+
LLaVa,20000,seedbench_seed_all,0.5113396331295164,
|
| 961 |
+
LLaVa,20000,textvqa_val_exact_match,0.45888,0.006775175991953595
|
app/src/content/assets/data/against_baselines_deduplicated.csv
ADDED
|
@@ -0,0 +1,828 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
run,step,metric,value,stderr
|
| 2 |
+
FineVisionDD,1200,average,0.264341097123272,
|
| 3 |
+
FineVisionDD,1200,average_rank,2.5714285714285716,
|
| 4 |
+
FineVisionDD,1200,docvqa_val_anls,0.3715200680496628,0.005949832790823121
|
| 5 |
+
FineVisionDD,1200,infovqa_val_anls,0.19222676120723237,0.006565134600763451
|
| 6 |
+
FineVisionDD,1200,mme_total_score,743.1522609043617,
|
| 7 |
+
FineVisionDD,1200,mmmu_val_mmmu_acc,0.26222,
|
| 8 |
+
FineVisionDD,1200,mmstar_average,0.21525975348273643,
|
| 9 |
+
FineVisionDD,1200,ocrbench_ocrbench_accuracy,0.3,
|
| 10 |
+
FineVisionDD,1200,textvqa_val_exact_match,0.24482000000000004,0.005905726800471586
|
| 11 |
+
FineVisionDD,2400,average,0.3178775750923926,
|
| 12 |
+
FineVisionDD,2400,average_rank,2.4285714285714284,
|
| 13 |
+
FineVisionDD,2400,docvqa_val_anls,0.47030638473718095,0.006228583735740807
|
| 14 |
+
FineVisionDD,2400,infovqa_val_anls,0.20933736286426122,0.006709818578853176
|
| 15 |
+
FineVisionDD,2400,mme_total_score,1185.2899159663866,
|
| 16 |
+
FineVisionDD,2400,mmmu_val_mmmu_acc,0.25,
|
| 17 |
+
FineVisionDD,2400,mmstar_average,0.24490170295291339,
|
| 18 |
+
FineVisionDD,2400,ocrbench_ocrbench_accuracy,0.384,
|
| 19 |
+
FineVisionDD,2400,textvqa_val_exact_match,0.34872,0.00652553360559637
|
| 20 |
+
FineVisionDD,3600,average,0.34596783716441254,
|
| 21 |
+
FineVisionDD,3600,average_rank,2.4285714285714284,
|
| 22 |
+
FineVisionDD,3600,docvqa_val_anls,0.52073479618703,0.006284214687786431
|
| 23 |
+
FineVisionDD,3600,infovqa_val_anls,0.22809076679417026,0.006878849345111437
|
| 24 |
+
FineVisionDD,3600,mme_total_score,1168.4510804321728,
|
| 25 |
+
FineVisionDD,3600,mmmu_val_mmmu_acc,0.25667,
|
| 26 |
+
FineVisionDD,3600,mmstar_average,0.23323146000527503,
|
| 27 |
+
FineVisionDD,3600,ocrbench_ocrbench_accuracy,0.454,
|
| 28 |
+
FineVisionDD,3600,textvqa_val_exact_match,0.38308000000000003,0.0066477952252059665
|
| 29 |
+
FineVisionDD,4800,average,0.3549622071061929,
|
| 30 |
+
FineVisionDD,4800,average_rank,2.2857142857142856,
|
| 31 |
+
FineVisionDD,4800,docvqa_val_anls,0.5347116037470354,0.006161120918755636
|
| 32 |
+
FineVisionDD,4800,infovqa_val_anls,0.22616829864068178,0.006791811877573115
|
| 33 |
+
FineVisionDD,4800,mme_total_score,1067.0920368147258,
|
| 34 |
+
FineVisionDD,4800,mmmu_val_mmmu_acc,0.27444,
|
| 35 |
+
FineVisionDD,4800,mmstar_average,0.23307334024944037,
|
| 36 |
+
FineVisionDD,4800,ocrbench_ocrbench_accuracy,0.473,
|
| 37 |
+
FineVisionDD,4800,textvqa_val_exact_match,0.38837999999999995,0.006654731565618713
|
| 38 |
+
FineVisionDD,6000,average,0.3848921103122081,
|
| 39 |
+
FineVisionDD,6000,average_rank,2.142857142857143,
|
| 40 |
+
FineVisionDD,6000,docvqa_val_anls,0.5762794835718067,0.006247345256607651
|
| 41 |
+
FineVisionDD,6000,infovqa_val_anls,0.25437900510747613,0.007245969162163573
|
| 42 |
+
FineVisionDD,6000,mme_total_score,1182.3837535014004,
|
| 43 |
+
FineVisionDD,6000,mmmu_val_mmmu_acc,0.27222,
|
| 44 |
+
FineVisionDD,6000,mmstar_average,0.2747341731939661,
|
| 45 |
+
FineVisionDD,6000,ocrbench_ocrbench_accuracy,0.495,
|
| 46 |
+
FineVisionDD,6000,textvqa_val_exact_match,0.43673999999999996,0.006759376621735387
|
| 47 |
+
FineVisionDD,7200,average,0.3978156352765745,
|
| 48 |
+
FineVisionDD,7200,average_rank,1.8571428571428572,
|
| 49 |
+
FineVisionDD,7200,docvqa_val_anls,0.5914916761381446,0.006230792162717311
|
| 50 |
+
FineVisionDD,7200,infovqa_val_anls,0.2584115961449724,0.007214877478455323
|
| 51 |
+
FineVisionDD,7200,mme_total_score,1174.9931972789116,
|
| 52 |
+
FineVisionDD,7200,mmmu_val_mmmu_acc,0.28889,
|
| 53 |
+
FineVisionDD,7200,mmstar_average,0.30312053937633016,
|
| 54 |
+
FineVisionDD,7200,ocrbench_ocrbench_accuracy,0.501,
|
| 55 |
+
FineVisionDD,7200,textvqa_val_exact_match,0.44398000000000004,0.006765405092173878
|
| 56 |
+
FineVisionDD,8400,average,0.4059159035113804,
|
| 57 |
+
FineVisionDD,8400,average_rank,1.7142857142857142,
|
| 58 |
+
FineVisionDD,8400,docvqa_val_anls,0.6115548076222326,0.006189572923188405
|
| 59 |
+
FineVisionDD,8400,infovqa_val_anls,0.2617197889496108,0.007158591695868175
|
| 60 |
+
FineVisionDD,8400,mme_total_score,1252.2165866346538,
|
| 61 |
+
FineVisionDD,8400,mmmu_val_mmmu_acc,0.29444,
|
| 62 |
+
FineVisionDD,8400,mmstar_average,0.285260824496439,
|
| 63 |
+
FineVisionDD,8400,ocrbench_ocrbench_accuracy,0.52,
|
| 64 |
+
FineVisionDD,8400,textvqa_val_exact_match,0.4625200000000001,0.0067937236370175695
|
| 65 |
+
FineVisionDD,9600,average,0.41115899049749083,
|
| 66 |
+
FineVisionDD,9600,average_rank,1.5714285714285714,
|
| 67 |
+
FineVisionDD,9600,docvqa_val_anls,0.6213641622467091,0.006165172206432181
|
| 68 |
+
FineVisionDD,9600,infovqa_val_anls,0.2757908658532091,0.007363785243871019
|
| 69 |
+
FineVisionDD,9600,mme_total_score,1239.7746098439375,
|
| 70 |
+
FineVisionDD,9600,mmmu_val_mmmu_acc,0.29444,
|
| 71 |
+
FineVisionDD,9600,mmstar_average,0.2999389148850269,
|
| 72 |
+
FineVisionDD,9600,ocrbench_ocrbench_accuracy,0.519,
|
| 73 |
+
FineVisionDD,9600,textvqa_val_exact_match,0.45642000000000005,0.006788827170791062
|
| 74 |
+
FineVisionDD,10800,average,0.41894565175282533,
|
| 75 |
+
FineVisionDD,10800,average_rank,1.1428571428571428,
|
| 76 |
+
FineVisionDD,10800,docvqa_val_anls,0.6353621980573124,0.006124533744452508
|
| 77 |
+
FineVisionDD,10800,infovqa_val_anls,0.26751996667040645,0.0071172404352328284
|
| 78 |
+
FineVisionDD,10800,mme_total_score,1353.3499399759903,
|
| 79 |
+
FineVisionDD,10800,mmmu_val_mmmu_acc,0.29778,
|
| 80 |
+
FineVisionDD,10800,mmstar_average,0.325351745789233,
|
| 81 |
+
FineVisionDD,10800,ocrbench_ocrbench_accuracy,0.516,
|
| 82 |
+
FineVisionDD,10800,textvqa_val_exact_match,0.47165999999999997,0.0067931287489374085
|
| 83 |
+
FineVisionDD,12000,average,0.4208515127756214,
|
| 84 |
+
FineVisionDD,12000,average_rank,1.4285714285714286,
|
| 85 |
+
FineVisionDD,12000,docvqa_val_anls,0.6294351828158641,0.006169625021925361
|
| 86 |
+
FineVisionDD,12000,infovqa_val_anls,0.2797661440287805,0.007408513793528687
|
| 87 |
+
FineVisionDD,12000,mme_total_score,1091.6394557823128,
|
| 88 |
+
FineVisionDD,12000,mmmu_val_mmmu_acc,0.29556,
|
| 89 |
+
FineVisionDD,12000,mmstar_average,0.32114774980908367,
|
| 90 |
+
FineVisionDD,12000,ocrbench_ocrbench_accuracy,0.525,
|
| 91 |
+
FineVisionDD,12000,textvqa_val_exact_match,0.4742,0.006787465354400525
|
| 92 |
+
FineVisionDD,13200,average,0.42658753741516975,
|
| 93 |
+
FineVisionDD,13200,average_rank,1.5714285714285714,
|
| 94 |
+
FineVisionDD,13200,docvqa_val_anls,0.6427877927509281,0.006125147292514003
|
| 95 |
+
FineVisionDD,13200,infovqa_val_anls,0.2907270038093242,0.007372590798085613
|
| 96 |
+
FineVisionDD,13200,mme_total_score,1211.7135854341736,
|
| 97 |
+
FineVisionDD,13200,mmmu_val_mmmu_acc,0.28889,
|
| 98 |
+
FineVisionDD,13200,mmstar_average,0.30988042793076603,
|
| 99 |
+
FineVisionDD,13200,ocrbench_ocrbench_accuracy,0.546,
|
| 100 |
+
FineVisionDD,13200,textvqa_val_exact_match,0.48123999999999995,0.0068072667243212395
|
| 101 |
+
FineVisionDD,14400,average,0.4273536900736185,
|
| 102 |
+
FineVisionDD,14400,average_rank,1.5714285714285714,
|
| 103 |
+
FineVisionDD,14400,docvqa_val_anls,0.654480111743584,0.006079437400066777
|
| 104 |
+
FineVisionDD,14400,infovqa_val_anls,0.2776743812062677,0.007152404684338895
|
| 105 |
+
FineVisionDD,14400,mme_total_score,1211.577330932373,
|
| 106 |
+
FineVisionDD,14400,mmmu_val_mmmu_acc,0.28222,
|
| 107 |
+
FineVisionDD,14400,mmstar_average,0.32896764749185925,
|
| 108 |
+
FineVisionDD,14400,ocrbench_ocrbench_accuracy,0.527,
|
| 109 |
+
FineVisionDD,14400,textvqa_val_exact_match,0.49378,0.006791486374677893
|
| 110 |
+
FineVisionDD,15600,average,0.4373836230155283,
|
| 111 |
+
FineVisionDD,15600,average_rank,1.0,
|
| 112 |
+
FineVisionDD,15600,docvqa_val_anls,0.6587223702708729,0.0060724859630705355
|
| 113 |
+
FineVisionDD,15600,infovqa_val_anls,0.2954608342132971,0.007455706284703673
|
| 114 |
+
FineVisionDD,15600,mme_total_score,1196.3369347739094,
|
| 115 |
+
FineVisionDD,15600,mmmu_val_mmmu_acc,0.29333,
|
| 116 |
+
FineVisionDD,15600,mmstar_average,0.33750853360899963,
|
| 117 |
+
FineVisionDD,15600,ocrbench_ocrbench_accuracy,0.54,
|
| 118 |
+
FineVisionDD,15600,textvqa_val_exact_match,0.49927999999999995,0.0067965531666418525
|
| 119 |
+
FineVisionDD,16800,average,0.43378959957858315,
|
| 120 |
+
FineVisionDD,16800,average_rank,1.2857142857142858,
|
| 121 |
+
FineVisionDD,16800,docvqa_val_anls,0.6677987652181413,0.006012562319824571
|
| 122 |
+
FineVisionDD,16800,infovqa_val_anls,0.2813134865271826,0.007107230565585641
|
| 123 |
+
FineVisionDD,16800,mme_total_score,1303.9127651060423,
|
| 124 |
+
FineVisionDD,16800,mmmu_val_mmmu_acc,0.28111,
|
| 125 |
+
FineVisionDD,16800,mmstar_average,0.3315953457261746,
|
| 126 |
+
FineVisionDD,16800,ocrbench_ocrbench_accuracy,0.549,
|
| 127 |
+
FineVisionDD,16800,textvqa_val_exact_match,0.4919200000000001,0.006795246706011423
|
| 128 |
+
FineVisionDD,18000,average,0.4460242607466102,
|
| 129 |
+
FineVisionDD,18000,average_rank,1.1428571428571428,
|
| 130 |
+
FineVisionDD,18000,docvqa_val_anls,0.6719255126618523,0.006008621561058294
|
| 131 |
+
FineVisionDD,18000,infovqa_val_anls,0.29900934485493813,0.007466958171203317
|
| 132 |
+
FineVisionDD,18000,mme_total_score,1236.6654661864745,
|
| 133 |
+
FineVisionDD,18000,mmmu_val_mmmu_acc,0.3,
|
| 134 |
+
FineVisionDD,18000,mmstar_average,0.34327070696287054,
|
| 135 |
+
FineVisionDD,18000,ocrbench_ocrbench_accuracy,0.546,
|
| 136 |
+
FineVisionDD,18000,textvqa_val_exact_match,0.5159400000000001,0.006793085637800874
|
| 137 |
+
FineVisionDD,19200,average,0.44845865852995476,
|
| 138 |
+
FineVisionDD,19200,average_rank,1.0,
|
| 139 |
+
FineVisionDD,19200,docvqa_val_anls,0.6777684245254485,0.005985910291387732
|
| 140 |
+
FineVisionDD,19200,infovqa_val_anls,0.2877789783739627,0.007152893066126468
|
| 141 |
+
FineVisionDD,19200,mme_total_score,1240.2280912364945,
|
| 142 |
+
FineVisionDD,19200,mmmu_val_mmmu_acc,0.29778,
|
| 143 |
+
FineVisionDD,19200,mmstar_average,0.3473245482803175,
|
| 144 |
+
FineVisionDD,19200,ocrbench_ocrbench_accuracy,0.568,
|
| 145 |
+
FineVisionDD,19200,textvqa_val_exact_match,0.5121,0.006797143387603819
|
| 146 |
+
FineVisionDD,20400,average,0.4507597489696731,
|
| 147 |
+
FineVisionDD,20400,docvqa_val_anls,0.683992435806577,0.005972444631447485
|
| 148 |
+
FineVisionDD,20400,infovqa_val_anls,0.29487349692639875,0.00732361020606081
|
| 149 |
+
FineVisionDD,20400,mme_total_score,1273.203481392557,
|
| 150 |
+
FineVisionDD,20400,mmmu_val_mmmu_acc,0.28222,
|
| 151 |
+
FineVisionDD,20400,mmstar_average,0.349552561085063,
|
| 152 |
+
FineVisionDD,20400,ocrbench_ocrbench_accuracy,0.575,
|
| 153 |
+
FineVisionDD,20400,textvqa_val_exact_match,0.5189199999999999,0.006790760605846829
|
| 154 |
+
CauldronDD,300,average,0.19965858916400772,
|
| 155 |
+
CauldronDD,300,average_rank,1.5714285714285714,
|
| 156 |
+
CauldronDD,300,docvqa_val_anls,0.1630709902951134,0.004134430994096956
|
| 157 |
+
CauldronDD,300,infovqa_val_anls,0.11235975762377737,0.0049002431669000045
|
| 158 |
+
CauldronDD,300,mme_total_score,916.8871548619447,
|
| 159 |
+
CauldronDD,300,mmmu_val_mmmu_acc,0.25667,
|
| 160 |
+
CauldronDD,300,mmstar_average,0.22555078706515572,
|
| 161 |
+
CauldronDD,300,ocrbench_ocrbench_accuracy,0.181,
|
| 162 |
+
CauldronDD,300,textvqa_val_exact_match,0.2593,0.006011350036876339
|
| 163 |
+
CauldronDD,1200,average,0.29972102969630693,
|
| 164 |
+
CauldronDD,1200,average_rank,1.5714285714285714,
|
| 165 |
+
CauldronDD,1200,docvqa_val_anls,0.3393747623503541,0.005393199870631087
|
| 166 |
+
CauldronDD,1200,infovqa_val_anls,0.14788475521512282,0.005517625394198703
|
| 167 |
+
CauldronDD,1200,mme_total_score,1237.1527611044417,
|
| 168 |
+
CauldronDD,1200,mmmu_val_mmmu_acc,0.28444,
|
| 169 |
+
CauldronDD,1200,mmstar_average,0.2961666606123647,
|
| 170 |
+
CauldronDD,1200,ocrbench_ocrbench_accuracy,0.324,
|
| 171 |
+
CauldronDD,1200,textvqa_val_exact_match,0.40646000000000004,0.006706135111196755
|
| 172 |
+
CauldronDD,2400,average,0.3338688722253544,
|
| 173 |
+
CauldronDD,2400,average_rank,1.8571428571428572,
|
| 174 |
+
CauldronDD,2400,docvqa_val_anls,0.4106908679403099,0.00557717705073105
|
| 175 |
+
CauldronDD,2400,infovqa_val_anls,0.16022819076638478,0.005740317063734872
|
| 176 |
+
CauldronDD,2400,mme_total_score,1243.3691476590636,
|
| 177 |
+
CauldronDD,2400,mmmu_val_mmmu_acc,0.27889,
|
| 178 |
+
CauldronDD,2400,mmstar_average,0.33588417464543163,
|
| 179 |
+
CauldronDD,2400,ocrbench_ocrbench_accuracy,0.366,
|
| 180 |
+
CauldronDD,2400,textvqa_val_exact_match,0.45152,0.006779965450229171
|
| 181 |
+
CauldronDD,2700,average,0.34202507191206166,
|
| 182 |
+
CauldronDD,2700,average_rank,1.5714285714285714,
|
| 183 |
+
CauldronDD,2700,docvqa_val_anls,0.4194265744988737,0.005598230883238166
|
| 184 |
+
CauldronDD,2700,infovqa_val_anls,0.16192600102107405,0.005717482217545598
|
| 185 |
+
CauldronDD,2700,mme_total_score,1197.9157663065225,
|
| 186 |
+
CauldronDD,2700,mmmu_val_mmmu_acc,0.29,
|
| 187 |
+
CauldronDD,2700,mmstar_average,0.32295785595242227,
|
| 188 |
+
CauldronDD,2700,ocrbench_ocrbench_accuracy,0.388,
|
| 189 |
+
CauldronDD,2700,textvqa_val_exact_match,0.46984,0.006812118310127491
|
| 190 |
+
CauldronDD,3600,average,0.33947430615719726,
|
| 191 |
+
CauldronDD,3600,average_rank,2.2857142857142856,
|
| 192 |
+
CauldronDD,3600,docvqa_val_anls,0.43097255569855397,0.005587910026275849
|
| 193 |
+
CauldronDD,3600,infovqa_val_anls,0.1641426454649424,0.005800068910792727
|
| 194 |
+
CauldronDD,3600,mme_total_score,1310.0697278911566,
|
| 195 |
+
CauldronDD,3600,mmmu_val_mmmu_acc,0.28333,
|
| 196 |
+
CauldronDD,3600,mmstar_average,0.3259006357796873,
|
| 197 |
+
CauldronDD,3600,ocrbench_ocrbench_accuracy,0.36,
|
| 198 |
+
CauldronDD,3600,textvqa_val_exact_match,0.4725,0.006816571214960329
|
| 199 |
+
CauldronDD,4800,average,0.3474647210512976,
|
| 200 |
+
CauldronDD,4800,average_rank,2.142857142857143,
|
| 201 |
+
CauldronDD,4800,docvqa_val_anls,0.44347290757863167,0.005625752855686164
|
| 202 |
+
CauldronDD,4800,infovqa_val_anls,0.16073440834957092,0.00572812246049592
|
| 203 |
+
CauldronDD,4800,mme_total_score,1239.124949979992,
|
| 204 |
+
CauldronDD,4800,mmmu_val_mmmu_acc,0.31556,
|
| 205 |
+
CauldronDD,4800,mmstar_average,0.3157610103795831,
|
| 206 |
+
CauldronDD,4800,ocrbench_ocrbench_accuracy,0.378,
|
| 207 |
+
CauldronDD,4800,textvqa_val_exact_match,0.47125999999999996,0.00680373872603368
|
| 208 |
+
CauldronDD,5100,average,0.34849328691237624,
|
| 209 |
+
CauldronDD,5100,average_rank,1.7142857142857142,
|
| 210 |
+
CauldronDD,5100,docvqa_val_anls,0.4400533401720571,0.005603146586802499
|
| 211 |
+
CauldronDD,5100,infovqa_val_anls,0.1592834226378583,0.005693695979163053
|
| 212 |
+
CauldronDD,5100,mme_total_score,1319.4603841536614,
|
| 213 |
+
CauldronDD,5100,mmmu_val_mmmu_acc,0.30333,
|
| 214 |
+
CauldronDD,5100,mmstar_average,0.33557295866434195,
|
| 215 |
+
CauldronDD,5100,ocrbench_ocrbench_accuracy,0.373,
|
| 216 |
+
CauldronDD,5100,textvqa_val_exact_match,0.47972,0.00682083932443933
|
| 217 |
+
CauldronDD,6000,average,0.3400596935324955,
|
| 218 |
+
CauldronDD,6000,average_rank,2.0,
|
| 219 |
+
CauldronDD,6000,docvqa_val_anls,0.43150620522864996,0.005601817666455916
|
| 220 |
+
CauldronDD,6000,infovqa_val_anls,0.16804581718043338,0.005797914749544558
|
| 221 |
+
CauldronDD,6000,mme_total_score,1246.4825930372149,
|
| 222 |
+
CauldronDD,6000,mmmu_val_mmmu_acc,0.27667,
|
| 223 |
+
CauldronDD,6000,mmstar_average,0.34191613878588945,
|
| 224 |
+
CauldronDD,6000,ocrbench_ocrbench_accuracy,0.368,
|
| 225 |
+
CauldronDD,6000,textvqa_val_exact_match,0.45421999999999996,0.006799535650102248
|
| 226 |
+
CauldronDD,7200,average,0.3391609673818097,
|
| 227 |
+
CauldronDD,7200,average_rank,2.2857142857142856,
|
| 228 |
+
CauldronDD,7200,docvqa_val_anls,0.4285872356274967,0.005613450362222006
|
| 229 |
+
CauldronDD,7200,infovqa_val_anls,0.1673609356908039,0.0058332340615507815
|
| 230 |
+
CauldronDD,7200,mme_total_score,1225.8680472188876,
|
| 231 |
+
CauldronDD,7200,mmmu_val_mmmu_acc,0.28778,
|
| 232 |
+
CauldronDD,7200,mmstar_average,0.31851763297255725,
|
| 233 |
+
CauldronDD,7200,ocrbench_ocrbench_accuracy,0.378,
|
| 234 |
+
CauldronDD,7200,textvqa_val_exact_match,0.45472000000000007,0.006786512776907903
|
| 235 |
+
CauldronDD,7500,average,0.34519234835518026,
|
| 236 |
+
CauldronDD,7500,average_rank,1.8571428571428572,
|
| 237 |
+
CauldronDD,7500,docvqa_val_anls,0.4400007858471883,0.005617720028882394
|
| 238 |
+
CauldronDD,7500,infovqa_val_anls,0.1702707959590441,0.0058853960353902985
|
| 239 |
+
CauldronDD,7500,mme_total_score,1251.4401760704282,
|
| 240 |
+
CauldronDD,7500,mmmu_val_mmmu_acc,0.29889,
|
| 241 |
+
CauldronDD,7500,mmstar_average,0.3133725083248492,
|
| 242 |
+
CauldronDD,7500,ocrbench_ocrbench_accuracy,0.391,
|
| 243 |
+
CauldronDD,7500,textvqa_val_exact_match,0.4576200000000001,0.006805178117422201
|
| 244 |
+
CauldronDD,8400,average,0.3431478061334871,
|
| 245 |
+
CauldronDD,8400,average_rank,2.5714285714285716,
|
| 246 |
+
CauldronDD,8400,docvqa_val_anls,0.440186698815653,0.005613446205499607
|
| 247 |
+
CauldronDD,8400,infovqa_val_anls,0.17029748604016814,0.005836597208873185
|
| 248 |
+
CauldronDD,8400,mme_total_score,1271.5840336134456,
|
| 249 |
+
CauldronDD,8400,mmmu_val_mmmu_acc,0.27778,
|
| 250 |
+
CauldronDD,8400,mmstar_average,0.32566265194510147,
|
| 251 |
+
CauldronDD,8400,ocrbench_ocrbench_accuracy,0.386,
|
| 252 |
+
CauldronDD,8400,textvqa_val_exact_match,0.45896000000000003,0.00681272532289869
|
| 253 |
+
CauldronDD,9600,average,0.3413459009956081,
|
| 254 |
+
CauldronDD,9600,average_rank,2.857142857142857,
|
| 255 |
+
CauldronDD,9600,docvqa_val_anls,0.4403774280666133,0.005612804160672664
|
| 256 |
+
CauldronDD,9600,infovqa_val_anls,0.16559694737276026,0.0058146690100803694
|
| 257 |
+
CauldronDD,9600,mme_total_score,1235.5730292116846,
|
| 258 |
+
CauldronDD,9600,mmmu_val_mmmu_acc,0.28,
|
| 259 |
+
CauldronDD,9600,mmstar_average,0.33264103053427463,
|
| 260 |
+
CauldronDD,9600,ocrbench_ocrbench_accuracy,0.383,
|
| 261 |
+
CauldronDD,9600,textvqa_val_exact_match,0.44646,0.006795434442760313
|
| 262 |
+
CauldronDD,9900,average,0.3355067141945109,
|
| 263 |
+
CauldronDD,9900,average_rank,2.142857142857143,
|
| 264 |
+
CauldronDD,9900,docvqa_val_anls,0.43635606798831567,0.0056201106916182715
|
| 265 |
+
CauldronDD,9900,infovqa_val_anls,0.15989145755054796,0.005753347711050537
|
| 266 |
+
CauldronDD,9900,mme_total_score,1246.687775110044,
|
| 267 |
+
CauldronDD,9900,mmmu_val_mmmu_acc,0.27111,
|
| 268 |
+
CauldronDD,9900,mmstar_average,0.31970275962820155,
|
| 269 |
+
CauldronDD,9900,ocrbench_ocrbench_accuracy,0.381,
|
| 270 |
+
CauldronDD,9900,textvqa_val_exact_match,0.44497999999999993,0.006793245877922539
|
| 271 |
+
CauldronDD,10800,average,0.3380861972330776,
|
| 272 |
+
CauldronDD,10800,average_rank,3.142857142857143,
|
| 273 |
+
CauldronDD,10800,docvqa_val_anls,0.4402326817553441,0.005626934973411334
|
| 274 |
+
CauldronDD,10800,infovqa_val_anls,0.16122827030707865,0.005747720437259022
|
| 275 |
+
CauldronDD,10800,mme_total_score,1245.125650260104,
|
| 276 |
+
CauldronDD,10800,mmmu_val_mmmu_acc,0.29444,
|
| 277 |
+
CauldronDD,10800,mmstar_average,0.309516231336043,
|
| 278 |
+
CauldronDD,10800,ocrbench_ocrbench_accuracy,0.383,
|
| 279 |
+
CauldronDD,10800,textvqa_val_exact_match,0.4401,0.006786752537259658
|
| 280 |
+
CauldronDD,11400,average,0.33351487442945066,
|
| 281 |
+
CauldronDD,11400,average_rank,2.4285714285714284,
|
| 282 |
+
CauldronDD,11400,docvqa_val_anls,0.43406486854294124,0.005623873843784784
|
| 283 |
+
CauldronDD,11400,infovqa_val_anls,0.16714581293411426,0.005782736796323627
|
| 284 |
+
CauldronDD,11400,mme_total_score,1237.4036614645859,
|
| 285 |
+
CauldronDD,11400,mmmu_val_mmmu_acc,0.26222,
|
| 286 |
+
CauldronDD,11400,mmstar_average,0.32995856509964816,
|
| 287 |
+
CauldronDD,11400,ocrbench_ocrbench_accuracy,0.364,
|
| 288 |
+
CauldronDD,11400,textvqa_val_exact_match,0.4437,0.006807916828686236
|
| 289 |
+
CauldronDD,12000,average,0.33154594568198864,
|
| 290 |
+
CauldronDD,12000,average_rank,3.2857142857142856,
|
| 291 |
+
CauldronDD,12000,docvqa_val_anls,0.43508650222322015,0.00561327125316578
|
| 292 |
+
CauldronDD,12000,infovqa_val_anls,0.16563023539653135,0.0058079534236688945
|
| 293 |
+
CauldronDD,12000,mme_total_score,1240.7185874349739,
|
| 294 |
+
CauldronDD,12000,mmmu_val_mmmu_acc,0.27556,
|
| 295 |
+
CauldronDD,12000,mmstar_average,0.2978389364721804,
|
| 296 |
+
CauldronDD,12000,ocrbench_ocrbench_accuracy,0.375,
|
| 297 |
+
CauldronDD,12000,textvqa_val_exact_match,0.44016000000000005,0.006801256229349064
|
| 298 |
+
CauldronDD,13200,average,0.3323617201953493,
|
| 299 |
+
CauldronDD,13200,average_rank,3.2857142857142856,
|
| 300 |
+
CauldronDD,13200,docvqa_val_anls,0.4336687642519214,0.00561127691138422
|
| 301 |
+
CauldronDD,13200,infovqa_val_anls,0.16294964748823013,0.00577613475202133
|
| 302 |
+
CauldronDD,13200,mme_total_score,1232.6909763905562,
|
| 303 |
+
CauldronDD,13200,mmmu_val_mmmu_acc,0.27556,
|
| 304 |
+
CauldronDD,13200,mmstar_average,0.3120919094319445,
|
| 305 |
+
CauldronDD,13200,ocrbench_ocrbench_accuracy,0.37,
|
| 306 |
+
CauldronDD,13200,textvqa_val_exact_match,0.4398999999999999,0.006800709369586816
|
| 307 |
+
CauldronDD,14400,average,0.33686465162435447,
|
| 308 |
+
CauldronDD,14400,average_rank,3.0,
|
| 309 |
+
CauldronDD,14400,docvqa_val_anls,0.4346981780601323,0.005637000083152569
|
| 310 |
+
CauldronDD,14400,infovqa_val_anls,0.15117394150977184,0.005624727950317896
|
| 311 |
+
CauldronDD,14400,mme_total_score,1229.5749299719887,
|
| 312 |
+
CauldronDD,14400,mmmu_val_mmmu_acc,0.28444,
|
| 313 |
+
CauldronDD,14400,mmstar_average,0.3150357901762228,
|
| 314 |
+
CauldronDD,14400,ocrbench_ocrbench_accuracy,0.396,
|
| 315 |
+
CauldronDD,14400,textvqa_val_exact_match,0.43983999999999995,0.006801397406514065
|
| 316 |
+
CauldronDD,14700,average,0.33429875896686784,
|
| 317 |
+
CauldronDD,14700,average_rank,2.2857142857142856,
|
| 318 |
+
CauldronDD,14700,docvqa_val_anls,0.4327738487046949,0.005633644388554696
|
| 319 |
+
CauldronDD,14700,infovqa_val_anls,0.160120841205593,0.005735225827091493
|
| 320 |
+
CauldronDD,14700,mme_total_score,1207.2609043617447,
|
| 321 |
+
CauldronDD,14700,mmmu_val_mmmu_acc,0.26,
|
| 322 |
+
CauldronDD,14700,mmstar_average,0.31633786389091934,
|
| 323 |
+
CauldronDD,14700,ocrbench_ocrbench_accuracy,0.389,
|
| 324 |
+
CauldronDD,14700,textvqa_val_exact_match,0.44756,0.0068101163585480235
|
| 325 |
+
CauldronDD,15600,average,0.32646326413760035,
|
| 326 |
+
CauldronDD,15600,average_rank,3.5714285714285716,
|
| 327 |
+
CauldronDD,15600,docvqa_val_anls,0.433995514472087,0.005646461618482555
|
| 328 |
+
CauldronDD,15600,infovqa_val_anls,0.1562018233604324,0.005700992835439662
|
| 329 |
+
CauldronDD,15600,mme_total_score,1122.3809523809523,
|
| 330 |
+
CauldronDD,15600,mmmu_val_mmmu_acc,0.26333,
|
| 331 |
+
CauldronDD,15600,mmstar_average,0.30641224699308284,
|
| 332 |
+
CauldronDD,15600,ocrbench_ocrbench_accuracy,0.366,
|
| 333 |
+
CauldronDD,15600,textvqa_val_exact_match,0.43283999999999995,0.006800820326359335
|
| 334 |
+
CauldronDD,16800,average,0.32818017568992097,
|
| 335 |
+
CauldronDD,16800,average_rank,3.2857142857142856,
|
| 336 |
+
CauldronDD,16800,docvqa_val_anls,0.43345387633219307,0.005602799050931306
|
| 337 |
+
CauldronDD,16800,infovqa_val_anls,0.16417934269316956,0.005815179007624968
|
| 338 |
+
CauldronDD,16800,mme_total_score,1197.6628651460585,
|
| 339 |
+
CauldronDD,16800,mmmu_val_mmmu_acc,0.27111,
|
| 340 |
+
CauldronDD,16800,mmstar_average,0.3091778351141632,
|
| 341 |
+
CauldronDD,16800,ocrbench_ocrbench_accuracy,0.36,
|
| 342 |
+
CauldronDD,16800,textvqa_val_exact_match,0.43116000000000004,0.006790215923404594
|
| 343 |
+
CauldronDD,17100,average,0.3385701687163391,
|
| 344 |
+
CauldronDD,17100,average_rank,2.2857142857142856,
|
| 345 |
+
CauldronDD,17100,docvqa_val_anls,0.44035807792372417,0.005618024098992455
|
| 346 |
+
CauldronDD,17100,infovqa_val_anls,0.15927117998447532,0.0057253221102160036
|
| 347 |
+
CauldronDD,17100,mme_total_score,1125.826630652261,
|
| 348 |
+
CauldronDD,17100,mmmu_val_mmmu_acc,0.29111,
|
| 349 |
+
CauldronDD,17100,mmstar_average,0.32228175438983514,
|
| 350 |
+
CauldronDD,17100,ocrbench_ocrbench_accuracy,0.385,
|
| 351 |
+
CauldronDD,17100,textvqa_val_exact_match,0.4334,0.006792916659532094
|
| 352 |
+
CauldronDD,18000,average,0.3341436139545066,
|
| 353 |
+
CauldronDD,18000,average_rank,3.2857142857142856,
|
| 354 |
+
CauldronDD,18000,docvqa_val_anls,0.4405469745279471,0.0056286501797814135
|
| 355 |
+
CauldronDD,18000,infovqa_val_anls,0.1660848313620339,0.005819813220995324
|
| 356 |
+
CauldronDD,18000,mme_total_score,1242.9980992396959,
|
| 357 |
+
CauldronDD,18000,mmmu_val_mmmu_acc,0.27778,
|
| 358 |
+
CauldronDD,18000,mmstar_average,0.31554987783705823,
|
| 359 |
+
CauldronDD,18000,ocrbench_ocrbench_accuracy,0.373,
|
| 360 |
+
CauldronDD,18000,textvqa_val_exact_match,0.4319,0.006790913141858027
|
| 361 |
+
CauldronDD,19200,average,0.33290606090591973,
|
| 362 |
+
CauldronDD,19200,average_rank,3.2857142857142856,
|
| 363 |
+
CauldronDD,19200,docvqa_val_anls,0.43616573848632056,0.005619579845927559
|
| 364 |
+
CauldronDD,19200,infovqa_val_anls,0.16528162106770297,0.005801061681754425
|
| 365 |
+
CauldronDD,19200,mme_total_score,1230.0974389755902,
|
| 366 |
+
CauldronDD,19200,mmmu_val_mmmu_acc,0.27,
|
| 367 |
+
CauldronDD,19200,mmstar_average,0.3266290058814946,
|
| 368 |
+
CauldronDD,19200,ocrbench_ocrbench_accuracy,0.374,
|
| 369 |
+
CauldronDD,19200,textvqa_val_exact_match,0.42536,0.006794218598284299
|
| 370 |
+
CauldronDD,19500,average,0.32553352494764914,
|
| 371 |
+
CauldronDD,19500,average_rank,2.4285714285714284,
|
| 372 |
+
CauldronDD,19500,docvqa_val_anls,0.4288225859433628,0.005619113441752853
|
| 373 |
+
CauldronDD,19500,infovqa_val_anls,0.15150002729561038,0.00560320463678714
|
| 374 |
+
CauldronDD,19500,mme_total_score,1198.1334533813524,
|
| 375 |
+
CauldronDD,19500,mmmu_val_mmmu_acc,0.25111,
|
| 376 |
+
CauldronDD,19500,mmstar_average,0.3198285364469219,
|
| 377 |
+
CauldronDD,19500,ocrbench_ocrbench_accuracy,0.373,
|
| 378 |
+
CauldronDD,19500,textvqa_val_exact_match,0.42894,0.006790325248719436
|
| 379 |
+
CambrianDD,300,average,0.17970577045668043,
|
| 380 |
+
CambrianDD,300,average_rank,1.8571428571428572,
|
| 381 |
+
CambrianDD,300,docvqa_val_anls,0.14433321388458845,0.004176784210049873
|
| 382 |
+
CambrianDD,300,infovqa_val_anls,0.13148487541870452,0.0056192589681577886
|
| 383 |
+
CambrianDD,300,mme_total_score,990.4948979591837,
|
| 384 |
+
CambrianDD,300,mmmu_val_mmmu_acc,0.24222,
|
| 385 |
+
CambrianDD,300,mmstar_average,0.2454565334367895,
|
| 386 |
+
CambrianDD,300,ocrbench_ocrbench_accuracy,0.134,
|
| 387 |
+
CambrianDD,300,textvqa_val_exact_match,0.18074,0.005296623577739393
|
| 388 |
+
CambrianDD,1200,average,0.2568586004702917,
|
| 389 |
+
CambrianDD,1200,average_rank,2.857142857142857,
|
| 390 |
+
CambrianDD,1200,docvqa_val_anls,0.3316039842462008,0.0057785603046722
|
| 391 |
+
CambrianDD,1200,infovqa_val_anls,0.14630377786332374,0.005668585125239906
|
| 392 |
+
CambrianDD,1200,mme_total_score,1112.7626050420167,
|
| 393 |
+
CambrianDD,1200,mmmu_val_mmmu_acc,0.26111,
|
| 394 |
+
CambrianDD,1200,mmstar_average,0.21803384071222537,
|
| 395 |
+
CambrianDD,1200,ocrbench_ocrbench_accuracy,0.247,
|
| 396 |
+
CambrianDD,1200,textvqa_val_exact_match,0.3371,0.006460330113317322
|
| 397 |
+
CambrianDD,2400,average,0.30575373318860816,
|
| 398 |
+
CambrianDD,2400,average_rank,2.7142857142857144,
|
| 399 |
+
CambrianDD,2400,docvqa_val_anls,0.40422225671207945,0.006074261001968628
|
| 400 |
+
CambrianDD,2400,infovqa_val_anls,0.1523121409563817,0.005638329718892052
|
| 401 |
+
CambrianDD,2400,mme_total_score,1059.9440776310523,
|
| 402 |
+
CambrianDD,2400,mmmu_val_mmmu_acc,0.28444,
|
| 403 |
+
CambrianDD,2400,mmstar_average,0.3110480014631879,
|
| 404 |
+
CambrianDD,2400,ocrbench_ocrbench_accuracy,0.3,
|
| 405 |
+
CambrianDD,2400,textvqa_val_exact_match,0.38249999999999995,0.006625581458704827
|
| 406 |
+
CambrianDD,2700,average,0.3094104328037755,
|
| 407 |
+
CambrianDD,2700,average_rank,2.2857142857142856,
|
| 408 |
+
CambrianDD,2700,docvqa_val_anls,0.4213173056510248,0.006079072406765826
|
| 409 |
+
CambrianDD,2700,infovqa_val_anls,0.16214248952051602,0.005831948548024231
|
| 410 |
+
CambrianDD,2700,mme_total_score,1054.2070828331332,
|
| 411 |
+
CambrianDD,2700,mmmu_val_mmmu_acc,0.26222,
|
| 412 |
+
CambrianDD,2700,mmstar_average,0.3088828016511124,
|
| 413 |
+
CambrianDD,2700,ocrbench_ocrbench_accuracy,0.306,
|
| 414 |
+
CambrianDD,2700,textvqa_val_exact_match,0.3959,0.006664497063111428
|
| 415 |
+
CambrianDD,3600,average,0.3244376041867266,
|
| 416 |
+
CambrianDD,3600,average_rank,2.7142857142857144,
|
| 417 |
+
CambrianDD,3600,docvqa_val_anls,0.4477711985871837,0.006244212556452033
|
| 418 |
+
CambrianDD,3600,infovqa_val_anls,0.17166556922234352,0.006038401288152695
|
| 419 |
+
CambrianDD,3600,mme_total_score,1054.6183473389356,
|
| 420 |
+
CambrianDD,3600,mmmu_val_mmmu_acc,0.28778,
|
| 421 |
+
CambrianDD,3600,mmstar_average,0.3192288573108325,
|
| 422 |
+
CambrianDD,3600,ocrbench_ocrbench_accuracy,0.325,
|
| 423 |
+
CambrianDD,3600,textvqa_val_exact_match,0.39518000000000003,0.00666872160834278
|
| 424 |
+
CambrianDD,4800,average,0.33575298162563233,
|
| 425 |
+
CambrianDD,4800,average_rank,2.7142857142857144,
|
| 426 |
+
CambrianDD,4800,docvqa_val_anls,0.48021663592502906,0.006264475129046182
|
| 427 |
+
CambrianDD,4800,infovqa_val_anls,0.17732197564395005,0.005979359845801751
|
| 428 |
+
CambrianDD,4800,mme_total_score,984.9863945578231,
|
| 429 |
+
CambrianDD,4800,mmmu_val_mmmu_acc,0.29111,
|
| 430 |
+
CambrianDD,4800,mmstar_average,0.29772927818481454,
|
| 431 |
+
CambrianDD,4800,ocrbench_ocrbench_accuracy,0.346,
|
| 432 |
+
CambrianDD,4800,textvqa_val_exact_match,0.42214000000000007,0.0067477011177196344
|
| 433 |
+
CambrianDD,5100,average,0.3359877520445322,
|
| 434 |
+
CambrianDD,5100,average_rank,2.2857142857142856,
|
| 435 |
+
CambrianDD,5100,docvqa_val_anls,0.4754298197157412,0.006168130327198727
|
| 436 |
+
CambrianDD,5100,infovqa_val_anls,0.18076704246631303,0.0060732104869038175
|
| 437 |
+
CambrianDD,5100,mme_total_score,895.3776510604241,
|
| 438 |
+
CambrianDD,5100,mmmu_val_mmmu_acc,0.27889,
|
| 439 |
+
CambrianDD,5100,mmstar_average,0.31171965008513874,
|
| 440 |
+
CambrianDD,5100,ocrbench_ocrbench_accuracy,0.35,
|
| 441 |
+
CambrianDD,5100,textvqa_val_exact_match,0.41912,0.0067289182479918445
|
| 442 |
+
CambrianDD,6000,average,0.32347651657813326,
|
| 443 |
+
CambrianDD,6000,average_rank,3.0,
|
| 444 |
+
CambrianDD,6000,docvqa_val_anls,0.46634507029121364,0.0062238629881778374
|
| 445 |
+
CambrianDD,6000,infovqa_val_anls,0.17940221095579675,0.006141333951799168
|
| 446 |
+
CambrianDD,6000,mme_total_score,1072.1291516606643,
|
| 447 |
+
CambrianDD,6000,mmmu_val_mmmu_acc,0.27667,
|
| 448 |
+
CambrianDD,6000,mmstar_average,0.31024181822178915,
|
| 449 |
+
CambrianDD,6000,ocrbench_ocrbench_accuracy,0.305,
|
| 450 |
+
CambrianDD,6000,textvqa_val_exact_match,0.4032,0.006697142849340224
|
| 451 |
+
CambrianDD,7200,average,0.3486686601177924,
|
| 452 |
+
CambrianDD,7200,average_rank,3.0,
|
| 453 |
+
CambrianDD,7200,docvqa_val_anls,0.5033994017292339,0.006211902263203208
|
| 454 |
+
CambrianDD,7200,infovqa_val_anls,0.1898192044728013,0.006149174628390649
|
| 455 |
+
CambrianDD,7200,mme_total_score,879.126550620248,
|
| 456 |
+
CambrianDD,7200,mmmu_val_mmmu_acc,0.27556,
|
| 457 |
+
CambrianDD,7200,mmstar_average,0.32559335450471943,
|
| 458 |
+
CambrianDD,7200,ocrbench_ocrbench_accuracy,0.365,
|
| 459 |
+
CambrianDD,7200,textvqa_val_exact_match,0.43263999999999997,0.006774430209318876
|
| 460 |
+
CambrianDD,7500,average,0.3515269876619196,
|
| 461 |
+
CambrianDD,7500,average_rank,2.0,
|
| 462 |
+
CambrianDD,7500,docvqa_val_anls,0.4864674639384977,0.006096512234708711
|
| 463 |
+
CambrianDD,7500,infovqa_val_anls,0.19628222332012993,0.006277685053210526
|
| 464 |
+
CambrianDD,7500,mme_total_score,1053.9439775910364,
|
| 465 |
+
CambrianDD,7500,mmmu_val_mmmu_acc,0.28778,
|
| 466 |
+
CambrianDD,7500,mmstar_average,0.32443223871288984,
|
| 467 |
+
CambrianDD,7500,ocrbench_ocrbench_accuracy,0.38,
|
| 468 |
+
CambrianDD,7500,textvqa_val_exact_match,0.4342000000000001,0.006762249448483892
|
| 469 |
+
CambrianDD,8400,average,0.3566021934695403,
|
| 470 |
+
CambrianDD,8400,average_rank,2.7142857142857144,
|
| 471 |
+
CambrianDD,8400,docvqa_val_anls,0.49954330523768764,0.006213069198769485
|
| 472 |
+
CambrianDD,8400,infovqa_val_anls,0.199645571135255,0.006349349194468786
|
| 473 |
+
CambrianDD,8400,mme_total_score,1114.3343337334934,
|
| 474 |
+
CambrianDD,8400,mmmu_val_mmmu_acc,0.29556,
|
| 475 |
+
CambrianDD,8400,mmstar_average,0.3215042844442992,
|
| 476 |
+
CambrianDD,8400,ocrbench_ocrbench_accuracy,0.379,
|
| 477 |
+
CambrianDD,8400,textvqa_val_exact_match,0.4443599999999999,0.006777995745444597
|
| 478 |
+
CambrianDD,9600,average,0.3625887269392778,
|
| 479 |
+
CambrianDD,9600,average_rank,2.5714285714285716,
|
| 480 |
+
CambrianDD,9600,docvqa_val_anls,0.5209747359075046,0.006185627757446921
|
| 481 |
+
CambrianDD,9600,infovqa_val_anls,0.20779524694498724,0.006396756819481715
|
| 482 |
+
CambrianDD,9600,mme_total_score,881.3031212484995,
|
| 483 |
+
CambrianDD,9600,mmmu_val_mmmu_acc,0.29889,
|
| 484 |
+
CambrianDD,9600,mmstar_average,0.3216523787831747,
|
| 485 |
+
CambrianDD,9600,ocrbench_ocrbench_accuracy,0.378,
|
| 486 |
+
CambrianDD,9600,textvqa_val_exact_match,0.44822,0.006790212641555748
|
| 487 |
+
CambrianDD,9900,average,0.36146463385115846,
|
| 488 |
+
CambrianDD,9900,average_rank,1.8571428571428572,
|
| 489 |
+
CambrianDD,9900,docvqa_val_anls,0.5081099773959435,0.00614386469189631
|
| 490 |
+
CambrianDD,9900,infovqa_val_anls,0.19863544950298542,0.006290462477841115
|
| 491 |
+
CambrianDD,9900,mme_total_score,947.5219087635055,
|
| 492 |
+
CambrianDD,9900,mmmu_val_mmmu_acc,0.30222,
|
| 493 |
+
CambrianDD,9900,mmstar_average,0.3418223762080219,
|
| 494 |
+
CambrianDD,9900,ocrbench_ocrbench_accuracy,0.375,
|
| 495 |
+
CambrianDD,9900,textvqa_val_exact_match,0.443,0.006785293511824548
|
| 496 |
+
CambrianDD,10800,average,0.36225439567996837,
|
| 497 |
+
CambrianDD,10800,average_rank,2.857142857142857,
|
| 498 |
+
CambrianDD,10800,docvqa_val_anls,0.5360831553687384,0.0062503917855996835
|
| 499 |
+
CambrianDD,10800,infovqa_val_anls,0.20358054292257038,0.0063419401635538405
|
| 500 |
+
CambrianDD,10800,mme_total_score,1067.7270908363346,
|
| 501 |
+
CambrianDD,10800,mmmu_val_mmmu_acc,0.28333,
|
| 502 |
+
CambrianDD,10800,mmstar_average,0.3324526757885013,
|
| 503 |
+
CambrianDD,10800,ocrbench_ocrbench_accuracy,0.368,
|
| 504 |
+
CambrianDD,10800,textvqa_val_exact_match,0.45008000000000004,0.006781238512185797
|
| 505 |
+
CambrianDD,11400,average,0.36662182455529396,
|
| 506 |
+
CambrianDD,11400,average_rank,1.8571428571428572,
|
| 507 |
+
CambrianDD,11400,docvqa_val_anls,0.5403085464525686,0.006288098035238887
|
| 508 |
+
CambrianDD,11400,infovqa_val_anls,0.20724387987551376,0.006422369131375898
|
| 509 |
+
CambrianDD,11400,mme_total_score,1090.8822529011604,
|
| 510 |
+
CambrianDD,11400,mmmu_val_mmmu_acc,0.29889,
|
| 511 |
+
CambrianDD,11400,mmstar_average,0.3195285210036817,
|
| 512 |
+
CambrianDD,11400,ocrbench_ocrbench_accuracy,0.38,
|
| 513 |
+
CambrianDD,11400,textvqa_val_exact_match,0.45375999999999994,0.006790875913984575
|
| 514 |
+
CambrianDD,12000,average,0.37022690841525296,
|
| 515 |
+
CambrianDD,12000,average_rank,2.142857142857143,
|
| 516 |
+
CambrianDD,12000,docvqa_val_anls,0.5329231904501042,0.00621682474881696
|
| 517 |
+
CambrianDD,12000,infovqa_val_anls,0.2099071605782676,0.0064660431120906045
|
| 518 |
+
CambrianDD,12000,mme_total_score,1029.8929571828733,
|
| 519 |
+
CambrianDD,12000,mmmu_val_mmmu_acc,0.30444,
|
| 520 |
+
CambrianDD,12000,mmstar_average,0.322291099463146,
|
| 521 |
+
CambrianDD,12000,ocrbench_ocrbench_accuracy,0.402,
|
| 522 |
+
CambrianDD,12000,textvqa_val_exact_match,0.4498,0.006790199802853561
|
| 523 |
+
CambrianDD,13200,average,0.3705817479124603,
|
| 524 |
+
CambrianDD,13200,average_rank,2.5714285714285716,
|
| 525 |
+
CambrianDD,13200,docvqa_val_anls,0.5406674097008617,0.006220185507992941
|
| 526 |
+
CambrianDD,13200,infovqa_val_anls,0.21720675802877365,0.00650836938989414
|
| 527 |
+
CambrianDD,13200,mme_total_score,1134.0421168467387,
|
| 528 |
+
CambrianDD,13200,mmmu_val_mmmu_acc,0.27889,
|
| 529 |
+
CambrianDD,13200,mmstar_average,0.3148263197451263,
|
| 530 |
+
CambrianDD,13200,ocrbench_ocrbench_accuracy,0.409,
|
| 531 |
+
CambrianDD,13200,textvqa_val_exact_match,0.4629,0.006796348730841747
|
| 532 |
+
CambrianDD,14400,average,0.3623658612664291,
|
| 533 |
+
CambrianDD,14400,average_rank,2.5714285714285716,
|
| 534 |
+
CambrianDD,14400,docvqa_val_anls,0.5152099093312626,0.006100903397549162
|
| 535 |
+
CambrianDD,14400,infovqa_val_anls,0.21109380152234544,0.006429931358574082
|
| 536 |
+
CambrianDD,14400,mme_total_score,1050.657763105242,
|
| 537 |
+
CambrianDD,14400,mmmu_val_mmmu_acc,0.28778,
|
| 538 |
+
CambrianDD,14400,mmstar_average,0.32701145674496673,
|
| 539 |
+
CambrianDD,14400,ocrbench_ocrbench_accuracy,0.383,
|
| 540 |
+
CambrianDD,14400,textvqa_val_exact_match,0.4501,0.006783833877713699
|
| 541 |
+
CambrianDD,14700,average,0.3748386548010339,
|
| 542 |
+
CambrianDD,14700,average_rank,1.4285714285714286,
|
| 543 |
+
CambrianDD,14700,docvqa_val_anls,0.5443355714236217,0.006257858861006952
|
| 544 |
+
CambrianDD,14700,infovqa_val_anls,0.21459500091962927,0.006462122780374779
|
| 545 |
+
CambrianDD,14700,mme_total_score,1105.4395758303322,
|
| 546 |
+
CambrianDD,14700,mmmu_val_mmmu_acc,0.29111,
|
| 547 |
+
CambrianDD,14700,mmstar_average,0.3272513564629525,
|
| 548 |
+
CambrianDD,14700,ocrbench_ocrbench_accuracy,0.404,
|
| 549 |
+
CambrianDD,14700,textvqa_val_exact_match,0.46774,0.006791751177480765
|
| 550 |
+
CambrianDD,15600,average,0.37528695975168413,
|
| 551 |
+
CambrianDD,15600,average_rank,2.142857142857143,
|
| 552 |
+
CambrianDD,15600,docvqa_val_anls,0.5490540524723359,0.006271460845615347
|
| 553 |
+
CambrianDD,15600,infovqa_val_anls,0.2171513714875839,0.006549339354210817
|
| 554 |
+
CambrianDD,15600,mme_total_score,1127.4101640656263,
|
| 555 |
+
CambrianDD,15600,mmmu_val_mmmu_acc,0.28556,
|
| 556 |
+
CambrianDD,15600,mmstar_average,0.332896334550185,
|
| 557 |
+
CambrianDD,15600,ocrbench_ocrbench_accuracy,0.399,
|
| 558 |
+
CambrianDD,15600,textvqa_val_exact_match,0.46806000000000003,0.006792053715831151
|
| 559 |
+
CambrianDD,16800,average,0.378379686213323,
|
| 560 |
+
CambrianDD,16800,average_rank,2.142857142857143,
|
| 561 |
+
CambrianDD,16800,docvqa_val_anls,0.5508556858421052,0.006230983486378255
|
| 562 |
+
CambrianDD,16800,infovqa_val_anls,0.22644813810901007,0.0065684324248959204
|
| 563 |
+
CambrianDD,16800,mme_total_score,956.2077831132453,
|
| 564 |
+
CambrianDD,16800,mmmu_val_mmmu_acc,0.29444,
|
| 565 |
+
CambrianDD,16800,mmstar_average,0.34207429332882255,
|
| 566 |
+
CambrianDD,16800,ocrbench_ocrbench_accuracy,0.405,
|
| 567 |
+
CambrianDD,16800,textvqa_val_exact_match,0.45146000000000003,0.00677465518462557
|
| 568 |
+
CambrianDD,17100,average,0.3745613817083588,
|
| 569 |
+
CambrianDD,17100,average_rank,1.4285714285714286,
|
| 570 |
+
CambrianDD,17100,docvqa_val_anls,0.5239343594449052,0.006067698891173559
|
| 571 |
+
CambrianDD,17100,infovqa_val_anls,0.21597294475540602,0.006475700240072832
|
| 572 |
+
CambrianDD,17100,mme_total_score,1066.7460984393756,
|
| 573 |
+
CambrianDD,17100,mmmu_val_mmmu_acc,0.3,
|
| 574 |
+
CambrianDD,17100,mmstar_average,0.34430098604984144,
|
| 575 |
+
CambrianDD,17100,ocrbench_ocrbench_accuracy,0.406,
|
| 576 |
+
CambrianDD,17100,textvqa_val_exact_match,0.45715999999999996,0.00678614450416776
|
| 577 |
+
CambrianDD,18000,average,0.37736657627182946,
|
| 578 |
+
CambrianDD,18000,average_rank,2.4285714285714284,
|
| 579 |
+
CambrianDD,18000,docvqa_val_anls,0.550171109156601,0.006266033692968377
|
| 580 |
+
CambrianDD,18000,infovqa_val_anls,0.2180520852784964,0.0064910045262362975
|
| 581 |
+
CambrianDD,18000,mme_total_score,1068.6598639455783,
|
| 582 |
+
CambrianDD,18000,mmmu_val_mmmu_acc,0.29,
|
| 583 |
+
CambrianDD,18000,mmstar_average,0.33205626319587944,
|
| 584 |
+
CambrianDD,18000,ocrbench_ocrbench_accuracy,0.409,
|
| 585 |
+
CambrianDD,18000,textvqa_val_exact_match,0.46492,0.0068105767385077025
|
| 586 |
+
CambrianDD,19200,average,0.37238254789618885,
|
| 587 |
+
CambrianDD,19200,average_rank,2.4285714285714284,
|
| 588 |
+
CambrianDD,19200,docvqa_val_anls,0.5332665411568654,0.006195231490784442
|
| 589 |
+
CambrianDD,19200,infovqa_val_anls,0.21571031377445513,0.006431739740859299
|
| 590 |
+
CambrianDD,19200,mme_total_score,1008.0998399359744,
|
| 591 |
+
CambrianDD,19200,mmmu_val_mmmu_acc,0.28444,
|
| 592 |
+
CambrianDD,19200,mmstar_average,0.33939843244581247,
|
| 593 |
+
CambrianDD,19200,ocrbench_ocrbench_accuracy,0.412,
|
| 594 |
+
CambrianDD,19200,textvqa_val_exact_match,0.44948,0.00679714544181831
|
| 595 |
+
CambrianDD,19500,average,0.3702087762443897,
|
| 596 |
+
CambrianDD,19500,average_rank,1.4285714285714286,
|
| 597 |
+
CambrianDD,19500,docvqa_val_anls,0.5327441491291284,0.006171493726324771
|
| 598 |
+
CambrianDD,19500,infovqa_val_anls,0.2134713917399994,0.006380629468185958
|
| 599 |
+
CambrianDD,19500,mme_total_score,1048.2445978391356,
|
| 600 |
+
CambrianDD,19500,mmmu_val_mmmu_acc,0.29444,
|
| 601 |
+
CambrianDD,19500,mmstar_average,0.33125711659721024,
|
| 602 |
+
CambrianDD,19500,ocrbench_ocrbench_accuracy,0.396,
|
| 603 |
+
CambrianDD,19500,textvqa_val_exact_match,0.4533400000000001,0.00679251032529976
|
| 604 |
+
LLaVaDD,300,average,0.14192111229918192,
|
| 605 |
+
LLaVaDD,300,average_rank,2.5714285714285716,
|
| 606 |
+
LLaVaDD,300,docvqa_val_anls,0.06089443514559298,0.0026836803170977547
|
| 607 |
+
LLaVaDD,300,infovqa_val_anls,0.0916406235448352,0.0046298095289004654
|
| 608 |
+
LLaVaDD,300,mme_total_score,777.2206882753101,
|
| 609 |
+
LLaVaDD,300,mmmu_val_mmmu_acc,0.24778,
|
| 610 |
+
LLaVaDD,300,mmstar_average,0.2549716151046633,
|
| 611 |
+
LLaVaDD,300,ocrbench_ocrbench_accuracy,0.118,
|
| 612 |
+
LLaVaDD,300,textvqa_val_exact_match,0.07824,0.0036768470624795064
|
| 613 |
+
LLaVaDD,1200,average,0.2509776310427157,
|
| 614 |
+
LLaVaDD,1200,average_rank,3.0,
|
| 615 |
+
LLaVaDD,1200,docvqa_val_anls,0.2444383475360029,0.005026540300329091
|
| 616 |
+
LLaVaDD,1200,infovqa_val_anls,0.15487600151177214,0.005600679946634536
|
| 617 |
+
LLaVaDD,1200,mme_total_score,860.4959983993598,
|
| 618 |
+
LLaVaDD,1200,mmmu_val_mmmu_acc,0.24667,
|
| 619 |
+
LLaVaDD,1200,mmstar_average,0.21306143720851922,
|
| 620 |
+
LLaVaDD,1200,ocrbench_ocrbench_accuracy,0.325,
|
| 621 |
+
LLaVaDD,1200,textvqa_val_exact_match,0.32182000000000005,0.006396230129691582
|
| 622 |
+
LLaVaDD,2400,average,0.29579280325109375,
|
| 623 |
+
LLaVaDD,2400,average_rank,3.0,
|
| 624 |
+
LLaVaDD,2400,docvqa_val_anls,0.31538339385878306,0.005424291843634001
|
| 625 |
+
LLaVaDD,2400,infovqa_val_anls,0.18261071688457164,0.0059828856978779545
|
| 626 |
+
LLaVaDD,2400,mme_total_score,744.4002601040415,
|
| 627 |
+
LLaVaDD,2400,mmmu_val_mmmu_acc,0.24889,
|
| 628 |
+
LLaVaDD,2400,mmstar_average,0.24909270876320772,
|
| 629 |
+
LLaVaDD,2400,ocrbench_ocrbench_accuracy,0.398,
|
| 630 |
+
LLaVaDD,2400,textvqa_val_exact_match,0.38077999999999995,0.006625050685037501
|
| 631 |
+
LLaVaDD,2700,average,0.3161939125538216,
|
| 632 |
+
LLaVaDD,2700,average_rank,2.142857142857143,
|
| 633 |
+
LLaVaDD,2700,docvqa_val_anls,0.33989172267075535,0.005558177429759288
|
| 634 |
+
LLaVaDD,2700,infovqa_val_anls,0.18986570203917563,0.0060535360821678975
|
| 635 |
+
LLaVaDD,2700,mme_total_score,794.4580832332933,
|
| 636 |
+
LLaVaDD,2700,mmmu_val_mmmu_acc,0.26667,
|
| 637 |
+
LLaVaDD,2700,mmstar_average,0.26941605061299884,
|
| 638 |
+
LLaVaDD,2700,ocrbench_ocrbench_accuracy,0.427,
|
| 639 |
+
LLaVaDD,2700,textvqa_val_exact_match,0.40432,0.006696240396453028
|
| 640 |
+
LLaVaDD,3600,average,0.32734135036250206,
|
| 641 |
+
LLaVaDD,3600,average_rank,2.5714285714285716,
|
| 642 |
+
LLaVaDD,3600,docvqa_val_anls,0.35235179662486144,0.005549556404054767
|
| 643 |
+
LLaVaDD,3600,infovqa_val_anls,0.18556296710855402,0.006043411346585987
|
| 644 |
+
LLaVaDD,3600,mme_total_score,835.5973389355743,
|
| 645 |
+
LLaVaDD,3600,mmmu_val_mmmu_acc,0.29778,
|
| 646 |
+
LLaVaDD,3600,mmstar_average,0.2915733384415969,
|
| 647 |
+
LLaVaDD,3600,ocrbench_ocrbench_accuracy,0.426,
|
| 648 |
+
LLaVaDD,3600,textvqa_val_exact_match,0.41078000000000003,0.0067073508951900115
|
| 649 |
+
LLaVaDD,4800,average,0.33013109358835874,
|
| 650 |
+
LLaVaDD,4800,average_rank,2.857142857142857,
|
| 651 |
+
LLaVaDD,4800,docvqa_val_anls,0.3502881859839653,0.005478097656928352
|
| 652 |
+
LLaVaDD,4800,infovqa_val_anls,0.19107082217989702,0.006085171603850096
|
| 653 |
+
LLaVaDD,4800,mme_total_score,733.0080032012804,
|
| 654 |
+
LLaVaDD,4800,mmmu_val_mmmu_acc,0.27,
|
| 655 |
+
LLaVaDD,4800,mmstar_average,0.32564755336629003,
|
| 656 |
+
LLaVaDD,4800,ocrbench_ocrbench_accuracy,0.424,
|
| 657 |
+
LLaVaDD,4800,textvqa_val_exact_match,0.41978000000000004,0.006734153256647549
|
| 658 |
+
LLaVaDD,5100,average,0.33484217665675037,
|
| 659 |
+
LLaVaDD,5100,average_rank,2.0,
|
| 660 |
+
LLaVaDD,5100,docvqa_val_anls,0.36535966236487605,0.005521676818896047
|
| 661 |
+
LLaVaDD,5100,infovqa_val_anls,0.18507025741281324,0.005999863664896731
|
| 662 |
+
LLaVaDD,5100,mme_total_score,782.4783913565427,
|
| 663 |
+
LLaVaDD,5100,mmmu_val_mmmu_acc,0.27333,
|
| 664 |
+
LLaVaDD,5100,mmstar_average,0.336633140162813,
|
| 665 |
+
LLaVaDD,5100,ocrbench_ocrbench_accuracy,0.427,
|
| 666 |
+
LLaVaDD,5100,textvqa_val_exact_match,0.42166000000000003,0.006745344232414143
|
| 667 |
+
LLaVaDD,6000,average,0.35016629838344665,
|
| 668 |
+
LLaVaDD,6000,average_rank,2.857142857142857,
|
| 669 |
+
LLaVaDD,6000,docvqa_val_anls,0.3972329845041029,0.005775860539243304
|
| 670 |
+
LLaVaDD,6000,infovqa_val_anls,0.2075063299082507,0.006269613699866996
|
| 671 |
+
LLaVaDD,6000,mme_total_score,793.4260704281713,
|
| 672 |
+
LLaVaDD,6000,mmmu_val_mmmu_acc,0.26778,
|
| 673 |
+
LLaVaDD,6000,mmstar_average,0.31483847588832625,
|
| 674 |
+
LLaVaDD,6000,ocrbench_ocrbench_accuracy,0.466,
|
| 675 |
+
LLaVaDD,6000,textvqa_val_exact_match,0.44764000000000004,0.006783751907166682
|
| 676 |
+
LLaVaDD,7200,average,0.34725325204788143,
|
| 677 |
+
LLaVaDD,7200,average_rank,2.857142857142857,
|
| 678 |
+
LLaVaDD,7200,docvqa_val_anls,0.38590528101197885,0.0056434459440418885
|
| 679 |
+
LLaVaDD,7200,infovqa_val_anls,0.20202261217969525,0.006207536626913416
|
| 680 |
+
LLaVaDD,7200,mme_total_score,806.6480592236894,
|
| 681 |
+
LLaVaDD,7200,mmmu_val_mmmu_acc,0.27778,
|
| 682 |
+
LLaVaDD,7200,mmstar_average,0.31109161909561434,
|
| 683 |
+
LLaVaDD,7200,ocrbench_ocrbench_accuracy,0.461,
|
| 684 |
+
LLaVaDD,7200,textvqa_val_exact_match,0.44572,0.006774357143149495
|
| 685 |
+
LLaVaDD,7500,average,0.3567314821960954,
|
| 686 |
+
LLaVaDD,7500,average_rank,2.142857142857143,
|
| 687 |
+
LLaVaDD,7500,docvqa_val_anls,0.40402326367659314,0.005709021228633167
|
| 688 |
+
LLaVaDD,7500,infovqa_val_anls,0.20360823695540417,0.006242149206646578
|
| 689 |
+
LLaVaDD,7500,mme_total_score,881.968587434974,
|
| 690 |
+
LLaVaDD,7500,mmmu_val_mmmu_acc,0.26889,
|
| 691 |
+
LLaVaDD,7500,mmstar_average,0.3211273925445752,
|
| 692 |
+
LLaVaDD,7500,ocrbench_ocrbench_accuracy,0.486,
|
| 693 |
+
LLaVaDD,7500,textvqa_val_exact_match,0.45674,0.006794008109300271
|
| 694 |
+
LLaVaDD,8400,average,0.36048790647619494,
|
| 695 |
+
LLaVaDD,8400,average_rank,3.0,
|
| 696 |
+
LLaVaDD,8400,docvqa_val_anls,0.41445027737785084,0.005825413484689958
|
| 697 |
+
LLaVaDD,8400,infovqa_val_anls,0.2172068852347218,0.006375888876018907
|
| 698 |
+
LLaVaDD,8400,mme_total_score,838.7092837134853,
|
| 699 |
+
LLaVaDD,8400,mmmu_val_mmmu_acc,0.29444,
|
| 700 |
+
LLaVaDD,8400,mmstar_average,0.31933027624459676,
|
| 701 |
+
LLaVaDD,8400,ocrbench_ocrbench_accuracy,0.473,
|
| 702 |
+
LLaVaDD,8400,textvqa_val_exact_match,0.4445,0.006768213334577188
|
| 703 |
+
LLaVaDD,9600,average,0.35282227960826557,
|
| 704 |
+
LLaVaDD,9600,average_rank,3.0,
|
| 705 |
+
LLaVaDD,9600,docvqa_val_anls,0.39757298714048717,0.005640323691319893
|
| 706 |
+
LLaVaDD,9600,infovqa_val_anls,0.2056866550403572,0.006279512757986692
|
| 707 |
+
LLaVaDD,9600,mme_total_score,760.0508203281312,
|
| 708 |
+
LLaVaDD,9600,mmmu_val_mmmu_acc,0.26556,
|
| 709 |
+
LLaVaDD,9600,mmstar_average,0.32465403546874877,
|
| 710 |
+
LLaVaDD,9600,ocrbench_ocrbench_accuracy,0.469,
|
| 711 |
+
LLaVaDD,9600,textvqa_val_exact_match,0.45446000000000003,0.006778466729448514
|
| 712 |
+
LLaVaDD,9900,average,0.3568107860471943,
|
| 713 |
+
LLaVaDD,9900,average_rank,2.0,
|
| 714 |
+
LLaVaDD,9900,docvqa_val_anls,0.40618497552083077,0.005714626310350028
|
| 715 |
+
LLaVaDD,9900,infovqa_val_anls,0.2062911109085894,0.006269310831066159
|
| 716 |
+
LLaVaDD,9900,mme_total_score,828.2609043617447,
|
| 717 |
+
LLaVaDD,9900,mmmu_val_mmmu_acc,0.26667,
|
| 718 |
+
LLaVaDD,9900,mmstar_average,0.32645862985374585,
|
| 719 |
+
LLaVaDD,9900,ocrbench_ocrbench_accuracy,0.473,
|
| 720 |
+
LLaVaDD,9900,textvqa_val_exact_match,0.46225999999999995,0.006800763821638828
|
| 721 |
+
LLaVaDD,10800,average,0.36137323363878177,
|
| 722 |
+
LLaVaDD,10800,average_rank,2.857142857142857,
|
| 723 |
+
LLaVaDD,10800,docvqa_val_anls,0.408003869061574,0.005694760075750652
|
| 724 |
+
LLaVaDD,10800,infovqa_val_anls,0.21338055182077123,0.0063085701231859895
|
| 725 |
+
LLaVaDD,10800,mme_total_score,895.123949579832,
|
| 726 |
+
LLaVaDD,10800,mmmu_val_mmmu_acc,0.28444,
|
| 727 |
+
LLaVaDD,10800,mmstar_average,0.32415498095034523,
|
| 728 |
+
LLaVaDD,10800,ocrbench_ocrbench_accuracy,0.48,
|
| 729 |
+
LLaVaDD,10800,textvqa_val_exact_match,0.45826000000000006,0.0067923767383995465
|
| 730 |
+
LLaVaDD,11400,average,0.36018553894261496,
|
| 731 |
+
LLaVaDD,11400,average_rank,1.7142857142857142,
|
| 732 |
+
LLaVaDD,11400,docvqa_val_anls,0.4075278809955403,0.005745964403945676
|
| 733 |
+
LLaVaDD,11400,infovqa_val_anls,0.21426494246529132,0.0063051564080262214
|
| 734 |
+
LLaVaDD,11400,mme_total_score,924.1244497799119,
|
| 735 |
+
LLaVaDD,11400,mmmu_val_mmmu_acc,0.27222,
|
| 736 |
+
LLaVaDD,11400,mmstar_average,0.33054041019485814,
|
| 737 |
+
LLaVaDD,11400,ocrbench_ocrbench_accuracy,0.478,
|
| 738 |
+
LLaVaDD,11400,textvqa_val_exact_match,0.45856,0.0067880601670997605
|
| 739 |
+
LLaVaDD,12000,average,0.3604179175862374,
|
| 740 |
+
LLaVaDD,12000,average_rank,3.142857142857143,
|
| 741 |
+
LLaVaDD,12000,docvqa_val_anls,0.4168137297955176,0.005781419012200098
|
| 742 |
+
LLaVaDD,12000,infovqa_val_anls,0.20846969163165352,0.006257602143639074
|
| 743 |
+
LLaVaDD,12000,mme_total_score,947.7637054821929,
|
| 744 |
+
LLaVaDD,12000,mmmu_val_mmmu_acc,0.26778,
|
| 745 |
+
LLaVaDD,12000,mmstar_average,0.3158840840902534,
|
| 746 |
+
LLaVaDD,12000,ocrbench_ocrbench_accuracy,0.498,
|
| 747 |
+
LLaVaDD,12000,textvqa_val_exact_match,0.45556,0.006805283216437887
|
| 748 |
+
LLaVaDD,13200,average,0.3615047561957609,
|
| 749 |
+
LLaVaDD,13200,average_rank,2.5714285714285716,
|
| 750 |
+
LLaVaDD,13200,docvqa_val_anls,0.4019041370209914,0.005625248973963487
|
| 751 |
+
LLaVaDD,13200,infovqa_val_anls,0.20363163177627105,0.00622309778124376
|
| 752 |
+
LLaVaDD,13200,mme_total_score,874.9429771908764,
|
| 753 |
+
LLaVaDD,13200,mmmu_val_mmmu_acc,0.28111,
|
| 754 |
+
LLaVaDD,13200,mmstar_average,0.3177427683773028,
|
| 755 |
+
LLaVaDD,13200,ocrbench_ocrbench_accuracy,0.494,
|
| 756 |
+
LLaVaDD,13200,textvqa_val_exact_match,0.47063999999999995,0.006828303099939613
|
| 757 |
+
LLaVaDD,14400,average,0.35822126770736845,
|
| 758 |
+
LLaVaDD,14400,average_rank,2.857142857142857,
|
| 759 |
+
LLaVaDD,14400,docvqa_val_anls,0.40475932408589743,0.005711979175622161
|
| 760 |
+
LLaVaDD,14400,infovqa_val_anls,0.2054455223584203,0.006260304272567981
|
| 761 |
+
LLaVaDD,14400,mme_total_score,895.4330732292917,
|
| 762 |
+
LLaVaDD,14400,mmmu_val_mmmu_acc,0.24889,
|
| 763 |
+
LLaVaDD,14400,mmstar_average,0.3293727597998932,
|
| 764 |
+
LLaVaDD,14400,ocrbench_ocrbench_accuracy,0.486,
|
| 765 |
+
LLaVaDD,14400,textvqa_val_exact_match,0.47486000000000006,0.006809762892651316
|
| 766 |
+
LLaVaDD,14700,average,0.3558320320318881,
|
| 767 |
+
LLaVaDD,14700,average_rank,2.2857142857142856,
|
| 768 |
+
LLaVaDD,14700,docvqa_val_anls,0.40809052878460395,0.0057531214312033715
|
| 769 |
+
LLaVaDD,14700,infovqa_val_anls,0.20725347402609343,0.006333426981708809
|
| 770 |
+
LLaVaDD,14700,mme_total_score,934.3972589035614,
|
| 771 |
+
LLaVaDD,14700,mmmu_val_mmmu_acc,0.25889,
|
| 772 |
+
LLaVaDD,14700,mmstar_average,0.3138981893806314,
|
| 773 |
+
LLaVaDD,14700,ocrbench_ocrbench_accuracy,0.474,
|
| 774 |
+
LLaVaDD,14700,textvqa_val_exact_match,0.47286000000000006,0.0068163316054393255
|
| 775 |
+
LLaVaDD,15600,average,0.3531190433154545,
|
| 776 |
+
LLaVaDD,15600,average_rank,3.2857142857142856,
|
| 777 |
+
LLaVaDD,15600,docvqa_val_anls,0.39525955886140174,0.005587329122981871
|
| 778 |
+
LLaVaDD,15600,infovqa_val_anls,0.20744642424798548,0.006270716143292359
|
| 779 |
+
LLaVaDD,15600,mme_total_score,887.2177871148459,
|
| 780 |
+
LLaVaDD,15600,mmmu_val_mmmu_acc,0.25111,
|
| 781 |
+
LLaVaDD,15600,mmstar_average,0.3150982767833397,
|
| 782 |
+
LLaVaDD,15600,ocrbench_ocrbench_accuracy,0.483,
|
| 783 |
+
LLaVaDD,15600,textvqa_val_exact_match,0.4668000000000001,0.00682372806117965
|
| 784 |
+
LLaVaDD,16800,average,0.35105363138195517,
|
| 785 |
+
LLaVaDD,16800,average_rank,3.2857142857142856,
|
| 786 |
+
LLaVaDD,16800,docvqa_val_anls,0.41852303319453404,0.005850640721947784
|
| 787 |
+
LLaVaDD,16800,infovqa_val_anls,0.2060249552494562,0.006276240807887592
|
| 788 |
+
LLaVaDD,16800,mme_total_score,922.4671868747499,
|
| 789 |
+
LLaVaDD,16800,mmmu_val_mmmu_acc,0.26444,
|
| 790 |
+
LLaVaDD,16800,mmstar_average,0.2870137998477405,
|
| 791 |
+
LLaVaDD,16800,ocrbench_ocrbench_accuracy,0.476,
|
| 792 |
+
LLaVaDD,16800,textvqa_val_exact_match,0.45432,0.006822490512661711
|
| 793 |
+
LLaVaDD,17100,average,0.3539392341852292,
|
| 794 |
+
LLaVaDD,17100,average_rank,2.2857142857142856,
|
| 795 |
+
LLaVaDD,17100,docvqa_val_anls,0.3926126439334205,0.005597953615807782
|
| 796 |
+
LLaVaDD,17100,infovqa_val_anls,0.19981020781200884,0.006157098782486468
|
| 797 |
+
LLaVaDD,17100,mme_total_score,906.0204081632653,
|
| 798 |
+
LLaVaDD,17100,mmmu_val_mmmu_acc,0.26333,
|
| 799 |
+
LLaVaDD,17100,mmstar_average,0.32212255336594614,
|
| 800 |
+
LLaVaDD,17100,ocrbench_ocrbench_accuracy,0.486,
|
| 801 |
+
LLaVaDD,17100,textvqa_val_exact_match,0.45975999999999995,0.006811389541459004
|
| 802 |
+
LLaVaDD,18000,average,0.3577636224241274,
|
| 803 |
+
LLaVaDD,18000,average_rank,3.142857142857143,
|
| 804 |
+
LLaVaDD,18000,docvqa_val_anls,0.40703277772305824,0.005678461401864167
|
| 805 |
+
LLaVaDD,18000,infovqa_val_anls,0.20110975485759583,0.006181644423856455
|
| 806 |
+
LLaVaDD,18000,mme_total_score,810.5214085634254,
|
| 807 |
+
LLaVaDD,18000,mmmu_val_mmmu_acc,0.26444,
|
| 808 |
+
LLaVaDD,18000,mmstar_average,0.3233392019641104,
|
| 809 |
+
LLaVaDD,18000,ocrbench_ocrbench_accuracy,0.482,
|
| 810 |
+
LLaVaDD,18000,textvqa_val_exact_match,0.46865999999999997,0.006819241988099444
|
| 811 |
+
LLaVaDD,19200,average,0.35213697279154416,
|
| 812 |
+
LLaVaDD,19200,average_rank,3.2857142857142856,
|
| 813 |
+
LLaVaDD,19200,docvqa_val_anls,0.40393359954160324,0.0057202986837765315
|
| 814 |
+
LLaVaDD,19200,infovqa_val_anls,0.19769978171894423,0.006187032583796771
|
| 815 |
+
LLaVaDD,19200,mme_total_score,918.2750100040016,
|
| 816 |
+
LLaVaDD,19200,mmmu_val_mmmu_acc,0.26778,
|
| 817 |
+
LLaVaDD,19200,mmstar_average,0.31024845548871743,
|
| 818 |
+
LLaVaDD,19200,ocrbench_ocrbench_accuracy,0.478,
|
| 819 |
+
LLaVaDD,19200,textvqa_val_exact_match,0.45516,0.006808813910614232
|
| 820 |
+
LLaVaDD,19500,average,0.3515463502032892,
|
| 821 |
+
LLaVaDD,19500,average_rank,2.142857142857143,
|
| 822 |
+
LLaVaDD,19500,docvqa_val_anls,0.4008490038529374,0.00568561290008947
|
| 823 |
+
LLaVaDD,19500,infovqa_val_anls,0.2007011465723568,0.00621601219684737
|
| 824 |
+
LLaVaDD,19500,mme_total_score,817.8963585434174,
|
| 825 |
+
LLaVaDD,19500,mmmu_val_mmmu_acc,0.26556,
|
| 826 |
+
LLaVaDD,19500,mmstar_average,0.3017679507944412,
|
| 827 |
+
LLaVaDD,19500,ocrbench_ocrbench_accuracy,0.48,
|
| 828 |
+
LLaVaDD,19500,textvqa_val_exact_match,0.4604,0.006801913054739883
|
app/src/content/assets/data/all_ratings_luis.csv
ADDED
|
@@ -0,0 +1,1201 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
run,step,metric,value,stderr
|
| 2 |
+
Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
|
| 3 |
+
Baseline,1000,average,0.27120689295763617,
|
| 4 |
+
Baseline,1000,average_rank,3.0,
|
| 5 |
+
Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
|
| 6 |
+
Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
|
| 7 |
+
Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
|
| 8 |
+
Baseline,1000,mme_total_score,977.4280712284914,
|
| 9 |
+
Baseline,1000,mmmu_val_mmmu_acc,0.25222,
|
| 10 |
+
Baseline,1000,mmstar_average,0.23215874078908072,
|
| 11 |
+
Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
|
| 12 |
+
Baseline,1000,seedbench_seed_all,0.2563646470261256,
|
| 13 |
+
Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
|
| 14 |
+
Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
|
| 15 |
+
Baseline,2000,average,0.3202068275596269,
|
| 16 |
+
Baseline,2000,average_rank,2.8,
|
| 17 |
+
Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
|
| 18 |
+
Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
|
| 19 |
+
Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
|
| 20 |
+
Baseline,2000,mme_total_score,1049.3036214485794,
|
| 21 |
+
Baseline,2000,mmmu_val_mmmu_acc,0.24556,
|
| 22 |
+
Baseline,2000,mmstar_average,0.21305462434540698,
|
| 23 |
+
Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
|
| 24 |
+
Baseline,2000,seedbench_seed_all,0.258532518065592,
|
| 25 |
+
Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
|
| 26 |
+
Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
|
| 27 |
+
Baseline,3000,average,0.3507423834414229,
|
| 28 |
+
Baseline,3000,average_rank,2.6,
|
| 29 |
+
Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
|
| 30 |
+
Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
|
| 31 |
+
Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
|
| 32 |
+
Baseline,3000,mme_total_score,1170.2383953581434,
|
| 33 |
+
Baseline,3000,mmmu_val_mmmu_acc,0.27556,
|
| 34 |
+
Baseline,3000,mmstar_average,0.25432376938577683,
|
| 35 |
+
Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
|
| 36 |
+
Baseline,3000,seedbench_seed_all,0.2792106725958866,
|
| 37 |
+
Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
|
| 38 |
+
Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
|
| 39 |
+
Baseline,4000,average,0.36961781722974835,
|
| 40 |
+
Baseline,4000,average_rank,2.8,
|
| 41 |
+
Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
|
| 42 |
+
Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
|
| 43 |
+
Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
|
| 44 |
+
Baseline,4000,mme_total_score,1155.203781512605,
|
| 45 |
+
Baseline,4000,mmmu_val_mmmu_acc,0.25556,
|
| 46 |
+
Baseline,4000,mmstar_average,0.2575590188757354,
|
| 47 |
+
Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
|
| 48 |
+
Baseline,4000,seedbench_seed_all,0.33913285158421347,
|
| 49 |
+
Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
|
| 50 |
+
Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
|
| 51 |
+
Baseline,5000,average,0.3974627910380972,
|
| 52 |
+
Baseline,5000,average_rank,2.3,
|
| 53 |
+
Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
|
| 54 |
+
Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
|
| 55 |
+
Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
|
| 56 |
+
Baseline,5000,mme_total_score,1181.4653861544618,
|
| 57 |
+
Baseline,5000,mmmu_val_mmmu_acc,0.26667,
|
| 58 |
+
Baseline,5000,mmstar_average,0.29596648146165705,
|
| 59 |
+
Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
|
| 60 |
+
Baseline,5000,seedbench_seed_all,0.43107281823235133,
|
| 61 |
+
Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
|
| 62 |
+
Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
|
| 63 |
+
Baseline,6000,average,0.4161227404571003,
|
| 64 |
+
Baseline,6000,average_rank,2.3,
|
| 65 |
+
Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
|
| 66 |
+
Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
|
| 67 |
+
Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
|
| 68 |
+
Baseline,6000,mme_total_score,1284.1648659463785,
|
| 69 |
+
Baseline,6000,mmmu_val_mmmu_acc,0.27111,
|
| 70 |
+
Baseline,6000,mmstar_average,0.2978489412854164,
|
| 71 |
+
Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
|
| 72 |
+
Baseline,6000,seedbench_seed_all,0.4795997776542524,
|
| 73 |
+
Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
|
| 74 |
+
Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
|
| 75 |
+
Baseline,7000,average,0.4291083177345374,
|
| 76 |
+
Baseline,7000,average_rank,2.1,
|
| 77 |
+
Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
|
| 78 |
+
Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
|
| 79 |
+
Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
|
| 80 |
+
Baseline,7000,mme_total_score,1185.875650260104,
|
| 81 |
+
Baseline,7000,mmmu_val_mmmu_acc,0.26556,
|
| 82 |
+
Baseline,7000,mmstar_average,0.31372400960777047,
|
| 83 |
+
Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
|
| 84 |
+
Baseline,7000,seedbench_seed_all,0.4964424680377988,
|
| 85 |
+
Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
|
| 86 |
+
Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
|
| 87 |
+
Baseline,8000,average,0.43846759477995995,
|
| 88 |
+
Baseline,8000,average_rank,1.9,
|
| 89 |
+
Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
|
| 90 |
+
Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
|
| 91 |
+
Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
|
| 92 |
+
Baseline,8000,mme_total_score,1199.2409963985594,
|
| 93 |
+
Baseline,8000,mmmu_val_mmmu_acc,0.28111,
|
| 94 |
+
Baseline,8000,mmstar_average,0.33512257186205047,
|
| 95 |
+
Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
|
| 96 |
+
Baseline,8000,seedbench_seed_all,0.5024458032240133,
|
| 97 |
+
Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
|
| 98 |
+
Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
|
| 99 |
+
Baseline,9000,average,0.4422510732201056,
|
| 100 |
+
Baseline,9000,average_rank,2.5,
|
| 101 |
+
Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
|
| 102 |
+
Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
|
| 103 |
+
Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
|
| 104 |
+
Baseline,9000,mme_total_score,1231.5195078031213,
|
| 105 |
+
Baseline,9000,mmmu_val_mmmu_acc,0.25889,
|
| 106 |
+
Baseline,9000,mmstar_average,0.3216444898242951,
|
| 107 |
+
Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
|
| 108 |
+
Baseline,9000,seedbench_seed_all,0.5120622568093385,
|
| 109 |
+
Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
|
| 110 |
+
Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
|
| 111 |
+
Baseline,10000,average,0.4523875703250908,
|
| 112 |
+
Baseline,10000,average_rank,1.9,
|
| 113 |
+
Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
|
| 114 |
+
Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
|
| 115 |
+
Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
|
| 116 |
+
Baseline,10000,mme_total_score,1240.8218287314926,
|
| 117 |
+
Baseline,10000,mmmu_val_mmmu_acc,0.28778,
|
| 118 |
+
Baseline,10000,mmstar_average,0.32972717906018517,
|
| 119 |
+
Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
|
| 120 |
+
Baseline,10000,seedbench_seed_all,0.5217342968315731,
|
| 121 |
+
Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
|
| 122 |
+
Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
|
| 123 |
+
Baseline,11000,average,0.4561398159525099,
|
| 124 |
+
Baseline,11000,average_rank,2.3,
|
| 125 |
+
Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
|
| 126 |
+
Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
|
| 127 |
+
Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
|
| 128 |
+
Baseline,11000,mme_total_score,1322.9488795518205,
|
| 129 |
+
Baseline,11000,mmmu_val_mmmu_acc,0.27778,
|
| 130 |
+
Baseline,11000,mmstar_average,0.3298563439522548,
|
| 131 |
+
Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
|
| 132 |
+
Baseline,11000,seedbench_seed_all,0.5237354085603113,
|
| 133 |
+
Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
|
| 134 |
+
Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
|
| 135 |
+
Baseline,12000,average,0.4582751140055433,
|
| 136 |
+
Baseline,12000,average_rank,2.4,
|
| 137 |
+
Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 138 |
+
Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
|
| 139 |
+
Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
|
| 140 |
+
Baseline,12000,mme_total_score,1225.6453581432572,
|
| 141 |
+
Baseline,12000,mmmu_val_mmmu_acc,0.27889,
|
| 142 |
+
Baseline,12000,mmstar_average,0.34010867846816534,
|
| 143 |
+
Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
|
| 144 |
+
Baseline,12000,seedbench_seed_all,0.5350194552529183,
|
| 145 |
+
Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
|
| 146 |
+
Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
|
| 147 |
+
Baseline,13000,average,0.4692868662590049,
|
| 148 |
+
Baseline,13000,average_rank,1.7,
|
| 149 |
+
Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
|
| 150 |
+
Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
|
| 151 |
+
Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
|
| 152 |
+
Baseline,13000,mme_total_score,1281.7122849139657,
|
| 153 |
+
Baseline,13000,mmmu_val_mmmu_acc,0.28222,
|
| 154 |
+
Baseline,13000,mmstar_average,0.3453069542917521,
|
| 155 |
+
Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
|
| 156 |
+
Baseline,13000,seedbench_seed_all,0.5442468037798777,
|
| 157 |
+
Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
|
| 158 |
+
Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
|
| 159 |
+
Baseline,14000,average,0.47352486841689195,
|
| 160 |
+
Baseline,14000,average_rank,1.9,
|
| 161 |
+
Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
|
| 162 |
+
Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
|
| 163 |
+
Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
|
| 164 |
+
Baseline,14000,mme_total_score,1309.1444577831132,
|
| 165 |
+
Baseline,14000,mmmu_val_mmmu_acc,0.28111,
|
| 166 |
+
Baseline,14000,mmstar_average,0.34575818188776586,
|
| 167 |
+
Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
|
| 168 |
+
Baseline,14000,seedbench_seed_all,0.5483602001111729,
|
| 169 |
+
Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
|
| 170 |
+
Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
|
| 171 |
+
Baseline,15000,average,0.47878665012878824,
|
| 172 |
+
Baseline,15000,average_rank,1.4,
|
| 173 |
+
Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 174 |
+
Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
|
| 175 |
+
Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
|
| 176 |
+
Baseline,15000,mme_total_score,1384.2171868747498,
|
| 177 |
+
Baseline,15000,mmmu_val_mmmu_acc,0.30222,
|
| 178 |
+
Baseline,15000,mmstar_average,0.35408135695920684,
|
| 179 |
+
Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
|
| 180 |
+
Baseline,15000,seedbench_seed_all,0.5411339633129516,
|
| 181 |
+
Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
|
| 182 |
+
Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
|
| 183 |
+
Baseline,16000,average,0.47665128022935843,
|
| 184 |
+
Baseline,16000,average_rank,2.1,
|
| 185 |
+
Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 186 |
+
Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
|
| 187 |
+
Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
|
| 188 |
+
Baseline,16000,mme_total_score,1317.8491396558625,
|
| 189 |
+
Baseline,16000,mmmu_val_mmmu_acc,0.27556,
|
| 190 |
+
Baseline,16000,mmstar_average,0.33214333327093315,
|
| 191 |
+
Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
|
| 192 |
+
Baseline,16000,seedbench_seed_all,0.5463590883824346,
|
| 193 |
+
Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
|
| 194 |
+
Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
|
| 195 |
+
Baseline,17000,average,0.4777141780162423,
|
| 196 |
+
Baseline,17000,average_rank,1.8,
|
| 197 |
+
Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 198 |
+
Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
|
| 199 |
+
Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
|
| 200 |
+
Baseline,17000,mme_total_score,1381.9161664665867,
|
| 201 |
+
Baseline,17000,mmmu_val_mmmu_acc,0.27667,
|
| 202 |
+
Baseline,17000,mmstar_average,0.3370289492329521,
|
| 203 |
+
Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
|
| 204 |
+
Baseline,17000,seedbench_seed_all,0.5510283490828238,
|
| 205 |
+
Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
|
| 206 |
+
Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
|
| 207 |
+
Baseline,18000,average,0.4819834595278701,
|
| 208 |
+
Baseline,18000,average_rank,1.6,
|
| 209 |
+
Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
|
| 210 |
+
Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
|
| 211 |
+
Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
|
| 212 |
+
Baseline,18000,mme_total_score,1336.922769107643,
|
| 213 |
+
Baseline,18000,mmmu_val_mmmu_acc,0.28667,
|
| 214 |
+
Baseline,18000,mmstar_average,0.34482796716566916,
|
| 215 |
+
Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
|
| 216 |
+
Baseline,18000,seedbench_seed_all,0.5543079488604781,
|
| 217 |
+
Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
|
| 218 |
+
Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
|
| 219 |
+
Baseline,19000,average,0.4899006713916878,
|
| 220 |
+
Baseline,19000,average_rank,1.4,
|
| 221 |
+
Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
|
| 222 |
+
Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
|
| 223 |
+
Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
|
| 224 |
+
Baseline,19000,mme_total_score,1406.6628651460583,
|
| 225 |
+
Baseline,19000,mmmu_val_mmmu_acc,0.28333,
|
| 226 |
+
Baseline,19000,mmstar_average,0.356220913822775,
|
| 227 |
+
Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
|
| 228 |
+
Baseline,19000,seedbench_seed_all,0.554585881045025,
|
| 229 |
+
Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
|
| 230 |
+
Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
|
| 231 |
+
Baseline,20000,average,0.4873169067639118,
|
| 232 |
+
Baseline,20000,average_rank,1.4,
|
| 233 |
+
Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
|
| 234 |
+
Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
|
| 235 |
+
Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
|
| 236 |
+
Baseline,20000,mme_total_score,1324.6738695478193,
|
| 237 |
+
Baseline,20000,mmmu_val_mmmu_acc,0.30111,
|
| 238 |
+
Baseline,20000,mmstar_average,0.33806766134497995,
|
| 239 |
+
Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
|
| 240 |
+
Baseline,20000,seedbench_seed_all,0.5587548638132296,
|
| 241 |
+
Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
|
| 242 |
+
≥2,1000,ai2d_exact_match,0.27331606217616583,0.008021157484423315
|
| 243 |
+
≥2,1000,average,0.2964817591841572,
|
| 244 |
+
≥2,1000,average_rank,2.0,
|
| 245 |
+
≥2,1000,chartqa_relaxed_overall,0.4016,0.009806398022560107
|
| 246 |
+
≥2,1000,docvqa_val_anls,0.38703197724603455,0.0059317827343935035
|
| 247 |
+
≥2,1000,infovqa_val_anls,0.17280000404070578,0.006201144732918485
|
| 248 |
+
≥2,1000,mme_total_score,961.9496798719488,
|
| 249 |
+
≥2,1000,mmmu_val_mmmu_acc,0.27556,
|
| 250 |
+
≥2,1000,mmstar_average,0.20051212493658782,
|
| 251 |
+
≥2,1000,ocrbench_ocrbench_accuracy,0.331,
|
| 252 |
+
≥2,1000,seedbench_seed_all,0.25219566425792106,
|
| 253 |
+
≥2,1000,textvqa_val_exact_match,0.37432,0.006614110432353112
|
| 254 |
+
≥2,2000,ai2d_exact_match,0.27428756476683935,0.008030027397236182
|
| 255 |
+
≥2,2000,average,0.3376151239444176,
|
| 256 |
+
≥2,2000,average_rank,1.8,
|
| 257 |
+
≥2,2000,chartqa_relaxed_overall,0.4984,0.010001949389825897
|
| 258 |
+
≥2,2000,docvqa_val_anls,0.47035044389194575,0.006171152822696564
|
| 259 |
+
≥2,2000,infovqa_val_anls,0.21264444578610614,0.006798221032077756
|
| 260 |
+
≥2,2000,mme_total_score,995.0442176870747,
|
| 261 |
+
≥2,2000,mmmu_val_mmmu_acc,0.26111,
|
| 262 |
+
≥2,2000,mmstar_average,0.2371410151404708,
|
| 263 |
+
≥2,2000,ocrbench_ocrbench_accuracy,0.386,
|
| 264 |
+
≥2,2000,seedbench_seed_all,0.27276264591439686,
|
| 265 |
+
≥2,2000,textvqa_val_exact_match,0.42583999999999994,0.006752390527477444
|
| 266 |
+
≥2,3000,ai2d_exact_match,0.28886010362694303,0.008157423105367313
|
| 267 |
+
≥2,3000,average,0.3650476191493284,
|
| 268 |
+
≥2,3000,average_rank,2.1,
|
| 269 |
+
≥2,3000,chartqa_relaxed_overall,0.5296,0.009984458511341809
|
| 270 |
+
≥2,3000,docvqa_val_anls,0.5084048093337913,0.006266409805144786
|
| 271 |
+
≥2,3000,infovqa_val_anls,0.226696840609911,0.0070183318907300766
|
| 272 |
+
≥2,3000,mme_total_score,966.6394557823129,
|
| 273 |
+
≥2,3000,mmmu_val_mmmu_acc,0.27556,
|
| 274 |
+
≥2,3000,mmstar_average,0.25798680765602255,
|
| 275 |
+
≥2,3000,ocrbench_ocrbench_accuracy,0.423,
|
| 276 |
+
≥2,3000,seedbench_seed_all,0.3360200111172874,
|
| 277 |
+
≥2,3000,textvqa_val_exact_match,0.4393,0.0067683280101374045
|
| 278 |
+
≥2,4000,ai2d_exact_match,0.3180051813471503,0.00838183912252989
|
| 279 |
+
≥2,4000,average,0.3939919625964655,
|
| 280 |
+
≥2,4000,average_rank,2.0,
|
| 281 |
+
≥2,4000,chartqa_relaxed_overall,0.5392,0.009971214271372281
|
| 282 |
+
≥2,4000,docvqa_val_anls,0.5318426170932731,0.006287567577266625
|
| 283 |
+
≥2,4000,infovqa_val_anls,0.24176968468370258,0.007226680233814427
|
| 284 |
+
≥2,4000,mme_total_score,1052.9128651460585,
|
| 285 |
+
≥2,4000,mmmu_val_mmmu_acc,0.27778,
|
| 286 |
+
≥2,4000,mmstar_average,0.30433696178936676,
|
| 287 |
+
≥2,4000,ocrbench_ocrbench_accuracy,0.447,
|
| 288 |
+
≥2,4000,seedbench_seed_all,0.42779321845469703,
|
| 289 |
+
≥2,4000,textvqa_val_exact_match,0.4581999999999999,0.006800867765254084
|
| 290 |
+
≥2,5000,ai2d_exact_match,0.3448834196891192,0.008555140353607655
|
| 291 |
+
≥2,5000,average,0.40963271881608265,
|
| 292 |
+
≥2,5000,average_rank,2.1,
|
| 293 |
+
≥2,5000,chartqa_relaxed_overall,0.548,0.009955804699716018
|
| 294 |
+
≥2,5000,docvqa_val_anls,0.575799913178854,0.006211088978189562
|
| 295 |
+
≥2,5000,infovqa_val_anls,0.25711323262099633,0.0073775881337487925
|
| 296 |
+
≥2,5000,mme_total_score,1010.4850940376151,
|
| 297 |
+
≥2,5000,mmmu_val_mmmu_acc,0.27667,
|
| 298 |
+
≥2,5000,mmstar_average,0.2871021117490485,
|
| 299 |
+
≥2,5000,ocrbench_ocrbench_accuracy,0.455,
|
| 300 |
+
≥2,5000,seedbench_seed_all,0.46642579210672597,
|
| 301 |
+
≥2,5000,textvqa_val_exact_match,0.4757,0.006785477915527278
|
| 302 |
+
≥2,6000,ai2d_exact_match,0.3795336787564767,0.008734055590837087
|
| 303 |
+
≥2,6000,average,0.423161039572533,
|
| 304 |
+
≥2,6000,average_rank,1.4,
|
| 305 |
+
≥2,6000,chartqa_relaxed_overall,0.5668,0.009912336039617753
|
| 306 |
+
≥2,6000,docvqa_val_anls,0.5827000147792567,0.006217654063020532
|
| 307 |
+
≥2,6000,infovqa_val_anls,0.24558020684647988,0.0071473774205313935
|
| 308 |
+
≥2,6000,mme_total_score,1096.4623849539817,
|
| 309 |
+
≥2,6000,mmmu_val_mmmu_acc,0.27222,
|
| 310 |
+
≥2,6000,mmstar_average,0.3026938215293386,
|
| 311 |
+
≥2,6000,ocrbench_ocrbench_accuracy,0.475,
|
| 312 |
+
≥2,6000,seedbench_seed_all,0.49494163424124515,
|
| 313 |
+
≥2,6000,textvqa_val_exact_match,0.4889799999999999,0.006798040496416463
|
| 314 |
+
≥2,7000,ai2d_exact_match,0.3863341968911917,0.00876353292332671
|
| 315 |
+
≥2,7000,average,0.43260201849012403,
|
| 316 |
+
≥2,7000,average_rank,2.1,
|
| 317 |
+
≥2,7000,chartqa_relaxed_overall,0.572,0.009897756626351943
|
| 318 |
+
≥2,7000,docvqa_val_anls,0.5958889673096114,0.006197986096231253
|
| 319 |
+
≥2,7000,infovqa_val_anls,0.24831461076228495,0.0071830066608344805
|
| 320 |
+
≥2,7000,mme_total_score,1098.0422168867549,
|
| 321 |
+
≥2,7000,mmmu_val_mmmu_acc,0.28333,
|
| 322 |
+
≥2,7000,mmstar_average,0.31254705626181345,
|
| 323 |
+
≥2,7000,ocrbench_ocrbench_accuracy,0.493,
|
| 324 |
+
≥2,7000,seedbench_seed_all,0.5060033351862145,
|
| 325 |
+
≥2,7000,textvqa_val_exact_match,0.496,0.006798444216786202
|
| 326 |
+
≥2,8000,ai2d_exact_match,0.4025259067357513,0.00882649222855129
|
| 327 |
+
≥2,8000,average,0.4423608272909927,
|
| 328 |
+
≥2,8000,average_rank,2.1,
|
| 329 |
+
≥2,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
|
| 330 |
+
≥2,8000,docvqa_val_anls,0.6081292058298197,0.006190473638311687
|
| 331 |
+
≥2,8000,infovqa_val_anls,0.25707448915865344,0.007179410853014501
|
| 332 |
+
≥2,8000,mme_total_score,1100.4132653061224,
|
| 333 |
+
≥2,8000,mmmu_val_mmmu_acc,0.28,
|
| 334 |
+
≥2,8000,mmstar_average,0.3170263263849818,
|
| 335 |
+
≥2,8000,ocrbench_ocrbench_accuracy,0.504,
|
| 336 |
+
≥2,8000,seedbench_seed_all,0.5167315175097277,
|
| 337 |
+
≥2,8000,textvqa_val_exact_match,0.5125600000000001,0.006790351320381798
|
| 338 |
+
≥2,9000,ai2d_exact_match,0.4106217616580311,0.008854207883828036
|
| 339 |
+
≥2,9000,average,0.4477239927349069,
|
| 340 |
+
≥2,9000,average_rank,1.8,
|
| 341 |
+
≥2,9000,chartqa_relaxed_overall,0.5884,0.009844437067525526
|
| 342 |
+
≥2,9000,docvqa_val_anls,0.6233981201771228,0.006152789393932141
|
| 343 |
+
≥2,9000,infovqa_val_anls,0.25099979430746866,0.006997337550850154
|
| 344 |
+
≥2,9000,mme_total_score,1100.9423769507803,
|
| 345 |
+
≥2,9000,mmmu_val_mmmu_acc,0.27778,
|
| 346 |
+
≥2,9000,mmstar_average,0.3172130122236236,
|
| 347 |
+
≥2,9000,ocrbench_ocrbench_accuracy,0.518,
|
| 348 |
+
≥2,9000,seedbench_seed_all,0.5178432462479156,
|
| 349 |
+
≥2,9000,textvqa_val_exact_match,0.5252600000000001,0.006790435073078627
|
| 350 |
+
≥2,10000,ai2d_exact_match,0.41904145077720206,0.008880404559123598
|
| 351 |
+
≥2,10000,average,0.450650749528602,
|
| 352 |
+
≥2,10000,average_rank,2.2,
|
| 353 |
+
≥2,10000,chartqa_relaxed_overall,0.5956,0.009817474681589429
|
| 354 |
+
≥2,10000,docvqa_val_anls,0.6254308760372823,0.006142114135609194
|
| 355 |
+
≥2,10000,infovqa_val_anls,0.23792853517114784,0.006776022015067822
|
| 356 |
+
≥2,10000,mme_total_score,1157.0735294117646,
|
| 357 |
+
≥2,10000,mmmu_val_mmmu_acc,0.27667,
|
| 358 |
+
≥2,10000,mmstar_average,0.31479930233765546,
|
| 359 |
+
≥2,10000,ocrbench_ocrbench_accuracy,0.53,
|
| 360 |
+
≥2,10000,seedbench_seed_all,0.5238465814341301,
|
| 361 |
+
≥2,10000,textvqa_val_exact_match,0.53254,0.006777862315178193
|
| 362 |
+
≥2,11000,ai2d_exact_match,0.43555699481865284,0.008924095913829727
|
| 363 |
+
≥2,11000,average,0.4613124059808435,
|
| 364 |
+
≥2,11000,average_rank,1.8,
|
| 365 |
+
≥2,11000,chartqa_relaxed_overall,0.5984,0.009806398022560106
|
| 366 |
+
≥2,11000,docvqa_val_anls,0.6453200065413649,0.0060722869307158955
|
| 367 |
+
≥2,11000,infovqa_val_anls,0.24059820801450565,0.006814633527776416
|
| 368 |
+
≥2,11000,mme_total_score,1262.6299519807922,
|
| 369 |
+
≥2,11000,mmmu_val_mmmu_acc,0.3,
|
| 370 |
+
≥2,11000,mmstar_average,0.33559717819403534,
|
| 371 |
+
≥2,11000,ocrbench_ocrbench_accuracy,0.527,
|
| 372 |
+
≥2,11000,seedbench_seed_all,0.5226792662590328,
|
| 373 |
+
≥2,11000,textvqa_val_exact_match,0.54666,0.0067526356704400645
|
| 374 |
+
≥2,12000,ai2d_exact_match,0.44073834196891193,0.008935721506916777
|
| 375 |
+
≥2,12000,average,0.46516707040731664,
|
| 376 |
+
≥2,12000,average_rank,1.9,
|
| 377 |
+
≥2,12000,chartqa_relaxed_overall,0.598,0.009808000752013664
|
| 378 |
+
≥2,12000,docvqa_val_anls,0.6402481933825662,0.006107198073878916
|
| 379 |
+
≥2,12000,infovqa_val_anls,0.2601009880983462,0.0070991293032872695
|
| 380 |
+
≥2,12000,mme_total_score,1112.7142857142858,
|
| 381 |
+
≥2,12000,mmmu_val_mmmu_acc,0.31,
|
| 382 |
+
≥2,12000,mmstar_average,0.32603422027717016,
|
| 383 |
+
≥2,12000,ocrbench_ocrbench_accuracy,0.547,
|
| 384 |
+
≥2,12000,seedbench_seed_all,0.523401889938855,
|
| 385 |
+
≥2,12000,textvqa_val_exact_match,0.54098,0.006767635340177507
|
| 386 |
+
≥2,13000,ai2d_exact_match,0.44041450777202074,0.008935023865613881
|
| 387 |
+
≥2,13000,average,0.46553651974650545,
|
| 388 |
+
≥2,13000,average_rank,2.2,
|
| 389 |
+
≥2,13000,chartqa_relaxed_overall,0.6092,0.009760545645634788
|
| 390 |
+
≥2,13000,docvqa_val_anls,0.6433035796450283,0.006095519860378371
|
| 391 |
+
≥2,13000,infovqa_val_anls,0.2594356954563223,0.007105630672634776
|
| 392 |
+
≥2,13000,mme_total_score,1207.9944977991197,
|
| 393 |
+
≥2,13000,mmmu_val_mmmu_acc,0.28111,
|
| 394 |
+
≥2,13000,mmstar_average,0.3383640832831994,
|
| 395 |
+
≥2,13000,ocrbench_ocrbench_accuracy,0.539,
|
| 396 |
+
≥2,13000,seedbench_seed_all,0.5294608115619789,
|
| 397 |
+
≥2,13000,textvqa_val_exact_match,0.5495399999999999,0.006753508692222968
|
| 398 |
+
≥2,14000,ai2d_exact_match,0.44462435233160624,0.008943792697097361
|
| 399 |
+
≥2,14000,average,0.46921726913331274,
|
| 400 |
+
≥2,14000,average_rank,2.1,
|
| 401 |
+
≥2,14000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 402 |
+
≥2,14000,docvqa_val_anls,0.65515509916543,0.006051151525703575
|
| 403 |
+
≥2,14000,infovqa_val_anls,0.2677755343748415,0.007100955702899581
|
| 404 |
+
≥2,14000,mme_total_score,1163.8374349739895,
|
| 405 |
+
≥2,14000,mmmu_val_mmmu_acc,0.28556,
|
| 406 |
+
≥2,14000,mmstar_average,0.32353974705611904,
|
| 407 |
+
≥2,14000,ocrbench_ocrbench_accuracy,0.543,
|
| 408 |
+
≥2,14000,seedbench_seed_all,0.5332406892718177,
|
| 409 |
+
≥2,14000,textvqa_val_exact_match,0.55806,0.006725656411892758
|
| 410 |
+
≥2,15000,ai2d_exact_match,0.44624352331606215,0.008946992176353901
|
| 411 |
+
≥2,15000,average,0.4737967933693773,
|
| 412 |
+
≥2,15000,average_rank,2.2,
|
| 413 |
+
≥2,15000,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 414 |
+
≥2,15000,docvqa_val_anls,0.6614354910767699,0.006013191753461033
|
| 415 |
+
≥2,15000,infovqa_val_anls,0.26176573129112124,0.007093151287118967
|
| 416 |
+
≥2,15000,mme_total_score,1229.438475390156,
|
| 417 |
+
≥2,15000,mmmu_val_mmmu_acc,0.29556,
|
| 418 |
+
≥2,15000,mmstar_average,0.32387576651370553,
|
| 419 |
+
≥2,15000,ocrbench_ocrbench_accuracy,0.561,
|
| 420 |
+
≥2,15000,seedbench_seed_all,0.5351306281267371,
|
| 421 |
+
≥2,15000,textvqa_val_exact_match,0.56116,0.006722390124486763
|
| 422 |
+
≥2,16000,ai2d_exact_match,0.4478626943005181,0.0089500956222288
|
| 423 |
+
≥2,16000,average,0.4748174802839308,
|
| 424 |
+
≥2,16000,average_rank,2.2,
|
| 425 |
+
≥2,16000,chartqa_relaxed_overall,0.6192,0.009713613422114641
|
| 426 |
+
≥2,16000,docvqa_val_anls,0.6585392720477772,0.0060616936904167125
|
| 427 |
+
≥2,16000,infovqa_val_anls,0.2653830027853819,0.007108417358601188
|
| 428 |
+
≥2,16000,mme_total_score,1157.8782513005203,
|
| 429 |
+
≥2,16000,mmmu_val_mmmu_acc,0.29889,
|
| 430 |
+
≥2,16000,mmstar_average,0.3217940710425999,
|
| 431 |
+
≥2,16000,ocrbench_ocrbench_accuracy,0.561,
|
| 432 |
+
≥2,16000,seedbench_seed_all,0.5349082823790995,
|
| 433 |
+
≥2,16000,textvqa_val_exact_match,0.5657800000000001,0.006716429140851619
|
| 434 |
+
≥2,17000,ai2d_exact_match,0.4540155440414508,0.00896101461327443
|
| 435 |
+
≥2,17000,average,0.4765363782507968,
|
| 436 |
+
≥2,17000,average_rank,2.2,
|
| 437 |
+
≥2,17000,chartqa_relaxed_overall,0.6184,0.009717527882093043
|
| 438 |
+
≥2,17000,docvqa_val_anls,0.6605538305641464,0.006048170352990264
|
| 439 |
+
≥2,17000,infovqa_val_anls,0.27438351817158263,0.007183740557624646
|
| 440 |
+
≥2,17000,mme_total_score,1231.31512605042,
|
| 441 |
+
≥2,17000,mmmu_val_mmmu_acc,0.30111,
|
| 442 |
+
≥2,17000,mmstar_average,0.3273406426639828,
|
| 443 |
+
≥2,17000,ocrbench_ocrbench_accuracy,0.555,
|
| 444 |
+
≥2,17000,seedbench_seed_all,0.5349638688160089,
|
| 445 |
+
≥2,17000,textvqa_val_exact_match,0.5630599999999999,0.006726822229512349
|
| 446 |
+
≥2,18000,ai2d_exact_match,0.4540155440414508,0.008961014613274428
|
| 447 |
+
≥2,18000,average,0.4749977548559891,
|
| 448 |
+
≥2,18000,average_rank,2.3,
|
| 449 |
+
≥2,18000,chartqa_relaxed_overall,0.614,0.009738559226822298
|
| 450 |
+
≥2,18000,docvqa_val_anls,0.6647865229953943,0.00602531683337989
|
| 451 |
+
≥2,18000,infovqa_val_anls,0.26486387970800995,0.006977819681460442
|
| 452 |
+
≥2,18000,mme_total_score,1245.188775510204,
|
| 453 |
+
≥2,18000,mmmu_val_mmmu_acc,0.29222,
|
| 454 |
+
≥2,18000,mmstar_average,0.32473355790957514,
|
| 455 |
+
≥2,18000,ocrbench_ocrbench_accuracy,0.555,
|
| 456 |
+
≥2,18000,seedbench_seed_all,0.5365202890494719,
|
| 457 |
+
≥2,18000,textvqa_val_exact_match,0.56884,0.006699820027260398
|
| 458 |
+
≥2,19000,ai2d_exact_match,0.45466321243523317,0.00896208360613934
|
| 459 |
+
≥2,19000,average,0.4768734192584572,
|
| 460 |
+
≥2,19000,average_rank,2.6,
|
| 461 |
+
≥2,19000,chartqa_relaxed_overall,0.62,0.009709671008043154
|
| 462 |
+
≥2,19000,docvqa_val_anls,0.6628357233664792,0.006042075311037487
|
| 463 |
+
≥2,19000,infovqa_val_anls,0.2657171063652747,0.007078002720459511
|
| 464 |
+
≥2,19000,mme_total_score,1248.7323929571828,
|
| 465 |
+
≥2,19000,mmmu_val_mmmu_acc,0.28889,
|
| 466 |
+
≥2,19000,mmstar_average,0.32802808302127334,
|
| 467 |
+
≥2,19000,ocrbench_ocrbench_accuracy,0.565,
|
| 468 |
+
≥2,19000,seedbench_seed_all,0.5399666481378543,
|
| 469 |
+
≥2,19000,textvqa_val_exact_match,0.5667599999999999,0.00671422643700147
|
| 470 |
+
≥2,20000,ai2d_exact_match,0.46178756476683935,0.008972834678172942
|
| 471 |
+
≥2,20000,average,0.47802392695549656,
|
| 472 |
+
≥2,20000,average_rank,2.3,
|
| 473 |
+
≥2,20000,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 474 |
+
≥2,20000,docvqa_val_anls,0.666568303416173,0.0059980334517589756
|
| 475 |
+
≥2,20000,infovqa_val_anls,0.2651324480102521,0.0070565217028431
|
| 476 |
+
≥2,20000,mme_total_score,1233.6009403761504,
|
| 477 |
+
≥2,20000,mmmu_val_mmmu_acc,0.28,
|
| 478 |
+
≥2,20000,mmstar_average,0.33277914424945065,
|
| 479 |
+
≥2,20000,ocrbench_ocrbench_accuracy,0.562,
|
| 480 |
+
≥2,20000,seedbench_seed_all,0.5381878821567537,
|
| 481 |
+
≥2,20000,textvqa_val_exact_match,0.5777599999999999,0.00668799090343766
|
| 482 |
+
≥3,1000,ai2d_exact_match,0.2661917098445596,0.007954634970279362
|
| 483 |
+
≥3,1000,average,0.2680725844272073,
|
| 484 |
+
≥3,1000,average_rank,3.2,
|
| 485 |
+
≥3,1000,chartqa_relaxed_overall,0.3476,0.009526069199715017
|
| 486 |
+
≥3,1000,docvqa_val_anls,0.3752729856163278,0.005939283617489936
|
| 487 |
+
≥3,1000,infovqa_val_anls,0.17325429231808173,0.0062340220795234725
|
| 488 |
+
≥3,1000,mme_total_score,707.53231292517,
|
| 489 |
+
≥3,1000,mmmu_val_mmmu_acc,0.23889,
|
| 490 |
+
≥3,1000,mmstar_average,0.19784737378907616,
|
| 491 |
+
≥3,1000,ocrbench_ocrbench_accuracy,0.288,
|
| 492 |
+
≥3,1000,seedbench_seed_all,0.25041689827682045,
|
| 493 |
+
≥3,1000,textvqa_val_exact_match,0.27518000000000004,0.006128613668775364
|
| 494 |
+
≥3,2000,ai2d_exact_match,0.27266839378238344,0.008015217564479073
|
| 495 |
+
≥3,2000,average,0.31253656058741547,
|
| 496 |
+
≥3,2000,average_rank,3.4,
|
| 497 |
+
≥3,2000,chartqa_relaxed_overall,0.4308,0.00990574548014469
|
| 498 |
+
≥3,2000,docvqa_val_anls,0.4481749259885666,0.00619992092326252
|
| 499 |
+
≥3,2000,infovqa_val_anls,0.19674507942801486,0.006580664003046453
|
| 500 |
+
≥3,2000,mme_total_score,786.0510204081633,
|
| 501 |
+
≥3,2000,mmmu_val_mmmu_acc,0.23556,
|
| 502 |
+
≥3,2000,mmstar_average,0.19768658271923586,
|
| 503 |
+
≥3,2000,ocrbench_ocrbench_accuracy,0.377,
|
| 504 |
+
≥3,2000,seedbench_seed_all,0.2653140633685381,
|
| 505 |
+
≥3,2000,textvqa_val_exact_match,0.38888,0.006660461055234364
|
| 506 |
+
≥3,3000,ai2d_exact_match,0.28270725388601037,0.008104913435481193
|
| 507 |
+
≥3,3000,average,0.34936609328629703,
|
| 508 |
+
≥3,3000,average_rank,3.2,
|
| 509 |
+
≥3,3000,chartqa_relaxed_overall,0.4844,0.009997131241172205
|
| 510 |
+
≥3,3000,docvqa_val_anls,0.49044354643512195,0.0062294371457984315
|
| 511 |
+
≥3,3000,infovqa_val_anls,0.21295743099446893,0.006855571779287104
|
| 512 |
+
≥3,3000,mme_total_score,861.8877551020407,
|
| 513 |
+
≥3,3000,mmmu_val_mmmu_acc,0.24889,
|
| 514 |
+
≥3,3000,mmstar_average,0.258368014597926,
|
| 515 |
+
≥3,3000,ocrbench_ocrbench_accuracy,0.394,
|
| 516 |
+
≥3,3000,seedbench_seed_all,0.3434685936631462,
|
| 517 |
+
≥3,3000,textvqa_val_exact_match,0.42906000000000005,0.0067494454796565755
|
| 518 |
+
≥3,4000,ai2d_exact_match,0.3325777202072539,0.00847966336079129
|
| 519 |
+
≥3,4000,average,0.3855383645559374,
|
| 520 |
+
≥3,4000,average_rank,2.8,
|
| 521 |
+
≥3,4000,chartqa_relaxed_overall,0.508,0.010000720262176365
|
| 522 |
+
≥3,4000,docvqa_val_anls,0.5226854794419781,0.006293466169647169
|
| 523 |
+
≥3,4000,infovqa_val_anls,0.2322658206586996,0.007103396837310004
|
| 524 |
+
≥3,4000,mme_total_score,912.9521808723489,
|
| 525 |
+
≥3,4000,mmmu_val_mmmu_acc,0.26667,
|
| 526 |
+
≥3,4000,mmstar_average,0.3070035703119584,
|
| 527 |
+
≥3,4000,ocrbench_ocrbench_accuracy,0.438,
|
| 528 |
+
≥3,4000,seedbench_seed_all,0.41684269038354643,
|
| 529 |
+
≥3,4000,textvqa_val_exact_match,0.4458,0.006781745381100857
|
| 530 |
+
≥3,5000,ai2d_exact_match,0.34520725388601037,0.008557040186364025
|
| 531 |
+
≥3,5000,average,0.39676974212184324,
|
| 532 |
+
≥3,5000,average_rank,2.7,
|
| 533 |
+
≥3,5000,chartqa_relaxed_overall,0.51,0.01
|
| 534 |
+
≥3,5000,docvqa_val_anls,0.5420071464866951,0.006256421242173299
|
| 535 |
+
≥3,5000,infovqa_val_anls,0.21485812900527704,0.0066319183580626885
|
| 536 |
+
≥3,5000,mme_total_score,957.2279911964786,
|
| 537 |
+
≥3,5000,mmmu_val_mmmu_acc,0.26111,
|
| 538 |
+
≥3,5000,mmstar_average,0.30632830702822333,
|
| 539 |
+
≥3,5000,ocrbench_ocrbench_accuracy,0.44,
|
| 540 |
+
≥3,5000,seedbench_seed_all,0.47031684269038354,
|
| 541 |
+
≥3,5000,textvqa_val_exact_match,0.4811,0.00681344572213808
|
| 542 |
+
≥3,6000,ai2d_exact_match,0.37629533678756477,0.008719379877890884
|
| 543 |
+
≥3,6000,average,0.40447034433869705,
|
| 544 |
+
≥3,6000,average_rank,3.3,
|
| 545 |
+
≥3,6000,chartqa_relaxed_overall,0.5084,0.010000589018267121
|
| 546 |
+
≥3,6000,docvqa_val_anls,0.5540669563018141,0.006258072329892215
|
| 547 |
+
≥3,6000,infovqa_val_anls,0.216535214445592,0.00668611609159469
|
| 548 |
+
≥3,6000,mme_total_score,864.5272108843537,
|
| 549 |
+
≥3,6000,mmmu_val_mmmu_acc,0.26889,
|
| 550 |
+
≥3,6000,mmstar_average,0.2932406887895669,
|
| 551 |
+
≥3,6000,ocrbench_ocrbench_accuracy,0.454,
|
| 552 |
+
≥3,6000,seedbench_seed_all,0.4848249027237354,
|
| 553 |
+
≥3,6000,textvqa_val_exact_match,0.48398,0.006803464510517356
|
| 554 |
+
≥3,7000,ai2d_exact_match,0.3947538860103627,0.008797532848529207
|
| 555 |
+
≥3,7000,average,0.42355543935120793,
|
| 556 |
+
≥3,7000,average_rank,2.7,
|
| 557 |
+
≥3,7000,chartqa_relaxed_overall,0.5488,0.00995424828018316
|
| 558 |
+
≥3,7000,docvqa_val_anls,0.5797391833660968,0.006220930330963092
|
| 559 |
+
≥3,7000,infovqa_val_anls,0.2221619185818123,0.00673372198453672
|
| 560 |
+
≥3,7000,mme_total_score,866.3928571428571,
|
| 561 |
+
≥3,7000,mmmu_val_mmmu_acc,0.28667,
|
| 562 |
+
≥3,7000,mmstar_average,0.3209799250686363,
|
| 563 |
+
≥3,7000,ocrbench_ocrbench_accuracy,0.476,
|
| 564 |
+
≥3,7000,seedbench_seed_all,0.49327404113396334,
|
| 565 |
+
≥3,7000,textvqa_val_exact_match,0.48962000000000006,0.006807769110659733
|
| 566 |
+
≥3,8000,ai2d_exact_match,0.4102979274611399,0.008853146969712133
|
| 567 |
+
≥3,8000,average,0.42791468613133354,
|
| 568 |
+
≥3,8000,average_rank,3.4,
|
| 569 |
+
≥3,8000,chartqa_relaxed_overall,0.5456,0.00996031822662661
|
| 570 |
+
≥3,8000,docvqa_val_anls,0.5824594046059755,0.006268157085435711
|
| 571 |
+
≥3,8000,infovqa_val_anls,0.22074277862778585,0.006618518997755148
|
| 572 |
+
≥3,8000,mme_total_score,788.9880952380953,
|
| 573 |
+
≥3,8000,mmmu_val_mmmu_acc,0.27556,
|
| 574 |
+
≥3,8000,mmstar_average,0.32537357643818443,
|
| 575 |
+
≥3,8000,ocrbench_ocrbench_accuracy,0.5,
|
| 576 |
+
≥3,8000,seedbench_seed_all,0.5012784880489161,
|
| 577 |
+
≥3,8000,textvqa_val_exact_match,0.48991999999999997,0.006810591424473371
|
| 578 |
+
≥3,9000,ai2d_exact_match,0.4251943005181347,0.0088978675214111
|
| 579 |
+
≥3,9000,average,0.4411468725875502,
|
| 580 |
+
≥3,9000,average_rank,2.9,
|
| 581 |
+
≥3,9000,chartqa_relaxed_overall,0.5648,0.009917647296166388
|
| 582 |
+
≥3,9000,docvqa_val_anls,0.6050413765127355,0.006187758928771102
|
| 583 |
+
≥3,9000,infovqa_val_anls,0.23301995192200392,0.00676964747288323
|
| 584 |
+
≥3,9000,mme_total_score,825.0221088435375,
|
| 585 |
+
≥3,9000,mmmu_val_mmmu_acc,0.27556,
|
| 586 |
+
≥3,9000,mmstar_average,0.33219983189483276,
|
| 587 |
+
≥3,9000,ocrbench_ocrbench_accuracy,0.504,
|
| 588 |
+
≥3,9000,seedbench_seed_all,0.5115063924402445,
|
| 589 |
+
≥3,9000,textvqa_val_exact_match,0.519,0.006787356896666665
|
| 590 |
+
≥3,10000,ai2d_exact_match,0.4258419689119171,0.00889962357526378
|
| 591 |
+
≥3,10000,average,0.44419201562479543,
|
| 592 |
+
≥3,10000,average_rank,2.7,
|
| 593 |
+
≥3,10000,chartqa_relaxed_overall,0.576,0.009885782289560632
|
| 594 |
+
≥3,10000,docvqa_val_anls,0.6087522279355707,0.006173079977045839
|
| 595 |
+
≥3,10000,infovqa_val_anls,0.24383042893389267,0.0069221731872859795
|
| 596 |
+
≥3,10000,mme_total_score,915.8061224489795,
|
| 597 |
+
≥3,10000,mmmu_val_mmmu_acc,0.27333,
|
| 598 |
+
≥3,10000,mmstar_average,0.3351679228462254,
|
| 599 |
+
≥3,10000,ocrbench_ocrbench_accuracy,0.489,
|
| 600 |
+
≥3,10000,seedbench_seed_all,0.5180655919955531,
|
| 601 |
+
≥3,10000,textvqa_val_exact_match,0.5277400000000001,0.006769908774345677
|
| 602 |
+
≥3,11000,ai2d_exact_match,0.43426165803108807,0.008921034830887029
|
| 603 |
+
≥3,11000,average,0.45138194167282136,
|
| 604 |
+
≥3,11000,average_rank,2.9,
|
| 605 |
+
≥3,11000,chartqa_relaxed_overall,0.5784,0.009878279615563902
|
| 606 |
+
≥3,11000,docvqa_val_anls,0.6240570866567314,0.006144737191710238
|
| 607 |
+
≥3,11000,infovqa_val_anls,0.2562175057951717,0.0071028888697453095
|
| 608 |
+
≥3,11000,mme_total_score,852.3894557823129,
|
| 609 |
+
≥3,11000,mmmu_val_mmmu_acc,0.28778,
|
| 610 |
+
≥3,11000,mmstar_average,0.3331474836051967,
|
| 611 |
+
≥3,11000,ocrbench_ocrbench_accuracy,0.5,
|
| 612 |
+
≥3,11000,seedbench_seed_all,0.520733740967204,
|
| 613 |
+
≥3,11000,textvqa_val_exact_match,0.5278400000000001,0.00678178334931745
|
| 614 |
+
≥3,12000,ai2d_exact_match,0.4381476683937824,0.008930032335354965
|
| 615 |
+
≥3,12000,average,0.45691171338244096,
|
| 616 |
+
≥3,12000,average_rank,2.5,
|
| 617 |
+
≥3,12000,chartqa_relaxed_overall,0.572,0.009897756626351943
|
| 618 |
+
≥3,12000,docvqa_val_anls,0.6273497290110698,0.006129247411332687
|
| 619 |
+
≥3,12000,infovqa_val_anls,0.268135358118058,0.007380056393275344
|
| 620 |
+
≥3,12000,mme_total_score,893.8265306122448,
|
| 621 |
+
≥3,12000,mmmu_val_mmmu_acc,0.29556,
|
| 622 |
+
≥3,12000,mmstar_average,0.34474290394073753,
|
| 623 |
+
≥3,12000,ocrbench_ocrbench_accuracy,0.508,
|
| 624 |
+
≥3,12000,seedbench_seed_all,0.5255697609783213,
|
| 625 |
+
≥3,12000,textvqa_val_exact_match,0.5327,0.006782133990735781
|
| 626 |
+
≥3,13000,ai2d_exact_match,0.43458549222797926,0.008921805911548512
|
| 627 |
+
≥3,13000,average,0.4607824778908788,
|
| 628 |
+
≥3,13000,average_rank,2.7,
|
| 629 |
+
≥3,13000,chartqa_relaxed_overall,0.5876,0.009847298295140926
|
| 630 |
+
≥3,13000,docvqa_val_anls,0.6386402725745638,0.006069984676680257
|
| 631 |
+
≥3,13000,infovqa_val_anls,0.2536816276782758,0.00704241123014852
|
| 632 |
+
≥3,13000,mme_total_score,941.5953381352541,
|
| 633 |
+
≥3,13000,mmmu_val_mmmu_acc,0.29667,
|
| 634 |
+
≥3,13000,mmstar_average,0.34638755445148733,
|
| 635 |
+
≥3,13000,ocrbench_ocrbench_accuracy,0.53,
|
| 636 |
+
≥3,13000,seedbench_seed_all,0.5272373540856031,
|
| 637 |
+
≥3,13000,textvqa_val_exact_match,0.53224,0.00678673179267349
|
| 638 |
+
≥3,14000,ai2d_exact_match,0.43490932642487046,0.008922573118260885
|
| 639 |
+
≥3,14000,average,0.4621098839598732,
|
| 640 |
+
≥3,14000,average_rank,2.8,
|
| 641 |
+
≥3,14000,chartqa_relaxed_overall,0.5936,0.009825183443166683
|
| 642 |
+
≥3,14000,docvqa_val_anls,0.6373184890679852,0.006105256249191251
|
| 643 |
+
≥3,14000,infovqa_val_anls,0.2624975120280117,0.007131056805776271
|
| 644 |
+
≥3,14000,mme_total_score,901.2585034013605,
|
| 645 |
+
≥3,14000,mmmu_val_mmmu_acc,0.28444,
|
| 646 |
+
≥3,14000,mmstar_average,0.34930389493288816,
|
| 647 |
+
≥3,14000,ocrbench_ocrbench_accuracy,0.517,
|
| 648 |
+
≥3,14000,seedbench_seed_all,0.5355197331851028,
|
| 649 |
+
≥3,14000,textvqa_val_exact_match,0.5444,0.006752217894092123
|
| 650 |
+
≥3,15000,ai2d_exact_match,0.44527202072538863,0.008945084019331405
|
| 651 |
+
≥3,15000,average,0.46643076140543904,
|
| 652 |
+
≥3,15000,average_rank,3.0,
|
| 653 |
+
≥3,15000,chartqa_relaxed_overall,0.5848,0.00985710144918839
|
| 654 |
+
≥3,15000,docvqa_val_anls,0.642316016710227,0.006100312721783546
|
| 655 |
+
≥3,15000,infovqa_val_anls,0.2596632231498878,0.007146587424008848
|
| 656 |
+
≥3,15000,mme_total_score,891.8367346938775,
|
| 657 |
+
≥3,15000,mmmu_val_mmmu_acc,0.29778,
|
| 658 |
+
≥3,15000,mmstar_average,0.34413882163543197,
|
| 659 |
+
≥3,15000,ocrbench_ocrbench_accuracy,0.538,
|
| 660 |
+
≥3,15000,seedbench_seed_all,0.5361867704280155,
|
| 661 |
+
≥3,15000,textvqa_val_exact_match,0.54972,0.006745330549116431
|
| 662 |
+
≥3,16000,ai2d_exact_match,0.4494818652849741,0.008953103134587205
|
| 663 |
+
≥3,16000,average,0.46786516199576034,
|
| 664 |
+
≥3,16000,average_rank,2.7,
|
| 665 |
+
≥3,16000,chartqa_relaxed_overall,0.5976,0.009809596692775395
|
| 666 |
+
≥3,16000,docvqa_val_anls,0.6432815750341822,0.006081847680686157
|
| 667 |
+
≥3,16000,infovqa_val_anls,0.2702450654855036,0.007372825383364985
|
| 668 |
+
≥3,16000,mme_total_score,919.3826530612245,
|
| 669 |
+
≥3,16000,mmmu_val_mmmu_acc,0.28333,
|
| 670 |
+
≥3,16000,mmstar_average,0.3386692973489569,
|
| 671 |
+
≥3,16000,ocrbench_ocrbench_accuracy,0.534,
|
| 672 |
+
≥3,16000,seedbench_seed_all,0.5415786548082268,
|
| 673 |
+
≥3,16000,textvqa_val_exact_match,0.5526,0.006745409410081935
|
| 674 |
+
≥3,17000,ai2d_exact_match,0.4494818652849741,0.008953103134587206
|
| 675 |
+
≥3,17000,average,0.4694732091424512,
|
| 676 |
+
≥3,17000,average_rank,2.8,
|
| 677 |
+
≥3,17000,chartqa_relaxed_overall,0.596,0.009815912634917984
|
| 678 |
+
≥3,17000,docvqa_val_anls,0.6468732282054332,0.006069886071041202
|
| 679 |
+
≥3,17000,infovqa_val_anls,0.2650584835459577,0.0072427928867972455
|
| 680 |
+
≥3,17000,mme_total_score,889.5646258503401,
|
| 681 |
+
≥3,17000,mmmu_val_mmmu_acc,0.29333,
|
| 682 |
+
≥3,17000,mmstar_average,0.342978718252922,
|
| 683 |
+
≥3,17000,ocrbench_ocrbench_accuracy,0.53,
|
| 684 |
+
≥3,17000,seedbench_seed_all,0.5418565869927737,
|
| 685 |
+
≥3,17000,textvqa_val_exact_match,0.5596800000000001,0.006734324743131207
|
| 686 |
+
≥3,18000,ai2d_exact_match,0.45531088082901555,0.008963137311190377
|
| 687 |
+
≥3,18000,average,0.46991408851845295,
|
| 688 |
+
≥3,18000,average_rank,2.7,
|
| 689 |
+
≥3,18000,chartqa_relaxed_overall,0.6036,0.009784943231599163
|
| 690 |
+
≥3,18000,docvqa_val_anls,0.6501128555487647,0.006068985343727089
|
| 691 |
+
≥3,18000,infovqa_val_anls,0.26796275265157754,0.007202201134473747
|
| 692 |
+
≥3,18000,mme_total_score,894.1054421768707,
|
| 693 |
+
≥3,18000,mmmu_val_mmmu_acc,0.28333,
|
| 694 |
+
≥3,18000,mmstar_average,0.33590517144994875,
|
| 695 |
+
≥3,18000,ocrbench_ocrbench_accuracy,0.534,
|
| 696 |
+
≥3,18000,seedbench_seed_all,0.5412451361867704,
|
| 697 |
+
≥3,18000,textvqa_val_exact_match,0.5577599999999999,0.0067408786051132655
|
| 698 |
+
≥3,19000,ai2d_exact_match,0.4498056994818653,0.008953693133598168
|
| 699 |
+
≥3,19000,average,0.47011136523574254,
|
| 700 |
+
≥3,19000,average_rank,3.0,
|
| 701 |
+
≥3,19000,chartqa_relaxed_overall,0.6096,0.009758751420735989
|
| 702 |
+
≥3,19000,docvqa_val_anls,0.6538834113203496,0.006040538366936906
|
| 703 |
+
≥3,19000,infovqa_val_anls,0.2705360277052952,0.007291872911349649
|
| 704 |
+
≥3,19000,mme_total_score,906.3231292517007,
|
| 705 |
+
≥3,19000,mmmu_val_mmmu_acc,0.27556,
|
| 706 |
+
≥3,19000,mmstar_average,0.3356215177081144,
|
| 707 |
+
≥3,19000,ocrbench_ocrbench_accuracy,0.539,
|
| 708 |
+
≥3,19000,seedbench_seed_all,0.5441356309060589,
|
| 709 |
+
≥3,19000,textvqa_val_exact_match,0.5528599999999999,0.006753272200724876
|
| 710 |
+
≥3,20000,ai2d_exact_match,0.44656735751295334,0.008947620544957215
|
| 711 |
+
≥3,20000,average,0.4679556547685855,
|
| 712 |
+
≥3,20000,average_rank,2.9,
|
| 713 |
+
≥3,20000,chartqa_relaxed_overall,0.5976,0.009809596692775395
|
| 714 |
+
≥3,20000,docvqa_val_anls,0.6493769742508846,0.006072933213063366
|
| 715 |
+
≥3,20000,infovqa_val_anls,0.26540905854876357,0.007209592372844281
|
| 716 |
+
≥3,20000,mme_total_score,926.0901360544218,
|
| 717 |
+
≥3,20000,mmmu_val_mmmu_acc,0.27333,
|
| 718 |
+
≥3,20000,mmstar_average,0.34157097675697473,
|
| 719 |
+
≥3,20000,ocrbench_ocrbench_accuracy,0.539,
|
| 720 |
+
≥3,20000,seedbench_seed_all,0.5437465258476931,
|
| 721 |
+
≥3,20000,textvqa_val_exact_match,0.555,0.0067346322137300735
|
| 722 |
+
≥4,1000,ai2d_exact_match,0.25874352331606215,0.00788225861008497
|
| 723 |
+
≥4,1000,average,0.27914578527127093,
|
| 724 |
+
≥4,1000,average_rank,2.6,
|
| 725 |
+
≥4,1000,chartqa_relaxed_overall,0.3512,0.009548816468986268
|
| 726 |
+
≥4,1000,docvqa_val_anls,0.36858592315444033,0.005921151680127505
|
| 727 |
+
≥4,1000,infovqa_val_anls,0.17699311795329079,0.006346227986201575
|
| 728 |
+
≥4,1000,mme_total_score,671.343537414966,
|
| 729 |
+
≥4,1000,mmmu_val_mmmu_acc,0.27111,
|
| 730 |
+
≥4,1000,mmstar_average,0.2086858732233149,
|
| 731 |
+
≥4,1000,ocrbench_ocrbench_accuracy,0.261,
|
| 732 |
+
≥4,1000,seedbench_seed_all,0.2605336297943302,
|
| 733 |
+
≥4,1000,textvqa_val_exact_match,0.35546000000000005,0.006549153835664011
|
| 734 |
+
≥4,2000,ai2d_exact_match,0.280440414507772,0.008085099461783339
|
| 735 |
+
≥4,2000,average,0.320025358717614,
|
| 736 |
+
≥4,2000,average_rank,2.8,
|
| 737 |
+
≥4,2000,chartqa_relaxed_overall,0.4488,0.009949423119365426
|
| 738 |
+
≥4,2000,docvqa_val_anls,0.43140645952438456,0.006042366638541379
|
| 739 |
+
≥4,2000,infovqa_val_anls,0.16528808420419083,0.005907032628809945
|
| 740 |
+
≥4,2000,mme_total_score,705.5901360544218,
|
| 741 |
+
≥4,2000,mmmu_val_mmmu_acc,0.27222,
|
| 742 |
+
≥4,2000,mmstar_average,0.24877125799316246,
|
| 743 |
+
≥4,2000,ocrbench_ocrbench_accuracy,0.329,
|
| 744 |
+
≥4,2000,seedbench_seed_all,0.3196220122290161,
|
| 745 |
+
≥4,2000,textvqa_val_exact_match,0.38468,0.006645983248449226
|
| 746 |
+
≥4,3000,ai2d_exact_match,0.34617875647668395,0.008562713351618977
|
| 747 |
+
≥4,3000,average,0.3596236953408542,
|
| 748 |
+
≥4,3000,average_rank,2.8,
|
| 749 |
+
≥4,3000,chartqa_relaxed_overall,0.468,0.009981495484186743
|
| 750 |
+
≥4,3000,docvqa_val_anls,0.464923009199496,0.006156900593094097
|
| 751 |
+
≥4,3000,infovqa_val_anls,0.18011502045718095,0.0061004080312330325
|
| 752 |
+
≥4,3000,mme_total_score,709.3333333333333,
|
| 753 |
+
≥4,3000,mmmu_val_mmmu_acc,0.27778,
|
| 754 |
+
≥4,3000,mmstar_average,0.30716380934399923,
|
| 755 |
+
≥4,3000,ocrbench_ocrbench_accuracy,0.351,
|
| 756 |
+
≥4,3000,seedbench_seed_all,0.42679266259032794,
|
| 757 |
+
≥4,3000,textvqa_val_exact_match,0.41466,0.006725300202411972
|
| 758 |
+
≥4,4000,ai2d_exact_match,0.36593264248704666,0.008669617940526182
|
| 759 |
+
≥4,4000,average,0.3829150140884673,
|
| 760 |
+
≥4,4000,average_rank,2.7,
|
| 761 |
+
≥4,4000,chartqa_relaxed_overall,0.5136,0.009998299975543861
|
| 762 |
+
≥4,4000,docvqa_val_anls,0.5002844765367886,0.0062258433013991955
|
| 763 |
+
≥4,4000,infovqa_val_anls,0.18808280764611432,0.006209185081756124
|
| 764 |
+
≥4,4000,mme_total_score,700.1989795918367,
|
| 765 |
+
≥4,4000,mmmu_val_mmmu_acc,0.28889,
|
| 766 |
+
≥4,4000,mmstar_average,0.3128795636615537,
|
| 767 |
+
≥4,4000,ocrbench_ocrbench_accuracy,0.379,
|
| 768 |
+
≥4,4000,seedbench_seed_all,0.46214563646470264,
|
| 769 |
+
≥4,4000,textvqa_val_exact_match,0.4354199999999999,0.006770365742739316
|
| 770 |
+
≥4,5000,ai2d_exact_match,0.39702072538860106,0.008806218703419164
|
| 771 |
+
≥4,5000,average,0.3990130243200321,
|
| 772 |
+
≥4,5000,average_rank,3.0,
|
| 773 |
+
≥4,5000,chartqa_relaxed_overall,0.5432,0.009964598400764347
|
| 774 |
+
≥4,5000,docvqa_val_anls,0.5330701388059006,0.006244542429703876
|
| 775 |
+
≥4,5000,infovqa_val_anls,0.20064814149562474,0.006400433745304747
|
| 776 |
+
≥4,5000,mme_total_score,687.6802721088436,
|
| 777 |
+
≥4,5000,mmmu_val_mmmu_acc,0.26333,
|
| 778 |
+
≥4,5000,mmstar_average,0.30889646221740025,
|
| 779 |
+
≥4,5000,ocrbench_ocrbench_accuracy,0.412,
|
| 780 |
+
≥4,5000,seedbench_seed_all,0.47315175097276263,
|
| 781 |
+
≥4,5000,textvqa_val_exact_match,0.4598,0.006799443983716428
|
| 782 |
+
≥4,6000,ai2d_exact_match,0.41224093264248707,0.00885945303235887
|
| 783 |
+
≥4,6000,average,0.4037939250305515,
|
| 784 |
+
≥4,6000,average_rank,3.2,
|
| 785 |
+
≥4,6000,chartqa_relaxed_overall,0.5312,0.009982508912777261
|
| 786 |
+
≥4,6000,docvqa_val_anls,0.5259911309932884,0.006272635836910295
|
| 787 |
+
≥4,6000,infovqa_val_anls,0.22056731437063212,0.00674209963892894
|
| 788 |
+
≥4,6000,mme_total_score,717.0051020408164,
|
| 789 |
+
≥4,6000,mmmu_val_mmmu_acc,0.26667,
|
| 790 |
+
≥4,6000,mmstar_average,0.3316518783413741,
|
| 791 |
+
≥4,6000,ocrbench_ocrbench_accuracy,0.408,
|
| 792 |
+
≥4,6000,seedbench_seed_all,0.48332406892718177,
|
| 793 |
+
≥4,6000,textvqa_val_exact_match,0.4544999999999999,0.006790726970992053
|
| 794 |
+
≥4,7000,ai2d_exact_match,0.4102979274611399,0.008853146969712133
|
| 795 |
+
≥4,7000,average,0.41740315045514464,
|
| 796 |
+
≥4,7000,average_rank,3.1,
|
| 797 |
+
≥4,7000,chartqa_relaxed_overall,0.5588,0.009932597172675325
|
| 798 |
+
≥4,7000,docvqa_val_anls,0.5597972576652357,0.0062571833970283125
|
| 799 |
+
≥4,7000,infovqa_val_anls,0.21665617889681224,0.006562362156515704
|
| 800 |
+
≥4,7000,mme_total_score,716.7908163265306,
|
| 801 |
+
≥4,7000,mmmu_val_mmmu_acc,0.28556,
|
| 802 |
+
≥4,7000,mmstar_average,0.32150517239662685,
|
| 803 |
+
≥4,7000,ocrbench_ocrbench_accuracy,0.431,
|
| 804 |
+
≥4,7000,seedbench_seed_all,0.4892718176764869,
|
| 805 |
+
≥4,7000,textvqa_val_exact_match,0.48374000000000006,0.006820617761268334
|
| 806 |
+
≥4,8000,ai2d_exact_match,0.4213082901554404,0.00888700282309854
|
| 807 |
+
≥4,8000,average,0.4251708917847074,
|
| 808 |
+
≥4,8000,average_rank,2.9,
|
| 809 |
+
≥4,8000,chartqa_relaxed_overall,0.564,0.009919725822025206
|
| 810 |
+
≥4,8000,docvqa_val_anls,0.5702706873242411,0.006237250618852069
|
| 811 |
+
≥4,8000,infovqa_val_anls,0.24000454829818865,0.006935520157929643
|
| 812 |
+
≥4,8000,mme_total_score,705.8180272108843,
|
| 813 |
+
≥4,8000,mmmu_val_mmmu_acc,0.28778,
|
| 814 |
+
≥4,8000,mmstar_average,0.3384645614295773,
|
| 815 |
+
≥4,8000,ocrbench_ocrbench_accuracy,0.42,
|
| 816 |
+
≥4,8000,seedbench_seed_all,0.5018899388549194,
|
| 817 |
+
≥4,8000,textvqa_val_exact_match,0.48281999999999997,0.006811185503977551
|
| 818 |
+
≥4,9000,ai2d_exact_match,0.4319948186528497,0.008915528710615487
|
| 819 |
+
≥4,9000,average,0.4318231930659084,
|
| 820 |
+
≥4,9000,average_rank,2.9,
|
| 821 |
+
≥4,9000,chartqa_relaxed_overall,0.5676,0.009910165515884228
|
| 822 |
+
≥4,9000,docvqa_val_anls,0.5846178021051754,0.006187149390116838
|
| 823 |
+
≥4,9000,infovqa_val_anls,0.2228617948699063,0.0066001763459020155
|
| 824 |
+
≥4,9000,mme_total_score,733.3503401360545,
|
| 825 |
+
≥4,9000,mmmu_val_mmmu_acc,0.28444,
|
| 826 |
+
≥4,9000,mmstar_average,0.3307124653782511,
|
| 827 |
+
≥4,9000,ocrbench_ocrbench_accuracy,0.463,
|
| 828 |
+
≥4,9000,seedbench_seed_all,0.5153418565869927,
|
| 829 |
+
≥4,9000,textvqa_val_exact_match,0.48583999999999994,0.0068269187957708125
|
| 830 |
+
≥4,10000,ai2d_exact_match,0.4410621761658031,0.0089364152923413
|
| 831 |
+
≥4,10000,average,0.436822457787989,
|
| 832 |
+
≥4,10000,average_rank,3.4,
|
| 833 |
+
≥4,10000,chartqa_relaxed_overall,0.5756,0.009887009516677585
|
| 834 |
+
≥4,10000,docvqa_val_anls,0.591441723638793,0.0062031994384821754
|
| 835 |
+
≥4,10000,infovqa_val_anls,0.22327754225685992,0.00649750251357461
|
| 836 |
+
≥4,10000,mme_total_score,695.8112244897959,
|
| 837 |
+
≥4,10000,mmmu_val_mmmu_acc,0.28778,
|
| 838 |
+
≥4,10000,mmstar_average,0.33369690927002266,
|
| 839 |
+
≥4,10000,ocrbench_ocrbench_accuracy,0.472,
|
| 840 |
+
≥4,10000,seedbench_seed_all,0.5107837687604224,
|
| 841 |
+
≥4,10000,textvqa_val_exact_match,0.49576000000000003,0.006808118284439173
|
| 842 |
+
≥4,11000,ai2d_exact_match,0.44332901554404147,0.008941163900483134
|
| 843 |
+
≥4,11000,average,0.44624717945755144,
|
| 844 |
+
≥4,11000,average_rank,3.0,
|
| 845 |
+
≥4,11000,chartqa_relaxed_overall,0.5868,0.009850132691777215
|
| 846 |
+
≥4,11000,docvqa_val_anls,0.60625861922937,0.006159202385167996
|
| 847 |
+
≥4,11000,infovqa_val_anls,0.2435454505191485,0.006860039872881237
|
| 848 |
+
≥4,11000,mme_total_score,751.1462585034014,
|
| 849 |
+
≥4,11000,mmmu_val_mmmu_acc,0.29222,
|
| 850 |
+
≥4,11000,mmstar_average,0.3470954764624236,
|
| 851 |
+
≥4,11000,ocrbench_ocrbench_accuracy,0.486,
|
| 852 |
+
≥4,11000,seedbench_seed_all,0.5128960533629794,
|
| 853 |
+
≥4,11000,textvqa_val_exact_match,0.49808,0.006799508024988012
|
| 854 |
+
≥4,12000,ai2d_exact_match,0.45142487046632124,0.008956585653027465
|
| 855 |
+
≥4,12000,average,0.44514971381341617,
|
| 856 |
+
≥4,12000,average_rank,3.3,
|
| 857 |
+
≥4,12000,chartqa_relaxed_overall,0.5868,0.009850132691777215
|
| 858 |
+
≥4,12000,docvqa_val_anls,0.6047188055272135,0.0061847009209673315
|
| 859 |
+
≥4,12000,infovqa_val_anls,0.2506217753279014,0.006972909032069362
|
| 860 |
+
≥4,12000,mme_total_score,742.969387755102,
|
| 861 |
+
≥4,12000,mmmu_val_mmmu_acc,0.28556,
|
| 862 |
+
≥4,12000,mmstar_average,0.33917912697374,
|
| 863 |
+
≥4,12000,ocrbench_ocrbench_accuracy,0.459,
|
| 864 |
+
≥4,12000,seedbench_seed_all,0.5211228460255698,
|
| 865 |
+
≥4,12000,textvqa_val_exact_match,0.5079199999999999,0.006798462954205747
|
| 866 |
+
≥4,13000,ai2d_exact_match,0.44689119170984454,0.00894824507304496
|
| 867 |
+
≥4,13000,average,0.4478540374461813,
|
| 868 |
+
≥4,13000,average_rank,3.5,
|
| 869 |
+
≥4,13000,chartqa_relaxed_overall,0.5936,0.009825183443166683
|
| 870 |
+
≥4,13000,docvqa_val_anls,0.6123877664020703,0.0061423212651813735
|
| 871 |
+
≥4,13000,infovqa_val_anls,0.23197941094655744,0.0066388766376455225
|
| 872 |
+
≥4,13000,mme_total_score,705.0068027210884,
|
| 873 |
+
≥4,13000,mmmu_val_mmmu_acc,0.29444,
|
| 874 |
+
≥4,13000,mmstar_average,0.3172158612312008,
|
| 875 |
+
≥4,13000,ocrbench_ocrbench_accuracy,0.5,
|
| 876 |
+
≥4,13000,seedbench_seed_all,0.5257921067259589,
|
| 877 |
+
≥4,13000,textvqa_val_exact_match,0.50838,0.006803735244897213
|
| 878 |
+
≥4,14000,ai2d_exact_match,0.45628238341968913,0.008964689215887884
|
| 879 |
+
≥4,14000,average,0.4541657280018954,
|
| 880 |
+
≥4,14000,average_rank,3.3,
|
| 881 |
+
≥4,14000,chartqa_relaxed_overall,0.5988,0.0098047885010856
|
| 882 |
+
≥4,14000,docvqa_val_anls,0.6230752215069362,0.006110772532320183
|
| 883 |
+
≥4,14000,infovqa_val_anls,0.23950752488444424,0.00673701613611272
|
| 884 |
+
≥4,14000,mme_total_score,693.2602040816327,
|
| 885 |
+
≥4,14000,mmmu_val_mmmu_acc,0.29,
|
| 886 |
+
≥4,14000,mmstar_average,0.3462132371031542,
|
| 887 |
+
≥4,14000,ocrbench_ocrbench_accuracy,0.492,
|
| 888 |
+
≥4,14000,seedbench_seed_all,0.519733185102835,
|
| 889 |
+
≥4,14000,textvqa_val_exact_match,0.52188,0.0067822601638824
|
| 890 |
+
≥4,15000,ai2d_exact_match,0.4536917098445596,0.008960474382205331
|
| 891 |
+
≥4,15000,average,0.4546421832173102,
|
| 892 |
+
≥4,15000,average_rank,3.5,
|
| 893 |
+
≥4,15000,chartqa_relaxed_overall,0.6012,0.0097949885513097
|
| 894 |
+
≥4,15000,docvqa_val_anls,0.6265798815467575,0.006118388682866076
|
| 895 |
+
≥4,15000,infovqa_val_anls,0.24253641235942872,0.006778846024017067
|
| 896 |
+
≥4,15000,mme_total_score,745.8826530612245,
|
| 897 |
+
≥4,15000,mmmu_val_mmmu_acc,0.28111,
|
| 898 |
+
≥4,15000,mmstar_average,0.3514223789460134,
|
| 899 |
+
≥4,15000,ocrbench_ocrbench_accuracy,0.493,
|
| 900 |
+
≥4,15000,seedbench_seed_all,0.5226792662590328,
|
| 901 |
+
≥4,15000,textvqa_val_exact_match,0.51956,0.006792518600768668
|
| 902 |
+
≥4,16000,ai2d_exact_match,0.4582253886010363,0.008967689939886603
|
| 903 |
+
≥4,16000,average,0.46307812033280477,
|
| 904 |
+
≥4,16000,average_rank,3.1,
|
| 905 |
+
≥4,16000,chartqa_relaxed_overall,0.6092,0.009760545645634788
|
| 906 |
+
≥4,16000,docvqa_val_anls,0.6397697311161549,0.006077931892063438
|
| 907 |
+
≥4,16000,infovqa_val_anls,0.2566929717899322,0.007049147355082826
|
| 908 |
+
≥4,16000,mme_total_score,769.8112244897959,
|
| 909 |
+
≥4,16000,mmmu_val_mmmu_acc,0.29,
|
| 910 |
+
≥4,16000,mmstar_average,0.3500855084419833,
|
| 911 |
+
≥4,16000,ocrbench_ocrbench_accuracy,0.512,
|
| 912 |
+
≥4,16000,seedbench_seed_all,0.5250694830461368,
|
| 913 |
+
≥4,16000,textvqa_val_exact_match,0.5266599999999999,0.006785297114451678
|
| 914 |
+
≥4,17000,ai2d_exact_match,0.46243523316062174,0.008973720555405783
|
| 915 |
+
≥4,17000,average,0.4637285100748874,
|
| 916 |
+
≥4,17000,average_rank,3.2,
|
| 917 |
+
≥4,17000,chartqa_relaxed_overall,0.6072,0.00976941352263433
|
| 918 |
+
≥4,17000,docvqa_val_anls,0.6316407990464801,0.006115829668357635
|
| 919 |
+
≥4,17000,infovqa_val_anls,0.26095289130380417,0.007179006033610968
|
| 920 |
+
≥4,17000,mme_total_score,772.2568027210884,
|
| 921 |
+
≥4,17000,mmmu_val_mmmu_acc,0.29222,
|
| 922 |
+
≥4,17000,mmstar_average,0.3487846654954876,
|
| 923 |
+
≥4,17000,ocrbench_ocrbench_accuracy,0.516,
|
| 924 |
+
≥4,17000,seedbench_seed_all,0.5254030016675931,
|
| 925 |
+
≥4,17000,textvqa_val_exact_match,0.52892,0.006777692390690844
|
| 926 |
+
≥4,18000,ai2d_exact_match,0.46729274611398963,0.00897987952745343
|
| 927 |
+
≥4,18000,average,0.46301237822364466,
|
| 928 |
+
≥4,18000,average_rank,3.5,
|
| 929 |
+
≥4,18000,chartqa_relaxed_overall,0.6024,0.009789996609470577
|
| 930 |
+
≥4,18000,docvqa_val_anls,0.6353229754668962,0.006102794809473289
|
| 931 |
+
≥4,18000,infovqa_val_anls,0.2566414572268362,0.006998597263140097
|
| 932 |
+
≥4,18000,mme_total_score,770.295918367347,
|
| 933 |
+
≥4,18000,mmmu_val_mmmu_acc,0.27778,
|
| 934 |
+
≥4,18000,mmstar_average,0.3522173046936848,
|
| 935 |
+
≥4,18000,ocrbench_ocrbench_accuracy,0.518,
|
| 936 |
+
≥4,18000,seedbench_seed_all,0.5224569205113953,
|
| 937 |
+
≥4,18000,textvqa_val_exact_match,0.535,0.006782934589123506
|
| 938 |
+
≥4,19000,ai2d_exact_match,0.4647020725388601,0.008976701230834869
|
| 939 |
+
≥4,19000,average,0.4657296959805982,
|
| 940 |
+
≥4,19000,average_rank,3.3,
|
| 941 |
+
≥4,19000,chartqa_relaxed_overall,0.6088,0.009762332982341016
|
| 942 |
+
≥4,19000,docvqa_val_anls,0.6386155506856869,0.006091782897731878
|
| 943 |
+
≥4,19000,infovqa_val_anls,0.2477875071753752,0.006879861435025137
|
| 944 |
+
≥4,19000,mme_total_score,772.204081632653,
|
| 945 |
+
≥4,19000,mmmu_val_mmmu_acc,0.30333,
|
| 946 |
+
≥4,19000,mmstar_average,0.3470027726694857,
|
| 947 |
+
≥4,19000,ocrbench_ocrbench_accuracy,0.512,
|
| 948 |
+
≥4,19000,seedbench_seed_all,0.5288493607559756,
|
| 949 |
+
≥4,19000,textvqa_val_exact_match,0.54048,0.006763536279536092
|
| 950 |
+
≥4,20000,ai2d_exact_match,0.4634067357512953,0.008975020819363737
|
| 951 |
+
≥4,20000,average,0.46162598712482705,
|
| 952 |
+
≥4,20000,average_rank,3.4,
|
| 953 |
+
≥4,20000,chartqa_relaxed_overall,0.61,0.009756950303844571
|
| 954 |
+
≥4,20000,docvqa_val_anls,0.6435026807424298,0.006070985460919362
|
| 955 |
+
≥4,20000,infovqa_val_anls,0.2543282868714285,0.006962743278022537
|
| 956 |
+
≥4,20000,mme_total_score,765.8690476190477,
|
| 957 |
+
≥4,20000,mmmu_val_mmmu_acc,0.27222,
|
| 958 |
+
≥4,20000,mmstar_average,0.34236379610014667,
|
| 959 |
+
≥4,20000,ocrbench_ocrbench_accuracy,0.509,
|
| 960 |
+
≥4,20000,seedbench_seed_all,0.5262923846581434,
|
| 961 |
+
≥4,20000,textvqa_val_exact_match,0.53352,0.006776464123213716
|
| 962 |
+
≥5,1000,ai2d_exact_match,0.24902849740932642,0.007783374690341817
|
| 963 |
+
≥5,1000,average,0.23561247048158757,
|
| 964 |
+
≥5,1000,average_rank,4.2,
|
| 965 |
+
≥5,1000,chartqa_relaxed_overall,0.2548,0.008716718216771047
|
| 966 |
+
≥5,1000,docvqa_val_anls,0.24096701334945672,0.004990683419188375
|
| 967 |
+
≥5,1000,infovqa_val_anls,0.12232054164836681,0.0051959928578510384
|
| 968 |
+
≥5,1000,mme_total_score,620.9336734693877,
|
| 969 |
+
≥5,1000,mmmu_val_mmmu_acc,0.23778,
|
| 970 |
+
≥5,1000,mmstar_average,0.26414819971479786,
|
| 971 |
+
≥5,1000,ocrbench_ocrbench_accuracy,0.216,
|
| 972 |
+
≥5,1000,seedbench_seed_all,0.2623679822123402,
|
| 973 |
+
≥5,1000,textvqa_val_exact_match,0.27310000000000006,0.0061250290771750005
|
| 974 |
+
≥5,2000,ai2d_exact_match,0.2344559585492228,0.007625132817591135
|
| 975 |
+
≥5,2000,average,0.2752283006434932,
|
| 976 |
+
≥5,2000,average_rank,4.2,
|
| 977 |
+
≥5,2000,chartqa_relaxed_overall,0.3732,0.009675026948726469
|
| 978 |
+
≥5,2000,docvqa_val_anls,0.331054267713041,0.005645142408620243
|
| 979 |
+
≥5,2000,infovqa_val_anls,0.1253737215538702,0.00524700917894423
|
| 980 |
+
≥5,2000,mme_total_score,678.2414965986395,
|
| 981 |
+
≥5,2000,mmmu_val_mmmu_acc,0.24,
|
| 982 |
+
≥5,2000,mmstar_average,0.24144442112149672,
|
| 983 |
+
≥5,2000,ocrbench_ocrbench_accuracy,0.264,
|
| 984 |
+
≥5,2000,seedbench_seed_all,0.33140633685380766,
|
| 985 |
+
≥5,2000,textvqa_val_exact_match,0.33612,0.006470505591414144
|
| 986 |
+
≥5,3000,ai2d_exact_match,0.22409326424870465,0.007505002611196186
|
| 987 |
+
≥5,3000,average,0.29997958942235364,
|
| 988 |
+
≥5,3000,average_rank,4.3,
|
| 989 |
+
≥5,3000,chartqa_relaxed_overall,0.392,0.00976588700628918
|
| 990 |
+
≥5,3000,docvqa_val_anls,0.37299390513630937,0.005683849773109756
|
| 991 |
+
≥5,3000,infovqa_val_anls,0.13605101039483827,0.005410567699808442
|
| 992 |
+
≥5,3000,mme_total_score,659.7210884353742,
|
| 993 |
+
≥5,3000,mmmu_val_mmmu_acc,0.27,
|
| 994 |
+
≥5,3000,mmstar_average,0.2682811266889234,
|
| 995 |
+
≥5,3000,ocrbench_ocrbench_accuracy,0.287,
|
| 996 |
+
≥5,3000,seedbench_seed_all,0.3745969983324069,
|
| 997 |
+
≥5,3000,textvqa_val_exact_match,0.3748,0.006628980364742018
|
| 998 |
+
≥5,4000,ai2d_exact_match,0.22733160621761658,0.007543244231635894
|
| 999 |
+
≥5,4000,average,0.3084813519082869,
|
| 1000 |
+
≥5,4000,average_rank,4.7,
|
| 1001 |
+
≥5,4000,chartqa_relaxed_overall,0.43,0.00990349593288537
|
| 1002 |
+
≥5,4000,docvqa_val_anls,0.4066720118815712,0.006028824654560211
|
| 1003 |
+
≥5,4000,infovqa_val_anls,0.14319025154556023,0.005617800071290847
|
| 1004 |
+
≥5,4000,mme_total_score,656.1462585034013,
|
| 1005 |
+
≥5,4000,mmmu_val_mmmu_acc,0.25667,
|
| 1006 |
+
≥5,4000,mmstar_average,0.2585945343280555,
|
| 1007 |
+
≥5,4000,ocrbench_ocrbench_accuracy,0.294,
|
| 1008 |
+
≥5,4000,seedbench_seed_all,0.39277376320177876,
|
| 1009 |
+
≥5,4000,textvqa_val_exact_match,0.3671,0.006592830278584186
|
| 1010 |
+
≥5,5000,ai2d_exact_match,0.24028497409326424,0.007689893942245019
|
| 1011 |
+
≥5,5000,average,0.3230129052623469,
|
| 1012 |
+
≥5,5000,average_rank,4.9,
|
| 1013 |
+
≥5,5000,chartqa_relaxed_overall,0.442,0.009934479228979264
|
| 1014 |
+
≥5,5000,docvqa_val_anls,0.43465518326761016,0.006092084287625314
|
| 1015 |
+
≥5,5000,infovqa_val_anls,0.16044569408280707,0.005985099003597859
|
| 1016 |
+
≥5,5000,mme_total_score,700.9234693877552,
|
| 1017 |
+
≥5,5000,mmmu_val_mmmu_acc,0.26,
|
| 1018 |
+
≥5,5000,mmstar_average,0.27948727201527235,
|
| 1019 |
+
≥5,5000,ocrbench_ocrbench_accuracy,0.309,
|
| 1020 |
+
≥5,5000,seedbench_seed_all,0.39744302390216785,
|
| 1021 |
+
≥5,5000,textvqa_val_exact_match,0.3838,0.006651041968883851
|
| 1022 |
+
≥5,6000,ai2d_exact_match,0.21761658031088082,0.007426556596739526
|
| 1023 |
+
≥5,6000,average,0.3285664644731758,
|
| 1024 |
+
≥5,6000,average_rank,4.8,
|
| 1025 |
+
≥5,6000,chartqa_relaxed_overall,0.4708,0.009984929820955767
|
| 1026 |
+
≥5,6000,docvqa_val_anls,0.4274906773084525,0.005930539560380286
|
| 1027 |
+
≥5,6000,infovqa_val_anls,0.15122815225662642,0.005687399721363878
|
| 1028 |
+
≥5,6000,mme_total_score,692.2227891156463,
|
| 1029 |
+
≥5,6000,mmmu_val_mmmu_acc,0.27,
|
| 1030 |
+
≥5,6000,mmstar_average,0.27596736182231085,
|
| 1031 |
+
≥5,6000,ocrbench_ocrbench_accuracy,0.341,
|
| 1032 |
+
≥5,6000,seedbench_seed_all,0.4237354085603113,
|
| 1033 |
+
≥5,6000,textvqa_val_exact_match,0.37926,0.006628782590470618
|
| 1034 |
+
≥5,7000,ai2d_exact_match,0.22959844559585493,0.007569631399592313
|
| 1035 |
+
≥5,7000,average,0.3397133831241853,
|
| 1036 |
+
≥5,7000,average_rank,5.0,
|
| 1037 |
+
≥5,7000,chartqa_relaxed_overall,0.4864,0.009998299975543861
|
| 1038 |
+
≥5,7000,docvqa_val_anls,0.4538685197224749,0.00598758370400633
|
| 1039 |
+
≥5,7000,infovqa_val_anls,0.15500462855057698,0.005842239614739797
|
| 1040 |
+
≥5,7000,mme_total_score,662.3809523809523,
|
| 1041 |
+
≥5,7000,mmmu_val_mmmu_acc,0.26444,
|
| 1042 |
+
≥5,7000,mmstar_average,0.2946102327923966,
|
| 1043 |
+
≥5,7000,ocrbench_ocrbench_accuracy,0.339,
|
| 1044 |
+
≥5,7000,seedbench_seed_all,0.43351862145636466,
|
| 1045 |
+
≥5,7000,textvqa_val_exact_match,0.40098,0.00668858395709213
|
| 1046 |
+
≥5,8000,ai2d_exact_match,0.26878238341968913,0.007979127569354613
|
| 1047 |
+
≥5,8000,average,0.3468669425903158,
|
| 1048 |
+
≥5,8000,average_rank,4.7,
|
| 1049 |
+
≥5,8000,chartqa_relaxed_overall,0.4644,0.009976616117083942
|
| 1050 |
+
≥5,8000,docvqa_val_anls,0.43320064291973065,0.005825461000081097
|
| 1051 |
+
≥5,8000,infovqa_val_anls,0.1525871677997588,0.0057380999639673955
|
| 1052 |
+
≥5,8000,mme_total_score,714.7789115646258,
|
| 1053 |
+
≥5,8000,mmmu_val_mmmu_acc,0.27667,
|
| 1054 |
+
≥5,8000,mmstar_average,0.3189178311414238,
|
| 1055 |
+
≥5,8000,ocrbench_ocrbench_accuracy,0.358,
|
| 1056 |
+
≥5,8000,seedbench_seed_all,0.4440244580322401,
|
| 1057 |
+
≥5,8000,textvqa_val_exact_match,0.40522,0.006705157876473132
|
| 1058 |
+
≥5,9000,ai2d_exact_match,0.23834196891191708,0.007668527149232641
|
| 1059 |
+
≥5,9000,average,0.34742834361066494,
|
| 1060 |
+
≥5,9000,average_rank,4.9,
|
| 1061 |
+
≥5,9000,chartqa_relaxed_overall,0.4832,0.009996353076494045
|
| 1062 |
+
≥5,9000,docvqa_val_anls,0.44997891177952337,0.005999690608407377
|
| 1063 |
+
≥5,9000,infovqa_val_anls,0.15249014258349003,0.005725765633377559
|
| 1064 |
+
≥5,9000,mme_total_score,696.5544217687075,
|
| 1065 |
+
≥5,9000,mmmu_val_mmmu_acc,0.26444,
|
| 1066 |
+
≥5,9000,mmstar_average,0.3019547640515156,
|
| 1067 |
+
≥5,9000,ocrbench_ocrbench_accuracy,0.384,
|
| 1068 |
+
≥5,9000,seedbench_seed_all,0.44874930516953865,
|
| 1069 |
+
≥5,9000,textvqa_val_exact_match,0.4037,0.006699928343494548
|
| 1070 |
+
≥5,10000,ai2d_exact_match,0.2979274611398964,0.008231480357867917
|
| 1071 |
+
≥5,10000,average,0.3538147252476138,
|
| 1072 |
+
≥5,10000,average_rank,4.8,
|
| 1073 |
+
≥5,10000,chartqa_relaxed_overall,0.48,0.009993995796516643
|
| 1074 |
+
≥5,10000,docvqa_val_anls,0.45125781190343667,0.0059273100312449535
|
| 1075 |
+
≥5,10000,infovqa_val_anls,0.15739085013451903,0.005776029267754871
|
| 1076 |
+
≥5,10000,mme_total_score,718.7227891156463,
|
| 1077 |
+
≥5,10000,mmmu_val_mmmu_acc,0.27556,
|
| 1078 |
+
≥5,10000,mmstar_average,0.3004387942674594,
|
| 1079 |
+
≥5,10000,ocrbench_ocrbench_accuracy,0.357,
|
| 1080 |
+
≥5,10000,seedbench_seed_all,0.4556976097832129,
|
| 1081 |
+
≥5,10000,textvqa_val_exact_match,0.40906000000000003,0.006714715240436636
|
| 1082 |
+
≥5,11000,ai2d_exact_match,0.3167098445595855,0.008372690712254882
|
| 1083 |
+
≥5,11000,average,0.36396020347184427,
|
| 1084 |
+
≥5,11000,average_rank,5.0,
|
| 1085 |
+
≥5,11000,chartqa_relaxed_overall,0.4924,0.010000845102345324
|
| 1086 |
+
≥5,11000,docvqa_val_anls,0.4691277601070516,0.0060867637597330085
|
| 1087 |
+
≥5,11000,infovqa_val_anls,0.15562897334070494,0.005768608804593679
|
| 1088 |
+
≥5,11000,mme_total_score,680.7667066826731,
|
| 1089 |
+
≥5,11000,mmmu_val_mmmu_acc,0.27667,
|
| 1090 |
+
≥5,11000,mmstar_average,0.3111702671358657,
|
| 1091 |
+
≥5,11000,ocrbench_ocrbench_accuracy,0.388,
|
| 1092 |
+
≥5,11000,seedbench_seed_all,0.45497498610339077,
|
| 1093 |
+
≥5,11000,textvqa_val_exact_match,0.41096000000000005,0.006715250896200365
|
| 1094 |
+
≥5,12000,ai2d_exact_match,0.24838082901554404,0.007776597937116943
|
| 1095 |
+
≥5,12000,average,0.35400963042471534,
|
| 1096 |
+
≥5,12000,average_rank,4.9,
|
| 1097 |
+
≥5,12000,chartqa_relaxed_overall,0.4624,0.00997367964766694
|
| 1098 |
+
≥5,12000,docvqa_val_anls,0.46480289866811825,0.005910238300168798
|
| 1099 |
+
≥5,12000,infovqa_val_anls,0.15657154481637633,0.0057842205757870115
|
| 1100 |
+
≥5,12000,mme_total_score,742.4894957983194,
|
| 1101 |
+
≥5,12000,mmmu_val_mmmu_acc,0.28444,
|
| 1102 |
+
≥5,12000,mmstar_average,0.30237252416842486,
|
| 1103 |
+
≥5,12000,ocrbench_ocrbench_accuracy,0.391,
|
| 1104 |
+
≥5,12000,seedbench_seed_all,0.46197887715397445,
|
| 1105 |
+
≥5,12000,textvqa_val_exact_match,0.41414000000000006,0.0067237975855013775
|
| 1106 |
+
≥5,13000,ai2d_exact_match,0.27266839378238344,0.008015217564479081
|
| 1107 |
+
≥5,13000,average,0.3605408154099655,
|
| 1108 |
+
≥5,13000,average_rank,4.9,
|
| 1109 |
+
≥5,13000,chartqa_relaxed_overall,0.4796,0.00999367226769808
|
| 1110 |
+
≥5,13000,docvqa_val_anls,0.4888368998254502,0.006080092164054846
|
| 1111 |
+
≥5,13000,infovqa_val_anls,0.1685412928680358,0.006153102666352037
|
| 1112 |
+
≥5,13000,mme_total_score,715.9022609043617,
|
| 1113 |
+
≥5,13000,mmmu_val_mmmu_acc,0.27,
|
| 1114 |
+
≥5,13000,mmstar_average,0.30550310907874534,
|
| 1115 |
+
≥5,13000,ocrbench_ocrbench_accuracy,0.39,
|
| 1116 |
+
≥5,13000,seedbench_seed_all,0.46375764313507506,
|
| 1117 |
+
≥5,13000,textvqa_val_exact_match,0.40596000000000004,0.006708225975557757
|
| 1118 |
+
≥5,14000,ai2d_exact_match,0.27266839378238344,0.008015217564479094
|
| 1119 |
+
≥5,14000,average,0.35876061642606916,
|
| 1120 |
+
≥5,14000,average_rank,4.9,
|
| 1121 |
+
≥5,14000,chartqa_relaxed_overall,0.4832,0.009996353076494045
|
| 1122 |
+
≥5,14000,docvqa_val_anls,0.4686745608937551,0.005954780465596843
|
| 1123 |
+
≥5,14000,infovqa_val_anls,0.16026985404926572,0.00587737555538511
|
| 1124 |
+
≥5,14000,mme_total_score,694.7702080832332,
|
| 1125 |
+
≥5,14000,mmmu_val_mmmu_acc,0.27778,
|
| 1126 |
+
≥5,14000,mmstar_average,0.3065739842454048,
|
| 1127 |
+
≥5,14000,ocrbench_ocrbench_accuracy,0.388,
|
| 1128 |
+
≥5,14000,seedbench_seed_all,0.46575875486381324,
|
| 1129 |
+
≥5,14000,textvqa_val_exact_match,0.40592,0.006717590038338499
|
| 1130 |
+
≥5,15000,ai2d_exact_match,0.26295336787564766,0.007923526907377253
|
| 1131 |
+
≥5,15000,average,0.3594508046372947,
|
| 1132 |
+
≥5,15000,average_rank,4.9,
|
| 1133 |
+
≥5,15000,chartqa_relaxed_overall,0.4904,0.010000156861514821
|
| 1134 |
+
≥5,15000,docvqa_val_anls,0.47702085294845603,0.006014469495902542
|
| 1135 |
+
≥5,15000,infovqa_val_anls,0.1709556715444569,0.006117350998294382
|
| 1136 |
+
≥5,15000,mme_total_score,748.1163465386154,
|
| 1137 |
+
≥5,15000,mmmu_val_mmmu_acc,0.25667,
|
| 1138 |
+
≥5,15000,mmstar_average,0.2990729469212882,
|
| 1139 |
+
≥5,15000,ocrbench_ocrbench_accuracy,0.404,
|
| 1140 |
+
≥5,15000,seedbench_seed_all,0.46392440244580324,
|
| 1141 |
+
≥5,15000,textvqa_val_exact_match,0.4100599999999999,0.0067243737790625615
|
| 1142 |
+
≥5,16000,ai2d_exact_match,0.28950777202072536,0.00816284339533906
|
| 1143 |
+
≥5,16000,average,0.3652803192394071,
|
| 1144 |
+
≥5,16000,average_rank,4.9,
|
| 1145 |
+
≥5,16000,chartqa_relaxed_overall,0.5004,0.010001997399559365
|
| 1146 |
+
≥5,16000,docvqa_val_anls,0.4789319968433556,0.005936381904079473
|
| 1147 |
+
≥5,16000,infovqa_val_anls,0.16818261112655605,0.006062058685336811
|
| 1148 |
+
≥5,16000,mme_total_score,703.8838535414166,
|
| 1149 |
+
≥5,16000,mmmu_val_mmmu_acc,0.28111,
|
| 1150 |
+
≥5,16000,mmstar_average,0.30021933140749574,
|
| 1151 |
+
≥5,16000,ocrbench_ocrbench_accuracy,0.392,
|
| 1152 |
+
≥5,16000,seedbench_seed_all,0.4640911617565314,
|
| 1153 |
+
≥5,16000,textvqa_val_exact_match,0.41308,0.006723304491442948
|
| 1154 |
+
≥5,17000,ai2d_exact_match,0.28335492227979275,0.008110527983566212
|
| 1155 |
+
≥5,17000,average,0.36065417779712866,
|
| 1156 |
+
≥5,17000,average_rank,5.0,
|
| 1157 |
+
≥5,17000,chartqa_relaxed_overall,0.4688,0.009982508912777261
|
| 1158 |
+
≥5,17000,docvqa_val_anls,0.4676527518642357,0.00590362287731878
|
| 1159 |
+
≥5,17000,infovqa_val_anls,0.16818540516392913,0.00605571000794457
|
| 1160 |
+
≥5,17000,mme_total_score,754.0354141656662,
|
| 1161 |
+
≥5,17000,mmmu_val_mmmu_acc,0.26222,
|
| 1162 |
+
≥5,17000,mmstar_average,0.31626391497403816,
|
| 1163 |
+
≥5,17000,ocrbench_ocrbench_accuracy,0.404,
|
| 1164 |
+
≥5,17000,seedbench_seed_all,0.46309060589216233,
|
| 1165 |
+
≥5,17000,textvqa_val_exact_match,0.41231999999999996,0.006722044383678169
|
| 1166 |
+
≥5,18000,ai2d_exact_match,0.2911269430051813,0.00817630569100236
|
| 1167 |
+
≥5,18000,average,0.3642489832139911,
|
| 1168 |
+
≥5,18000,average_rank,4.9,
|
| 1169 |
+
≥5,18000,chartqa_relaxed_overall,0.488,0.009999119609104738
|
| 1170 |
+
≥5,18000,docvqa_val_anls,0.4852288069276555,0.006044640681137398
|
| 1171 |
+
≥5,18000,infovqa_val_anls,0.1659765406298008,0.006009331694189444
|
| 1172 |
+
≥5,18000,mme_total_score,748.4861944777911,
|
| 1173 |
+
≥5,18000,mmmu_val_mmmu_acc,0.28111,
|
| 1174 |
+
≥5,18000,mmstar_average,0.3014618713149217,
|
| 1175 |
+
≥5,18000,ocrbench_ocrbench_accuracy,0.389,
|
| 1176 |
+
≥5,18000,seedbench_seed_all,0.4660366870483602,
|
| 1177 |
+
≥5,18000,textvqa_val_exact_match,0.4103,0.0067180509406887
|
| 1178 |
+
≥5,19000,ai2d_exact_match,0.2817357512953368,0.008096452844781159
|
| 1179 |
+
≥5,19000,average,0.35871512802442374,
|
| 1180 |
+
≥5,19000,average_rank,4.7,
|
| 1181 |
+
≥5,19000,chartqa_relaxed_overall,0.452,0.009955804699716018
|
| 1182 |
+
≥5,19000,docvqa_val_anls,0.4693437417424619,0.005945802716190409
|
| 1183 |
+
≥5,19000,infovqa_val_anls,0.17352672765291935,0.006108049035774969
|
| 1184 |
+
≥5,19000,mme_total_score,757.4390756302521,
|
| 1185 |
+
≥5,19000,mmmu_val_mmmu_acc,0.29556,
|
| 1186 |
+
≥5,19000,mmstar_average,0.299605929305638,
|
| 1187 |
+
≥5,19000,ocrbench_ocrbench_accuracy,0.382,
|
| 1188 |
+
≥5,19000,seedbench_seed_all,0.4672040022234575,
|
| 1189 |
+
≥5,19000,textvqa_val_exact_match,0.40746,0.006711235192985202
|
| 1190 |
+
≥5,20000,ai2d_exact_match,0.28950777202072536,0.008162843395339051
|
| 1191 |
+
≥5,20000,average,0.3571101844602158,
|
| 1192 |
+
≥5,20000,average_rank,5.0,
|
| 1193 |
+
≥5,20000,chartqa_relaxed_overall,0.452,0.009955804699716018
|
| 1194 |
+
≥5,20000,docvqa_val_anls,0.4781541164812954,0.006040598891772297
|
| 1195 |
+
≥5,20000,infovqa_val_anls,0.16871824680773087,0.00599943702354704
|
| 1196 |
+
≥5,20000,mme_total_score,713.3514405762305,
|
| 1197 |
+
≥5,20000,mmmu_val_mmmu_acc,0.26667,
|
| 1198 |
+
≥5,20000,mmstar_average,0.30644375940695473,
|
| 1199 |
+
≥5,20000,ocrbench_ocrbench_accuracy,0.398,
|
| 1200 |
+
≥5,20000,seedbench_seed_all,0.4599777654252362,
|
| 1201 |
+
≥5,20000,textvqa_val_exact_match,0.39452000000000004,0.006680937127692554
|
app/src/content/assets/data/formatting_filters.csv
ADDED
|
@@ -0,0 +1,1201 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
run,step,metric,value,stderr
|
| 2 |
+
Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
|
| 3 |
+
Baseline,1000,average,0.27120689295763617,
|
| 4 |
+
Baseline,1000,average_rank,3.8,
|
| 5 |
+
Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
|
| 6 |
+
Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
|
| 7 |
+
Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
|
| 8 |
+
Baseline,1000,mme_total_score,977.4280712284914,
|
| 9 |
+
Baseline,1000,mmmu_val_mmmu_acc,0.25222,
|
| 10 |
+
Baseline,1000,mmstar_average,0.23215874078908072,
|
| 11 |
+
Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
|
| 12 |
+
Baseline,1000,seedbench_seed_all,0.2563646470261256,
|
| 13 |
+
Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
|
| 14 |
+
Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
|
| 15 |
+
Baseline,2000,average,0.3202068275596269,
|
| 16 |
+
Baseline,2000,average_rank,3.7,
|
| 17 |
+
Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
|
| 18 |
+
Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
|
| 19 |
+
Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
|
| 20 |
+
Baseline,2000,mme_total_score,1049.3036214485794,
|
| 21 |
+
Baseline,2000,mmmu_val_mmmu_acc,0.24556,
|
| 22 |
+
Baseline,2000,mmstar_average,0.21305462434540698,
|
| 23 |
+
Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
|
| 24 |
+
Baseline,2000,seedbench_seed_all,0.258532518065592,
|
| 25 |
+
Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
|
| 26 |
+
Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
|
| 27 |
+
Baseline,3000,average,0.3507423834414229,
|
| 28 |
+
Baseline,3000,average_rank,2.6,
|
| 29 |
+
Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
|
| 30 |
+
Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
|
| 31 |
+
Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
|
| 32 |
+
Baseline,3000,mme_total_score,1170.2383953581434,
|
| 33 |
+
Baseline,3000,mmmu_val_mmmu_acc,0.27556,
|
| 34 |
+
Baseline,3000,mmstar_average,0.25432376938577683,
|
| 35 |
+
Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
|
| 36 |
+
Baseline,3000,seedbench_seed_all,0.2792106725958866,
|
| 37 |
+
Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
|
| 38 |
+
Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
|
| 39 |
+
Baseline,4000,average,0.36961781722974835,
|
| 40 |
+
Baseline,4000,average_rank,2.8,
|
| 41 |
+
Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
|
| 42 |
+
Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
|
| 43 |
+
Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
|
| 44 |
+
Baseline,4000,mme_total_score,1155.203781512605,
|
| 45 |
+
Baseline,4000,mmmu_val_mmmu_acc,0.25556,
|
| 46 |
+
Baseline,4000,mmstar_average,0.2575590188757354,
|
| 47 |
+
Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
|
| 48 |
+
Baseline,4000,seedbench_seed_all,0.33913285158421347,
|
| 49 |
+
Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
|
| 50 |
+
Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
|
| 51 |
+
Baseline,5000,average,0.3974627910380972,
|
| 52 |
+
Baseline,5000,average_rank,3.1,
|
| 53 |
+
Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
|
| 54 |
+
Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
|
| 55 |
+
Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
|
| 56 |
+
Baseline,5000,mme_total_score,1181.4653861544618,
|
| 57 |
+
Baseline,5000,mmmu_val_mmmu_acc,0.26667,
|
| 58 |
+
Baseline,5000,mmstar_average,0.29596648146165705,
|
| 59 |
+
Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
|
| 60 |
+
Baseline,5000,seedbench_seed_all,0.43107281823235133,
|
| 61 |
+
Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
|
| 62 |
+
Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
|
| 63 |
+
Baseline,6000,average,0.4161227404571003,
|
| 64 |
+
Baseline,6000,average_rank,2.3,
|
| 65 |
+
Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
|
| 66 |
+
Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
|
| 67 |
+
Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
|
| 68 |
+
Baseline,6000,mme_total_score,1284.1648659463785,
|
| 69 |
+
Baseline,6000,mmmu_val_mmmu_acc,0.27111,
|
| 70 |
+
Baseline,6000,mmstar_average,0.2978489412854164,
|
| 71 |
+
Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
|
| 72 |
+
Baseline,6000,seedbench_seed_all,0.4795997776542524,
|
| 73 |
+
Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
|
| 74 |
+
Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
|
| 75 |
+
Baseline,7000,average,0.4291083177345374,
|
| 76 |
+
Baseline,7000,average_rank,2.6,
|
| 77 |
+
Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
|
| 78 |
+
Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
|
| 79 |
+
Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
|
| 80 |
+
Baseline,7000,mme_total_score,1185.875650260104,
|
| 81 |
+
Baseline,7000,mmmu_val_mmmu_acc,0.26556,
|
| 82 |
+
Baseline,7000,mmstar_average,0.31372400960777047,
|
| 83 |
+
Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
|
| 84 |
+
Baseline,7000,seedbench_seed_all,0.4964424680377988,
|
| 85 |
+
Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
|
| 86 |
+
Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
|
| 87 |
+
Baseline,8000,average,0.43846759477995995,
|
| 88 |
+
Baseline,8000,average_rank,2.1,
|
| 89 |
+
Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
|
| 90 |
+
Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
|
| 91 |
+
Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
|
| 92 |
+
Baseline,8000,mme_total_score,1199.2409963985594,
|
| 93 |
+
Baseline,8000,mmmu_val_mmmu_acc,0.28111,
|
| 94 |
+
Baseline,8000,mmstar_average,0.33512257186205047,
|
| 95 |
+
Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
|
| 96 |
+
Baseline,8000,seedbench_seed_all,0.5024458032240133,
|
| 97 |
+
Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
|
| 98 |
+
Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
|
| 99 |
+
Baseline,9000,average,0.4422510732201056,
|
| 100 |
+
Baseline,9000,average_rank,2.1,
|
| 101 |
+
Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
|
| 102 |
+
Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
|
| 103 |
+
Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
|
| 104 |
+
Baseline,9000,mme_total_score,1231.5195078031213,
|
| 105 |
+
Baseline,9000,mmmu_val_mmmu_acc,0.25889,
|
| 106 |
+
Baseline,9000,mmstar_average,0.3216444898242951,
|
| 107 |
+
Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
|
| 108 |
+
Baseline,9000,seedbench_seed_all,0.5120622568093385,
|
| 109 |
+
Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
|
| 110 |
+
Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
|
| 111 |
+
Baseline,10000,average,0.4523875703250908,
|
| 112 |
+
Baseline,10000,average_rank,2.4,
|
| 113 |
+
Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
|
| 114 |
+
Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
|
| 115 |
+
Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
|
| 116 |
+
Baseline,10000,mme_total_score,1240.8218287314926,
|
| 117 |
+
Baseline,10000,mmmu_val_mmmu_acc,0.28778,
|
| 118 |
+
Baseline,10000,mmstar_average,0.32972717906018517,
|
| 119 |
+
Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
|
| 120 |
+
Baseline,10000,seedbench_seed_all,0.5217342968315731,
|
| 121 |
+
Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
|
| 122 |
+
Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
|
| 123 |
+
Baseline,11000,average,0.4561398159525099,
|
| 124 |
+
Baseline,11000,average_rank,2.1,
|
| 125 |
+
Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
|
| 126 |
+
Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
|
| 127 |
+
Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
|
| 128 |
+
Baseline,11000,mme_total_score,1322.9488795518205,
|
| 129 |
+
Baseline,11000,mmmu_val_mmmu_acc,0.27778,
|
| 130 |
+
Baseline,11000,mmstar_average,0.3298563439522548,
|
| 131 |
+
Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
|
| 132 |
+
Baseline,11000,seedbench_seed_all,0.5237354085603113,
|
| 133 |
+
Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
|
| 134 |
+
Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
|
| 135 |
+
Baseline,12000,average,0.4582751140055433,
|
| 136 |
+
Baseline,12000,average_rank,2.4,
|
| 137 |
+
Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 138 |
+
Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
|
| 139 |
+
Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
|
| 140 |
+
Baseline,12000,mme_total_score,1225.6453581432572,
|
| 141 |
+
Baseline,12000,mmmu_val_mmmu_acc,0.27889,
|
| 142 |
+
Baseline,12000,mmstar_average,0.34010867846816534,
|
| 143 |
+
Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
|
| 144 |
+
Baseline,12000,seedbench_seed_all,0.5350194552529183,
|
| 145 |
+
Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
|
| 146 |
+
Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
|
| 147 |
+
Baseline,13000,average,0.4692868662590049,
|
| 148 |
+
Baseline,13000,average_rank,1.6,
|
| 149 |
+
Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
|
| 150 |
+
Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
|
| 151 |
+
Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
|
| 152 |
+
Baseline,13000,mme_total_score,1281.7122849139657,
|
| 153 |
+
Baseline,13000,mmmu_val_mmmu_acc,0.28222,
|
| 154 |
+
Baseline,13000,mmstar_average,0.3453069542917521,
|
| 155 |
+
Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
|
| 156 |
+
Baseline,13000,seedbench_seed_all,0.5442468037798777,
|
| 157 |
+
Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
|
| 158 |
+
Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
|
| 159 |
+
Baseline,14000,average,0.47352486841689195,
|
| 160 |
+
Baseline,14000,average_rank,1.7,
|
| 161 |
+
Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
|
| 162 |
+
Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
|
| 163 |
+
Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
|
| 164 |
+
Baseline,14000,mme_total_score,1309.1444577831132,
|
| 165 |
+
Baseline,14000,mmmu_val_mmmu_acc,0.28111,
|
| 166 |
+
Baseline,14000,mmstar_average,0.34575818188776586,
|
| 167 |
+
Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
|
| 168 |
+
Baseline,14000,seedbench_seed_all,0.5483602001111729,
|
| 169 |
+
Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
|
| 170 |
+
Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
|
| 171 |
+
Baseline,15000,average,0.47878665012878824,
|
| 172 |
+
Baseline,15000,average_rank,1.6,
|
| 173 |
+
Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 174 |
+
Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
|
| 175 |
+
Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
|
| 176 |
+
Baseline,15000,mme_total_score,1384.2171868747498,
|
| 177 |
+
Baseline,15000,mmmu_val_mmmu_acc,0.30222,
|
| 178 |
+
Baseline,15000,mmstar_average,0.35408135695920684,
|
| 179 |
+
Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
|
| 180 |
+
Baseline,15000,seedbench_seed_all,0.5411339633129516,
|
| 181 |
+
Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
|
| 182 |
+
Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
|
| 183 |
+
Baseline,16000,average,0.47665128022935843,
|
| 184 |
+
Baseline,16000,average_rank,1.6,
|
| 185 |
+
Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 186 |
+
Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
|
| 187 |
+
Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
|
| 188 |
+
Baseline,16000,mme_total_score,1317.8491396558625,
|
| 189 |
+
Baseline,16000,mmmu_val_mmmu_acc,0.27556,
|
| 190 |
+
Baseline,16000,mmstar_average,0.33214333327093315,
|
| 191 |
+
Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
|
| 192 |
+
Baseline,16000,seedbench_seed_all,0.5463590883824346,
|
| 193 |
+
Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
|
| 194 |
+
Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
|
| 195 |
+
Baseline,17000,average,0.4777141780162423,
|
| 196 |
+
Baseline,17000,average_rank,1.9,
|
| 197 |
+
Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 198 |
+
Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
|
| 199 |
+
Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
|
| 200 |
+
Baseline,17000,mme_total_score,1381.9161664665867,
|
| 201 |
+
Baseline,17000,mmmu_val_mmmu_acc,0.27667,
|
| 202 |
+
Baseline,17000,mmstar_average,0.3370289492329521,
|
| 203 |
+
Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
|
| 204 |
+
Baseline,17000,seedbench_seed_all,0.5510283490828238,
|
| 205 |
+
Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
|
| 206 |
+
Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
|
| 207 |
+
Baseline,18000,average,0.4819834595278701,
|
| 208 |
+
Baseline,18000,average_rank,1.7,
|
| 209 |
+
Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
|
| 210 |
+
Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
|
| 211 |
+
Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
|
| 212 |
+
Baseline,18000,mme_total_score,1336.922769107643,
|
| 213 |
+
Baseline,18000,mmmu_val_mmmu_acc,0.28667,
|
| 214 |
+
Baseline,18000,mmstar_average,0.34482796716566916,
|
| 215 |
+
Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
|
| 216 |
+
Baseline,18000,seedbench_seed_all,0.5543079488604781,
|
| 217 |
+
Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
|
| 218 |
+
Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
|
| 219 |
+
Baseline,19000,average,0.4899006713916878,
|
| 220 |
+
Baseline,19000,average_rank,1.5,
|
| 221 |
+
Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
|
| 222 |
+
Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
|
| 223 |
+
Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
|
| 224 |
+
Baseline,19000,mme_total_score,1406.6628651460583,
|
| 225 |
+
Baseline,19000,mmmu_val_mmmu_acc,0.28333,
|
| 226 |
+
Baseline,19000,mmstar_average,0.356220913822775,
|
| 227 |
+
Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
|
| 228 |
+
Baseline,19000,seedbench_seed_all,0.554585881045025,
|
| 229 |
+
Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
|
| 230 |
+
Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
|
| 231 |
+
Baseline,20000,average,0.4873169067639118,
|
| 232 |
+
Baseline,20000,average_rank,1.7,
|
| 233 |
+
Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
|
| 234 |
+
Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
|
| 235 |
+
Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
|
| 236 |
+
Baseline,20000,mme_total_score,1324.6738695478193,
|
| 237 |
+
Baseline,20000,mmmu_val_mmmu_acc,0.30111,
|
| 238 |
+
Baseline,20000,mmstar_average,0.33806766134497995,
|
| 239 |
+
Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
|
| 240 |
+
Baseline,20000,seedbench_seed_all,0.5587548638132296,
|
| 241 |
+
Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
|
| 242 |
+
≥2,1000,ai2d_exact_match,0.2619818652849741,0.007914086941902845
|
| 243 |
+
≥2,1000,average,0.2852885776543714,
|
| 244 |
+
≥2,1000,average_rank,2.8,
|
| 245 |
+
≥2,1000,chartqa_relaxed_overall,0.36,0.009601920576192066
|
| 246 |
+
≥2,1000,docvqa_val_anls,0.3691495236959511,0.0059102400877721764
|
| 247 |
+
≥2,1000,infovqa_val_anls,0.18005913830944342,0.006300821228003093
|
| 248 |
+
≥2,1000,mme_total_score,1034.4992997198879,
|
| 249 |
+
≥2,1000,mmmu_val_mmmu_acc,0.25222,
|
| 250 |
+
≥2,1000,mmstar_average,0.20333316409480473,
|
| 251 |
+
≥2,1000,ocrbench_ocrbench_accuracy,0.331,
|
| 252 |
+
≥2,1000,seedbench_seed_all,0.264313507504169,
|
| 253 |
+
≥2,1000,textvqa_val_exact_match,0.34554,0.006483180392801138
|
| 254 |
+
≥2,2000,ai2d_exact_match,0.25971502590673573,0.007891865786132407
|
| 255 |
+
≥2,2000,average,0.3309474525195546,
|
| 256 |
+
≥2,2000,average_rank,2.3,
|
| 257 |
+
≥2,2000,chartqa_relaxed_overall,0.4664,0.009979391329160321
|
| 258 |
+
≥2,2000,docvqa_val_anls,0.44591152951098784,0.006197056264354256
|
| 259 |
+
≥2,2000,infovqa_val_anls,0.20775747304541303,0.006645281501282388
|
| 260 |
+
≥2,2000,mme_total_score,1083.1982793117247,
|
| 261 |
+
≥2,2000,mmmu_val_mmmu_acc,0.26222,
|
| 262 |
+
≥2,2000,mmstar_average,0.23515873070535015,
|
| 263 |
+
≥2,2000,ocrbench_ocrbench_accuracy,0.413,
|
| 264 |
+
≥2,2000,seedbench_seed_all,0.2757643135075042,
|
| 265 |
+
≥2,2000,textvqa_val_exact_match,0.4126,0.006707581257746032
|
| 266 |
+
≥2,3000,ai2d_exact_match,0.27299222797927464,0.00801819019286542
|
| 267 |
+
≥2,3000,average,0.35386512749374127,
|
| 268 |
+
≥2,3000,average_rank,1.8,
|
| 269 |
+
≥2,3000,chartqa_relaxed_overall,0.5124,0.009998924311892653
|
| 270 |
+
≥2,3000,docvqa_val_anls,0.48910828732243933,0.006274136020264289
|
| 271 |
+
≥2,3000,infovqa_val_anls,0.2070472808493129,0.0065577848697521875
|
| 272 |
+
≥2,3000,mme_total_score,1128.4556822729091,
|
| 273 |
+
≥2,3000,mmmu_val_mmmu_acc,0.25222,
|
| 274 |
+
≥2,3000,mmstar_average,0.25601322622316125,
|
| 275 |
+
≥2,3000,ocrbench_ocrbench_accuracy,0.447,
|
| 276 |
+
≥2,3000,seedbench_seed_all,0.30522512506948307,
|
| 277 |
+
≥2,3000,textvqa_val_exact_match,0.44278,0.006763750733490772
|
| 278 |
+
≥2,4000,ai2d_exact_match,0.3173575129533679,0.008377274276497445
|
| 279 |
+
≥2,4000,average,0.3859767914191833,
|
| 280 |
+
≥2,4000,average_rank,2.4,
|
| 281 |
+
≥2,4000,chartqa_relaxed_overall,0.5388,0.0099718403035556
|
| 282 |
+
≥2,4000,docvqa_val_anls,0.5360750144064242,0.006293888693576319
|
| 283 |
+
≥2,4000,infovqa_val_anls,0.20188347210673038,0.0064101781168258935
|
| 284 |
+
≥2,4000,mme_total_score,1110.1481592637056,
|
| 285 |
+
≥2,4000,mmmu_val_mmmu_acc,0.25222,
|
| 286 |
+
≥2,4000,mmstar_average,0.28905163247788923,
|
| 287 |
+
≥2,4000,ocrbench_ocrbench_accuracy,0.473,
|
| 288 |
+
≥2,4000,seedbench_seed_all,0.4102834908282379,
|
| 289 |
+
≥2,4000,textvqa_val_exact_match,0.4551200000000001,0.0067829756649846785
|
| 290 |
+
≥2,5000,ai2d_exact_match,0.32577720207253885,0.008435168191407938
|
| 291 |
+
≥2,5000,average,0.4029564015858402,
|
| 292 |
+
≥2,5000,average_rank,2.5,
|
| 293 |
+
≥2,5000,chartqa_relaxed_overall,0.5628,0.00992279440175477
|
| 294 |
+
≥2,5000,docvqa_val_anls,0.548770508666009,0.006315482288099859
|
| 295 |
+
≥2,5000,infovqa_val_anls,0.20783386531525747,0.006421967027729742
|
| 296 |
+
≥2,5000,mme_total_score,1206.0127050820329,
|
| 297 |
+
≥2,5000,mmmu_val_mmmu_acc,0.25667,
|
| 298 |
+
≥2,5000,mmstar_average,0.3210115801865161,
|
| 299 |
+
≥2,5000,ocrbench_ocrbench_accuracy,0.484,
|
| 300 |
+
≥2,5000,seedbench_seed_all,0.4440244580322401,
|
| 301 |
+
≥2,5000,textvqa_val_exact_match,0.47572,0.006783457774606987
|
| 302 |
+
≥2,6000,ai2d_exact_match,0.3542746113989637,0.00860846328571982
|
| 303 |
+
≥2,6000,average,0.4118759304334577,
|
| 304 |
+
≥2,6000,average_rank,3.3,
|
| 305 |
+
≥2,6000,chartqa_relaxed_overall,0.5644,0.00991868984106597
|
| 306 |
+
≥2,6000,docvqa_val_anls,0.5618652265799138,0.006261889040657647
|
| 307 |
+
≥2,6000,infovqa_val_anls,0.2101901707833487,0.006387610514125727
|
| 308 |
+
≥2,6000,mme_total_score,1135.471288515406,
|
| 309 |
+
≥2,6000,mmmu_val_mmmu_acc,0.26333,
|
| 310 |
+
≥2,6000,mmstar_average,0.3255666447386709,
|
| 311 |
+
≥2,6000,ocrbench_ocrbench_accuracy,0.482,
|
| 312 |
+
≥2,6000,seedbench_seed_all,0.4740967204002223,
|
| 313 |
+
≥2,6000,textvqa_val_exact_match,0.47116,0.00678908456375694
|
| 314 |
+
≥2,7000,ai2d_exact_match,0.37338082901554404,0.008705816961084268
|
| 315 |
+
≥2,7000,average,0.4291995483001856,
|
| 316 |
+
≥2,7000,average_rank,2.4,
|
| 317 |
+
≥2,7000,chartqa_relaxed_overall,0.5716,0.009898917689756362
|
| 318 |
+
≥2,7000,docvqa_val_anls,0.5846126379804475,0.006218823793449337
|
| 319 |
+
≥2,7000,infovqa_val_anls,0.2243908724169204,0.006651785538916188
|
| 320 |
+
≥2,7000,mme_total_score,1249.9180672268908,
|
| 321 |
+
≥2,7000,mmmu_val_mmmu_acc,0.27556,
|
| 322 |
+
≥2,7000,mmstar_average,0.32644844909642934,
|
| 323 |
+
≥2,7000,ocrbench_ocrbench_accuracy,0.506,
|
| 324 |
+
≥2,7000,seedbench_seed_all,0.4936631461923291,
|
| 325 |
+
≥2,7000,textvqa_val_exact_match,0.5071399999999999,0.00678246300696791
|
| 326 |
+
≥2,8000,ai2d_exact_match,0.3963730569948187,0.008803757198545703
|
| 327 |
+
≥2,8000,average,0.43448504857720643,
|
| 328 |
+
≥2,8000,average_rank,3.0,
|
| 329 |
+
≥2,8000,chartqa_relaxed_overall,0.5784,0.009878279615563902
|
| 330 |
+
≥2,8000,docvqa_val_anls,0.5935884981677085,0.006228109848938283
|
| 331 |
+
≥2,8000,infovqa_val_anls,0.22034669568379356,0.006538842004996925
|
| 332 |
+
≥2,8000,mme_total_score,1251.6327531012405,
|
| 333 |
+
≥2,8000,mmmu_val_mmmu_acc,0.27444,
|
| 334 |
+
≥2,8000,mmstar_average,0.3368503047476477,
|
| 335 |
+
≥2,8000,ocrbench_ocrbench_accuracy,0.516,
|
| 336 |
+
≥2,8000,seedbench_seed_all,0.4963868816008894,
|
| 337 |
+
≥2,8000,textvqa_val_exact_match,0.49798000000000003,0.006777844181917349
|
| 338 |
+
≥2,9000,ai2d_exact_match,0.4015544041450777,0.008822998789014784
|
| 339 |
+
≥2,9000,average,0.4409076865862069,
|
| 340 |
+
≥2,9000,average_rank,2.8,
|
| 341 |
+
≥2,9000,chartqa_relaxed_overall,0.5952,0.0098190299592035
|
| 342 |
+
≥2,9000,docvqa_val_anls,0.6142957639909281,0.006149142953850004
|
| 343 |
+
≥2,9000,infovqa_val_anls,0.225441641847203,0.006565814507342015
|
| 344 |
+
≥2,9000,mme_total_score,1170.923569427771,
|
| 345 |
+
≥2,9000,mmmu_val_mmmu_acc,0.28,
|
| 346 |
+
≥2,9000,mmstar_average,0.32763888124373686,
|
| 347 |
+
≥2,9000,ocrbench_ocrbench_accuracy,0.514,
|
| 348 |
+
≥2,9000,seedbench_seed_all,0.5012784880489161,
|
| 349 |
+
≥2,9000,textvqa_val_exact_match,0.50876,0.006788539558245703
|
| 350 |
+
≥2,10000,ai2d_exact_match,0.4018782383419689,0.008824167272304229
|
| 351 |
+
≥2,10000,average,0.44844067183729286,
|
| 352 |
+
≥2,10000,average_rank,2.7,
|
| 353 |
+
≥2,10000,chartqa_relaxed_overall,0.5956,0.009817474681589429
|
| 354 |
+
≥2,10000,docvqa_val_anls,0.6161881255627961,0.006150295182189919
|
| 355 |
+
≥2,10000,infovqa_val_anls,0.2273186020139702,0.006609762944776786
|
| 356 |
+
≥2,10000,mme_total_score,1244.6918767507002,
|
| 357 |
+
≥2,10000,mmmu_val_mmmu_acc,0.28667,
|
| 358 |
+
≥2,10000,mmstar_average,0.3405769394273513,
|
| 359 |
+
≥2,10000,ocrbench_ocrbench_accuracy,0.529,
|
| 360 |
+
≥2,10000,seedbench_seed_all,0.5174541411895498,
|
| 361 |
+
≥2,10000,textvqa_val_exact_match,0.52128,0.0067723707312184415
|
| 362 |
+
≥2,11000,ai2d_exact_match,0.41386010362694303,0.008864599272573477
|
| 363 |
+
≥2,11000,average,0.4508015001273205,
|
| 364 |
+
≥2,11000,average_rank,3.1,
|
| 365 |
+
≥2,11000,chartqa_relaxed_overall,0.5916,0.0098327233755248
|
| 366 |
+
≥2,11000,docvqa_val_anls,0.6132516406541649,0.006147223601932411
|
| 367 |
+
≥2,11000,infovqa_val_anls,0.23136501765139353,0.006670154065298524
|
| 368 |
+
≥2,11000,mme_total_score,1193.1198479391755,
|
| 369 |
+
≥2,11000,mmmu_val_mmmu_acc,0.28222,
|
| 370 |
+
≥2,11000,mmstar_average,0.34055130285985363,
|
| 371 |
+
≥2,11000,ocrbench_ocrbench_accuracy,0.544,
|
| 372 |
+
≥2,11000,seedbench_seed_all,0.5137854363535297,
|
| 373 |
+
≥2,11000,textvqa_val_exact_match,0.52658,0.006779520123033763
|
| 374 |
+
≥2,12000,ai2d_exact_match,0.42033678756476683,0.008884198538329101
|
| 375 |
+
≥2,12000,average,0.4593162089992856,
|
| 376 |
+
≥2,12000,average_rank,2.4,
|
| 377 |
+
≥2,12000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 378 |
+
≥2,12000,docvqa_val_anls,0.6322256818549263,0.006037251396803284
|
| 379 |
+
≥2,12000,infovqa_val_anls,0.23499854511160906,0.006635085630122106
|
| 380 |
+
≥2,12000,mme_total_score,1282.3226290516207,
|
| 381 |
+
≥2,12000,mmmu_val_mmmu_acc,0.29444,
|
| 382 |
+
≥2,12000,mmstar_average,0.3455632878074604,
|
| 383 |
+
≥2,12000,ocrbench_ocrbench_accuracy,0.542,
|
| 384 |
+
≥2,12000,seedbench_seed_all,0.5148415786548082,
|
| 385 |
+
≥2,12000,textvqa_val_exact_match,0.5374399999999999,0.0067549667056943374
|
| 386 |
+
≥2,13000,ai2d_exact_match,0.4329663212435233,0.008917911748577596
|
| 387 |
+
≥2,13000,average,0.4594856750450977,
|
| 388 |
+
≥2,13000,average_rank,3.1,
|
| 389 |
+
≥2,13000,chartqa_relaxed_overall,0.6116,0.009749676839741497
|
| 390 |
+
≥2,13000,docvqa_val_anls,0.6480115225202001,0.006082136258345928
|
| 391 |
+
≥2,13000,infovqa_val_anls,0.2390399772273204,0.006801403608154099
|
| 392 |
+
≥2,13000,mme_total_score,1255.4888955582232,
|
| 393 |
+
≥2,13000,mmmu_val_mmmu_acc,0.26667,
|
| 394 |
+
≥2,13000,mmstar_average,0.3276926929918222,
|
| 395 |
+
≥2,13000,ocrbench_ocrbench_accuracy,0.551,
|
| 396 |
+
≥2,13000,seedbench_seed_all,0.5190105614230128,
|
| 397 |
+
≥2,13000,textvqa_val_exact_match,0.5393800000000001,0.006748937157104821
|
| 398 |
+
≥2,14000,ai2d_exact_match,0.43523316062176165,0.00892333645202351
|
| 399 |
+
≥2,14000,average,0.46380397688227554,
|
| 400 |
+
≥2,14000,average_rank,3.4,
|
| 401 |
+
≥2,14000,chartqa_relaxed_overall,0.6136,0.009740429476494075
|
| 402 |
+
≥2,14000,docvqa_val_anls,0.6474419557198757,0.006056802443739013
|
| 403 |
+
≥2,14000,infovqa_val_anls,0.24341248035748822,0.006789396426159645
|
| 404 |
+
≥2,14000,mme_total_score,1209.5489195678272,
|
| 405 |
+
≥2,14000,mmmu_val_mmmu_acc,0.27556,
|
| 406 |
+
≥2,14000,mmstar_average,0.35309886783724065,
|
| 407 |
+
≥2,14000,ocrbench_ocrbench_accuracy,0.545,
|
| 408 |
+
≥2,14000,seedbench_seed_all,0.5207893274041134,
|
| 409 |
+
≥2,14000,textvqa_val_exact_match,0.5400999999999999,0.006762835587905254
|
| 410 |
+
≥2,15000,ai2d_exact_match,0.43588082901554404,0.008924851504668983
|
| 411 |
+
≥2,15000,average,0.46327247775474995,
|
| 412 |
+
≥2,15000,average_rank,3.4,
|
| 413 |
+
≥2,15000,chartqa_relaxed_overall,0.614,0.009738559226822298
|
| 414 |
+
≥2,15000,docvqa_val_anls,0.638973421646662,0.005999307255506728
|
| 415 |
+
≥2,15000,infovqa_val_anls,0.23590457960067904,0.006699424952743598
|
| 416 |
+
≥2,15000,mme_total_score,1230.12775110044,
|
| 417 |
+
≥2,15000,mmmu_val_mmmu_acc,0.28667,
|
| 418 |
+
≥2,15000,mmstar_average,0.3545309625815601,
|
| 419 |
+
≥2,15000,ocrbench_ocrbench_accuracy,0.536,
|
| 420 |
+
≥2,15000,seedbench_seed_all,0.5225125069483046,
|
| 421 |
+
≥2,15000,textvqa_val_exact_match,0.54498,0.006749227387936104
|
| 422 |
+
≥2,16000,ai2d_exact_match,0.4413860103626943,0.008937105222785166
|
| 423 |
+
≥2,16000,average,0.4691799993543692,
|
| 424 |
+
≥2,16000,average_rank,3.1,
|
| 425 |
+
≥2,16000,chartqa_relaxed_overall,0.6168,0.009725273074549106
|
| 426 |
+
≥2,16000,docvqa_val_anls,0.6539654303329543,0.00605387835402321
|
| 427 |
+
≥2,16000,infovqa_val_anls,0.251011584102177,0.006888371171829252
|
| 428 |
+
≥2,16000,mme_total_score,1235.6986794717886,
|
| 429 |
+
≥2,16000,mmmu_val_mmmu_acc,0.28556,
|
| 430 |
+
≥2,16000,mmstar_average,0.35467603553935745,
|
| 431 |
+
≥2,16000,ocrbench_ocrbench_accuracy,0.545,
|
| 432 |
+
≥2,16000,seedbench_seed_all,0.5256809338521401,
|
| 433 |
+
≥2,16000,textvqa_val_exact_match,0.5485399999999999,0.0067546057338473825
|
| 434 |
+
≥2,17000,ai2d_exact_match,0.43976683937823835,0.008933617011753861
|
| 435 |
+
≥2,17000,average,0.47032074037035837,
|
| 436 |
+
≥2,17000,average_rank,3.1,
|
| 437 |
+
≥2,17000,chartqa_relaxed_overall,0.6184,0.009717527882093043
|
| 438 |
+
≥2,17000,docvqa_val_anls,0.655318828070759,0.005978737407680595
|
| 439 |
+
≥2,17000,infovqa_val_anls,0.24899610305034758,0.006825123869520012
|
| 440 |
+
≥2,17000,mme_total_score,1246.5548219287716,
|
| 441 |
+
≥2,17000,mmmu_val_mmmu_acc,0.29556,
|
| 442 |
+
≥2,17000,mmstar_average,0.3399003792152031,
|
| 443 |
+
≥2,17000,ocrbench_ocrbench_accuracy,0.553,
|
| 444 |
+
≥2,17000,seedbench_seed_all,0.5241245136186771,
|
| 445 |
+
≥2,17000,textvqa_val_exact_match,0.55782,0.0067278542139723035
|
| 446 |
+
≥2,18000,ai2d_exact_match,0.4413860103626943,0.008937105222785166
|
| 447 |
+
≥2,18000,average,0.4720458439616472,
|
| 448 |
+
≥2,18000,average_rank,3.0,
|
| 449 |
+
≥2,18000,chartqa_relaxed_overall,0.6256,0.009681288495793083
|
| 450 |
+
≥2,18000,docvqa_val_anls,0.6595541124701471,0.00598982698063352
|
| 451 |
+
≥2,18000,infovqa_val_anls,0.24628476636774824,0.006771852338911992
|
| 452 |
+
≥2,18000,mme_total_score,1246.580632252901,
|
| 453 |
+
≥2,18000,mmmu_val_mmmu_acc,0.29556,
|
| 454 |
+
≥2,18000,mmstar_average,0.3370877286888103,
|
| 455 |
+
≥2,18000,ocrbench_ocrbench_accuracy,0.557,
|
| 456 |
+
≥2,18000,seedbench_seed_all,0.5279599777654252,
|
| 457 |
+
≥2,18000,textvqa_val_exact_match,0.5579800000000001,0.006730967620262408
|
| 458 |
+
≥2,19000,ai2d_exact_match,0.44009067357512954,0.008934322367529354
|
| 459 |
+
≥2,19000,average,0.47484048232505544,
|
| 460 |
+
≥2,19000,average_rank,3.1,
|
| 461 |
+
≥2,19000,chartqa_relaxed_overall,0.6236,0.009691583292459796
|
| 462 |
+
≥2,19000,docvqa_val_anls,0.656845810830509,0.005998201285366962
|
| 463 |
+
≥2,19000,infovqa_val_anls,0.2483428639081206,0.006859079557818024
|
| 464 |
+
≥2,19000,mme_total_score,1255.002801120448,
|
| 465 |
+
≥2,19000,mmmu_val_mmmu_acc,0.30667,
|
| 466 |
+
≥2,19000,mmstar_average,0.3484268658746636,
|
| 467 |
+
≥2,19000,ocrbench_ocrbench_accuracy,0.557,
|
| 468 |
+
≥2,19000,seedbench_seed_all,0.5306281267370762,
|
| 469 |
+
≥2,19000,textvqa_val_exact_match,0.56196,0.006711810587335734
|
| 470 |
+
≥2,20000,ai2d_exact_match,0.44397668393782386,0.008942485993062323
|
| 471 |
+
≥2,20000,average,0.4719647885204447,
|
| 472 |
+
≥2,20000,average_rank,3.3,
|
| 473 |
+
≥2,20000,chartqa_relaxed_overall,0.6252,0.009683361554563506
|
| 474 |
+
≥2,20000,docvqa_val_anls,0.6531065052301426,0.005958657790556006
|
| 475 |
+
≥2,20000,infovqa_val_anls,0.2515640311557441,0.00684713602725156
|
| 476 |
+
≥2,20000,mme_total_score,1269.56512605042,
|
| 477 |
+
≥2,20000,mmmu_val_mmmu_acc,0.29222,
|
| 478 |
+
≥2,20000,mmstar_average,0.3405247257210479,
|
| 479 |
+
≥2,20000,ocrbench_ocrbench_accuracy,0.557,
|
| 480 |
+
≥2,20000,seedbench_seed_all,0.528071150639244,
|
| 481 |
+
≥2,20000,textvqa_val_exact_match,0.5560200000000001,0.006742124529303335
|
| 482 |
+
≥3,1000,ai2d_exact_match,0.25712435233160624,0.007866134203324925
|
| 483 |
+
≥3,1000,average,0.2908366935977347,
|
| 484 |
+
≥3,1000,average_rank,2.6,
|
| 485 |
+
≥3,1000,chartqa_relaxed_overall,0.3724,0.009670817229291067
|
| 486 |
+
≥3,1000,docvqa_val_anls,0.36190361730095816,0.005874681377878617
|
| 487 |
+
≥3,1000,infovqa_val_anls,0.1897409167650202,0.006570751118077319
|
| 488 |
+
≥3,1000,mme_total_score,938.3572428971589,
|
| 489 |
+
≥3,1000,mmmu_val_mmmu_acc,0.26333,
|
| 490 |
+
≥3,1000,mmstar_average,0.2361438073438948,
|
| 491 |
+
≥3,1000,ocrbench_ocrbench_accuracy,0.313,
|
| 492 |
+
≥3,1000,seedbench_seed_all,0.25758754863813227,
|
| 493 |
+
≥3,1000,textvqa_val_exact_match,0.36629999999999996,0.006582851113746775
|
| 494 |
+
≥3,2000,ai2d_exact_match,0.25647668393782386,0.007859644922870102
|
| 495 |
+
≥3,2000,average,0.327478691314146,
|
| 496 |
+
≥3,2000,average_rank,2.9,
|
| 497 |
+
≥3,2000,chartqa_relaxed_overall,0.4708,0.009984929820955767
|
| 498 |
+
≥3,2000,docvqa_val_anls,0.455859181323049,0.0061669106819143196
|
| 499 |
+
≥3,2000,infovqa_val_anls,0.20804914764579785,0.0066905821266465505
|
| 500 |
+
≥3,2000,mme_total_score,990.4248699479792,
|
| 501 |
+
≥3,2000,mmmu_val_mmmu_acc,0.27111,
|
| 502 |
+
≥3,2000,mmstar_average,0.21380673865938699,
|
| 503 |
+
≥3,2000,ocrbench_ocrbench_accuracy,0.405,
|
| 504 |
+
≥3,2000,seedbench_seed_all,0.26364647026125626,
|
| 505 |
+
≥3,2000,textvqa_val_exact_match,0.40256,0.0066960030295180025
|
| 506 |
+
≥3,3000,ai2d_exact_match,0.2697538860103627,0.007988222765138163
|
| 507 |
+
≥3,3000,average,0.34640625458292296,
|
| 508 |
+
≥3,3000,average_rank,3.1,
|
| 509 |
+
≥3,3000,chartqa_relaxed_overall,0.514,0.009998079047189691
|
| 510 |
+
≥3,3000,docvqa_val_anls,0.4749731938810012,0.00604931863100692
|
| 511 |
+
≥3,3000,infovqa_val_anls,0.19785201580687228,0.00636819106561235
|
| 512 |
+
≥3,3000,mme_total_score,1022.0748299319728,
|
| 513 |
+
≥3,3000,mmmu_val_mmmu_acc,0.26556,
|
| 514 |
+
≥3,3000,mmstar_average,0.2234035546364529,
|
| 515 |
+
≥3,3000,ocrbench_ocrbench_accuracy,0.435,
|
| 516 |
+
≥3,3000,seedbench_seed_all,0.29655364091161757,
|
| 517 |
+
≥3,3000,textvqa_val_exact_match,0.44056000000000006,0.006770653264576898
|
| 518 |
+
≥3,4000,ai2d_exact_match,0.31476683937823835,0.008358827401711809
|
| 519 |
+
≥3,4000,average,0.3840989485719881,
|
| 520 |
+
≥3,4000,average_rank,2.5,
|
| 521 |
+
≥3,4000,chartqa_relaxed_overall,0.5244,0.009990083919101193
|
| 522 |
+
≥3,4000,docvqa_val_anls,0.5398623644141017,0.006209437344747972
|
| 523 |
+
≥3,4000,infovqa_val_anls,0.21841657659961455,0.006654701266433889
|
| 524 |
+
≥3,4000,mme_total_score,1008.1938775510204,
|
| 525 |
+
≥3,4000,mmmu_val_mmmu_acc,0.27778,
|
| 526 |
+
≥3,4000,mmstar_average,0.25611290572758977,
|
| 527 |
+
≥3,4000,ocrbench_ocrbench_accuracy,0.462,
|
| 528 |
+
≥3,4000,seedbench_seed_all,0.39733185102834906,
|
| 529 |
+
≥3,4000,textvqa_val_exact_match,0.46621999999999997,0.006799457981763631
|
| 530 |
+
≥3,5000,ai2d_exact_match,0.3442357512953368,0.008551327504046387
|
| 531 |
+
≥3,5000,average,0.4034839586685592,
|
| 532 |
+
≥3,5000,average_rank,2.4,
|
| 533 |
+
≥3,5000,chartqa_relaxed_overall,0.5544,0.009942625323290008
|
| 534 |
+
≥3,5000,docvqa_val_anls,0.5567727758893183,0.006173642024037381
|
| 535 |
+
≥3,5000,infovqa_val_anls,0.21638639427926507,0.006559084868006158
|
| 536 |
+
≥3,5000,mme_total_score,1074.1284513805522,
|
| 537 |
+
≥3,5000,mmmu_val_mmmu_acc,0.26778,
|
| 538 |
+
≥3,5000,mmstar_average,0.3009278216170371,
|
| 539 |
+
≥3,5000,ocrbench_ocrbench_accuracy,0.482,
|
| 540 |
+
≥3,5000,seedbench_seed_all,0.4471928849360756,
|
| 541 |
+
≥3,5000,textvqa_val_exact_match,0.46165999999999996,0.006793381991893107
|
| 542 |
+
≥3,6000,ai2d_exact_match,0.36819948186528495,0.008680870162409787
|
| 543 |
+
≥3,6000,average,0.41987173897944946,
|
| 544 |
+
≥3,6000,average_rank,2.2,
|
| 545 |
+
≥3,6000,chartqa_relaxed_overall,0.5636,0.009920755241100424
|
| 546 |
+
≥3,6000,docvqa_val_anls,0.5766507662420887,0.006104661016322198
|
| 547 |
+
≥3,6000,infovqa_val_anls,0.2209691160904877,0.0066290878786102805
|
| 548 |
+
≥3,6000,mme_total_score,1088.6353541416568,
|
| 549 |
+
≥3,6000,mmmu_val_mmmu_acc,0.29778,
|
| 550 |
+
≥3,6000,mmstar_average,0.2960442855054549,
|
| 551 |
+
≥3,6000,ocrbench_ocrbench_accuracy,0.496,
|
| 552 |
+
≥3,6000,seedbench_seed_all,0.48360200111172874,
|
| 553 |
+
≥3,6000,textvqa_val_exact_match,0.476,0.006791614329821814
|
| 554 |
+
≥3,7000,ai2d_exact_match,0.3905440414507772,0.008780876258359173
|
| 555 |
+
≥3,7000,average,0.4305333557001585,
|
| 556 |
+
≥3,7000,average_rank,2.4,
|
| 557 |
+
≥3,7000,chartqa_relaxed_overall,0.5744,0.009890651444389179
|
| 558 |
+
≥3,7000,docvqa_val_anls,0.5943945047786826,0.006168637154272831
|
| 559 |
+
≥3,7000,infovqa_val_anls,0.23015651757384684,0.006652654324068369
|
| 560 |
+
≥3,7000,mme_total_score,1024.3486394557824,
|
| 561 |
+
≥3,7000,mmmu_val_mmmu_acc,0.29,
|
| 562 |
+
≥3,7000,mmstar_average,0.3086297067032336,
|
| 563 |
+
≥3,7000,ocrbench_ocrbench_accuracy,0.496,
|
| 564 |
+
≥3,7000,seedbench_seed_all,0.49577543079488606,
|
| 565 |
+
≥3,7000,textvqa_val_exact_match,0.4949,0.006791673090238732
|
| 566 |
+
≥3,8000,ai2d_exact_match,0.39863989637305697,0.008812301996070583
|
| 567 |
+
≥3,8000,average,0.43894563539556736,
|
| 568 |
+
≥3,8000,average_rank,2.4,
|
| 569 |
+
≥3,8000,chartqa_relaxed_overall,0.5812,0.009869224115088964
|
| 570 |
+
≥3,8000,docvqa_val_anls,0.597896936397571,0.006178924858305047
|
| 571 |
+
≥3,8000,infovqa_val_anls,0.23624667779429379,0.006701812126185011
|
| 572 |
+
≥3,8000,mme_total_score,1087.3003201280512,
|
| 573 |
+
≥3,8000,mmmu_val_mmmu_acc,0.30444,
|
| 574 |
+
≥3,8000,mmstar_average,0.3179238728089695,
|
| 575 |
+
≥3,8000,ocrbench_ocrbench_accuracy,0.52,
|
| 576 |
+
≥3,8000,seedbench_seed_all,0.5060033351862145,
|
| 577 |
+
≥3,8000,textvqa_val_exact_match,0.48816000000000004,0.006805617250862191
|
| 578 |
+
≥3,9000,ai2d_exact_match,0.4073834196891192,0.008843420154535594
|
| 579 |
+
≥3,9000,average,0.4380819691649286,
|
| 580 |
+
≥3,9000,average_rank,2.8,
|
| 581 |
+
≥3,9000,chartqa_relaxed_overall,0.5892,0.009841548985529353
|
| 582 |
+
≥3,9000,docvqa_val_anls,0.5926801961513722,0.00607014347283834
|
| 583 |
+
≥3,9000,infovqa_val_anls,0.22884227739619317,0.006587321958723987
|
| 584 |
+
≥3,9000,mme_total_score,960.1394557823129,
|
| 585 |
+
≥3,9000,mmmu_val_mmmu_acc,0.29222,
|
| 586 |
+
≥3,9000,mmstar_average,0.3023124740503409,
|
| 587 |
+
≥3,9000,ocrbench_ocrbench_accuracy,0.516,
|
| 588 |
+
≥3,9000,seedbench_seed_all,0.5108393551973318,
|
| 589 |
+
≥3,9000,textvqa_val_exact_match,0.50326,0.006787480273097782
|
| 590 |
+
≥3,10000,ai2d_exact_match,0.42487046632124353,0.008896983637113786
|
| 591 |
+
≥3,10000,average,0.45376975700130806,
|
| 592 |
+
≥3,10000,average_rank,2.5,
|
| 593 |
+
≥3,10000,chartqa_relaxed_overall,0.592,0.009831228876620145
|
| 594 |
+
≥3,10000,docvqa_val_anls,0.6288940533515488,0.006078026262812974
|
| 595 |
+
≥3,10000,infovqa_val_anls,0.2639557991160976,0.007015193539901653
|
| 596 |
+
≥3,10000,mme_total_score,1135.5116046418568,
|
| 597 |
+
≥3,10000,mmmu_val_mmmu_acc,0.29556,
|
| 598 |
+
≥3,10000,mmstar_average,0.3171165325775241,
|
| 599 |
+
≥3,10000,ocrbench_ocrbench_accuracy,0.53,
|
| 600 |
+
≥3,10000,seedbench_seed_all,0.5157309616453586,
|
| 601 |
+
≥3,10000,textvqa_val_exact_match,0.5158,0.0067831610812991135
|
| 602 |
+
≥3,11000,ai2d_exact_match,0.4271373056994819,0.008903088856242218
|
| 603 |
+
≥3,11000,average,0.4507656942256156,
|
| 604 |
+
≥3,11000,average_rank,3.1,
|
| 605 |
+
≥3,11000,chartqa_relaxed_overall,0.6008,0.00979663889573671
|
| 606 |
+
≥3,11000,docvqa_val_anls,0.6266233612884972,0.006097228164879785
|
| 607 |
+
≥3,11000,infovqa_val_anls,0.23605295775343718,0.006674753327687541
|
| 608 |
+
≥3,11000,mme_total_score,1115.0593237294918,
|
| 609 |
+
≥3,11000,mmmu_val_mmmu_acc,0.27889,
|
| 610 |
+
≥3,11000,mmstar_average,0.3244509807099135,
|
| 611 |
+
≥3,11000,ocrbench_ocrbench_accuracy,0.524,
|
| 612 |
+
≥3,11000,seedbench_seed_all,0.5219566425792107,
|
| 613 |
+
≥3,11000,textvqa_val_exact_match,0.5169799999999999,0.006776837095888084
|
| 614 |
+
≥3,12000,ai2d_exact_match,0.42843264248704666,0.008906491762178372
|
| 615 |
+
≥3,12000,average,0.4596080978908205,
|
| 616 |
+
≥3,12000,average_rank,2.6,
|
| 617 |
+
≥3,12000,chartqa_relaxed_overall,0.6048,0.009779828322460816
|
| 618 |
+
≥3,12000,docvqa_val_anls,0.6391083009950597,0.006038971765674556
|
| 619 |
+
≥3,12000,infovqa_val_anls,0.24141834493583503,0.006794485284013245
|
| 620 |
+
≥3,12000,mme_total_score,1183.3176270508202,
|
| 621 |
+
≥3,12000,mmmu_val_mmmu_acc,0.28444,
|
| 622 |
+
≥3,12000,mmstar_average,0.3293224419601992,
|
| 623 |
+
≥3,12000,ocrbench_ocrbench_accuracy,0.555,
|
| 624 |
+
≥3,12000,seedbench_seed_all,0.528071150639244,
|
| 625 |
+
≥3,12000,textvqa_val_exact_match,0.5258799999999999,0.006773951756875811
|
| 626 |
+
≥3,13000,ai2d_exact_match,0.43458549222797926,0.008921805911548515
|
| 627 |
+
≥3,13000,average,0.4623863039639755,
|
| 628 |
+
≥3,13000,average_rank,2.7,
|
| 629 |
+
≥3,13000,chartqa_relaxed_overall,0.6108,0.00975332737879659
|
| 630 |
+
≥3,13000,docvqa_val_anls,0.6376374898768016,0.006015671277879292
|
| 631 |
+
≥3,13000,infovqa_val_anls,0.24710671614089955,0.006756961641692092
|
| 632 |
+
≥3,13000,mme_total_score,1261.84493797519,
|
| 633 |
+
≥3,13000,mmmu_val_mmmu_acc,0.28889,
|
| 634 |
+
≥3,13000,mmstar_average,0.3264133242561134,
|
| 635 |
+
≥3,13000,ocrbench_ocrbench_accuracy,0.553,
|
| 636 |
+
≥3,13000,seedbench_seed_all,0.5306837131739855,
|
| 637 |
+
≥3,13000,textvqa_val_exact_match,0.5323599999999999,0.0067627001192260856
|
| 638 |
+
≥3,14000,ai2d_exact_match,0.4381476683937824,0.008930032335354969
|
| 639 |
+
≥3,14000,average,0.4678786302971554,
|
| 640 |
+
≥3,14000,average_rank,2.8,
|
| 641 |
+
≥3,14000,chartqa_relaxed_overall,0.6104,0.009755142291143075
|
| 642 |
+
≥3,14000,docvqa_val_anls,0.6523739582747238,0.006065891171788989
|
| 643 |
+
≥3,14000,infovqa_val_anls,0.2541881734588241,0.006851623469491799
|
| 644 |
+
≥3,14000,mme_total_score,1188.5243097238895,
|
| 645 |
+
≥3,14000,mmmu_val_mmmu_acc,0.30333,
|
| 646 |
+
≥3,14000,mmstar_average,0.3360144984503474,
|
| 647 |
+
≥3,14000,ocrbench_ocrbench_accuracy,0.544,
|
| 648 |
+
≥3,14000,seedbench_seed_all,0.5320733740967204,
|
| 649 |
+
≥3,14000,textvqa_val_exact_match,0.54038,0.006754155375986593
|
| 650 |
+
≥3,15000,ai2d_exact_match,0.4420336787564767,0.008938473522297184
|
| 651 |
+
≥3,15000,average,0.4717225541426424,
|
| 652 |
+
≥3,15000,average_rank,2.6,
|
| 653 |
+
≥3,15000,chartqa_relaxed_overall,0.6204,0.009707689307588963
|
| 654 |
+
≥3,15000,docvqa_val_anls,0.6657062061222615,0.005987913582977679
|
| 655 |
+
≥3,15000,infovqa_val_anls,0.2547770182395889,0.00697535381427897
|
| 656 |
+
≥3,15000,mme_total_score,1150.081532613045,
|
| 657 |
+
≥3,15000,mmmu_val_mmmu_acc,0.29889,
|
| 658 |
+
≥3,15000,mmstar_average,0.32974187627218104,
|
| 659 |
+
≥3,15000,ocrbench_ocrbench_accuracy,0.556,
|
| 660 |
+
≥3,15000,seedbench_seed_all,0.533574207893274,
|
| 661 |
+
≥3,15000,textvqa_val_exact_match,0.54438,0.006740769296908389
|
| 662 |
+
≥3,16000,ai2d_exact_match,0.44397668393782386,0.008942485993062323
|
| 663 |
+
≥3,16000,average,0.4725329505079693,
|
| 664 |
+
≥3,16000,average_rank,2.7,
|
| 665 |
+
≥3,16000,chartqa_relaxed_overall,0.6136,0.009740429476494075
|
| 666 |
+
≥3,16000,docvqa_val_anls,0.6627356411508976,0.00599828184206493
|
| 667 |
+
≥3,16000,infovqa_val_anls,0.25144929243788827,0.006859545868541458
|
| 668 |
+
≥3,16000,mme_total_score,1189.4136654661866,
|
| 669 |
+
≥3,16000,mmmu_val_mmmu_acc,0.30556,
|
| 670 |
+
≥3,16000,mmstar_average,0.32828609880164517,
|
| 671 |
+
≥3,16000,ocrbench_ocrbench_accuracy,0.565,
|
| 672 |
+
≥3,16000,seedbench_seed_all,0.5359088382434686,
|
| 673 |
+
≥3,16000,textvqa_val_exact_match,0.54628,0.006755557699551266
|
| 674 |
+
≥3,17000,ai2d_exact_match,0.4423575129533679,0.008939151893135124
|
| 675 |
+
≥3,17000,average,0.47219284094380815,
|
| 676 |
+
≥3,17000,average_rank,2.9,
|
| 677 |
+
≥3,17000,chartqa_relaxed_overall,0.6196,0.009711645711462604
|
| 678 |
+
≥3,17000,docvqa_val_anls,0.6671354152323413,0.005979986643812461
|
| 679 |
+
≥3,17000,infovqa_val_anls,0.26085018558007095,0.006930202483417548
|
| 680 |
+
≥3,17000,mme_total_score,1181.9268707482993,
|
| 681 |
+
≥3,17000,mmmu_val_mmmu_acc,0.29667,
|
| 682 |
+
≥3,17000,mmstar_average,0.3246494808541184,
|
| 683 |
+
≥3,17000,ocrbench_ocrbench_accuracy,0.556,
|
| 684 |
+
≥3,17000,seedbench_seed_all,0.5353529738743746,
|
| 685 |
+
≥3,17000,textvqa_val_exact_match,0.5471199999999999,0.006741055517194408
|
| 686 |
+
≥3,18000,ai2d_exact_match,0.44624352331606215,0.0089469921763539
|
| 687 |
+
≥3,18000,average,0.47727537354972976,
|
| 688 |
+
≥3,18000,average_rank,2.8,
|
| 689 |
+
≥3,18000,chartqa_relaxed_overall,0.6212,0.009703704898413913
|
| 690 |
+
≥3,18000,docvqa_val_anls,0.6676971859833172,0.005968624246725931
|
| 691 |
+
≥3,18000,infovqa_val_anls,0.2614701461784385,0.006943538426265278
|
| 692 |
+
≥3,18000,mme_total_score,1133.047819127651,
|
| 693 |
+
≥3,18000,mmmu_val_mmmu_acc,0.30444,
|
| 694 |
+
≥3,18000,mmstar_average,0.3242292852357318,
|
| 695 |
+
≥3,18000,ocrbench_ocrbench_accuracy,0.582,
|
| 696 |
+
≥3,18000,seedbench_seed_all,0.5367982212340189,
|
| 697 |
+
≥3,18000,textvqa_val_exact_match,0.5513999999999999,0.006735687188133017
|
| 698 |
+
≥3,19000,ai2d_exact_match,0.4520725388601036,0.008957715852675527
|
| 699 |
+
≥3,19000,average,0.4762675915069992,
|
| 700 |
+
≥3,19000,average_rank,3.0,
|
| 701 |
+
≥3,19000,chartqa_relaxed_overall,0.6216,0.009701702181065136
|
| 702 |
+
≥3,19000,docvqa_val_anls,0.6679273632688325,0.00596194457686321
|
| 703 |
+
≥3,19000,infovqa_val_anls,0.25211534311880446,0.006837669178934141
|
| 704 |
+
≥3,19000,mme_total_score,1168.6077430972389,
|
| 705 |
+
≥3,19000,mmmu_val_mmmu_acc,0.30111,
|
| 706 |
+
≥3,19000,mmstar_average,0.334229548576509,
|
| 707 |
+
≥3,19000,ocrbench_ocrbench_accuracy,0.566,
|
| 708 |
+
≥3,19000,seedbench_seed_all,0.5363535297387437,
|
| 709 |
+
≥3,19000,textvqa_val_exact_match,0.555,0.006737661257130932
|
| 710 |
+
≥3,20000,ai2d_exact_match,0.4566062176165803,0.008965198879336198
|
| 711 |
+
≥3,20000,average,0.4782761612786655,
|
| 712 |
+
≥3,20000,average_rank,2.7,
|
| 713 |
+
≥3,20000,chartqa_relaxed_overall,0.6268,0.009675026948726469
|
| 714 |
+
≥3,20000,docvqa_val_anls,0.6699567897644018,0.005975453790424837
|
| 715 |
+
≥3,20000,infovqa_val_anls,0.2594904076423186,0.006910668664574003
|
| 716 |
+
≥3,20000,mme_total_score,1194.4682873149259,
|
| 717 |
+
≥3,20000,mmmu_val_mmmu_acc,0.30667,
|
| 718 |
+
≥3,20000,mmstar_average,0.3291890626103143,
|
| 719 |
+
≥3,20000,ocrbench_ocrbench_accuracy,0.571,
|
| 720 |
+
≥3,20000,seedbench_seed_all,0.5353529738743746,
|
| 721 |
+
≥3,20000,textvqa_val_exact_match,0.54942,0.0067426571472292
|
| 722 |
+
≥4,1000,ai2d_exact_match,0.266839378238342,0.007960790788435024
|
| 723 |
+
≥4,1000,average,0.28718938224797474,
|
| 724 |
+
≥4,1000,average_rank,2.8,
|
| 725 |
+
≥4,1000,chartqa_relaxed_overall,0.3824,0.009721414421746647
|
| 726 |
+
≥4,1000,docvqa_val_anls,0.3742280929549393,0.005897617626003216
|
| 727 |
+
≥4,1000,infovqa_val_anls,0.18767733564942402,0.006495529242061099
|
| 728 |
+
≥4,1000,mme_total_score,970.0657262905162,
|
| 729 |
+
≥4,1000,mmmu_val_mmmu_acc,0.24667,
|
| 730 |
+
≥4,1000,mmstar_average,0.20409674845299178,
|
| 731 |
+
≥4,1000,ocrbench_ocrbench_accuracy,0.324,
|
| 732 |
+
≥4,1000,seedbench_seed_all,0.2471928849360756,
|
| 733 |
+
≥4,1000,textvqa_val_exact_match,0.3516,0.006519815150594346
|
| 734 |
+
≥4,2000,ai2d_exact_match,0.2700777202072539,0.007991243694641088
|
| 735 |
+
≥4,2000,average,0.32538295993176786,
|
| 736 |
+
≥4,2000,average_rank,2.8,
|
| 737 |
+
≥4,2000,chartqa_relaxed_overall,0.476,0.009990471651004463
|
| 738 |
+
≥4,2000,docvqa_val_anls,0.45055679456484166,0.006087636141467791
|
| 739 |
+
≥4,2000,infovqa_val_anls,0.21184608413063888,0.006740983882332282
|
| 740 |
+
≥4,2000,mme_total_score,1065.059423769508,
|
| 741 |
+
≥4,2000,mmmu_val_mmmu_acc,0.25444,
|
| 742 |
+
≥4,2000,mmstar_average,0.20479630173942967,
|
| 743 |
+
≥4,2000,ocrbench_ocrbench_accuracy,0.404,
|
| 744 |
+
≥4,2000,seedbench_seed_all,0.2535297387437465,
|
| 745 |
+
≥4,2000,textvqa_val_exact_match,0.4032,0.00669032914742019
|
| 746 |
+
≥4,3000,ai2d_exact_match,0.26813471502590674,0.007973037037795191
|
| 747 |
+
≥4,3000,average,0.3429973351943505,
|
| 748 |
+
≥4,3000,average_rank,3.6,
|
| 749 |
+
≥4,3000,chartqa_relaxed_overall,0.5052,0.010001459677380663
|
| 750 |
+
≥4,3000,docvqa_val_anls,0.4883627712637139,0.006123671768321872
|
| 751 |
+
≥4,3000,infovqa_val_anls,0.2020989926298624,0.006492359244043468
|
| 752 |
+
≥4,3000,mme_total_score,1028.0742296918768,
|
| 753 |
+
≥4,3000,mmmu_val_mmmu_acc,0.24444,
|
| 754 |
+
≥4,3000,mmstar_average,0.23755500197641968,
|
| 755 |
+
≥4,3000,ocrbench_ocrbench_accuracy,0.417,
|
| 756 |
+
≥4,3000,seedbench_seed_all,0.2961645358532518,
|
| 757 |
+
≥4,3000,textvqa_val_exact_match,0.42802000000000007,0.006729073636571477
|
| 758 |
+
≥4,4000,ai2d_exact_match,0.297279792746114,0.008226320033454882
|
| 759 |
+
≥4,4000,average,0.37640705986204226,
|
| 760 |
+
≥4,4000,average_rank,3.0,
|
| 761 |
+
≥4,4000,chartqa_relaxed_overall,0.5328,0.009980456292330589
|
| 762 |
+
≥4,4000,docvqa_val_anls,0.5114700599486628,0.006120071866795458
|
| 763 |
+
≥4,4000,infovqa_val_anls,0.20557836945629954,0.006329851460183733
|
| 764 |
+
≥4,4000,mme_total_score,1074.640656262505,
|
| 765 |
+
≥4,4000,mmmu_val_mmmu_acc,0.25889,
|
| 766 |
+
≥4,4000,mmstar_average,0.24011275407256244,
|
| 767 |
+
≥4,4000,ocrbench_ocrbench_accuracy,0.489,
|
| 768 |
+
≥4,4000,seedbench_seed_all,0.40261256253474154,
|
| 769 |
+
≥4,4000,textvqa_val_exact_match,0.44992,0.006773387223162055
|
| 770 |
+
≥4,5000,ai2d_exact_match,0.32998704663212436,0.008462949140760363
|
| 771 |
+
≥4,5000,average,0.3995227518308942,
|
| 772 |
+
≥4,5000,average_rank,2.8,
|
| 773 |
+
≥4,5000,chartqa_relaxed_overall,0.55,0.009951864943131942
|
| 774 |
+
≥4,5000,docvqa_val_anls,0.5627332434349699,0.006167596088104117
|
| 775 |
+
≥4,5000,infovqa_val_anls,0.20676909019266723,0.0063195922615256655
|
| 776 |
+
≥4,5000,mme_total_score,1081.3841536614646,
|
| 777 |
+
≥4,5000,mmmu_val_mmmu_acc,0.26667,
|
| 778 |
+
≥4,5000,mmstar_average,0.26742033896981504,
|
| 779 |
+
≥4,5000,ocrbench_ocrbench_accuracy,0.49,
|
| 780 |
+
≥4,5000,seedbench_seed_all,0.4530850472484714,
|
| 781 |
+
≥4,5000,textvqa_val_exact_match,0.46903999999999996,0.006785801728684695
|
| 782 |
+
≥4,6000,ai2d_exact_match,0.35654145077720206,0.008620788425978479
|
| 783 |
+
≥4,6000,average,0.41458714913417777,
|
| 784 |
+
≥4,6000,average_rank,2.8,
|
| 785 |
+
≥4,6000,chartqa_relaxed_overall,0.5632,0.009921778100334079
|
| 786 |
+
≥4,6000,docvqa_val_anls,0.5818014224607982,0.006182490179642956
|
| 787 |
+
≥4,6000,infovqa_val_anls,0.2145391217079547,0.006472934595237677
|
| 788 |
+
≥4,6000,mme_total_score,1132.2886154461785,
|
| 789 |
+
≥4,6000,mmmu_val_mmmu_acc,0.26667,
|
| 790 |
+
≥4,6000,mmstar_average,0.28714914548287923,
|
| 791 |
+
≥4,6000,ocrbench_ocrbench_accuracy,0.499,
|
| 792 |
+
≥4,6000,seedbench_seed_all,0.47376320177876596,
|
| 793 |
+
≥4,6000,textvqa_val_exact_match,0.48862,0.006787319991169747
|
| 794 |
+
≥4,7000,ai2d_exact_match,0.38471502590673573,0.008756678690415541
|
| 795 |
+
≥4,7000,average,0.42592935170009355,
|
| 796 |
+
≥4,7000,average_rank,3.1,
|
| 797 |
+
≥4,7000,chartqa_relaxed_overall,0.5804,0.009871844677005952
|
| 798 |
+
≥4,7000,docvqa_val_anls,0.5710623718792285,0.006078423874650784
|
| 799 |
+
≥4,7000,infovqa_val_anls,0.22007869704137703,0.006475129444868969
|
| 800 |
+
≥4,7000,mme_total_score,1041.2597038815525,
|
| 801 |
+
≥4,7000,mmmu_val_mmmu_acc,0.28444,
|
| 802 |
+
≥4,7000,mmstar_average,0.3026487819798931,
|
| 803 |
+
≥4,7000,ocrbench_ocrbench_accuracy,0.502,
|
| 804 |
+
≥4,7000,seedbench_seed_all,0.4947192884936076,
|
| 805 |
+
≥4,7000,textvqa_val_exact_match,0.4933,0.006785560460724908
|
| 806 |
+
≥4,8000,ai2d_exact_match,0.3915155440414508,0.008784780895708938
|
| 807 |
+
≥4,8000,average,0.43659006376695136,
|
| 808 |
+
≥4,8000,average_rank,3.0,
|
| 809 |
+
≥4,8000,chartqa_relaxed_overall,0.5736,0.009893046292521752
|
| 810 |
+
≥4,8000,docvqa_val_anls,0.6079864136742988,0.006139878520335163
|
| 811 |
+
≥4,8000,infovqa_val_anls,0.23243402779245617,0.006686893363455147
|
| 812 |
+
≥4,8000,mme_total_score,1108.9173669467787,
|
| 813 |
+
≥4,8000,mmmu_val_mmmu_acc,0.28,
|
| 814 |
+
≥4,8000,mmstar_average,0.3276025817239844,
|
| 815 |
+
≥4,8000,ocrbench_ocrbench_accuracy,0.508,
|
| 816 |
+
≥4,8000,seedbench_seed_all,0.5016120066703724,
|
| 817 |
+
≥4,8000,textvqa_val_exact_match,0.50656,0.006805281452749051
|
| 818 |
+
≥4,9000,ai2d_exact_match,0.39248704663212436,0.008788649010397578
|
| 819 |
+
≥4,9000,average,0.4379212821083599,
|
| 820 |
+
≥4,9000,average_rank,2.9,
|
| 821 |
+
≥4,9000,chartqa_relaxed_overall,0.5844,0.009858475126140203
|
| 822 |
+
≥4,9000,docvqa_val_anls,0.6225000882770518,0.00610983265425905
|
| 823 |
+
≥4,9000,infovqa_val_anls,0.2357319670089269,0.006735352134813103
|
| 824 |
+
≥4,9000,mme_total_score,1054.3165266106444,
|
| 825 |
+
≥4,9000,mmmu_val_mmmu_acc,0.28556,
|
| 826 |
+
≥4,9000,mmstar_average,0.30919474945291153,
|
| 827 |
+
≥4,9000,ocrbench_ocrbench_accuracy,0.496,
|
| 828 |
+
≥4,9000,seedbench_seed_all,0.5078376876042245,
|
| 829 |
+
≥4,9000,textvqa_val_exact_match,0.50758,0.0067866133191798106
|
| 830 |
+
≥4,10000,ai2d_exact_match,0.4177461139896373,0.008876547725654098
|
| 831 |
+
≥4,10000,average,0.4482945169324334,
|
| 832 |
+
≥4,10000,average_rank,3.0,
|
| 833 |
+
≥4,10000,chartqa_relaxed_overall,0.5872,0.009848718845878486
|
| 834 |
+
≥4,10000,docvqa_val_anls,0.6178172719068701,0.006018237392964321
|
| 835 |
+
≥4,10000,infovqa_val_anls,0.24180220451279583,0.006673139519957623
|
| 836 |
+
≥4,10000,mme_total_score,1143.5380152060825,
|
| 837 |
+
≥4,10000,mmmu_val_mmmu_acc,0.29667,
|
| 838 |
+
≥4,10000,mmstar_average,0.31635030378359796,
|
| 839 |
+
≥4,10000,ocrbench_ocrbench_accuracy,0.524,
|
| 840 |
+
≥4,10000,seedbench_seed_all,0.5165647581989995,
|
| 841 |
+
≥4,10000,textvqa_val_exact_match,0.5165,0.006796704277648658
|
| 842 |
+
≥4,11000,ai2d_exact_match,0.41515544041450775,0.00886864516657515
|
| 843 |
+
≥4,11000,average,0.45134109009976725,
|
| 844 |
+
≥4,11000,average_rank,2.5,
|
| 845 |
+
≥4,11000,chartqa_relaxed_overall,0.5956,0.009817474681589429
|
| 846 |
+
≥4,11000,docvqa_val_anls,0.629269001239484,0.00608373788497042
|
| 847 |
+
≥4,11000,infovqa_val_anls,0.24324006994727237,0.006777064540159464
|
| 848 |
+
≥4,11000,mme_total_score,1228.8085234093637,
|
| 849 |
+
≥4,11000,mmmu_val_mmmu_acc,0.28333,
|
| 850 |
+
≥4,11000,mmstar_average,0.3288790569397762,
|
| 851 |
+
≥4,11000,ocrbench_ocrbench_accuracy,0.522,
|
| 852 |
+
≥4,11000,seedbench_seed_all,0.525236242356865,
|
| 853 |
+
≥4,11000,textvqa_val_exact_match,0.5193599999999999,0.0067761804436039675
|
| 854 |
+
≥4,12000,ai2d_exact_match,0.4183937823834197,0.008878484004260249
|
| 855 |
+
≥4,12000,average,0.45687238598965646,
|
| 856 |
+
≥4,12000,average_rank,3.3,
|
| 857 |
+
≥4,12000,chartqa_relaxed_overall,0.5988,0.0098047885010856
|
| 858 |
+
≥4,12000,docvqa_val_anls,0.6281800356608191,0.005956403319187123
|
| 859 |
+
≥4,12000,infovqa_val_anls,0.242249391011484,0.006664412716854741
|
| 860 |
+
≥4,12000,mme_total_score,1051.548619447779,
|
| 861 |
+
≥4,12000,mmmu_val_mmmu_acc,0.28,
|
| 862 |
+
≥4,12000,mmstar_average,0.32638661949265274,
|
| 863 |
+
≥4,12000,ocrbench_ocrbench_accuracy,0.553,
|
| 864 |
+
≥4,12000,seedbench_seed_all,0.5309616453585325,
|
| 865 |
+
≥4,12000,textvqa_val_exact_match,0.53388,0.006762808309810877
|
| 866 |
+
≥4,13000,ai2d_exact_match,0.4323186528497409,0.008916326937351901
|
| 867 |
+
≥4,13000,average,0.46134498058034357,
|
| 868 |
+
≥4,13000,average_rank,3.1,
|
| 869 |
+
≥4,13000,chartqa_relaxed_overall,0.5948,0.009820578470976232
|
| 870 |
+
≥4,13000,docvqa_val_anls,0.6459204882256453,0.006047391420582867
|
| 871 |
+
≥4,13000,infovqa_val_anls,0.24395762124162781,0.006787945348887751
|
| 872 |
+
≥4,13000,mme_total_score,1195.637755102041,
|
| 873 |
+
≥4,13000,mmmu_val_mmmu_acc,0.29556,
|
| 874 |
+
≥4,13000,mmstar_average,0.329867095702076,
|
| 875 |
+
≥4,13000,ocrbench_ocrbench_accuracy,0.542,
|
| 876 |
+
≥4,13000,seedbench_seed_all,0.5337409672040022,
|
| 877 |
+
≥4,13000,textvqa_val_exact_match,0.53394,0.006767804364428913
|
| 878 |
+
≥4,14000,ai2d_exact_match,0.4319948186528497,0.008915528710615492
|
| 879 |
+
≥4,14000,average,0.4668148142530245,
|
| 880 |
+
≥4,14000,average_rank,2.7,
|
| 881 |
+
≥4,14000,chartqa_relaxed_overall,0.6076,0.009767653701044555
|
| 882 |
+
≥4,14000,docvqa_val_anls,0.6561789267585798,0.005953346874132679
|
| 883 |
+
≥4,14000,infovqa_val_anls,0.24945371957223306,0.006769327490532885
|
| 884 |
+
≥4,14000,mme_total_score,1259.298019207683,
|
| 885 |
+
≥4,14000,mmmu_val_mmmu_acc,0.30111,
|
| 886 |
+
≥4,14000,mmstar_average,0.32172026573936097,
|
| 887 |
+
≥4,14000,ocrbench_ocrbench_accuracy,0.55,
|
| 888 |
+
≥4,14000,seedbench_seed_all,0.5360755975541968,
|
| 889 |
+
≥4,14000,textvqa_val_exact_match,0.5472000000000001,0.006748951153204005
|
| 890 |
+
≥4,15000,ai2d_exact_match,0.44624352331606215,0.008946992176353898
|
| 891 |
+
≥4,15000,average,0.46662671868754135,
|
| 892 |
+
≥4,15000,average_rank,2.9,
|
| 893 |
+
≥4,15000,chartqa_relaxed_overall,0.6044,0.009781540134915584
|
| 894 |
+
≥4,15000,docvqa_val_anls,0.6622581274446402,0.005962189435141322
|
| 895 |
+
≥4,15000,infovqa_val_anls,0.2534140745372918,0.006885986461871116
|
| 896 |
+
≥4,15000,mme_total_score,1200.2537014805923,
|
| 897 |
+
≥4,15000,mmmu_val_mmmu_acc,0.29,
|
| 898 |
+
≥4,15000,mmstar_average,0.3198168607331238,
|
| 899 |
+
≥4,15000,ocrbench_ocrbench_accuracy,0.538,
|
| 900 |
+
≥4,15000,seedbench_seed_all,0.5381878821567537,
|
| 901 |
+
≥4,15000,textvqa_val_exact_match,0.54732,0.006746470669416614
|
| 902 |
+
≥4,16000,ai2d_exact_match,0.43911917098445596,0.008932194723472647
|
| 903 |
+
≥4,16000,average,0.46812785270927354,
|
| 904 |
+
≥4,16000,average_rank,3.0,
|
| 905 |
+
≥4,16000,chartqa_relaxed_overall,0.6108,0.00975332737879659
|
| 906 |
+
≥4,16000,docvqa_val_anls,0.6699643513666329,0.005944732124585459
|
| 907 |
+
≥4,16000,infovqa_val_anls,0.2589072217280723,0.006864729360775582
|
| 908 |
+
≥4,16000,mme_total_score,1239.2577030812326,
|
| 909 |
+
≥4,16000,mmmu_val_mmmu_acc,0.28444,
|
| 910 |
+
≥4,16000,mmstar_average,0.3155913699930164,
|
| 911 |
+
≥4,16000,ocrbench_ocrbench_accuracy,0.547,
|
| 912 |
+
≥4,16000,seedbench_seed_all,0.535408560311284,
|
| 913 |
+
≥4,16000,textvqa_val_exact_match,0.55192,0.006727935474062503
|
| 914 |
+
≥4,17000,ai2d_exact_match,0.44591968911917096,0.00894635996642554
|
| 915 |
+
≥4,17000,average,0.4711902865454063,
|
| 916 |
+
≥4,17000,average_rank,2.9,
|
| 917 |
+
≥4,17000,chartqa_relaxed_overall,0.6092,0.009760545645634788
|
| 918 |
+
≥4,17000,docvqa_val_anls,0.6630256300679175,0.005926991608870499
|
| 919 |
+
≥4,17000,infovqa_val_anls,0.2604941623528308,0.0069459226352746855
|
| 920 |
+
≥4,17000,mme_total_score,1231.9475790316128,
|
| 921 |
+
≥4,17000,mmmu_val_mmmu_acc,0.28667,
|
| 922 |
+
≥4,17000,mmstar_average,0.3281863880858027,
|
| 923 |
+
≥4,17000,ocrbench_ocrbench_accuracy,0.559,
|
| 924 |
+
≥4,17000,seedbench_seed_all,0.5380767092829349,
|
| 925 |
+
≥4,17000,textvqa_val_exact_match,0.55014,0.00673464677421427
|
| 926 |
+
≥4,18000,ai2d_exact_match,0.44527202072538863,0.008945084019331404
|
| 927 |
+
≥4,18000,average,0.4730863890198541,
|
| 928 |
+
≥4,18000,average_rank,3.1,
|
| 929 |
+
≥4,18000,chartqa_relaxed_overall,0.6148,0.00973479791861169
|
| 930 |
+
≥4,18000,docvqa_val_anls,0.6724670614264582,0.0059283840951577715
|
| 931 |
+
≥4,18000,infovqa_val_anls,0.2591524677671406,0.006860568910244235
|
| 932 |
+
≥4,18000,mme_total_score,1230.187074829932,
|
| 933 |
+
≥4,18000,mmmu_val_mmmu_acc,0.28222,
|
| 934 |
+
≥4,18000,mmstar_average,0.3313130996754855,
|
| 935 |
+
≥4,18000,ocrbench_ocrbench_accuracy,0.559,
|
| 936 |
+
≥4,18000,seedbench_seed_all,0.5391328515842134,
|
| 937 |
+
≥4,18000,textvqa_val_exact_match,0.55442,0.0067378017419973775
|
| 938 |
+
≥4,19000,ai2d_exact_match,0.4475388601036269,0.008949482610884277
|
| 939 |
+
≥4,19000,average,0.4748981492546839,
|
| 940 |
+
≥4,19000,average_rank,3.0,
|
| 941 |
+
≥4,19000,chartqa_relaxed_overall,0.614,0.009738559226822298
|
| 942 |
+
≥4,19000,docvqa_val_anls,0.6780110114952748,0.005954038851856335
|
| 943 |
+
≥4,19000,infovqa_val_anls,0.2592553130284412,0.0068947091925615645
|
| 944 |
+
≥4,19000,mme_total_score,1280.6934773909566,
|
| 945 |
+
≥4,19000,mmmu_val_mmmu_acc,0.29778,
|
| 946 |
+
≥4,19000,mmstar_average,0.33015385627459537,
|
| 947 |
+
≥4,19000,ocrbench_ocrbench_accuracy,0.55,
|
| 948 |
+
≥4,19000,seedbench_seed_all,0.5397443023902168,
|
| 949 |
+
≥4,19000,textvqa_val_exact_match,0.5576000000000001,0.00671993150976252
|
| 950 |
+
≥4,20000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
|
| 951 |
+
≥4,20000,average,0.47750231781976243,
|
| 952 |
+
≥4,20000,average_rank,2.7,
|
| 953 |
+
≥4,20000,chartqa_relaxed_overall,0.6204,0.009707689307588963
|
| 954 |
+
≥4,20000,docvqa_val_anls,0.673153386104693,0.005945073379634221
|
| 955 |
+
≥4,20000,infovqa_val_anls,0.2604945747241511,0.006917780880359967
|
| 956 |
+
≥4,20000,mme_total_score,1348.6498599439776,
|
| 957 |
+
≥4,20000,mmmu_val_mmmu_acc,0.29222,
|
| 958 |
+
≥4,20000,mmstar_average,0.32618642565880246,
|
| 959 |
+
≥4,20000,ocrbench_ocrbench_accuracy,0.569,
|
| 960 |
+
≥4,20000,seedbench_seed_all,0.5406892718176765,
|
| 961 |
+
≥4,20000,textvqa_val_exact_match,0.5646,0.006722023885782034
|
| 962 |
+
≥5,1000,ai2d_exact_match,0.27396373056994816,0.008027076080717028
|
| 963 |
+
≥5,1000,average,0.26438200891802877,
|
| 964 |
+
≥5,1000,average_rank,3.0,
|
| 965 |
+
≥5,1000,chartqa_relaxed_overall,0.2832,0.00901285729603301
|
| 966 |
+
≥5,1000,docvqa_val_anls,0.32055326545515606,0.0056245129740867565
|
| 967 |
+
≥5,1000,infovqa_val_anls,0.15327397474830004,0.005916826112508726
|
| 968 |
+
≥5,1000,mme_total_score,1087.6624649859943,
|
| 969 |
+
≥5,1000,mmmu_val_mmmu_acc,0.29778,
|
| 970 |
+
≥5,1000,mmstar_average,0.26060215117868224,
|
| 971 |
+
≥5,1000,ocrbench_ocrbench_accuracy,0.259,
|
| 972 |
+
≥5,1000,seedbench_seed_all,0.2649249583101723,
|
| 973 |
+
≥5,1000,textvqa_val_exact_match,0.26614,0.006037548383085275
|
| 974 |
+
≥5,2000,ai2d_exact_match,0.26392487046632124,0.007932917099101329
|
| 975 |
+
≥5,2000,average,0.2929576826335877,
|
| 976 |
+
≥5,2000,average_rank,3.3,
|
| 977 |
+
≥5,2000,chartqa_relaxed_overall,0.3824,0.009721414421746647
|
| 978 |
+
≥5,2000,docvqa_val_anls,0.3929824030686217,0.005977850940256623
|
| 979 |
+
≥5,2000,infovqa_val_anls,0.15895135963621443,0.005878593482981634
|
| 980 |
+
≥5,2000,mme_total_score,1073.4139655862346,
|
| 981 |
+
≥5,2000,mmmu_val_mmmu_acc,0.27333,
|
| 982 |
+
≥5,2000,mmstar_average,0.25335477956948643,
|
| 983 |
+
≥5,2000,ocrbench_ocrbench_accuracy,0.301,
|
| 984 |
+
≥5,2000,seedbench_seed_all,0.26831573096164535,
|
| 985 |
+
≥5,2000,textvqa_val_exact_match,0.34236,0.006479253215027554
|
| 986 |
+
≥5,3000,ai2d_exact_match,0.2600388601036269,0.007895056974601723
|
| 987 |
+
≥5,3000,average,0.3126381365493242,
|
| 988 |
+
≥5,3000,average_rank,3.9,
|
| 989 |
+
≥5,3000,chartqa_relaxed_overall,0.4324,0.009910165515884228
|
| 990 |
+
≥5,3000,docvqa_val_anls,0.4366357263318607,0.00610598785442012
|
| 991 |
+
≥5,3000,infovqa_val_anls,0.17846201123654198,0.006273305639193489
|
| 992 |
+
≥5,3000,mme_total_score,1164.5565226090434,
|
| 993 |
+
≥5,3000,mmmu_val_mmmu_acc,0.28778,
|
| 994 |
+
≥5,3000,mmstar_average,0.25130117268378377,
|
| 995 |
+
≥5,3000,ocrbench_ocrbench_accuracy,0.344,
|
| 996 |
+
≥5,3000,seedbench_seed_all,0.2858254585881045,
|
| 997 |
+
≥5,3000,textvqa_val_exact_match,0.3373,0.006457064405451384
|
| 998 |
+
≥5,4000,ai2d_exact_match,0.25647668393782386,0.007859644922870104
|
| 999 |
+
≥5,4000,average,0.3300809923443584,
|
| 1000 |
+
≥5,4000,average_rank,4.3,
|
| 1001 |
+
≥5,4000,chartqa_relaxed_overall,0.4428,0.009936335154498413
|
| 1002 |
+
≥5,4000,docvqa_val_anls,0.4736486989184438,0.006240863735639683
|
| 1003 |
+
≥5,4000,infovqa_val_anls,0.19267658764675277,0.006512420811238904
|
| 1004 |
+
≥5,4000,mme_total_score,1218.2668067226891,
|
| 1005 |
+
≥5,4000,mmmu_val_mmmu_acc,0.26889,
|
| 1006 |
+
≥5,4000,mmstar_average,0.22297093502644408,
|
| 1007 |
+
≥5,4000,ocrbench_ocrbench_accuracy,0.379,
|
| 1008 |
+
≥5,4000,seedbench_seed_all,0.322846025569761,
|
| 1009 |
+
≥5,4000,textvqa_val_exact_match,0.41142,0.006712445761838313
|
| 1010 |
+
≥5,5000,ai2d_exact_match,0.25161917098445596,0.007810248924722509
|
| 1011 |
+
≥5,5000,average,0.3420574749713038,
|
| 1012 |
+
≥5,5000,average_rank,4.2,
|
| 1013 |
+
≥5,5000,chartqa_relaxed_overall,0.4488,0.009949423119365426
|
| 1014 |
+
≥5,5000,docvqa_val_anls,0.4973120888104521,0.00627054301371889
|
| 1015 |
+
≥5,5000,infovqa_val_anls,0.20687122924296383,0.006767419172429617
|
| 1016 |
+
≥5,5000,mme_total_score,1285.299119647859,
|
| 1017 |
+
≥5,5000,mmmu_val_mmmu_acc,0.26778,
|
| 1018 |
+
≥5,5000,mmstar_average,0.24681232878335083,
|
| 1019 |
+
≥5,5000,ocrbench_ocrbench_accuracy,0.392,
|
| 1020 |
+
≥5,5000,seedbench_seed_all,0.3604224569205114,
|
| 1021 |
+
≥5,5000,textvqa_val_exact_match,0.4069,0.00670861230775927
|
| 1022 |
+
≥5,6000,ai2d_exact_match,0.2704015544041451,0.00799425923314582
|
| 1023 |
+
≥5,6000,average,0.35916516291601697,
|
| 1024 |
+
≥5,6000,average_rank,4.4,
|
| 1025 |
+
≥5,6000,chartqa_relaxed_overall,0.4844,0.009997131241172205
|
| 1026 |
+
≥5,6000,docvqa_val_anls,0.5108154498847224,0.0062636540505031655
|
| 1027 |
+
≥5,6000,infovqa_val_anls,0.20262763630072025,0.0066138397079363274
|
| 1028 |
+
≥5,6000,mme_total_score,1273.862545018007,
|
| 1029 |
+
≥5,6000,mmmu_val_mmmu_acc,0.27444,
|
| 1030 |
+
≥5,6000,mmstar_average,0.2588150329919745,
|
| 1031 |
+
≥5,6000,ocrbench_ocrbench_accuracy,0.403,
|
| 1032 |
+
≥5,6000,seedbench_seed_all,0.4082267926625903,
|
| 1033 |
+
≥5,6000,textvqa_val_exact_match,0.41976,0.006731520716318925
|
| 1034 |
+
≥5,7000,ai2d_exact_match,0.31994818652849744,0.008395421656067303
|
| 1035 |
+
≥5,7000,average,0.3723337802797541,
|
| 1036 |
+
≥5,7000,average_rank,4.5,
|
| 1037 |
+
≥5,7000,chartqa_relaxed_overall,0.476,0.009990471651004463
|
| 1038 |
+
≥5,7000,docvqa_val_anls,0.5291779466505276,0.006267960743408816
|
| 1039 |
+
≥5,7000,infovqa_val_anls,0.20957812087727798,0.006721757004150909
|
| 1040 |
+
≥5,7000,mme_total_score,1327.2439975990396,
|
| 1041 |
+
≥5,7000,mmmu_val_mmmu_acc,0.27222,
|
| 1042 |
+
≥5,7000,mmstar_average,0.29354698358099446,
|
| 1043 |
+
≥5,7000,ocrbench_ocrbench_accuracy,0.403,
|
| 1044 |
+
≥5,7000,seedbench_seed_all,0.42301278488048916,
|
| 1045 |
+
≥5,7000,textvqa_val_exact_match,0.42452,0.006734688198055274
|
| 1046 |
+
≥5,8000,ai2d_exact_match,0.30958549222797926,0.008321027166750249
|
| 1047 |
+
≥5,8000,average,0.37926040717793597,
|
| 1048 |
+
≥5,8000,average_rank,4.5,
|
| 1049 |
+
≥5,8000,chartqa_relaxed_overall,0.5136,0.009998299975543861
|
| 1050 |
+
≥5,8000,docvqa_val_anls,0.5386485557171258,0.006250093887872433
|
| 1051 |
+
≥5,8000,infovqa_val_anls,0.21347817313272946,0.006767638253739939
|
| 1052 |
+
≥5,8000,mme_total_score,1351.172769107643,
|
| 1053 |
+
≥5,8000,mmmu_val_mmmu_acc,0.27667,
|
| 1054 |
+
≥5,8000,mmstar_average,0.27077818615838667,
|
| 1055 |
+
≥5,8000,ocrbench_ocrbench_accuracy,0.406,
|
| 1056 |
+
≥5,8000,seedbench_seed_all,0.4538632573652029,
|
| 1057 |
+
≥5,8000,textvqa_val_exact_match,0.43072,0.00674498000523754
|
| 1058 |
+
≥5,9000,ai2d_exact_match,0.32642487046632124,0.00843949241376102
|
| 1059 |
+
≥5,9000,average,0.3915431470529602,
|
| 1060 |
+
≥5,9000,average_rank,4.4,
|
| 1061 |
+
≥5,9000,chartqa_relaxed_overall,0.5196,0.009994312908659929
|
| 1062 |
+
≥5,9000,docvqa_val_anls,0.5447526718541965,0.006277223186340111
|
| 1063 |
+
≥5,9000,infovqa_val_anls,0.22534586447558344,0.006943394394173722
|
| 1064 |
+
≥5,9000,mme_total_score,1380.0509203681472,
|
| 1065 |
+
≥5,9000,mmmu_val_mmmu_acc,0.28222,
|
| 1066 |
+
≥5,9000,mmstar_average,0.2981132324115024,
|
| 1067 |
+
≥5,9000,ocrbench_ocrbench_accuracy,0.42,
|
| 1068 |
+
≥5,9000,seedbench_seed_all,0.45703168426903834,
|
| 1069 |
+
≥5,9000,textvqa_val_exact_match,0.4504,0.0067806462400486975
|
| 1070 |
+
≥5,10000,ai2d_exact_match,0.3121761658031088,0.008340079044408505
|
| 1071 |
+
≥5,10000,average,0.3945344056050298,
|
| 1072 |
+
≥5,10000,average_rank,4.4,
|
| 1073 |
+
≥5,10000,chartqa_relaxed_overall,0.524,0.009990471651004463
|
| 1074 |
+
≥5,10000,docvqa_val_anls,0.5476477162524015,0.006282119242898783
|
| 1075 |
+
≥5,10000,infovqa_val_anls,0.2268357982996008,0.007080273138697436
|
| 1076 |
+
≥5,10000,mme_total_score,1385.6108443377352,
|
| 1077 |
+
≥5,10000,mmmu_val_mmmu_acc,0.29222,
|
| 1078 |
+
≥5,10000,mmstar_average,0.29882846925636025,
|
| 1079 |
+
≥5,10000,ocrbench_ocrbench_accuracy,0.43,
|
| 1080 |
+
≥5,10000,seedbench_seed_all,0.4627015008337966,
|
| 1081 |
+
≥5,10000,textvqa_val_exact_match,0.4564,0.006792248149691337
|
| 1082 |
+
≥5,11000,ai2d_exact_match,0.3403497409326425,0.008528080007639036
|
| 1083 |
+
≥5,11000,average,0.40311924614292627,
|
| 1084 |
+
≥5,11000,average_rank,4.2,
|
| 1085 |
+
≥5,11000,chartqa_relaxed_overall,0.5404,0.009969297405349211
|
| 1086 |
+
≥5,11000,docvqa_val_anls,0.5698821874786791,0.006251823346664307
|
| 1087 |
+
≥5,11000,infovqa_val_anls,0.22660700332356035,0.006919487246988994
|
| 1088 |
+
≥5,11000,mme_total_score,1358.4087635054022,
|
| 1089 |
+
≥5,11000,mmmu_val_mmmu_acc,0.28778,
|
| 1090 |
+
≥5,11000,mmstar_average,0.28965012568597354,
|
| 1091 |
+
≥5,11000,ocrbench_ocrbench_accuracy,0.436,
|
| 1092 |
+
≥5,11000,seedbench_seed_all,0.4714841578654808,
|
| 1093 |
+
≥5,11000,textvqa_val_exact_match,0.46592,0.006784225516827446
|
| 1094 |
+
≥5,12000,ai2d_exact_match,0.342940414507772,0.008543648986216495
|
| 1095 |
+
≥5,12000,average,0.4128131018529697,
|
| 1096 |
+
≥5,12000,average_rank,4.3,
|
| 1097 |
+
≥5,12000,chartqa_relaxed_overall,0.5548,0.009941746291659784
|
| 1098 |
+
≥5,12000,docvqa_val_anls,0.578981486161722,0.00625708617478689
|
| 1099 |
+
≥5,12000,infovqa_val_anls,0.2380032381589791,0.007080943870134072
|
| 1100 |
+
≥5,12000,mme_total_score,1390.3039215686274,
|
| 1101 |
+
≥5,12000,mmmu_val_mmmu_acc,0.28222,
|
| 1102 |
+
≥5,12000,mmstar_average,0.3060607822951693,
|
| 1103 |
+
≥5,12000,ocrbench_ocrbench_accuracy,0.472,
|
| 1104 |
+
≥5,12000,seedbench_seed_all,0.46559199555308506,
|
| 1105 |
+
≥5,12000,textvqa_val_exact_match,0.47472,0.006773519058221244
|
| 1106 |
+
≥5,13000,ai2d_exact_match,0.33678756476683935,0.008506208807020252
|
| 1107 |
+
≥5,13000,average,0.41416266738683244,
|
| 1108 |
+
≥5,13000,average_rank,4.5,
|
| 1109 |
+
≥5,13000,chartqa_relaxed_overall,0.5564,0.009938164963872337
|
| 1110 |
+
≥5,13000,docvqa_val_anls,0.5882749499950303,0.0062089530468064
|
| 1111 |
+
≥5,13000,infovqa_val_anls,0.2250291831460855,0.007008754051627638
|
| 1112 |
+
≥5,13000,mme_total_score,1463.7286914765905,
|
| 1113 |
+
≥5,13000,mmmu_val_mmmu_acc,0.28222,
|
| 1114 |
+
≥5,13000,mmstar_average,0.32070873992428756,
|
| 1115 |
+
≥5,13000,ocrbench_ocrbench_accuracy,0.475,
|
| 1116 |
+
≥5,13000,seedbench_seed_all,0.4624235686492496,
|
| 1117 |
+
≥5,13000,textvqa_val_exact_match,0.48062,0.006792356759039414
|
| 1118 |
+
≥5,14000,ai2d_exact_match,0.35103626943005184,0.008590489143063932
|
| 1119 |
+
≥5,14000,average,0.4197541703337554,
|
| 1120 |
+
≥5,14000,average_rank,4.4,
|
| 1121 |
+
≥5,14000,chartqa_relaxed_overall,0.5644,0.00991868984106597
|
| 1122 |
+
≥5,14000,docvqa_val_anls,0.5968397354218249,0.006216108072191749
|
| 1123 |
+
≥5,14000,infovqa_val_anls,0.23493831065135024,0.00713715521281919
|
| 1124 |
+
≥5,14000,mme_total_score,1381.4046618647458,
|
| 1125 |
+
≥5,14000,mmmu_val_mmmu_acc,0.28778,
|
| 1126 |
+
≥5,14000,mmstar_average,0.3178569306745569,
|
| 1127 |
+
≥5,14000,ocrbench_ocrbench_accuracy,0.465,
|
| 1128 |
+
≥5,14000,seedbench_seed_all,0.46931628682601445,
|
| 1129 |
+
≥5,14000,textvqa_val_exact_match,0.49062,0.0067877549928948315
|
| 1130 |
+
≥5,15000,ai2d_exact_match,0.3448834196891192,0.008555140353607656
|
| 1131 |
+
≥5,15000,average,0.4156222682362929,
|
| 1132 |
+
≥5,15000,average_rank,4.5,
|
| 1133 |
+
≥5,15000,chartqa_relaxed_overall,0.5544,0.009942625323290008
|
| 1134 |
+
≥5,15000,docvqa_val_anls,0.5981327465682499,0.0062119027314077434
|
| 1135 |
+
≥5,15000,infovqa_val_anls,0.2430496253387209,0.007241165402150032
|
| 1136 |
+
≥5,15000,mme_total_score,1405.2406962785115,
|
| 1137 |
+
≥5,15000,mmmu_val_mmmu_acc,0.27667,
|
| 1138 |
+
≥5,15000,mmstar_average,0.30769549523760537,
|
| 1139 |
+
≥5,15000,ocrbench_ocrbench_accuracy,0.462,
|
| 1140 |
+
≥5,15000,seedbench_seed_all,0.4724291272929405,
|
| 1141 |
+
≥5,15000,textvqa_val_exact_match,0.48134,0.006785688616050607
|
| 1142 |
+
≥5,16000,ai2d_exact_match,0.3555699481865285,0.008615532040064747
|
| 1143 |
+
≥5,16000,average,0.41928937760980056,
|
| 1144 |
+
≥5,16000,average_rank,4.6,
|
| 1145 |
+
≥5,16000,chartqa_relaxed_overall,0.556,0.00993907007952043
|
| 1146 |
+
≥5,16000,docvqa_val_anls,0.5950015990375694,0.006217949166028718
|
| 1147 |
+
≥5,16000,infovqa_val_anls,0.2429016453664355,0.007192121794741783
|
| 1148 |
+
≥5,16000,mme_total_score,1444.7096838735495,
|
| 1149 |
+
≥5,16000,mmmu_val_mmmu_acc,0.27222,
|
| 1150 |
+
≥5,16000,mmstar_average,0.29597997743741594,
|
| 1151 |
+
≥5,16000,ocrbench_ocrbench_accuracy,0.484,
|
| 1152 |
+
≥5,16000,seedbench_seed_all,0.4802112284602557,
|
| 1153 |
+
≥5,16000,textvqa_val_exact_match,0.49172000000000005,0.006790781344017229
|
| 1154 |
+
≥5,17000,ai2d_exact_match,0.35783678756476683,0.008627736835305362
|
| 1155 |
+
≥5,17000,average,0.4243671877798907,
|
| 1156 |
+
≥5,17000,average_rank,4.2,
|
| 1157 |
+
≥5,17000,chartqa_relaxed_overall,0.566,0.00991448025705367
|
| 1158 |
+
≥5,17000,docvqa_val_anls,0.6011648636453683,0.006202633675401635
|
| 1159 |
+
≥5,17000,infovqa_val_anls,0.24233190899997978,0.007185211142139982
|
| 1160 |
+
≥5,17000,mme_total_score,1383.0262104841936,
|
| 1161 |
+
≥5,17000,mmmu_val_mmmu_acc,0.29778,
|
| 1162 |
+
≥5,17000,mmstar_average,0.30487588800790116,
|
| 1163 |
+
≥5,17000,ocrbench_ocrbench_accuracy,0.48,
|
| 1164 |
+
≥5,17000,seedbench_seed_all,0.48343524180100056,
|
| 1165 |
+
≥5,17000,textvqa_val_exact_match,0.48588000000000003,0.006793096079908642
|
| 1166 |
+
≥5,18000,ai2d_exact_match,0.3484455958549223,0.008575797499263314
|
| 1167 |
+
≥5,18000,average,0.4229679292209723,
|
| 1168 |
+
≥5,18000,average_rank,4.4,
|
| 1169 |
+
≥5,18000,chartqa_relaxed_overall,0.5564,0.009938164963872337
|
| 1170 |
+
≥5,18000,docvqa_val_anls,0.6015112951191799,0.006202182626672507
|
| 1171 |
+
≥5,18000,infovqa_val_anls,0.2406225801562843,0.007159684093951319
|
| 1172 |
+
≥5,18000,mme_total_score,1388.7428971588636,
|
| 1173 |
+
≥5,18000,mmmu_val_mmmu_acc,0.29444,
|
| 1174 |
+
≥5,18000,mmstar_average,0.3048242431646457,
|
| 1175 |
+
≥5,18000,ocrbench_ocrbench_accuracy,0.489,
|
| 1176 |
+
≥5,18000,seedbench_seed_all,0.48176764869371874,
|
| 1177 |
+
≥5,18000,textvqa_val_exact_match,0.4897,0.006784304485905058
|
| 1178 |
+
≥5,19000,ai2d_exact_match,0.3552461139896373,0.00861377131101951
|
| 1179 |
+
≥5,19000,average,0.4271521214095191,
|
| 1180 |
+
≥5,19000,average_rank,4.4,
|
| 1181 |
+
≥5,19000,chartqa_relaxed_overall,0.564,0.009919725822025206
|
| 1182 |
+
≥5,19000,docvqa_val_anls,0.6030864459750552,0.006186369284106836
|
| 1183 |
+
≥5,19000,infovqa_val_anls,0.24933668460761893,0.007278561320407618
|
| 1184 |
+
≥5,19000,mme_total_score,1420.4112645058024,
|
| 1185 |
+
≥5,19000,mmmu_val_mmmu_acc,0.28889,
|
| 1186 |
+
≥5,19000,mmstar_average,0.3146239893029105,
|
| 1187 |
+
≥5,19000,ocrbench_ocrbench_accuracy,0.495,
|
| 1188 |
+
≥5,19000,seedbench_seed_all,0.48254585881045026,
|
| 1189 |
+
≥5,19000,textvqa_val_exact_match,0.49163999999999997,0.006786164784802775
|
| 1190 |
+
≥5,20000,ai2d_exact_match,0.3630181347150259,0.008654846701304475
|
| 1191 |
+
≥5,20000,average,0.4295614849660391,
|
| 1192 |
+
≥5,20000,average_rank,4.6,
|
| 1193 |
+
≥5,20000,chartqa_relaxed_overall,0.566,0.00991448025705367
|
| 1194 |
+
≥5,20000,docvqa_val_anls,0.6104819342838989,0.006177273769345363
|
| 1195 |
+
≥5,20000,infovqa_val_anls,0.24655527159214874,0.007266841528312276
|
| 1196 |
+
≥5,20000,mme_total_score,1413.0101040416166,
|
| 1197 |
+
≥5,20000,mmmu_val_mmmu_acc,0.28444,
|
| 1198 |
+
≥5,20000,mmstar_average,0.32507494461467334,
|
| 1199 |
+
≥5,20000,ocrbench_ocrbench_accuracy,0.499,
|
| 1200 |
+
≥5,20000,seedbench_seed_all,0.4775430794886048,
|
| 1201 |
+
≥5,20000,textvqa_val_exact_match,0.4939399999999999,0.006784796384004054
|
app/src/content/assets/data/image_correspondence_filters.csv
ADDED
|
@@ -0,0 +1,1177 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
run,step,metric,value,stderr
|
| 2 |
+
Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
|
| 3 |
+
Baseline,1000,average,0.27120689295763617,
|
| 4 |
+
Baseline,1000,average_rank,3.3,
|
| 5 |
+
Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
|
| 6 |
+
Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
|
| 7 |
+
Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
|
| 8 |
+
Baseline,1000,mme_total_score,977.4280712284914,
|
| 9 |
+
Baseline,1000,mmmu_val_mmmu_acc,0.25222,
|
| 10 |
+
Baseline,1000,mmstar_average,0.23215874078908072,
|
| 11 |
+
Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
|
| 12 |
+
Baseline,1000,seedbench_seed_all,0.2563646470261256,
|
| 13 |
+
Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
|
| 14 |
+
Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
|
| 15 |
+
Baseline,2000,average,0.3202068275596269,
|
| 16 |
+
Baseline,2000,average_rank,3.1,
|
| 17 |
+
Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
|
| 18 |
+
Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
|
| 19 |
+
Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
|
| 20 |
+
Baseline,2000,mme_total_score,1049.3036214485794,
|
| 21 |
+
Baseline,2000,mmmu_val_mmmu_acc,0.24556,
|
| 22 |
+
Baseline,2000,mmstar_average,0.21305462434540698,
|
| 23 |
+
Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
|
| 24 |
+
Baseline,2000,seedbench_seed_all,0.258532518065592,
|
| 25 |
+
Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
|
| 26 |
+
Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
|
| 27 |
+
Baseline,3000,average,0.3507423834414229,
|
| 28 |
+
Baseline,3000,average_rank,2.6,
|
| 29 |
+
Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
|
| 30 |
+
Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
|
| 31 |
+
Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
|
| 32 |
+
Baseline,3000,mme_total_score,1170.2383953581434,
|
| 33 |
+
Baseline,3000,mmmu_val_mmmu_acc,0.27556,
|
| 34 |
+
Baseline,3000,mmstar_average,0.25432376938577683,
|
| 35 |
+
Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
|
| 36 |
+
Baseline,3000,seedbench_seed_all,0.2792106725958866,
|
| 37 |
+
Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
|
| 38 |
+
Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
|
| 39 |
+
Baseline,4000,average,0.36961781722974835,
|
| 40 |
+
Baseline,4000,average_rank,3.2,
|
| 41 |
+
Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
|
| 42 |
+
Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
|
| 43 |
+
Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
|
| 44 |
+
Baseline,4000,mme_total_score,1155.203781512605,
|
| 45 |
+
Baseline,4000,mmmu_val_mmmu_acc,0.25556,
|
| 46 |
+
Baseline,4000,mmstar_average,0.2575590188757354,
|
| 47 |
+
Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
|
| 48 |
+
Baseline,4000,seedbench_seed_all,0.33913285158421347,
|
| 49 |
+
Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
|
| 50 |
+
Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
|
| 51 |
+
Baseline,5000,average,0.3974627910380972,
|
| 52 |
+
Baseline,5000,average_rank,3.1,
|
| 53 |
+
Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
|
| 54 |
+
Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
|
| 55 |
+
Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
|
| 56 |
+
Baseline,5000,mme_total_score,1181.4653861544618,
|
| 57 |
+
Baseline,5000,mmmu_val_mmmu_acc,0.26667,
|
| 58 |
+
Baseline,5000,mmstar_average,0.29596648146165705,
|
| 59 |
+
Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
|
| 60 |
+
Baseline,5000,seedbench_seed_all,0.43107281823235133,
|
| 61 |
+
Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
|
| 62 |
+
Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
|
| 63 |
+
Baseline,6000,average,0.4161227404571003,
|
| 64 |
+
Baseline,6000,average_rank,2.9,
|
| 65 |
+
Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
|
| 66 |
+
Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
|
| 67 |
+
Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
|
| 68 |
+
Baseline,6000,mme_total_score,1284.1648659463785,
|
| 69 |
+
Baseline,6000,mmmu_val_mmmu_acc,0.27111,
|
| 70 |
+
Baseline,6000,mmstar_average,0.2978489412854164,
|
| 71 |
+
Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
|
| 72 |
+
Baseline,6000,seedbench_seed_all,0.4795997776542524,
|
| 73 |
+
Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
|
| 74 |
+
Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
|
| 75 |
+
Baseline,7000,average,0.4291083177345374,
|
| 76 |
+
Baseline,7000,average_rank,2.4,
|
| 77 |
+
Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
|
| 78 |
+
Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
|
| 79 |
+
Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
|
| 80 |
+
Baseline,7000,mme_total_score,1185.875650260104,
|
| 81 |
+
Baseline,7000,mmmu_val_mmmu_acc,0.26556,
|
| 82 |
+
Baseline,7000,mmstar_average,0.31372400960777047,
|
| 83 |
+
Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
|
| 84 |
+
Baseline,7000,seedbench_seed_all,0.4964424680377988,
|
| 85 |
+
Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
|
| 86 |
+
Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
|
| 87 |
+
Baseline,8000,average,0.43846759477995995,
|
| 88 |
+
Baseline,8000,average_rank,2.4,
|
| 89 |
+
Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
|
| 90 |
+
Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
|
| 91 |
+
Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
|
| 92 |
+
Baseline,8000,mme_total_score,1199.2409963985594,
|
| 93 |
+
Baseline,8000,mmmu_val_mmmu_acc,0.28111,
|
| 94 |
+
Baseline,8000,mmstar_average,0.33512257186205047,
|
| 95 |
+
Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
|
| 96 |
+
Baseline,8000,seedbench_seed_all,0.5024458032240133,
|
| 97 |
+
Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
|
| 98 |
+
Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
|
| 99 |
+
Baseline,9000,average,0.4422510732201056,
|
| 100 |
+
Baseline,9000,average_rank,2.5,
|
| 101 |
+
Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
|
| 102 |
+
Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
|
| 103 |
+
Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
|
| 104 |
+
Baseline,9000,mme_total_score,1231.5195078031213,
|
| 105 |
+
Baseline,9000,mmmu_val_mmmu_acc,0.25889,
|
| 106 |
+
Baseline,9000,mmstar_average,0.3216444898242951,
|
| 107 |
+
Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
|
| 108 |
+
Baseline,9000,seedbench_seed_all,0.5120622568093385,
|
| 109 |
+
Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
|
| 110 |
+
Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
|
| 111 |
+
Baseline,10000,average,0.4523875703250908,
|
| 112 |
+
Baseline,10000,average_rank,2.3,
|
| 113 |
+
Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
|
| 114 |
+
Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
|
| 115 |
+
Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
|
| 116 |
+
Baseline,10000,mme_total_score,1240.8218287314926,
|
| 117 |
+
Baseline,10000,mmmu_val_mmmu_acc,0.28778,
|
| 118 |
+
Baseline,10000,mmstar_average,0.32972717906018517,
|
| 119 |
+
Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
|
| 120 |
+
Baseline,10000,seedbench_seed_all,0.5217342968315731,
|
| 121 |
+
Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
|
| 122 |
+
Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
|
| 123 |
+
Baseline,11000,average,0.4561398159525099,
|
| 124 |
+
Baseline,11000,average_rank,2.6,
|
| 125 |
+
Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
|
| 126 |
+
Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
|
| 127 |
+
Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
|
| 128 |
+
Baseline,11000,mme_total_score,1322.9488795518205,
|
| 129 |
+
Baseline,11000,mmmu_val_mmmu_acc,0.27778,
|
| 130 |
+
Baseline,11000,mmstar_average,0.3298563439522548,
|
| 131 |
+
Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
|
| 132 |
+
Baseline,11000,seedbench_seed_all,0.5237354085603113,
|
| 133 |
+
Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
|
| 134 |
+
Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
|
| 135 |
+
Baseline,12000,average,0.4582751140055433,
|
| 136 |
+
Baseline,12000,average_rank,2.7,
|
| 137 |
+
Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 138 |
+
Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
|
| 139 |
+
Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
|
| 140 |
+
Baseline,12000,mme_total_score,1225.6453581432572,
|
| 141 |
+
Baseline,12000,mmmu_val_mmmu_acc,0.27889,
|
| 142 |
+
Baseline,12000,mmstar_average,0.34010867846816534,
|
| 143 |
+
Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
|
| 144 |
+
Baseline,12000,seedbench_seed_all,0.5350194552529183,
|
| 145 |
+
Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
|
| 146 |
+
Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
|
| 147 |
+
Baseline,13000,average,0.4692868662590049,
|
| 148 |
+
Baseline,13000,average_rank,2.6,
|
| 149 |
+
Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
|
| 150 |
+
Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
|
| 151 |
+
Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
|
| 152 |
+
Baseline,13000,mme_total_score,1281.7122849139657,
|
| 153 |
+
Baseline,13000,mmmu_val_mmmu_acc,0.28222,
|
| 154 |
+
Baseline,13000,mmstar_average,0.3453069542917521,
|
| 155 |
+
Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
|
| 156 |
+
Baseline,13000,seedbench_seed_all,0.5442468037798777,
|
| 157 |
+
Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
|
| 158 |
+
Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
|
| 159 |
+
Baseline,14000,average,0.47352486841689195,
|
| 160 |
+
Baseline,14000,average_rank,2.5,
|
| 161 |
+
Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
|
| 162 |
+
Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
|
| 163 |
+
Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
|
| 164 |
+
Baseline,14000,mme_total_score,1309.1444577831132,
|
| 165 |
+
Baseline,14000,mmmu_val_mmmu_acc,0.28111,
|
| 166 |
+
Baseline,14000,mmstar_average,0.34575818188776586,
|
| 167 |
+
Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
|
| 168 |
+
Baseline,14000,seedbench_seed_all,0.5483602001111729,
|
| 169 |
+
Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
|
| 170 |
+
Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
|
| 171 |
+
Baseline,15000,average,0.47878665012878824,
|
| 172 |
+
Baseline,15000,average_rank,2.1,
|
| 173 |
+
Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 174 |
+
Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
|
| 175 |
+
Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
|
| 176 |
+
Baseline,15000,mme_total_score,1384.2171868747498,
|
| 177 |
+
Baseline,15000,mmmu_val_mmmu_acc,0.30222,
|
| 178 |
+
Baseline,15000,mmstar_average,0.35408135695920684,
|
| 179 |
+
Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
|
| 180 |
+
Baseline,15000,seedbench_seed_all,0.5411339633129516,
|
| 181 |
+
Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
|
| 182 |
+
Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
|
| 183 |
+
Baseline,16000,average,0.47665128022935843,
|
| 184 |
+
Baseline,16000,average_rank,2.3,
|
| 185 |
+
Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 186 |
+
Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
|
| 187 |
+
Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
|
| 188 |
+
Baseline,16000,mme_total_score,1317.8491396558625,
|
| 189 |
+
Baseline,16000,mmmu_val_mmmu_acc,0.27556,
|
| 190 |
+
Baseline,16000,mmstar_average,0.33214333327093315,
|
| 191 |
+
Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
|
| 192 |
+
Baseline,16000,seedbench_seed_all,0.5463590883824346,
|
| 193 |
+
Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
|
| 194 |
+
Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
|
| 195 |
+
Baseline,17000,average,0.4777141780162423,
|
| 196 |
+
Baseline,17000,average_rank,2.3,
|
| 197 |
+
Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 198 |
+
Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
|
| 199 |
+
Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
|
| 200 |
+
Baseline,17000,mme_total_score,1381.9161664665867,
|
| 201 |
+
Baseline,17000,mmmu_val_mmmu_acc,0.27667,
|
| 202 |
+
Baseline,17000,mmstar_average,0.3370289492329521,
|
| 203 |
+
Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
|
| 204 |
+
Baseline,17000,seedbench_seed_all,0.5510283490828238,
|
| 205 |
+
Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
|
| 206 |
+
Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
|
| 207 |
+
Baseline,18000,average,0.4819834595278701,
|
| 208 |
+
Baseline,18000,average_rank,2.5,
|
| 209 |
+
Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
|
| 210 |
+
Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
|
| 211 |
+
Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
|
| 212 |
+
Baseline,18000,mme_total_score,1336.922769107643,
|
| 213 |
+
Baseline,18000,mmmu_val_mmmu_acc,0.28667,
|
| 214 |
+
Baseline,18000,mmstar_average,0.34482796716566916,
|
| 215 |
+
Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
|
| 216 |
+
Baseline,18000,seedbench_seed_all,0.5543079488604781,
|
| 217 |
+
Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
|
| 218 |
+
Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
|
| 219 |
+
Baseline,19000,average,0.4899006713916878,
|
| 220 |
+
Baseline,19000,average_rank,2.1,
|
| 221 |
+
Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
|
| 222 |
+
Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
|
| 223 |
+
Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
|
| 224 |
+
Baseline,19000,mme_total_score,1406.6628651460583,
|
| 225 |
+
Baseline,19000,mmmu_val_mmmu_acc,0.28333,
|
| 226 |
+
Baseline,19000,mmstar_average,0.356220913822775,
|
| 227 |
+
Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
|
| 228 |
+
Baseline,19000,seedbench_seed_all,0.554585881045025,
|
| 229 |
+
Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
|
| 230 |
+
Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
|
| 231 |
+
Baseline,20000,average,0.4873169067639118,
|
| 232 |
+
Baseline,20000,average_rank,2.1,
|
| 233 |
+
Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
|
| 234 |
+
Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
|
| 235 |
+
Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
|
| 236 |
+
Baseline,20000,mme_total_score,1324.6738695478193,
|
| 237 |
+
Baseline,20000,mmmu_val_mmmu_acc,0.30111,
|
| 238 |
+
Baseline,20000,mmstar_average,0.33806766134497995,
|
| 239 |
+
Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
|
| 240 |
+
Baseline,20000,seedbench_seed_all,0.5587548638132296,
|
| 241 |
+
Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
|
| 242 |
+
≥2,1000,ai2d_exact_match,0.25647668393782386,0.007859644922870102
|
| 243 |
+
≥2,1000,average,0.27425088839708317,
|
| 244 |
+
≥2,1000,average_rank,3.6,
|
| 245 |
+
≥2,1000,chartqa_relaxed_overall,0.3528,0.009558734841217527
|
| 246 |
+
≥2,1000,docvqa_val_anls,0.3487177879998493,0.005772448136868996
|
| 247 |
+
≥2,1000,infovqa_val_anls,0.16953112470324783,0.006224999754409024
|
| 248 |
+
≥2,1000,mme_total_score,824.5716286514606,
|
| 249 |
+
≥2,1000,mmmu_val_mmmu_acc,0.26444,
|
| 250 |
+
≥2,1000,mmstar_average,0.22150750732637972,
|
| 251 |
+
≥2,1000,ocrbench_ocrbench_accuracy,0.277,
|
| 252 |
+
≥2,1000,seedbench_seed_all,0.24880489160644803,
|
| 253 |
+
≥2,1000,textvqa_val_exact_match,0.32898,0.006433157002732356
|
| 254 |
+
≥2,2000,ai2d_exact_match,0.2713730569948187,0.008003273563555614
|
| 255 |
+
≥2,2000,average,0.3170484430215563,
|
| 256 |
+
≥2,2000,average_rank,3.2,
|
| 257 |
+
≥2,2000,chartqa_relaxed_overall,0.4576,0.009965973321743335
|
| 258 |
+
≥2,2000,docvqa_val_anls,0.4397320562320007,0.006117658797669254
|
| 259 |
+
≥2,2000,infovqa_val_anls,0.19182235122159558,0.006445631136889586
|
| 260 |
+
≥2,2000,mme_total_score,943.4792917166867,
|
| 261 |
+
≥2,2000,mmmu_val_mmmu_acc,0.26222,
|
| 262 |
+
≥2,2000,mmstar_average,0.21467384792624822,
|
| 263 |
+
≥2,2000,ocrbench_ocrbench_accuracy,0.366,
|
| 264 |
+
≥2,2000,seedbench_seed_all,0.2464146748193441,
|
| 265 |
+
≥2,2000,textvqa_val_exact_match,0.40359999999999996,0.006696325571179395
|
| 266 |
+
≥2,3000,ai2d_exact_match,0.2801165803108808,0.008082248116182685
|
| 267 |
+
≥2,3000,average,0.34966771857538625,
|
| 268 |
+
≥2,3000,average_rank,2.8,
|
| 269 |
+
≥2,3000,chartqa_relaxed_overall,0.5228,0.009991596308834713
|
| 270 |
+
≥2,3000,docvqa_val_anls,0.4874772803745103,0.006201774677139367
|
| 271 |
+
≥2,3000,infovqa_val_anls,0.22283805188110828,0.00694062083381895
|
| 272 |
+
≥2,3000,mme_total_score,966.8010204081634,
|
| 273 |
+
≥2,3000,mmmu_val_mmmu_acc,0.27,
|
| 274 |
+
≥2,3000,mmstar_average,0.2379978992478856,
|
| 275 |
+
≥2,3000,ocrbench_ocrbench_accuracy,0.411,
|
| 276 |
+
≥2,3000,seedbench_seed_all,0.28337965536409115,
|
| 277 |
+
≥2,3000,textvqa_val_exact_match,0.4314,0.0067401404954778015
|
| 278 |
+
≥2,4000,ai2d_exact_match,0.29533678756476683,0.008210720304314063
|
| 279 |
+
≥2,4000,average,0.3812564539509352,
|
| 280 |
+
≥2,4000,average_rank,2.5,
|
| 281 |
+
≥2,4000,chartqa_relaxed_overall,0.5388,0.0099718403035556
|
| 282 |
+
≥2,4000,docvqa_val_anls,0.5330469699330452,0.006286693650476338
|
| 283 |
+
≥2,4000,infovqa_val_anls,0.24204946206609423,0.00717558288279668
|
| 284 |
+
≥2,4000,mme_total_score,995.9115646258504,
|
| 285 |
+
≥2,4000,mmmu_val_mmmu_acc,0.26667,
|
| 286 |
+
≥2,4000,mmstar_average,0.3026544157443715,
|
| 287 |
+
≥2,4000,ocrbench_ocrbench_accuracy,0.455,
|
| 288 |
+
≥2,4000,seedbench_seed_all,0.35881045025013897,
|
| 289 |
+
≥2,4000,textvqa_val_exact_match,0.43893999999999994,0.006772821384172211
|
| 290 |
+
≥2,5000,ai2d_exact_match,0.33711139896373055,0.008508219384896985
|
| 291 |
+
≥2,5000,average,0.4071218650285344,
|
| 292 |
+
≥2,5000,average_rank,2.3,
|
| 293 |
+
≥2,5000,chartqa_relaxed_overall,0.5668,0.009912336039617753
|
| 294 |
+
≥2,5000,docvqa_val_anls,0.564524028708337,0.0062521888936635335
|
| 295 |
+
≥2,5000,infovqa_val_anls,0.24496968712079598,0.007124175210142404
|
| 296 |
+
≥2,5000,mme_total_score,1015.9452781112445,
|
| 297 |
+
≥2,5000,mmmu_val_mmmu_acc,0.27889,
|
| 298 |
+
≥2,5000,mmstar_average,0.28054619519991103,
|
| 299 |
+
≥2,5000,ocrbench_ocrbench_accuracy,0.489,
|
| 300 |
+
≥2,5000,seedbench_seed_all,0.43985547526403557,
|
| 301 |
+
≥2,5000,textvqa_val_exact_match,0.4624,0.006784220893413342
|
| 302 |
+
≥2,6000,ai2d_exact_match,0.35103626943005184,0.008590489143063932
|
| 303 |
+
≥2,6000,average,0.4121891443057646,
|
| 304 |
+
≥2,6000,average_rank,3.0,
|
| 305 |
+
≥2,6000,chartqa_relaxed_overall,0.5768,0.009883307943718245
|
| 306 |
+
≥2,6000,docvqa_val_anls,0.5776287354366231,0.0062230020803370695
|
| 307 |
+
≥2,6000,infovqa_val_anls,0.2221908019883868,0.006590859192234515
|
| 308 |
+
≥2,6000,mme_total_score,1020.3381352541016,
|
| 309 |
+
≥2,6000,mmmu_val_mmmu_acc,0.28,
|
| 310 |
+
≥2,6000,mmstar_average,0.27381767588792544,
|
| 311 |
+
≥2,6000,ocrbench_ocrbench_accuracy,0.488,
|
| 312 |
+
≥2,6000,seedbench_seed_all,0.46386881600889385,
|
| 313 |
+
≥2,6000,textvqa_val_exact_match,0.47636000000000006,0.006799814525081922
|
| 314 |
+
≥2,7000,ai2d_exact_match,0.37629533678756477,0.008719379877890883
|
| 315 |
+
≥2,7000,average,0.41852126487504937,
|
| 316 |
+
≥2,7000,average_rank,3.6,
|
| 317 |
+
≥2,7000,chartqa_relaxed_overall,0.5784,0.009878279615563902
|
| 318 |
+
≥2,7000,docvqa_val_anls,0.5890225700952161,0.00623482047941176
|
| 319 |
+
≥2,7000,infovqa_val_anls,0.223522004380568,0.006616105445267792
|
| 320 |
+
≥2,7000,mme_total_score,1017.6768707482994,
|
| 321 |
+
≥2,7000,mmmu_val_mmmu_acc,0.26444,
|
| 322 |
+
≥2,7000,mmstar_average,0.2842963864531179,
|
| 323 |
+
≥2,7000,ocrbench_ocrbench_accuracy,0.485,
|
| 324 |
+
≥2,7000,seedbench_seed_all,0.47915508615897723,
|
| 325 |
+
≥2,7000,textvqa_val_exact_match,0.48656000000000005,0.006793372009587883
|
| 326 |
+
≥2,8000,ai2d_exact_match,0.4015544041450777,0.008822998789014791
|
| 327 |
+
≥2,8000,average,0.43741617461905385,
|
| 328 |
+
≥2,8000,average_rank,2.7,
|
| 329 |
+
≥2,8000,chartqa_relaxed_overall,0.5868,0.009850132691777215
|
| 330 |
+
≥2,8000,docvqa_val_anls,0.6064868329976114,0.006195078404871516
|
| 331 |
+
≥2,8000,infovqa_val_anls,0.237253715462471,0.006761266007987291
|
| 332 |
+
≥2,8000,mme_total_score,1051.3844537815125,
|
| 333 |
+
≥2,8000,mmmu_val_mmmu_acc,0.29556,
|
| 334 |
+
≥2,8000,mmstar_average,0.3249125644916164,
|
| 335 |
+
≥2,8000,ocrbench_ocrbench_accuracy,0.499,
|
| 336 |
+
≥2,8000,seedbench_seed_all,0.4964980544747082,
|
| 337 |
+
≥2,8000,textvqa_val_exact_match,0.48868,0.006786367399168372
|
| 338 |
+
≥2,9000,ai2d_exact_match,0.40382124352331605,0.008831094143874315
|
| 339 |
+
≥2,9000,average,0.4404946424331453,
|
| 340 |
+
≥2,9000,average_rank,2.9,
|
| 341 |
+
≥2,9000,chartqa_relaxed_overall,0.6032,0.00978663452296623
|
| 342 |
+
≥2,9000,docvqa_val_anls,0.6121548768634689,0.0061762532067103386
|
| 343 |
+
≥2,9000,infovqa_val_anls,0.22182207634556947,0.006503514281737561
|
| 344 |
+
≥2,9000,mme_total_score,1016.1477591036414,
|
| 345 |
+
≥2,9000,mmmu_val_mmmu_acc,0.28222,
|
| 346 |
+
≥2,9000,mmstar_average,0.33800404653337934,
|
| 347 |
+
≥2,9000,ocrbench_ocrbench_accuracy,0.5,
|
| 348 |
+
≥2,9000,seedbench_seed_all,0.5051695386325736,
|
| 349 |
+
≥2,9000,textvqa_val_exact_match,0.49805999999999995,0.006801536551389838
|
| 350 |
+
≥2,10000,ai2d_exact_match,0.4258419689119171,0.00889962357526378
|
| 351 |
+
≥2,10000,average,0.45210592763811075,
|
| 352 |
+
≥2,10000,average_rank,2.3,
|
| 353 |
+
≥2,10000,chartqa_relaxed_overall,0.5944,0.009822120220107639
|
| 354 |
+
≥2,10000,docvqa_val_anls,0.6316361917189336,0.006144343697405114
|
| 355 |
+
≥2,10000,infovqa_val_anls,0.23913212463600403,0.0067351911105917
|
| 356 |
+
≥2,10000,mme_total_score,989.5969387755102,
|
| 357 |
+
≥2,10000,mmmu_val_mmmu_acc,0.29111,
|
| 358 |
+
≥2,10000,mmstar_average,0.3311992002187771,
|
| 359 |
+
≥2,10000,ocrbench_ocrbench_accuracy,0.524,
|
| 360 |
+
≥2,10000,seedbench_seed_all,0.5169538632573653,
|
| 361 |
+
≥2,10000,textvqa_val_exact_match,0.51468,0.006777646111841742
|
| 362 |
+
≥2,11000,ai2d_exact_match,0.42875647668393785,0.008907332750968604
|
| 363 |
+
≥2,11000,average,0.45881179587417986,
|
| 364 |
+
≥2,11000,average_rank,2.4,
|
| 365 |
+
≥2,11000,chartqa_relaxed_overall,0.6112,0.009751505562952713
|
| 366 |
+
≥2,11000,docvqa_val_anls,0.6351833269477972,0.006125490431617443
|
| 367 |
+
≥2,11000,infovqa_val_anls,0.23606787081800862,0.006703826515822327
|
| 368 |
+
≥2,11000,mme_total_score,1065.0292116846738,
|
| 369 |
+
≥2,11000,mmmu_val_mmmu_acc,0.29444,
|
| 370 |
+
≥2,11000,mmstar_average,0.3469301171004765,
|
| 371 |
+
≥2,11000,ocrbench_ocrbench_accuracy,0.534,
|
| 372 |
+
≥2,11000,seedbench_seed_all,0.5230683713173986,
|
| 373 |
+
≥2,11000,textvqa_val_exact_match,0.51966,0.006767766057679764
|
| 374 |
+
≥2,12000,ai2d_exact_match,0.4384715025906736,0.008930756993395149
|
| 375 |
+
≥2,12000,average,0.4594169631513685,
|
| 376 |
+
≥2,12000,average_rank,2.9,
|
| 377 |
+
≥2,12000,chartqa_relaxed_overall,0.604,0.009783245103435851
|
| 378 |
+
≥2,12000,docvqa_val_anls,0.6404649164504777,0.006108516485005316
|
| 379 |
+
≥2,12000,infovqa_val_anls,0.24399960384251437,0.006798095411909792
|
| 380 |
+
≥2,12000,mme_total_score,1071.3268307322928,
|
| 381 |
+
≥2,12000,mmmu_val_mmmu_acc,0.28,
|
| 382 |
+
≥2,12000,mmstar_average,0.3319074070128363,
|
| 383 |
+
≥2,12000,ocrbench_ocrbench_accuracy,0.535,
|
| 384 |
+
≥2,12000,seedbench_seed_all,0.5326292384658143,
|
| 385 |
+
≥2,12000,textvqa_val_exact_match,0.52828,0.006774867385094495
|
| 386 |
+
≥2,13000,ai2d_exact_match,0.4494818652849741,0.008953103134587205
|
| 387 |
+
≥2,13000,average,0.4664204868584231,
|
| 388 |
+
≥2,13000,average_rank,2.9,
|
| 389 |
+
≥2,13000,chartqa_relaxed_overall,0.6072,0.00976941352263433
|
| 390 |
+
≥2,13000,docvqa_val_anls,0.6520830792564345,0.006078195885582825
|
| 391 |
+
≥2,13000,infovqa_val_anls,0.2540091405377872,0.006900079046632844
|
| 392 |
+
≥2,13000,mme_total_score,1102.111644657863,
|
| 393 |
+
≥2,13000,mmmu_val_mmmu_acc,0.27889,
|
| 394 |
+
≥2,13000,mmstar_average,0.344523865295862,
|
| 395 |
+
≥2,13000,ocrbench_ocrbench_accuracy,0.544,
|
| 396 |
+
≥2,13000,seedbench_seed_all,0.5375764313507504,
|
| 397 |
+
≥2,13000,textvqa_val_exact_match,0.5300199999999999,0.006760687930991938
|
| 398 |
+
≥2,14000,ai2d_exact_match,0.46599740932642486,0.008978320789223167
|
| 399 |
+
≥2,14000,average,0.47503952495924406,
|
| 400 |
+
≥2,14000,average_rank,2.1,
|
| 401 |
+
≥2,14000,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 402 |
+
≥2,14000,docvqa_val_anls,0.6580902962118945,0.006056507155937736
|
| 403 |
+
≥2,14000,infovqa_val_anls,0.2596815364895075,0.006931336614399575
|
| 404 |
+
≥2,14000,mme_total_score,1081.4191676670669,
|
| 405 |
+
≥2,14000,mmmu_val_mmmu_acc,0.29333,
|
| 406 |
+
≥2,14000,mmstar_average,0.34673893952588086,
|
| 407 |
+
≥2,14000,ocrbench_ocrbench_accuracy,0.547,
|
| 408 |
+
≥2,14000,seedbench_seed_all,0.5395775430794886,
|
| 409 |
+
≥2,14000,textvqa_val_exact_match,0.5469400000000001,0.006754557875273413
|
| 410 |
+
≥2,15000,ai2d_exact_match,0.46211139896373055,0.008973279520621462
|
| 411 |
+
≥2,15000,average,0.4760526986294352,
|
| 412 |
+
≥2,15000,average_rank,2.4,
|
| 413 |
+
≥2,15000,chartqa_relaxed_overall,0.628,0.009668701749325345
|
| 414 |
+
≥2,15000,docvqa_val_anls,0.6648088448329239,0.006037271631807744
|
| 415 |
+
≥2,15000,infovqa_val_anls,0.25795022333006473,0.006890072365188988
|
| 416 |
+
≥2,15000,mme_total_score,1089.547719087635,
|
| 417 |
+
≥2,15000,mmmu_val_mmmu_acc,0.28556,
|
| 418 |
+
≥2,15000,mmstar_average,0.3469607521668807,
|
| 419 |
+
≥2,15000,ocrbench_ocrbench_accuracy,0.552,
|
| 420 |
+
≥2,15000,seedbench_seed_all,0.5415230683713174,
|
| 421 |
+
≥2,15000,textvqa_val_exact_match,0.5455599999999999,0.006760798692446918
|
| 422 |
+
≥2,16000,ai2d_exact_match,0.45919689119170987,0.008969138793675547
|
| 423 |
+
≥2,16000,average,0.47749499404431844,
|
| 424 |
+
≥2,16000,average_rank,2.6,
|
| 425 |
+
≥2,16000,chartqa_relaxed_overall,0.6308,0.009653694708691147
|
| 426 |
+
≥2,16000,docvqa_val_anls,0.6761390499297608,0.005978202784466009
|
| 427 |
+
≥2,16000,infovqa_val_anls,0.258655084903391,0.006852561120622793
|
| 428 |
+
≥2,16000,mme_total_score,1131.8199279711885,
|
| 429 |
+
≥2,16000,mmmu_val_mmmu_acc,0.28111,
|
| 430 |
+
≥2,16000,mmstar_average,0.3465829809632207,
|
| 431 |
+
≥2,16000,ocrbench_ocrbench_accuracy,0.55,
|
| 432 |
+
≥2,16000,seedbench_seed_all,0.5436909394107837,
|
| 433 |
+
≥2,16000,textvqa_val_exact_match,0.55128,0.00673284030314915
|
| 434 |
+
≥2,17000,ai2d_exact_match,0.47053108808290156,0.008983510489560252
|
| 435 |
+
≥2,17000,average,0.4824112637913165,
|
| 436 |
+
≥2,17000,average_rank,2.5,
|
| 437 |
+
≥2,17000,chartqa_relaxed_overall,0.6264,0.009677121197436144
|
| 438 |
+
≥2,17000,docvqa_val_anls,0.6761198524404004,0.005987381973810974
|
| 439 |
+
≥2,17000,infovqa_val_anls,0.2750604377713151,0.007138932651224592
|
| 440 |
+
≥2,17000,mme_total_score,1029.6003401360545,
|
| 441 |
+
≥2,17000,mmmu_val_mmmu_acc,0.28667,
|
| 442 |
+
≥2,17000,mmstar_average,0.35096186353151126,
|
| 443 |
+
≥2,17000,ocrbench_ocrbench_accuracy,0.554,
|
| 444 |
+
≥2,17000,seedbench_seed_all,0.5486381322957199,
|
| 445 |
+
≥2,17000,textvqa_val_exact_match,0.55332,0.006735374419712295
|
| 446 |
+
≥2,18000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
|
| 447 |
+
≥2,18000,average,0.48820815333600937,
|
| 448 |
+
≥2,18000,average_rank,2.0,
|
| 449 |
+
≥2,18000,chartqa_relaxed_overall,0.644,0.009578219924326623
|
| 450 |
+
≥2,18000,docvqa_val_anls,0.6810993351781675,0.005958907235334871
|
| 451 |
+
≥2,18000,infovqa_val_anls,0.26273964411171846,0.006970166334491144
|
| 452 |
+
≥2,18000,mme_total_score,1233.8152260904362,
|
| 453 |
+
≥2,18000,mmmu_val_mmmu_acc,0.30889,
|
| 454 |
+
≥2,18000,mmstar_average,0.35081513813292553,
|
| 455 |
+
≥2,18000,ocrbench_ocrbench_accuracy,0.578,
|
| 456 |
+
≥2,18000,seedbench_seed_all,0.5450250138966092,
|
| 457 |
+
≥2,18000,textvqa_val_exact_match,0.5550400000000001,0.006740445564002446
|
| 458 |
+
≥2,19000,ai2d_exact_match,0.48154145077720206,0.00899301968014488
|
| 459 |
+
≥2,19000,average,0.48747038725935266,
|
| 460 |
+
≥2,19000,average_rank,2.5,
|
| 461 |
+
≥2,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
|
| 462 |
+
≥2,19000,docvqa_val_anls,0.6803936935190295,0.005967440848189623
|
| 463 |
+
≥2,19000,infovqa_val_anls,0.2673329802724793,0.006976813332121409
|
| 464 |
+
≥2,19000,mme_total_score,1179.0547218887555,
|
| 465 |
+
≥2,19000,mmmu_val_mmmu_acc,0.29556,
|
| 466 |
+
≥2,19000,mmstar_average,0.340879324078415,
|
| 467 |
+
≥2,19000,ocrbench_ocrbench_accuracy,0.562,
|
| 468 |
+
≥2,19000,seedbench_seed_all,0.5588660366870484,
|
| 469 |
+
≥2,19000,textvqa_val_exact_match,0.5562600000000001,0.006734421501999508
|
| 470 |
+
≥2,20000,ai2d_exact_match,0.4805699481865285,0.008992356706334513
|
| 471 |
+
≥2,20000,average,0.49109872298543183,
|
| 472 |
+
≥2,20000,average_rank,1.7,
|
| 473 |
+
≥2,20000,chartqa_relaxed_overall,0.6464,0.009563650001989001
|
| 474 |
+
≥2,20000,docvqa_val_anls,0.6823974164165829,0.005959610876737005
|
| 475 |
+
≥2,20000,infovqa_val_anls,0.26825054401896686,0.007072214875698234
|
| 476 |
+
≥2,20000,mme_total_score,1187.1244497799119,
|
| 477 |
+
≥2,20000,mmmu_val_mmmu_acc,0.31,
|
| 478 |
+
≥2,20000,mmstar_average,0.3539436054730449,
|
| 479 |
+
≥2,20000,ocrbench_ocrbench_accuracy,0.568,
|
| 480 |
+
≥2,20000,seedbench_seed_all,0.5565869927737632,
|
| 481 |
+
≥2,20000,textvqa_val_exact_match,0.55374,0.006734617546282709
|
| 482 |
+
≥3,1000,ai2d_exact_match,0.2619818652849741,0.007914086941902848
|
| 483 |
+
≥3,1000,average,0.2794334029794183,
|
| 484 |
+
≥3,1000,average_rank,2.8,
|
| 485 |
+
≥3,1000,chartqa_relaxed_overall,0.3624,0.009615793331418735
|
| 486 |
+
≥3,1000,docvqa_val_anls,0.358726414254659,0.00583517632645742
|
| 487 |
+
≥3,1000,infovqa_val_anls,0.17567716068461908,0.0063503165333000855
|
| 488 |
+
≥3,1000,mme_total_score,754.6462585034014,
|
| 489 |
+
≥3,1000,mmmu_val_mmmu_acc,0.25889,
|
| 490 |
+
≥3,1000,mmstar_average,0.20669310209912864,
|
| 491 |
+
≥3,1000,ocrbench_ocrbench_accuracy,0.299,
|
| 492 |
+
≥3,1000,seedbench_seed_all,0.2537520844913841,
|
| 493 |
+
≥3,1000,textvqa_val_exact_match,0.33777999999999997,0.006462823526724795
|
| 494 |
+
≥3,2000,ai2d_exact_match,0.2707253886010363,0.007997269386750955
|
| 495 |
+
≥3,2000,average,0.324956811840241,
|
| 496 |
+
≥3,2000,average_rank,2.9,
|
| 497 |
+
≥3,2000,chartqa_relaxed_overall,0.468,0.009981495484186743
|
| 498 |
+
≥3,2000,docvqa_val_anls,0.4401305975808376,0.006085479161829202
|
| 499 |
+
≥3,2000,infovqa_val_anls,0.21738366907082515,0.00690560152820958
|
| 500 |
+
≥3,2000,mme_total_score,780.5238095238094,
|
| 501 |
+
≥3,2000,mmmu_val_mmmu_acc,0.25222,
|
| 502 |
+
≥3,2000,mmstar_average,0.2313413567013541,
|
| 503 |
+
≥3,2000,ocrbench_ocrbench_accuracy,0.386,
|
| 504 |
+
≥3,2000,seedbench_seed_all,0.2545302946081156,
|
| 505 |
+
≥3,2000,textvqa_val_exact_match,0.40428000000000003,0.006698634984990034
|
| 506 |
+
≥3,3000,ai2d_exact_match,0.27363989637305697,0.008024119445073188
|
| 507 |
+
≥3,3000,average,0.35281014111410386,
|
| 508 |
+
≥3,3000,average_rank,2.6,
|
| 509 |
+
≥3,3000,chartqa_relaxed_overall,0.5132,0.009998514495506157
|
| 510 |
+
≥3,3000,docvqa_val_anls,0.49578090596419144,0.0062540129206588675
|
| 511 |
+
≥3,3000,infovqa_val_anls,0.22472603379950587,0.006863330299819649
|
| 512 |
+
≥3,3000,mme_total_score,868.3095238095237,
|
| 513 |
+
≥3,3000,mmmu_val_mmmu_acc,0.27444,
|
| 514 |
+
≥3,3000,mmstar_average,0.25839301643603935,
|
| 515 |
+
≥3,3000,ocrbench_ocrbench_accuracy,0.409,
|
| 516 |
+
≥3,3000,seedbench_seed_all,0.2925514174541412,
|
| 517 |
+
≥3,3000,textvqa_val_exact_match,0.43356000000000006,0.006754959006110611
|
| 518 |
+
≥3,4000,ai2d_exact_match,0.28335492227979275,0.008110527983566214
|
| 519 |
+
≥3,4000,average,0.3674252373982893,
|
| 520 |
+
≥3,4000,average_rank,3.5,
|
| 521 |
+
≥3,4000,chartqa_relaxed_overall,0.5396,0.009970581778431997
|
| 522 |
+
≥3,4000,docvqa_val_anls,0.5289127945577605,0.006289931251894248
|
| 523 |
+
≥3,4000,infovqa_val_anls,0.21582133824627234,0.00674279410471775
|
| 524 |
+
≥3,4000,mme_total_score,889.4285714285714,
|
| 525 |
+
≥3,4000,mmmu_val_mmmu_acc,0.24556,
|
| 526 |
+
≥3,4000,mmstar_average,0.2644251965647028,
|
| 527 |
+
≥3,4000,ocrbench_ocrbench_accuracy,0.429,
|
| 528 |
+
≥3,4000,seedbench_seed_all,0.3471928849360756,
|
| 529 |
+
≥3,4000,textvqa_val_exact_match,0.45296000000000003,0.006791544205446865
|
| 530 |
+
≥3,5000,ai2d_exact_match,0.33516839378238344,0.008496088804445223
|
| 531 |
+
≥3,5000,average,0.39888444206353324,
|
| 532 |
+
≥3,5000,average_rank,3.2,
|
| 533 |
+
≥3,5000,chartqa_relaxed_overall,0.5716,0.009898917689756362
|
| 534 |
+
≥3,5000,docvqa_val_anls,0.5575899695261644,0.006265975659556661
|
| 535 |
+
≥3,5000,infovqa_val_anls,0.23013455835644483,0.0068368490116401705
|
| 536 |
+
≥3,5000,mme_total_score,985.1445578231293,
|
| 537 |
+
≥3,5000,mmmu_val_mmmu_acc,0.27111,
|
| 538 |
+
≥3,5000,mmstar_average,0.2946740552392133,
|
| 539 |
+
≥3,5000,ocrbench_ocrbench_accuracy,0.43,
|
| 540 |
+
≥3,5000,seedbench_seed_all,0.4254030016675931,
|
| 541 |
+
≥3,5000,textvqa_val_exact_match,0.4742799999999999,0.006788410183729657
|
| 542 |
+
≥3,6000,ai2d_exact_match,0.3601036269430052,0.008639731726372677
|
| 543 |
+
≥3,6000,average,0.4225217783490169,
|
| 544 |
+
≥3,6000,average_rank,2.1,
|
| 545 |
+
≥3,6000,chartqa_relaxed_overall,0.5752,0.009888230116554488
|
| 546 |
+
≥3,6000,docvqa_val_anls,0.5829205672304983,0.006247927399021504
|
| 547 |
+
≥3,6000,infovqa_val_anls,0.24796306866649032,0.007051352766215089
|
| 548 |
+
≥3,6000,mme_total_score,894.8299319727892,
|
| 549 |
+
≥3,6000,mmmu_val_mmmu_acc,0.28667,
|
| 550 |
+
≥3,6000,mmstar_average,0.32224643546402654,
|
| 551 |
+
≥3,6000,ocrbench_ocrbench_accuracy,0.466,
|
| 552 |
+
≥3,6000,seedbench_seed_all,0.4741523068371317,
|
| 553 |
+
≥3,6000,textvqa_val_exact_match,0.48744000000000004,0.006797771047795272
|
| 554 |
+
≥3,7000,ai2d_exact_match,0.39378238341968913,0.008793749766856823
|
| 555 |
+
≥3,7000,average,0.42775795560136004,
|
| 556 |
+
≥3,7000,average_rank,2.5,
|
| 557 |
+
≥3,7000,chartqa_relaxed_overall,0.5876,0.009847298295140926
|
| 558 |
+
≥3,7000,docvqa_val_anls,0.6000941606468793,0.006194010994352466
|
| 559 |
+
≥3,7000,infovqa_val_anls,0.24479859857192363,0.007060559159607034
|
| 560 |
+
≥3,7000,mme_total_score,876.7176870748299,
|
| 561 |
+
≥3,7000,mmmu_val_mmmu_acc,0.27222,
|
| 562 |
+
≥3,7000,mmstar_average,0.2977466801194958,
|
| 563 |
+
≥3,7000,ocrbench_ocrbench_accuracy,0.477,
|
| 564 |
+
≥3,7000,seedbench_seed_all,0.4795997776542524,
|
| 565 |
+
≥3,7000,textvqa_val_exact_match,0.49698000000000003,0.0067935120726511
|
| 566 |
+
≥3,8000,ai2d_exact_match,0.4057642487046632,0.008837877210720615
|
| 567 |
+
≥3,8000,average,0.43551031057375794,
|
| 568 |
+
≥3,8000,average_rank,2.9,
|
| 569 |
+
≥3,8000,chartqa_relaxed_overall,0.592,0.009831228876620145
|
| 570 |
+
≥3,8000,docvqa_val_anls,0.6177973292861353,0.006138014034823096
|
| 571 |
+
≥3,8000,infovqa_val_anls,0.23532628107457257,0.00676650636184475
|
| 572 |
+
≥3,8000,mme_total_score,939.6819727891157,
|
| 573 |
+
≥3,8000,mmmu_val_mmmu_acc,0.27444,
|
| 574 |
+
≥3,8000,mmstar_average,0.29094521403063506,
|
| 575 |
+
≥3,8000,ocrbench_ocrbench_accuracy,0.499,
|
| 576 |
+
≥3,8000,seedbench_seed_all,0.49949972206781545,
|
| 577 |
+
≥3,8000,textvqa_val_exact_match,0.5048199999999999,0.0067899465531651255
|
| 578 |
+
≥3,9000,ai2d_exact_match,0.40770725388601037,0.008844516803704298
|
| 579 |
+
≥3,9000,average,0.4390017474760467,
|
| 580 |
+
≥3,9000,average_rank,2.9,
|
| 581 |
+
≥3,9000,chartqa_relaxed_overall,0.5872,0.009848718845878486
|
| 582 |
+
≥3,9000,docvqa_val_anls,0.61752739984947,0.00618332088681346
|
| 583 |
+
≥3,9000,infovqa_val_anls,0.25912362264120503,0.007280015371693194
|
| 584 |
+
≥3,9000,mme_total_score,879.0001000400159,
|
| 585 |
+
≥3,9000,mmmu_val_mmmu_acc,0.27889,
|
| 586 |
+
≥3,9000,mmstar_average,0.31710081388716777,
|
| 587 |
+
≥3,9000,ocrbench_ocrbench_accuracy,0.475,
|
| 588 |
+
≥3,9000,seedbench_seed_all,0.503946637020567,
|
| 589 |
+
≥3,9000,textvqa_val_exact_match,0.5045200000000001,0.006796966505047244
|
| 590 |
+
≥3,10000,ai2d_exact_match,0.41580310880829013,0.008870644443998564
|
| 591 |
+
≥3,10000,average,0.4482767982697443,
|
| 592 |
+
≥3,10000,average_rank,3.1,
|
| 593 |
+
≥3,10000,chartqa_relaxed_overall,0.5948,0.009820578470976232
|
| 594 |
+
≥3,10000,docvqa_val_anls,0.632014816225421,0.006118052909783931
|
| 595 |
+
≥3,10000,infovqa_val_anls,0.26061122659986763,0.007146718031628882
|
| 596 |
+
≥3,10000,mme_total_score,988.3401360544218,
|
| 597 |
+
≥3,10000,mmmu_val_mmmu_acc,0.28556,
|
| 598 |
+
≥3,10000,mmstar_average,0.30861783045948943,
|
| 599 |
+
≥3,10000,ocrbench_ocrbench_accuracy,0.506,
|
| 600 |
+
≥3,10000,seedbench_seed_all,0.5155642023346303,
|
| 601 |
+
≥3,10000,textvqa_val_exact_match,0.5155200000000001,0.006789480490366388
|
| 602 |
+
≥3,11000,ai2d_exact_match,0.43944300518134716,0.0089329077973751
|
| 603 |
+
≥3,11000,average,0.4597510485723372,
|
| 604 |
+
≥3,11000,average_rank,2.5,
|
| 605 |
+
≥3,11000,chartqa_relaxed_overall,0.6092,0.009760545645634788
|
| 606 |
+
≥3,11000,docvqa_val_anls,0.6464425255299558,0.006062020581004778
|
| 607 |
+
≥3,11000,infovqa_val_anls,0.25020176764946855,0.006887224684938156
|
| 608 |
+
≥3,11000,mme_total_score,960.4336734693878,
|
| 609 |
+
≥3,11000,mmmu_val_mmmu_acc,0.28556,
|
| 610 |
+
≥3,11000,mmstar_average,0.3246843233372334,
|
| 611 |
+
≥3,11000,ocrbench_ocrbench_accuracy,0.523,
|
| 612 |
+
≥3,11000,seedbench_seed_all,0.5220678154530295,
|
| 613 |
+
≥3,11000,textvqa_val_exact_match,0.53716,0.006766105446753199
|
| 614 |
+
≥3,12000,ai2d_exact_match,0.44430051813471505,0.008943141268224502
|
| 615 |
+
≥3,12000,average,0.46138583256526733,
|
| 616 |
+
≥3,12000,average_rank,2.4,
|
| 617 |
+
≥3,12000,chartqa_relaxed_overall,0.6176,0.00972141442174665
|
| 618 |
+
≥3,12000,docvqa_val_anls,0.6470164139517766,0.006072800791453103
|
| 619 |
+
≥3,12000,infovqa_val_anls,0.25520554317365624,0.00698649679999368
|
| 620 |
+
≥3,12000,mme_total_score,907.6496598639455,
|
| 621 |
+
≥3,12000,mmmu_val_mmmu_acc,0.26889,
|
| 622 |
+
≥3,12000,mmstar_average,0.3420805959262021,
|
| 623 |
+
≥3,12000,ocrbench_ocrbench_accuracy,0.518,
|
| 624 |
+
≥3,12000,seedbench_seed_all,0.5269594219010562,
|
| 625 |
+
≥3,12000,textvqa_val_exact_match,0.53242,0.006773903296709706
|
| 626 |
+
≥3,13000,ai2d_exact_match,0.4523963730569948,0.00895827521082005
|
| 627 |
+
≥3,13000,average,0.46949309002333817,
|
| 628 |
+
≥3,13000,average_rank,2.4,
|
| 629 |
+
≥3,13000,chartqa_relaxed_overall,0.6212,0.009703704898413913
|
| 630 |
+
≥3,13000,docvqa_val_anls,0.6619667030411912,0.006021347175756138
|
| 631 |
+
≥3,13000,infovqa_val_anls,0.2616368936908815,0.007081151619852865
|
| 632 |
+
≥3,13000,mme_total_score,949.1751700680272,
|
| 633 |
+
≥3,13000,mmmu_val_mmmu_acc,0.29,
|
| 634 |
+
≥3,13000,mmstar_average,0.3366565952847893,
|
| 635 |
+
≥3,13000,ocrbench_ocrbench_accuracy,0.531,
|
| 636 |
+
≥3,13000,seedbench_seed_all,0.5342412451361868,
|
| 637 |
+
≥3,13000,textvqa_val_exact_match,0.5363399999999999,0.006781436312145912
|
| 638 |
+
≥3,14000,ai2d_exact_match,0.45466321243523317,0.008962083606139334
|
| 639 |
+
≥3,14000,average,0.4712975949864662,
|
| 640 |
+
≥3,14000,average_rank,2.7,
|
| 641 |
+
≥3,14000,chartqa_relaxed_overall,0.6264,0.009677121197436144
|
| 642 |
+
≥3,14000,docvqa_val_anls,0.6740908198240112,0.005957001035082802
|
| 643 |
+
≥3,14000,infovqa_val_anls,0.2577834440006994,0.006966195686343909
|
| 644 |
+
≥3,14000,mme_total_score,1001.2684073629453,
|
| 645 |
+
≥3,14000,mmmu_val_mmmu_acc,0.29333,
|
| 646 |
+
≥3,14000,mmstar_average,0.32549694865716267,
|
| 647 |
+
≥3,14000,ocrbench_ocrbench_accuracy,0.523,
|
| 648 |
+
≥3,14000,seedbench_seed_all,0.5330739299610895,
|
| 649 |
+
≥3,14000,textvqa_val_exact_match,0.55384,0.006735794315818514
|
| 650 |
+
≥3,15000,ai2d_exact_match,0.4540155440414508,0.008961014613274426
|
| 651 |
+
≥3,15000,average,0.47089632593243724,
|
| 652 |
+
≥3,15000,average_rank,3.0,
|
| 653 |
+
≥3,15000,chartqa_relaxed_overall,0.6308,0.009653694708691147
|
| 654 |
+
≥3,15000,docvqa_val_anls,0.6653892896567976,0.006002863596715536
|
| 655 |
+
≥3,15000,infovqa_val_anls,0.25006728644957676,0.006925310310812123
|
| 656 |
+
≥3,15000,mme_total_score,952.5915366146459,
|
| 657 |
+
≥3,15000,mmmu_val_mmmu_acc,0.28778,
|
| 658 |
+
≥3,15000,mmstar_average,0.321321405795528,
|
| 659 |
+
≥3,15000,ocrbench_ocrbench_accuracy,0.544,
|
| 660 |
+
≥3,15000,seedbench_seed_all,0.5401334074485825,
|
| 661 |
+
≥3,15000,textvqa_val_exact_match,0.5445599999999999,0.00676213180626591
|
| 662 |
+
≥3,16000,ai2d_exact_match,0.46113989637305697,0.008971933568013594
|
| 663 |
+
≥3,16000,average,0.4765857166360205,
|
| 664 |
+
≥3,16000,average_rank,2.7,
|
| 665 |
+
≥3,16000,chartqa_relaxed_overall,0.6292,0.00966231277258432
|
| 666 |
+
≥3,16000,docvqa_val_anls,0.6750185163043225,0.005960110331744373
|
| 667 |
+
≥3,16000,infovqa_val_anls,0.26628641470953346,0.007064079945590166
|
| 668 |
+
≥3,16000,mme_total_score,1021.2616046418567,
|
| 669 |
+
≥3,16000,mmmu_val_mmmu_acc,0.28889,
|
| 670 |
+
≥3,16000,mmstar_average,0.31529947948012893,
|
| 671 |
+
≥3,16000,ocrbench_ocrbench_accuracy,0.558,
|
| 672 |
+
≥3,16000,seedbench_seed_all,0.5428571428571428,
|
| 673 |
+
≥3,16000,textvqa_val_exact_match,0.5525800000000001,0.0067489094137982856
|
| 674 |
+
≥3,17000,ai2d_exact_match,0.46016839378238344,0.00897055333097463
|
| 675 |
+
≥3,17000,average,0.4767395382894575,
|
| 676 |
+
≥3,17000,average_rank,2.8,
|
| 677 |
+
≥3,17000,chartqa_relaxed_overall,0.636,0.009624897685803465
|
| 678 |
+
≥3,17000,docvqa_val_anls,0.6722259229035369,0.005969094115618876
|
| 679 |
+
≥3,17000,infovqa_val_anls,0.24935441742721265,0.006802032350689338
|
| 680 |
+
≥3,17000,mme_total_score,936.1445578231293,
|
| 681 |
+
≥3,17000,mmmu_val_mmmu_acc,0.28778,
|
| 682 |
+
≥3,17000,mmstar_average,0.3236027747499053,
|
| 683 |
+
≥3,17000,ocrbench_ocrbench_accuracy,0.557,
|
| 684 |
+
≥3,17000,seedbench_seed_all,0.547804335742079,
|
| 685 |
+
≥3,17000,textvqa_val_exact_match,0.55672,0.00673676815164555
|
| 686 |
+
≥3,18000,ai2d_exact_match,0.4637305699481865,0.008975446629055962
|
| 687 |
+
≥3,18000,average,0.48384455011990835,
|
| 688 |
+
≥3,18000,average_rank,2.5,
|
| 689 |
+
≥3,18000,chartqa_relaxed_overall,0.6448,0.009573392498878078
|
| 690 |
+
≥3,18000,docvqa_val_anls,0.6796365109602944,0.005968558973913562
|
| 691 |
+
≥3,18000,infovqa_val_anls,0.26322789951300113,0.0069935719047194405
|
| 692 |
+
≥3,18000,mme_total_score,1086.5858343337334,
|
| 693 |
+
≥3,18000,mmmu_val_mmmu_acc,0.29667,
|
| 694 |
+
≥3,18000,mmstar_average,0.32901024525469136,
|
| 695 |
+
≥3,18000,ocrbench_ocrbench_accuracy,0.568,
|
| 696 |
+
≥3,18000,seedbench_seed_all,0.5503057254030017,
|
| 697 |
+
≥3,18000,textvqa_val_exact_match,0.5592199999999999,0.006724703907870569
|
| 698 |
+
≥3,19000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
|
| 699 |
+
≥3,19000,average,0.48669992125156075,
|
| 700 |
+
≥3,19000,average_rank,2.6,
|
| 701 |
+
≥3,19000,chartqa_relaxed_overall,0.6432,0.009583018193402223
|
| 702 |
+
≥3,19000,docvqa_val_anls,0.6804836987040075,0.005959667749728633
|
| 703 |
+
≥3,19000,infovqa_val_anls,0.2638534672767717,0.007026196725406296
|
| 704 |
+
≥3,19000,mme_total_score,960.3163265306122,
|
| 705 |
+
≥3,19000,mmmu_val_mmmu_acc,0.29778,
|
| 706 |
+
≥3,19000,mmstar_average,0.3418245205114739,
|
| 707 |
+
≥3,19000,ocrbench_ocrbench_accuracy,0.568,
|
| 708 |
+
≥3,19000,seedbench_seed_all,0.5493051695386326,
|
| 709 |
+
≥3,19000,textvqa_val_exact_match,0.56014,0.006731277597872481
|
| 710 |
+
≥3,20000,ai2d_exact_match,0.47117875647668395,0.008984191131586656
|
| 711 |
+
≥3,20000,average,0.4903196828425222,
|
| 712 |
+
≥3,20000,average_rank,2.2,
|
| 713 |
+
≥3,20000,chartqa_relaxed_overall,0.648,0.009553790345406665
|
| 714 |
+
≥3,20000,docvqa_val_anls,0.6902930502166585,0.0059096225576472155
|
| 715 |
+
≥3,20000,infovqa_val_anls,0.2637260616044305,0.007044756469416206
|
| 716 |
+
≥3,20000,mme_total_score,968.2636054421769,
|
| 717 |
+
≥3,20000,mmmu_val_mmmu_acc,0.29778,
|
| 718 |
+
≥3,20000,mmstar_average,0.3516103723377342,
|
| 719 |
+
≥3,20000,ocrbench_ocrbench_accuracy,0.568,
|
| 720 |
+
≥3,20000,seedbench_seed_all,0.5520289049471929,
|
| 721 |
+
≥3,20000,textvqa_val_exact_match,0.57026,0.0067066312154801
|
| 722 |
+
≥4,1000,ai2d_exact_match,0.24514248704663213,0.00774236194438642
|
| 723 |
+
≥4,1000,average,0.2886475913888803,
|
| 724 |
+
≥4,1000,average_rank,2.3,
|
| 725 |
+
≥4,1000,chartqa_relaxed_overall,0.3972,0.009788318981080978
|
| 726 |
+
≥4,1000,docvqa_val_anls,0.37365436598717294,0.005925680297715887
|
| 727 |
+
≥4,1000,infovqa_val_anls,0.17743073573571846,0.0061602017146906085
|
| 728 |
+
≥4,1000,mme_total_score,632.5170068027211,
|
| 729 |
+
≥4,1000,mmmu_val_mmmu_acc,0.26889,
|
| 730 |
+
≥4,1000,mmstar_average,0.2207913007120554,
|
| 731 |
+
≥4,1000,ocrbench_ocrbench_accuracy,0.295,
|
| 732 |
+
≥4,1000,seedbench_seed_all,0.26297943301834353,
|
| 733 |
+
≥4,1000,textvqa_val_exact_match,0.35674,0.006549831642027738
|
| 734 |
+
≥4,2000,ai2d_exact_match,0.2658678756476684,0.007951548865715979
|
| 735 |
+
≥4,2000,average,0.3301948089612685,
|
| 736 |
+
≥4,2000,average_rank,2.2,
|
| 737 |
+
≥4,2000,chartqa_relaxed_overall,0.5108,0.009999667061284322
|
| 738 |
+
≥4,2000,docvqa_val_anls,0.47288379978857037,0.006197116763458197
|
| 739 |
+
≥4,2000,infovqa_val_anls,0.19614396715396193,0.006363207550147918
|
| 740 |
+
≥4,2000,mme_total_score,649.1020408163265,
|
| 741 |
+
≥4,2000,mmmu_val_mmmu_acc,0.25333,
|
| 742 |
+
≥4,2000,mmstar_average,0.23593804384220507,
|
| 743 |
+
≥4,2000,ocrbench_ocrbench_accuracy,0.344,
|
| 744 |
+
≥4,2000,seedbench_seed_all,0.28526959421901055,
|
| 745 |
+
≥4,2000,textvqa_val_exact_match,0.40752,0.006707017723053031
|
| 746 |
+
≥4,3000,ai2d_exact_match,0.27428756476683935,0.008030027397236163
|
| 747 |
+
≥4,3000,average,0.35549277156858605,
|
| 748 |
+
≥4,3000,average_rank,3.1,
|
| 749 |
+
≥4,3000,chartqa_relaxed_overall,0.5332,0.009979927032670678
|
| 750 |
+
≥4,3000,docvqa_val_anls,0.5073841710534057,0.006243585075672888
|
| 751 |
+
≥4,3000,infovqa_val_anls,0.2112620781635733,0.006555166517270566
|
| 752 |
+
≥4,3000,mme_total_score,631.2074829931973,
|
| 753 |
+
≥4,3000,mmmu_val_mmmu_acc,0.27,
|
| 754 |
+
≥4,3000,mmstar_average,0.23471667210121605,
|
| 755 |
+
≥4,3000,ocrbench_ocrbench_accuracy,0.404,
|
| 756 |
+
≥4,3000,seedbench_seed_all,0.34402445803224013,
|
| 757 |
+
≥4,3000,textvqa_val_exact_match,0.42056000000000004,0.006749442071286688
|
| 758 |
+
≥4,4000,ai2d_exact_match,0.31832901554404147,0.008384114535775948
|
| 759 |
+
≥4,4000,average,0.385231957814173,
|
| 760 |
+
≥4,4000,average_rank,2.4,
|
| 761 |
+
≥4,4000,chartqa_relaxed_overall,0.5652,0.009916598185256227
|
| 762 |
+
≥4,4000,docvqa_val_anls,0.5416928947604102,0.006213976135239445
|
| 763 |
+
≥4,4000,infovqa_val_anls,0.20356144693573172,0.0062836907942324565
|
| 764 |
+
≥4,4000,mme_total_score,653.2091836734694,
|
| 765 |
+
≥4,4000,mmmu_val_mmmu_acc,0.28,
|
| 766 |
+
≥4,4000,mmstar_average,0.29405509132528374,
|
| 767 |
+
≥4,4000,ocrbench_ocrbench_accuracy,0.405,
|
| 768 |
+
≥4,4000,seedbench_seed_all,0.41650917176209007,
|
| 769 |
+
≥4,4000,textvqa_val_exact_match,0.44273999999999997,0.006779280950811967
|
| 770 |
+
≥4,5000,ai2d_exact_match,0.36755181347150256,0.00867767630454297
|
| 771 |
+
≥4,5000,average,0.4077588201424885,
|
| 772 |
+
≥4,5000,average_rank,2.6,
|
| 773 |
+
≥4,5000,chartqa_relaxed_overall,0.58,0.009873144969898833
|
| 774 |
+
≥4,5000,docvqa_val_anls,0.5630037000906716,0.006209962710311604
|
| 775 |
+
≥4,5000,infovqa_val_anls,0.21458613370689986,0.006397856835104317
|
| 776 |
+
≥4,5000,mme_total_score,625.7602040816327,
|
| 777 |
+
≥4,5000,mmmu_val_mmmu_acc,0.27333,
|
| 778 |
+
≥4,5000,mmstar_average,0.3209608357365017,
|
| 779 |
+
≥4,5000,ocrbench_ocrbench_accuracy,0.445,
|
| 780 |
+
≥4,5000,seedbench_seed_all,0.45041689827682047,
|
| 781 |
+
≥4,5000,textvqa_val_exact_match,0.45498,0.006789422594877592
|
| 782 |
+
≥4,6000,ai2d_exact_match,0.3785621761658031,0.008729696327646355
|
| 783 |
+
≥4,6000,average,0.41319441398053125,
|
| 784 |
+
≥4,6000,average_rank,3.0,
|
| 785 |
+
≥4,6000,chartqa_relaxed_overall,0.598,0.009808000752013664
|
| 786 |
+
≥4,6000,docvqa_val_anls,0.5873915760067876,0.006194850343529871
|
| 787 |
+
≥4,6000,infovqa_val_anls,0.2118313803571488,0.0064268256454762555
|
| 788 |
+
≥4,6000,mme_total_score,647.9846938775511,
|
| 789 |
+
≥4,6000,mmmu_val_mmmu_acc,0.27667,
|
| 790 |
+
≥4,6000,mmstar_average,0.31152278673584216,
|
| 791 |
+
≥4,6000,ocrbench_ocrbench_accuracy,0.432,
|
| 792 |
+
≥4,6000,seedbench_seed_all,0.45325180655919955,
|
| 793 |
+
≥4,6000,textvqa_val_exact_match,0.46952,0.006815356464393287
|
| 794 |
+
≥4,7000,ai2d_exact_match,0.405440414507772,0.008836756671878079
|
| 795 |
+
≥4,7000,average,0.4292314266450108,
|
| 796 |
+
≥4,7000,average_rank,2.5,
|
| 797 |
+
≥4,7000,chartqa_relaxed_overall,0.602,0.00979166741164548
|
| 798 |
+
≥4,7000,docvqa_val_anls,0.5975001541722433,0.006202378232201727
|
| 799 |
+
≥4,7000,infovqa_val_anls,0.22746329153304923,0.006598501883805769
|
| 800 |
+
≥4,7000,mme_total_score,644.2482993197278,
|
| 801 |
+
≥4,7000,mmmu_val_mmmu_acc,0.29556,
|
| 802 |
+
≥4,7000,mmstar_average,0.3402250385136554,
|
| 803 |
+
≥4,7000,ocrbench_ocrbench_accuracy,0.456,
|
| 804 |
+
≥4,7000,seedbench_seed_all,0.4690939410783769,
|
| 805 |
+
≥4,7000,textvqa_val_exact_match,0.4698,0.006774981333879443
|
| 806 |
+
≥4,8000,ai2d_exact_match,0.420660621761658,0.008885137221616577
|
| 807 |
+
≥4,8000,average,0.4378486452448895,
|
| 808 |
+
≥4,8000,average_rank,2.7,
|
| 809 |
+
≥4,8000,chartqa_relaxed_overall,0.6104,0.009755142291143075
|
| 810 |
+
≥4,8000,docvqa_val_anls,0.6138040101801899,0.006179898911166834
|
| 811 |
+
≥4,8000,infovqa_val_anls,0.22606978572806807,0.006579692710461168
|
| 812 |
+
≥4,8000,mme_total_score,658.9676870748299,
|
| 813 |
+
≥4,8000,mmmu_val_mmmu_acc,0.28778,
|
| 814 |
+
≥4,8000,mmstar_average,0.3468841677442059,
|
| 815 |
+
≥4,8000,ocrbench_ocrbench_accuracy,0.463,
|
| 816 |
+
≥4,8000,seedbench_seed_all,0.4785992217898833,
|
| 817 |
+
≥4,8000,textvqa_val_exact_match,0.49344000000000005,0.006803341118162017
|
| 818 |
+
≥4,9000,ai2d_exact_match,0.4219559585492228,0.008888852746011196
|
| 819 |
+
≥4,9000,average,0.4420874430953781,
|
| 820 |
+
≥4,9000,average_rank,2.6,
|
| 821 |
+
≥4,9000,chartqa_relaxed_overall,0.6152,0.009732906852031212
|
| 822 |
+
≥4,9000,docvqa_val_anls,0.6305245733667586,0.006112674867758156
|
| 823 |
+
≥4,9000,infovqa_val_anls,0.2397582783787718,0.006679019564643084
|
| 824 |
+
≥4,9000,mme_total_score,637.5170068027211,
|
| 825 |
+
≥4,9000,mmmu_val_mmmu_acc,0.29444,
|
| 826 |
+
≥4,9000,mmstar_average,0.32453031208282723,
|
| 827 |
+
≥4,9000,ocrbench_ocrbench_accuracy,0.47,
|
| 828 |
+
≥4,9000,seedbench_seed_all,0.4841578654808227,
|
| 829 |
+
≥4,9000,textvqa_val_exact_match,0.49822,0.0067895350265813805
|
| 830 |
+
≥4,10000,ai2d_exact_match,0.4226036269430052,0.008890687000142644
|
| 831 |
+
≥4,10000,average,0.44420978068105677,
|
| 832 |
+
≥4,10000,average_rank,3.0,
|
| 833 |
+
≥4,10000,chartqa_relaxed_overall,0.6216,0.009701702181065136
|
| 834 |
+
≥4,10000,docvqa_val_anls,0.6324744510097755,0.006105489658656957
|
| 835 |
+
≥4,10000,infovqa_val_anls,0.2379840149655046,0.006688060277831451
|
| 836 |
+
≥4,10000,mme_total_score,655.0340136054422,
|
| 837 |
+
≥4,10000,mmmu_val_mmmu_acc,0.27778,
|
| 838 |
+
≥4,10000,mmstar_average,0.3590975730111145,
|
| 839 |
+
≥4,10000,ocrbench_ocrbench_accuracy,0.46,
|
| 840 |
+
≥4,10000,seedbench_seed_all,0.48704836020011116,
|
| 841 |
+
≥4,10000,textvqa_val_exact_match,0.4993,0.006805289442255823
|
| 842 |
+
≥4,11000,ai2d_exact_match,0.43102331606217614,0.008913110733383509
|
| 843 |
+
≥4,11000,average,0.44988739002985145,
|
| 844 |
+
≥4,11000,average_rank,3.2,
|
| 845 |
+
≥4,11000,chartqa_relaxed_overall,0.634,0.00963611653607192
|
| 846 |
+
≥4,11000,docvqa_val_anls,0.6322712133935365,0.006121517573716792
|
| 847 |
+
≥4,11000,infovqa_val_anls,0.2413865745385472,0.006692342108960141
|
| 848 |
+
≥4,11000,mme_total_score,658.1836734693877,
|
| 849 |
+
≥4,11000,mmmu_val_mmmu_acc,0.28444,
|
| 850 |
+
≥4,11000,mmstar_average,0.342313121671846,
|
| 851 |
+
≥4,11000,ocrbench_ocrbench_accuracy,0.479,
|
| 852 |
+
≥4,11000,seedbench_seed_all,0.502112284602557,
|
| 853 |
+
≥4,11000,textvqa_val_exact_match,0.50244,0.00679965119188229
|
| 854 |
+
≥4,12000,ai2d_exact_match,0.42875647668393785,0.008907332750968597
|
| 855 |
+
≥4,12000,average,0.4548323782860016,
|
| 856 |
+
≥4,12000,average_rank,2.8,
|
| 857 |
+
≥4,12000,chartqa_relaxed_overall,0.6304,0.009655859891905061
|
| 858 |
+
≥4,12000,docvqa_val_anls,0.6455348323026604,0.006090623668615334
|
| 859 |
+
≥4,12000,infovqa_val_anls,0.24824487207836357,0.006776227086451809
|
| 860 |
+
≥4,12000,mme_total_score,659.5289115646259,
|
| 861 |
+
≥4,12000,mmmu_val_mmmu_acc,0.28667,
|
| 862 |
+
≥4,12000,mmstar_average,0.3540534947708648,
|
| 863 |
+
≥4,12000,ocrbench_ocrbench_accuracy,0.479,
|
| 864 |
+
≥4,12000,seedbench_seed_all,0.5011117287381879,
|
| 865 |
+
≥4,12000,textvqa_val_exact_match,0.51972,0.006789421445801825
|
| 866 |
+
≥4,13000,ai2d_exact_match,0.43879533678756477,0.008931477789122115
|
| 867 |
+
≥4,13000,average,0.4591999882773314,
|
| 868 |
+
≥4,13000,average_rank,2.9,
|
| 869 |
+
≥4,13000,chartqa_relaxed_overall,0.6404,0.009599583157550096
|
| 870 |
+
≥4,13000,docvqa_val_anls,0.6527664026795218,0.006064581205597092
|
| 871 |
+
≥4,13000,infovqa_val_anls,0.25301984861581456,0.006782117731741186
|
| 872 |
+
≥4,13000,mme_total_score,682.5748299319728,
|
| 873 |
+
≥4,13000,mmmu_val_mmmu_acc,0.29111,
|
| 874 |
+
≥4,13000,mmstar_average,0.3517055270912357,
|
| 875 |
+
≥4,13000,ocrbench_ocrbench_accuracy,0.478,
|
| 876 |
+
≥4,13000,seedbench_seed_all,0.5050027793218455,
|
| 877 |
+
≥4,13000,textvqa_val_exact_match,0.522,0.0067926156909974755
|
| 878 |
+
≥4,14000,ai2d_exact_match,0.4323186528497409,0.008916326937351901
|
| 879 |
+
≥4,14000,average,0.4646863548031565,
|
| 880 |
+
≥4,14000,average_rank,3.0,
|
| 881 |
+
≥4,14000,chartqa_relaxed_overall,0.644,0.009578219924326623
|
| 882 |
+
≥4,14000,docvqa_val_anls,0.6548905776766276,0.006057263905849616
|
| 883 |
+
≥4,14000,infovqa_val_anls,0.2562200123257713,0.006874581648592813
|
| 884 |
+
≥4,14000,mme_total_score,671.9404761904761,
|
| 885 |
+
≥4,14000,mmmu_val_mmmu_acc,0.29667,
|
| 886 |
+
≥4,14000,mmstar_average,0.37705986254969837,
|
| 887 |
+
≥4,14000,ocrbench_ocrbench_accuracy,0.493,
|
| 888 |
+
≥4,14000,seedbench_seed_all,0.5045580878265703,
|
| 889 |
+
≥4,14000,textvqa_val_exact_match,0.52346,0.006781469114039297
|
| 890 |
+
≥4,15000,ai2d_exact_match,0.44430051813471505,0.008943141268224495
|
| 891 |
+
≥4,15000,average,0.46755888531075723,
|
| 892 |
+
≥4,15000,average_rank,3.2,
|
| 893 |
+
≥4,15000,chartqa_relaxed_overall,0.6456,0.009568535872927508
|
| 894 |
+
≥4,15000,docvqa_val_anls,0.6580148685293012,0.006071011273366836
|
| 895 |
+
≥4,15000,infovqa_val_anls,0.25650876918794263,0.006807617862342499
|
| 896 |
+
≥4,15000,mme_total_score,652.6581632653061,
|
| 897 |
+
≥4,15000,mmmu_val_mmmu_acc,0.3,
|
| 898 |
+
≥4,15000,mmstar_average,0.37405941950461163,
|
| 899 |
+
≥4,15000,ocrbench_ocrbench_accuracy,0.494,
|
| 900 |
+
≥4,15000,seedbench_seed_all,0.5115063924402445,
|
| 901 |
+
≥4,15000,textvqa_val_exact_match,0.5240400000000001,0.006789641781942949
|
| 902 |
+
≥4,16000,ai2d_exact_match,0.44591968911917096,0.00894635996642554
|
| 903 |
+
≥4,16000,average,0.4675481362198435,
|
| 904 |
+
≥4,16000,average_rank,3.4,
|
| 905 |
+
≥4,16000,chartqa_relaxed_overall,0.644,0.009578219924326623
|
| 906 |
+
≥4,16000,docvqa_val_anls,0.6701700723398091,0.00598528368808041
|
| 907 |
+
≥4,16000,infovqa_val_anls,0.25594206541579917,0.006827795722845132
|
| 908 |
+
≥4,16000,mme_total_score,621.2006802721088,
|
| 909 |
+
≥4,16000,mmmu_val_mmmu_acc,0.29333,
|
| 910 |
+
≥4,16000,mmstar_average,0.3555236170026451,
|
| 911 |
+
≥4,16000,ocrbench_ocrbench_accuracy,0.5,
|
| 912 |
+
≥4,16000,seedbench_seed_all,0.5140077821011673,
|
| 913 |
+
≥4,16000,textvqa_val_exact_match,0.52904,0.006781307610378791
|
| 914 |
+
≥4,17000,ai2d_exact_match,0.4423575129533679,0.008939151893135124
|
| 915 |
+
≥4,17000,average,0.470260028617176,
|
| 916 |
+
≥4,17000,average_rank,3.2,
|
| 917 |
+
≥4,17000,chartqa_relaxed_overall,0.6528,0.009523504757028414
|
| 918 |
+
≥4,17000,docvqa_val_anls,0.6715440208321617,0.005990650413425848
|
| 919 |
+
≥4,17000,infovqa_val_anls,0.2505498351142534,0.0067846959958436336
|
| 920 |
+
≥4,17000,mme_total_score,645.9387755102041,
|
| 921 |
+
≥4,17000,mmmu_val_mmmu_acc,0.28889,
|
| 922 |
+
≥4,17000,mmstar_average,0.36983647620343896,
|
| 923 |
+
≥4,17000,ocrbench_ocrbench_accuracy,0.506,
|
| 924 |
+
≥4,17000,seedbench_seed_all,0.5163424124513619,
|
| 925 |
+
≥4,17000,textvqa_val_exact_match,0.5340199999999999,0.006775522818343422
|
| 926 |
+
≥4,18000,ai2d_exact_match,0.44365284974093266,0.008941826870765836
|
| 927 |
+
≥4,18000,average,0.4716323231463362,
|
| 928 |
+
≥4,18000,average_rank,3.3,
|
| 929 |
+
≥4,18000,chartqa_relaxed_overall,0.6548,0.009510571191350932
|
| 930 |
+
≥4,18000,docvqa_val_anls,0.6713197941217036,0.006012007386995055
|
| 931 |
+
≥4,18000,infovqa_val_anls,0.25642018150567725,0.006824635684186863
|
| 932 |
+
≥4,18000,mme_total_score,678.5255102040816,
|
| 933 |
+
≥4,18000,mmmu_val_mmmu_acc,0.30222,
|
| 934 |
+
≥4,18000,mmstar_average,0.36512008406044094,
|
| 935 |
+
≥4,18000,ocrbench_ocrbench_accuracy,0.505,
|
| 936 |
+
≥4,18000,seedbench_seed_all,0.5163979988882713,
|
| 937 |
+
≥4,18000,textvqa_val_exact_match,0.5297599999999999,0.006789351039496154
|
| 938 |
+
≥4,19000,ai2d_exact_match,0.4420336787564767,0.008938473522297173
|
| 939 |
+
≥4,19000,average,0.47578432127978293,
|
| 940 |
+
≥4,19000,average_rank,3.1,
|
| 941 |
+
≥4,19000,chartqa_relaxed_overall,0.6552,0.009507962165354631
|
| 942 |
+
≥4,19000,docvqa_val_anls,0.6758822484033866,0.005978188514035828
|
| 943 |
+
≥4,19000,infovqa_val_anls,0.25920579379187,0.006859000726576107
|
| 944 |
+
≥4,19000,mme_total_score,648.0051020408163,
|
| 945 |
+
≥4,19000,mmmu_val_mmmu_acc,0.31222,
|
| 946 |
+
≥4,19000,mmstar_average,0.36867213443512875,
|
| 947 |
+
≥4,19000,ocrbench_ocrbench_accuracy,0.514,
|
| 948 |
+
≥4,19000,seedbench_seed_all,0.517065036131184,
|
| 949 |
+
≥4,19000,textvqa_val_exact_match,0.53778,0.006774194584041153
|
| 950 |
+
≥5,1000,ai2d_exact_match,0.2707253886010363,0.007997269386750962
|
| 951 |
+
≥5,1000,average,0.28746693021594993,
|
| 952 |
+
≥5,1000,average_rank,3.0,
|
| 953 |
+
≥5,1000,chartqa_relaxed_overall,0.4188,0.009869224115088964
|
| 954 |
+
≥5,1000,docvqa_val_anls,0.3974008919607103,0.006032200167231822
|
| 955 |
+
≥5,1000,infovqa_val_anls,0.19100831944942226,0.006573293191722239
|
| 956 |
+
≥5,1000,mme_total_score,611.360544217687,
|
| 957 |
+
≥5,1000,mmmu_val_mmmu_acc,0.24,
|
| 958 |
+
≥5,1000,mmstar_average,0.1976861821602846,
|
| 959 |
+
≥5,1000,ocrbench_ocrbench_accuracy,0.275,
|
| 960 |
+
≥5,1000,seedbench_seed_all,0.2508615897720956,
|
| 961 |
+
≥5,1000,textvqa_val_exact_match,0.34571999999999997,0.0065009529145933324
|
| 962 |
+
≥5,2000,ai2d_exact_match,0.26424870466321243,0.007936036132740997
|
| 963 |
+
≥5,2000,average,0.3233315583742189,
|
| 964 |
+
≥5,2000,average_rank,3.6,
|
| 965 |
+
≥5,2000,chartqa_relaxed_overall,0.5084,0.010000589018267121
|
| 966 |
+
≥5,2000,docvqa_val_anls,0.4796362298612967,0.006176090342725644
|
| 967 |
+
≥5,2000,infovqa_val_anls,0.18440735237308775,0.006268696895430431
|
| 968 |
+
≥5,2000,mme_total_score,573.4557823129252,
|
| 969 |
+
≥5,2000,mmmu_val_mmmu_acc,0.25333,
|
| 970 |
+
≥5,2000,mmstar_average,0.2035063743792113,
|
| 971 |
+
≥5,2000,ocrbench_ocrbench_accuracy,0.338,
|
| 972 |
+
≥5,2000,seedbench_seed_all,0.27965536409116176,
|
| 973 |
+
≥5,2000,textvqa_val_exact_match,0.3988,0.006679269011419241
|
| 974 |
+
≥5,3000,ai2d_exact_match,0.27428756476683935,0.00803002739723617
|
| 975 |
+
≥5,3000,average,0.3443864268244443,
|
| 976 |
+
≥5,3000,average_rank,3.9,
|
| 977 |
+
≥5,3000,chartqa_relaxed_overall,0.5452,0.00996104778570988
|
| 978 |
+
≥5,3000,docvqa_val_anls,0.4898334505042122,0.006220141079936321
|
| 979 |
+
≥5,3000,infovqa_val_anls,0.1905138805204429,0.0062439153410265395
|
| 980 |
+
≥5,3000,mme_total_score,556.8095238095239,
|
| 981 |
+
≥5,3000,mmmu_val_mmmu_acc,0.26778,
|
| 982 |
+
≥5,3000,mmstar_average,0.2298314503533517,
|
| 983 |
+
≥5,3000,ocrbench_ocrbench_accuracy,0.357,
|
| 984 |
+
≥5,3000,seedbench_seed_all,0.34469149527515286,
|
| 985 |
+
≥5,3000,textvqa_val_exact_match,0.40034000000000003,0.006692244325099119
|
| 986 |
+
≥5,4000,ai2d_exact_match,0.32027202072538863,0.008397669117307337
|
| 987 |
+
≥5,4000,average,0.376205627947542,
|
| 988 |
+
≥5,4000,average_rank,3.4,
|
| 989 |
+
≥5,4000,chartqa_relaxed_overall,0.5652,0.009916598185256227
|
| 990 |
+
≥5,4000,docvqa_val_anls,0.5177963857889966,0.006222777093039552
|
| 991 |
+
≥5,4000,infovqa_val_anls,0.18916843913392253,0.006215579683049469
|
| 992 |
+
≥5,4000,mme_total_score,604.8146258503401,
|
| 993 |
+
≥5,4000,mmmu_val_mmmu_acc,0.28778,
|
| 994 |
+
≥5,4000,mmstar_average,0.29371580699129896,
|
| 995 |
+
≥5,4000,ocrbench_ocrbench_accuracy,0.386,
|
| 996 |
+
≥5,4000,seedbench_seed_all,0.4163979988882713,
|
| 997 |
+
≥5,4000,textvqa_val_exact_match,0.40952,0.006715332684134995
|
| 998 |
+
≥5,5000,ai2d_exact_match,0.33743523316062174,0.008510225495976804
|
| 999 |
+
≥5,5000,average,0.3848546106999232,
|
| 1000 |
+
≥5,5000,average_rank,3.8,
|
| 1001 |
+
≥5,5000,chartqa_relaxed_overall,0.5804,0.009871844677005952
|
| 1002 |
+
≥5,5000,docvqa_val_anls,0.5377359601766157,0.0062198901419007885
|
| 1003 |
+
≥5,5000,infovqa_val_anls,0.2011732768094287,0.0063147175491986935
|
| 1004 |
+
≥5,5000,mme_total_score,635.4557823129252,
|
| 1005 |
+
≥5,5000,mmmu_val_mmmu_acc,0.28333,
|
| 1006 |
+
≥5,5000,mmstar_average,0.2707551695656502,
|
| 1007 |
+
≥5,5000,ocrbench_ocrbench_accuracy,0.413,
|
| 1008 |
+
≥5,5000,seedbench_seed_all,0.4153418565869928,
|
| 1009 |
+
≥5,5000,textvqa_val_exact_match,0.4245199999999999,0.006749286220993934
|
| 1010 |
+
≥5,6000,ai2d_exact_match,0.33775906735751293,0.008512227143417681
|
| 1011 |
+
≥5,6000,average,0.40160010157704135,
|
| 1012 |
+
≥5,6000,average_rank,4.0,
|
| 1013 |
+
≥5,6000,chartqa_relaxed_overall,0.5972,0.009811185848158155
|
| 1014 |
+
≥5,6000,docvqa_val_anls,0.5638436995149737,0.006239864600720921
|
| 1015 |
+
≥5,6000,infovqa_val_anls,0.2134170597784526,0.0064637890910638615
|
| 1016 |
+
≥5,6000,mme_total_score,629.6700680272108,
|
| 1017 |
+
≥5,6000,mmmu_val_mmmu_acc,0.29111,
|
| 1018 |
+
≥5,6000,mmstar_average,0.30043490077200463,
|
| 1019 |
+
≥5,6000,ocrbench_ocrbench_accuracy,0.428,
|
| 1020 |
+
≥5,6000,seedbench_seed_all,0.445136186770428,
|
| 1021 |
+
≥5,6000,textvqa_val_exact_match,0.4375000000000001,0.006770193284051843
|
| 1022 |
+
≥5,7000,ai2d_exact_match,0.38147668393782386,0.008742662684201102
|
| 1023 |
+
≥5,7000,average,0.4128574907483326,
|
| 1024 |
+
≥5,7000,average_rank,4.0,
|
| 1025 |
+
≥5,7000,chartqa_relaxed_overall,0.5924,0.009829727637028773
|
| 1026 |
+
≥5,7000,docvqa_val_anls,0.5787327229367657,0.006207050035602503
|
| 1027 |
+
≥5,7000,infovqa_val_anls,0.21426196373121223,0.006473055692640963
|
| 1028 |
+
≥5,7000,mme_total_score,634.8928571428571,
|
| 1029 |
+
≥5,7000,mmmu_val_mmmu_acc,0.28778,
|
| 1030 |
+
≥5,7000,mmstar_average,0.3047775080524821,
|
| 1031 |
+
≥5,7000,ocrbench_ocrbench_accuracy,0.445,
|
| 1032 |
+
≥5,7000,seedbench_seed_all,0.4633685380767093,
|
| 1033 |
+
≥5,7000,textvqa_val_exact_match,0.4479200000000001,0.006776925561345115
|
| 1034 |
+
≥5,8000,ai2d_exact_match,0.37694300518134716,0.008722348153640555
|
| 1035 |
+
≥5,8000,average,0.4177104341751616,
|
| 1036 |
+
≥5,8000,average_rank,4.3,
|
| 1037 |
+
≥5,8000,chartqa_relaxed_overall,0.6052,0.009778109662477129
|
| 1038 |
+
≥5,8000,docvqa_val_anls,0.5848226849030369,0.006118280955924086
|
| 1039 |
+
≥5,8000,infovqa_val_anls,0.22716516383976268,0.006599069597925426
|
| 1040 |
+
≥5,8000,mme_total_score,631.8299319727892,
|
| 1041 |
+
≥5,8000,mmmu_val_mmmu_acc,0.28778,
|
| 1042 |
+
≥5,8000,mmstar_average,0.310840196509451,
|
| 1043 |
+
≥5,8000,ocrbench_ocrbench_accuracy,0.45,
|
| 1044 |
+
≥5,8000,seedbench_seed_all,0.45714285714285713,
|
| 1045 |
+
≥5,8000,textvqa_val_exact_match,0.4595,0.006799352633835655
|
| 1046 |
+
≥5,9000,ai2d_exact_match,0.39345854922279794,0.008792480650628211
|
| 1047 |
+
≥5,9000,average,0.4230005209124751,
|
| 1048 |
+
≥5,9000,average_rank,4.1,
|
| 1049 |
+
≥5,9000,chartqa_relaxed_overall,0.6164,0.009727191953761483
|
| 1050 |
+
≥5,9000,docvqa_val_anls,0.5898502952284708,0.006210892818800316
|
| 1051 |
+
≥5,9000,infovqa_val_anls,0.22700559034851783,0.006566637554657031
|
| 1052 |
+
≥5,9000,mme_total_score,650.124149659864,
|
| 1053 |
+
≥5,9000,mmmu_val_mmmu_acc,0.26333,
|
| 1054 |
+
≥5,9000,mmstar_average,0.3054500143908104,
|
| 1055 |
+
≥5,9000,ocrbench_ocrbench_accuracy,0.476,
|
| 1056 |
+
≥5,9000,seedbench_seed_all,0.4744302390216787,
|
| 1057 |
+
≥5,9000,textvqa_val_exact_match,0.46107999999999993,0.00679104339362331
|
| 1058 |
+
≥5,10000,ai2d_exact_match,0.40770725388601037,0.008844516803704286
|
| 1059 |
+
≥5,10000,average,0.42883369231582574,
|
| 1060 |
+
≥5,10000,average_rank,4.3,
|
| 1061 |
+
≥5,10000,chartqa_relaxed_overall,0.6212,0.009703704898413913
|
| 1062 |
+
≥5,10000,docvqa_val_anls,0.6019271313791558,0.0061562654420724205
|
| 1063 |
+
≥5,10000,infovqa_val_anls,0.22574970075829065,0.006593818910085297
|
| 1064 |
+
≥5,10000,mme_total_score,596.4710884353742,
|
| 1065 |
+
≥5,10000,mmmu_val_mmmu_acc,0.28778,
|
| 1066 |
+
≥5,10000,mmstar_average,0.31747912258440025,
|
| 1067 |
+
≥5,10000,ocrbench_ocrbench_accuracy,0.458,
|
| 1068 |
+
≥5,10000,seedbench_seed_all,0.47204002223457475,
|
| 1069 |
+
≥5,10000,textvqa_val_exact_match,0.46762,0.00680558375997701
|
| 1070 |
+
≥5,11000,ai2d_exact_match,0.41224093264248707,0.008859453032358869
|
| 1071 |
+
≥5,11000,average,0.43139135632851494,
|
| 1072 |
+
≥5,11000,average_rank,4.3,
|
| 1073 |
+
≥5,11000,chartqa_relaxed_overall,0.624,0.009689538423575438
|
| 1074 |
+
≥5,11000,docvqa_val_anls,0.5969298636083556,0.006077525402912234
|
| 1075 |
+
≥5,11000,infovqa_val_anls,0.23206384708038705,0.006649968873942653
|
| 1076 |
+
≥5,11000,mme_total_score,618.6326530612246,
|
| 1077 |
+
≥5,11000,mmmu_val_mmmu_acc,0.29778,
|
| 1078 |
+
≥5,11000,mmstar_average,0.304166274020068,
|
| 1079 |
+
≥5,11000,ocrbench_ocrbench_accuracy,0.467,
|
| 1080 |
+
≥5,11000,seedbench_seed_all,0.4783212896053363,
|
| 1081 |
+
≥5,11000,textvqa_val_exact_match,0.47002,0.006791013436350906
|
| 1082 |
+
≥5,12000,ai2d_exact_match,0.41386010362694303,0.00886459927257348
|
| 1083 |
+
≥5,12000,average,0.43780261093672457,
|
| 1084 |
+
≥5,12000,average_rank,4.2,
|
| 1085 |
+
≥5,12000,chartqa_relaxed_overall,0.63,0.00965801796044974
|
| 1086 |
+
≥5,12000,docvqa_val_anls,0.61907502588855,0.006141216046133696
|
| 1087 |
+
≥5,12000,infovqa_val_anls,0.23508371459325778,0.006715413220374958
|
| 1088 |
+
≥5,12000,mme_total_score,636.8095238095239,
|
| 1089 |
+
≥5,12000,mmmu_val_mmmu_acc,0.29444,
|
| 1090 |
+
≥5,12000,mmstar_average,0.30800611068641726,
|
| 1091 |
+
≥5,12000,ocrbench_ocrbench_accuracy,0.48,
|
| 1092 |
+
≥5,12000,seedbench_seed_all,0.481378543635353,
|
| 1093 |
+
≥5,12000,textvqa_val_exact_match,0.47838,0.0067952446747735285
|
| 1094 |
+
≥5,13000,ai2d_exact_match,0.41968911917098445,0.008882309400443855
|
| 1095 |
+
≥5,13000,average,0.44244351114513614,
|
| 1096 |
+
≥5,13000,average_rank,4.2,
|
| 1097 |
+
≥5,13000,chartqa_relaxed_overall,0.63,0.00965801796044974
|
| 1098 |
+
≥5,13000,docvqa_val_anls,0.6332585799642098,0.006108497007314537
|
| 1099 |
+
≥5,13000,infovqa_val_anls,0.23941240946565165,0.006705748427777414
|
| 1100 |
+
≥5,13000,mme_total_score,626.8248299319728,
|
| 1101 |
+
≥5,13000,mmmu_val_mmmu_acc,0.29444,
|
| 1102 |
+
≥5,13000,mmstar_average,0.31812106368981574,
|
| 1103 |
+
≥5,13000,ocrbench_ocrbench_accuracy,0.479,
|
| 1104 |
+
≥5,13000,seedbench_seed_all,0.4867704280155642,
|
| 1105 |
+
≥5,13000,textvqa_val_exact_match,0.4812999999999999,0.006807740525770151
|
| 1106 |
+
≥5,14000,ai2d_exact_match,0.4180699481865285,0.008877517831066049
|
| 1107 |
+
≥5,14000,average,0.43859864594625364,
|
| 1108 |
+
≥5,14000,average_rank,4.7,
|
| 1109 |
+
≥5,14000,chartqa_relaxed_overall,0.634,0.00963611653607192
|
| 1110 |
+
≥5,14000,docvqa_val_anls,0.6228194930250945,0.00603170401976013
|
| 1111 |
+
≥5,14000,infovqa_val_anls,0.2373742957554242,0.006749917839373105
|
| 1112 |
+
≥5,14000,mme_total_score,640.9353741496599,
|
| 1113 |
+
≥5,14000,mmmu_val_mmmu_acc,0.28111,
|
| 1114 |
+
≥5,14000,mmstar_average,0.30870170300837935,
|
| 1115 |
+
≥5,14000,ocrbench_ocrbench_accuracy,0.468,
|
| 1116 |
+
≥5,14000,seedbench_seed_all,0.490272373540856,
|
| 1117 |
+
≥5,14000,textvqa_val_exact_match,0.48704,0.006811555412490416
|
| 1118 |
+
≥5,15000,ai2d_exact_match,0.42033678756476683,0.0088841985383291
|
| 1119 |
+
≥5,15000,average,0.4493020549744483,
|
| 1120 |
+
≥5,15000,average_rank,4.3,
|
| 1121 |
+
≥5,15000,chartqa_relaxed_overall,0.634,0.00963611653607192
|
| 1122 |
+
≥5,15000,docvqa_val_anls,0.6441936439366807,0.006083543225400507
|
| 1123 |
+
≥5,15000,infovqa_val_anls,0.24225340680501783,0.006766995068281554
|
| 1124 |
+
≥5,15000,mme_total_score,649.6683673469388,
|
| 1125 |
+
≥5,15000,mmmu_val_mmmu_acc,0.31444,
|
| 1126 |
+
≥5,15000,mmstar_average,0.3088639171639584,
|
| 1127 |
+
≥5,15000,ocrbench_ocrbench_accuracy,0.49,
|
| 1128 |
+
≥5,15000,seedbench_seed_all,0.4953307392996109,
|
| 1129 |
+
≥5,15000,textvqa_val_exact_match,0.4943,0.006802702209524558
|
| 1130 |
+
≥5,16000,ai2d_exact_match,0.43102331606217614,0.008913110733383512
|
| 1131 |
+
≥5,16000,average,0.4547498341048439,
|
| 1132 |
+
≥5,16000,average_rank,4.0,
|
| 1133 |
+
≥5,16000,chartqa_relaxed_overall,0.6376,0.009615793331418735
|
| 1134 |
+
≥5,16000,docvqa_val_anls,0.6400316893948637,0.006137343277865641
|
| 1135 |
+
≥5,16000,infovqa_val_anls,0.2494232082698933,0.006878312285939229
|
| 1136 |
+
≥5,16000,mme_total_score,651.4880952380952,
|
| 1137 |
+
≥5,16000,mmmu_val_mmmu_acc,0.31111,
|
| 1138 |
+
≥5,16000,mmstar_average,0.33139384518998005,
|
| 1139 |
+
≥5,16000,ocrbench_ocrbench_accuracy,0.504,
|
| 1140 |
+
≥5,16000,seedbench_seed_all,0.49160644802668146,
|
| 1141 |
+
≥5,16000,textvqa_val_exact_match,0.49655999999999995,0.006809089955434884
|
| 1142 |
+
≥5,17000,ai2d_exact_match,0.41547927461139894,0.008869646776634897
|
| 1143 |
+
≥5,17000,average,0.45183488389514526,
|
| 1144 |
+
≥5,17000,average_rank,4.2,
|
| 1145 |
+
≥5,17000,chartqa_relaxed_overall,0.6444,0.009575809858898698
|
| 1146 |
+
≥5,17000,docvqa_val_anls,0.6470904368022692,0.00610746328694398
|
| 1147 |
+
≥5,17000,infovqa_val_anls,0.24587813104100764,0.006774660099659317
|
| 1148 |
+
≥5,17000,mme_total_score,635.6700680272108,
|
| 1149 |
+
≥5,17000,mmmu_val_mmmu_acc,0.29222,
|
| 1150 |
+
≥5,17000,mmstar_average,0.3214527496221985,
|
| 1151 |
+
≥5,17000,ocrbench_ocrbench_accuracy,0.51,
|
| 1152 |
+
≥5,17000,seedbench_seed_all,0.49605336297943303,
|
| 1153 |
+
≥5,17000,textvqa_val_exact_match,0.49394000000000005,0.006805286008944275
|
| 1154 |
+
≥5,18000,ai2d_exact_match,0.4319948186528497,0.008915528710615484
|
| 1155 |
+
≥5,18000,average,0.4524884361615038,
|
| 1156 |
+
≥5,18000,average_rank,4.7,
|
| 1157 |
+
≥5,18000,chartqa_relaxed_overall,0.6412,0.009594886593362934
|
| 1158 |
+
≥5,18000,docvqa_val_anls,0.6417498056690655,0.006070355136614947
|
| 1159 |
+
≥5,18000,infovqa_val_anls,0.2404225713513222,0.006730728710089941
|
| 1160 |
+
≥5,18000,mme_total_score,655.829931972789,
|
| 1161 |
+
≥5,18000,mmmu_val_mmmu_acc,0.29222,
|
| 1162 |
+
≥5,18000,mmstar_average,0.32211620059741813,
|
| 1163 |
+
≥5,18000,ocrbench_ocrbench_accuracy,0.513,
|
| 1164 |
+
≥5,18000,seedbench_seed_all,0.4945525291828794,
|
| 1165 |
+
≥5,18000,textvqa_val_exact_match,0.49513999999999997,0.0067971844482318305
|
| 1166 |
+
≥5,19000,ai2d_exact_match,0.4213082901554404,0.008887002823098537
|
| 1167 |
+
≥5,19000,average,0.4541437926528052,
|
| 1168 |
+
≥5,19000,average_rank,4.7,
|
| 1169 |
+
≥5,19000,chartqa_relaxed_overall,0.642,0.009590161024476605
|
| 1170 |
+
≥5,19000,docvqa_val_anls,0.6518779003706476,0.006074501334886487
|
| 1171 |
+
≥5,19000,infovqa_val_anls,0.24196990768518037,0.006752919839434985
|
| 1172 |
+
≥5,19000,mme_total_score,646.6513605442177,
|
| 1173 |
+
≥5,19000,mmmu_val_mmmu_acc,0.30111,
|
| 1174 |
+
≥5,19000,mmstar_average,0.3147497032570857,
|
| 1175 |
+
≥5,19000,ocrbench_ocrbench_accuracy,0.51,
|
| 1176 |
+
≥5,19000,seedbench_seed_all,0.49699833240689273,
|
| 1177 |
+
≥5,19000,textvqa_val_exact_match,0.5072800000000001,0.006794367741490103
|
app/src/content/assets/data/internal_deduplication.csv
ADDED
|
@@ -0,0 +1,729 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
run,step,metric,value,stderr
|
| 2 |
+
Baseline,300,ai2d_exact_match,0.2551813471502591,0.007846598309236504
|
| 3 |
+
Baseline,300,average,0.1836384379377178,
|
| 4 |
+
Baseline,300,average_rank,1.4444444444444444,
|
| 5 |
+
Baseline,300,chartqa_relaxed_overall,0.1328,0.006788526912302523
|
| 6 |
+
Baseline,300,docvqa_val_anls,0.1503143424142802,0.004151727384820528
|
| 7 |
+
Baseline,300,infovqa_val_anls,0.11374396685909084,0.005163280990095591
|
| 8 |
+
Baseline,300,mme_total_score,691.1952781112445,
|
| 9 |
+
Baseline,300,mmmu_val_mmmu_acc,0.26556,
|
| 10 |
+
Baseline,300,mmstar_average,0.2859278470781123,
|
| 11 |
+
Baseline,300,ocrbench_ocrbench_accuracy,0.149,
|
| 12 |
+
Baseline,300,textvqa_val_exact_match,0.11657999999999999,0.004405144921606561
|
| 13 |
+
Baseline,1500,ai2d_exact_match,0.27525906735751293,0.008038849490577982
|
| 14 |
+
Baseline,1500,average,0.318819844462715,
|
| 15 |
+
Baseline,1500,average_rank,1.2222222222222223,
|
| 16 |
+
Baseline,1500,chartqa_relaxed_overall,0.374,0.009679208378267924
|
| 17 |
+
Baseline,1500,docvqa_val_anls,0.437411196849637,0.0061765544267728045
|
| 18 |
+
Baseline,1500,infovqa_val_anls,0.21582289145457856,0.006873661480889723
|
| 19 |
+
Baseline,1500,mme_total_score,1066.704581832733,
|
| 20 |
+
Baseline,1500,mmmu_val_mmmu_acc,0.24,
|
| 21 |
+
Baseline,1500,mmstar_average,0.23474560003999134,
|
| 22 |
+
Baseline,1500,ocrbench_ocrbench_accuracy,0.411,
|
| 23 |
+
Baseline,1500,textvqa_val_exact_match,0.36232000000000003,0.006579840604488538
|
| 24 |
+
Baseline,2700,ai2d_exact_match,0.27849740932642486,0.008067913113285858
|
| 25 |
+
Baseline,2700,average,0.36471172748595665,
|
| 26 |
+
Baseline,2700,average_rank,1.4444444444444444,
|
| 27 |
+
Baseline,2700,chartqa_relaxed_overall,0.4624,0.00997367964766694
|
| 28 |
+
Baseline,2700,docvqa_val_anls,0.4953558755845657,0.006275075768152338
|
| 29 |
+
Baseline,2700,infovqa_val_anls,0.20975551937756792,0.006468441430093479
|
| 30 |
+
Baseline,2700,mme_total_score,1172.469887955182,
|
| 31 |
+
Baseline,2700,mmmu_val_mmmu_acc,0.27111,
|
| 32 |
+
Baseline,2700,mmstar_average,0.2503150155990948,
|
| 33 |
+
Baseline,2700,ocrbench_ocrbench_accuracy,0.486,
|
| 34 |
+
Baseline,2700,textvqa_val_exact_match,0.46426000000000006,0.006792330795207658
|
| 35 |
+
Baseline,3900,ai2d_exact_match,0.35038860103626945,0.008586842325753156
|
| 36 |
+
Baseline,3900,average,0.398537125609502,
|
| 37 |
+
Baseline,3900,average_rank,1.4444444444444444,
|
| 38 |
+
Baseline,3900,chartqa_relaxed_overall,0.4948,0.010001459677380663
|
| 39 |
+
Baseline,3900,docvqa_val_anls,0.5407649774017467,0.00626354456311192
|
| 40 |
+
Baseline,3900,infovqa_val_anls,0.22943878312324553,0.006664668392753554
|
| 41 |
+
Baseline,3900,mme_total_score,1168.9393757503,
|
| 42 |
+
Baseline,3900,mmmu_val_mmmu_acc,0.27,
|
| 43 |
+
Baseline,3900,mmstar_average,0.3015046433147543,
|
| 44 |
+
Baseline,3900,ocrbench_ocrbench_accuracy,0.517,
|
| 45 |
+
Baseline,3900,textvqa_val_exact_match,0.4844,0.006794038548018284
|
| 46 |
+
Baseline,5100,ai2d_exact_match,0.3898963730569948,0.008778252852376944
|
| 47 |
+
Baseline,5100,average,0.42767475240113806,
|
| 48 |
+
Baseline,5100,average_rank,1.2222222222222223,
|
| 49 |
+
Baseline,5100,chartqa_relaxed_overall,0.5264,0.009988048880946633
|
| 50 |
+
Baseline,5100,docvqa_val_anls,0.5781350651939515,0.006244324391533268
|
| 51 |
+
Baseline,5100,infovqa_val_anls,0.2546269175216946,0.007112814176935012
|
| 52 |
+
Baseline,5100,mme_total_score,1185.1023409363747,
|
| 53 |
+
Baseline,5100,mmmu_val_mmmu_acc,0.29222,
|
| 54 |
+
Baseline,5100,mmstar_average,0.33637966343646347,
|
| 55 |
+
Baseline,5100,ocrbench_ocrbench_accuracy,0.533,
|
| 56 |
+
Baseline,5100,textvqa_val_exact_match,0.51074,0.0068004249599511925
|
| 57 |
+
Baseline,6300,ai2d_exact_match,0.41515544041450775,0.00886864516657515
|
| 58 |
+
Baseline,6300,average,0.43890688312888254,
|
| 59 |
+
Baseline,6300,average_rank,1.4444444444444444,
|
| 60 |
+
Baseline,6300,chartqa_relaxed_overall,0.5388,0.0099718403035556
|
| 61 |
+
Baseline,6300,docvqa_val_anls,0.6024512173813115,0.006190216536053702
|
| 62 |
+
Baseline,6300,infovqa_val_anls,0.2548412895443468,0.007030638027408485
|
| 63 |
+
Baseline,6300,mme_total_score,1187.329431772709,
|
| 64 |
+
Baseline,6300,mmmu_val_mmmu_acc,0.30667,
|
| 65 |
+
Baseline,6300,mmstar_average,0.3500771176908943,
|
| 66 |
+
Baseline,6300,ocrbench_ocrbench_accuracy,0.516,
|
| 67 |
+
Baseline,6300,textvqa_val_exact_match,0.52726,0.006770298802059908
|
| 68 |
+
Baseline,7500,ai2d_exact_match,0.42972797927461137,0.008909832364541428
|
| 69 |
+
Baseline,7500,average,0.44878537461255386,
|
| 70 |
+
Baseline,7500,average_rank,1.3333333333333333,
|
| 71 |
+
Baseline,7500,chartqa_relaxed_overall,0.5728,0.009895414680177737
|
| 72 |
+
Baseline,7500,docvqa_val_anls,0.6164034078362094,0.006122657396260068
|
| 73 |
+
Baseline,7500,infovqa_val_anls,0.25244937386016403,0.006941949044716374
|
| 74 |
+
Baseline,7500,mme_total_score,1282.560024009604,
|
| 75 |
+
Baseline,7500,mmmu_val_mmmu_acc,0.29667,
|
| 76 |
+
Baseline,7500,mmstar_average,0.3339722359294459,
|
| 77 |
+
Baseline,7500,ocrbench_ocrbench_accuracy,0.558,
|
| 78 |
+
Baseline,7500,textvqa_val_exact_match,0.5302600000000001,0.0067524799649562395
|
| 79 |
+
Baseline,8700,ai2d_exact_match,0.44527202072538863,0.008945084019331404
|
| 80 |
+
Baseline,8700,average,0.4558942646480554,
|
| 81 |
+
Baseline,8700,average_rank,1.5555555555555556,
|
| 82 |
+
Baseline,8700,chartqa_relaxed_overall,0.5852,0.009855721084488851
|
| 83 |
+
Baseline,8700,docvqa_val_anls,0.6221835109907441,0.006147036255020746
|
| 84 |
+
Baseline,8700,infovqa_val_anls,0.25900127209441604,0.006885435292484948
|
| 85 |
+
Baseline,8700,mme_total_score,1182.047919167667,
|
| 86 |
+
Baseline,8700,mmmu_val_mmmu_acc,0.30333,
|
| 87 |
+
Baseline,8700,mmstar_average,0.3299073133738943,
|
| 88 |
+
Baseline,8700,ocrbench_ocrbench_accuracy,0.559,
|
| 89 |
+
Baseline,8700,textvqa_val_exact_match,0.54326,0.0067297527736521565
|
| 90 |
+
Baseline,9900,ai2d_exact_match,0.4520725388601036,0.008957715852675529
|
| 91 |
+
Baseline,9900,average,0.4655685311713072,
|
| 92 |
+
Baseline,9900,average_rank,1.5555555555555556,
|
| 93 |
+
Baseline,9900,chartqa_relaxed_overall,0.5888,0.009842996384797287
|
| 94 |
+
Baseline,9900,docvqa_val_anls,0.6443822232919176,0.006072644236356477
|
| 95 |
+
Baseline,9900,infovqa_val_anls,0.2707219279967856,0.007060292176646616
|
| 96 |
+
Baseline,9900,mme_total_score,1293.4631852741097,
|
| 97 |
+
Baseline,9900,mmmu_val_mmmu_acc,0.30444,
|
| 98 |
+
Baseline,9900,mmstar_average,0.34327155922165065,
|
| 99 |
+
Baseline,9900,ocrbench_ocrbench_accuracy,0.557,
|
| 100 |
+
Baseline,9900,textvqa_val_exact_match,0.56386,0.006703146016110842
|
| 101 |
+
Baseline,11100,ai2d_exact_match,0.4494818652849741,0.008953103134587198
|
| 102 |
+
Baseline,11100,average,0.471077301321738,
|
| 103 |
+
Baseline,11100,average_rank,1.6666666666666667,
|
| 104 |
+
Baseline,11100,chartqa_relaxed_overall,0.5948,0.009820578470976232
|
| 105 |
+
Baseline,11100,docvqa_val_anls,0.657973309294109,0.006015458191652746
|
| 106 |
+
Baseline,11100,infovqa_val_anls,0.29696232573726855,0.007574623301736419
|
| 107 |
+
Baseline,11100,mme_total_score,1338.3029211684673,
|
| 108 |
+
Baseline,11100,mmmu_val_mmmu_acc,0.29667,
|
| 109 |
+
Baseline,11100,mmstar_average,0.3394909102575524,
|
| 110 |
+
Baseline,11100,ocrbench_ocrbench_accuracy,0.565,
|
| 111 |
+
Baseline,11100,textvqa_val_exact_match,0.56824,0.006679879088496093
|
| 112 |
+
Baseline,12300,ai2d_exact_match,0.4676165803108808,0.008980259712600086
|
| 113 |
+
Baseline,12300,average,0.47342294699365395,
|
| 114 |
+
Baseline,12300,average_rank,1.5555555555555556,
|
| 115 |
+
Baseline,12300,chartqa_relaxed_overall,0.598,0.009808000752013664
|
| 116 |
+
Baseline,12300,docvqa_val_anls,0.6588847758219586,0.00602421968017162
|
| 117 |
+
Baseline,12300,infovqa_val_anls,0.2830975650419957,0.007216197962807829
|
| 118 |
+
Baseline,12300,mme_total_score,1269.7461984793918,
|
| 119 |
+
Baseline,12300,mmmu_val_mmmu_acc,0.28333,
|
| 120 |
+
Baseline,12300,mmstar_average,0.3693946547743964,
|
| 121 |
+
Baseline,12300,ocrbench_ocrbench_accuracy,0.559,
|
| 122 |
+
Baseline,12300,textvqa_val_exact_match,0.5680599999999999,0.006686980665598219
|
| 123 |
+
Baseline,13500,ai2d_exact_match,0.47085492227979275,0.008983852707691612
|
| 124 |
+
Baseline,13500,average,0.48226394524672617,
|
| 125 |
+
Baseline,13500,average_rank,1.5555555555555556,
|
| 126 |
+
Baseline,13500,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 127 |
+
Baseline,13500,docvqa_val_anls,0.6663692127257962,0.005978102603390597
|
| 128 |
+
Baseline,13500,infovqa_val_anls,0.32051341945189793,0.007779116582967409
|
| 129 |
+
Baseline,13500,mme_total_score,1202.768607442977,
|
| 130 |
+
Baseline,13500,mmmu_val_mmmu_acc,0.28,
|
| 131 |
+
Baseline,13500,mmstar_average,0.35477400751632243,
|
| 132 |
+
Baseline,13500,ocrbench_ocrbench_accuracy,0.569,
|
| 133 |
+
Baseline,13500,textvqa_val_exact_match,0.5785999999999999,0.006676145758177908
|
| 134 |
+
Baseline,14700,ai2d_exact_match,0.46567357512953367,0.008977921602780724
|
| 135 |
+
Baseline,14700,average,0.48621829332317545,
|
| 136 |
+
Baseline,14700,average_rank,1.5555555555555556,
|
| 137 |
+
Baseline,14700,chartqa_relaxed_overall,0.6296,0.0096601689190934
|
| 138 |
+
Baseline,14700,docvqa_val_anls,0.6810941724065047,0.005910647813959628
|
| 139 |
+
Baseline,14700,infovqa_val_anls,0.3016034504434661,0.007417514325399065
|
| 140 |
+
Baseline,14700,mme_total_score,1281.9612845138056,
|
| 141 |
+
Baseline,14700,mmmu_val_mmmu_acc,0.29778,
|
| 142 |
+
Baseline,14700,mmstar_average,0.365895148605899,
|
| 143 |
+
Baseline,14700,ocrbench_ocrbench_accuracy,0.562,
|
| 144 |
+
Baseline,14700,textvqa_val_exact_match,0.5861,0.006642001297519238
|
| 145 |
+
Baseline,15900,ai2d_exact_match,0.48186528497409326,0.008993233105757854
|
| 146 |
+
Baseline,15900,average,0.48999290982002447,
|
| 147 |
+
Baseline,15900,average_rank,1.5,
|
| 148 |
+
Baseline,15900,chartqa_relaxed_overall,0.64,0.009601920576192066
|
| 149 |
+
Baseline,15900,docvqa_val_anls,0.6858324657211811,0.00589619582327283
|
| 150 |
+
Baseline,15900,infovqa_val_anls,0.2913749730393032,0.007302812648430173
|
| 151 |
+
Baseline,15900,mme_total_score,1296.9955982392958,
|
| 152 |
+
Baseline,15900,mmmu_val_mmmu_acc,0.29111,
|
| 153 |
+
Baseline,15900,mmstar_average,0.35848055482561814,
|
| 154 |
+
Baseline,15900,ocrbench_ocrbench_accuracy,0.581,
|
| 155 |
+
Baseline,15900,textvqa_val_exact_match,0.59028,0.006635865524726405
|
| 156 |
+
Baseline,17100,ai2d_exact_match,0.4740932642487047,0.008987066275159845
|
| 157 |
+
Baseline,17100,average,0.4931189092163302,
|
| 158 |
+
Baseline,17100,average_rank,1.7777777777777777,
|
| 159 |
+
Baseline,17100,chartqa_relaxed_overall,0.644,0.009578219924326623
|
| 160 |
+
Baseline,17100,docvqa_val_anls,0.6847803896363295,0.005919128355709122
|
| 161 |
+
Baseline,17100,infovqa_val_anls,0.3018247984331409,0.007408081810180743
|
| 162 |
+
Baseline,17100,mme_total_score,1262.8012204881952,
|
| 163 |
+
Baseline,17100,mmmu_val_mmmu_acc,0.28444,
|
| 164 |
+
Baseline,17100,mmstar_average,0.36583282141246676,
|
| 165 |
+
Baseline,17100,ocrbench_ocrbench_accuracy,0.588,
|
| 166 |
+
Baseline,17100,textvqa_val_exact_match,0.6019800000000001,0.0065905009567234045
|
| 167 |
+
Baseline,18300,ai2d_exact_match,0.4876943005181347,0.008996428218289523
|
| 168 |
+
Baseline,18300,average,0.5004883767088391,
|
| 169 |
+
Baseline,18300,average_rank,1.5,
|
| 170 |
+
Baseline,18300,chartqa_relaxed_overall,0.652,0.00952862623294433
|
| 171 |
+
Baseline,18300,docvqa_val_anls,0.6975218894019752,0.005845051202995877
|
| 172 |
+
Baseline,18300,infovqa_val_anls,0.3185079040699619,0.007608667971660477
|
| 173 |
+
Baseline,18300,mme_total_score,1310.265706282513,
|
| 174 |
+
Baseline,18300,mmmu_val_mmmu_acc,0.29556,
|
| 175 |
+
Baseline,18300,mmstar_average,0.36108291968064027,
|
| 176 |
+
Baseline,18300,ocrbench_ocrbench_accuracy,0.588,
|
| 177 |
+
Baseline,18300,textvqa_val_exact_match,0.60354,0.006611280926348344
|
| 178 |
+
Baseline,19500,ai2d_exact_match,0.47765544041450775,0.00899016344465196
|
| 179 |
+
Baseline,19500,average,0.5040547762672563,
|
| 180 |
+
Baseline,19500,average_rank,1.4444444444444444,
|
| 181 |
+
Baseline,19500,chartqa_relaxed_overall,0.6552,0.009507962165354631
|
| 182 |
+
Baseline,19500,docvqa_val_anls,0.7041825239698998,0.005808767160221614
|
| 183 |
+
Baseline,19500,infovqa_val_anls,0.3209241432627218,0.007605560217474187
|
| 184 |
+
Baseline,19500,mme_total_score,1295.3964585834333,
|
| 185 |
+
Baseline,19500,mmmu_val_mmmu_acc,0.30333,
|
| 186 |
+
Baseline,19500,mmstar_average,0.35936610249092044,
|
| 187 |
+
Baseline,19500,ocrbench_ocrbench_accuracy,0.604,
|
| 188 |
+
Baseline,19500,textvqa_val_exact_match,0.60778,0.006595164407254131
|
| 189 |
+
Baseline,20700,ai2d_exact_match,0.49190414507772023,0.008997974381217105
|
| 190 |
+
Baseline,20700,average,0.5348651598748863,
|
| 191 |
+
Baseline,20700,average_rank,1.25,
|
| 192 |
+
Baseline,20700,chartqa_relaxed_overall,0.6472,0.009558734841217527
|
| 193 |
+
Baseline,20700,docvqa_val_anls,0.70377508713271,0.005815829966103309
|
| 194 |
+
Baseline,20700,infovqa_val_anls,0.31228879567103124,0.0074592773891107925
|
| 195 |
+
Baseline,20700,mme_total_score,1267.3561424569828,
|
| 196 |
+
Baseline,20700,mmstar_average,0.36086809124274183,
|
| 197 |
+
Baseline,20700,ocrbench_ocrbench_accuracy,0.605,
|
| 198 |
+
Baseline,20700,textvqa_val_exact_match,0.62302,0.006536647571369781
|
| 199 |
+
Baseline,21900,ai2d_exact_match,0.49125647668393785,0.008997778057794698
|
| 200 |
+
Baseline,21900,average,0.5035549318138456,
|
| 201 |
+
Baseline,21900,average_rank,1.4444444444444444,
|
| 202 |
+
Baseline,21900,chartqa_relaxed_overall,0.6556,0.009505345687488459
|
| 203 |
+
Baseline,21900,docvqa_val_anls,0.7044656227681543,0.005797355786446792
|
| 204 |
+
Baseline,21900,infovqa_val_anls,0.3214548388700204,0.007656455061893302
|
| 205 |
+
Baseline,21900,mme_total_score,1270.262104841937,
|
| 206 |
+
Baseline,21900,mmmu_val_mmmu_acc,0.28111,
|
| 207 |
+
Baseline,21900,mmstar_average,0.36167251618865237,
|
| 208 |
+
Baseline,21900,ocrbench_ocrbench_accuracy,0.597,
|
| 209 |
+
Baseline,21900,textvqa_val_exact_match,0.61588,0.006563701818052925
|
| 210 |
+
Baseline,23100,ai2d_exact_match,0.49319948186528495,0.008998321712163856
|
| 211 |
+
Baseline,23100,average,0.5385543058304301,
|
| 212 |
+
Baseline,23100,average_rank,1.5,
|
| 213 |
+
Baseline,23100,chartqa_relaxed_overall,0.6592,0.009481461028833927
|
| 214 |
+
Baseline,23100,docvqa_val_anls,0.7121972356483652,0.005769225218375019
|
| 215 |
+
Baseline,23100,infovqa_val_anls,0.31967136620122777,0.007611618366213475
|
| 216 |
+
Baseline,23100,mme_total_score,1318.2786114445778,
|
| 217 |
+
Baseline,23100,mmstar_average,0.3630320570981325,
|
| 218 |
+
Baseline,23100,ocrbench_ocrbench_accuracy,0.602,
|
| 219 |
+
Baseline,23100,textvqa_val_exact_match,0.62058,0.006524799408523169
|
| 220 |
+
Baseline,24300,ai2d_exact_match,0.49255181347150256,0.008998155599035915
|
| 221 |
+
Baseline,24300,average,0.5094308504545716,
|
| 222 |
+
Baseline,24300,average_rank,1.5555555555555556,
|
| 223 |
+
Baseline,24300,chartqa_relaxed_overall,0.6704,0.009403239035659185
|
| 224 |
+
Baseline,24300,docvqa_val_anls,0.7177853964151442,0.005720014481294498
|
| 225 |
+
Baseline,24300,infovqa_val_anls,0.31972012794378407,0.007606738233281323
|
| 226 |
+
Baseline,24300,mme_total_score,1306.592336934774,
|
| 227 |
+
Baseline,24300,mmmu_val_mmmu_acc,0.29778,
|
| 228 |
+
Baseline,24300,mmstar_average,0.37076946580614156,
|
| 229 |
+
Baseline,24300,ocrbench_ocrbench_accuracy,0.59,
|
| 230 |
+
Baseline,24300,textvqa_val_exact_match,0.6164400000000001,0.006543401905866729
|
| 231 |
+
Baseline,25500,ai2d_exact_match,0.501619170984456,0.008999106932714636
|
| 232 |
+
Baseline,25500,average,0.5486249165918439,
|
| 233 |
+
Baseline,25500,average_rank,1.625,
|
| 234 |
+
Baseline,25500,chartqa_relaxed_overall,0.6752,0.00936787525721462
|
| 235 |
+
Baseline,25500,docvqa_val_anls,0.7137288248520355,0.0057597420625403505
|
| 236 |
+
Baseline,25500,infovqa_val_anls,0.34135511904919924,0.0077802284678825705
|
| 237 |
+
Baseline,25500,mme_total_score,1323.6883753501402,
|
| 238 |
+
Baseline,25500,mmstar_average,0.369071301257217,
|
| 239 |
+
Baseline,25500,ocrbench_ocrbench_accuracy,0.619,
|
| 240 |
+
Baseline,25500,textvqa_val_exact_match,0.6204,0.00653548089294892
|
| 241 |
+
Baseline,26700,ai2d_exact_match,0.4990284974093264,0.008999137132137064
|
| 242 |
+
Baseline,26700,average,0.5171016246428288,
|
| 243 |
+
Baseline,26700,average_rank,1.4444444444444444,
|
| 244 |
+
Baseline,26700,chartqa_relaxed_overall,0.6712,0.009397422445513864
|
| 245 |
+
Baseline,26700,docvqa_val_anls,0.7233130041233962,0.005709000608468465
|
| 246 |
+
Baseline,26700,infovqa_val_anls,0.34093933218960265,0.007871398735359877
|
| 247 |
+
Baseline,26700,mme_total_score,1290.1798719487797,
|
| 248 |
+
Baseline,26700,mmmu_val_mmmu_acc,0.29889,
|
| 249 |
+
Baseline,26700,mmstar_average,0.3681821634203056,
|
| 250 |
+
Baseline,26700,ocrbench_ocrbench_accuracy,0.602,
|
| 251 |
+
Baseline,26700,textvqa_val_exact_match,0.63326,0.006491932186699375
|
| 252 |
+
Baseline,27900,ai2d_exact_match,0.49773316062176165,0.008999061633391479
|
| 253 |
+
Baseline,27900,average,0.5456332793229398,
|
| 254 |
+
Baseline,27900,average_rank,1.625,
|
| 255 |
+
Baseline,27900,chartqa_relaxed_overall,0.6756,0.009364877808842454
|
| 256 |
+
Baseline,27900,docvqa_val_anls,0.7132690678246167,0.00575358310740901
|
| 257 |
+
Baseline,27900,infovqa_val_anls,0.3362338249924974,0.007684149470716349
|
| 258 |
+
Baseline,27900,mme_total_score,1267.1172468987595,
|
| 259 |
+
Baseline,27900,mmstar_average,0.3725169018217032,
|
| 260 |
+
Baseline,27900,ocrbench_ocrbench_accuracy,0.599,
|
| 261 |
+
Baseline,27900,textvqa_val_exact_match,0.62508,0.006518059200340837
|
| 262 |
+
Baseline,29100,ai2d_exact_match,0.5019430051813472,0.008999086170553228
|
| 263 |
+
Baseline,29100,average,0.5238317316407767,
|
| 264 |
+
Baseline,29100,average_rank,1.0,
|
| 265 |
+
Baseline,29100,chartqa_relaxed_overall,0.6828,0.009309582768982347
|
| 266 |
+
Baseline,29100,docvqa_val_anls,0.7233823673869951,0.005705166797815572
|
| 267 |
+
Baseline,29100,infovqa_val_anls,0.34214735285161113,0.007759163899965965
|
| 268 |
+
Baseline,29100,mme_total_score,1321.8040216086433,
|
| 269 |
+
Baseline,29100,mmmu_val_mmmu_acc,0.31222,
|
| 270 |
+
Baseline,29100,mmstar_average,0.3709411277062599,
|
| 271 |
+
Baseline,29100,ocrbench_ocrbench_accuracy,0.622,
|
| 272 |
+
Baseline,29100,textvqa_val_exact_match,0.6352199999999999,0.00647159073314463
|
| 273 |
+
Baseline,30300,ai2d_exact_match,0.5055051813471503,0.008998608627616667
|
| 274 |
+
Baseline,30300,average,0.5497034826600226,
|
| 275 |
+
Baseline,30300,average_rank,1.375,
|
| 276 |
+
Baseline,30300,chartqa_relaxed_overall,0.6784,0.009343676884347384
|
| 277 |
+
Baseline,30300,docvqa_val_anls,0.7227075209990185,0.005720573311731873
|
| 278 |
+
Baseline,30300,infovqa_val_anls,0.33249900926543363,0.007751325884024483
|
| 279 |
+
Baseline,30300,mme_total_score,1290.3790516206482,
|
| 280 |
+
Baseline,30300,mmstar_average,0.36331266700855536,
|
| 281 |
+
Baseline,30300,ocrbench_ocrbench_accuracy,0.612,
|
| 282 |
+
Baseline,30300,textvqa_val_exact_match,0.6335,0.006488911402865572
|
| 283 |
+
Baseline,31500,ai2d_exact_match,0.4993523316062176,0.008999146569435543
|
| 284 |
+
Baseline,31500,average,0.5220721222554265,
|
| 285 |
+
Baseline,31500,average_rank,1.5555555555555556,
|
| 286 |
+
Baseline,31500,chartqa_relaxed_overall,0.6872,0.009274528060677767
|
| 287 |
+
Baseline,31500,docvqa_val_anls,0.732681296661989,0.005643494305560718
|
| 288 |
+
Baseline,31500,infovqa_val_anls,0.34453436089995576,0.007841367492503165
|
| 289 |
+
Baseline,31500,mme_total_score,1304.8996598639455,
|
| 290 |
+
Baseline,31500,mmmu_val_mmmu_acc,0.29444,
|
| 291 |
+
Baseline,31500,mmstar_average,0.37192898887525,
|
| 292 |
+
Baseline,31500,ocrbench_ocrbench_accuracy,0.61,
|
| 293 |
+
Baseline,31500,textvqa_val_exact_match,0.63644,0.006473052244580776
|
| 294 |
+
Baseline,32700,ai2d_exact_match,0.49870466321243523,0.00899912391990207
|
| 295 |
+
Baseline,32700,average,0.5546837276191249,
|
| 296 |
+
Baseline,32700,average_rank,1.5,
|
| 297 |
+
Baseline,32700,chartqa_relaxed_overall,0.68,0.009331389496316869
|
| 298 |
+
Baseline,32700,docvqa_val_anls,0.7278962076951819,0.005686137433507678
|
| 299 |
+
Baseline,32700,infovqa_val_anls,0.3359004823603636,0.007743137801806592
|
| 300 |
+
Baseline,32700,mme_total_score,1329.2223889555821,
|
| 301 |
+
Baseline,32700,mmstar_average,0.3761847400658931,
|
| 302 |
+
Baseline,32700,ocrbench_ocrbench_accuracy,0.626,
|
| 303 |
+
Baseline,32700,textvqa_val_exact_match,0.6381000000000001,0.006469625121275727
|
| 304 |
+
Baseline,33900,ai2d_exact_match,0.5019430051813472,0.00899908617055323
|
| 305 |
+
Baseline,33900,average,0.5185104134885045,
|
| 306 |
+
Baseline,33900,average_rank,1.5555555555555556,
|
| 307 |
+
Baseline,33900,chartqa_relaxed_overall,0.6784,0.009343676884347384
|
| 308 |
+
Baseline,33900,docvqa_val_anls,0.7328401883203162,0.005641229328683336
|
| 309 |
+
Baseline,33900,infovqa_val_anls,0.33727943427582574,0.0077500601420040695
|
| 310 |
+
Baseline,33900,mme_total_score,1330.3196278511405,
|
| 311 |
+
Baseline,33900,mmmu_val_mmmu_acc,0.28,
|
| 312 |
+
Baseline,33900,mmstar_average,0.3640006801305467,
|
| 313 |
+
Baseline,33900,ocrbench_ocrbench_accuracy,0.617,
|
| 314 |
+
Baseline,33900,textvqa_val_exact_match,0.63662,0.006467562214018388
|
| 315 |
+
Baseline,35100,ai2d_exact_match,0.5029145077720207,0.008999001233939133
|
| 316 |
+
Baseline,35100,average,0.5522905800868071,
|
| 317 |
+
Baseline,35100,average_rank,1.625,
|
| 318 |
+
Baseline,35100,chartqa_relaxed_overall,0.68,0.009331389496316869
|
| 319 |
+
Baseline,35100,docvqa_val_anls,0.7269648828481717,0.005683622810231662
|
| 320 |
+
Baseline,35100,infovqa_val_anls,0.33846207838337145,0.00774681529996113
|
| 321 |
+
Baseline,35100,mme_total_score,1299.1129451780712,
|
| 322 |
+
Baseline,35100,mmstar_average,0.36183259160408615,
|
| 323 |
+
Baseline,35100,ocrbench_ocrbench_accuracy,0.616,
|
| 324 |
+
Baseline,35100,textvqa_val_exact_match,0.63986,0.0064564830453322595
|
| 325 |
+
Baseline,36300,ai2d_exact_match,0.501619170984456,0.008999106932714636
|
| 326 |
+
Baseline,36300,average,0.5203510175588769,
|
| 327 |
+
Baseline,36300,average_rank,1.4444444444444444,
|
| 328 |
+
Baseline,36300,chartqa_relaxed_overall,0.6808,0.009325198535746702
|
| 329 |
+
Baseline,36300,docvqa_val_anls,0.7270212281583848,0.0056833541878296414
|
| 330 |
+
Baseline,36300,infovqa_val_anls,0.3340392024865933,0.007611756166885497
|
| 331 |
+
Baseline,36300,mme_total_score,1280.1442577030812,
|
| 332 |
+
Baseline,36300,mmmu_val_mmmu_acc,0.30111,
|
| 333 |
+
Baseline,36300,mmstar_average,0.36247853884158143,
|
| 334 |
+
Baseline,36300,ocrbench_ocrbench_accuracy,0.615,
|
| 335 |
+
Baseline,36300,textvqa_val_exact_match,0.64074,0.0064493076522863105
|
| 336 |
+
Baseline,37500,ai2d_exact_match,0.5074481865284974,0.008998155599035891
|
| 337 |
+
Baseline,37500,average,0.5599086924183005,
|
| 338 |
+
Baseline,37500,average_rank,1.25,
|
| 339 |
+
Baseline,37500,chartqa_relaxed_overall,0.69,0.009251715392027472
|
| 340 |
+
Baseline,37500,docvqa_val_anls,0.7338638293909314,0.005628628195159443
|
| 341 |
+
Baseline,37500,infovqa_val_anls,0.35075945776545553,0.007880392253956911
|
| 342 |
+
Baseline,37500,mme_total_score,1308.0833333333333,
|
| 343 |
+
Baseline,37500,mmstar_average,0.37624937324321944,
|
| 344 |
+
Baseline,37500,ocrbench_ocrbench_accuracy,0.622,
|
| 345 |
+
Baseline,37500,textvqa_val_exact_match,0.63904,0.006478670412520058
|
| 346 |
+
Baseline,38700,ai2d_exact_match,0.5,0.008999154119267315
|
| 347 |
+
Baseline,38700,average,0.5225140432328732,
|
| 348 |
+
Baseline,38700,average_rank,1.5555555555555556,
|
| 349 |
+
Baseline,38700,chartqa_relaxed_overall,0.6832,0.009306435832216308
|
| 350 |
+
Baseline,38700,docvqa_val_anls,0.73088808708227,0.00563114482117092
|
| 351 |
+
Baseline,38700,infovqa_val_anls,0.3478216232204623,0.00789714223139076
|
| 352 |
+
Baseline,38700,mme_total_score,1277.5526210484195,
|
| 353 |
+
Baseline,38700,mmmu_val_mmmu_acc,0.28667,
|
| 354 |
+
Baseline,38700,mmstar_average,0.3681926355602532,
|
| 355 |
+
Baseline,38700,ocrbench_ocrbench_accuracy,0.624,
|
| 356 |
+
Baseline,38700,textvqa_val_exact_match,0.6393399999999999,0.00647079957419683
|
| 357 |
+
Baseline,39900,ai2d_exact_match,0.5058290155440415,0.008998542562369288
|
| 358 |
+
Baseline,39900,average,0.5567573845010034,
|
| 359 |
+
Baseline,39900,average_rank,1.375,
|
| 360 |
+
Baseline,39900,chartqa_relaxed_overall,0.6788,0.00934061683451043
|
| 361 |
+
Baseline,39900,docvqa_val_anls,0.7307115103048833,0.005666517404544185
|
| 362 |
+
Baseline,39900,infovqa_val_anls,0.3519024541637205,0.007911172051974351
|
| 363 |
+
Baseline,39900,mme_total_score,1294.3033213285314,
|
| 364 |
+
Baseline,39900,mmstar_average,0.36969871149437833,
|
| 365 |
+
Baseline,39900,ocrbench_ocrbench_accuracy,0.619,
|
| 366 |
+
Baseline,39900,textvqa_val_exact_match,0.6413599999999999,0.006448549204074314
|
| 367 |
+
Internal Deduplication,300,ai2d_exact_match,0.2503238341968912,0.007796858242572104
|
| 368 |
+
Internal Deduplication,300,average,0.19412722789194248,
|
| 369 |
+
Internal Deduplication,300,average_rank,1.5555555555555556,
|
| 370 |
+
Internal Deduplication,300,chartqa_relaxed_overall,0.1412,0.0069659481604092775
|
| 371 |
+
Internal Deduplication,300,docvqa_val_anls,0.15637861297756628,0.004267695603476823
|
| 372 |
+
Internal Deduplication,300,infovqa_val_anls,0.1042887841127396,0.005046536381262501
|
| 373 |
+
Internal Deduplication,300,mme_total_score,598.6149459783913,
|
| 374 |
+
Internal Deduplication,300,mmmu_val_mmmu_acc,0.26556,
|
| 375 |
+
Internal Deduplication,300,mmstar_average,0.2694265918483427,
|
| 376 |
+
Internal Deduplication,300,ocrbench_ocrbench_accuracy,0.167,
|
| 377 |
+
Internal Deduplication,300,textvqa_val_exact_match,0.19884000000000002,0.005492264002465154
|
| 378 |
+
Internal Deduplication,1500,ai2d_exact_match,0.27299222797927464,0.008018190192865413
|
| 379 |
+
Internal Deduplication,1500,average,0.31955460499150806,
|
| 380 |
+
Internal Deduplication,1500,average_rank,1.7777777777777777,
|
| 381 |
+
Internal Deduplication,1500,chartqa_relaxed_overall,0.3708,0.00966231277258432
|
| 382 |
+
Internal Deduplication,1500,docvqa_val_anls,0.42768709568231533,0.006154040400291129
|
| 383 |
+
Internal Deduplication,1500,infovqa_val_anls,0.2099303690224102,0.00676857279363082
|
| 384 |
+
Internal Deduplication,1500,mme_total_score,992.9132653061225,
|
| 385 |
+
Internal Deduplication,1500,mmmu_val_mmmu_acc,0.26889,
|
| 386 |
+
Internal Deduplication,1500,mmstar_average,0.21057714724806412,
|
| 387 |
+
Internal Deduplication,1500,ocrbench_ocrbench_accuracy,0.404,
|
| 388 |
+
Internal Deduplication,1500,textvqa_val_exact_match,0.39155999999999996,0.006665511164780805
|
| 389 |
+
Internal Deduplication,2700,ai2d_exact_match,0.295660621761658,0.008213332656949247
|
| 390 |
+
Internal Deduplication,2700,average,0.36762151428382045,
|
| 391 |
+
Internal Deduplication,2700,average_rank,1.5555555555555556,
|
| 392 |
+
Internal Deduplication,2700,chartqa_relaxed_overall,0.4752,0.009989689762981844
|
| 393 |
+
Internal Deduplication,2700,docvqa_val_anls,0.5094800317043119,0.006254649346492251
|
| 394 |
+
Internal Deduplication,2700,infovqa_val_anls,0.20719401979989327,0.006520807933324386
|
| 395 |
+
Internal Deduplication,2700,mme_total_score,1071.3925570228091,
|
| 396 |
+
Internal Deduplication,2700,mmmu_val_mmmu_acc,0.27,
|
| 397 |
+
Internal Deduplication,2700,mmstar_average,0.2397774410047003,
|
| 398 |
+
Internal Deduplication,2700,ocrbench_ocrbench_accuracy,0.494,
|
| 399 |
+
Internal Deduplication,2700,textvqa_val_exact_match,0.44965999999999995,0.006770608917152268
|
| 400 |
+
Internal Deduplication,3900,ai2d_exact_match,0.35751295336787564,0.008626006165018857
|
| 401 |
+
Internal Deduplication,3900,average,0.40092708598125315,
|
| 402 |
+
Internal Deduplication,3900,average_rank,1.5555555555555556,
|
| 403 |
+
Internal Deduplication,3900,chartqa_relaxed_overall,0.5108,0.009999667061284322
|
| 404 |
+
Internal Deduplication,3900,docvqa_val_anls,0.5404721998847206,0.0062378368939630035
|
| 405 |
+
Internal Deduplication,3900,infovqa_val_anls,0.22349780573998537,0.006643570027298634
|
| 406 |
+
Internal Deduplication,3900,mme_total_score,1134.516706682673,
|
| 407 |
+
Internal Deduplication,3900,mmmu_val_mmmu_acc,0.29111,
|
| 408 |
+
Internal Deduplication,3900,mmstar_average,0.27976372885744333,
|
| 409 |
+
Internal Deduplication,3900,ocrbench_ocrbench_accuracy,0.51,
|
| 410 |
+
Internal Deduplication,3900,textvqa_val_exact_match,0.49426000000000003,0.006797576913163843
|
| 411 |
+
Internal Deduplication,5100,ai2d_exact_match,0.38827720207253885,0.008771623130477878
|
| 412 |
+
Internal Deduplication,5100,average,0.4219485735226934,
|
| 413 |
+
Internal Deduplication,5100,average_rank,1.7777777777777777,
|
| 414 |
+
Internal Deduplication,5100,chartqa_relaxed_overall,0.5236,0.009990852959439592
|
| 415 |
+
Internal Deduplication,5100,docvqa_val_anls,0.5747949496010799,0.006245322873999332
|
| 416 |
+
Internal Deduplication,5100,infovqa_val_anls,0.2283558074433608,0.006643505571541433
|
| 417 |
+
Internal Deduplication,5100,mme_total_score,1120.3775510204082,
|
| 418 |
+
Internal Deduplication,5100,mmmu_val_mmmu_acc,0.27444,
|
| 419 |
+
Internal Deduplication,5100,mmstar_average,0.32262062906456745,
|
| 420 |
+
Internal Deduplication,5100,ocrbench_ocrbench_accuracy,0.546,
|
| 421 |
+
Internal Deduplication,5100,textvqa_val_exact_match,0.5175,0.006791610648074506
|
| 422 |
+
Internal Deduplication,6300,ai2d_exact_match,0.3947538860103627,0.008797532848529212
|
| 423 |
+
Internal Deduplication,6300,average,0.4392913905300591,
|
| 424 |
+
Internal Deduplication,6300,average_rank,1.5555555555555556,
|
| 425 |
+
Internal Deduplication,6300,chartqa_relaxed_overall,0.554,0.009943497838271193
|
| 426 |
+
Internal Deduplication,6300,docvqa_val_anls,0.6054354573141266,0.006148692369883667
|
| 427 |
+
Internal Deduplication,6300,infovqa_val_anls,0.2479668172159887,0.006849066135124891
|
| 428 |
+
Internal Deduplication,6300,mme_total_score,1120.747699079632,
|
| 429 |
+
Internal Deduplication,6300,mmmu_val_mmmu_acc,0.28222,
|
| 430 |
+
Internal Deduplication,6300,mmstar_average,0.33081496369999497,
|
| 431 |
+
Internal Deduplication,6300,ocrbench_ocrbench_accuracy,0.562,
|
| 432 |
+
Internal Deduplication,6300,textvqa_val_exact_match,0.53714,0.00675218797787041
|
| 433 |
+
Internal Deduplication,7500,ai2d_exact_match,0.4368523316062176,0.008927095061184939
|
| 434 |
+
Internal Deduplication,7500,average,0.4484625925841701,
|
| 435 |
+
Internal Deduplication,7500,average_rank,1.6666666666666667,
|
| 436 |
+
Internal Deduplication,7500,chartqa_relaxed_overall,0.5716,0.009898917689756362
|
| 437 |
+
Internal Deduplication,7500,docvqa_val_anls,0.6158904129878224,0.006156668221029065
|
| 438 |
+
Internal Deduplication,7500,infovqa_val_anls,0.2491041330885082,0.006950914810318631
|
| 439 |
+
Internal Deduplication,7500,mme_total_score,1182.0997398959585,
|
| 440 |
+
Internal Deduplication,7500,mmmu_val_mmmu_acc,0.30222,
|
| 441 |
+
Internal Deduplication,7500,mmstar_average,0.3126938629908125,
|
| 442 |
+
Internal Deduplication,7500,ocrbench_ocrbench_accuracy,0.554,
|
| 443 |
+
Internal Deduplication,7500,textvqa_val_exact_match,0.5453399999999999,0.006743052026354684
|
| 444 |
+
Internal Deduplication,8700,ai2d_exact_match,0.43555699481865284,0.008924095913829722
|
| 445 |
+
Internal Deduplication,8700,average,0.4610890710492869,
|
| 446 |
+
Internal Deduplication,8700,average_rank,1.4444444444444444,
|
| 447 |
+
Internal Deduplication,8700,chartqa_relaxed_overall,0.5856,0.009854334029231191
|
| 448 |
+
Internal Deduplication,8700,docvqa_val_anls,0.6337792662388687,0.006121292484093459
|
| 449 |
+
Internal Deduplication,8700,infovqa_val_anls,0.3014589775424448,0.007723778532370607
|
| 450 |
+
Internal Deduplication,8700,mme_total_score,1146.702080832333,
|
| 451 |
+
Internal Deduplication,8700,mmmu_val_mmmu_acc,0.28111,
|
| 452 |
+
Internal Deduplication,8700,mmstar_average,0.34138732979432873,
|
| 453 |
+
Internal Deduplication,8700,ocrbench_ocrbench_accuracy,0.554,
|
| 454 |
+
Internal Deduplication,8700,textvqa_val_exact_match,0.5558200000000001,0.006722310868494742
|
| 455 |
+
Internal Deduplication,9900,ai2d_exact_match,0.4530440414507772,0.008959382447335284
|
| 456 |
+
Internal Deduplication,9900,average,0.4640919637505932,
|
| 457 |
+
Internal Deduplication,9900,average_rank,1.4444444444444444,
|
| 458 |
+
Internal Deduplication,9900,chartqa_relaxed_overall,0.596,0.009815912634917984
|
| 459 |
+
Internal Deduplication,9900,docvqa_val_anls,0.6449581300442709,0.006031449307242489
|
| 460 |
+
Internal Deduplication,9900,infovqa_val_anls,0.2651241729320676,0.007027677036596941
|
| 461 |
+
Internal Deduplication,9900,mme_total_score,1198.2277911164465,
|
| 462 |
+
Internal Deduplication,9900,mmmu_val_mmmu_acc,0.28,
|
| 463 |
+
Internal Deduplication,9900,mmstar_average,0.33564936557763,
|
| 464 |
+
Internal Deduplication,9900,ocrbench_ocrbench_accuracy,0.571,
|
| 465 |
+
Internal Deduplication,9900,textvqa_val_exact_match,0.5669599999999999,0.0067004067615447065
|
| 466 |
+
Internal Deduplication,11100,ai2d_exact_match,0.4566062176165803,0.008965198879336196
|
| 467 |
+
Internal Deduplication,11100,average,0.4745786301209996,
|
| 468 |
+
Internal Deduplication,11100,average_rank,1.3333333333333333,
|
| 469 |
+
Internal Deduplication,11100,chartqa_relaxed_overall,0.608,0.00976588700628918
|
| 470 |
+
Internal Deduplication,11100,docvqa_val_anls,0.6596743239996393,0.005996833864420919
|
| 471 |
+
Internal Deduplication,11100,infovqa_val_anls,0.30142039609988674,0.0075421730872732295
|
| 472 |
+
Internal Deduplication,11100,mme_total_score,1136.5589235694279,
|
| 473 |
+
Internal Deduplication,11100,mmmu_val_mmmu_acc,0.29,
|
| 474 |
+
Internal Deduplication,11100,mmstar_average,0.32532810325189065,
|
| 475 |
+
Internal Deduplication,11100,ocrbench_ocrbench_accuracy,0.586,
|
| 476 |
+
Internal Deduplication,11100,textvqa_val_exact_match,0.5696,0.00669753233570974
|
| 477 |
+
Internal Deduplication,12300,ai2d_exact_match,0.47085492227979275,0.0089838527076916
|
| 478 |
+
Internal Deduplication,12300,average,0.47675266119609205,
|
| 479 |
+
Internal Deduplication,12300,average_rank,1.4444444444444444,
|
| 480 |
+
Internal Deduplication,12300,chartqa_relaxed_overall,0.6024,0.009789996609470577
|
| 481 |
+
Internal Deduplication,12300,docvqa_val_anls,0.6541921314490913,0.0059901948837693935
|
| 482 |
+
Internal Deduplication,12300,infovqa_val_anls,0.26890492643687214,0.0068929334847927185
|
| 483 |
+
Internal Deduplication,12300,mme_total_score,1180.1697679071628,
|
| 484 |
+
Internal Deduplication,12300,mmmu_val_mmmu_acc,0.30111,
|
| 485 |
+
Internal Deduplication,12300,mmstar_average,0.3420593094029801,
|
| 486 |
+
Internal Deduplication,12300,ocrbench_ocrbench_accuracy,0.588,
|
| 487 |
+
Internal Deduplication,12300,textvqa_val_exact_match,0.5865000000000001,0.006650353031162167
|
| 488 |
+
Internal Deduplication,13500,ai2d_exact_match,0.4689119170984456,0.008981742470016596
|
| 489 |
+
Internal Deduplication,13500,average,0.477194042186954,
|
| 490 |
+
Internal Deduplication,13500,average_rank,1.4444444444444444,
|
| 491 |
+
Internal Deduplication,13500,chartqa_relaxed_overall,0.6076,0.009767653701044555
|
| 492 |
+
Internal Deduplication,13500,docvqa_val_anls,0.6669529256090054,0.005964340335624923
|
| 493 |
+
Internal Deduplication,13500,infovqa_val_anls,0.28048200541677026,0.00715533754622952
|
| 494 |
+
Internal Deduplication,13500,mme_total_score,1205.548119247699,
|
| 495 |
+
Internal Deduplication,13500,mmmu_val_mmmu_acc,0.28556,
|
| 496 |
+
Internal Deduplication,13500,mmstar_average,0.3358454893714108,
|
| 497 |
+
Internal Deduplication,13500,ocrbench_ocrbench_accuracy,0.589,
|
| 498 |
+
Internal Deduplication,13500,textvqa_val_exact_match,0.5832,0.006654352566675162
|
| 499 |
+
Internal Deduplication,14700,ai2d_exact_match,0.47733160621761656,0.008989900821900263
|
| 500 |
+
Internal Deduplication,14700,average,0.4884023663438535,
|
| 501 |
+
Internal Deduplication,14700,average_rank,1.4444444444444444,
|
| 502 |
+
Internal Deduplication,14700,chartqa_relaxed_overall,0.6304,0.009655859891905061
|
| 503 |
+
Internal Deduplication,14700,docvqa_val_anls,0.6801802838124448,0.005922660123416213
|
| 504 |
+
Internal Deduplication,14700,infovqa_val_anls,0.306442807638199,0.007585813874676366
|
| 505 |
+
Internal Deduplication,14700,mme_total_score,1141.5065026010404,
|
| 506 |
+
Internal Deduplication,14700,mmmu_val_mmmu_acc,0.28556,
|
| 507 |
+
Internal Deduplication,14700,mmstar_average,0.3313042330825678,
|
| 508 |
+
Internal Deduplication,14700,ocrbench_ocrbench_accuracy,0.601,
|
| 509 |
+
Internal Deduplication,14700,textvqa_val_exact_match,0.595,0.006618682753560443
|
| 510 |
+
Internal Deduplication,15900,ai2d_exact_match,0.48737046632124353,0.0089962828388782
|
| 511 |
+
Internal Deduplication,15900,average,0.5203517701538484,
|
| 512 |
+
Internal Deduplication,15900,average_rank,1.5,
|
| 513 |
+
Internal Deduplication,15900,chartqa_relaxed_overall,0.6268,0.009675026948726469
|
| 514 |
+
Internal Deduplication,15900,docvqa_val_anls,0.6832159326200654,0.005900840845629961
|
| 515 |
+
Internal Deduplication,15900,infovqa_val_anls,0.3152545751330662,0.007651477632904633
|
| 516 |
+
Internal Deduplication,15900,mme_total_score,1225.4948979591836,
|
| 517 |
+
Internal Deduplication,15900,mmstar_average,0.32764141700256333,
|
| 518 |
+
Internal Deduplication,15900,ocrbench_ocrbench_accuracy,0.603,
|
| 519 |
+
Internal Deduplication,15900,textvqa_val_exact_match,0.5991799999999999,0.006605224547149299
|
| 520 |
+
Internal Deduplication,17100,ai2d_exact_match,0.47636010362694303,0.008989090232793597
|
| 521 |
+
Internal Deduplication,17100,average,0.4961663419392575,
|
| 522 |
+
Internal Deduplication,17100,average_rank,1.2222222222222223,
|
| 523 |
+
Internal Deduplication,17100,chartqa_relaxed_overall,0.6464,0.009563650001989001
|
| 524 |
+
Internal Deduplication,17100,docvqa_val_anls,0.6927261914773173,0.005861047908265113
|
| 525 |
+
Internal Deduplication,17100,infovqa_val_anls,0.3154358494585615,0.00763456160506387
|
| 526 |
+
Internal Deduplication,17100,mme_total_score,1286.2750100040016,
|
| 527 |
+
Internal Deduplication,17100,mmmu_val_mmmu_acc,0.29889,
|
| 528 |
+
Internal Deduplication,17100,mmstar_average,0.34921859095123836,
|
| 529 |
+
Internal Deduplication,17100,ocrbench_ocrbench_accuracy,0.587,
|
| 530 |
+
Internal Deduplication,17100,textvqa_val_exact_match,0.6033,0.006602767700613255
|
| 531 |
+
Internal Deduplication,18300,ai2d_exact_match,0.4786269430051813,0.008990928596702264
|
| 532 |
+
Internal Deduplication,18300,average,0.5266473503807093,
|
| 533 |
+
Internal Deduplication,18300,average_rank,1.5,
|
| 534 |
+
Internal Deduplication,18300,chartqa_relaxed_overall,0.6552,0.009507962165354631
|
| 535 |
+
Internal Deduplication,18300,docvqa_val_anls,0.6989798369115747,0.00583327960847754
|
| 536 |
+
Internal Deduplication,18300,infovqa_val_anls,0.31662733272229215,0.00758318378302427
|
| 537 |
+
Internal Deduplication,18300,mme_total_score,1217.9891956782712,
|
| 538 |
+
Internal Deduplication,18300,mmstar_average,0.3360973400259174,
|
| 539 |
+
Internal Deduplication,18300,ocrbench_ocrbench_accuracy,0.595,
|
| 540 |
+
Internal Deduplication,18300,textvqa_val_exact_match,0.6060000000000001,0.006592108249887561
|
| 541 |
+
Internal Deduplication,19500,ai2d_exact_match,0.4896373056994819,0.008997221155546277
|
| 542 |
+
Internal Deduplication,19500,average,0.5003413312777834,
|
| 543 |
+
Internal Deduplication,19500,average_rank,1.5555555555555556,
|
| 544 |
+
Internal Deduplication,19500,chartqa_relaxed_overall,0.6508,0.009536252935404934
|
| 545 |
+
Internal Deduplication,19500,docvqa_val_anls,0.7013552478733074,0.005824977752328648
|
| 546 |
+
Internal Deduplication,19500,infovqa_val_anls,0.32620790060169225,0.007764453086996403
|
| 547 |
+
Internal Deduplication,19500,mme_total_score,1299.4400760304122,
|
| 548 |
+
Internal Deduplication,19500,mmmu_val_mmmu_acc,0.29556,
|
| 549 |
+
Internal Deduplication,19500,mmstar_average,0.3368301960477849,
|
| 550 |
+
Internal Deduplication,19500,ocrbench_ocrbench_accuracy,0.593,
|
| 551 |
+
Internal Deduplication,19500,textvqa_val_exact_match,0.60934,0.006559905437723197
|
| 552 |
+
Internal Deduplication,20700,ai2d_exact_match,0.4889896373056995,0.008996971954224612
|
| 553 |
+
Internal Deduplication,20700,average,0.5296276786578733,
|
| 554 |
+
Internal Deduplication,20700,average_rank,1.75,
|
| 555 |
+
Internal Deduplication,20700,chartqa_relaxed_overall,0.6444,0.009575809858898698
|
| 556 |
+
Internal Deduplication,20700,docvqa_val_anls,0.6989112987356239,0.00585808944665685
|
| 557 |
+
Internal Deduplication,20700,infovqa_val_anls,0.3158264619814475,0.007568423570507376
|
| 558 |
+
Internal Deduplication,20700,mme_total_score,1174.7768107242898,
|
| 559 |
+
Internal Deduplication,20700,mmstar_average,0.33400635258234235,
|
| 560 |
+
Internal Deduplication,20700,ocrbench_ocrbench_accuracy,0.614,
|
| 561 |
+
Internal Deduplication,20700,textvqa_val_exact_match,0.6112599999999999,0.0065589363778955695
|
| 562 |
+
Internal Deduplication,21900,ai2d_exact_match,0.4957901554404145,0.008998835133354702
|
| 563 |
+
Internal Deduplication,21900,average,0.5035083877228906,
|
| 564 |
+
Internal Deduplication,21900,average_rank,1.5555555555555556,
|
| 565 |
+
Internal Deduplication,21900,chartqa_relaxed_overall,0.64,0.009601920576192066
|
| 566 |
+
Internal Deduplication,21900,docvqa_val_anls,0.7037412472922321,0.005813532329025727
|
| 567 |
+
Internal Deduplication,21900,infovqa_val_anls,0.3194560697014221,0.007649647661031666
|
| 568 |
+
Internal Deduplication,21900,mme_total_score,1199.6734693877552,
|
| 569 |
+
Internal Deduplication,21900,mmmu_val_mmmu_acc,0.30889,
|
| 570 |
+
Internal Deduplication,21900,mmstar_average,0.33692962934905674,
|
| 571 |
+
Internal Deduplication,21900,ocrbench_ocrbench_accuracy,0.603,
|
| 572 |
+
Internal Deduplication,21900,textvqa_val_exact_match,0.6202599999999999,0.006539392877923941
|
| 573 |
+
Internal Deduplication,23100,ai2d_exact_match,0.4944948186528497,0.008998608627616672
|
| 574 |
+
Internal Deduplication,23100,average,0.5413853458503779,
|
| 575 |
+
Internal Deduplication,23100,average_rank,1.5,
|
| 576 |
+
Internal Deduplication,23100,chartqa_relaxed_overall,0.646,0.009566096595876119
|
| 577 |
+
Internal Deduplication,23100,docvqa_val_anls,0.7101587999220607,0.005806193919644477
|
| 578 |
+
Internal Deduplication,23100,infovqa_val_anls,0.336754873549068,0.007886540099947482
|
| 579 |
+
Internal Deduplication,23100,mme_total_score,1316.6187474989997,
|
| 580 |
+
Internal Deduplication,23100,mmstar_average,0.3476289288286667,
|
| 581 |
+
Internal Deduplication,23100,ocrbench_ocrbench_accuracy,0.627,
|
| 582 |
+
Internal Deduplication,23100,textvqa_val_exact_match,0.62766,0.006520482207447814
|
| 583 |
+
Internal Deduplication,24300,ai2d_exact_match,0.4899611398963731,0.008997340090107673
|
| 584 |
+
Internal Deduplication,24300,average,0.5100750686661266,
|
| 585 |
+
Internal Deduplication,24300,average_rank,1.4444444444444444,
|
| 586 |
+
Internal Deduplication,24300,chartqa_relaxed_overall,0.6516,0.009531175862679805
|
| 587 |
+
Internal Deduplication,24300,docvqa_val_anls,0.7179021844889384,0.005742973360829408
|
| 588 |
+
Internal Deduplication,24300,infovqa_val_anls,0.3358758923979091,0.007878017215252312
|
| 589 |
+
Internal Deduplication,24300,mme_total_score,1409.844237695078,
|
| 590 |
+
Internal Deduplication,24300,mmmu_val_mmmu_acc,0.28556,
|
| 591 |
+
Internal Deduplication,24300,mmstar_average,0.3347613325457924,
|
| 592 |
+
Internal Deduplication,24300,ocrbench_ocrbench_accuracy,0.634,
|
| 593 |
+
Internal Deduplication,24300,textvqa_val_exact_match,0.63094,0.006498229657201687
|
| 594 |
+
Internal Deduplication,25500,ai2d_exact_match,0.48607512953367876,0.008995663534025174
|
| 595 |
+
Internal Deduplication,25500,average,0.5472398215745332,
|
| 596 |
+
Internal Deduplication,25500,average_rank,1.375,
|
| 597 |
+
Internal Deduplication,25500,chartqa_relaxed_overall,0.6536,0.0095183536193109
|
| 598 |
+
Internal Deduplication,25500,docvqa_val_anls,0.7180940785000507,0.005735169057784404
|
| 599 |
+
Internal Deduplication,25500,infovqa_val_anls,0.35632636677863483,0.008180298439903802
|
| 600 |
+
Internal Deduplication,25500,mme_total_score,1376.716986794718,
|
| 601 |
+
Internal Deduplication,25500,mmstar_average,0.3529231762093682,
|
| 602 |
+
Internal Deduplication,25500,ocrbench_ocrbench_accuracy,0.633,
|
| 603 |
+
Internal Deduplication,25500,textvqa_val_exact_match,0.63066,0.006504156647155582
|
| 604 |
+
Internal Deduplication,26700,ai2d_exact_match,0.49255181347150256,0.008998155599035912
|
| 605 |
+
Internal Deduplication,26700,average,0.516487110189266,
|
| 606 |
+
Internal Deduplication,26700,average_rank,1.5555555555555556,
|
| 607 |
+
Internal Deduplication,26700,chartqa_relaxed_overall,0.6644,0.009445885130487209
|
| 608 |
+
Internal Deduplication,26700,docvqa_val_anls,0.7168133343849862,0.005756579734549226
|
| 609 |
+
Internal Deduplication,26700,infovqa_val_anls,0.34371436472133005,0.008017561696940439
|
| 610 |
+
Internal Deduplication,26700,mme_total_score,1409.4487795118048,
|
| 611 |
+
Internal Deduplication,26700,mmmu_val_mmmu_acc,0.30222,
|
| 612 |
+
Internal Deduplication,26700,mmstar_average,0.35023736893630925,
|
| 613 |
+
Internal Deduplication,26700,ocrbench_ocrbench_accuracy,0.63,
|
| 614 |
+
Internal Deduplication,26700,textvqa_val_exact_match,0.6319600000000001,0.006495302107669356
|
| 615 |
+
Internal Deduplication,27900,ai2d_exact_match,0.4954663212435233,0.008998784170060767
|
| 616 |
+
Internal Deduplication,27900,average,0.5488694312151498,
|
| 617 |
+
Internal Deduplication,27900,average_rank,1.375,
|
| 618 |
+
Internal Deduplication,27900,chartqa_relaxed_overall,0.6736,0.009379787213112317
|
| 619 |
+
Internal Deduplication,27900,docvqa_val_anls,0.7224633461958828,0.005716176978314635
|
| 620 |
+
Internal Deduplication,27900,infovqa_val_anls,0.35413809221269893,0.00811649922857756
|
| 621 |
+
Internal Deduplication,27900,mme_total_score,1365.8970588235293,
|
| 622 |
+
Internal Deduplication,27900,mmstar_average,0.33847825885394267,
|
| 623 |
+
Internal Deduplication,27900,ocrbench_ocrbench_accuracy,0.623,
|
| 624 |
+
Internal Deduplication,27900,textvqa_val_exact_match,0.6349400000000001,0.006474057612069333
|
| 625 |
+
Internal Deduplication,29100,ai2d_exact_match,0.4957901554404145,0.008998835133354704
|
| 626 |
+
Internal Deduplication,29100,average,0.5113797484193323,
|
| 627 |
+
Internal Deduplication,29100,average_rank,2.0,
|
| 628 |
+
Internal Deduplication,29100,chartqa_relaxed_overall,0.6604,0.009473364442136777
|
| 629 |
+
Internal Deduplication,29100,docvqa_val_anls,0.716657704725735,0.005756925555640175
|
| 630 |
+
Internal Deduplication,29100,infovqa_val_anls,0.3372271343716428,0.007828634509891694
|
| 631 |
+
Internal Deduplication,29100,mme_total_score,1300.1049419767908,
|
| 632 |
+
Internal Deduplication,29100,mmmu_val_mmmu_acc,0.29556,
|
| 633 |
+
Internal Deduplication,29100,mmstar_average,0.33882299281686595,
|
| 634 |
+
Internal Deduplication,29100,ocrbench_ocrbench_accuracy,0.613,
|
| 635 |
+
Internal Deduplication,29100,textvqa_val_exact_match,0.6335799999999999,0.006486361946288509
|
| 636 |
+
Internal Deduplication,30300,ai2d_exact_match,0.49676165803108807,0.008998965371572352
|
| 637 |
+
Internal Deduplication,30300,average,0.5468368131516261,
|
| 638 |
+
Internal Deduplication,30300,average_rank,1.625,
|
| 639 |
+
Internal Deduplication,30300,chartqa_relaxed_overall,0.6608,0.009470650520873179
|
| 640 |
+
Internal Deduplication,30300,docvqa_val_anls,0.7208981382284003,0.005745692168242118
|
| 641 |
+
Internal Deduplication,30300,infovqa_val_anls,0.33146012551516996,0.007795838114372819
|
| 642 |
+
Internal Deduplication,30300,mme_total_score,1330.1678671468587,
|
| 643 |
+
Internal Deduplication,30300,mmstar_average,0.35709777028672485,
|
| 644 |
+
Internal Deduplication,30300,ocrbench_ocrbench_accuracy,0.622,
|
| 645 |
+
Internal Deduplication,30300,textvqa_val_exact_match,0.6388400000000001,0.006462092742178937
|
| 646 |
+
Internal Deduplication,31500,ai2d_exact_match,0.4996761658031088,0.008999152231809677
|
| 647 |
+
Internal Deduplication,31500,average,0.5161255997108974,
|
| 648 |
+
Internal Deduplication,31500,average_rank,1.4444444444444444,
|
| 649 |
+
Internal Deduplication,31500,chartqa_relaxed_overall,0.6624,0.009459719367730022
|
| 650 |
+
Internal Deduplication,31500,docvqa_val_anls,0.7248827916963386,0.005715267948257416
|
| 651 |
+
Internal Deduplication,31500,infovqa_val_anls,0.3462785194206036,0.007940616340604684
|
| 652 |
+
Internal Deduplication,31500,mme_total_score,1388.7246898759504,
|
| 653 |
+
Internal Deduplication,31500,mmmu_val_mmmu_acc,0.28556,
|
| 654 |
+
Internal Deduplication,31500,mmstar_average,0.34634732076712815,
|
| 655 |
+
Internal Deduplication,31500,ocrbench_ocrbench_accuracy,0.622,
|
| 656 |
+
Internal Deduplication,31500,textvqa_val_exact_match,0.64186,0.006449237676913657
|
| 657 |
+
Internal Deduplication,32700,ai2d_exact_match,0.4957901554404145,0.008998835133354704
|
| 658 |
+
Internal Deduplication,32700,average,0.5500475012134611,
|
| 659 |
+
Internal Deduplication,32700,average_rank,1.5,
|
| 660 |
+
Internal Deduplication,32700,chartqa_relaxed_overall,0.6688,0.009414779829167153
|
| 661 |
+
Internal Deduplication,32700,docvqa_val_anls,0.7263156273407247,0.00570514646941267
|
| 662 |
+
Internal Deduplication,32700,infovqa_val_anls,0.3489756877198793,0.00798640336179305
|
| 663 |
+
Internal Deduplication,32700,mme_total_score,1362.764905962385,
|
| 664 |
+
Internal Deduplication,32700,mmstar_average,0.3385910379932094,
|
| 665 |
+
Internal Deduplication,32700,ocrbench_ocrbench_accuracy,0.63,
|
| 666 |
+
Internal Deduplication,32700,textvqa_val_exact_match,0.64186,0.006452586710386076
|
| 667 |
+
Internal Deduplication,33900,ai2d_exact_match,0.4957901554404145,0.008998835133354704
|
| 668 |
+
Internal Deduplication,33900,average,0.5160312203077811,
|
| 669 |
+
Internal Deduplication,33900,average_rank,1.4444444444444444,
|
| 670 |
+
Internal Deduplication,33900,chartqa_relaxed_overall,0.674,0.009376820884924869
|
| 671 |
+
Internal Deduplication,33900,docvqa_val_anls,0.7257174511919398,0.005702388110070895
|
| 672 |
+
Internal Deduplication,33900,infovqa_val_anls,0.3422539948680319,0.007936425119162906
|
| 673 |
+
Internal Deduplication,33900,mme_total_score,1389.4628851540615,
|
| 674 |
+
Internal Deduplication,33900,mmmu_val_mmmu_acc,0.28444,
|
| 675 |
+
Internal Deduplication,33900,mmstar_average,0.34272816096186326,
|
| 676 |
+
Internal Deduplication,33900,ocrbench_ocrbench_accuracy,0.619,
|
| 677 |
+
Internal Deduplication,33900,textvqa_val_exact_match,0.64432,0.0064359794815068575
|
| 678 |
+
Internal Deduplication,35100,ai2d_exact_match,0.49838082901554404,0.008999106932714645
|
| 679 |
+
Internal Deduplication,35100,average,0.5533101842015907,
|
| 680 |
+
Internal Deduplication,35100,average_rank,1.375,
|
| 681 |
+
Internal Deduplication,35100,chartqa_relaxed_overall,0.6736,0.009379787213112317
|
| 682 |
+
Internal Deduplication,35100,docvqa_val_anls,0.7278181728761878,0.005688301164010059
|
| 683 |
+
Internal Deduplication,35100,infovqa_val_anls,0.351201318391893,0.008119188634171728
|
| 684 |
+
Internal Deduplication,35100,mme_total_score,1411.3839535814327,
|
| 685 |
+
Internal Deduplication,35100,mmstar_average,0.34205096912751043,
|
| 686 |
+
Internal Deduplication,35100,ocrbench_ocrbench_accuracy,0.634,
|
| 687 |
+
Internal Deduplication,35100,textvqa_val_exact_match,0.64612,0.006431209933771596
|
| 688 |
+
Internal Deduplication,36300,ai2d_exact_match,0.49805699481865284,0.00899908617055324
|
| 689 |
+
Internal Deduplication,36300,average,0.5195231205481649,
|
| 690 |
+
Internal Deduplication,36300,average_rank,1.5555555555555556,
|
| 691 |
+
Internal Deduplication,36300,chartqa_relaxed_overall,0.672,0.009391574983583366
|
| 692 |
+
Internal Deduplication,36300,docvqa_val_anls,0.730916270863908,0.005660120362847363
|
| 693 |
+
Internal Deduplication,36300,infovqa_val_anls,0.3412406587672079,0.007911958522422949
|
| 694 |
+
Internal Deduplication,36300,mme_total_score,1367.637254901961,
|
| 695 |
+
Internal Deduplication,36300,mmmu_val_mmmu_acc,0.29444,
|
| 696 |
+
Internal Deduplication,36300,mmstar_average,0.34529103993555027,
|
| 697 |
+
Internal Deduplication,36300,ocrbench_ocrbench_accuracy,0.634,
|
| 698 |
+
Internal Deduplication,36300,textvqa_val_exact_match,0.6402399999999999,0.006461617365628822
|
| 699 |
+
Internal Deduplication,37500,ai2d_exact_match,0.5019430051813472,0.008999086170553233
|
| 700 |
+
Internal Deduplication,37500,average,0.5495836143474903,
|
| 701 |
+
Internal Deduplication,37500,average_rank,1.75,
|
| 702 |
+
Internal Deduplication,37500,chartqa_relaxed_overall,0.6756,0.009364877808842454
|
| 703 |
+
Internal Deduplication,37500,docvqa_val_anls,0.7255309514873474,0.005687086085909167
|
| 704 |
+
Internal Deduplication,37500,infovqa_val_anls,0.3366534174444908,0.007850461211973954
|
| 705 |
+
Internal Deduplication,37500,mme_total_score,1364.8713485394157,
|
| 706 |
+
Internal Deduplication,37500,mmstar_average,0.3467179263192468,
|
| 707 |
+
Internal Deduplication,37500,ocrbench_ocrbench_accuracy,0.618,
|
| 708 |
+
Internal Deduplication,37500,textvqa_val_exact_match,0.64264,0.0064540760066348676
|
| 709 |
+
Internal Deduplication,38700,ai2d_exact_match,0.49708549222797926,0.008999001233939138
|
| 710 |
+
Internal Deduplication,38700,average,0.5196671356527304,
|
| 711 |
+
Internal Deduplication,38700,average_rank,1.4444444444444444,
|
| 712 |
+
Internal Deduplication,38700,chartqa_relaxed_overall,0.6744,0.009373846787815587
|
| 713 |
+
Internal Deduplication,38700,docvqa_val_anls,0.732080533728902,0.0056514543481841085
|
| 714 |
+
Internal Deduplication,38700,infovqa_val_anls,0.34326469229313616,0.0079487702679686
|
| 715 |
+
Internal Deduplication,38700,mme_total_score,1366.760604241697,
|
| 716 |
+
Internal Deduplication,38700,mmmu_val_mmmu_acc,0.28778,
|
| 717 |
+
Internal Deduplication,38700,mmstar_average,0.34458636697182526,
|
| 718 |
+
Internal Deduplication,38700,ocrbench_ocrbench_accuracy,0.632,
|
| 719 |
+
Internal Deduplication,38700,textvqa_val_exact_match,0.6461399999999999,0.00642093963319658
|
| 720 |
+
Internal Deduplication,39900,ai2d_exact_match,0.4957901554404145,0.008998835133354702
|
| 721 |
+
Internal Deduplication,39900,average,0.5516529838475074,
|
| 722 |
+
Internal Deduplication,39900,average_rank,1.625,
|
| 723 |
+
Internal Deduplication,39900,chartqa_relaxed_overall,0.6696,0.009409024811273465
|
| 724 |
+
Internal Deduplication,39900,docvqa_val_anls,0.723701988394961,0.005721818793341698
|
| 725 |
+
Internal Deduplication,39900,infovqa_val_anls,0.3483904533235705,0.007951328084102772
|
| 726 |
+
Internal Deduplication,39900,mme_total_score,1403.717386954782,
|
| 727 |
+
Internal Deduplication,39900,mmstar_average,0.34950828977360593,
|
| 728 |
+
Internal Deduplication,39900,ocrbench_ocrbench_accuracy,0.629,
|
| 729 |
+
Internal Deduplication,39900,textvqa_val_exact_match,0.64558,0.006428340177019748
|
app/src/content/assets/data/mnist-variant-model.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:7dca86e85be46c1fca6a4e2503786e88e3f8d4609fb7284c8a1479620a5827da
|
| 3 |
-
size 4315
|
|
|
|
|
|
|
|
|
|
|
|
app/src/content/assets/data/relevance_filters.csv
ADDED
|
@@ -0,0 +1,1201 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
run,step,metric,value,stderr
|
| 2 |
+
Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
|
| 3 |
+
Baseline,1000,average,0.27120689295763617,
|
| 4 |
+
Baseline,1000,average_rank,3.1,
|
| 5 |
+
Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
|
| 6 |
+
Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
|
| 7 |
+
Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
|
| 8 |
+
Baseline,1000,mme_total_score,977.4280712284914,
|
| 9 |
+
Baseline,1000,mmmu_val_mmmu_acc,0.25222,
|
| 10 |
+
Baseline,1000,mmstar_average,0.23215874078908072,
|
| 11 |
+
Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
|
| 12 |
+
Baseline,1000,seedbench_seed_all,0.2563646470261256,
|
| 13 |
+
Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
|
| 14 |
+
Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
|
| 15 |
+
Baseline,2000,average,0.3202068275596269,
|
| 16 |
+
Baseline,2000,average_rank,2.9,
|
| 17 |
+
Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
|
| 18 |
+
Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
|
| 19 |
+
Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
|
| 20 |
+
Baseline,2000,mme_total_score,1049.3036214485794,
|
| 21 |
+
Baseline,2000,mmmu_val_mmmu_acc,0.24556,
|
| 22 |
+
Baseline,2000,mmstar_average,0.21305462434540698,
|
| 23 |
+
Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
|
| 24 |
+
Baseline,2000,seedbench_seed_all,0.258532518065592,
|
| 25 |
+
Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
|
| 26 |
+
Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
|
| 27 |
+
Baseline,3000,average,0.3507423834414229,
|
| 28 |
+
Baseline,3000,average_rank,2.7,
|
| 29 |
+
Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
|
| 30 |
+
Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
|
| 31 |
+
Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
|
| 32 |
+
Baseline,3000,mme_total_score,1170.2383953581434,
|
| 33 |
+
Baseline,3000,mmmu_val_mmmu_acc,0.27556,
|
| 34 |
+
Baseline,3000,mmstar_average,0.25432376938577683,
|
| 35 |
+
Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
|
| 36 |
+
Baseline,3000,seedbench_seed_all,0.2792106725958866,
|
| 37 |
+
Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
|
| 38 |
+
Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
|
| 39 |
+
Baseline,4000,average,0.36961781722974835,
|
| 40 |
+
Baseline,4000,average_rank,3.7,
|
| 41 |
+
Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
|
| 42 |
+
Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
|
| 43 |
+
Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
|
| 44 |
+
Baseline,4000,mme_total_score,1155.203781512605,
|
| 45 |
+
Baseline,4000,mmmu_val_mmmu_acc,0.25556,
|
| 46 |
+
Baseline,4000,mmstar_average,0.2575590188757354,
|
| 47 |
+
Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
|
| 48 |
+
Baseline,4000,seedbench_seed_all,0.33913285158421347,
|
| 49 |
+
Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
|
| 50 |
+
Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
|
| 51 |
+
Baseline,5000,average,0.3974627910380972,
|
| 52 |
+
Baseline,5000,average_rank,3.3,
|
| 53 |
+
Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
|
| 54 |
+
Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
|
| 55 |
+
Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
|
| 56 |
+
Baseline,5000,mme_total_score,1181.4653861544618,
|
| 57 |
+
Baseline,5000,mmmu_val_mmmu_acc,0.26667,
|
| 58 |
+
Baseline,5000,mmstar_average,0.29596648146165705,
|
| 59 |
+
Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
|
| 60 |
+
Baseline,5000,seedbench_seed_all,0.43107281823235133,
|
| 61 |
+
Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
|
| 62 |
+
Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
|
| 63 |
+
Baseline,6000,average,0.4161227404571003,
|
| 64 |
+
Baseline,6000,average_rank,2.6,
|
| 65 |
+
Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
|
| 66 |
+
Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
|
| 67 |
+
Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
|
| 68 |
+
Baseline,6000,mme_total_score,1284.1648659463785,
|
| 69 |
+
Baseline,6000,mmmu_val_mmmu_acc,0.27111,
|
| 70 |
+
Baseline,6000,mmstar_average,0.2978489412854164,
|
| 71 |
+
Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
|
| 72 |
+
Baseline,6000,seedbench_seed_all,0.4795997776542524,
|
| 73 |
+
Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
|
| 74 |
+
Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
|
| 75 |
+
Baseline,7000,average,0.4291083177345374,
|
| 76 |
+
Baseline,7000,average_rank,2.9,
|
| 77 |
+
Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
|
| 78 |
+
Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
|
| 79 |
+
Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
|
| 80 |
+
Baseline,7000,mme_total_score,1185.875650260104,
|
| 81 |
+
Baseline,7000,mmmu_val_mmmu_acc,0.26556,
|
| 82 |
+
Baseline,7000,mmstar_average,0.31372400960777047,
|
| 83 |
+
Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
|
| 84 |
+
Baseline,7000,seedbench_seed_all,0.4964424680377988,
|
| 85 |
+
Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
|
| 86 |
+
Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
|
| 87 |
+
Baseline,8000,average,0.43846759477995995,
|
| 88 |
+
Baseline,8000,average_rank,3.2,
|
| 89 |
+
Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
|
| 90 |
+
Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
|
| 91 |
+
Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
|
| 92 |
+
Baseline,8000,mme_total_score,1199.2409963985594,
|
| 93 |
+
Baseline,8000,mmmu_val_mmmu_acc,0.28111,
|
| 94 |
+
Baseline,8000,mmstar_average,0.33512257186205047,
|
| 95 |
+
Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
|
| 96 |
+
Baseline,8000,seedbench_seed_all,0.5024458032240133,
|
| 97 |
+
Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
|
| 98 |
+
Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
|
| 99 |
+
Baseline,9000,average,0.4422510732201056,
|
| 100 |
+
Baseline,9000,average_rank,3.2,
|
| 101 |
+
Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
|
| 102 |
+
Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
|
| 103 |
+
Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
|
| 104 |
+
Baseline,9000,mme_total_score,1231.5195078031213,
|
| 105 |
+
Baseline,9000,mmmu_val_mmmu_acc,0.25889,
|
| 106 |
+
Baseline,9000,mmstar_average,0.3216444898242951,
|
| 107 |
+
Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
|
| 108 |
+
Baseline,9000,seedbench_seed_all,0.5120622568093385,
|
| 109 |
+
Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
|
| 110 |
+
Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
|
| 111 |
+
Baseline,10000,average,0.4523875703250908,
|
| 112 |
+
Baseline,10000,average_rank,2.9,
|
| 113 |
+
Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
|
| 114 |
+
Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
|
| 115 |
+
Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
|
| 116 |
+
Baseline,10000,mme_total_score,1240.8218287314926,
|
| 117 |
+
Baseline,10000,mmmu_val_mmmu_acc,0.28778,
|
| 118 |
+
Baseline,10000,mmstar_average,0.32972717906018517,
|
| 119 |
+
Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
|
| 120 |
+
Baseline,10000,seedbench_seed_all,0.5217342968315731,
|
| 121 |
+
Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
|
| 122 |
+
Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
|
| 123 |
+
Baseline,11000,average,0.4561398159525099,
|
| 124 |
+
Baseline,11000,average_rank,3.0,
|
| 125 |
+
Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
|
| 126 |
+
Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
|
| 127 |
+
Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
|
| 128 |
+
Baseline,11000,mme_total_score,1322.9488795518205,
|
| 129 |
+
Baseline,11000,mmmu_val_mmmu_acc,0.27778,
|
| 130 |
+
Baseline,11000,mmstar_average,0.3298563439522548,
|
| 131 |
+
Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
|
| 132 |
+
Baseline,11000,seedbench_seed_all,0.5237354085603113,
|
| 133 |
+
Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
|
| 134 |
+
Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
|
| 135 |
+
Baseline,12000,average,0.4582751140055433,
|
| 136 |
+
Baseline,12000,average_rank,3.5,
|
| 137 |
+
Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 138 |
+
Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
|
| 139 |
+
Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
|
| 140 |
+
Baseline,12000,mme_total_score,1225.6453581432572,
|
| 141 |
+
Baseline,12000,mmmu_val_mmmu_acc,0.27889,
|
| 142 |
+
Baseline,12000,mmstar_average,0.34010867846816534,
|
| 143 |
+
Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
|
| 144 |
+
Baseline,12000,seedbench_seed_all,0.5350194552529183,
|
| 145 |
+
Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
|
| 146 |
+
Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
|
| 147 |
+
Baseline,13000,average,0.4692868662590049,
|
| 148 |
+
Baseline,13000,average_rank,2.7,
|
| 149 |
+
Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
|
| 150 |
+
Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
|
| 151 |
+
Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
|
| 152 |
+
Baseline,13000,mme_total_score,1281.7122849139657,
|
| 153 |
+
Baseline,13000,mmmu_val_mmmu_acc,0.28222,
|
| 154 |
+
Baseline,13000,mmstar_average,0.3453069542917521,
|
| 155 |
+
Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
|
| 156 |
+
Baseline,13000,seedbench_seed_all,0.5442468037798777,
|
| 157 |
+
Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
|
| 158 |
+
Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
|
| 159 |
+
Baseline,14000,average,0.47352486841689195,
|
| 160 |
+
Baseline,14000,average_rank,2.5,
|
| 161 |
+
Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
|
| 162 |
+
Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
|
| 163 |
+
Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
|
| 164 |
+
Baseline,14000,mme_total_score,1309.1444577831132,
|
| 165 |
+
Baseline,14000,mmmu_val_mmmu_acc,0.28111,
|
| 166 |
+
Baseline,14000,mmstar_average,0.34575818188776586,
|
| 167 |
+
Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
|
| 168 |
+
Baseline,14000,seedbench_seed_all,0.5483602001111729,
|
| 169 |
+
Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
|
| 170 |
+
Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
|
| 171 |
+
Baseline,15000,average,0.47878665012878824,
|
| 172 |
+
Baseline,15000,average_rank,2.6,
|
| 173 |
+
Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 174 |
+
Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
|
| 175 |
+
Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
|
| 176 |
+
Baseline,15000,mme_total_score,1384.2171868747498,
|
| 177 |
+
Baseline,15000,mmmu_val_mmmu_acc,0.30222,
|
| 178 |
+
Baseline,15000,mmstar_average,0.35408135695920684,
|
| 179 |
+
Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
|
| 180 |
+
Baseline,15000,seedbench_seed_all,0.5411339633129516,
|
| 181 |
+
Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
|
| 182 |
+
Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
|
| 183 |
+
Baseline,16000,average,0.47665128022935843,
|
| 184 |
+
Baseline,16000,average_rank,3.0,
|
| 185 |
+
Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 186 |
+
Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
|
| 187 |
+
Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
|
| 188 |
+
Baseline,16000,mme_total_score,1317.8491396558625,
|
| 189 |
+
Baseline,16000,mmmu_val_mmmu_acc,0.27556,
|
| 190 |
+
Baseline,16000,mmstar_average,0.33214333327093315,
|
| 191 |
+
Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
|
| 192 |
+
Baseline,16000,seedbench_seed_all,0.5463590883824346,
|
| 193 |
+
Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
|
| 194 |
+
Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
|
| 195 |
+
Baseline,17000,average,0.4777141780162423,
|
| 196 |
+
Baseline,17000,average_rank,2.5,
|
| 197 |
+
Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 198 |
+
Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
|
| 199 |
+
Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
|
| 200 |
+
Baseline,17000,mme_total_score,1381.9161664665867,
|
| 201 |
+
Baseline,17000,mmmu_val_mmmu_acc,0.27667,
|
| 202 |
+
Baseline,17000,mmstar_average,0.3370289492329521,
|
| 203 |
+
Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
|
| 204 |
+
Baseline,17000,seedbench_seed_all,0.5510283490828238,
|
| 205 |
+
Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
|
| 206 |
+
Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
|
| 207 |
+
Baseline,18000,average,0.4819834595278701,
|
| 208 |
+
Baseline,18000,average_rank,2.9,
|
| 209 |
+
Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
|
| 210 |
+
Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
|
| 211 |
+
Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
|
| 212 |
+
Baseline,18000,mme_total_score,1336.922769107643,
|
| 213 |
+
Baseline,18000,mmmu_val_mmmu_acc,0.28667,
|
| 214 |
+
Baseline,18000,mmstar_average,0.34482796716566916,
|
| 215 |
+
Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
|
| 216 |
+
Baseline,18000,seedbench_seed_all,0.5543079488604781,
|
| 217 |
+
Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
|
| 218 |
+
Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
|
| 219 |
+
Baseline,19000,average,0.4899006713916878,
|
| 220 |
+
Baseline,19000,average_rank,2.7,
|
| 221 |
+
Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
|
| 222 |
+
Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
|
| 223 |
+
Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
|
| 224 |
+
Baseline,19000,mme_total_score,1406.6628651460583,
|
| 225 |
+
Baseline,19000,mmmu_val_mmmu_acc,0.28333,
|
| 226 |
+
Baseline,19000,mmstar_average,0.356220913822775,
|
| 227 |
+
Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
|
| 228 |
+
Baseline,19000,seedbench_seed_all,0.554585881045025,
|
| 229 |
+
Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
|
| 230 |
+
Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
|
| 231 |
+
Baseline,20000,average,0.4873169067639118,
|
| 232 |
+
Baseline,20000,average_rank,3.1,
|
| 233 |
+
Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
|
| 234 |
+
Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
|
| 235 |
+
Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
|
| 236 |
+
Baseline,20000,mme_total_score,1324.6738695478193,
|
| 237 |
+
Baseline,20000,mmmu_val_mmmu_acc,0.30111,
|
| 238 |
+
Baseline,20000,mmstar_average,0.33806766134497995,
|
| 239 |
+
Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
|
| 240 |
+
Baseline,20000,seedbench_seed_all,0.5587548638132296,
|
| 241 |
+
Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
|
| 242 |
+
≥2,1000,ai2d_exact_match,0.2645725388601036,0.007939149662089442
|
| 243 |
+
≥2,1000,average,0.2722931646460497,
|
| 244 |
+
≥2,1000,average_rank,3.2,
|
| 245 |
+
≥2,1000,chartqa_relaxed_overall,0.3664,0.009638338810708616
|
| 246 |
+
≥2,1000,docvqa_val_anls,0.35825461497275807,0.005864292098743202
|
| 247 |
+
≥2,1000,infovqa_val_anls,0.16722293767954274,0.0061333612650745235
|
| 248 |
+
≥2,1000,mme_total_score,994.9906962785115,
|
| 249 |
+
≥2,1000,mmmu_val_mmmu_acc,0.25111,
|
| 250 |
+
≥2,1000,mmstar_average,0.2099224814637991,
|
| 251 |
+
≥2,1000,ocrbench_ocrbench_accuracy,0.304,
|
| 252 |
+
≥2,1000,seedbench_seed_all,0.24463590883824346,
|
| 253 |
+
≥2,1000,textvqa_val_exact_match,0.28452000000000005,0.006179555914647949
|
| 254 |
+
≥2,2000,ai2d_exact_match,0.2648963730569948,0.007942257693619753
|
| 255 |
+
≥2,2000,average,0.3161289250086133,
|
| 256 |
+
≥2,2000,average_rank,3.2,
|
| 257 |
+
≥2,2000,chartqa_relaxed_overall,0.4476,0.00994692276581072
|
| 258 |
+
≥2,2000,docvqa_val_anls,0.44553207035528164,0.006176982458046509
|
| 259 |
+
≥2,2000,infovqa_val_anls,0.19690312157526974,0.00648399793536667
|
| 260 |
+
≥2,2000,mme_total_score,1054.2768107242898,
|
| 261 |
+
≥2,2000,mmmu_val_mmmu_acc,0.24778,
|
| 262 |
+
≥2,2000,mmstar_average,0.20779488571532076,
|
| 263 |
+
≥2,2000,ocrbench_ocrbench_accuracy,0.383,
|
| 264 |
+
≥2,2000,seedbench_seed_all,0.2529738743746526,
|
| 265 |
+
≥2,2000,textvqa_val_exact_match,0.39868,0.006677826756815335
|
| 266 |
+
≥2,3000,ai2d_exact_match,0.2697538860103627,0.007988222765138163
|
| 267 |
+
≥2,3000,average,0.34461871112110076,
|
| 268 |
+
≥2,3000,average_rank,3.7,
|
| 269 |
+
≥2,3000,chartqa_relaxed_overall,0.502,0.010001920583875201
|
| 270 |
+
≥2,3000,docvqa_val_anls,0.4943706505276063,0.006276617082627261
|
| 271 |
+
≥2,3000,infovqa_val_anls,0.21287605644341218,0.006682253709215569
|
| 272 |
+
≥2,3000,mme_total_score,1162.2701080432173,
|
| 273 |
+
≥2,3000,mmmu_val_mmmu_acc,0.25556,
|
| 274 |
+
��2,3000,mmstar_average,0.21427247636922603,
|
| 275 |
+
≥2,3000,ocrbench_ocrbench_accuracy,0.449,
|
| 276 |
+
≥2,3000,seedbench_seed_all,0.2715953307392996,
|
| 277 |
+
≥2,3000,textvqa_val_exact_match,0.43213999999999997,0.006742795943777913
|
| 278 |
+
≥2,4000,ai2d_exact_match,0.27525906735751293,0.008038849490577975
|
| 279 |
+
≥2,4000,average,0.37379440715652495,
|
| 280 |
+
≥2,4000,average_rank,2.6,
|
| 281 |
+
≥2,4000,chartqa_relaxed_overall,0.5356,0.009976616117083942
|
| 282 |
+
≥2,4000,docvqa_val_anls,0.5415736777563739,0.006259488230977563
|
| 283 |
+
≥2,4000,infovqa_val_anls,0.22392444384387208,0.00676041701311943
|
| 284 |
+
≥2,4000,mme_total_score,1195.5438175270108,
|
| 285 |
+
≥2,4000,mmmu_val_mmmu_acc,0.26667,
|
| 286 |
+
≥2,4000,mmstar_average,0.2507897461569136,
|
| 287 |
+
≥2,4000,ocrbench_ocrbench_accuracy,0.462,
|
| 288 |
+
≥2,4000,seedbench_seed_all,0.34291272929405225,
|
| 289 |
+
≥2,4000,textvqa_val_exact_match,0.46542,0.0067795602517745565
|
| 290 |
+
≥2,5000,ai2d_exact_match,0.31055699481865284,0.008328207321163279
|
| 291 |
+
≥2,5000,average,0.39445964778826137,
|
| 292 |
+
≥2,5000,average_rank,3.3,
|
| 293 |
+
≥2,5000,chartqa_relaxed_overall,0.552,0.00994776272300849
|
| 294 |
+
≥2,5000,docvqa_val_anls,0.5556927230238289,0.006299299461817651
|
| 295 |
+
≥2,5000,infovqa_val_anls,0.24261245038285142,0.007075738778751112
|
| 296 |
+
≥2,5000,mme_total_score,1220.672168867547,
|
| 297 |
+
≥2,5000,mmmu_val_mmmu_acc,0.27444,
|
| 298 |
+
≥2,5000,mmstar_average,0.2522926162881413,
|
| 299 |
+
≥2,5000,ocrbench_ocrbench_accuracy,0.467,
|
| 300 |
+
≥2,5000,seedbench_seed_all,0.42768204558087825,
|
| 301 |
+
≥2,5000,textvqa_val_exact_match,0.46785999999999994,0.006777889939974511
|
| 302 |
+
≥2,6000,ai2d_exact_match,0.3325777202072539,0.008479663360791275
|
| 303 |
+
≥2,6000,average,0.4101998600043759,
|
| 304 |
+
≥2,6000,average_rank,3.7,
|
| 305 |
+
≥2,6000,chartqa_relaxed_overall,0.5672,0.009911254067113462
|
| 306 |
+
≥2,6000,docvqa_val_anls,0.5702012141050906,0.006263916894054504
|
| 307 |
+
≥2,6000,infovqa_val_anls,0.21632587505016104,0.006473865748732477
|
| 308 |
+
≥2,6000,mme_total_score,1313.7047819127652,
|
| 309 |
+
≥2,6000,mmmu_val_mmmu_acc,0.28,
|
| 310 |
+
≥2,6000,mmstar_average,0.28566177948176935,
|
| 311 |
+
≥2,6000,ocrbench_ocrbench_accuracy,0.486,
|
| 312 |
+
≥2,6000,seedbench_seed_all,0.4698721511951084,
|
| 313 |
+
≥2,6000,textvqa_val_exact_match,0.48396000000000006,0.006801425994533192
|
| 314 |
+
≥2,7000,ai2d_exact_match,0.35200777202072536,0.00859592682822483
|
| 315 |
+
≥2,7000,average,0.4204955224344633,
|
| 316 |
+
≥2,7000,average_rank,4.0,
|
| 317 |
+
≥2,7000,chartqa_relaxed_overall,0.5712,0.00990007214980924
|
| 318 |
+
≥2,7000,docvqa_val_anls,0.5850734578344774,0.006202520219850679
|
| 319 |
+
≥2,7000,infovqa_val_anls,0.23449023638527144,0.0067906990453115955
|
| 320 |
+
≥2,7000,mme_total_score,1247.423969587835,
|
| 321 |
+
≥2,7000,mmmu_val_mmmu_acc,0.28444,
|
| 322 |
+
≥2,7000,mmstar_average,0.29053864145068503,
|
| 323 |
+
≥2,7000,ocrbench_ocrbench_accuracy,0.487,
|
| 324 |
+
≥2,7000,seedbench_seed_all,0.48526959421901056,
|
| 325 |
+
≥2,7000,textvqa_val_exact_match,0.49444000000000005,0.006796105847537853
|
| 326 |
+
≥2,8000,ai2d_exact_match,0.3746761658031088,0.008711886524907496
|
| 327 |
+
≥2,8000,average,0.43663916832315425,
|
| 328 |
+
≥2,8000,average_rank,2.9,
|
| 329 |
+
≥2,8000,chartqa_relaxed_overall,0.5816,0.00986790384075991
|
| 330 |
+
≥2,8000,docvqa_val_anls,0.6028798426362394,0.006214872354058686
|
| 331 |
+
≥2,8000,infovqa_val_anls,0.2535281850303886,0.0070045473889607445
|
| 332 |
+
≥2,8000,mme_total_score,1300.5965386154462,
|
| 333 |
+
≥2,8000,mmmu_val_mmmu_acc,0.27333,
|
| 334 |
+
≥2,8000,mmstar_average,0.310944925107356,
|
| 335 |
+
≥2,8000,ocrbench_ocrbench_accuracy,0.516,
|
| 336 |
+
≥2,8000,seedbench_seed_all,0.5041133963312951,
|
| 337 |
+
≥2,8000,textvqa_val_exact_match,0.51268,0.006798079603627737
|
| 338 |
+
≥2,9000,ai2d_exact_match,0.3795336787564767,0.00873405559083709
|
| 339 |
+
≥2,9000,average,0.43759974296352216,
|
| 340 |
+
≥2,9000,average_rank,2.9,
|
| 341 |
+
≥2,9000,chartqa_relaxed_overall,0.5884,0.009844437067525526
|
| 342 |
+
≥2,9000,docvqa_val_anls,0.6175894644110065,0.0061700253612544395
|
| 343 |
+
≥2,9000,infovqa_val_anls,0.24471327484068725,0.006934982517240646
|
| 344 |
+
≥2,9000,mme_total_score,1258.1754701880752,
|
| 345 |
+
≥2,9000,mmmu_val_mmmu_acc,0.27,
|
| 346 |
+
≥2,9000,mmstar_average,0.2988526527658083,
|
| 347 |
+
≥2,9000,ocrbench_ocrbench_accuracy,0.514,
|
| 348 |
+
≥2,9000,seedbench_seed_all,0.5155086158977209,
|
| 349 |
+
≥2,9000,textvqa_val_exact_match,0.5098,0.00680062068405066
|
| 350 |
+
≥2,10000,ai2d_exact_match,0.407059585492228,0.008842319527489083
|
| 351 |
+
≥2,10000,average,0.45127176699921406,
|
| 352 |
+
≥2,10000,average_rank,3.1,
|
| 353 |
+
≥2,10000,chartqa_relaxed_overall,0.5956,0.009817474681589429
|
| 354 |
+
≥2,10000,docvqa_val_anls,0.6286443353240219,0.006128441640319587
|
| 355 |
+
≥2,10000,infovqa_val_anls,0.25277210900180563,0.007055702724548255
|
| 356 |
+
≥2,10000,mme_total_score,1320.1028411364546,
|
| 357 |
+
≥2,10000,mmmu_val_mmmu_acc,0.27556,
|
| 358 |
+
≥2,10000,mmstar_average,0.3429750538307907,
|
| 359 |
+
≥2,10000,ocrbench_ocrbench_accuracy,0.523,
|
| 360 |
+
≥2,10000,seedbench_seed_all,0.51467481934408,
|
| 361 |
+
≥2,10000,textvqa_val_exact_match,0.5211600000000001,0.006783601870014644
|
| 362 |
+
≥2,11000,ai2d_exact_match,0.41580310880829013,0.008870644443998564
|
| 363 |
+
≥2,11000,average,0.4525862975952584,
|
| 364 |
+
≥2,11000,average_rank,3.5,
|
| 365 |
+
≥2,11000,chartqa_relaxed_overall,0.598,0.009808000752013664
|
| 366 |
+
≥2,11000,docvqa_val_anls,0.6307438129106796,0.006133911991297053
|
| 367 |
+
≥2,11000,infovqa_val_anls,0.25390014221903434,0.007050537280004977
|
| 368 |
+
≥2,11000,mme_total_score,1302.5287114845937,
|
| 369 |
+
≥2,11000,mmmu_val_mmmu_acc,0.29333,
|
| 370 |
+
≥2,11000,mmstar_average,0.303972877343168,
|
| 371 |
+
���2,11000,ocrbench_ocrbench_accuracy,0.523,
|
| 372 |
+
≥2,11000,seedbench_seed_all,0.5281267370761534,
|
| 373 |
+
≥2,11000,textvqa_val_exact_match,0.5264,0.006786826961404041
|
| 374 |
+
≥2,12000,ai2d_exact_match,0.43426165803108807,0.008921034830887027
|
| 375 |
+
≥2,12000,average,0.46342874141175217,
|
| 376 |
+
≥2,12000,average_rank,2.7,
|
| 377 |
+
≥2,12000,chartqa_relaxed_overall,0.6188,0.009715574144248037
|
| 378 |
+
≥2,12000,docvqa_val_anls,0.6419729722202083,0.006094582531110984
|
| 379 |
+
≥2,12000,infovqa_val_anls,0.24776952598966778,0.006784112219881613
|
| 380 |
+
≥2,12000,mme_total_score,1255.4957983193276,
|
| 381 |
+
≥2,12000,mmmu_val_mmmu_acc,0.27111,
|
| 382 |
+
≥2,12000,mmstar_average,0.3424608032908198,
|
| 383 |
+
≥2,12000,ocrbench_ocrbench_accuracy,0.541,
|
| 384 |
+
≥2,12000,seedbench_seed_all,0.5306837131739855,
|
| 385 |
+
≥2,12000,textvqa_val_exact_match,0.5428,0.006758192556691964
|
| 386 |
+
≥2,13000,ai2d_exact_match,0.42843264248704666,0.008906491762178375
|
| 387 |
+
≥2,13000,average,0.4611120038339278,
|
| 388 |
+
≥2,13000,average_rank,3.8,
|
| 389 |
+
≥2,13000,chartqa_relaxed_overall,0.606,0.00977465178546074
|
| 390 |
+
≥2,13000,docvqa_val_anls,0.6433656711922792,0.0061086851054902285
|
| 391 |
+
≥2,13000,infovqa_val_anls,0.2535479547381062,0.006989226376396767
|
| 392 |
+
≥2,13000,mme_total_score,1360.003101240496,
|
| 393 |
+
≥2,13000,mmmu_val_mmmu_acc,0.28556,
|
| 394 |
+
≥2,13000,mmstar_average,0.3320394092229932,
|
| 395 |
+
≥2,13000,ocrbench_ocrbench_accuracy,0.526,
|
| 396 |
+
≥2,13000,seedbench_seed_all,0.5362423568649249,
|
| 397 |
+
≥2,13000,textvqa_val_exact_match,0.53882,0.006765393974568386
|
| 398 |
+
≥2,14000,ai2d_exact_match,0.44689119170984454,0.008948245073044956
|
| 399 |
+
≥2,14000,average,0.47130833654714216,
|
| 400 |
+
≥2,14000,average_rank,2.8,
|
| 401 |
+
≥2,14000,chartqa_relaxed_overall,0.6216,0.009701702181065136
|
| 402 |
+
≥2,14000,docvqa_val_anls,0.6619108814388047,0.006015398975274413
|
| 403 |
+
≥2,14000,infovqa_val_anls,0.2567040650730957,0.006986745571340195
|
| 404 |
+
≥2,14000,mme_total_score,1310.3628451380553,
|
| 405 |
+
≥2,14000,mmmu_val_mmmu_acc,0.28333,
|
| 406 |
+
≥2,14000,mmstar_average,0.3315916867003111,
|
| 407 |
+
≥2,14000,ocrbench_ocrbench_accuracy,0.547,
|
| 408 |
+
≥2,14000,seedbench_seed_all,0.5409672040022234,
|
| 409 |
+
≥2,14000,textvqa_val_exact_match,0.55178,0.006748546131944198
|
| 410 |
+
≥2,15000,ai2d_exact_match,0.4523963730569948,0.00895827521082005
|
| 411 |
+
≥2,15000,average,0.4720211465604895,
|
| 412 |
+
≥2,15000,average_rank,3.5,
|
| 413 |
+
≥2,15000,chartqa_relaxed_overall,0.62,0.009709671008043154
|
| 414 |
+
≥2,15000,docvqa_val_anls,0.6679183447758706,0.005982903367170995
|
| 415 |
+
≥2,15000,infovqa_val_anls,0.24815705436683513,0.006864270716284432
|
| 416 |
+
≥2,15000,mme_total_score,1236.2534013605443,
|
| 417 |
+
≥2,15000,mmmu_val_mmmu_acc,0.29889,
|
| 418 |
+
≥2,15000,mmstar_average,0.3351456007635487,
|
| 419 |
+
≥2,15000,ocrbench_ocrbench_accuracy,0.527,
|
| 420 |
+
≥2,15000,seedbench_seed_all,0.5453029460811561,
|
| 421 |
+
≥2,15000,textvqa_val_exact_match,0.55338,0.006735012041373013
|
| 422 |
+
≥2,16000,ai2d_exact_match,0.44624352331606215,0.008946992176353898
|
| 423 |
+
≥2,16000,average,0.4766960932538844,
|
| 424 |
+
≥2,16000,average_rank,3.2,
|
| 425 |
+
≥2,16000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 426 |
+
≥2,16000,docvqa_val_anls,0.6754589054855508,0.005966817690473989
|
| 427 |
+
≥2,16000,infovqa_val_anls,0.27323519213464514,0.007206289716945655
|
| 428 |
+
≥2,16000,mme_total_score,1305.906762705082,
|
| 429 |
+
≥2,16000,mmmu_val_mmmu_acc,0.29,
|
| 430 |
+
≥2,16000,mmstar_average,0.34328884147265926,
|
| 431 |
+
≥2,16000,ocrbench_ocrbench_accuracy,0.555,
|
| 432 |
+
≥2,16000,seedbench_seed_all,0.5410783768760422,
|
| 433 |
+
≥2,16000,textvqa_val_exact_match,0.55396,0.00674076785464787
|
| 434 |
+
≥2,17000,ai2d_exact_match,0.4485103626943005,0.008951310133709686
|
| 435 |
+
≥2,17000,average,0.4803744475549501,
|
| 436 |
+
≥2,17000,average_rank,3.3,
|
| 437 |
+
≥2,17000,chartqa_relaxed_overall,0.6352,0.009629406741314642
|
| 438 |
+
≥2,17000,docvqa_val_anls,0.6735387256928971,0.006001868055856522
|
| 439 |
+
≥2,17000,infovqa_val_anls,0.2713449738427,0.007231154690666275
|
| 440 |
+
≥2,17000,mme_total_score,1302.8314325730291,
|
| 441 |
+
≥2,17000,mmmu_val_mmmu_acc,0.28667,
|
| 442 |
+
≥2,17000,mmstar_average,0.33631999578132954,
|
| 443 |
+
≥2,17000,ocrbench_ocrbench_accuracy,0.571,
|
| 444 |
+
≥2,17000,seedbench_seed_all,0.542745969983324,
|
| 445 |
+
≥2,17000,textvqa_val_exact_match,0.5580400000000001,0.006741465801458199
|
| 446 |
+
≥2,18000,ai2d_exact_match,0.46113989637305697,0.008971933568013592
|
| 447 |
+
≥2,18000,average,0.48745721111983964,
|
| 448 |
+
≥2,18000,average_rank,2.6,
|
| 449 |
+
≥2,18000,chartqa_relaxed_overall,0.6276,0.009670817229291067
|
| 450 |
+
≥2,18000,docvqa_val_anls,0.6812777947859573,0.005935773909547658
|
| 451 |
+
≥2,18000,infovqa_val_anls,0.27095882924867687,0.007164605404977649
|
| 452 |
+
≥2,18000,mme_total_score,1289.7513005202081,
|
| 453 |
+
≥2,18000,mmmu_val_mmmu_acc,0.31556,
|
| 454 |
+
≥2,18000,mmstar_average,0.35401030852022664,
|
| 455 |
+
≥2,18000,ocrbench_ocrbench_accuracy,0.564,
|
| 456 |
+
≥2,18000,seedbench_seed_all,0.5505280711506393,
|
| 457 |
+
≥2,18000,textvqa_val_exact_match,0.5620400000000001,0.00673487040527694
|
| 458 |
+
≥2,19000,ai2d_exact_match,0.4698834196891192,0.008982814668850815
|
| 459 |
+
≥2,19000,average,0.48664836716175586,
|
| 460 |
+
≥2,19000,average_rank,3.2,
|
| 461 |
+
≥2,19000,chartqa_relaxed_overall,0.6276,0.009670817229291067
|
| 462 |
+
≥2,19000,docvqa_val_anls,0.6838077764263535,0.005944136929785695
|
| 463 |
+
≥2,19000,infovqa_val_anls,0.26757170067350106,0.007096398035000058
|
| 464 |
+
≥2,19000,mme_total_score,1310.4946978791518,
|
| 465 |
+
≥2,19000,mmmu_val_mmmu_acc,0.29444,
|
| 466 |
+
≥2,19000,mmstar_average,0.365800601107629,
|
| 467 |
+
≥2,19000,ocrbench_ocrbench_accuracy,0.559,
|
| 468 |
+
≥2,19000,seedbench_seed_all,0.5532518065591996,
|
| 469 |
+
≥2,19000,textvqa_val_exact_match,0.55848,0.006735717623117797
|
| 470 |
+
≥2,20000,ai2d_exact_match,0.4727979274611399,0.008985826352357515
|
| 471 |
+
≥2,20000,average,0.4887875980209429,
|
| 472 |
+
≥2,20000,average_rank,3.4,
|
| 473 |
+
≥2,20000,chartqa_relaxed_overall,0.6392,0.00960657371300514
|
| 474 |
+
≥2,20000,docvqa_val_anls,0.6828620051596259,0.005923332769971399
|
| 475 |
+
≥2,20000,infovqa_val_anls,0.2701274975234547,0.007055868134029247
|
| 476 |
+
≥2,20000,mme_total_score,1323.9108643457382,
|
| 477 |
+
≥2,20000,mmmu_val_mmmu_acc,0.30222,
|
| 478 |
+
≥2,20000,mmstar_average,0.33931189145504953,
|
| 479 |
+
≥2,20000,ocrbench_ocrbench_accuracy,0.57,
|
| 480 |
+
≥2,20000,seedbench_seed_all,0.5563090605892163,
|
| 481 |
+
≥2,20000,textvqa_val_exact_match,0.56626,0.0067178082936069205
|
| 482 |
+
≥3,1000,ai2d_exact_match,0.2691062176165803,0.007982164708643914
|
| 483 |
+
≥3,1000,average,0.27573784261835144,
|
| 484 |
+
≥3,1000,average_rank,2.8,
|
| 485 |
+
≥3,1000,chartqa_relaxed_overall,0.352,0.009553790345406665
|
| 486 |
+
≥3,1000,docvqa_val_anls,0.3425840937939014,0.005755186508181206
|
| 487 |
+
≥3,1000,infovqa_val_anls,0.1714752271538445,0.006218691549786442
|
| 488 |
+
≥3,1000,mme_total_score,1013.1872749099639,
|
| 489 |
+
≥3,1000,mmmu_val_mmmu_acc,0.24778,
|
| 490 |
+
≥3,1000,mmstar_average,0.2075589805205699,
|
| 491 |
+
≥3,1000,ocrbench_ocrbench_accuracy,0.324,
|
| 492 |
+
≥3,1000,seedbench_seed_all,0.24891606448026682,
|
| 493 |
+
≥3,1000,textvqa_val_exact_match,0.31822,0.006368399926474836
|
| 494 |
+
≥3,2000,ai2d_exact_match,0.25647668393782386,0.007859644922870102
|
| 495 |
+
≥3,2000,average,0.32059377128504934,
|
| 496 |
+
≥3,2000,average_rank,2.8,
|
| 497 |
+
≥3,2000,chartqa_relaxed_overall,0.4628,0.009974279848861338
|
| 498 |
+
≥3,2000,docvqa_val_anls,0.4518369496978485,0.00619300217721929
|
| 499 |
+
≥3,2000,infovqa_val_anls,0.21204013425009277,0.006820894774458214
|
| 500 |
+
≥3,2000,mme_total_score,1118.8858543417368,
|
| 501 |
+
≥3,2000,mmmu_val_mmmu_acc,0.25222,
|
| 502 |
+
≥3,2000,mmstar_average,0.20454842826555975,
|
| 503 |
+
≥3,2000,ocrbench_ocrbench_accuracy,0.376,
|
| 504 |
+
≥3,2000,seedbench_seed_all,0.25514174541411894,
|
| 505 |
+
≥3,2000,textvqa_val_exact_match,0.41428000000000004,0.006714956027174666
|
| 506 |
+
≥3,3000,ai2d_exact_match,0.25259067357512954,0.007820231277456426
|
| 507 |
+
≥3,3000,average,0.35341646277484595,
|
| 508 |
+
≥3,3000,average_rank,2.4,
|
| 509 |
+
≥3,3000,chartqa_relaxed_overall,0.5208,0.00999334232158103
|
| 510 |
+
≥3,3000,docvqa_val_anls,0.49758866181984457,0.00626460182861003
|
| 511 |
+
≥3,3000,infovqa_val_anls,0.21333414080666746,0.0067509043256437935
|
| 512 |
+
≥3,3000,mme_total_score,1165.3744497799119,
|
| 513 |
+
≥3,3000,mmmu_val_mmmu_acc,0.26,
|
| 514 |
+
≥3,3000,mmstar_average,0.2652435492500152,
|
| 515 |
+
≥3,3000,ocrbench_ocrbench_accuracy,0.442,
|
| 516 |
+
≥3,3000,seedbench_seed_all,0.29205113952195666,
|
| 517 |
+
≥3,3000,textvqa_val_exact_match,0.43714,0.006763850531672249
|
| 518 |
+
≥3,4000,ai2d_exact_match,0.28303108808290156,0.008107723290508887
|
| 519 |
+
≥3,4000,average,0.37496255619498237,
|
| 520 |
+
≥3,4000,average_rank,3.4,
|
| 521 |
+
≥3,4000,chartqa_relaxed_overall,0.5412,0.009967987174315731
|
| 522 |
+
≥3,4000,docvqa_val_anls,0.5296261512617491,0.006274192303767133
|
| 523 |
+
≥3,4000,infovqa_val_anls,0.2050381576936679,0.006416570814061769
|
| 524 |
+
≥3,4000,mme_total_score,1119.7681072428973,
|
| 525 |
+
≥3,4000,mmmu_val_mmmu_acc,0.25556,
|
| 526 |
+
≥3,4000,mmstar_average,0.24897141082880767,
|
| 527 |
+
≥3,4000,ocrbench_ocrbench_accuracy,0.47,
|
| 528 |
+
≥3,4000,seedbench_seed_all,0.3811561978877154,
|
| 529 |
+
≥3,4000,textvqa_val_exact_match,0.46007999999999993,0.006793769924125808
|
| 530 |
+
≥3,5000,ai2d_exact_match,0.3248056994818653,0.008428647470081763
|
| 531 |
+
≥3,5000,average,0.3977887563101667,
|
| 532 |
+
≥3,5000,average_rank,2.8,
|
| 533 |
+
≥3,5000,chartqa_relaxed_overall,0.5544,0.009942625323290008
|
| 534 |
+
≥3,5000,docvqa_val_anls,0.553669449701632,0.006282439058750721
|
| 535 |
+
≥3,5000,infovqa_val_anls,0.20821650889148954,0.006430552192683275
|
| 536 |
+
≥3,5000,mme_total_score,1326.9777911164465,
|
| 537 |
+
≥3,5000,mmmu_val_mmmu_acc,0.26444,
|
| 538 |
+
≥3,5000,mmstar_average,0.279759822424129,
|
| 539 |
+
≥3,5000,ocrbench_ocrbench_accuracy,0.487,
|
| 540 |
+
≥3,5000,seedbench_seed_all,0.43718732629238466,
|
| 541 |
+
≥3,5000,textvqa_val_exact_match,0.47062,0.0067917147023207275
|
| 542 |
+
≥3,6000,ai2d_exact_match,0.3536269430051813,0.008604903043803527
|
| 543 |
+
≥3,6000,average,0.41524300122458385,
|
| 544 |
+
≥3,6000,average_rank,3.1,
|
| 545 |
+
≥3,6000,chartqa_relaxed_overall,0.568,0.009909070383761948
|
| 546 |
+
≥3,6000,docvqa_val_anls,0.5722640243712676,0.00625854154899254
|
| 547 |
+
≥3,6000,infovqa_val_anls,0.2204869348964998,0.00662088578415522
|
| 548 |
+
≥3,6000,mme_total_score,1270.3575430172068,
|
| 549 |
+
≥3,6000,mmmu_val_mmmu_acc,0.26556,
|
| 550 |
+
≥3,6000,mmstar_average,0.2958896090262379,
|
| 551 |
+
≥3,6000,ocrbench_ocrbench_accuracy,0.497,
|
| 552 |
+
≥3,6000,seedbench_seed_all,0.47909949972206783,
|
| 553 |
+
≥3,6000,textvqa_val_exact_match,0.48526,0.006795924028171543
|
| 554 |
+
≥3,7000,ai2d_exact_match,0.3805051813471503,0.00873837769131663
|
| 555 |
+
≥3,7000,average,0.42920372592352884,
|
| 556 |
+
≥3,7000,average_rank,2.7,
|
| 557 |
+
≥3,7000,chartqa_relaxed_overall,0.5728,0.009895414680177737
|
| 558 |
+
≥3,7000,docvqa_val_anls,0.5922749765517075,0.006249497802747461
|
| 559 |
+
≥3,7000,infovqa_val_anls,0.23025261139769496,0.006777932440928761
|
| 560 |
+
≥3,7000,mme_total_score,1289.3664465786314,
|
| 561 |
+
≥3,7000,mmmu_val_mmmu_acc,0.27111,
|
| 562 |
+
≥3,7000,mmstar_average,0.3153601470057574,
|
| 563 |
+
≥3,7000,ocrbench_ocrbench_accuracy,0.498,
|
| 564 |
+
≥3,7000,seedbench_seed_all,0.4991106170094497,
|
| 565 |
+
≥3,7000,textvqa_val_exact_match,0.50342,0.006801949281110862
|
| 566 |
+
≥3,8000,ai2d_exact_match,0.39799222797927464,0.008809880751131852
|
| 567 |
+
≥3,8000,average,0.438180751977588,
|
| 568 |
+
≥3,8000,average_rank,2.4,
|
| 569 |
+
≥3,8000,chartqa_relaxed_overall,0.5844,0.009858475126140203
|
| 570 |
+
≥3,8000,docvqa_val_anls,0.6044755547364623,0.006202062618138765
|
| 571 |
+
≥3,8000,infovqa_val_anls,0.21693088745597935,0.006529416377309533
|
| 572 |
+
≥3,8000,mme_total_score,1187.3639455782313,
|
| 573 |
+
≥3,8000,mmmu_val_mmmu_acc,0.28667,
|
| 574 |
+
≥3,8000,mmstar_average,0.31735843114519735,
|
| 575 |
+
≥3,8000,ocrbench_ocrbench_accuracy,0.506,
|
| 576 |
+
≥3,8000,seedbench_seed_all,0.5193996664813786,
|
| 577 |
+
≥3,8000,textvqa_val_exact_match,0.5104,0.0067972647853171315
|
| 578 |
+
≥3,9000,ai2d_exact_match,0.407059585492228,0.008842319527489083
|
| 579 |
+
≥3,9000,average,0.44395606448032265,
|
| 580 |
+
≥3,9000,average_rank,3.0,
|
| 581 |
+
≥3,9000,chartqa_relaxed_overall,0.598,0.009808000752013664
|
| 582 |
+
≥3,9000,docvqa_val_anls,0.6107522318987826,0.006184930065074595
|
| 583 |
+
≥3,9000,infovqa_val_anls,0.2347778400526839,0.0067525186273140235
|
| 584 |
+
≥3,9000,mme_total_score,1195.0110044017606,
|
| 585 |
+
≥3,9000,mmmu_val_mmmu_acc,0.28222,
|
| 586 |
+
≥3,9000,mmstar_average,0.3264280968647572,
|
| 587 |
+
≥3,9000,ocrbench_ocrbench_accuracy,0.521,
|
| 588 |
+
≥3,9000,seedbench_seed_all,0.5162868260144525,
|
| 589 |
+
≥3,9000,textvqa_val_exact_match,0.4990799999999999,0.00679372222366579
|
| 590 |
+
≥3,10000,ai2d_exact_match,0.41580310880829013,0.008870644443998564
|
| 591 |
+
≥3,10000,average,0.4524021135685592,
|
| 592 |
+
≥3,10000,average_rank,2.5,
|
| 593 |
+
≥3,10000,chartqa_relaxed_overall,0.5992,0.00980317218424473
|
| 594 |
+
≥3,10000,docvqa_val_anls,0.6291907180725226,0.0061343676879221844
|
| 595 |
+
≥3,10000,infovqa_val_anls,0.2282836442456148,0.006711844883510513
|
| 596 |
+
≥3,10000,mme_total_score,1326.8972589035614,
|
| 597 |
+
≥3,10000,mmmu_val_mmmu_acc,0.30111,
|
| 598 |
+
≥3,10000,mmstar_average,0.3402582102457474,
|
| 599 |
+
≥3,10000,ocrbench_ocrbench_accuracy,0.522,
|
| 600 |
+
≥3,10000,seedbench_seed_all,0.5240133407448583,
|
| 601 |
+
≥3,10000,textvqa_val_exact_match,0.51176,0.006789754092169055
|
| 602 |
+
≥3,11000,ai2d_exact_match,0.42389896373056996,0.008894308540753343
|
| 603 |
+
≥3,11000,average,0.45530296075039445,
|
| 604 |
+
≥3,11000,average_rank,3.3,
|
| 605 |
+
≥3,11000,chartqa_relaxed_overall,0.5992,0.00980317218424473
|
| 606 |
+
≥3,11000,docvqa_val_anls,0.637004884118944,0.0060952660672868655
|
| 607 |
+
≥3,11000,infovqa_val_anls,0.24182483065748125,0.006800414154487266
|
| 608 |
+
≥3,11000,mme_total_score,1229.9441776710685,
|
| 609 |
+
≥3,11000,mmmu_val_mmmu_acc,0.28556,
|
| 610 |
+
≥3,11000,mmstar_average,0.3210406141609519,
|
| 611 |
+
≥3,11000,ocrbench_ocrbench_accuracy,0.532,
|
| 612 |
+
≥3,11000,seedbench_seed_all,0.5272373540856031,
|
| 613 |
+
≥3,11000,textvqa_val_exact_match,0.52996,0.006774485841130848
|
| 614 |
+
≥3,12000,ai2d_exact_match,0.4378238341968912,0.008929303814062614
|
| 615 |
+
≥3,12000,average,0.4603808175211579,
|
| 616 |
+
≥3,12000,average_rank,2.8,
|
| 617 |
+
≥3,12000,chartqa_relaxed_overall,0.6036,0.009784943231599163
|
| 618 |
+
≥3,12000,docvqa_val_anls,0.6425836471445318,0.006082856374953106
|
| 619 |
+
≥3,12000,infovqa_val_anls,0.23921346499497054,0.00674373988949671
|
| 620 |
+
≥3,12000,mme_total_score,1253.4613845538215,
|
| 621 |
+
≥3,12000,mmmu_val_mmmu_acc,0.28,
|
| 622 |
+
≥3,12000,mmstar_average,0.3402058443723711,
|
| 623 |
+
≥3,12000,ocrbench_ocrbench_accuracy,0.533,
|
| 624 |
+
≥3,12000,seedbench_seed_all,0.5370205669816565,
|
| 625 |
+
≥3,12000,textvqa_val_exact_match,0.52998,0.006788538632972067
|
| 626 |
+
≥3,13000,ai2d_exact_match,0.4410621761658031,0.0089364152923413
|
| 627 |
+
≥3,13000,average,0.46617815773624777,
|
| 628 |
+
≥3,13000,average_rank,2.8,
|
| 629 |
+
≥3,13000,chartqa_relaxed_overall,0.6116,0.009749676839741497
|
| 630 |
+
≥3,13000,docvqa_val_anls,0.6435913615068958,0.006093449845266186
|
| 631 |
+
≥3,13000,infovqa_val_anls,0.24655403627027533,0.0068431739840280935
|
| 632 |
+
≥3,13000,mme_total_score,1338.6154461784713,
|
| 633 |
+
≥3,13000,mmmu_val_mmmu_acc,0.29556,
|
| 634 |
+
≥3,13000,mmstar_average,0.33746561777886447,
|
| 635 |
+
≥3,13000,ocrbench_ocrbench_accuracy,0.543,
|
| 636 |
+
≥3,13000,seedbench_seed_all,0.5384102279043913,
|
| 637 |
+
≥3,13000,textvqa_val_exact_match,0.5383600000000001,0.006773985492742893
|
| 638 |
+
≥3,14000,ai2d_exact_match,0.4426813471502591,0.008939826412531762
|
| 639 |
+
≥3,14000,average,0.46514162030247774,
|
| 640 |
+
≥3,14000,average_rank,3.6,
|
| 641 |
+
≥3,14000,chartqa_relaxed_overall,0.6104,0.009755142291143075
|
| 642 |
+
≥3,14000,docvqa_val_anls,0.6522898002805984,0.006013616663077038
|
| 643 |
+
≥3,14000,infovqa_val_anls,0.23824160343368236,0.006685403314320424
|
| 644 |
+
≥3,14000,mme_total_score,1290.797318927571,
|
| 645 |
+
≥3,14000,mmmu_val_mmmu_acc,0.29111,
|
| 646 |
+
≥3,14000,mmstar_average,0.34665083130189556,
|
| 647 |
+
≥3,14000,ocrbench_ocrbench_accuracy,0.533,
|
| 648 |
+
≥3,14000,seedbench_seed_all,0.5418010005558643,
|
| 649 |
+
≥3,14000,textvqa_val_exact_match,0.5300999999999999,0.006785072250248203
|
| 650 |
+
≥3,15000,ai2d_exact_match,0.4536917098445596,0.00896047438220532
|
| 651 |
+
≥3,15000,average,0.47760694744777243,
|
| 652 |
+
≥3,15000,average_rank,2.9,
|
| 653 |
+
≥3,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 654 |
+
≥3,15000,docvqa_val_anls,0.6656498012528964,0.006035466987037702
|
| 655 |
+
≥3,15000,infovqa_val_anls,0.2625991131461808,0.007063916588796129
|
| 656 |
+
≥3,15000,mme_total_score,1285.4465786314527,
|
| 657 |
+
≥3,15000,mmmu_val_mmmu_acc,0.30222,
|
| 658 |
+
≥3,15000,mmstar_average,0.3502185231309507,
|
| 659 |
+
≥3,15000,ocrbench_ocrbench_accuracy,0.558,
|
| 660 |
+
≥3,15000,seedbench_seed_all,0.5500833796553641,
|
| 661 |
+
≥3,15000,textvqa_val_exact_match,0.544,0.0067575389652278954
|
| 662 |
+
≥3,16000,ai2d_exact_match,0.4689119170984456,0.008981742470016596
|
| 663 |
+
≥3,16000,average,0.4804309718902879,
|
| 664 |
+
≥3,16000,average_rank,2.5,
|
| 665 |
+
≥3,16000,chartqa_relaxed_overall,0.6204,0.009707689307588963
|
| 666 |
+
≥3,16000,docvqa_val_anls,0.6742164965149466,0.0059800657435710326
|
| 667 |
+
≥3,16000,infovqa_val_anls,0.2633355771988975,0.00704601997176055
|
| 668 |
+
≥3,16000,mme_total_score,1288.4584833933575,
|
| 669 |
+
≥3,16000,mmmu_val_mmmu_acc,0.29556,
|
| 670 |
+
≥3,16000,mmstar_average,0.3443487528651147,
|
| 671 |
+
≥3,16000,ocrbench_ocrbench_accuracy,0.55,
|
| 672 |
+
≥3,16000,seedbench_seed_all,0.5508060033351863,
|
| 673 |
+
≥3,16000,textvqa_val_exact_match,0.5563,0.006742548063668376
|
| 674 |
+
≥3,17000,ai2d_exact_match,0.45595854922279794,0.008964175733819342
|
| 675 |
+
≥3,17000,average,0.4809373657329622,
|
| 676 |
+
≥3,17000,average_rank,3.3,
|
| 677 |
+
≥3,17000,chartqa_relaxed_overall,0.6204,0.009707689307588963
|
| 678 |
+
≥3,17000,docvqa_val_anls,0.6739488016448908,0.005975889304414765
|
| 679 |
+
≥3,17000,infovqa_val_anls,0.2580649809644441,0.007031141926644411
|
| 680 |
+
≥3,17000,mme_total_score,1230.4375750300119,
|
| 681 |
+
≥3,17000,mmmu_val_mmmu_acc,0.29444,
|
| 682 |
+
≥3,17000,mmstar_average,0.3444925534276732,
|
| 683 |
+
≥3,17000,ocrbench_ocrbench_accuracy,0.578,
|
| 684 |
+
≥3,17000,seedbench_seed_all,0.5565314063368538,
|
| 685 |
+
≥3,17000,textvqa_val_exact_match,0.5466,0.006752985159298985
|
| 686 |
+
≥3,18000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
|
| 687 |
+
≥3,18000,average,0.48088758936650067,
|
| 688 |
+
≥3,18000,average_rank,3.5,
|
| 689 |
+
≥3,18000,chartqa_relaxed_overall,0.6252,0.009683361554563506
|
| 690 |
+
≥3,18000,docvqa_val_anls,0.675384499731014,0.005997750609265588
|
| 691 |
+
≥3,18000,infovqa_val_anls,0.2579974692510198,0.0070128299378415275
|
| 692 |
+
≥3,18000,mme_total_score,1234.843237294918,
|
| 693 |
+
≥3,18000,mmmu_val_mmmu_acc,0.3,
|
| 694 |
+
≥3,18000,mmstar_average,0.3363850750308216,
|
| 695 |
+
≥3,18000,ocrbench_ocrbench_accuracy,0.566,
|
| 696 |
+
≥3,18000,seedbench_seed_all,0.5558643690939411,
|
| 697 |
+
≥3,18000,textvqa_val_exact_match,0.55196,0.006755291146330729
|
| 698 |
+
≥3,19000,ai2d_exact_match,0.4634067357512953,0.008975020819363737
|
| 699 |
+
≥3,19000,average,0.4861360634692545,
|
| 700 |
+
≥3,19000,average_rank,3.3,
|
| 701 |
+
≥3,19000,chartqa_relaxed_overall,0.6312,0.009651522406019766
|
| 702 |
+
≥3,19000,docvqa_val_anls,0.6819220996842664,0.005927423649467908
|
| 703 |
+
≥3,19000,infovqa_val_anls,0.26277439983326806,0.007102707331042042
|
| 704 |
+
≥3,19000,mme_total_score,1337.9653861544616,
|
| 705 |
+
≥3,19000,mmmu_val_mmmu_acc,0.29889,
|
| 706 |
+
≥3,19000,mmstar_average,0.34778832316957964,
|
| 707 |
+
≥3,19000,ocrbench_ocrbench_accuracy,0.574,
|
| 708 |
+
≥3,19000,seedbench_seed_all,0.5614230127848805,
|
| 709 |
+
≥3,19000,textvqa_val_exact_match,0.55382,0.006743039020727005
|
| 710 |
+
≥3,20000,ai2d_exact_match,0.4841321243523316,0.008994621193008031
|
| 711 |
+
≥3,20000,average,0.4916087790351852,
|
| 712 |
+
≥3,20000,average_rank,2.3,
|
| 713 |
+
≥3,20000,chartqa_relaxed_overall,0.638,0.009613499245701268
|
| 714 |
+
≥3,20000,docvqa_val_anls,0.6839168937073106,0.005936410873687919
|
| 715 |
+
≥3,20000,infovqa_val_anls,0.25441216838205727,0.006890877173562315
|
| 716 |
+
≥3,20000,mme_total_score,1330.3037214885953,
|
| 717 |
+
≥3,20000,mmmu_val_mmmu_acc,0.31,
|
| 718 |
+
≥3,20000,mmstar_average,0.35052721898280503,
|
| 719 |
+
≥3,20000,ocrbench_ocrbench_accuracy,0.572,
|
| 720 |
+
≥3,20000,seedbench_seed_all,0.5630906058921623,
|
| 721 |
+
≥3,20000,textvqa_val_exact_match,0.5684000000000001,0.00672360984783302
|
| 722 |
+
≥4,1000,ai2d_exact_match,0.26360103626943004,0.00792979255467583
|
| 723 |
+
≥4,1000,average,0.26922373369534647,
|
| 724 |
+
≥4,1000,average_rank,3.3,
|
| 725 |
+
≥4,1000,chartqa_relaxed_overall,0.3488,0.009533718094861256
|
| 726 |
+
≥4,1000,docvqa_val_anls,0.3599045480096881,0.005885735735631119
|
| 727 |
+
≥4,1000,infovqa_val_anls,0.17148252623256244,0.0061724612150041895
|
| 728 |
+
≥4,1000,mme_total_score,1104.3533413365346,
|
| 729 |
+
≥4,1000,mmmu_val_mmmu_acc,0.24,
|
| 730 |
+
≥4,1000,mmstar_average,0.21109041770474804,
|
| 731 |
+
≥4,1000,ocrbench_ocrbench_accuracy,0.29,
|
| 732 |
+
≥4,1000,seedbench_seed_all,0.24313507504168982,
|
| 733 |
+
≥4,1000,textvqa_val_exact_match,0.295,0.006241441429527609
|
| 734 |
+
≥4,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
|
| 735 |
+
≥4,2000,average,0.3192621308996215,
|
| 736 |
+
≥4,2000,average_rank,2.9,
|
| 737 |
+
≥4,2000,chartqa_relaxed_overall,0.4644,0.009976616117083942
|
| 738 |
+
≥4,2000,docvqa_val_anls,0.44610634422212336,0.006125661837378556
|
| 739 |
+
≥4,2000,infovqa_val_anls,0.19012118870963063,0.006420072608935975
|
| 740 |
+
≥4,2000,mme_total_score,1052.5613245298118,
|
| 741 |
+
≥4,2000,mmmu_val_mmmu_acc,0.23889,
|
| 742 |
+
≥4,2000,mmstar_average,0.22088912220303317,
|
| 743 |
+
≥4,2000,ocrbench_ocrbench_accuracy,0.389,
|
| 744 |
+
≥4,2000,seedbench_seed_all,0.262479155086159,
|
| 745 |
+
≥4,2000,textvqa_val_exact_match,0.39852,0.006693677836929181
|
| 746 |
+
≥4,3000,ai2d_exact_match,0.2545336787564767,0.007840040862810524
|
| 747 |
+
≥4,3000,average,0.34899183633853254,
|
| 748 |
+
≥4,3000,average_rank,3.1,
|
| 749 |
+
≥4,3000,chartqa_relaxed_overall,0.5144,0.009997851710018818
|
| 750 |
+
≥4,3000,docvqa_val_anls,0.5122633926443586,0.006250170224123374
|
| 751 |
+
≥4,3000,infovqa_val_anls,0.21839296983497156,0.006786311152255019
|
| 752 |
+
≥4,3000,mme_total_score,1148.9873949579833,
|
| 753 |
+
≥4,3000,mmmu_val_mmmu_acc,0.24667,
|
| 754 |
+
≥4,3000,mmstar_average,0.23884910949080793,
|
| 755 |
+
≥4,3000,ocrbench_ocrbench_accuracy,0.426,
|
| 756 |
+
≥4,3000,seedbench_seed_all,0.29927737632017787,
|
| 757 |
+
≥4,3000,textvqa_val_exact_match,0.43054,0.006761938068430401
|
| 758 |
+
≥4,4000,ai2d_exact_match,0.2814119170984456,0.00809362228799086
|
| 759 |
+
≥4,4000,average,0.3808723899304912,
|
| 760 |
+
≥4,4000,average_rank,2.3,
|
| 761 |
+
≥4,4000,chartqa_relaxed_overall,0.536,0.009976041728231964
|
| 762 |
+
≥4,4000,docvqa_val_anls,0.5444976153718191,0.006262351643342788
|
| 763 |
+
≥4,4000,infovqa_val_anls,0.22943118895386538,0.006865542219383826
|
| 764 |
+
≥4,4000,mme_total_score,1161.4330732292917,
|
| 765 |
+
≥4,4000,mmmu_val_mmmu_acc,0.26889,
|
| 766 |
+
≥4,4000,mmstar_average,0.2546319608241094,
|
| 767 |
+
≥4,4000,ocrbench_ocrbench_accuracy,0.459,
|
| 768 |
+
≥4,4000,seedbench_seed_all,0.39988882712618123,
|
| 769 |
+
≥4,4000,textvqa_val_exact_match,0.4541,0.006780990662644609
|
| 770 |
+
≥4,5000,ai2d_exact_match,0.31573834196891193,0.00836578020190971
|
| 771 |
+
≥4,5000,average,0.4004212057382194,
|
| 772 |
+
≥4,5000,average_rank,2.7,
|
| 773 |
+
≥4,5000,chartqa_relaxed_overall,0.5544,0.009942625323290008
|
| 774 |
+
≥4,5000,docvqa_val_anls,0.556855142418819,0.006267140081468451
|
| 775 |
+
≥4,5000,infovqa_val_anls,0.23435340618373432,0.006883129487757931
|
| 776 |
+
≥4,5000,mme_total_score,1145.157863145258,
|
| 777 |
+
≥4,5000,mmmu_val_mmmu_acc,0.26556,
|
| 778 |
+
≥4,5000,mmstar_average,0.2888277743020811,
|
| 779 |
+
≥4,5000,ocrbench_ocrbench_accuracy,0.475,
|
| 780 |
+
≥4,5000,seedbench_seed_all,0.445136186770428,
|
| 781 |
+
≥4,5000,textvqa_val_exact_match,0.46792,0.0067973094238147356
|
| 782 |
+
≥4,6000,ai2d_exact_match,0.38471502590673573,0.008756678690415541
|
| 783 |
+
≥4,6000,average,0.42131977921781544,
|
| 784 |
+
≥4,6000,average_rank,3.0,
|
| 785 |
+
≥4,6000,chartqa_relaxed_overall,0.556,0.00993907007952043
|
| 786 |
+
≥4,6000,docvqa_val_anls,0.5727106862384739,0.006269180765398416
|
| 787 |
+
≥4,6000,infovqa_val_anls,0.2310709838980833,0.006744459748098398
|
| 788 |
+
≥4,6000,mme_total_score,1139.8311324529811,
|
| 789 |
+
≥4,6000,mmmu_val_mmmu_acc,0.27,
|
| 790 |
+
≥4,6000,mmstar_average,0.30779610290926424,
|
| 791 |
+
≥4,6000,ocrbench_ocrbench_accuracy,0.492,
|
| 792 |
+
≥4,6000,seedbench_seed_all,0.4933852140077821,
|
| 793 |
+
≥4,6000,textvqa_val_exact_match,0.4841999999999999,0.006796772117869219
|
| 794 |
+
≥4,7000,ai2d_exact_match,0.39281088082901555,0.008789930274160654
|
| 795 |
+
≥4,7000,average,0.42891500537341953,
|
| 796 |
+
≥4,7000,average_rank,2.9,
|
| 797 |
+
≥4,7000,chartqa_relaxed_overall,0.576,0.009885782289560632
|
| 798 |
+
≥4,7000,docvqa_val_anls,0.5907488324071782,0.006231156163373406
|
| 799 |
+
≥4,7000,infovqa_val_anls,0.24013816441297325,0.006930097636315065
|
| 800 |
+
≥4,7000,mme_total_score,1162.137755102041,
|
| 801 |
+
≥4,7000,mmmu_val_mmmu_acc,0.27556,
|
| 802 |
+
≥4,7000,mmstar_average,0.29752599783778977,
|
| 803 |
+
≥4,7000,ocrbench_ocrbench_accuracy,0.504,
|
| 804 |
+
≥4,7000,seedbench_seed_all,0.5001111728738188,
|
| 805 |
+
≥4,7000,textvqa_val_exact_match,0.48333999999999994,0.006805450147517214
|
| 806 |
+
≥4,8000,ai2d_exact_match,0.4164507772020725,0.008872627955954676
|
| 807 |
+
≥4,8000,average,0.43574351275219425,
|
| 808 |
+
≥4,8000,average_rank,3.5,
|
| 809 |
+
≥4,8000,chartqa_relaxed_overall,0.5808,0.009870537726284339
|
| 810 |
+
≥4,8000,docvqa_val_anls,0.6057226019616091,0.0061946427553956785
|
| 811 |
+
≥4,8000,infovqa_val_anls,0.2476713069705094,0.006953489019987495
|
| 812 |
+
≥4,8000,mme_total_score,1170.280612244898,
|
| 813 |
+
≥4,8000,mmmu_val_mmmu_acc,0.26778,
|
| 814 |
+
≥4,8000,mmstar_average,0.30520454953605713,
|
| 815 |
+
≥4,8000,ocrbench_ocrbench_accuracy,0.496,
|
| 816 |
+
≥4,8000,seedbench_seed_all,0.5082823790994997,
|
| 817 |
+
≥4,8000,textvqa_val_exact_match,0.49378,0.006806491606223952
|
| 818 |
+
≥4,9000,ai2d_exact_match,0.42357512953367876,0.008893409023558714
|
| 819 |
+
≥4,9000,average,0.441337144868937,
|
| 820 |
+
≥4,9000,average_rank,3.4,
|
| 821 |
+
≥4,9000,chartqa_relaxed_overall,0.578,0.00987954665846924
|
| 822 |
+
≥4,9000,docvqa_val_anls,0.6243353881540346,0.006123815047404004
|
| 823 |
+
≥4,9000,infovqa_val_anls,0.2437398253282973,0.00692277294272151
|
| 824 |
+
≥4,9000,mme_total_score,1255.001700680272,
|
| 825 |
+
≥4,9000,mmmu_val_mmmu_acc,0.26778,
|
| 826 |
+
≥4,9000,mmstar_average,0.31167080905344985,
|
| 827 |
+
≥4,9000,ocrbench_ocrbench_accuracy,0.512,
|
| 828 |
+
≥4,9000,seedbench_seed_all,0.5116731517509727,
|
| 829 |
+
≥4,9000,textvqa_val_exact_match,0.49926000000000004,0.006799642454386958
|
| 830 |
+
≥4,10000,ai2d_exact_match,0.44462435233160624,0.00894379269709736
|
| 831 |
+
≥4,10000,average,0.4536388119594498,
|
| 832 |
+
≥4,10000,average_rank,3.6,
|
| 833 |
+
≥4,10000,chartqa_relaxed_overall,0.5992,0.00980317218424473
|
| 834 |
+
≥4,10000,docvqa_val_anls,0.6264595846035441,0.006147505656275056
|
| 835 |
+
≥4,10000,infovqa_val_anls,0.2598110483089896,0.00706252458320144
|
| 836 |
+
≥4,10000,mme_total_score,1192.952080832333,
|
| 837 |
+
≥4,10000,mmmu_val_mmmu_acc,0.28556,
|
| 838 |
+
≥4,10000,mmstar_average,0.3186673407344323,
|
| 839 |
+
≥4,10000,ocrbench_ocrbench_accuracy,0.52,
|
| 840 |
+
≥4,10000,seedbench_seed_all,0.5205669816564759,
|
| 841 |
+
≥4,10000,textvqa_val_exact_match,0.5078600000000001,0.006802447996107573
|
| 842 |
+
≥4,11000,ai2d_exact_match,0.4536917098445596,0.008960474382205324
|
| 843 |
+
≥4,11000,average,0.46152725733636885,
|
| 844 |
+
≥4,11000,average_rank,2.3,
|
| 845 |
+
≥4,11000,chartqa_relaxed_overall,0.6004,0.009798282427824488
|
| 846 |
+
≥4,11000,docvqa_val_anls,0.6401993666501584,0.0060898160255800525
|
| 847 |
+
≥4,11000,infovqa_val_anls,0.2552761209118603,0.007046581941151624
|
| 848 |
+
≥4,11000,mme_total_score,1246.6340536214486,
|
| 849 |
+
≥4,11000,mmmu_val_mmmu_acc,0.28,
|
| 850 |
+
≥4,11000,mmstar_average,0.3347344054467562,
|
| 851 |
+
≥4,11000,ocrbench_ocrbench_accuracy,0.533,
|
| 852 |
+
≥4,11000,seedbench_seed_all,0.5306837131739855,
|
| 853 |
+
≥4,11000,textvqa_val_exact_match,0.5257599999999999,0.006772980077619183
|
| 854 |
+
≥4,12000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
|
| 855 |
+
≥4,12000,average,0.46100484234717837,
|
| 856 |
+
≥4,12000,average_rank,3.7,
|
| 857 |
+
≥4,12000,chartqa_relaxed_overall,0.5956,0.009817474681589429
|
| 858 |
+
≥4,12000,docvqa_val_anls,0.6409268162702743,0.006097072959583667
|
| 859 |
+
≥4,12000,infovqa_val_anls,0.26230050824120466,0.007170670588017343
|
| 860 |
+
≥4,12000,mme_total_score,1168.390556222489,
|
| 861 |
+
≥4,12000,mmmu_val_mmmu_acc,0.26889,
|
| 862 |
+
≥4,12000,mmstar_average,0.3293674421306994,
|
| 863 |
+
≥4,12000,ocrbench_ocrbench_accuracy,0.538,
|
| 864 |
+
≥4,12000,seedbench_seed_all,0.5314619232907171,
|
| 865 |
+
≥4,12000,textvqa_val_exact_match,0.5233,0.006791483405661084
|
| 866 |
+
≥4,13000,ai2d_exact_match,0.46081606217616583,0.008971477299154906
|
| 867 |
+
≥4,13000,average,0.46897968661537504,
|
| 868 |
+
≥4,13000,average_rank,3.0,
|
| 869 |
+
≥4,13000,chartqa_relaxed_overall,0.6084,0.00976411343463736
|
| 870 |
+
≥4,13000,docvqa_val_anls,0.6557097904355208,0.006045284321472833
|
| 871 |
+
≥4,13000,infovqa_val_anls,0.25716935409374025,0.007037968981507592
|
| 872 |
+
≥4,13000,mme_total_score,1214.4760904361744,
|
| 873 |
+
≥4,13000,mmmu_val_mmmu_acc,0.27444,
|
| 874 |
+
≥4,13000,mmstar_average,0.35062705343328215,
|
| 875 |
+
≥4,13000,ocrbench_ocrbench_accuracy,0.542,
|
| 876 |
+
≥4,13000,seedbench_seed_all,0.5388549193996665,
|
| 877 |
+
≥4,13000,textvqa_val_exact_match,0.5328,0.006772208248489718
|
| 878 |
+
≥4,14000,ai2d_exact_match,0.4637305699481865,0.008975446629055962
|
| 879 |
+
≥4,14000,average,0.46882712562329804,
|
| 880 |
+
≥4,14000,average_rank,3.4,
|
| 881 |
+
≥4,14000,chartqa_relaxed_overall,0.6052,0.009778109662477129
|
| 882 |
+
≥4,14000,docvqa_val_anls,0.6600293980607723,0.006003818486747537
|
| 883 |
+
≥4,14000,infovqa_val_anls,0.2604896578960276,0.0070806001081496605
|
| 884 |
+
≥4,14000,mme_total_score,1180.2360944377751,
|
| 885 |
+
≥4,14000,mmmu_val_mmmu_acc,0.29889,
|
| 886 |
+
≥4,14000,mmstar_average,0.3370405135985262,
|
| 887 |
+
≥4,14000,ocrbench_ocrbench_accuracy,0.532,
|
| 888 |
+
≥4,14000,seedbench_seed_all,0.5311839911061701,
|
| 889 |
+
≥4,14000,textvqa_val_exact_match,0.53088,0.006765681045393848
|
| 890 |
+
≥4,15000,ai2d_exact_match,0.469559585492228,0.008982461065390123
|
| 891 |
+
≥4,15000,average,0.47678210727691706,
|
| 892 |
+
≥4,15000,average_rank,3.0,
|
| 893 |
+
≥4,15000,chartqa_relaxed_overall,0.6228,0.009695651925812239
|
| 894 |
+
≥4,15000,docvqa_val_anls,0.668732849209273,0.006002172541493102
|
| 895 |
+
≥4,15000,infovqa_val_anls,0.2541377129865746,0.006911037097155498
|
| 896 |
+
≥4,15000,mme_total_score,1198.8395358143257,
|
| 897 |
+
≥4,15000,mmmu_val_mmmu_acc,0.28111,
|
| 898 |
+
≥4,15000,mmstar_average,0.3574887121899482,
|
| 899 |
+
≥4,15000,ocrbench_ocrbench_accuracy,0.558,
|
| 900 |
+
≥4,15000,seedbench_seed_all,0.5421901056142301,
|
| 901 |
+
≥4,15000,textvqa_val_exact_match,0.53702,0.0067620891069120025
|
| 902 |
+
≥4,16000,ai2d_exact_match,0.4689119170984456,0.00898174247001659
|
| 903 |
+
≥4,16000,average,0.47623501147363423,
|
| 904 |
+
≥4,16000,average_rank,3.7,
|
| 905 |
+
≥4,16000,chartqa_relaxed_overall,0.6184,0.009717527882093043
|
| 906 |
+
≥4,16000,docvqa_val_anls,0.664711612332228,0.006033753206179003
|
| 907 |
+
≥4,16000,infovqa_val_anls,0.26137627968800997,0.0069587136315641595
|
| 908 |
+
≥4,16000,mme_total_score,1223.327631052421,
|
| 909 |
+
≥4,16000,mmmu_val_mmmu_acc,0.27778,
|
| 910 |
+
≥4,16000,mmstar_average,0.3532221312757646,
|
| 911 |
+
≥4,16000,ocrbench_ocrbench_accuracy,0.545,
|
| 912 |
+
≥4,16000,seedbench_seed_all,0.5476931628682602,
|
| 913 |
+
≥4,16000,textvqa_val_exact_match,0.54902,0.006730591957147508
|
| 914 |
+
≥4,17000,ai2d_exact_match,0.47830310880829013,0.008990677331728418
|
| 915 |
+
≥4,17000,average,0.4815150623543914,
|
| 916 |
+
≥4,17000,average_rank,2.8,
|
| 917 |
+
≥4,17000,chartqa_relaxed_overall,0.6208,0.009705700605814084
|
| 918 |
+
≥4,17000,docvqa_val_anls,0.6784945768946954,0.005958779114256312
|
| 919 |
+
≥4,17000,infovqa_val_anls,0.27415576971914574,0.007211057524316044
|
| 920 |
+
≥4,17000,mme_total_score,1267.6510604241696,
|
| 921 |
+
≥4,17000,mmmu_val_mmmu_acc,0.27889,
|
| 922 |
+
≥4,17000,mmstar_average,0.35485659715149337,
|
| 923 |
+
≥4,17000,ocrbench_ocrbench_accuracy,0.55,
|
| 924 |
+
≥4,17000,seedbench_seed_all,0.5479155086158978,
|
| 925 |
+
≥4,17000,textvqa_val_exact_match,0.5502199999999999,0.006738803500215962
|
| 926 |
+
≥4,18000,ai2d_exact_match,0.4795984455958549,0.008991659681159872
|
| 927 |
+
≥4,18000,average,0.4839656525796875,
|
| 928 |
+
≥4,18000,average_rank,3.1,
|
| 929 |
+
≥4,18000,chartqa_relaxed_overall,0.6228,0.009695651925812239
|
| 930 |
+
≥4,18000,docvqa_val_anls,0.680615041882376,0.005957029786047422
|
| 931 |
+
≥4,18000,infovqa_val_anls,0.27507992619170296,0.007267921800589956
|
| 932 |
+
≥4,18000,mme_total_score,1226.5048019207684,
|
| 933 |
+
≥4,18000,mmmu_val_mmmu_acc,0.28111,
|
| 934 |
+
≥4,18000,mmstar_average,0.35607565298805366,
|
| 935 |
+
≥4,18000,ocrbench_ocrbench_accuracy,0.555,
|
| 936 |
+
≥4,18000,seedbench_seed_all,0.5532518065591996,
|
| 937 |
+
≥4,18000,textvqa_val_exact_match,0.55216,0.006730239676654988
|
| 938 |
+
≥4,19000,ai2d_exact_match,0.4734455958549223,0.008986453895645547
|
| 939 |
+
≥4,19000,average,0.485443851233213,
|
| 940 |
+
≥4,19000,average_rank,3.0,
|
| 941 |
+
≥4,19000,chartqa_relaxed_overall,0.6276,0.009670817229291067
|
| 942 |
+
≥4,19000,docvqa_val_anls,0.690884348495626,0.005908240141234498
|
| 943 |
+
≥4,19000,infovqa_val_anls,0.2676836840845966,0.007165567282387595
|
| 944 |
+
≥4,19000,mme_total_score,1323.2516006402561,
|
| 945 |
+
≥4,19000,mmmu_val_mmmu_acc,0.28556,
|
| 946 |
+
≥4,19000,mmstar_average,0.33406913716627346,
|
| 947 |
+
≥4,19000,ocrbench_ocrbench_accuracy,0.584,
|
| 948 |
+
≥4,19000,seedbench_seed_all,0.5414118954974986,
|
| 949 |
+
≥4,19000,textvqa_val_exact_match,0.56434,0.006692191716171407
|
| 950 |
+
≥4,20000,ai2d_exact_match,0.4876943005181347,0.008996428218289526
|
| 951 |
+
≥4,20000,average,0.4906341423361293,
|
| 952 |
+
≥4,20000,average_rank,3.3,
|
| 953 |
+
≥4,20000,chartqa_relaxed_overall,0.6284,0.009666579183001631
|
| 954 |
+
≥4,20000,docvqa_val_anls,0.6887236251150223,0.005918556723502163
|
| 955 |
+
≥4,20000,infovqa_val_anls,0.2809124119459898,0.007354611102020885
|
| 956 |
+
≥4,20000,mme_total_score,1254.5532212885155,
|
| 957 |
+
≥4,20000,mmmu_val_mmmu_acc,0.29333,
|
| 958 |
+
≥4,20000,mmstar_average,0.34736535367392096,
|
| 959 |
+
≥4,20000,ocrbench_ocrbench_accuracy,0.572,
|
| 960 |
+
≥4,20000,seedbench_seed_all,0.5508615897720957,
|
| 961 |
+
≥4,20000,textvqa_val_exact_match,0.56642,0.00672606309106159
|
| 962 |
+
≥5,1000,ai2d_exact_match,0.26327720207253885,0.007926662492947056
|
| 963 |
+
≥5,1000,average,0.27709006947371073,
|
| 964 |
+
≥5,1000,average_rank,2.6,
|
| 965 |
+
≥5,1000,chartqa_relaxed_overall,0.3412,0.009484144853461517
|
| 966 |
+
≥5,1000,docvqa_val_anls,0.36296241117667905,0.005852839558467308
|
| 967 |
+
≥5,1000,infovqa_val_anls,0.17994878830754762,0.006336933369747534
|
| 968 |
+
≥5,1000,mme_total_score,968.375450180072,
|
| 969 |
+
≥5,1000,mmmu_val_mmmu_acc,0.26667,
|
| 970 |
+
≥5,1000,mmstar_average,0.22684359669162246,
|
| 971 |
+
≥5,1000,ocrbench_ocrbench_accuracy,0.301,
|
| 972 |
+
≥5,1000,seedbench_seed_all,0.25152862701500833,
|
| 973 |
+
≥5,1000,textvqa_val_exact_match,0.30038,0.006282823083071704
|
| 974 |
+
≥5,2000,ai2d_exact_match,0.27331606217616583,0.008021157484423327
|
| 975 |
+
≥5,2000,average,0.318491261297989,
|
| 976 |
+
≥5,2000,average_rank,3.2,
|
| 977 |
+
≥5,2000,chartqa_relaxed_overall,0.4524,0.009956573172519544
|
| 978 |
+
≥5,2000,docvqa_val_anls,0.4578740641673747,0.006180081722767688
|
| 979 |
+
≥5,2000,infovqa_val_anls,0.1919057230410833,0.006401757863597739
|
| 980 |
+
≥5,2000,mme_total_score,1031.2603041216487,
|
| 981 |
+
≥5,2000,mmmu_val_mmmu_acc,0.24667,
|
| 982 |
+
≥5,2000,mmstar_average,0.21129996032951712,
|
| 983 |
+
≥5,2000,ocrbench_ocrbench_accuracy,0.383,
|
| 984 |
+
≥5,2000,seedbench_seed_all,0.25597554196775985,
|
| 985 |
+
≥5,2000,textvqa_val_exact_match,0.39398,0.0066750028503822015
|
| 986 |
+
≥5,3000,ai2d_exact_match,0.2661917098445596,0.007954634970279373
|
| 987 |
+
≥5,3000,average,0.3470898411915701,
|
| 988 |
+
≥5,3000,average_rank,3.1,
|
| 989 |
+
≥5,3000,chartqa_relaxed_overall,0.4888,0.009999490983443667
|
| 990 |
+
≥5,3000,docvqa_val_anls,0.5063663265388635,0.006269377896147078
|
| 991 |
+
≥5,3000,infovqa_val_anls,0.2002412084672373,0.006449644926640854
|
| 992 |
+
≥5,3000,mme_total_score,1176.8578431372548,
|
| 993 |
+
≥5,3000,mmmu_val_mmmu_acc,0.25889,
|
| 994 |
+
≥5,3000,mmstar_average,0.2226891646728035,
|
| 995 |
+
≥5,3000,ocrbench_ocrbench_accuracy,0.422,
|
| 996 |
+
≥5,3000,seedbench_seed_all,0.32229016120066706,
|
| 997 |
+
≥5,3000,textvqa_val_exact_match,0.43633999999999995,0.006743513614961789
|
| 998 |
+
≥5,4000,ai2d_exact_match,0.32091968911917096,0.008402150106895235
|
| 999 |
+
≥5,4000,average,0.38454946481840957,
|
| 1000 |
+
≥5,4000,average_rank,3.0,
|
| 1001 |
+
≥5,4000,chartqa_relaxed_overall,0.5244,0.009990083919101193
|
| 1002 |
+
≥5,4000,docvqa_val_anls,0.5408182220870532,0.0062304604635426315
|
| 1003 |
+
≥5,4000,infovqa_val_anls,0.21034975209325477,0.006529781109938355
|
| 1004 |
+
≥5,4000,mme_total_score,1186.4263705482194,
|
| 1005 |
+
≥5,4000,mmmu_val_mmmu_acc,0.26556,
|
| 1006 |
+
≥5,4000,mmstar_average,0.26918979355147643,
|
| 1007 |
+
≥5,4000,ocrbench_ocrbench_accuracy,0.452,
|
| 1008 |
+
≥5,4000,seedbench_seed_all,0.4339077265147304,
|
| 1009 |
+
≥5,4000,textvqa_val_exact_match,0.4438,0.006776008770579609
|
| 1010 |
+
≥5,5000,ai2d_exact_match,0.3494170984455959,0.008581339503665948
|
| 1011 |
+
≥5,5000,average,0.4053929772745627,
|
| 1012 |
+
≥5,5000,average_rank,2.9,
|
| 1013 |
+
≥5,5000,chartqa_relaxed_overall,0.546,0.009959582185560013
|
| 1014 |
+
≥5,5000,docvqa_val_anls,0.5611769594797935,0.006252030837783964
|
| 1015 |
+
≥5,5000,infovqa_val_anls,0.2283202771889911,0.006874345513158979
|
| 1016 |
+
≥5,5000,mme_total_score,1179.6603641456581,
|
| 1017 |
+
≥5,5000,mmmu_val_mmmu_acc,0.27556,
|
| 1018 |
+
≥5,5000,mmstar_average,0.28276518409209217,
|
| 1019 |
+
≥5,5000,ocrbench_ocrbench_accuracy,0.464,
|
| 1020 |
+
≥5,5000,seedbench_seed_all,0.4750972762645914,
|
| 1021 |
+
≥5,5000,textvqa_val_exact_match,0.4662,0.0067984671677640855
|
| 1022 |
+
≥5,6000,ai2d_exact_match,0.3636658031088083,0.008658158841882573
|
| 1023 |
+
≥5,6000,average,0.41623598541202544,
|
| 1024 |
+
≥5,6000,average_rank,2.6,
|
| 1025 |
+
≥5,6000,chartqa_relaxed_overall,0.5584,0.009933541468098847
|
| 1026 |
+
≥5,6000,docvqa_val_anls,0.5839255211800125,0.006223251970774856
|
| 1027 |
+
≥5,6000,infovqa_val_anls,0.23899504944949723,0.007013133491201096
|
| 1028 |
+
≥5,6000,mme_total_score,1252.7314925970388,
|
| 1029 |
+
≥5,6000,mmmu_val_mmmu_acc,0.27222,
|
| 1030 |
+
≥5,6000,mmstar_average,0.3101670336024846,
|
| 1031 |
+
≥5,6000,ocrbench_ocrbench_accuracy,0.473,
|
| 1032 |
+
≥5,6000,seedbench_seed_all,0.49483046136742637,
|
| 1033 |
+
≥5,6000,textvqa_val_exact_match,0.45092,0.006772193384764505
|
| 1034 |
+
≥5,7000,ai2d_exact_match,0.42033678756476683,0.008884198538329093
|
| 1035 |
+
≥5,7000,average,0.4338560588435303,
|
| 1036 |
+
≥5,7000,average_rank,2.5,
|
| 1037 |
+
≥5,7000,chartqa_relaxed_overall,0.5692,0.00990574548014469
|
| 1038 |
+
≥5,7000,docvqa_val_anls,0.5924368390904757,0.006231022369252223
|
| 1039 |
+
≥5,7000,infovqa_val_anls,0.23945153983485024,0.007006534034576772
|
| 1040 |
+
≥5,7000,mme_total_score,1315.113445378151,
|
| 1041 |
+
≥5,7000,mmmu_val_mmmu_acc,0.3,
|
| 1042 |
+
≥5,7000,mmstar_average,0.31063340423564356,
|
| 1043 |
+
≥5,7000,ocrbench_ocrbench_accuracy,0.488,
|
| 1044 |
+
≥5,7000,seedbench_seed_all,0.5067259588660367,
|
| 1045 |
+
≥5,7000,textvqa_val_exact_match,0.47791999999999996,0.006793800546466833
|
| 1046 |
+
≥5,8000,ai2d_exact_match,0.42908031088082904,0.008908169846895226
|
| 1047 |
+
≥5,8000,average,0.43778255861533233,
|
| 1048 |
+
≥5,8000,average_rank,3.0,
|
| 1049 |
+
≥5,8000,chartqa_relaxed_overall,0.5752,0.009888230116554488
|
| 1050 |
+
≥5,8000,docvqa_val_anls,0.6032859006895523,0.006193925022795706
|
| 1051 |
+
≥5,8000,infovqa_val_anls,0.24493490021598546,0.007008771158507111
|
| 1052 |
+
≥5,8000,mme_total_score,1304.6824729891955,
|
| 1053 |
+
≥5,8000,mmmu_val_mmmu_acc,0.28667,
|
| 1054 |
+
≥5,8000,mmstar_average,0.31703546216629863,
|
| 1055 |
+
≥5,8000,ocrbench_ocrbench_accuracy,0.487,
|
| 1056 |
+
≥5,8000,seedbench_seed_all,0.5096164535853251,
|
| 1057 |
+
≥5,8000,textvqa_val_exact_match,0.48722,0.006804659800386776
|
| 1058 |
+
≥5,9000,ai2d_exact_match,0.42940414507772023,0.008909003051055709
|
| 1059 |
+
≥5,9000,average,0.44649777930382,
|
| 1060 |
+
≥5,9000,average_rank,2.5,
|
| 1061 |
+
≥5,9000,chartqa_relaxed_overall,0.5792,0.009875725592704212
|
| 1062 |
+
≥5,9000,docvqa_val_anls,0.6158422097964253,0.00617698110304048
|
| 1063 |
+
≥5,9000,infovqa_val_anls,0.24039717009699607,0.0068877247346275485
|
| 1064 |
+
≥5,9000,mme_total_score,1379.7254901960782,
|
| 1065 |
+
≥5,9000,mmmu_val_mmmu_acc,0.29889,
|
| 1066 |
+
≥5,9000,mmstar_average,0.3280690123874737,
|
| 1067 |
+
≥5,9000,ocrbench_ocrbench_accuracy,0.513,
|
| 1068 |
+
≥5,9000,seedbench_seed_all,0.5234574763757643,
|
| 1069 |
+
≥5,9000,textvqa_val_exact_match,0.4902200000000001,0.006801597067199211
|
| 1070 |
+
≥5,10000,ai2d_exact_match,0.45012953367875647,0.00895427929990258
|
| 1071 |
+
≥5,10000,average,0.4555663389491801,
|
| 1072 |
+
≥5,10000,average_rank,2.9,
|
| 1073 |
+
≥5,10000,chartqa_relaxed_overall,0.5844,0.009858475126140203
|
| 1074 |
+
≥5,10000,docvqa_val_anls,0.6189420793161403,0.006040465868816934
|
| 1075 |
+
≥5,10000,infovqa_val_anls,0.24850918819779613,0.007091394184737253
|
| 1076 |
+
≥5,10000,mme_total_score,1235.7704081632655,
|
| 1077 |
+
≥5,10000,mmmu_val_mmmu_acc,0.27667,
|
| 1078 |
+
≥5,10000,mmstar_average,0.34675895640940496,
|
| 1079 |
+
≥5,10000,ocrbench_ocrbench_accuracy,0.528,
|
| 1080 |
+
≥5,10000,seedbench_seed_all,0.5291272929405225,
|
| 1081 |
+
≥5,10000,textvqa_val_exact_match,0.51756,0.006786717998284417
|
| 1082 |
+
≥5,11000,ai2d_exact_match,0.4566062176165803,0.008965198879336196
|
| 1083 |
+
≥5,11000,average,0.458748059148796,
|
| 1084 |
+
≥5,11000,average_rank,2.9,
|
| 1085 |
+
≥5,11000,chartqa_relaxed_overall,0.5916,0.0098327233755248
|
| 1086 |
+
≥5,11000,docvqa_val_anls,0.633602507666147,0.006134122729213928
|
| 1087 |
+
≥5,11000,infovqa_val_anls,0.2621320066294427,0.007275786683175354
|
| 1088 |
+
≥5,11000,mme_total_score,1326.4276710684273,
|
| 1089 |
+
≥5,11000,mmmu_val_mmmu_acc,0.27667,
|
| 1090 |
+
≥5,11000,mmstar_average,0.34479339575773355,
|
| 1091 |
+
≥5,11000,ocrbench_ocrbench_accuracy,0.517,
|
| 1092 |
+
≥5,11000,seedbench_seed_all,0.5311284046692607,
|
| 1093 |
+
≥5,11000,textvqa_val_exact_match,0.5152000000000001,0.006786619456012555
|
| 1094 |
+
≥5,12000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
|
| 1095 |
+
≥5,12000,average,0.4644995480385772,
|
| 1096 |
+
≥5,12000,average_rank,2.3,
|
| 1097 |
+
≥5,12000,chartqa_relaxed_overall,0.596,0.009815912634917984
|
| 1098 |
+
≥5,12000,docvqa_val_anls,0.6453485539631237,0.006065269954977215
|
| 1099 |
+
≥5,12000,infovqa_val_anls,0.2685572578806166,0.007278550841020009
|
| 1100 |
+
≥5,12000,mme_total_score,1374.9406762705082,
|
| 1101 |
+
≥5,12000,mmmu_val_mmmu_acc,0.28444,
|
| 1102 |
+
≥5,12000,mmstar_average,0.35205377405882654,
|
| 1103 |
+
≥5,12000,ocrbench_ocrbench_accuracy,0.519,
|
| 1104 |
+
≥5,12000,seedbench_seed_all,0.5350194552529183,
|
| 1105 |
+
≥5,12000,textvqa_val_exact_match,0.52088,0.006777757204160069
|
| 1106 |
+
≥5,13000,ai2d_exact_match,0.4640544041450777,0.008975868633841907
|
| 1107 |
+
≥5,13000,average,0.4696984757423332,
|
| 1108 |
+
≥5,13000,average_rank,2.7,
|
| 1109 |
+
≥5,13000,chartqa_relaxed_overall,0.608,0.00976588700628918
|
| 1110 |
+
≥5,13000,docvqa_val_anls,0.6599237778239753,0.006035894149838363
|
| 1111 |
+
≥5,13000,infovqa_val_anls,0.25759117282312316,0.007107246020667877
|
| 1112 |
+
≥5,13000,mme_total_score,1326.0453181272508,
|
| 1113 |
+
≥5,13000,mmmu_val_mmmu_acc,0.28667,
|
| 1114 |
+
≥5,13000,mmstar_average,0.35252858336464304,
|
| 1115 |
+
≥5,13000,ocrbench_ocrbench_accuracy,0.533,
|
| 1116 |
+
≥5,13000,seedbench_seed_all,0.5330183435241801,
|
| 1117 |
+
≥5,13000,textvqa_val_exact_match,0.5325,0.006770636476998357
|
| 1118 |
+
≥5,14000,ai2d_exact_match,0.4689119170984456,0.00898174247001659
|
| 1119 |
+
≥5,14000,average,0.47293227498131896,
|
| 1120 |
+
≥5,14000,average_rank,2.7,
|
| 1121 |
+
≥5,14000,chartqa_relaxed_overall,0.614,0.009738559226822298
|
| 1122 |
+
≥5,14000,docvqa_val_anls,0.6583491716485876,0.0060256160547597325
|
| 1123 |
+
≥5,14000,infovqa_val_anls,0.26613522559599984,0.0071532088405842145
|
| 1124 |
+
≥5,14000,mme_total_score,1278.5425170068027,
|
| 1125 |
+
≥5,14000,mmmu_val_mmmu_acc,0.28,
|
| 1126 |
+
≥5,14000,mmstar_average,0.35624004153386235,
|
| 1127 |
+
≥5,14000,ocrbench_ocrbench_accuracy,0.55,
|
| 1128 |
+
≥5,14000,seedbench_seed_all,0.5454141189549749,
|
| 1129 |
+
≥5,14000,textvqa_val_exact_match,0.5173399999999999,0.006787096420087393
|
| 1130 |
+
≥5,15000,ai2d_exact_match,0.4740932642487047,0.008987066275159846
|
| 1131 |
+
≥5,15000,average,0.47568039073709784,
|
| 1132 |
+
≥5,15000,average_rank,3.0,
|
| 1133 |
+
≥5,15000,chartqa_relaxed_overall,0.602,0.00979166741164548
|
| 1134 |
+
≥5,15000,docvqa_val_anls,0.6649825816931088,0.006012202194059076
|
| 1135 |
+
≥5,15000,infovqa_val_anls,0.2659187859072639,0.007233849219121225
|
| 1136 |
+
≥5,15000,mme_total_score,1301.498799519808,
|
| 1137 |
+
≥5,15000,mmmu_val_mmmu_acc,0.30333,
|
| 1138 |
+
≥5,15000,mmstar_average,0.363574304462402,
|
| 1139 |
+
≥5,15000,ocrbench_ocrbench_accuracy,0.536,
|
| 1140 |
+
≥5,15000,seedbench_seed_all,0.5402445803224013,
|
| 1141 |
+
≥5,15000,textvqa_val_exact_match,0.53098,0.006774896882281907
|
| 1142 |
+
≥5,16000,ai2d_exact_match,0.47538860103626945,0.008988245555188545
|
| 1143 |
+
≥5,16000,average,0.48103362013771567,
|
| 1144 |
+
≥5,16000,average_rank,2.6,
|
| 1145 |
+
≥5,16000,chartqa_relaxed_overall,0.6172,0.009723347231923635
|
| 1146 |
+
≥5,16000,docvqa_val_anls,0.6661394800733964,0.006000339067695713
|
| 1147 |
+
≥5,16000,infovqa_val_anls,0.27200681388207976,0.007361243845813883
|
| 1148 |
+
≥5,16000,mme_total_score,1312.4185674269709,
|
| 1149 |
+
≥5,16000,mmmu_val_mmmu_acc,0.30667,
|
| 1150 |
+
≥5,16000,mmstar_average,0.352673072573432,
|
| 1151 |
+
≥5,16000,ocrbench_ocrbench_accuracy,0.553,
|
| 1152 |
+
≥5,16000,seedbench_seed_all,0.5483046136742635,
|
| 1153 |
+
≥5,16000,textvqa_val_exact_match,0.53792,0.0067618902203356104
|
| 1154 |
+
≥5,17000,ai2d_exact_match,0.4740932642487047,0.008987066275159846
|
| 1155 |
+
≥5,17000,average,0.4842246444979549,
|
| 1156 |
+
≥5,17000,average_rank,3.1,
|
| 1157 |
+
≥5,17000,chartqa_relaxed_overall,0.6252,0.009683361554563506
|
| 1158 |
+
≥5,17000,docvqa_val_anls,0.6727784028551866,0.005982986502554192
|
| 1159 |
+
≥5,17000,infovqa_val_anls,0.273461783643309,0.00736211121641681
|
| 1160 |
+
≥5,17000,mme_total_score,1256.561224489796,
|
| 1161 |
+
≥5,17000,mmmu_val_mmmu_acc,0.31889,
|
| 1162 |
+
≥5,17000,mmstar_average,0.35664172938975786,
|
| 1163 |
+
≥5,17000,ocrbench_ocrbench_accuracy,0.545,
|
| 1164 |
+
≥5,17000,seedbench_seed_all,0.549916620344636,
|
| 1165 |
+
≥5,17000,textvqa_val_exact_match,0.5420400000000001,0.006760567190239792
|
| 1166 |
+
≥5,18000,ai2d_exact_match,0.4802461139896373,0.008992128148477658
|
| 1167 |
+
≥5,18000,average,0.48392876207158253,
|
| 1168 |
+
≥5,18000,average_rank,2.9,
|
| 1169 |
+
≥5,18000,chartqa_relaxed_overall,0.6252,0.009683361554563506
|
| 1170 |
+
≥5,18000,docvqa_val_anls,0.68034033548242,0.005889534044935538
|
| 1171 |
+
≥5,18000,infovqa_val_anls,0.28015930560506613,0.007380855727131182
|
| 1172 |
+
≥5,18000,mme_total_score,1380.5266106442577,
|
| 1173 |
+
≥5,18000,mmmu_val_mmmu_acc,0.3,
|
| 1174 |
+
≥5,18000,mmstar_average,0.34877922919246646,
|
| 1175 |
+
≥5,18000,ocrbench_ocrbench_accuracy,0.549,
|
| 1176 |
+
≥5,18000,seedbench_seed_all,0.5529738743746526,
|
| 1177 |
+
≥5,18000,textvqa_val_exact_match,0.5386599999999999,0.0067648675941745775
|
| 1178 |
+
≥5,19000,ai2d_exact_match,0.4805699481865285,0.008992356706334513
|
| 1179 |
+
≥5,19000,average,0.49271643602329757,
|
| 1180 |
+
≥5,19000,average_rank,2.8,
|
| 1181 |
+
≥5,19000,chartqa_relaxed_overall,0.6248,0.009685427559111736
|
| 1182 |
+
≥5,19000,docvqa_val_anls,0.6825000217737053,0.005934471601355602
|
| 1183 |
+
≥5,19000,infovqa_val_anls,0.2841253071402532,0.007403930662950274
|
| 1184 |
+
≥5,19000,mme_total_score,1261.751700680272,
|
| 1185 |
+
≥5,19000,mmmu_val_mmmu_acc,0.32,
|
| 1186 |
+
≥5,19000,mmstar_average,0.3611420745688909,
|
| 1187 |
+
≥5,19000,ocrbench_ocrbench_accuracy,0.572,
|
| 1188 |
+
≥5,19000,seedbench_seed_all,0.5550305725403002,
|
| 1189 |
+
≥5,19000,textvqa_val_exact_match,0.5542799999999999,0.006739897741383979
|
| 1190 |
+
≥5,20000,ai2d_exact_match,0.4844559585492228,0.008994804366753555
|
| 1191 |
+
≥5,20000,average,0.49543136618963995,
|
| 1192 |
+
≥5,20000,average_rank,2.9,
|
| 1193 |
+
≥5,20000,chartqa_relaxed_overall,0.638,0.009613499245701268
|
| 1194 |
+
≥5,20000,docvqa_val_anls,0.688451623661496,0.005905200575549553
|
| 1195 |
+
≥5,20000,infovqa_val_anls,0.2789607789162199,0.007289443671361681
|
| 1196 |
+
≥5,20000,mme_total_score,1296.043617446979,
|
| 1197 |
+
≥5,20000,mmmu_val_mmmu_acc,0.3,
|
| 1198 |
+
≥5,20000,mmstar_average,0.3804272197382418,
|
| 1199 |
+
≥5,20000,ocrbench_ocrbench_accuracy,0.577,
|
| 1200 |
+
≥5,20000,seedbench_seed_all,0.5560867148415787,
|
| 1201 |
+
≥5,20000,textvqa_val_exact_match,0.5555,0.006734970078953051
|
app/src/content/assets/data/remove_ch.csv
ADDED
|
@@ -0,0 +1,455 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
run,step,metric,value,stderr
|
| 2 |
+
Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
|
| 3 |
+
Baseline,1000,average,0.27120689295763617,
|
| 4 |
+
Baseline,1000,average_rank,1.7,
|
| 5 |
+
Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
|
| 6 |
+
Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
|
| 7 |
+
Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
|
| 8 |
+
Baseline,1000,mme_total_score,977.4280712284914,
|
| 9 |
+
Baseline,1000,mmmu_val_mmmu_acc,0.25222,
|
| 10 |
+
Baseline,1000,mmstar_average,0.23215874078908072,
|
| 11 |
+
Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
|
| 12 |
+
Baseline,1000,seedbench_seed_all,0.2563646470261256,
|
| 13 |
+
Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
|
| 14 |
+
Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
|
| 15 |
+
Baseline,2000,average,0.3202068275596269,
|
| 16 |
+
Baseline,2000,average_rank,1.5,
|
| 17 |
+
Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
|
| 18 |
+
Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
|
| 19 |
+
Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
|
| 20 |
+
Baseline,2000,mme_total_score,1049.3036214485794,
|
| 21 |
+
Baseline,2000,mmmu_val_mmmu_acc,0.24556,
|
| 22 |
+
Baseline,2000,mmstar_average,0.21305462434540698,
|
| 23 |
+
Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
|
| 24 |
+
Baseline,2000,seedbench_seed_all,0.258532518065592,
|
| 25 |
+
Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
|
| 26 |
+
Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
|
| 27 |
+
Baseline,3000,average,0.3507423834414229,
|
| 28 |
+
Baseline,3000,average_rank,1.6,
|
| 29 |
+
Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
|
| 30 |
+
Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
|
| 31 |
+
Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
|
| 32 |
+
Baseline,3000,mme_total_score,1170.2383953581434,
|
| 33 |
+
Baseline,3000,mmmu_val_mmmu_acc,0.27556,
|
| 34 |
+
Baseline,3000,mmstar_average,0.25432376938577683,
|
| 35 |
+
Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
|
| 36 |
+
Baseline,3000,seedbench_seed_all,0.2792106725958866,
|
| 37 |
+
Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
|
| 38 |
+
Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
|
| 39 |
+
Baseline,4000,average,0.36961781722974835,
|
| 40 |
+
Baseline,4000,average_rank,1.6,
|
| 41 |
+
Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
|
| 42 |
+
Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
|
| 43 |
+
Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
|
| 44 |
+
Baseline,4000,mme_total_score,1155.203781512605,
|
| 45 |
+
Baseline,4000,mmmu_val_mmmu_acc,0.25556,
|
| 46 |
+
Baseline,4000,mmstar_average,0.2575590188757354,
|
| 47 |
+
Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
|
| 48 |
+
Baseline,4000,seedbench_seed_all,0.33913285158421347,
|
| 49 |
+
Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
|
| 50 |
+
Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
|
| 51 |
+
Baseline,5000,average,0.3974627910380972,
|
| 52 |
+
Baseline,5000,average_rank,1.6,
|
| 53 |
+
Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
|
| 54 |
+
Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
|
| 55 |
+
Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
|
| 56 |
+
Baseline,5000,mme_total_score,1181.4653861544618,
|
| 57 |
+
Baseline,5000,mmmu_val_mmmu_acc,0.26667,
|
| 58 |
+
Baseline,5000,mmstar_average,0.29596648146165705,
|
| 59 |
+
Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
|
| 60 |
+
Baseline,5000,seedbench_seed_all,0.43107281823235133,
|
| 61 |
+
Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
|
| 62 |
+
Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
|
| 63 |
+
Baseline,6000,average,0.4161227404571003,
|
| 64 |
+
Baseline,6000,average_rank,1.7,
|
| 65 |
+
Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
|
| 66 |
+
Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
|
| 67 |
+
Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
|
| 68 |
+
Baseline,6000,mme_total_score,1284.1648659463785,
|
| 69 |
+
Baseline,6000,mmmu_val_mmmu_acc,0.27111,
|
| 70 |
+
Baseline,6000,mmstar_average,0.2978489412854164,
|
| 71 |
+
Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
|
| 72 |
+
Baseline,6000,seedbench_seed_all,0.4795997776542524,
|
| 73 |
+
Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
|
| 74 |
+
Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
|
| 75 |
+
Baseline,7000,average,0.4291083177345374,
|
| 76 |
+
Baseline,7000,average_rank,1.6,
|
| 77 |
+
Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
|
| 78 |
+
Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
|
| 79 |
+
Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
|
| 80 |
+
Baseline,7000,mme_total_score,1185.875650260104,
|
| 81 |
+
Baseline,7000,mmmu_val_mmmu_acc,0.26556,
|
| 82 |
+
Baseline,7000,mmstar_average,0.31372400960777047,
|
| 83 |
+
Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
|
| 84 |
+
Baseline,7000,seedbench_seed_all,0.4964424680377988,
|
| 85 |
+
Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
|
| 86 |
+
Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
|
| 87 |
+
Baseline,8000,average,0.43846759477995995,
|
| 88 |
+
Baseline,8000,average_rank,1.8,
|
| 89 |
+
Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
|
| 90 |
+
Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
|
| 91 |
+
Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
|
| 92 |
+
Baseline,8000,mme_total_score,1199.2409963985594,
|
| 93 |
+
Baseline,8000,mmmu_val_mmmu_acc,0.28111,
|
| 94 |
+
Baseline,8000,mmstar_average,0.33512257186205047,
|
| 95 |
+
Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
|
| 96 |
+
Baseline,8000,seedbench_seed_all,0.5024458032240133,
|
| 97 |
+
Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
|
| 98 |
+
Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
|
| 99 |
+
Baseline,9000,average,0.4422510732201056,
|
| 100 |
+
Baseline,9000,average_rank,1.8,
|
| 101 |
+
Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
|
| 102 |
+
Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
|
| 103 |
+
Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
|
| 104 |
+
Baseline,9000,mme_total_score,1231.5195078031213,
|
| 105 |
+
Baseline,9000,mmmu_val_mmmu_acc,0.25889,
|
| 106 |
+
Baseline,9000,mmstar_average,0.3216444898242951,
|
| 107 |
+
Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
|
| 108 |
+
Baseline,9000,seedbench_seed_all,0.5120622568093385,
|
| 109 |
+
Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
|
| 110 |
+
Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
|
| 111 |
+
Baseline,10000,average,0.4523875703250908,
|
| 112 |
+
Baseline,10000,average_rank,1.7,
|
| 113 |
+
Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
|
| 114 |
+
Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
|
| 115 |
+
Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
|
| 116 |
+
Baseline,10000,mme_total_score,1240.8218287314926,
|
| 117 |
+
Baseline,10000,mmmu_val_mmmu_acc,0.28778,
|
| 118 |
+
Baseline,10000,mmstar_average,0.32972717906018517,
|
| 119 |
+
Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
|
| 120 |
+
Baseline,10000,seedbench_seed_all,0.5217342968315731,
|
| 121 |
+
Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
|
| 122 |
+
Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
|
| 123 |
+
Baseline,11000,average,0.4561398159525099,
|
| 124 |
+
Baseline,11000,average_rank,1.7,
|
| 125 |
+
Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
|
| 126 |
+
Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
|
| 127 |
+
Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
|
| 128 |
+
Baseline,11000,mme_total_score,1322.9488795518205,
|
| 129 |
+
Baseline,11000,mmmu_val_mmmu_acc,0.27778,
|
| 130 |
+
Baseline,11000,mmstar_average,0.3298563439522548,
|
| 131 |
+
Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
|
| 132 |
+
Baseline,11000,seedbench_seed_all,0.5237354085603113,
|
| 133 |
+
Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
|
| 134 |
+
Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
|
| 135 |
+
Baseline,12000,average,0.4582751140055433,
|
| 136 |
+
Baseline,12000,average_rank,1.7,
|
| 137 |
+
Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 138 |
+
Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
|
| 139 |
+
Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
|
| 140 |
+
Baseline,12000,mme_total_score,1225.6453581432572,
|
| 141 |
+
Baseline,12000,mmmu_val_mmmu_acc,0.27889,
|
| 142 |
+
Baseline,12000,mmstar_average,0.34010867846816534,
|
| 143 |
+
Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
|
| 144 |
+
Baseline,12000,seedbench_seed_all,0.5350194552529183,
|
| 145 |
+
Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
|
| 146 |
+
Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
|
| 147 |
+
Baseline,13000,average,0.4692868662590049,
|
| 148 |
+
Baseline,13000,average_rank,1.4,
|
| 149 |
+
Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
|
| 150 |
+
Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
|
| 151 |
+
Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
|
| 152 |
+
Baseline,13000,mme_total_score,1281.7122849139657,
|
| 153 |
+
Baseline,13000,mmmu_val_mmmu_acc,0.28222,
|
| 154 |
+
Baseline,13000,mmstar_average,0.3453069542917521,
|
| 155 |
+
Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
|
| 156 |
+
Baseline,13000,seedbench_seed_all,0.5442468037798777,
|
| 157 |
+
Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
|
| 158 |
+
Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
|
| 159 |
+
Baseline,14000,average,0.47352486841689195,
|
| 160 |
+
Baseline,14000,average_rank,1.3,
|
| 161 |
+
Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
|
| 162 |
+
Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
|
| 163 |
+
Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
|
| 164 |
+
Baseline,14000,mme_total_score,1309.1444577831132,
|
| 165 |
+
Baseline,14000,mmmu_val_mmmu_acc,0.28111,
|
| 166 |
+
Baseline,14000,mmstar_average,0.34575818188776586,
|
| 167 |
+
Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
|
| 168 |
+
Baseline,14000,seedbench_seed_all,0.5483602001111729,
|
| 169 |
+
Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
|
| 170 |
+
Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
|
| 171 |
+
Baseline,15000,average,0.47878665012878824,
|
| 172 |
+
Baseline,15000,average_rank,1.2,
|
| 173 |
+
Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 174 |
+
Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
|
| 175 |
+
Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
|
| 176 |
+
Baseline,15000,mme_total_score,1384.2171868747498,
|
| 177 |
+
Baseline,15000,mmmu_val_mmmu_acc,0.30222,
|
| 178 |
+
Baseline,15000,mmstar_average,0.35408135695920684,
|
| 179 |
+
Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
|
| 180 |
+
Baseline,15000,seedbench_seed_all,0.5411339633129516,
|
| 181 |
+
Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
|
| 182 |
+
Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
|
| 183 |
+
Baseline,16000,average,0.47665128022935843,
|
| 184 |
+
Baseline,16000,average_rank,1.5,
|
| 185 |
+
Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 186 |
+
Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
|
| 187 |
+
Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
|
| 188 |
+
Baseline,16000,mme_total_score,1317.8491396558625,
|
| 189 |
+
Baseline,16000,mmmu_val_mmmu_acc,0.27556,
|
| 190 |
+
Baseline,16000,mmstar_average,0.33214333327093315,
|
| 191 |
+
Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
|
| 192 |
+
Baseline,16000,seedbench_seed_all,0.5463590883824346,
|
| 193 |
+
Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
|
| 194 |
+
Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
|
| 195 |
+
Baseline,17000,average,0.4777141780162423,
|
| 196 |
+
Baseline,17000,average_rank,1.2,
|
| 197 |
+
Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 198 |
+
Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
|
| 199 |
+
Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
|
| 200 |
+
Baseline,17000,mme_total_score,1381.9161664665867,
|
| 201 |
+
Baseline,17000,mmmu_val_mmmu_acc,0.27667,
|
| 202 |
+
Baseline,17000,mmstar_average,0.3370289492329521,
|
| 203 |
+
Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
|
| 204 |
+
Baseline,17000,seedbench_seed_all,0.5510283490828238,
|
| 205 |
+
Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
|
| 206 |
+
Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
|
| 207 |
+
Baseline,18000,average,0.4819834595278701,
|
| 208 |
+
Baseline,18000,average_rank,1.1,
|
| 209 |
+
Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
|
| 210 |
+
Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
|
| 211 |
+
Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
|
| 212 |
+
Baseline,18000,mme_total_score,1336.922769107643,
|
| 213 |
+
Baseline,18000,mmmu_val_mmmu_acc,0.28667,
|
| 214 |
+
Baseline,18000,mmstar_average,0.34482796716566916,
|
| 215 |
+
Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
|
| 216 |
+
Baseline,18000,seedbench_seed_all,0.5543079488604781,
|
| 217 |
+
Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
|
| 218 |
+
Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
|
| 219 |
+
Baseline,19000,average,0.4899006713916878,
|
| 220 |
+
Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
|
| 221 |
+
Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
|
| 222 |
+
Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
|
| 223 |
+
Baseline,19000,mme_total_score,1406.6628651460583,
|
| 224 |
+
Baseline,19000,mmmu_val_mmmu_acc,0.28333,
|
| 225 |
+
Baseline,19000,mmstar_average,0.356220913822775,
|
| 226 |
+
Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
|
| 227 |
+
Baseline,19000,seedbench_seed_all,0.554585881045025,
|
| 228 |
+
Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
|
| 229 |
+
Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
|
| 230 |
+
Baseline,20000,average,0.4873169067639118,
|
| 231 |
+
Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
|
| 232 |
+
Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
|
| 233 |
+
Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
|
| 234 |
+
Baseline,20000,mme_total_score,1324.6738695478193,
|
| 235 |
+
Baseline,20000,mmmu_val_mmmu_acc,0.30111,
|
| 236 |
+
Baseline,20000,mmstar_average,0.33806766134497995,
|
| 237 |
+
Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
|
| 238 |
+
Baseline,20000,seedbench_seed_all,0.5587548638132296,
|
| 239 |
+
Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
|
| 240 |
+
Remove Multilingual Data,1000,ai2d_exact_match,0.2619818652849741,0.007914086941902855
|
| 241 |
+
Remove Multilingual Data,1000,average,0.29340443385847137,
|
| 242 |
+
Remove Multilingual Data,1000,average_rank,1.3,
|
| 243 |
+
Remove Multilingual Data,1000,chartqa_relaxed_overall,0.3736,0.009677121197436144
|
| 244 |
+
Remove Multilingual Data,1000,docvqa_val_anls,0.403140100303888,0.006111323163666132
|
| 245 |
+
Remove Multilingual Data,1000,infovqa_val_anls,0.1764617576183696,0.006251319736392345
|
| 246 |
+
Remove Multilingual Data,1000,mme_total_score,979.3045218087235,
|
| 247 |
+
Remove Multilingual Data,1000,mmmu_val_mmmu_acc,0.25222,
|
| 248 |
+
Remove Multilingual Data,1000,mmstar_average,0.2073057646207335,
|
| 249 |
+
Remove Multilingual Data,1000,ocrbench_ocrbench_accuracy,0.333,
|
| 250 |
+
Remove Multilingual Data,1000,seedbench_seed_all,0.2507504168982768,
|
| 251 |
+
Remove Multilingual Data,1000,textvqa_val_exact_match,0.38218,0.006631325992355026
|
| 252 |
+
Remove Multilingual Data,2000,ai2d_exact_match,0.25291450777202074,0.007823547213659585
|
| 253 |
+
Remove Multilingual Data,2000,average,0.32254499165624334,
|
| 254 |
+
Remove Multilingual Data,2000,average_rank,1.5,
|
| 255 |
+
Remove Multilingual Data,2000,chartqa_relaxed_overall,0.4692,0.009983005968307607
|
| 256 |
+
Remove Multilingual Data,2000,docvqa_val_anls,0.472590835723597,0.006255090657185791
|
| 257 |
+
Remove Multilingual Data,2000,infovqa_val_anls,0.19402428600531574,0.006415305613638088
|
| 258 |
+
Remove Multilingual Data,2000,mme_total_score,1067.5286114445778,
|
| 259 |
+
Remove Multilingual Data,2000,mmmu_val_mmmu_acc,0.24444,
|
| 260 |
+
Remove Multilingual Data,2000,mmstar_average,0.20544885849586278,
|
| 261 |
+
Remove Multilingual Data,2000,ocrbench_ocrbench_accuracy,0.409,
|
| 262 |
+
Remove Multilingual Data,2000,seedbench_seed_all,0.2555864369093941,
|
| 263 |
+
Remove Multilingual Data,2000,textvqa_val_exact_match,0.3997,0.006677042652231296
|
| 264 |
+
Remove Multilingual Data,3000,ai2d_exact_match,0.2658678756476684,0.00795154886571598
|
| 265 |
+
Remove Multilingual Data,3000,average,0.35383248024337044,
|
| 266 |
+
Remove Multilingual Data,3000,average_rank,1.4,
|
| 267 |
+
Remove Multilingual Data,3000,chartqa_relaxed_overall,0.536,0.009976041728231964
|
| 268 |
+
Remove Multilingual Data,3000,docvqa_val_anls,0.5115050780592246,0.006297134520533815
|
| 269 |
+
Remove Multilingual Data,3000,infovqa_val_anls,0.1959317380528948,0.006353999153527862
|
| 270 |
+
Remove Multilingual Data,3000,mme_total_score,1055.7074829931971,
|
| 271 |
+
Remove Multilingual Data,3000,mmmu_val_mmmu_acc,0.26,
|
| 272 |
+
Remove Multilingual Data,3000,mmstar_average,0.2325690534433309,
|
| 273 |
+
Remove Multilingual Data,3000,ocrbench_ocrbench_accuracy,0.449,
|
| 274 |
+
Remove Multilingual Data,3000,seedbench_seed_all,0.28943857698721515,
|
| 275 |
+
Remove Multilingual Data,3000,textvqa_val_exact_match,0.44418,0.0067730052591185854
|
| 276 |
+
Remove Multilingual Data,4000,ai2d_exact_match,0.2856217616580311,0.008130016747303466
|
| 277 |
+
Remove Multilingual Data,4000,average,0.3775873253769421,
|
| 278 |
+
Remove Multilingual Data,4000,average_rank,1.4,
|
| 279 |
+
Remove Multilingual Data,4000,chartqa_relaxed_overall,0.55,0.009951864943131942
|
| 280 |
+
Remove Multilingual Data,4000,docvqa_val_anls,0.5339851175847934,0.0062957385772197255
|
| 281 |
+
Remove Multilingual Data,4000,infovqa_val_anls,0.20750676546327357,0.006369425500899887
|
| 282 |
+
Remove Multilingual Data,4000,mme_total_score,1228.202280912365,
|
| 283 |
+
Remove Multilingual Data,4000,mmmu_val_mmmu_acc,0.27111,
|
| 284 |
+
Remove Multilingual Data,4000,mmstar_average,0.24655460164079995,
|
| 285 |
+
Remove Multilingual Data,4000,ocrbench_ocrbench_accuracy,0.456,
|
| 286 |
+
Remove Multilingual Data,4000,seedbench_seed_all,0.3898276820455809,
|
| 287 |
+
Remove Multilingual Data,4000,textvqa_val_exact_match,0.45768000000000003,0.006781666588703993
|
| 288 |
+
Remove Multilingual Data,5000,ai2d_exact_match,0.3121761658031088,0.008340079044408505
|
| 289 |
+
Remove Multilingual Data,5000,average,0.3976192139479395,
|
| 290 |
+
Remove Multilingual Data,5000,average_rank,1.4,
|
| 291 |
+
Remove Multilingual Data,5000,chartqa_relaxed_overall,0.5684,0.009907968668564455
|
| 292 |
+
Remove Multilingual Data,5000,docvqa_val_anls,0.5611339219828478,0.006260862186673622
|
| 293 |
+
Remove Multilingual Data,5000,infovqa_val_anls,0.21913407408993218,0.006638320670102091
|
| 294 |
+
Remove Multilingual Data,5000,mme_total_score,1219.2377951180472,
|
| 295 |
+
Remove Multilingual Data,5000,mmmu_val_mmmu_acc,0.29444,
|
| 296 |
+
Remove Multilingual Data,5000,mmstar_average,0.23556637343877926,
|
| 297 |
+
Remove Multilingual Data,5000,ocrbench_ocrbench_accuracy,0.472,
|
| 298 |
+
Remove Multilingual Data,5000,seedbench_seed_all,0.4443023902167871,
|
| 299 |
+
Remove Multilingual Data,5000,textvqa_val_exact_match,0.47142,0.006807048104779351
|
| 300 |
+
Remove Multilingual Data,6000,ai2d_exact_match,0.35200777202072536,0.008595926828224822
|
| 301 |
+
Remove Multilingual Data,6000,average,0.42451996443270734,
|
| 302 |
+
Remove Multilingual Data,6000,average_rank,1.3,
|
| 303 |
+
Remove Multilingual Data,6000,chartqa_relaxed_overall,0.5744,0.009890651444389179
|
| 304 |
+
Remove Multilingual Data,6000,docvqa_val_anls,0.5825552977560686,0.006257174245982806
|
| 305 |
+
Remove Multilingual Data,6000,infovqa_val_anls,0.252828230577843,0.007149939162213116
|
| 306 |
+
Remove Multilingual Data,6000,mme_total_score,1216.607643057223,
|
| 307 |
+
Remove Multilingual Data,6000,mmmu_val_mmmu_acc,0.30222,
|
| 308 |
+
Remove Multilingual Data,6000,mmstar_average,0.2807390632529032,
|
| 309 |
+
Remove Multilingual Data,6000,ocrbench_ocrbench_accuracy,0.497,
|
| 310 |
+
Remove Multilingual Data,6000,seedbench_seed_all,0.484769316286826,
|
| 311 |
+
Remove Multilingual Data,6000,textvqa_val_exact_match,0.49416000000000004,0.006798707477504303
|
| 312 |
+
Remove Multilingual Data,7000,ai2d_exact_match,0.3801813471502591,0.008736941116932581
|
| 313 |
+
Remove Multilingual Data,7000,average,0.428085510128325,
|
| 314 |
+
Remove Multilingual Data,7000,average_rank,1.4,
|
| 315 |
+
Remove Multilingual Data,7000,chartqa_relaxed_overall,0.5796,0.009874438607593145
|
| 316 |
+
Remove Multilingual Data,7000,docvqa_val_anls,0.5966369586509165,0.006224801729990067
|
| 317 |
+
Remove Multilingual Data,7000,infovqa_val_anls,0.23354910759447625,0.006817906701297544
|
| 318 |
+
Remove Multilingual Data,7000,mme_total_score,1188.1020408163265,
|
| 319 |
+
Remove Multilingual Data,7000,mmmu_val_mmmu_acc,0.27556,
|
| 320 |
+
Remove Multilingual Data,7000,mmstar_average,0.292518909276783,
|
| 321 |
+
Remove Multilingual Data,7000,ocrbench_ocrbench_accuracy,0.503,
|
| 322 |
+
Remove Multilingual Data,7000,seedbench_seed_all,0.48988326848249025,
|
| 323 |
+
Remove Multilingual Data,7000,textvqa_val_exact_match,0.5018400000000001,0.006795274684043781
|
| 324 |
+
Remove Multilingual Data,8000,ai2d_exact_match,0.3863341968911917,0.008763532923326706
|
| 325 |
+
Remove Multilingual Data,8000,average,0.4413787447198958,
|
| 326 |
+
Remove Multilingual Data,8000,average_rank,1.2,
|
| 327 |
+
Remove Multilingual Data,8000,chartqa_relaxed_overall,0.5964,0.009814343815957088
|
| 328 |
+
Remove Multilingual Data,8000,docvqa_val_anls,0.603351366738696,0.006235087701254087
|
| 329 |
+
Remove Multilingual Data,8000,infovqa_val_anls,0.25307646024963104,0.007198626238671866
|
| 330 |
+
Remove Multilingual Data,8000,mme_total_score,1261.5517206882753,
|
| 331 |
+
Remove Multilingual Data,8000,mmmu_val_mmmu_acc,0.29556,
|
| 332 |
+
Remove Multilingual Data,8000,mmstar_average,0.30595531673183934,
|
| 333 |
+
Remove Multilingual Data,8000,ocrbench_ocrbench_accuracy,0.505,
|
| 334 |
+
Remove Multilingual Data,8000,seedbench_seed_all,0.5124513618677042,
|
| 335 |
+
Remove Multilingual Data,8000,textvqa_val_exact_match,0.51428,0.006792322389925977
|
| 336 |
+
Remove Multilingual Data,9000,ai2d_exact_match,0.3908678756476684,0.008782181865213609
|
| 337 |
+
Remove Multilingual Data,9000,average,0.4483393474436153,
|
| 338 |
+
Remove Multilingual Data,9000,average_rank,1.2,
|
| 339 |
+
Remove Multilingual Data,9000,chartqa_relaxed_overall,0.6008,0.00979663889573671
|
| 340 |
+
Remove Multilingual Data,9000,docvqa_val_anls,0.6206417157518567,0.006160046717594884
|
| 341 |
+
Remove Multilingual Data,9000,infovqa_val_anls,0.2517144366407357,0.007092352700671051
|
| 342 |
+
Remove Multilingual Data,9000,mme_total_score,1270.4974989995999,
|
| 343 |
+
Remove Multilingual Data,9000,mmmu_val_mmmu_acc,0.29333,
|
| 344 |
+
Remove Multilingual Data,9000,mmstar_average,0.32657768650091523,
|
| 345 |
+
Remove Multilingual Data,9000,ocrbench_ocrbench_accuracy,0.52,
|
| 346 |
+
Remove Multilingual Data,9000,seedbench_seed_all,0.5163424124513619,
|
| 347 |
+
Remove Multilingual Data,9000,textvqa_val_exact_match,0.51478,0.006772730933446224
|
| 348 |
+
Remove Multilingual Data,10000,ai2d_exact_match,0.41450777202072536,0.008866630113019596
|
| 349 |
+
Remove Multilingual Data,10000,average,0.45448389614950035,
|
| 350 |
+
Remove Multilingual Data,10000,average_rank,1.3,
|
| 351 |
+
Remove Multilingual Data,10000,chartqa_relaxed_overall,0.6068,0.009771166474772143
|
| 352 |
+
Remove Multilingual Data,10000,docvqa_val_anls,0.6232449599819007,0.006177718712473361
|
| 353 |
+
Remove Multilingual Data,10000,infovqa_val_anls,0.23737546748097776,0.006778926597473845
|
| 354 |
+
Remove Multilingual Data,10000,mme_total_score,1276.3549419767905,
|
| 355 |
+
Remove Multilingual Data,10000,mmmu_val_mmmu_acc,0.29889,
|
| 356 |
+
Remove Multilingual Data,10000,mmstar_average,0.3130758097195978,
|
| 357 |
+
Remove Multilingual Data,10000,ocrbench_ocrbench_accuracy,0.539,
|
| 358 |
+
Remove Multilingual Data,10000,seedbench_seed_all,0.5219010561423013,
|
| 359 |
+
Remove Multilingual Data,10000,textvqa_val_exact_match,0.53556,0.00676001751827386
|
| 360 |
+
Remove Multilingual Data,11000,ai2d_exact_match,0.41904145077720206,0.008880404559123601
|
| 361 |
+
Remove Multilingual Data,11000,average,0.4609227111862355,
|
| 362 |
+
Remove Multilingual Data,11000,average_rank,1.3,
|
| 363 |
+
Remove Multilingual Data,11000,chartqa_relaxed_overall,0.6108,0.00975332737879659
|
| 364 |
+
Remove Multilingual Data,11000,docvqa_val_anls,0.6387481065492241,0.006094036395159673
|
| 365 |
+
Remove Multilingual Data,11000,infovqa_val_anls,0.25052436731474453,0.006993658213921465
|
| 366 |
+
Remove Multilingual Data,11000,mme_total_score,1258.2553021208482,
|
| 367 |
+
Remove Multilingual Data,11000,mmmu_val_mmmu_acc,0.28,
|
| 368 |
+
Remove Multilingual Data,11000,mmstar_average,0.3213557456291676,
|
| 369 |
+
Remove Multilingual Data,11000,ocrbench_ocrbench_accuracy,0.561,
|
| 370 |
+
Remove Multilingual Data,11000,seedbench_seed_all,0.526514730405781,
|
| 371 |
+
Remove Multilingual Data,11000,textvqa_val_exact_match,0.54032,0.0067608876222200335
|
| 372 |
+
Remove Multilingual Data,12000,ai2d_exact_match,0.41353626943005184,0.00886357792887845
|
| 373 |
+
Remove Multilingual Data,12000,average,0.46149948562642984,
|
| 374 |
+
Remove Multilingual Data,12000,average_rank,1.3,
|
| 375 |
+
Remove Multilingual Data,12000,chartqa_relaxed_overall,0.622,0.009699692449425671
|
| 376 |
+
Remove Multilingual Data,12000,docvqa_val_anls,0.6481870346272672,0.0060803752132680255
|
| 377 |
+
Remove Multilingual Data,12000,infovqa_val_anls,0.25116762340113796,0.006993814336062128
|
| 378 |
+
Remove Multilingual Data,12000,mme_total_score,1256.7357943177271,
|
| 379 |
+
Remove Multilingual Data,12000,mmmu_val_mmmu_acc,0.28222,
|
| 380 |
+
Remove Multilingual Data,12000,mmstar_average,0.311104865636332,
|
| 381 |
+
Remove Multilingual Data,12000,ocrbench_ocrbench_accuracy,0.547,
|
| 382 |
+
Remove Multilingual Data,12000,seedbench_seed_all,0.5312395775430795,
|
| 383 |
+
Remove Multilingual Data,12000,textvqa_val_exact_match,0.54704,0.006750774938661079
|
| 384 |
+
Remove Multilingual Data,13000,ai2d_exact_match,0.42810880829015546,0.008905646879422012
|
| 385 |
+
Remove Multilingual Data,13000,average,0.4658949593838579,
|
| 386 |
+
Remove Multilingual Data,13000,average_rank,1.6,
|
| 387 |
+
Remove Multilingual Data,13000,chartqa_relaxed_overall,0.622,0.009699692449425671
|
| 388 |
+
Remove Multilingual Data,13000,docvqa_val_anls,0.6461697403304425,0.006072036108570188
|
| 389 |
+
Remove Multilingual Data,13000,infovqa_val_anls,0.2635164421127001,0.007102540516236264
|
| 390 |
+
Remove Multilingual Data,13000,mme_total_score,1295.0039015606244,
|
| 391 |
+
Remove Multilingual Data,13000,mmmu_val_mmmu_acc,0.29,
|
| 392 |
+
Remove Multilingual Data,13000,mmstar_average,0.3296444797414335,
|
| 393 |
+
Remove Multilingual Data,13000,ocrbench_ocrbench_accuracy,0.54,
|
| 394 |
+
Remove Multilingual Data,13000,seedbench_seed_all,0.5312951639799889,
|
| 395 |
+
Remove Multilingual Data,13000,textvqa_val_exact_match,0.54232,0.006771571040376891
|
| 396 |
+
Remove Multilingual Data,14000,ai2d_exact_match,0.42487046632124353,0.008896983637113786
|
| 397 |
+
Remove Multilingual Data,14000,average,0.46755416993970794,
|
| 398 |
+
Remove Multilingual Data,14000,average_rank,1.7,
|
| 399 |
+
Remove Multilingual Data,14000,chartqa_relaxed_overall,0.6256,0.009681288495793083
|
| 400 |
+
Remove Multilingual Data,14000,docvqa_val_anls,0.6470833619171145,0.006119244473927763
|
| 401 |
+
Remove Multilingual Data,14000,infovqa_val_anls,0.2541720455309047,0.007006172199083197
|
| 402 |
+
Remove Multilingual Data,14000,mme_total_score,1262.1793717486994,
|
| 403 |
+
Remove Multilingual Data,14000,mmmu_val_mmmu_acc,0.28556,
|
| 404 |
+
Remove Multilingual Data,14000,mmstar_average,0.327544946405174,
|
| 405 |
+
Remove Multilingual Data,14000,ocrbench_ocrbench_accuracy,0.559,
|
| 406 |
+
Remove Multilingual Data,14000,seedbench_seed_all,0.5380767092829349,
|
| 407 |
+
Remove Multilingual Data,14000,textvqa_val_exact_match,0.5460799999999999,0.006754587449305995
|
| 408 |
+
Remove Multilingual Data,15000,ai2d_exact_match,0.42908031088082904,0.00890816984689523
|
| 409 |
+
Remove Multilingual Data,15000,average,0.4720258172705174,
|
| 410 |
+
Remove Multilingual Data,15000,average_rank,1.8,
|
| 411 |
+
Remove Multilingual Data,15000,chartqa_relaxed_overall,0.626,0.009679208378267924
|
| 412 |
+
Remove Multilingual Data,15000,docvqa_val_anls,0.655881547989144,0.006058079036611966
|
| 413 |
+
Remove Multilingual Data,15000,infovqa_val_anls,0.2538472956751567,0.006929926842577286
|
| 414 |
+
Remove Multilingual Data,15000,mme_total_score,1283.2800120048018,
|
| 415 |
+
Remove Multilingual Data,15000,mmmu_val_mmmu_acc,0.29,
|
| 416 |
+
Remove Multilingual Data,15000,mmstar_average,0.3309383426349411,
|
| 417 |
+
Remove Multilingual Data,15000,ocrbench_ocrbench_accuracy,0.572,
|
| 418 |
+
Remove Multilingual Data,15000,seedbench_seed_all,0.5407448582545858,
|
| 419 |
+
Remove Multilingual Data,15000,textvqa_val_exact_match,0.54974,0.006738090742441116
|
| 420 |
+
Remove Multilingual Data,16000,ai2d_exact_match,0.42940414507772023,0.008909003051055714
|
| 421 |
+
Remove Multilingual Data,16000,average,0.476926180401357,
|
| 422 |
+
Remove Multilingual Data,16000,average_rank,1.5,
|
| 423 |
+
Remove Multilingual Data,16000,chartqa_relaxed_overall,0.626,0.009679208378267924
|
| 424 |
+
Remove Multilingual Data,16000,docvqa_val_anls,0.6622394005833824,0.006046858134280091
|
| 425 |
+
Remove Multilingual Data,16000,infovqa_val_anls,0.2633356312454137,0.007137388413784386
|
| 426 |
+
Remove Multilingual Data,16000,mme_total_score,1328.4599839935972,
|
| 427 |
+
Remove Multilingual Data,16000,mmmu_val_mmmu_acc,0.29556,
|
| 428 |
+
Remove Multilingual Data,16000,mmstar_average,0.33932578522709744,
|
| 429 |
+
Remove Multilingual Data,16000,ocrbench_ocrbench_accuracy,0.578,
|
| 430 |
+
Remove Multilingual Data,16000,seedbench_seed_all,0.5431906614785992,
|
| 431 |
+
Remove Multilingual Data,16000,textvqa_val_exact_match,0.55528,0.006733817132847886
|
| 432 |
+
Remove Multilingual Data,17000,ai2d_exact_match,0.42940414507772023,0.008909003051055712
|
| 433 |
+
Remove Multilingual Data,17000,average,0.4732087844936434,
|
| 434 |
+
Remove Multilingual Data,17000,average_rank,1.8,
|
| 435 |
+
Remove Multilingual Data,17000,chartqa_relaxed_overall,0.6264,0.009677121197436144
|
| 436 |
+
Remove Multilingual Data,17000,docvqa_val_anls,0.661817176575324,0.0060368801840957114
|
| 437 |
+
Remove Multilingual Data,17000,infovqa_val_anls,0.25584519300448166,0.007033162778192734
|
| 438 |
+
Remove Multilingual Data,17000,mme_total_score,1270.766606642657,
|
| 439 |
+
Remove Multilingual Data,17000,mmmu_val_mmmu_acc,0.28,
|
| 440 |
+
Remove Multilingual Data,17000,mmstar_average,0.3233592606268431,
|
| 441 |
+
Remove Multilingual Data,17000,ocrbench_ocrbench_accuracy,0.58,
|
| 442 |
+
Remove Multilingual Data,17000,seedbench_seed_all,0.5439132851584213,
|
| 443 |
+
Remove Multilingual Data,17000,textvqa_val_exact_match,0.5581400000000001,0.006731048171116916
|
| 444 |
+
Remove Multilingual Data,18000,ai2d_exact_match,0.4368523316062176,0.008927095061184944
|
| 445 |
+
Remove Multilingual Data,18000,average,0.4769341122300441,
|
| 446 |
+
Remove Multilingual Data,18000,average_rank,1.9,
|
| 447 |
+
Remove Multilingual Data,18000,chartqa_relaxed_overall,0.636,0.009624897685803465
|
| 448 |
+
Remove Multilingual Data,18000,docvqa_val_anls,0.671397164123935,0.006004837667492473
|
| 449 |
+
Remove Multilingual Data,18000,infovqa_val_anls,0.2570865428675732,0.007022334730795061
|
| 450 |
+
Remove Multilingual Data,18000,mme_total_score,1330.2323929571828,
|
| 451 |
+
Remove Multilingual Data,18000,mmmu_val_mmmu_acc,0.28444,
|
| 452 |
+
Remove Multilingual Data,18000,mmstar_average,0.3272633338962395,
|
| 453 |
+
Remove Multilingual Data,18000,ocrbench_ocrbench_accuracy,0.579,
|
| 454 |
+
Remove Multilingual Data,18000,seedbench_seed_all,0.5457476375764313,
|
| 455 |
+
Remove Multilingual Data,18000,textvqa_val_exact_match,0.55462,0.0067429981999808505
|
app/src/content/assets/data/s25_ratings.csv
ADDED
|
@@ -0,0 +1,1189 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
run,step,metric,value,stderr
|
| 2 |
+
≥1,1000,ai2d_exact_match,0.48283678756476683,0.00899385068939683
|
| 3 |
+
≥1,1000,average,0.4841740613238066,
|
| 4 |
+
≥1,1000,average_rank,2.4,
|
| 5 |
+
≥1,1000,chartqa_relaxed_overall,0.6328,0.00964276190429159
|
| 6 |
+
≥1,1000,docvqa_val_anls,0.6709958484393396,0.006009113294340719
|
| 7 |
+
≥1,1000,infovqa_val_anls,0.2911610792718508,0.007480963558334323
|
| 8 |
+
≥1,1000,mme_total_score,1300.6441576630652,
|
| 9 |
+
≥1,1000,mmmu_val_mmmu_acc,0.28111,
|
| 10 |
+
≥1,1000,mmstar_average,0.34899099672724077,
|
| 11 |
+
≥1,1000,ocrbench_ocrbench_accuracy,0.53,
|
| 12 |
+
≥1,1000,seedbench_seed_all,0.5613118399110617,
|
| 13 |
+
≥1,1000,textvqa_val_exact_match,0.5583600000000001,0.006733787259646062
|
| 14 |
+
≥1,2000,ai2d_exact_match,0.4834844559585492,0.008994243503406855
|
| 15 |
+
≥1,2000,average,0.4870755750428875,
|
| 16 |
+
≥1,2000,average_rank,2.0,
|
| 17 |
+
≥1,2000,chartqa_relaxed_overall,0.6296,0.0096601689190934
|
| 18 |
+
≥1,2000,docvqa_val_anls,0.6827112292156415,0.005909694544631059
|
| 19 |
+
≥1,2000,infovqa_val_anls,0.26248215166111283,0.006999241957900095
|
| 20 |
+
≥1,2000,mme_total_score,1316.5322128851542,
|
| 21 |
+
≥1,2000,mmmu_val_mmmu_acc,0.29556,
|
| 22 |
+
≥1,2000,mmstar_average,0.351185684854186,
|
| 23 |
+
≥1,2000,ocrbench_ocrbench_accuracy,0.557,
|
| 24 |
+
≥1,2000,seedbench_seed_all,0.5579766536964981,
|
| 25 |
+
≥1,2000,textvqa_val_exact_match,0.5636800000000001,0.006720565803631728
|
| 26 |
+
≥1,3000,ai2d_exact_match,0.47085492227979275,0.008983852707691605
|
| 27 |
+
≥1,3000,average,0.48291385198510484,
|
| 28 |
+
≥1,3000,average_rank,2.7,
|
| 29 |
+
≥1,3000,chartqa_relaxed_overall,0.6416,0.00959252743718011
|
| 30 |
+
≥1,3000,docvqa_val_anls,0.680081009037435,0.005963713977526521
|
| 31 |
+
≥1,3000,infovqa_val_anls,0.2758757523314467,0.007145074435929658
|
| 32 |
+
≥1,3000,mme_total_score,1338.268607442977,
|
| 33 |
+
≥1,3000,mmmu_val_mmmu_acc,0.26889,
|
| 34 |
+
≥1,3000,mmstar_average,0.34908867626840856,
|
| 35 |
+
≥1,3000,ocrbench_ocrbench_accuracy,0.542,
|
| 36 |
+
≥1,3000,seedbench_seed_all,0.5577543079488605,
|
| 37 |
+
≥1,3000,textvqa_val_exact_match,0.56008,0.00674696843305253
|
| 38 |
+
≥1,4000,ai2d_exact_match,0.48218911917098445,0.008993442748995703
|
| 39 |
+
≥1,4000,average,0.49172515123492716,
|
| 40 |
+
≥1,4000,average_rank,2.3,
|
| 41 |
+
≥1,4000,chartqa_relaxed_overall,0.6488,0.009548816468986266
|
| 42 |
+
≥1,4000,docvqa_val_anls,0.6902890941626307,0.005912204920631156
|
| 43 |
+
≥1,4000,infovqa_val_anls,0.26986279043614175,0.007091114226807192
|
| 44 |
+
≥1,4000,mme_total_score,1322.6090436174468,
|
| 45 |
+
≥1,4000,mmmu_val_mmmu_acc,0.31,
|
| 46 |
+
≥1,4000,mmstar_average,0.35470222226954573,
|
| 47 |
+
≥1,4000,ocrbench_ocrbench_accuracy,0.542,
|
| 48 |
+
≥1,4000,seedbench_seed_all,0.5576431350750417,
|
| 49 |
+
≥1,4000,textvqa_val_exact_match,0.57004,0.006721660198430491
|
| 50 |
+
≥1,5000,ai2d_exact_match,0.48704663212435234,0.008996133680935945
|
| 51 |
+
≥1,5000,average,0.4922453953675835,
|
| 52 |
+
≥1,5000,average_rank,2.3,
|
| 53 |
+
≥1,5000,chartqa_relaxed_overall,0.6524,0.009526069199715017
|
| 54 |
+
≥1,5000,docvqa_val_anls,0.7021575420936199,0.005829944728253253
|
| 55 |
+
≥1,5000,infovqa_val_anls,0.2714850202382579,0.0071017460136769345
|
| 56 |
+
≥1,5000,mme_total_score,1372.0063025210084,
|
| 57 |
+
≥1,5000,mmmu_val_mmmu_acc,0.28444,
|
| 58 |
+
≥1,5000,mmstar_average,0.34918092027225467,
|
| 59 |
+
≥1,5000,ocrbench_ocrbench_accuracy,0.553,
|
| 60 |
+
≥1,5000,seedbench_seed_all,0.5571984435797666,
|
| 61 |
+
≥1,5000,textvqa_val_exact_match,0.5733,0.0066972526186883305
|
| 62 |
+
≥1,6000,ai2d_exact_match,0.4838082901554404,0.008994434238637761
|
| 63 |
+
≥1,6000,average,0.4949352825546263,
|
| 64 |
+
≥1,6000,average_rank,2.3,
|
| 65 |
+
≥1,6000,chartqa_relaxed_overall,0.6484,0.009551307082635064
|
| 66 |
+
≥1,6000,docvqa_val_anls,0.7034964362890477,0.00583650860725618
|
| 67 |
+
≥1,6000,infovqa_val_anls,0.2724245614355471,0.0071074877022118095
|
| 68 |
+
≥1,6000,mme_total_score,1406.1297519007603,
|
| 69 |
+
≥1,6000,mmmu_val_mmmu_acc,0.30333,
|
| 70 |
+
≥1,6000,mmstar_average,0.3537726186468994,
|
| 71 |
+
≥1,6000,ocrbench_ocrbench_accuracy,0.551,
|
| 72 |
+
≥1,6000,seedbench_seed_all,0.5621456364647026,
|
| 73 |
+
≥1,6000,textvqa_val_exact_match,0.57604,0.006696965995935035
|
| 74 |
+
≥1,7000,ai2d_exact_match,0.49158031088082904,0.008997878107766406
|
| 75 |
+
≥1,7000,average,0.5010900439307898,
|
| 76 |
+
≥1,7000,average_rank,1.9,
|
| 77 |
+
≥1,7000,chartqa_relaxed_overall,0.6564,0.009500090351500593
|
| 78 |
+
≥1,7000,docvqa_val_anls,0.7105997601562098,0.005781434620670767
|
| 79 |
+
≥1,7000,infovqa_val_anls,0.29338120425035286,0.007415977951206446
|
| 80 |
+
≥1,7000,mme_total_score,1362.5676270508204,
|
| 81 |
+
≥1,7000,mmmu_val_mmmu_acc,0.30778,
|
| 82 |
+
≥1,7000,mmstar_average,0.34667048751606516,
|
| 83 |
+
≥1,7000,ocrbench_ocrbench_accuracy,0.555,
|
| 84 |
+
≥1,7000,seedbench_seed_all,0.569538632573652,
|
| 85 |
+
≥1,7000,textvqa_val_exact_match,0.57886,0.006701104464206482
|
| 86 |
+
≥1,8000,ai2d_exact_match,0.48704663212435234,0.008996133680935945
|
| 87 |
+
≥1,8000,average,0.5012874693126343,
|
| 88 |
+
≥1,8000,average_rank,2.4,
|
| 89 |
+
≥1,8000,chartqa_relaxed_overall,0.66,0.009476070829586857
|
| 90 |
+
≥1,8000,docvqa_val_anls,0.7013710839567656,0.00584567154399218
|
| 91 |
+
≥1,8000,infovqa_val_anls,0.2843596286067672,0.00726326016667778
|
| 92 |
+
≥1,8000,mme_total_score,1366.6049419767905,
|
| 93 |
+
≥1,8000,mmmu_val_mmmu_acc,0.29778,
|
| 94 |
+
≥1,8000,mmstar_average,0.3726316801263804,
|
| 95 |
+
≥1,8000,ocrbench_ocrbench_accuracy,0.568,
|
| 96 |
+
≥1,8000,seedbench_seed_all,0.5647581989994441,
|
| 97 |
+
≥1,8000,textvqa_val_exact_match,0.5756399999999999,0.006701275960583923
|
| 98 |
+
≥1,9000,ai2d_exact_match,0.5080958549222798,0.008997974381217102
|
| 99 |
+
≥1,9000,average,0.5049424624827252,
|
| 100 |
+
≥1,9000,average_rank,1.9,
|
| 101 |
+
≥1,9000,chartqa_relaxed_overall,0.6644,0.009445885130487209
|
| 102 |
+
≥1,9000,docvqa_val_anls,0.7114743939854425,0.005784207378273765
|
| 103 |
+
≥1,9000,infovqa_val_anls,0.27927629692536604,0.007234508289873752
|
| 104 |
+
≥1,9000,mme_total_score,1385.721988795518,
|
| 105 |
+
≥1,9000,mmmu_val_mmmu_acc,0.30333,
|
| 106 |
+
≥1,9000,mmstar_average,0.35371044141416225,
|
| 107 |
+
≥1,9000,ocrbench_ocrbench_accuracy,0.572,
|
| 108 |
+
≥1,9000,seedbench_seed_all,0.5673151750972762,
|
| 109 |
+
≥1,9000,textvqa_val_exact_match,0.58488,0.006674247990391685
|
| 110 |
+
≥1,10000,ai2d_exact_match,0.5006476683937824,0.008999146569435552
|
| 111 |
+
≥1,10000,average,0.5082439013030791,
|
| 112 |
+
≥1,10000,average_rank,2.1,
|
| 113 |
+
≥1,10000,chartqa_relaxed_overall,0.66,0.009476070829586857
|
| 114 |
+
≥1,10000,docvqa_val_anls,0.7160888537676756,0.005756158349745215
|
| 115 |
+
≥1,10000,infovqa_val_anls,0.29920594326668903,0.0074179476099996864
|
| 116 |
+
≥1,10000,mme_total_score,1331.7510004001601,
|
| 117 |
+
≥1,10000,mmmu_val_mmmu_acc,0.31222,
|
| 118 |
+
≥1,10000,mmstar_average,0.34770435280317685,
|
| 119 |
+
≥1,10000,ocrbench_ocrbench_accuracy,0.572,
|
| 120 |
+
≥1,10000,seedbench_seed_all,0.5709282934963869,
|
| 121 |
+
≥1,10000,textvqa_val_exact_match,0.5954,0.006639803114330983
|
| 122 |
+
≥1,11000,ai2d_exact_match,0.506800518134715,0.008998321712163856
|
| 123 |
+
≥1,11000,average,0.5113045470128461,
|
| 124 |
+
≥1,11000,average_rank,2.4,
|
| 125 |
+
≥1,11000,chartqa_relaxed_overall,0.6648,0.009443095510537233
|
| 126 |
+
≥1,11000,docvqa_val_anls,0.7219007936057111,0.005738025679608452
|
| 127 |
+
≥1,11000,infovqa_val_anls,0.2919206859707748,0.007295238934448537
|
| 128 |
+
≥1,11000,mme_total_score,1423.2838135254103,
|
| 129 |
+
≥1,11000,mmmu_val_mmmu_acc,0.32,
|
| 130 |
+
≥1,11000,mmstar_average,0.34837257743331856,
|
| 131 |
+
≥1,11000,ocrbench_ocrbench_accuracy,0.584,
|
| 132 |
+
≥1,11000,seedbench_seed_all,0.567426347971095,
|
| 133 |
+
≥1,11000,textvqa_val_exact_match,0.5965199999999999,0.006637830223651069
|
| 134 |
+
≥1,12000,ai2d_exact_match,0.4957901554404145,0.0089988351333547
|
| 135 |
+
≥1,12000,average,0.5133063005116858,
|
| 136 |
+
≥1,12000,average_rank,2.0,
|
| 137 |
+
≥1,12000,chartqa_relaxed_overall,0.6752,0.00936787525721462
|
| 138 |
+
≥1,12000,docvqa_val_anls,0.7317458509080867,0.005677899397993261
|
| 139 |
+
≥1,12000,infovqa_val_anls,0.30244398410320705,0.0074372299260171675
|
| 140 |
+
≥1,12000,mme_total_score,1358.8711484593837,
|
| 141 |
+
≥1,12000,mmmu_val_mmmu_acc,0.30222,
|
| 142 |
+
≥1,12000,mmstar_average,0.36151764800560426,
|
| 143 |
+
≥1,12000,ocrbench_ocrbench_accuracy,0.571,
|
| 144 |
+
≥1,12000,seedbench_seed_all,0.5743190661478599,
|
| 145 |
+
≥1,12000,textvqa_val_exact_match,0.6055199999999998,0.006601107546780982
|
| 146 |
+
≥1,13000,ai2d_exact_match,0.5029145077720207,0.008999001233939133
|
| 147 |
+
≥1,13000,average,0.5113232076887448,
|
| 148 |
+
≥1,13000,average_rank,2.2,
|
| 149 |
+
≥1,13000,chartqa_relaxed_overall,0.6764,0.009358859508536295
|
| 150 |
+
≥1,13000,docvqa_val_anls,0.7299154645021083,0.005686391180628681
|
| 151 |
+
≥1,13000,infovqa_val_anls,0.28296895663700367,0.007106598521793854
|
| 152 |
+
≥1,13000,mme_total_score,1461.5425170068027,
|
| 153 |
+
≥1,13000,mmmu_val_mmmu_acc,0.28444,
|
| 154 |
+
≥1,13000,mmstar_average,0.3679555656349867,
|
| 155 |
+
≥1,13000,ocrbench_ocrbench_accuracy,0.575,
|
| 156 |
+
≥1,13000,seedbench_seed_all,0.5738743746525847,
|
| 157 |
+
≥1,13000,textvqa_val_exact_match,0.60844,0.006603822784953804
|
| 158 |
+
≥1,14000,ai2d_exact_match,0.508419689119171,0.00899787810776641
|
| 159 |
+
≥1,14000,average,0.5204248423941521,
|
| 160 |
+
≥1,14000,average_rank,1.5,
|
| 161 |
+
≥1,14000,chartqa_relaxed_overall,0.6748,0.009370864914387439
|
| 162 |
+
≥1,14000,docvqa_val_anls,0.7348023413497262,0.005658144612389036
|
| 163 |
+
≥1,14000,infovqa_val_anls,0.30339204212390886,0.007452040139655917
|
| 164 |
+
≥1,14000,mme_total_score,1421.6612645058024,
|
| 165 |
+
≥1,14000,mmmu_val_mmmu_acc,0.32333,
|
| 166 |
+
≥1,14000,mmstar_average,0.3578816768256025,
|
| 167 |
+
≥1,14000,ocrbench_ocrbench_accuracy,0.59,
|
| 168 |
+
≥1,14000,seedbench_seed_all,0.5760978321289605,
|
| 169 |
+
≥1,14000,textvqa_val_exact_match,0.6151,0.006568548330143662
|
| 170 |
+
≥1,15000,ai2d_exact_match,0.5123056994818653,0.008996428218289524
|
| 171 |
+
≥1,15000,average,0.518135626255078,
|
| 172 |
+
≥1,15000,average_rank,1.8,
|
| 173 |
+
≥1,15000,chartqa_relaxed_overall,0.6768,0.009355838641547569
|
| 174 |
+
≥1,15000,docvqa_val_anls,0.7406818231641893,0.00561534943093856
|
| 175 |
+
≥1,15000,infovqa_val_anls,0.2993680664172523,0.007344080406067735
|
| 176 |
+
≥1,15000,mme_total_score,1410.685474189676,
|
| 177 |
+
≥1,15000,mmmu_val_mmmu_acc,0.31778,
|
| 178 |
+
≥1,15000,mmstar_average,0.34818335740471335,
|
| 179 |
+
≥1,15000,ocrbench_ocrbench_accuracy,0.581,
|
| 180 |
+
≥1,15000,seedbench_seed_all,0.575041689827682,
|
| 181 |
+
≥1,15000,textvqa_val_exact_match,0.61206,0.006579602534644686
|
| 182 |
+
≥1,16000,ai2d_exact_match,0.5148963730569949,0.008995159373289019
|
| 183 |
+
≥1,16000,average,0.5188529848530237,
|
| 184 |
+
≥1,16000,average_rank,2.2,
|
| 185 |
+
≥1,16000,chartqa_relaxed_overall,0.6768,0.009355838641547569
|
| 186 |
+
≥1,16000,docvqa_val_anls,0.7381040832460759,0.005632273383411858
|
| 187 |
+
≥1,16000,infovqa_val_anls,0.30209162600532213,0.007372809699325085
|
| 188 |
+
≥1,16000,mme_total_score,1390.1362545018007,
|
| 189 |
+
≥1,16000,mmmu_val_mmmu_acc,0.31111,
|
| 190 |
+
≥1,16000,mmstar_average,0.35327018992913145,
|
| 191 |
+
≥1,16000,ocrbench_ocrbench_accuracy,0.581,
|
| 192 |
+
≥1,16000,seedbench_seed_all,0.5762645914396887,
|
| 193 |
+
≥1,16000,textvqa_val_exact_match,0.6161399999999999,0.006566896139347796
|
| 194 |
+
≥1,17000,ai2d_exact_match,0.5148963730569949,0.008995159373289019
|
| 195 |
+
≥1,17000,average,0.5197229023161958,
|
| 196 |
+
≥1,17000,average_rank,2.4,
|
| 197 |
+
≥1,17000,chartqa_relaxed_overall,0.6808,0.009325198535746702
|
| 198 |
+
≥1,17000,docvqa_val_anls,0.7415371461870564,0.005606416638789011
|
| 199 |
+
≥1,17000,infovqa_val_anls,0.31757741607819345,0.0075605614362149656
|
| 200 |
+
≥1,17000,mme_total_score,1349.7522008803521,
|
| 201 |
+
≥1,17000,mmmu_val_mmmu_acc,0.29556,
|
| 202 |
+
≥1,17000,mmstar_average,0.3467129398314668,
|
| 203 |
+
≥1,17000,ocrbench_ocrbench_accuracy,0.589,
|
| 204 |
+
≥1,17000,seedbench_seed_all,0.5760422456920511,
|
| 205 |
+
≥1,17000,textvqa_val_exact_match,0.6153799999999999,0.0065759668329423305
|
| 206 |
+
≥1,18000,ai2d_exact_match,0.5113341968911918,0.008996841687150462
|
| 207 |
+
≥1,18000,average,0.5217542622446647,
|
| 208 |
+
≥1,18000,average_rank,2.1,
|
| 209 |
+
≥1,18000,chartqa_relaxed_overall,0.686,0.00928418431696466
|
| 210 |
+
≥1,18000,docvqa_val_anls,0.7485976064804745,0.005545760483304357
|
| 211 |
+
≥1,18000,infovqa_val_anls,0.3079394168596966,0.007506515528281936
|
| 212 |
+
≥1,18000,mme_total_score,1386.236494597839,
|
| 213 |
+
≥1,18000,mmmu_val_mmmu_acc,0.30889,
|
| 214 |
+
≥1,18000,mmstar_average,0.36329690094894107,
|
| 215 |
+
≥1,18000,ocrbench_ocrbench_accuracy,0.58,
|
| 216 |
+
≥1,18000,seedbench_seed_all,0.5744302390216787,
|
| 217 |
+
≥1,18000,textvqa_val_exact_match,0.6153,0.006569673821646289
|
| 218 |
+
≥1,19000,ai2d_exact_match,0.5116580310880829,0.008996707642249475
|
| 219 |
+
≥1,19000,average,0.5243525940235553,
|
| 220 |
+
≥1,19000,average_rank,1.6,
|
| 221 |
+
≥1,19000,chartqa_relaxed_overall,0.6896,0.009254998541285659
|
| 222 |
+
≥1,19000,docvqa_val_anls,0.7410075109051968,0.005624845495160182
|
| 223 |
+
≥1,19000,infovqa_val_anls,0.31451986671246684,0.00754441993362511
|
| 224 |
+
≥1,19000,mme_total_score,1379.0539215686274,
|
| 225 |
+
≥1,19000,mmmu_val_mmmu_acc,0.30889,
|
| 226 |
+
≥1,19000,mmstar_average,0.36379458008546134,
|
| 227 |
+
≥1,19000,ocrbench_ocrbench_accuracy,0.594,
|
| 228 |
+
≥1,19000,seedbench_seed_all,0.5780433574207893,
|
| 229 |
+
≥1,19000,textvqa_val_exact_match,0.61766,0.006552511881896322
|
| 230 |
+
≥2,1000,ai2d_exact_match,0.47765544041450775,0.00899016344465196
|
| 231 |
+
≥2,1000,average,0.48208320918746633,
|
| 232 |
+
≥2,1000,average_rank,2.7,
|
| 233 |
+
≥2,1000,chartqa_relaxed_overall,0.626,0.009679208378267924
|
| 234 |
+
≥2,1000,docvqa_val_anls,0.6830886615719474,0.005941664313882304
|
| 235 |
+
≥2,1000,infovqa_val_anls,0.2636626226113445,0.007012099858086531
|
| 236 |
+
≥2,1000,mme_total_score,1394.7869147659064,
|
| 237 |
+
≥2,1000,mmmu_val_mmmu_acc,0.28111,
|
| 238 |
+
≥2,1000,mmstar_average,0.3621500124529322,
|
| 239 |
+
≥2,1000,ocrbench_ocrbench_accuracy,0.53,
|
| 240 |
+
≥2,1000,seedbench_seed_all,0.5518621456364647,
|
| 241 |
+
≥2,1000,textvqa_val_exact_match,0.5632199999999999,0.006735793977260649
|
| 242 |
+
≥2,2000,ai2d_exact_match,0.47506476683937826,0.00898795641911507
|
| 243 |
+
≥2,2000,average,0.48647523098478523,
|
| 244 |
+
≥2,2000,average_rank,2.4,
|
| 245 |
+
≥2,2000,chartqa_relaxed_overall,0.6392,0.00960657371300514
|
| 246 |
+
≥2,2000,docvqa_val_anls,0.6776161818000301,0.005964335785163625
|
| 247 |
+
≥2,2000,infovqa_val_anls,0.28064001553745443,0.007228333231022024
|
| 248 |
+
≥2,2000,mme_total_score,1262.5283113245298,
|
| 249 |
+
≥2,2000,mmmu_val_mmmu_acc,0.29556,
|
| 250 |
+
≥2,2000,mmstar_average,0.3433600502059375,
|
| 251 |
+
≥2,2000,ocrbench_ocrbench_accuracy,0.562,
|
| 252 |
+
≥2,2000,seedbench_seed_all,0.5489160644802669,
|
| 253 |
+
≥2,2000,textvqa_val_exact_match,0.55592,0.006741845534884587
|
| 254 |
+
≥2,3000,ai2d_exact_match,0.4854274611398964,0.00899533120652686
|
| 255 |
+
≥2,3000,average,0.4892979098475977,
|
| 256 |
+
≥2,3000,average_rank,2.0,
|
| 257 |
+
≥2,3000,chartqa_relaxed_overall,0.642,0.009590161024476605
|
| 258 |
+
≥2,3000,docvqa_val_anls,0.682810147307377,0.005940269120275799
|
| 259 |
+
≥2,3000,infovqa_val_anls,0.27552490540828095,0.007240182675336717
|
| 260 |
+
≥2,3000,mme_total_score,1310.3195278111243,
|
| 261 |
+
≥2,3000,mmmu_val_mmmu_acc,0.29667,
|
| 262 |
+
≥2,3000,mmstar_average,0.33383353302741087,
|
| 263 |
+
≥2,3000,ocrbench_ocrbench_accuracy,0.56,
|
| 264 |
+
≥2,3000,seedbench_seed_all,0.5592551417454141,
|
| 265 |
+
≥2,3000,textvqa_val_exact_match,0.56816,0.00671355771938026
|
| 266 |
+
≥2,4000,ai2d_exact_match,0.4838082901554404,0.008994434238637763
|
| 267 |
+
≥2,4000,average,0.49195026536834224,
|
| 268 |
+
≥2,4000,average_rank,2.2,
|
| 269 |
+
≥2,4000,chartqa_relaxed_overall,0.6428,0.009585406407993486
|
| 270 |
+
≥2,4000,docvqa_val_anls,0.6936982319965624,0.005883844142208432
|
| 271 |
+
≥2,4000,infovqa_val_anls,0.26951374340713585,0.007112166845409044
|
| 272 |
+
≥2,4000,mme_total_score,1301.329931972789,
|
| 273 |
+
≥2,4000,mmmu_val_mmmu_acc,0.30667,
|
| 274 |
+
≥2,4000,mmstar_average,0.34946626950413445,
|
| 275 |
+
≥2,4000,ocrbench_ocrbench_accuracy,0.547,
|
| 276 |
+
≥2,4000,seedbench_seed_all,0.5645358532518066,
|
| 277 |
+
≥2,4000,textvqa_val_exact_match,0.5700599999999999,0.006712416151142391
|
| 278 |
+
≥2,5000,ai2d_exact_match,0.4802461139896373,0.008992128148477658
|
| 279 |
+
≥2,5000,average,0.4911460216363542,
|
| 280 |
+
≥2,5000,average_rank,2.6,
|
| 281 |
+
≥2,5000,chartqa_relaxed_overall,0.6592,0.009481461028833927
|
| 282 |
+
≥2,5000,docvqa_val_anls,0.6952750329046061,0.005874374530558489
|
| 283 |
+
≥2,5000,infovqa_val_anls,0.2792676155726946,0.007321946399777712
|
| 284 |
+
≥2,5000,mme_total_score,1246.5271108443376,
|
| 285 |
+
≥2,5000,mmmu_val_mmmu_acc,0.30667,
|
| 286 |
+
≥2,5000,mmstar_average,0.3273375111929903,
|
| 287 |
+
≥2,5000,ocrbench_ocrbench_accuracy,0.544,
|
| 288 |
+
≥2,5000,seedbench_seed_all,0.5642579210672596,
|
| 289 |
+
≥2,5000,textvqa_val_exact_match,0.56406,0.006733849732986717
|
| 290 |
+
≥2,6000,ai2d_exact_match,0.47636010362694303,0.0089890902327936
|
| 291 |
+
≥2,6000,average,0.49370635223913606,
|
| 292 |
+
≥2,6000,average_rank,2.4,
|
| 293 |
+
≥2,6000,chartqa_relaxed_overall,0.6576,0.00949215130381674
|
| 294 |
+
≥2,6000,docvqa_val_anls,0.6979936603307108,0.005857650960456797
|
| 295 |
+
≥2,6000,infovqa_val_anls,0.2848576580974239,0.007220288614025636
|
| 296 |
+
≥2,6000,mme_total_score,1257.9977991196479,
|
| 297 |
+
≥2,6000,mmmu_val_mmmu_acc,0.28889,
|
| 298 |
+
≥2,6000,mmstar_average,0.3386087219715212,
|
| 299 |
+
≥2,6000,ocrbench_ocrbench_accuracy,0.555,
|
| 300 |
+
≥2,6000,seedbench_seed_all,0.5646470261256253,
|
| 301 |
+
≥2,6000,textvqa_val_exact_match,0.5794000000000001,0.0066913139768320015
|
| 302 |
+
≥2,7000,ai2d_exact_match,0.49125647668393785,0.008997778057794696
|
| 303 |
+
≥2,7000,average,0.49923190517534066,
|
| 304 |
+
≥2,7000,average_rank,2.4,
|
| 305 |
+
≥2,7000,chartqa_relaxed_overall,0.6564,0.009500090351500593
|
| 306 |
+
≥2,7000,docvqa_val_anls,0.7050049130392773,0.005832016517791021
|
| 307 |
+
≥2,7000,infovqa_val_anls,0.27630514531293887,0.007147131752819133
|
| 308 |
+
≥2,7000,mme_total_score,1298.6506602641057,
|
| 309 |
+
≥2,7000,mmmu_val_mmmu_acc,0.30667,
|
| 310 |
+
≥2,7000,mmstar_average,0.35103185667809866,
|
| 311 |
+
≥2,7000,ocrbench_ocrbench_accuracy,0.561,
|
| 312 |
+
≥2,7000,seedbench_seed_all,0.5657587548638132,
|
| 313 |
+
≥2,7000,textvqa_val_exact_match,0.5796600000000001,0.006695268643186835
|
| 314 |
+
≥2,8000,ai2d_exact_match,0.4948186528497409,0.008998670917263325
|
| 315 |
+
≥2,8000,average,0.5019054681854818,
|
| 316 |
+
≥2,8000,average_rank,1.7,
|
| 317 |
+
≥2,8000,chartqa_relaxed_overall,0.6528,0.009523504757028414
|
| 318 |
+
≥2,8000,docvqa_val_anls,0.7073923991601945,0.005811715016078567
|
| 319 |
+
≥2,8000,infovqa_val_anls,0.2893855968120429,0.007315932200378898
|
| 320 |
+
≥2,8000,mme_total_score,1294.7393957583033,
|
| 321 |
+
≥2,8000,mmmu_val_mmmu_acc,0.31444,
|
| 322 |
+
≥2,8000,mmstar_average,0.35566192560333365,
|
| 323 |
+
≥2,8000,ocrbench_ocrbench_accuracy,0.543,
|
| 324 |
+
≥2,8000,seedbench_seed_all,0.5711506392440244,
|
| 325 |
+
≥2,8000,textvqa_val_exact_match,0.5885,0.006652668757748281
|
| 326 |
+
≥2,9000,ai2d_exact_match,0.4961139896373057,0.008998882321332237
|
| 327 |
+
≥2,9000,average,0.5033958878673905,
|
| 328 |
+
≥2,9000,average_rank,1.8,
|
| 329 |
+
≥2,9000,chartqa_relaxed_overall,0.6652,0.009440298284094473
|
| 330 |
+
≥2,9000,docvqa_val_anls,0.706747911546142,0.005822953083156574
|
| 331 |
+
≥2,9000,infovqa_val_anls,0.2960318229790583,0.007315313753711981
|
| 332 |
+
≥2,9000,mme_total_score,1284.486194477791,
|
| 333 |
+
≥2,9000,mmmu_val_mmmu_acc,0.31111,
|
| 334 |
+
≥2,9000,mmstar_average,0.3461876713132692,
|
| 335 |
+
≥2,9000,ocrbench_ocrbench_accuracy,0.551,
|
| 336 |
+
≥2,9000,seedbench_seed_all,0.5688715953307393,
|
| 337 |
+
≥2,9000,textvqa_val_exact_match,0.5893,0.006649446971666576
|
| 338 |
+
≥2,10000,ai2d_exact_match,0.4954663212435233,0.008998784170060763
|
| 339 |
+
≥2,10000,average,0.5062630509259689,
|
| 340 |
+
≥2,10000,average_rank,2.5,
|
| 341 |
+
≥2,10000,chartqa_relaxed_overall,0.668,0.009420504145710235
|
| 342 |
+
≥2,10000,docvqa_val_anls,0.722875937910498,0.005715570269767272
|
| 343 |
+
≥2,10000,infovqa_val_anls,0.28155653174519985,0.007182472403759747
|
| 344 |
+
≥2,10000,mme_total_score,1304.360544217687,
|
| 345 |
+
≥2,10000,mmmu_val_mmmu_acc,0.31556,
|
| 346 |
+
≥2,10000,mmstar_average,0.34845583808486047,
|
| 347 |
+
≥2,10000,ocrbench_ocrbench_accuracy,0.564,
|
| 348 |
+
≥2,10000,seedbench_seed_all,0.5670928293496387,
|
| 349 |
+
≥2,10000,textvqa_val_exact_match,0.59336,0.006650836699676301
|
| 350 |
+
≥2,11000,ai2d_exact_match,0.5093911917098446,0.008997566627779879
|
| 351 |
+
≥2,11000,average,0.5121996275740728,
|
| 352 |
+
≥2,11000,average_rank,2.1,
|
| 353 |
+
≥2,11000,chartqa_relaxed_overall,0.6692,0.009411906161401973
|
| 354 |
+
≥2,11000,docvqa_val_anls,0.7205703696519083,0.005737270521428796
|
| 355 |
+
≥2,11000,infovqa_val_anls,0.30697732217578644,0.007486340094072884
|
| 356 |
+
≥2,11000,mme_total_score,1312.018607442977,
|
| 357 |
+
≥2,11000,mmmu_val_mmmu_acc,0.30889,
|
| 358 |
+
≥2,11000,mmstar_average,0.34270221710271187,
|
| 359 |
+
≥2,11000,ocrbench_ocrbench_accuracy,0.574,
|
| 360 |
+
≥2,11000,seedbench_seed_all,0.5739855475264035,
|
| 361 |
+
≥2,11000,textvqa_val_exact_match,0.6040800000000001,0.00661203558088616
|
| 362 |
+
≥2,12000,ai2d_exact_match,0.5123056994818653,0.008996428218289528
|
| 363 |
+
≥2,12000,average,0.5150951619345675,
|
| 364 |
+
≥2,12000,average_rank,2.3,
|
| 365 |
+
≥2,12000,chartqa_relaxed_overall,0.6672,0.00942619781683542
|
| 366 |
+
≥2,12000,docvqa_val_anls,0.726550362704052,0.005691891264118933
|
| 367 |
+
≥2,12000,infovqa_val_anls,0.3008889889078986,0.007362325835960529
|
| 368 |
+
≥2,12000,mme_total_score,1224.6687675070027,
|
| 369 |
+
≥2,12000,mmmu_val_mmmu_acc,0.31444,
|
| 370 |
+
≥2,12000,mmstar_average,0.35781468591706916,
|
| 371 |
+
≥2,12000,ocrbench_ocrbench_accuracy,0.58,
|
| 372 |
+
≥2,12000,seedbench_seed_all,0.5740967204002223,
|
| 373 |
+
≥2,12000,textvqa_val_exact_match,0.60256,0.006618961505423797
|
| 374 |
+
≥2,13000,ai2d_exact_match,0.5080958549222798,0.0089979743812171
|
| 375 |
+
≥2,13000,average,0.5180586542380377,
|
| 376 |
+
≥2,13000,average_rank,1.4,
|
| 377 |
+
≥2,13000,chartqa_relaxed_overall,0.6752,0.00936787525721462
|
| 378 |
+
≥2,13000,docvqa_val_anls,0.726059208019786,0.0056904102427854444
|
| 379 |
+
≥2,13000,infovqa_val_anls,0.3067653345076983,0.007414171368476549
|
| 380 |
+
≥2,13000,mme_total_score,1241.2817126850741,
|
| 381 |
+
≥2,13000,mmmu_val_mmmu_acc,0.31778,
|
| 382 |
+
≥2,13000,mmstar_average,0.35994731281597653,
|
| 383 |
+
≥2,13000,ocrbench_ocrbench_accuracy,0.582,
|
| 384 |
+
≥2,13000,seedbench_seed_all,0.5763201778765981,
|
| 385 |
+
≥2,13000,textvqa_val_exact_match,0.61036,0.006605638574142127
|
| 386 |
+
≥2,14000,ai2d_exact_match,0.5055051813471503,0.008998608627616667
|
| 387 |
+
≥2,14000,average,0.5187199474947337,
|
| 388 |
+
≥2,14000,average_rank,2.1,
|
| 389 |
+
≥2,14000,chartqa_relaxed_overall,0.6788,0.00934061683451043
|
| 390 |
+
≥2,14000,docvqa_val_anls,0.7306315173289623,0.005670445587318404
|
| 391 |
+
≥2,14000,infovqa_val_anls,0.30084936045159133,0.007340699586893536
|
| 392 |
+
≥2,14000,mme_total_score,1266.9314725890356,
|
| 393 |
+
≥2,14000,mmmu_val_mmmu_acc,0.32,
|
| 394 |
+
≥2,14000,mmstar_average,0.360779371604499,
|
| 395 |
+
≥2,14000,ocrbench_ocrbench_accuracy,0.587,
|
| 396 |
+
≥2,14000,seedbench_seed_all,0.5733740967204002,
|
| 397 |
+
≥2,14000,textvqa_val_exact_match,0.61154,0.006582281592745273
|
| 398 |
+
≥2,15000,ai2d_exact_match,0.5077720207253886,0.008998066878268323
|
| 399 |
+
≥2,15000,average,0.5182417002827931,
|
| 400 |
+
≥2,15000,average_rank,2.2,
|
| 401 |
+
≥2,15000,chartqa_relaxed_overall,0.6732,0.009382745779746297
|
| 402 |
+
≥2,15000,docvqa_val_anls,0.7366238053330653,0.005647248266865468
|
| 403 |
+
≥2,15000,infovqa_val_anls,0.30893362163842225,0.007385953320794889
|
| 404 |
+
≥2,15000,mme_total_score,1280.3160264105643,
|
| 405 |
+
≥2,15000,mmmu_val_mmmu_acc,0.31778,
|
| 406 |
+
≥2,15000,mmstar_average,0.3597603073218589,
|
| 407 |
+
≥2,15000,ocrbench_ocrbench_accuracy,0.571,
|
| 408 |
+
≥2,15000,seedbench_seed_all,0.5739855475264035,
|
| 409 |
+
≥2,15000,textvqa_val_exact_match,0.61512,0.006574049037248568
|
| 410 |
+
≥2,16000,ai2d_exact_match,0.5055051813471503,0.008998608627616667
|
| 411 |
+
≥2,16000,average,0.5226694963682967,
|
| 412 |
+
≥2,16000,average_rank,1.9,
|
| 413 |
+
≥2,16000,chartqa_relaxed_overall,0.6844,0.009296947310365735
|
| 414 |
+
≥2,16000,docvqa_val_anls,0.7369050997741022,0.00564381681657765
|
| 415 |
+
≥2,16000,infovqa_val_anls,0.2990672453595873,0.007260058695111045
|
| 416 |
+
≥2,16000,mme_total_score,1231.8950580232092,
|
| 417 |
+
≥2,16000,mmmu_val_mmmu_acc,0.32333,
|
| 418 |
+
≥2,16000,mmstar_average,0.3660943054808571,
|
| 419 |
+
≥2,16000,ocrbench_ocrbench_accuracy,0.601,
|
| 420 |
+
≥2,16000,seedbench_seed_all,0.5785436353529739,
|
| 421 |
+
≥2,16000,textvqa_val_exact_match,0.6091799999999999,0.006589463538554954
|
| 422 |
+
≥2,17000,ai2d_exact_match,0.5097150259067358,0.008997455247470535
|
| 423 |
+
≥2,17000,average,0.5231030271400094,
|
| 424 |
+
≥2,17000,average_rank,1.5,
|
| 425 |
+
≥2,17000,chartqa_relaxed_overall,0.6844,0.009296947310365735
|
| 426 |
+
≥2,17000,docvqa_val_anls,0.7407178352725541,0.005609117579860497
|
| 427 |
+
≥2,17000,infovqa_val_anls,0.30677928223689904,0.007423923972542159
|
| 428 |
+
≥2,17000,mme_total_score,1251.8118247298921,
|
| 429 |
+
≥2,17000,mmmu_val_mmmu_acc,0.31333,
|
| 430 |
+
≥2,17000,mmstar_average,0.35755470618019386,
|
| 431 |
+
≥2,17000,ocrbench_ocrbench_accuracy,0.595,
|
| 432 |
+
≥2,17000,seedbench_seed_all,0.5787103946637021,
|
| 433 |
+
≥2,17000,textvqa_val_exact_match,0.6217199999999999,0.006547657801423109
|
| 434 |
+
≥2,18000,ai2d_exact_match,0.5129533678756477,0.008996133680935945
|
| 435 |
+
≥2,18000,average,0.520551210243477,
|
| 436 |
+
≥2,18000,average_rank,2.1,
|
| 437 |
+
≥2,18000,chartqa_relaxed_overall,0.6796,0.009334473148259746
|
| 438 |
+
≥2,18000,docvqa_val_anls,0.7420992559479452,0.005605162069925204
|
| 439 |
+
≥2,18000,infovqa_val_anls,0.30026388587258485,0.007302705356586967
|
| 440 |
+
≥2,18000,mme_total_score,1243.8207282913165,
|
| 441 |
+
≥2,18000,mmmu_val_mmmu_acc,0.31111,
|
| 442 |
+
≥2,18000,mmstar_average,0.3520362724339696,
|
| 443 |
+
≥2,18000,ocrbench_ocrbench_accuracy,0.591,
|
| 444 |
+
≥2,18000,seedbench_seed_all,0.576598110061145,
|
| 445 |
+
≥2,18000,textvqa_val_exact_match,0.6193,0.006553540400299342
|
| 446 |
+
≥2,19000,ai2d_exact_match,0.508419689119171,0.008997878107766411
|
| 447 |
+
≥2,19000,average,0.523370364263479,
|
| 448 |
+
≥2,19000,average_rank,2.1,
|
| 449 |
+
≥2,19000,chartqa_relaxed_overall,0.6852,0.009290581788240476
|
| 450 |
+
≥2,19000,docvqa_val_anls,0.7378793056289451,0.005630284657853331
|
| 451 |
+
≥2,19000,infovqa_val_anls,0.29852452208029057,0.007300069652856512
|
| 452 |
+
≥2,19000,mme_total_score,1273.484593837535,
|
| 453 |
+
≥2,19000,mmmu_val_mmmu_acc,0.31778,
|
| 454 |
+
≥2,19000,mmstar_average,0.3583181328603031,
|
| 455 |
+
≥2,19000,ocrbench_ocrbench_accuracy,0.609,
|
| 456 |
+
≥2,19000,seedbench_seed_all,0.5769316286826014,
|
| 457 |
+
≥2,19000,textvqa_val_exact_match,0.6182799999999999,0.0065560479462046795
|
| 458 |
+
≥2,20000,ai2d_exact_match,0.5132772020725389,0.008995980744276042
|
| 459 |
+
≥2,20000,average,0.5252790448062622,
|
| 460 |
+
≥2,20000,average_rank,1.1,
|
| 461 |
+
≥2,20000,chartqa_relaxed_overall,0.6808,0.009325198535746702
|
| 462 |
+
≥2,20000,docvqa_val_anls,0.7417425674578729,0.0056064333517934105
|
| 463 |
+
≥2,20000,infovqa_val_anls,0.3091382953917658,0.007408396253875713
|
| 464 |
+
≥2,20000,mme_total_score,1276.3417366946778,
|
| 465 |
+
≥2,20000,mmmu_val_mmmu_acc,0.31778,
|
| 466 |
+
≥2,20000,mmstar_average,0.35941177079666153,
|
| 467 |
+
≥2,20000,ocrbench_ocrbench_accuracy,0.607,
|
| 468 |
+
≥2,20000,seedbench_seed_all,0.5788215675375209,
|
| 469 |
+
≥2,20000,textvqa_val_exact_match,0.6195400000000001,0.0065546800414733606
|
| 470 |
+
≥3,1000,ai2d_exact_match,0.46696891191709844,0.008979495543032526
|
| 471 |
+
≥3,1000,average,0.4819077497875202,
|
| 472 |
+
≥3,1000,average_rank,2.6,
|
| 473 |
+
≥3,1000,chartqa_relaxed_overall,0.6376,0.009615793331418735
|
| 474 |
+
≥3,1000,docvqa_val_anls,0.6765375416572318,0.00595906808496784
|
| 475 |
+
≥3,1000,infovqa_val_anls,0.28562874655210324,0.007377482443151623
|
| 476 |
+
≥3,1000,mme_total_score,1210.921368547419,
|
| 477 |
+
≥3,1000,mmmu_val_mmmu_acc,0.28556,
|
| 478 |
+
≥3,1000,mmstar_average,0.342259005993489,
|
| 479 |
+
≥3,1000,ocrbench_ocrbench_accuracy,0.534,
|
| 480 |
+
≥3,1000,seedbench_seed_all,0.5559755419677599,
|
| 481 |
+
≥3,1000,textvqa_val_exact_match,0.5526399999999999,0.006746058696867995
|
| 482 |
+
≥3,2000,ai2d_exact_match,0.4685880829015544,0.008981377477192708
|
| 483 |
+
≥3,2000,average,0.48620065133730717,
|
| 484 |
+
≥3,2000,average_rank,2.0,
|
| 485 |
+
≥3,2000,chartqa_relaxed_overall,0.6356,0.009627155802808046
|
| 486 |
+
≥3,2000,docvqa_val_anls,0.6718354763369633,0.005996528203070324
|
| 487 |
+
≥3,2000,infovqa_val_anls,0.26419815798743335,0.006939135175774486
|
| 488 |
+
≥3,2000,mme_total_score,1264.654161664666,
|
| 489 |
+
≥3,2000,mmmu_val_mmmu_acc,0.30333,
|
| 490 |
+
≥3,2000,mmstar_average,0.3562263182394967,
|
| 491 |
+
≥3,2000,ocrbench_ocrbench_accuracy,0.551,
|
| 492 |
+
≥3,2000,seedbench_seed_all,0.5580878265703169,
|
| 493 |
+
≥3,2000,textvqa_val_exact_match,0.56694,0.0067100232609457085
|
| 494 |
+
≥3,3000,ai2d_exact_match,0.4838082901554404,0.008994434238637763
|
| 495 |
+
≥3,3000,average,0.48597261570869915,
|
| 496 |
+
≥3,3000,average_rank,3.0,
|
| 497 |
+
≥3,3000,chartqa_relaxed_overall,0.6316,0.009649342979082627
|
| 498 |
+
≥3,3000,docvqa_val_anls,0.6746316657514325,0.005965125654000594
|
| 499 |
+
≥3,3000,infovqa_val_anls,0.26946459224600977,0.007089931445596614
|
| 500 |
+
≥3,3000,mme_total_score,1247.7741096438576,
|
| 501 |
+
≥3,3000,mmmu_val_mmmu_acc,0.28556,
|
| 502 |
+
≥3,3000,mmstar_average,0.3458825563160156,
|
| 503 |
+
≥3,3000,ocrbench_ocrbench_accuracy,0.562,
|
| 504 |
+
≥3,3000,seedbench_seed_all,0.5555864369093941,
|
| 505 |
+
≥3,3000,textvqa_val_exact_match,0.56522,0.006727876573231477
|
| 506 |
+
≥3,4000,ai2d_exact_match,0.48575129533678757,0.008995499260034972
|
| 507 |
+
≥3,4000,average,0.49355357269641903,
|
| 508 |
+
≥3,4000,average_rank,2.2,
|
| 509 |
+
≥3,4000,chartqa_relaxed_overall,0.6548,0.009510571191350932
|
| 510 |
+
≥3,4000,docvqa_val_anls,0.6853328262496681,0.0059320222320751875
|
| 511 |
+
≥3,4000,infovqa_val_anls,0.2850385340966683,0.007361799302921674
|
| 512 |
+
≥3,4000,mme_total_score,1288.7305922368948,
|
| 513 |
+
≥3,4000,mmmu_val_mmmu_acc,0.30111,
|
| 514 |
+
≥3,4000,mmstar_average,0.357958492470139,
|
| 515 |
+
≥3,4000,ocrbench_ocrbench_accuracy,0.541,
|
| 516 |
+
≥3,4000,seedbench_seed_all,0.5598110061145081,
|
| 517 |
+
≥3,4000,textvqa_val_exact_match,0.57118,0.006705227329084893
|
| 518 |
+
≥3,5000,ai2d_exact_match,0.4727979274611399,0.008985826352357517
|
| 519 |
+
≥3,5000,average,0.4915808423458039,
|
| 520 |
+
≥3,5000,average_rank,2.6,
|
| 521 |
+
≥3,5000,chartqa_relaxed_overall,0.6516,0.009531175862679805
|
| 522 |
+
≥3,5000,docvqa_val_anls,0.6805544343770252,0.005954062592926349
|
| 523 |
+
≥3,5000,infovqa_val_anls,0.2790745100628044,0.007226853744230138
|
| 524 |
+
≥3,5000,mme_total_score,1234.2862144857945,
|
| 525 |
+
≥3,5000,mmmu_val_mmmu_acc,0.30889,
|
| 526 |
+
≥3,5000,mmstar_average,0.3515137442307206,
|
| 527 |
+
≥3,5000,ocrbench_ocrbench_accuracy,0.548,
|
| 528 |
+
≥3,5000,seedbench_seed_all,0.5665369649805447,
|
| 529 |
+
≥3,5000,textvqa_val_exact_match,0.56526,0.006737603842695726
|
| 530 |
+
≥3,6000,ai2d_exact_match,0.4834844559585492,0.008994243503406857
|
| 531 |
+
≥3,6000,average,0.4916502418219912,
|
| 532 |
+
≥3,6000,average_rank,2.8,
|
| 533 |
+
≥3,6000,chartqa_relaxed_overall,0.6504,0.009538780390203614
|
| 534 |
+
≥3,6000,docvqa_val_anls,0.6884045774843457,0.005915343845415068
|
| 535 |
+
≥3,6000,infovqa_val_anls,0.27942328823789453,0.007164390448746867
|
| 536 |
+
≥3,6000,mme_total_score,1169.1683673469388,
|
| 537 |
+
≥3,6000,mmmu_val_mmmu_acc,0.28333,
|
| 538 |
+
≥3,6000,mmstar_average,0.3409248964069589,
|
| 539 |
+
≥3,6000,ocrbench_ocrbench_accuracy,0.564,
|
| 540 |
+
≥3,6000,seedbench_seed_all,0.5649249583101723,
|
| 541 |
+
≥3,6000,textvqa_val_exact_match,0.5699599999999999,0.006704305275108255
|
| 542 |
+
≥3,7000,ai2d_exact_match,0.49158031088082904,0.008997878107766406
|
| 543 |
+
≥3,7000,average,0.498429728288751,
|
| 544 |
+
≥3,7000,average_rank,2.5,
|
| 545 |
+
≥3,7000,chartqa_relaxed_overall,0.6468,0.009561196085649289
|
| 546 |
+
≥3,7000,docvqa_val_anls,0.6934235036732116,0.005908575911274035
|
| 547 |
+
≥3,7000,infovqa_val_anls,0.29038240983122426,0.0073217194111741745
|
| 548 |
+
≥3,7000,mme_total_score,1180.5045018007202,
|
| 549 |
+
≥3,7000,mmmu_val_mmmu_acc,0.31111,
|
| 550 |
+
≥3,7000,mmstar_average,0.3501487176509592,
|
| 551 |
+
≥3,7000,ocrbench_ocrbench_accuracy,0.562,
|
| 552 |
+
≥3,7000,seedbench_seed_all,0.5647026125625347,
|
| 553 |
+
≥3,7000,textvqa_val_exact_match,0.57572,0.00668387845238326
|
| 554 |
+
≥3,8000,ai2d_exact_match,0.5048575129533679,0.008998729431386472
|
| 555 |
+
≥3,8000,average,0.4984949624992469,
|
| 556 |
+
≥3,8000,average_rank,2.8,
|
| 557 |
+
≥3,8000,chartqa_relaxed_overall,0.6472,0.009558734841217527
|
| 558 |
+
≥3,8000,docvqa_val_anls,0.7000334929155309,0.005878854074791644
|
| 559 |
+
≥3,8000,infovqa_val_anls,0.286719889854365,0.0073233352192073635
|
| 560 |
+
≥3,8000,mme_total_score,1136.111644657863,
|
| 561 |
+
≥3,8000,mmmu_val_mmmu_acc,0.28778,
|
| 562 |
+
≥3,8000,mmstar_average,0.33112430039975294,
|
| 563 |
+
≥3,8000,ocrbench_ocrbench_accuracy,0.578,
|
| 564 |
+
≥3,8000,seedbench_seed_all,0.5710394663702056,
|
| 565 |
+
≥3,8000,textvqa_val_exact_match,0.5797000000000001,0.006692483833971778
|
| 566 |
+
≥3,9000,ai2d_exact_match,0.4944948186528497,0.00899860862761667
|
| 567 |
+
≥3,9000,average,0.5022809828687513,
|
| 568 |
+
≥3,9000,average_rank,2.8,
|
| 569 |
+
≥3,9000,chartqa_relaxed_overall,0.6648,0.009443095510537233
|
| 570 |
+
≥3,9000,docvqa_val_anls,0.7066412322864666,0.005811056629671494
|
| 571 |
+
≥3,9000,infovqa_val_anls,0.2915189250095514,0.007376511883779376
|
| 572 |
+
≥3,9000,mme_total_score,1097.9659863945578,
|
| 573 |
+
≥3,9000,mmmu_val_mmmu_acc,0.3,
|
| 574 |
+
≥3,9000,mmstar_average,0.34971925063698756,
|
| 575 |
+
≥3,9000,ocrbench_ocrbench_accuracy,0.565,
|
| 576 |
+
≥3,9000,seedbench_seed_all,0.5663146192329072,
|
| 577 |
+
≥3,9000,textvqa_val_exact_match,0.58204,0.006677395090979731
|
| 578 |
+
≥3,10000,ai2d_exact_match,0.49287564766839376,0.008998240543632312
|
| 579 |
+
≥3,10000,average,0.5094325810245673,
|
| 580 |
+
≥3,10000,average_rank,2.2,
|
| 581 |
+
≥3,10000,chartqa_relaxed_overall,0.6704,0.009403239035659185
|
| 582 |
+
≥3,10000,docvqa_val_anls,0.7142047579734908,0.005771728801461397
|
| 583 |
+
≥3,10000,infovqa_val_anls,0.2964737261567996,0.007512514632225057
|
| 584 |
+
≥3,10000,mme_total_score,1149.6209483793518,
|
| 585 |
+
≥3,10000,mmmu_val_mmmu_acc,0.30778,
|
| 586 |
+
≥3,10000,mmstar_average,0.3527466682951291,
|
| 587 |
+
≥3,10000,ocrbench_ocrbench_accuracy,0.577,
|
| 588 |
+
≥3,10000,seedbench_seed_all,0.5703724291272929,
|
| 589 |
+
≥3,10000,textvqa_val_exact_match,0.6030399999999999,0.006618920886575133
|
| 590 |
+
≥3,11000,ai2d_exact_match,0.506800518134715,0.008998321712163861
|
| 591 |
+
≥3,11000,average,0.5127105840130626,
|
| 592 |
+
≥3,11000,average_rank,2.3,
|
| 593 |
+
≥3,11000,chartqa_relaxed_overall,0.6676,0.009423354808471266
|
| 594 |
+
≥3,11000,docvqa_val_anls,0.7155651550295605,0.005774638250173171
|
| 595 |
+
≥3,11000,infovqa_val_anls,0.2960078107648859,0.007491292300444957
|
| 596 |
+
≥3,11000,mme_total_score,1091.2908163265306,
|
| 597 |
+
≥3,11000,mmmu_val_mmmu_acc,0.31,
|
| 598 |
+
≥3,11000,mmstar_average,0.3487291874190849,
|
| 599 |
+
≥3,11000,ocrbench_ocrbench_accuracy,0.597,
|
| 600 |
+
≥3,11000,seedbench_seed_all,0.5746525847693162,
|
| 601 |
+
≥3,11000,textvqa_val_exact_match,0.59804,0.006635181746369987
|
| 602 |
+
≥3,12000,ai2d_exact_match,0.5045336787564767,0.008998784170060779
|
| 603 |
+
≥3,12000,average,0.5136574673989756,
|
| 604 |
+
≥3,12000,average_rank,2.5,
|
| 605 |
+
≥3,12000,chartqa_relaxed_overall,0.6648,0.009443095510537233
|
| 606 |
+
≥3,12000,docvqa_val_anls,0.7173866974813923,0.005773853880330729
|
| 607 |
+
≥3,12000,infovqa_val_anls,0.31948469442993455,0.007793312195447671
|
| 608 |
+
≥3,12000,mme_total_score,1082.547619047619,
|
| 609 |
+
≥3,12000,mmmu_val_mmmu_acc,0.29889,
|
| 610 |
+
≥3,12000,mmstar_average,0.3472872054060228,
|
| 611 |
+
≥3,12000,ocrbench_ocrbench_accuracy,0.593,
|
| 612 |
+
≥3,12000,seedbench_seed_all,0.5748749305169538,
|
| 613 |
+
≥3,12000,textvqa_val_exact_match,0.6026600000000001,0.006626535072978538
|
| 614 |
+
≥3,13000,ai2d_exact_match,0.4996761658031088,0.008999152231809674
|
| 615 |
+
≥3,13000,average,0.5115591424915379,
|
| 616 |
+
≥3,13000,average_rank,3.0,
|
| 617 |
+
≥3,13000,chartqa_relaxed_overall,0.668,0.009420504145710235
|
| 618 |
+
≥3,13000,docvqa_val_anls,0.7201586562486062,0.005765862770757432
|
| 619 |
+
≥3,13000,infovqa_val_anls,0.30087605763050673,0.007444543350447085
|
| 620 |
+
≥3,13000,mme_total_score,1142.0209083633454,
|
| 621 |
+
≥3,13000,mmmu_val_mmmu_acc,0.31333,
|
| 622 |
+
≥3,13000,mmstar_average,0.35413412647702824,
|
| 623 |
+
≥3,13000,ocrbench_ocrbench_accuracy,0.568,
|
| 624 |
+
≥3,13000,seedbench_seed_all,0.5750972762645914,
|
| 625 |
+
≥3,13000,textvqa_val_exact_match,0.60476,0.0066167835724745445
|
| 626 |
+
≥3,14000,ai2d_exact_match,0.5051813471502591,0.008998670917263325
|
| 627 |
+
≥3,14000,average,0.512283584996583,
|
| 628 |
+
≥3,14000,average_rank,2.9,
|
| 629 |
+
≥3,14000,chartqa_relaxed_overall,0.6748,0.009370864914387439
|
| 630 |
+
≥3,14000,docvqa_val_anls,0.7235575236423071,0.0057268410738261786
|
| 631 |
+
≥3,14000,infovqa_val_anls,0.30893243437607226,0.007712373578271492
|
| 632 |
+
≥3,14000,mme_total_score,1159.9943977591035,
|
| 633 |
+
≥3,14000,mmmu_val_mmmu_acc,0.29667,
|
| 634 |
+
≥3,14000,mmstar_average,0.3421543616905478,
|
| 635 |
+
≥3,14000,ocrbench_ocrbench_accuracy,0.576,
|
| 636 |
+
≥3,14000,seedbench_seed_all,0.5778765981100611,
|
| 637 |
+
≥3,14000,textvqa_val_exact_match,0.6053799999999999,0.006612545370071516
|
| 638 |
+
≥3,15000,ai2d_exact_match,0.501619170984456,0.00899910693271464
|
| 639 |
+
≥3,15000,average,0.5157692661333466,
|
| 640 |
+
≥3,15000,average_rank,2.6,
|
| 641 |
+
≥3,15000,chartqa_relaxed_overall,0.6836,0.009303280948921504
|
| 642 |
+
≥3,15000,docvqa_val_anls,0.7289675184474169,0.005688711489562826
|
| 643 |
+
≥3,15000,infovqa_val_anls,0.31447779168584217,0.0076280570930290885
|
| 644 |
+
≥3,15000,mme_total_score,1129.2125850340135,
|
| 645 |
+
≥3,15000,mmmu_val_mmmu_acc,0.31222,
|
| 646 |
+
≥3,15000,mmstar_average,0.35035142103070877,
|
| 647 |
+
≥3,15000,ocrbench_ocrbench_accuracy,0.563,
|
| 648 |
+
≥3,15000,seedbench_seed_all,0.5774874930516953,
|
| 649 |
+
≥3,15000,textvqa_val_exact_match,0.6102,0.006593260666562748
|
| 650 |
+
≥3,16000,ai2d_exact_match,0.506800518134715,0.00899832171216386
|
| 651 |
+
≥3,16000,average,0.5182958246289815,
|
| 652 |
+
≥3,16000,average_rank,2.5,
|
| 653 |
+
≥3,16000,chartqa_relaxed_overall,0.674,0.009376820884924869
|
| 654 |
+
≥3,16000,docvqa_val_anls,0.7332718536740643,0.005664165532854214
|
| 655 |
+
≥3,16000,infovqa_val_anls,0.3097055695251213,0.007564531791761635
|
| 656 |
+
≥3,16000,mme_total_score,1158.8010204081631,
|
| 657 |
+
≥3,16000,mmmu_val_mmmu_acc,0.30889,
|
| 658 |
+
≥3,16000,mmstar_average,0.3535555364692335,
|
| 659 |
+
≥3,16000,ocrbench_ocrbench_accuracy,0.588,
|
| 660 |
+
≥3,16000,seedbench_seed_all,0.5780989438576987,
|
| 661 |
+
≥3,16000,textvqa_val_exact_match,0.61234,0.006584482968555135
|
| 662 |
+
≥3,17000,ai2d_exact_match,0.4990284974093264,0.008999137132137064
|
| 663 |
+
≥3,17000,average,0.517538300624539,
|
| 664 |
+
≥3,17000,average_rank,2.8,
|
| 665 |
+
≥3,17000,chartqa_relaxed_overall,0.6736,0.009379787213112317
|
| 666 |
+
≥3,17000,docvqa_val_anls,0.7343487873475517,0.005650745093023672
|
| 667 |
+
≥3,17000,infovqa_val_anls,0.30023060019445785,0.007383738588396597
|
| 668 |
+
≥3,17000,mme_total_score,1158.095238095238,
|
| 669 |
+
≥3,17000,mmmu_val_mmmu_acc,0.31222,
|
| 670 |
+
≥3,17000,mmstar_average,0.3493043581903593,
|
| 671 |
+
≥3,17000,ocrbench_ocrbench_accuracy,0.594,
|
| 672 |
+
≥3,17000,seedbench_seed_all,0.5784324624791551,
|
| 673 |
+
≥3,17000,textvqa_val_exact_match,0.61668,0.0065583044906102304
|
| 674 |
+
≥3,18000,ai2d_exact_match,0.5058290155440415,0.008998542562369287
|
| 675 |
+
≥3,18000,average,0.5182210734972332,
|
| 676 |
+
≥3,18000,average_rank,2.5,
|
| 677 |
+
≥3,18000,chartqa_relaxed_overall,0.674,0.009376820884924869
|
| 678 |
+
≥3,18000,docvqa_val_anls,0.7287326909630594,0.005700735629180951
|
| 679 |
+
≥3,18000,infovqa_val_anls,0.30100700787702633,0.007386740457934267
|
| 680 |
+
≥3,18000,mme_total_score,1175.5579231692677,
|
| 681 |
+
≥3,18000,mmmu_val_mmmu_acc,0.32,
|
| 682 |
+
≥3,18000,mmstar_average,0.34714462691309605,
|
| 683 |
+
≥3,18000,ocrbench_ocrbench_accuracy,0.6,
|
| 684 |
+
≥3,18000,seedbench_seed_all,0.5773763201778765,
|
| 685 |
+
≥3,18000,textvqa_val_exact_match,0.6099,0.006589801445917723
|
| 686 |
+
≥3,19000,ai2d_exact_match,0.5045336787564767,0.008998784170060777
|
| 687 |
+
≥3,19000,average,0.5187824665863345,
|
| 688 |
+
≥3,19000,average_rank,2.7,
|
| 689 |
+
≥3,19000,chartqa_relaxed_overall,0.6768,0.009355838641547569
|
| 690 |
+
≥3,19000,docvqa_val_anls,0.7340665543774125,0.005662673189593881
|
| 691 |
+
≥3,19000,infovqa_val_anls,0.3094998838176309,0.007498739242965892
|
| 692 |
+
≥3,19000,mme_total_score,1173.1207482993198,
|
| 693 |
+
≥3,19000,mmmu_val_mmmu_acc,0.30444,
|
| 694 |
+
≥3,19000,mmstar_average,0.34505224352615843,
|
| 695 |
+
≥3,19000,ocrbench_ocrbench_accuracy,0.597,
|
| 696 |
+
≥3,19000,seedbench_seed_all,0.5777098387993329,
|
| 697 |
+
≥3,19000,textvqa_val_exact_match,0.6199399999999999,0.0065535844523310115
|
| 698 |
+
≥3,20000,ai2d_exact_match,0.4944948186528497,0.008998608627616674
|
| 699 |
+
≥3,20000,average,0.5158935311436484,
|
| 700 |
+
≥3,20000,average_rank,2.2,
|
| 701 |
+
≥3,20000,chartqa_relaxed_overall,0.6788,0.00934061683451043
|
| 702 |
+
≥3,20000,docvqa_val_anls,0.7330651042103438,0.0056772111451400455
|
| 703 |
+
≥3,20000,infovqa_val_anls,0.2964558374276726,0.007412691037826716
|
| 704 |
+
≥3,20000,mme_total_score,1203.7891156462586,
|
| 705 |
+
≥3,20000,mmmu_val_mmmu_acc,0.31556,
|
| 706 |
+
≥3,20000,mmstar_average,0.3448737131648378,
|
| 707 |
+
≥3,20000,ocrbench_ocrbench_accuracy,0.592,
|
| 708 |
+
≥3,20000,seedbench_seed_all,0.5741523068371317,
|
| 709 |
+
≥3,20000,textvqa_val_exact_match,0.6136400000000001,0.006578650759020563
|
| 710 |
+
≥4,1000,ai2d_exact_match,0.46599740932642486,0.008978320789223164
|
| 711 |
+
≥4,1000,average,0.4810433130994131,
|
| 712 |
+
≥4,1000,average_rank,3.0,
|
| 713 |
+
≥4,1000,chartqa_relaxed_overall,0.6364,0.009622632385247222
|
| 714 |
+
≥4,1000,docvqa_val_anls,0.6731681544556957,0.005980246808815758
|
| 715 |
+
≥4,1000,infovqa_val_anls,0.273064875980351,0.007121239402495689
|
| 716 |
+
≥4,1000,mme_total_score,1069.875850340136,
|
| 717 |
+
≥4,1000,mmmu_val_mmmu_acc,0.28889,
|
| 718 |
+
≥4,1000,mmstar_average,0.35232408630345313,
|
| 719 |
+
≥4,1000,ocrbench_ocrbench_accuracy,0.54,
|
| 720 |
+
≥4,1000,seedbench_seed_all,0.5455252918287937,
|
| 721 |
+
≥4,1000,textvqa_val_exact_match,0.5540200000000001,0.006743431077169729
|
| 722 |
+
≥4,2000,ai2d_exact_match,0.4579015544041451,0.00896719935987288
|
| 723 |
+
≥4,2000,average,0.4733427752805117,
|
| 724 |
+
≥4,2000,average_rank,4.1,
|
| 725 |
+
≥4,2000,chartqa_relaxed_overall,0.6368,0.009620359896064799
|
| 726 |
+
≥4,2000,docvqa_val_anls,0.6685623057181342,0.006022846398992095
|
| 727 |
+
≥4,2000,infovqa_val_anls,0.2586347697028306,0.006939507684848232
|
| 728 |
+
≥4,2000,mme_total_score,1037.0391156462586,
|
| 729 |
+
≥4,2000,mmmu_val_mmmu_acc,0.27778,
|
| 730 |
+
≥4,2000,mmstar_average,0.3426833682664769,
|
| 731 |
+
≥4,2000,ocrbench_ocrbench_accuracy,0.517,
|
| 732 |
+
≥4,2000,seedbench_seed_all,0.5533629794330184,
|
| 733 |
+
≥4,2000,textvqa_val_exact_match,0.5473600000000001,0.006769325729654826
|
| 734 |
+
≥4,3000,ai2d_exact_match,0.4724740932642487,0.008985506893308395
|
| 735 |
+
≥4,3000,average,0.48486620292260835,
|
| 736 |
+
≥4,3000,average_rank,3.1,
|
| 737 |
+
≥4,3000,chartqa_relaxed_overall,0.648,0.009553790345406665
|
| 738 |
+
≥4,3000,docvqa_val_anls,0.6797920414745026,0.0059259219910189455
|
| 739 |
+
≥4,3000,infovqa_val_anls,0.25291991664683544,0.0068990348571168
|
| 740 |
+
≥4,3000,mme_total_score,989.6139455782312,
|
| 741 |
+
≥4,3000,mmmu_val_mmmu_acc,0.31889,
|
| 742 |
+
≥4,3000,mmstar_average,0.359381486980145,
|
| 743 |
+
≥4,3000,ocrbench_ocrbench_accuracy,0.528,
|
| 744 |
+
≥4,3000,seedbench_seed_all,0.5529182879377432,
|
| 745 |
+
≥4,3000,textvqa_val_exact_match,0.55142,0.006751052663282407
|
| 746 |
+
≥4,4000,ai2d_exact_match,0.48704663212435234,0.008996133680935945
|
| 747 |
+
≥4,4000,average,0.4833844903828087,
|
| 748 |
+
≥4,4000,average_rank,3.6,
|
| 749 |
+
≥4,4000,chartqa_relaxed_overall,0.634,0.00963611653607192
|
| 750 |
+
≥4,4000,docvqa_val_anls,0.6872369707367743,0.005902275856072045
|
| 751 |
+
≥4,4000,infovqa_val_anls,0.26951247528968925,0.007084476663871501
|
| 752 |
+
≥4,4000,mme_total_score,943.8639455782313,
|
| 753 |
+
≥4,4000,mmmu_val_mmmu_acc,0.28556,
|
| 754 |
+
≥4,4000,mmstar_average,0.3561252135601649,
|
| 755 |
+
≥4,4000,ocrbench_ocrbench_accuracy,0.525,
|
| 756 |
+
≥4,4000,seedbench_seed_all,0.5544191217342969,
|
| 757 |
+
≥4,4000,textvqa_val_exact_match,0.55156,0.006755726552211068
|
| 758 |
+
≥4,5000,ai2d_exact_match,0.4944948186528497,0.008998608627616672
|
| 759 |
+
≥4,5000,average,0.4902778679880179,
|
| 760 |
+
≥4,5000,average_rank,3.2,
|
| 761 |
+
≥4,5000,chartqa_relaxed_overall,0.6524,0.009526069199715017
|
| 762 |
+
≥4,5000,docvqa_val_anls,0.6838821393578449,0.005934519981948664
|
| 763 |
+
≥4,5000,infovqa_val_anls,0.2885173111410286,0.007387917761485684
|
| 764 |
+
≥4,5000,mme_total_score,877.7568027210884,
|
| 765 |
+
≥4,5000,mmmu_val_mmmu_acc,0.28222,
|
| 766 |
+
≥4,5000,mmstar_average,0.3491170485770135,
|
| 767 |
+
≥4,5000,ocrbench_ocrbench_accuracy,0.543,
|
| 768 |
+
≥4,5000,seedbench_seed_all,0.5610894941634241,
|
| 769 |
+
≥4,5000,textvqa_val_exact_match,0.55778,0.006740043023304169
|
| 770 |
+
≥4,6000,ai2d_exact_match,0.47830310880829013,0.008990677331728418
|
| 771 |
+
≥4,6000,average,0.49160704402561856,
|
| 772 |
+
≥4,6000,average_rank,3.2,
|
| 773 |
+
≥4,6000,chartqa_relaxed_overall,0.6524,0.009526069199715017
|
| 774 |
+
≥4,6000,docvqa_val_anls,0.6895610098990497,0.005895883993977457
|
| 775 |
+
≥4,6000,infovqa_val_anls,0.29445466931250164,0.007468796422091737
|
| 776 |
+
≥4,6000,mme_total_score,959.8639455782312,
|
| 777 |
+
≥4,6000,mmmu_val_mmmu_acc,0.29889,
|
| 778 |
+
≥4,6000,mmstar_average,0.33644817130133137,
|
| 779 |
+
≥4,6000,ocrbench_ocrbench_accuracy,0.56,
|
| 780 |
+
≥4,6000,seedbench_seed_all,0.5555864369093941,
|
| 781 |
+
≥4,6000,textvqa_val_exact_match,0.5588199999999999,0.006728260950578821
|
| 782 |
+
≥4,7000,ai2d_exact_match,0.5006476683937824,0.008999146569435549
|
| 783 |
+
≥4,7000,average,0.4896016595164798,
|
| 784 |
+
≥4,7000,average_rank,3.4,
|
| 785 |
+
≥4,7000,chartqa_relaxed_overall,0.6572,0.009494805133851454
|
| 786 |
+
≥4,7000,docvqa_val_anls,0.6893686651094205,0.0058969034940001145
|
| 787 |
+
≥4,7000,infovqa_val_anls,0.2859893612299588,0.00729316403263038
|
| 788 |
+
≥4,7000,mme_total_score,927.2312925170069,
|
| 789 |
+
≥4,7000,mmmu_val_mmmu_acc,0.29222,
|
| 790 |
+
≥4,7000,mmstar_average,0.3472936989473972,
|
| 791 |
+
≥4,7000,ocrbench_ocrbench_accuracy,0.519,
|
| 792 |
+
≥4,7000,seedbench_seed_all,0.5559755419677599,
|
| 793 |
+
≥4,7000,textvqa_val_exact_match,0.55872,0.0067301301064875444
|
| 794 |
+
≥4,8000,ai2d_exact_match,0.4731217616580311,0.008986142019669732
|
| 795 |
+
≥4,8000,average,0.492078815307747,
|
| 796 |
+
≥4,8000,average_rank,3.5,
|
| 797 |
+
≥4,8000,chartqa_relaxed_overall,0.6676,0.009423354808471266
|
| 798 |
+
≥4,8000,docvqa_val_anls,0.6925495561792242,0.005900020554879468
|
| 799 |
+
≥4,8000,infovqa_val_anls,0.2810222429209379,0.007207972787105912
|
| 800 |
+
≥4,8000,mme_total_score,848.2908163265306,
|
| 801 |
+
≥4,8000,mmmu_val_mmmu_acc,0.31333,
|
| 802 |
+
≥4,8000,mmstar_average,0.350192703081569,
|
| 803 |
+
≥4,8000,ocrbench_ocrbench_accuracy,0.529,
|
| 804 |
+
≥4,8000,seedbench_seed_all,0.5595330739299611,
|
| 805 |
+
≥4,8000,textvqa_val_exact_match,0.56236,0.006739267736625781
|
| 806 |
+
≥4,9000,ai2d_exact_match,0.48737046632124353,0.0089962828388782
|
| 807 |
+
≥4,9000,average,0.49234565866208857,
|
| 808 |
+
≥4,9000,average_rank,4.1,
|
| 809 |
+
≥4,9000,chartqa_relaxed_overall,0.6608,0.009470650520873179
|
| 810 |
+
≥4,9000,docvqa_val_anls,0.6999407172900073,0.0058399608509493465
|
| 811 |
+
≥4,9000,infovqa_val_anls,0.28057597856713984,0.0072085582760555555
|
| 812 |
+
≥4,9000,mme_total_score,971.6003401360543,
|
| 813 |
+
≥4,9000,mmmu_val_mmmu_acc,0.28111,
|
| 814 |
+
≥4,9000,mmstar_average,0.3444424817337138,
|
| 815 |
+
≥4,9000,ocrbench_ocrbench_accuracy,0.545,
|
| 816 |
+
≥4,9000,seedbench_seed_all,0.5603112840466926,
|
| 817 |
+
≥4,9000,textvqa_val_exact_match,0.5715600000000001,0.006710949310502175
|
| 818 |
+
≥4,10000,ai2d_exact_match,0.5009715025906736,0.008999137132137068
|
| 819 |
+
≥4,10000,average,0.5030450627246211,
|
| 820 |
+
≥4,10000,average_rank,3.3,
|
| 821 |
+
≥4,10000,chartqa_relaxed_overall,0.666,0.009434680984649817
|
| 822 |
+
≥4,10000,docvqa_val_anls,0.7128440324276674,0.005793211438464534
|
| 823 |
+
≥4,10000,infovqa_val_anls,0.28379375750066616,0.007201019014370097
|
| 824 |
+
≥4,10000,mme_total_score,823.7772108843537,
|
| 825 |
+
≥4,10000,mmmu_val_mmmu_acc,0.30444,
|
| 826 |
+
≥4,10000,mmstar_average,0.35617504910097153,
|
| 827 |
+
≥4,10000,ocrbench_ocrbench_accuracy,0.563,
|
| 828 |
+
≥4,10000,seedbench_seed_all,0.562201222901612,
|
| 829 |
+
≥4,10000,textvqa_val_exact_match,0.57798,0.00669581889824864
|
| 830 |
+
≥4,11000,ai2d_exact_match,0.4899611398963731,0.008997340090107678
|
| 831 |
+
≥4,11000,average,0.5043945508574572,
|
| 832 |
+
≥4,11000,average_rank,3.7,
|
| 833 |
+
≥4,11000,chartqa_relaxed_overall,0.6684,0.009417645821601513
|
| 834 |
+
≥4,11000,docvqa_val_anls,0.718360308980877,0.00573640855634517
|
| 835 |
+
≥4,11000,infovqa_val_anls,0.3061911172660032,0.007586892248142986
|
| 836 |
+
≥4,11000,mme_total_score,913.9846938775511,
|
| 837 |
+
≥4,11000,mmmu_val_mmmu_acc,0.30444,
|
| 838 |
+
≥4,11000,mmstar_average,0.3441847617795319,
|
| 839 |
+
≥4,11000,ocrbench_ocrbench_accuracy,0.572,
|
| 840 |
+
≥4,11000,seedbench_seed_all,0.5605336297943302,
|
| 841 |
+
≥4,11000,textvqa_val_exact_match,0.5754799999999999,0.006700024775058468
|
| 842 |
+
≥4,12000,ai2d_exact_match,0.48737046632124353,0.0089962828388782
|
| 843 |
+
≥4,12000,average,0.5040020270755444,
|
| 844 |
+
≥4,12000,average_rank,3.7,
|
| 845 |
+
≥4,12000,chartqa_relaxed_overall,0.6708,0.009400334595970852
|
| 846 |
+
≥4,12000,docvqa_val_anls,0.7119962267424205,0.0057890771916119035
|
| 847 |
+
≥4,12000,infovqa_val_anls,0.29271378410211696,0.007308133874246768
|
| 848 |
+
≥4,12000,mme_total_score,857.7363945578231,
|
| 849 |
+
≥4,12000,mmmu_val_mmmu_acc,0.31333,
|
| 850 |
+
≥4,12000,mmstar_average,0.3366375608443022,
|
| 851 |
+
≥4,12000,ocrbench_ocrbench_accuracy,0.578,
|
| 852 |
+
≥4,12000,seedbench_seed_all,0.5663702056698166,
|
| 853 |
+
≥4,12000,textvqa_val_exact_match,0.5788,0.006686093984573812
|
| 854 |
+
≥4,13000,ai2d_exact_match,0.5029145077720207,0.008999001233939135
|
| 855 |
+
≥4,13000,average,0.5025324527027837,
|
| 856 |
+
≥4,13000,average_rank,3.8,
|
| 857 |
+
≥4,13000,chartqa_relaxed_overall,0.6736,0.009379787213112317
|
| 858 |
+
≥4,13000,docvqa_val_anls,0.7115068932890629,0.0057865061972425
|
| 859 |
+
≥4,13000,infovqa_val_anls,0.28657766964072817,0.007222563686487699
|
| 860 |
+
≥4,13000,mme_total_score,912.2363945578231,
|
| 861 |
+
≥4,13000,mmmu_val_mmmu_acc,0.30222,
|
| 862 |
+
≥4,13000,mmstar_average,0.35323329267271303,
|
| 863 |
+
≥4,13000,ocrbench_ocrbench_accuracy,0.548,
|
| 864 |
+
≥4,13000,seedbench_seed_all,0.5634797109505281,
|
| 865 |
+
≥4,13000,textvqa_val_exact_match,0.58126,0.006685319826323647
|
| 866 |
+
≥4,14000,ai2d_exact_match,0.5029145077720207,0.008999001233939133
|
| 867 |
+
≥4,14000,average,0.5048464815785578,
|
| 868 |
+
≥4,14000,average_rank,3.7,
|
| 869 |
+
≥4,14000,chartqa_relaxed_overall,0.6836,0.009303280948921504
|
| 870 |
+
≥4,14000,docvqa_val_anls,0.7158797575412708,0.005776895411277372
|
| 871 |
+
≥4,14000,infovqa_val_anls,0.2977244895971059,0.007409958547797003
|
| 872 |
+
≥4,14000,mme_total_score,863.687074829932,
|
| 873 |
+
≥4,14000,mmmu_val_mmmu_acc,0.30111,
|
| 874 |
+
≥4,14000,mmstar_average,0.33994823410485003,
|
| 875 |
+
≥4,14000,ocrbench_ocrbench_accuracy,0.562,
|
| 876 |
+
≥4,14000,seedbench_seed_all,0.5584213451917732,
|
| 877 |
+
≥4,14000,textvqa_val_exact_match,0.58202,0.0066807687023343965
|
| 878 |
+
≥4,15000,ai2d_exact_match,0.5123056994818653,0.008996428218289531
|
| 879 |
+
≥4,15000,average,0.5109092566320045,
|
| 880 |
+
≥4,15000,average_rank,3.5,
|
| 881 |
+
≥4,15000,chartqa_relaxed_overall,0.6712,0.009397422445513864
|
| 882 |
+
≥4,15000,docvqa_val_anls,0.7188356324043049,0.005760252125758746
|
| 883 |
+
≥4,15000,infovqa_val_anls,0.31301984081498224,0.007566633771439808
|
| 884 |
+
≥4,15000,mme_total_score,910.8588435374149,
|
| 885 |
+
≥4,15000,mmmu_val_mmmu_acc,0.31333,
|
| 886 |
+
≥4,15000,mmstar_average,0.3405171175316349,
|
| 887 |
+
≥4,15000,ocrbench_ocrbench_accuracy,0.57,
|
| 888 |
+
≥4,15000,seedbench_seed_all,0.5630350194552529,
|
| 889 |
+
≥4,15000,textvqa_val_exact_match,0.59594,0.006638698497713893
|
| 890 |
+
≥4,16000,ai2d_exact_match,0.5035621761658031,0.008998925734053562
|
| 891 |
+
≥4,16000,average,0.5129618260818141,
|
| 892 |
+
≥4,16000,average_rank,3.5,
|
| 893 |
+
≥4,16000,chartqa_relaxed_overall,0.6816,0.00931897598051042
|
| 894 |
+
≥4,16000,docvqa_val_anls,0.721094250093947,0.005753477697941139
|
| 895 |
+
≥4,16000,infovqa_val_anls,0.3169075222245947,0.007639057821423446
|
| 896 |
+
≥4,16000,mme_total_score,900.3299319727892,
|
| 897 |
+
≥4,16000,mmmu_val_mmmu_acc,0.30222,
|
| 898 |
+
≥4,16000,mmstar_average,0.34176931782507797,
|
| 899 |
+
≥4,16000,ocrbench_ocrbench_accuracy,0.588,
|
| 900 |
+
≥4,16000,seedbench_seed_all,0.5657031684269038,
|
| 901 |
+
≥4,16000,textvqa_val_exact_match,0.5958,0.006635034041762488
|
| 902 |
+
≥4,17000,ai2d_exact_match,0.49190414507772023,0.008997974381217109
|
| 903 |
+
≥4,17000,average,0.5093728538878098,
|
| 904 |
+
≥4,17000,average_rank,3.7,
|
| 905 |
+
≥4,17000,chartqa_relaxed_overall,0.68,0.009331389496316869
|
| 906 |
+
≥4,17000,docvqa_val_anls,0.7210491309814,0.005753367994292813
|
| 907 |
+
≥4,17000,infovqa_val_anls,0.3201561983029552,0.007662267005009952
|
| 908 |
+
≥4,17000,mme_total_score,877.8401360544218,
|
| 909 |
+
≥4,17000,mmmu_val_mmmu_acc,0.30778,
|
| 910 |
+
≥4,17000,mmstar_average,0.3348906353085911,
|
| 911 |
+
≥4,17000,ocrbench_ocrbench_accuracy,0.573,
|
| 912 |
+
≥4,17000,seedbench_seed_all,0.564035575319622,
|
| 913 |
+
≥4,17000,textvqa_val_exact_match,0.59154,0.006655985735352941
|
| 914 |
+
≥4,18000,ai2d_exact_match,0.49028497409326427,0.008997455247470554
|
| 915 |
+
≥4,18000,average,0.5099416307525485,
|
| 916 |
+
≥4,18000,average_rank,3.7,
|
| 917 |
+
≥4,18000,chartqa_relaxed_overall,0.6788,0.00934061683451043
|
| 918 |
+
≥4,18000,docvqa_val_anls,0.7282528158071215,0.00570007403218014
|
| 919 |
+
≥4,18000,infovqa_val_anls,0.3087200968720397,0.007513490264469946
|
| 920 |
+
≥4,18000,mme_total_score,929.9506802721088,
|
| 921 |
+
≥4,18000,mmmu_val_mmmu_acc,0.30111,
|
| 922 |
+
≥4,18000,mmstar_average,0.342890936748704,
|
| 923 |
+
≥4,18000,ocrbench_ocrbench_accuracy,0.585,
|
| 924 |
+
≥4,18000,seedbench_seed_all,0.5645358532518066,
|
| 925 |
+
≥4,18000,textvqa_val_exact_match,0.5898800000000001,0.006662761859703513
|
| 926 |
+
≥4,19000,ai2d_exact_match,0.49417098445595853,0.008998542562369278
|
| 927 |
+
≥4,19000,average,0.5124245876024062,
|
| 928 |
+
≥4,19000,average_rank,3.7,
|
| 929 |
+
≥4,19000,chartqa_relaxed_overall,0.6752,0.00936787525721462
|
| 930 |
+
≥4,19000,docvqa_val_anls,0.7321651331177514,0.005674418458926489
|
| 931 |
+
≥4,19000,infovqa_val_anls,0.3128382816486564,0.007545062449451713
|
| 932 |
+
≥4,19000,mme_total_score,920.5612244897959,
|
| 933 |
+
≥4,19000,mmmu_val_mmmu_acc,0.31444,
|
| 934 |
+
≥4,19000,mmstar_average,0.3381960053749425,
|
| 935 |
+
≥4,19000,ocrbench_ocrbench_accuracy,0.581,
|
| 936 |
+
≥4,19000,seedbench_seed_all,0.5635908838243469,
|
| 937 |
+
≥4,19000,textvqa_val_exact_match,0.60022,0.0066236821295251325
|
| 938 |
+
≥4,20000,ai2d_exact_match,0.4993523316062176,0.008999146569435543
|
| 939 |
+
≥4,20000,average,0.5097536365259775,
|
| 940 |
+
≥4,20000,average_rank,2.9,
|
| 941 |
+
≥4,20000,chartqa_relaxed_overall,0.6788,0.00934061683451043
|
| 942 |
+
≥4,20000,docvqa_val_anls,0.7257805691640822,0.005714530309266441
|
| 943 |
+
≥4,20000,infovqa_val_anls,0.3115295213783156,0.007581035362425172
|
| 944 |
+
≥4,20000,mme_total_score,936.8911564625851,
|
| 945 |
+
≥4,20000,mmmu_val_mmmu_acc,0.29333,
|
| 946 |
+
≥4,20000,mmstar_average,0.342179144828651,
|
| 947 |
+
≥4,20000,ocrbench_ocrbench_accuracy,0.572,
|
| 948 |
+
≥4,20000,seedbench_seed_all,0.5640911617565314,
|
| 949 |
+
≥4,20000,textvqa_val_exact_match,0.6007199999999999,0.00662859592800733
|
| 950 |
+
≥5,1000,ai2d_exact_match,0.46275906735751293,0.008974157783087492
|
| 951 |
+
≥5,1000,average,0.46601067306382465,
|
| 952 |
+
≥5,1000,average_rank,4.3,
|
| 953 |
+
≥5,1000,chartqa_relaxed_overall,0.586,0.009852940280589808
|
| 954 |
+
≥5,1000,docvqa_val_anls,0.6587979311295683,0.006033428065938081
|
| 955 |
+
≥5,1000,infovqa_val_anls,0.26573226652787757,0.007027770857338852
|
| 956 |
+
≥5,1000,mme_total_score,1141.8704481792718,
|
| 957 |
+
≥5,1000,mmmu_val_mmmu_acc,0.29111,
|
| 958 |
+
≥5,1000,mmstar_average,0.3326633517590184,
|
| 959 |
+
≥5,1000,ocrbench_ocrbench_accuracy,0.512,
|
| 960 |
+
≥5,1000,seedbench_seed_all,0.5481934408004447,
|
| 961 |
+
≥5,1000,textvqa_val_exact_match,0.53684,0.0067933638823904985
|
| 962 |
+
≥5,2000,ai2d_exact_match,0.46729274611398963,0.008979879527453428
|
| 963 |
+
≥5,2000,average,0.46843615619784085,
|
| 964 |
+
≥5,2000,average_rank,4.5,
|
| 965 |
+
≥5,2000,chartqa_relaxed_overall,0.6232,0.009693621125059844
|
| 966 |
+
≥5,2000,docvqa_val_anls,0.6576303662245503,0.0060380542666198
|
| 967 |
+
≥5,2000,infovqa_val_anls,0.2544768002153279,0.006980921578600097
|
| 968 |
+
≥5,2000,mme_total_score,1121.454081632653,
|
| 969 |
+
≥5,2000,mmmu_val_mmmu_acc,0.28333,
|
| 970 |
+
≥5,2000,mmstar_average,0.3471972386408189,
|
| 971 |
+
≥5,2000,ocrbench_ocrbench_accuracy,0.517,
|
| 972 |
+
≥5,2000,seedbench_seed_all,0.544858254585881,
|
| 973 |
+
≥5,2000,textvqa_val_exact_match,0.52094,0.006790900275023118
|
| 974 |
+
≥5,3000,ai2d_exact_match,0.405440414507772,0.00883675667187808
|
| 975 |
+
≥5,3000,average,0.46118026257153605,
|
| 976 |
+
≥5,3000,average_rank,4.2,
|
| 977 |
+
≥5,3000,chartqa_relaxed_overall,0.6156,0.009731008838409575
|
| 978 |
+
≥5,3000,docvqa_val_anls,0.6431483265654762,0.0060571462869005105
|
| 979 |
+
≥5,3000,infovqa_val_anls,0.25688718356638174,0.007171420821325129
|
| 980 |
+
≥5,3000,mme_total_score,1082.7074829931973,
|
| 981 |
+
≥5,3000,mmmu_val_mmmu_acc,0.29778,
|
| 982 |
+
≥5,3000,mmstar_average,0.3516979671312099,
|
| 983 |
+
≥5,3000,ocrbench_ocrbench_accuracy,0.521,
|
| 984 |
+
≥5,3000,seedbench_seed_all,0.547248471372985,
|
| 985 |
+
≥5,3000,textvqa_val_exact_match,0.51182,0.006815757362882421
|
| 986 |
+
≥5,4000,ai2d_exact_match,0.4491580310880829,0.008952509302111547
|
| 987 |
+
≥5,4000,average,0.4646276743863569,
|
| 988 |
+
≥5,4000,average_rank,4.7,
|
| 989 |
+
≥5,4000,chartqa_relaxed_overall,0.626,0.009679208378267924
|
| 990 |
+
≥5,4000,docvqa_val_anls,0.6457409563970327,0.006096190550822001
|
| 991 |
+
≥5,4000,infovqa_val_anls,0.2657314142312884,0.007188343779485259
|
| 992 |
+
≥5,4000,mme_total_score,1068.4183673469388,
|
| 993 |
+
≥5,4000,mmmu_val_mmmu_acc,0.30333,
|
| 994 |
+
≥5,4000,mmstar_average,0.33033606075691685,
|
| 995 |
+
≥5,4000,ocrbench_ocrbench_accuracy,0.501,
|
| 996 |
+
≥5,4000,seedbench_seed_all,0.546692607003891,
|
| 997 |
+
≥5,4000,textvqa_val_exact_match,0.5136599999999999,0.006800789000270868
|
| 998 |
+
≥5,5000,ai2d_exact_match,0.4630829015544041,0.008974591204222938
|
| 999 |
+
≥5,5000,average,0.47175640838182836,
|
| 1000 |
+
≥5,5000,average_rank,4.3,
|
| 1001 |
+
≥5,5000,chartqa_relaxed_overall,0.6016,0.009793331391099473
|
| 1002 |
+
≥5,5000,docvqa_val_anls,0.6583943642704193,0.006028435004183156
|
| 1003 |
+
≥5,5000,infovqa_val_anls,0.28445442343834715,0.007395809557151278
|
| 1004 |
+
≥5,5000,mme_total_score,1063.6232492997199,
|
| 1005 |
+
≥5,5000,mmmu_val_mmmu_acc,0.30111,
|
| 1006 |
+
≥5,5000,mmstar_average,0.34898828745177285,
|
| 1007 |
+
≥5,5000,ocrbench_ocrbench_accuracy,0.524,
|
| 1008 |
+
≥5,5000,seedbench_seed_all,0.5438576987215119,
|
| 1009 |
+
≥5,5000,textvqa_val_exact_match,0.52032,0.006801255099919928
|
| 1010 |
+
≥5,6000,ai2d_exact_match,0.41936528497409326,0.008881358943343104
|
| 1011 |
+
≥5,6000,average,0.4607787848869989,
|
| 1012 |
+
≥5,6000,average_rank,4.3,
|
| 1013 |
+
≥5,6000,chartqa_relaxed_overall,0.5688,0.009906860368095493
|
| 1014 |
+
≥5,6000,docvqa_val_anls,0.6530464526768891,0.006064278476677726
|
| 1015 |
+
≥5,6000,infovqa_val_anls,0.2838075612518576,0.007411753553258339
|
| 1016 |
+
≥5,6000,mme_total_score,1102.3075230092036,
|
| 1017 |
+
≥5,6000,mmmu_val_mmmu_acc,0.30222,
|
| 1018 |
+
≥5,6000,mmstar_average,0.3384709546298998,
|
| 1019 |
+
≥5,6000,ocrbench_ocrbench_accuracy,0.514,
|
| 1020 |
+
≥5,6000,seedbench_seed_all,0.5458588104502501,
|
| 1021 |
+
≥5,6000,textvqa_val_exact_match,0.52144,0.006795447071398616
|
| 1022 |
+
≥5,7000,ai2d_exact_match,0.44689119170984454,0.008948245073044946
|
| 1023 |
+
≥5,7000,average,0.46361553646961884,
|
| 1024 |
+
≥5,7000,average_rank,4.8,
|
| 1025 |
+
≥5,7000,chartqa_relaxed_overall,0.596,0.009815912634917984
|
| 1026 |
+
≥5,7000,docvqa_val_anls,0.6473018376832792,0.00607167873881633
|
| 1027 |
+
≥5,7000,infovqa_val_anls,0.2701993608610082,0.007205660851186524
|
| 1028 |
+
≥5,7000,mme_total_score,1018.4163665466186,
|
| 1029 |
+
≥5,7000,mmmu_val_mmmu_acc,0.30556,
|
| 1030 |
+
≥5,7000,mmstar_average,0.33545236293074815,
|
| 1031 |
+
≥5,7000,ocrbench_ocrbench_accuracy,0.512,
|
| 1032 |
+
≥5,7000,seedbench_seed_all,0.5431350750416898,
|
| 1033 |
+
≥5,7000,textvqa_val_exact_match,0.516,0.006822412261202951
|
| 1034 |
+
≥5,8000,ai2d_exact_match,0.4488341968911917,0.008951911635408226
|
| 1035 |
+
≥5,8000,average,0.4668616879585074,
|
| 1036 |
+
≥5,8000,average_rank,4.6,
|
| 1037 |
+
≥5,8000,chartqa_relaxed_overall,0.6072,0.00976941352263433
|
| 1038 |
+
≥5,8000,docvqa_val_anls,0.6519934800658986,0.006080604378776126
|
| 1039 |
+
≥5,8000,infovqa_val_anls,0.2785842294592336,0.0074156607313128845
|
| 1040 |
+
≥5,8000,mme_total_score,1057.0833333333333,
|
| 1041 |
+
≥5,8000,mmmu_val_mmmu_acc,0.30333,
|
| 1042 |
+
≥5,8000,mmstar_average,0.3378731184509322,
|
| 1043 |
+
≥5,8000,ocrbench_ocrbench_accuracy,0.518,
|
| 1044 |
+
≥5,8000,seedbench_seed_all,0.5403001667593107,
|
| 1045 |
+
≥5,8000,textvqa_val_exact_match,0.51564,0.006799847473666819
|
| 1046 |
+
≥5,9000,ai2d_exact_match,0.39345854922279794,0.008792480650628204
|
| 1047 |
+
≥5,9000,average,0.459912532971243,
|
| 1048 |
+
≥5,9000,average_rank,4.4,
|
| 1049 |
+
≥5,9000,chartqa_relaxed_overall,0.6084,0.00976411343463736
|
| 1050 |
+
≥5,9000,docvqa_val_anls,0.6541887939771373,0.006063084097609983
|
| 1051 |
+
≥5,9000,infovqa_val_anls,0.27276949319611876,0.007319733478874126
|
| 1052 |
+
≥5,9000,mme_total_score,1123.0184073629453,
|
| 1053 |
+
≥5,9000,mmmu_val_mmmu_acc,0.32333,
|
| 1054 |
+
≥5,9000,mmstar_average,0.3247860715180074,
|
| 1055 |
+
≥5,9000,ocrbench_ocrbench_accuracy,0.509,
|
| 1056 |
+
≥5,9000,seedbench_seed_all,0.5397998888271262,
|
| 1057 |
+
≥5,9000,textvqa_val_exact_match,0.51348,0.006813467735926963
|
| 1058 |
+
≥5,10000,ai2d_exact_match,0.4326424870466321,0.008917121282993509
|
| 1059 |
+
≥5,10000,average,0.46134428795967075,
|
| 1060 |
+
≥5,10000,average_rank,4.9,
|
| 1061 |
+
≥5,10000,chartqa_relaxed_overall,0.6072,0.00976941352263433
|
| 1062 |
+
≥5,10000,docvqa_val_anls,0.651687815510166,0.006071913532526164
|
| 1063 |
+
≥5,10000,infovqa_val_anls,0.27997237091892013,0.007395864137910542
|
| 1064 |
+
≥5,10000,mme_total_score,1022.3228291316527,
|
| 1065 |
+
≥5,10000,mmmu_val_mmmu_acc,0.28889,
|
| 1066 |
+
≥5,10000,mmstar_average,0.329809486810568,
|
| 1067 |
+
≥5,10000,ocrbench_ocrbench_accuracy,0.511,
|
| 1068 |
+
≥5,10000,seedbench_seed_all,0.5375764313507504,
|
| 1069 |
+
≥5,10000,textvqa_val_exact_match,0.51332,0.006823388252580171
|
| 1070 |
+
≥5,11000,ai2d_exact_match,0.4268134715025907,0.008902228386480452
|
| 1071 |
+
≥5,11000,average,0.46104097512732234,
|
| 1072 |
+
≥5,11000,average_rank,4.5,
|
| 1073 |
+
≥5,11000,chartqa_relaxed_overall,0.6012,0.0097949885513097
|
| 1074 |
+
≥5,11000,docvqa_val_anls,0.644837445168982,0.006085623472495874
|
| 1075 |
+
≥5,11000,infovqa_val_anls,0.2640855729780956,0.007198720155523597
|
| 1076 |
+
≥5,11000,mme_total_score,1039.019507803121,
|
| 1077 |
+
≥5,11000,mmmu_val_mmmu_acc,0.30889,
|
| 1078 |
+
≥5,11000,mmstar_average,0.3483796516991238,
|
| 1079 |
+
≥5,11000,ocrbench_ocrbench_accuracy,0.509,
|
| 1080 |
+
≥5,11000,seedbench_seed_all,0.5367426347971095,
|
| 1081 |
+
≥5,11000,textvqa_val_exact_match,0.50942,0.00682319308775463
|
| 1082 |
+
≥5,12000,ai2d_exact_match,0.3944300518134715,0.008796275864065532
|
| 1083 |
+
≥5,12000,average,0.4536626915651331,
|
| 1084 |
+
≥5,12000,average_rank,4.5,
|
| 1085 |
+
≥5,12000,chartqa_relaxed_overall,0.5772,0.009882060820012199
|
| 1086 |
+
≥5,12000,docvqa_val_anls,0.6559090016447592,0.00604562177320508
|
| 1087 |
+
≥5,12000,infovqa_val_anls,0.2673016323091914,0.0072722156919221015
|
| 1088 |
+
≥5,12000,mme_total_score,1023.3735494197679,
|
| 1089 |
+
≥5,12000,mmmu_val_mmmu_acc,0.31778,
|
| 1090 |
+
≥5,12000,mmstar_average,0.3313831269791424,
|
| 1091 |
+
≥5,12000,ocrbench_ocrbench_accuracy,0.505,
|
| 1092 |
+
≥5,12000,seedbench_seed_all,0.5327404113396331,
|
| 1093 |
+
≥5,12000,textvqa_val_exact_match,0.50122,0.006832030272732221
|
| 1094 |
+
≥5,13000,ai2d_exact_match,0.40867875647668395,0.00884778289870742
|
| 1095 |
+
≥5,13000,average,0.45776989353761316,
|
| 1096 |
+
≥5,13000,average_rank,4.6,
|
| 1097 |
+
≥5,13000,chartqa_relaxed_overall,0.5908,0.009835692163550793
|
| 1098 |
+
≥5,13000,docvqa_val_anls,0.6503688245286325,0.0060676446684505315
|
| 1099 |
+
≥5,13000,infovqa_val_anls,0.2636657235502622,0.007162177374827191
|
| 1100 |
+
≥5,13000,mme_total_score,1002.4256702681073,
|
| 1101 |
+
≥5,13000,mmmu_val_mmmu_acc,0.31556,
|
| 1102 |
+
≥5,13000,mmstar_average,0.3443129801956691,
|
| 1103 |
+
≥5,13000,ocrbench_ocrbench_accuracy,0.512,
|
| 1104 |
+
≥5,13000,seedbench_seed_all,0.5329627570872707,
|
| 1105 |
+
≥5,13000,textvqa_val_exact_match,0.50158,0.006823401826251807
|
| 1106 |
+
≥5,14000,ai2d_exact_match,0.41483160621761656,0.008867639612484149
|
| 1107 |
+
≥5,14000,average,0.45268439675941724,
|
| 1108 |
+
≥5,14000,average_rank,4.8,
|
| 1109 |
+
≥5,14000,chartqa_relaxed_overall,0.588,0.009845871036662436
|
| 1110 |
+
≥5,14000,docvqa_val_anls,0.6498549577427524,0.006073551423686328
|
| 1111 |
+
≥5,14000,infovqa_val_anls,0.2661623211050356,0.00731739457874179
|
| 1112 |
+
≥5,14000,mme_total_score,1034.9393757503,
|
| 1113 |
+
≥5,14000,mmmu_val_mmmu_acc,0.3,
|
| 1114 |
+
≥5,14000,mmstar_average,0.3342054606442816,
|
| 1115 |
+
≥5,14000,ocrbench_ocrbench_accuracy,0.494,
|
| 1116 |
+
≥5,14000,seedbench_seed_all,0.5294052251250695,
|
| 1117 |
+
≥5,14000,textvqa_val_exact_match,0.4977,0.006830920457365827
|
| 1118 |
+
≥5,15000,ai2d_exact_match,0.42001295336787564,0.008883255931688048
|
| 1119 |
+
≥5,15000,average,0.4570450291018434,
|
| 1120 |
+
≥5,15000,average_rank,4.9,
|
| 1121 |
+
≥5,15000,chartqa_relaxed_overall,0.59,0.009838634025503496
|
| 1122 |
+
≥5,15000,docvqa_val_anls,0.6475057079650752,0.006081599544786637
|
| 1123 |
+
≥5,15000,infovqa_val_anls,0.26732840510253686,0.007267222145742162
|
| 1124 |
+
≥5,15000,mme_total_score,1033.4811924769908,
|
| 1125 |
+
≥5,15000,mmmu_val_mmmu_acc,0.30667,
|
| 1126 |
+
≥5,15000,mmstar_average,0.33017343172346,
|
| 1127 |
+
≥5,15000,ocrbench_ocrbench_accuracy,0.516,
|
| 1128 |
+
≥5,15000,seedbench_seed_all,0.5345747637576431,
|
| 1129 |
+
≥5,15000,textvqa_val_exact_match,0.5011399999999999,0.006833438318727342
|
| 1130 |
+
≥5,16000,ai2d_exact_match,0.41353626943005184,0.008863577928878446
|
| 1131 |
+
≥5,16000,average,0.45298319741394405,
|
| 1132 |
+
≥5,16000,average_rank,4.9,
|
| 1133 |
+
≥5,16000,chartqa_relaxed_overall,0.5928,0.00982821965366181
|
| 1134 |
+
≥5,16000,docvqa_val_anls,0.6444827949953511,0.006083163064354419
|
| 1135 |
+
≥5,16000,infovqa_val_anls,0.27176255535031313,0.0073933278275316846
|
| 1136 |
+
≥5,16000,mme_total_score,1004.6780712284915,
|
| 1137 |
+
≥5,16000,mmmu_val_mmmu_acc,0.30111,
|
| 1138 |
+
≥5,16000,mmstar_average,0.32864449435945237,
|
| 1139 |
+
≥5,16000,ocrbench_ocrbench_accuracy,0.502,
|
| 1140 |
+
≥5,16000,seedbench_seed_all,0.526792662590328,
|
| 1141 |
+
≥5,16000,textvqa_val_exact_match,0.49572000000000005,0.006835033273947625
|
| 1142 |
+
≥5,17000,ai2d_exact_match,0.4112694300518135,0.008856317823411107
|
| 1143 |
+
≥5,17000,average,0.4529560838437233,
|
| 1144 |
+
≥5,17000,average_rank,4.6,
|
| 1145 |
+
≥5,17000,chartqa_relaxed_overall,0.5876,0.009847298295140926
|
| 1146 |
+
≥5,17000,docvqa_val_anls,0.6389774022522821,0.00612389508858012
|
| 1147 |
+
≥5,17000,infovqa_val_anls,0.2806511079053113,0.007508796510654168
|
| 1148 |
+
≥5,17000,mme_total_score,995.327831132453,
|
| 1149 |
+
≥5,17000,mmmu_val_mmmu_acc,0.31333,
|
| 1150 |
+
≥5,17000,mmstar_average,0.33458844306670454,
|
| 1151 |
+
≥5,17000,ocrbench_ocrbench_accuracy,0.496,
|
| 1152 |
+
≥5,17000,seedbench_seed_all,0.5230683713173986,
|
| 1153 |
+
≥5,17000,textvqa_val_exact_match,0.49112,0.006832742230852753
|
| 1154 |
+
≥5,18000,ai2d_exact_match,0.41936528497409326,0.008881358943343104
|
| 1155 |
+
≥5,18000,average,0.4506952703715631,
|
| 1156 |
+
≥5,18000,average_rank,4.6,
|
| 1157 |
+
≥5,18000,chartqa_relaxed_overall,0.5752,0.009888230116554488
|
| 1158 |
+
≥5,18000,docvqa_val_anls,0.6393690836463973,0.006115883377355433
|
| 1159 |
+
≥5,18000,infovqa_val_anls,0.26588334023973736,0.007337010225614936
|
| 1160 |
+
≥5,18000,mme_total_score,989.2097839135654,
|
| 1161 |
+
≥5,18000,mmmu_val_mmmu_acc,0.31333,
|
| 1162 |
+
≥5,18000,mmstar_average,0.32322974671841465,
|
| 1163 |
+
≥5,18000,ocrbench_ocrbench_accuracy,0.498,
|
| 1164 |
+
≥5,18000,seedbench_seed_all,0.5279599777654252,
|
| 1165 |
+
≥5,18000,textvqa_val_exact_match,0.49391999999999997,0.006830895911063903
|
| 1166 |
+
≥5,19000,ai2d_exact_match,0.3954015544041451,0.008800034697838395
|
| 1167 |
+
≥5,19000,average,0.44423744725800945,
|
| 1168 |
+
≥5,19000,average_rank,4.9,
|
| 1169 |
+
≥5,19000,chartqa_relaxed_overall,0.5744,0.009890651444389179
|
| 1170 |
+
≥5,19000,docvqa_val_anls,0.6275859067200067,0.006146304949422434
|
| 1171 |
+
≥5,19000,infovqa_val_anls,0.27001013621966435,0.00741081345045112
|
| 1172 |
+
≥5,19000,mme_total_score,1012.671468587435,
|
| 1173 |
+
≥5,19000,mmmu_val_mmmu_acc,0.30222,
|
| 1174 |
+
≥5,19000,mmstar_average,0.33172139573813564,
|
| 1175 |
+
≥5,19000,ocrbench_ocrbench_accuracy,0.489,
|
| 1176 |
+
≥5,19000,seedbench_seed_all,0.5244580322401334,
|
| 1177 |
+
≥5,19000,textvqa_val_exact_match,0.48334000000000005,0.006839754771120511
|
| 1178 |
+
≥5,20000,ai2d_exact_match,0.3950777202072539,0.00879878579254534
|
| 1179 |
+
≥5,20000,average,0.44700580037620813,
|
| 1180 |
+
≥5,20000,average_rank,3.8,
|
| 1181 |
+
≥5,20000,chartqa_relaxed_overall,0.5824,0.009865243291986469
|
| 1182 |
+
≥5,20000,docvqa_val_anls,0.635044358086249,0.006123826440768213
|
| 1183 |
+
≥5,20000,infovqa_val_anls,0.2648967410637257,0.0073547743128100345
|
| 1184 |
+
≥5,20000,mme_total_score,1015.2638055222089,
|
| 1185 |
+
≥5,20000,mmmu_val_mmmu_acc,0.31111,
|
| 1186 |
+
≥5,20000,mmstar_average,0.33540590765288064,
|
| 1187 |
+
≥5,20000,ocrbench_ocrbench_accuracy,0.485,
|
| 1188 |
+
≥5,20000,seedbench_seed_all,0.5234574763757643,
|
| 1189 |
+
≥5,20000,textvqa_val_exact_match,0.49065999999999993,0.0068247980522276805
|
app/src/content/assets/data/ss_vs_s1.csv
ADDED
|
@@ -0,0 +1,481 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
run,step,metric,value,stderr
|
| 2 |
+
Single Stage,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
|
| 3 |
+
Single Stage,1000,average,0.27120689295763617,
|
| 4 |
+
Single Stage,1000,average_rank,2.0,
|
| 5 |
+
Single Stage,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
|
| 6 |
+
Single Stage,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
|
| 7 |
+
Single Stage,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
|
| 8 |
+
Single Stage,1000,mme_total_score,977.4280712284914,
|
| 9 |
+
Single Stage,1000,mmmu_val_mmmu_acc,0.25222,
|
| 10 |
+
Single Stage,1000,mmstar_average,0.23215874078908072,
|
| 11 |
+
Single Stage,1000,ocrbench_ocrbench_accuracy,0.286,
|
| 12 |
+
Single Stage,1000,seedbench_seed_all,0.2563646470261256,
|
| 13 |
+
Single Stage,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
|
| 14 |
+
Single Stage,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
|
| 15 |
+
Single Stage,2000,average,0.3202068275596269,
|
| 16 |
+
Single Stage,2000,average_rank,1.8,
|
| 17 |
+
Single Stage,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
|
| 18 |
+
Single Stage,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
|
| 19 |
+
Single Stage,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
|
| 20 |
+
Single Stage,2000,mme_total_score,1049.3036214485794,
|
| 21 |
+
Single Stage,2000,mmmu_val_mmmu_acc,0.24556,
|
| 22 |
+
Single Stage,2000,mmstar_average,0.21305462434540698,
|
| 23 |
+
Single Stage,2000,ocrbench_ocrbench_accuracy,0.395,
|
| 24 |
+
Single Stage,2000,seedbench_seed_all,0.258532518065592,
|
| 25 |
+
Single Stage,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
|
| 26 |
+
Single Stage,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
|
| 27 |
+
Single Stage,3000,average,0.3507423834414229,
|
| 28 |
+
Single Stage,3000,average_rank,1.7,
|
| 29 |
+
Single Stage,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
|
| 30 |
+
Single Stage,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
|
| 31 |
+
Single Stage,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
|
| 32 |
+
Single Stage,3000,mme_total_score,1170.2383953581434,
|
| 33 |
+
Single Stage,3000,mmmu_val_mmmu_acc,0.27556,
|
| 34 |
+
Single Stage,3000,mmstar_average,0.25432376938577683,
|
| 35 |
+
Single Stage,3000,ocrbench_ocrbench_accuracy,0.436,
|
| 36 |
+
Single Stage,3000,seedbench_seed_all,0.2792106725958866,
|
| 37 |
+
Single Stage,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
|
| 38 |
+
Single Stage,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
|
| 39 |
+
Single Stage,4000,average,0.36961781722974835,
|
| 40 |
+
Single Stage,4000,average_rank,1.8,
|
| 41 |
+
Single Stage,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
|
| 42 |
+
Single Stage,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
|
| 43 |
+
Single Stage,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
|
| 44 |
+
Single Stage,4000,mme_total_score,1155.203781512605,
|
| 45 |
+
Single Stage,4000,mmmu_val_mmmu_acc,0.25556,
|
| 46 |
+
Single Stage,4000,mmstar_average,0.2575590188757354,
|
| 47 |
+
Single Stage,4000,ocrbench_ocrbench_accuracy,0.453,
|
| 48 |
+
Single Stage,4000,seedbench_seed_all,0.33913285158421347,
|
| 49 |
+
Single Stage,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
|
| 50 |
+
Single Stage,5000,ai2d_exact_match,0.3125,0.008342439145556371
|
| 51 |
+
Single Stage,5000,average,0.3974627910380972,
|
| 52 |
+
Single Stage,5000,average_rank,1.8,
|
| 53 |
+
Single Stage,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
|
| 54 |
+
Single Stage,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
|
| 55 |
+
Single Stage,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
|
| 56 |
+
Single Stage,5000,mme_total_score,1181.4653861544618,
|
| 57 |
+
Single Stage,5000,mmmu_val_mmmu_acc,0.26667,
|
| 58 |
+
Single Stage,5000,mmstar_average,0.29596648146165705,
|
| 59 |
+
Single Stage,5000,ocrbench_ocrbench_accuracy,0.462,
|
| 60 |
+
Single Stage,5000,seedbench_seed_all,0.43107281823235133,
|
| 61 |
+
Single Stage,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
|
| 62 |
+
Single Stage,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
|
| 63 |
+
Single Stage,6000,average,0.4161227404571003,
|
| 64 |
+
Single Stage,6000,average_rank,1.6,
|
| 65 |
+
Single Stage,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
|
| 66 |
+
Single Stage,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
|
| 67 |
+
Single Stage,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
|
| 68 |
+
Single Stage,6000,mme_total_score,1284.1648659463785,
|
| 69 |
+
Single Stage,6000,mmmu_val_mmmu_acc,0.27111,
|
| 70 |
+
Single Stage,6000,mmstar_average,0.2978489412854164,
|
| 71 |
+
Single Stage,6000,ocrbench_ocrbench_accuracy,0.495,
|
| 72 |
+
Single Stage,6000,seedbench_seed_all,0.4795997776542524,
|
| 73 |
+
Single Stage,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
|
| 74 |
+
Single Stage,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
|
| 75 |
+
Single Stage,7000,average,0.4291083177345374,
|
| 76 |
+
Single Stage,7000,average_rank,1.6,
|
| 77 |
+
Single Stage,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
|
| 78 |
+
Single Stage,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
|
| 79 |
+
Single Stage,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
|
| 80 |
+
Single Stage,7000,mme_total_score,1185.875650260104,
|
| 81 |
+
Single Stage,7000,mmmu_val_mmmu_acc,0.26556,
|
| 82 |
+
Single Stage,7000,mmstar_average,0.31372400960777047,
|
| 83 |
+
Single Stage,7000,ocrbench_ocrbench_accuracy,0.504,
|
| 84 |
+
Single Stage,7000,seedbench_seed_all,0.4964424680377988,
|
| 85 |
+
Single Stage,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
|
| 86 |
+
Single Stage,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
|
| 87 |
+
Single Stage,8000,average,0.43846759477995995,
|
| 88 |
+
Single Stage,8000,average_rank,1.5,
|
| 89 |
+
Single Stage,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
|
| 90 |
+
Single Stage,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
|
| 91 |
+
Single Stage,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
|
| 92 |
+
Single Stage,8000,mme_total_score,1199.2409963985594,
|
| 93 |
+
Single Stage,8000,mmmu_val_mmmu_acc,0.28111,
|
| 94 |
+
Single Stage,8000,mmstar_average,0.33512257186205047,
|
| 95 |
+
Single Stage,8000,ocrbench_ocrbench_accuracy,0.51,
|
| 96 |
+
Single Stage,8000,seedbench_seed_all,0.5024458032240133,
|
| 97 |
+
Single Stage,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
|
| 98 |
+
Single Stage,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
|
| 99 |
+
Single Stage,9000,average,0.4422510732201056,
|
| 100 |
+
Single Stage,9000,average_rank,1.6,
|
| 101 |
+
Single Stage,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
|
| 102 |
+
Single Stage,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
|
| 103 |
+
Single Stage,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
|
| 104 |
+
Single Stage,9000,mme_total_score,1231.5195078031213,
|
| 105 |
+
Single Stage,9000,mmmu_val_mmmu_acc,0.25889,
|
| 106 |
+
Single Stage,9000,mmstar_average,0.3216444898242951,
|
| 107 |
+
Single Stage,9000,ocrbench_ocrbench_accuracy,0.515,
|
| 108 |
+
Single Stage,9000,seedbench_seed_all,0.5120622568093385,
|
| 109 |
+
Single Stage,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
|
| 110 |
+
Single Stage,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
|
| 111 |
+
Single Stage,10000,average,0.4523875703250908,
|
| 112 |
+
Single Stage,10000,average_rank,1.3,
|
| 113 |
+
Single Stage,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
|
| 114 |
+
Single Stage,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
|
| 115 |
+
Single Stage,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
|
| 116 |
+
Single Stage,10000,mme_total_score,1240.8218287314926,
|
| 117 |
+
Single Stage,10000,mmmu_val_mmmu_acc,0.28778,
|
| 118 |
+
Single Stage,10000,mmstar_average,0.32972717906018517,
|
| 119 |
+
Single Stage,10000,ocrbench_ocrbench_accuracy,0.517,
|
| 120 |
+
Single Stage,10000,seedbench_seed_all,0.5217342968315731,
|
| 121 |
+
Single Stage,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
|
| 122 |
+
Single Stage,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
|
| 123 |
+
Single Stage,11000,average,0.4561398159525099,
|
| 124 |
+
Single Stage,11000,average_rank,1.2,
|
| 125 |
+
Single Stage,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
|
| 126 |
+
Single Stage,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
|
| 127 |
+
Single Stage,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
|
| 128 |
+
Single Stage,11000,mme_total_score,1322.9488795518205,
|
| 129 |
+
Single Stage,11000,mmmu_val_mmmu_acc,0.27778,
|
| 130 |
+
Single Stage,11000,mmstar_average,0.3298563439522548,
|
| 131 |
+
Single Stage,11000,ocrbench_ocrbench_accuracy,0.521,
|
| 132 |
+
Single Stage,11000,seedbench_seed_all,0.5237354085603113,
|
| 133 |
+
Single Stage,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
|
| 134 |
+
Single Stage,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
|
| 135 |
+
Single Stage,12000,average,0.4582751140055433,
|
| 136 |
+
Single Stage,12000,average_rank,1.4,
|
| 137 |
+
Single Stage,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 138 |
+
Single Stage,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
|
| 139 |
+
Single Stage,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
|
| 140 |
+
Single Stage,12000,mme_total_score,1225.6453581432572,
|
| 141 |
+
Single Stage,12000,mmmu_val_mmmu_acc,0.27889,
|
| 142 |
+
Single Stage,12000,mmstar_average,0.34010867846816534,
|
| 143 |
+
Single Stage,12000,ocrbench_ocrbench_accuracy,0.512,
|
| 144 |
+
Single Stage,12000,seedbench_seed_all,0.5350194552529183,
|
| 145 |
+
Single Stage,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
|
| 146 |
+
Single Stage,13000,ai2d_exact_match,0.4375,0.008928571428571428
|
| 147 |
+
Single Stage,13000,average,0.4692868662590049,
|
| 148 |
+
Single Stage,13000,average_rank,1.2,
|
| 149 |
+
Single Stage,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
|
| 150 |
+
Single Stage,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
|
| 151 |
+
Single Stage,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
|
| 152 |
+
Single Stage,13000,mme_total_score,1281.7122849139657,
|
| 153 |
+
Single Stage,13000,mmmu_val_mmmu_acc,0.28222,
|
| 154 |
+
Single Stage,13000,mmstar_average,0.3453069542917521,
|
| 155 |
+
Single Stage,13000,ocrbench_ocrbench_accuracy,0.549,
|
| 156 |
+
Single Stage,13000,seedbench_seed_all,0.5442468037798777,
|
| 157 |
+
Single Stage,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
|
| 158 |
+
Single Stage,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
|
| 159 |
+
Single Stage,14000,average,0.47352486841689195,
|
| 160 |
+
Single Stage,14000,average_rank,1.4,
|
| 161 |
+
Single Stage,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
|
| 162 |
+
Single Stage,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
|
| 163 |
+
Single Stage,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
|
| 164 |
+
Single Stage,14000,mme_total_score,1309.1444577831132,
|
| 165 |
+
Single Stage,14000,mmmu_val_mmmu_acc,0.28111,
|
| 166 |
+
Single Stage,14000,mmstar_average,0.34575818188776586,
|
| 167 |
+
Single Stage,14000,ocrbench_ocrbench_accuracy,0.551,
|
| 168 |
+
Single Stage,14000,seedbench_seed_all,0.5483602001111729,
|
| 169 |
+
Single Stage,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
|
| 170 |
+
Single Stage,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
|
| 171 |
+
Single Stage,15000,average,0.47878665012878824,
|
| 172 |
+
Single Stage,15000,average_rank,1.2,
|
| 173 |
+
Single Stage,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 174 |
+
Single Stage,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
|
| 175 |
+
Single Stage,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
|
| 176 |
+
Single Stage,15000,mme_total_score,1384.2171868747498,
|
| 177 |
+
Single Stage,15000,mmmu_val_mmmu_acc,0.30222,
|
| 178 |
+
Single Stage,15000,mmstar_average,0.35408135695920684,
|
| 179 |
+
Single Stage,15000,ocrbench_ocrbench_accuracy,0.558,
|
| 180 |
+
Single Stage,15000,seedbench_seed_all,0.5411339633129516,
|
| 181 |
+
Single Stage,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
|
| 182 |
+
Single Stage,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
|
| 183 |
+
Single Stage,16000,average,0.47665128022935843,
|
| 184 |
+
Single Stage,16000,average_rank,1.3,
|
| 185 |
+
Single Stage,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 186 |
+
Single Stage,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
|
| 187 |
+
Single Stage,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
|
| 188 |
+
Single Stage,16000,mme_total_score,1317.8491396558625,
|
| 189 |
+
Single Stage,16000,mmmu_val_mmmu_acc,0.27556,
|
| 190 |
+
Single Stage,16000,mmstar_average,0.33214333327093315,
|
| 191 |
+
Single Stage,16000,ocrbench_ocrbench_accuracy,0.56,
|
| 192 |
+
Single Stage,16000,seedbench_seed_all,0.5463590883824346,
|
| 193 |
+
Single Stage,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
|
| 194 |
+
Single Stage,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
|
| 195 |
+
Single Stage,17000,average,0.4777141780162423,
|
| 196 |
+
Single Stage,17000,average_rank,1.3,
|
| 197 |
+
Single Stage,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 198 |
+
Single Stage,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
|
| 199 |
+
Single Stage,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
|
| 200 |
+
Single Stage,17000,mme_total_score,1381.9161664665867,
|
| 201 |
+
Single Stage,17000,mmmu_val_mmmu_acc,0.27667,
|
| 202 |
+
Single Stage,17000,mmstar_average,0.3370289492329521,
|
| 203 |
+
Single Stage,17000,ocrbench_ocrbench_accuracy,0.519,
|
| 204 |
+
Single Stage,17000,seedbench_seed_all,0.5510283490828238,
|
| 205 |
+
Single Stage,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
|
| 206 |
+
Single Stage,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
|
| 207 |
+
Single Stage,18000,average,0.4819834595278701,
|
| 208 |
+
Single Stage,18000,average_rank,1.3,
|
| 209 |
+
Single Stage,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
|
| 210 |
+
Single Stage,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
|
| 211 |
+
Single Stage,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
|
| 212 |
+
Single Stage,18000,mme_total_score,1336.922769107643,
|
| 213 |
+
Single Stage,18000,mmmu_val_mmmu_acc,0.28667,
|
| 214 |
+
Single Stage,18000,mmstar_average,0.34482796716566916,
|
| 215 |
+
Single Stage,18000,ocrbench_ocrbench_accuracy,0.533,
|
| 216 |
+
Single Stage,18000,seedbench_seed_all,0.5543079488604781,
|
| 217 |
+
Single Stage,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
|
| 218 |
+
Single Stage,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
|
| 219 |
+
Single Stage,19000,average,0.4899006713916878,
|
| 220 |
+
Single Stage,19000,average_rank,1.1,
|
| 221 |
+
Single Stage,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
|
| 222 |
+
Single Stage,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
|
| 223 |
+
Single Stage,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
|
| 224 |
+
Single Stage,19000,mme_total_score,1406.6628651460583,
|
| 225 |
+
Single Stage,19000,mmmu_val_mmmu_acc,0.28333,
|
| 226 |
+
Single Stage,19000,mmstar_average,0.356220913822775,
|
| 227 |
+
Single Stage,19000,ocrbench_ocrbench_accuracy,0.577,
|
| 228 |
+
Single Stage,19000,seedbench_seed_all,0.554585881045025,
|
| 229 |
+
Single Stage,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
|
| 230 |
+
Single Stage,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
|
| 231 |
+
Single Stage,20000,average,0.4873169067639118,
|
| 232 |
+
Single Stage,20000,average_rank,1.2,
|
| 233 |
+
Single Stage,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
|
| 234 |
+
Single Stage,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
|
| 235 |
+
Single Stage,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
|
| 236 |
+
Single Stage,20000,mme_total_score,1324.6738695478193,
|
| 237 |
+
Single Stage,20000,mmmu_val_mmmu_acc,0.30111,
|
| 238 |
+
Single Stage,20000,mmstar_average,0.33806766134497995,
|
| 239 |
+
Single Stage,20000,ocrbench_ocrbench_accuracy,0.555,
|
| 240 |
+
Single Stage,20000,seedbench_seed_all,0.5587548638132296,
|
| 241 |
+
Single Stage,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
|
| 242 |
+
Two Stage,1000,ai2d_exact_match,0.25906735751295334,0.007885466610693084
|
| 243 |
+
Two Stage,1000,average,0.31368848609084204,
|
| 244 |
+
Two Stage,1000,average_rank,1.0,
|
| 245 |
+
Two Stage,1000,chartqa_relaxed_overall,0.4436,0.009938164963872337
|
| 246 |
+
Two Stage,1000,docvqa_val_anls,0.42857906272393714,0.00617017051120098
|
| 247 |
+
Two Stage,1000,infovqa_val_anls,0.19144447578161194,0.006593728313201272
|
| 248 |
+
Two Stage,1000,mme_total_score,998.7869147659063,
|
| 249 |
+
Two Stage,1000,mmmu_val_mmmu_acc,0.25889,
|
| 250 |
+
Two Stage,1000,mmstar_average,0.2467637945300377,
|
| 251 |
+
Two Stage,1000,ocrbench_ocrbench_accuracy,0.368,
|
| 252 |
+
Two Stage,1000,seedbench_seed_all,0.25703168426903833,
|
| 253 |
+
Two Stage,1000,textvqa_val_exact_match,0.36982,0.006597131039140386
|
| 254 |
+
Two Stage,2000,ai2d_exact_match,0.26327720207253885,0.007926662492947052
|
| 255 |
+
Two Stage,2000,average,0.3358130433652279,
|
| 256 |
+
Two Stage,2000,average_rank,1.2,
|
| 257 |
+
Two Stage,2000,chartqa_relaxed_overall,0.4992,0.010001987797631107
|
| 258 |
+
Two Stage,2000,docvqa_val_anls,0.4932752040405314,0.006286364089099095
|
| 259 |
+
Two Stage,2000,infovqa_val_anls,0.19095428252193772,0.006391194919224349
|
| 260 |
+
Two Stage,2000,mme_total_score,1062.8957583033214,
|
| 261 |
+
Two Stage,2000,mmmu_val_mmmu_acc,0.23333,
|
| 262 |
+
Two Stage,2000,mmstar_average,0.22051867830573926,
|
| 263 |
+
Two Stage,2000,ocrbench_ocrbench_accuracy,0.435,
|
| 264 |
+
Two Stage,2000,seedbench_seed_all,0.2556420233463035,
|
| 265 |
+
Two Stage,2000,textvqa_val_exact_match,0.43112,0.006756288819146318
|
| 266 |
+
Two Stage,3000,ai2d_exact_match,0.2655440414507772,0.007948457289013515
|
| 267 |
+
Two Stage,3000,average,0.3636919255920759,
|
| 268 |
+
Two Stage,3000,average_rank,1.3,
|
| 269 |
+
Two Stage,3000,chartqa_relaxed_overall,0.5348,0.009977745545085072
|
| 270 |
+
Two Stage,3000,docvqa_val_anls,0.5283823835512687,0.006261305725762883
|
| 271 |
+
Two Stage,3000,infovqa_val_anls,0.2064005153919739,0.00660395026420985
|
| 272 |
+
Two Stage,3000,mme_total_score,1152.5195078031213,
|
| 273 |
+
Two Stage,3000,mmmu_val_mmmu_acc,0.26667,
|
| 274 |
+
Two Stage,3000,mmstar_average,0.26072557614922737,
|
| 275 |
+
Two Stage,3000,ocrbench_ocrbench_accuracy,0.455,
|
| 276 |
+
Two Stage,3000,seedbench_seed_all,0.29666481378543635,
|
| 277 |
+
Two Stage,3000,textvqa_val_exact_match,0.45903999999999995,0.006792178031860127
|
| 278 |
+
Two Stage,4000,ai2d_exact_match,0.30343264248704666,0.008274550183857863
|
| 279 |
+
Two Stage,4000,average,0.386738207804619,
|
| 280 |
+
Two Stage,4000,average_rank,1.2,
|
| 281 |
+
Two Stage,4000,chartqa_relaxed_overall,0.5464,0.00995883966107287
|
| 282 |
+
Two Stage,4000,docvqa_val_anls,0.5513347609587042,0.006295149714671814
|
| 283 |
+
Two Stage,4000,infovqa_val_anls,0.209061566918142,0.006630816594060217
|
| 284 |
+
Two Stage,4000,mme_total_score,1092.9095638255303,
|
| 285 |
+
Two Stage,4000,mmmu_val_mmmu_acc,0.26889,
|
| 286 |
+
Two Stage,4000,mmstar_average,0.26686799048357046,
|
| 287 |
+
Two Stage,4000,ocrbench_ocrbench_accuracy,0.477,
|
| 288 |
+
Two Stage,4000,seedbench_seed_all,0.38643690939410785,
|
| 289 |
+
Two Stage,4000,textvqa_val_exact_match,0.47121999999999997,0.006809171409434235
|
| 290 |
+
Two Stage,5000,ai2d_exact_match,0.34617875647668395,0.008562713351618975
|
| 291 |
+
Two Stage,5000,average,0.41048271276999254,
|
| 292 |
+
Two Stage,5000,average_rank,1.2,
|
| 293 |
+
Two Stage,5000,chartqa_relaxed_overall,0.5568,0.009937253322797029
|
| 294 |
+
Two Stage,5000,docvqa_val_anls,0.5616928036954175,0.006281333847375657
|
| 295 |
+
Two Stage,5000,infovqa_val_anls,0.21417615930558564,0.006470237976804916
|
| 296 |
+
Two Stage,5000,mme_total_score,1113.2024809923969,
|
| 297 |
+
Two Stage,5000,mmmu_val_mmmu_acc,0.28889,
|
| 298 |
+
Two Stage,5000,mmstar_average,0.3048769900603613,
|
| 299 |
+
Two Stage,5000,ocrbench_ocrbench_accuracy,0.501,
|
| 300 |
+
Two Stage,5000,seedbench_seed_all,0.4454697053918844,
|
| 301 |
+
Two Stage,5000,textvqa_val_exact_match,0.47525999999999996,0.006811465752181289
|
| 302 |
+
Two Stage,6000,ai2d_exact_match,0.3853626943005181,0.008759432661868542
|
| 303 |
+
Two Stage,6000,average,0.4256324408073156,
|
| 304 |
+
Two Stage,6000,average_rank,1.4,
|
| 305 |
+
Two Stage,6000,chartqa_relaxed_overall,0.574,0.009891852177211218
|
| 306 |
+
Two Stage,6000,docvqa_val_anls,0.5959624206334873,0.006223948314975518
|
| 307 |
+
Two Stage,6000,infovqa_val_anls,0.21910870056052556,0.00650522330852698
|
| 308 |
+
Two Stage,6000,mme_total_score,1166.5228091236495,
|
| 309 |
+
Two Stage,6000,mmmu_val_mmmu_acc,0.28333,
|
| 310 |
+
Two Stage,6000,mmstar_average,0.28797389940888596,
|
| 311 |
+
Two Stage,6000,ocrbench_ocrbench_accuracy,0.512,
|
| 312 |
+
Two Stage,6000,seedbench_seed_all,0.4776542523624236,
|
| 313 |
+
Two Stage,6000,textvqa_val_exact_match,0.4953,0.006792791061270795
|
| 314 |
+
Two Stage,7000,ai2d_exact_match,0.3915155440414508,0.008784780895708935
|
| 315 |
+
Two Stage,7000,average,0.4301306852910006,
|
| 316 |
+
Two Stage,7000,average_rank,1.4,
|
| 317 |
+
Two Stage,7000,chartqa_relaxed_overall,0.5776,0.009880807059104824
|
| 318 |
+
Two Stage,7000,docvqa_val_anls,0.5986163103423551,0.0062031909815058375
|
| 319 |
+
Two Stage,7000,infovqa_val_anls,0.22133856274121264,0.006604073748499083
|
| 320 |
+
Two Stage,7000,mme_total_score,1191.3954581832734,
|
| 321 |
+
Two Stage,7000,mmmu_val_mmmu_acc,0.28667,
|
| 322 |
+
Two Stage,7000,mmstar_average,0.2999043663917079,
|
| 323 |
+
Two Stage,7000,ocrbench_ocrbench_accuracy,0.501,
|
| 324 |
+
Two Stage,7000,seedbench_seed_all,0.48449138410227904,
|
| 325 |
+
Two Stage,7000,textvqa_val_exact_match,0.51004,0.006807782962299279
|
| 326 |
+
Two Stage,8000,ai2d_exact_match,0.4106217616580311,0.008854207883828033
|
| 327 |
+
Two Stage,8000,average,0.4460743520389214,
|
| 328 |
+
Two Stage,8000,average_rank,1.5,
|
| 329 |
+
Two Stage,8000,chartqa_relaxed_overall,0.6044,0.009781540134915584
|
| 330 |
+
Two Stage,8000,docvqa_val_anls,0.6026263625222106,0.006221681650022778
|
| 331 |
+
Two Stage,8000,infovqa_val_anls,0.25653488200256863,0.007114496312902602
|
| 332 |
+
Two Stage,8000,mme_total_score,1122.452581032413,
|
| 333 |
+
Two Stage,8000,mmmu_val_mmmu_acc,0.30556,
|
| 334 |
+
Two Stage,8000,mmstar_average,0.3287554228678711,
|
| 335 |
+
Two Stage,8000,ocrbench_ocrbench_accuracy,0.502,
|
| 336 |
+
Two Stage,8000,seedbench_seed_all,0.4953307392996109,
|
| 337 |
+
Two Stage,8000,textvqa_val_exact_match,0.5088400000000001,0.006790286627123755
|
| 338 |
+
Two Stage,9000,ai2d_exact_match,0.40900259067357514,0.00884886365109852
|
| 339 |
+
Two Stage,9000,average,0.4448373661618862,
|
| 340 |
+
Two Stage,9000,average_rank,1.4,
|
| 341 |
+
Two Stage,9000,chartqa_relaxed_overall,0.602,0.00979166741164548
|
| 342 |
+
Two Stage,9000,docvqa_val_anls,0.6230206474600885,0.006150742264825986
|
| 343 |
+
Two Stage,9000,infovqa_val_anls,0.22695214706156083,0.0066522293148095326
|
| 344 |
+
Two Stage,9000,mme_total_score,1123.2771108443376,
|
| 345 |
+
Two Stage,9000,mmmu_val_mmmu_acc,0.28444,
|
| 346 |
+
Two Stage,9000,mmstar_average,0.31337399530900006,
|
| 347 |
+
Two Stage,9000,ocrbench_ocrbench_accuracy,0.516,
|
| 348 |
+
Two Stage,9000,seedbench_seed_all,0.5044469149527515,
|
| 349 |
+
Two Stage,9000,textvqa_val_exact_match,0.5243,0.006775919466531711
|
| 350 |
+
Two Stage,10000,ai2d_exact_match,0.4167746113989637,0.008873613803189363
|
| 351 |
+
Two Stage,10000,average,0.45019708387432694,
|
| 352 |
+
Two Stage,10000,average_rank,1.7,
|
| 353 |
+
Two Stage,10000,chartqa_relaxed_overall,0.6008,0.00979663889573671
|
| 354 |
+
Two Stage,10000,docvqa_val_anls,0.625559493523932,0.006163808988970625
|
| 355 |
+
Two Stage,10000,infovqa_val_anls,0.2484394159425024,0.006960467307383163
|
| 356 |
+
Two Stage,10000,mme_total_score,1175.7940176070429,
|
| 357 |
+
Two Stage,10000,mmmu_val_mmmu_acc,0.28444,
|
| 358 |
+
Two Stage,10000,mmstar_average,0.3201372990396749,
|
| 359 |
+
Two Stage,10000,ocrbench_ocrbench_accuracy,0.523,
|
| 360 |
+
Two Stage,10000,seedbench_seed_all,0.5092829349638688,
|
| 361 |
+
Two Stage,10000,textvqa_val_exact_match,0.52334,0.006775531746371587
|
| 362 |
+
Two Stage,11000,ai2d_exact_match,0.4219559585492228,0.008888852746011196
|
| 363 |
+
Two Stage,11000,average,0.4544831873326875,
|
| 364 |
+
Two Stage,11000,average_rank,1.8,
|
| 365 |
+
Two Stage,11000,chartqa_relaxed_overall,0.6128,0.009744149186940382
|
| 366 |
+
Two Stage,11000,docvqa_val_anls,0.6332812103643084,0.006140691371662128
|
| 367 |
+
Two Stage,11000,infovqa_val_anls,0.23863681037743975,0.006726839163261667
|
| 368 |
+
Two Stage,11000,mme_total_score,1205.7752100840335,
|
| 369 |
+
Two Stage,11000,mmmu_val_mmmu_acc,0.27667,
|
| 370 |
+
Two Stage,11000,mmstar_average,0.3207287756303977,
|
| 371 |
+
Two Stage,11000,ocrbench_ocrbench_accuracy,0.542,
|
| 372 |
+
Two Stage,11000,seedbench_seed_all,0.5166759310728183,
|
| 373 |
+
Two Stage,11000,textvqa_val_exact_match,0.5276,0.006779501480792346
|
| 374 |
+
Two Stage,12000,ai2d_exact_match,0.43005181347150256,0.00891065778843896
|
| 375 |
+
Two Stage,12000,average,0.4603231834457321,
|
| 376 |
+
Two Stage,12000,average_rank,1.6,
|
| 377 |
+
Two Stage,12000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 378 |
+
Two Stage,12000,docvqa_val_anls,0.6395985301346107,0.006113052714689484
|
| 379 |
+
Two Stage,12000,infovqa_val_anls,0.2439170659215255,0.006865310277271596
|
| 380 |
+
Two Stage,12000,mme_total_score,1157.484293717487,
|
| 381 |
+
Two Stage,12000,mmmu_val_mmmu_acc,0.29556,
|
| 382 |
+
Two Stage,12000,mmstar_average,0.33444157500257155,
|
| 383 |
+
Two Stage,12000,ocrbench_ocrbench_accuracy,0.539,
|
| 384 |
+
Two Stage,12000,seedbench_seed_all,0.5193996664813786,
|
| 385 |
+
Two Stage,12000,textvqa_val_exact_match,0.52894,0.006785904875622425
|
| 386 |
+
Two Stage,13000,ai2d_exact_match,0.4339378238341969,0.00892025987527176
|
| 387 |
+
Two Stage,13000,average,0.46490664749620997,
|
| 388 |
+
Two Stage,13000,average_rank,1.8,
|
| 389 |
+
Two Stage,13000,chartqa_relaxed_overall,0.6224,0.009697675699134625
|
| 390 |
+
Two Stage,13000,docvqa_val_anls,0.6462803017356844,0.0061027748005307945
|
| 391 |
+
Two Stage,13000,infovqa_val_anls,0.24426636134362278,0.006797247018813037
|
| 392 |
+
Two Stage,13000,mme_total_score,1191.0042016806724,
|
| 393 |
+
Two Stage,13000,mmmu_val_mmmu_acc,0.3,
|
| 394 |
+
Two Stage,13000,mmstar_average,0.33993002648901727,
|
| 395 |
+
Two Stage,13000,ocrbench_ocrbench_accuracy,0.545,
|
| 396 |
+
Two Stage,13000,seedbench_seed_all,0.5175653140633686,
|
| 397 |
+
Two Stage,13000,textvqa_val_exact_match,0.5347799999999999,0.0067635803775740536
|
| 398 |
+
Two Stage,14000,ai2d_exact_match,0.44332901554404147,0.008941163900483138
|
| 399 |
+
Two Stage,14000,average,0.47155104399726233,
|
| 400 |
+
Two Stage,14000,average_rank,1.6,
|
| 401 |
+
Two Stage,14000,chartqa_relaxed_overall,0.6268,0.009675026948726469
|
| 402 |
+
Two Stage,14000,docvqa_val_anls,0.6586021078894133,0.006060927182389954
|
| 403 |
+
Two Stage,14000,infovqa_val_anls,0.2553127836308732,0.0069494972189920795
|
| 404 |
+
Two Stage,14000,mme_total_score,1219.156662665066,
|
| 405 |
+
Two Stage,14000,mmmu_val_mmmu_acc,0.30444,
|
| 406 |
+
Two Stage,14000,mmstar_average,0.32252187023399065,
|
| 407 |
+
Two Stage,14000,ocrbench_ocrbench_accuracy,0.564,
|
| 408 |
+
Two Stage,14000,seedbench_seed_all,0.5245136186770428,
|
| 409 |
+
Two Stage,14000,textvqa_val_exact_match,0.54444,0.006760159556655915
|
| 410 |
+
Two Stage,15000,ai2d_exact_match,0.44527202072538863,0.008945084019331405
|
| 411 |
+
Two Stage,15000,average,0.47506404899487137,
|
| 412 |
+
Two Stage,15000,average_rank,1.8,
|
| 413 |
+
Two Stage,15000,chartqa_relaxed_overall,0.628,0.009668701749325345
|
| 414 |
+
Two Stage,15000,docvqa_val_anls,0.6614266719753668,0.006055793707421594
|
| 415 |
+
Two Stage,15000,infovqa_val_anls,0.25669760055121127,0.006992050333066725
|
| 416 |
+
Two Stage,15000,mme_total_score,1198.7210884353742,
|
| 417 |
+
Two Stage,15000,mmmu_val_mmmu_acc,0.31222,
|
| 418 |
+
Two Stage,15000,mmstar_average,0.34599838005318234,
|
| 419 |
+
Two Stage,15000,ocrbench_ocrbench_accuracy,0.553,
|
| 420 |
+
Two Stage,15000,seedbench_seed_all,0.5271817676486937,
|
| 421 |
+
Two Stage,15000,textvqa_val_exact_match,0.5457799999999999,0.006751174267547695
|
| 422 |
+
Two Stage,16000,ai2d_exact_match,0.452720207253886,0.008958830742136086
|
| 423 |
+
Two Stage,16000,average,0.4756900312291722,
|
| 424 |
+
Two Stage,16000,average_rank,1.7,
|
| 425 |
+
Two Stage,16000,chartqa_relaxed_overall,0.6228,0.009695651925812239
|
| 426 |
+
Two Stage,16000,docvqa_val_anls,0.6636227651335681,0.006049765989250173
|
| 427 |
+
Two Stage,16000,infovqa_val_anls,0.2545981800588258,0.0069034382302033005
|
| 428 |
+
Two Stage,16000,mme_total_score,1211.0271108443376,
|
| 429 |
+
Two Stage,16000,mmmu_val_mmmu_acc,0.30778,
|
| 430 |
+
Two Stage,16000,mmstar_average,0.3441840591332238,
|
| 431 |
+
Two Stage,16000,ocrbench_ocrbench_accuracy,0.558,
|
| 432 |
+
Two Stage,16000,seedbench_seed_all,0.5251250694830462,
|
| 433 |
+
Two Stage,16000,textvqa_val_exact_match,0.55238,0.006735691577574321
|
| 434 |
+
Two Stage,17000,ai2d_exact_match,0.45142487046632124,0.008956585653027465
|
| 435 |
+
Two Stage,17000,average,0.478877157951835,
|
| 436 |
+
Two Stage,17000,average_rank,1.7,
|
| 437 |
+
Two Stage,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 438 |
+
Two Stage,17000,docvqa_val_anls,0.6682822523143818,0.006027291004964481
|
| 439 |
+
Two Stage,17000,infovqa_val_anls,0.2566899031113292,0.006984361605936137
|
| 440 |
+
Two Stage,17000,mme_total_score,1157.7550020008005,
|
| 441 |
+
Two Stage,17000,mmmu_val_mmmu_acc,0.31556,
|
| 442 |
+
Two Stage,17000,mmstar_average,0.3413821094043331,
|
| 443 |
+
Two Stage,17000,ocrbench_ocrbench_accuracy,0.563,
|
| 444 |
+
Two Stage,17000,seedbench_seed_all,0.5275152862701501,
|
| 445 |
+
Two Stage,17000,textvqa_val_exact_match,0.55404,0.006743665997528143
|
| 446 |
+
Two Stage,18000,ai2d_exact_match,0.45077720207253885,0.008955440137395842
|
| 447 |
+
Two Stage,18000,average,0.48011960096968553,
|
| 448 |
+
Two Stage,18000,average_rank,1.7,
|
| 449 |
+
Two Stage,18000,chartqa_relaxed_overall,0.6324,0.00964496273307725
|
| 450 |
+
Two Stage,18000,docvqa_val_anls,0.6669938909662756,0.006030949772272312
|
| 451 |
+
Two Stage,18000,infovqa_val_anls,0.26114082779542375,0.006997258882360672
|
| 452 |
+
Two Stage,18000,mme_total_score,1199.3700480192078,
|
| 453 |
+
Two Stage,18000,mmmu_val_mmmu_acc,0.30222,
|
| 454 |
+
Two Stage,18000,mmstar_average,0.34746272024423847,
|
| 455 |
+
Two Stage,18000,ocrbench_ocrbench_accuracy,0.579,
|
| 456 |
+
Two Stage,18000,seedbench_seed_all,0.5271817676486937,
|
| 457 |
+
Two Stage,18000,textvqa_val_exact_match,0.5539,0.0067478933611137175
|
| 458 |
+
Two Stage,19000,ai2d_exact_match,0.44559585492227977,0.00894572391435784
|
| 459 |
+
Two Stage,19000,average,0.48026929849849115,
|
| 460 |
+
Two Stage,19000,average_rank,1.9,
|
| 461 |
+
Two Stage,19000,chartqa_relaxed_overall,0.6372,0.00961808021316077
|
| 462 |
+
Two Stage,19000,docvqa_val_anls,0.6688318561206944,0.006022351017420005
|
| 463 |
+
Two Stage,19000,infovqa_val_anls,0.2646354907091152,0.007027671735260141
|
| 464 |
+
Two Stage,19000,mme_total_score,1170.1806722689075,
|
| 465 |
+
Two Stage,19000,mmmu_val_mmmu_acc,0.29778,
|
| 466 |
+
Two Stage,19000,mmstar_average,0.35086201891999,
|
| 467 |
+
Two Stage,19000,ocrbench_ocrbench_accuracy,0.574,
|
| 468 |
+
Two Stage,19000,seedbench_seed_all,0.5292384658143413,
|
| 469 |
+
Two Stage,19000,textvqa_val_exact_match,0.55428,0.006746127657232224
|
| 470 |
+
Two Stage,20000,ai2d_exact_match,0.44721502590673573,0.008948865761421001
|
| 471 |
+
Two Stage,20000,average,0.4807284005437735,
|
| 472 |
+
Two Stage,20000,average_rank,1.8,
|
| 473 |
+
Two Stage,20000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 474 |
+
Two Stage,20000,docvqa_val_anls,0.6696120046502304,0.0060246464192922275
|
| 475 |
+
Two Stage,20000,infovqa_val_anls,0.2643335615077466,0.007024758501317731
|
| 476 |
+
Two Stage,20000,mme_total_score,1187.4589835934376,
|
| 477 |
+
Two Stage,20000,mmmu_val_mmmu_acc,0.29778,
|
| 478 |
+
Two Stage,20000,mmstar_average,0.34891710287927624,
|
| 479 |
+
Two Stage,20000,ocrbench_ocrbench_accuracy,0.582,
|
| 480 |
+
Two Stage,20000,seedbench_seed_all,0.5282379099499722,
|
| 481 |
+
Two Stage,20000,textvqa_val_exact_match,0.5564600000000001,0.006728915911338792
|
app/src/content/assets/data/visual_dependency_filters.csv
ADDED
|
@@ -0,0 +1,1165 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
run,step,metric,value,stderr
|
| 2 |
+
Baseline,1000,ai2d_exact_match,0.2548575129533679,0.007843322436924496
|
| 3 |
+
Baseline,1000,average,0.27120689295763617,
|
| 4 |
+
Baseline,1000,average_rank,3.5,
|
| 5 |
+
Baseline,1000,chartqa_relaxed_overall,0.3308,0.009411906161401973
|
| 6 |
+
Baseline,1000,docvqa_val_anls,0.3528553494243383,0.005852289239342309
|
| 7 |
+
Baseline,1000,infovqa_val_anls,0.17320578642581314,0.006297063452679795
|
| 8 |
+
Baseline,1000,mme_total_score,977.4280712284914,
|
| 9 |
+
Baseline,1000,mmmu_val_mmmu_acc,0.25222,
|
| 10 |
+
Baseline,1000,mmstar_average,0.23215874078908072,
|
| 11 |
+
Baseline,1000,ocrbench_ocrbench_accuracy,0.286,
|
| 12 |
+
Baseline,1000,seedbench_seed_all,0.2563646470261256,
|
| 13 |
+
Baseline,1000,textvqa_val_exact_match,0.3024,0.00628900296642181
|
| 14 |
+
Baseline,2000,ai2d_exact_match,0.26295336787564766,0.007923526907377255
|
| 15 |
+
Baseline,2000,average,0.3202068275596269,
|
| 16 |
+
Baseline,2000,average_rank,3.7,
|
| 17 |
+
Baseline,2000,chartqa_relaxed_overall,0.4688,0.009982508912777261
|
| 18 |
+
Baseline,2000,docvqa_val_anls,0.4452261510942785,0.00614755494712251
|
| 19 |
+
Baseline,2000,infovqa_val_anls,0.1820547866557169,0.006217861455795791
|
| 20 |
+
Baseline,2000,mme_total_score,1049.3036214485794,
|
| 21 |
+
Baseline,2000,mmmu_val_mmmu_acc,0.24556,
|
| 22 |
+
Baseline,2000,mmstar_average,0.21305462434540698,
|
| 23 |
+
Baseline,2000,ocrbench_ocrbench_accuracy,0.395,
|
| 24 |
+
Baseline,2000,seedbench_seed_all,0.258532518065592,
|
| 25 |
+
Baseline,2000,textvqa_val_exact_match,0.41068000000000005,0.006697862330024289
|
| 26 |
+
Baseline,3000,ai2d_exact_match,0.25226683937823835,0.007816909588794397
|
| 27 |
+
Baseline,3000,average,0.3507423834414229,
|
| 28 |
+
Baseline,3000,average_rank,2.6,
|
| 29 |
+
Baseline,3000,chartqa_relaxed_overall,0.5028,0.010001843767601082
|
| 30 |
+
Baseline,3000,docvqa_val_anls,0.502653993831009,0.006267072346683124
|
| 31 |
+
Baseline,3000,infovqa_val_anls,0.21728617578189535,0.006796941784959762
|
| 32 |
+
Baseline,3000,mme_total_score,1170.2383953581434,
|
| 33 |
+
Baseline,3000,mmmu_val_mmmu_acc,0.27556,
|
| 34 |
+
Baseline,3000,mmstar_average,0.25432376938577683,
|
| 35 |
+
Baseline,3000,ocrbench_ocrbench_accuracy,0.436,
|
| 36 |
+
Baseline,3000,seedbench_seed_all,0.2792106725958866,
|
| 37 |
+
Baseline,3000,textvqa_val_exact_match,0.43658,0.006766885462882726
|
| 38 |
+
Baseline,4000,ai2d_exact_match,0.2645725388601036,0.007939149662089447
|
| 39 |
+
Baseline,4000,average,0.36961781722974835,
|
| 40 |
+
Baseline,4000,average_rank,3.2,
|
| 41 |
+
Baseline,4000,chartqa_relaxed_overall,0.5312,0.009982508912777261
|
| 42 |
+
Baseline,4000,docvqa_val_anls,0.5374434618615119,0.0062905728113059655
|
| 43 |
+
Baseline,4000,infovqa_val_anls,0.2287924838861707,0.006994568698639919
|
| 44 |
+
Baseline,4000,mme_total_score,1155.203781512605,
|
| 45 |
+
Baseline,4000,mmmu_val_mmmu_acc,0.25556,
|
| 46 |
+
Baseline,4000,mmstar_average,0.2575590188757354,
|
| 47 |
+
Baseline,4000,ocrbench_ocrbench_accuracy,0.453,
|
| 48 |
+
Baseline,4000,seedbench_seed_all,0.33913285158421347,
|
| 49 |
+
Baseline,4000,textvqa_val_exact_match,0.4593,0.006791695475025738
|
| 50 |
+
Baseline,5000,ai2d_exact_match,0.3125,0.008342439145556371
|
| 51 |
+
Baseline,5000,average,0.3974627910380972,
|
| 52 |
+
Baseline,5000,average_rank,3.2,
|
| 53 |
+
Baseline,5000,chartqa_relaxed_overall,0.5488,0.00995424828018316
|
| 54 |
+
Baseline,5000,docvqa_val_anls,0.552360266782429,0.006300308519952055
|
| 55 |
+
Baseline,5000,infovqa_val_anls,0.23425555286643698,0.007002254622066442
|
| 56 |
+
Baseline,5000,mme_total_score,1181.4653861544618,
|
| 57 |
+
Baseline,5000,mmmu_val_mmmu_acc,0.26667,
|
| 58 |
+
Baseline,5000,mmstar_average,0.29596648146165705,
|
| 59 |
+
Baseline,5000,ocrbench_ocrbench_accuracy,0.462,
|
| 60 |
+
Baseline,5000,seedbench_seed_all,0.43107281823235133,
|
| 61 |
+
Baseline,5000,textvqa_val_exact_match,0.47354000000000007,0.0068172185364497985
|
| 62 |
+
Baseline,6000,ai2d_exact_match,0.358160621761658,0.008629463221867162
|
| 63 |
+
Baseline,6000,average,0.4161227404571003,
|
| 64 |
+
Baseline,6000,average_rank,2.7,
|
| 65 |
+
Baseline,6000,chartqa_relaxed_overall,0.5628,0.00992279440175477
|
| 66 |
+
Baseline,6000,docvqa_val_anls,0.5747451497228876,0.00625495440870239
|
| 67 |
+
Baseline,6000,infovqa_val_anls,0.22152017368968838,0.006604546680525351
|
| 68 |
+
Baseline,6000,mme_total_score,1284.1648659463785,
|
| 69 |
+
Baseline,6000,mmmu_val_mmmu_acc,0.27111,
|
| 70 |
+
Baseline,6000,mmstar_average,0.2978489412854164,
|
| 71 |
+
Baseline,6000,ocrbench_ocrbench_accuracy,0.495,
|
| 72 |
+
Baseline,6000,seedbench_seed_all,0.4795997776542524,
|
| 73 |
+
Baseline,6000,textvqa_val_exact_match,0.48432,0.006800535050670284
|
| 74 |
+
Baseline,7000,ai2d_exact_match,0.3707901554404145,0.00869347755587734
|
| 75 |
+
Baseline,7000,average,0.4291083177345374,
|
| 76 |
+
Baseline,7000,average_rank,2.5,
|
| 77 |
+
Baseline,7000,chartqa_relaxed_overall,0.5656,0.009915542506251351
|
| 78 |
+
Baseline,7000,docvqa_val_anls,0.5940907049431567,0.006224236305767187
|
| 79 |
+
Baseline,7000,infovqa_val_anls,0.2515675215816963,0.007105097396092786
|
| 80 |
+
Baseline,7000,mme_total_score,1185.875650260104,
|
| 81 |
+
Baseline,7000,mmmu_val_mmmu_acc,0.26556,
|
| 82 |
+
Baseline,7000,mmstar_average,0.31372400960777047,
|
| 83 |
+
Baseline,7000,ocrbench_ocrbench_accuracy,0.504,
|
| 84 |
+
Baseline,7000,seedbench_seed_all,0.4964424680377988,
|
| 85 |
+
Baseline,7000,textvqa_val_exact_match,0.5002,0.006794794025220267
|
| 86 |
+
Baseline,8000,ai2d_exact_match,0.37759067357512954,0.008725299846043883
|
| 87 |
+
Baseline,8000,average,0.43846759477995995,
|
| 88 |
+
Baseline,8000,average_rank,2.3,
|
| 89 |
+
Baseline,8000,chartqa_relaxed_overall,0.5832,0.009862556058385773
|
| 90 |
+
Baseline,8000,docvqa_val_anls,0.6017336419437208,0.006231612198089698
|
| 91 |
+
Baseline,8000,infovqa_val_anls,0.2449256624147254,0.006992518502948913
|
| 92 |
+
Baseline,8000,mme_total_score,1199.2409963985594,
|
| 93 |
+
Baseline,8000,mmmu_val_mmmu_acc,0.28111,
|
| 94 |
+
Baseline,8000,mmstar_average,0.33512257186205047,
|
| 95 |
+
Baseline,8000,ocrbench_ocrbench_accuracy,0.51,
|
| 96 |
+
Baseline,8000,seedbench_seed_all,0.5024458032240133,
|
| 97 |
+
Baseline,8000,textvqa_val_exact_match,0.51008,0.006796301690135059
|
| 98 |
+
Baseline,9000,ai2d_exact_match,0.4067357512953368,0.008841214921078996
|
| 99 |
+
Baseline,9000,average,0.4422510732201056,
|
| 100 |
+
Baseline,9000,average_rank,2.6,
|
| 101 |
+
Baseline,9000,chartqa_relaxed_overall,0.5912,0.009834211136815875
|
| 102 |
+
Baseline,9000,docvqa_val_anls,0.6170968481662739,0.00617235763542544
|
| 103 |
+
Baseline,9000,infovqa_val_anls,0.23537031288570615,0.00670318154156447
|
| 104 |
+
Baseline,9000,mme_total_score,1231.5195078031213,
|
| 105 |
+
Baseline,9000,mmmu_val_mmmu_acc,0.25889,
|
| 106 |
+
Baseline,9000,mmstar_average,0.3216444898242951,
|
| 107 |
+
Baseline,9000,ocrbench_ocrbench_accuracy,0.515,
|
| 108 |
+
Baseline,9000,seedbench_seed_all,0.5120622568093385,
|
| 109 |
+
Baseline,9000,textvqa_val_exact_match,0.52226,0.006792711289708482
|
| 110 |
+
Baseline,10000,ai2d_exact_match,0.39993523316062174,0.008817096257082848
|
| 111 |
+
Baseline,10000,average,0.4523875703250908,
|
| 112 |
+
Baseline,10000,average_rank,2.1,
|
| 113 |
+
Baseline,10000,chartqa_relaxed_overall,0.5996,0.00980154906867574
|
| 114 |
+
Baseline,10000,docvqa_val_anls,0.6262613496433054,0.006147756371688175
|
| 115 |
+
Baseline,10000,infovqa_val_anls,0.263290074230132,0.007186788766942786
|
| 116 |
+
Baseline,10000,mme_total_score,1240.8218287314926,
|
| 117 |
+
Baseline,10000,mmmu_val_mmmu_acc,0.28778,
|
| 118 |
+
Baseline,10000,mmstar_average,0.32972717906018517,
|
| 119 |
+
Baseline,10000,ocrbench_ocrbench_accuracy,0.517,
|
| 120 |
+
Baseline,10000,seedbench_seed_all,0.5217342968315731,
|
| 121 |
+
Baseline,10000,textvqa_val_exact_match,0.5261600000000001,0.006785774843600811
|
| 122 |
+
Baseline,11000,ai2d_exact_match,0.422279792746114,0.008889771831066474
|
| 123 |
+
Baseline,11000,average,0.4561398159525099,
|
| 124 |
+
Baseline,11000,average_rank,2.4,
|
| 125 |
+
Baseline,11000,chartqa_relaxed_overall,0.6104,0.009755142291143075
|
| 126 |
+
Baseline,11000,docvqa_val_anls,0.6373130149166712,0.006128022584995044
|
| 127 |
+
Baseline,11000,infovqa_val_anls,0.24419378339723755,0.006897644885887063
|
| 128 |
+
Baseline,11000,mme_total_score,1322.9488795518205,
|
| 129 |
+
Baseline,11000,mmmu_val_mmmu_acc,0.27778,
|
| 130 |
+
Baseline,11000,mmstar_average,0.3298563439522548,
|
| 131 |
+
Baseline,11000,ocrbench_ocrbench_accuracy,0.521,
|
| 132 |
+
Baseline,11000,seedbench_seed_all,0.5237354085603113,
|
| 133 |
+
Baseline,11000,textvqa_val_exact_match,0.5387,0.006770851562852138
|
| 134 |
+
Baseline,12000,ai2d_exact_match,0.42001295336787564,0.008883255931688034
|
| 135 |
+
Baseline,12000,average,0.4582751140055433,
|
| 136 |
+
Baseline,12000,average_rank,2.7,
|
| 137 |
+
Baseline,12000,chartqa_relaxed_overall,0.618,0.009719474639861454
|
| 138 |
+
Baseline,12000,docvqa_val_anls,0.6393961983751871,0.0061228747388476674
|
| 139 |
+
Baseline,12000,infovqa_val_anls,0.24798874058574302,0.006855374548993139
|
| 140 |
+
Baseline,12000,mme_total_score,1225.6453581432572,
|
| 141 |
+
Baseline,12000,mmmu_val_mmmu_acc,0.27889,
|
| 142 |
+
Baseline,12000,mmstar_average,0.34010867846816534,
|
| 143 |
+
Baseline,12000,ocrbench_ocrbench_accuracy,0.512,
|
| 144 |
+
Baseline,12000,seedbench_seed_all,0.5350194552529183,
|
| 145 |
+
Baseline,12000,textvqa_val_exact_match,0.5330600000000001,0.006777713092109446
|
| 146 |
+
Baseline,13000,ai2d_exact_match,0.4375,0.008928571428571428
|
| 147 |
+
Baseline,13000,average,0.4692868662590049,
|
| 148 |
+
Baseline,13000,average_rank,2.2,
|
| 149 |
+
Baseline,13000,chartqa_relaxed_overall,0.6148,0.00973479791861169
|
| 150 |
+
Baseline,13000,docvqa_val_anls,0.6511374872549951,0.006086953065248391
|
| 151 |
+
Baseline,13000,infovqa_val_anls,0.24465055100441893,0.006808432538374664
|
| 152 |
+
Baseline,13000,mme_total_score,1281.7122849139657,
|
| 153 |
+
Baseline,13000,mmmu_val_mmmu_acc,0.28222,
|
| 154 |
+
Baseline,13000,mmstar_average,0.3453069542917521,
|
| 155 |
+
Baseline,13000,ocrbench_ocrbench_accuracy,0.549,
|
| 156 |
+
Baseline,13000,seedbench_seed_all,0.5442468037798777,
|
| 157 |
+
Baseline,13000,textvqa_val_exact_match,0.55472,0.0067416788982325
|
| 158 |
+
Baseline,14000,ai2d_exact_match,0.4572538860103627,0.00896620675297095
|
| 159 |
+
Baseline,14000,average,0.47352486841689195,
|
| 160 |
+
Baseline,14000,average_rank,2.2,
|
| 161 |
+
Baseline,14000,chartqa_relaxed_overall,0.6172,0.009723347231923635
|
| 162 |
+
Baseline,14000,docvqa_val_anls,0.6502269393708169,0.006057950730638126
|
| 163 |
+
Baseline,14000,infovqa_val_anls,0.25805460837190913,0.007037735231659539
|
| 164 |
+
Baseline,14000,mme_total_score,1309.1444577831132,
|
| 165 |
+
Baseline,14000,mmmu_val_mmmu_acc,0.28111,
|
| 166 |
+
Baseline,14000,mmstar_average,0.34575818188776586,
|
| 167 |
+
Baseline,14000,ocrbench_ocrbench_accuracy,0.551,
|
| 168 |
+
Baseline,14000,seedbench_seed_all,0.5483602001111729,
|
| 169 |
+
Baseline,14000,textvqa_val_exact_match,0.55276,0.006751206724612103
|
| 170 |
+
Baseline,15000,ai2d_exact_match,0.45045336787564766,0.008954861634252399
|
| 171 |
+
Baseline,15000,average,0.47878665012878824,
|
| 172 |
+
Baseline,15000,average_rank,1.6,
|
| 173 |
+
Baseline,15000,chartqa_relaxed_overall,0.612,0.009747841205275417
|
| 174 |
+
Baseline,15000,docvqa_val_anls,0.6621413031955148,0.006056838050222495
|
| 175 |
+
Baseline,15000,infovqa_val_anls,0.2706898598157733,0.007200315730154543
|
| 176 |
+
Baseline,15000,mme_total_score,1384.2171868747498,
|
| 177 |
+
Baseline,15000,mmmu_val_mmmu_acc,0.30222,
|
| 178 |
+
Baseline,15000,mmstar_average,0.35408135695920684,
|
| 179 |
+
Baseline,15000,ocrbench_ocrbench_accuracy,0.558,
|
| 180 |
+
Baseline,15000,seedbench_seed_all,0.5411339633129516,
|
| 181 |
+
Baseline,15000,textvqa_val_exact_match,0.5583600000000001,0.0067279027203879065
|
| 182 |
+
Baseline,16000,ai2d_exact_match,0.45077720207253885,0.008955440137395838
|
| 183 |
+
Baseline,16000,average,0.47665128022935843,
|
| 184 |
+
Baseline,16000,average_rank,2.2,
|
| 185 |
+
Baseline,16000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 186 |
+
Baseline,16000,docvqa_val_anls,0.6709415729142987,0.005999818105621502
|
| 187 |
+
Baseline,16000,infovqa_val_anls,0.26050032542402035,0.006997451875879188
|
| 188 |
+
Baseline,16000,mme_total_score,1317.8491396558625,
|
| 189 |
+
Baseline,16000,mmmu_val_mmmu_acc,0.27556,
|
| 190 |
+
Baseline,16000,mmstar_average,0.33214333327093315,
|
| 191 |
+
Baseline,16000,ocrbench_ocrbench_accuracy,0.56,
|
| 192 |
+
Baseline,16000,seedbench_seed_all,0.5463590883824346,
|
| 193 |
+
Baseline,16000,textvqa_val_exact_match,0.56158,0.006723854754867398
|
| 194 |
+
Baseline,17000,ai2d_exact_match,0.45919689119170987,0.008969138793675545
|
| 195 |
+
Baseline,17000,average,0.4777141780162423,
|
| 196 |
+
Baseline,17000,average_rank,2.2,
|
| 197 |
+
Baseline,17000,chartqa_relaxed_overall,0.632,0.00964715642305132
|
| 198 |
+
Baseline,17000,docvqa_val_anls,0.6796338519136422,0.005948761388267941
|
| 199 |
+
Baseline,17000,infovqa_val_anls,0.28070956072505215,0.007298333094144192
|
| 200 |
+
Baseline,17000,mme_total_score,1381.9161664665867,
|
| 201 |
+
Baseline,17000,mmmu_val_mmmu_acc,0.27667,
|
| 202 |
+
Baseline,17000,mmstar_average,0.3370289492329521,
|
| 203 |
+
Baseline,17000,ocrbench_ocrbench_accuracy,0.519,
|
| 204 |
+
Baseline,17000,seedbench_seed_all,0.5510283490828238,
|
| 205 |
+
Baseline,17000,textvqa_val_exact_match,0.56416,0.006724830373229479
|
| 206 |
+
Baseline,18000,ai2d_exact_match,0.46567357512953367,0.008977921602780726
|
| 207 |
+
Baseline,18000,average,0.4819834595278701,
|
| 208 |
+
Baseline,18000,average_rank,2.1,
|
| 209 |
+
Baseline,18000,chartqa_relaxed_overall,0.6376,0.009615793331418735
|
| 210 |
+
Baseline,18000,docvqa_val_anls,0.6775884603912571,0.005972234236435759
|
| 211 |
+
Baseline,18000,infovqa_val_anls,0.27154318420389256,0.007164903131667027
|
| 212 |
+
Baseline,18000,mme_total_score,1336.922769107643,
|
| 213 |
+
Baseline,18000,mmmu_val_mmmu_acc,0.28667,
|
| 214 |
+
Baseline,18000,mmstar_average,0.34482796716566916,
|
| 215 |
+
Baseline,18000,ocrbench_ocrbench_accuracy,0.533,
|
| 216 |
+
Baseline,18000,seedbench_seed_all,0.5543079488604781,
|
| 217 |
+
Baseline,18000,textvqa_val_exact_match,0.5666399999999999,0.006713392287599574
|
| 218 |
+
Baseline,19000,ai2d_exact_match,0.4682642487046632,0.008981008686994101
|
| 219 |
+
Baseline,19000,average,0.4899006713916878,
|
| 220 |
+
Baseline,19000,average_rank,1.8,
|
| 221 |
+
Baseline,19000,chartqa_relaxed_overall,0.6444,0.009575809858898698
|
| 222 |
+
Baseline,19000,docvqa_val_anls,0.678226526479947,0.005970619221588814
|
| 223 |
+
Baseline,19000,infovqa_val_anls,0.26993847247278,0.0071348470764911525
|
| 224 |
+
Baseline,19000,mme_total_score,1406.6628651460583,
|
| 225 |
+
Baseline,19000,mmmu_val_mmmu_acc,0.28333,
|
| 226 |
+
Baseline,19000,mmstar_average,0.356220913822775,
|
| 227 |
+
Baseline,19000,ocrbench_ocrbench_accuracy,0.577,
|
| 228 |
+
Baseline,19000,seedbench_seed_all,0.554585881045025,
|
| 229 |
+
Baseline,19000,textvqa_val_exact_match,0.57714,0.0066918487914812905
|
| 230 |
+
Baseline,20000,ai2d_exact_match,0.47571243523316065,0.00898853090258662
|
| 231 |
+
Baseline,20000,average,0.4873169067639118,
|
| 232 |
+
Baseline,20000,average_rank,1.1,
|
| 233 |
+
Baseline,20000,chartqa_relaxed_overall,0.6336,0.009638338810708618
|
| 234 |
+
Baseline,20000,docvqa_val_anls,0.6895214454380043,0.005896462073053767
|
| 235 |
+
Baseline,20000,infovqa_val_anls,0.2655657550458317,0.007033265532032538
|
| 236 |
+
Baseline,20000,mme_total_score,1324.6738695478193,
|
| 237 |
+
Baseline,20000,mmmu_val_mmmu_acc,0.30111,
|
| 238 |
+
Baseline,20000,mmstar_average,0.33806766134497995,
|
| 239 |
+
Baseline,20000,ocrbench_ocrbench_accuracy,0.555,
|
| 240 |
+
Baseline,20000,seedbench_seed_all,0.5587548638132296,
|
| 241 |
+
Baseline,20000,textvqa_val_exact_match,0.56852,0.006720151338087659
|
| 242 |
+
≥2,1000,ai2d_exact_match,0.25777202072538863,0.00787260087439643
|
| 243 |
+
≥2,1000,average,0.29870004148945406,
|
| 244 |
+
≥2,1000,average_rank,1.6,
|
| 245 |
+
≥2,1000,chartqa_relaxed_overall,0.392,0.00976588700628918
|
| 246 |
+
≥2,1000,docvqa_val_anls,0.38022613055363247,0.005894591191928051
|
| 247 |
+
≥2,1000,infovqa_val_anls,0.18869492615378894,0.0064732209321745355
|
| 248 |
+
≥2,1000,mme_total_score,1009.0293117246899,
|
| 249 |
+
≥2,1000,mmmu_val_mmmu_acc,0.26889,
|
| 250 |
+
≥2,1000,mmstar_average,0.2331497473341441,
|
| 251 |
+
≥2,1000,ocrbench_ocrbench_accuracy,0.342,
|
| 252 |
+
≥2,1000,seedbench_seed_all,0.25758754863813227,
|
| 253 |
+
≥2,1000,textvqa_val_exact_match,0.36798000000000003,0.0065929080994233105
|
| 254 |
+
≥2,2000,ai2d_exact_match,0.2655440414507772,0.007948457289013512
|
| 255 |
+
≥2,2000,average,0.33300868864578836,
|
| 256 |
+
≥2,2000,average_rank,1.9,
|
| 257 |
+
≥2,2000,chartqa_relaxed_overall,0.4772,0.009991596308834713
|
| 258 |
+
≥2,2000,docvqa_val_anls,0.46008700414278725,0.006164804465853957
|
| 259 |
+
≥2,2000,infovqa_val_anls,0.21854176213098941,0.0068306837394726885
|
| 260 |
+
≥2,2000,mme_total_score,1096.7091836734694,
|
| 261 |
+
≥2,2000,mmmu_val_mmmu_acc,0.25333,
|
| 262 |
+
≥2,2000,mmstar_average,0.203705184417725,
|
| 263 |
+
≥2,2000,ocrbench_ocrbench_accuracy,0.409,
|
| 264 |
+
≥2,2000,seedbench_seed_all,0.26637020566981656,
|
| 265 |
+
≥2,2000,textvqa_val_exact_match,0.4433,0.006768706307487082
|
| 266 |
+
≥2,3000,ai2d_exact_match,0.2610103626943005,0.007904597024354016
|
| 267 |
+
≥2,3000,average,0.3534218124222384,
|
| 268 |
+
≥2,3000,average_rank,2.8,
|
| 269 |
+
≥2,3000,chartqa_relaxed_overall,0.5264,0.009988048880946633
|
| 270 |
+
≥2,3000,docvqa_val_anls,0.5023083447985476,0.006259506318102633
|
| 271 |
+
≥2,3000,infovqa_val_anls,0.2121067617258121,0.006644891191437915
|
| 272 |
+
≥2,3000,mme_total_score,1089.7261904761906,
|
| 273 |
+
≥2,3000,mmmu_val_mmmu_acc,0.24333,
|
| 274 |
+
≥2,3000,mmstar_average,0.23083152629465944,
|
| 275 |
+
≥2,3000,ocrbench_ocrbench_accuracy,0.462,
|
| 276 |
+
≥2,3000,seedbench_seed_all,0.284769316286826,
|
| 277 |
+
≥2,3000,textvqa_val_exact_match,0.45803999999999995,0.006781406135443796
|
| 278 |
+
≥2,4000,ai2d_exact_match,0.30440414507772023,0.00828200443840283
|
| 279 |
+
≥2,4000,average,0.3907370325722774,
|
| 280 |
+
≥2,4000,average_rank,2.0,
|
| 281 |
+
≥2,4000,chartqa_relaxed_overall,0.5336,0.009979391329160321
|
| 282 |
+
≥2,4000,docvqa_val_anls,0.5333935372644802,0.006254756291115302
|
| 283 |
+
≥2,4000,infovqa_val_anls,0.21491089642827804,0.006550896522978477
|
| 284 |
+
≥2,4000,mme_total_score,1174.6278511404562,
|
| 285 |
+
≥2,4000,mmmu_val_mmmu_acc,0.27778,
|
| 286 |
+
≥2,4000,mmstar_average,0.28828662655344744,
|
| 287 |
+
≥2,4000,ocrbench_ocrbench_accuracy,0.483,
|
| 288 |
+
≥2,4000,seedbench_seed_all,0.4045580878265703,
|
| 289 |
+
≥2,4000,textvqa_val_exact_match,0.4767,0.00679925495724309
|
| 290 |
+
≥2,5000,ai2d_exact_match,0.34067357512953367,0.008530041622806898
|
| 291 |
+
≥2,5000,average,0.4052701461218865,
|
| 292 |
+
≥2,5000,average_rank,2.4,
|
| 293 |
+
≥2,5000,chartqa_relaxed_overall,0.556,0.00993907007952043
|
| 294 |
+
≥2,5000,docvqa_val_anls,0.551636615576218,0.006262113198152568
|
| 295 |
+
≥2,5000,infovqa_val_anls,0.2187520201553633,0.0065999197492640155
|
| 296 |
+
≥2,5000,mme_total_score,1239.7700080032012,
|
| 297 |
+
≥2,5000,mmmu_val_mmmu_acc,0.25889,
|
| 298 |
+
≥2,5000,mmstar_average,0.3088917112397545,
|
| 299 |
+
≥2,5000,ocrbench_ocrbench_accuracy,0.479,
|
| 300 |
+
≥2,5000,seedbench_seed_all,0.45330739299610895,
|
| 301 |
+
≥2,5000,textvqa_val_exact_match,0.48028000000000004,0.0068060185643105285
|
| 302 |
+
≥2,6000,ai2d_exact_match,0.3636658031088083,0.008658158841882571
|
| 303 |
+
≥2,6000,average,0.4239178365378573,
|
| 304 |
+
≥2,6000,average_rank,1.8,
|
| 305 |
+
≥2,6000,chartqa_relaxed_overall,0.5588,0.009932597172675325
|
| 306 |
+
≥2,6000,docvqa_val_anls,0.5759208000996875,0.00627846983332349
|
| 307 |
+
≥2,6000,infovqa_val_anls,0.22102654462080973,0.006590286832050693
|
| 308 |
+
≥2,6000,mme_total_score,1250.6378551420567,
|
| 309 |
+
≥2,6000,mmmu_val_mmmu_acc,0.28889,
|
| 310 |
+
≥2,6000,mmstar_average,0.3344899768980139,
|
| 311 |
+
≥2,6000,ocrbench_ocrbench_accuracy,0.487,
|
| 312 |
+
≥2,6000,seedbench_seed_all,0.4893274041133963,
|
| 313 |
+
≥2,6000,textvqa_val_exact_match,0.4961400000000001,0.006795016889670414
|
| 314 |
+
≥2,7000,ai2d_exact_match,0.38147668393782386,0.008742662684201102
|
| 315 |
+
≥2,7000,average,0.43263406493008894,
|
| 316 |
+
≥2,7000,average_rank,1.6,
|
| 317 |
+
≥2,7000,chartqa_relaxed_overall,0.5748,0.009889444091645227
|
| 318 |
+
≥2,7000,docvqa_val_anls,0.5867668845214878,0.006247104281789733
|
| 319 |
+
≥2,7000,infovqa_val_anls,0.23824260143164633,0.0068608126823946685
|
| 320 |
+
≥2,7000,mme_total_score,1296.735694277711,
|
| 321 |
+
≥2,7000,mmmu_val_mmmu_acc,0.28556,
|
| 322 |
+
≥2,7000,mmstar_average,0.3072635940240335,
|
| 323 |
+
≥2,7000,ocrbench_ocrbench_accuracy,0.513,
|
| 324 |
+
≥2,7000,seedbench_seed_all,0.4982768204558088,
|
| 325 |
+
≥2,7000,textvqa_val_exact_match,0.5083200000000001,0.006792185382957041
|
| 326 |
+
≥2,8000,ai2d_exact_match,0.40382124352331605,0.008831094143874323
|
| 327 |
+
≥2,8000,average,0.4456329819076237,
|
| 328 |
+
≥2,8000,average_rank,1.2,
|
| 329 |
+
≥2,8000,chartqa_relaxed_overall,0.5856,0.009854334029231191
|
| 330 |
+
≥2,8000,docvqa_val_anls,0.6057730552623142,0.006226842243771017
|
| 331 |
+
≥2,8000,infovqa_val_anls,0.2288666100209681,0.006682620404600125
|
| 332 |
+
≥2,8000,mme_total_score,1221.7308923569428,
|
| 333 |
+
≥2,8000,mmmu_val_mmmu_acc,0.29778,
|
| 334 |
+
≥2,8000,mmstar_average,0.3382327154659617,
|
| 335 |
+
≥2,8000,ocrbench_ocrbench_accuracy,0.523,
|
| 336 |
+
≥2,8000,seedbench_seed_all,0.5097832128960533,
|
| 337 |
+
≥2,8000,textvqa_val_exact_match,0.51784,0.006779139932738188
|
| 338 |
+
≥2,9000,ai2d_exact_match,0.40025906735751293,0.008818284784223732
|
| 339 |
+
≥2,9000,average,0.4496383408952438,
|
| 340 |
+
≥2,9000,average_rank,2.1,
|
| 341 |
+
≥2,9000,chartqa_relaxed_overall,0.5952,0.0098190299592035
|
| 342 |
+
≥2,9000,docvqa_val_anls,0.6174168384969986,0.006180917562443268
|
| 343 |
+
≥2,9000,infovqa_val_anls,0.25032529875297344,0.007005541408150581
|
| 344 |
+
≥2,9000,mme_total_score,1151.3506402561025,
|
| 345 |
+
≥2,9000,mmmu_val_mmmu_acc,0.28111,
|
| 346 |
+
≥2,9000,mmstar_average,0.3278445360455953,
|
| 347 |
+
≥2,9000,ocrbench_ocrbench_accuracy,0.533,
|
| 348 |
+
≥2,9000,seedbench_seed_all,0.5207893274041134,
|
| 349 |
+
≥2,9000,textvqa_val_exact_match,0.5208,0.006794961832215119
|
| 350 |
+
≥2,10000,ai2d_exact_match,0.41353626943005184,0.00886357792887845
|
| 351 |
+
≥2,10000,average,0.45099931997713383,
|
| 352 |
+
≥2,10000,average_rank,2.3,
|
| 353 |
+
≥2,10000,chartqa_relaxed_overall,0.5808,0.009870537726284339
|
| 354 |
+
≥2,10000,docvqa_val_anls,0.6302398141335456,0.006157573676383471
|
| 355 |
+
≥2,10000,infovqa_val_anls,0.2381115764088292,0.0068559608378919125
|
| 356 |
+
≥2,10000,mme_total_score,1079.641156462585,
|
| 357 |
+
≥2,10000,mmmu_val_mmmu_acc,0.27222,
|
| 358 |
+
≥2,10000,mmstar_average,0.32534155056107783,
|
| 359 |
+
≥2,10000,ocrbench_ocrbench_accuracy,0.54,
|
| 360 |
+
≥2,10000,seedbench_seed_all,0.5284046692607004,
|
| 361 |
+
≥2,10000,textvqa_val_exact_match,0.5303399999999999,0.006770691643625976
|
| 362 |
+
≥2,11000,ai2d_exact_match,0.42972797927461137,0.008909832364541423
|
| 363 |
+
≥2,11000,average,0.4614569435855967,
|
| 364 |
+
≥2,11000,average_rank,1.9,
|
| 365 |
+
≥2,11000,chartqa_relaxed_overall,0.5924,0.009829727637028773
|
| 366 |
+
≥2,11000,docvqa_val_anls,0.6325497973300792,0.006140954920476996
|
| 367 |
+
≥2,11000,infovqa_val_anls,0.2522830831713292,0.007064159450996417
|
| 368 |
+
≥2,11000,mme_total_score,1068.0994397759105,
|
| 369 |
+
≥2,11000,mmmu_val_mmmu_acc,0.3,
|
| 370 |
+
≥2,11000,mmstar_average,0.3439924885254794,
|
| 371 |
+
��2,11000,ocrbench_ocrbench_accuracy,0.541,
|
| 372 |
+
≥2,11000,seedbench_seed_all,0.5264591439688716,
|
| 373 |
+
≥2,11000,textvqa_val_exact_match,0.5347,0.006770579749920238
|
| 374 |
+
≥2,12000,ai2d_exact_match,0.4332901554404145,0.008918698335135207
|
| 375 |
+
≥2,12000,average,0.46757010591353954,
|
| 376 |
+
≥2,12000,average_rank,2.3,
|
| 377 |
+
≥2,12000,chartqa_relaxed_overall,0.6032,0.00978663452296623
|
| 378 |
+
≥2,12000,docvqa_val_anls,0.6411886944030833,0.006133992458699484
|
| 379 |
+
≥2,12000,infovqa_val_anls,0.247583905971666,0.0068292606857312445
|
| 380 |
+
≥2,12000,mme_total_score,1116.4749899959984,
|
| 381 |
+
≥2,12000,mmmu_val_mmmu_acc,0.31333,
|
| 382 |
+
≥2,12000,mmstar_average,0.36351686333220656,
|
| 383 |
+
≥2,12000,ocrbench_ocrbench_accuracy,0.545,
|
| 384 |
+
≥2,12000,seedbench_seed_all,0.5224013340744859,
|
| 385 |
+
≥2,12000,textvqa_val_exact_match,0.53862,0.006767396741120647
|
| 386 |
+
≥2,13000,ai2d_exact_match,0.43102331606217614,0.008913110733383512
|
| 387 |
+
≥2,13000,average,0.4708623711695242,
|
| 388 |
+
≥2,13000,average_rank,1.8,
|
| 389 |
+
≥2,13000,chartqa_relaxed_overall,0.598,0.009808000752013664
|
| 390 |
+
≥2,13000,docvqa_val_anls,0.647518045726831,0.006104766828382393
|
| 391 |
+
≥2,13000,infovqa_val_anls,0.2658523754039203,0.0070971911654514885
|
| 392 |
+
≥2,13000,mme_total_score,1216.8277310924368,
|
| 393 |
+
≥2,13000,mmmu_val_mmmu_acc,0.30778,
|
| 394 |
+
≥2,13000,mmstar_average,0.35401024924718744,
|
| 395 |
+
≥2,13000,ocrbench_ocrbench_accuracy,0.559,
|
| 396 |
+
≥2,13000,seedbench_seed_all,0.5272373540856031,
|
| 397 |
+
≥2,13000,textvqa_val_exact_match,0.5473399999999999,0.0067627015094813454
|
| 398 |
+
≥2,14000,ai2d_exact_match,0.4413860103626943,0.008937105222785164
|
| 399 |
+
≥2,14000,average,0.47200827220139996,
|
| 400 |
+
≥2,14000,average_rank,2.2,
|
| 401 |
+
≥2,14000,chartqa_relaxed_overall,0.608,0.00976588700628918
|
| 402 |
+
≥2,14000,docvqa_val_anls,0.6481895430203203,0.006114739128147622
|
| 403 |
+
≥2,14000,infovqa_val_anls,0.24797040095296322,0.006854729144835086
|
| 404 |
+
≥2,14000,mme_total_score,1153.2053821528611,
|
| 405 |
+
≥2,14000,mmmu_val_mmmu_acc,0.29778,
|
| 406 |
+
≥2,14000,mmstar_average,0.3623816027584452,
|
| 407 |
+
≥2,14000,ocrbench_ocrbench_accuracy,0.557,
|
| 408 |
+
≥2,14000,seedbench_seed_all,0.5324068927181768,
|
| 409 |
+
≥2,14000,textvqa_val_exact_match,0.55296,0.006743422160037565
|
| 410 |
+
≥2,15000,ai2d_exact_match,0.44430051813471505,0.008943141268224493
|
| 411 |
+
≥2,15000,average,0.4768540644353607,
|
| 412 |
+
≥2,15000,average_rank,2.1,
|
| 413 |
+
≥2,15000,chartqa_relaxed_overall,0.6124,0.00974599865564932
|
| 414 |
+
≥2,15000,docvqa_val_anls,0.657105104030175,0.006083402178894847
|
| 415 |
+
≥2,15000,infovqa_val_anls,0.2631976158051741,0.0070935590715263345
|
| 416 |
+
≥2,15000,mme_total_score,1174.1028411364546,
|
| 417 |
+
≥2,15000,mmmu_val_mmmu_acc,0.29222,
|
| 418 |
+
≥2,15000,mmstar_average,0.36067644923000525,
|
| 419 |
+
≥2,15000,ocrbench_ocrbench_accuracy,0.573,
|
| 420 |
+
≥2,15000,seedbench_seed_all,0.5324068927181768,
|
| 421 |
+
≥2,15000,textvqa_val_exact_match,0.55638,0.006732794755897807
|
| 422 |
+
≥2,16000,ai2d_exact_match,0.4423575129533679,0.008939151893135126
|
| 423 |
+
≥2,16000,average,0.4826201441651047,
|
| 424 |
+
≥2,16000,average_rank,1.7,
|
| 425 |
+
≥2,16000,chartqa_relaxed_overall,0.6136,0.009740429476494075
|
| 426 |
+
≥2,16000,docvqa_val_anls,0.6608706272244327,0.006045334017501067
|
| 427 |
+
≥2,16000,infovqa_val_anls,0.2658456998658782,0.007096717267723537
|
| 428 |
+
≥2,16000,mme_total_score,1196.9498799519806,
|
| 429 |
+
≥2,16000,mmmu_val_mmmu_acc,0.31667,
|
| 430 |
+
≥2,16000,mmstar_average,0.3731506870142482,
|
| 431 |
+
≥2,16000,ocrbench_ocrbench_accuracy,0.57,
|
| 432 |
+
≥2,16000,seedbench_seed_all,0.5361867704280155,
|
| 433 |
+
≥2,16000,textvqa_val_exact_match,0.5649,0.006716890748275191
|
| 434 |
+
≥2,17000,ai2d_exact_match,0.4413860103626943,0.008937105222785164
|
| 435 |
+
≥2,17000,average,0.4831149323865578,
|
| 436 |
+
≥2,17000,average_rank,1.7,
|
| 437 |
+
≥2,17000,chartqa_relaxed_overall,0.6176,0.00972141442174665
|
| 438 |
+
≥2,17000,docvqa_val_anls,0.6642577380136374,0.0060335568273967325
|
| 439 |
+
≥2,17000,infovqa_val_anls,0.27541255121036223,0.007220728859165202
|
| 440 |
+
≥2,17000,mme_total_score,1226.7763105242097,
|
| 441 |
+
≥2,17000,mmmu_val_mmmu_acc,0.31222,
|
| 442 |
+
≥2,17000,mmstar_average,0.3539718773286798,
|
| 443 |
+
≥2,17000,ocrbench_ocrbench_accuracy,0.579,
|
| 444 |
+
≥2,17000,seedbench_seed_all,0.5351862145636465,
|
| 445 |
+
≥2,17000,textvqa_val_exact_match,0.569,0.006714701590055116
|
| 446 |
+
≥2,18000,ai2d_exact_match,0.4485103626943005,0.008951310133709684
|
| 447 |
+
≥2,18000,average,0.4835600793462623,
|
| 448 |
+
≥2,18000,average_rank,2.0,
|
| 449 |
+
≥2,18000,chartqa_relaxed_overall,0.6144,0.009736682042198788
|
| 450 |
+
≥2,18000,docvqa_val_anls,0.6637272679006306,0.006043227308338888
|
| 451 |
+
≥2,18000,infovqa_val_anls,0.27265678126051135,0.007221049354398339
|
| 452 |
+
≥2,18000,mme_total_score,1195.9091636654662,
|
| 453 |
+
≥2,18000,mmmu_val_mmmu_acc,0.30778,
|
| 454 |
+
≥2,18000,mmstar_average,0.36311980976508723,
|
| 455 |
+
≥2,18000,ocrbench_ocrbench_accuracy,0.574,
|
| 456 |
+
≥2,18000,seedbench_seed_all,0.535686492495831,
|
| 457 |
+
≥2,18000,textvqa_val_exact_match,0.5721599999999999,0.006697668148820628
|
| 458 |
+
≥2,19000,ai2d_exact_match,0.452720207253886,0.008958830742136076
|
| 459 |
+
≥2,19000,average,0.48356600425554297,
|
| 460 |
+
≥2,19000,average_rank,2.1,
|
| 461 |
+
≥2,19000,chartqa_relaxed_overall,0.6128,0.009744149186940382
|
| 462 |
+
≥2,19000,docvqa_val_anls,0.6625382006068653,0.006055552711820008
|
| 463 |
+
≥2,19000,infovqa_val_anls,0.27294748299543115,0.007192809991207902
|
| 464 |
+
≥2,19000,mme_total_score,1187.5743297318927,
|
| 465 |
+
≥2,19000,mmmu_val_mmmu_acc,0.31222,
|
| 466 |
+
≥2,19000,mmstar_average,0.3570382308233601,
|
| 467 |
+
≥2,19000,ocrbench_ocrbench_accuracy,0.583,
|
| 468 |
+
≥2,19000,seedbench_seed_all,0.5298499166203446,
|
| 469 |
+
≥2,19000,textvqa_val_exact_match,0.56898,0.006699263239012863
|
| 470 |
+
≥3,1000,ai2d_exact_match,0.265220207253886,0.00794536023378452
|
| 471 |
+
≥3,1000,average,0.28531874553483266,
|
| 472 |
+
≥3,1000,average_rank,2.7,
|
| 473 |
+
≥3,1000,chartqa_relaxed_overall,0.3328,0.00942619781683542
|
| 474 |
+
≥3,1000,docvqa_val_anls,0.3817065818486834,0.0060201820313849075
|
| 475 |
+
≥3,1000,infovqa_val_anls,0.1724277234914656,0.0063364421395684075
|
| 476 |
+
≥3,1000,mme_total_score,1014.2047819127652,
|
| 477 |
+
≥3,1000,mmmu_val_mmmu_acc,0.25222,
|
| 478 |
+
≥3,1000,mmstar_average,0.21462695986537322,
|
| 479 |
+
≥3,1000,ocrbench_ocrbench_accuracy,0.34,
|
| 480 |
+
≥3,1000,seedbench_seed_all,0.2490272373540856,
|
| 481 |
+
≥3,1000,textvqa_val_exact_match,0.35984,0.0065578377478545035
|
| 482 |
+
≥3,2000,ai2d_exact_match,0.265220207253886,0.007945360233784508
|
| 483 |
+
≥3,2000,average,0.31448842347349887,
|
| 484 |
+
≥3,2000,average_rank,3.2,
|
| 485 |
+
≥3,2000,chartqa_relaxed_overall,0.4032,0.009812768221458571
|
| 486 |
+
≥3,2000,docvqa_val_anls,0.4310807448012024,0.006128304803001007
|
| 487 |
+
≥3,2000,infovqa_val_anls,0.18407319533143493,0.006354264550345518
|
| 488 |
+
≥3,2000,mme_total_score,950.6575630252102,
|
| 489 |
+
≥3,2000,mmmu_val_mmmu_acc,0.25667,
|
| 490 |
+
≥3,2000,mmstar_average,0.21376105798280387,
|
| 491 |
+
≥3,2000,ocrbench_ocrbench_accuracy,0.404,
|
| 492 |
+
≥3,2000,seedbench_seed_all,0.2630906058921623,
|
| 493 |
+
≥3,2000,textvqa_val_exact_match,0.4093,0.006714113334132268
|
| 494 |
+
≥3,3000,ai2d_exact_match,0.27266839378238344,0.008015217564479087
|
| 495 |
+
≥3,3000,average,0.34673777173147413,
|
| 496 |
+
≥3,3000,average_rank,2.9,
|
| 497 |
+
≥3,3000,chartqa_relaxed_overall,0.4572,0.00996528909739792
|
| 498 |
+
≥3,3000,docvqa_val_anls,0.4798353261388413,0.006271292862880522
|
| 499 |
+
≥3,3000,infovqa_val_anls,0.19911035728945153,0.006542447972079325
|
| 500 |
+
≥3,3000,mme_total_score,1038.0850340136053,
|
| 501 |
+
≥3,3000,mmmu_val_mmmu_acc,0.28,
|
| 502 |
+
≥3,3000,mmstar_average,0.22994571273056794,
|
| 503 |
+
≥3,3000,ocrbench_ocrbench_accuracy,0.452,
|
| 504 |
+
≥3,3000,seedbench_seed_all,0.3042801556420233,
|
| 505 |
+
≥3,3000,textvqa_val_exact_match,0.4456,0.006780363018639408
|
| 506 |
+
≥3,4000,ai2d_exact_match,0.3008419689119171,0.008254458183344766
|
| 507 |
+
≥3,4000,average,0.372256616377869,
|
| 508 |
+
≥3,4000,average_rank,3.4,
|
| 509 |
+
≥3,4000,chartqa_relaxed_overall,0.498,0.010001920583875201
|
| 510 |
+
≥3,4000,docvqa_val_anls,0.5005393380802441,0.006322332852652876
|
| 511 |
+
≥3,4000,infovqa_val_anls,0.2022979884693795,0.006565632329170807
|
| 512 |
+
≥3,4000,mme_total_score,1107.3175270108043,
|
| 513 |
+
≥3,4000,mmmu_val_mmmu_acc,0.27889,
|
| 514 |
+
≥3,4000,mmstar_average,0.26412491008269295,
|
| 515 |
+
≥3,4000,ocrbench_ocrbench_accuracy,0.449,
|
| 516 |
+
≥3,4000,seedbench_seed_all,0.407615341856587,
|
| 517 |
+
≥3,4000,textvqa_val_exact_match,0.449,0.006768113582993008
|
| 518 |
+
≥3,5000,ai2d_exact_match,0.342940414507772,0.008543648986216484
|
| 519 |
+
≥3,5000,average,0.39554841320961825,
|
| 520 |
+
≥3,5000,average_rank,2.8,
|
| 521 |
+
≥3,5000,chartqa_relaxed_overall,0.5016,0.010001949389825897
|
| 522 |
+
≥3,5000,docvqa_val_anls,0.5217724980647621,0.006332629841214822
|
| 523 |
+
≥3,5000,infovqa_val_anls,0.20921213176388598,0.006571741980614216
|
| 524 |
+
≥3,5000,mme_total_score,1143.5384153661464,
|
| 525 |
+
≥3,5000,mmmu_val_mmmu_acc,0.27889,
|
| 526 |
+
≥3,5000,mmstar_average,0.3085553160176263,
|
| 527 |
+
≥3,5000,ocrbench_ocrbench_accuracy,0.471,
|
| 528 |
+
≥3,5000,seedbench_seed_all,0.4616453585325181,
|
| 529 |
+
≥3,5000,textvqa_val_exact_match,0.46431999999999995,0.006784172205840865
|
| 530 |
+
≥3,6000,ai2d_exact_match,0.3591321243523316,0.008634616704865624
|
| 531 |
+
≥3,6000,average,0.408597756536911,
|
| 532 |
+
≥3,6000,average_rank,3.4,
|
| 533 |
+
≥3,6000,chartqa_relaxed_overall,0.524,0.009990471651004463
|
| 534 |
+
≥3,6000,docvqa_val_anls,0.5385470022871673,0.006299095577053015
|
| 535 |
+
≥3,6000,infovqa_val_anls,0.19835998026203344,0.006301952324954991
|
| 536 |
+
≥3,6000,mme_total_score,1166.4043617446978,
|
| 537 |
+
≥3,6000,mmmu_val_mmmu_acc,0.26889,
|
| 538 |
+
≥3,6000,mmstar_average,0.3153629642986489,
|
| 539 |
+
≥3,6000,ocrbench_ocrbench_accuracy,0.524,
|
| 540 |
+
≥3,6000,seedbench_seed_all,0.4699277376320178,
|
| 541 |
+
≥3,6000,textvqa_val_exact_match,0.47916000000000003,0.006792463941257152
|
| 542 |
+
≥3,7000,ai2d_exact_match,0.3743523316062176,0.00871037538055804
|
| 543 |
+
≥3,7000,average,0.4146622463341539,
|
| 544 |
+
≥3,7000,average_rank,3.2,
|
| 545 |
+
≥3,7000,chartqa_relaxed_overall,0.5324,0.009980979109165145
|
| 546 |
+
≥3,7000,docvqa_val_anls,0.5423666654155802,0.0063068441060751875
|
| 547 |
+
≥3,7000,infovqa_val_anls,0.20429604055237802,0.006425262069821515
|
| 548 |
+
≥3,7000,mme_total_score,1148.8516406562626,
|
| 549 |
+
≥3,7000,mmmu_val_mmmu_acc,0.28111,
|
| 550 |
+
≥3,7000,mmstar_average,0.314909965425427,
|
| 551 |
+
≥3,7000,ocrbench_ocrbench_accuracy,0.501,
|
| 552 |
+
≥3,7000,seedbench_seed_all,0.4933852140077821,
|
| 553 |
+
≥3,7000,textvqa_val_exact_match,0.4881400000000001,0.006795461996122717
|
| 554 |
+
≥3,8000,ai2d_exact_match,0.3869818652849741,0.008766245989484155
|
| 555 |
+
≥3,8000,average,0.42472235991601914,
|
| 556 |
+
≥3,8000,average_rank,3.2,
|
| 557 |
+
≥3,8000,chartqa_relaxed_overall,0.548,0.009955804699716018
|
| 558 |
+
≥3,8000,docvqa_val_anls,0.564403800428159,0.006324143942272913
|
| 559 |
+
≥3,8000,infovqa_val_anls,0.2120646153586776,0.006489418673794432
|
| 560 |
+
≥3,8000,mme_total_score,1176.846738695478,
|
| 561 |
+
≥3,8000,mmmu_val_mmmu_acc,0.27333,
|
| 562 |
+
≥3,8000,mmstar_average,0.3144330871328953,
|
| 563 |
+
≥3,8000,ocrbench_ocrbench_accuracy,0.528,
|
| 564 |
+
≥3,8000,seedbench_seed_all,0.5021678710394664,
|
| 565 |
+
≥3,8000,textvqa_val_exact_match,0.4931199999999999,0.006792154282108025
|
| 566 |
+
≥3,9000,ai2d_exact_match,0.3898963730569948,0.008778252852376935
|
| 567 |
+
≥3,9000,average,0.4346681180521681,
|
| 568 |
+
≥3,9000,average_rank,2.6,
|
| 569 |
+
≥3,9000,chartqa_relaxed_overall,0.558,0.009934479228979262
|
| 570 |
+
≥3,9000,docvqa_val_anls,0.5621929569166433,0.006295010516387767
|
| 571 |
+
≥3,9000,infovqa_val_anls,0.22522973646589128,0.0066872494457316176
|
| 572 |
+
≥3,9000,mme_total_score,1198.484293717487,
|
| 573 |
+
≥3,9000,mmmu_val_mmmu_acc,0.28333,
|
| 574 |
+
≥3,9000,mmstar_average,0.3344926341622794,
|
| 575 |
+
≥3,9000,ocrbench_ocrbench_accuracy,0.537,
|
| 576 |
+
≥3,9000,seedbench_seed_all,0.5124513618677042,
|
| 577 |
+
≥3,9000,textvqa_val_exact_match,0.50942,0.006797109154528248
|
| 578 |
+
≥3,10000,ai2d_exact_match,0.39831606217616583,0.008811093384512251
|
| 579 |
+
≥3,10000,average,0.4363450582991636,
|
| 580 |
+
≥3,10000,average_rank,3.1,
|
| 581 |
+
≥3,10000,chartqa_relaxed_overall,0.5644,0.00991868984106597
|
| 582 |
+
≥3,10000,docvqa_val_anls,0.5876359888597289,0.006275892871477498
|
| 583 |
+
≥3,10000,infovqa_val_anls,0.21414071448409078,0.00648536598995207
|
| 584 |
+
≥3,10000,mme_total_score,1125.9251700680272,
|
| 585 |
+
≥3,10000,mmmu_val_mmmu_acc,0.29111,
|
| 586 |
+
≥3,10000,mmstar_average,0.3371786791280175,
|
| 587 |
+
≥3,10000,ocrbench_ocrbench_accuracy,0.509,
|
| 588 |
+
≥3,10000,seedbench_seed_all,0.5193440800444692,
|
| 589 |
+
≥3,10000,textvqa_val_exact_match,0.50598,0.006794533174266738
|
| 590 |
+
≥3,11000,ai2d_exact_match,0.41321243523316065,0.00886255263438398
|
| 591 |
+
≥3,11000,average,0.44519353317344135,
|
| 592 |
+
≥3,11000,average_rank,3.2,
|
| 593 |
+
≥3,11000,chartqa_relaxed_overall,0.5648,0.009917647296166388
|
| 594 |
+
≥3,11000,docvqa_val_anls,0.5901130752220717,0.006255436475783363
|
| 595 |
+
≥3,11000,infovqa_val_anls,0.2237154173345093,0.006578524720143259
|
| 596 |
+
≥3,11000,mme_total_score,1143.140256102441,
|
| 597 |
+
≥3,11000,mmmu_val_mmmu_acc,0.28333,
|
| 598 |
+
≥3,11000,mmstar_average,0.3585590419774556,
|
| 599 |
+
≥3,11000,ocrbench_ocrbench_accuracy,0.53,
|
| 600 |
+
≥3,11000,seedbench_seed_all,0.5252918287937743,
|
| 601 |
+
≥3,11000,textvqa_val_exact_match,0.51772,0.0067926451641216416
|
| 602 |
+
≥3,12000,ai2d_exact_match,0.42487046632124353,0.008896983637113786
|
| 603 |
+
≥3,12000,average,0.45334975698356383,
|
| 604 |
+
≥3,12000,average_rank,3.1,
|
| 605 |
+
≥3,12000,chartqa_relaxed_overall,0.5804,0.009871844677005952
|
| 606 |
+
≥3,12000,docvqa_val_anls,0.5973296953454501,0.006247696301034546
|
| 607 |
+
≥3,12000,infovqa_val_anls,0.23268166067961038,0.006728755322175445
|
| 608 |
+
≥3,12000,mme_total_score,1161.591236494598,
|
| 609 |
+
≥3,12000,mmmu_val_mmmu_acc,0.29667,
|
| 610 |
+
≥3,12000,mmstar_average,0.35603672424673755,
|
| 611 |
+
≥3,12000,ocrbench_ocrbench_accuracy,0.547,
|
| 612 |
+
≥3,12000,seedbench_seed_all,0.5226792662590328,
|
| 613 |
+
≥3,12000,textvqa_val_exact_match,0.5224799999999999,0.006785052491135311
|
| 614 |
+
≥3,13000,ai2d_exact_match,0.4167746113989637,0.008873613803189363
|
| 615 |
+
≥3,13000,average,0.4552955297331601,
|
| 616 |
+
≥3,13000,average_rank,3.1,
|
| 617 |
+
≥3,13000,chartqa_relaxed_overall,0.5792,0.009875725592704212
|
| 618 |
+
≥3,13000,docvqa_val_anls,0.6043196538705875,0.006220916351545474
|
| 619 |
+
≥3,13000,infovqa_val_anls,0.2321475605523485,0.006628428362101348
|
| 620 |
+
≥3,13000,mme_total_score,1192.6958783513405,
|
| 621 |
+
≥3,13000,mmmu_val_mmmu_acc,0.3,
|
| 622 |
+
≥3,13000,mmstar_average,0.3557924720711495,
|
| 623 |
+
≥3,13000,ocrbench_ocrbench_accuracy,0.556,
|
| 624 |
+
≥3,13000,seedbench_seed_all,0.5218454697053919,
|
| 625 |
+
≥3,13000,textvqa_val_exact_match,0.53158,0.006760864040676702
|
| 626 |
+
≥3,14000,ai2d_exact_match,0.4174222797927461,0.008875573686735059
|
| 627 |
+
≥3,14000,average,0.4584104359960548,
|
| 628 |
+
≥3,14000,average_rank,3.3,
|
| 629 |
+
≥3,14000,chartqa_relaxed_overall,0.5804,0.009871844677005952
|
| 630 |
+
≥3,14000,docvqa_val_anls,0.6088163368119239,0.006242610950178105
|
| 631 |
+
≥3,14000,infovqa_val_anls,0.2468904756158024,0.006928482619868768
|
| 632 |
+
≥3,14000,mme_total_score,1161.6323529411766,
|
| 633 |
+
≥3,14000,mmmu_val_mmmu_acc,0.29111,
|
| 634 |
+
≥3,14000,mmstar_average,0.3576571997262333,
|
| 635 |
+
≥3,14000,ocrbench_ocrbench_accuracy,0.556,
|
| 636 |
+
≥3,14000,seedbench_seed_all,0.5277376320177877,
|
| 637 |
+
≥3,14000,textvqa_val_exact_match,0.5396599999999999,0.006765661217822162
|
| 638 |
+
≥3,15000,ai2d_exact_match,0.4268134715025907,0.008902228386480453
|
| 639 |
+
≥3,15000,average,0.46234537077539845,
|
| 640 |
+
≥3,15000,average_rank,3.4,
|
| 641 |
+
≥3,15000,chartqa_relaxed_overall,0.5904,0.009837166458771298
|
| 642 |
+
≥3,15000,docvqa_val_anls,0.6107165887431807,0.006220748699516272
|
| 643 |
+
≥3,15000,infovqa_val_anls,0.24344524225018507,0.006901823884143617
|
| 644 |
+
≥3,15000,mme_total_score,1094.0884353741496,
|
| 645 |
+
≥3,15000,mmmu_val_mmmu_acc,0.31667,
|
| 646 |
+
≥3,15000,mmstar_average,0.35479574154210686,
|
| 647 |
+
≥3,15000,ocrbench_ocrbench_accuracy,0.542,
|
| 648 |
+
≥3,15000,seedbench_seed_all,0.5291272929405225,
|
| 649 |
+
≥3,15000,textvqa_val_exact_match,0.54714,0.006751257658836256
|
| 650 |
+
≥3,16000,ai2d_exact_match,0.4219559585492228,0.008888852746011193
|
| 651 |
+
≥3,16000,average,0.46445166086391904,
|
| 652 |
+
≥3,16000,average_rank,3.2,
|
| 653 |
+
≥3,16000,chartqa_relaxed_overall,0.596,0.009815912634917984
|
| 654 |
+
≥3,16000,docvqa_val_anls,0.6187371260510179,0.0061854867035024364
|
| 655 |
+
≥3,16000,infovqa_val_anls,0.24123922023999142,0.006805599174719394
|
| 656 |
+
≥3,16000,mme_total_score,1190.470988395358,
|
| 657 |
+
≥3,16000,mmmu_val_mmmu_acc,0.30889,
|
| 658 |
+
≥3,16000,mmstar_average,0.34969960791558363,
|
| 659 |
+
≥3,16000,ocrbench_ocrbench_accuracy,0.561,
|
| 660 |
+
≥3,16000,seedbench_seed_all,0.5334630350194552,
|
| 661 |
+
≥3,16000,textvqa_val_exact_match,0.54908,0.006745556405979068
|
| 662 |
+
≥3,17000,ai2d_exact_match,0.42422279792746115,0.008895204147957244
|
| 663 |
+
≥3,17000,average,0.46637762414930467,
|
| 664 |
+
≥3,17000,average_rank,2.8,
|
| 665 |
+
≥3,17000,chartqa_relaxed_overall,0.5904,0.009837166458771298
|
| 666 |
+
≥3,17000,docvqa_val_anls,0.6216027833511425,0.0061927675114663225
|
| 667 |
+
≥3,17000,infovqa_val_anls,0.24846818556149347,0.006941645665075416
|
| 668 |
+
≥3,17000,mme_total_score,1174.2743097238895,
|
| 669 |
+
≥3,17000,mmmu_val_mmmu_acc,0.30556,
|
| 670 |
+
≥3,17000,mmstar_average,0.34999517846362294,
|
| 671 |
+
≥3,17000,ocrbench_ocrbench_accuracy,0.574,
|
| 672 |
+
≥3,17000,seedbench_seed_all,0.5374096720400222,
|
| 673 |
+
≥3,17000,textvqa_val_exact_match,0.54574,0.006745181241004187
|
| 674 |
+
≥3,18000,ai2d_exact_match,0.4264896373056995,0.008901364017155312
|
| 675 |
+
≥3,18000,average,0.46675721155035865,
|
| 676 |
+
≥3,18000,average_rank,3.0,
|
| 677 |
+
≥3,18000,chartqa_relaxed_overall,0.5924,0.009829727637028773
|
| 678 |
+
≥3,18000,docvqa_val_anls,0.6233196080823749,0.006192343138186881
|
| 679 |
+
≥3,18000,infovqa_val_anls,0.2450489373084423,0.0069116354204653415
|
| 680 |
+
≥3,18000,mme_total_score,1183.4800920368148,
|
| 681 |
+
≥3,18000,mmmu_val_mmmu_acc,0.31,
|
| 682 |
+
≥3,18000,mmstar_average,0.35095973404159225,
|
| 683 |
+
≥3,18000,ocrbench_ocrbench_accuracy,0.565,
|
| 684 |
+
≥3,18000,seedbench_seed_all,0.5385769872151195,
|
| 685 |
+
≥3,18000,textvqa_val_exact_match,0.54902,0.006747511667477171
|
| 686 |
+
≥3,19000,ai2d_exact_match,0.4271373056994819,0.008903088856242221
|
| 687 |
+
≥3,19000,average,0.4695625710163449,
|
| 688 |
+
≥3,19000,average_rank,3.2,
|
| 689 |
+
≥3,19000,chartqa_relaxed_overall,0.6,0.009799919151000504
|
| 690 |
+
≥3,19000,docvqa_val_anls,0.6249883761261851,0.006191311654184014
|
| 691 |
+
≥3,19000,infovqa_val_anls,0.25697301832944924,0.007011498374179748
|
| 692 |
+
≥3,19000,mme_total_score,1127.6929771908763,
|
| 693 |
+
≥3,19000,mmmu_val_mmmu_acc,0.30556,
|
| 694 |
+
≥3,19000,mmstar_average,0.354252298914167,
|
| 695 |
+
≥3,19000,ocrbench_ocrbench_accuracy,0.574,
|
| 696 |
+
≥3,19000,seedbench_seed_all,0.533852140077821,
|
| 697 |
+
≥3,19000,textvqa_val_exact_match,0.5493,0.006748704394341216
|
| 698 |
+
≥4,1000,ai2d_exact_match,0.265220207253886,0.00794536023378451
|
| 699 |
+
≥4,1000,average,0.2793207606664983,
|
| 700 |
+
≥4,1000,average_rank,2.8,
|
| 701 |
+
≥4,1000,chartqa_relaxed_overall,0.338,0.009462463489288317
|
| 702 |
+
≥4,1000,docvqa_val_anls,0.35701229672500506,0.005886294752091894
|
| 703 |
+
≥4,1000,infovqa_val_anls,0.1702128015434752,0.006226832451986845
|
| 704 |
+
≥4,1000,mme_total_score,1085.1492597038816,
|
| 705 |
+
≥4,1000,mmmu_val_mmmu_acc,0.25556,
|
| 706 |
+
≥4,1000,mmstar_average,0.20140090679073797,
|
| 707 |
+
≥4,1000,ocrbench_ocrbench_accuracy,0.321,
|
| 708 |
+
≥4,1000,seedbench_seed_all,0.25314063368538076,
|
| 709 |
+
≥4,1000,textvqa_val_exact_match,0.35234,0.006541090280099372
|
| 710 |
+
≥4,2000,ai2d_exact_match,0.265220207253886,0.007945360233784506
|
| 711 |
+
≥4,2000,average,0.31328817598941605,
|
| 712 |
+
≥4,2000,average_rank,2.5,
|
| 713 |
+
≥4,2000,chartqa_relaxed_overall,0.3824,0.009721414421746647
|
| 714 |
+
≥4,2000,docvqa_val_anls,0.4201305033634262,0.006091444830113075
|
| 715 |
+
≥4,2000,infovqa_val_anls,0.19137718920242625,0.006520595312619915
|
| 716 |
+
≥4,2000,mme_total_score,1086.8080232092836,
|
| 717 |
+
≥4,2000,mmmu_val_mmmu_acc,0.26222,
|
| 718 |
+
≥4,2000,mmstar_average,0.21611905818172608,
|
| 719 |
+
≥4,2000,ocrbench_ocrbench_accuracy,0.399,
|
| 720 |
+
≥4,2000,seedbench_seed_all,0.2679266259032796,
|
| 721 |
+
≥4,2000,textvqa_val_exact_match,0.4152,0.00671836433098539
|
| 722 |
+
≥4,3000,ai2d_exact_match,0.26845854922279794,0.007976085014471616
|
| 723 |
+
≥4,3000,average,0.33960193660592997,
|
| 724 |
+
≥4,3000,average_rank,3.1,
|
| 725 |
+
≥4,3000,chartqa_relaxed_overall,0.4464,0.009944363838318645
|
| 726 |
+
≥4,3000,docvqa_val_anls,0.4407858509014982,0.0060983906556816405
|
| 727 |
+
≥4,3000,infovqa_val_anls,0.20212796288555795,0.00665038909519927
|
| 728 |
+
≥4,3000,mme_total_score,1114.6237494998,
|
| 729 |
+
≥4,3000,mmmu_val_mmmu_acc,0.29111,
|
| 730 |
+
≥4,3000,mmstar_average,0.24413840162973033,
|
| 731 |
+
≥4,3000,ocrbench_ocrbench_accuracy,0.437,
|
| 732 |
+
≥4,3000,seedbench_seed_all,0.29399666481378545,
|
| 733 |
+
≥4,3000,textvqa_val_exact_match,0.4324,0.006747677691683722
|
| 734 |
+
≥4,4000,ai2d_exact_match,0.30569948186528495,0.008291875663892657
|
| 735 |
+
≥4,4000,average,0.3669898926039935,
|
| 736 |
+
≥4,4000,average_rank,3.4,
|
| 737 |
+
≥4,4000,chartqa_relaxed_overall,0.4784,0.009992663174896409
|
| 738 |
+
≥4,4000,docvqa_val_anls,0.48059829461468456,0.006293877686600319
|
| 739 |
+
≥4,4000,infovqa_val_anls,0.21045581097075505,0.006806991633534936
|
| 740 |
+
≥4,4000,mme_total_score,1055.9580832332933,
|
| 741 |
+
≥4,4000,mmmu_val_mmmu_acc,0.26778,
|
| 742 |
+
≥4,4000,mmstar_average,0.2427667077973349,
|
| 743 |
+
≥4,4000,ocrbench_ocrbench_accuracy,0.465,
|
| 744 |
+
≥4,4000,seedbench_seed_all,0.41172873818788214,
|
| 745 |
+
≥4,4000,textvqa_val_exact_match,0.4404799999999999,0.0067545391167522255
|
| 746 |
+
≥4,5000,ai2d_exact_match,0.33678756476683935,0.008506208807020249
|
| 747 |
+
≥4,5000,average,0.39120653632259295,
|
| 748 |
+
≥4,5000,average_rank,3.6,
|
| 749 |
+
≥4,5000,chartqa_relaxed_overall,0.49,0.01
|
| 750 |
+
≥4,5000,docvqa_val_anls,0.5128995224283958,0.0062990418586802415
|
| 751 |
+
≥4,5000,infovqa_val_anls,0.2258500728355162,0.006992798311741374
|
| 752 |
+
≥4,5000,mme_total_score,1159.296218487395,
|
| 753 |
+
≥4,5000,mmmu_val_mmmu_acc,0.27889,
|
| 754 |
+
≥4,5000,mmstar_average,0.27755131111938897,
|
| 755 |
+
≥4,5000,ocrbench_ocrbench_accuracy,0.485,
|
| 756 |
+
≥4,5000,seedbench_seed_all,0.4526403557531962,
|
| 757 |
+
≥4,5000,textvqa_val_exact_match,0.46124000000000004,0.006800410399109693
|
| 758 |
+
≥4,6000,ai2d_exact_match,0.3555699481865285,0.008615532040064745
|
| 759 |
+
≥4,6000,average,0.4066720136664597,
|
| 760 |
+
≥4,6000,average_rank,3.8,
|
| 761 |
+
≥4,6000,chartqa_relaxed_overall,0.5164,0.009996618876179197
|
| 762 |
+
≥4,6000,docvqa_val_anls,0.5222903438245158,0.006294204987152361
|
| 763 |
+
≥4,6000,infovqa_val_anls,0.2154967392860663,0.006755296318500426
|
| 764 |
+
≥4,6000,mme_total_score,1106.7833133253303,
|
| 765 |
+
≥4,6000,mmmu_val_mmmu_acc,0.28444,
|
| 766 |
+
≥4,6000,mmstar_average,0.3133409749695097,
|
| 767 |
+
≥4,6000,ocrbench_ocrbench_accuracy,0.497,
|
| 768 |
+
≥4,6000,seedbench_seed_all,0.47821011673151753,
|
| 769 |
+
≥4,6000,textvqa_val_exact_match,0.4773,0.006792419995397721
|
| 770 |
+
≥4,7000,ai2d_exact_match,0.36139896373056996,0.00864649204396549
|
| 771 |
+
≥4,7000,average,0.4103350463820299,
|
| 772 |
+
≥4,7000,average_rank,4.0,
|
| 773 |
+
≥4,7000,chartqa_relaxed_overall,0.5172,0.009996080864671974
|
| 774 |
+
≥4,7000,docvqa_val_anls,0.5257230333803197,0.006272237880714654
|
| 775 |
+
≥4,7000,infovqa_val_anls,0.23099276568068242,0.006959690609260986
|
| 776 |
+
≥4,7000,mme_total_score,1121.3700480192078,
|
| 777 |
+
≥4,7000,mmmu_val_mmmu_acc,0.28444,
|
| 778 |
+
≥4,7000,mmstar_average,0.2974223555916661,
|
| 779 |
+
≥4,7000,ocrbench_ocrbench_accuracy,0.508,
|
| 780 |
+
≥4,7000,seedbench_seed_all,0.48893829905503056,
|
| 781 |
+
≥4,7000,textvqa_val_exact_match,0.4789,0.006778594830020003
|
| 782 |
+
≥4,8000,ai2d_exact_match,0.3753238341968912,0.008714896333400902
|
| 783 |
+
≥4,8000,average,0.4184757440883718,
|
| 784 |
+
≥4,8000,average_rank,4.4,
|
| 785 |
+
≥4,8000,chartqa_relaxed_overall,0.546,0.009959582185560013
|
| 786 |
+
≥4,8000,docvqa_val_anls,0.5520377547382946,0.006292207241711801
|
| 787 |
+
≥4,8000,infovqa_val_anls,0.21912642797830967,0.006806034933740873
|
| 788 |
+
≥4,8000,mme_total_score,1115.3978591436573,
|
| 789 |
+
≥4,8000,mmmu_val_mmmu_acc,0.27222,
|
| 790 |
+
≥4,8000,mmstar_average,0.30256719850330677,
|
| 791 |
+
≥4,8000,ocrbench_ocrbench_accuracy,0.501,
|
| 792 |
+
≥4,8000,seedbench_seed_all,0.49966648137854364,
|
| 793 |
+
≥4,8000,textvqa_val_exact_match,0.49834000000000006,0.006795725513663086
|
| 794 |
+
≥4,9000,ai2d_exact_match,0.37240932642487046,0.008701221016094279
|
| 795 |
+
≥4,9000,average,0.42372729918985835,
|
| 796 |
+
≥4,9000,average_rank,4.1,
|
| 797 |
+
≥4,9000,chartqa_relaxed_overall,0.5412,0.009967987174315731
|
| 798 |
+
≥4,9000,docvqa_val_anls,0.5552693199281612,0.0063045997581429696
|
| 799 |
+
≥4,9000,infovqa_val_anls,0.24796577925731056,0.007300649633229033
|
| 800 |
+
≥4,9000,mme_total_score,1167.27931172469,
|
| 801 |
+
≥4,9000,mmmu_val_mmmu_acc,0.27444,
|
| 802 |
+
≥4,9000,mmstar_average,0.3137012670983827,
|
| 803 |
+
≥4,9000,ocrbench_ocrbench_accuracy,0.51,
|
| 804 |
+
≥4,9000,seedbench_seed_all,0.5,
|
| 805 |
+
≥4,9000,textvqa_val_exact_match,0.49855999999999995,0.0067982960104521145
|
| 806 |
+
≥4,10000,ai2d_exact_match,0.40025906735751293,0.008818284784223729
|
| 807 |
+
≥4,10000,average,0.43549956089714104,
|
| 808 |
+
≥4,10000,average_rank,3.2,
|
| 809 |
+
≥4,10000,chartqa_relaxed_overall,0.554,0.009943497838271193
|
| 810 |
+
≥4,10000,docvqa_val_anls,0.5744988238115305,0.006278177576359751
|
| 811 |
+
≥4,10000,infovqa_val_anls,0.2383657067942184,0.0070367443431813784
|
| 812 |
+
≥4,10000,mme_total_score,1164.3313325330132,
|
| 813 |
+
≥4,10000,mmmu_val_mmmu_acc,0.28889,
|
| 814 |
+
≥4,10000,mmstar_average,0.32102500708710546,
|
| 815 |
+
≥4,10000,ocrbench_ocrbench_accuracy,0.523,
|
| 816 |
+
≥4,10000,seedbench_seed_all,0.5153974430239021,
|
| 817 |
+
≥4,10000,textvqa_val_exact_match,0.5040600000000001,0.006783679454498736
|
| 818 |
+
≥4,11000,ai2d_exact_match,0.38795336787564766,0.00877028496444078
|
| 819 |
+
≥4,11000,average,0.44176056966808264,
|
| 820 |
+
≥4,11000,average_rank,3.4,
|
| 821 |
+
≥4,11000,chartqa_relaxed_overall,0.5736,0.009893046292521752
|
| 822 |
+
≥4,11000,docvqa_val_anls,0.5809715962307931,0.006265890767561776
|
| 823 |
+
≥4,11000,infovqa_val_anls,0.23536745207392892,0.006978481065319724
|
| 824 |
+
≥4,11000,mme_total_score,1179.1470588235293,
|
| 825 |
+
≥4,11000,mmmu_val_mmmu_acc,0.29,
|
| 826 |
+
≥4,11000,mmstar_average,0.3247706096650586,
|
| 827 |
+
≥4,11000,ocrbench_ocrbench_accuracy,0.551,
|
| 828 |
+
≥4,11000,seedbench_seed_all,0.5077821011673151,
|
| 829 |
+
≥4,11000,textvqa_val_exact_match,0.5244000000000001,0.006777175962213506
|
| 830 |
+
≥4,12000,ai2d_exact_match,0.4051165803108808,0.008835632146152574
|
| 831 |
+
≥4,12000,average,0.45346905961254286,
|
| 832 |
+
≥4,12000,average_rank,2.7,
|
| 833 |
+
≥4,12000,chartqa_relaxed_overall,0.566,0.00991448025705367
|
| 834 |
+
≥4,12000,docvqa_val_anls,0.5921021144001818,0.006234512965080029
|
| 835 |
+
≥4,12000,infovqa_val_anls,0.2538480181050717,0.007182280029496359
|
| 836 |
+
≥4,12000,mme_total_score,1209.4413765506204,
|
| 837 |
+
≥4,12000,mmmu_val_mmmu_acc,0.3,
|
| 838 |
+
≥4,12000,mmstar_average,0.35097126616478974,
|
| 839 |
+
≥4,12000,ocrbench_ocrbench_accuracy,0.564,
|
| 840 |
+
≥4,12000,seedbench_seed_all,0.5264035575319622,
|
| 841 |
+
≥4,12000,textvqa_val_exact_match,0.52278,0.006771483213963052
|
| 842 |
+
≥4,13000,ai2d_exact_match,0.41321243523316065,0.00886255263438398
|
| 843 |
+
≥4,13000,average,0.4517297858459223,
|
| 844 |
+
≥4,13000,average_rank,3.7,
|
| 845 |
+
≥4,13000,chartqa_relaxed_overall,0.5612,0.009926794069396146
|
| 846 |
+
≥4,13000,docvqa_val_anls,0.5990624567313981,0.006220638069941083
|
| 847 |
+
≥4,13000,infovqa_val_anls,0.25690374869936067,0.007271041982190199
|
| 848 |
+
≥4,13000,mme_total_score,1155.3058223289315,
|
| 849 |
+
≥4,13000,mmmu_val_mmmu_acc,0.28556,
|
| 850 |
+
≥4,13000,mmstar_average,0.34272201671869795,
|
| 851 |
+
≥4,13000,ocrbench_ocrbench_accuracy,0.558,
|
| 852 |
+
≥4,13000,seedbench_seed_all,0.5253474152306837,
|
| 853 |
+
≥4,13000,textvqa_val_exact_match,0.52356,0.006774582221277897
|
| 854 |
+
≥4,14000,ai2d_exact_match,0.42033678756476683,0.008884198538329086
|
| 855 |
+
≥4,14000,average,0.4566555050830236,
|
| 856 |
+
≥4,14000,average_rank,2.9,
|
| 857 |
+
≥4,14000,chartqa_relaxed_overall,0.5804,0.009871844677005952
|
| 858 |
+
≥4,14000,docvqa_val_anls,0.5986851751311028,0.006220005268131575
|
| 859 |
+
≥4,14000,infovqa_val_anls,0.26286587749144424,0.007367704534540481
|
| 860 |
+
≥4,14000,mme_total_score,1171.1798719487795,
|
| 861 |
+
≥4,14000,mmmu_val_mmmu_acc,0.29889,
|
| 862 |
+
≥4,14000,mmstar_average,0.34666697515411715,
|
| 863 |
+
≥4,14000,ocrbench_ocrbench_accuracy,0.561,
|
| 864 |
+
≥4,14000,seedbench_seed_all,0.526514730405781,
|
| 865 |
+
≥4,14000,textvqa_val_exact_match,0.51454,0.006775138240629185
|
| 866 |
+
≥4,15000,ai2d_exact_match,0.4167746113989637,0.008873613803189377
|
| 867 |
+
≥4,15000,average,0.4535696515209742,
|
| 868 |
+
≥4,15000,average_rank,4.0,
|
| 869 |
+
≥4,15000,chartqa_relaxed_overall,0.5788,0.009877005927832552
|
| 870 |
+
≥4,15000,docvqa_val_anls,0.6057110103452333,0.006218006894177394
|
| 871 |
+
≥4,15000,infovqa_val_anls,0.24773239789580184,0.007161205474173786
|
| 872 |
+
≥4,15000,mme_total_score,1166.4441776710685,
|
| 873 |
+
≥4,15000,mmmu_val_mmmu_acc,0.28333,
|
| 874 |
+
≥4,15000,mmstar_average,0.3337998223700587,
|
| 875 |
+
≥4,15000,ocrbench_ocrbench_accuracy,0.552,
|
| 876 |
+
≥4,15000,seedbench_seed_all,0.5302390216787104,
|
| 877 |
+
≥4,15000,textvqa_val_exact_match,0.53374,0.006755401510636802
|
| 878 |
+
≥4,16000,ai2d_exact_match,0.4180699481865285,0.008877517831066049
|
| 879 |
+
≥4,16000,average,0.45867291897271806,
|
| 880 |
+
≥4,16000,average_rank,3.3,
|
| 881 |
+
≥4,16000,chartqa_relaxed_overall,0.5772,0.009882060820012199
|
| 882 |
+
≥4,16000,docvqa_val_anls,0.6103972706562478,0.0062217369721592605
|
| 883 |
+
≥4,16000,infovqa_val_anls,0.2588879789412925,0.007244086191292801
|
| 884 |
+
≥4,16000,mme_total_score,1221.7044817927172,
|
| 885 |
+
≥4,16000,mmmu_val_mmmu_acc,0.28444,
|
| 886 |
+
≥4,16000,mmstar_average,0.3519979156607769,
|
| 887 |
+
≥4,16000,ocrbench_ocrbench_accuracy,0.566,
|
| 888 |
+
≥4,16000,seedbench_seed_all,0.5296831573096165,
|
| 889 |
+
≥4,16000,textvqa_val_exact_match,0.5313800000000001,0.006770452490445899
|
| 890 |
+
≥4,17000,ai2d_exact_match,0.42033678756476683,0.008884198538329094
|
| 891 |
+
≥4,17000,average,0.45851865563425853,
|
| 892 |
+
≥4,17000,average_rank,4.0,
|
| 893 |
+
≥4,17000,chartqa_relaxed_overall,0.5852,0.009855721084488851
|
| 894 |
+
≥4,17000,docvqa_val_anls,0.6114594059922879,0.00618088682319735
|
| 895 |
+
≥4,17000,infovqa_val_anls,0.2562207626214487,0.007200492820461838
|
| 896 |
+
≥4,17000,mme_total_score,1153.1200480192078,
|
| 897 |
+
≥4,17000,mmmu_val_mmmu_acc,0.28667,
|
| 898 |
+
≥4,17000,mmstar_average,0.3472831568700119,
|
| 899 |
+
≥4,17000,ocrbench_ocrbench_accuracy,0.559,
|
| 900 |
+
≥4,17000,seedbench_seed_all,0.532017787659811,
|
| 901 |
+
≥4,17000,textvqa_val_exact_match,0.5284800000000001,0.006763351721247465
|
| 902 |
+
≥4,18000,ai2d_exact_match,0.41483160621761656,0.008867639612484157
|
| 903 |
+
≥4,18000,average,0.4608796157707856,
|
| 904 |
+
≥4,18000,average_rank,3.6,
|
| 905 |
+
≥4,18000,chartqa_relaxed_overall,0.5908,0.009835692163550793
|
| 906 |
+
≥4,18000,docvqa_val_anls,0.6149471910441747,0.006205844131422159
|
| 907 |
+
≥4,18000,infovqa_val_anls,0.25757413998054934,0.007209471673907614
|
| 908 |
+
≥4,18000,mme_total_score,1208.4107643057223,
|
| 909 |
+
≥4,18000,mmmu_val_mmmu_acc,0.28333,
|
| 910 |
+
≥4,18000,mmstar_average,0.34716850185982184,
|
| 911 |
+
≥4,18000,ocrbench_ocrbench_accuracy,0.571,
|
| 912 |
+
≥4,18000,seedbench_seed_all,0.5331851028349083,
|
| 913 |
+
≥4,18000,textvqa_val_exact_match,0.53508,0.006764601295430295
|
| 914 |
+
≥4,19000,ai2d_exact_match,0.4261658031088083,0.008900495747130163
|
| 915 |
+
≥4,19000,average,0.4632638192773651,
|
| 916 |
+
≥4,19000,average_rank,3.8,
|
| 917 |
+
≥4,19000,chartqa_relaxed_overall,0.59,0.009838634025503496
|
| 918 |
+
≥4,19000,docvqa_val_anls,0.6191514566027063,0.006184609999615919
|
| 919 |
+
≥4,19000,infovqa_val_anls,0.254771484204865,0.007169865220056487
|
| 920 |
+
≥4,19000,mme_total_score,1155.6170468187274,
|
| 921 |
+
≥4,19000,mmmu_val_mmmu_acc,0.28667,
|
| 922 |
+
≥4,19000,mmstar_average,0.3639228113475546,
|
| 923 |
+
≥4,19000,ocrbench_ocrbench_accuracy,0.559,
|
| 924 |
+
≥4,19000,seedbench_seed_all,0.5310728182323513,
|
| 925 |
+
≥4,19000,textvqa_val_exact_match,0.5386200000000001,0.0067549703955686585
|
| 926 |
+
≥5,1000,ai2d_exact_match,0.25971502590673573,0.007891865786132416
|
| 927 |
+
≥5,1000,average,0.26157782335380847,
|
| 928 |
+
≥5,1000,average_rank,4.4,
|
| 929 |
+
≥5,1000,chartqa_relaxed_overall,0.2932,0.009106408439657643
|
| 930 |
+
≥5,1000,docvqa_val_anls,0.33552834229551537,0.005637187546463478
|
| 931 |
+
≥5,1000,infovqa_val_anls,0.14237853126797878,0.005593365396144926
|
| 932 |
+
≥5,1000,mme_total_score,968.6369547819128,
|
| 933 |
+
≥5,1000,mmmu_val_mmmu_acc,0.24556,
|
| 934 |
+
≥5,1000,mmstar_average,0.22142932783466918,
|
| 935 |
+
≥5,1000,ocrbench_ocrbench_accuracy,0.312,
|
| 936 |
+
≥5,1000,seedbench_seed_all,0.2525291828793774,
|
| 937 |
+
≥5,1000,textvqa_val_exact_match,0.29186,0.006221486113292513
|
| 938 |
+
≥5,2000,ai2d_exact_match,0.25809585492227977,0.007875825748825005
|
| 939 |
+
≥5,2000,average,0.3027645397000401,
|
| 940 |
+
≥5,2000,average_rank,3.7,
|
| 941 |
+
≥5,2000,chartqa_relaxed_overall,0.3408,0.009481461028833927
|
| 942 |
+
≥5,2000,docvqa_val_anls,0.4068080236104559,0.00608411485086675
|
| 943 |
+
≥5,2000,infovqa_val_anls,0.16040581942520143,0.005930443116029954
|
| 944 |
+
≥5,2000,mme_total_score,1068.6808723489396,
|
| 945 |
+
≥5,2000,mmmu_val_mmmu_acc,0.24556,
|
| 946 |
+
≥5,2000,mmstar_average,0.2341315706820568,
|
| 947 |
+
≥5,2000,ocrbench_ocrbench_accuracy,0.416,
|
| 948 |
+
≥5,2000,seedbench_seed_all,0.26725958866036686,
|
| 949 |
+
≥5,2000,textvqa_val_exact_match,0.39582,0.006679538065297116
|
| 950 |
+
≥5,3000,ai2d_exact_match,0.2616580310880829,0.007910929195141643
|
| 951 |
+
≥5,3000,average,0.33060384994476777,
|
| 952 |
+
≥5,3000,average_rank,3.6,
|
| 953 |
+
≥5,3000,chartqa_relaxed_overall,0.3832,0.009725273074549106
|
| 954 |
+
≥5,3000,docvqa_val_anls,0.44828350858716837,0.006213256822478988
|
| 955 |
+
≥5,3000,infovqa_val_anls,0.18913153904026223,0.0065079547987124675
|
| 956 |
+
≥5,3000,mme_total_score,1154.7704081632653,
|
| 957 |
+
≥5,3000,mmmu_val_mmmu_acc,0.26667,
|
| 958 |
+
≥5,3000,mmstar_average,0.25122915833603465,
|
| 959 |
+
≥5,3000,ocrbench_ocrbench_accuracy,0.433,
|
| 960 |
+
≥5,3000,seedbench_seed_all,0.31634241245136185,
|
| 961 |
+
≥5,3000,textvqa_val_exact_match,0.42591999999999997,0.006732985767062625
|
| 962 |
+
≥5,4000,ai2d_exact_match,0.30926165803108807,0.008318624237265801
|
| 963 |
+
≥5,4000,average,0.3627994699316965,
|
| 964 |
+
≥5,4000,average_rank,3.0,
|
| 965 |
+
≥5,4000,chartqa_relaxed_overall,0.4268,0.009894233792716745
|
| 966 |
+
≥5,4000,docvqa_val_anls,0.46900891865422867,0.006307491163968653
|
| 967 |
+
≥5,4000,infovqa_val_anls,0.1867790227760032,0.006445564559285368
|
| 968 |
+
≥5,4000,mme_total_score,1174.8784513805522,
|
| 969 |
+
≥5,4000,mmmu_val_mmmu_acc,0.28333,
|
| 970 |
+
≥5,4000,mmstar_average,0.2869697933480728,
|
| 971 |
+
≥5,4000,ocrbench_ocrbench_accuracy,0.454,
|
| 972 |
+
≥5,4000,seedbench_seed_all,0.41050583657587547,
|
| 973 |
+
≥5,4000,textvqa_val_exact_match,0.43854,0.0067614552527859775
|
| 974 |
+
≥5,5000,ai2d_exact_match,0.34229274611398963,0.008539783270456082
|
| 975 |
+
≥5,5000,average,0.382476547981286,
|
| 976 |
+
≥5,5000,average_rank,3.0,
|
| 977 |
+
≥5,5000,chartqa_relaxed_overall,0.4572,0.00996528909739792
|
| 978 |
+
≥5,5000,docvqa_val_anls,0.4793408867564976,0.006287275417010131
|
| 979 |
+
≥5,5000,infovqa_val_anls,0.1793224766992334,0.006318821081615601
|
| 980 |
+
≥5,5000,mme_total_score,1266.0171068427371,
|
| 981 |
+
≥5,5000,mmmu_val_mmmu_acc,0.28333,
|
| 982 |
+
≥5,5000,mmstar_average,0.2965278361584629,
|
| 983 |
+
≥5,5000,ocrbench_ocrbench_accuracy,0.496,
|
| 984 |
+
≥5,5000,seedbench_seed_all,0.45497498610339077,
|
| 985 |
+
≥5,5000,textvqa_val_exact_match,0.4532999999999999,0.006785206688521816
|
| 986 |
+
≥5,6000,ai2d_exact_match,0.3633419689119171,0.008656504892172956
|
| 987 |
+
≥5,6000,average,0.3927428387872692,
|
| 988 |
+
≥5,6000,average_rank,3.3,
|
| 989 |
+
≥5,6000,chartqa_relaxed_overall,0.4496,0.009951057502505313
|
| 990 |
+
≥5,6000,docvqa_val_anls,0.4903665904735212,0.0062875635436497905
|
| 991 |
+
≥5,6000,infovqa_val_anls,0.19061425722983663,0.006415658242163764
|
| 992 |
+
≥5,6000,mme_total_score,1291.9338735494198,
|
| 993 |
+
≥5,6000,mmmu_val_mmmu_acc,0.29333,
|
| 994 |
+
≥5,6000,mmstar_average,0.31955832446570054,
|
| 995 |
+
≥5,6000,ocrbench_ocrbench_accuracy,0.486,
|
| 996 |
+
≥5,6000,seedbench_seed_all,0.4819344080044469,
|
| 997 |
+
≥5,6000,textvqa_val_exact_match,0.45993999999999996,0.006786622940033285
|
| 998 |
+
≥5,7000,ai2d_exact_match,0.34650259067357514,0.00856459563872305
|
| 999 |
+
≥5,7000,average,0.4018210435787178,
|
| 1000 |
+
≥5,7000,average_rank,3.7,
|
| 1001 |
+
≥5,7000,chartqa_relaxed_overall,0.4672,0.009980456292330589
|
| 1002 |
+
≥5,7000,docvqa_val_anls,0.5111773927622861,0.006315277614665012
|
| 1003 |
+
≥5,7000,infovqa_val_anls,0.1940751656275074,0.006487614451910199
|
| 1004 |
+
≥5,7000,mme_total_score,1190.4136654661866,
|
| 1005 |
+
≥5,7000,mmmu_val_mmmu_acc,0.30333,
|
| 1006 |
+
≥5,7000,mmstar_average,0.3269684121278599,
|
| 1007 |
+
≥5,7000,ocrbench_ocrbench_accuracy,0.495,
|
| 1008 |
+
≥5,7000,seedbench_seed_all,0.4924958310172318,
|
| 1009 |
+
≥5,7000,textvqa_val_exact_match,0.47963999999999996,0.006798760086055511
|
| 1010 |
+
≥5,8000,ai2d_exact_match,0.37694300518134716,0.008722348153640555
|
| 1011 |
+
≥5,8000,average,0.4095380208793758,
|
| 1012 |
+
≥5,8000,average_rank,3.9,
|
| 1013 |
+
≥5,8000,chartqa_relaxed_overall,0.482,0.009995517202509246
|
| 1014 |
+
≥5,8000,docvqa_val_anls,0.525391623371338,0.0063183015218023705
|
| 1015 |
+
≥5,8000,infovqa_val_anls,0.19303661546973522,0.0064531126694776005
|
| 1016 |
+
≥5,8000,mme_total_score,1202.8684473789515,
|
| 1017 |
+
≥5,8000,mmmu_val_mmmu_acc,0.29556,
|
| 1018 |
+
≥5,8000,mmstar_average,0.31571686940613636,
|
| 1019 |
+
≥5,8000,ocrbench_ocrbench_accuracy,0.51,
|
| 1020 |
+
≥5,8000,seedbench_seed_all,0.5013340744858255,
|
| 1021 |
+
≥5,8000,textvqa_val_exact_match,0.48586,0.006796708845479998
|
| 1022 |
+
≥5,9000,ai2d_exact_match,0.36755181347150256,0.008677676304542971
|
| 1023 |
+
≥5,9000,average,0.4152301884271321,
|
| 1024 |
+
≥5,9000,average_rank,3.6,
|
| 1025 |
+
≥5,9000,chartqa_relaxed_overall,0.48,0.009993995796516643
|
| 1026 |
+
≥5,9000,docvqa_val_anls,0.5348533191278194,0.006304701269106288
|
| 1027 |
+
≥5,9000,infovqa_val_anls,0.19959306843788593,0.006473992753624113
|
| 1028 |
+
≥5,9000,mme_total_score,1204.311424569828,
|
| 1029 |
+
≥5,9000,mmmu_val_mmmu_acc,0.30111,
|
| 1030 |
+
≥5,9000,mmstar_average,0.33670210514605864,
|
| 1031 |
+
≥5,9000,ocrbench_ocrbench_accuracy,0.526,
|
| 1032 |
+
≥5,9000,seedbench_seed_all,0.5025013896609227,
|
| 1033 |
+
≥5,9000,textvqa_val_exact_match,0.48876,0.006790814053639094
|
| 1034 |
+
≥5,10000,ai2d_exact_match,0.37338082901554404,0.008705816961084262
|
| 1035 |
+
≥5,10000,average,0.4147710702725824,
|
| 1036 |
+
≥5,10000,average_rank,4.3,
|
| 1037 |
+
≥5,10000,chartqa_relaxed_overall,0.5004,0.010001997399559365
|
| 1038 |
+
≥5,10000,docvqa_val_anls,0.5453333593332199,0.006307378137253011
|
| 1039 |
+
≥5,10000,infovqa_val_anls,0.19201908600396586,0.006324501041207469
|
| 1040 |
+
≥5,10000,mme_total_score,1201.624049619848,
|
| 1041 |
+
≥5,10000,mmmu_val_mmmu_acc,0.28778,
|
| 1042 |
+
≥5,10000,mmstar_average,0.3212133619915626,
|
| 1043 |
+
≥5,10000,ocrbench_ocrbench_accuracy,0.518,
|
| 1044 |
+
≥5,10000,seedbench_seed_all,0.5073929961089494,
|
| 1045 |
+
≥5,10000,textvqa_val_exact_match,0.48741999999999996,0.006796262690428575
|
| 1046 |
+
≥5,11000,ai2d_exact_match,0.39216321243523317,0.008787363693921278
|
| 1047 |
+
≥5,11000,average,0.4233003259407862,
|
| 1048 |
+
≥5,11000,average_rank,4.1,
|
| 1049 |
+
≥5,11000,chartqa_relaxed_overall,0.4996,0.010001997399559365
|
| 1050 |
+
≥5,11000,docvqa_val_anls,0.551420413629631,0.006320257790796602
|
| 1051 |
+
≥5,11000,infovqa_val_anls,0.210676410854829,0.006763210440361733
|
| 1052 |
+
≥5,11000,mme_total_score,1205.9969987995198,
|
| 1053 |
+
≥5,11000,mmmu_val_mmmu_acc,0.28444,
|
| 1054 |
+
≥5,11000,mmstar_average,0.3293872434067487,
|
| 1055 |
+
≥5,11000,ocrbench_ocrbench_accuracy,0.529,
|
| 1056 |
+
≥5,11000,seedbench_seed_all,0.5161756531406337,
|
| 1057 |
+
≥5,11000,textvqa_val_exact_match,0.49684000000000006,0.0068038269593118286
|
| 1058 |
+
≥5,12000,ai2d_exact_match,0.3866580310880829,0.008764891499284331
|
| 1059 |
+
≥5,12000,average,0.42915630067456684,
|
| 1060 |
+
≥5,12000,average_rank,4.2,
|
| 1061 |
+
≥5,12000,chartqa_relaxed_overall,0.5208,0.00999334232158103
|
| 1062 |
+
≥5,12000,docvqa_val_anls,0.5651676550208474,0.006302610383880636
|
| 1063 |
+
≥5,12000,infovqa_val_anls,0.2027930391809884,0.006544451575065131
|
| 1064 |
+
≥5,12000,mme_total_score,1229.9349739895958,
|
| 1065 |
+
≥5,12000,mmmu_val_mmmu_acc,0.28778,
|
| 1066 |
+
≥5,12000,mmstar_average,0.32392609084232776,
|
| 1067 |
+
≥5,12000,ocrbench_ocrbench_accuracy,0.543,
|
| 1068 |
+
≥5,12000,seedbench_seed_all,0.523401889938855,
|
| 1069 |
+
≥5,12000,textvqa_val_exact_match,0.50888,0.006783556032531116
|
| 1070 |
+
≥5,13000,ai2d_exact_match,0.3960492227979275,0.008802520399129762
|
| 1071 |
+
≥5,13000,average,0.42835710337207544,
|
| 1072 |
+
≥5,13000,average_rank,4.2,
|
| 1073 |
+
≥5,13000,chartqa_relaxed_overall,0.5016,0.010001949389825897
|
| 1074 |
+
≥5,13000,docvqa_val_anls,0.5709600314067668,0.006314249102846677
|
| 1075 |
+
≥5,13000,infovqa_val_anls,0.20954434018332707,0.006654090452675221
|
| 1076 |
+
≥5,13000,mme_total_score,1299.5349139655862,
|
| 1077 |
+
≥5,13000,mmmu_val_mmmu_acc,0.28889,
|
| 1078 |
+
≥5,13000,mmstar_average,0.3229485683119639,
|
| 1079 |
+
≥5,13000,ocrbench_ocrbench_accuracy,0.531,
|
| 1080 |
+
≥5,13000,seedbench_seed_all,0.5271817676486937,
|
| 1081 |
+
≥5,13000,textvqa_val_exact_match,0.50704,0.0067891167394013964
|
| 1082 |
+
≥5,14000,ai2d_exact_match,0.39993523316062174,0.00881709625708285
|
| 1083 |
+
≥5,14000,average,0.4331521956786839,
|
| 1084 |
+
≥5,14000,average_rank,4.4,
|
| 1085 |
+
≥5,14000,chartqa_relaxed_overall,0.5184,0.009995225751083666
|
| 1086 |
+
≥5,14000,docvqa_val_anls,0.5724182420273719,0.006294497356115864
|
| 1087 |
+
≥5,14000,infovqa_val_anls,0.20486077238494155,0.0066055382910337555
|
| 1088 |
+
≥5,14000,mme_total_score,1249.124649859944,
|
| 1089 |
+
≥5,14000,mmmu_val_mmmu_acc,0.29778,
|
| 1090 |
+
≥5,14000,mmstar_average,0.3186280983045363,
|
| 1091 |
+
≥5,14000,ocrbench_ocrbench_accuracy,0.545,
|
| 1092 |
+
≥5,14000,seedbench_seed_all,0.5253474152306837,
|
| 1093 |
+
≥5,14000,textvqa_val_exact_match,0.516,0.006773328250950121
|
| 1094 |
+
≥5,15000,ai2d_exact_match,0.4015544041450777,0.008822998789014788
|
| 1095 |
+
≥5,15000,average,0.4411473449525349,
|
| 1096 |
+
≥5,15000,average_rank,3.9,
|
| 1097 |
+
≥5,15000,chartqa_relaxed_overall,0.5284,0.009985853138573692
|
| 1098 |
+
≥5,15000,docvqa_val_anls,0.579157353735036,0.0062656254347961005
|
| 1099 |
+
≥5,15000,infovqa_val_anls,0.21477765510878127,0.0066658509229973765
|
| 1100 |
+
≥5,15000,mme_total_score,1272.857042817127,
|
| 1101 |
+
≥5,15000,mmmu_val_mmmu_acc,0.29778,
|
| 1102 |
+
≥5,15000,mmstar_average,0.32345739197302437,
|
| 1103 |
+
≥5,15000,ocrbench_ocrbench_accuracy,0.574,
|
| 1104 |
+
≥5,15000,seedbench_seed_all,0.5307392996108949,
|
| 1105 |
+
≥5,15000,textvqa_val_exact_match,0.5204599999999999,0.006785535084079623
|
| 1106 |
+
≥5,16000,ai2d_exact_match,0.405440414507772,0.008836756671878079
|
| 1107 |
+
≥5,16000,average,0.43998232270106674,
|
| 1108 |
+
≥5,16000,average_rank,4.6,
|
| 1109 |
+
≥5,16000,chartqa_relaxed_overall,0.5352,0.009977184055667825
|
| 1110 |
+
≥5,16000,docvqa_val_anls,0.5773546915859126,0.006282997503331479
|
| 1111 |
+
≥5,16000,infovqa_val_anls,0.21908996623791824,0.006795378209171745
|
| 1112 |
+
≥5,16000,mme_total_score,1216.8606442577031,
|
| 1113 |
+
≥5,16000,mmmu_val_mmmu_acc,0.29444,
|
| 1114 |
+
≥5,16000,mmstar_average,0.32497171858166657,
|
| 1115 |
+
≥5,16000,ocrbench_ocrbench_accuracy,0.554,
|
| 1116 |
+
≥5,16000,seedbench_seed_all,0.5274041133963313,
|
| 1117 |
+
≥5,16000,textvqa_val_exact_match,0.52194,0.006778238427735974
|
| 1118 |
+
≥5,17000,ai2d_exact_match,0.40479274611398963,0.008834503632021165
|
| 1119 |
+
≥5,17000,average,0.44388314606679896,
|
| 1120 |
+
≥5,17000,average_rank,4.3,
|
| 1121 |
+
≥5,17000,chartqa_relaxed_overall,0.5364,0.009975460887997665
|
| 1122 |
+
≥5,17000,docvqa_val_anls,0.5822797934751336,0.00626122992985784
|
| 1123 |
+
≥5,17000,infovqa_val_anls,0.21556119368142682,0.0067540438882146454
|
| 1124 |
+
≥5,17000,mme_total_score,1221.2239895958382,
|
| 1125 |
+
≥5,17000,mmmu_val_mmmu_acc,0.29778,
|
| 1126 |
+
≥5,17000,mmstar_average,0.33263595987427624,
|
| 1127 |
+
≥5,17000,ocrbench_ocrbench_accuracy,0.564,
|
| 1128 |
+
≥5,17000,seedbench_seed_all,0.5335186214563646,
|
| 1129 |
+
≥5,17000,textvqa_val_exact_match,0.52798,0.006783526160149534
|
| 1130 |
+
≥5,18000,ai2d_exact_match,0.4073834196891192,0.008843420154535592
|
| 1131 |
+
≥5,18000,average,0.44456504203931085,
|
| 1132 |
+
≥5,18000,average_rank,4.3,
|
| 1133 |
+
≥5,18000,chartqa_relaxed_overall,0.542,0.009966651075133582
|
| 1134 |
+
≥5,18000,docvqa_val_anls,0.5939080403347998,0.006256065698832867
|
| 1135 |
+
≥5,18000,infovqa_val_anls,0.217668975074557,0.006723025382951482
|
| 1136 |
+
≥5,18000,mme_total_score,1263.6669667867147,
|
| 1137 |
+
≥5,18000,mmmu_val_mmmu_acc,0.28333,
|
| 1138 |
+
≥5,18000,mmstar_average,0.3216128643225813,
|
| 1139 |
+
≥5,18000,ocrbench_ocrbench_accuracy,0.568,
|
| 1140 |
+
≥5,18000,seedbench_seed_all,0.5357420789327404,
|
| 1141 |
+
≥5,18000,textvqa_val_exact_match,0.5314399999999999,0.006770308168358284
|
| 1142 |
+
≥5,19000,ai2d_exact_match,0.4060880829015544,0.008838993764195596
|
| 1143 |
+
≥5,19000,average,0.44569235541726965,
|
| 1144 |
+
≥5,19000,average_rank,4.1,
|
| 1145 |
+
≥5,19000,chartqa_relaxed_overall,0.5384,0.009972459876198698
|
| 1146 |
+
≥5,19000,docvqa_val_anls,0.5872765253291726,0.006247572686109655
|
| 1147 |
+
≥5,19000,infovqa_val_anls,0.22290098871841885,0.006768484859310975
|
| 1148 |
+
≥5,19000,mme_total_score,1243.9738895558223,
|
| 1149 |
+
≥5,19000,mmmu_val_mmmu_acc,0.29222,
|
| 1150 |
+
≥5,19000,mmstar_average,0.3312170414949971,
|
| 1151 |
+
≥5,19000,ocrbench_ocrbench_accuracy,0.569,
|
| 1152 |
+
≥5,19000,seedbench_seed_all,0.535408560311284,
|
| 1153 |
+
≥5,19000,textvqa_val_exact_match,0.52872,0.006772725173905718
|
| 1154 |
+
≥5,20000,ai2d_exact_match,0.40867875647668395,0.00884778289870743
|
| 1155 |
+
≥5,20000,average,0.4447757248308666,
|
| 1156 |
+
≥5,20000,average_rank,1.9,
|
| 1157 |
+
≥5,20000,chartqa_relaxed_overall,0.5368,0.009974873595254053
|
| 1158 |
+
≥5,20000,docvqa_val_anls,0.5881395593641573,0.00625433143624698
|
| 1159 |
+
≥5,20000,infovqa_val_anls,0.21756373662547837,0.006798638807266341
|
| 1160 |
+
≥5,20000,mme_total_score,1235.672769107643,
|
| 1161 |
+
≥5,20000,mmmu_val_mmmu_acc,0.28667,
|
| 1162 |
+
≥5,20000,mmstar_average,0.32944615805983984,
|
| 1163 |
+
≥5,20000,ocrbench_ocrbench_accuracy,0.57,
|
| 1164 |
+
≥5,20000,seedbench_seed_all,0.5339633129516398,
|
| 1165 |
+
≥5,20000,textvqa_val_exact_match,0.53172,0.006760466633437396
|
app/src/content/embeds/against-baselines-deduplicated.html
ADDED
|
@@ -0,0 +1,576 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-line" style="width:100%;margin:10px 0;"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-line .d3-line__controls select {
|
| 4 |
+
font-size: 12px;
|
| 5 |
+
padding: 8px 28px 8px 10px;
|
| 6 |
+
border: 1px solid var(--border-color);
|
| 7 |
+
border-radius: 8px;
|
| 8 |
+
background-color: var(--surface-bg);
|
| 9 |
+
color: var(--text-color);
|
| 10 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 11 |
+
background-repeat: no-repeat;
|
| 12 |
+
background-position: right 8px center;
|
| 13 |
+
background-size: 12px;
|
| 14 |
+
-webkit-appearance: none;
|
| 15 |
+
-moz-appearance: none;
|
| 16 |
+
appearance: none;
|
| 17 |
+
cursor: pointer;
|
| 18 |
+
transition: border-color .15s ease, box-shadow .15s ease;
|
| 19 |
+
}
|
| 20 |
+
[data-theme="dark"] .d3-line .d3-line__controls select {
|
| 21 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 22 |
+
}
|
| 23 |
+
.d3-line .d3-line__controls select:hover {
|
| 24 |
+
border-color: var(--primary-color);
|
| 25 |
+
}
|
| 26 |
+
.d3-line .d3-line__controls select:focus {
|
| 27 |
+
border-color: var(--primary-color);
|
| 28 |
+
box-shadow: 0 0 0 3px rgba(232,137,171,.25);
|
| 29 |
+
outline: none;
|
| 30 |
+
}
|
| 31 |
+
.d3-line .d3-line__controls label { gap: 8px; }
|
| 32 |
+
|
| 33 |
+
/* Range slider themed with --primary-color */
|
| 34 |
+
.d3-line .d3-line__controls input[type="range"] {
|
| 35 |
+
-webkit-appearance: none;
|
| 36 |
+
appearance: none;
|
| 37 |
+
width: 100%;
|
| 38 |
+
height: 6px;
|
| 39 |
+
border-radius: 999px;
|
| 40 |
+
background: var(--border-color);
|
| 41 |
+
outline: none;
|
| 42 |
+
}
|
| 43 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
|
| 44 |
+
height: 6px;
|
| 45 |
+
background: transparent;
|
| 46 |
+
border-radius: 999px;
|
| 47 |
+
}
|
| 48 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
|
| 49 |
+
-webkit-appearance: none;
|
| 50 |
+
appearance: none;
|
| 51 |
+
width: 16px;
|
| 52 |
+
height: 16px;
|
| 53 |
+
border-radius: 50%;
|
| 54 |
+
background: var(--primary-color);
|
| 55 |
+
border: 2px solid var(--on-primary);
|
| 56 |
+
margin-top: -5px;
|
| 57 |
+
cursor: pointer;
|
| 58 |
+
}
|
| 59 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-track {
|
| 60 |
+
height: 6px;
|
| 61 |
+
background: transparent;
|
| 62 |
+
border-radius: 999px;
|
| 63 |
+
}
|
| 64 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
|
| 65 |
+
width: 16px;
|
| 66 |
+
height: 16px;
|
| 67 |
+
border-radius: 50%;
|
| 68 |
+
background: var(--primary-color);
|
| 69 |
+
border: 2px solid var(--on-primary);
|
| 70 |
+
cursor: pointer;
|
| 71 |
+
}
|
| 72 |
+
/* Improved line color via CSS */
|
| 73 |
+
.d3-line .lines path.improved { stroke: var(--primary-color); }
|
| 74 |
+
</style>
|
| 75 |
+
<script>
|
| 76 |
+
(() => {
|
| 77 |
+
const ensureD3 = (cb) => {
|
| 78 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 79 |
+
let s = document.getElementById('d3-cdn-script');
|
| 80 |
+
if (!s) {
|
| 81 |
+
s = document.createElement('script');
|
| 82 |
+
s.id = 'd3-cdn-script';
|
| 83 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 84 |
+
document.head.appendChild(s);
|
| 85 |
+
}
|
| 86 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 87 |
+
s.addEventListener('load', onReady, { once: true });
|
| 88 |
+
if (window.d3) onReady();
|
| 89 |
+
};
|
| 90 |
+
|
| 91 |
+
const bootstrap = () => {
|
| 92 |
+
const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
|
| 93 |
+
const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
|
| 94 |
+
if (!container) return;
|
| 95 |
+
if (container.dataset) {
|
| 96 |
+
if (container.dataset.mounted === 'true') return;
|
| 97 |
+
container.dataset.mounted = 'true';
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
// CSV: prefer public path, fallback to relative
|
| 101 |
+
const CSV_PATHS = [
|
| 102 |
+
'/data/against_baselines_deduplicated.csv',
|
| 103 |
+
'./assets/data/against_baselines_deduplicated.csv',
|
| 104 |
+
'../assets/data/against_baselines_deduplicated.csv',
|
| 105 |
+
'../../assets/data/against_baselines_deduplicated.csv'
|
| 106 |
+
];
|
| 107 |
+
const fetchFirstAvailable = async (paths) => {
|
| 108 |
+
for (const p of paths) {
|
| 109 |
+
try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
|
| 110 |
+
}
|
| 111 |
+
throw new Error('CSV not found: against_baselines_deduplicated.csv');
|
| 112 |
+
};
|
| 113 |
+
|
| 114 |
+
// Controls UI
|
| 115 |
+
const controls = document.createElement('div');
|
| 116 |
+
controls.className = 'd3-line__controls';
|
| 117 |
+
Object.assign(controls.style, {
|
| 118 |
+
marginTop: '12px',
|
| 119 |
+
display: 'flex',
|
| 120 |
+
gap: '16px',
|
| 121 |
+
alignItems: 'center',
|
| 122 |
+
justifyContent: 'space-between',
|
| 123 |
+
width: '100%'
|
| 124 |
+
});
|
| 125 |
+
|
| 126 |
+
const labelMetric = document.createElement('label');
|
| 127 |
+
Object.assign(labelMetric.style, {
|
| 128 |
+
fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
|
| 129 |
+
});
|
| 130 |
+
labelMetric.textContent = 'Metric';
|
| 131 |
+
const selectMetric = document.createElement('select');
|
| 132 |
+
Object.assign(selectMetric.style, { fontSize: '12px' });
|
| 133 |
+
labelMetric.appendChild(selectMetric);
|
| 134 |
+
|
| 135 |
+
// Inline legend on the right of the select
|
| 136 |
+
const legendInline = document.createElement('div');
|
| 137 |
+
legendInline.className = 'controls__legend';
|
| 138 |
+
Object.assign(legendInline.style, {
|
| 139 |
+
display: 'flex',
|
| 140 |
+
gap: '8px',
|
| 141 |
+
alignItems: 'center',
|
| 142 |
+
flexWrap: 'nowrap',
|
| 143 |
+
fontSize: '11px',
|
| 144 |
+
marginLeft: '8px'
|
| 145 |
+
});
|
| 146 |
+
controls.appendChild(legendInline);
|
| 147 |
+
controls.appendChild(labelMetric);
|
| 148 |
+
|
| 149 |
+
// Create SVG with marker definitions
|
| 150 |
+
const svg = d3.select(container).append('svg')
|
| 151 |
+
.attr('width', '100%')
|
| 152 |
+
.style('display', 'block');
|
| 153 |
+
|
| 154 |
+
// Add marker definitions for different shapes
|
| 155 |
+
const defs = svg.append('defs');
|
| 156 |
+
|
| 157 |
+
// Academic marker shapes
|
| 158 |
+
const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
|
| 159 |
+
const markerSize = 8;
|
| 160 |
+
|
| 161 |
+
// Groups
|
| 162 |
+
const gRoot = svg.append('g');
|
| 163 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 164 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 165 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 166 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 167 |
+
const gHover = gRoot.append('g').attr('class', 'hover');
|
| 168 |
+
const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
|
| 169 |
+
|
| 170 |
+
// Tooltip
|
| 171 |
+
container.style.position = container.style.position || 'relative';
|
| 172 |
+
let tip = container.querySelector('.d3-tooltip');
|
| 173 |
+
let tipInner;
|
| 174 |
+
if (!tip) {
|
| 175 |
+
tip = document.createElement('div');
|
| 176 |
+
tip.className = 'd3-tooltip';
|
| 177 |
+
Object.assign(tip.style, {
|
| 178 |
+
position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
|
| 179 |
+
padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
|
| 180 |
+
background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
|
| 181 |
+
transition: 'opacity .12s ease'
|
| 182 |
+
});
|
| 183 |
+
tipInner = document.createElement('div');
|
| 184 |
+
tipInner.className = 'd3-tooltip__inner';
|
| 185 |
+
tipInner.style.textAlign = 'left';
|
| 186 |
+
tip.appendChild(tipInner);
|
| 187 |
+
container.appendChild(tip);
|
| 188 |
+
} else {
|
| 189 |
+
tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
// Colors per run
|
| 193 |
+
const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
|
| 194 |
+
const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
|
| 195 |
+
|
| 196 |
+
// Mapping from metric names to display titles
|
| 197 |
+
const metricTitleMapping = {
|
| 198 |
+
'docvqa_val_anls': 'DocVQA',
|
| 199 |
+
'infovqa_val_anls': 'InfoVQA',
|
| 200 |
+
'mme_total_score': 'MME Total',
|
| 201 |
+
'mmmu_val_mmmu_acc': 'MMMU',
|
| 202 |
+
'mmstar_average': 'MMStar',
|
| 203 |
+
'ocrbench_ocrbench_accuracy': 'OCRBench',
|
| 204 |
+
'scienceqa_exact_match': 'ScienceQA',
|
| 205 |
+
'textvqa_val_exact_match': 'TextVQA',
|
| 206 |
+
'average': 'Average (excl. MME)',
|
| 207 |
+
'average_rank': 'Average Rank',
|
| 208 |
+
'ai2d_exact_match': 'AI2D',
|
| 209 |
+
'chartqa_relaxed_overall': 'ChartQA',
|
| 210 |
+
'seedbench_seed_all': 'SeedBench'
|
| 211 |
+
};
|
| 212 |
+
|
| 213 |
+
// Function to get display name for metric
|
| 214 |
+
function getMetricDisplayName(metricKey) {
|
| 215 |
+
return metricTitleMapping[metricKey] || metricKey;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// State and data
|
| 219 |
+
let metricList = [];
|
| 220 |
+
let runList = [];
|
| 221 |
+
let runOrder = [];
|
| 222 |
+
const dataByMetric = new Map(); // metric => { run => [{step,value}] }
|
| 223 |
+
let isRankStrictFlag = false;
|
| 224 |
+
let rankTickMax = 1;
|
| 225 |
+
|
| 226 |
+
// Scales and layout
|
| 227 |
+
let width = 800, height = 360;
|
| 228 |
+
let margin = { top: 16, right: 28, bottom: 56, left: 64 };
|
| 229 |
+
let xScale = d3.scaleLinear();
|
| 230 |
+
let yScale = d3.scaleLinear();
|
| 231 |
+
|
| 232 |
+
// Line generators - simple linear connections
|
| 233 |
+
const lineGen = d3.line()
|
| 234 |
+
.x((d) => xScale(d.step))
|
| 235 |
+
.y((d) => yScale(d.value));
|
| 236 |
+
|
| 237 |
+
// Function to draw different marker shapes
|
| 238 |
+
function drawMarker(selection, shape, size) {
|
| 239 |
+
const s = size / 2;
|
| 240 |
+
switch (shape) {
|
| 241 |
+
case 'circle':
|
| 242 |
+
return selection.append('circle').attr('r', s);
|
| 243 |
+
case 'square':
|
| 244 |
+
return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
|
| 245 |
+
case 'triangle':
|
| 246 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
|
| 247 |
+
case 'diamond':
|
| 248 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
|
| 249 |
+
case 'inverted-triangle':
|
| 250 |
+
return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
|
| 251 |
+
default:
|
| 252 |
+
return selection.append('circle').attr('r', s);
|
| 253 |
+
}
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
// Hover elements
|
| 257 |
+
const hoverLine = gHover.append('line').attr('stroke-width', 1);
|
| 258 |
+
|
| 259 |
+
const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
|
| 260 |
+
|
| 261 |
+
function updateScales() {
|
| 262 |
+
const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
|
| 263 |
+
const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
|
| 264 |
+
const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
|
| 265 |
+
const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
|
| 266 |
+
|
| 267 |
+
width = container.clientWidth || 800;
|
| 268 |
+
height = Math.max(360, Math.round(width / 2.2));
|
| 269 |
+
svg.attr('width', width).attr('height', height);
|
| 270 |
+
|
| 271 |
+
const innerWidth = width - margin.left - margin.right;
|
| 272 |
+
const innerHeight = height - margin.top - margin.bottom;
|
| 273 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 274 |
+
|
| 275 |
+
xScale.range([0, innerWidth]);
|
| 276 |
+
yScale.range([innerHeight, 0]);
|
| 277 |
+
|
| 278 |
+
// Compute Y ticks
|
| 279 |
+
let yTicks = [];
|
| 280 |
+
if (isRankStrictFlag) {
|
| 281 |
+
const maxR = Math.max(1, Math.round(rankTickMax));
|
| 282 |
+
for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
|
| 283 |
+
} else {
|
| 284 |
+
// Use D3's tick generator to produce nice floating-point ticks
|
| 285 |
+
yTicks = yScale.ticks(6);
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
// Grid (horizontal)
|
| 289 |
+
gGrid.selectAll('*').remove();
|
| 290 |
+
gGrid.selectAll('line')
|
| 291 |
+
.data(yTicks)
|
| 292 |
+
.join('line')
|
| 293 |
+
.attr('x1', 0)
|
| 294 |
+
.attr('x2', innerWidth)
|
| 295 |
+
.attr('y1', (d) => yScale(d))
|
| 296 |
+
.attr('y2', (d) => yScale(d))
|
| 297 |
+
.attr('stroke', gridColor)
|
| 298 |
+
.attr('stroke-width', 1)
|
| 299 |
+
.attr('shape-rendering', 'crispEdges');
|
| 300 |
+
|
| 301 |
+
// Axes
|
| 302 |
+
gAxes.selectAll('*').remove();
|
| 303 |
+
let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
|
| 304 |
+
if (isRankStrictFlag) {
|
| 305 |
+
const [dx0, dx1] = xScale.domain();
|
| 306 |
+
const start = Math.ceil(dx0 / 1000) * 1000;
|
| 307 |
+
const end = Math.floor(dx1 / 1000) * 1000;
|
| 308 |
+
const xTicks = [];
|
| 309 |
+
for (let v = start; v <= end; v += 1000) xTicks.push(v);
|
| 310 |
+
if (xTicks.length === 0) xTicks.push(Math.round(dx0));
|
| 311 |
+
xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
|
| 312 |
+
} else {
|
| 313 |
+
xAxis = xAxis.ticks(8);
|
| 314 |
+
}
|
| 315 |
+
const yAxis = d3.axisLeft(yScale)
|
| 316 |
+
.tickValues(yTicks)
|
| 317 |
+
.tickSizeOuter(0)
|
| 318 |
+
.tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
|
| 319 |
+
gAxes.append('g')
|
| 320 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 321 |
+
.call(xAxis)
|
| 322 |
+
.call((g) => {
|
| 323 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 324 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 325 |
+
});
|
| 326 |
+
gAxes.append('g')
|
| 327 |
+
.call(yAxis)
|
| 328 |
+
.call((g) => {
|
| 329 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 330 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
// Axis labels (X and Y)
|
| 334 |
+
gAxes.append('text')
|
| 335 |
+
.attr('class', 'axis-label axis-label--x')
|
| 336 |
+
.attr('x', innerWidth / 2)
|
| 337 |
+
.attr('y', innerHeight + 44)
|
| 338 |
+
.attr('text-anchor', 'middle')
|
| 339 |
+
.style('font-size', '12px')
|
| 340 |
+
.style('fill', tickColor)
|
| 341 |
+
.text('Step');
|
| 342 |
+
gAxes.append('text')
|
| 343 |
+
.attr('class', 'axis-label axis-label--y')
|
| 344 |
+
.attr('text-anchor', 'middle')
|
| 345 |
+
.attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
|
| 346 |
+
.style('font-size', '12px')
|
| 347 |
+
.style('fill', tickColor)
|
| 348 |
+
.text('Value');
|
| 349 |
+
|
| 350 |
+
overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
|
| 351 |
+
hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
|
| 352 |
+
|
| 353 |
+
// Legend placeholder; actual content set in renderMetric
|
| 354 |
+
const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
|
| 355 |
+
const legendHeight = 64;
|
| 356 |
+
gLegend
|
| 357 |
+
.attr('x', innerWidth - legendWidth + 42)
|
| 358 |
+
.attr('y', innerHeight - legendHeight - 12)
|
| 359 |
+
.attr('width', legendWidth)
|
| 360 |
+
.attr('height', legendHeight);
|
| 361 |
+
const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
|
| 362 |
+
Object.assign(legendRoot.node().style, {
|
| 363 |
+
background: 'transparent',
|
| 364 |
+
border: 'none',
|
| 365 |
+
borderRadius: '0',
|
| 366 |
+
padding: '0',
|
| 367 |
+
fontSize: '12px',
|
| 368 |
+
lineHeight: '1.35',
|
| 369 |
+
color: 'var(--text-color)'
|
| 370 |
+
});
|
| 371 |
+
|
| 372 |
+
return { innerWidth, innerHeight };
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
function renderMetric(metricKey){
|
| 376 |
+
const map = dataByMetric.get(metricKey) || {};
|
| 377 |
+
const runs = runOrder;
|
| 378 |
+
// Domain
|
| 379 |
+
let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
|
| 380 |
+
const isRank = /rank/i.test(metricKey);
|
| 381 |
+
const isAverage = /average/i.test(metricKey);
|
| 382 |
+
const isRankStrict = isRank && !isAverage;
|
| 383 |
+
runs.forEach(r => {
|
| 384 |
+
const arr = map[r] || [];
|
| 385 |
+
arr.forEach(pt => {
|
| 386 |
+
const val = isRankStrict ? Math.round(pt.value) : pt.value;
|
| 387 |
+
minStep = Math.min(minStep, pt.step);
|
| 388 |
+
maxStep = Math.max(maxStep, pt.step);
|
| 389 |
+
maxVal = Math.max(maxVal, val);
|
| 390 |
+
minVal = Math.min(minVal, val);
|
| 391 |
+
});
|
| 392 |
+
});
|
| 393 |
+
if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
|
| 394 |
+
xScale.domain([minStep, maxStep]);
|
| 395 |
+
if (isRank) {
|
| 396 |
+
rankTickMax = Math.max(1, Math.round(maxVal));
|
| 397 |
+
yScale.domain([rankTickMax, 1]);
|
| 398 |
+
} else {
|
| 399 |
+
yScale.domain([0, Math.max(1, maxVal)]).nice();
|
| 400 |
+
}
|
| 401 |
+
isRankStrictFlag = isRankStrict;
|
| 402 |
+
|
| 403 |
+
const { innerWidth, innerHeight } = updateScales();
|
| 404 |
+
|
| 405 |
+
// Bind lines and markers
|
| 406 |
+
const series = runs.map((r, i) => ({
|
| 407 |
+
run: r,
|
| 408 |
+
color: pool[i % pool.length],
|
| 409 |
+
marker: markerShapes[i % markerShapes.length],
|
| 410 |
+
values: (map[r]||[])
|
| 411 |
+
.slice()
|
| 412 |
+
.sort((a,b)=>a.step-b.step)
|
| 413 |
+
.map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
|
| 414 |
+
}));
|
| 415 |
+
|
| 416 |
+
// Draw lines
|
| 417 |
+
const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
|
| 418 |
+
paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
|
| 419 |
+
.attr('stroke', d=>d.color).attr('opacity',0.9)
|
| 420 |
+
.attr('d', d=>lineGen(d.values))
|
| 421 |
+
.merge(paths)
|
| 422 |
+
.transition().duration(200)
|
| 423 |
+
.attr('stroke', d=>d.color)
|
| 424 |
+
.attr('d', d=>lineGen(d.values));
|
| 425 |
+
paths.exit().remove();
|
| 426 |
+
|
| 427 |
+
// Draw markers for each data point
|
| 428 |
+
gPoints.selectAll('*').remove();
|
| 429 |
+
series.forEach((s, seriesIndex) => {
|
| 430 |
+
const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
|
| 431 |
+
.data(s.values)
|
| 432 |
+
.join('g')
|
| 433 |
+
.attr('class', `points-${seriesIndex}`)
|
| 434 |
+
.attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
|
| 435 |
+
|
| 436 |
+
drawMarker(pointGroup, s.marker, markerSize)
|
| 437 |
+
.attr('fill', s.color)
|
| 438 |
+
.attr('stroke', s.color)
|
| 439 |
+
.attr('stroke-width', 1.5)
|
| 440 |
+
.style('cursor', 'crosshair');
|
| 441 |
+
});
|
| 442 |
+
|
| 443 |
+
// Inline legend content with marker shapes
|
| 444 |
+
legendInline.innerHTML = '';
|
| 445 |
+
series.forEach(s => {
|
| 446 |
+
const legendItem = document.createElement('span');
|
| 447 |
+
legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
|
| 448 |
+
|
| 449 |
+
// Create small SVG for marker shape
|
| 450 |
+
const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
|
| 451 |
+
markerSvg.setAttribute('width', '16');
|
| 452 |
+
markerSvg.setAttribute('height', '12');
|
| 453 |
+
markerSvg.style.display = 'inline-block';
|
| 454 |
+
|
| 455 |
+
const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
|
| 456 |
+
g.setAttribute('transform', 'translate(8,6)');
|
| 457 |
+
|
| 458 |
+
let shape;
|
| 459 |
+
const size = 6;
|
| 460 |
+
const halfSize = size / 2;
|
| 461 |
+
switch(s.marker) {
|
| 462 |
+
case 'circle':
|
| 463 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 464 |
+
shape.setAttribute('r', halfSize);
|
| 465 |
+
break;
|
| 466 |
+
case 'square':
|
| 467 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
|
| 468 |
+
shape.setAttribute('x', -halfSize);
|
| 469 |
+
shape.setAttribute('y', -halfSize);
|
| 470 |
+
shape.setAttribute('width', size);
|
| 471 |
+
shape.setAttribute('height', size);
|
| 472 |
+
break;
|
| 473 |
+
case 'triangle':
|
| 474 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 475 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
|
| 476 |
+
break;
|
| 477 |
+
case 'diamond':
|
| 478 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 479 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
|
| 480 |
+
break;
|
| 481 |
+
case 'inverted-triangle':
|
| 482 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 483 |
+
shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
|
| 484 |
+
break;
|
| 485 |
+
default:
|
| 486 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 487 |
+
shape.setAttribute('r', halfSize);
|
| 488 |
+
}
|
| 489 |
+
shape.setAttribute('fill', s.color);
|
| 490 |
+
shape.setAttribute('stroke', s.color);
|
| 491 |
+
shape.setAttribute('stroke-width', '1');
|
| 492 |
+
|
| 493 |
+
g.appendChild(shape);
|
| 494 |
+
markerSvg.appendChild(g);
|
| 495 |
+
|
| 496 |
+
const label = document.createElement('span');
|
| 497 |
+
label.textContent = s.run;
|
| 498 |
+
|
| 499 |
+
legendItem.appendChild(markerSvg);
|
| 500 |
+
legendItem.appendChild(label);
|
| 501 |
+
legendInline.appendChild(legendItem);
|
| 502 |
+
});
|
| 503 |
+
|
| 504 |
+
// Hover
|
| 505 |
+
const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
|
| 506 |
+
const steps = Array.from(stepSet).sort((a,b)=>a-b);
|
| 507 |
+
function onMove(event){
|
| 508 |
+
const [mx, my] = d3.pointer(event, overlay.node());
|
| 509 |
+
const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
|
| 510 |
+
const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
|
| 511 |
+
const xpx = xScale(nearest);
|
| 512 |
+
hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
|
| 513 |
+
// Tooltip content
|
| 514 |
+
let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
|
| 515 |
+
series.forEach(s=>{
|
| 516 |
+
const m = new Map(s.values.map(v=>[v.step, v.value]));
|
| 517 |
+
const val = m.has(nearest) ? m.get(nearest) : null;
|
| 518 |
+
if (val != null) {
|
| 519 |
+
const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
|
| 520 |
+
html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
|
| 521 |
+
}
|
| 522 |
+
});
|
| 523 |
+
tipInner.innerHTML = html;
|
| 524 |
+
const offsetX = 12, offsetY = 12;
|
| 525 |
+
tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
|
| 526 |
+
}
|
| 527 |
+
function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
|
| 528 |
+
overlay.on('mousemove', onMove).on('mouseleave', onLeave);
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
// (old hover removed; hover is attached in renderMetric)
|
| 532 |
+
|
| 533 |
+
// Load CSV and wire controls
|
| 534 |
+
(async () => {
|
| 535 |
+
try {
|
| 536 |
+
const text = await fetchFirstAvailable(CSV_PATHS);
|
| 537 |
+
const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
|
| 538 |
+
metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
|
| 539 |
+
runList = Array.from(new Set(rows.map(r=>r.run))).sort();
|
| 540 |
+
runOrder = runList;
|
| 541 |
+
// Build dataByMetric
|
| 542 |
+
metricList.forEach(m => {
|
| 543 |
+
const map = {};
|
| 544 |
+
runList.forEach(r => { map[r] = []; });
|
| 545 |
+
rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
|
| 546 |
+
dataByMetric.set(m, map);
|
| 547 |
+
});
|
| 548 |
+
|
| 549 |
+
// Populate metric select (default to average_rank if present)
|
| 550 |
+
metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
|
| 551 |
+
const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
|
| 552 |
+
if (def) selectMetric.value = def;
|
| 553 |
+
|
| 554 |
+
container.appendChild(controls);
|
| 555 |
+
updateScales();
|
| 556 |
+
renderMetric(selectMetric.value);
|
| 557 |
+
|
| 558 |
+
selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
|
| 559 |
+
|
| 560 |
+
const rerender = () => { renderMetric(selectMetric.value); };
|
| 561 |
+
if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
|
| 562 |
+
} catch (e) {
|
| 563 |
+
const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
|
| 564 |
+
pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
|
| 565 |
+
container.appendChild(pre);
|
| 566 |
+
}
|
| 567 |
+
})();
|
| 568 |
+
};
|
| 569 |
+
|
| 570 |
+
if (document.readyState === 'loading') {
|
| 571 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 572 |
+
} else { ensureD3(bootstrap); }
|
| 573 |
+
})();
|
| 574 |
+
</script>
|
| 575 |
+
|
| 576 |
+
|
app/src/content/embeds/{d3-line.html → against-baselines.html}
RENAMED
|
File without changes
|
app/src/content/embeds/all-ratings.html
ADDED
|
@@ -0,0 +1,576 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-line" style="width:100%;margin:10px 0;"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-line .d3-line__controls select {
|
| 4 |
+
font-size: 12px;
|
| 5 |
+
padding: 8px 28px 8px 10px;
|
| 6 |
+
border: 1px solid var(--border-color);
|
| 7 |
+
border-radius: 8px;
|
| 8 |
+
background-color: var(--surface-bg);
|
| 9 |
+
color: var(--text-color);
|
| 10 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 11 |
+
background-repeat: no-repeat;
|
| 12 |
+
background-position: right 8px center;
|
| 13 |
+
background-size: 12px;
|
| 14 |
+
-webkit-appearance: none;
|
| 15 |
+
-moz-appearance: none;
|
| 16 |
+
appearance: none;
|
| 17 |
+
cursor: pointer;
|
| 18 |
+
transition: border-color .15s ease, box-shadow .15s ease;
|
| 19 |
+
}
|
| 20 |
+
[data-theme="dark"] .d3-line .d3-line__controls select {
|
| 21 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 22 |
+
}
|
| 23 |
+
.d3-line .d3-line__controls select:hover {
|
| 24 |
+
border-color: var(--primary-color);
|
| 25 |
+
}
|
| 26 |
+
.d3-line .d3-line__controls select:focus {
|
| 27 |
+
border-color: var(--primary-color);
|
| 28 |
+
box-shadow: 0 0 0 3px rgba(232,137,171,.25);
|
| 29 |
+
outline: none;
|
| 30 |
+
}
|
| 31 |
+
.d3-line .d3-line__controls label { gap: 8px; }
|
| 32 |
+
|
| 33 |
+
/* Range slider themed with --primary-color */
|
| 34 |
+
.d3-line .d3-line__controls input[type="range"] {
|
| 35 |
+
-webkit-appearance: none;
|
| 36 |
+
appearance: none;
|
| 37 |
+
width: 100%;
|
| 38 |
+
height: 6px;
|
| 39 |
+
border-radius: 999px;
|
| 40 |
+
background: var(--border-color);
|
| 41 |
+
outline: none;
|
| 42 |
+
}
|
| 43 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
|
| 44 |
+
height: 6px;
|
| 45 |
+
background: transparent;
|
| 46 |
+
border-radius: 999px;
|
| 47 |
+
}
|
| 48 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
|
| 49 |
+
-webkit-appearance: none;
|
| 50 |
+
appearance: none;
|
| 51 |
+
width: 16px;
|
| 52 |
+
height: 16px;
|
| 53 |
+
border-radius: 50%;
|
| 54 |
+
background: var(--primary-color);
|
| 55 |
+
border: 2px solid var(--on-primary);
|
| 56 |
+
margin-top: -5px;
|
| 57 |
+
cursor: pointer;
|
| 58 |
+
}
|
| 59 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-track {
|
| 60 |
+
height: 6px;
|
| 61 |
+
background: transparent;
|
| 62 |
+
border-radius: 999px;
|
| 63 |
+
}
|
| 64 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
|
| 65 |
+
width: 16px;
|
| 66 |
+
height: 16px;
|
| 67 |
+
border-radius: 50%;
|
| 68 |
+
background: var(--primary-color);
|
| 69 |
+
border: 2px solid var(--on-primary);
|
| 70 |
+
cursor: pointer;
|
| 71 |
+
}
|
| 72 |
+
/* Improved line color via CSS */
|
| 73 |
+
.d3-line .lines path.improved { stroke: var(--primary-color); }
|
| 74 |
+
</style>
|
| 75 |
+
<script>
|
| 76 |
+
(() => {
|
| 77 |
+
const ensureD3 = (cb) => {
|
| 78 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 79 |
+
let s = document.getElementById('d3-cdn-script');
|
| 80 |
+
if (!s) {
|
| 81 |
+
s = document.createElement('script');
|
| 82 |
+
s.id = 'd3-cdn-script';
|
| 83 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 84 |
+
document.head.appendChild(s);
|
| 85 |
+
}
|
| 86 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 87 |
+
s.addEventListener('load', onReady, { once: true });
|
| 88 |
+
if (window.d3) onReady();
|
| 89 |
+
};
|
| 90 |
+
|
| 91 |
+
const bootstrap = () => {
|
| 92 |
+
const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
|
| 93 |
+
const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
|
| 94 |
+
if (!container) return;
|
| 95 |
+
if (container.dataset) {
|
| 96 |
+
if (container.dataset.mounted === 'true') return;
|
| 97 |
+
container.dataset.mounted = 'true';
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
// CSV: prefer public path, fallback to relative
|
| 101 |
+
const CSV_PATHS = [
|
| 102 |
+
'/data/all_ratings_luis.csv',
|
| 103 |
+
'./assets/data/all_ratings_luis.csv',
|
| 104 |
+
'../assets/data/all_ratings_luis.csv',
|
| 105 |
+
'../../assets/data/all_ratings_luis.csv'
|
| 106 |
+
];
|
| 107 |
+
const fetchFirstAvailable = async (paths) => {
|
| 108 |
+
for (const p of paths) {
|
| 109 |
+
try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
|
| 110 |
+
}
|
| 111 |
+
throw new Error('CSV not found: all_ratings_luis.csv');
|
| 112 |
+
};
|
| 113 |
+
|
| 114 |
+
// Controls UI
|
| 115 |
+
const controls = document.createElement('div');
|
| 116 |
+
controls.className = 'd3-line__controls';
|
| 117 |
+
Object.assign(controls.style, {
|
| 118 |
+
marginTop: '12px',
|
| 119 |
+
display: 'flex',
|
| 120 |
+
gap: '16px',
|
| 121 |
+
alignItems: 'center',
|
| 122 |
+
justifyContent: 'space-between',
|
| 123 |
+
width: '100%'
|
| 124 |
+
});
|
| 125 |
+
|
| 126 |
+
const labelMetric = document.createElement('label');
|
| 127 |
+
Object.assign(labelMetric.style, {
|
| 128 |
+
fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
|
| 129 |
+
});
|
| 130 |
+
labelMetric.textContent = 'Metric';
|
| 131 |
+
const selectMetric = document.createElement('select');
|
| 132 |
+
Object.assign(selectMetric.style, { fontSize: '12px' });
|
| 133 |
+
labelMetric.appendChild(selectMetric);
|
| 134 |
+
|
| 135 |
+
// Inline legend on the right of the select
|
| 136 |
+
const legendInline = document.createElement('div');
|
| 137 |
+
legendInline.className = 'controls__legend';
|
| 138 |
+
Object.assign(legendInline.style, {
|
| 139 |
+
display: 'flex',
|
| 140 |
+
gap: '8px',
|
| 141 |
+
alignItems: 'center',
|
| 142 |
+
flexWrap: 'nowrap',
|
| 143 |
+
fontSize: '11px',
|
| 144 |
+
marginLeft: '8px'
|
| 145 |
+
});
|
| 146 |
+
controls.appendChild(legendInline);
|
| 147 |
+
controls.appendChild(labelMetric);
|
| 148 |
+
|
| 149 |
+
// Create SVG with marker definitions
|
| 150 |
+
const svg = d3.select(container).append('svg')
|
| 151 |
+
.attr('width', '100%')
|
| 152 |
+
.style('display', 'block');
|
| 153 |
+
|
| 154 |
+
// Add marker definitions for different shapes
|
| 155 |
+
const defs = svg.append('defs');
|
| 156 |
+
|
| 157 |
+
// Academic marker shapes
|
| 158 |
+
const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
|
| 159 |
+
const markerSize = 8;
|
| 160 |
+
|
| 161 |
+
// Groups
|
| 162 |
+
const gRoot = svg.append('g');
|
| 163 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 164 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 165 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 166 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 167 |
+
const gHover = gRoot.append('g').attr('class', 'hover');
|
| 168 |
+
const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
|
| 169 |
+
|
| 170 |
+
// Tooltip
|
| 171 |
+
container.style.position = container.style.position || 'relative';
|
| 172 |
+
let tip = container.querySelector('.d3-tooltip');
|
| 173 |
+
let tipInner;
|
| 174 |
+
if (!tip) {
|
| 175 |
+
tip = document.createElement('div');
|
| 176 |
+
tip.className = 'd3-tooltip';
|
| 177 |
+
Object.assign(tip.style, {
|
| 178 |
+
position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
|
| 179 |
+
padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
|
| 180 |
+
background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
|
| 181 |
+
transition: 'opacity .12s ease'
|
| 182 |
+
});
|
| 183 |
+
tipInner = document.createElement('div');
|
| 184 |
+
tipInner.className = 'd3-tooltip__inner';
|
| 185 |
+
tipInner.style.textAlign = 'left';
|
| 186 |
+
tip.appendChild(tipInner);
|
| 187 |
+
container.appendChild(tip);
|
| 188 |
+
} else {
|
| 189 |
+
tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
// Colors per run
|
| 193 |
+
const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
|
| 194 |
+
const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
|
| 195 |
+
|
| 196 |
+
// Mapping from metric names to display titles
|
| 197 |
+
const metricTitleMapping = {
|
| 198 |
+
'docvqa_val_anls': 'DocVQA',
|
| 199 |
+
'infovqa_val_anls': 'InfoVQA',
|
| 200 |
+
'mme_total_score': 'MME Total',
|
| 201 |
+
'mmmu_val_mmmu_acc': 'MMMU',
|
| 202 |
+
'mmstar_average': 'MMStar',
|
| 203 |
+
'ocrbench_ocrbench_accuracy': 'OCRBench',
|
| 204 |
+
'scienceqa_exact_match': 'ScienceQA',
|
| 205 |
+
'textvqa_val_exact_match': 'TextVQA',
|
| 206 |
+
'average': 'Average (excl. MME)',
|
| 207 |
+
'average_rank': 'Average Rank',
|
| 208 |
+
'ai2d_exact_match': 'AI2D',
|
| 209 |
+
'chartqa_relaxed_overall': 'ChartQA',
|
| 210 |
+
'seedbench_seed_all': 'SeedBench'
|
| 211 |
+
};
|
| 212 |
+
|
| 213 |
+
// Function to get display name for metric
|
| 214 |
+
function getMetricDisplayName(metricKey) {
|
| 215 |
+
return metricTitleMapping[metricKey] || metricKey;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// State and data
|
| 219 |
+
let metricList = [];
|
| 220 |
+
let runList = [];
|
| 221 |
+
let runOrder = [];
|
| 222 |
+
const dataByMetric = new Map(); // metric => { run => [{step,value}] }
|
| 223 |
+
let isRankStrictFlag = false;
|
| 224 |
+
let rankTickMax = 1;
|
| 225 |
+
|
| 226 |
+
// Scales and layout
|
| 227 |
+
let width = 800, height = 360;
|
| 228 |
+
let margin = { top: 16, right: 28, bottom: 56, left: 64 };
|
| 229 |
+
let xScale = d3.scaleLinear();
|
| 230 |
+
let yScale = d3.scaleLinear();
|
| 231 |
+
|
| 232 |
+
// Line generators - simple linear connections
|
| 233 |
+
const lineGen = d3.line()
|
| 234 |
+
.x((d) => xScale(d.step))
|
| 235 |
+
.y((d) => yScale(d.value));
|
| 236 |
+
|
| 237 |
+
// Function to draw different marker shapes
|
| 238 |
+
function drawMarker(selection, shape, size) {
|
| 239 |
+
const s = size / 2;
|
| 240 |
+
switch (shape) {
|
| 241 |
+
case 'circle':
|
| 242 |
+
return selection.append('circle').attr('r', s);
|
| 243 |
+
case 'square':
|
| 244 |
+
return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
|
| 245 |
+
case 'triangle':
|
| 246 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
|
| 247 |
+
case 'diamond':
|
| 248 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
|
| 249 |
+
case 'inverted-triangle':
|
| 250 |
+
return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
|
| 251 |
+
default:
|
| 252 |
+
return selection.append('circle').attr('r', s);
|
| 253 |
+
}
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
// Hover elements
|
| 257 |
+
const hoverLine = gHover.append('line').attr('stroke-width', 1);
|
| 258 |
+
|
| 259 |
+
const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
|
| 260 |
+
|
| 261 |
+
function updateScales() {
|
| 262 |
+
const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
|
| 263 |
+
const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
|
| 264 |
+
const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
|
| 265 |
+
const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
|
| 266 |
+
|
| 267 |
+
width = container.clientWidth || 800;
|
| 268 |
+
height = Math.max(360, Math.round(width / 2.2));
|
| 269 |
+
svg.attr('width', width).attr('height', height);
|
| 270 |
+
|
| 271 |
+
const innerWidth = width - margin.left - margin.right;
|
| 272 |
+
const innerHeight = height - margin.top - margin.bottom;
|
| 273 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 274 |
+
|
| 275 |
+
xScale.range([0, innerWidth]);
|
| 276 |
+
yScale.range([innerHeight, 0]);
|
| 277 |
+
|
| 278 |
+
// Compute Y ticks
|
| 279 |
+
let yTicks = [];
|
| 280 |
+
if (isRankStrictFlag) {
|
| 281 |
+
const maxR = Math.max(1, Math.round(rankTickMax));
|
| 282 |
+
for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
|
| 283 |
+
} else {
|
| 284 |
+
// Use D3's tick generator to produce nice floating-point ticks
|
| 285 |
+
yTicks = yScale.ticks(6);
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
// Grid (horizontal)
|
| 289 |
+
gGrid.selectAll('*').remove();
|
| 290 |
+
gGrid.selectAll('line')
|
| 291 |
+
.data(yTicks)
|
| 292 |
+
.join('line')
|
| 293 |
+
.attr('x1', 0)
|
| 294 |
+
.attr('x2', innerWidth)
|
| 295 |
+
.attr('y1', (d) => yScale(d))
|
| 296 |
+
.attr('y2', (d) => yScale(d))
|
| 297 |
+
.attr('stroke', gridColor)
|
| 298 |
+
.attr('stroke-width', 1)
|
| 299 |
+
.attr('shape-rendering', 'crispEdges');
|
| 300 |
+
|
| 301 |
+
// Axes
|
| 302 |
+
gAxes.selectAll('*').remove();
|
| 303 |
+
let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
|
| 304 |
+
if (isRankStrictFlag) {
|
| 305 |
+
const [dx0, dx1] = xScale.domain();
|
| 306 |
+
const start = Math.ceil(dx0 / 1000) * 1000;
|
| 307 |
+
const end = Math.floor(dx1 / 1000) * 1000;
|
| 308 |
+
const xTicks = [];
|
| 309 |
+
for (let v = start; v <= end; v += 1000) xTicks.push(v);
|
| 310 |
+
if (xTicks.length === 0) xTicks.push(Math.round(dx0));
|
| 311 |
+
xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
|
| 312 |
+
} else {
|
| 313 |
+
xAxis = xAxis.ticks(8);
|
| 314 |
+
}
|
| 315 |
+
const yAxis = d3.axisLeft(yScale)
|
| 316 |
+
.tickValues(yTicks)
|
| 317 |
+
.tickSizeOuter(0)
|
| 318 |
+
.tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
|
| 319 |
+
gAxes.append('g')
|
| 320 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 321 |
+
.call(xAxis)
|
| 322 |
+
.call((g) => {
|
| 323 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 324 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 325 |
+
});
|
| 326 |
+
gAxes.append('g')
|
| 327 |
+
.call(yAxis)
|
| 328 |
+
.call((g) => {
|
| 329 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 330 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
// Axis labels (X and Y)
|
| 334 |
+
gAxes.append('text')
|
| 335 |
+
.attr('class', 'axis-label axis-label--x')
|
| 336 |
+
.attr('x', innerWidth / 2)
|
| 337 |
+
.attr('y', innerHeight + 44)
|
| 338 |
+
.attr('text-anchor', 'middle')
|
| 339 |
+
.style('font-size', '12px')
|
| 340 |
+
.style('fill', tickColor)
|
| 341 |
+
.text('Step');
|
| 342 |
+
gAxes.append('text')
|
| 343 |
+
.attr('class', 'axis-label axis-label--y')
|
| 344 |
+
.attr('text-anchor', 'middle')
|
| 345 |
+
.attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
|
| 346 |
+
.style('font-size', '12px')
|
| 347 |
+
.style('fill', tickColor)
|
| 348 |
+
.text('Value');
|
| 349 |
+
|
| 350 |
+
overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
|
| 351 |
+
hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
|
| 352 |
+
|
| 353 |
+
// Legend placeholder; actual content set in renderMetric
|
| 354 |
+
const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
|
| 355 |
+
const legendHeight = 64;
|
| 356 |
+
gLegend
|
| 357 |
+
.attr('x', innerWidth - legendWidth + 42)
|
| 358 |
+
.attr('y', innerHeight - legendHeight - 12)
|
| 359 |
+
.attr('width', legendWidth)
|
| 360 |
+
.attr('height', legendHeight);
|
| 361 |
+
const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
|
| 362 |
+
Object.assign(legendRoot.node().style, {
|
| 363 |
+
background: 'transparent',
|
| 364 |
+
border: 'none',
|
| 365 |
+
borderRadius: '0',
|
| 366 |
+
padding: '0',
|
| 367 |
+
fontSize: '12px',
|
| 368 |
+
lineHeight: '1.35',
|
| 369 |
+
color: 'var(--text-color)'
|
| 370 |
+
});
|
| 371 |
+
|
| 372 |
+
return { innerWidth, innerHeight };
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
function renderMetric(metricKey){
|
| 376 |
+
const map = dataByMetric.get(metricKey) || {};
|
| 377 |
+
const runs = runOrder;
|
| 378 |
+
// Domain
|
| 379 |
+
let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
|
| 380 |
+
const isRank = /rank/i.test(metricKey);
|
| 381 |
+
const isAverage = /average/i.test(metricKey);
|
| 382 |
+
const isRankStrict = isRank && !isAverage;
|
| 383 |
+
runs.forEach(r => {
|
| 384 |
+
const arr = map[r] || [];
|
| 385 |
+
arr.forEach(pt => {
|
| 386 |
+
const val = isRankStrict ? Math.round(pt.value) : pt.value;
|
| 387 |
+
minStep = Math.min(minStep, pt.step);
|
| 388 |
+
maxStep = Math.max(maxStep, pt.step);
|
| 389 |
+
maxVal = Math.max(maxVal, val);
|
| 390 |
+
minVal = Math.min(minVal, val);
|
| 391 |
+
});
|
| 392 |
+
});
|
| 393 |
+
if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
|
| 394 |
+
xScale.domain([minStep, maxStep]);
|
| 395 |
+
if (isRank) {
|
| 396 |
+
rankTickMax = Math.max(1, Math.round(maxVal));
|
| 397 |
+
yScale.domain([rankTickMax, 1]);
|
| 398 |
+
} else {
|
| 399 |
+
yScale.domain([0, Math.max(1, maxVal)]).nice();
|
| 400 |
+
}
|
| 401 |
+
isRankStrictFlag = isRankStrict;
|
| 402 |
+
|
| 403 |
+
const { innerWidth, innerHeight } = updateScales();
|
| 404 |
+
|
| 405 |
+
// Bind lines and markers
|
| 406 |
+
const series = runs.map((r, i) => ({
|
| 407 |
+
run: r,
|
| 408 |
+
color: pool[i % pool.length],
|
| 409 |
+
marker: markerShapes[i % markerShapes.length],
|
| 410 |
+
values: (map[r]||[])
|
| 411 |
+
.slice()
|
| 412 |
+
.sort((a,b)=>a.step-b.step)
|
| 413 |
+
.map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
|
| 414 |
+
}));
|
| 415 |
+
|
| 416 |
+
// Draw lines
|
| 417 |
+
const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
|
| 418 |
+
paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
|
| 419 |
+
.attr('stroke', d=>d.color).attr('opacity',0.9)
|
| 420 |
+
.attr('d', d=>lineGen(d.values))
|
| 421 |
+
.merge(paths)
|
| 422 |
+
.transition().duration(200)
|
| 423 |
+
.attr('stroke', d=>d.color)
|
| 424 |
+
.attr('d', d=>lineGen(d.values));
|
| 425 |
+
paths.exit().remove();
|
| 426 |
+
|
| 427 |
+
// Draw markers for each data point
|
| 428 |
+
gPoints.selectAll('*').remove();
|
| 429 |
+
series.forEach((s, seriesIndex) => {
|
| 430 |
+
const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
|
| 431 |
+
.data(s.values)
|
| 432 |
+
.join('g')
|
| 433 |
+
.attr('class', `points-${seriesIndex}`)
|
| 434 |
+
.attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
|
| 435 |
+
|
| 436 |
+
drawMarker(pointGroup, s.marker, markerSize)
|
| 437 |
+
.attr('fill', s.color)
|
| 438 |
+
.attr('stroke', s.color)
|
| 439 |
+
.attr('stroke-width', 1.5)
|
| 440 |
+
.style('cursor', 'crosshair');
|
| 441 |
+
});
|
| 442 |
+
|
| 443 |
+
// Inline legend content with marker shapes
|
| 444 |
+
legendInline.innerHTML = '';
|
| 445 |
+
series.forEach(s => {
|
| 446 |
+
const legendItem = document.createElement('span');
|
| 447 |
+
legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
|
| 448 |
+
|
| 449 |
+
// Create small SVG for marker shape
|
| 450 |
+
const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
|
| 451 |
+
markerSvg.setAttribute('width', '16');
|
| 452 |
+
markerSvg.setAttribute('height', '12');
|
| 453 |
+
markerSvg.style.display = 'inline-block';
|
| 454 |
+
|
| 455 |
+
const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
|
| 456 |
+
g.setAttribute('transform', 'translate(8,6)');
|
| 457 |
+
|
| 458 |
+
let shape;
|
| 459 |
+
const size = 6;
|
| 460 |
+
const halfSize = size / 2;
|
| 461 |
+
switch(s.marker) {
|
| 462 |
+
case 'circle':
|
| 463 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 464 |
+
shape.setAttribute('r', halfSize);
|
| 465 |
+
break;
|
| 466 |
+
case 'square':
|
| 467 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
|
| 468 |
+
shape.setAttribute('x', -halfSize);
|
| 469 |
+
shape.setAttribute('y', -halfSize);
|
| 470 |
+
shape.setAttribute('width', size);
|
| 471 |
+
shape.setAttribute('height', size);
|
| 472 |
+
break;
|
| 473 |
+
case 'triangle':
|
| 474 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 475 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
|
| 476 |
+
break;
|
| 477 |
+
case 'diamond':
|
| 478 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 479 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
|
| 480 |
+
break;
|
| 481 |
+
case 'inverted-triangle':
|
| 482 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 483 |
+
shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
|
| 484 |
+
break;
|
| 485 |
+
default:
|
| 486 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 487 |
+
shape.setAttribute('r', halfSize);
|
| 488 |
+
}
|
| 489 |
+
shape.setAttribute('fill', s.color);
|
| 490 |
+
shape.setAttribute('stroke', s.color);
|
| 491 |
+
shape.setAttribute('stroke-width', '1');
|
| 492 |
+
|
| 493 |
+
g.appendChild(shape);
|
| 494 |
+
markerSvg.appendChild(g);
|
| 495 |
+
|
| 496 |
+
const label = document.createElement('span');
|
| 497 |
+
label.textContent = s.run;
|
| 498 |
+
|
| 499 |
+
legendItem.appendChild(markerSvg);
|
| 500 |
+
legendItem.appendChild(label);
|
| 501 |
+
legendInline.appendChild(legendItem);
|
| 502 |
+
});
|
| 503 |
+
|
| 504 |
+
// Hover
|
| 505 |
+
const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
|
| 506 |
+
const steps = Array.from(stepSet).sort((a,b)=>a-b);
|
| 507 |
+
function onMove(event){
|
| 508 |
+
const [mx, my] = d3.pointer(event, overlay.node());
|
| 509 |
+
const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
|
| 510 |
+
const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
|
| 511 |
+
const xpx = xScale(nearest);
|
| 512 |
+
hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
|
| 513 |
+
// Tooltip content
|
| 514 |
+
let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
|
| 515 |
+
series.forEach(s=>{
|
| 516 |
+
const m = new Map(s.values.map(v=>[v.step, v.value]));
|
| 517 |
+
const val = m.has(nearest) ? m.get(nearest) : null;
|
| 518 |
+
if (val != null) {
|
| 519 |
+
const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
|
| 520 |
+
html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
|
| 521 |
+
}
|
| 522 |
+
});
|
| 523 |
+
tipInner.innerHTML = html;
|
| 524 |
+
const offsetX = 12, offsetY = 12;
|
| 525 |
+
tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
|
| 526 |
+
}
|
| 527 |
+
function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
|
| 528 |
+
overlay.on('mousemove', onMove).on('mouseleave', onLeave);
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
// (old hover removed; hover is attached in renderMetric)
|
| 532 |
+
|
| 533 |
+
// Load CSV and wire controls
|
| 534 |
+
(async () => {
|
| 535 |
+
try {
|
| 536 |
+
const text = await fetchFirstAvailable(CSV_PATHS);
|
| 537 |
+
const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
|
| 538 |
+
metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
|
| 539 |
+
runList = Array.from(new Set(rows.map(r=>r.run))).sort();
|
| 540 |
+
runOrder = runList;
|
| 541 |
+
// Build dataByMetric
|
| 542 |
+
metricList.forEach(m => {
|
| 543 |
+
const map = {};
|
| 544 |
+
runList.forEach(r => { map[r] = []; });
|
| 545 |
+
rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
|
| 546 |
+
dataByMetric.set(m, map);
|
| 547 |
+
});
|
| 548 |
+
|
| 549 |
+
// Populate metric select (default to average_rank if present)
|
| 550 |
+
metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
|
| 551 |
+
const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
|
| 552 |
+
if (def) selectMetric.value = def;
|
| 553 |
+
|
| 554 |
+
container.appendChild(controls);
|
| 555 |
+
updateScales();
|
| 556 |
+
renderMetric(selectMetric.value);
|
| 557 |
+
|
| 558 |
+
selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
|
| 559 |
+
|
| 560 |
+
const rerender = () => { renderMetric(selectMetric.value); };
|
| 561 |
+
if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
|
| 562 |
+
} catch (e) {
|
| 563 |
+
const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
|
| 564 |
+
pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
|
| 565 |
+
container.appendChild(pre);
|
| 566 |
+
}
|
| 567 |
+
})();
|
| 568 |
+
};
|
| 569 |
+
|
| 570 |
+
if (document.readyState === 'loading') {
|
| 571 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 572 |
+
} else { ensureD3(bootstrap); }
|
| 573 |
+
})();
|
| 574 |
+
</script>
|
| 575 |
+
|
| 576 |
+
|
app/src/content/embeds/formatting-filters.html
ADDED
|
@@ -0,0 +1,576 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-line" style="width:100%;margin:10px 0;"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-line .d3-line__controls select {
|
| 4 |
+
font-size: 12px;
|
| 5 |
+
padding: 8px 28px 8px 10px;
|
| 6 |
+
border: 1px solid var(--border-color);
|
| 7 |
+
border-radius: 8px;
|
| 8 |
+
background-color: var(--surface-bg);
|
| 9 |
+
color: var(--text-color);
|
| 10 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 11 |
+
background-repeat: no-repeat;
|
| 12 |
+
background-position: right 8px center;
|
| 13 |
+
background-size: 12px;
|
| 14 |
+
-webkit-appearance: none;
|
| 15 |
+
-moz-appearance: none;
|
| 16 |
+
appearance: none;
|
| 17 |
+
cursor: pointer;
|
| 18 |
+
transition: border-color .15s ease, box-shadow .15s ease;
|
| 19 |
+
}
|
| 20 |
+
[data-theme="dark"] .d3-line .d3-line__controls select {
|
| 21 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 22 |
+
}
|
| 23 |
+
.d3-line .d3-line__controls select:hover {
|
| 24 |
+
border-color: var(--primary-color);
|
| 25 |
+
}
|
| 26 |
+
.d3-line .d3-line__controls select:focus {
|
| 27 |
+
border-color: var(--primary-color);
|
| 28 |
+
box-shadow: 0 0 0 3px rgba(232,137,171,.25);
|
| 29 |
+
outline: none;
|
| 30 |
+
}
|
| 31 |
+
.d3-line .d3-line__controls label { gap: 8px; }
|
| 32 |
+
|
| 33 |
+
/* Range slider themed with --primary-color */
|
| 34 |
+
.d3-line .d3-line__controls input[type="range"] {
|
| 35 |
+
-webkit-appearance: none;
|
| 36 |
+
appearance: none;
|
| 37 |
+
width: 100%;
|
| 38 |
+
height: 6px;
|
| 39 |
+
border-radius: 999px;
|
| 40 |
+
background: var(--border-color);
|
| 41 |
+
outline: none;
|
| 42 |
+
}
|
| 43 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
|
| 44 |
+
height: 6px;
|
| 45 |
+
background: transparent;
|
| 46 |
+
border-radius: 999px;
|
| 47 |
+
}
|
| 48 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
|
| 49 |
+
-webkit-appearance: none;
|
| 50 |
+
appearance: none;
|
| 51 |
+
width: 16px;
|
| 52 |
+
height: 16px;
|
| 53 |
+
border-radius: 50%;
|
| 54 |
+
background: var(--primary-color);
|
| 55 |
+
border: 2px solid var(--on-primary);
|
| 56 |
+
margin-top: -5px;
|
| 57 |
+
cursor: pointer;
|
| 58 |
+
}
|
| 59 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-track {
|
| 60 |
+
height: 6px;
|
| 61 |
+
background: transparent;
|
| 62 |
+
border-radius: 999px;
|
| 63 |
+
}
|
| 64 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
|
| 65 |
+
width: 16px;
|
| 66 |
+
height: 16px;
|
| 67 |
+
border-radius: 50%;
|
| 68 |
+
background: var(--primary-color);
|
| 69 |
+
border: 2px solid var(--on-primary);
|
| 70 |
+
cursor: pointer;
|
| 71 |
+
}
|
| 72 |
+
/* Improved line color via CSS */
|
| 73 |
+
.d3-line .lines path.improved { stroke: var(--primary-color); }
|
| 74 |
+
</style>
|
| 75 |
+
<script>
|
| 76 |
+
(() => {
|
| 77 |
+
const ensureD3 = (cb) => {
|
| 78 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 79 |
+
let s = document.getElementById('d3-cdn-script');
|
| 80 |
+
if (!s) {
|
| 81 |
+
s = document.createElement('script');
|
| 82 |
+
s.id = 'd3-cdn-script';
|
| 83 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 84 |
+
document.head.appendChild(s);
|
| 85 |
+
}
|
| 86 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 87 |
+
s.addEventListener('load', onReady, { once: true });
|
| 88 |
+
if (window.d3) onReady();
|
| 89 |
+
};
|
| 90 |
+
|
| 91 |
+
const bootstrap = () => {
|
| 92 |
+
const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
|
| 93 |
+
const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
|
| 94 |
+
if (!container) return;
|
| 95 |
+
if (container.dataset) {
|
| 96 |
+
if (container.dataset.mounted === 'true') return;
|
| 97 |
+
container.dataset.mounted = 'true';
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
// CSV: prefer public path, fallback to relative
|
| 101 |
+
const CSV_PATHS = [
|
| 102 |
+
'/data/formatting_filters.csv',
|
| 103 |
+
'./assets/data/formatting_filters.csv',
|
| 104 |
+
'../assets/data/formatting_filters.csv',
|
| 105 |
+
'../../assets/data/formatting_filters.csv'
|
| 106 |
+
];
|
| 107 |
+
const fetchFirstAvailable = async (paths) => {
|
| 108 |
+
for (const p of paths) {
|
| 109 |
+
try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
|
| 110 |
+
}
|
| 111 |
+
throw new Error('CSV not found: formatting_filters.csv');
|
| 112 |
+
};
|
| 113 |
+
|
| 114 |
+
// Controls UI
|
| 115 |
+
const controls = document.createElement('div');
|
| 116 |
+
controls.className = 'd3-line__controls';
|
| 117 |
+
Object.assign(controls.style, {
|
| 118 |
+
marginTop: '12px',
|
| 119 |
+
display: 'flex',
|
| 120 |
+
gap: '16px',
|
| 121 |
+
alignItems: 'center',
|
| 122 |
+
justifyContent: 'space-between',
|
| 123 |
+
width: '100%'
|
| 124 |
+
});
|
| 125 |
+
|
| 126 |
+
const labelMetric = document.createElement('label');
|
| 127 |
+
Object.assign(labelMetric.style, {
|
| 128 |
+
fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
|
| 129 |
+
});
|
| 130 |
+
labelMetric.textContent = 'Metric';
|
| 131 |
+
const selectMetric = document.createElement('select');
|
| 132 |
+
Object.assign(selectMetric.style, { fontSize: '12px' });
|
| 133 |
+
labelMetric.appendChild(selectMetric);
|
| 134 |
+
|
| 135 |
+
// Inline legend on the right of the select
|
| 136 |
+
const legendInline = document.createElement('div');
|
| 137 |
+
legendInline.className = 'controls__legend';
|
| 138 |
+
Object.assign(legendInline.style, {
|
| 139 |
+
display: 'flex',
|
| 140 |
+
gap: '8px',
|
| 141 |
+
alignItems: 'center',
|
| 142 |
+
flexWrap: 'nowrap',
|
| 143 |
+
fontSize: '11px',
|
| 144 |
+
marginLeft: '8px'
|
| 145 |
+
});
|
| 146 |
+
controls.appendChild(legendInline);
|
| 147 |
+
controls.appendChild(labelMetric);
|
| 148 |
+
|
| 149 |
+
// Create SVG with marker definitions
|
| 150 |
+
const svg = d3.select(container).append('svg')
|
| 151 |
+
.attr('width', '100%')
|
| 152 |
+
.style('display', 'block');
|
| 153 |
+
|
| 154 |
+
// Add marker definitions for different shapes
|
| 155 |
+
const defs = svg.append('defs');
|
| 156 |
+
|
| 157 |
+
// Academic marker shapes
|
| 158 |
+
const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
|
| 159 |
+
const markerSize = 8;
|
| 160 |
+
|
| 161 |
+
// Groups
|
| 162 |
+
const gRoot = svg.append('g');
|
| 163 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 164 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 165 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 166 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 167 |
+
const gHover = gRoot.append('g').attr('class', 'hover');
|
| 168 |
+
const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
|
| 169 |
+
|
| 170 |
+
// Tooltip
|
| 171 |
+
container.style.position = container.style.position || 'relative';
|
| 172 |
+
let tip = container.querySelector('.d3-tooltip');
|
| 173 |
+
let tipInner;
|
| 174 |
+
if (!tip) {
|
| 175 |
+
tip = document.createElement('div');
|
| 176 |
+
tip.className = 'd3-tooltip';
|
| 177 |
+
Object.assign(tip.style, {
|
| 178 |
+
position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
|
| 179 |
+
padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
|
| 180 |
+
background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
|
| 181 |
+
transition: 'opacity .12s ease'
|
| 182 |
+
});
|
| 183 |
+
tipInner = document.createElement('div');
|
| 184 |
+
tipInner.className = 'd3-tooltip__inner';
|
| 185 |
+
tipInner.style.textAlign = 'left';
|
| 186 |
+
tip.appendChild(tipInner);
|
| 187 |
+
container.appendChild(tip);
|
| 188 |
+
} else {
|
| 189 |
+
tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
// Colors per run
|
| 193 |
+
const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
|
| 194 |
+
const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
|
| 195 |
+
|
| 196 |
+
// Mapping from metric names to display titles
|
| 197 |
+
const metricTitleMapping = {
|
| 198 |
+
'docvqa_val_anls': 'DocVQA',
|
| 199 |
+
'infovqa_val_anls': 'InfoVQA',
|
| 200 |
+
'mme_total_score': 'MME Total',
|
| 201 |
+
'mmmu_val_mmmu_acc': 'MMMU',
|
| 202 |
+
'mmstar_average': 'MMStar',
|
| 203 |
+
'ocrbench_ocrbench_accuracy': 'OCRBench',
|
| 204 |
+
'scienceqa_exact_match': 'ScienceQA',
|
| 205 |
+
'textvqa_val_exact_match': 'TextVQA',
|
| 206 |
+
'average': 'Average (excl. MME)',
|
| 207 |
+
'average_rank': 'Average Rank',
|
| 208 |
+
'ai2d_exact_match': 'AI2D',
|
| 209 |
+
'chartqa_relaxed_overall': 'ChartQA',
|
| 210 |
+
'seedbench_seed_all': 'SeedBench'
|
| 211 |
+
};
|
| 212 |
+
|
| 213 |
+
// Function to get display name for metric
|
| 214 |
+
function getMetricDisplayName(metricKey) {
|
| 215 |
+
return metricTitleMapping[metricKey] || metricKey;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// State and data
|
| 219 |
+
let metricList = [];
|
| 220 |
+
let runList = [];
|
| 221 |
+
let runOrder = [];
|
| 222 |
+
const dataByMetric = new Map(); // metric => { run => [{step,value}] }
|
| 223 |
+
let isRankStrictFlag = false;
|
| 224 |
+
let rankTickMax = 1;
|
| 225 |
+
|
| 226 |
+
// Scales and layout
|
| 227 |
+
let width = 800, height = 360;
|
| 228 |
+
let margin = { top: 16, right: 28, bottom: 56, left: 64 };
|
| 229 |
+
let xScale = d3.scaleLinear();
|
| 230 |
+
let yScale = d3.scaleLinear();
|
| 231 |
+
|
| 232 |
+
// Line generators - simple linear connections
|
| 233 |
+
const lineGen = d3.line()
|
| 234 |
+
.x((d) => xScale(d.step))
|
| 235 |
+
.y((d) => yScale(d.value));
|
| 236 |
+
|
| 237 |
+
// Function to draw different marker shapes
|
| 238 |
+
function drawMarker(selection, shape, size) {
|
| 239 |
+
const s = size / 2;
|
| 240 |
+
switch (shape) {
|
| 241 |
+
case 'circle':
|
| 242 |
+
return selection.append('circle').attr('r', s);
|
| 243 |
+
case 'square':
|
| 244 |
+
return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
|
| 245 |
+
case 'triangle':
|
| 246 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
|
| 247 |
+
case 'diamond':
|
| 248 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
|
| 249 |
+
case 'inverted-triangle':
|
| 250 |
+
return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
|
| 251 |
+
default:
|
| 252 |
+
return selection.append('circle').attr('r', s);
|
| 253 |
+
}
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
// Hover elements
|
| 257 |
+
const hoverLine = gHover.append('line').attr('stroke-width', 1);
|
| 258 |
+
|
| 259 |
+
const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
|
| 260 |
+
|
| 261 |
+
function updateScales() {
|
| 262 |
+
const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
|
| 263 |
+
const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
|
| 264 |
+
const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
|
| 265 |
+
const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
|
| 266 |
+
|
| 267 |
+
width = container.clientWidth || 800;
|
| 268 |
+
height = Math.max(360, Math.round(width / 2.2));
|
| 269 |
+
svg.attr('width', width).attr('height', height);
|
| 270 |
+
|
| 271 |
+
const innerWidth = width - margin.left - margin.right;
|
| 272 |
+
const innerHeight = height - margin.top - margin.bottom;
|
| 273 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 274 |
+
|
| 275 |
+
xScale.range([0, innerWidth]);
|
| 276 |
+
yScale.range([innerHeight, 0]);
|
| 277 |
+
|
| 278 |
+
// Compute Y ticks
|
| 279 |
+
let yTicks = [];
|
| 280 |
+
if (isRankStrictFlag) {
|
| 281 |
+
const maxR = Math.max(1, Math.round(rankTickMax));
|
| 282 |
+
for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
|
| 283 |
+
} else {
|
| 284 |
+
// Use D3's tick generator to produce nice floating-point ticks
|
| 285 |
+
yTicks = yScale.ticks(6);
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
// Grid (horizontal)
|
| 289 |
+
gGrid.selectAll('*').remove();
|
| 290 |
+
gGrid.selectAll('line')
|
| 291 |
+
.data(yTicks)
|
| 292 |
+
.join('line')
|
| 293 |
+
.attr('x1', 0)
|
| 294 |
+
.attr('x2', innerWidth)
|
| 295 |
+
.attr('y1', (d) => yScale(d))
|
| 296 |
+
.attr('y2', (d) => yScale(d))
|
| 297 |
+
.attr('stroke', gridColor)
|
| 298 |
+
.attr('stroke-width', 1)
|
| 299 |
+
.attr('shape-rendering', 'crispEdges');
|
| 300 |
+
|
| 301 |
+
// Axes
|
| 302 |
+
gAxes.selectAll('*').remove();
|
| 303 |
+
let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
|
| 304 |
+
if (isRankStrictFlag) {
|
| 305 |
+
const [dx0, dx1] = xScale.domain();
|
| 306 |
+
const start = Math.ceil(dx0 / 1000) * 1000;
|
| 307 |
+
const end = Math.floor(dx1 / 1000) * 1000;
|
| 308 |
+
const xTicks = [];
|
| 309 |
+
for (let v = start; v <= end; v += 1000) xTicks.push(v);
|
| 310 |
+
if (xTicks.length === 0) xTicks.push(Math.round(dx0));
|
| 311 |
+
xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
|
| 312 |
+
} else {
|
| 313 |
+
xAxis = xAxis.ticks(8);
|
| 314 |
+
}
|
| 315 |
+
const yAxis = d3.axisLeft(yScale)
|
| 316 |
+
.tickValues(yTicks)
|
| 317 |
+
.tickSizeOuter(0)
|
| 318 |
+
.tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
|
| 319 |
+
gAxes.append('g')
|
| 320 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 321 |
+
.call(xAxis)
|
| 322 |
+
.call((g) => {
|
| 323 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 324 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 325 |
+
});
|
| 326 |
+
gAxes.append('g')
|
| 327 |
+
.call(yAxis)
|
| 328 |
+
.call((g) => {
|
| 329 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 330 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
// Axis labels (X and Y)
|
| 334 |
+
gAxes.append('text')
|
| 335 |
+
.attr('class', 'axis-label axis-label--x')
|
| 336 |
+
.attr('x', innerWidth / 2)
|
| 337 |
+
.attr('y', innerHeight + 44)
|
| 338 |
+
.attr('text-anchor', 'middle')
|
| 339 |
+
.style('font-size', '12px')
|
| 340 |
+
.style('fill', tickColor)
|
| 341 |
+
.text('Step');
|
| 342 |
+
gAxes.append('text')
|
| 343 |
+
.attr('class', 'axis-label axis-label--y')
|
| 344 |
+
.attr('text-anchor', 'middle')
|
| 345 |
+
.attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
|
| 346 |
+
.style('font-size', '12px')
|
| 347 |
+
.style('fill', tickColor)
|
| 348 |
+
.text('Value');
|
| 349 |
+
|
| 350 |
+
overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
|
| 351 |
+
hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
|
| 352 |
+
|
| 353 |
+
// Legend placeholder; actual content set in renderMetric
|
| 354 |
+
const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
|
| 355 |
+
const legendHeight = 64;
|
| 356 |
+
gLegend
|
| 357 |
+
.attr('x', innerWidth - legendWidth + 42)
|
| 358 |
+
.attr('y', innerHeight - legendHeight - 12)
|
| 359 |
+
.attr('width', legendWidth)
|
| 360 |
+
.attr('height', legendHeight);
|
| 361 |
+
const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
|
| 362 |
+
Object.assign(legendRoot.node().style, {
|
| 363 |
+
background: 'transparent',
|
| 364 |
+
border: 'none',
|
| 365 |
+
borderRadius: '0',
|
| 366 |
+
padding: '0',
|
| 367 |
+
fontSize: '12px',
|
| 368 |
+
lineHeight: '1.35',
|
| 369 |
+
color: 'var(--text-color)'
|
| 370 |
+
});
|
| 371 |
+
|
| 372 |
+
return { innerWidth, innerHeight };
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
function renderMetric(metricKey){
|
| 376 |
+
const map = dataByMetric.get(metricKey) || {};
|
| 377 |
+
const runs = runOrder;
|
| 378 |
+
// Domain
|
| 379 |
+
let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
|
| 380 |
+
const isRank = /rank/i.test(metricKey);
|
| 381 |
+
const isAverage = /average/i.test(metricKey);
|
| 382 |
+
const isRankStrict = isRank && !isAverage;
|
| 383 |
+
runs.forEach(r => {
|
| 384 |
+
const arr = map[r] || [];
|
| 385 |
+
arr.forEach(pt => {
|
| 386 |
+
const val = isRankStrict ? Math.round(pt.value) : pt.value;
|
| 387 |
+
minStep = Math.min(minStep, pt.step);
|
| 388 |
+
maxStep = Math.max(maxStep, pt.step);
|
| 389 |
+
maxVal = Math.max(maxVal, val);
|
| 390 |
+
minVal = Math.min(minVal, val);
|
| 391 |
+
});
|
| 392 |
+
});
|
| 393 |
+
if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
|
| 394 |
+
xScale.domain([minStep, maxStep]);
|
| 395 |
+
if (isRank) {
|
| 396 |
+
rankTickMax = Math.max(1, Math.round(maxVal));
|
| 397 |
+
yScale.domain([rankTickMax, 1]);
|
| 398 |
+
} else {
|
| 399 |
+
yScale.domain([0, Math.max(1, maxVal)]).nice();
|
| 400 |
+
}
|
| 401 |
+
isRankStrictFlag = isRankStrict;
|
| 402 |
+
|
| 403 |
+
const { innerWidth, innerHeight } = updateScales();
|
| 404 |
+
|
| 405 |
+
// Bind lines and markers
|
| 406 |
+
const series = runs.map((r, i) => ({
|
| 407 |
+
run: r,
|
| 408 |
+
color: pool[i % pool.length],
|
| 409 |
+
marker: markerShapes[i % markerShapes.length],
|
| 410 |
+
values: (map[r]||[])
|
| 411 |
+
.slice()
|
| 412 |
+
.sort((a,b)=>a.step-b.step)
|
| 413 |
+
.map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
|
| 414 |
+
}));
|
| 415 |
+
|
| 416 |
+
// Draw lines
|
| 417 |
+
const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
|
| 418 |
+
paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
|
| 419 |
+
.attr('stroke', d=>d.color).attr('opacity',0.9)
|
| 420 |
+
.attr('d', d=>lineGen(d.values))
|
| 421 |
+
.merge(paths)
|
| 422 |
+
.transition().duration(200)
|
| 423 |
+
.attr('stroke', d=>d.color)
|
| 424 |
+
.attr('d', d=>lineGen(d.values));
|
| 425 |
+
paths.exit().remove();
|
| 426 |
+
|
| 427 |
+
// Draw markers for each data point
|
| 428 |
+
gPoints.selectAll('*').remove();
|
| 429 |
+
series.forEach((s, seriesIndex) => {
|
| 430 |
+
const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
|
| 431 |
+
.data(s.values)
|
| 432 |
+
.join('g')
|
| 433 |
+
.attr('class', `points-${seriesIndex}`)
|
| 434 |
+
.attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
|
| 435 |
+
|
| 436 |
+
drawMarker(pointGroup, s.marker, markerSize)
|
| 437 |
+
.attr('fill', s.color)
|
| 438 |
+
.attr('stroke', s.color)
|
| 439 |
+
.attr('stroke-width', 1.5)
|
| 440 |
+
.style('cursor', 'crosshair');
|
| 441 |
+
});
|
| 442 |
+
|
| 443 |
+
// Inline legend content with marker shapes
|
| 444 |
+
legendInline.innerHTML = '';
|
| 445 |
+
series.forEach(s => {
|
| 446 |
+
const legendItem = document.createElement('span');
|
| 447 |
+
legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
|
| 448 |
+
|
| 449 |
+
// Create small SVG for marker shape
|
| 450 |
+
const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
|
| 451 |
+
markerSvg.setAttribute('width', '16');
|
| 452 |
+
markerSvg.setAttribute('height', '12');
|
| 453 |
+
markerSvg.style.display = 'inline-block';
|
| 454 |
+
|
| 455 |
+
const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
|
| 456 |
+
g.setAttribute('transform', 'translate(8,6)');
|
| 457 |
+
|
| 458 |
+
let shape;
|
| 459 |
+
const size = 6;
|
| 460 |
+
const halfSize = size / 2;
|
| 461 |
+
switch(s.marker) {
|
| 462 |
+
case 'circle':
|
| 463 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 464 |
+
shape.setAttribute('r', halfSize);
|
| 465 |
+
break;
|
| 466 |
+
case 'square':
|
| 467 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
|
| 468 |
+
shape.setAttribute('x', -halfSize);
|
| 469 |
+
shape.setAttribute('y', -halfSize);
|
| 470 |
+
shape.setAttribute('width', size);
|
| 471 |
+
shape.setAttribute('height', size);
|
| 472 |
+
break;
|
| 473 |
+
case 'triangle':
|
| 474 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 475 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
|
| 476 |
+
break;
|
| 477 |
+
case 'diamond':
|
| 478 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 479 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
|
| 480 |
+
break;
|
| 481 |
+
case 'inverted-triangle':
|
| 482 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 483 |
+
shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
|
| 484 |
+
break;
|
| 485 |
+
default:
|
| 486 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 487 |
+
shape.setAttribute('r', halfSize);
|
| 488 |
+
}
|
| 489 |
+
shape.setAttribute('fill', s.color);
|
| 490 |
+
shape.setAttribute('stroke', s.color);
|
| 491 |
+
shape.setAttribute('stroke-width', '1');
|
| 492 |
+
|
| 493 |
+
g.appendChild(shape);
|
| 494 |
+
markerSvg.appendChild(g);
|
| 495 |
+
|
| 496 |
+
const label = document.createElement('span');
|
| 497 |
+
label.textContent = s.run;
|
| 498 |
+
|
| 499 |
+
legendItem.appendChild(markerSvg);
|
| 500 |
+
legendItem.appendChild(label);
|
| 501 |
+
legendInline.appendChild(legendItem);
|
| 502 |
+
});
|
| 503 |
+
|
| 504 |
+
// Hover
|
| 505 |
+
const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
|
| 506 |
+
const steps = Array.from(stepSet).sort((a,b)=>a-b);
|
| 507 |
+
function onMove(event){
|
| 508 |
+
const [mx, my] = d3.pointer(event, overlay.node());
|
| 509 |
+
const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
|
| 510 |
+
const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
|
| 511 |
+
const xpx = xScale(nearest);
|
| 512 |
+
hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
|
| 513 |
+
// Tooltip content
|
| 514 |
+
let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
|
| 515 |
+
series.forEach(s=>{
|
| 516 |
+
const m = new Map(s.values.map(v=>[v.step, v.value]));
|
| 517 |
+
const val = m.has(nearest) ? m.get(nearest) : null;
|
| 518 |
+
if (val != null) {
|
| 519 |
+
const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
|
| 520 |
+
html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
|
| 521 |
+
}
|
| 522 |
+
});
|
| 523 |
+
tipInner.innerHTML = html;
|
| 524 |
+
const offsetX = 12, offsetY = 12;
|
| 525 |
+
tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
|
| 526 |
+
}
|
| 527 |
+
function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
|
| 528 |
+
overlay.on('mousemove', onMove).on('mouseleave', onLeave);
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
// (old hover removed; hover is attached in renderMetric)
|
| 532 |
+
|
| 533 |
+
// Load CSV and wire controls
|
| 534 |
+
(async () => {
|
| 535 |
+
try {
|
| 536 |
+
const text = await fetchFirstAvailable(CSV_PATHS);
|
| 537 |
+
const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
|
| 538 |
+
metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
|
| 539 |
+
runList = Array.from(new Set(rows.map(r=>r.run))).sort();
|
| 540 |
+
runOrder = runList;
|
| 541 |
+
// Build dataByMetric
|
| 542 |
+
metricList.forEach(m => {
|
| 543 |
+
const map = {};
|
| 544 |
+
runList.forEach(r => { map[r] = []; });
|
| 545 |
+
rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
|
| 546 |
+
dataByMetric.set(m, map);
|
| 547 |
+
});
|
| 548 |
+
|
| 549 |
+
// Populate metric select (default to average_rank if present)
|
| 550 |
+
metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
|
| 551 |
+
const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
|
| 552 |
+
if (def) selectMetric.value = def;
|
| 553 |
+
|
| 554 |
+
container.appendChild(controls);
|
| 555 |
+
updateScales();
|
| 556 |
+
renderMetric(selectMetric.value);
|
| 557 |
+
|
| 558 |
+
selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
|
| 559 |
+
|
| 560 |
+
const rerender = () => { renderMetric(selectMetric.value); };
|
| 561 |
+
if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
|
| 562 |
+
} catch (e) {
|
| 563 |
+
const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
|
| 564 |
+
pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
|
| 565 |
+
container.appendChild(pre);
|
| 566 |
+
}
|
| 567 |
+
})();
|
| 568 |
+
};
|
| 569 |
+
|
| 570 |
+
if (document.readyState === 'loading') {
|
| 571 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 572 |
+
} else { ensureD3(bootstrap); }
|
| 573 |
+
})();
|
| 574 |
+
</script>
|
| 575 |
+
|
| 576 |
+
|
app/src/content/embeds/image-correspondence-filters.html
ADDED
|
@@ -0,0 +1,576 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-line" style="width:100%;margin:10px 0;"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-line .d3-line__controls select {
|
| 4 |
+
font-size: 12px;
|
| 5 |
+
padding: 8px 28px 8px 10px;
|
| 6 |
+
border: 1px solid var(--border-color);
|
| 7 |
+
border-radius: 8px;
|
| 8 |
+
background-color: var(--surface-bg);
|
| 9 |
+
color: var(--text-color);
|
| 10 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 11 |
+
background-repeat: no-repeat;
|
| 12 |
+
background-position: right 8px center;
|
| 13 |
+
background-size: 12px;
|
| 14 |
+
-webkit-appearance: none;
|
| 15 |
+
-moz-appearance: none;
|
| 16 |
+
appearance: none;
|
| 17 |
+
cursor: pointer;
|
| 18 |
+
transition: border-color .15s ease, box-shadow .15s ease;
|
| 19 |
+
}
|
| 20 |
+
[data-theme="dark"] .d3-line .d3-line__controls select {
|
| 21 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 22 |
+
}
|
| 23 |
+
.d3-line .d3-line__controls select:hover {
|
| 24 |
+
border-color: var(--primary-color);
|
| 25 |
+
}
|
| 26 |
+
.d3-line .d3-line__controls select:focus {
|
| 27 |
+
border-color: var(--primary-color);
|
| 28 |
+
box-shadow: 0 0 0 3px rgba(232,137,171,.25);
|
| 29 |
+
outline: none;
|
| 30 |
+
}
|
| 31 |
+
.d3-line .d3-line__controls label { gap: 8px; }
|
| 32 |
+
|
| 33 |
+
/* Range slider themed with --primary-color */
|
| 34 |
+
.d3-line .d3-line__controls input[type="range"] {
|
| 35 |
+
-webkit-appearance: none;
|
| 36 |
+
appearance: none;
|
| 37 |
+
width: 100%;
|
| 38 |
+
height: 6px;
|
| 39 |
+
border-radius: 999px;
|
| 40 |
+
background: var(--border-color);
|
| 41 |
+
outline: none;
|
| 42 |
+
}
|
| 43 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
|
| 44 |
+
height: 6px;
|
| 45 |
+
background: transparent;
|
| 46 |
+
border-radius: 999px;
|
| 47 |
+
}
|
| 48 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
|
| 49 |
+
-webkit-appearance: none;
|
| 50 |
+
appearance: none;
|
| 51 |
+
width: 16px;
|
| 52 |
+
height: 16px;
|
| 53 |
+
border-radius: 50%;
|
| 54 |
+
background: var(--primary-color);
|
| 55 |
+
border: 2px solid var(--on-primary);
|
| 56 |
+
margin-top: -5px;
|
| 57 |
+
cursor: pointer;
|
| 58 |
+
}
|
| 59 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-track {
|
| 60 |
+
height: 6px;
|
| 61 |
+
background: transparent;
|
| 62 |
+
border-radius: 999px;
|
| 63 |
+
}
|
| 64 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
|
| 65 |
+
width: 16px;
|
| 66 |
+
height: 16px;
|
| 67 |
+
border-radius: 50%;
|
| 68 |
+
background: var(--primary-color);
|
| 69 |
+
border: 2px solid var(--on-primary);
|
| 70 |
+
cursor: pointer;
|
| 71 |
+
}
|
| 72 |
+
/* Improved line color via CSS */
|
| 73 |
+
.d3-line .lines path.improved { stroke: var(--primary-color); }
|
| 74 |
+
</style>
|
| 75 |
+
<script>
|
| 76 |
+
(() => {
|
| 77 |
+
const ensureD3 = (cb) => {
|
| 78 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 79 |
+
let s = document.getElementById('d3-cdn-script');
|
| 80 |
+
if (!s) {
|
| 81 |
+
s = document.createElement('script');
|
| 82 |
+
s.id = 'd3-cdn-script';
|
| 83 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 84 |
+
document.head.appendChild(s);
|
| 85 |
+
}
|
| 86 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 87 |
+
s.addEventListener('load', onReady, { once: true });
|
| 88 |
+
if (window.d3) onReady();
|
| 89 |
+
};
|
| 90 |
+
|
| 91 |
+
const bootstrap = () => {
|
| 92 |
+
const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
|
| 93 |
+
const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
|
| 94 |
+
if (!container) return;
|
| 95 |
+
if (container.dataset) {
|
| 96 |
+
if (container.dataset.mounted === 'true') return;
|
| 97 |
+
container.dataset.mounted = 'true';
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
// CSV: prefer public path, fallback to relative
|
| 101 |
+
const CSV_PATHS = [
|
| 102 |
+
'/data/image_correspondence_filters.csv',
|
| 103 |
+
'./assets/data/image_correspondence_filters.csv',
|
| 104 |
+
'../assets/data/image_correspondence_filters.csv',
|
| 105 |
+
'../../assets/data/image_correspondence_filters.csv'
|
| 106 |
+
];
|
| 107 |
+
const fetchFirstAvailable = async (paths) => {
|
| 108 |
+
for (const p of paths) {
|
| 109 |
+
try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
|
| 110 |
+
}
|
| 111 |
+
throw new Error('CSV not found: image_correspondence_filters.csv');
|
| 112 |
+
};
|
| 113 |
+
|
| 114 |
+
// Controls UI
|
| 115 |
+
const controls = document.createElement('div');
|
| 116 |
+
controls.className = 'd3-line__controls';
|
| 117 |
+
Object.assign(controls.style, {
|
| 118 |
+
marginTop: '12px',
|
| 119 |
+
display: 'flex',
|
| 120 |
+
gap: '16px',
|
| 121 |
+
alignItems: 'center',
|
| 122 |
+
justifyContent: 'space-between',
|
| 123 |
+
width: '100%'
|
| 124 |
+
});
|
| 125 |
+
|
| 126 |
+
const labelMetric = document.createElement('label');
|
| 127 |
+
Object.assign(labelMetric.style, {
|
| 128 |
+
fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
|
| 129 |
+
});
|
| 130 |
+
labelMetric.textContent = 'Metric';
|
| 131 |
+
const selectMetric = document.createElement('select');
|
| 132 |
+
Object.assign(selectMetric.style, { fontSize: '12px' });
|
| 133 |
+
labelMetric.appendChild(selectMetric);
|
| 134 |
+
|
| 135 |
+
// Inline legend on the right of the select
|
| 136 |
+
const legendInline = document.createElement('div');
|
| 137 |
+
legendInline.className = 'controls__legend';
|
| 138 |
+
Object.assign(legendInline.style, {
|
| 139 |
+
display: 'flex',
|
| 140 |
+
gap: '8px',
|
| 141 |
+
alignItems: 'center',
|
| 142 |
+
flexWrap: 'nowrap',
|
| 143 |
+
fontSize: '11px',
|
| 144 |
+
marginLeft: '8px'
|
| 145 |
+
});
|
| 146 |
+
controls.appendChild(legendInline);
|
| 147 |
+
controls.appendChild(labelMetric);
|
| 148 |
+
|
| 149 |
+
// Create SVG with marker definitions
|
| 150 |
+
const svg = d3.select(container).append('svg')
|
| 151 |
+
.attr('width', '100%')
|
| 152 |
+
.style('display', 'block');
|
| 153 |
+
|
| 154 |
+
// Add marker definitions for different shapes
|
| 155 |
+
const defs = svg.append('defs');
|
| 156 |
+
|
| 157 |
+
// Academic marker shapes
|
| 158 |
+
const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
|
| 159 |
+
const markerSize = 8;
|
| 160 |
+
|
| 161 |
+
// Groups
|
| 162 |
+
const gRoot = svg.append('g');
|
| 163 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 164 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 165 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 166 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 167 |
+
const gHover = gRoot.append('g').attr('class', 'hover');
|
| 168 |
+
const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
|
| 169 |
+
|
| 170 |
+
// Tooltip
|
| 171 |
+
container.style.position = container.style.position || 'relative';
|
| 172 |
+
let tip = container.querySelector('.d3-tooltip');
|
| 173 |
+
let tipInner;
|
| 174 |
+
if (!tip) {
|
| 175 |
+
tip = document.createElement('div');
|
| 176 |
+
tip.className = 'd3-tooltip';
|
| 177 |
+
Object.assign(tip.style, {
|
| 178 |
+
position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
|
| 179 |
+
padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
|
| 180 |
+
background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
|
| 181 |
+
transition: 'opacity .12s ease'
|
| 182 |
+
});
|
| 183 |
+
tipInner = document.createElement('div');
|
| 184 |
+
tipInner.className = 'd3-tooltip__inner';
|
| 185 |
+
tipInner.style.textAlign = 'left';
|
| 186 |
+
tip.appendChild(tipInner);
|
| 187 |
+
container.appendChild(tip);
|
| 188 |
+
} else {
|
| 189 |
+
tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
// Colors per run
|
| 193 |
+
const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
|
| 194 |
+
const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
|
| 195 |
+
|
| 196 |
+
// Mapping from metric names to display titles
|
| 197 |
+
const metricTitleMapping = {
|
| 198 |
+
'docvqa_val_anls': 'DocVQA',
|
| 199 |
+
'infovqa_val_anls': 'InfoVQA',
|
| 200 |
+
'mme_total_score': 'MME Total',
|
| 201 |
+
'mmmu_val_mmmu_acc': 'MMMU',
|
| 202 |
+
'mmstar_average': 'MMStar',
|
| 203 |
+
'ocrbench_ocrbench_accuracy': 'OCRBench',
|
| 204 |
+
'scienceqa_exact_match': 'ScienceQA',
|
| 205 |
+
'textvqa_val_exact_match': 'TextVQA',
|
| 206 |
+
'average': 'Average (excl. MME)',
|
| 207 |
+
'average_rank': 'Average Rank',
|
| 208 |
+
'ai2d_exact_match': 'AI2D',
|
| 209 |
+
'chartqa_relaxed_overall': 'ChartQA',
|
| 210 |
+
'seedbench_seed_all': 'SeedBench'
|
| 211 |
+
};
|
| 212 |
+
|
| 213 |
+
// Function to get display name for metric
|
| 214 |
+
function getMetricDisplayName(metricKey) {
|
| 215 |
+
return metricTitleMapping[metricKey] || metricKey;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// State and data
|
| 219 |
+
let metricList = [];
|
| 220 |
+
let runList = [];
|
| 221 |
+
let runOrder = [];
|
| 222 |
+
const dataByMetric = new Map(); // metric => { run => [{step,value}] }
|
| 223 |
+
let isRankStrictFlag = false;
|
| 224 |
+
let rankTickMax = 1;
|
| 225 |
+
|
| 226 |
+
// Scales and layout
|
| 227 |
+
let width = 800, height = 360;
|
| 228 |
+
let margin = { top: 16, right: 28, bottom: 56, left: 64 };
|
| 229 |
+
let xScale = d3.scaleLinear();
|
| 230 |
+
let yScale = d3.scaleLinear();
|
| 231 |
+
|
| 232 |
+
// Line generators - simple linear connections
|
| 233 |
+
const lineGen = d3.line()
|
| 234 |
+
.x((d) => xScale(d.step))
|
| 235 |
+
.y((d) => yScale(d.value));
|
| 236 |
+
|
| 237 |
+
// Function to draw different marker shapes
|
| 238 |
+
function drawMarker(selection, shape, size) {
|
| 239 |
+
const s = size / 2;
|
| 240 |
+
switch (shape) {
|
| 241 |
+
case 'circle':
|
| 242 |
+
return selection.append('circle').attr('r', s);
|
| 243 |
+
case 'square':
|
| 244 |
+
return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
|
| 245 |
+
case 'triangle':
|
| 246 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
|
| 247 |
+
case 'diamond':
|
| 248 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
|
| 249 |
+
case 'inverted-triangle':
|
| 250 |
+
return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
|
| 251 |
+
default:
|
| 252 |
+
return selection.append('circle').attr('r', s);
|
| 253 |
+
}
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
// Hover elements
|
| 257 |
+
const hoverLine = gHover.append('line').attr('stroke-width', 1);
|
| 258 |
+
|
| 259 |
+
const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
|
| 260 |
+
|
| 261 |
+
function updateScales() {
|
| 262 |
+
const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
|
| 263 |
+
const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
|
| 264 |
+
const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
|
| 265 |
+
const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
|
| 266 |
+
|
| 267 |
+
width = container.clientWidth || 800;
|
| 268 |
+
height = Math.max(360, Math.round(width / 2.2));
|
| 269 |
+
svg.attr('width', width).attr('height', height);
|
| 270 |
+
|
| 271 |
+
const innerWidth = width - margin.left - margin.right;
|
| 272 |
+
const innerHeight = height - margin.top - margin.bottom;
|
| 273 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 274 |
+
|
| 275 |
+
xScale.range([0, innerWidth]);
|
| 276 |
+
yScale.range([innerHeight, 0]);
|
| 277 |
+
|
| 278 |
+
// Compute Y ticks
|
| 279 |
+
let yTicks = [];
|
| 280 |
+
if (isRankStrictFlag) {
|
| 281 |
+
const maxR = Math.max(1, Math.round(rankTickMax));
|
| 282 |
+
for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
|
| 283 |
+
} else {
|
| 284 |
+
// Use D3's tick generator to produce nice floating-point ticks
|
| 285 |
+
yTicks = yScale.ticks(6);
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
// Grid (horizontal)
|
| 289 |
+
gGrid.selectAll('*').remove();
|
| 290 |
+
gGrid.selectAll('line')
|
| 291 |
+
.data(yTicks)
|
| 292 |
+
.join('line')
|
| 293 |
+
.attr('x1', 0)
|
| 294 |
+
.attr('x2', innerWidth)
|
| 295 |
+
.attr('y1', (d) => yScale(d))
|
| 296 |
+
.attr('y2', (d) => yScale(d))
|
| 297 |
+
.attr('stroke', gridColor)
|
| 298 |
+
.attr('stroke-width', 1)
|
| 299 |
+
.attr('shape-rendering', 'crispEdges');
|
| 300 |
+
|
| 301 |
+
// Axes
|
| 302 |
+
gAxes.selectAll('*').remove();
|
| 303 |
+
let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
|
| 304 |
+
if (isRankStrictFlag) {
|
| 305 |
+
const [dx0, dx1] = xScale.domain();
|
| 306 |
+
const start = Math.ceil(dx0 / 1000) * 1000;
|
| 307 |
+
const end = Math.floor(dx1 / 1000) * 1000;
|
| 308 |
+
const xTicks = [];
|
| 309 |
+
for (let v = start; v <= end; v += 1000) xTicks.push(v);
|
| 310 |
+
if (xTicks.length === 0) xTicks.push(Math.round(dx0));
|
| 311 |
+
xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
|
| 312 |
+
} else {
|
| 313 |
+
xAxis = xAxis.ticks(8);
|
| 314 |
+
}
|
| 315 |
+
const yAxis = d3.axisLeft(yScale)
|
| 316 |
+
.tickValues(yTicks)
|
| 317 |
+
.tickSizeOuter(0)
|
| 318 |
+
.tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
|
| 319 |
+
gAxes.append('g')
|
| 320 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 321 |
+
.call(xAxis)
|
| 322 |
+
.call((g) => {
|
| 323 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 324 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 325 |
+
});
|
| 326 |
+
gAxes.append('g')
|
| 327 |
+
.call(yAxis)
|
| 328 |
+
.call((g) => {
|
| 329 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 330 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
// Axis labels (X and Y)
|
| 334 |
+
gAxes.append('text')
|
| 335 |
+
.attr('class', 'axis-label axis-label--x')
|
| 336 |
+
.attr('x', innerWidth / 2)
|
| 337 |
+
.attr('y', innerHeight + 44)
|
| 338 |
+
.attr('text-anchor', 'middle')
|
| 339 |
+
.style('font-size', '12px')
|
| 340 |
+
.style('fill', tickColor)
|
| 341 |
+
.text('Step');
|
| 342 |
+
gAxes.append('text')
|
| 343 |
+
.attr('class', 'axis-label axis-label--y')
|
| 344 |
+
.attr('text-anchor', 'middle')
|
| 345 |
+
.attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
|
| 346 |
+
.style('font-size', '12px')
|
| 347 |
+
.style('fill', tickColor)
|
| 348 |
+
.text('Value');
|
| 349 |
+
|
| 350 |
+
overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
|
| 351 |
+
hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
|
| 352 |
+
|
| 353 |
+
// Legend placeholder; actual content set in renderMetric
|
| 354 |
+
const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
|
| 355 |
+
const legendHeight = 64;
|
| 356 |
+
gLegend
|
| 357 |
+
.attr('x', innerWidth - legendWidth + 42)
|
| 358 |
+
.attr('y', innerHeight - legendHeight - 12)
|
| 359 |
+
.attr('width', legendWidth)
|
| 360 |
+
.attr('height', legendHeight);
|
| 361 |
+
const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
|
| 362 |
+
Object.assign(legendRoot.node().style, {
|
| 363 |
+
background: 'transparent',
|
| 364 |
+
border: 'none',
|
| 365 |
+
borderRadius: '0',
|
| 366 |
+
padding: '0',
|
| 367 |
+
fontSize: '12px',
|
| 368 |
+
lineHeight: '1.35',
|
| 369 |
+
color: 'var(--text-color)'
|
| 370 |
+
});
|
| 371 |
+
|
| 372 |
+
return { innerWidth, innerHeight };
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
function renderMetric(metricKey){
|
| 376 |
+
const map = dataByMetric.get(metricKey) || {};
|
| 377 |
+
const runs = runOrder;
|
| 378 |
+
// Domain
|
| 379 |
+
let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
|
| 380 |
+
const isRank = /rank/i.test(metricKey);
|
| 381 |
+
const isAverage = /average/i.test(metricKey);
|
| 382 |
+
const isRankStrict = isRank && !isAverage;
|
| 383 |
+
runs.forEach(r => {
|
| 384 |
+
const arr = map[r] || [];
|
| 385 |
+
arr.forEach(pt => {
|
| 386 |
+
const val = isRankStrict ? Math.round(pt.value) : pt.value;
|
| 387 |
+
minStep = Math.min(minStep, pt.step);
|
| 388 |
+
maxStep = Math.max(maxStep, pt.step);
|
| 389 |
+
maxVal = Math.max(maxVal, val);
|
| 390 |
+
minVal = Math.min(minVal, val);
|
| 391 |
+
});
|
| 392 |
+
});
|
| 393 |
+
if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
|
| 394 |
+
xScale.domain([minStep, maxStep]);
|
| 395 |
+
if (isRank) {
|
| 396 |
+
rankTickMax = Math.max(1, Math.round(maxVal));
|
| 397 |
+
yScale.domain([rankTickMax, 1]);
|
| 398 |
+
} else {
|
| 399 |
+
yScale.domain([0, Math.max(1, maxVal)]).nice();
|
| 400 |
+
}
|
| 401 |
+
isRankStrictFlag = isRankStrict;
|
| 402 |
+
|
| 403 |
+
const { innerWidth, innerHeight } = updateScales();
|
| 404 |
+
|
| 405 |
+
// Bind lines and markers
|
| 406 |
+
const series = runs.map((r, i) => ({
|
| 407 |
+
run: r,
|
| 408 |
+
color: pool[i % pool.length],
|
| 409 |
+
marker: markerShapes[i % markerShapes.length],
|
| 410 |
+
values: (map[r]||[])
|
| 411 |
+
.slice()
|
| 412 |
+
.sort((a,b)=>a.step-b.step)
|
| 413 |
+
.map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
|
| 414 |
+
}));
|
| 415 |
+
|
| 416 |
+
// Draw lines
|
| 417 |
+
const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
|
| 418 |
+
paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
|
| 419 |
+
.attr('stroke', d=>d.color).attr('opacity',0.9)
|
| 420 |
+
.attr('d', d=>lineGen(d.values))
|
| 421 |
+
.merge(paths)
|
| 422 |
+
.transition().duration(200)
|
| 423 |
+
.attr('stroke', d=>d.color)
|
| 424 |
+
.attr('d', d=>lineGen(d.values));
|
| 425 |
+
paths.exit().remove();
|
| 426 |
+
|
| 427 |
+
// Draw markers for each data point
|
| 428 |
+
gPoints.selectAll('*').remove();
|
| 429 |
+
series.forEach((s, seriesIndex) => {
|
| 430 |
+
const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
|
| 431 |
+
.data(s.values)
|
| 432 |
+
.join('g')
|
| 433 |
+
.attr('class', `points-${seriesIndex}`)
|
| 434 |
+
.attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
|
| 435 |
+
|
| 436 |
+
drawMarker(pointGroup, s.marker, markerSize)
|
| 437 |
+
.attr('fill', s.color)
|
| 438 |
+
.attr('stroke', s.color)
|
| 439 |
+
.attr('stroke-width', 1.5)
|
| 440 |
+
.style('cursor', 'crosshair');
|
| 441 |
+
});
|
| 442 |
+
|
| 443 |
+
// Inline legend content with marker shapes
|
| 444 |
+
legendInline.innerHTML = '';
|
| 445 |
+
series.forEach(s => {
|
| 446 |
+
const legendItem = document.createElement('span');
|
| 447 |
+
legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
|
| 448 |
+
|
| 449 |
+
// Create small SVG for marker shape
|
| 450 |
+
const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
|
| 451 |
+
markerSvg.setAttribute('width', '16');
|
| 452 |
+
markerSvg.setAttribute('height', '12');
|
| 453 |
+
markerSvg.style.display = 'inline-block';
|
| 454 |
+
|
| 455 |
+
const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
|
| 456 |
+
g.setAttribute('transform', 'translate(8,6)');
|
| 457 |
+
|
| 458 |
+
let shape;
|
| 459 |
+
const size = 6;
|
| 460 |
+
const halfSize = size / 2;
|
| 461 |
+
switch(s.marker) {
|
| 462 |
+
case 'circle':
|
| 463 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 464 |
+
shape.setAttribute('r', halfSize);
|
| 465 |
+
break;
|
| 466 |
+
case 'square':
|
| 467 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
|
| 468 |
+
shape.setAttribute('x', -halfSize);
|
| 469 |
+
shape.setAttribute('y', -halfSize);
|
| 470 |
+
shape.setAttribute('width', size);
|
| 471 |
+
shape.setAttribute('height', size);
|
| 472 |
+
break;
|
| 473 |
+
case 'triangle':
|
| 474 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 475 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
|
| 476 |
+
break;
|
| 477 |
+
case 'diamond':
|
| 478 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 479 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
|
| 480 |
+
break;
|
| 481 |
+
case 'inverted-triangle':
|
| 482 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 483 |
+
shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
|
| 484 |
+
break;
|
| 485 |
+
default:
|
| 486 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 487 |
+
shape.setAttribute('r', halfSize);
|
| 488 |
+
}
|
| 489 |
+
shape.setAttribute('fill', s.color);
|
| 490 |
+
shape.setAttribute('stroke', s.color);
|
| 491 |
+
shape.setAttribute('stroke-width', '1');
|
| 492 |
+
|
| 493 |
+
g.appendChild(shape);
|
| 494 |
+
markerSvg.appendChild(g);
|
| 495 |
+
|
| 496 |
+
const label = document.createElement('span');
|
| 497 |
+
label.textContent = s.run;
|
| 498 |
+
|
| 499 |
+
legendItem.appendChild(markerSvg);
|
| 500 |
+
legendItem.appendChild(label);
|
| 501 |
+
legendInline.appendChild(legendItem);
|
| 502 |
+
});
|
| 503 |
+
|
| 504 |
+
// Hover
|
| 505 |
+
const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
|
| 506 |
+
const steps = Array.from(stepSet).sort((a,b)=>a-b);
|
| 507 |
+
function onMove(event){
|
| 508 |
+
const [mx, my] = d3.pointer(event, overlay.node());
|
| 509 |
+
const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
|
| 510 |
+
const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
|
| 511 |
+
const xpx = xScale(nearest);
|
| 512 |
+
hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
|
| 513 |
+
// Tooltip content
|
| 514 |
+
let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
|
| 515 |
+
series.forEach(s=>{
|
| 516 |
+
const m = new Map(s.values.map(v=>[v.step, v.value]));
|
| 517 |
+
const val = m.has(nearest) ? m.get(nearest) : null;
|
| 518 |
+
if (val != null) {
|
| 519 |
+
const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
|
| 520 |
+
html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
|
| 521 |
+
}
|
| 522 |
+
});
|
| 523 |
+
tipInner.innerHTML = html;
|
| 524 |
+
const offsetX = 12, offsetY = 12;
|
| 525 |
+
tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
|
| 526 |
+
}
|
| 527 |
+
function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
|
| 528 |
+
overlay.on('mousemove', onMove).on('mouseleave', onLeave);
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
// (old hover removed; hover is attached in renderMetric)
|
| 532 |
+
|
| 533 |
+
// Load CSV and wire controls
|
| 534 |
+
(async () => {
|
| 535 |
+
try {
|
| 536 |
+
const text = await fetchFirstAvailable(CSV_PATHS);
|
| 537 |
+
const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
|
| 538 |
+
metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
|
| 539 |
+
runList = Array.from(new Set(rows.map(r=>r.run))).sort();
|
| 540 |
+
runOrder = runList;
|
| 541 |
+
// Build dataByMetric
|
| 542 |
+
metricList.forEach(m => {
|
| 543 |
+
const map = {};
|
| 544 |
+
runList.forEach(r => { map[r] = []; });
|
| 545 |
+
rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
|
| 546 |
+
dataByMetric.set(m, map);
|
| 547 |
+
});
|
| 548 |
+
|
| 549 |
+
// Populate metric select (default to average_rank if present)
|
| 550 |
+
metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
|
| 551 |
+
const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
|
| 552 |
+
if (def) selectMetric.value = def;
|
| 553 |
+
|
| 554 |
+
container.appendChild(controls);
|
| 555 |
+
updateScales();
|
| 556 |
+
renderMetric(selectMetric.value);
|
| 557 |
+
|
| 558 |
+
selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
|
| 559 |
+
|
| 560 |
+
const rerender = () => { renderMetric(selectMetric.value); };
|
| 561 |
+
if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
|
| 562 |
+
} catch (e) {
|
| 563 |
+
const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
|
| 564 |
+
pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
|
| 565 |
+
container.appendChild(pre);
|
| 566 |
+
}
|
| 567 |
+
})();
|
| 568 |
+
};
|
| 569 |
+
|
| 570 |
+
if (document.readyState === 'loading') {
|
| 571 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 572 |
+
} else { ensureD3(bootstrap); }
|
| 573 |
+
})();
|
| 574 |
+
</script>
|
| 575 |
+
|
| 576 |
+
|
app/src/content/embeds/internal-deduplication.html
ADDED
|
@@ -0,0 +1,576 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-line" style="width:100%;margin:10px 0;"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-line .d3-line__controls select {
|
| 4 |
+
font-size: 12px;
|
| 5 |
+
padding: 8px 28px 8px 10px;
|
| 6 |
+
border: 1px solid var(--border-color);
|
| 7 |
+
border-radius: 8px;
|
| 8 |
+
background-color: var(--surface-bg);
|
| 9 |
+
color: var(--text-color);
|
| 10 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 11 |
+
background-repeat: no-repeat;
|
| 12 |
+
background-position: right 8px center;
|
| 13 |
+
background-size: 12px;
|
| 14 |
+
-webkit-appearance: none;
|
| 15 |
+
-moz-appearance: none;
|
| 16 |
+
appearance: none;
|
| 17 |
+
cursor: pointer;
|
| 18 |
+
transition: border-color .15s ease, box-shadow .15s ease;
|
| 19 |
+
}
|
| 20 |
+
[data-theme="dark"] .d3-line .d3-line__controls select {
|
| 21 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 22 |
+
}
|
| 23 |
+
.d3-line .d3-line__controls select:hover {
|
| 24 |
+
border-color: var(--primary-color);
|
| 25 |
+
}
|
| 26 |
+
.d3-line .d3-line__controls select:focus {
|
| 27 |
+
border-color: var(--primary-color);
|
| 28 |
+
box-shadow: 0 0 0 3px rgba(232,137,171,.25);
|
| 29 |
+
outline: none;
|
| 30 |
+
}
|
| 31 |
+
.d3-line .d3-line__controls label { gap: 8px; }
|
| 32 |
+
|
| 33 |
+
/* Range slider themed with --primary-color */
|
| 34 |
+
.d3-line .d3-line__controls input[type="range"] {
|
| 35 |
+
-webkit-appearance: none;
|
| 36 |
+
appearance: none;
|
| 37 |
+
width: 100%;
|
| 38 |
+
height: 6px;
|
| 39 |
+
border-radius: 999px;
|
| 40 |
+
background: var(--border-color);
|
| 41 |
+
outline: none;
|
| 42 |
+
}
|
| 43 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
|
| 44 |
+
height: 6px;
|
| 45 |
+
background: transparent;
|
| 46 |
+
border-radius: 999px;
|
| 47 |
+
}
|
| 48 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
|
| 49 |
+
-webkit-appearance: none;
|
| 50 |
+
appearance: none;
|
| 51 |
+
width: 16px;
|
| 52 |
+
height: 16px;
|
| 53 |
+
border-radius: 50%;
|
| 54 |
+
background: var(--primary-color);
|
| 55 |
+
border: 2px solid var(--on-primary);
|
| 56 |
+
margin-top: -5px;
|
| 57 |
+
cursor: pointer;
|
| 58 |
+
}
|
| 59 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-track {
|
| 60 |
+
height: 6px;
|
| 61 |
+
background: transparent;
|
| 62 |
+
border-radius: 999px;
|
| 63 |
+
}
|
| 64 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
|
| 65 |
+
width: 16px;
|
| 66 |
+
height: 16px;
|
| 67 |
+
border-radius: 50%;
|
| 68 |
+
background: var(--primary-color);
|
| 69 |
+
border: 2px solid var(--on-primary);
|
| 70 |
+
cursor: pointer;
|
| 71 |
+
}
|
| 72 |
+
/* Improved line color via CSS */
|
| 73 |
+
.d3-line .lines path.improved { stroke: var(--primary-color); }
|
| 74 |
+
</style>
|
| 75 |
+
<script>
|
| 76 |
+
(() => {
|
| 77 |
+
const ensureD3 = (cb) => {
|
| 78 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 79 |
+
let s = document.getElementById('d3-cdn-script');
|
| 80 |
+
if (!s) {
|
| 81 |
+
s = document.createElement('script');
|
| 82 |
+
s.id = 'd3-cdn-script';
|
| 83 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 84 |
+
document.head.appendChild(s);
|
| 85 |
+
}
|
| 86 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 87 |
+
s.addEventListener('load', onReady, { once: true });
|
| 88 |
+
if (window.d3) onReady();
|
| 89 |
+
};
|
| 90 |
+
|
| 91 |
+
const bootstrap = () => {
|
| 92 |
+
const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
|
| 93 |
+
const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
|
| 94 |
+
if (!container) return;
|
| 95 |
+
if (container.dataset) {
|
| 96 |
+
if (container.dataset.mounted === 'true') return;
|
| 97 |
+
container.dataset.mounted = 'true';
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
// CSV: prefer public path, fallback to relative
|
| 101 |
+
const CSV_PATHS = [
|
| 102 |
+
'/data/internal_deduplication.csv',
|
| 103 |
+
'./assets/data/internal_deduplication.csv',
|
| 104 |
+
'../assets/data/internal_deduplication.csv',
|
| 105 |
+
'../../assets/data/internal_deduplication.csv'
|
| 106 |
+
];
|
| 107 |
+
const fetchFirstAvailable = async (paths) => {
|
| 108 |
+
for (const p of paths) {
|
| 109 |
+
try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
|
| 110 |
+
}
|
| 111 |
+
throw new Error('CSV not found: internal_deduplication.csv');
|
| 112 |
+
};
|
| 113 |
+
|
| 114 |
+
// Controls UI
|
| 115 |
+
const controls = document.createElement('div');
|
| 116 |
+
controls.className = 'd3-line__controls';
|
| 117 |
+
Object.assign(controls.style, {
|
| 118 |
+
marginTop: '12px',
|
| 119 |
+
display: 'flex',
|
| 120 |
+
gap: '16px',
|
| 121 |
+
alignItems: 'center',
|
| 122 |
+
justifyContent: 'space-between',
|
| 123 |
+
width: '100%'
|
| 124 |
+
});
|
| 125 |
+
|
| 126 |
+
const labelMetric = document.createElement('label');
|
| 127 |
+
Object.assign(labelMetric.style, {
|
| 128 |
+
fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
|
| 129 |
+
});
|
| 130 |
+
labelMetric.textContent = 'Metric';
|
| 131 |
+
const selectMetric = document.createElement('select');
|
| 132 |
+
Object.assign(selectMetric.style, { fontSize: '12px' });
|
| 133 |
+
labelMetric.appendChild(selectMetric);
|
| 134 |
+
|
| 135 |
+
// Inline legend on the right of the select
|
| 136 |
+
const legendInline = document.createElement('div');
|
| 137 |
+
legendInline.className = 'controls__legend';
|
| 138 |
+
Object.assign(legendInline.style, {
|
| 139 |
+
display: 'flex',
|
| 140 |
+
gap: '8px',
|
| 141 |
+
alignItems: 'center',
|
| 142 |
+
flexWrap: 'nowrap',
|
| 143 |
+
fontSize: '11px',
|
| 144 |
+
marginLeft: '8px'
|
| 145 |
+
});
|
| 146 |
+
controls.appendChild(legendInline);
|
| 147 |
+
controls.appendChild(labelMetric);
|
| 148 |
+
|
| 149 |
+
// Create SVG with marker definitions
|
| 150 |
+
const svg = d3.select(container).append('svg')
|
| 151 |
+
.attr('width', '100%')
|
| 152 |
+
.style('display', 'block');
|
| 153 |
+
|
| 154 |
+
// Add marker definitions for different shapes
|
| 155 |
+
const defs = svg.append('defs');
|
| 156 |
+
|
| 157 |
+
// Academic marker shapes
|
| 158 |
+
const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
|
| 159 |
+
const markerSize = 8;
|
| 160 |
+
|
| 161 |
+
// Groups
|
| 162 |
+
const gRoot = svg.append('g');
|
| 163 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 164 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 165 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 166 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 167 |
+
const gHover = gRoot.append('g').attr('class', 'hover');
|
| 168 |
+
const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
|
| 169 |
+
|
| 170 |
+
// Tooltip
|
| 171 |
+
container.style.position = container.style.position || 'relative';
|
| 172 |
+
let tip = container.querySelector('.d3-tooltip');
|
| 173 |
+
let tipInner;
|
| 174 |
+
if (!tip) {
|
| 175 |
+
tip = document.createElement('div');
|
| 176 |
+
tip.className = 'd3-tooltip';
|
| 177 |
+
Object.assign(tip.style, {
|
| 178 |
+
position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
|
| 179 |
+
padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
|
| 180 |
+
background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
|
| 181 |
+
transition: 'opacity .12s ease'
|
| 182 |
+
});
|
| 183 |
+
tipInner = document.createElement('div');
|
| 184 |
+
tipInner.className = 'd3-tooltip__inner';
|
| 185 |
+
tipInner.style.textAlign = 'left';
|
| 186 |
+
tip.appendChild(tipInner);
|
| 187 |
+
container.appendChild(tip);
|
| 188 |
+
} else {
|
| 189 |
+
tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
// Colors per run
|
| 193 |
+
const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
|
| 194 |
+
const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
|
| 195 |
+
|
| 196 |
+
// Mapping from metric names to display titles
|
| 197 |
+
const metricTitleMapping = {
|
| 198 |
+
'docvqa_val_anls': 'DocVQA',
|
| 199 |
+
'infovqa_val_anls': 'InfoVQA',
|
| 200 |
+
'mme_total_score': 'MME Total',
|
| 201 |
+
'mmmu_val_mmmu_acc': 'MMMU',
|
| 202 |
+
'mmstar_average': 'MMStar',
|
| 203 |
+
'ocrbench_ocrbench_accuracy': 'OCRBench',
|
| 204 |
+
'scienceqa_exact_match': 'ScienceQA',
|
| 205 |
+
'textvqa_val_exact_match': 'TextVQA',
|
| 206 |
+
'average': 'Average (excl. MME)',
|
| 207 |
+
'average_rank': 'Average Rank',
|
| 208 |
+
'ai2d_exact_match': 'AI2D',
|
| 209 |
+
'chartqa_relaxed_overall': 'ChartQA',
|
| 210 |
+
'seedbench_seed_all': 'SeedBench'
|
| 211 |
+
};
|
| 212 |
+
|
| 213 |
+
// Function to get display name for metric
|
| 214 |
+
function getMetricDisplayName(metricKey) {
|
| 215 |
+
return metricTitleMapping[metricKey] || metricKey;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// State and data
|
| 219 |
+
let metricList = [];
|
| 220 |
+
let runList = [];
|
| 221 |
+
let runOrder = [];
|
| 222 |
+
const dataByMetric = new Map(); // metric => { run => [{step,value}] }
|
| 223 |
+
let isRankStrictFlag = false;
|
| 224 |
+
let rankTickMax = 1;
|
| 225 |
+
|
| 226 |
+
// Scales and layout
|
| 227 |
+
let width = 800, height = 360;
|
| 228 |
+
let margin = { top: 16, right: 28, bottom: 56, left: 64 };
|
| 229 |
+
let xScale = d3.scaleLinear();
|
| 230 |
+
let yScale = d3.scaleLinear();
|
| 231 |
+
|
| 232 |
+
// Line generators - simple linear connections
|
| 233 |
+
const lineGen = d3.line()
|
| 234 |
+
.x((d) => xScale(d.step))
|
| 235 |
+
.y((d) => yScale(d.value));
|
| 236 |
+
|
| 237 |
+
// Function to draw different marker shapes
|
| 238 |
+
function drawMarker(selection, shape, size) {
|
| 239 |
+
const s = size / 2;
|
| 240 |
+
switch (shape) {
|
| 241 |
+
case 'circle':
|
| 242 |
+
return selection.append('circle').attr('r', s);
|
| 243 |
+
case 'square':
|
| 244 |
+
return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
|
| 245 |
+
case 'triangle':
|
| 246 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
|
| 247 |
+
case 'diamond':
|
| 248 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
|
| 249 |
+
case 'inverted-triangle':
|
| 250 |
+
return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
|
| 251 |
+
default:
|
| 252 |
+
return selection.append('circle').attr('r', s);
|
| 253 |
+
}
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
// Hover elements
|
| 257 |
+
const hoverLine = gHover.append('line').attr('stroke-width', 1);
|
| 258 |
+
|
| 259 |
+
const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
|
| 260 |
+
|
| 261 |
+
function updateScales() {
|
| 262 |
+
const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
|
| 263 |
+
const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
|
| 264 |
+
const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
|
| 265 |
+
const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
|
| 266 |
+
|
| 267 |
+
width = container.clientWidth || 800;
|
| 268 |
+
height = Math.max(360, Math.round(width / 2.2));
|
| 269 |
+
svg.attr('width', width).attr('height', height);
|
| 270 |
+
|
| 271 |
+
const innerWidth = width - margin.left - margin.right;
|
| 272 |
+
const innerHeight = height - margin.top - margin.bottom;
|
| 273 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 274 |
+
|
| 275 |
+
xScale.range([0, innerWidth]);
|
| 276 |
+
yScale.range([innerHeight, 0]);
|
| 277 |
+
|
| 278 |
+
// Compute Y ticks
|
| 279 |
+
let yTicks = [];
|
| 280 |
+
if (isRankStrictFlag) {
|
| 281 |
+
const maxR = Math.max(1, Math.round(rankTickMax));
|
| 282 |
+
for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
|
| 283 |
+
} else {
|
| 284 |
+
// Use D3's tick generator to produce nice floating-point ticks
|
| 285 |
+
yTicks = yScale.ticks(6);
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
// Grid (horizontal)
|
| 289 |
+
gGrid.selectAll('*').remove();
|
| 290 |
+
gGrid.selectAll('line')
|
| 291 |
+
.data(yTicks)
|
| 292 |
+
.join('line')
|
| 293 |
+
.attr('x1', 0)
|
| 294 |
+
.attr('x2', innerWidth)
|
| 295 |
+
.attr('y1', (d) => yScale(d))
|
| 296 |
+
.attr('y2', (d) => yScale(d))
|
| 297 |
+
.attr('stroke', gridColor)
|
| 298 |
+
.attr('stroke-width', 1)
|
| 299 |
+
.attr('shape-rendering', 'crispEdges');
|
| 300 |
+
|
| 301 |
+
// Axes
|
| 302 |
+
gAxes.selectAll('*').remove();
|
| 303 |
+
let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
|
| 304 |
+
if (isRankStrictFlag) {
|
| 305 |
+
const [dx0, dx1] = xScale.domain();
|
| 306 |
+
const start = Math.ceil(dx0 / 1000) * 1000;
|
| 307 |
+
const end = Math.floor(dx1 / 1000) * 1000;
|
| 308 |
+
const xTicks = [];
|
| 309 |
+
for (let v = start; v <= end; v += 1000) xTicks.push(v);
|
| 310 |
+
if (xTicks.length === 0) xTicks.push(Math.round(dx0));
|
| 311 |
+
xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
|
| 312 |
+
} else {
|
| 313 |
+
xAxis = xAxis.ticks(8);
|
| 314 |
+
}
|
| 315 |
+
const yAxis = d3.axisLeft(yScale)
|
| 316 |
+
.tickValues(yTicks)
|
| 317 |
+
.tickSizeOuter(0)
|
| 318 |
+
.tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
|
| 319 |
+
gAxes.append('g')
|
| 320 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 321 |
+
.call(xAxis)
|
| 322 |
+
.call((g) => {
|
| 323 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 324 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 325 |
+
});
|
| 326 |
+
gAxes.append('g')
|
| 327 |
+
.call(yAxis)
|
| 328 |
+
.call((g) => {
|
| 329 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 330 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
// Axis labels (X and Y)
|
| 334 |
+
gAxes.append('text')
|
| 335 |
+
.attr('class', 'axis-label axis-label--x')
|
| 336 |
+
.attr('x', innerWidth / 2)
|
| 337 |
+
.attr('y', innerHeight + 44)
|
| 338 |
+
.attr('text-anchor', 'middle')
|
| 339 |
+
.style('font-size', '12px')
|
| 340 |
+
.style('fill', tickColor)
|
| 341 |
+
.text('Step');
|
| 342 |
+
gAxes.append('text')
|
| 343 |
+
.attr('class', 'axis-label axis-label--y')
|
| 344 |
+
.attr('text-anchor', 'middle')
|
| 345 |
+
.attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
|
| 346 |
+
.style('font-size', '12px')
|
| 347 |
+
.style('fill', tickColor)
|
| 348 |
+
.text('Value');
|
| 349 |
+
|
| 350 |
+
overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
|
| 351 |
+
hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
|
| 352 |
+
|
| 353 |
+
// Legend placeholder; actual content set in renderMetric
|
| 354 |
+
const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
|
| 355 |
+
const legendHeight = 64;
|
| 356 |
+
gLegend
|
| 357 |
+
.attr('x', innerWidth - legendWidth + 42)
|
| 358 |
+
.attr('y', innerHeight - legendHeight - 12)
|
| 359 |
+
.attr('width', legendWidth)
|
| 360 |
+
.attr('height', legendHeight);
|
| 361 |
+
const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
|
| 362 |
+
Object.assign(legendRoot.node().style, {
|
| 363 |
+
background: 'transparent',
|
| 364 |
+
border: 'none',
|
| 365 |
+
borderRadius: '0',
|
| 366 |
+
padding: '0',
|
| 367 |
+
fontSize: '12px',
|
| 368 |
+
lineHeight: '1.35',
|
| 369 |
+
color: 'var(--text-color)'
|
| 370 |
+
});
|
| 371 |
+
|
| 372 |
+
return { innerWidth, innerHeight };
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
function renderMetric(metricKey){
|
| 376 |
+
const map = dataByMetric.get(metricKey) || {};
|
| 377 |
+
const runs = runOrder;
|
| 378 |
+
// Domain
|
| 379 |
+
let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
|
| 380 |
+
const isRank = /rank/i.test(metricKey);
|
| 381 |
+
const isAverage = /average/i.test(metricKey);
|
| 382 |
+
const isRankStrict = isRank && !isAverage;
|
| 383 |
+
runs.forEach(r => {
|
| 384 |
+
const arr = map[r] || [];
|
| 385 |
+
arr.forEach(pt => {
|
| 386 |
+
const val = isRankStrict ? Math.round(pt.value) : pt.value;
|
| 387 |
+
minStep = Math.min(minStep, pt.step);
|
| 388 |
+
maxStep = Math.max(maxStep, pt.step);
|
| 389 |
+
maxVal = Math.max(maxVal, val);
|
| 390 |
+
minVal = Math.min(minVal, val);
|
| 391 |
+
});
|
| 392 |
+
});
|
| 393 |
+
if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
|
| 394 |
+
xScale.domain([minStep, maxStep]);
|
| 395 |
+
if (isRank) {
|
| 396 |
+
rankTickMax = Math.max(1, Math.round(maxVal));
|
| 397 |
+
yScale.domain([rankTickMax, 1]);
|
| 398 |
+
} else {
|
| 399 |
+
yScale.domain([0, Math.max(1, maxVal)]).nice();
|
| 400 |
+
}
|
| 401 |
+
isRankStrictFlag = isRankStrict;
|
| 402 |
+
|
| 403 |
+
const { innerWidth, innerHeight } = updateScales();
|
| 404 |
+
|
| 405 |
+
// Bind lines and markers
|
| 406 |
+
const series = runs.map((r, i) => ({
|
| 407 |
+
run: r,
|
| 408 |
+
color: pool[i % pool.length],
|
| 409 |
+
marker: markerShapes[i % markerShapes.length],
|
| 410 |
+
values: (map[r]||[])
|
| 411 |
+
.slice()
|
| 412 |
+
.sort((a,b)=>a.step-b.step)
|
| 413 |
+
.map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
|
| 414 |
+
}));
|
| 415 |
+
|
| 416 |
+
// Draw lines
|
| 417 |
+
const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
|
| 418 |
+
paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
|
| 419 |
+
.attr('stroke', d=>d.color).attr('opacity',0.9)
|
| 420 |
+
.attr('d', d=>lineGen(d.values))
|
| 421 |
+
.merge(paths)
|
| 422 |
+
.transition().duration(200)
|
| 423 |
+
.attr('stroke', d=>d.color)
|
| 424 |
+
.attr('d', d=>lineGen(d.values));
|
| 425 |
+
paths.exit().remove();
|
| 426 |
+
|
| 427 |
+
// Draw markers for each data point
|
| 428 |
+
gPoints.selectAll('*').remove();
|
| 429 |
+
series.forEach((s, seriesIndex) => {
|
| 430 |
+
const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
|
| 431 |
+
.data(s.values)
|
| 432 |
+
.join('g')
|
| 433 |
+
.attr('class', `points-${seriesIndex}`)
|
| 434 |
+
.attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
|
| 435 |
+
|
| 436 |
+
drawMarker(pointGroup, s.marker, markerSize)
|
| 437 |
+
.attr('fill', s.color)
|
| 438 |
+
.attr('stroke', s.color)
|
| 439 |
+
.attr('stroke-width', 1.5)
|
| 440 |
+
.style('cursor', 'crosshair');
|
| 441 |
+
});
|
| 442 |
+
|
| 443 |
+
// Inline legend content with marker shapes
|
| 444 |
+
legendInline.innerHTML = '';
|
| 445 |
+
series.forEach(s => {
|
| 446 |
+
const legendItem = document.createElement('span');
|
| 447 |
+
legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
|
| 448 |
+
|
| 449 |
+
// Create small SVG for marker shape
|
| 450 |
+
const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
|
| 451 |
+
markerSvg.setAttribute('width', '16');
|
| 452 |
+
markerSvg.setAttribute('height', '12');
|
| 453 |
+
markerSvg.style.display = 'inline-block';
|
| 454 |
+
|
| 455 |
+
const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
|
| 456 |
+
g.setAttribute('transform', 'translate(8,6)');
|
| 457 |
+
|
| 458 |
+
let shape;
|
| 459 |
+
const size = 6;
|
| 460 |
+
const halfSize = size / 2;
|
| 461 |
+
switch(s.marker) {
|
| 462 |
+
case 'circle':
|
| 463 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 464 |
+
shape.setAttribute('r', halfSize);
|
| 465 |
+
break;
|
| 466 |
+
case 'square':
|
| 467 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
|
| 468 |
+
shape.setAttribute('x', -halfSize);
|
| 469 |
+
shape.setAttribute('y', -halfSize);
|
| 470 |
+
shape.setAttribute('width', size);
|
| 471 |
+
shape.setAttribute('height', size);
|
| 472 |
+
break;
|
| 473 |
+
case 'triangle':
|
| 474 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 475 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
|
| 476 |
+
break;
|
| 477 |
+
case 'diamond':
|
| 478 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 479 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
|
| 480 |
+
break;
|
| 481 |
+
case 'inverted-triangle':
|
| 482 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 483 |
+
shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
|
| 484 |
+
break;
|
| 485 |
+
default:
|
| 486 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 487 |
+
shape.setAttribute('r', halfSize);
|
| 488 |
+
}
|
| 489 |
+
shape.setAttribute('fill', s.color);
|
| 490 |
+
shape.setAttribute('stroke', s.color);
|
| 491 |
+
shape.setAttribute('stroke-width', '1');
|
| 492 |
+
|
| 493 |
+
g.appendChild(shape);
|
| 494 |
+
markerSvg.appendChild(g);
|
| 495 |
+
|
| 496 |
+
const label = document.createElement('span');
|
| 497 |
+
label.textContent = s.run;
|
| 498 |
+
|
| 499 |
+
legendItem.appendChild(markerSvg);
|
| 500 |
+
legendItem.appendChild(label);
|
| 501 |
+
legendInline.appendChild(legendItem);
|
| 502 |
+
});
|
| 503 |
+
|
| 504 |
+
// Hover
|
| 505 |
+
const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
|
| 506 |
+
const steps = Array.from(stepSet).sort((a,b)=>a-b);
|
| 507 |
+
function onMove(event){
|
| 508 |
+
const [mx, my] = d3.pointer(event, overlay.node());
|
| 509 |
+
const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
|
| 510 |
+
const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
|
| 511 |
+
const xpx = xScale(nearest);
|
| 512 |
+
hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
|
| 513 |
+
// Tooltip content
|
| 514 |
+
let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
|
| 515 |
+
series.forEach(s=>{
|
| 516 |
+
const m = new Map(s.values.map(v=>[v.step, v.value]));
|
| 517 |
+
const val = m.has(nearest) ? m.get(nearest) : null;
|
| 518 |
+
if (val != null) {
|
| 519 |
+
const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
|
| 520 |
+
html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
|
| 521 |
+
}
|
| 522 |
+
});
|
| 523 |
+
tipInner.innerHTML = html;
|
| 524 |
+
const offsetX = 12, offsetY = 12;
|
| 525 |
+
tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
|
| 526 |
+
}
|
| 527 |
+
function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
|
| 528 |
+
overlay.on('mousemove', onMove).on('mouseleave', onLeave);
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
// (old hover removed; hover is attached in renderMetric)
|
| 532 |
+
|
| 533 |
+
// Load CSV and wire controls
|
| 534 |
+
(async () => {
|
| 535 |
+
try {
|
| 536 |
+
const text = await fetchFirstAvailable(CSV_PATHS);
|
| 537 |
+
const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
|
| 538 |
+
metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
|
| 539 |
+
runList = Array.from(new Set(rows.map(r=>r.run))).sort();
|
| 540 |
+
runOrder = runList;
|
| 541 |
+
// Build dataByMetric
|
| 542 |
+
metricList.forEach(m => {
|
| 543 |
+
const map = {};
|
| 544 |
+
runList.forEach(r => { map[r] = []; });
|
| 545 |
+
rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
|
| 546 |
+
dataByMetric.set(m, map);
|
| 547 |
+
});
|
| 548 |
+
|
| 549 |
+
// Populate metric select (default to average_rank if present)
|
| 550 |
+
metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
|
| 551 |
+
const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
|
| 552 |
+
if (def) selectMetric.value = def;
|
| 553 |
+
|
| 554 |
+
container.appendChild(controls);
|
| 555 |
+
updateScales();
|
| 556 |
+
renderMetric(selectMetric.value);
|
| 557 |
+
|
| 558 |
+
selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
|
| 559 |
+
|
| 560 |
+
const rerender = () => { renderMetric(selectMetric.value); };
|
| 561 |
+
if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
|
| 562 |
+
} catch (e) {
|
| 563 |
+
const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
|
| 564 |
+
pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
|
| 565 |
+
container.appendChild(pre);
|
| 566 |
+
}
|
| 567 |
+
})();
|
| 568 |
+
};
|
| 569 |
+
|
| 570 |
+
if (document.readyState === 'loading') {
|
| 571 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 572 |
+
} else { ensureD3(bootstrap); }
|
| 573 |
+
})();
|
| 574 |
+
</script>
|
| 575 |
+
|
| 576 |
+
|
app/src/content/embeds/relevance-filters.html
ADDED
|
@@ -0,0 +1,576 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-line" style="width:100%;margin:10px 0;"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-line .d3-line__controls select {
|
| 4 |
+
font-size: 12px;
|
| 5 |
+
padding: 8px 28px 8px 10px;
|
| 6 |
+
border: 1px solid var(--border-color);
|
| 7 |
+
border-radius: 8px;
|
| 8 |
+
background-color: var(--surface-bg);
|
| 9 |
+
color: var(--text-color);
|
| 10 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 11 |
+
background-repeat: no-repeat;
|
| 12 |
+
background-position: right 8px center;
|
| 13 |
+
background-size: 12px;
|
| 14 |
+
-webkit-appearance: none;
|
| 15 |
+
-moz-appearance: none;
|
| 16 |
+
appearance: none;
|
| 17 |
+
cursor: pointer;
|
| 18 |
+
transition: border-color .15s ease, box-shadow .15s ease;
|
| 19 |
+
}
|
| 20 |
+
[data-theme="dark"] .d3-line .d3-line__controls select {
|
| 21 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 22 |
+
}
|
| 23 |
+
.d3-line .d3-line__controls select:hover {
|
| 24 |
+
border-color: var(--primary-color);
|
| 25 |
+
}
|
| 26 |
+
.d3-line .d3-line__controls select:focus {
|
| 27 |
+
border-color: var(--primary-color);
|
| 28 |
+
box-shadow: 0 0 0 3px rgba(232,137,171,.25);
|
| 29 |
+
outline: none;
|
| 30 |
+
}
|
| 31 |
+
.d3-line .d3-line__controls label { gap: 8px; }
|
| 32 |
+
|
| 33 |
+
/* Range slider themed with --primary-color */
|
| 34 |
+
.d3-line .d3-line__controls input[type="range"] {
|
| 35 |
+
-webkit-appearance: none;
|
| 36 |
+
appearance: none;
|
| 37 |
+
width: 100%;
|
| 38 |
+
height: 6px;
|
| 39 |
+
border-radius: 999px;
|
| 40 |
+
background: var(--border-color);
|
| 41 |
+
outline: none;
|
| 42 |
+
}
|
| 43 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
|
| 44 |
+
height: 6px;
|
| 45 |
+
background: transparent;
|
| 46 |
+
border-radius: 999px;
|
| 47 |
+
}
|
| 48 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
|
| 49 |
+
-webkit-appearance: none;
|
| 50 |
+
appearance: none;
|
| 51 |
+
width: 16px;
|
| 52 |
+
height: 16px;
|
| 53 |
+
border-radius: 50%;
|
| 54 |
+
background: var(--primary-color);
|
| 55 |
+
border: 2px solid var(--on-primary);
|
| 56 |
+
margin-top: -5px;
|
| 57 |
+
cursor: pointer;
|
| 58 |
+
}
|
| 59 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-track {
|
| 60 |
+
height: 6px;
|
| 61 |
+
background: transparent;
|
| 62 |
+
border-radius: 999px;
|
| 63 |
+
}
|
| 64 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
|
| 65 |
+
width: 16px;
|
| 66 |
+
height: 16px;
|
| 67 |
+
border-radius: 50%;
|
| 68 |
+
background: var(--primary-color);
|
| 69 |
+
border: 2px solid var(--on-primary);
|
| 70 |
+
cursor: pointer;
|
| 71 |
+
}
|
| 72 |
+
/* Improved line color via CSS */
|
| 73 |
+
.d3-line .lines path.improved { stroke: var(--primary-color); }
|
| 74 |
+
</style>
|
| 75 |
+
<script>
|
| 76 |
+
(() => {
|
| 77 |
+
const ensureD3 = (cb) => {
|
| 78 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 79 |
+
let s = document.getElementById('d3-cdn-script');
|
| 80 |
+
if (!s) {
|
| 81 |
+
s = document.createElement('script');
|
| 82 |
+
s.id = 'd3-cdn-script';
|
| 83 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 84 |
+
document.head.appendChild(s);
|
| 85 |
+
}
|
| 86 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 87 |
+
s.addEventListener('load', onReady, { once: true });
|
| 88 |
+
if (window.d3) onReady();
|
| 89 |
+
};
|
| 90 |
+
|
| 91 |
+
const bootstrap = () => {
|
| 92 |
+
const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
|
| 93 |
+
const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
|
| 94 |
+
if (!container) return;
|
| 95 |
+
if (container.dataset) {
|
| 96 |
+
if (container.dataset.mounted === 'true') return;
|
| 97 |
+
container.dataset.mounted = 'true';
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
// CSV: prefer public path, fallback to relative
|
| 101 |
+
const CSV_PATHS = [
|
| 102 |
+
'/data/relevance_filters.csv',
|
| 103 |
+
'./assets/data/relevance_filters.csv',
|
| 104 |
+
'../assets/data/relevance_filters.csv',
|
| 105 |
+
'../../assets/data/relevance_filters.csv'
|
| 106 |
+
];
|
| 107 |
+
const fetchFirstAvailable = async (paths) => {
|
| 108 |
+
for (const p of paths) {
|
| 109 |
+
try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
|
| 110 |
+
}
|
| 111 |
+
throw new Error('CSV not found: relevance_filters.csv');
|
| 112 |
+
};
|
| 113 |
+
|
| 114 |
+
// Controls UI
|
| 115 |
+
const controls = document.createElement('div');
|
| 116 |
+
controls.className = 'd3-line__controls';
|
| 117 |
+
Object.assign(controls.style, {
|
| 118 |
+
marginTop: '12px',
|
| 119 |
+
display: 'flex',
|
| 120 |
+
gap: '16px',
|
| 121 |
+
alignItems: 'center',
|
| 122 |
+
justifyContent: 'space-between',
|
| 123 |
+
width: '100%'
|
| 124 |
+
});
|
| 125 |
+
|
| 126 |
+
const labelMetric = document.createElement('label');
|
| 127 |
+
Object.assign(labelMetric.style, {
|
| 128 |
+
fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
|
| 129 |
+
});
|
| 130 |
+
labelMetric.textContent = 'Metric';
|
| 131 |
+
const selectMetric = document.createElement('select');
|
| 132 |
+
Object.assign(selectMetric.style, { fontSize: '12px' });
|
| 133 |
+
labelMetric.appendChild(selectMetric);
|
| 134 |
+
|
| 135 |
+
// Inline legend on the right of the select
|
| 136 |
+
const legendInline = document.createElement('div');
|
| 137 |
+
legendInline.className = 'controls__legend';
|
| 138 |
+
Object.assign(legendInline.style, {
|
| 139 |
+
display: 'flex',
|
| 140 |
+
gap: '8px',
|
| 141 |
+
alignItems: 'center',
|
| 142 |
+
flexWrap: 'nowrap',
|
| 143 |
+
fontSize: '11px',
|
| 144 |
+
marginLeft: '8px'
|
| 145 |
+
});
|
| 146 |
+
controls.appendChild(legendInline);
|
| 147 |
+
controls.appendChild(labelMetric);
|
| 148 |
+
|
| 149 |
+
// Create SVG with marker definitions
|
| 150 |
+
const svg = d3.select(container).append('svg')
|
| 151 |
+
.attr('width', '100%')
|
| 152 |
+
.style('display', 'block');
|
| 153 |
+
|
| 154 |
+
// Add marker definitions for different shapes
|
| 155 |
+
const defs = svg.append('defs');
|
| 156 |
+
|
| 157 |
+
// Academic marker shapes
|
| 158 |
+
const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
|
| 159 |
+
const markerSize = 8;
|
| 160 |
+
|
| 161 |
+
// Groups
|
| 162 |
+
const gRoot = svg.append('g');
|
| 163 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 164 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 165 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 166 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 167 |
+
const gHover = gRoot.append('g').attr('class', 'hover');
|
| 168 |
+
const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
|
| 169 |
+
|
| 170 |
+
// Tooltip
|
| 171 |
+
container.style.position = container.style.position || 'relative';
|
| 172 |
+
let tip = container.querySelector('.d3-tooltip');
|
| 173 |
+
let tipInner;
|
| 174 |
+
if (!tip) {
|
| 175 |
+
tip = document.createElement('div');
|
| 176 |
+
tip.className = 'd3-tooltip';
|
| 177 |
+
Object.assign(tip.style, {
|
| 178 |
+
position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
|
| 179 |
+
padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
|
| 180 |
+
background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
|
| 181 |
+
transition: 'opacity .12s ease'
|
| 182 |
+
});
|
| 183 |
+
tipInner = document.createElement('div');
|
| 184 |
+
tipInner.className = 'd3-tooltip__inner';
|
| 185 |
+
tipInner.style.textAlign = 'left';
|
| 186 |
+
tip.appendChild(tipInner);
|
| 187 |
+
container.appendChild(tip);
|
| 188 |
+
} else {
|
| 189 |
+
tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
// Colors per run
|
| 193 |
+
const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
|
| 194 |
+
const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
|
| 195 |
+
|
| 196 |
+
// Mapping from metric names to display titles
|
| 197 |
+
const metricTitleMapping = {
|
| 198 |
+
'docvqa_val_anls': 'DocVQA',
|
| 199 |
+
'infovqa_val_anls': 'InfoVQA',
|
| 200 |
+
'mme_total_score': 'MME Total',
|
| 201 |
+
'mmmu_val_mmmu_acc': 'MMMU',
|
| 202 |
+
'mmstar_average': 'MMStar',
|
| 203 |
+
'ocrbench_ocrbench_accuracy': 'OCRBench',
|
| 204 |
+
'scienceqa_exact_match': 'ScienceQA',
|
| 205 |
+
'textvqa_val_exact_match': 'TextVQA',
|
| 206 |
+
'average': 'Average (excl. MME)',
|
| 207 |
+
'average_rank': 'Average Rank',
|
| 208 |
+
'ai2d_exact_match': 'AI2D',
|
| 209 |
+
'chartqa_relaxed_overall': 'ChartQA',
|
| 210 |
+
'seedbench_seed_all': 'SeedBench'
|
| 211 |
+
};
|
| 212 |
+
|
| 213 |
+
// Function to get display name for metric
|
| 214 |
+
function getMetricDisplayName(metricKey) {
|
| 215 |
+
return metricTitleMapping[metricKey] || metricKey;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// State and data
|
| 219 |
+
let metricList = [];
|
| 220 |
+
let runList = [];
|
| 221 |
+
let runOrder = [];
|
| 222 |
+
const dataByMetric = new Map(); // metric => { run => [{step,value}] }
|
| 223 |
+
let isRankStrictFlag = false;
|
| 224 |
+
let rankTickMax = 1;
|
| 225 |
+
|
| 226 |
+
// Scales and layout
|
| 227 |
+
let width = 800, height = 360;
|
| 228 |
+
let margin = { top: 16, right: 28, bottom: 56, left: 64 };
|
| 229 |
+
let xScale = d3.scaleLinear();
|
| 230 |
+
let yScale = d3.scaleLinear();
|
| 231 |
+
|
| 232 |
+
// Line generators - simple linear connections
|
| 233 |
+
const lineGen = d3.line()
|
| 234 |
+
.x((d) => xScale(d.step))
|
| 235 |
+
.y((d) => yScale(d.value));
|
| 236 |
+
|
| 237 |
+
// Function to draw different marker shapes
|
| 238 |
+
function drawMarker(selection, shape, size) {
|
| 239 |
+
const s = size / 2;
|
| 240 |
+
switch (shape) {
|
| 241 |
+
case 'circle':
|
| 242 |
+
return selection.append('circle').attr('r', s);
|
| 243 |
+
case 'square':
|
| 244 |
+
return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
|
| 245 |
+
case 'triangle':
|
| 246 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
|
| 247 |
+
case 'diamond':
|
| 248 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
|
| 249 |
+
case 'inverted-triangle':
|
| 250 |
+
return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
|
| 251 |
+
default:
|
| 252 |
+
return selection.append('circle').attr('r', s);
|
| 253 |
+
}
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
// Hover elements
|
| 257 |
+
const hoverLine = gHover.append('line').attr('stroke-width', 1);
|
| 258 |
+
|
| 259 |
+
const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
|
| 260 |
+
|
| 261 |
+
function updateScales() {
|
| 262 |
+
const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
|
| 263 |
+
const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
|
| 264 |
+
const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
|
| 265 |
+
const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
|
| 266 |
+
|
| 267 |
+
width = container.clientWidth || 800;
|
| 268 |
+
height = Math.max(360, Math.round(width / 2.2));
|
| 269 |
+
svg.attr('width', width).attr('height', height);
|
| 270 |
+
|
| 271 |
+
const innerWidth = width - margin.left - margin.right;
|
| 272 |
+
const innerHeight = height - margin.top - margin.bottom;
|
| 273 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 274 |
+
|
| 275 |
+
xScale.range([0, innerWidth]);
|
| 276 |
+
yScale.range([innerHeight, 0]);
|
| 277 |
+
|
| 278 |
+
// Compute Y ticks
|
| 279 |
+
let yTicks = [];
|
| 280 |
+
if (isRankStrictFlag) {
|
| 281 |
+
const maxR = Math.max(1, Math.round(rankTickMax));
|
| 282 |
+
for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
|
| 283 |
+
} else {
|
| 284 |
+
// Use D3's tick generator to produce nice floating-point ticks
|
| 285 |
+
yTicks = yScale.ticks(6);
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
// Grid (horizontal)
|
| 289 |
+
gGrid.selectAll('*').remove();
|
| 290 |
+
gGrid.selectAll('line')
|
| 291 |
+
.data(yTicks)
|
| 292 |
+
.join('line')
|
| 293 |
+
.attr('x1', 0)
|
| 294 |
+
.attr('x2', innerWidth)
|
| 295 |
+
.attr('y1', (d) => yScale(d))
|
| 296 |
+
.attr('y2', (d) => yScale(d))
|
| 297 |
+
.attr('stroke', gridColor)
|
| 298 |
+
.attr('stroke-width', 1)
|
| 299 |
+
.attr('shape-rendering', 'crispEdges');
|
| 300 |
+
|
| 301 |
+
// Axes
|
| 302 |
+
gAxes.selectAll('*').remove();
|
| 303 |
+
let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
|
| 304 |
+
if (isRankStrictFlag) {
|
| 305 |
+
const [dx0, dx1] = xScale.domain();
|
| 306 |
+
const start = Math.ceil(dx0 / 1000) * 1000;
|
| 307 |
+
const end = Math.floor(dx1 / 1000) * 1000;
|
| 308 |
+
const xTicks = [];
|
| 309 |
+
for (let v = start; v <= end; v += 1000) xTicks.push(v);
|
| 310 |
+
if (xTicks.length === 0) xTicks.push(Math.round(dx0));
|
| 311 |
+
xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
|
| 312 |
+
} else {
|
| 313 |
+
xAxis = xAxis.ticks(8);
|
| 314 |
+
}
|
| 315 |
+
const yAxis = d3.axisLeft(yScale)
|
| 316 |
+
.tickValues(yTicks)
|
| 317 |
+
.tickSizeOuter(0)
|
| 318 |
+
.tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
|
| 319 |
+
gAxes.append('g')
|
| 320 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 321 |
+
.call(xAxis)
|
| 322 |
+
.call((g) => {
|
| 323 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 324 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 325 |
+
});
|
| 326 |
+
gAxes.append('g')
|
| 327 |
+
.call(yAxis)
|
| 328 |
+
.call((g) => {
|
| 329 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 330 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
// Axis labels (X and Y)
|
| 334 |
+
gAxes.append('text')
|
| 335 |
+
.attr('class', 'axis-label axis-label--x')
|
| 336 |
+
.attr('x', innerWidth / 2)
|
| 337 |
+
.attr('y', innerHeight + 44)
|
| 338 |
+
.attr('text-anchor', 'middle')
|
| 339 |
+
.style('font-size', '12px')
|
| 340 |
+
.style('fill', tickColor)
|
| 341 |
+
.text('Step');
|
| 342 |
+
gAxes.append('text')
|
| 343 |
+
.attr('class', 'axis-label axis-label--y')
|
| 344 |
+
.attr('text-anchor', 'middle')
|
| 345 |
+
.attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
|
| 346 |
+
.style('font-size', '12px')
|
| 347 |
+
.style('fill', tickColor)
|
| 348 |
+
.text('Value');
|
| 349 |
+
|
| 350 |
+
overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
|
| 351 |
+
hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
|
| 352 |
+
|
| 353 |
+
// Legend placeholder; actual content set in renderMetric
|
| 354 |
+
const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
|
| 355 |
+
const legendHeight = 64;
|
| 356 |
+
gLegend
|
| 357 |
+
.attr('x', innerWidth - legendWidth + 42)
|
| 358 |
+
.attr('y', innerHeight - legendHeight - 12)
|
| 359 |
+
.attr('width', legendWidth)
|
| 360 |
+
.attr('height', legendHeight);
|
| 361 |
+
const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
|
| 362 |
+
Object.assign(legendRoot.node().style, {
|
| 363 |
+
background: 'transparent',
|
| 364 |
+
border: 'none',
|
| 365 |
+
borderRadius: '0',
|
| 366 |
+
padding: '0',
|
| 367 |
+
fontSize: '12px',
|
| 368 |
+
lineHeight: '1.35',
|
| 369 |
+
color: 'var(--text-color)'
|
| 370 |
+
});
|
| 371 |
+
|
| 372 |
+
return { innerWidth, innerHeight };
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
function renderMetric(metricKey){
|
| 376 |
+
const map = dataByMetric.get(metricKey) || {};
|
| 377 |
+
const runs = runOrder;
|
| 378 |
+
// Domain
|
| 379 |
+
let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
|
| 380 |
+
const isRank = /rank/i.test(metricKey);
|
| 381 |
+
const isAverage = /average/i.test(metricKey);
|
| 382 |
+
const isRankStrict = isRank && !isAverage;
|
| 383 |
+
runs.forEach(r => {
|
| 384 |
+
const arr = map[r] || [];
|
| 385 |
+
arr.forEach(pt => {
|
| 386 |
+
const val = isRankStrict ? Math.round(pt.value) : pt.value;
|
| 387 |
+
minStep = Math.min(minStep, pt.step);
|
| 388 |
+
maxStep = Math.max(maxStep, pt.step);
|
| 389 |
+
maxVal = Math.max(maxVal, val);
|
| 390 |
+
minVal = Math.min(minVal, val);
|
| 391 |
+
});
|
| 392 |
+
});
|
| 393 |
+
if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
|
| 394 |
+
xScale.domain([minStep, maxStep]);
|
| 395 |
+
if (isRank) {
|
| 396 |
+
rankTickMax = Math.max(1, Math.round(maxVal));
|
| 397 |
+
yScale.domain([rankTickMax, 1]);
|
| 398 |
+
} else {
|
| 399 |
+
yScale.domain([0, Math.max(1, maxVal)]).nice();
|
| 400 |
+
}
|
| 401 |
+
isRankStrictFlag = isRankStrict;
|
| 402 |
+
|
| 403 |
+
const { innerWidth, innerHeight } = updateScales();
|
| 404 |
+
|
| 405 |
+
// Bind lines and markers
|
| 406 |
+
const series = runs.map((r, i) => ({
|
| 407 |
+
run: r,
|
| 408 |
+
color: pool[i % pool.length],
|
| 409 |
+
marker: markerShapes[i % markerShapes.length],
|
| 410 |
+
values: (map[r]||[])
|
| 411 |
+
.slice()
|
| 412 |
+
.sort((a,b)=>a.step-b.step)
|
| 413 |
+
.map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
|
| 414 |
+
}));
|
| 415 |
+
|
| 416 |
+
// Draw lines
|
| 417 |
+
const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
|
| 418 |
+
paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
|
| 419 |
+
.attr('stroke', d=>d.color).attr('opacity',0.9)
|
| 420 |
+
.attr('d', d=>lineGen(d.values))
|
| 421 |
+
.merge(paths)
|
| 422 |
+
.transition().duration(200)
|
| 423 |
+
.attr('stroke', d=>d.color)
|
| 424 |
+
.attr('d', d=>lineGen(d.values));
|
| 425 |
+
paths.exit().remove();
|
| 426 |
+
|
| 427 |
+
// Draw markers for each data point
|
| 428 |
+
gPoints.selectAll('*').remove();
|
| 429 |
+
series.forEach((s, seriesIndex) => {
|
| 430 |
+
const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
|
| 431 |
+
.data(s.values)
|
| 432 |
+
.join('g')
|
| 433 |
+
.attr('class', `points-${seriesIndex}`)
|
| 434 |
+
.attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
|
| 435 |
+
|
| 436 |
+
drawMarker(pointGroup, s.marker, markerSize)
|
| 437 |
+
.attr('fill', s.color)
|
| 438 |
+
.attr('stroke', s.color)
|
| 439 |
+
.attr('stroke-width', 1.5)
|
| 440 |
+
.style('cursor', 'crosshair');
|
| 441 |
+
});
|
| 442 |
+
|
| 443 |
+
// Inline legend content with marker shapes
|
| 444 |
+
legendInline.innerHTML = '';
|
| 445 |
+
series.forEach(s => {
|
| 446 |
+
const legendItem = document.createElement('span');
|
| 447 |
+
legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
|
| 448 |
+
|
| 449 |
+
// Create small SVG for marker shape
|
| 450 |
+
const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
|
| 451 |
+
markerSvg.setAttribute('width', '16');
|
| 452 |
+
markerSvg.setAttribute('height', '12');
|
| 453 |
+
markerSvg.style.display = 'inline-block';
|
| 454 |
+
|
| 455 |
+
const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
|
| 456 |
+
g.setAttribute('transform', 'translate(8,6)');
|
| 457 |
+
|
| 458 |
+
let shape;
|
| 459 |
+
const size = 6;
|
| 460 |
+
const halfSize = size / 2;
|
| 461 |
+
switch(s.marker) {
|
| 462 |
+
case 'circle':
|
| 463 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 464 |
+
shape.setAttribute('r', halfSize);
|
| 465 |
+
break;
|
| 466 |
+
case 'square':
|
| 467 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
|
| 468 |
+
shape.setAttribute('x', -halfSize);
|
| 469 |
+
shape.setAttribute('y', -halfSize);
|
| 470 |
+
shape.setAttribute('width', size);
|
| 471 |
+
shape.setAttribute('height', size);
|
| 472 |
+
break;
|
| 473 |
+
case 'triangle':
|
| 474 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 475 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
|
| 476 |
+
break;
|
| 477 |
+
case 'diamond':
|
| 478 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 479 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
|
| 480 |
+
break;
|
| 481 |
+
case 'inverted-triangle':
|
| 482 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 483 |
+
shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
|
| 484 |
+
break;
|
| 485 |
+
default:
|
| 486 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 487 |
+
shape.setAttribute('r', halfSize);
|
| 488 |
+
}
|
| 489 |
+
shape.setAttribute('fill', s.color);
|
| 490 |
+
shape.setAttribute('stroke', s.color);
|
| 491 |
+
shape.setAttribute('stroke-width', '1');
|
| 492 |
+
|
| 493 |
+
g.appendChild(shape);
|
| 494 |
+
markerSvg.appendChild(g);
|
| 495 |
+
|
| 496 |
+
const label = document.createElement('span');
|
| 497 |
+
label.textContent = s.run;
|
| 498 |
+
|
| 499 |
+
legendItem.appendChild(markerSvg);
|
| 500 |
+
legendItem.appendChild(label);
|
| 501 |
+
legendInline.appendChild(legendItem);
|
| 502 |
+
});
|
| 503 |
+
|
| 504 |
+
// Hover
|
| 505 |
+
const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
|
| 506 |
+
const steps = Array.from(stepSet).sort((a,b)=>a-b);
|
| 507 |
+
function onMove(event){
|
| 508 |
+
const [mx, my] = d3.pointer(event, overlay.node());
|
| 509 |
+
const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
|
| 510 |
+
const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
|
| 511 |
+
const xpx = xScale(nearest);
|
| 512 |
+
hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
|
| 513 |
+
// Tooltip content
|
| 514 |
+
let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
|
| 515 |
+
series.forEach(s=>{
|
| 516 |
+
const m = new Map(s.values.map(v=>[v.step, v.value]));
|
| 517 |
+
const val = m.has(nearest) ? m.get(nearest) : null;
|
| 518 |
+
if (val != null) {
|
| 519 |
+
const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
|
| 520 |
+
html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
|
| 521 |
+
}
|
| 522 |
+
});
|
| 523 |
+
tipInner.innerHTML = html;
|
| 524 |
+
const offsetX = 12, offsetY = 12;
|
| 525 |
+
tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
|
| 526 |
+
}
|
| 527 |
+
function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
|
| 528 |
+
overlay.on('mousemove', onMove).on('mouseleave', onLeave);
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
// (old hover removed; hover is attached in renderMetric)
|
| 532 |
+
|
| 533 |
+
// Load CSV and wire controls
|
| 534 |
+
(async () => {
|
| 535 |
+
try {
|
| 536 |
+
const text = await fetchFirstAvailable(CSV_PATHS);
|
| 537 |
+
const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
|
| 538 |
+
metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
|
| 539 |
+
runList = Array.from(new Set(rows.map(r=>r.run))).sort();
|
| 540 |
+
runOrder = runList;
|
| 541 |
+
// Build dataByMetric
|
| 542 |
+
metricList.forEach(m => {
|
| 543 |
+
const map = {};
|
| 544 |
+
runList.forEach(r => { map[r] = []; });
|
| 545 |
+
rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
|
| 546 |
+
dataByMetric.set(m, map);
|
| 547 |
+
});
|
| 548 |
+
|
| 549 |
+
// Populate metric select (default to average_rank if present)
|
| 550 |
+
metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
|
| 551 |
+
const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
|
| 552 |
+
if (def) selectMetric.value = def;
|
| 553 |
+
|
| 554 |
+
container.appendChild(controls);
|
| 555 |
+
updateScales();
|
| 556 |
+
renderMetric(selectMetric.value);
|
| 557 |
+
|
| 558 |
+
selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
|
| 559 |
+
|
| 560 |
+
const rerender = () => { renderMetric(selectMetric.value); };
|
| 561 |
+
if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
|
| 562 |
+
} catch (e) {
|
| 563 |
+
const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
|
| 564 |
+
pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
|
| 565 |
+
container.appendChild(pre);
|
| 566 |
+
}
|
| 567 |
+
})();
|
| 568 |
+
};
|
| 569 |
+
|
| 570 |
+
if (document.readyState === 'loading') {
|
| 571 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 572 |
+
} else { ensureD3(bootstrap); }
|
| 573 |
+
})();
|
| 574 |
+
</script>
|
| 575 |
+
|
| 576 |
+
|
app/src/content/embeds/remove-ch.html
ADDED
|
@@ -0,0 +1,576 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-line" style="width:100%;margin:10px 0;"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-line .d3-line__controls select {
|
| 4 |
+
font-size: 12px;
|
| 5 |
+
padding: 8px 28px 8px 10px;
|
| 6 |
+
border: 1px solid var(--border-color);
|
| 7 |
+
border-radius: 8px;
|
| 8 |
+
background-color: var(--surface-bg);
|
| 9 |
+
color: var(--text-color);
|
| 10 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 11 |
+
background-repeat: no-repeat;
|
| 12 |
+
background-position: right 8px center;
|
| 13 |
+
background-size: 12px;
|
| 14 |
+
-webkit-appearance: none;
|
| 15 |
+
-moz-appearance: none;
|
| 16 |
+
appearance: none;
|
| 17 |
+
cursor: pointer;
|
| 18 |
+
transition: border-color .15s ease, box-shadow .15s ease;
|
| 19 |
+
}
|
| 20 |
+
[data-theme="dark"] .d3-line .d3-line__controls select {
|
| 21 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 22 |
+
}
|
| 23 |
+
.d3-line .d3-line__controls select:hover {
|
| 24 |
+
border-color: var(--primary-color);
|
| 25 |
+
}
|
| 26 |
+
.d3-line .d3-line__controls select:focus {
|
| 27 |
+
border-color: var(--primary-color);
|
| 28 |
+
box-shadow: 0 0 0 3px rgba(232,137,171,.25);
|
| 29 |
+
outline: none;
|
| 30 |
+
}
|
| 31 |
+
.d3-line .d3-line__controls label { gap: 8px; }
|
| 32 |
+
|
| 33 |
+
/* Range slider themed with --primary-color */
|
| 34 |
+
.d3-line .d3-line__controls input[type="range"] {
|
| 35 |
+
-webkit-appearance: none;
|
| 36 |
+
appearance: none;
|
| 37 |
+
width: 100%;
|
| 38 |
+
height: 6px;
|
| 39 |
+
border-radius: 999px;
|
| 40 |
+
background: var(--border-color);
|
| 41 |
+
outline: none;
|
| 42 |
+
}
|
| 43 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
|
| 44 |
+
height: 6px;
|
| 45 |
+
background: transparent;
|
| 46 |
+
border-radius: 999px;
|
| 47 |
+
}
|
| 48 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
|
| 49 |
+
-webkit-appearance: none;
|
| 50 |
+
appearance: none;
|
| 51 |
+
width: 16px;
|
| 52 |
+
height: 16px;
|
| 53 |
+
border-radius: 50%;
|
| 54 |
+
background: var(--primary-color);
|
| 55 |
+
border: 2px solid var(--on-primary);
|
| 56 |
+
margin-top: -5px;
|
| 57 |
+
cursor: pointer;
|
| 58 |
+
}
|
| 59 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-track {
|
| 60 |
+
height: 6px;
|
| 61 |
+
background: transparent;
|
| 62 |
+
border-radius: 999px;
|
| 63 |
+
}
|
| 64 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
|
| 65 |
+
width: 16px;
|
| 66 |
+
height: 16px;
|
| 67 |
+
border-radius: 50%;
|
| 68 |
+
background: var(--primary-color);
|
| 69 |
+
border: 2px solid var(--on-primary);
|
| 70 |
+
cursor: pointer;
|
| 71 |
+
}
|
| 72 |
+
/* Improved line color via CSS */
|
| 73 |
+
.d3-line .lines path.improved { stroke: var(--primary-color); }
|
| 74 |
+
</style>
|
| 75 |
+
<script>
|
| 76 |
+
(() => {
|
| 77 |
+
const ensureD3 = (cb) => {
|
| 78 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 79 |
+
let s = document.getElementById('d3-cdn-script');
|
| 80 |
+
if (!s) {
|
| 81 |
+
s = document.createElement('script');
|
| 82 |
+
s.id = 'd3-cdn-script';
|
| 83 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 84 |
+
document.head.appendChild(s);
|
| 85 |
+
}
|
| 86 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 87 |
+
s.addEventListener('load', onReady, { once: true });
|
| 88 |
+
if (window.d3) onReady();
|
| 89 |
+
};
|
| 90 |
+
|
| 91 |
+
const bootstrap = () => {
|
| 92 |
+
const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
|
| 93 |
+
const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
|
| 94 |
+
if (!container) return;
|
| 95 |
+
if (container.dataset) {
|
| 96 |
+
if (container.dataset.mounted === 'true') return;
|
| 97 |
+
container.dataset.mounted = 'true';
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
// CSV: prefer public path, fallback to relative
|
| 101 |
+
const CSV_PATHS = [
|
| 102 |
+
'/data/remove_ch.csv',
|
| 103 |
+
'./assets/data/remove_ch.csv',
|
| 104 |
+
'../assets/data/remove_ch.csv',
|
| 105 |
+
'../../assets/data/remove_ch.csv'
|
| 106 |
+
];
|
| 107 |
+
const fetchFirstAvailable = async (paths) => {
|
| 108 |
+
for (const p of paths) {
|
| 109 |
+
try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
|
| 110 |
+
}
|
| 111 |
+
throw new Error('CSV not found: remove_ch.csv');
|
| 112 |
+
};
|
| 113 |
+
|
| 114 |
+
// Controls UI
|
| 115 |
+
const controls = document.createElement('div');
|
| 116 |
+
controls.className = 'd3-line__controls';
|
| 117 |
+
Object.assign(controls.style, {
|
| 118 |
+
marginTop: '12px',
|
| 119 |
+
display: 'flex',
|
| 120 |
+
gap: '16px',
|
| 121 |
+
alignItems: 'center',
|
| 122 |
+
justifyContent: 'space-between',
|
| 123 |
+
width: '100%'
|
| 124 |
+
});
|
| 125 |
+
|
| 126 |
+
const labelMetric = document.createElement('label');
|
| 127 |
+
Object.assign(labelMetric.style, {
|
| 128 |
+
fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
|
| 129 |
+
});
|
| 130 |
+
labelMetric.textContent = 'Metric';
|
| 131 |
+
const selectMetric = document.createElement('select');
|
| 132 |
+
Object.assign(selectMetric.style, { fontSize: '12px' });
|
| 133 |
+
labelMetric.appendChild(selectMetric);
|
| 134 |
+
|
| 135 |
+
// Inline legend on the right of the select
|
| 136 |
+
const legendInline = document.createElement('div');
|
| 137 |
+
legendInline.className = 'controls__legend';
|
| 138 |
+
Object.assign(legendInline.style, {
|
| 139 |
+
display: 'flex',
|
| 140 |
+
gap: '8px',
|
| 141 |
+
alignItems: 'center',
|
| 142 |
+
flexWrap: 'nowrap',
|
| 143 |
+
fontSize: '11px',
|
| 144 |
+
marginLeft: '8px'
|
| 145 |
+
});
|
| 146 |
+
controls.appendChild(legendInline);
|
| 147 |
+
controls.appendChild(labelMetric);
|
| 148 |
+
|
| 149 |
+
// Create SVG with marker definitions
|
| 150 |
+
const svg = d3.select(container).append('svg')
|
| 151 |
+
.attr('width', '100%')
|
| 152 |
+
.style('display', 'block');
|
| 153 |
+
|
| 154 |
+
// Add marker definitions for different shapes
|
| 155 |
+
const defs = svg.append('defs');
|
| 156 |
+
|
| 157 |
+
// Academic marker shapes
|
| 158 |
+
const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
|
| 159 |
+
const markerSize = 8;
|
| 160 |
+
|
| 161 |
+
// Groups
|
| 162 |
+
const gRoot = svg.append('g');
|
| 163 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 164 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 165 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 166 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 167 |
+
const gHover = gRoot.append('g').attr('class', 'hover');
|
| 168 |
+
const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
|
| 169 |
+
|
| 170 |
+
// Tooltip
|
| 171 |
+
container.style.position = container.style.position || 'relative';
|
| 172 |
+
let tip = container.querySelector('.d3-tooltip');
|
| 173 |
+
let tipInner;
|
| 174 |
+
if (!tip) {
|
| 175 |
+
tip = document.createElement('div');
|
| 176 |
+
tip.className = 'd3-tooltip';
|
| 177 |
+
Object.assign(tip.style, {
|
| 178 |
+
position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
|
| 179 |
+
padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
|
| 180 |
+
background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
|
| 181 |
+
transition: 'opacity .12s ease'
|
| 182 |
+
});
|
| 183 |
+
tipInner = document.createElement('div');
|
| 184 |
+
tipInner.className = 'd3-tooltip__inner';
|
| 185 |
+
tipInner.style.textAlign = 'left';
|
| 186 |
+
tip.appendChild(tipInner);
|
| 187 |
+
container.appendChild(tip);
|
| 188 |
+
} else {
|
| 189 |
+
tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
// Colors per run
|
| 193 |
+
const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
|
| 194 |
+
const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
|
| 195 |
+
|
| 196 |
+
// Mapping from metric names to display titles
|
| 197 |
+
const metricTitleMapping = {
|
| 198 |
+
'docvqa_val_anls': 'DocVQA',
|
| 199 |
+
'infovqa_val_anls': 'InfoVQA',
|
| 200 |
+
'mme_total_score': 'MME Total',
|
| 201 |
+
'mmmu_val_mmmu_acc': 'MMMU',
|
| 202 |
+
'mmstar_average': 'MMStar',
|
| 203 |
+
'ocrbench_ocrbench_accuracy': 'OCRBench',
|
| 204 |
+
'scienceqa_exact_match': 'ScienceQA',
|
| 205 |
+
'textvqa_val_exact_match': 'TextVQA',
|
| 206 |
+
'average': 'Average (excl. MME)',
|
| 207 |
+
'average_rank': 'Average Rank',
|
| 208 |
+
'ai2d_exact_match': 'AI2D',
|
| 209 |
+
'chartqa_relaxed_overall': 'ChartQA',
|
| 210 |
+
'seedbench_seed_all': 'SeedBench'
|
| 211 |
+
};
|
| 212 |
+
|
| 213 |
+
// Function to get display name for metric
|
| 214 |
+
function getMetricDisplayName(metricKey) {
|
| 215 |
+
return metricTitleMapping[metricKey] || metricKey;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// State and data
|
| 219 |
+
let metricList = [];
|
| 220 |
+
let runList = [];
|
| 221 |
+
let runOrder = [];
|
| 222 |
+
const dataByMetric = new Map(); // metric => { run => [{step,value}] }
|
| 223 |
+
let isRankStrictFlag = false;
|
| 224 |
+
let rankTickMax = 1;
|
| 225 |
+
|
| 226 |
+
// Scales and layout
|
| 227 |
+
let width = 800, height = 360;
|
| 228 |
+
let margin = { top: 16, right: 28, bottom: 56, left: 64 };
|
| 229 |
+
let xScale = d3.scaleLinear();
|
| 230 |
+
let yScale = d3.scaleLinear();
|
| 231 |
+
|
| 232 |
+
// Line generators - simple linear connections
|
| 233 |
+
const lineGen = d3.line()
|
| 234 |
+
.x((d) => xScale(d.step))
|
| 235 |
+
.y((d) => yScale(d.value));
|
| 236 |
+
|
| 237 |
+
// Function to draw different marker shapes
|
| 238 |
+
function drawMarker(selection, shape, size) {
|
| 239 |
+
const s = size / 2;
|
| 240 |
+
switch (shape) {
|
| 241 |
+
case 'circle':
|
| 242 |
+
return selection.append('circle').attr('r', s);
|
| 243 |
+
case 'square':
|
| 244 |
+
return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
|
| 245 |
+
case 'triangle':
|
| 246 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
|
| 247 |
+
case 'diamond':
|
| 248 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
|
| 249 |
+
case 'inverted-triangle':
|
| 250 |
+
return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
|
| 251 |
+
default:
|
| 252 |
+
return selection.append('circle').attr('r', s);
|
| 253 |
+
}
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
// Hover elements
|
| 257 |
+
const hoverLine = gHover.append('line').attr('stroke-width', 1);
|
| 258 |
+
|
| 259 |
+
const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
|
| 260 |
+
|
| 261 |
+
function updateScales() {
|
| 262 |
+
const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
|
| 263 |
+
const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
|
| 264 |
+
const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
|
| 265 |
+
const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
|
| 266 |
+
|
| 267 |
+
width = container.clientWidth || 800;
|
| 268 |
+
height = Math.max(360, Math.round(width / 2.2));
|
| 269 |
+
svg.attr('width', width).attr('height', height);
|
| 270 |
+
|
| 271 |
+
const innerWidth = width - margin.left - margin.right;
|
| 272 |
+
const innerHeight = height - margin.top - margin.bottom;
|
| 273 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 274 |
+
|
| 275 |
+
xScale.range([0, innerWidth]);
|
| 276 |
+
yScale.range([innerHeight, 0]);
|
| 277 |
+
|
| 278 |
+
// Compute Y ticks
|
| 279 |
+
let yTicks = [];
|
| 280 |
+
if (isRankStrictFlag) {
|
| 281 |
+
const maxR = Math.max(1, Math.round(rankTickMax));
|
| 282 |
+
for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
|
| 283 |
+
} else {
|
| 284 |
+
// Use D3's tick generator to produce nice floating-point ticks
|
| 285 |
+
yTicks = yScale.ticks(6);
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
// Grid (horizontal)
|
| 289 |
+
gGrid.selectAll('*').remove();
|
| 290 |
+
gGrid.selectAll('line')
|
| 291 |
+
.data(yTicks)
|
| 292 |
+
.join('line')
|
| 293 |
+
.attr('x1', 0)
|
| 294 |
+
.attr('x2', innerWidth)
|
| 295 |
+
.attr('y1', (d) => yScale(d))
|
| 296 |
+
.attr('y2', (d) => yScale(d))
|
| 297 |
+
.attr('stroke', gridColor)
|
| 298 |
+
.attr('stroke-width', 1)
|
| 299 |
+
.attr('shape-rendering', 'crispEdges');
|
| 300 |
+
|
| 301 |
+
// Axes
|
| 302 |
+
gAxes.selectAll('*').remove();
|
| 303 |
+
let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
|
| 304 |
+
if (isRankStrictFlag) {
|
| 305 |
+
const [dx0, dx1] = xScale.domain();
|
| 306 |
+
const start = Math.ceil(dx0 / 1000) * 1000;
|
| 307 |
+
const end = Math.floor(dx1 / 1000) * 1000;
|
| 308 |
+
const xTicks = [];
|
| 309 |
+
for (let v = start; v <= end; v += 1000) xTicks.push(v);
|
| 310 |
+
if (xTicks.length === 0) xTicks.push(Math.round(dx0));
|
| 311 |
+
xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
|
| 312 |
+
} else {
|
| 313 |
+
xAxis = xAxis.ticks(8);
|
| 314 |
+
}
|
| 315 |
+
const yAxis = d3.axisLeft(yScale)
|
| 316 |
+
.tickValues(yTicks)
|
| 317 |
+
.tickSizeOuter(0)
|
| 318 |
+
.tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
|
| 319 |
+
gAxes.append('g')
|
| 320 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 321 |
+
.call(xAxis)
|
| 322 |
+
.call((g) => {
|
| 323 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 324 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 325 |
+
});
|
| 326 |
+
gAxes.append('g')
|
| 327 |
+
.call(yAxis)
|
| 328 |
+
.call((g) => {
|
| 329 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 330 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
// Axis labels (X and Y)
|
| 334 |
+
gAxes.append('text')
|
| 335 |
+
.attr('class', 'axis-label axis-label--x')
|
| 336 |
+
.attr('x', innerWidth / 2)
|
| 337 |
+
.attr('y', innerHeight + 44)
|
| 338 |
+
.attr('text-anchor', 'middle')
|
| 339 |
+
.style('font-size', '12px')
|
| 340 |
+
.style('fill', tickColor)
|
| 341 |
+
.text('Step');
|
| 342 |
+
gAxes.append('text')
|
| 343 |
+
.attr('class', 'axis-label axis-label--y')
|
| 344 |
+
.attr('text-anchor', 'middle')
|
| 345 |
+
.attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
|
| 346 |
+
.style('font-size', '12px')
|
| 347 |
+
.style('fill', tickColor)
|
| 348 |
+
.text('Value');
|
| 349 |
+
|
| 350 |
+
overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
|
| 351 |
+
hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
|
| 352 |
+
|
| 353 |
+
// Legend placeholder; actual content set in renderMetric
|
| 354 |
+
const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
|
| 355 |
+
const legendHeight = 64;
|
| 356 |
+
gLegend
|
| 357 |
+
.attr('x', innerWidth - legendWidth + 42)
|
| 358 |
+
.attr('y', innerHeight - legendHeight - 12)
|
| 359 |
+
.attr('width', legendWidth)
|
| 360 |
+
.attr('height', legendHeight);
|
| 361 |
+
const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
|
| 362 |
+
Object.assign(legendRoot.node().style, {
|
| 363 |
+
background: 'transparent',
|
| 364 |
+
border: 'none',
|
| 365 |
+
borderRadius: '0',
|
| 366 |
+
padding: '0',
|
| 367 |
+
fontSize: '12px',
|
| 368 |
+
lineHeight: '1.35',
|
| 369 |
+
color: 'var(--text-color)'
|
| 370 |
+
});
|
| 371 |
+
|
| 372 |
+
return { innerWidth, innerHeight };
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
function renderMetric(metricKey){
|
| 376 |
+
const map = dataByMetric.get(metricKey) || {};
|
| 377 |
+
const runs = runOrder;
|
| 378 |
+
// Domain
|
| 379 |
+
let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
|
| 380 |
+
const isRank = /rank/i.test(metricKey);
|
| 381 |
+
const isAverage = /average/i.test(metricKey);
|
| 382 |
+
const isRankStrict = isRank && !isAverage;
|
| 383 |
+
runs.forEach(r => {
|
| 384 |
+
const arr = map[r] || [];
|
| 385 |
+
arr.forEach(pt => {
|
| 386 |
+
const val = isRankStrict ? Math.round(pt.value) : pt.value;
|
| 387 |
+
minStep = Math.min(minStep, pt.step);
|
| 388 |
+
maxStep = Math.max(maxStep, pt.step);
|
| 389 |
+
maxVal = Math.max(maxVal, val);
|
| 390 |
+
minVal = Math.min(minVal, val);
|
| 391 |
+
});
|
| 392 |
+
});
|
| 393 |
+
if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
|
| 394 |
+
xScale.domain([minStep, maxStep]);
|
| 395 |
+
if (isRank) {
|
| 396 |
+
rankTickMax = Math.max(1, Math.round(maxVal));
|
| 397 |
+
yScale.domain([rankTickMax, 1]);
|
| 398 |
+
} else {
|
| 399 |
+
yScale.domain([0, Math.max(1, maxVal)]).nice();
|
| 400 |
+
}
|
| 401 |
+
isRankStrictFlag = isRankStrict;
|
| 402 |
+
|
| 403 |
+
const { innerWidth, innerHeight } = updateScales();
|
| 404 |
+
|
| 405 |
+
// Bind lines and markers
|
| 406 |
+
const series = runs.map((r, i) => ({
|
| 407 |
+
run: r,
|
| 408 |
+
color: pool[i % pool.length],
|
| 409 |
+
marker: markerShapes[i % markerShapes.length],
|
| 410 |
+
values: (map[r]||[])
|
| 411 |
+
.slice()
|
| 412 |
+
.sort((a,b)=>a.step-b.step)
|
| 413 |
+
.map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
|
| 414 |
+
}));
|
| 415 |
+
|
| 416 |
+
// Draw lines
|
| 417 |
+
const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
|
| 418 |
+
paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
|
| 419 |
+
.attr('stroke', d=>d.color).attr('opacity',0.9)
|
| 420 |
+
.attr('d', d=>lineGen(d.values))
|
| 421 |
+
.merge(paths)
|
| 422 |
+
.transition().duration(200)
|
| 423 |
+
.attr('stroke', d=>d.color)
|
| 424 |
+
.attr('d', d=>lineGen(d.values));
|
| 425 |
+
paths.exit().remove();
|
| 426 |
+
|
| 427 |
+
// Draw markers for each data point
|
| 428 |
+
gPoints.selectAll('*').remove();
|
| 429 |
+
series.forEach((s, seriesIndex) => {
|
| 430 |
+
const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
|
| 431 |
+
.data(s.values)
|
| 432 |
+
.join('g')
|
| 433 |
+
.attr('class', `points-${seriesIndex}`)
|
| 434 |
+
.attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
|
| 435 |
+
|
| 436 |
+
drawMarker(pointGroup, s.marker, markerSize)
|
| 437 |
+
.attr('fill', s.color)
|
| 438 |
+
.attr('stroke', s.color)
|
| 439 |
+
.attr('stroke-width', 1.5)
|
| 440 |
+
.style('cursor', 'crosshair');
|
| 441 |
+
});
|
| 442 |
+
|
| 443 |
+
// Inline legend content with marker shapes
|
| 444 |
+
legendInline.innerHTML = '';
|
| 445 |
+
series.forEach(s => {
|
| 446 |
+
const legendItem = document.createElement('span');
|
| 447 |
+
legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
|
| 448 |
+
|
| 449 |
+
// Create small SVG for marker shape
|
| 450 |
+
const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
|
| 451 |
+
markerSvg.setAttribute('width', '16');
|
| 452 |
+
markerSvg.setAttribute('height', '12');
|
| 453 |
+
markerSvg.style.display = 'inline-block';
|
| 454 |
+
|
| 455 |
+
const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
|
| 456 |
+
g.setAttribute('transform', 'translate(8,6)');
|
| 457 |
+
|
| 458 |
+
let shape;
|
| 459 |
+
const size = 6;
|
| 460 |
+
const halfSize = size / 2;
|
| 461 |
+
switch(s.marker) {
|
| 462 |
+
case 'circle':
|
| 463 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 464 |
+
shape.setAttribute('r', halfSize);
|
| 465 |
+
break;
|
| 466 |
+
case 'square':
|
| 467 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
|
| 468 |
+
shape.setAttribute('x', -halfSize);
|
| 469 |
+
shape.setAttribute('y', -halfSize);
|
| 470 |
+
shape.setAttribute('width', size);
|
| 471 |
+
shape.setAttribute('height', size);
|
| 472 |
+
break;
|
| 473 |
+
case 'triangle':
|
| 474 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 475 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
|
| 476 |
+
break;
|
| 477 |
+
case 'diamond':
|
| 478 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 479 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
|
| 480 |
+
break;
|
| 481 |
+
case 'inverted-triangle':
|
| 482 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 483 |
+
shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
|
| 484 |
+
break;
|
| 485 |
+
default:
|
| 486 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 487 |
+
shape.setAttribute('r', halfSize);
|
| 488 |
+
}
|
| 489 |
+
shape.setAttribute('fill', s.color);
|
| 490 |
+
shape.setAttribute('stroke', s.color);
|
| 491 |
+
shape.setAttribute('stroke-width', '1');
|
| 492 |
+
|
| 493 |
+
g.appendChild(shape);
|
| 494 |
+
markerSvg.appendChild(g);
|
| 495 |
+
|
| 496 |
+
const label = document.createElement('span');
|
| 497 |
+
label.textContent = s.run;
|
| 498 |
+
|
| 499 |
+
legendItem.appendChild(markerSvg);
|
| 500 |
+
legendItem.appendChild(label);
|
| 501 |
+
legendInline.appendChild(legendItem);
|
| 502 |
+
});
|
| 503 |
+
|
| 504 |
+
// Hover
|
| 505 |
+
const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
|
| 506 |
+
const steps = Array.from(stepSet).sort((a,b)=>a-b);
|
| 507 |
+
function onMove(event){
|
| 508 |
+
const [mx, my] = d3.pointer(event, overlay.node());
|
| 509 |
+
const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
|
| 510 |
+
const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
|
| 511 |
+
const xpx = xScale(nearest);
|
| 512 |
+
hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
|
| 513 |
+
// Tooltip content
|
| 514 |
+
let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
|
| 515 |
+
series.forEach(s=>{
|
| 516 |
+
const m = new Map(s.values.map(v=>[v.step, v.value]));
|
| 517 |
+
const val = m.has(nearest) ? m.get(nearest) : null;
|
| 518 |
+
if (val != null) {
|
| 519 |
+
const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
|
| 520 |
+
html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
|
| 521 |
+
}
|
| 522 |
+
});
|
| 523 |
+
tipInner.innerHTML = html;
|
| 524 |
+
const offsetX = 12, offsetY = 12;
|
| 525 |
+
tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
|
| 526 |
+
}
|
| 527 |
+
function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
|
| 528 |
+
overlay.on('mousemove', onMove).on('mouseleave', onLeave);
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
// (old hover removed; hover is attached in renderMetric)
|
| 532 |
+
|
| 533 |
+
// Load CSV and wire controls
|
| 534 |
+
(async () => {
|
| 535 |
+
try {
|
| 536 |
+
const text = await fetchFirstAvailable(CSV_PATHS);
|
| 537 |
+
const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
|
| 538 |
+
metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
|
| 539 |
+
runList = Array.from(new Set(rows.map(r=>r.run))).sort();
|
| 540 |
+
runOrder = runList;
|
| 541 |
+
// Build dataByMetric
|
| 542 |
+
metricList.forEach(m => {
|
| 543 |
+
const map = {};
|
| 544 |
+
runList.forEach(r => { map[r] = []; });
|
| 545 |
+
rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
|
| 546 |
+
dataByMetric.set(m, map);
|
| 547 |
+
});
|
| 548 |
+
|
| 549 |
+
// Populate metric select (default to average_rank if present)
|
| 550 |
+
metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
|
| 551 |
+
const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
|
| 552 |
+
if (def) selectMetric.value = def;
|
| 553 |
+
|
| 554 |
+
container.appendChild(controls);
|
| 555 |
+
updateScales();
|
| 556 |
+
renderMetric(selectMetric.value);
|
| 557 |
+
|
| 558 |
+
selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
|
| 559 |
+
|
| 560 |
+
const rerender = () => { renderMetric(selectMetric.value); };
|
| 561 |
+
if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
|
| 562 |
+
} catch (e) {
|
| 563 |
+
const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
|
| 564 |
+
pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
|
| 565 |
+
container.appendChild(pre);
|
| 566 |
+
}
|
| 567 |
+
})();
|
| 568 |
+
};
|
| 569 |
+
|
| 570 |
+
if (document.readyState === 'loading') {
|
| 571 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 572 |
+
} else { ensureD3(bootstrap); }
|
| 573 |
+
})();
|
| 574 |
+
</script>
|
| 575 |
+
|
| 576 |
+
|
app/src/content/embeds/s25-ratings.html
ADDED
|
@@ -0,0 +1,576 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-line" style="width:100%;margin:10px 0;"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-line .d3-line__controls select {
|
| 4 |
+
font-size: 12px;
|
| 5 |
+
padding: 8px 28px 8px 10px;
|
| 6 |
+
border: 1px solid var(--border-color);
|
| 7 |
+
border-radius: 8px;
|
| 8 |
+
background-color: var(--surface-bg);
|
| 9 |
+
color: var(--text-color);
|
| 10 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 11 |
+
background-repeat: no-repeat;
|
| 12 |
+
background-position: right 8px center;
|
| 13 |
+
background-size: 12px;
|
| 14 |
+
-webkit-appearance: none;
|
| 15 |
+
-moz-appearance: none;
|
| 16 |
+
appearance: none;
|
| 17 |
+
cursor: pointer;
|
| 18 |
+
transition: border-color .15s ease, box-shadow .15s ease;
|
| 19 |
+
}
|
| 20 |
+
[data-theme="dark"] .d3-line .d3-line__controls select {
|
| 21 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 22 |
+
}
|
| 23 |
+
.d3-line .d3-line__controls select:hover {
|
| 24 |
+
border-color: var(--primary-color);
|
| 25 |
+
}
|
| 26 |
+
.d3-line .d3-line__controls select:focus {
|
| 27 |
+
border-color: var(--primary-color);
|
| 28 |
+
box-shadow: 0 0 0 3px rgba(232,137,171,.25);
|
| 29 |
+
outline: none;
|
| 30 |
+
}
|
| 31 |
+
.d3-line .d3-line__controls label { gap: 8px; }
|
| 32 |
+
|
| 33 |
+
/* Range slider themed with --primary-color */
|
| 34 |
+
.d3-line .d3-line__controls input[type="range"] {
|
| 35 |
+
-webkit-appearance: none;
|
| 36 |
+
appearance: none;
|
| 37 |
+
width: 100%;
|
| 38 |
+
height: 6px;
|
| 39 |
+
border-radius: 999px;
|
| 40 |
+
background: var(--border-color);
|
| 41 |
+
outline: none;
|
| 42 |
+
}
|
| 43 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
|
| 44 |
+
height: 6px;
|
| 45 |
+
background: transparent;
|
| 46 |
+
border-radius: 999px;
|
| 47 |
+
}
|
| 48 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
|
| 49 |
+
-webkit-appearance: none;
|
| 50 |
+
appearance: none;
|
| 51 |
+
width: 16px;
|
| 52 |
+
height: 16px;
|
| 53 |
+
border-radius: 50%;
|
| 54 |
+
background: var(--primary-color);
|
| 55 |
+
border: 2px solid var(--on-primary);
|
| 56 |
+
margin-top: -5px;
|
| 57 |
+
cursor: pointer;
|
| 58 |
+
}
|
| 59 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-track {
|
| 60 |
+
height: 6px;
|
| 61 |
+
background: transparent;
|
| 62 |
+
border-radius: 999px;
|
| 63 |
+
}
|
| 64 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
|
| 65 |
+
width: 16px;
|
| 66 |
+
height: 16px;
|
| 67 |
+
border-radius: 50%;
|
| 68 |
+
background: var(--primary-color);
|
| 69 |
+
border: 2px solid var(--on-primary);
|
| 70 |
+
cursor: pointer;
|
| 71 |
+
}
|
| 72 |
+
/* Improved line color via CSS */
|
| 73 |
+
.d3-line .lines path.improved { stroke: var(--primary-color); }
|
| 74 |
+
</style>
|
| 75 |
+
<script>
|
| 76 |
+
(() => {
|
| 77 |
+
const ensureD3 = (cb) => {
|
| 78 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 79 |
+
let s = document.getElementById('d3-cdn-script');
|
| 80 |
+
if (!s) {
|
| 81 |
+
s = document.createElement('script');
|
| 82 |
+
s.id = 'd3-cdn-script';
|
| 83 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 84 |
+
document.head.appendChild(s);
|
| 85 |
+
}
|
| 86 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 87 |
+
s.addEventListener('load', onReady, { once: true });
|
| 88 |
+
if (window.d3) onReady();
|
| 89 |
+
};
|
| 90 |
+
|
| 91 |
+
const bootstrap = () => {
|
| 92 |
+
const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
|
| 93 |
+
const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
|
| 94 |
+
if (!container) return;
|
| 95 |
+
if (container.dataset) {
|
| 96 |
+
if (container.dataset.mounted === 'true') return;
|
| 97 |
+
container.dataset.mounted = 'true';
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
// CSV: prefer public path, fallback to relative
|
| 101 |
+
const CSV_PATHS = [
|
| 102 |
+
'/data/s25_ratings.csv',
|
| 103 |
+
'./assets/data/s25_ratings.csv',
|
| 104 |
+
'../assets/data/s25_ratings.csv',
|
| 105 |
+
'../../assets/data/s25_ratings.csv'
|
| 106 |
+
];
|
| 107 |
+
const fetchFirstAvailable = async (paths) => {
|
| 108 |
+
for (const p of paths) {
|
| 109 |
+
try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
|
| 110 |
+
}
|
| 111 |
+
throw new Error('CSV not found: s25_ratings.csv');
|
| 112 |
+
};
|
| 113 |
+
|
| 114 |
+
// Controls UI
|
| 115 |
+
const controls = document.createElement('div');
|
| 116 |
+
controls.className = 'd3-line__controls';
|
| 117 |
+
Object.assign(controls.style, {
|
| 118 |
+
marginTop: '12px',
|
| 119 |
+
display: 'flex',
|
| 120 |
+
gap: '16px',
|
| 121 |
+
alignItems: 'center',
|
| 122 |
+
justifyContent: 'space-between',
|
| 123 |
+
width: '100%'
|
| 124 |
+
});
|
| 125 |
+
|
| 126 |
+
const labelMetric = document.createElement('label');
|
| 127 |
+
Object.assign(labelMetric.style, {
|
| 128 |
+
fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
|
| 129 |
+
});
|
| 130 |
+
labelMetric.textContent = 'Metric';
|
| 131 |
+
const selectMetric = document.createElement('select');
|
| 132 |
+
Object.assign(selectMetric.style, { fontSize: '12px' });
|
| 133 |
+
labelMetric.appendChild(selectMetric);
|
| 134 |
+
|
| 135 |
+
// Inline legend on the right of the select
|
| 136 |
+
const legendInline = document.createElement('div');
|
| 137 |
+
legendInline.className = 'controls__legend';
|
| 138 |
+
Object.assign(legendInline.style, {
|
| 139 |
+
display: 'flex',
|
| 140 |
+
gap: '8px',
|
| 141 |
+
alignItems: 'center',
|
| 142 |
+
flexWrap: 'nowrap',
|
| 143 |
+
fontSize: '11px',
|
| 144 |
+
marginLeft: '8px'
|
| 145 |
+
});
|
| 146 |
+
controls.appendChild(legendInline);
|
| 147 |
+
controls.appendChild(labelMetric);
|
| 148 |
+
|
| 149 |
+
// Create SVG with marker definitions
|
| 150 |
+
const svg = d3.select(container).append('svg')
|
| 151 |
+
.attr('width', '100%')
|
| 152 |
+
.style('display', 'block');
|
| 153 |
+
|
| 154 |
+
// Add marker definitions for different shapes
|
| 155 |
+
const defs = svg.append('defs');
|
| 156 |
+
|
| 157 |
+
// Academic marker shapes
|
| 158 |
+
const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
|
| 159 |
+
const markerSize = 8;
|
| 160 |
+
|
| 161 |
+
// Groups
|
| 162 |
+
const gRoot = svg.append('g');
|
| 163 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 164 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 165 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 166 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 167 |
+
const gHover = gRoot.append('g').attr('class', 'hover');
|
| 168 |
+
const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
|
| 169 |
+
|
| 170 |
+
// Tooltip
|
| 171 |
+
container.style.position = container.style.position || 'relative';
|
| 172 |
+
let tip = container.querySelector('.d3-tooltip');
|
| 173 |
+
let tipInner;
|
| 174 |
+
if (!tip) {
|
| 175 |
+
tip = document.createElement('div');
|
| 176 |
+
tip.className = 'd3-tooltip';
|
| 177 |
+
Object.assign(tip.style, {
|
| 178 |
+
position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
|
| 179 |
+
padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
|
| 180 |
+
background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
|
| 181 |
+
transition: 'opacity .12s ease'
|
| 182 |
+
});
|
| 183 |
+
tipInner = document.createElement('div');
|
| 184 |
+
tipInner.className = 'd3-tooltip__inner';
|
| 185 |
+
tipInner.style.textAlign = 'left';
|
| 186 |
+
tip.appendChild(tipInner);
|
| 187 |
+
container.appendChild(tip);
|
| 188 |
+
} else {
|
| 189 |
+
tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
// Colors per run
|
| 193 |
+
const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
|
| 194 |
+
const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
|
| 195 |
+
|
| 196 |
+
// Mapping from metric names to display titles
|
| 197 |
+
const metricTitleMapping = {
|
| 198 |
+
'docvqa_val_anls': 'DocVQA',
|
| 199 |
+
'infovqa_val_anls': 'InfoVQA',
|
| 200 |
+
'mme_total_score': 'MME Total',
|
| 201 |
+
'mmmu_val_mmmu_acc': 'MMMU',
|
| 202 |
+
'mmstar_average': 'MMStar',
|
| 203 |
+
'ocrbench_ocrbench_accuracy': 'OCRBench',
|
| 204 |
+
'scienceqa_exact_match': 'ScienceQA',
|
| 205 |
+
'textvqa_val_exact_match': 'TextVQA',
|
| 206 |
+
'average': 'Average (excl. MME)',
|
| 207 |
+
'average_rank': 'Average Rank',
|
| 208 |
+
'ai2d_exact_match': 'AI2D',
|
| 209 |
+
'chartqa_relaxed_overall': 'ChartQA',
|
| 210 |
+
'seedbench_seed_all': 'SeedBench'
|
| 211 |
+
};
|
| 212 |
+
|
| 213 |
+
// Function to get display name for metric
|
| 214 |
+
function getMetricDisplayName(metricKey) {
|
| 215 |
+
return metricTitleMapping[metricKey] || metricKey;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// State and data
|
| 219 |
+
let metricList = [];
|
| 220 |
+
let runList = [];
|
| 221 |
+
let runOrder = [];
|
| 222 |
+
const dataByMetric = new Map(); // metric => { run => [{step,value}] }
|
| 223 |
+
let isRankStrictFlag = false;
|
| 224 |
+
let rankTickMax = 1;
|
| 225 |
+
|
| 226 |
+
// Scales and layout
|
| 227 |
+
let width = 800, height = 360;
|
| 228 |
+
let margin = { top: 16, right: 28, bottom: 56, left: 64 };
|
| 229 |
+
let xScale = d3.scaleLinear();
|
| 230 |
+
let yScale = d3.scaleLinear();
|
| 231 |
+
|
| 232 |
+
// Line generators - simple linear connections
|
| 233 |
+
const lineGen = d3.line()
|
| 234 |
+
.x((d) => xScale(d.step))
|
| 235 |
+
.y((d) => yScale(d.value));
|
| 236 |
+
|
| 237 |
+
// Function to draw different marker shapes
|
| 238 |
+
function drawMarker(selection, shape, size) {
|
| 239 |
+
const s = size / 2;
|
| 240 |
+
switch (shape) {
|
| 241 |
+
case 'circle':
|
| 242 |
+
return selection.append('circle').attr('r', s);
|
| 243 |
+
case 'square':
|
| 244 |
+
return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
|
| 245 |
+
case 'triangle':
|
| 246 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
|
| 247 |
+
case 'diamond':
|
| 248 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
|
| 249 |
+
case 'inverted-triangle':
|
| 250 |
+
return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
|
| 251 |
+
default:
|
| 252 |
+
return selection.append('circle').attr('r', s);
|
| 253 |
+
}
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
// Hover elements
|
| 257 |
+
const hoverLine = gHover.append('line').attr('stroke-width', 1);
|
| 258 |
+
|
| 259 |
+
const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
|
| 260 |
+
|
| 261 |
+
function updateScales() {
|
| 262 |
+
const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
|
| 263 |
+
const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
|
| 264 |
+
const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
|
| 265 |
+
const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
|
| 266 |
+
|
| 267 |
+
width = container.clientWidth || 800;
|
| 268 |
+
height = Math.max(360, Math.round(width / 2.2));
|
| 269 |
+
svg.attr('width', width).attr('height', height);
|
| 270 |
+
|
| 271 |
+
const innerWidth = width - margin.left - margin.right;
|
| 272 |
+
const innerHeight = height - margin.top - margin.bottom;
|
| 273 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 274 |
+
|
| 275 |
+
xScale.range([0, innerWidth]);
|
| 276 |
+
yScale.range([innerHeight, 0]);
|
| 277 |
+
|
| 278 |
+
// Compute Y ticks
|
| 279 |
+
let yTicks = [];
|
| 280 |
+
if (isRankStrictFlag) {
|
| 281 |
+
const maxR = Math.max(1, Math.round(rankTickMax));
|
| 282 |
+
for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
|
| 283 |
+
} else {
|
| 284 |
+
// Use D3's tick generator to produce nice floating-point ticks
|
| 285 |
+
yTicks = yScale.ticks(6);
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
// Grid (horizontal)
|
| 289 |
+
gGrid.selectAll('*').remove();
|
| 290 |
+
gGrid.selectAll('line')
|
| 291 |
+
.data(yTicks)
|
| 292 |
+
.join('line')
|
| 293 |
+
.attr('x1', 0)
|
| 294 |
+
.attr('x2', innerWidth)
|
| 295 |
+
.attr('y1', (d) => yScale(d))
|
| 296 |
+
.attr('y2', (d) => yScale(d))
|
| 297 |
+
.attr('stroke', gridColor)
|
| 298 |
+
.attr('stroke-width', 1)
|
| 299 |
+
.attr('shape-rendering', 'crispEdges');
|
| 300 |
+
|
| 301 |
+
// Axes
|
| 302 |
+
gAxes.selectAll('*').remove();
|
| 303 |
+
let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
|
| 304 |
+
if (isRankStrictFlag) {
|
| 305 |
+
const [dx0, dx1] = xScale.domain();
|
| 306 |
+
const start = Math.ceil(dx0 / 1000) * 1000;
|
| 307 |
+
const end = Math.floor(dx1 / 1000) * 1000;
|
| 308 |
+
const xTicks = [];
|
| 309 |
+
for (let v = start; v <= end; v += 1000) xTicks.push(v);
|
| 310 |
+
if (xTicks.length === 0) xTicks.push(Math.round(dx0));
|
| 311 |
+
xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
|
| 312 |
+
} else {
|
| 313 |
+
xAxis = xAxis.ticks(8);
|
| 314 |
+
}
|
| 315 |
+
const yAxis = d3.axisLeft(yScale)
|
| 316 |
+
.tickValues(yTicks)
|
| 317 |
+
.tickSizeOuter(0)
|
| 318 |
+
.tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
|
| 319 |
+
gAxes.append('g')
|
| 320 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 321 |
+
.call(xAxis)
|
| 322 |
+
.call((g) => {
|
| 323 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 324 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 325 |
+
});
|
| 326 |
+
gAxes.append('g')
|
| 327 |
+
.call(yAxis)
|
| 328 |
+
.call((g) => {
|
| 329 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 330 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
// Axis labels (X and Y)
|
| 334 |
+
gAxes.append('text')
|
| 335 |
+
.attr('class', 'axis-label axis-label--x')
|
| 336 |
+
.attr('x', innerWidth / 2)
|
| 337 |
+
.attr('y', innerHeight + 44)
|
| 338 |
+
.attr('text-anchor', 'middle')
|
| 339 |
+
.style('font-size', '12px')
|
| 340 |
+
.style('fill', tickColor)
|
| 341 |
+
.text('Step');
|
| 342 |
+
gAxes.append('text')
|
| 343 |
+
.attr('class', 'axis-label axis-label--y')
|
| 344 |
+
.attr('text-anchor', 'middle')
|
| 345 |
+
.attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
|
| 346 |
+
.style('font-size', '12px')
|
| 347 |
+
.style('fill', tickColor)
|
| 348 |
+
.text('Value');
|
| 349 |
+
|
| 350 |
+
overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
|
| 351 |
+
hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
|
| 352 |
+
|
| 353 |
+
// Legend placeholder; actual content set in renderMetric
|
| 354 |
+
const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
|
| 355 |
+
const legendHeight = 64;
|
| 356 |
+
gLegend
|
| 357 |
+
.attr('x', innerWidth - legendWidth + 42)
|
| 358 |
+
.attr('y', innerHeight - legendHeight - 12)
|
| 359 |
+
.attr('width', legendWidth)
|
| 360 |
+
.attr('height', legendHeight);
|
| 361 |
+
const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
|
| 362 |
+
Object.assign(legendRoot.node().style, {
|
| 363 |
+
background: 'transparent',
|
| 364 |
+
border: 'none',
|
| 365 |
+
borderRadius: '0',
|
| 366 |
+
padding: '0',
|
| 367 |
+
fontSize: '12px',
|
| 368 |
+
lineHeight: '1.35',
|
| 369 |
+
color: 'var(--text-color)'
|
| 370 |
+
});
|
| 371 |
+
|
| 372 |
+
return { innerWidth, innerHeight };
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
function renderMetric(metricKey){
|
| 376 |
+
const map = dataByMetric.get(metricKey) || {};
|
| 377 |
+
const runs = runOrder;
|
| 378 |
+
// Domain
|
| 379 |
+
let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
|
| 380 |
+
const isRank = /rank/i.test(metricKey);
|
| 381 |
+
const isAverage = /average/i.test(metricKey);
|
| 382 |
+
const isRankStrict = isRank && !isAverage;
|
| 383 |
+
runs.forEach(r => {
|
| 384 |
+
const arr = map[r] || [];
|
| 385 |
+
arr.forEach(pt => {
|
| 386 |
+
const val = isRankStrict ? Math.round(pt.value) : pt.value;
|
| 387 |
+
minStep = Math.min(minStep, pt.step);
|
| 388 |
+
maxStep = Math.max(maxStep, pt.step);
|
| 389 |
+
maxVal = Math.max(maxVal, val);
|
| 390 |
+
minVal = Math.min(minVal, val);
|
| 391 |
+
});
|
| 392 |
+
});
|
| 393 |
+
if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
|
| 394 |
+
xScale.domain([minStep, maxStep]);
|
| 395 |
+
if (isRank) {
|
| 396 |
+
rankTickMax = Math.max(1, Math.round(maxVal));
|
| 397 |
+
yScale.domain([rankTickMax, 1]);
|
| 398 |
+
} else {
|
| 399 |
+
yScale.domain([0, Math.max(1, maxVal)]).nice();
|
| 400 |
+
}
|
| 401 |
+
isRankStrictFlag = isRankStrict;
|
| 402 |
+
|
| 403 |
+
const { innerWidth, innerHeight } = updateScales();
|
| 404 |
+
|
| 405 |
+
// Bind lines and markers
|
| 406 |
+
const series = runs.map((r, i) => ({
|
| 407 |
+
run: r,
|
| 408 |
+
color: pool[i % pool.length],
|
| 409 |
+
marker: markerShapes[i % markerShapes.length],
|
| 410 |
+
values: (map[r]||[])
|
| 411 |
+
.slice()
|
| 412 |
+
.sort((a,b)=>a.step-b.step)
|
| 413 |
+
.map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
|
| 414 |
+
}));
|
| 415 |
+
|
| 416 |
+
// Draw lines
|
| 417 |
+
const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
|
| 418 |
+
paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
|
| 419 |
+
.attr('stroke', d=>d.color).attr('opacity',0.9)
|
| 420 |
+
.attr('d', d=>lineGen(d.values))
|
| 421 |
+
.merge(paths)
|
| 422 |
+
.transition().duration(200)
|
| 423 |
+
.attr('stroke', d=>d.color)
|
| 424 |
+
.attr('d', d=>lineGen(d.values));
|
| 425 |
+
paths.exit().remove();
|
| 426 |
+
|
| 427 |
+
// Draw markers for each data point
|
| 428 |
+
gPoints.selectAll('*').remove();
|
| 429 |
+
series.forEach((s, seriesIndex) => {
|
| 430 |
+
const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
|
| 431 |
+
.data(s.values)
|
| 432 |
+
.join('g')
|
| 433 |
+
.attr('class', `points-${seriesIndex}`)
|
| 434 |
+
.attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
|
| 435 |
+
|
| 436 |
+
drawMarker(pointGroup, s.marker, markerSize)
|
| 437 |
+
.attr('fill', s.color)
|
| 438 |
+
.attr('stroke', s.color)
|
| 439 |
+
.attr('stroke-width', 1.5)
|
| 440 |
+
.style('cursor', 'crosshair');
|
| 441 |
+
});
|
| 442 |
+
|
| 443 |
+
// Inline legend content with marker shapes
|
| 444 |
+
legendInline.innerHTML = '';
|
| 445 |
+
series.forEach(s => {
|
| 446 |
+
const legendItem = document.createElement('span');
|
| 447 |
+
legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
|
| 448 |
+
|
| 449 |
+
// Create small SVG for marker shape
|
| 450 |
+
const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
|
| 451 |
+
markerSvg.setAttribute('width', '16');
|
| 452 |
+
markerSvg.setAttribute('height', '12');
|
| 453 |
+
markerSvg.style.display = 'inline-block';
|
| 454 |
+
|
| 455 |
+
const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
|
| 456 |
+
g.setAttribute('transform', 'translate(8,6)');
|
| 457 |
+
|
| 458 |
+
let shape;
|
| 459 |
+
const size = 6;
|
| 460 |
+
const halfSize = size / 2;
|
| 461 |
+
switch(s.marker) {
|
| 462 |
+
case 'circle':
|
| 463 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 464 |
+
shape.setAttribute('r', halfSize);
|
| 465 |
+
break;
|
| 466 |
+
case 'square':
|
| 467 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
|
| 468 |
+
shape.setAttribute('x', -halfSize);
|
| 469 |
+
shape.setAttribute('y', -halfSize);
|
| 470 |
+
shape.setAttribute('width', size);
|
| 471 |
+
shape.setAttribute('height', size);
|
| 472 |
+
break;
|
| 473 |
+
case 'triangle':
|
| 474 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 475 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
|
| 476 |
+
break;
|
| 477 |
+
case 'diamond':
|
| 478 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 479 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
|
| 480 |
+
break;
|
| 481 |
+
case 'inverted-triangle':
|
| 482 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 483 |
+
shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
|
| 484 |
+
break;
|
| 485 |
+
default:
|
| 486 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 487 |
+
shape.setAttribute('r', halfSize);
|
| 488 |
+
}
|
| 489 |
+
shape.setAttribute('fill', s.color);
|
| 490 |
+
shape.setAttribute('stroke', s.color);
|
| 491 |
+
shape.setAttribute('stroke-width', '1');
|
| 492 |
+
|
| 493 |
+
g.appendChild(shape);
|
| 494 |
+
markerSvg.appendChild(g);
|
| 495 |
+
|
| 496 |
+
const label = document.createElement('span');
|
| 497 |
+
label.textContent = s.run;
|
| 498 |
+
|
| 499 |
+
legendItem.appendChild(markerSvg);
|
| 500 |
+
legendItem.appendChild(label);
|
| 501 |
+
legendInline.appendChild(legendItem);
|
| 502 |
+
});
|
| 503 |
+
|
| 504 |
+
// Hover
|
| 505 |
+
const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
|
| 506 |
+
const steps = Array.from(stepSet).sort((a,b)=>a-b);
|
| 507 |
+
function onMove(event){
|
| 508 |
+
const [mx, my] = d3.pointer(event, overlay.node());
|
| 509 |
+
const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
|
| 510 |
+
const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
|
| 511 |
+
const xpx = xScale(nearest);
|
| 512 |
+
hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
|
| 513 |
+
// Tooltip content
|
| 514 |
+
let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
|
| 515 |
+
series.forEach(s=>{
|
| 516 |
+
const m = new Map(s.values.map(v=>[v.step, v.value]));
|
| 517 |
+
const val = m.has(nearest) ? m.get(nearest) : null;
|
| 518 |
+
if (val != null) {
|
| 519 |
+
const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
|
| 520 |
+
html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
|
| 521 |
+
}
|
| 522 |
+
});
|
| 523 |
+
tipInner.innerHTML = html;
|
| 524 |
+
const offsetX = 12, offsetY = 12;
|
| 525 |
+
tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
|
| 526 |
+
}
|
| 527 |
+
function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
|
| 528 |
+
overlay.on('mousemove', onMove).on('mouseleave', onLeave);
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
// (old hover removed; hover is attached in renderMetric)
|
| 532 |
+
|
| 533 |
+
// Load CSV and wire controls
|
| 534 |
+
(async () => {
|
| 535 |
+
try {
|
| 536 |
+
const text = await fetchFirstAvailable(CSV_PATHS);
|
| 537 |
+
const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
|
| 538 |
+
metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
|
| 539 |
+
runList = Array.from(new Set(rows.map(r=>r.run))).sort();
|
| 540 |
+
runOrder = runList;
|
| 541 |
+
// Build dataByMetric
|
| 542 |
+
metricList.forEach(m => {
|
| 543 |
+
const map = {};
|
| 544 |
+
runList.forEach(r => { map[r] = []; });
|
| 545 |
+
rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
|
| 546 |
+
dataByMetric.set(m, map);
|
| 547 |
+
});
|
| 548 |
+
|
| 549 |
+
// Populate metric select (default to average_rank if present)
|
| 550 |
+
metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
|
| 551 |
+
const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
|
| 552 |
+
if (def) selectMetric.value = def;
|
| 553 |
+
|
| 554 |
+
container.appendChild(controls);
|
| 555 |
+
updateScales();
|
| 556 |
+
renderMetric(selectMetric.value);
|
| 557 |
+
|
| 558 |
+
selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
|
| 559 |
+
|
| 560 |
+
const rerender = () => { renderMetric(selectMetric.value); };
|
| 561 |
+
if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
|
| 562 |
+
} catch (e) {
|
| 563 |
+
const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
|
| 564 |
+
pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
|
| 565 |
+
container.appendChild(pre);
|
| 566 |
+
}
|
| 567 |
+
})();
|
| 568 |
+
};
|
| 569 |
+
|
| 570 |
+
if (document.readyState === 'loading') {
|
| 571 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 572 |
+
} else { ensureD3(bootstrap); }
|
| 573 |
+
})();
|
| 574 |
+
</script>
|
| 575 |
+
|
| 576 |
+
|
app/src/content/embeds/ss-vs-s1.html
ADDED
|
@@ -0,0 +1,576 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-line" style="width:100%;margin:10px 0;"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-line .d3-line__controls select {
|
| 4 |
+
font-size: 12px;
|
| 5 |
+
padding: 8px 28px 8px 10px;
|
| 6 |
+
border: 1px solid var(--border-color);
|
| 7 |
+
border-radius: 8px;
|
| 8 |
+
background-color: var(--surface-bg);
|
| 9 |
+
color: var(--text-color);
|
| 10 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 11 |
+
background-repeat: no-repeat;
|
| 12 |
+
background-position: right 8px center;
|
| 13 |
+
background-size: 12px;
|
| 14 |
+
-webkit-appearance: none;
|
| 15 |
+
-moz-appearance: none;
|
| 16 |
+
appearance: none;
|
| 17 |
+
cursor: pointer;
|
| 18 |
+
transition: border-color .15s ease, box-shadow .15s ease;
|
| 19 |
+
}
|
| 20 |
+
[data-theme="dark"] .d3-line .d3-line__controls select {
|
| 21 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 22 |
+
}
|
| 23 |
+
.d3-line .d3-line__controls select:hover {
|
| 24 |
+
border-color: var(--primary-color);
|
| 25 |
+
}
|
| 26 |
+
.d3-line .d3-line__controls select:focus {
|
| 27 |
+
border-color: var(--primary-color);
|
| 28 |
+
box-shadow: 0 0 0 3px rgba(232,137,171,.25);
|
| 29 |
+
outline: none;
|
| 30 |
+
}
|
| 31 |
+
.d3-line .d3-line__controls label { gap: 8px; }
|
| 32 |
+
|
| 33 |
+
/* Range slider themed with --primary-color */
|
| 34 |
+
.d3-line .d3-line__controls input[type="range"] {
|
| 35 |
+
-webkit-appearance: none;
|
| 36 |
+
appearance: none;
|
| 37 |
+
width: 100%;
|
| 38 |
+
height: 6px;
|
| 39 |
+
border-radius: 999px;
|
| 40 |
+
background: var(--border-color);
|
| 41 |
+
outline: none;
|
| 42 |
+
}
|
| 43 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
|
| 44 |
+
height: 6px;
|
| 45 |
+
background: transparent;
|
| 46 |
+
border-radius: 999px;
|
| 47 |
+
}
|
| 48 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
|
| 49 |
+
-webkit-appearance: none;
|
| 50 |
+
appearance: none;
|
| 51 |
+
width: 16px;
|
| 52 |
+
height: 16px;
|
| 53 |
+
border-radius: 50%;
|
| 54 |
+
background: var(--primary-color);
|
| 55 |
+
border: 2px solid var(--on-primary);
|
| 56 |
+
margin-top: -5px;
|
| 57 |
+
cursor: pointer;
|
| 58 |
+
}
|
| 59 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-track {
|
| 60 |
+
height: 6px;
|
| 61 |
+
background: transparent;
|
| 62 |
+
border-radius: 999px;
|
| 63 |
+
}
|
| 64 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
|
| 65 |
+
width: 16px;
|
| 66 |
+
height: 16px;
|
| 67 |
+
border-radius: 50%;
|
| 68 |
+
background: var(--primary-color);
|
| 69 |
+
border: 2px solid var(--on-primary);
|
| 70 |
+
cursor: pointer;
|
| 71 |
+
}
|
| 72 |
+
/* Improved line color via CSS */
|
| 73 |
+
.d3-line .lines path.improved { stroke: var(--primary-color); }
|
| 74 |
+
</style>
|
| 75 |
+
<script>
|
| 76 |
+
(() => {
|
| 77 |
+
const ensureD3 = (cb) => {
|
| 78 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 79 |
+
let s = document.getElementById('d3-cdn-script');
|
| 80 |
+
if (!s) {
|
| 81 |
+
s = document.createElement('script');
|
| 82 |
+
s.id = 'd3-cdn-script';
|
| 83 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 84 |
+
document.head.appendChild(s);
|
| 85 |
+
}
|
| 86 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 87 |
+
s.addEventListener('load', onReady, { once: true });
|
| 88 |
+
if (window.d3) onReady();
|
| 89 |
+
};
|
| 90 |
+
|
| 91 |
+
const bootstrap = () => {
|
| 92 |
+
const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
|
| 93 |
+
const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
|
| 94 |
+
if (!container) return;
|
| 95 |
+
if (container.dataset) {
|
| 96 |
+
if (container.dataset.mounted === 'true') return;
|
| 97 |
+
container.dataset.mounted = 'true';
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
// CSV: prefer public path, fallback to relative
|
| 101 |
+
const CSV_PATHS = [
|
| 102 |
+
'/data/ss_vs_s1.csv',
|
| 103 |
+
'./assets/data/ss_vs_s1.csv',
|
| 104 |
+
'../assets/data/ss_vs_s1.csv',
|
| 105 |
+
'../../assets/data/ss_vs_s1.csv'
|
| 106 |
+
];
|
| 107 |
+
const fetchFirstAvailable = async (paths) => {
|
| 108 |
+
for (const p of paths) {
|
| 109 |
+
try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
|
| 110 |
+
}
|
| 111 |
+
throw new Error('CSV not found: ss_vs_s1.csv');
|
| 112 |
+
};
|
| 113 |
+
|
| 114 |
+
// Controls UI
|
| 115 |
+
const controls = document.createElement('div');
|
| 116 |
+
controls.className = 'd3-line__controls';
|
| 117 |
+
Object.assign(controls.style, {
|
| 118 |
+
marginTop: '12px',
|
| 119 |
+
display: 'flex',
|
| 120 |
+
gap: '16px',
|
| 121 |
+
alignItems: 'center',
|
| 122 |
+
justifyContent: 'space-between',
|
| 123 |
+
width: '100%'
|
| 124 |
+
});
|
| 125 |
+
|
| 126 |
+
const labelMetric = document.createElement('label');
|
| 127 |
+
Object.assign(labelMetric.style, {
|
| 128 |
+
fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
|
| 129 |
+
});
|
| 130 |
+
labelMetric.textContent = 'Metric';
|
| 131 |
+
const selectMetric = document.createElement('select');
|
| 132 |
+
Object.assign(selectMetric.style, { fontSize: '12px' });
|
| 133 |
+
labelMetric.appendChild(selectMetric);
|
| 134 |
+
|
| 135 |
+
// Inline legend on the right of the select
|
| 136 |
+
const legendInline = document.createElement('div');
|
| 137 |
+
legendInline.className = 'controls__legend';
|
| 138 |
+
Object.assign(legendInline.style, {
|
| 139 |
+
display: 'flex',
|
| 140 |
+
gap: '8px',
|
| 141 |
+
alignItems: 'center',
|
| 142 |
+
flexWrap: 'nowrap',
|
| 143 |
+
fontSize: '11px',
|
| 144 |
+
marginLeft: '8px'
|
| 145 |
+
});
|
| 146 |
+
controls.appendChild(legendInline);
|
| 147 |
+
controls.appendChild(labelMetric);
|
| 148 |
+
|
| 149 |
+
// Create SVG with marker definitions
|
| 150 |
+
const svg = d3.select(container).append('svg')
|
| 151 |
+
.attr('width', '100%')
|
| 152 |
+
.style('display', 'block');
|
| 153 |
+
|
| 154 |
+
// Add marker definitions for different shapes
|
| 155 |
+
const defs = svg.append('defs');
|
| 156 |
+
|
| 157 |
+
// Academic marker shapes
|
| 158 |
+
const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
|
| 159 |
+
const markerSize = 8;
|
| 160 |
+
|
| 161 |
+
// Groups
|
| 162 |
+
const gRoot = svg.append('g');
|
| 163 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 164 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 165 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 166 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 167 |
+
const gHover = gRoot.append('g').attr('class', 'hover');
|
| 168 |
+
const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
|
| 169 |
+
|
| 170 |
+
// Tooltip
|
| 171 |
+
container.style.position = container.style.position || 'relative';
|
| 172 |
+
let tip = container.querySelector('.d3-tooltip');
|
| 173 |
+
let tipInner;
|
| 174 |
+
if (!tip) {
|
| 175 |
+
tip = document.createElement('div');
|
| 176 |
+
tip.className = 'd3-tooltip';
|
| 177 |
+
Object.assign(tip.style, {
|
| 178 |
+
position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
|
| 179 |
+
padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
|
| 180 |
+
background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
|
| 181 |
+
transition: 'opacity .12s ease'
|
| 182 |
+
});
|
| 183 |
+
tipInner = document.createElement('div');
|
| 184 |
+
tipInner.className = 'd3-tooltip__inner';
|
| 185 |
+
tipInner.style.textAlign = 'left';
|
| 186 |
+
tip.appendChild(tipInner);
|
| 187 |
+
container.appendChild(tip);
|
| 188 |
+
} else {
|
| 189 |
+
tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
// Colors per run
|
| 193 |
+
const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
|
| 194 |
+
const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
|
| 195 |
+
|
| 196 |
+
// Mapping from metric names to display titles
|
| 197 |
+
const metricTitleMapping = {
|
| 198 |
+
'docvqa_val_anls': 'DocVQA',
|
| 199 |
+
'infovqa_val_anls': 'InfoVQA',
|
| 200 |
+
'mme_total_score': 'MME Total',
|
| 201 |
+
'mmmu_val_mmmu_acc': 'MMMU',
|
| 202 |
+
'mmstar_average': 'MMStar',
|
| 203 |
+
'ocrbench_ocrbench_accuracy': 'OCRBench',
|
| 204 |
+
'scienceqa_exact_match': 'ScienceQA',
|
| 205 |
+
'textvqa_val_exact_match': 'TextVQA',
|
| 206 |
+
'average': 'Average (excl. MME)',
|
| 207 |
+
'average_rank': 'Average Rank',
|
| 208 |
+
'ai2d_exact_match': 'AI2D',
|
| 209 |
+
'chartqa_relaxed_overall': 'ChartQA',
|
| 210 |
+
'seedbench_seed_all': 'SeedBench'
|
| 211 |
+
};
|
| 212 |
+
|
| 213 |
+
// Function to get display name for metric
|
| 214 |
+
function getMetricDisplayName(metricKey) {
|
| 215 |
+
return metricTitleMapping[metricKey] || metricKey;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// State and data
|
| 219 |
+
let metricList = [];
|
| 220 |
+
let runList = [];
|
| 221 |
+
let runOrder = [];
|
| 222 |
+
const dataByMetric = new Map(); // metric => { run => [{step,value}] }
|
| 223 |
+
let isRankStrictFlag = false;
|
| 224 |
+
let rankTickMax = 1;
|
| 225 |
+
|
| 226 |
+
// Scales and layout
|
| 227 |
+
let width = 800, height = 360;
|
| 228 |
+
let margin = { top: 16, right: 28, bottom: 56, left: 64 };
|
| 229 |
+
let xScale = d3.scaleLinear();
|
| 230 |
+
let yScale = d3.scaleLinear();
|
| 231 |
+
|
| 232 |
+
// Line generators - simple linear connections
|
| 233 |
+
const lineGen = d3.line()
|
| 234 |
+
.x((d) => xScale(d.step))
|
| 235 |
+
.y((d) => yScale(d.value));
|
| 236 |
+
|
| 237 |
+
// Function to draw different marker shapes
|
| 238 |
+
function drawMarker(selection, shape, size) {
|
| 239 |
+
const s = size / 2;
|
| 240 |
+
switch (shape) {
|
| 241 |
+
case 'circle':
|
| 242 |
+
return selection.append('circle').attr('r', s);
|
| 243 |
+
case 'square':
|
| 244 |
+
return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
|
| 245 |
+
case 'triangle':
|
| 246 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
|
| 247 |
+
case 'diamond':
|
| 248 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
|
| 249 |
+
case 'inverted-triangle':
|
| 250 |
+
return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
|
| 251 |
+
default:
|
| 252 |
+
return selection.append('circle').attr('r', s);
|
| 253 |
+
}
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
// Hover elements
|
| 257 |
+
const hoverLine = gHover.append('line').attr('stroke-width', 1);
|
| 258 |
+
|
| 259 |
+
const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
|
| 260 |
+
|
| 261 |
+
function updateScales() {
|
| 262 |
+
const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
|
| 263 |
+
const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
|
| 264 |
+
const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
|
| 265 |
+
const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
|
| 266 |
+
|
| 267 |
+
width = container.clientWidth || 800;
|
| 268 |
+
height = Math.max(360, Math.round(width / 2.2));
|
| 269 |
+
svg.attr('width', width).attr('height', height);
|
| 270 |
+
|
| 271 |
+
const innerWidth = width - margin.left - margin.right;
|
| 272 |
+
const innerHeight = height - margin.top - margin.bottom;
|
| 273 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 274 |
+
|
| 275 |
+
xScale.range([0, innerWidth]);
|
| 276 |
+
yScale.range([innerHeight, 0]);
|
| 277 |
+
|
| 278 |
+
// Compute Y ticks
|
| 279 |
+
let yTicks = [];
|
| 280 |
+
if (isRankStrictFlag) {
|
| 281 |
+
const maxR = Math.max(1, Math.round(rankTickMax));
|
| 282 |
+
for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
|
| 283 |
+
} else {
|
| 284 |
+
// Use D3's tick generator to produce nice floating-point ticks
|
| 285 |
+
yTicks = yScale.ticks(6);
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
// Grid (horizontal)
|
| 289 |
+
gGrid.selectAll('*').remove();
|
| 290 |
+
gGrid.selectAll('line')
|
| 291 |
+
.data(yTicks)
|
| 292 |
+
.join('line')
|
| 293 |
+
.attr('x1', 0)
|
| 294 |
+
.attr('x2', innerWidth)
|
| 295 |
+
.attr('y1', (d) => yScale(d))
|
| 296 |
+
.attr('y2', (d) => yScale(d))
|
| 297 |
+
.attr('stroke', gridColor)
|
| 298 |
+
.attr('stroke-width', 1)
|
| 299 |
+
.attr('shape-rendering', 'crispEdges');
|
| 300 |
+
|
| 301 |
+
// Axes
|
| 302 |
+
gAxes.selectAll('*').remove();
|
| 303 |
+
let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
|
| 304 |
+
if (isRankStrictFlag) {
|
| 305 |
+
const [dx0, dx1] = xScale.domain();
|
| 306 |
+
const start = Math.ceil(dx0 / 1000) * 1000;
|
| 307 |
+
const end = Math.floor(dx1 / 1000) * 1000;
|
| 308 |
+
const xTicks = [];
|
| 309 |
+
for (let v = start; v <= end; v += 1000) xTicks.push(v);
|
| 310 |
+
if (xTicks.length === 0) xTicks.push(Math.round(dx0));
|
| 311 |
+
xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
|
| 312 |
+
} else {
|
| 313 |
+
xAxis = xAxis.ticks(8);
|
| 314 |
+
}
|
| 315 |
+
const yAxis = d3.axisLeft(yScale)
|
| 316 |
+
.tickValues(yTicks)
|
| 317 |
+
.tickSizeOuter(0)
|
| 318 |
+
.tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
|
| 319 |
+
gAxes.append('g')
|
| 320 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 321 |
+
.call(xAxis)
|
| 322 |
+
.call((g) => {
|
| 323 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 324 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 325 |
+
});
|
| 326 |
+
gAxes.append('g')
|
| 327 |
+
.call(yAxis)
|
| 328 |
+
.call((g) => {
|
| 329 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 330 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
// Axis labels (X and Y)
|
| 334 |
+
gAxes.append('text')
|
| 335 |
+
.attr('class', 'axis-label axis-label--x')
|
| 336 |
+
.attr('x', innerWidth / 2)
|
| 337 |
+
.attr('y', innerHeight + 44)
|
| 338 |
+
.attr('text-anchor', 'middle')
|
| 339 |
+
.style('font-size', '12px')
|
| 340 |
+
.style('fill', tickColor)
|
| 341 |
+
.text('Step');
|
| 342 |
+
gAxes.append('text')
|
| 343 |
+
.attr('class', 'axis-label axis-label--y')
|
| 344 |
+
.attr('text-anchor', 'middle')
|
| 345 |
+
.attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
|
| 346 |
+
.style('font-size', '12px')
|
| 347 |
+
.style('fill', tickColor)
|
| 348 |
+
.text('Value');
|
| 349 |
+
|
| 350 |
+
overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
|
| 351 |
+
hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
|
| 352 |
+
|
| 353 |
+
// Legend placeholder; actual content set in renderMetric
|
| 354 |
+
const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
|
| 355 |
+
const legendHeight = 64;
|
| 356 |
+
gLegend
|
| 357 |
+
.attr('x', innerWidth - legendWidth + 42)
|
| 358 |
+
.attr('y', innerHeight - legendHeight - 12)
|
| 359 |
+
.attr('width', legendWidth)
|
| 360 |
+
.attr('height', legendHeight);
|
| 361 |
+
const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
|
| 362 |
+
Object.assign(legendRoot.node().style, {
|
| 363 |
+
background: 'transparent',
|
| 364 |
+
border: 'none',
|
| 365 |
+
borderRadius: '0',
|
| 366 |
+
padding: '0',
|
| 367 |
+
fontSize: '12px',
|
| 368 |
+
lineHeight: '1.35',
|
| 369 |
+
color: 'var(--text-color)'
|
| 370 |
+
});
|
| 371 |
+
|
| 372 |
+
return { innerWidth, innerHeight };
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
function renderMetric(metricKey){
|
| 376 |
+
const map = dataByMetric.get(metricKey) || {};
|
| 377 |
+
const runs = runOrder;
|
| 378 |
+
// Domain
|
| 379 |
+
let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
|
| 380 |
+
const isRank = /rank/i.test(metricKey);
|
| 381 |
+
const isAverage = /average/i.test(metricKey);
|
| 382 |
+
const isRankStrict = isRank && !isAverage;
|
| 383 |
+
runs.forEach(r => {
|
| 384 |
+
const arr = map[r] || [];
|
| 385 |
+
arr.forEach(pt => {
|
| 386 |
+
const val = isRankStrict ? Math.round(pt.value) : pt.value;
|
| 387 |
+
minStep = Math.min(minStep, pt.step);
|
| 388 |
+
maxStep = Math.max(maxStep, pt.step);
|
| 389 |
+
maxVal = Math.max(maxVal, val);
|
| 390 |
+
minVal = Math.min(minVal, val);
|
| 391 |
+
});
|
| 392 |
+
});
|
| 393 |
+
if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
|
| 394 |
+
xScale.domain([minStep, maxStep]);
|
| 395 |
+
if (isRank) {
|
| 396 |
+
rankTickMax = Math.max(1, Math.round(maxVal));
|
| 397 |
+
yScale.domain([rankTickMax, 1]);
|
| 398 |
+
} else {
|
| 399 |
+
yScale.domain([0, Math.max(1, maxVal)]).nice();
|
| 400 |
+
}
|
| 401 |
+
isRankStrictFlag = isRankStrict;
|
| 402 |
+
|
| 403 |
+
const { innerWidth, innerHeight } = updateScales();
|
| 404 |
+
|
| 405 |
+
// Bind lines and markers
|
| 406 |
+
const series = runs.map((r, i) => ({
|
| 407 |
+
run: r,
|
| 408 |
+
color: pool[i % pool.length],
|
| 409 |
+
marker: markerShapes[i % markerShapes.length],
|
| 410 |
+
values: (map[r]||[])
|
| 411 |
+
.slice()
|
| 412 |
+
.sort((a,b)=>a.step-b.step)
|
| 413 |
+
.map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
|
| 414 |
+
}));
|
| 415 |
+
|
| 416 |
+
// Draw lines
|
| 417 |
+
const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
|
| 418 |
+
paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
|
| 419 |
+
.attr('stroke', d=>d.color).attr('opacity',0.9)
|
| 420 |
+
.attr('d', d=>lineGen(d.values))
|
| 421 |
+
.merge(paths)
|
| 422 |
+
.transition().duration(200)
|
| 423 |
+
.attr('stroke', d=>d.color)
|
| 424 |
+
.attr('d', d=>lineGen(d.values));
|
| 425 |
+
paths.exit().remove();
|
| 426 |
+
|
| 427 |
+
// Draw markers for each data point
|
| 428 |
+
gPoints.selectAll('*').remove();
|
| 429 |
+
series.forEach((s, seriesIndex) => {
|
| 430 |
+
const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
|
| 431 |
+
.data(s.values)
|
| 432 |
+
.join('g')
|
| 433 |
+
.attr('class', `points-${seriesIndex}`)
|
| 434 |
+
.attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
|
| 435 |
+
|
| 436 |
+
drawMarker(pointGroup, s.marker, markerSize)
|
| 437 |
+
.attr('fill', s.color)
|
| 438 |
+
.attr('stroke', s.color)
|
| 439 |
+
.attr('stroke-width', 1.5)
|
| 440 |
+
.style('cursor', 'crosshair');
|
| 441 |
+
});
|
| 442 |
+
|
| 443 |
+
// Inline legend content with marker shapes
|
| 444 |
+
legendInline.innerHTML = '';
|
| 445 |
+
series.forEach(s => {
|
| 446 |
+
const legendItem = document.createElement('span');
|
| 447 |
+
legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
|
| 448 |
+
|
| 449 |
+
// Create small SVG for marker shape
|
| 450 |
+
const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
|
| 451 |
+
markerSvg.setAttribute('width', '16');
|
| 452 |
+
markerSvg.setAttribute('height', '12');
|
| 453 |
+
markerSvg.style.display = 'inline-block';
|
| 454 |
+
|
| 455 |
+
const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
|
| 456 |
+
g.setAttribute('transform', 'translate(8,6)');
|
| 457 |
+
|
| 458 |
+
let shape;
|
| 459 |
+
const size = 6;
|
| 460 |
+
const halfSize = size / 2;
|
| 461 |
+
switch(s.marker) {
|
| 462 |
+
case 'circle':
|
| 463 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 464 |
+
shape.setAttribute('r', halfSize);
|
| 465 |
+
break;
|
| 466 |
+
case 'square':
|
| 467 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
|
| 468 |
+
shape.setAttribute('x', -halfSize);
|
| 469 |
+
shape.setAttribute('y', -halfSize);
|
| 470 |
+
shape.setAttribute('width', size);
|
| 471 |
+
shape.setAttribute('height', size);
|
| 472 |
+
break;
|
| 473 |
+
case 'triangle':
|
| 474 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 475 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
|
| 476 |
+
break;
|
| 477 |
+
case 'diamond':
|
| 478 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 479 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
|
| 480 |
+
break;
|
| 481 |
+
case 'inverted-triangle':
|
| 482 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 483 |
+
shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
|
| 484 |
+
break;
|
| 485 |
+
default:
|
| 486 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 487 |
+
shape.setAttribute('r', halfSize);
|
| 488 |
+
}
|
| 489 |
+
shape.setAttribute('fill', s.color);
|
| 490 |
+
shape.setAttribute('stroke', s.color);
|
| 491 |
+
shape.setAttribute('stroke-width', '1');
|
| 492 |
+
|
| 493 |
+
g.appendChild(shape);
|
| 494 |
+
markerSvg.appendChild(g);
|
| 495 |
+
|
| 496 |
+
const label = document.createElement('span');
|
| 497 |
+
label.textContent = s.run;
|
| 498 |
+
|
| 499 |
+
legendItem.appendChild(markerSvg);
|
| 500 |
+
legendItem.appendChild(label);
|
| 501 |
+
legendInline.appendChild(legendItem);
|
| 502 |
+
});
|
| 503 |
+
|
| 504 |
+
// Hover
|
| 505 |
+
const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
|
| 506 |
+
const steps = Array.from(stepSet).sort((a,b)=>a-b);
|
| 507 |
+
function onMove(event){
|
| 508 |
+
const [mx, my] = d3.pointer(event, overlay.node());
|
| 509 |
+
const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
|
| 510 |
+
const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
|
| 511 |
+
const xpx = xScale(nearest);
|
| 512 |
+
hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
|
| 513 |
+
// Tooltip content
|
| 514 |
+
let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
|
| 515 |
+
series.forEach(s=>{
|
| 516 |
+
const m = new Map(s.values.map(v=>[v.step, v.value]));
|
| 517 |
+
const val = m.has(nearest) ? m.get(nearest) : null;
|
| 518 |
+
if (val != null) {
|
| 519 |
+
const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
|
| 520 |
+
html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
|
| 521 |
+
}
|
| 522 |
+
});
|
| 523 |
+
tipInner.innerHTML = html;
|
| 524 |
+
const offsetX = 12, offsetY = 12;
|
| 525 |
+
tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
|
| 526 |
+
}
|
| 527 |
+
function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
|
| 528 |
+
overlay.on('mousemove', onMove).on('mouseleave', onLeave);
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
// (old hover removed; hover is attached in renderMetric)
|
| 532 |
+
|
| 533 |
+
// Load CSV and wire controls
|
| 534 |
+
(async () => {
|
| 535 |
+
try {
|
| 536 |
+
const text = await fetchFirstAvailable(CSV_PATHS);
|
| 537 |
+
const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
|
| 538 |
+
metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
|
| 539 |
+
runList = Array.from(new Set(rows.map(r=>r.run))).sort();
|
| 540 |
+
runOrder = runList;
|
| 541 |
+
// Build dataByMetric
|
| 542 |
+
metricList.forEach(m => {
|
| 543 |
+
const map = {};
|
| 544 |
+
runList.forEach(r => { map[r] = []; });
|
| 545 |
+
rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
|
| 546 |
+
dataByMetric.set(m, map);
|
| 547 |
+
});
|
| 548 |
+
|
| 549 |
+
// Populate metric select (default to average_rank if present)
|
| 550 |
+
metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
|
| 551 |
+
const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
|
| 552 |
+
if (def) selectMetric.value = def;
|
| 553 |
+
|
| 554 |
+
container.appendChild(controls);
|
| 555 |
+
updateScales();
|
| 556 |
+
renderMetric(selectMetric.value);
|
| 557 |
+
|
| 558 |
+
selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
|
| 559 |
+
|
| 560 |
+
const rerender = () => { renderMetric(selectMetric.value); };
|
| 561 |
+
if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
|
| 562 |
+
} catch (e) {
|
| 563 |
+
const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
|
| 564 |
+
pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
|
| 565 |
+
container.appendChild(pre);
|
| 566 |
+
}
|
| 567 |
+
})();
|
| 568 |
+
};
|
| 569 |
+
|
| 570 |
+
if (document.readyState === 'loading') {
|
| 571 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 572 |
+
} else { ensureD3(bootstrap); }
|
| 573 |
+
})();
|
| 574 |
+
</script>
|
| 575 |
+
|
| 576 |
+
|
app/src/content/embeds/visual-dependency-filters.html
ADDED
|
@@ -0,0 +1,576 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-line" style="width:100%;margin:10px 0;"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-line .d3-line__controls select {
|
| 4 |
+
font-size: 12px;
|
| 5 |
+
padding: 8px 28px 8px 10px;
|
| 6 |
+
border: 1px solid var(--border-color);
|
| 7 |
+
border-radius: 8px;
|
| 8 |
+
background-color: var(--surface-bg);
|
| 9 |
+
color: var(--text-color);
|
| 10 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%230f1115' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 11 |
+
background-repeat: no-repeat;
|
| 12 |
+
background-position: right 8px center;
|
| 13 |
+
background-size: 12px;
|
| 14 |
+
-webkit-appearance: none;
|
| 15 |
+
-moz-appearance: none;
|
| 16 |
+
appearance: none;
|
| 17 |
+
cursor: pointer;
|
| 18 |
+
transition: border-color .15s ease, box-shadow .15s ease;
|
| 19 |
+
}
|
| 20 |
+
[data-theme="dark"] .d3-line .d3-line__controls select {
|
| 21 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 24 24' fill='none' stroke='%23ffffff' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'/%3E%3C/svg%3E");
|
| 22 |
+
}
|
| 23 |
+
.d3-line .d3-line__controls select:hover {
|
| 24 |
+
border-color: var(--primary-color);
|
| 25 |
+
}
|
| 26 |
+
.d3-line .d3-line__controls select:focus {
|
| 27 |
+
border-color: var(--primary-color);
|
| 28 |
+
box-shadow: 0 0 0 3px rgba(232,137,171,.25);
|
| 29 |
+
outline: none;
|
| 30 |
+
}
|
| 31 |
+
.d3-line .d3-line__controls label { gap: 8px; }
|
| 32 |
+
|
| 33 |
+
/* Range slider themed with --primary-color */
|
| 34 |
+
.d3-line .d3-line__controls input[type="range"] {
|
| 35 |
+
-webkit-appearance: none;
|
| 36 |
+
appearance: none;
|
| 37 |
+
width: 100%;
|
| 38 |
+
height: 6px;
|
| 39 |
+
border-radius: 999px;
|
| 40 |
+
background: var(--border-color);
|
| 41 |
+
outline: none;
|
| 42 |
+
}
|
| 43 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-runnable-track {
|
| 44 |
+
height: 6px;
|
| 45 |
+
background: transparent;
|
| 46 |
+
border-radius: 999px;
|
| 47 |
+
}
|
| 48 |
+
.d3-line .d3-line__controls input[type="range"]::-webkit-slider-thumb {
|
| 49 |
+
-webkit-appearance: none;
|
| 50 |
+
appearance: none;
|
| 51 |
+
width: 16px;
|
| 52 |
+
height: 16px;
|
| 53 |
+
border-radius: 50%;
|
| 54 |
+
background: var(--primary-color);
|
| 55 |
+
border: 2px solid var(--on-primary);
|
| 56 |
+
margin-top: -5px;
|
| 57 |
+
cursor: pointer;
|
| 58 |
+
}
|
| 59 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-track {
|
| 60 |
+
height: 6px;
|
| 61 |
+
background: transparent;
|
| 62 |
+
border-radius: 999px;
|
| 63 |
+
}
|
| 64 |
+
.d3-line .d3-line__controls input[type="range"]::-moz-range-thumb {
|
| 65 |
+
width: 16px;
|
| 66 |
+
height: 16px;
|
| 67 |
+
border-radius: 50%;
|
| 68 |
+
background: var(--primary-color);
|
| 69 |
+
border: 2px solid var(--on-primary);
|
| 70 |
+
cursor: pointer;
|
| 71 |
+
}
|
| 72 |
+
/* Improved line color via CSS */
|
| 73 |
+
.d3-line .lines path.improved { stroke: var(--primary-color); }
|
| 74 |
+
</style>
|
| 75 |
+
<script>
|
| 76 |
+
(() => {
|
| 77 |
+
const ensureD3 = (cb) => {
|
| 78 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 79 |
+
let s = document.getElementById('d3-cdn-script');
|
| 80 |
+
if (!s) {
|
| 81 |
+
s = document.createElement('script');
|
| 82 |
+
s.id = 'd3-cdn-script';
|
| 83 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 84 |
+
document.head.appendChild(s);
|
| 85 |
+
}
|
| 86 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 87 |
+
s.addEventListener('load', onReady, { once: true });
|
| 88 |
+
if (window.d3) onReady();
|
| 89 |
+
};
|
| 90 |
+
|
| 91 |
+
const bootstrap = () => {
|
| 92 |
+
const mount = document.currentScript ? document.currentScript.previousElementSibling : null;
|
| 93 |
+
const container = (mount && mount.querySelector && mount.querySelector('.d3-line')) || document.querySelector('.d3-line');
|
| 94 |
+
if (!container) return;
|
| 95 |
+
if (container.dataset) {
|
| 96 |
+
if (container.dataset.mounted === 'true') return;
|
| 97 |
+
container.dataset.mounted = 'true';
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
// CSV: prefer public path, fallback to relative
|
| 101 |
+
const CSV_PATHS = [
|
| 102 |
+
'/data/visual_dependency_filters.csv',
|
| 103 |
+
'./assets/data/visual_dependency_filters.csv',
|
| 104 |
+
'../assets/data/visual_dependency_filters.csv',
|
| 105 |
+
'../../assets/data/visual_dependency_filters.csv'
|
| 106 |
+
];
|
| 107 |
+
const fetchFirstAvailable = async (paths) => {
|
| 108 |
+
for (const p of paths) {
|
| 109 |
+
try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return await r.text(); } catch(e) {}
|
| 110 |
+
}
|
| 111 |
+
throw new Error('CSV not found: visual_dependency_filters.csv');
|
| 112 |
+
};
|
| 113 |
+
|
| 114 |
+
// Controls UI
|
| 115 |
+
const controls = document.createElement('div');
|
| 116 |
+
controls.className = 'd3-line__controls';
|
| 117 |
+
Object.assign(controls.style, {
|
| 118 |
+
marginTop: '12px',
|
| 119 |
+
display: 'flex',
|
| 120 |
+
gap: '16px',
|
| 121 |
+
alignItems: 'center',
|
| 122 |
+
justifyContent: 'space-between',
|
| 123 |
+
width: '100%'
|
| 124 |
+
});
|
| 125 |
+
|
| 126 |
+
const labelMetric = document.createElement('label');
|
| 127 |
+
Object.assign(labelMetric.style, {
|
| 128 |
+
fontSize: '12px', color: 'var(--muted-color)', display: 'flex', alignItems: 'center', gap: '6px', whiteSpace: 'nowrap', padding: '6px 10px', marginLeft: 'auto'
|
| 129 |
+
});
|
| 130 |
+
labelMetric.textContent = 'Metric';
|
| 131 |
+
const selectMetric = document.createElement('select');
|
| 132 |
+
Object.assign(selectMetric.style, { fontSize: '12px' });
|
| 133 |
+
labelMetric.appendChild(selectMetric);
|
| 134 |
+
|
| 135 |
+
// Inline legend on the right of the select
|
| 136 |
+
const legendInline = document.createElement('div');
|
| 137 |
+
legendInline.className = 'controls__legend';
|
| 138 |
+
Object.assign(legendInline.style, {
|
| 139 |
+
display: 'flex',
|
| 140 |
+
gap: '8px',
|
| 141 |
+
alignItems: 'center',
|
| 142 |
+
flexWrap: 'nowrap',
|
| 143 |
+
fontSize: '11px',
|
| 144 |
+
marginLeft: '8px'
|
| 145 |
+
});
|
| 146 |
+
controls.appendChild(legendInline);
|
| 147 |
+
controls.appendChild(labelMetric);
|
| 148 |
+
|
| 149 |
+
// Create SVG with marker definitions
|
| 150 |
+
const svg = d3.select(container).append('svg')
|
| 151 |
+
.attr('width', '100%')
|
| 152 |
+
.style('display', 'block');
|
| 153 |
+
|
| 154 |
+
// Add marker definitions for different shapes
|
| 155 |
+
const defs = svg.append('defs');
|
| 156 |
+
|
| 157 |
+
// Academic marker shapes
|
| 158 |
+
const markerShapes = ['circle', 'square', 'triangle', 'diamond', 'inverted-triangle'];
|
| 159 |
+
const markerSize = 8;
|
| 160 |
+
|
| 161 |
+
// Groups
|
| 162 |
+
const gRoot = svg.append('g');
|
| 163 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 164 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 165 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 166 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 167 |
+
const gHover = gRoot.append('g').attr('class', 'hover');
|
| 168 |
+
const gLegend = gRoot.append('foreignObject').attr('class', 'legend');
|
| 169 |
+
|
| 170 |
+
// Tooltip
|
| 171 |
+
container.style.position = container.style.position || 'relative';
|
| 172 |
+
let tip = container.querySelector('.d3-tooltip');
|
| 173 |
+
let tipInner;
|
| 174 |
+
if (!tip) {
|
| 175 |
+
tip = document.createElement('div');
|
| 176 |
+
tip.className = 'd3-tooltip';
|
| 177 |
+
Object.assign(tip.style, {
|
| 178 |
+
position: 'absolute', top: '0px', left: '0px', transform: 'translate(-9999px, -9999px)', pointerEvents: 'none',
|
| 179 |
+
padding: '8px 10px', borderRadius: '8px', fontSize: '12px', lineHeight: '1.35', border: '1px solid var(--border-color)',
|
| 180 |
+
background: 'var(--surface-bg)', color: 'var(--text-color)', boxShadow: '0 4px 24px rgba(0,0,0,.18)', opacity: '0',
|
| 181 |
+
transition: 'opacity .12s ease'
|
| 182 |
+
});
|
| 183 |
+
tipInner = document.createElement('div');
|
| 184 |
+
tipInner.className = 'd3-tooltip__inner';
|
| 185 |
+
tipInner.style.textAlign = 'left';
|
| 186 |
+
tip.appendChild(tipInner);
|
| 187 |
+
container.appendChild(tip);
|
| 188 |
+
} else {
|
| 189 |
+
tipInner = tip.querySelector('.d3-tooltip__inner') || tip;
|
| 190 |
+
}
|
| 191 |
+
|
| 192 |
+
// Colors per run
|
| 193 |
+
const primary = getComputedStyle(document.documentElement).getPropertyValue('--primary-color').trim() || '#E889AB';
|
| 194 |
+
const pool = [primary, '#4EA5B7', '#E38A42', '#CEC0FA', ...(d3.schemeTableau10||[])];
|
| 195 |
+
|
| 196 |
+
// Mapping from metric names to display titles
|
| 197 |
+
const metricTitleMapping = {
|
| 198 |
+
'docvqa_val_anls': 'DocVQA',
|
| 199 |
+
'infovqa_val_anls': 'InfoVQA',
|
| 200 |
+
'mme_total_score': 'MME Total',
|
| 201 |
+
'mmmu_val_mmmu_acc': 'MMMU',
|
| 202 |
+
'mmstar_average': 'MMStar',
|
| 203 |
+
'ocrbench_ocrbench_accuracy': 'OCRBench',
|
| 204 |
+
'scienceqa_exact_match': 'ScienceQA',
|
| 205 |
+
'textvqa_val_exact_match': 'TextVQA',
|
| 206 |
+
'average': 'Average (excl. MME)',
|
| 207 |
+
'average_rank': 'Average Rank',
|
| 208 |
+
'ai2d_exact_match': 'AI2D',
|
| 209 |
+
'chartqa_relaxed_overall': 'ChartQA',
|
| 210 |
+
'seedbench_seed_all': 'SeedBench'
|
| 211 |
+
};
|
| 212 |
+
|
| 213 |
+
// Function to get display name for metric
|
| 214 |
+
function getMetricDisplayName(metricKey) {
|
| 215 |
+
return metricTitleMapping[metricKey] || metricKey;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// State and data
|
| 219 |
+
let metricList = [];
|
| 220 |
+
let runList = [];
|
| 221 |
+
let runOrder = [];
|
| 222 |
+
const dataByMetric = new Map(); // metric => { run => [{step,value}] }
|
| 223 |
+
let isRankStrictFlag = false;
|
| 224 |
+
let rankTickMax = 1;
|
| 225 |
+
|
| 226 |
+
// Scales and layout
|
| 227 |
+
let width = 800, height = 360;
|
| 228 |
+
let margin = { top: 16, right: 28, bottom: 56, left: 64 };
|
| 229 |
+
let xScale = d3.scaleLinear();
|
| 230 |
+
let yScale = d3.scaleLinear();
|
| 231 |
+
|
| 232 |
+
// Line generators - simple linear connections
|
| 233 |
+
const lineGen = d3.line()
|
| 234 |
+
.x((d) => xScale(d.step))
|
| 235 |
+
.y((d) => yScale(d.value));
|
| 236 |
+
|
| 237 |
+
// Function to draw different marker shapes
|
| 238 |
+
function drawMarker(selection, shape, size) {
|
| 239 |
+
const s = size / 2;
|
| 240 |
+
switch (shape) {
|
| 241 |
+
case 'circle':
|
| 242 |
+
return selection.append('circle').attr('r', s);
|
| 243 |
+
case 'square':
|
| 244 |
+
return selection.append('rect').attr('x', -s).attr('y', -s).attr('width', size).attr('height', size);
|
| 245 |
+
case 'triangle':
|
| 246 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},${s * 0.6} L${-s * 1.1},${s * 0.6} Z`);
|
| 247 |
+
case 'diamond':
|
| 248 |
+
return selection.append('path').attr('d', `M0,${-s * 1.2} L${s * 1.1},0 L0,${s * 1.2} L${-s * 1.1},0 Z`);
|
| 249 |
+
case 'inverted-triangle':
|
| 250 |
+
return selection.append('path').attr('d', `M0,${s * 1.2} L${s * 1.1},${-s * 0.6} L${-s * 1.1},${-s * 0.6} Z`);
|
| 251 |
+
default:
|
| 252 |
+
return selection.append('circle').attr('r', s);
|
| 253 |
+
}
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
// Hover elements
|
| 257 |
+
const hoverLine = gHover.append('line').attr('stroke-width', 1);
|
| 258 |
+
|
| 259 |
+
const overlay = gHover.append('rect').attr('fill', 'transparent').style('cursor', 'crosshair');
|
| 260 |
+
|
| 261 |
+
function updateScales() {
|
| 262 |
+
const isDark = document.documentElement.getAttribute('data-theme') === 'dark';
|
| 263 |
+
const axisColor = isDark ? 'rgba(255,255,255,0.25)' : 'rgba(0,0,0,0.25)';
|
| 264 |
+
const tickColor = isDark ? 'rgba(255,255,255,0.70)' : 'rgba(0,0,0,0.55)';
|
| 265 |
+
const gridColor = isDark ? 'rgba(255,255,255,0.08)' : 'rgba(0,0,0,0.05)';
|
| 266 |
+
|
| 267 |
+
width = container.clientWidth || 800;
|
| 268 |
+
height = Math.max(360, Math.round(width / 2.2));
|
| 269 |
+
svg.attr('width', width).attr('height', height);
|
| 270 |
+
|
| 271 |
+
const innerWidth = width - margin.left - margin.right;
|
| 272 |
+
const innerHeight = height - margin.top - margin.bottom;
|
| 273 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 274 |
+
|
| 275 |
+
xScale.range([0, innerWidth]);
|
| 276 |
+
yScale.range([innerHeight, 0]);
|
| 277 |
+
|
| 278 |
+
// Compute Y ticks
|
| 279 |
+
let yTicks = [];
|
| 280 |
+
if (isRankStrictFlag) {
|
| 281 |
+
const maxR = Math.max(1, Math.round(rankTickMax));
|
| 282 |
+
for (let v = 1; v <= maxR; v += 1) yTicks.push(v);
|
| 283 |
+
} else {
|
| 284 |
+
// Use D3's tick generator to produce nice floating-point ticks
|
| 285 |
+
yTicks = yScale.ticks(6);
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
// Grid (horizontal)
|
| 289 |
+
gGrid.selectAll('*').remove();
|
| 290 |
+
gGrid.selectAll('line')
|
| 291 |
+
.data(yTicks)
|
| 292 |
+
.join('line')
|
| 293 |
+
.attr('x1', 0)
|
| 294 |
+
.attr('x2', innerWidth)
|
| 295 |
+
.attr('y1', (d) => yScale(d))
|
| 296 |
+
.attr('y2', (d) => yScale(d))
|
| 297 |
+
.attr('stroke', gridColor)
|
| 298 |
+
.attr('stroke-width', 1)
|
| 299 |
+
.attr('shape-rendering', 'crispEdges');
|
| 300 |
+
|
| 301 |
+
// Axes
|
| 302 |
+
gAxes.selectAll('*').remove();
|
| 303 |
+
let xAxis = d3.axisBottom(xScale).tickSizeOuter(0);
|
| 304 |
+
if (isRankStrictFlag) {
|
| 305 |
+
const [dx0, dx1] = xScale.domain();
|
| 306 |
+
const start = Math.ceil(dx0 / 1000) * 1000;
|
| 307 |
+
const end = Math.floor(dx1 / 1000) * 1000;
|
| 308 |
+
const xTicks = [];
|
| 309 |
+
for (let v = start; v <= end; v += 1000) xTicks.push(v);
|
| 310 |
+
if (xTicks.length === 0) xTicks.push(Math.round(dx0));
|
| 311 |
+
xAxis = xAxis.tickValues(xTicks).tickFormat(d3.format('d'));
|
| 312 |
+
} else {
|
| 313 |
+
xAxis = xAxis.ticks(8);
|
| 314 |
+
}
|
| 315 |
+
const yAxis = d3.axisLeft(yScale)
|
| 316 |
+
.tickValues(yTicks)
|
| 317 |
+
.tickSizeOuter(0)
|
| 318 |
+
.tickFormat(isRankStrictFlag ? d3.format('d') : d3.format('.2f'));
|
| 319 |
+
gAxes.append('g')
|
| 320 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 321 |
+
.call(xAxis)
|
| 322 |
+
.call((g) => {
|
| 323 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 324 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 325 |
+
});
|
| 326 |
+
gAxes.append('g')
|
| 327 |
+
.call(yAxis)
|
| 328 |
+
.call((g) => {
|
| 329 |
+
g.selectAll('path, line').attr('stroke', axisColor);
|
| 330 |
+
g.selectAll('text').attr('fill', tickColor).style('font-size', '12px');
|
| 331 |
+
});
|
| 332 |
+
|
| 333 |
+
// Axis labels (X and Y)
|
| 334 |
+
gAxes.append('text')
|
| 335 |
+
.attr('class', 'axis-label axis-label--x')
|
| 336 |
+
.attr('x', innerWidth / 2)
|
| 337 |
+
.attr('y', innerHeight + 44)
|
| 338 |
+
.attr('text-anchor', 'middle')
|
| 339 |
+
.style('font-size', '12px')
|
| 340 |
+
.style('fill', tickColor)
|
| 341 |
+
.text('Step');
|
| 342 |
+
gAxes.append('text')
|
| 343 |
+
.attr('class', 'axis-label axis-label--y')
|
| 344 |
+
.attr('text-anchor', 'middle')
|
| 345 |
+
.attr('transform', `translate(${-44},${innerHeight/2}) rotate(-90)`)
|
| 346 |
+
.style('font-size', '12px')
|
| 347 |
+
.style('fill', tickColor)
|
| 348 |
+
.text('Value');
|
| 349 |
+
|
| 350 |
+
overlay.attr('x', 0).attr('y', 0).attr('width', innerWidth).attr('height', innerHeight);
|
| 351 |
+
hoverLine.attr('y1', 0).attr('y2', innerHeight).attr('stroke', axisColor);
|
| 352 |
+
|
| 353 |
+
// Legend placeholder; actual content set in renderMetric
|
| 354 |
+
const legendWidth = Math.min(180, Math.max(120, Math.round(innerWidth * 0.22)));
|
| 355 |
+
const legendHeight = 64;
|
| 356 |
+
gLegend
|
| 357 |
+
.attr('x', innerWidth - legendWidth + 42)
|
| 358 |
+
.attr('y', innerHeight - legendHeight - 12)
|
| 359 |
+
.attr('width', legendWidth)
|
| 360 |
+
.attr('height', legendHeight);
|
| 361 |
+
const legendRoot = gLegend.selectAll('div').data([0]).join('xhtml:div');
|
| 362 |
+
Object.assign(legendRoot.node().style, {
|
| 363 |
+
background: 'transparent',
|
| 364 |
+
border: 'none',
|
| 365 |
+
borderRadius: '0',
|
| 366 |
+
padding: '0',
|
| 367 |
+
fontSize: '12px',
|
| 368 |
+
lineHeight: '1.35',
|
| 369 |
+
color: 'var(--text-color)'
|
| 370 |
+
});
|
| 371 |
+
|
| 372 |
+
return { innerWidth, innerHeight };
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
function renderMetric(metricKey){
|
| 376 |
+
const map = dataByMetric.get(metricKey) || {};
|
| 377 |
+
const runs = runOrder;
|
| 378 |
+
// Domain
|
| 379 |
+
let minStep = Infinity, maxStep = -Infinity, maxVal = 0, minVal = Infinity;
|
| 380 |
+
const isRank = /rank/i.test(metricKey);
|
| 381 |
+
const isAverage = /average/i.test(metricKey);
|
| 382 |
+
const isRankStrict = isRank && !isAverage;
|
| 383 |
+
runs.forEach(r => {
|
| 384 |
+
const arr = map[r] || [];
|
| 385 |
+
arr.forEach(pt => {
|
| 386 |
+
const val = isRankStrict ? Math.round(pt.value) : pt.value;
|
| 387 |
+
minStep = Math.min(minStep, pt.step);
|
| 388 |
+
maxStep = Math.max(maxStep, pt.step);
|
| 389 |
+
maxVal = Math.max(maxVal, val);
|
| 390 |
+
minVal = Math.min(minVal, val);
|
| 391 |
+
});
|
| 392 |
+
});
|
| 393 |
+
if (!isFinite(minStep) || !isFinite(maxStep)) { return; }
|
| 394 |
+
xScale.domain([minStep, maxStep]);
|
| 395 |
+
if (isRank) {
|
| 396 |
+
rankTickMax = Math.max(1, Math.round(maxVal));
|
| 397 |
+
yScale.domain([rankTickMax, 1]);
|
| 398 |
+
} else {
|
| 399 |
+
yScale.domain([0, Math.max(1, maxVal)]).nice();
|
| 400 |
+
}
|
| 401 |
+
isRankStrictFlag = isRankStrict;
|
| 402 |
+
|
| 403 |
+
const { innerWidth, innerHeight } = updateScales();
|
| 404 |
+
|
| 405 |
+
// Bind lines and markers
|
| 406 |
+
const series = runs.map((r, i) => ({
|
| 407 |
+
run: r,
|
| 408 |
+
color: pool[i % pool.length],
|
| 409 |
+
marker: markerShapes[i % markerShapes.length],
|
| 410 |
+
values: (map[r]||[])
|
| 411 |
+
.slice()
|
| 412 |
+
.sort((a,b)=>a.step-b.step)
|
| 413 |
+
.map(pt => isRankStrict ? { step: pt.step, value: Math.round(pt.value) } : pt)
|
| 414 |
+
}));
|
| 415 |
+
|
| 416 |
+
// Draw lines
|
| 417 |
+
const paths = gLines.selectAll('path.run-line').data(series, d=>d.run);
|
| 418 |
+
paths.enter().append('path').attr('class','run-line').attr('fill','none').attr('stroke-width',2)
|
| 419 |
+
.attr('stroke', d=>d.color).attr('opacity',0.9)
|
| 420 |
+
.attr('d', d=>lineGen(d.values))
|
| 421 |
+
.merge(paths)
|
| 422 |
+
.transition().duration(200)
|
| 423 |
+
.attr('stroke', d=>d.color)
|
| 424 |
+
.attr('d', d=>lineGen(d.values));
|
| 425 |
+
paths.exit().remove();
|
| 426 |
+
|
| 427 |
+
// Draw markers for each data point
|
| 428 |
+
gPoints.selectAll('*').remove();
|
| 429 |
+
series.forEach((s, seriesIndex) => {
|
| 430 |
+
const pointGroup = gPoints.selectAll(`.points-${seriesIndex}`)
|
| 431 |
+
.data(s.values)
|
| 432 |
+
.join('g')
|
| 433 |
+
.attr('class', `points-${seriesIndex}`)
|
| 434 |
+
.attr('transform', d => `translate(${xScale(d.step)},${yScale(d.value)})`);
|
| 435 |
+
|
| 436 |
+
drawMarker(pointGroup, s.marker, markerSize)
|
| 437 |
+
.attr('fill', s.color)
|
| 438 |
+
.attr('stroke', s.color)
|
| 439 |
+
.attr('stroke-width', 1.5)
|
| 440 |
+
.style('cursor', 'crosshair');
|
| 441 |
+
});
|
| 442 |
+
|
| 443 |
+
// Inline legend content with marker shapes
|
| 444 |
+
legendInline.innerHTML = '';
|
| 445 |
+
series.forEach(s => {
|
| 446 |
+
const legendItem = document.createElement('span');
|
| 447 |
+
legendItem.style.cssText = 'display:inline-flex;align-items:center;gap:6px;white-space:nowrap;';
|
| 448 |
+
|
| 449 |
+
// Create small SVG for marker shape
|
| 450 |
+
const markerSvg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
|
| 451 |
+
markerSvg.setAttribute('width', '16');
|
| 452 |
+
markerSvg.setAttribute('height', '12');
|
| 453 |
+
markerSvg.style.display = 'inline-block';
|
| 454 |
+
|
| 455 |
+
const g = document.createElementNS('http://www.w3.org/2000/svg', 'g');
|
| 456 |
+
g.setAttribute('transform', 'translate(8,6)');
|
| 457 |
+
|
| 458 |
+
let shape;
|
| 459 |
+
const size = 6;
|
| 460 |
+
const halfSize = size / 2;
|
| 461 |
+
switch(s.marker) {
|
| 462 |
+
case 'circle':
|
| 463 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 464 |
+
shape.setAttribute('r', halfSize);
|
| 465 |
+
break;
|
| 466 |
+
case 'square':
|
| 467 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
|
| 468 |
+
shape.setAttribute('x', -halfSize);
|
| 469 |
+
shape.setAttribute('y', -halfSize);
|
| 470 |
+
shape.setAttribute('width', size);
|
| 471 |
+
shape.setAttribute('height', size);
|
| 472 |
+
break;
|
| 473 |
+
case 'triangle':
|
| 474 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 475 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},${halfSize * 0.6} L${-halfSize * 1.1},${halfSize * 0.6} Z`);
|
| 476 |
+
break;
|
| 477 |
+
case 'diamond':
|
| 478 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 479 |
+
shape.setAttribute('d', `M0,${-halfSize * 1.2} L${halfSize * 1.1},0 L0,${halfSize * 1.2} L${-halfSize * 1.1},0 Z`);
|
| 480 |
+
break;
|
| 481 |
+
case 'inverted-triangle':
|
| 482 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 483 |
+
shape.setAttribute('d', `M0,${halfSize * 1.2} L${halfSize * 1.1},${-halfSize * 0.6} L${-halfSize * 1.1},${-halfSize * 0.6} Z`);
|
| 484 |
+
break;
|
| 485 |
+
default:
|
| 486 |
+
shape = document.createElementNS('http://www.w3.org/2000/svg', 'circle');
|
| 487 |
+
shape.setAttribute('r', halfSize);
|
| 488 |
+
}
|
| 489 |
+
shape.setAttribute('fill', s.color);
|
| 490 |
+
shape.setAttribute('stroke', s.color);
|
| 491 |
+
shape.setAttribute('stroke-width', '1');
|
| 492 |
+
|
| 493 |
+
g.appendChild(shape);
|
| 494 |
+
markerSvg.appendChild(g);
|
| 495 |
+
|
| 496 |
+
const label = document.createElement('span');
|
| 497 |
+
label.textContent = s.run;
|
| 498 |
+
|
| 499 |
+
legendItem.appendChild(markerSvg);
|
| 500 |
+
legendItem.appendChild(label);
|
| 501 |
+
legendInline.appendChild(legendItem);
|
| 502 |
+
});
|
| 503 |
+
|
| 504 |
+
// Hover
|
| 505 |
+
const stepSet = new Set(); series.forEach(s=>s.values.forEach(v=>stepSet.add(v.step)));
|
| 506 |
+
const steps = Array.from(stepSet).sort((a,b)=>a-b);
|
| 507 |
+
function onMove(event){
|
| 508 |
+
const [mx, my] = d3.pointer(event, overlay.node());
|
| 509 |
+
const sx = Math.max(steps[0], Math.min(steps[steps.length-1], Math.round(xScale.invert(mx)/1)*1));
|
| 510 |
+
const nearest = steps.reduce((best, s)=> Math.abs(s - xScale.invert(mx)) < Math.abs(best - xScale.invert(mx)) ? s : best, steps[0]);
|
| 511 |
+
const xpx = xScale(nearest);
|
| 512 |
+
hoverLine.attr('x1', xpx).attr('x2', xpx).style('display', null).attr('stroke', 'rgba(0,0,0,0.25)');
|
| 513 |
+
// Tooltip content
|
| 514 |
+
let html = `<div><strong>${getMetricDisplayName(metricKey)}</strong></div><div><strong>step</strong> ${nearest}</div>`;
|
| 515 |
+
series.forEach(s=>{
|
| 516 |
+
const m = new Map(s.values.map(v=>[v.step, v.value]));
|
| 517 |
+
const val = m.has(nearest) ? m.get(nearest) : null;
|
| 518 |
+
if (val != null) {
|
| 519 |
+
const formatVal = (vv) => (isRankStrict ? d3.format('d')(vv) : (+vv).toFixed(4));
|
| 520 |
+
html += `<div><span style=\"display:inline-block;width:10px;height:10px;background:${s.color};border-radius:50%;margin-right:6px;\"></span><strong>${s.run}</strong> ${formatVal(val)}</div>`;
|
| 521 |
+
}
|
| 522 |
+
});
|
| 523 |
+
tipInner.innerHTML = html;
|
| 524 |
+
const offsetX = 12, offsetY = 12;
|
| 525 |
+
tip.style.opacity = '1'; tip.style.transform = `translate(${Math.round(mx + offsetX + margin.left)}px, ${Math.round(my + offsetY + margin.top)}px)`;
|
| 526 |
+
}
|
| 527 |
+
function onLeave(){ tip.style.opacity='0'; tip.style.transform='translate(-9999px, -9999px)'; hoverLine.style('display','none'); }
|
| 528 |
+
overlay.on('mousemove', onMove).on('mouseleave', onLeave);
|
| 529 |
+
}
|
| 530 |
+
|
| 531 |
+
// (old hover removed; hover is attached in renderMetric)
|
| 532 |
+
|
| 533 |
+
// Load CSV and wire controls
|
| 534 |
+
(async () => {
|
| 535 |
+
try {
|
| 536 |
+
const text = await fetchFirstAvailable(CSV_PATHS);
|
| 537 |
+
const rows = d3.csvParse(text, d => ({ run: (d.run||'').trim(), step: +d.step, metric: (d.metric||'').trim(), value: +d.value }));
|
| 538 |
+
metricList = Array.from(new Set(rows.map(r=>r.metric))).sort();
|
| 539 |
+
runList = Array.from(new Set(rows.map(r=>r.run))).sort();
|
| 540 |
+
runOrder = runList;
|
| 541 |
+
// Build dataByMetric
|
| 542 |
+
metricList.forEach(m => {
|
| 543 |
+
const map = {};
|
| 544 |
+
runList.forEach(r => { map[r] = []; });
|
| 545 |
+
rows.filter(r=>r.metric===m).forEach(r => { if (!isNaN(r.step) && !isNaN(r.value)) map[r.run].push({ step:r.step, value:r.value }); });
|
| 546 |
+
dataByMetric.set(m, map);
|
| 547 |
+
});
|
| 548 |
+
|
| 549 |
+
// Populate metric select (default to average_rank if present)
|
| 550 |
+
metricList.forEach((m)=>{ const o=document.createElement('option'); o.value=m; o.textContent=getMetricDisplayName(m); selectMetric.appendChild(o); });
|
| 551 |
+
const def = metricList.find(m => /average_rank/i.test(m)) || metricList[0];
|
| 552 |
+
if (def) selectMetric.value = def;
|
| 553 |
+
|
| 554 |
+
container.appendChild(controls);
|
| 555 |
+
updateScales();
|
| 556 |
+
renderMetric(selectMetric.value);
|
| 557 |
+
|
| 558 |
+
selectMetric.addEventListener('change', ()=>{ renderMetric(selectMetric.value); });
|
| 559 |
+
|
| 560 |
+
const rerender = () => { renderMetric(selectMetric.value); };
|
| 561 |
+
if (window.ResizeObserver) { const ro = new ResizeObserver(()=>rerender()); ro.observe(container); } else { window.addEventListener('resize', rerender); }
|
| 562 |
+
} catch (e) {
|
| 563 |
+
const pre = document.createElement('pre'); pre.textContent = 'CSV load error: ' + (e && e.message ? e.message : e);
|
| 564 |
+
pre.style.color = 'var(--danger, #b00020)'; pre.style.fontSize = '12px'; pre.style.whiteSpace = 'pre-wrap';
|
| 565 |
+
container.appendChild(pre);
|
| 566 |
+
}
|
| 567 |
+
})();
|
| 568 |
+
};
|
| 569 |
+
|
| 570 |
+
if (document.readyState === 'loading') {
|
| 571 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 572 |
+
} else { ensureD3(bootstrap); }
|
| 573 |
+
})();
|
| 574 |
+
</script>
|
| 575 |
+
|
| 576 |
+
|