AbstractPhil
/

penta-vit-experiments

Zero-Shot Classification

Model card Files Files and versions

Metrics Training metrics Community

AbstractPhil commited on Sep 16, 2025

Commit

b155f4d

·

verified ·

1 Parent(s): 08f8805

Update README.md

Files changed (1) hide show

README.md +17 -0

README.md CHANGED Viewed

@@ -4,6 +4,23 @@ datasets:
 - AbstractPhil/geometric-vocab
 pipeline_tag: zero-shot-classification
 ---
 # After a big notebook refactor
 I have pushed the updated model code, and included the loader. I will not include the losses or the training methodology until the full process is prepared and the paper published. After which you will see exactly what I've developed and why each piece exists. Until then there are only breadcrumbs and inference code.

 - AbstractPhil/geometric-vocab
 pipeline_tag: zero-shot-classification
 ---
+# Breakdown and assessment
+Using the standard vit position tokenization doesn't work. It caps our unique feature map representation to 65 tokens worth based on patch4, and due to the high-dimensional geometry bloating dimensions to such a degree - the representative patches can only be learned to a certain degree of accuracy before they simply run out of space and the model defaults to memorizing the training data.
+Probe diagnostics show;
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/OZMtUTUc_1ZEMkVskviWt.png)
+They are in fact forming unique separated clusters of similar assessment, and the 2d graph shows they are highly diverse;
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/i0LPPZbBcIJ7ejMFZ0jGO.png)
+So the process works, but the process simply does not have enough compartmentalization to fully represent a classified patch within a classification within the current set of parameters.
+I am currently devising a better and more directly intertwined structure that represents the baseline of geometry directly entangled with the vit patches rather than trying to shape it using another system indirectly.
+This direct representation will be a bit more volatile at first but it should help solidify the missing 40% accuracy in a utilizable and repeatable way to extract the necessary patches.
 # After a big notebook refactor
 I have pushed the updated model code, and included the loader. I will not include the losses or the training methodology until the full process is prepared and the paper published. After which you will see exactly what I've developed and why each piece exists. Until then there are only breadcrumbs and inference code.