amaye15
/

aimv2-large-patch14-native-image-classification

Image Classification

Model card Files Files and versions

amaye15 commited on Nov 25, 2024

Commit

f3003dd

·

verified ·

1 Parent(s): a47cb18

Update README.md

Files changed (1) hide show

README.md +77 -3

README.md CHANGED Viewed

@@ -1,3 +1,77 @@
----
-license: mit
----

+---
+license: mit
+base_model:
+- apple/aimv2-large-patch14-native
+pipeline_tag: image-classification
+tags:
+- image-classification
+- vision
+---
+# AIMv2-Large-Patch14-Native Image Classification
+[Original AIMv2 Paper](https://arxiv.org/abs/2411.14402) | [BibTeX](#citation)
+This repository contains an adapted version of the original AIMv2 model, modified to be compatible with the `AutoModelForImageClassification` class from Hugging Face Transformers. This adaptation enables seamless use of the model for image classification tasks.
+## Introduction
+We have adapted the original `apple/aimv2-large-patch14-native` model to work with `AutoModelForImageClassification`. The AIMv2 family consists of vision models pre-trained with a multimodal autoregressive objective, offering robust performance across various benchmarks.
+Some highlights of the AIMv2 models include:
+1. Outperforming OAI CLIP and SigLIP on the majority of multimodal understanding benchmarks.
+2. Surpassing DINOv2 in open-vocabulary object detection and referring expression comprehension.
+3. Demonstrating strong recognition performance, with AIMv2-3B achieving **89.5% on ImageNet using a frozen trunk**.
+## Usage
+### PyTorch
+```python
+import requests
+from PIL import Image
+from transformers import AutoImageProcessor, AutoModelForImageClassification
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+processor = AutoImageProcessor.from_pretrained(
+    "amaye15/aimv2-large-patch14-native-image-classification",
+)
+model = AutoModelForImageClassification.from_pretrained(
+    "amaye15/aimv2-large-patch14-native-image-classification",
+    trust_remote_code=True,
+)
+inputs = processor(images=image, return_tensors="pt")
+outputs = model(**inputs)
+# Get predicted class
+predictions = outputs.logits.softmax(dim=-1)
+predicted_class = predictions.argmax(-1).item()
+print(f"Predicted class: {model.config.id2label[predicted_class]}")
+```
+## Model Details
+- **Model Name**: `amaye15/aimv2-large-patch14-native-image-classification`
+- **Original Model**: `apple/aimv2-large-patch14-native`
+- **Adaptation**: Modified to be compatible with `AutoModelForImageClassification` for direct use in image classification tasks.
+- **Framework**: PyTorch
+- **License**: [Specify license if applicable]
+## Citation
+If you use this model or find it helpful, please consider citing the original AIMv2 paper:
+```bibtex
+@article{yang2023aimv2,
+  title={AIMv2: Advances in Multimodal Vision Models},
+  author={Yang, Li and others},
+  journal={arXiv preprint arXiv:2411.14402},
+  year={2023}
+}
+```