Visual Document Retrieval
PEFT
Safetensors
qwen2_vl
vidore
reranker
MonoQwen2-VL-v0.1 / README.md
uminaty's picture
Update README.md
de61fd2 verified
|
raw
history blame
3.68 kB
metadata
license: apache-2.0

MonoQwen2-VL-2B-LoRA-Reranker

Model Overview

The MonoQwen2-VL-2B-LoRA-Reranker is a fine-tuned version of the Qwen2-VL-2B model, optimized for reranking image-query relevance. It is built to process visual and text data and generate binary relevance scores. This model can be used in scenarios where reranking image relevance is crucial, such as document analysis and image-based search tasks.

How to Use the Model

Below is a quick example to rerank a single image against a user query using this model:

import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration

# Load processor and model
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = Qwen2VLForConditionalGeneration.from_pretrained("lightonai/MonoQwen2-VL-2B-LoRA-Reranker")

# Define the query and the image
query = "What is the value of the thing in the document"
image = Image.open("path_to_image.jpg")

# Prepare the inputs
prompt = f"Assert the relevance of the previous image document to the following query, answer True or False. The query is: {query}"
inputs = processor(text=prompt, images=image, return_tensors="pt")

# Run the model and obtain results
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    logits_for_last_token = logits[:, -1, :]
    true_token_id = processor.tokenizer.convert_tokens_to_ids("True")
    false_token_id = processor.tokenizer.convert_tokens_to_ids("False")
    relevance_score = torch.softmax(logits_for_last_token[:, [true_token_id, false_token_id]], dim=-1)

# Print the True/False probabilities
true_prob = relevance_score[:, 0].item()
false_prob = relevance_score[:, 1].item()

print(f"True probability: {true_prob}, False probability: {false_prob}")

This example demonstrates how to use the model to assess the relevance of an image with respect to a query. It outputs the probability that the image is relevant ("True") or not relevant ("False").

Performance Metrics

The model has been evaluated on ViDoRe Benchmark, by retrieving 10 elements with MrLight_dse-qwen2-2b-mrl-v1 and reranking them. The table below summarizes its ndcg@5 scores:

Dataset NDCG@5 Before Reranking NDCG@5 After Reranking
Mean 87.6 91.8
vidore/arxivqa_test_subsampled 85.6 89.01
vidore/docvqa_test_subsampled 57.1 59.71
vidore/infovqa_test_subsampled 88.1 93.49
vidore/tabfquad_test_subsampled 93.1 95.96
vidore/shiftproject_test 82.0 92.98
vidore/syntheticDocQA_artificial_intelligence_test 97.5 100.00
vidore/syntheticDocQA_energy_test 92.9 97.65
vidore/syntheticDocQA_government_reports_test 96.0 98.04
vidore/syntheticDocQA_healthcare_industry_test 96.4 99.27

License

This LoRA model is licensed under the Apache 2.0 license.