|
|
--- |
|
|
license: llama3.2 |
|
|
base_model: meta-llama/Llama-3.2-11B-Vision-Instruct |
|
|
datasets: |
|
|
- QCRI/MemeXplain |
|
|
language: |
|
|
- en |
|
|
- ar |
|
|
pipeline_tag: image-text-to-text |
|
|
tags: |
|
|
- meme-detection |
|
|
- propaganda |
|
|
- hate-speech |
|
|
- multimodal |
|
|
- vision-language |
|
|
- explainability |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# MemeIntel: Explainable Detection of Propagandistic and Hateful Memes |
|
|
|
|
|
MemeIntel is a Vision-Language Model fine-tuned from [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) for detecting propaganda in Arabic memes and hateful content in English memes, with explainable reasoning. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
MemeIntel addresses the challenge of understanding and moderating complex, context-dependent multimodal content on social media. The model performs: |
|
|
- **Label Detection**: Classifies memes into categories (propaganda/not-propaganda/not-meme/other for Arabic; hateful/not-hateful for English) |
|
|
- **Explanation Generation**: Provides human-readable explanations for its predictions |
|
|
|
|
|
The model was trained using a novel multi-stage optimization approach on the [MemeXplain](https://huggingface.co/datasets/QCRI/MemeXplain) dataset. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import MllamaForConditionalGeneration, AutoProcessor |
|
|
from PIL import Image |
|
|
|
|
|
# Load model and processor |
|
|
model = MllamaForConditionalGeneration.from_pretrained( |
|
|
"QCRI/MemeIntel", |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
processor = AutoProcessor.from_pretrained("QCRI/MemeIntel") |
|
|
|
|
|
# Load your meme image |
|
|
image = Image.open("path/to/meme.jpg") |
|
|
``` |
|
|
|
|
|
### Arabic Propaganda Meme Detection (Arabic Explanation) |
|
|
|
|
|
```python |
|
|
messages = [ |
|
|
{"role": "system", "content": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts."}, |
|
|
{"role": "user", "content": [ |
|
|
{"type": "image"}, |
|
|
{"type": "text", "text": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in Arabic. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: لما يقولي انتي مالكيش عزيز\nاعز ما ليا البطاطس المقلية"} |
|
|
]} |
|
|
] |
|
|
|
|
|
input_text = processor.apply_chat_template(messages, add_generation_prompt=True) |
|
|
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to(model.device) |
|
|
output = model.generate(**inputs, max_new_tokens=256) |
|
|
print(processor.decode(output[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Arabic Propaganda Meme Detection (English Explanation) |
|
|
|
|
|
```python |
|
|
messages = [ |
|
|
{"role": "system", "content": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts."}, |
|
|
{"role": "user", "content": [ |
|
|
{"type": "image"}, |
|
|
{"type": "text", "text": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in English. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: وأنا أبكي\n٣\nانت تتمنى وانا البي\n{7"} |
|
|
]} |
|
|
] |
|
|
|
|
|
input_text = processor.apply_chat_template(messages, add_generation_prompt=True) |
|
|
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to(model.device) |
|
|
output = model.generate(**inputs, max_new_tokens=256) |
|
|
print(processor.decode(output[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### English Hateful Meme Detection |
|
|
|
|
|
```python |
|
|
messages = [ |
|
|
{"role": "system", "content": "You are an expert social media image analyzer specializing in identifying hateful content in memes"}, |
|
|
{"role": "user", "content": [ |
|
|
{"type": "image"}, |
|
|
{"type": "text", "text": "I will provide you with memes and the text extracted from these images. Your task is to classify the image as one of the following: 'hateful' or 'not-hateful' and provide a brief explanation. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: bows here, bows there, bows everywhere"} |
|
|
]} |
|
|
] |
|
|
|
|
|
input_text = processor.apply_chat_template(messages, add_generation_prompt=True) |
|
|
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to(model.device) |
|
|
output = model.generate(**inputs, max_new_tokens=256) |
|
|
print(processor.decode(output[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Prompt Templates |
|
|
|
|
|
### Arabic Meme (Arabic Explanation) |
|
|
``` |
|
|
System: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. |
|
|
|
|
|
User: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in Arabic. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: {OCR_TEXT} |
|
|
``` |
|
|
|
|
|
### Arabic Meme (English Explanation) |
|
|
``` |
|
|
System: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. |
|
|
|
|
|
User: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in English. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: {OCR_TEXT} |
|
|
``` |
|
|
|
|
|
### English Hateful Meme |
|
|
``` |
|
|
System: You are an expert social media image analyzer specializing in identifying hateful content in memes |
|
|
|
|
|
User: I will provide you with memes and the text extracted from these images. Your task is to classify the image as one of the following: 'hateful' or 'not-hateful' and provide a brief explanation. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: {OCR_TEXT} |
|
|
``` |
|
|
|
|
|
## Expected Output Format |
|
|
|
|
|
The model outputs in the following format: |
|
|
``` |
|
|
Label: [classification_label] |
|
|
Explanation: [reasoning for the classification] |
|
|
``` |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Base Model**: [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) |
|
|
- **Training Dataset**: [QCRI/MemeXplain](https://huggingface.co/datasets/QCRI/MemeXplain) |
|
|
- **Training Method**: Multi-stage optimization approach |
|
|
|
|
|
## Performance |
|
|
|
|
|
MemeIntel achieves state-of-the-art results: |
|
|
- **ArMeme (Arabic Propaganda)**: ~3% absolute improvement over previous SOTA |
|
|
- **Hateful Memes (English)**: ~7% absolute improvement over previous SOTA |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{kmainasi-etal-2025-memeintel, |
|
|
title = "{M}eme{I}ntel: Explainable Detection of Propagandistic and Hateful Memes", |
|
|
author = "Kmainasi, Mohamed Bayan and |
|
|
Hasnat, Abul and |
|
|
Hasan, Md Arid and |
|
|
Shahroor, Ali Ezzat and |
|
|
Alam, Firoj", |
|
|
editor = "Christodoulopoulos, Christos and |
|
|
Chakraborty, Tanmoy and |
|
|
Rose, Carolyn and |
|
|
Peng, Violet", |
|
|
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing", |
|
|
month = nov, |
|
|
year = "2025", |
|
|
address = "Suzhou, China", |
|
|
publisher = "Association for Computational Linguistics", |
|
|
url = "https://aclanthology.org/2025.emnlp-main.1539/", |
|
|
doi = "10.18653/v1/2025.emnlp-main.1539", |
|
|
pages = "30263--30279", |
|
|
ISBN = "979-8-89176-332-6", |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the [Llama 3.2 Community License](https://www.llama.com/llama3_2/license/). |
|
|
|
|
|
## Authors |
|
|
|
|
|
- Mohamed Bayan Kmainasi |
|
|
- Abul Hasnat |
|
|
- Md Arid Hasan |
|
|
- Ali Ezzat Shahroor |
|
|
- Firoj Alam |
|
|
|
|
|
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University |
|
|
|