Propagandistic content in arabic memes

Introduction
The research presented in “ArMeme: Propagandistic Content in Arabic Memes” investigates the identification of propagandistic content in Arabic memes. With the proliferation of digital communication, memes have become a significant medium for cultural and political expression. However, memes are often used to mislead audiences. Identifying such misleading and persuasive multimodal content is crucial for social media platforms, policymakers, and the broader society to mitigate potential harm.

The study focuses on developing an Arabic meme dataset with manual annotations of propagandistic content. Approximately 6,000 Arabic memes were collected from various social media platforms, making this the first resource for Arabic multimodal research aimed at creating computational tools for detecting such content.

Background
Social media platforms facilitate the posting and sharing of online content, some of which provides valuable resources for public awareness and political campaigns. However, a significant portion aims to mislead users for social, economic, or political purposes, contributing to the spread of disinformation, hate speech, propaganda, and cyberbullying. The lack of media literacy exacerbates this issue, leading to the uncritical acceptance and rapid dissemination of false information.

Memes, defined as collections of digital items with common characteristics that circulate widely online, are often humorous but can also convey misleading narratives. Research on automatically identifying such content has focused on offensive material, hate speech, and propaganda techniques, primarily in English. This study addresses the gap in resources for Arabic multimodal content.

Contributions

The key contributions of this work include:

The creation of the first manually annotated Arabic meme dataset.
A detailed description of the data collection procedure to assist future research.
An annotation guideline to serve as a foundation for subsequent studies.
Detailed experimental results for detecting propagandistic content in text, image, and multimodal formats.
Public release of the dataset and annotation guideline to facilitate further research and enhance media literacy.

Related Work

Previous research on propaganda detection has spanned various content types, including news articles, tweets, and political speech, primarily focusing on English. These studies have developed corpora and models for identifying propaganda techniques, yet there has been little attention to Arabic content. Multimodal studies have examined the effectiveness of combining textual and visual information to detect propaganda and other harmful content.

Dataset Collection and Annotation

The dataset was collected from Facebook, Instagram, Pinterest, and Twitter. The collection process involved manual selection of public groups, crawling images, filtering duplicates, and using OCR to extract text from memes. A classifier was employed to filter out non-meme images, resulting in a dataset of approximately 6,000 memes.

Annotation involved categorizing memes into four categories: not-meme, other, not propaganda, and propaganda. The annotation process included editing text to correct OCR errors. The annotation agreement was measured using various metrics, indicating moderate agreement among annotators.

Data Analysis

The dataset was analyzed across different modalities:

Text Modality: Classical models and transformer models were trained and fine-tuned.
Image Modality: CNN models with different architectures were fine-tuned.
Multimodality: An early fusion-based model was trained, and different large language models (LLMs) were evaluated in a zero-shot setup.

Experimental Results

The experimental results demonstrated the effectiveness of various models in detecting propagandistic content. The study provides a benchmark for future research in this area and highlights the importance of developing resources for less commonly studied languages like Arabic.

Conclusion

The “ArMeme” study presents a significant step toward understanding and identifying propagandistic content in Arabic memes. By providing a comprehensive dataset and detailed analysis, the research lays the groundwork for future studies aiming to develop automatic detection systems. The release of the dataset and annotation guideline will benefit the research community and contribute to enhancing media literacy among social media users.

Source:

DOI: https://doi.org/10.48550/arXiv.2406.03916