MemeInterpret: Towards an All-in-one Dataset for Meme Understanding

Jeongsik Park, Khoi P. N. Nguyen, Jihyung Park, Minseok Kim, Jaeheon Lee, Jae Won Choi, Kalyani Ganta, Phalgun Ashrit Kasu, Rohan Sarakinti, Sanjana Vipperla, Sai Sathanapalli, Nishan Vaghani, and Vincent Ng
Findings of the Association for Computational Linguistics: EMNLP 2025, pp. , 2025.

Click here for the PDF version.

Abstract

Meme captioning, the task of generating a sentence that describes the meaning of a meme, is both challenging and important in advancing Computational Meme Understanding (CMU). However, existing research has not explored its decomposition into subtasks or its connections to other CMU tasks. To address this gap, we introduce MemeInterpret, a meme corpus containing meme captions together with corresponding surface messages and relevant background knowledge. Strategically built upon the Facebook Hateful Memes dataset, MemeInterpret is the first corpus to unify three major categories of CMU tasks. Extensive experiments on MemeInterpret and connected datasets suggest strong relationships between meme captioning, its two proposed subtasks, and the other two core CMU tasks: classification and explanation. To stimulate further research on CMU, we make our dataset publicly available at https://github.com/npnkhoi/MemeInterpret.

Dataset

The dataset used in this paper is available from this page.

BibTeX entry

@InProceedings{Nguyen+etal:25b,
  author = {Jeongsik Park and Nguyen, Khoi P. N. and Jihyung Park and Minseok Kim and Jaeheon Lee and Choi, Jae Won and Kalyani Ganta and Phalgun Kasu and Rohan Sarakinti and Sanjana Vipperla Sai Sathanapalli Nishan Vaghani and Vincent Ng},
  title = {MemeInterpret: Towards an All-in-one Dataset for Meme Understanding},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
  pages = {}, 
  year = 2025}