Title
Multimodal intent classification with incomplete modalities using text embedding propagation
Date Issued
05 November 2021
Access level
metadata only access
Resource Type
conference paper
Author(s)
Snap Research
Publisher(s)
Association for Computing Machinery
Abstract
Determining the author's intent in a social media post is a challenging multimodal task and requires identifying complex relationships between image and text in the post. For example, the post image can represent an object, person, product, or company, while the text can be an ironic message about the image content. Similarly, a text can be a news headline, while the image represents a provocation, meme, or satire about the news. Existing approaches propose intent classification techniques combining both modalities. However, some posts may have missing textual annotations. Hence, we investigate a graph-based approach that propagates available text embedding data from complete multimodal posts to incomplete ones. This paper presents a text embedding propagation method, which transfers embeddings from BERT neural language models to image-only posts (i.e., posts with incomplete modality) considering the topology of a graph constructed from both visual and textual modalities available during the training step. By using this inference approach, our method provides competitive results when textual modality is available at different completeness levels, even compared to reference methods that require complete modalities.
Start page
217
End page
220
Language
English
OCDE Knowledge area
Ciencias de la computación
Subjects
Scopus EID
2-s2.0-85116548277
Resource of which it is part
ACM International Conference Proceeding Series
ISBN of the container
978-145038609-8
Conference
27th Brazilian Symposium on Multimedia and the Web, WebMedia 2021
Sponsor(s)
This work was supported by CNPq [process number 426663/2018-7] and FAPESP [process numbers 2019/25010-5 and 2019/07665-4].
Sources of information:
Directorio de Producción Científica
Scopus