Multimodal Data Evaluation for Classification Problems


Daniela Moctezuma1, Víctor Muníz2 and Jorge García2, 1Centro de Investigación en Ciencias de Información Geoespacial, Mexico, 2Centro de Investigación en Matemáticas, Mexico


Social media data is currently the main input to a wide variety of research works in many knowledge fields. This kind of data is generally multimodal, i.e., it contains different modalities of information such as text, images, video or audio, mainly. To deal with multimodal data to tackle a specific task could be very difficult. One of the main challenges is to find useful representations of the data, capable of capturing the subtle information that the users who generate that information provided, or even the way they use it. In this paper, we analysed the usage of two modalities of data, images, and text, both in a separate way and by combining them to address two classification problems: meme's classification and user profiling. For images, we use a textual semantic representation by using a pre-trained model of image captioning. Later, a text classifier based on optimal lexical representations was used to build a classification model. Interesting findings were found in the usage of these two modalities of data, and the pros and cons of using them to solve the two classification problems are also discussed.


Multimodal Data, Deep Learning, Natural Language Processing, Image captioning.

Full Text  Volume 11, Number 21