Title
Cross-modality personalization for retrieval
Date Issued
01 June 2019
Access level
metadata only access
Resource Type
conference paper
Author(s)
University of Pittsburgh
Publisher(s)
IEEE Computer Society
Abstract
Existing captioning and gaze prediction approaches do not consider the multiple facets of personality that affect how a viewer extracts meaning from an image. While there are methods that consider personalized captioning, they do not consider personalized perception across modalities, i.e. how a person's way of looking at an image (gaze) affects the way they describe it (captioning). In this work, we propose a model for modeling cross-modality personalized retrieval. In addition to modeling gaze and captions, we also explicitly model the personality of the users providing these samples. We incorporate constraints that encourage gaze and caption samples on the same image to be close in a learned space; we refer to this as content modeling. We also model style: we encourage samples provided by the same user to be close in a separate embedding space, regardless of the image on which they were provided. To leverage the complementary information that content and style constraints provide, we combine the embeddings from both networks. We show that our combined embeddings achieve better performance than existing approaches for cross-modal retrieval.
Start page
6422
End page
6431
Volume
2019-June
Number
8953895
Language
English
OCDE Knowledge area
Ingeniería de sistemas y comunicaciones Ciencias de la Información
Scopus EID
2-s2.0-85078786270
Resource of which it is part
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN of the container
10636919
ISBN of the container
978-172813293-8
Conference
32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Sponsor(s)
Acknowledgements. This material is based upon work supported by the National Science Foundation under Grant No. 1566270. This research was also supported by a Google Faculty Research Award, a University of Pittsburgh Central Research Development Fund (CRDF) grant, and an NVIDIA hardware grant. We thank the reviewers and AC for their valuable suggestions. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Sources of information: Directorio de Producción Científica Scopus