Abstrakti
General purpose search engines are used for searching not only plain text but also multimedia information. In multimodal search, it is common to use multiple queries to find the demanded information in the different media modalities. In most cases, however, it is hard to prepare such multimodal search queries. In addition, the semantic connection between the individual modalities is often weak or totally lacking in such multimodal search. Hence, single modality searching makes it hard to find the searched for information in the multimodal domain. In this paper we improve the Deep Boltzmann Machine applied to multimodal search by using GoogLeNet deep convolutional neural network and semantic concept features. We also propose a supervised method to produce a similarity map between hidden topics in text documents and the visual concepts in corresponding images, and an unsupervised method which uses the hidden topics in the documents as pseudo labels. The model can be used to infer and generate pseudo tags for untagged input query images in order to complement an image-only query to a multimodal one. The classification results with pseudo tag inputs show in our experiments improvement compared to the original tag inputs.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | 2017 International Joint Conference on Neural Networks, IJCNN 2017 - Proceedings |
Kustantaja | IEEE |
Sivut | 1305-1312 |
Sivumäärä | 8 |
ISBN (elektroninen) | 9781509061815 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 30 kesäk. 2017 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisussa |
Tapahtuma | International Joint Conference on Neural Networks - Anchorage, Yhdysvallat Kesto: 14 toukok. 2017 → 19 toukok. 2017 |
Conference
Conference | International Joint Conference on Neural Networks |
---|---|
Lyhennettä | IJCNN |
Maa/Alue | Yhdysvallat |
Kaupunki | Anchorage |
Ajanjakso | 14/05/2017 → 19/05/2017 |