Context aware query image representation for particular object retrieval

Zakaria Laskar*, Juho Kannala

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

24 Citations (Scopus)


The current models of image representation based on Convolutional Neural Networks (CNN) have shown tremendous performance in image retrieval. Such models are inspired by the information flow along the visual pathway in the human visual cortex. We propose that in the field of particular object retrieval, the process of extracting CNN representations from query images with a given region of interest (ROI) can also be modelled by taking inspiration from human vision. Particularly, we show that by making the CNN pay attention on the ROI while extracting query image representation leads to significant improvement over the baseline methods on challenging Oxford5k and Paris6k datasets. Furthermore, we propose an extension to a recently introduced encoding method for CNN representations, regional maximum activations of convolutions (R-MAC). The proposed extension weights the regional representations using a novel saliency measure prior to aggregation. This leads to further improvement in retrieval accuracy.

Original languageEnglish
Title of host publicationImage Analysis - 20th Scandinavian Conference, SCIA 2017, Proceedings
Number of pages12
Volume10270 LNCS
Publication statusPublished - 2017
MoE publication typeA4 Article in a conference publication
EventScandinavian Conference on Image Analysis - Tromso, Norway
Duration: 12 Jun 201714 Jun 2017
Conference number: 20

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10270 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349


ConferenceScandinavian Conference on Image Analysis
Abbreviated titleSCIA


  • Image retrieval


Dive into the research topics of 'Context aware query image representation for particular object retrieval'. Together they form a unique fingerprint.

Cite this