TY - JOUR
T1 - Do you see what I see? Measuring the semantic differences in image-recognition services' outputs
AU - Berg, Anton
AU - Nelimarkka, Matti
N1 - Funding Information:
We generously thank C. V. Akerlund Media Foundation and Research Council of Finland for supporting our work. We thank Pentzold et al. ( 2018 ), Hokka and Nelimarkka ( 2020 ), and Thelwall et al. ( 2016 ) for providing us with image data from their original research.
Publisher Copyright:
© 2023 The Authors. Journal of the Association for Information Science and Technology published by Wiley Periodicals LLC on behalf of Association for Information Science and Technology.
PY - 2023/11
Y1 - 2023/11
N2 - As scholars increasingly undertake large-scale analysis of visual materials, advanced computational tools show promise for informing that process. One technique in the toolbox is image recognition, made readily accessible via Google Vision AI, Microsoft Azure Computer Vision, and Amazon's Rekognition service. However, concerns about such issues as bias factors and low reliability have led to warnings against research employing it. A systematic study of cross-service label agreement concretized such issues: using eight datasets, spanning professionally produced and user-generated images, the work showed that image-recognition services disagree on the most suitable labels for images. Beyond supporting caveats expressed in prior literature, the report articulates two mitigation strategies, both involving the use of multiple image-recognition services: Highly explorative research could include all the labels, accepting noisier but less restrictive analysis output. Alternatively, scholars may employ word-embedding-based approaches to identify concepts that are similar enough for their purposes, then focus on those labels filtered in.
AB - As scholars increasingly undertake large-scale analysis of visual materials, advanced computational tools show promise for informing that process. One technique in the toolbox is image recognition, made readily accessible via Google Vision AI, Microsoft Azure Computer Vision, and Amazon's Rekognition service. However, concerns about such issues as bias factors and low reliability have led to warnings against research employing it. A systematic study of cross-service label agreement concretized such issues: using eight datasets, spanning professionally produced and user-generated images, the work showed that image-recognition services disagree on the most suitable labels for images. Beyond supporting caveats expressed in prior literature, the report articulates two mitigation strategies, both involving the use of multiple image-recognition services: Highly explorative research could include all the labels, accepting noisier but less restrictive analysis output. Alternatively, scholars may employ word-embedding-based approaches to identify concepts that are similar enough for their purposes, then focus on those labels filtered in.
UR - http://www.scopus.com/inward/record.url?scp=85169816480&partnerID=8YFLogxK
U2 - 10.1002/asi.24827
DO - 10.1002/asi.24827
M3 - Article
AN - SCOPUS:85169816480
SN - 2330-1635
VL - 74
SP - 1307
EP - 1324
JO - JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY
JF - JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY
IS - 11
ER -