Projects per year
Abstract
This chapter focuses on the recent surge of interest in automating methods for describing audiovisual content ,whether for image search and retrieval, visual storytelling or in response to the rising demand for audio description following changes to regulatory frameworks. While computer vision communities have intensified research into the automatic generation of video descriptions (Bernardi et al., 2016), the automation of still image captioning remains a challenge in terms of accuracy (Husain and Bober, 2016). Moving images pose additional challenges linked to temporality, including co-referencing (Rohrbach et al., 2017) and other features of narrative continuity (Huang et al., 2016). Machine-generated descriptions are currently less sophisticated than their human equivalents, and frequently incoherent or incorrect. By contrast, human descriptions are more elaborate and reliable but are expensive to produce. Nevertheless, they offer information about visual and auditory elements in audiovisual content that can be exploited for research into machine training. Based on our research conducted in the EU-funded MeMAD project, this chapter outlines a methodological approach for a systematic comparison of human- and machine-generated video descriptions, drawing on corpus-based and discourse-based approaches, with a view to identifying key characteristics and patterns in both types of description, and exploiting human knowledge about video description for machine training.
This chapter focuses on the recent surge of interest in automating methods for describing audiovisual content, whether for image search and retrieval, visual storytelling or in response to the rising demand for audio description following changes to regulatory frameworks. A model for machine-generated content description is therefore likely to be a more achievable goal in the shorter term than a model for generating elaborate audio descriptions. Relevance Theory (RT) focuses on the human ability to derive meaning through inferential processes. RT asserts that these processes are highly inferential, drawing on common knowledge and cultural experience, and that they are guided by the human tendency to maximise relevance and assumption that speakers/storytellers normally choose the optimally relevant way of communicating their intentions. Moving on from basic comprehension of events to interpretation and conjecture requires the viewer to employ ‘extradiegetic’ references such as social convention, cultural norms and life experience.
This chapter focuses on the recent surge of interest in automating methods for describing audiovisual content, whether for image search and retrieval, visual storytelling or in response to the rising demand for audio description following changes to regulatory frameworks. A model for machine-generated content description is therefore likely to be a more achievable goal in the shorter term than a model for generating elaborate audio descriptions. Relevance Theory (RT) focuses on the human ability to derive meaning through inferential processes. RT asserts that these processes are highly inferential, drawing on common knowledge and cultural experience, and that they are guided by the human tendency to maximise relevance and assumption that speakers/storytellers normally choose the optimally relevant way of communicating their intentions. Moving on from basic comprehension of events to interpretation and conjecture requires the viewer to employ ‘extradiegetic’ references such as social convention, cultural norms and life experience.
Original language | English |
---|---|
Title of host publication | Innovation in Audio Description Research |
Publisher | Routledge |
Chapter | 8 |
Number of pages | 38 |
ISBN (Electronic) | 9781003052968 |
ISBN (Print) | 9781138356672 |
DOIs | |
Publication status | Published - 2020 |
MoE publication type | A3 Book section, Chapters in research books |
Publication series
Name | IATIS Yearbook |
---|
Fingerprint
Dive into the research topics of 'Comparing human and automated approaches to visual storytelling'. Together they form a unique fingerprint.Projects
- 1 Finished
-
MeMAD Laaksonen
Laaksonen, J., Sjöberg, M., Pehlivan Tort, S. & Laria Mantecon, H.
01/01/2018 → 31/03/2021
Project: EU: Framework programmes funding