Functional Gaze Prediction in Egocentric Video

Si-Ahmed Naas, Xiaolan Jiang, Stephan Sigg, Yusheng Ji

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

1 Citation (Scopus)
187 Downloads (Pure)


Streaming 360 videos to a head-mounted display (HMD) client is challenging due to their high network resource consumption and computational load. This is due to the use of gaze point prediction or image saliency features from the field of view (FoV) since, in real-time scenarios, FoV extraction is computationally demanding. We propose a functional gaze prediction system that addresses these issues by relying on a tiling scheme for gaze prediction. We condition gaze point prediction on virtual reality (VR) content and long short-term memory (LSTM)-encoded eye movement history. Further, we encode image flow and saliency maps of RGB images via VGG16, using a convolutional neural network (CNN). Future gaze points are then predicted using a novel sinusoidal encoding technique. In experiments, our tile-based approach outperforms state-of-the-art FoV-based schemes in terms of computational load and predicted gaze position.
Original languageEnglish
Title of host publication18th International Conference on Advances in Mobile Computing and Multimedia, MoMM2020 - Proceedings
EditorsPari Delir Haghighi, Ivan Luiz Salvadori, Matthias Steinbauer, Ismail Khalil, Gabriele Kotsis
Number of pages8
ISBN (Electronic)9781450389242
Publication statusPublished - 30 Nov 2020
MoE publication typeA4 Article in a conference publication
EventInternational Conference on Advances in Mobile Computing & Multimedia - Chiang Mai, Thailand
Duration: 30 Nov 20202 Dec 2020
Conference number: 18


ConferenceInternational Conference on Advances in Mobile Computing & Multimedia
Abbreviated titleMoMM
CityChiang Mai


  • 360 video
  • convolutional neural network
  • gaze prediction
  • machine learning
  • pervasive HMD interaction
  • virtual and augmented reality


Dive into the research topics of 'Functional Gaze Prediction in Egocentric Video'. Together they form a unique fingerprint.

Cite this