Learning to Predict Head Pose in Remotely-Rendered Virtual Reality

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

1 Citation (Scopus)
43 Downloads (Pure)


Accurate characterization of Head Mounted Display (HMD) pose in a virtual scene is essential for rendering immersive graphics in Extended Reality (XR). Remote rendering employs servers in the cloud or at the edge of the network to overcome the computational limitations of either standalone or tethered HMDs. Unfortunately, it increases the latency experienced by the user; for this reason, predicting HMD pose in advance is highly beneficial, as long as it achieves high accuracy. This work provides a thorough characterization of solutions that forecast HMD pose in remotely-rendered virtual reality (VR) by considering six degrees of freedom. Specifically, it provides an extensive evaluation of pose representations, forecasting methods, machine learning models, and the use of multiple modalities along with joint and separate training. In particular, a novel three-point representation of pose is introduced together with a data fusion scheme for long-Term short-Term memory (LSTM) neural networks. Our findings show that machine learning models benefit from using multiple modalities, even though simple statistical models perform surprisingly well. Moreover, joint training is comparable to separate training with carefully chosen pose representation and data fusion strategies.

Original languageEnglish
Title of host publicationMMSys 2023 - Proceedings of the 14th ACM Multimedia Systems Conference
Number of pages12
ISBN (Electronic)979-8-4007-0148-1
Publication statusPublished - 7 Jun 2023
MoE publication typeA4 Conference publication
EventACM Multimedia Systems Conference - Vancouver, Canada
Duration: 7 Jun 202310 Jun 2023
Conference number: 14


ConferenceACM Multimedia Systems Conference
Abbreviated titleMMSys


  • machine learning
  • pose prediction
  • virtual reality
  • VR


Dive into the research topics of 'Learning to Predict Head Pose in Remotely-Rendered Virtual Reality'. Together they form a unique fingerprint.

Cite this