Systems and Methods for Multiple-View and Depth-Based People Tracking and Human-Computer Interaction

Otto Korkalo

Research output: ThesisDoctoral ThesisCollection of Articles


This thesis presents systems and methods for real-time multiple-view and depth-based optical tracking for specific human-computer interaction and smart environment applications. Multiple-view systems are used for mitigating occlusions, enhancing tracking precision and accuracy, and extending the tracking volume to encompass larger scales. Depth cameras, on the other hand, offer the advantage of directly providing three-dimensional information from the scene, which makes them particularly appealing for spatial analysis. For multi-touch interaction, we developed a tracking approach that utilizes multiple side-view cameras to transform any flat surface into a multi-touch screen. Instead of explicitly triangulating the touch points, we employed an extended Kalman filter-based method in which the states of the touch points are updated whenever an observation is received from any of the cameras, ensuring low latency and rapid update rates. To position the cameras as close to the screen as possible, we employed fisheye lenses with modified distortion model, and explored the optimal camera configuration for achieving robust tracking with varying numbers of cameras and touch points. Accurate intrinsic and extrinsic calibration of cameras and camera systems is essential for optimal data fusion and state estimation. Typically, calibration procedures are carried out manually, which is not only time-consuming but can also be impractical. To address this issue in multiple-view depth camera-based people tracking systems, we have developed an auto-calibration method that directly derives the camera network topology and sensor calibration parameters from observations. Additionally, to account for the uncertainties in the observations during state estimation and data fusion, we developed a measurement noise model as part of the auto-calibration procedure. In mixed reality, the aim of camera pose estimation and tracking is to align the real and virtual environments in real-time and in all three dimensions. To achieve this goal, we developed a computer-aided design model-based depth camera tracking approach that utilizes a fast graphics processing unit-based iterative closest point method for pose estimation. This method can be applied to various objects, as long as a depth map from the object can be generated from the desired viewpoint. We conducted investigations into the applicability and performance of the method with different targets and concluded that the proposed approach exhibits reduced drift compared to simultaneous localization and mapping-based method and outperforms monocular edge-based method in terms of accuracy.
Translated title of the contributionSyvyyskameroihin ja usean kameran hyödyntämiseen perustuvia järjestelmiä ja menetelmiä ihmisten seurantaan sekä ihmisen ja tietokoneen väliseen vuorovaikutukseen
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
  • Kannala, Juho, Supervising Professor
  • Takala, Tapio, Supervising Professor
  • Takala, Tapio, Thesis Advisor
Print ISBNs978-952-64-1787-5
Electronic ISBNs978-952-64-1788-2
Publication statusPublished - 2024
MoE publication typeG5 Doctoral dissertation (article)


  • depth cameras
  • multiple-view systems
  • people tracking
  • camera pose estimation
  • multi-touch systems
  • mixed reality


Dive into the research topics of 'Systems and Methods for Multiple-View and Depth-Based People Tracking and Human-Computer Interaction'. Together they form a unique fingerprint.

Cite this