Head Pose Estimation for Sign Language Video

Marcos Luzardo, Matti Karppa, Jorma Laaksonen, Tommi Jantunen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    5 Citations (Scopus)


    We address the problem of estimating three head pose angles in sign language video using the Pointing04 data set as training data. The proposed model employs facial landmark points and Support Vector Regression learned from the training set to identify yaw and pitch angles independently. A simple geometric approach is used for the roll angle. As a novel development, we propose to use the detected skin tone areas within the face bounding box as additional features for head pose estimation. The accuracy level of the estimators we obtain compares favorably with published results on the same data, but the smaller number of pose angles in our setup may explain some of the observed advantage. We evaluated the pose angle estimators also against ground truth values from motion capture recording of a sign language video. The correlations for the yaw and roll angles exceeded 0.9 whereas the pitch correlation was slightly worse. As a whole, the results are very promising both from the computer vision and linguistic points of view. © 2013 Springer-Verlag.
    Original languageEnglish
    Title of host publication18th Scandinavian Conference on Image Analysis, (SCIA 2013), Espoo, Finland, 17-20 June 2013
    Place of PublicationEspoo
    Publication statusPublished - 2013
    MoE publication typeA4 Article in a conference publication
    EventScandinavian Conference on Image Analysis - Espoo, Finland
    Duration: 17 Jun 201320 Jun 2013
    Conference number: 18

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    ISSN (Print)0302-9743


    ConferenceScandinavian Conference on Image Analysis
    Abbreviated titleSCIA

    Fingerprint Dive into the research topics of 'Head Pose Estimation for Sign Language Video'. Together they form a unique fingerprint.

    Cite this