Abstrakti
We address the problem of estimating three head pose angles in sign language video using the Pointing04 data set as training data. The proposed model employs facial landmark points and Support Vector Regression learned from the training set to identify yaw and pitch angles independently. A simple geometric approach is used for the roll angle. As a novel development, we propose to use the detected skin tone areas within the face bounding box as additional features for head pose estimation. The accuracy level of the estimators we obtain compares favorably with published results on the same data, but the smaller number of pose angles in our setup may explain some of the observed advantage. We evaluated the pose angle estimators also against ground truth values from motion capture recording of a sign language video. The correlations for the yaw and roll angles exceeded 0.9 whereas the pitch correlation was slightly worse. As a whole, the results are very promising both from the computer vision and linguistic points of view. © 2013 Springer-Verlag.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | 18th Scandinavian Conference on Image Analysis, (SCIA 2013), Espoo, Finland, 17-20 June 2013 |
Julkaisupaikka | Espoo |
Kustantaja | Springer Gabler |
Sivut | 349-360 |
ISBN (painettu) | 978-3-642-38885-9 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 2013 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisussa |
Tapahtuma | Scandinavian Conference on Image Analysis - Espoo, Suomi Kesto: 17 kesäk. 2013 → 20 kesäk. 2013 Konferenssinumero: 18 |
Julkaisusarja
Nimi | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Kustantaja | Springer |
Vuosikerta | 7944 |
ISSN (painettu) | 0302-9743 |
Conference
Conference | Scandinavian Conference on Image Analysis |
---|---|
Lyhennettä | SCIA |
Maa/Alue | Suomi |
Kaupunki | Espoo |
Ajanjakso | 17/06/2013 → 20/06/2013 |