Face Pose Estimation Task

From GM-RKB
Jump to navigation Jump to search

A Face Pose Estimation Task is a Preprocessing Task that is based on a Face Recognization Task.



References

2017a

2017b

  • (Xu & Kakadiaris, 2017) ⇒ Xu, X., & Kakadiaris, I. A. (2017, May). "Joint head pose estimation and face alignment framework using global and local CNN features". In Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on (pp. 642-649). DOI: 10.1109/FG.2017.81.
    • ABSTRACT: In this paper, we explore global and local features obtained from Convolutional Neural Networks (CNN) for learning to estimate head pose and localize landmarks jointly. Because there is a high correlation between head pose and landmark locations, the head pose distributions from a reference database and learned local deep patch features are used to reduce the error in the head pose estimation and face alignment tasks. First, we train GNet on the detected face region to obtain a rough estimate of the pose and to localize the seven primary landmarks. The most similar shape is selected for initialization from a reference shape pool constructed from the training samples according to the estimated head pose. Starting from the initial pose and shape, LNet is used to learn local CNN features and predict the shape and pose residuals. We demonstrate that our algorithm, named JFA, improves both the head pose estimation and face alignment. To the best of our knowledge, this is the first system that explores the use of the global and local CNN features to solve head pose estimation and landmark detection tasks jointly.

2016

2015

  • (Saeed et al., 2015) ⇒ Saeed, A., Al-Hamadi, A., & Ghoneim, A. (2015). [Head pose estimation on top of haar-like face detection: A study using the kinect sensor]. Sensors, 15(9), 20945-20966 DOI: 10.3390/s150920945.
  • ABSTRACT: Head pose estimation is a crucial initial task for human face analysis, which is employed in several computer vision systems, such as: facial expression recognition, head gesture recognition, yawn detection, etc. In this work, we propose a frame-based approach to estimate the head pose on top of the Viola and Jones (VJ) Haar-like face detector. Several appearance and depth-based feature types are employed for the pose estimation, where comparisons between them in terms of accuracy and speed are presented. It is clearly shown through this work that using the depth data, we improve the accuracy of the head pose estimation. Additionally, we can spot positive detections, faces in profile views detected by the frontal model, that are wrongly cropped due to background disturbances. We introduce a new depth-based feature descriptor that provides competitive estimation results with a lower computation time. Evaluation on a benchmark Kinect database shows that the histogram of oriented gradients and the developed depth-based features are more distinctive for the head pose estimation, where they compare favorably to the current state-of-the-art approaches. Using a concatenation of the aforementioned feature types, we achieved a head pose estimation with average errors not exceeding 5:1; 4:6; 4:2 for pitch, yaw and roll angles, respectively.

2012

  • (Zhu & Ramanan, 2012) ⇒ Zhu, X., & Ramanan, D. (2012, June). Face detection, pose estimation, and landmark localization in the wild. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 2879-2886). DOI: 10.1109/CVPR.2012.6248014
    • ABSTRACT: We present a unified model for face detection, pose estimation, and landmark estimation in real-world, cluttered images. Our model is based on a mixtures of trees with a shared pool of parts; we model every facial landmark as a part and use global mixtures to capture topological changes due to viewpoint. We show that tree-structured models are surprisingly effective at capturing global elastic deformation, while being easy to optimize unlike dense graph structures. We present extensive results on standard face benchmarks, as well as a new “in the wild” annotated dataset, that suggests our system advances the state-of-the-art, sometimes considerably, for all three tasks. Though our model is modestly trained with hundreds of faces, it compares favorably to commercial systems trained with billions of examples (such as Google Picasa and face.com).

2008