2017 MofaModelbasedDeepConvolutional

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Model-based Autoencoder, Deep Convolutional Autoencoder.

Notes

Cited By

Quotes

Abstract

In this work we propose a novel model-based deep convolutional autoencoder that addresses the highly [[challenging problem of reconstructing a 3D human face from a single in-the-wild color image. To this end, we combine a convolutional encoder network with an expert-designed generative model that serves as decoder. The core innovation is the differentiable parametric decoder that encapsulates image formation analytically based on a generative model. Our decoder takes as input a code vector with exactly defined semantic meaning that encodes detailed face pose, shape, expression, skin reflectance and scene illumination. For the first time, a CNN encoder and an expert-designed generative model can be trained end-to-end in an unsupervised manner, which renders training on very large (unlabeled) real world data feasible. The obtained reconstructions compare favorably to current state-of-the-art approaches in terms of quality and richness of representation.

This paper contributes a new type of model-based deep convolutional autoencoder that joins forces of state-of-the-art generative and CNN-based regression approaches for dense 3D face reconstruction via a deep integration of the two on an architectural level. Our network architecture is inspired by recent progress on deep convolutional autoencoders, which, in their original form, couple a CNN encoder and a CNN decoder through a code-layer of reduced dimensionality [ 18, 33, 61 ]. Unlike previously used CNN-based decoders, our convolutional autoencoder deeply integrates an expert-designed decoder.

References

  • [1] O. Alexander, M. Rogers, W. Lambeth, M. Chiang, and P. Debevec. The Digital Emily Project: photoreal facial modeling and animation. In ACM SIGGRAPH Courses, pages 12:1– 12:15. ACM, 2009. 4
  • [2] V. Blanz, C. Basso, T. Poggio, and T. Vetter. Reanimating faces in images and video. In Computer graphics forum, pages 641–650. Wiley Online Library, 2003. 1, 2
  • [3] V. Blanz and T. Vetter. A morphable model for the synthesis of 3d faces. In Proc. SIGGRAPH, pages 187–194. ACM Press/Addison-Wesley Publishing Co., 1999. 1, 2, 4, 5
  • [4] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000. 6
  • [5] A. Bulat and G. Tzimiropoulos. Two-stage convolutional part heatmap regression for the 1st 3D face alignment in the wild (3DFAW) challenge. In ECCVW, 2016. 2
  • [6] C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou. Facewarehouse: A 3D facial expression database for visual computing. IEEE TVCG, 20(3):413–425, 2014. 4
  • [7] C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20(3):413–425, Mar. 2014. 5, 8
  • [8] G. G. Chrysos, E. Antonakos, S. Zafeiriou, and P. Snape. Offline deformable face tracking in arbitrary videos. In The IEEE International Conference on Computer Vision (ICCV) Workshops, December 2015. 5, 6
  • [9] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):681–685, June 2001. 2
  • [10] C. Ding and D. Tao. Robust face recognition via multimodal deep face representation. IEEE Transactions on Multimedia, 17(11):2049–2058, 2015. 3
  • [11] C. N. Duong, K. Luu, K. G. Quach, and T. D. Bui. Deep appearance models: A deep boltzmann machine approach for face modeling. arXiv:1607.06871, July 2016. 3
  • [12] O. Fried, E. Shechtman, D. B. Goldman, and A. Finkelstein. Perspective-aware manipulation of portrait photos. ACM Trans. Graph., 35(4), July 2016. 1, 2
  • [13] S. Gao, Y. Zhang, K. Jia, J. Lu, and Y. Zhang. Single Sample Face Recognition via Learning Deep Supervised Autoencoders. IEEE Transactions on Information Forensics and Security, 10(10):2108–2118, 2015. 3
  • [14] P. Garrido, M. Zollh¨ofer, D. Casas, L. Valgaerts, K. Varanasi, P. Pérez, and C. Theobalt. Reconstruction of personalized 3D face rigs from monocular video. ACM Transactions on Graphics, 35(3):28:1–15, June 2016. 1, 2, 5, 7
  • [15] E. Grant, P. Kohli, and M. van Gerven. Deep disentangled representations for volumetric reconstruction. In ECCVW, 2016. 2, 3, 8
  • [16] R. A. G¨uler, G. Trigeorgis, E. Antonakos, P. Snape, S. Zafeiriou, and I. Kokkinos. Densereg: Fully convolutional dense shape regression in-the-wild. arXiv:1612.01202, Dec. 2016. 3
  • [17] A. Handa, M. Bloesch, V. Patraucean, S. Stent, J. McCormac, and A. J. Davison. gvnn: Neural network library for geometric computer vision. In ECCV, 2016. 3
  • [18] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, July 2006. 2, 3
  • [19] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007. 5, 6
  • [20] P. Huber, P. Kopp, M. R¨atsch, W. Christmas, and J. Kittler. 3D face tracking and texture fusion in the wild. arXiv:1605.06764, May 2016. 2
  • [21] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spatial transformer networks. In NIPS, 2015. 3
  • [22] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. 6
  • [23] X. Jin and X. Tan. Face alignment in-the-wild: A survey. arXiv:1608.04188, Aug. 2016. 2
  • [24] M. Kan, S. Shan, H. Chang, and X. Chen. Stacked Progressive Auto-Encoders (SPAE) for Face Recognition Across Poses. 2014. 3
  • [25] I. Kemelmacher-Shlizerman, A. Sankar, E. Shechtman, and S. M. Seitz. Being john malkovich. In ECCV, 2010. 1, 2
  • [26] I. Kemelmacher-Shlizerman and S. M. Seitz. Face reconstruction in the wild. In ICCV, 2011. 1, 2
  • [27] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012. 5, 7
  • [28] T. D. Kulkarni, W. Whitney, P. Kohli, and J. B. Tenenbaum. Deep convolutional inverse graphics network. In NIPS, 2015. 2, 3, 8
  • [29] S. Laine, T. Karras, T. Aila, A. Herva, and J. Lehtinen. Facial performance capture with deep neural networks. arXiv:1609.06536, 2016. 3
  • [30] M. Li, W. Zuo, and D. Zhang. Convolutional network for attribute-driven and identity-preserving human face generation. arXiv:1608.06434, Aug. 2016. 3
  • [31] F. Liu, D. Zeng, J. Li, and Q. Zhao. 3D face reconstruction via cascaded regression in shape space. arXiv:1509.06161, 2016. 2
  • [32] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In: Proceedings of International Confer- ence on Computer Vision (ICCV), 2015. 5, 6
  • [33] J. Masci, U. Meier, D. Cires¸an, and J. Schmidhuber. Stacked convolutional auto-encoders for hierarchical feature extraction. In: Proceedings of The International Conference on Artificial Neural Net- works, 2011. 2, 3
  • [34] C. M¨uller. Spherical harmonics. Springer, 1966. 4
  • [35] V. Nair, J. Susskind, and G. E. Hinton. Analysis-by-synthesis by learning to invert generative black boxes. In: Proceedings of The International Conference on Artificial Neural Networks (ICANN), 2008. 3
  • [36] NVIDIA. NVIDIA CUDA Programming Guide 2.0. 2008. 6
  • [37] O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In British Machine Vision Conference, 2015. 5, 7 1282
  • [38] X. Peng, R. S. Feris, X.Wang, and D. N. Metaxas. A recurrent encoder-decoder network for sequential face alignment. In ECCV, 2016. 2
  • [39] K. G. Quach, C. N. Duong, K. Luu, and T. D. Bui. Robust deep appearance models. arXiv:1607.00659, July 2016. 3
  • [40] R. Ranjan, S. Sankaranarayanan, C. D. Castillo, and R. Chellappa. An all-in-one convolutional neural network for face analysis. arXiv:1611.00851, 2016. 2
  • [41] E. Richardson, M. Sela, and R. Kimmel. 3D face reconstruction by learning from synthetic data. In 3DV, 2016. 1, 2, 3, 6
  • [42] E. Richardson, M. Sela, R. Or-El, and R. Kimmel. Learning detailed face reconstruction from a single image. In The IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), July 2017. 2, 3, 6
  • [43] S. Romdhani and T. Vetter. Estimating 3D Shape and Texture Using Pixel Intensity, Edges, Specular Highlights, Texture Constraints and a Prior. CVPR, 2005. 2
  • [44] S. Romdhani and T. Vetter. Estimating 3d shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In CVPR, pages 986–993, Washington, DC, USA, 2005. IEEE Computer Society. 6
  • [45] J. Roth, Y. Tong, and X. Liu. Adaptive 3d face reconstruction from unconstrained photo collections. December 2016. 1, 2
  • [46] J. M. Saragih, S. Lucey, and J. F. Cohn. Deformable model fitting by regularized landmark mean-shift. 91(2):200–215, 2011. 5, 7
  • [47] S. Sch¨onborn, B. Egger, A. Forster, and T. Vetter. Background modeling for generative image models. Comput. Vis. Image Underst., 136(C):117–127, July 2015. 8
  • [48] J. Shen, S. Zafeiriou, G. G. Chrysos, J. Kossaifi, G. Tzimiropoulos, and M. Pantic. The first facial landmark tracking in-the-wild challenge: Benchmark and results. In ICCVW, December 2015. 5, 6
  • [49] R. W. Sumner and J. Popovi´c. Deformation transfer for triangle meshes. ACM TOG, 23(3):399–405, 2004. 4
  • [50] Y. Sun, X. Wang, and X. Tang. Deep Convolutional Network Cascade for Facial Point Detection. 2013. 2
  • [51] S. Suwajanakorn, I. Kemelmacher-Shlizerman, and S. M. Seitz. Total moving face reconstruction. In ECCV, 2014. 1, 2
  • [52] S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher- Shlizerman. What makes tom hanks look like tom hanks. In ICCV, 2015. 1, 2
  • [53] Y. Tang, R. Salakhutdinov, and G. E. Hinton. Deep lambertian networks. 2012. 3
  • [54] J. Thies, M. Zollh¨ofer, M. Stamminger, C. Theobalt, and M. Niener. Face2Face: Real-time face capture and reenactment of RGB videos. In CVPR, 2016. 1, 2, 5, 6, 7
  • [55] A. Tuan Tran, T. Hassner, I. Masi, and G. Medioni. Regressing robust and discriminative 3d morphable models with a very deep neural network. In The IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), July 2017. 2, 3, 6
  • [56] G. Tzimiropoulos. Project-out cascaded regression with an application to face alignment. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015. 5, 6
  • [57] N. Wang, X. Gao, D. Tao, and X. Li. Facial Feature Point Detection: A Comprehensive Survey. arXiv:1410.1037, Oct. 2014. 2
  • [58] Y. Wu and Q. Ji. Discriminative deep face shape model for facial point detection. International Journal of Computer Vision, 113(1):37–53, 2015. 2
  • [59] X. Yan, J. Yang, E. Yumer, Y. Guo, and H. Lee. Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision. arXiv:1612.00814, Dec. 2016. 3
  • [60] J. Zhang, S. Shan, M. Kan, and X. Chen. Coarse-to-Fine Auto- Encoder Networks (CFAN) for Real-Time Face Alignment. 2014. 3
  • [61] F. Zhao, J. Feng, J. Zhao, W. Yang, and S. Yan. Robust lstm-autoencoders for face de-occlusion in the wild. arXiv:1612.08534, 2016. 2, 3
  • [62] A. Zhmoginov and M. Sandler. Inverting face embeddings with convolutional neural networks. arXiv:1606.04189, June 2016. 3
  • [63] E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin. Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade. In CVPRW, 2013. 2
  • [64] S. Zhu, S. Liu, C. C. Loy, and X. Tang. Deep cascaded binetwork for face hallucination. arXiv:1607.05046, July 2016. 3
  • [65] Z. Zhu, P. Luo, X. Wang, and X. Tang. Deep Learning Identity-Preserving Face Space. 2013. 3
  • [66] Z. Zhu, P. Luo, X.Wang, and X. Tang. Multi-view perceptron: a deep model for learning face identity and view representations. 2014. 3 1283;


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2017 MofaModelbasedDeepConvolutionalAyush Tewari
Michael Zollhofer
Hyeongwoo Kim
Pablo Garrido
Florian Bernard
Patrick Perez
Christian Theobalt
Mofa: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction2017