Onfocus detection aims at identifying whether the focus of the individual captured by a camera is on the camera or not. Based on the behavioral research, the focus of an individual during face-to-camera communication leads to a special type of eye contact, i.e., the individual-camera eye contact, which is a powerful signal in social communication and plays a crucial role in recognizing irregular individual status (e.g., lying or suffering mental disease) and special purposes (e.g., seeking help or attracting fans). Thus, developing effective onfocus detection algorithms is of significance for assisting the criminal investigation, disease discovery, and social behavior analysis. However, the review of the literature shows that very few efforts have been made toward the development of onfocus detector owing to the lack of large-scale public available datasets as well as the challenging nature of this task. To this end, this paper engages in the onfocus detection research by addressing the above two issues. Firstly, we build a large-scale onfocus detection dataset, named as the onfocus detection in the wild (OFDIW). It consists of 20623 images in unconstrained capture conditions (thus called “in the wild”) and contains individuals with diverse emotions, ages, facial characteristics, and rich interactions with surrounding objects and background scenes. On top of that, we propose a novel end-to-end deep model, i.e., the eye-context interaction inferring network (ECIIN), for onfocus detection, which explores eye-context interaction via dynamic capsule routing. Finally, comprehensive experiments are conducted on the proposed OFDIW dataset to benchmark the existing learning models and demonstrate the effectiveness of the proposed ECIIN.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Walczyk J J, Griffith D A, Yates R, et al. LIE detection by inducing cognitive load. Criminal Justice Behav, 2012, 39: 887–909
Chong E, Chanda K, Ye Z, et al. Detecting gaze towards eyes in natural social interactions and its use in child assessment. Proc ACM Interact Mob Wearable Ubiquitous Technol, 2017, 1: 1–20
Grossmann T. The eyes as windows into other minds. Perspect Psychol Sci, 2017, 12: 107–121
Prantl L, Heidekrueger P I, Broer P N, et al. Female eye attractiveness—Where beauty meets science. J Cranio-Maxillofacial Surgery, 2019, 47: 73–79
Zhao W, Zhao F, Wang D, et al. Defocus blur detection via multi-stream bottom-top-bottom network. IEEE Trans Pattern Anal Mach Intell, 2020, 42: 1884–1897
Huang G B, Learned-Miller E. Labeled Faces in the Wild: Updates and New Reporting Procedures. Technical Report, 14-003. 2014
Parkhi O M, Vedaldi A, Zisserman A, et al. Cats and dogs. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012. 3498–3505
Hinton G E, Sabour S, Frosst N. Matrix capsules with EM routing. In: Proceedings of International Conference on Learning Representations, Vancouver, 2018
Qodseya M, Panta F, Sedes F. Visual-based eye contact detection in multi-person interactions. In: Proceedings of 2019 International Conference on Content-Based Multimedia Indexing, Dublin, 2019. 1–6
Li B, Zhang H, He L, et al. Eye contact detection via CNN-based gaze direction regression. In: Proceedings of 2019 WRC Symposium on Advanced Robotics and Automation, Beijing, 2019. 425–431
Mitsuzumi Y, Nakazawa A, Nishida T. DEEP eye contact detector: robust eye contact bid detection using convolutional neural network. In: Proceedings of British Machine Vision Conference, London, 2017
Ye Z, Li Y, Fathi A, et al. Detecting eye contact using wearable eye-tracking glasses. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, 2012. 699–704
Zhang X, Sugano Y, Bulling A. Everyday eye contact detection using unsupervised gaze target discovery. In: Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, Québec City, 2017. 193–203
Smith B A, Yin Q, Feiner S K, et al. Gaze locking: passive eye contact detection for human-object interaction. In: Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, St. Andrews Scotland, 2013. 271–280
Zhang X, Sugano Y, Fritz M, et al. Appearance-based gaze estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 4511–4520
Huang Q, Veeraraghavan A, Sabharwal A. TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets. Machine Vision Appl, 2017, 28: 445–461
Park S, Spurr A, Hilliges O. Deep pictorial gaze estimation. In: Proceedings of the European Conference on Computer Vision, Munich, 2018. 721–738
Lian D, Hu L, Luo W, et al. Multiview multitask gaze estimation with deep convolutional neural networks. IEEE Trans Neural Netw Learn Syst, 2019, 30: 3010–3023
Kellnhofer P, Recasens A, Stent S, et al. Gaze360: physically unconstrained gaze estimation in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, Long Beach, 2019. 6912–6921
Müller P, Huang M X, Zhang X, et al. Robust eye contact detection in natural multi-person interactions using gaze and speaking behaviour. In: Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, Warsaw Poland, 2018. 1–10
Mora K A F, Monay F, Odobez J M. Eyediap: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceedings of the Symposium on Eye Tracking Research and Applications, Safety Harbor, 2014. 255–258
Rehg J, Abowd G, Rozga A, et al. Decoding children’s social behavior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland Oregon, 2013. 3414–3421
Zhang X, Sugano Y, Fritz M, et al. It’s written all over your face: full-face appearance-based gaze estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu Hawaii, 2017. 51–60
Krafka K, Khosla A, Kellnhofer P, et al. Eye tracking for everyone. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas Nevada, 2016. 2176–2184
Parekh V, Subramanian R, Jawahar C V. Eye contact detection via deep neural networks. In: Proceedings of International Conference on Human-Computer Interaction, Vancouver, 2017. 366–374
Ye Z, Li Y, Liu Y, et al. Detecting bids for eye contact using a wearable camera. In: Proceedings of the 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, Ljubljana, 2015. 1–8
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014. ArXiv:1409.1556
Zhou B, Khosla A, Lapedriza A, et al. Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas Nevada, 2016. 2921–2929
Cheng Y, Huang S, Wang F, et al. A coarse-to-fine adaptive network for appearance-based gaze estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, New York, 2020. 10623–10630
Yang Z, Luo T, Wang D, et al. Learning to navigate for fine-grained classification. In: Proceedings of the European Conference on Computer Vision, Munich, 2018. 420–435
Li P, Xie J, Wang Q, et al. Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 947–955
Wang Y, Morariu V I, Davis L S. Learning a discriminative filter bank within a CNN for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 4148–4157
Lin T Y, RoyChowdhury A, Maji S. Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 1309–1322
Chen Y, Bai Y, Zhang W, et al. Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 5157–5166
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770–778
Gao S H, Cheng M M, Zhao K, et al. Res2Net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell, 2021, 43: 652–662
Huang G, Liu Z, Pleiss G, et al. Convolutional networks with dense connectivity. IEEE Trans Pattern Anal Mach Intell, 2019. doi: https://doi.org/10.1109/TPAMI.2019.2918284
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7132–7141
Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis, 2015, 115: 211–252
Wang G, Wang K, Lin L. Adaptively connected neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 1781–1790
Tokozume Y, Ushiku Y, Harada T. Between-class learning for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 5486–5494
This work was supported in part by National Natural Science Foundation of China (Grant Nos. 61876140, 61773301), Fundamental Research Funds for the Central Universities (Grant No. JBZ170401), and China Postdoctoral Support Scheme for Innovative Talents (Grant No. BX20180236).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
Open access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhang, D., Wang, B., Wang, G. et al. Onfocus detection: identifying individual-camera eye contact from unconstrained images. Sci. China Inf. Sci. 65, 160101 (2022). https://doi.org/10.1007/s11432-020-3181-9
DOI: https://doi.org/10.1007/s11432-020-3181-9