Abstract
Action recognition is a popular research topic in the computer vision community. A new trend has emerged in this field which seeks to recognise the action with as few frames as possible, called early action recognition. Visual bag-of-words methods that rely on local descriptors and visual words are one of the tools that have been used in both offline and early action recognition. In this paper, we propose an improvement to bag-of-words approaches by means of what we name patterns, i.e. co-occurrences of visual words. We compare our method with basic bag-of-words. Experiments on benchmark datasets suggest that our method achieves better accuracy than simple bag-of-words. Also, our method performs better than some of the state of the art methods at some observation ratios. Furthermore, some methods proposed in the literature require segments or video partitions as their working unit. Our method, however, is more granular and can update its prediction as soon as a new descriptor arrives.







Similar content being viewed by others
References
Cao Y, Barrett D, Barbu A, Narayanaswamy S, Yu H, Michaux A, Lin Y, Dickinson S, Siskind J M, Wang S (2013) Recognize human activities from partially observed videos. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2013.343, pp 2658–2665
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/11744047_33, pp 428–441
Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Proceedings - 2nd Joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, VS-PETS. https://doi.org/10.1109/VSPETS.2005.1570899, vol 2005, pp 65–72
Hassan M, Atieh M (2015) Action prediction in smart home based on reinforcement learning. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-319-14424-5_22, vol 8456. Springer, pp 207–212
Kantorov V, Laptev I (2014) Efficient feature extraction, encoding, and classification for action recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2014.332, pp 2593–2600
Khan M A, Javed K, Khan S A, Saba T, Habib U, Khan J A, Abbasi A A (2020) Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimedia Tools and Applications, 1–27. https://doi.org/10.1007/s11042-020-08806-9
Khan M A, Sharif M, Akram T, Raza M, Saba T, Rehman A (2020) Hand-crafted and deep convolutional neural network features fusion and selection strategy: an application to intelligent human action recognition. Appl Soft Comput J 87:105986. https://doi.org/10.1016/j.asoc.2019.105986
Kong Y, Fu Y (2016) Max-margin action prediction machine. IEEE Trans Pattern Anal Mach Intell 38(9):1844–1858. https://doi.org/10.1109/TPAMI.2015.2491928
Kong Y, Jia Y, Fu Y (2012) Learning human interaction by interactive phrases. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-642-33718-5_22, vol 7572 LNCS, pp 300–313
Kong Y, Kit D, Fu Y (2014) A discriminative model with multiple temporal scales for action prediction. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-319-10602-1_39, vol 8693 LNCS, pp 596–611
Kong Y, Tao Z, Fu Y (2017) Deep sequential context networks for action prediction. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2017.390, http://ieeexplore.ieee.org/document/8099873/, pp 3662–3670
Lai S, Zheng W S, Hu J F, Zhang J (2017) Global-local temporal saliency action prediction. IEEE Trans Image Process 27(5):2272–2285. https://doi.org/10.1109/TIP.2017.2751145
Laptev I (2005) On space-time interest points. In: International journal of computer vision. https://doi.org/10.1007/s11263-005-1838-7, vol 64, pp 107–123
Li K, Fu Y (2014) Prediction of human activity by discovering temporal sequence patterns. IEEE Trans Pattern Anal Mach Intell 36(8):1644–1657. https://doi.org/10.1109/TPAMI.2013.2297321
Liu J, Shahroudy A, Wang G, Duan L-Y, Kot AC (2018) Ssnet: scale selection network for online 3d action prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8349–8358
Liu J, Shahroudy A, Wang G, Duan L-Y, Kot Chichung A (2019) Skeleton-based online action prediction using scale selection network. IEEE Trans Pattern Anal Mach Intell, 1–1. https://doi.org/10.1109/tpami.2019.2898954
Ma S, Sigal L, Sclaroff S (2016) Learning activity progression in LSTMs for activity detection and early detection. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2016.214, http://ieeexplore.ieee.org/document/7780583/, pp 1942–1950
Manning C, Raghavan P, Schütze H (2010) Introduction to information retrieval. Cambridge University Press
Rana AJ, Tirupattur P, Duarte K, Demir U, Rawat Y, Shah M (2020) An online system for real-time activity detection in untrimmed surveillance videos Mamshad Nayeem Rizve. Appl Sci 10(1)
Rasouli A, Kotseruba I, Tsotsos JK (2019) Pedestrian action anticipation using contextual feature fusion in stacked RNNs. In: Proceedings of the 30th British Machine Vision Conference 2019, BMVC 2019
Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981. https://doi.org/10.1007/s00138-012-0450-4
Rodriguez MD, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: 26th IEEE Conference on computer vision and pattern recognition, CVPR. https://doi.org/10.1109/CVPR.2008.4587727
Ryoo MS (2011) Human activity prediction: early recognition of ongoing activities from streaming videos. In: Proceedings of the IEEE international conference on computer vision. https://doi.org/10.1109/ICCV.2011.6126349, pp 1036–1043
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings - international conference on pattern recognition. https://doi.org/10.1109/ICPR.2004.1334462, vol 3, pp 32–36
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th international conference on Multimedia - MULTIMEDIA ’07. https://doi.org/10.1145/1291233.1291311, http://portal.acm.org/citation.cfm?doid=1291233.1291311, p 357
Sharif M, Khan MA, Zahid F, Shah JH, Akram T (2020) Human action recognition: a framework of statistical weighted segmentation and rank correlation-based selection. Pattern Anal Applic 23(1):281–294. https://doi.org/10.1007/s10044-019-00789-0
Soomro K, Zamir AR (2014) Action recognition in realistic sports videos. Adv Comput Vis Pattern Recogn 71:181–208. https://doi.org/10.1007/978-3-319-09396-3_9
Tran DP, Nhu NG, Hoang VD (2018) Pedestrian action prediction based on deep features extraction of human posture and traffic scene. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-319-75420-8_53, https://link.springer.com/chapter/10.1007/978-3-319-75420-8_53, vol 10752 LNAI. Springer, pp 563–572
Vondrick C, Pirsiavash H, Torralba A (2016) Anticipating visual representations from unlabeled video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 98–106
Wang H, Yang W, Yuan C, Ling H, Hu W (2017) Human activity prediction using temporally-weighted generalized time warping. Neurocomputing 225:139–147. https://doi.org/10.1016/j.neucom.2016.11.004
Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1007/s11263-012-0594-8, pp 3169–3176
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79. https://doi.org/10.1007/s11263-012-0594-8
Wang H, Oneata D, Verbeek J, Schmid C (2016) A robust and efficient video representation for action recognition. Int J Comput Vis 119(3):219–238. https://doi.org/10.1007/s11263-015-0846-5
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision. https://doi.org/10.1016/j.neucom.2016.11.004, pp 3551–3558
Wang X, Hu J-F, Lai J-H, Zhang J, Zheng W-S (2019) Progressive teacher-student learning for early action prediction. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00367. Institute of Electrical and Electronics Engineers (IEEE), pp 3551–3560
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Saremi, M., Yaghmaee, F. Improved use of descriptors for early recognition of actions in video. Multimed Tools Appl 82, 2617–2633 (2023). https://doi.org/10.1007/s11042-022-13316-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13316-x