Abstract
This paper discusses the topic of automatic segmentation and extraction of important segments of videos taken with Google Glasses. Using the information from both the video images and additional sensor data that are recorded concurrently, we devise methods that automatically divide the video into coherent segments and estimate the importance of the each segment. Such information then enables automatic generation of video summary that contains only the important segments. The features used include colors, image details, motions, and speeches. We then train multi-layer perceptrons for the two tasks (segmentation and importance estimation) according to human annotations. We also present a systematic evaluation procedure that compares the automatic segmentation and importance estimation results with those given by multiple users and demonstrate the effectiveness of our approach.










Similar content being viewed by others
References
Abdollahian G, Taskiran CM, Pizlo Z, Delp EJ (2010) Camera motion-based analysis of user generated video. IEEE Transactions on Multimedia 12:28–41
Ajmal M, Ashraf MH, Shakir M, Abbas Y, Shah FA (2012) Video summarization: techniques and classification. In: LNCS, vol 7594, pp 1–13
Boreczhy JS, Wilcox LD (1998) A hidden Markov model framework for video segmentation using audio and image features. In: Proc. IEEE 1998 conference on acoustics, speech and signal processing, vol. 6, pp 3741–3744
Cheatle P (2004) Media content and type selection from always-on wearable video. Proc ICPR 4:979–982
Cricri F, Dabov K, Curcio ID, Mate S, Gabbouj M (2014) Multimodal extraction of events and of information about the recording activity in user generated videos. Multimedia tools and applications 70:119–158
Damnjanovic U, Fernandez V, Izquierdo E, Martínez JM (2008) Event detection and clustering for surveillance video summarization. In: Proc. 9th international workshop on image analysis for multimedia interactive services, pp. 63–66
Ferman M, Tekalp AM, Mehrotra R (2002) Robust color histogram descriptors for video segment retrieval and identification. IEEE Trans Image Process 11:497–508
Fujimura K, Honda K, Uehara K (2002) Automatic video summarization by using color and utterance information. In: Proc. ICME, pp 49–52
Han J (2009) Object segmentation from consumer videos: a unified framework based on visual attention. IEEE Trans Consum Electron 55:1597–1605
Hua XS, Lu L, Zhang HJ (2004) Optimization-based automated home video editing system. IEEE Transactions on circuits and systems for video technology 14:572–583
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Kannan R, Ghinea G, Swaminathan S, Kannaiyan S (2013) Improving video summarization based on user preferences. In: Fourth National Conference on computer vision, Pattern Recognition, Image Processing and Graphics, pp 1–4
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1346–1353
Li Y, Lee SH, Yeh CH, Kuo CC (2006) Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Process Mag 23:79–89
Li Y, Fathi A, Rehg JM (2013) Learning to predict gaze in egocentric video. In: Proc. IEEE International Conference on Computer Vision, pp. 3216–3223
Lie WN, Lai CM (2004) News video summarization based on spatial and motion feature analysis. Advances in multimedia information processing – PCM 2004, LNCS 3332:246–255
Lienhart R (1999) Abstracting home video automatically. In: Proc. 7th ACM international conference on multimedia (part 2), pp 37–40
Lienhart R, Pfeiffer S, Effelsberg W (1997) Video abstracting. ACM Communications Magazine 40:54–62
Ma YF, Hua XS, Lu L, Zhang HJ (2005) A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia 7:907–919
Money AG, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19:121–143
Nagasaka A, Tanaka Y (1992) Automatic video indexing and full-video search for object appearances. In: Proceedings of the IFIP TC2/WG 2.6 second working conference on visual database systems II, pp 113–127
Nakamura Y, Ohde JY, Ohta Y (2000) Structuring personal activity records based on attention-analyzing videos from head mounted camera. Proc ICPR 4:222–225
[online] http://ggdiaries.com/
[Online] http://eshare.stust.edu.tw/EshareFile/2016_1/2016_1_3dca35cb.pptx
Peng WT, Chang CH, Chu WT, Huang WJ, Chou CN, Chang WY, Hung YP (2010) A real-time user interest meter and its applications in home video summarizing. In: Proc. 2010 I.E. international conference on multimedia and expo, pp 849–854
Poleg Y, Arora C, Peleg S (2014) Temporal segmentation of egocentric videos. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2537–2544
Rahtu E, Kannala J, Salo M, Heikkilä J (2010) Segmenting salient objects from images and videos. In: Proc. ECCV, pp. 366–379
Rallapalli S, Ganesan A, Chintalapudi K, Padmanabhan VN, Qiu L (2014) Enabling physical analytics in retail stores using smart glasses. In: Proc. 20th annual international conference on mobile computing and networking, pp. 115–126
Su YC, Grauman K (2016) Detecting engagement in egocentric video. Proc. ECCV, In, pp 454–471
Takahashi Y, Nitta N, Babaguchi N (2005) Video summarization for large sports video archives. In: Proc. ICME, pp. 1170–1173
Tse K, Wei J, Panchanathan S (1995) A scene change detection algorithm for MPEG compressed video sequences. In: Proc. IEEE 1995 Canadian conference on electrical and computer engineering, vol. 2, pp 827–830
Yamada K, Sugano Y, Okabe T, Sato Y, Sugimoto A, Hiraki K (2012) Attention prediction in egocentric video using motion and visual saliency. In: Proc. PSIVT, pp. 277–288
Zhang HJ, Low CY, Gong YH, Smoliar SW (1994) Video parsing using compressed data. In: Proc. SPIE, vol. 2182, pp 142–149
Zhang L, Xia Y, Mao K, Ma H, Shan Z (2015) An effective video summarization framework toward handheld devices. IEEE Trans Ind Electron 62:1309–1316
Zhu B, Liu W, Wei G, Yuan L (2014) A method for video synopsis based on multiple object tracking. In: Proc. 5th IEEE international conference on software engineering and service sciences, pp 414–418
Acknowledgements
This work is supported by the Ministry of Science and Technology of Taiwan under grant number MOST-104-3115-E-009-001.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chiu, YC., Liu, LY. & Wang, T. Automatic segmentation and summarization for videos taken with smart glasses. Multimed Tools Appl 77, 12679–12699 (2018). https://doi.org/10.1007/s11042-017-4910-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4910-8