Project Report Format DBUU

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

A Project Report on

<TITLE>
Submitted in the Partial Fulfilment of the Requirements for the Degree of

BATCHLORE OF TECHNOLOGY

in
COMPUTER SCIENCE & ENGINEERING

by

<StudentName> (Admission No)


<StudentName> (Admission No)
<StudentName> (Admission No)

Under the supervision of

<Guide Name>
<Designation>

Submitted to the
Department of Computer Science and Engineering
School of Engineering & Engineering (SoEC)
DEV BHOOMI UTTARAKHAND UNIVERSITY,UTTARAKHAND-248001

DECEMBER 2023
CANDIDATE’S DECLARATION

I hereby declare that the work presented in this project titled, “Title Name” submitted by me
in the partial fulfilment of the requirement of the award of the degree of Batchlor of
Technology (B.Tech.) submitted in the Department of Computer Science & Engineering,
Uttarakhand TechnicalUniversity, Dehradun, is an authentic record of my thesis carried out
under the guidance of <Guide Name>, <designation>, Department of Computer Science
and Engineering under SoEC, Dev Bhoomi Uttarakhand University, Dehradun

Date: <Student’s Name>


B. Tech (CSE)
Roll No.: 07060101180
Dev Bhoomi Institute of Technology, Dehradun

Approved By Mr. Dhajvir Singh Rai


Head of the Department
(Computer Science & Engineering)
Dev Bhoomi Institute of Technology, Dehradun

ii
CERTIFICATE
It is to certify that the thesis entitled “Title Name” which is being submitted by Student Name
to the UttarakhandUniversity Dehradun, in the fulfilment of the requirement for the award of
the degree of Batchlor of Technology (B. Tech.) is a record of bonafide research work carried
out by him under myguidance and supervision. The matter presented in this thesis has not been
submitted either in part or full to any University or Institute for award of any degree.

<Guide Name>
<Designation>
Department of Computer Science and Engg.
Dev Bhoomi Institute of Technology, Dehradun
(Uttarakhand) INDIA

iii
ABSTRACT

If you’ve ever gone back to rewatch the key focuses of a football or cricket coordinate long
after it was over, you know how impactful and vital highlight recordings can be! But that’s not
the as it were thing highlight recordings are confined to. From exhibiting the highpoints of a
wedding ceremony to meeting recordings they come in convenient in a few ways. Highlight
recordings basically whole up the most takeaways and noteworthy minutes of an occasion into
brief, effectively edible pieces so that watchers don’t get to observe the whole thing in arrange
to induce an understanding of what happened. But fair since these are brief, they don’t ought
to be boring and boring.

Highlight recordings deliver a speedy run-through of the foremost curiously parts of an


occasion and subsequently they’re brief and regularly fast-paced so they can avoid the viewer’s
consideration from faltering.

Given the hazardous development of online recordings, it is getting to be progressively


imperative to single out those highlights for groups of onlookers rather than requiring them
browsing each dull portion of the video. It is in a perfect world that the substance of extricated
highlights can be reliable with the subject of the video as well as the inclination of the person
group of onlookers.
In this context the problem at hand is to make available a concept of “Video Highlights
Detection and Extraction” what could help extract useful sections of a video recording for quick
access and consumption. These highlights can be defined for the audience basis their domain
of work and need for information.

iv
ACKNOWLEDGEMENT

At this ecstatic time of presenting this dissertation, first, the author bows to almighty God for
blessing with enough patience and strength to go through this challenging phase of life.
I would like to express a deep sense of gratitude and thanks to those people who have helped
me in the accomplishment of this M. Tech. thesis.
First and foremost, I would like to thank my supervisor, Mr. Dhajvir Singh Rai for their
expertise, guidance, enthusiasm, and patience. These were invaluable contributors whose
insightful guidance helped to the successful completion of this dissertation and spent many
hours patiently answering questions and troubleshooting the problems.
Beyond all this, I would like to give special thanks to my parents, Husband and daughter for
the unbounded affection, sweet love, constant inspiration, and encouragement. Without their
support this research would not have been possible.
Finally, I would like to thank all faculty, college management, administrative and technical
staff of School of Engineering & Computing, Uttarakhand Technical University,
Dehradun for their encouragement, assistance, and friendship throughout my candidature.

Date: <Student’s Name>

v
TABLE OF CONTENTS

Page
No.
Candidate’s Declaration i
Certificate ii
Abstract. iii
Acknowledgements
Contents.
List of Figures
List of Tables

CHAPTER 1: INTRODUCTION 1-8


1.1 Introduction 1
1.2 Classification of steganography techniques 2
1.3 Spatial domain 3
1.3.1 List significant bit subsitution(LSB)
1.3.2 Multi bit plane image substitution
1.3.3 Pixel value Differencing
1.4 Transform domain
1.5 Histogram shifting Imitated Reversible Data Hiding
1.6 Exploiting Modification Direction
1.7 Objective of work
1.8 Structure of the project

CHAPTER 2: LITERATURE SURVEY 10-12


2.1 Introduction 10
2.2 Review of Steganography 11
2.3 Review of EMD Technique 12

CHAPTER 3: PROPOSED WORK


3.1 Overview
3.2 Simulation Tool
3.3 Histogram Shifting Reversible Data Hiding
3.4 Median Edge Detection Predictor

CHAPTER 4 EXPERIMENTAL RESULTS


4.1 Software Environment
4.1.1 Workspace overview
4.1.2
Image Processing Toolbox
4.2Analysis of Wang et methods
4.2.1 Analysis of EMD method
CHAPTER 5 : CONCLUSION & FUTURE SCOPE

REFERENCES.

vi
LIST OF FIGURES

Figure No. Figure Name Page No.

Figure 1.1 Architecture of highlight generation system 2

Figure 1.2 Screenshot from a recording with argument 6

Figure 1.3 Image from a meeting that was conducted virtually 6

Figure 1.4 Meeting where the sentiment was of delight and 7

celebration

Figure 1.5 Overall architecture of proposed system 9

Figure 3.1 Extraction Component 30

Figure 3.2 Naïve Bayes Algorithm Steps 31

Figure 3.3 Analysis Component 34

Figure 3.4 Highlighter Engine 36

Figure 4.1 Screenshot from a recording on which the experiment 39

was conducted

vii
LIST OF TABLES

Table No. Table Name Page


No.

Table 4.1 Experimental results for different test/training dataset 39

Table 4.2 Experimental results for different training/validation/test 43


dataset

viii
CHAPTER 1
INTRODUCTION

1.1 Overview

The invention is aimed to target the video segment where the content is growing in size/length
and in number with the recent surge in the online meeting situation due to the pandemic. We
are now in a world where hybrid work is a norm and thus meetings mostly happen online on
platforms like Microsoft Teams, Zoom, Google Meet etc. This has led to generation of a lot of
video content in the form of recordings that the organizations deal with in different ways where
some choose to also record all meetings automatically. When meetings get recorded, they help
a great deal for the audience to revisit the content and help people to come to meetings offline
while they missed those.

In this context the problem at hand is to make available a concept of “video highlights” what
could help extract useful sections of a video recording for quick access and consumption. These
highlights can be defined for the audience basis their domain of work and need for information.
More on this has been covered in the user scenario section of this document.

1.2 Highlight Generation System Architecture

The highlight generation system can be broken down into components like Pre-processing,
Feature extraction, Classification, Highlight Recognition and Highlight Extraction as shown in
Figure 1

1.3 Applications of Highlight Generation Systems

The following user scenarios are the real use cases we believe this would apply to strongly and
how this will be used, talking about each domain in some detail:

Education:
In the EDU sector the amount of content that is generated in the form of recordings is huge and
that has taken more steam with the pandemic and online mode of education. With more and
more shift towards digital education where teachers from different parts of the world are
bringing classes on the web to make sure we use technology to its fullest. With this the problem

9
that came up is the amount of content that got generated vs consumed. This solution will help
bridge that gap and introduce a way for the consumers to evaluate what they need to spend time
on and what can be skipped basis their need. This would also help educators to understand the
analyze the type of content consumers are interested in.

Corporates:
In the business domain, meetings are one of the major use cases not just due to the pandemic
which has surged the numbers for sure but then even before that. In corporations there are many
meetings and the majority of those are recorded. To an extent that organization develop their

10
CHAPTER 2
LITERATURE REVIEW

2.1 Highlight Generation

One of the ways for extricating highlights from a badminton video was suggested by (Tao, Luo,
Shang, & Wang, 2020). This was suggested in an approach where, Firstly, classify the
distinctive sees of badminton recordings for division of video through classification building
demonstrate based upon learning via exchange, and accomplish high level precision along with
real-time division. Besides, relying on protest discovery by question identifying demonstrate
YOLOv3, finding players in a video section and calculating the players’ normal speed so as to
extricate the highlights from a video of badminton match. Video segments with higher normal
speed of players’ points towards the serious scenes/portions of a badminton game diversion, so
are able respect them as the sought highlights in a manner. Then extricate highlights by sorting
of the said badminton video fragments with higher normal speed players’ as identified before,
which leads users to save considerable time to appreciate the highlights of a whole video. Along
the side assess the strategy proposed by confirming if a fragment had conceded objective points
of interest for example energizing reaction of the gatherings of people and positive or optimistic
assessment of the storytellers. In similar lines, (Ringer, Nicolaou, & Walker, 2022) suggested
that the crude information is by and large communicated through the extraction frameworks of
features, e.g. a CNN or Convolutional Neural Network or sound recurrence examination
methods. Whereas it is hypothetically conceivable to act on the crude information itself, there
are for the most part numerous inputs within both the cases i.e., sound as well as visual
information, eventually making this a lot more challenging. Hence, employing a include
extractor entrusted with detecting a littler number of features that are salient is favoured. They
are at that point utilized within the frameworks downstream. Nearing the conclusion of the
pipeline, there has to be an instrument for decision making that decides on the off chance that
the approaching highlights do not shape portion of a highlight, or they do. The yield or output
from the said decision-making instrument is as a rule an ultimate yield of the system and
comprises of a flag highlighting time series over the entire length of a video wherein each point
within that flag has a relation to the probability that one single video fragment even may be a
highlight. Also, on a few points in between crude input and last yield, the information from
every input methodology for example information that is visual, sound information, ought to

10
be coalesced or merged to bind the framework together. Precisely how the combination
component is actualized may shift among models. For an illustration, it might be conceived
that for specific information groups to meld the crude inputs, e.g., connecting an RGB and
Optical/Visual Stream picture with each other at the crude information level. Then again,
concatenation can also happen post include extraction such that the process of decision making
has a lone set of highlights with which to create a choice, which we may allude to as a ‘fusion
feature’. At long last, possibly every modality is prepared totally independently, counting
choice making, and after that the choices made for each methodology are amassed in a few
frames. This accumulation is what we call ‘model fusion’.

(Hanjalic, GENERIC APPROACH TO HIGHLIGHTS EXTRACTION FROM A SPORT


VIDEO, 2003) proposed a strategy for extricating highlights from a don TV broadcast.
(Choros, 2017) proposed ordering of sports recordings based on content often relies on the
programmed discovery of highlights of videos. Those highlights may be identified with the
premise of referees’ or players’ motions as well as stances. A few signals and stances of players
are exceptionally normal for uncommon sports occasions. These extraordinary signals and
stances can be identified primarily in scenarios that are close-up and medium near see visuals.
For the view classification strategy to be effective it ought to be to begin with connected. Within
their paper sports video visuals agreeable to identify signals and poses of players are
distinguished and after that test comes about of the tests with video shot/visual classification
based upon motion acknowledgment. At that point vital and curious minutes in the soccer game
diversions are recognized as to when officials/referees hold the punishment cards over their
heads and see in the direction of the players that have perpetrated a genuine violation of rules.
That acknowledgment handle is rests on data that is visual, of sports recordings and does not
take into account any of the sensors. In the contrast, there was work done to utilize the content
classification show for occasion discovery of time-stamped football matches and after that tag
each brief content with a comparing occasion name. At that point we utilize the Optical
Character Acknowledgment (OCR) demonstration to adjust the live content coordinate time
and video time, and extricate the corresponding video clips. On the off chance that fundamental
to utilize a particular football occasion acknowledgment show for encourage fine-grained video
clips. We begin with utilize the content classification demonstrate to check each live brief
content (labelled as occasions, such as ruddy cards, objectives, punishment kicks, etc). At that
point we discover the comparing begin time and end time for each distinguished occasion (the
time here alludes to the coordinate time rather than the video time). At that point we utilize the

11
22
CHAPTER 3
PROPOSED METHODOLOGY

3.1 Proposed Methodology

As we concluded in the previous chapter and also learnt from the literature review of the work
done so far in this space, there is limited work done when it comes to the subject and domain
of recordings for meetings and mostly the focus has been to generate highlights for sports
recordings and that so based on the expression depicted from the audio of the recording. With
that thought in mind, here the proposed methodology is multi fold which is detailed in the
following section of this chapter. In here will try to create a complete view of what we are
trying to do.

We have tried to use video recordings for meetings where we start by doing a lot of refinement
and pre-processing on the subject video which is intended to remove any noise from the
recording including the sections of video where there was no conversation happening. Also,
we try extracting the audio from the video recording to be used as a specific attribute for our
algorithm in the process further. With all the pre-processing out of the way the next goal of the
proposed method is to try getting the audio file transcript and run speaker diarization over that.
Speaker diarization as a concept is explained in detail in the following sections of this chapter,
but in a summary, it is the concept of figuring out the number of speakers during a conversation
and then we use this information to divide the audio/video file into segments on the basis of
the speakers who spoke in that section. This helps us create a group or collection of smaller
audio/video sections of the original recording basis when one of the different identified
speakers spoke during the meeting length.

Our next step is to then to use the input from the user to understand what highlight context is
he/she looking for in the video. This is needed to make the system more configurable and
personalized where the user can define the context in which they intend to generate the
highlights from the recordings. Once the context is provided by the user we use the Naïve
Bayes Algorithm to define the probability of the sections of video recording which have
depicted that sentiment and also rate them basis the probability at which they have
demonstrated the sentiment we are trying to narrow down to. In this step we are then using this

23
information and the data from our previous step/component to narrow down even smaller
sections in audio/video file segments we created earlier to find the required sentiment from
those smaller groups.

This is then followed by using our highlighter engine to give a highlight score to all the
audio/video segments we have thus generated so far from the previous two components. How
this is done is by suing two different theories, firstly we use the probability assigned to each of
the video segments basis the sentiment that they have depicted and this is our first parameter
to the highlight formula as presented in this proposed methodology. The second aspect of this
is that with our experiments we have observed that in such video recorded meetings the
speakers who have spoken more are given lesser weightage as they are not generating a lot of
value from a highlight perspective. Taking a use case to understand this, in an executive level
meeting if there is a disagreement between parties over a topic the most important section
maybe is when one of the executives used strong words to express that emotion and then would
choose to stay silent, in this case the executive will be given a higher weightage for our score.

We use this then to generate a score for all the video segments we have created so far and the
details of generating this highlight score have been detailed in the following sections of this
chapter. Here we use this highlight score to then order the video segments in decreasing order
of their score to be treated has highlights and given as result to our users.

The invention can be broken down into the following broader areas of work/implantation. We
shall discuss the role of each individual component and finally talk about how the system comes
together to deliver the objective.

3.2 Extraction Component

This component deals with taking the video file in the supported formats and subjecting it to
the extraction algorithm to take the audio file in known formats that we can use for further
processing. Once the audio file is received, the audio is done with pre-processing to extract
noise which can be in the form of silent sections or background disturbances. After the pre-
processing step we then run feature extraction which helps us train the model and get the
number of speakers and the sections in audio where each of those speakers were talking. We
care calling these sections of audio as speaker conversations and are mapped to each speaker

24
in the form of start and end times in the whole audio script where that speaker was talking in
the meeting.

Background Noise Reduction:

The capacity to improve an audio segment that is noisy by the virtue of removing the main
audio content is considered to be the process of reducing background noise. Background noise
elimination is utilised almost in all areas, including in video conferencing systems, software
used for editing videos and audio files, and headphones with noise cancellation features.
Reducing background noise is still an area that is fast growing and evolving when spoken about
in technology space, and artificial intelligence has opened up a whole new range of methods
for doing it better.

Recurrent neural networks are models with sequential data recognition and comprehension
capabilities. In order to understand what is sequential data we can consider examples of the
location of an object across time or say music and similarly text.

RNNs are especially good at eliminating background noise because they can recognise patterns
over long periods of time, which is necessary for interpreting audio.

A feed-forward neural network with an input layer, a hidden layer, and an output layer as its
three primary layers. As the model goes through each item in a sequence, with recurrent neural
networks there is also an availability of a feedback loop that can be considered as hidden state
that is abstracted from the hidden layer which keeps updating itself.

Audio sample may be divided into a series of equally spaced time segments. The hidden state
is updated throughout each iteration, keeping track of the prior steps each time, with the
submission of individual sequence sample into the recurrent neural network. After each cycle,
the output is routed via a feed-forward neural network to create a new audio stream that is
completely free of background noise.

Speaker Diarization:

Speaker diarization is a method of deconstructing a recorded conversation to identify different


speakers so that companies can innovate on applications that can do speech analysis. When it
comes to recording and then evaluating the conversation between two human beings or multiple

25
human beings, the task gets complex and the most relevant and needful solution available is
speaker diarization.

When in an audio conversation the file or the recording can be broken into segments or sections
in which we can uniquely identify who is the speaker that the conversation is happening
between, makes it much simpler for human understanding and as well as in the field of Artificial
intelligence. This enabled both to comprehend the context and flow of the dialogue in subject.

Speaker diarization can be achieved by doing the following steps which is a two-way process:

Finding Speakers:

Speaker segmentation is another word for the same. In this step mostly analyses the
characteristics and zero-crossing rates of each voice to determine who is speaking and when.
Gender of each speaker is identified by features such as pitch.

Clustering Speakers:

Once a speaker is recognized, it is divided into separate segments, for the whole conversation
to be correctly marked or tagged and also understood easily, all the non-speech sections are
skipped. In order to do this the approach of probabilistic analysis is used in order to identify
the number of people who were contributing to the dialogue at a particular point of time.

Algorithm:

Step 1: Take input as the video file and extract audio using MoviePy library.

Step 2: Take the audio like and subject to pre-processing to remove noise.

Step3: Now the file is subjected to the voice activity detector that will remove the speech

sections from the non-speech sections, thus trimming silences from the audio

recording/file.

Step4: Now we break the audio into segments that are of varied length, these are created

basis the statements made in the conversation. Let’s call them audio segments of

36
CHAPTER 4
EXPERIMENTAL RESULTS

4.1 Data Analysis

There's an impressive riches of recordings made by fans individually and content livestreamed
and recorded. The livestream content or recording can be considered as an amalgamation of
highlighted sections which are not really highlights but intentionally captured sections of the
recordings, but then the concluded highlighted segments are shown or considered as highlights
of the recording. In the ideal case and assuming the right context the highlight content from
any recording can be done easily by using manual interference and inference. An example to
this is we could select a set of people who could watch the recorded content and then help us
select the sections in the content that was their favourite and let us know as highlights. As much
as this is desirable the manual step here is very costly and time taking and thus encourages us
to take advantage of the available cutting edge technology and datasets that can help us achieve
the same thing in no time and with no or minimal manual intervention.

Alternative sensible way of handling this could be to adjust the interesting video fragments by
the ones within the unique livestream recording, e.g. by recognizing those outlines in the
livestream content that are too within the highlight video that is in context here. In any case,
whereas smaller amount of expending as compared to human comment, it again features a
critical information collection punishment similar to citing traditions for the highlights and
livestream recordings together are unpredictable, needing human pre-processing. Moreover,
the outline coordinating prepare is costly from a computation standpoint. In this kind of
approach that is need of an equally desired highlight video for every livestream recorded video
and a livestream recorded video for every highlight video within the dataset. Such constraints
and difficulties make robotized recorded content highlights coordinating illogical for datasets
that are large-scale.

Instep, the approach which is of unlabelled content that is positive in nature, suggested by
(Xiong, Kalantidis, Ghadiyaram , & Grauman, 2019) prescribes collection of two datasets, one
that contains blended names, from livestream recordings and other of the positive names, from
suggested highlight recordings, although they have no relation among the various datasets used.

37
Following same approach, we thus collected many meeting video recordings that were self-
curated and also collected from various meeting recording sources. The dataset was then
divided into groups of training data set and test data sets.

Figure 4.1 Screenshot from a recording on which the experiment was conducted

4.2 Result Analysis

Table 4.1 Experimental results for different test/training dataset

Training Data Test Data F1 Score


20% 80% 84.3%
30% 70% 89.1%

40% 60% 90.02%

50% 50% 91.3%

38
Above results demonstrate the F1 scores for our algorithm that generates the highlights. It is
descriptive of the value where the highlight generated was effectively denoting the required
sentiment in the video and then the segment that was extracted was also demonstrating the
required expression.

We ran the algorithm on various set of video recordings that included recordings ranging from
those of meetings that had a lot of conflicts and arguments to video recordings that did not have
any specific sentiment highlighted. With this we were able to find out that our algorithm does
not perform well for videos that did not have any specific highlighted sentiment.

Also, with the algorithm being run on technical training recordings it was observed that the
algorithm generated too many segments of the video that were considered to be highlights and
thus was not very effective to create a summary. We shall choose to take this up as a future
improvement to our work.

We then also ran the experimentation with different sets of training/validation and test data and
observed our scores with each of these combinations to end up with the following scores. The
list of figures below shows the results observed on the algorithm with various combinations of
the dataset subsets.

Figure 4.2 Results with executed with 20/30/50 dataset combination

40
Figure 4.3 Graph demonstrating results with accuracy of 0.85

Figure 4.4 Results with executed with 30/20/50 dataset combination

41
Figure 4.5 Results with executed with 40/20/40 dataset combination

Figure 4.6 Results with executed with 40/30/30 dataset combination

Figure 4.7 Results with executed with 50/20/

42
CHAPTER 5
CONCLUSION & FUTURE SCOPE

4.1 Conclusion

4.2 Future Scope

43
REFERENCES

1. Agyeman, R., Muhammad, R., & Choi, G. S. (2019). Soccer Video Summarization
using Deep Learning. IEEE, 270-273.

2. Bertini, M., Bimbo, A. D., & Nunriati, W. (2004). Common Visual Cues for Sports
Highlights Detection. IEEE, 1399-1402.

3. Chakraborty, P. R., Tjondronegoro, D., Zhang, L., & Chandran, V. (n.d.). Using
Viewer’s Facial Expression and Heart Rate for Sports Video Highlights Detection.371-
378.

4. Chakraborty, R. P., Tjondronegoro, D., Zhang, L., & Chandran, V. (2016). Automatic
Identification of Sports Video Highlights using Viewer Interest Features. 55-62.

5. CHING, W.-S., TOH, P.-S., & ER, M.-H. (n.d.). A New Specular Highlights Detection
Algorithm Using Multiple Views. 474-478.

6. Choros, K. (2017). Highlights Extraction in Sports Videos Based on AutomaticPosture


and Gesture Recognition. Springer International Publishing, 619-628.

7. Gao, X., Liu, X., Yang, T., Deng, G., Peng, H., Zhang, Q., . . . Liu, J. (2020).Automatic
Key Moment Extraction And Highlights Generation Based On Comprehensive Soccer
Video Understanding. IEEE, 1-6.

8. Gygli, M., Grabner, H., & Gool, L. V. (2015). Video Summarization by Learning
Submodular Mixtures of Objectives. IEEE, 3090-3098.

9. Hanjalic, A. (2003). Generic Approach To Highlights Extraction From A Sport Video.


IEEE.

10. Hanjalic, A. (2005). Adaptive Extraction of Highlights From a Sport Adaptive


Extraction of Highlights From a Sport. IEEE, 1114-1122.

11. Hanjalic, A. (2005). Adaptive Extraction of Highlights From a Sport Video Based on
Excitement Modeling. IEEE, 1114-1122.

12. Hsieh, J.-T. T., Li, C. E., Liu, W., & Zeng, K.-H. (n.d.). Spotlight: A Smart Video
Highlight Generator. stanford.edu, 1-7.

44
13. Hu, L., He, W., Zhang, L., Xiong, H., & Chen, E. (2021). Detecting HighlightedVideo
Clips Through Emotion-Enhanced Audio-Visual Cues. IEEE.

14. Jiang, K., Chen, X., & Zhao, Q. (2011). Automatic composing soccer video highlights
with core-around event model. IEEE, 183-190.

15. Jiang, R., Qu, C., Wang, J., Wang, C., & Zheng, Y. (2020). Towards Extracting
Highlights From Recorded Live Videos: An Implicit Crowdsourcing Approach. IEEE,
1810-1813.

16. Kostoulas, T., Chanel, G., Muszynski, M., Lombardo, P., & Pun, T. (2015). Identifying
aesthetic highlights in movies from clustering of physiological andbehavioral signals.
IEEE.

17. Kudi, S., & Namboodiri, A. M. (2017). Words speak for Actions: Using Text to find
Video Highlights. Asian Conference on Pattern Recognition.

18. Li, Q., Chen, J., Xie, Q., & Han, X. (2020). Detecting boundaries of absolutehighlights
for sports videos.

19. Liu, C., Huang, Q., Jiang, S., & Zhang, W. (2006). Extracting Story Units In Sports
Video Based On Unsupervised Video Scene Clustering. IEEE, 1605-1608.

20. Longfei, Z., Yuanda, C., Gangyi, D., & Yong, W. (2008). A Computable Visual
Attention Model for Video Skimming. IEEE, 667-672.

21. Ma, Y.-F., & Zhang, H. J. (2005). Video Snapshot: A Bird View of Video Sequence.
IEEE.

22. Marlow, S., Sadlier, D. A., O’Connor, N., & Murphy, N. (2002). Audio Processing for
Automatic TV Sports Program Highlights Detection. ISSC.

23. Merler, M., Joshi, D., Nguyen, Q.-B., Hammer, S., Kent, J., Smith, J. R., & Feris, R.
S. (2017). Automatic Curation of Golf Highlights using Multimodal Excitement
Features. IEEE, 57-65.

24. Merler, M., Mac, K.-H. C., Joshi, D., Nyugen, Q.-B., Hammer, S., Kent, J., . . . Feris,
R. S. (2018). Automatic Curation of Sports Highlights using Multimodal Excitement
Features. Ieee Transactions On Multimedia, 1-16.

45
25. Ngo, C.-W., ma, Y.-F., & ZhANG, H.-J. (2005). Video Summarization and Scene
Detection by Graph Modeling. IEEE, 296-305.

26. Pun, H., Beek, p. v., & Sezan, M. I. (2001). Detection Of Slow-Motion ReplaySegments
In Sports Video For Highlights Generation. IEEE, 1649-1652.

27. Ringer, C., Nicolaou, M. A., & Walker, J. A. (2022). Autohighlight: Highlight
detection in League of Legends esports broadcasts via crowd-sourced data. Machine
Learning with Applications, 1-15.

28. Shih, H.-C., & Huang, C.-L. (2004). Detection Of The Highlights In Baseball Video
Program. IEEE, 595-598.

29. Tang, h., Kwatra, V., Sargin, M. E., & Gargi, U. (n.d.). Detecting Highlights In Sports
Videos: Cricket As A Test Case.

30. Tang, H., Kwatra, V., Sargin, M., & Gargi, U. (2011). Detecting Highlights In Sports
Videos: Cricket As A Test Case. IEEE.

31. Tang, K., Bao, Y., Zhao, Z., Zhu, L., Lin, Y., & Peng, Y. (2018). AutoHighlight :
Automatic Highlights Detection and Segmentation in Soccer Matches. IEEE, 4619-
4624.

32. Tao, S., Luo, J., Shang, J., & Wang, M. (2020). Extracting Highlights from a
Badminton Video Combine Transfer Learning with Players’ Velocity. International
Conference on Computer Animation and Social Agents, 82-91.

33. Tjondronegoro, D. W., Chen, Y.-P. P., & Pham, B. (2004). Classification of Self-
Consumable Highlights for Soccer Video Summaries. IEEE, 579-582.

34. WAN, K., YAn, X., & Xu, C. (2005). Automatic Mobile Sports Highlights. IEEE.

35. Wang, H., Yu, H., Chen, P., Hua, R., Yan, C., & Zuo, L. (2018). Unsupervised Video
Highlight Extraction via Query-related Deep Transfer. 24th International Conference
on Pattern Recognition, 2971-2976.

36. Wu, P. (2004). A Semi-automatic Approach to Detect Highlights for Home Video
Annotation. IEEE, 957-960.

37. Wung, P., Cui, R., & Yang, S.-Q. (2004). Contextual Browsing For Highlights InSports
Video. IEEE, 1951-1954.

50
38. Xiao, B., Yin, X., & Kang, S.-C. (2021). Vision-based method of automatically
detecting construction video highlights by integrating machine tracking and CNN
feature extraction. Automation in Construction, 1-13.

39. Xiong, B., Kalantidis, Y., Ghadiyaram , D., & Grauman, K. (2019). Less Is More:
Learning Highlight Detection From Video Duration. Less Is More: Learning Highlight
Detection From Video Duration, 1258-1267.

40. Xiong, Z., Radhakrishnan, R., Divakaran, A., & Huang, T. S. (2005). Highlights
Extraction From Sports Video Based On An Audio-Visual Marker Detection
Framework. IEEE.

41. Xiong, Z., Radhakriuhnan, R., Divakaran, A., & Huang, T. S. (2004). Effective And
Efficient Sports Highlights Extraction Using The Minimum Description Length
Criterion In Selecting Gmm Structures. IEEE, 1947-1950.

42. Yang, H., Wang, B., Lin, S., Wipf, D., Gio, M., & Guo, B. (2015). Unsupervised
Extraction of Video Highlights Via Robust Recurrent Auto-encoders. IEEE
International Conference on Computer Vision, 4633-4641.

43. Yao, T., Mei, T., & Rui, Y. (n.d.). Highlight Detection with Pairwise Deep Ranking for
First-Person Video Summarization. IEEE, 982-990.

51

You might also like