Papers by dulce ponceleon
Page 1. ''What is in that video anyway?: In Search of Better Browsing ... more Page 1. ''What is in that video anyway?: In Search of Better Browsing Savitha Srinivasan, Duke Ponceleon, Arnon Amir, Dragutin Petkovic ZBM Almaden Research Center 650 Harry Road San Jose, CA 95120 USA savitha, duke, arnon, petkovic@almaden. ibm. corn ...
Proceedings of the ninth ACM international conference on Multimedia - MULTIMEDIA '01, 2001
INTRODUCTION The amount of text available on the web is orders of magnitude greater than the amou... more INTRODUCTION The amount of text available on the web is orders of magnitude greater than the amount of streaming media. A recent study indicates that less than 1% of documents on the web include streaming media links [AshourOl 1. Text continues to be the primary ...
This paper addresses the problem of automatic detection of salient video segments for real-world ... more This paper addresses the problem of automatic detection of salient video segments for real-world applications such as corporate training based on associated speech transcriptions. We present a novel segmentation algorithm based on automatic speech recognition (ASR) applied to the audio track of the video. Our feature set consists of word n-grams extracted from the imperfect speech transcriptions. We use a two-pass algorithm that combines a boundary-based method with a content-based method. In the first pass, we analyze the temporal distribution and the rate of arrival of features to compute an initial segmentation. In the second pass, we detect changes in content-bearing words by using the content-bearing features as queries in an information retrieval system. The content-based second pass validates the initial segments and merges them as needed. Variations in the structure of the audio/video content, and the accuracy of ASR have an impact on the feasibility of the segmentation task. For realistic data we observe that we can identify content-rich segments of the audio. In the best scenario a high-level table-of-contents is generated and in the worse scenario a single salient segment is identified. We illustrate the algorithm in detail with some examples and validate the data with manual segmentation boundaries.
ABSTRACT As digital media replaces analog, consumers are beginning to discover that their devices... more ABSTRACT As digital media replaces analog, consumers are beginning to discover that their devices no longer interoperate in the way they are accustomed to. For example, they may find it impossible to transfer a DVR-recorded show onto a portable media player from a different manufacturer. Content creators are weary of supporting myriad ecosystems, each with different rules, restrictions, DRM systems, and file formats. To address these issues, IBM has developed the Advanced Secure Content Cluster Technology (ASCCT), which provides the highest levels of DRM protections for content owners, while at the same time being completely invisible to the user, allowing them to freely move, play, and backup content from any and all ASCCT devices within their home network. We describe the plug-and-play automatic construction of an interoperable secure cluster of heterogeneous devices using a minimal message protocol.
Proceedings of SPIE, Feb 5, 2009
Broadcast encryption is a well established alternative to public key encryption for use in conten... more Broadcast encryption is a well established alternative to public key encryption for use in content protection systems. It offers significant performance benefits, as well as useful features such a one-to-many delivery, dynamic membership in the authorized receivers group, and provides anonymous access to content, permitting content protection systems to preserve privacy for consumers. Broadcast encryption has been successfully deployed to users for protection of commercial content on digital media such as flash memory devices and optical media for both standard-definition and high-definition content. In this paper, we present the Advanced Secure Content Cluster Technology which applies broadcast encryption to content protection for home media networks
Proceedings of SPIE, May 2, 1994
ABSTRACT
E3 UNCLASSIFIED/UNLIMITED CJ SAME AS RPT [ DTIC USERS UNCLASSIFIED 22a NAME OF RESPONSIBLE INDIVI... more E3 UNCLASSIFIED/UNLIMITED CJ SAME AS RPT [ DTIC USERS UNCLASSIFIED 22a NAME OF RESPONSIBLE INDIVIDUAL 3h TFi FPHOA.'Z l0"ifid, aP. rode) 22 r 'p-cri: VMaeOL Josenh Oliaer (45) 723-0571 DD Form 1473, JUN 86 Previous editions are obsolete SECURITY CLASS,FICATION OF ,-A
A small collection of successive frames of a video sequence of a talking person defines a subspac... more A small collection of successive frames of a video sequence of a talking person defines a subspace whose dimensionality is much less than the size of one frame. Any face image drawn from the video sequence can be associated with a subspace defined by itself and the frames close to it in time. Representing faces in their associated subspaces can
Proceedings of SPIE, Dec 19, 2001
Compact representations of video data can enable efficient video browsing. Such representations p... more Compact representations of video data can enable efficient video browsing. Such representations provide the user with information about the content of the particular sequence being examined while preserving the essential message. We propose a method to automatically generate video summaries for long videos. Our video summarization approach involves mainly two tasks: first, segmenting the video into small, coherent segments and second, ranking the resulting segments. Our proposed algorithm scores segments based on word frequency analysis of speech transcripts. Then a summary is generated by selecting the segments with the highest score to duration ratios and these are concatenating them. We have designed and performed a user study to evaluate the quality of summaries generated. Comparisons are made using our proposed algorithm and a random segment selection scheme based on statistical analysis of the user study results. Finally we discuss various issues that arise in summary evaluation with user studies.
In barrier methods for constrained optimization, the main work lies in solving large linear syste... more In barrier methods for constrained optimization, the main work lies in solving large linear systems Kp = r, where K is symmetric and indefinite. For linear programs, these KKT systems are usually reduced to smaller positive-definite systems AH −1 A T q = s, where H is a large principal submatrix of K. These systems can be solved more efficiently, but AH −1 A T is typically more ill-conditioned than K. In order to improve the numerical properties of barrier implementations, we discuss the use of "reduced KKT systems", whose dimension and condition lie somewhere in between those of K and AH −1 A T. The approach applies to linear programs and to positive semidefinite quadratic programs whose Hessian H is at least partially diagonal. We have implemented reduced KKT systems in a primal-dual algorithm for linear programming, based on the sparse indefinite solver MA27 from the Harwell Subroutine Library. Some features of the algorithm are presented, along with results on the netlib LP test set.
NASA Tech Briefs, 1992
ABSTRACT
International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999
We will demonstrate CueVideo, modular system that automatically analyzes video/audio in order to ... more We will demonstrate CueVideo, modular system that automatically analyzes video/audio in order to facilitate rapid retrieval, assimilation and browsing. CueVideo employs a combination of automatic indexing and analysis techniques in the domain of video, audio, speech recognition and IR. Its application are in the areas where user needs to quickly assimilate information from the video such as training, distance learning, corporate communication etc.
People can browse text documents very quickly and efficiently. A user can find, within seconds, a... more People can browse text documents very quickly and efficiently. A user can find, within seconds, a relevant document from a dozen retrieved items listed on a screen. On the other hand, browsing of multiple audio and video documents could be very time-consuming. Even the task of browsing a single one-hour video to find a relevant segment might take considerable time. Different visualization methods have been developed over the years to assist video browsing. This chapter covers several such methods, including storyboards, animation, slide shows, audio speedup, and adaptive accelerating fast playback. These views are integrated into a video search and retrieval system. A synchronized browser allows the user to switch views while keeping the context. The results of a usability study about audio speedup in different views are presented.
In recent years, a few transitions in multimedia applications may be observed. We can identify at... more In recent years, a few transitions in multimedia applications may be observed. We can identify at least two trends. Firstly, multimedia is being introduced in mainstream applications, leaving behind its traditional focus on highly professional markets, and on the gaming enhancement arena. Secondly, standardization bodies continue to work on media standards in order to provide a common approach to enable interoperability, better quality and efficiency under specified constraints. These new media standards are then added to existing archives of media, spanning a broad spectrum of legacy media standards. The result of these two trends is that a typical multimedia application, in order to be effective, needs to support many input types and provide the user with a seamless and transparent behavior. This paper discusses a pragmatic approach to this problem based on the object-oriented paradigm for real-world multimedia applications.
We will demonstrate CueVideo, modular system that automatically analyzes video/audio in order to ... more We will demonstrate CueVideo, modular system that automatically analyzes video/audio in order to facilitate rapid retrieval, assimilation and browsing. CueVideo employs a combination of automatic indexing and analysis techniques in the domain of video, audio, speech recognition and IR. Its application are in the areas where user needs to quickly assimilate information from the video such as training, distance learning, corporate communication etc.
This paper addresses the problem of automatic detection of salient video segments for real-world ... more This paper addresses the problem of automatic detection of salient video segments for real-world applications such as corporate training based on associated speech transcriptions. We present a novel segmentation algorithm based on automatic speech recognition (ASR) applied to the audio track of the video. Our feature set consists of word n-grams extracted from the imperfect speech transcriptions. We use a two-pass algorithm that combines a boundary-based method with a content-based method. In the first pass, we analyze the temporal distribution and the rate of arrival of features to compute an initial segmentation. In the second pass, we detect changes in content-bearing words by using the content-bearing features as queries in an information retrieval system. The content-based second pass validates the initial segments and merges them as needed. Variations in the structure of the audio/video content, and the accuracy of ASR have an impact on the feasibility of the segmentation task. For realistic data we observe that we can identify content-rich segments of the audio. In the best scenario a high-level table-of-contents is generated and in the worse scenario a single salient segment is identified. We illustrate the algorithm in detail with some examples and validate the data with manual segmentation boundaries.
An algorithm for the segmentation of an audio/video source into topically cohesive segments based... more An algorithm for the segmentation of an audio/video source into topically cohesive segments based on automatic speech recognition (ASR) transcriptions is presented. A novel two-pass algorithm is described that combines a boundary-based method with a content-based method. In the first pass, the temporal proximity and the rate of arrival of ngram features is analyzed in order to compute an initial segmentation. In the content-based second pass, changes in content-bearing words are detected by using the ngram features as queries in an information-retrieval system. The second pass validates the initial segments and merges them as needed. Feasibility of the segmentation task can vary enormously depending on the structure of the audio content, and the accuracy of ASR. For real-world corporate training data our method identifies, at worst, a single salient segment of the audio and, at best, a high-level table-of-contents. We illustrate the algorithm in detail with some examples and validate the results with segmentation boundaries generated manually.
International Conference on Multimedia Computing and Systems, Jun 7, 1999
Page 1. ''What is in that video anyway?: In Search of Better Browsing ... more Page 1. ''What is in that video anyway?: In Search of Better Browsing Savitha Srinivasan, Duke Ponceleon, Arnon Amir, Dragutin Petkovic ZBM Almaden Research Center 650 Harry Road San Jose, CA 95120 USA savitha, duke, arnon, petkovic@almaden. ibm. corn ...
Uploads
Papers by dulce ponceleon