Papers by Giovanni Giuffrida
Proceedings of SPIE - The International Society for Optical Engineering, 2007
The rapid increase of technological innovations in the mobile phone industry induces the research... more The rapid increase of technological innovations in the mobile phone industry induces the research community to develop new and advanced systems to optimize services offered by mobile phones operators (telcos) to maximize their effectiveness and improve their business. Data mining algorithms can run over data produced by mobile phones usage (e.g. image, video, text and logs files) to discover user's preferences and predict the most likely (to be purchased) offer for each individual customer. One of the main challenges is the reduction of the learning time and cost of these automatic tasks. In this paper we discuss an experiment where a commercial offer is composed by a small picture augmented with a short text describing the offer itself. Each customer's purchase is properly logged with all relevant information. Upon arrival of new items we need to learn who the best customers (prospects) for each item are, that is, the ones most likely to be interested in purchasing that specific item. Such learning activity is time consuming and, in our specific case, is not applicable given the large number of new items arriving every day. Basically, given the current customer base we are not able to learn on all new items. Thus, we need somehow to select among those new items to identify the best candidates. We do so by using a joint analysis between visual features and text to estimate how good each new item could be, that is, whether or not is worth to learn on it. Preliminary results show the effectiveness of the proposed approach to improve classical data mining techniques.
Pattern Analysis and Applications, 2009
Traditionally, direct marketing companies have relied on pre-testing to select the best offers to... more Traditionally, direct marketing companies have relied on pre-testing to select the best offers to send to their audiences. Companies systematically dispatch the offers under consideration to a limited sample of potential buyers, rank them with respect to their performance and, based on this ranking, decide which offers to send to the wider population. Though this pre-testing process is simple and widely used, recently the direct marketing industry has been under increased pressure to further optimize learning, in particular when facing severe time and space constraints. Taking into account the multimedia nature of offers, which typically comprise both a visual and text component, we propose a two-phase learning strategy based on a cascade of regression methods. This proposed approach takes advantage of visual and text features to improve and accelerate the learning process. Experiments in the domain of a commercial multimedia messaging service show the effectiveness of the proposed methods that improve on classical learning techniques. The main contribution of the present work is to demonstrate that direct marketing firms can exploit the information on visual content to optimize the learning phase. The proposed methods can be used in any multimedia direct marketing domains in which offers are composed by image and text. Keywords Visual and text features Á Learning in time and space constrained domains Á Multimedia messaging services Á Direct marketing
Multimedia Tools and Applications, 2008
Traditionally, direct marketing companies have relied on pre-testing to select the best offers to... more Traditionally, direct marketing companies have relied on pre-testing to select the best offers to send to their audience. Companies systematically dispatch the offers under consideration to a limited sample of potential buyers, rank them with respect to their performance and, based on this ranking, decide which offers to send to the wider population. Though this pre-testing process is simple and widely used, recently the industry has been under increased pressure to further optimize learning, in particular when facing severe time and learning space constraints. The main contribution of the present work is to demonstrate that direct marketing firms can Multimed Tools Appl Service (MMS) show the effectiveness of the proposed methods and a significant improvement over traditional learning techniques. The proposed approach can be used in any multimedia direct marketing domain in which offers comprise both a visual and text component. Keywords Visual and text features • Learning in time and space constrained domains • Multimedia messaging services • Direct marketing Multimed Tools Appl
Metadata Mining for Image Understanding, 2008
Direct marketing companies systematically dispatch the offers under consideration to a limited sa... more Direct marketing companies systematically dispatch the offers under consideration to a limited sample of potential buyers, rank them with respect to their performance and, based on this ranking, decide which offers to send to the wider population. Though this pre-testing process is simple and widely used, recently the direct marketing industry has been under increased pressure to further optimize learning, in particular when facing severe time and space constraints. Taking into account the multimedia nature of offers, which typically comprise both a visual and text component, we propose a two-phase learning strategy based on a cascade of regression methods. This proposed approach takes advantage of visual and text features to improve and accelerate the learning process. Experiments in the domain of a commercial Multimedia Messaging Service (MMS) show the effectiveness of the proposed methods that improve on classical learning techniques.
Frontiers in Physics, 2021
In early January 2020, after China reported the first cases of the new coronavirus (SARS-CoV-2) i... more In early January 2020, after China reported the first cases of the new coronavirus (SARS-CoV-2) in the city of Wuhan, unreliable and not fully accurate information has started spreading faster than the virus itself. Alongside this pandemic, people have experienced a parallel infodemic, i.e., an overabundance of information, some of which is misleading or even harmful, which has widely spread around the globe. Although social media are increasingly being used as the information source, web search engines, such as Google or Yahoo!, still represent a powerful and trustworthy resource for finding information on the Web. This is due to their capability to capture the largest amount of information, helping users quickly identify the most relevant, useful, although not always the most reliable, results for their search queries. This study aims to detect potential misleading and fake contents by capturing and analysing textual information, which flow through search engines. By using a real-...
Management Science Journal of the Institute For Operations Research and the Management Sciences, Feb 1, 2000
... Graduate School of Management, 110 Westwood Plaza, Suite B518, University of California at Lo... more ... Graduate School of Management, 110 Westwood Plaza, Suite B518, University of California at Los Angeles, Los Angeles, California 90095-1481 Computer Science Department, University of California at Los Angeles, Los Angeles, California 90095 lee.cooper@anderson.ucla ...
Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery, Sep 4, 2000
... Application of feature subset selection techniques improves the performance. Our algorithm is... more ... Application of feature subset selection techniques improves the performance. Our algorithm is compu-tationally efficient being bounded by O n log n for n samples. 1 Introduction ... We now investigate whether a subset of these features can perform as well in text classification. ...
18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397), 2000
In a dynamic situation, data modalities range from static (e.g., spatial data representation, obj... more In a dynamic situation, data modalities range from static (e.g., spatial data representation, objects description) to dynamic (e.g., moving entities, evolving objects). Static data typically reside on local database systems while dynamic data are continuously generated from sensors or special events; we need to integrate both types of data for a realistic modeling of a dynamic situation. In this paper
Machine learning induction algorithms are difficult to scale to very large databases because of t... more Machine learning induction algorithms are difficult to scale to very large databases because of their memory-bound nature. Using virtual memory results to a significant performance degradation. To overcome such shortcomings, we developed a classification rule induction algorithm for relational databases. Our algorithm uses a bottom-up rule generation strategy that is more effective for mining databases having large cardinality of nominal variables. We have successfully used our algorithm to mine a retail grocery database containing more than 1.6 million records in about 5 hours on a dual Pentium processor PC.
Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243), 1998
Machine learning induction algorithms are difficult to scale to very large databases because of t... more Machine learning induction algorithms are difficult to scale to very large databases because of their memory-bound nature. Using virtual memory results to a significant performance degradation. To overcome such shortcomings, we developed a classification rule induction algorithm for relational databases. Our algorithm uses a bottom-up rule generation strategy that is more effective for mining databases having large cardinality of nominal variables. We have successfully used our algorithm to mine a retail grocery database containing more than 1.6 million records in about 5 hours on a dual Pentium processor PC.
Lecture Notes in Computer Science, 1999
... Information Sciences Laboratory HRL Laboratories 3011 Malibu Canyon Road Malibu, CA 90265, US... more ... Information Sciences Laboratory HRL Laboratories 3011 Malibu Canyon Road Malibu, CA 90265, USA shek@hrl.com, giovanni@wins.hrl.com, suhas@wins.hrl.com, skdao@hrl.com Abstract. ... CS Wallace and DL Dowe: Intrinsic Classification by MML - the SNOB Program. ...
Proceedings of the fifth ACM conference on Digital libraries - DL '00, 2000
Abstract The automatic document metadata extraction process is animportant task in a world where ... more Abstract The automatic document metadata extraction process is animportant task in a world where thousands of documents are just one``click''away. Thus, powerful indices are necessary to support effective retrieval. The upcoming XML standard represents an ...
In situ measurements have been the basis for monitoring volcanic gas emissions for many years and... more In situ measurements have been the basis for monitoring volcanic gas emissions for many years and-being complemented by remote sensing techniques-still play an important role to date. Concerning in situ techniques for sampling a dilute plume, an increase in accuracy and a reduction of detection limits are still necessary for most gases (e.g., CO 2 , SO 2 , HCl, HF, HBr, HI). In this work, the Raschig-Tube technique (RT) is modified and utilized for application on volcanic plumes. The theoretical and experimental absorption properties of the RT and the Drechsel bottle (DB) setups are characterized and both are applied simultaneously to the well-established Filter packs technique (FP) in the field (on Stromboli Island and Mount Etna). The comparison points out that FPs are the most practical to apply but the results are errorprone compared to RT and DB, whereas the RT results in up to 13 times higher analyte concentrations than the DB in the same sampling time. An optimization of the analytical procedure, including sample pretreatment and analysis by titration, Ion Chromatography, and Inductively Coupled Plasma Mass Spectrometry, led to a comprehensive data set covering a wide range of compounds. In particular, less abundant species were quantified more accurately and iodine was detected for the first time in Stromboli's plume. Simultaneously applying Multiaxis Differential Optical Absorption Spectroscopy (MAX-DOAS) the chemical transformation of emitted bromide into bromine monoxide (BrO) from Stromboli and Etna was determined to 3-6% and 7%, respectively, within less than 5 min after the gas release from the active vents.
Lecture Notes in Computer Science, 2000
Decision tree induction algorithms scale well to large datasets for their univariate and divide-a... more Decision tree induction algorithms scale well to large datasets for their univariate and divide-and-conquer approach. However, they may fail in discovering effective knowledge when the input dataset consists of a large number of uncorrelated many-valued attributes. In this paper we present an algorithm, Noah, that tackles this problem by applying a multivariate search. Performing a multivariate search leads to a much larger consumption of computation time and memory, this may be prohibitive for large datasets. We remedy this problem by exploiting effective pruning strategies and efficient data structures. We applied our algorithm to a real marketing application of cross-selling. Experimental results revealed that the application database was too complex for C4.5 as it failed to discover any useful knowledge. The application database was also too large for various well known rule discovery algorithms which were not able to complete their task. The pruning techniques used in Noah are general in nature and can be used in other mining systems.
Lecture Notes in Computer Science, 2000
... Application of feature subset selection techniques improves the performance. Our algorithm is... more ... Application of feature subset selection techniques improves the performance. Our algorithm is compu-tationally efficient being bounded by O n log n for n samples. 1 Introduction ... We now investigate whether a subset of these features can perform as well in text classification. ...
Uploads
Papers by Giovanni Giuffrida