Skip to main content

Giovanni Giuffrida

Followers

21

Following

11

Co-authors

10

Public Views

Computer scientist with a twists for the social sciences and how these are impacted by computers, data, and algorithms.

less

Sebastiano Battiato

Università di Catania

Vladimir Trajkovik

Ss. Cyril and Methodius University (UKIM) (Univerzitet "Sv. Kiril i Metodij" - Skopje)

La Trobe University

Università di Catania

Anatole Gershman

Carnegie Mellon University

chaabane Djeraba

Imperial College London

Mohammed Maniruzzaman

Bradley University

Interests

Uploads

Papers by Giovanni Giuffrida

Data mining learning bootstrap through semantic thumbnail analysis

Proceedings of SPIE - The International Society for Optical Engineering, 2007

The rapid increase of technological innovations in the mobile phone industry induces the research... more The rapid increase of technological innovations in the mobile phone industry induces the research community to develop new and advanced systems to optimize services offered by mobile phones operators (telcos) to maximize their effectiveness and improve their business. Data mining algorithms can run over data produced by mobile phones usage (e.g. image, video, text and logs files) to discover user's preferences and predict the most likely (to be purchased) offer for each individual customer. One of the main challenges is the reduction of the learning time and cost of these automatic tasks. In this paper we discuss an experiment where a commercial offer is composed by a small picture augmented with a short text describing the offer itself. Each customer's purchase is properly logged with all relevant information. Upon arrival of new items we need to learn who the best customers (prospects) for each item are, that is, the ones most likely to be interested in purchasing that specific item. Such learning activity is time consuming and, in our specific case, is not applicable given the large number of new items arriving every day. Basically, given the current customer base we are not able to learn on all new items. Thus, we need somehow to select among those new items to identify the best candidates. We do so by using a joint analysis between visual features and text to estimate how good each new item could be, that is, whether or not is worth to learn on it. Preliminary results show the effectiveness of the proposed approach to improve classical data mining techniques.

Exploiting visual and text features for direct marketing learning in time and space constrained domains

Pattern Analysis and Applications, 2009

Traditionally, direct marketing companies have relied on pre-testing to select the best offers to... more Traditionally, direct marketing companies have relied on pre-testing to select the best offers to send to their audiences. Companies systematically dispatch the offers under consideration to a limited sample of potential buyers, rank them with respect to their performance and, based on this ranking, decide which offers to send to the wider population. Though this pre-testing process is simple and widely used, recently the direct marketing industry has been under increased pressure to further optimize learning, in particular when facing severe time and space constraints. Taking into account the multimedia nature of offers, which typically comprise both a visual and text component, we propose a two-phase learning strategy based on a cascade of regression methods. This proposed approach takes advantage of visual and text features to improve and accelerate the learning process. Experiments in the domain of a commercial multimedia messaging service show the effectiveness of the proposed methods that improve on classical learning techniques. The main contribution of the present work is to demonstrate that direct marketing firms can exploit the information on visual content to optimize the learning phase. The proposed methods can be used in any multimedia direct marketing domains in which offers are composed by image and text. Keywords Visual and text features Á Learning in time and space constrained domains Á Multimedia messaging services Á Direct marketing

Using visual and text features for direct marketing on multimedia messaging services domain

Multimedia Tools and Applications, 2008

Traditionally, direct marketing companies have relied on pre-testing to select the best offers to... more Traditionally, direct marketing companies have relied on pre-testing to select the best offers to send to their audience. Companies systematically dispatch the offers under consideration to a limited sample of potential buyers, rank them with respect to their performance and, based on this ranking, decide which offers to send to the wider population. Though this pre-testing process is simple and widely used, recently the industry has been under increased pressure to further optimize learning, in particular when facing severe time and learning space constraints. The main contribution of the present work is to demonstrate that direct marketing firms can Multimed Tools Appl Service (MMS) show the effectiveness of the proposed methods and a significant improvement over traditional learning techniques. The proposed approach can be used in any multimedia direct marketing domain in which offers comprise both a visual and text component. Keywords Visual and text features • Learning in time and space constrained domains • Multimedia messaging services • Direct marketing Multimed Tools Appl

Combining Visual and Text Features for Learning in Multimedia Direct Marketing Domain

Metadata Mining for Image Understanding, 2008

Direct marketing companies systematically dispatch the offers under consideration to a limited sa... more Direct marketing companies systematically dispatch the offers under consideration to a limited sample of potential buyers, rank them with respect to their performance and, based on this ranking, decide which offers to send to the wider population. Though this pre-testing process is simple and widely used, recently the direct marketing industry has been under increased pressure to further optimize learning, in particular when facing severe time and space constraints. Taking into account the multimedia nature of offers, which typically comprise both a visual and text component, we propose a two-phase learning strategy based on a cascade of regression methods. This proposed approach takes advantage of visual and text features to improve and accelerate the learning process. Experiments in the domain of a commercial Multimedia Messaging Service (MMS) show the effectiveness of the proposed methods that improve on classical learning techniques.

Detection of Fake News on COVID-19 on Web Search Engines

Frontiers in Physics, 2021

In early January 2020, after China reported the first cases of the new coronavirus (SARS-CoV-2) i... more In early January 2020, after China reported the first cases of the new coronavirus (SARS-CoV-2) in the city of Wuhan, unreliable and not fully accurate information has started spreading faster than the virus itself. Alongside this pandemic, people have experienced a parallel infodemic, i.e., an overabundance of information, some of which is misleading or even harmful, which has widely spread around the globe. Although social media are increasingly being used as the information source, web search engines, such as Google or Yahoo!, still represent a powerful and trustworthy resource for finding information on the Web. This is due to their capability to capture the largest amount of information, helping users quickly identify the most relevant, useful, although not always the most reliable, results for their search queries. This study aims to detect potential misleading and fake contents by capturing and analysing textual information, which flow through search engines. By using a real-...

System to forecast performance of online news articles to suggest the optimal homepage layout to maximize article readership and readers stickiness

Method and System for the Sale of Products Through the Internet by Displaying Advertising Banners

Turning Datamining into a Management Science Tool: New Algorithms and Empirical Results.: New Algorithms and Empirical Results

Management Science Journal of the Institute For Operations Research and the Management Sciences, Feb 1, 2000

... Graduate School of Management, 110 Westwood Plaza, Suite B518, University of California at Lo... more

A Fast Algorithm for Hierarchical Text Classification

Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery, Sep 4, 2000

... Application of feature subset selection techniques improves the performance. Our algorithm is... more

Knowledge based dynamic modeling system for sensor fusion applications

18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397), 2000

In a dynamic situation, data modalities range from static (e.g., spatial data representation, obj... more In a dynamic situation, data modalities range from static (e.g., spatial data representation, objects description) to dynamic (e.g., moving entities, evolving objects). Static data typically reside on local database systems while dynamic data are continuously generated from sensors or special events; we need to integrate both types of data for a realistic modeling of a dynamic situation. In this paper

Method and apparatus for automatically extracting metadata from electronic documents using spatial rules

A Scalable Bottom-Up Data Mining Algorithm for Relational Databases

by Giovanni Giuffrida and Wesley Chu

Machine learning induction algorithms are difficult to scale to very large databases because of t... more Machine learning induction algorithms are difficult to scale to very large databases because of their memory-bound nature. Using virtual memory results to a significant performance degradation. To overcome such shortcomings, we developed a classification rule induction algorithm for relational databases. Our algorithm uses a bottom-up rule generation strategy that is more effective for mining databases having large cardinality of nominal variables. We have successfully used our algorithm to mine a retail grocery database containing more than 1.6 million records in about 5 hours on a dual Pentium processor PC.

A scalable bottom-up data mining algorithm for relational databases

Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243), 1998

Machine learning induction algorithms are difficult to scale to very large databases because of t... more Machine learning induction algorithms are difficult to scale to very large databases because of their memory-bound nature. Using virtual memory results to a significant performance degradation. To overcome such shortcomings, we developed a classification rule induction algorithm for relational databases. Our algorithm uses a bottom-up rule generation strategy that is more effective for mining databases having large cardinality of nominal variables. We have successfully used our algorithm to mine a retail grocery database containing more than 1.6 million records in about 5 hours on a dual Pentium processor PC.

A Recommendation Algorithm for Personalized Online News based on Collective Intelligence and Content

Dynamic Spatial Clustering for Intelligent Mobile Information Sharing and Dissemination

Lecture Notes in Computer Science, 1999

... Information Sciences Laboratory HRL Laboratories 3011 Malibu Canyon Road Malibu, CA 90265, US... more

Knowledge-based metadata extraction from PostScript files

Proceedings of the fifth ACM conference on Digital libraries - DL '00, 2000

Abstract The automatic document metadata extraction process is animportant task in a world where ... more

Epl: event pattern language

Active alkaline traps to determine acidic‐gas ratios in volcanic plumes: Sampling techniques and analytical methods

In situ measurements have been the basis for monitoring volcanic gas emissions for many years and... more In situ measurements have been the basis for monitoring volcanic gas emissions for many years and-being complemented by remote sensing techniques-still play an important role to date. Concerning in situ techniques for sampling a dilute plume, an increase in accuracy and a reduction of detection limits are still necessary for most gases (e.g., CO 2 , SO 2 , HCl, HF, HBr, HI). In this work, the Raschig-Tube technique (RT) is modified and utilized for application on volcanic plumes. The theoretical and experimental absorption properties of the RT and the Drechsel bottle (DB) setups are characterized and both are applied simultaneously to the well-established Filter packs technique (FP) in the field (on Stromboli Island and Mount Etna). The comparison points out that FPs are the most practical to apply but the results are errorprone compared to RT and DB, whereas the RT results in up to 13 times higher analyte concentrations than the DB in the same sampling time. An optimization of the analytical procedure, including sample pretreatment and analysis by titration, Ion Chromatography, and Inductively Coupled Plasma Mass Spectrometry, led to a comprehensive data set covering a wide range of compounds. In particular, less abundant species were quantified more accurately and iodine was detected for the first time in Stromboli's plume. Simultaneously applying Multiaxis Differential Optical Absorption Spectroscopy (MAX-DOAS) the chemical transformation of emitted bromide into bromine monoxide (BrO) from Stromboli and Etna was determined to 3-6% and 7%, respectively, within less than 5 min after the gas release from the active vents.

Mining Classification Rules from Datasets with Large Number of Many-Valued Attributes

Lecture Notes in Computer Science, 2000

Decision tree induction algorithms scale well to large datasets for their univariate and divide-a... more Decision tree induction algorithms scale well to large datasets for their univariate and divide-and-conquer approach. However, they may fail in discovering effective knowledge when the input dataset consists of a large number of uncorrelated many-valued attributes. In this paper we present an algorithm, Noah, that tackles this problem by applying a multivariate search. Performing a multivariate search leads to a much larger consumption of computation time and memory, this may be prohibitive for large datasets. We remedy this problem by exploiting effective pruning strategies and efficient data structures. We applied our algorithm to a real marketing application of cross-selling. Experimental results revealed that the application database was too complex for C4.5 as it failed to discover any useful knowledge. The application database was also too large for various well known rule discovery algorithms which were not able to complete their task. The pruning techniques used in Noah are general in nature and can be used in other mining systems.

A Fast Algorithm for Hierarchical Text Classification

Lecture Notes in Computer Science, 2000

... Application of feature subset selection techniques improves the performance. Our algorithm is... more

Data mining learning bootstrap through semantic thumbnail analysis

Proceedings of SPIE - The International Society for Optical Engineering, 2007

The rapid increase of technological innovations in the mobile phone industry induces the research... more The rapid increase of technological innovations in the mobile phone industry induces the research community to develop new and advanced systems to optimize services offered by mobile phones operators (telcos) to maximize their effectiveness and improve their business. Data mining algorithms can run over data produced by mobile phones usage (e.g. image, video, text and logs files) to discover user's preferences and predict the most likely (to be purchased) offer for each individual customer. One of the main challenges is the reduction of the learning time and cost of these automatic tasks. In this paper we discuss an experiment where a commercial offer is composed by a small picture augmented with a short text describing the offer itself. Each customer's purchase is properly logged with all relevant information. Upon arrival of new items we need to learn who the best customers (prospects) for each item are, that is, the ones most likely to be interested in purchasing that specific item. Such learning activity is time consuming and, in our specific case, is not applicable given the large number of new items arriving every day. Basically, given the current customer base we are not able to learn on all new items. Thus, we need somehow to select among those new items to identify the best candidates. We do so by using a joint analysis between visual features and text to estimate how good each new item could be, that is, whether or not is worth to learn on it. Preliminary results show the effectiveness of the proposed approach to improve classical data mining techniques.

Exploiting visual and text features for direct marketing learning in time and space constrained domains

Pattern Analysis and Applications, 2009

Traditionally, direct marketing companies have relied on pre-testing to select the best offers to... more Traditionally, direct marketing companies have relied on pre-testing to select the best offers to send to their audiences. Companies systematically dispatch the offers under consideration to a limited sample of potential buyers, rank them with respect to their performance and, based on this ranking, decide which offers to send to the wider population. Though this pre-testing process is simple and widely used, recently the direct marketing industry has been under increased pressure to further optimize learning, in particular when facing severe time and space constraints. Taking into account the multimedia nature of offers, which typically comprise both a visual and text component, we propose a two-phase learning strategy based on a cascade of regression methods. This proposed approach takes advantage of visual and text features to improve and accelerate the learning process. Experiments in the domain of a commercial multimedia messaging service show the effectiveness of the proposed methods that improve on classical learning techniques. The main contribution of the present work is to demonstrate that direct marketing firms can exploit the information on visual content to optimize the learning phase. The proposed methods can be used in any multimedia direct marketing domains in which offers are composed by image and text. Keywords Visual and text features Á Learning in time and space constrained domains Á Multimedia messaging services Á Direct marketing

Using visual and text features for direct marketing on multimedia messaging services domain

Multimedia Tools and Applications, 2008

Traditionally, direct marketing companies have relied on pre-testing to select the best offers to... more Traditionally, direct marketing companies have relied on pre-testing to select the best offers to send to their audience. Companies systematically dispatch the offers under consideration to a limited sample of potential buyers, rank them with respect to their performance and, based on this ranking, decide which offers to send to the wider population. Though this pre-testing process is simple and widely used, recently the industry has been under increased pressure to further optimize learning, in particular when facing severe time and learning space constraints. The main contribution of the present work is to demonstrate that direct marketing firms can Multimed Tools Appl Service (MMS) show the effectiveness of the proposed methods and a significant improvement over traditional learning techniques. The proposed approach can be used in any multimedia direct marketing domain in which offers comprise both a visual and text component. Keywords Visual and text features • Learning in time and space constrained domains • Multimedia messaging services • Direct marketing Multimed Tools Appl

Combining Visual and Text Features for Learning in Multimedia Direct Marketing Domain

Metadata Mining for Image Understanding, 2008

Direct marketing companies systematically dispatch the offers under consideration to a limited sa... more Direct marketing companies systematically dispatch the offers under consideration to a limited sample of potential buyers, rank them with respect to their performance and, based on this ranking, decide which offers to send to the wider population. Though this pre-testing process is simple and widely used, recently the direct marketing industry has been under increased pressure to further optimize learning, in particular when facing severe time and space constraints. Taking into account the multimedia nature of offers, which typically comprise both a visual and text component, we propose a two-phase learning strategy based on a cascade of regression methods. This proposed approach takes advantage of visual and text features to improve and accelerate the learning process. Experiments in the domain of a commercial Multimedia Messaging Service (MMS) show the effectiveness of the proposed methods that improve on classical learning techniques.

Detection of Fake News on COVID-19 on Web Search Engines

Frontiers in Physics, 2021

In early January 2020, after China reported the first cases of the new coronavirus (SARS-CoV-2) i... more In early January 2020, after China reported the first cases of the new coronavirus (SARS-CoV-2) in the city of Wuhan, unreliable and not fully accurate information has started spreading faster than the virus itself. Alongside this pandemic, people have experienced a parallel infodemic, i.e., an overabundance of information, some of which is misleading or even harmful, which has widely spread around the globe. Although social media are increasingly being used as the information source, web search engines, such as Google or Yahoo!, still represent a powerful and trustworthy resource for finding information on the Web. This is due to their capability to capture the largest amount of information, helping users quickly identify the most relevant, useful, although not always the most reliable, results for their search queries. This study aims to detect potential misleading and fake contents by capturing and analysing textual information, which flow through search engines. By using a real-...

System to forecast performance of online news articles to suggest the optimal homepage layout to maximize article readership and readers stickiness

Method and System for the Sale of Products Through the Internet by Displaying Advertising Banners

Turning Datamining into a Management Science Tool: New Algorithms and Empirical Results.: New Algorithms and Empirical Results

Management Science Journal of the Institute For Operations Research and the Management Sciences, Feb 1, 2000

... Graduate School of Management, 110 Westwood Plaza, Suite B518, University of California at Lo... more

A Fast Algorithm for Hierarchical Text Classification

Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery, Sep 4, 2000

... Application of feature subset selection techniques improves the performance. Our algorithm is... more

Knowledge based dynamic modeling system for sensor fusion applications

18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397), 2000

In a dynamic situation, data modalities range from static (e.g., spatial data representation, obj... more In a dynamic situation, data modalities range from static (e.g., spatial data representation, objects description) to dynamic (e.g., moving entities, evolving objects). Static data typically reside on local database systems while dynamic data are continuously generated from sensors or special events; we need to integrate both types of data for a realistic modeling of a dynamic situation. In this paper

Method and apparatus for automatically extracting metadata from electronic documents using spatial rules

A Scalable Bottom-Up Data Mining Algorithm for Relational Databases

by Giovanni Giuffrida and Wesley Chu

Machine learning induction algorithms are difficult to scale to very large databases because of t... more Machine learning induction algorithms are difficult to scale to very large databases because of their memory-bound nature. Using virtual memory results to a significant performance degradation. To overcome such shortcomings, we developed a classification rule induction algorithm for relational databases. Our algorithm uses a bottom-up rule generation strategy that is more effective for mining databases having large cardinality of nominal variables. We have successfully used our algorithm to mine a retail grocery database containing more than 1.6 million records in about 5 hours on a dual Pentium processor PC.

A scalable bottom-up data mining algorithm for relational databases

Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243), 1998

Machine learning induction algorithms are difficult to scale to very large databases because of t... more Machine learning induction algorithms are difficult to scale to very large databases because of their memory-bound nature. Using virtual memory results to a significant performance degradation. To overcome such shortcomings, we developed a classification rule induction algorithm for relational databases. Our algorithm uses a bottom-up rule generation strategy that is more effective for mining databases having large cardinality of nominal variables. We have successfully used our algorithm to mine a retail grocery database containing more than 1.6 million records in about 5 hours on a dual Pentium processor PC.

A Recommendation Algorithm for Personalized Online News based on Collective Intelligence and Content

Dynamic Spatial Clustering for Intelligent Mobile Information Sharing and Dissemination

Lecture Notes in Computer Science, 1999

... Information Sciences Laboratory HRL Laboratories 3011 Malibu Canyon Road Malibu, CA 90265, US... more

Knowledge-based metadata extraction from PostScript files

Proceedings of the fifth ACM conference on Digital libraries - DL '00, 2000

Abstract The automatic document metadata extraction process is animportant task in a world where ... more

Epl: event pattern language

Active alkaline traps to determine acidic‐gas ratios in volcanic plumes: Sampling techniques and analytical methods

In situ measurements have been the basis for monitoring volcanic gas emissions for many years and... more In situ measurements have been the basis for monitoring volcanic gas emissions for many years and-being complemented by remote sensing techniques-still play an important role to date. Concerning in situ techniques for sampling a dilute plume, an increase in accuracy and a reduction of detection limits are still necessary for most gases (e.g., CO 2 , SO 2 , HCl, HF, HBr, HI). In this work, the Raschig-Tube technique (RT) is modified and utilized for application on volcanic plumes. The theoretical and experimental absorption properties of the RT and the Drechsel bottle (DB) setups are characterized and both are applied simultaneously to the well-established Filter packs technique (FP) in the field (on Stromboli Island and Mount Etna). The comparison points out that FPs are the most practical to apply but the results are errorprone compared to RT and DB, whereas the RT results in up to 13 times higher analyte concentrations than the DB in the same sampling time. An optimization of the analytical procedure, including sample pretreatment and analysis by titration, Ion Chromatography, and Inductively Coupled Plasma Mass Spectrometry, led to a comprehensive data set covering a wide range of compounds. In particular, less abundant species were quantified more accurately and iodine was detected for the first time in Stromboli's plume. Simultaneously applying Multiaxis Differential Optical Absorption Spectroscopy (MAX-DOAS) the chemical transformation of emitted bromide into bromine monoxide (BrO) from Stromboli and Etna was determined to 3-6% and 7%, respectively, within less than 5 min after the gas release from the active vents.

Mining Classification Rules from Datasets with Large Number of Many-Valued Attributes

Lecture Notes in Computer Science, 2000

Decision tree induction algorithms scale well to large datasets for their univariate and divide-a... more Decision tree induction algorithms scale well to large datasets for their univariate and divide-and-conquer approach. However, they may fail in discovering effective knowledge when the input dataset consists of a large number of uncorrelated many-valued attributes. In this paper we present an algorithm, Noah, that tackles this problem by applying a multivariate search. Performing a multivariate search leads to a much larger consumption of computation time and memory, this may be prohibitive for large datasets. We remedy this problem by exploiting effective pruning strategies and efficient data structures. We applied our algorithm to a real marketing application of cross-selling. Experimental results revealed that the application database was too complex for C4.5 as it failed to discover any useful knowledge. The application database was also too large for various well known rule discovery algorithms which were not able to complete their task. The pruning techniques used in Noah are general in nature and can be used in other mining systems.

A Fast Algorithm for Hierarchical Text Classification

Lecture Notes in Computer Science, 2000

... Application of feature subset selection techniques improves the performance. Our algorithm is... more