Papers by Sebastiano Battiato
The focus of the chapter is related to review techniques for the automatic generation of good qua... more The focus of the chapter is related to review techniques for the automatic generation of good quality digital mosaics from raster images. Mosaics, in the digital realm, are illustrations composed by a set of small images called ”tiles”. The tiles tessellate an input image with the aim of reproducing the original visual information rendered into a mosaic-like style. This chapter will review the major different approaches for digital mosaic generation reporting a short description and a discussion about the most relevant and recent issues. Particular emphasis will be devoted to techniques able to generate artificial mosaics that emulates in some way ancient mosaics both in terms of tile positioning and tile cutting procedures. Visual comparisons among different approaches together with suggestions for future work will be also provided. 1
As a consequence of the social revolution we faced on the Web, news and information we daily enjo... more As a consequence of the social revolution we faced on the Web, news and information we daily enjoy may come from different and diverse sources which are not necessarily the traditional ones such as newspapers, either in their paper or online version, television, radio, etc. Everyone on the Web is allowed to produce and share news which can soon become viral if they follow the new media channels represented by social networks. This freedom in producing and sharing news comes with a counter-effect: the proliferation of fake news. Unfortunately, they can be very effective and may influence people and, more generally, the public opinion. We propose a combined approach of natural language and image processing that takes into account the semantics encoded within both text and images coming with news together with contextual information that may help in the classification of a news as fake or not.
Journal of Imaging
A stereopair consists of two pictures related to the same subject taken by two different points o... more A stereopair consists of two pictures related to the same subject taken by two different points of view. Since the two images contain a high amount of redundant information, new compression approaches and data formats are continuously proposed, which aim to reduce the space needed to store a stereoscopic image while preserving its quality. A standard for multi-picture image encoding is represented by the MPO format (Multi-Picture Object). The classic stereoscopic image compression approaches compute a disparity map between the two views, which is stored with one of the two views together with a residual image. An alternative approach, named adaptive stereoscopic image compression, encodes just the two views independently with different quality factors. Then, the redundancy between the two views is exploited to enhance the low quality image. In this paper, the problem of stereoscopic image compression is presented, with a focus on the adaptive stereoscopic compression approach, which...
Can you to tell which among the array of images are real, and which are CG?
To be able to score the aesthetic and emotional appealing of digital pictures through the usage o... more To be able to score the aesthetic and emotional appealing of digital pictures through the usage of ad-hoc computational frameworks is now affordable. It is possible to combine lowlevel features and composition rule to extract semantic issues devoted to isolate the degree of emotional appealing of the involved subject. We propose to assess the aesthetic quality assessment on a general set of photos focusing on consumer photos with faces. Taking into account local spatial relation between involved faces and coupling such information with simple composition rule an effective aesthetic scoring is obtained. A further contribution of the proposed solution is the novel usage of the involved facial expressions and relative pose to derive additional insights to the overall procedure. Preliminary experiments and comparisons with recent solution in the field confirm the effectiveness of the proposed tool.
Journal of imaging, 2021
A stereopair consists of two pictures related to the same subject taken by two different points o... more A stereopair consists of two pictures related to the same subject taken by two different points of view. Since the two images contain a high amount of redundant information, new compression approaches and data formats are continuously proposed, which aim to reduce the space needed to store a stereoscopic image while preserving its quality. A standard for multi-picture image encoding is represented by the MPO format (Multi-Picture Object). The classic stereoscopic image compression approaches compute a disparity map between the two views, which is stored with one of the two views together with a residual image. An alternative approach, named adaptive stereoscopic image compression, encodes just the two views independently with different quality factors. Then, the redundancy between the two views is exploited to enhance the low quality image. In this paper, the problem of stereoscopic image compression is presented, with a focus on the adaptive stereoscopic compression approach, which...
The JPEG compression algorithm has proven to be efficient in saving storage and preserving image ... more The JPEG compression algorithm has proven to be efficient in saving storage and preserving image quality thus becoming extremely popular. On the other hand, the overall process leaves traces into encoded signals which are typically exploited for forensic purposes: for instance, the compression parameters of the acquisition device (or editing software) could be inferred. To this aim, in this paper a novel technique to estimate “previous” JPEG quantization factors on images compressed multiple times, in the aligned case by analyzing statistical traces hidden on Discrete Cosine Transform (DCT) histograms is exploited. Experimental results on double, triple and quadruple compressed images, demonstrate the effectiveness of the proposed technique while unveiling further interesting insights.
The exploitation of traces in JPEG double compressed images is of utter importance for investigat... more The exploitation of traces in JPEG double compressed images is of utter importance for investigations. Properly exploiting such insights, First Quantization Estimation (FQE) could be performed in order to obtain source camera model identification (CMI) and therefore reconstruct the history of a digital image. In this paper, a method able to estimate the first quantization factors for JPEG double compressed images is presented, employing a mixed statistical and Machine Learning approach. The presented solution is demonstrated to work without any a-priori assumptions about the quantization matrices. Experimental results and comparisons with the state-of-the-art show the goodness of the proposed technique.
This study explores the feasibility of estimating the Body Condition Score (BCS) of cows from dig... more This study explores the feasibility of estimating the Body Condition Score (BCS) of cows from digital images by employing statistical shape analysis and regression machines. The shapes of body cows are described through a number of variations from a unique average shape. Specically, Kernel Principal Component Analysis is used to determine the components describing the many ways in which the body shape of dierent cows tend to deform from the average shape. This description is used for automatic estimation of BCS through regression approach. The proposed method has been tested on a new benchmark dataset available through the Internet. Experimental results conrm the eectiveness of the proposed technique that outperforms the state-of-the-art approaches proposed in the context of dairy cattle research.
2020 AEIT International Conference of Electrical and Electronic Technologies for Automotive (AEIT AUTOMOTIVE)
The continuing evolution of technologies in the automotive industry has led to the development of... more The continuing evolution of technologies in the automotive industry has led to the development of the socalled Advanced Driver Assistance Systems (ADAS). ADAS is the term used to describe vehicle-based intelligent safety systems designed to support the driver, with the aim to significantly improve his safety, and the driving safety in general. In terms of development, current ADAS technologies are based on control functions about the vehicle movements with respect to the objects and entities detected in the same environment (e.g., other vehicles, pedestrian, roads, etc.). However, there is an ever growing interest on the use of internal cameras to infer additional information regarding the driver status (e.g., weakness, level of attention). The purpose of such technologies is to provide accurate details about the environment in order to increase safety and smart driving. In the last few years, Computer Vision technology has achieved impressive results on several tasks related to recognition and detection of customized objects/entities on images and videos. However, automotive-grade devices' hardware resources are limited, with respect to the once usually required for the implementation of modern Computer Vision algorithms. In this work, we present a benchmarking evaluation of a standard Computer Vision algorithm for the driver behaviour monitoring through face detection and analysis, comparing the performances obtained on a common laptop with the same experiments on an existing commercial automotive-grade device based on the Accordo5 processor by STMicroelectronics.
2020 IEEE International Conference on Image Processing (ICIP)
Pollen grain classification has a remarkable role in many fields from medicine to biology and agr... more Pollen grain classification has a remarkable role in many fields from medicine to biology and agronomy. Indeed, automatic pollen grain classification is an important task for all related applications and areas. This work presents the first large-scale pollen grain image dataset, including more than 13 thousands objects. After an introduction to the problem of pollen grain classification and its motivations, the paper focuses on the employed data acquisition steps, which include aerobiological sampling, microscope image acquisition, object detection, segmentation and labelling. Furthermore, a baseline experimental assessment for the task of pollen classification on the built dataset, together with discussion on the achieved results, is presented.
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
With the spread of technology in several fields, there is an increasing demand to automate specia... more With the spread of technology in several fields, there is an increasing demand to automate specialized tasks that usually require human involvement in order to maximize efficiency and reduce processing time. Pollen identification and classification is a proper example to be treated in the Palynology field, which has been an expensive qualitative process, involving observation and discrimination of features by highly qualified experts. Although it is the most accurate and useful method, it is a time-consuming process that slowed down the research progress. In this paper, we present a dataset composed of more than 13.000 objects, identified by an appropriate segmentation pipeline applied on aerobiological samples. Besides, we present the results obtained from the classification of these objects by taking advantage of several Machine Learning techniques, discussing which approaches have produced the most satisfactory results, and outlining the challenges we had to face to accomplish the task.
Journal of Imaging
To properly contrast the Deepfake phenomenon the need to design new Deepfake detection algorithms... more To properly contrast the Deepfake phenomenon the need to design new Deepfake detection algorithms arises; the misuse of this formidable A.I. technology brings serious consequences in the private life of every involved person. State-of-the-art proliferates with solutions using deep neural networks to detect a fake multimedia content but unfortunately these algorithms appear to be neither generalizable nor explainable. However, traces left by Generative Adversarial Network (GAN) engines during the creation of the Deepfakes can be detected by analyzing ad-hoc frequencies. For this reason, in this paper we propose a new pipeline able to detect the so-called GAN Specific Frequencies (GSF) representing a unique fingerprint of the different generative architectures. By employing Discrete Cosine Transform (DCT), anomalous frequencies were detected. The β statistics inferred by the AC coefficients distribution have been the key to recognize GAN-engine generated data. Robustness tests were al...
Image Analysis and Processing - ICIAP 2017
Traditional methods for early detection of melanoma rely upon a dermatologist who visually analyz... more Traditional methods for early detection of melanoma rely upon a dermatologist who visually analyzes skin lesion using the so called ABCDE (Asymmetry, Border irregularity, Color variegation, Diameter, Evolution) criteria even though conclusive confirmation is obtained through biopsy performed by pathologist. The proposed method shows a bio-inspired feed-forward automatic pipeline based on morphological analysis and evaluation of skin lesion dermoscopy image. Preliminary segmentation and pre-processing of dermoscopy image by SC-Cellular Neural Networks is performed in order to get ad-hoc gray-level skin lesion image in which we compute analytic innovative hand-crafted image features for oncological risks assessment. At the end, pre-trained Levenberg-Marquardt Neural Network is used to perform ad-hoc clustering of such hand-crafted image features in order to get an efficient nevus discrimination (benign against melanoma) as well as a numerical array to be used for follow-up rate definition and assessment.
IEEE Access
Advances in Artificial Intelligence and Image Processing are changing the way people interacts wi... more Advances in Artificial Intelligence and Image Processing are changing the way people interacts with digital images and video. Widespread mobile apps like FACEAPP make use of the most advanced Generative Adversarial Networks (GAN) to produce extreme transformations on human face photos such gender swap, aging, etc. The results are utterly realistic and extremely easy to be exploited even for non-experienced users. This kind of media object took the name of Deepfake and raised a new challenge in the multimedia forensics field: the Deepfake detection challenge. Indeed, discriminating a Deepfake from a real image could be a difficult task even for human eyes but recent works are trying to apply the same technology used for generating images for discriminating them with preliminary good results but with many limitations: employed Convolutional Neural Networks are not so robust, demonstrate to be specific to the context and tend to extract semantics from images. In this paper, a new approach aimed to extract a Deepfake fingerprint from images is proposed. The method is based on the Expectation-Maximization algorithm trained to detect and extract a fingerprint that represents the Convolutional Traces (CT) left by GANs during image generation. The CT demonstrates to have high discriminative power achieving better results than state-of-the-art in the Deepfake detection task also proving to be robust to different attacks. Achieving an overall classification accuracy of over 98%, considering Deepfakes from 10 different GAN architectures not only involved in images of faces, the CT demonstrates to be reliable and without any dependence on image semantic. Finally, tests carried out on Deepfakes generated by FACEAPP achieving 93% of accuracy in the fake detection task, demonstrated the effectiveness of the proposed technique on a real-case scenario. INDEX TERMS Deepfake detection, generative adversarial networks, multimedia forensics, image forensics.
Journal of Imaging
This paper proposes a novel approach for semi-supervised domain adaptation for holistic regressio... more This paper proposes a novel approach for semi-supervised domain adaptation for holistic regression tasks, where a DNN predicts a continuous value y∈R given an input image x. The current literature generally lacks specific domain adaptation approaches for this task, as most of them mostly focus on classification. In the context of holistic regression, most of the real-world datasets not only exhibit a covariate (or domain) shift, but also a label gap—the target dataset may contain labels not included in the source dataset (and vice versa). We propose an approach tackling both covariate and label gap in a unified training framework. Specifically, a Generative Adversarial Network (GAN) is used to reduce covariate shift, and label gap is mitigated via label normalisation. To avoid overfitting, we propose a stopping criterion that simultaneously takes advantage of the Maximum Mean Discrepancy and the GAN Global Optimality condition. To restore the original label range—that was previously...
Image Analysis and Processing - ICIAP 2017
Image Forensics has already achieved great results for the source camera identification task on i... more Image Forensics has already achieved great results for the source camera identification task on images. Standard approaches for data coming from Social Network Platforms cannot be applied due to different processes involved (e.g., scaling, compression, etc.). In this paper, a classification engine for the reconstruction of the history of an image, is presented. Specifically, machine learning techniques and a-priori knowledge acquired through image analysis, we propose an automatic approach that can understand which Social Network Platform has processed an image and the software application used to perform the image upload. The engine makes use of proper alterations introduced by each platform as features. Results, in terms of global accuracy on a dataset of 2720 images, confirm the effectiveness of the proposed strategy.
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
The Deepfake phenomenon has become very popular nowadays thanks to the possibility to create incr... more The Deepfake phenomenon has become very popular nowadays thanks to the possibility to create incredibly realistic images using deep learning tools, based mainly on adhoc Generative Adversarial Networks (GAN). In this work we focus on the analysis of Deepfakes of human faces with the objective of creating a new detection method able to detect a forensics trace hidden in images: a sort of fingerprint left in the image generation process. The proposed technique, by means of an Expectation Maximization (EM) algorithm, extracts a set of local features specifically addressed to model the underlying convolutional generative process. Ad-hoc validation has been employed through experimental tests with naive classifiers on five different architectures (GDWCT, STARGAN, ATTGAN, STYLEGAN, STYLEGAN2) against the CELEBA dataset as ground-truth for non-fakes. Results demonstrated the effectiveness of the technique in distinguishing the different architectures and the corresponding generation process.
2018 International Conference on Content-Based Multimedia Indexing (CBMI)
Visual Sentiment Analysis aims to estimate the polarity of the sentiment evoked by images in term... more Visual Sentiment Analysis aims to estimate the polarity of the sentiment evoked by images in terms of positive or negative sentiment. To this aim, most of the state of the art works exploit the text associated to a social post provided by the user. However, such textual data is typically noisy due to the subjectivity of the user which usually includes text useful to maximize the diffusion of the social post. In this paper we extract and employ an Objective Text description of images automatically extracted from the visual content rather than the classic Subjective Text provided by the users. The proposed method defines a multimodal embedding space based on the contribute of both visual and textual features. The sentiment polarity is then inferred by a supervised Support Vector Machine trained on the representations of the obtained embedding space. Experiments performed on a representative dataset of 47235 labelled samples demonstrate that the exploitation of the proposed Objective Text helps to outperform state-of-the-art for sentiment polarity estimation.
Journal of Imaging
The identification of printed materials is a critical and challenging issue for security purposes... more The identification of printed materials is a critical and challenging issue for security purposes, especially when it comes to documents such as banknotes, tickets, or rare collectable cards: eligible targets for ad hoc forgery. State-of-the-art methods require expensive and specific industrial equipment, while a low-cost, fast, and reliable solution for document identification is increasingly needed in many contexts. This paper presents a method to generate a robust fingerprint, by the extraction of translucent patterns from paper sheets, and exploiting the peculiarities of binary pattern descriptors. A final descriptor is generated by employing a block-based solution followed by principal component analysis (PCA), to reduce the overall data to be processed. To validate the robustness of the proposed method, a novel dataset was created and recognition tests were performed under both ideal and noisy conditions.
Uploads
Papers by Sebastiano Battiato