Papers by Duc Thanh Nguyen
Research Square (Research Square), May 15, 2024
arXiv (Cornell University), Aug 27, 2020
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023
arXiv (Cornell University), Apr 1, 2018
Lecture Notes in Computer Science, 2022
arXiv (Cornell University), Aug 13, 2019
arXiv (Cornell University), Oct 19, 2016
Social Science Research Network, 2022
Multimedia Tools and Applications
Deep learning has been applied to achieve significant progress in emotion recognition from multim... more Deep learning has been applied to achieve significant progress in emotion recognition from multimedia data. Despite such substantial progress, existing approaches are hindered by insufficient training data, leading to weak generalisation under mismatched conditions. To address these challenges, we propose a learning strategy which jointly transfers emotional knowledge learnt from rich datasets to source-poor datasets. Our method is also able to learn cross-domain features, leading to improved recognition performance. To demonstrate the robustness of the proposed learning strategy, we conducted extensive experiments on several benchmark datasets including eNTERFACE, SAVEE, EMODB, and RAVDESS. Experimental results show that the proposed method surpassed existing transfer learning schemes by a significant margin.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Out-of-distribution (OOD) generalisation aims to build a model that can generalise well on an uns... more Out-of-distribution (OOD) generalisation aims to build a model that can generalise well on an unseen target domain using knowledge from multiple source domains. To this end, the model should seek the causal dependence between inputs and labels, which may be determined by the semantics of inputs and remain invariant across domains. However, statistical or non-causal methods often cannot capture this dependence and perform poorly due to not considering spurious correlations learnt from model training via unobserved confounders. A well-known existing causal inference method like back-door adjustment cannot be applied to remove spurious correlations as it requires the observation of confounders. In this paper, we propose a novel method that effectively deals with hidden confounders by successfully implementing front-door adjustment (FA). FA requires the choice of a mediator, which we regard as the semantic information of images that helps access the causal mechanism without the need for observing confounders. Further, we propose to estimate the combination of the mediator with other observed images in the front-door formula via style transfer algorithms. Our use of style transfer to estimate FA is novel and sensible for OOD generalisation, which we justify by extensive experimental results on widely used benchmark datasets. CCS CONCEPTS • Computing methodologies → Causal reasoning and diagnostics; Learning under covariate shift; Computer vision.
SSRN Electronic Journal, 2022
Deep learning has been successfully applied to solve various complex problems ranging from big da... more Deep learning has been successfully applied to solve various complex problems ranging from big data analytics to computer vision and human-level control. Deep learning advances however have also been employed to create software that can cause threats to privacy, democracy and national security. One of those deep learning-powered applications recently emerged is deepfake. Deepfake algorithms can create fake images and videos that humans cannot distinguish them from authentic ones. The proposal of technologies that can automatically detect and assess the integrity of digital visual media is therefore indispensable. This paper presents a survey of algorithms used to create deepfakes and, more importantly, methods proposed to detect deepfakes in the literature to date. We present extensive discussions on challenges, research trends and directions related to deepfake technologies. By reviewing the background of deepfakes and state-of-the-art deepfake detection methods, this study provides a comprehensive overview of deepfake techniques and facilitates the development of new and more robust methods to deal with the increasingly challenging deepfakes. Impact Statement-This survey provides a timely overview of deepfake creation and detection methods and presents a broad discussion of challenges, potential trends, and future directions. We conduct the survey with a different perspective and taxonomy compared to existing survey papers on the same topic. Informative graphics are provided for guiding readers through the latest development in deepfake research. The methods surveyed are comprehensive and will be valuable to the artificial intelligence community in tackling the current challenges of deepfakes.
Proceedings of the 26th International Conference on World Wide Web Companion - WWW '17 Companion, 2017
From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) s... more From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting "insufficient sleep", a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at midlevel and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coeffi-©2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License.
Communications in Computer and Information Science, 2019
Measurement of population health outcomes is critical to understanding the health status of commu... more Measurement of population health outcomes is critical to understanding the health status of communities and thus enabling the development of appropriate health-care programmes for the communities. This task acquires the prediction of population health status to be fast and accurate yet scalable to different population sizes. To satisfy these requirements, this paper proposes a method for automatic prediction of population health outcomes from social media using Set Probabilistic Distance Features (SPDF). The proposed SPDF are mid-level features built upon the similarity in posting patterns between populations. Our proposed SPDF hold several advantages. Firstly, they can be applied to various low-level features. Secondly, our SPDF fit well problems with weakly labelled data, i.e., only the labels of sets are available while the labels of sets' elements are not explicitly provided. We thoroughly evaluate our approach in the task of prediction of health indices of counties in the US via a large-scale dataset collected from Twitter. We also apply our proposed SPDF to two different textual features including latent topics and linguistic styles. We conduct two case studies: acrossyear vs across-county prediction. The performance of the approach is validated against the Behavioral Risk Factor Surveillance System surveys. Experimental results show that the proposed approach achieves state-of-the-art performance on linguistic style features in prediction of all health indices and in both case studies.
ArXiv, 2020
Deep learning has been applied to achieve significant progress in emotion recognition. Despite su... more Deep learning has been applied to achieve significant progress in emotion recognition. Despite such substantial progress, existing approaches are still hindered by insufficient training data, and the resulting models do not generalize well under mismatched conditions. To address this challenge, we propose a learning strategy which jointly transfers the knowledge learned from rich datasets to source-poor datasets. Our method is also able to learn cross-domain features which lead to improved recognition performance. To demonstrate the robustness of our proposed framework, we conducted experiments on three benchmark emotion datasets including eNTERFACE, SAVEE, and EMODB. Experimental results show that the proposed method surpassed state-of-the-art transfer learning schemes by a significant margin.
2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019
Deep learning techniques for point cloud data have demonstrated great potentials in solving class... more Deep learning techniques for point cloud data have demonstrated great potentials in solving classical problems in 3D computer vision such as 3D object classification and segmentation. Several recent 3D object classification methods have reported state-of-the-art performance on CAD model datasets such as ModelNet40 with high accuracy (∼92%). Despite such impressive results, in this paper, we argue that object classification is still a challenging task when objects are framed with real-world settings. To prove this, we introduce ScanObjectNN, a new real-world point cloud object dataset based on scanned indoor scene data. From our comprehensive benchmark, we show that our dataset poses great challenges to existing point cloud classification techniques as objects from real-world scans are often cluttered with background and/or are partial due to occlusions. We identify three key open problems for point cloud object classification, and propose new point cloud classification neural networks that achieve state-of-the-art performance on classifying objects with cluttered background. Our dataset and code are publicly available in our project page 1 .
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Deep learning techniques have become the to-go models for most vision-related tasks on 2D images.... more Deep learning techniques have become the to-go models for most vision-related tasks on 2D images. However, their power has not been fully realised on several tasks in 3D space, e.g., 3D scene understanding. In this work, we jointly address the problems of semantic and instance segmentation of 3D point clouds. Specifically, we develop a multi-task pointwise network that simultaneously performs two tasks: predicting the semantic classes of 3D points and embedding the points into high-dimensional vectors so that points of the same object instance are represented by similar embeddings. We then propose a multi-value conditional random field model to incorporate the semantic and instance labels and formulate the problem of semantic and instance segmentation as jointly optimising labels in the field model. The proposed method is thoroughly evaluated and compared with existing methods on different indoor scene datasets including S3DIS and SceneNN. Experimental results showed the robustness of the proposed joint semanticinstance segmentation scheme over its single components. Our method also achieved state-of-the-art performance on semantic segmentation.
Neural Computing and Applications
Deep learning has been widely adopted in automatic emotion recognition and has lead to significan... more Deep learning has been widely adopted in automatic emotion recognition and has lead to significant progress in the field. However, due to insufficient training data, pre-trained models are limited in their generalisation ability, leading to poor performance on novel test sets. To mitigate this challenge, transfer learning performed by fine-tuning pr-etrained models on novel domains has been applied. However, the fine-tuned knowledge may overwrite and/or discard important knowledge learnt in pre-trained models. In this paper, we address this issue by proposing a PathNet-based meta-transfer learning method that is able to (i) transfer emotional knowledge learnt from one visual/audio emotion domain to another domain and (ii) transfer emotional knowledge learnt from multiple audio emotion domains to one another to improve overall emotion recognition accuracy. To show the robustness of our proposed method, extensive experiments on facial expression-based emotion recognition and speech em...
MMSP 2011 - IEEE International Workshop on Multimedia Signal Processing, 2011
This paper presents a novel and low complexity method for real-time video-based smoke detection. ... more This paper presents a novel and low complexity method for real-time video-based smoke detection. As a local texture operator, Non-Redundant Local Binary Pattern (NRLBP) is more discriminative and robust to illumination changes in comparison with original Local Binary Pattern (LBP), thus is employed to encode the appearance information of smoke. Non-Redundant Local Motion Binary Pattern (NRLMBP), which is computed on the difference image of consecutive frames, is introduced to capture the motion information of smoke. Experimental results show that NRLBP outperforms the original LBP in the smoke detection task. Furthermore, the combination of NRLBP and NRLMBP, which can be considered as a spatial-temporal descriptor of smoke, can lead to remarkable improvement on detection performance.
2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014
This paper proposes a novel mean field-based Chamfer template matching method. In our method, eac... more This paper proposes a novel mean field-based Chamfer template matching method. In our method, each template is represented as a field model and matching a template with an input image is formulated as estimation of a maximum of posteriori in the field model. Variational approach is then adopted to approximate the estimation. The proposed method was applied for two different variants of Chamfer template matching and evaluated through the task of object detection. Experimental results on benchmark datasets including ETHZShapeClass and INRIAHorse have shown that the proposed method could significantly improve the accuracy of template matching while not sacrificing much of the efficiency. Comparisons with other recent template matching algorithms have also shown the robustness of the proposed method.
ArXiv, 2022
Object reconstruction from 3D point clouds has achieved impressive progress in the computer visio... more Object reconstruction from 3D point clouds has achieved impressive progress in the computer vision and computer graphics research field. However, reconstruction from time-varying point clouds (a.k.a. 4D point clouds) is generally overlooked. In this paper, we propose a new network architecture, namely RFNet-4D, that jointly reconstruct objects and their motion flows from 4D point clouds. The key insight is that simultaneously performing both tasks via learning spatial and temporal features from a sequence of point clouds can leverage individual tasks and lead to improved overall performance. The proposed network can be trained using both supervised and unsupervised learning. To prove this ability, we design a temporal vector field learning module using unsupervised learning approach for flow estimation, leveraged by supervised learning of spatial structures for object reconstruction. Extensive experiments and analyses on benchmark dataset validated the effectiveness and efficiency of our method. As shown in experimental results, our method achieves state-of-the-art performance on both flow estimation and object reconstruction while performing much faster than existing methods in both training and inference.
Uploads
Papers by Duc Thanh Nguyen