research-article

Causal Feature Selection with Missing Data

Authors:

Wei DingAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 16, Issue 4

Article No.: 66, Pages 1 - 24

https://doi.org/10.1145/3488055

Published: 08 January 2022 Publication History

Abstract

Causal feature selection aims at learning the Markov blanket (MB) of a class variable for feature selection. The MB of a class variable implies the local causal structure among the class variable and its MB and all other features are probabilistically independent of the class variable conditioning on its MB, this enables causal feature selection to identify potential causal features for feature selection for building robust and physically meaningful prediction models. Missing data, ubiquitous in many real-world applications, remain an open research problem in causal feature selection due to its technical complexity. In this article, we discuss a novel multiple imputation MB (MimMB) framework for causal feature selection with missing data. MimMB integrates Data Imputation with MB Learning in a unified framework to enable the two key components to engage with each other. MB Learning enables Data Imputation in a potentially causal feature space for achieving accurate data imputation, while accurate Data Imputation helps MB Learning identify a reliable MB of the class variable in turn. Then, we further design an enhanced kNN estimator for imputing missing values and instantiate the MimMB. In our comprehensively experimental evaluation, our new approach can effectively learn the MB of a given variable in a Bayesian network and outperforms other rival algorithms using synthetic and real-world datasets.

References

[1]

Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation. Journal of Machine Learning Research 11, 1 (2010), 171–234.

Digital Library

[2]

Alex Aussem and Sergio Rodrigues de Morais. 2010. A conservative feature subset selection algorithm with missing data. Neurocomputing 73, 4–6 (2010), 585–590.

Digital Library

[3]

Ingo A. Beinlich, Henri Jacques Suermondt, R. Martin Chavez, and Gregory F. Cooper. 1989. The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine. Springer, 247–256.

[4]

Robert G. Cowell, Philip Dawid, Steffen L. Lauritzen, and David J. Spiegelhalter. 2006. Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. Springer.

Digital Library

[5]

Rupam Deb and Alan Wee-Chung Liew. 2016. Missing value imputation for the analysis of incomplete traffic accident data. Information Sciences 339 (2016), 274–289.

Digital Library

[6]

Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1–22.

[7]

Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, Jan (2006), 1–30.

Digital Library

[8]

Gauthier Doquire and Michel Verleysen. 2012. Feature selection with missing data using mutual information estimators. Neurocomputing 90 (2012), 3–11.

Digital Library

[9]

Dheeru Dua and Casey Graff. 2017. UCI machine learning repository. Retrieved October 22, 2020 from http://archive.ics.uci.edu/ml.

[10]

Pedro J. García-Laencina, José-Luis Sancho-Gómez, and Aníbal R. Figueiras-Vidal. 2010. Pattern classification with missing data: A review. Neural Computing and Applications 19, 2 (2010), 263–282.

Digital Library

[11]

Pedro J. García-Laencina, José-Luis Sancho-Gómez, Aníbal R. Figueiras-Vidal, and Michel Verleysen. 2009. K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72, 7–9 (2009), 1483–1493.

Digital Library

[12]

Isabelle Guyon, Constantin Aliferis, and André Elisseeff. 2007. Causal feature selection. In Computational Methods of Feature Selection (2007). Chapman and Hall/CRC, 63–82.

[13]

Jianglin Huang, Jacky Wai Keung, Federica Sarro, Yan-Fu Li, Yuen-Tak Yu, WK Chan, and Hongyi Sun. 2017. Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study. Journal of Systems and Software 132 (2017), 226–252.

Digital Library

[14]

Lawrence R. Landerman, Kenneth C. Land, and Carl F. Pieper. 1997. An empirical evaluation of the predictive mean matching method for imputing missing values. Sociological Methods & Research 26, 1 (1997), 3–33.

[15]

Dimitris Margaritis and Sebastian Thrun. 1999. Bayesian network induction via local neighborhoods. In Proceedings of the Advances in Neural Information Processing Systems. 505–511.

Digital Library

[16]

Vahid Nassiri, Geert Molenberghs, Geert Verbeke, and João Barbosa-Breda. 2020. Iterative multiple imputation: A framework to determine the number of imputed datasets. The American Statistician 74, 2 (2020), 125–136.

[17]

Liqiang Pan and Jianzhong Li. 2010. K-nearest neighbor based missing data estimation algorithm in wireless sensor networks. Wireless Sensor Network 2, 02 (2010), 115.

[18]

Judea Pearl. 2014. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Elsevier.

Digital Library

[19]

Judea Pearl and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic Books.

Digital Library

[20]

Jose M. Pena, Roland Nilsson, Johan Björkegren, and Jesper Tegnér. 2007. Towards scalable and data efficient learning of Markov boundaries. International Journal of Approximate Reasoning 45, 2 (2007), 211–232.

Digital Library

[21]

Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 8 (2005), 1226–1238.

Digital Library

[22]

Utomo Pujianto, Aji Prasetya Wibawa, and Muhammad Iqbal Akbar. 2019. K-nearest neighbor (K-NN) based missing data imputation. In Proceedings of the 5th International Conference on Science in Information Technology. IEEE, 83–88.

[23]

Wenbin Qian and Wenhao Shu. 2015. Mutual information criterion for feature selection from incomplete data. Neurocomputing 168 (2015), 210–220.

Digital Library

[24]

J. Ross Quinlan. 1989. Unknown attribute values in induction. In Proceedings of the 6th International Workshop on Machine Learning. Elsevier, 164–168.

Digital Library

[25]

Beatriz Remeseiro and Veronica Bolon-Canedo. 2019. A review of feature selection methods in medical applications. Computers in Biology and Medicine 112 (2019), 103375.

Digital Library

[26]

Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. 2021. Toward causal representation learning. Proceedings of the IEEE 109, 5 (2021), 612–634.

[27]

Borja Seijo-Pardo, Amparo Alonso-Betanzos, Kristin P. Bennett, Veronica Bolon-Canedo, Julie Josse, Mehreen Saeed, and Isabelle Guyon. 2019. Biases in feature selection with missing data. Neurocomputing 342 (2019), 97–112.

[28]

J. G. Skellam. 1952. Studies in statistical ecology: I. Spatial pattern. Biometrika 39, 3/4 (1952), 346–362.

[29]

Peter Spirtes, Clark N. Glymour, Richard Scheines, and David Heckerman. 2000. Causation, Prediction, and Search. MIT press.

[30]

Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267–288.

[31]

Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov, and Er Statnikov. 2003. Algorithms for large scale markov blanket discovery. In Proceedings of the 16th International Florida Artificial Intelligence Research Society Conference. 376–380.

[32]

Jason Van Hulse and Taghi M. Khoshgoftaar. 2014. Incomplete-case nearest neighbor imputation in software measurement data. Information Sciences 259 (2014), 596–610.

Digital Library

[33]

Hao Wang, Zhaolong Ling, Kui Yu, and Xindong Wu. 2020. Towards efficient and effective discovery of markov blankets for feature selection. Information Sciences 509 (2020), 227–242.

Digital Library

[34]

David Williams, Xuejun Liao, Ya Xue, and Lawrence Carin. 2005. Incomplete-data classification using logistic regression. In Proceedings of the 22nd International Conference on Machine Learning. 972–979.

Digital Library

[35]

Zeshui Xu and J. Chen. 2008. An overview of distance and similarity measures of intuitionistic fuzzy sets. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 16, 04 (2008), 529–555.

[36]

Sandeep Yaramakala and Dimitris Margaritis. 2005. Speculative markov blanket discovery for optimal feature selection. In Proceedings of the 5th IEEE International Conference on Data Mining. IEEE, 809–812.

Digital Library

[37]

Kui Yu, Xianjie Guo, Lin Liu, Jiuyong Li, Hao Wang, Zhaolong Ling, and Xindong Wu. 2020. Causality-based feature selection: Methods and evaluations. ACM Computing Surveys 53, 5 (2020), 1–36.

Digital Library

[38]

Kui Yu, Lin Liu, and Jiuyong Li. 2021. A unified view of causal and non-causal feature selection. ACM Transactions on Knowledge Discovery from Data 15, 4 (2021), 1–46.

Digital Library

[39]

Lei Yu and Huan Liu. 2004. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, Oct (2004), 1205–1224.

Digital Library

[40]

Shichao Zhang. 2011. Shell-neighbor method and its application in missing data imputation. Applied Intelligence 35, 1 (2011), 123–133.

Digital Library

[41]

Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, and Debo Cheng. 2017. Learning k for knn classification. ACM Transactions on Intelligent Systems and Technology 8, 3 (2017), 1–19.

Digital Library

[42]

Wei Zheng, Xiaofeng Zhu, Yonghua Zhu, and Shichao Zhang. 2018. Robust feature selection on incomplete data. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 3191–3197.

Digital Library

[43]

Xiaofeng Zhu, Jianye Yang, Chengyuan Zhang, and Shichao Zhang. 2021. Efficient utilization of missing data in cost-sensitive learning. IEEE Transactions on Knowledge and Data Engineering 33, 6 (2021), 2425–2436.

[44]

Xiaofeng Zhu, Shichao Zhang, Zhi Jin, Zili Zhang, and Zhuoming Xu. 2010. Missing value estimation for mixed-attribute datasets. IEEE Transactions on Knowledge and Data Engineering 23, 1 (2010), 110–121.

Digital Library

Cited By

Zhang YTang JCao ZChen H(2025)Sparse multi-label feature selection via pseudo-label learning and dynamic graph constraintsInformation Fusion10.1016/j.inffus.2025.102975118(102975)Online publication date: Jun-2025
https://doi.org/10.1016/j.inffus.2025.102975
Bernardini GLiu CLoukides GMarchetti-Spaccamela APissis SStougie LSweering M(2025)Missing value replacement in strings and applicationsData Mining and Knowledge Discovery10.1007/s10618-024-01074-339:2Online publication date: 22-Jan-2025
https://doi.org/10.1007/s10618-024-01074-3
Mitra RAli EVaram DSulieman HKamalov F(2024)Variable Selection in Data Analysis: A Synthetic Data ToolkitMathematics10.3390/math1204057012:4(570)Online publication date: 14-Feb-2024
https://doi.org/10.3390/math12040570
Show More Cited By

Index Terms

Causal Feature Selection with Missing Data
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection

Recommendations

A Unified View of Causal and Non-causal Feature Selection

In this article, we aim to develop a unified view of causal and non-causal feature selection methods. The unified view will fill in the gap in the research of the relation between the two types of methods. Based on the Bayesian network framework and ...
Error-aware Markov blanket learning for causal feature selection
Abstract
Causal feature selection has attracted much attention in recent years, since it has better robustness than the traditional feature selection. Existing causal feature selection algorithms aim to identify a Markov blanket (MB) of the ...
Causal Feature Selection Algorithm Based on Maximizing Neighbourhood Mutual Information
CSAI '24: Proceedings of the 2024 8th International Conference on Computer Science and Artificial Intelligence

The method of causal feature selection for constructing predictive models, which utilizes the causal relationships between predictor variables and class variables, has garnered significant attention in recent years. However, almost all existing causal ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 16, Issue 4

August 2022

529 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3505210

Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 January 2022

Accepted: 01 September 2021

Revised: 01 July 2021

Received: 01 March 2021

Published in TKDD Volume 16, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Key Research and Development Program of China
National Science Foundation of China
Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
1,338
Total Downloads

Downloads (Last 12 months)291
Downloads (Last 6 weeks)24

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YTang JCao ZChen H(2025)Sparse multi-label feature selection via pseudo-label learning and dynamic graph constraintsInformation Fusion10.1016/j.inffus.2025.102975118(102975)Online publication date: Jun-2025
https://doi.org/10.1016/j.inffus.2025.102975
Bernardini GLiu CLoukides GMarchetti-Spaccamela APissis SStougie LSweering M(2025)Missing value replacement in strings and applicationsData Mining and Knowledge Discovery10.1007/s10618-024-01074-339:2Online publication date: 22-Jan-2025
https://doi.org/10.1007/s10618-024-01074-3
Mitra RAli EVaram DSulieman HKamalov F(2024)Variable Selection in Data Analysis: A Synthetic Data ToolkitMathematics10.3390/math1204057012:4(570)Online publication date: 14-Feb-2024
https://doi.org/10.3390/math12040570
Chen MWang HWang RPeng YZhang H(2024)CDRM: Causal disentangled representation learning for missing dataKnowledge-Based Systems10.1016/j.knosys.2024.112079299(112079)Online publication date: Sep-2024
https://doi.org/10.1016/j.knosys.2024.112079
Qiu PZhang CGao DNiu Z(2024)A fusion of centrality and correlation for feature selectionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122548241:COnline publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122548
Lu FJia KZhang XSun L(2024)CRViT: Vision transformer advanced by causality and inductive bias for image recognitionApplied Intelligence10.1007/s10489-024-05910-355:1Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1007/s10489-024-05910-3
Luo JWang FZhu-Ge JLi YZou YZhang B(2023)Identifying the Physical Origin of Gamma-Ray Bursts with Supervised Machine LearningThe Astrophysical Journal10.3847/1538-4357/ad03ec959:1(44)Online publication date: 5-Dec-2023
https://doi.org/10.3847/1538-4357/ad03ec
Liu ZLetchmunan S(2023)Enhanced Fuzzy Clustering for Incomplete Instance with Evidence CombinationACM Transactions on Knowledge Discovery from Data10.1145/363806118:3(1-20)Online publication date: 19-Dec-2023
https://dl.acm.org/doi/10.1145/3638061
Zhang ZZhang ZYao JLiu LLi JWu GWu X(2023)Multi-Label Feature Selection Via Adaptive Label Correlation EstimationACM Transactions on Knowledge Discovery from Data10.1145/360456017:9(1-28)Online publication date: 10-Aug-2023
https://dl.acm.org/doi/10.1145/3604560
Shi DZhu LDong XSong XLi JCheng Z(2023)Adaptive Collaborative Soft Label Learning for Unsupervised Multi-View Feature SelectionACM Transactions on Knowledge Discovery from Data10.1145/359146717:8(1-25)Online publication date: 28-Jun-2023
https://dl.acm.org/doi/10.1145/3591467
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents