skip to main content
research-article

Causal Feature Selection with Missing Data

Published: 08 January 2022 Publication History

Abstract

Causal feature selection aims at learning the Markov blanket (MB) of a class variable for feature selection. The MB of a class variable implies the local causal structure among the class variable and its MB and all other features are probabilistically independent of the class variable conditioning on its MB, this enables causal feature selection to identify potential causal features for feature selection for building robust and physically meaningful prediction models. Missing data, ubiquitous in many real-world applications, remain an open research problem in causal feature selection due to its technical complexity. In this article, we discuss a novel multiple imputation MB (MimMB) framework for causal feature selection with missing data. MimMB integrates Data Imputation with MB Learning in a unified framework to enable the two key components to engage with each other. MB Learning enables Data Imputation in a potentially causal feature space for achieving accurate data imputation, while accurate Data Imputation helps MB Learning identify a reliable MB of the class variable in turn. Then, we further design an enhanced kNN estimator for imputing missing values and instantiate the MimMB. In our comprehensively experimental evaluation, our new approach can effectively learn the MB of a given variable in a Bayesian network and outperforms other rival algorithms using synthetic and real-world datasets.

References

[1]
Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation. Journal of Machine Learning Research 11, 1 (2010), 171–234.
[2]
Alex Aussem and Sergio Rodrigues de Morais. 2010. A conservative feature subset selection algorithm with missing data. Neurocomputing 73, 4–6 (2010), 585–590.
[3]
Ingo A. Beinlich, Henri Jacques Suermondt, R. Martin Chavez, and Gregory F. Cooper. 1989. The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine. Springer, 247–256.
[4]
Robert G. Cowell, Philip Dawid, Steffen L. Lauritzen, and David J. Spiegelhalter. 2006. Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. Springer.
[5]
Rupam Deb and Alan Wee-Chung Liew. 2016. Missing value imputation for the analysis of incomplete traffic accident data. Information Sciences 339 (2016), 274–289.
[6]
Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1–22.
[7]
Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, Jan (2006), 1–30.
[8]
Gauthier Doquire and Michel Verleysen. 2012. Feature selection with missing data using mutual information estimators. Neurocomputing 90 (2012), 3–11.
[9]
Dheeru Dua and Casey Graff. 2017. UCI machine learning repository. Retrieved October 22, 2020 from http://archive.ics.uci.edu/ml.
[10]
Pedro J. García-Laencina, José-Luis Sancho-Gómez, and Aníbal R. Figueiras-Vidal. 2010. Pattern classification with missing data: A review. Neural Computing and Applications 19, 2 (2010), 263–282.
[11]
Pedro J. García-Laencina, José-Luis Sancho-Gómez, Aníbal R. Figueiras-Vidal, and Michel Verleysen. 2009. K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72, 7–9 (2009), 1483–1493.
[12]
Isabelle Guyon, Constantin Aliferis, and André Elisseeff. 2007. Causal feature selection. In Computational Methods of Feature Selection (2007). Chapman and Hall/CRC, 63–82.
[13]
Jianglin Huang, Jacky Wai Keung, Federica Sarro, Yan-Fu Li, Yuen-Tak Yu, WK Chan, and Hongyi Sun. 2017. Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study. Journal of Systems and Software 132 (2017), 226–252.
[14]
Lawrence R. Landerman, Kenneth C. Land, and Carl F. Pieper. 1997. An empirical evaluation of the predictive mean matching method for imputing missing values. Sociological Methods & Research 26, 1 (1997), 3–33.
[15]
Dimitris Margaritis and Sebastian Thrun. 1999. Bayesian network induction via local neighborhoods. In Proceedings of the Advances in Neural Information Processing Systems. 505–511.
[16]
Vahid Nassiri, Geert Molenberghs, Geert Verbeke, and João Barbosa-Breda. 2020. Iterative multiple imputation: A framework to determine the number of imputed datasets. The American Statistician 74, 2 (2020), 125–136.
[17]
Liqiang Pan and Jianzhong Li. 2010. K-nearest neighbor based missing data estimation algorithm in wireless sensor networks. Wireless Sensor Network 2, 02 (2010), 115.
[18]
Judea Pearl. 2014. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Elsevier.
[19]
Judea Pearl and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic Books.
[20]
Jose M. Pena, Roland Nilsson, Johan Björkegren, and Jesper Tegnér. 2007. Towards scalable and data efficient learning of Markov boundaries. International Journal of Approximate Reasoning 45, 2 (2007), 211–232.
[21]
Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 8 (2005), 1226–1238.
[22]
Utomo Pujianto, Aji Prasetya Wibawa, and Muhammad Iqbal Akbar. 2019. K-nearest neighbor (K-NN) based missing data imputation. In Proceedings of the 5th International Conference on Science in Information Technology. IEEE, 83–88.
[23]
Wenbin Qian and Wenhao Shu. 2015. Mutual information criterion for feature selection from incomplete data. Neurocomputing 168 (2015), 210–220.
[24]
J. Ross Quinlan. 1989. Unknown attribute values in induction. In Proceedings of the 6th International Workshop on Machine Learning. Elsevier, 164–168.
[25]
Beatriz Remeseiro and Veronica Bolon-Canedo. 2019. A review of feature selection methods in medical applications. Computers in Biology and Medicine 112 (2019), 103375.
[26]
Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. 2021. Toward causal representation learning. Proceedings of the IEEE 109, 5 (2021), 612–634.
[27]
Borja Seijo-Pardo, Amparo Alonso-Betanzos, Kristin P. Bennett, Veronica Bolon-Canedo, Julie Josse, Mehreen Saeed, and Isabelle Guyon. 2019. Biases in feature selection with missing data. Neurocomputing 342 (2019), 97–112.
[28]
J. G. Skellam. 1952. Studies in statistical ecology: I. Spatial pattern. Biometrika 39, 3/4 (1952), 346–362.
[29]
Peter Spirtes, Clark N. Glymour, Richard Scheines, and David Heckerman. 2000. Causation, Prediction, and Search. MIT press.
[30]
Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267–288.
[31]
Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov, and Er Statnikov. 2003. Algorithms for large scale markov blanket discovery. In Proceedings of the 16th International Florida Artificial Intelligence Research Society Conference. 376–380.
[32]
Jason Van Hulse and Taghi M. Khoshgoftaar. 2014. Incomplete-case nearest neighbor imputation in software measurement data. Information Sciences 259 (2014), 596–610.
[33]
Hao Wang, Zhaolong Ling, Kui Yu, and Xindong Wu. 2020. Towards efficient and effective discovery of markov blankets for feature selection. Information Sciences 509 (2020), 227–242.
[34]
David Williams, Xuejun Liao, Ya Xue, and Lawrence Carin. 2005. Incomplete-data classification using logistic regression. In Proceedings of the 22nd International Conference on Machine Learning. 972–979.
[35]
Zeshui Xu and J. Chen. 2008. An overview of distance and similarity measures of intuitionistic fuzzy sets. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 16, 04 (2008), 529–555.
[36]
Sandeep Yaramakala and Dimitris Margaritis. 2005. Speculative markov blanket discovery for optimal feature selection. In Proceedings of the 5th IEEE International Conference on Data Mining. IEEE, 809–812.
[37]
Kui Yu, Xianjie Guo, Lin Liu, Jiuyong Li, Hao Wang, Zhaolong Ling, and Xindong Wu. 2020. Causality-based feature selection: Methods and evaluations. ACM Computing Surveys 53, 5 (2020), 1–36.
[38]
Kui Yu, Lin Liu, and Jiuyong Li. 2021. A unified view of causal and non-causal feature selection. ACM Transactions on Knowledge Discovery from Data 15, 4 (2021), 1–46.
[39]
Lei Yu and Huan Liu. 2004. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, Oct (2004), 1205–1224.
[40]
Shichao Zhang. 2011. Shell-neighbor method and its application in missing data imputation. Applied Intelligence 35, 1 (2011), 123–133.
[41]
Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, and Debo Cheng. 2017. Learning k for knn classification. ACM Transactions on Intelligent Systems and Technology 8, 3 (2017), 1–19.
[42]
Wei Zheng, Xiaofeng Zhu, Yonghua Zhu, and Shichao Zhang. 2018. Robust feature selection on incomplete data. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 3191–3197.
[43]
Xiaofeng Zhu, Jianye Yang, Chengyuan Zhang, and Shichao Zhang. 2021. Efficient utilization of missing data in cost-sensitive learning. IEEE Transactions on Knowledge and Data Engineering 33, 6 (2021), 2425–2436.
[44]
Xiaofeng Zhu, Shichao Zhang, Zhi Jin, Zili Zhang, and Zhuoming Xu. 2010. Missing value estimation for mixed-attribute datasets. IEEE Transactions on Knowledge and Data Engineering 23, 1 (2010), 110–121.

Cited By

View all
  • (2025)Sparse multi-label feature selection via pseudo-label learning and dynamic graph constraintsInformation Fusion10.1016/j.inffus.2025.102975118(102975)Online publication date: Jun-2025
  • (2025)Missing value replacement in strings and applicationsData Mining and Knowledge Discovery10.1007/s10618-024-01074-339:2Online publication date: 22-Jan-2025
  • (2024)Variable Selection in Data Analysis: A Synthetic Data ToolkitMathematics10.3390/math1204057012:4(570)Online publication date: 14-Feb-2024
  • Show More Cited By

Index Terms

  1. Causal Feature Selection with Missing Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 4
    August 2022
    529 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3505210
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 January 2022
    Accepted: 01 September 2021
    Revised: 01 July 2021
    Received: 01 March 2021
    Published in TKDD Volume 16, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Causal feature selection
    2. markov blanket
    3. bayesian network
    4. missing data

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Key Research and Development Program of China
    • National Science Foundation of China
    • Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)291
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Sparse multi-label feature selection via pseudo-label learning and dynamic graph constraintsInformation Fusion10.1016/j.inffus.2025.102975118(102975)Online publication date: Jun-2025
    • (2025)Missing value replacement in strings and applicationsData Mining and Knowledge Discovery10.1007/s10618-024-01074-339:2Online publication date: 22-Jan-2025
    • (2024)Variable Selection in Data Analysis: A Synthetic Data ToolkitMathematics10.3390/math1204057012:4(570)Online publication date: 14-Feb-2024
    • (2024)CDRM: Causal disentangled representation learning for missing dataKnowledge-Based Systems10.1016/j.knosys.2024.112079299(112079)Online publication date: Sep-2024
    • (2024)A fusion of centrality and correlation for feature selectionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122548241:COnline publication date: 25-Jun-2024
    • (2024)CRViT: Vision transformer advanced by causality and inductive bias for image recognitionApplied Intelligence10.1007/s10489-024-05910-355:1Online publication date: 2-Dec-2024
    • (2023)Identifying the Physical Origin of Gamma-Ray Bursts with Supervised Machine LearningThe Astrophysical Journal10.3847/1538-4357/ad03ec959:1(44)Online publication date: 5-Dec-2023
    • (2023)Enhanced Fuzzy Clustering for Incomplete Instance with Evidence CombinationACM Transactions on Knowledge Discovery from Data10.1145/363806118:3(1-20)Online publication date: 19-Dec-2023
    • (2023)Multi-Label Feature Selection Via Adaptive Label Correlation EstimationACM Transactions on Knowledge Discovery from Data10.1145/360456017:9(1-28)Online publication date: 10-Aug-2023
    • (2023)Adaptive Collaborative Soft Label Learning for Unsupervised Multi-View Feature SelectionACM Transactions on Knowledge Discovery from Data10.1145/359146717:8(1-25)Online publication date: 28-Jun-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media