Abstract
Although k-anonymity is a good way of publishing microdata for research purposes, it cannot resist several common attacks, such as attribute disclosure and the similarity attack. To resist these attacks, many refinements of kanonymity have been proposed with t-closeness being one of the strictest privacy models. While most existing t-closeness models address the case in which the original data have only one single sensitive attribute, data with multiple sensitive attributes are more common in practice. In this paper, we cover this gap with two proposed algorithms for multiple sensitive attributes and make the published data satisfy t-closeness. Based on the observation that the values of the sensitive attributes in any equivalence class must be as spread as possible over the entire data to make the published data satisfy t-closeness, both of the algorithms use different methods to partition records into groups in terms of sensitive attributes. One uses a clustering method, while the other leverages the principal component analysis. Then, according to the similarity of quasiidentifier attributes, records are selected from different groups to construct an equivalence class, which will reduce the loss of information as much as possible during anonymization. Our proposed algorithms are evaluated using a real dataset. The results show that the average speed of the first proposed algorithm is slower than that of the second proposed algorithm but the former can preserve more original information. In addition, compared with related approaches, both proposed algorithms can achieve stronger protection of privacy and reduce less.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sánchez D, Martínez S, Domingo-Ferrer J. Comment on “Unique in the shopping mall: On the reidentifiability of credit card metadata”. Science, 2016, 351(6279): 1274.
Sweeney L. k-anonymity: A model for protecting privacy. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 2002, 10(5): 557-570.
LeFevre K, DeWitt D J, Ramakrishnan R. Mondrian multidimensional k-anonymity. In Proc. the 22nd International Conference on Data Engineering, April 2006, p.25.
Machanavajjhala A, Gehrke J, Kifer D. Venkitasubramaniam M. l-diversity: Privacy beyond k-anonymity. In Proc. the 22nd International Conference on Data Engineering, April 2006, p.24.
Li N H, Li T C, Venkatasubramanian S. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proc. the 23rd International Conference on Data Engineering, April 2007, pp.106-115.
Domingo-Ferrer J, Soria-Comas J. From t-closeness to differential privacy and vice versa in data anonymization. Knowledge-Based Systems, 2015, 74: 151-158.
Rebollo-Monedero D, Forne J, Domingo-Ferrer J. From t-closeness-like privacy to postrandomization via information theory. IEEE Trans. Knowl. Data Eng., 2010, 22(11): 1623-1636.
Cao J N, Karras P, Kalnis P, Tan K L. SABRE: A sensitive attribute bucketization and redistribution framework for t-closeness. The VLDB Journal, 2011, 20: 59-81.
Soria-Comas J, Domingo-Ferrer J, Sánchez D, Martínez S. t-closeness through microaggregation: Strict privacy with enhanced utility preservation. IEEE Trans. Knowl. Data Eng., 2015, 27(11): 3098-3110.
Sha C F, Li Y, Zhou A Y. On t-closeness with KL-divergence and semantic privacy. In Proc. the 15th International Conference on Database Systems for Advanced Applications, April 2010, pp.153-167.
Zhang J P, Xie J, Yang J, Zhang B. A t-closeness privacy model based on sensitive attribute values semantics bucketization. Journal of Computer Research and Development, 2014, 51(1): 126-137. (in Chinese)
Rubner Y, Tomasi C, Guibas L J. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 2000, 40(2): 99-121.
Xu J, Wang W, Pei J, Wang X Y, Shi B L, Fu A W C. Utility-based anonymization using local recoding. In Proc. the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2006, pp.785-790.
Ghinita G, Karras P, Kalnis P, Mamoulis N. Fast data anonymization with low information loss. In Proc. the 33rd International Conference on Very Large Data Bases, September 2007, pp.758-769.
LeFevre K, DeWitt D J, Ramakrishnan R. Incognito: Efficient full-domain k-anonymity. In Proc. ACM SIGMOD International Conference on Management of Data, June 2005, pp.49-60.
Li N H, Li T C, Venkatasubramanian S. Closeness: A new privacy measure for data publishing. IEEE Trans. Knowl. Data Eng., 2010, 22(7): 943-956.
Fang Y, Ashrafi M Z, Ng S K. Privacy beyond single sensitive attribute. In Proc. the 22nd International Conference on Database and Expert Systems Applications, August 2011, pp.187-201.
Sei Y C, Okumura H, Takenouchi T, Ohsuga A. Anonymization of sensitive quasiidentifiers for l-diversity and t-closeness. IEEE Transactions on Dependable and Secure Computing. doi:10.1109/TDSC.2017.2698472.
Höppner F, Klawonn F. Clustering with size constraints. In Computational Intelligence Paradigms, Jain L C, Sato-Ilic M, Virvou M, Tsihrintzis G A, Balas V E (eds.), Springer, Berlin, Heidelberg, 2008, pp.167-180.
Jolliffe I T, Cadima J. Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2016, 374(2065): 20150202.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 185 kb)
Rights and permissions
About this article
Cite this article
Wang, R., Zhu, Y., Chen, TS. et al. Privacy-Preserving Algorithms for Multiple Sensitive Attributes Satisfying t-Closeness. J. Comput. Sci. Technol. 33, 1231–1242 (2018). https://doi.org/10.1007/s11390-018-1884-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-018-1884-6