Abstract
Microaggregation is a well-known perturbative approach to publish personal or financial records while preserving the privacy of data subjects. Microaggregation is also a mechanism to realize the k-anonymity model for Privacy Preserving Data Publishing (PPDP). Microaggregation consists of two successive phases: partitioning the underlying records into small clusters with at least k records and aggregating the clustered records by a special kind of cluster statistic as a replacement. Optimal multivariate microaggregation has been shown to be NP-hard. Several heuristic approaches have been proposed in the literature. This paper presents an iterative optimization method based on the optimal solution of the microaggregation problem (IMHM). The method builds the groups based on constrained clustering and linear programming relaxation and fine-tunes the results within an integrated iterative approach. Experimental results on both synthetic and real-world data sets show that IMHM introduces less information loss for a given privacy parameter, and can be adopted for different real world applications.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The terms “merge” and “noMerge” are used in the DBA paper.
Referred to as “Lloyd quantization design algorithm” in the original paper.
Referred to as “cell” in the original paper.
Number of Pre-partitioned Blocks (NPB).
We have also used a semi-random initialization, which is discussed in Sect. 5.4.
Here, the index of each centroid is its implicit number.
A centroid is called unassigned, if none of its members are assigned yet. Similarly, a remained record is a record not in the Path.
We assume, NPB=1, i.e., |B|=|T|=n.
Please note that another procedure reorders records in a path for ImprovedMHM algorithm (line 19 in Fig. 2), so the term nlogn does not make sense here.
Please see the Appendix.
For simplicity, we removed the time complexity related to the dimension, d.
In the referred paper, number of clusters (c) and records in a cluster (n i ) was chosen randomly.
References
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557–570
Machanavajjhala A et al (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3
Ninghui L, Tiancheng L, Venkatasubramanian S (2007) T-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE
Hong TP et al (2012) Using TF-IDF to hide sensitive itemsets. Appl Intell 1–9
Yin Y et al (2011) Privacy-preserving data mining. In: Data mining. Springer, Berlin
Domingo-Ferrer J, Solanas A, Martinez-Balleste A (2006) Privacy in statistical databases: k-anonymity through microaggregation. In: Proceedings of IEEE granular computing, pp 774–777
Oganian A, Karr AF (2011) Masking methods that preserve positivity constraints in microdata. J Stat Plan Inference 141(1):31–41
Domingo-Ferrer J, Torra V (2001) Disclosure control methods and information loss for microdata. In: Doyle P, Lane JI, Theeuwes JJM, Zayatz L (eds) Confidentiality, disclosure, and data access: theory and practical applications for statistical agencies, pp 93–112
Xu C et al (2012) Efficient fuzzy ranking queries in uncertain databases. Appl Artif Intell 37(1):47–59
Wong RC-W et al (2006) (α,k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Philadelphia, pp 754–759
Sun X, Li M, Wang H (2011) A family of enhanced-diversity models for privacy preserving data publishing. Future Gener Comput Syst 27(3):348–356
Sun X, Sun L, Wang H (2011) Extended k-anonymity models against sensitive attribute disclosure. Comput Commun 34(4):526–535
Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method
Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201
Domingo-Ferrer J, Sramka M Disclosure control by computer scientists: an overview and an application of microaggregation to mobility data anonymization
Rebollo-Monedero D, Forné J, Soriano M (2011) An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl Eng
Domingo-Ferrer J, Torra V (2005) Ordinal continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Torra V (2004) In: Domingo-Ferrer J, Torra V (eds) Microaggregation for categorical variables: a median based approach, privacy in statistical databases. Springer, Berlin, pp 162–174
Hansen SL, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044
Solanas A, Sebé F, Domingo-Ferrer J (2008) Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond. In: Proceedings of the 2008 international workshop on privacy and anonymity in information society. ACM, New York
Domingo-Ferrer J, Sebé F, Solanas A (2008) An anonymity model achievable via microaggregation. In: Secure data management, pp 209–218
Jian-min H, Ting-ting C, Hui-qun Y (2008) An improved V-MDAV algorithm for l-diversity. In: 2008 international symposiums on information processing (ISIP). IEEE, New York
Skinner CJ et al (1994) Disclosure control for census microdata. J Off Stat 10(1):31–51
Nin J, Herranz J, Torra V (2008) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412
Yancey W, Winkler W, Creecy R (2002) Disclosure risk assessment in perturbative microdata protection. In: Inference control in statistical databases, pp 49–60
Chang C-C, Li Y-C, Huang W-H (2007) TFRP: an efficient microaggregation algorithm for statistical disclosure control. J Syst Softw 80(11):1866–1878
Domingo-Ferrer J, González-Nicolás Ú (2010) Hybrid microdata using microaggregation. Inf Sci 180(15):2834–2844
Sun X et al (2012) An approximate microaggregation approach for microdata protection. Expert Syst Appl 39(2):2211–2219
Crises G (2004) Microaggregation for privacy protection in statistical databases. Rovira i Virgili, Univ. Tarragona, Spain, Tech. Rep. CRIREP-04-005
Hundepool A et al (2006) CENEX SDC handbook on statistical disclosure control, version 1.01
Oganian A, Domingo-Ferrer J (2001) On the complexity of optimal microaggregation for statistical disclosure control. Stat J United Nations Econ Comission Eur 18(4):345–354
Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, Ottawa, Canada
Mateo Sanz JM, Domingo i Ferrer J (1998) A comparative study of microaggregation methods. Questiió 22(3):511–526
Domingo-Ferrer J et al (2006) Efficient multivariate data-oriented microaggregation. VLDB J 15(4):355–369
Heaton B (2012) New record ordering heuristics for multivariate microaggregation. Nova Southeastern University
Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911
Solanas A, Martinez-Balleste A, Domingo-Ferrer J (2006) V-MDAV: a multivariate microaggregation with variable group size. In: COMPSTAT’2006
Chang CC, Li YC, Huang WH (2007) TFRP: an efficient microaggregation algorithm for statistical disclosure control. J Syst Softw 80(11):1866–1878
Solanas A et al (2006) Multivariate microaggregation based genetic algorithms. In: 2006 3rd international IEEE conference on intelligent systems
Lin JL et al (2010) Density-based microaggregation for statistical disclosure control. Expert Syst Appl 37(4):3256–3263
Rebollo-Monedero D, Forné J, Soriano M (2011) An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl Eng 70(10):892–921
Panagiotakis C, Tziritas G (2011) Successive group selection for microaggregation. IEEE Trans Knowl Data Eng. http://www.computer.org/csdl/trans/tk/preprint/ttk2011990169-abs.html
Li Y et al (2002) A privacy-enhanced microaggregation method. Foundations of Information and Knowledge Systems, 237–250
Solanas A (2008) In: Yang A, Shan Y, Bui L (eds) Success in evolutionary computation. Springer, Berlin, pp 215–237
Fayyoumi E, Oommen BJ (2010) A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases. Softw Pract Exp 40(12):1161–1188
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations, California, USA
Cormen TH et al (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge, p 850
Ghouila-Houri A (1962) Caractérisation des matrices totalement unimodulaires. C R Math Acad Sci 254:1192–1194
Korte BH, Vygen J (2006) Combinatorial optimization: theory and algorithms. Algorithms and combinatorics, vol 21. Springer, Berlin
Wong RCW et al. (2007) Minimality attack in privacy preserving data publishing. In: Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment
Mosek AS (2010) The MOSEK optimization software. Online at http://www.mosek.com
Brand R, Domingo-Ferrer J, Mateo-Sanz J (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. European Project IST-2000-25069 CASC. http://neon.vb.cbs.nl/casc
UCI KDD archive. Available from: http://kdd.ics.uci.edu/
S, H. and B. SD (1999) The UCI KDD archive
Cormode G et al (2010) Minimizing minimality and maximizing utility: analyzing method-based attacks on anonymized data. Proc VLDB Endow 3(1–2):1045–1056
Acknowledgement
This research is partially supported by ITRC (Iran Telecommunication Research Center) under contract No. 12200/500. We would like to thank the editor and anonymous reviewers for their detailed comments that help improve the paper.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs related to the ImprovedMHM function (Fig. 4)
Appendix: Proofs related to the ImprovedMHM function (Fig. 4)
Assume a group of records G with n members {X 1,X 2,…,X n } in a d dimensional space. Let \(C_{1}^{n}[l]= \sum_{j\in \{1,2,\ldots ,n\}}X_{j}[l]/n\) represents the lth dimension (1≤l≤d) of the centroid of G, where X j [l] denotes the lth dimension of X j . Based on Eq. (1), it is easy to verify that the SSE of the group can be computed by SSE(G)=SSE(G[1])+SSE(G[2])+⋯+SSE(G[d]), where G[l]={X 1[l],X 2[l],…,X n [l]}.
Lemma 1
If a new record X n+1 is added to a group G 1, the SSE of the new group G 2=G 1∪{X n+1} denoted by SSE 2 can be calculated based on the calculated SSE before the addition (SSE 1) in O(1).
Proof
Let ΔSSE=SSE 2−SSE 1=∑1≤l≤d SSE 2[l]−SSE 1[l]. Based on Eq. (1), we can expand the ΔSSE[l]:

The new centroid can be computed simply:
So Eq. (3) reduces to:

These two updates require O(1) computations. □
Equations (4) and (5) are used in the ImprovedMHM function (lines 25–26 of Fig. 4) to update the SSE and centroid of a new group.
Lemma 2
If a record X n+1 replaces a member in the group G, say X 1, the SSE of the new group can be calculated in O(1).
Proof
We have \(\mathit{SSE}_{2}[l]=\sum_{2\leq j\leq n+1}(X_{j}[l]-C_{2}^{n+1}[l])^{2}\). Let Δ[l]=X n+1[l]−X 1[l], so \(C_{2}^{n+1}[l]=C_{1}^{n}[l]+\Delta [l]/n\) and the SSE can be updated incrementally:

Equation (6) is used in line 19 of Fig. 4, and requires O(1) computations. □
Rights and permissions
About this article
Cite this article
Mortazavi, R., Jalili, S. & Gohargazi, H. Multivariate microaggregation by iterative optimization. Appl Intell 39, 529–544 (2013). https://doi.org/10.1007/s10489-013-0431-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-013-0431-y