Data Privacy and Utility Trade-Off: An Efficient K-Anonymization Algorithm with Low Information Loss
Article Sidebar
Main Article Content
Privacy Preserving Data Publishing (PPDP) remains a critical challenge in the era of large-scale data sharing, where the need to balance data utility and individual privacy is inherently conflicting. Among existing models, k-anonymity continues to be widely adopted due to its simplicity and interpretability; however, traditional k-anonymization algorithms suffer from key limitations, including distribution-agnostic partitioning and inadequate handling of outliers, which lead to excessive information loss and reduced data utility.
This paper proposes RAYDEN, a novel hybrid k-anonymization algorithm that integrates distribution-aware VP-tree partitioning with Connectivity-based Outlier Factor (COF) detection to address these limitations. The algorithm employs Gower distance to support mixed-type datasets and introduces a statistically adaptive threshold for robust outlier identification. Unlike existing approaches, RAYDEN incorporates a recursive outlier recovery mechanism that re-partitions detected outliers, maximizing data retention before applying suppression as a last resort.
Experimental evaluation on the UCI Adult dataset demonstrates that RAYDEN consistently outperforms compared algorithms across key utility metrics utilized in the study. The outlier recovery mechanism achieves a mean recovery rate exceeding 90% across all k values, substantially reducing suppression-related information loss compared to Mondrian with COF. While incurring higher computational cost, the algorithm achieves practical execution times and significantly improves the privacy–utility trade-off, particularly at commonly used k values. These results establish RAYDEN as a robust and effective framework for privacy-preserving data publishing in mixed-type datasets.
Downloads
References
L. Sweeney, “Simple demographics often identify people uniquely,” Health, vol. 671, pp. 1–34, 2000.
R. Chen, B. Fung, K. Wang, and P. Yu, “Privacy-preserving data publishing: A survey of recent developments,” ACM Comput. Surv., vol. 42, no. 4, pp. 1–53, 2010.
S. Abdelhameed, M. Khalifa, and S. Moussa, “Privacy-preserving tabular data publishing: A comprehensive evaluation from web to cloud,” Comput. Secur., vol. 72, pp. 74–95, 2017.
A. Meyerson and R. Williams, “On the complexity of optimal k-anonymity,” in Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles Database Syst., pp. 223–228, 2004.
G. Aggarwal et al., “Approximation algorithms for k-anonymity,” J. Privacy Technol., vol. 2005, no. 1, pp. 1–18, 2005.
C. C. Aggarwal, “On k-anonymity and the curse of dimensionality,” in Proc. VLDB, pp. 901–909, 2005.
European Parliament and Council, “Regulation (EU) 2016/679 (General Data Protection Regulation),” Off. J. Eur. Union, L119, pp. 1–88, 2016.
Parliament of Ghana, Data Protection Act, 2012 (Act 843), 2012.
R. Liu and H. Wang, “Hiding outliers into crowd: Privacy-preserving data publishing with outliers,” Data Knowl. Eng., vol. 100, pp. 94–115, 2015.
L. Sweeney, “k-anonymity: A model for protecting privacy,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst., vol. 10, no. 5, pp. 557–570, 2002.
C. Eyüpoğlu, B. C. Kara, and O. Karakuş, “(r, k, ε)-anonymization: Privacy-preserving data publishing algorithm,” IEEE Access, vol. 13, pp. 70422–70435, 2025.
J. Gehrke et al., “ℓ-diversity: Privacy beyond k-anonymity,” ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, Art. 3, 2007.
N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond k-anonymity and ℓ-diversity,” in Proc. IEEE ICDE, pp. 106–115, 2007.
C. Dwork, “Differential privacy,” in Automata, Languages and Programming (ICALP 2006), LNCS vol. 4052, pp. 1–12, 2006.
C. Eyüpoğlu and B. C. Kara, “Anonymization methods for privacy-preserving data publishing,” in Smart Applications with Advanced Machine Learning, vol. 1, pp. 145–159, 2023.
D. J. DeWitt, K. LeFevre, and R. Ramakrishnan, “Mondrian multidimensional k-anonymity,” in Proc. IEEE ICDE, p. 25, 2006.
P. Kalnis, N. Mamoulis, and M. Terrovitis, “Local and global recoding methods for anonymizing set-valued data,” VLDB J., vol. 20, no. 1, pp. 83–106, 2011.
B. Kenig and T. Tassa, “A practical approximation algorithm for optimal k-anonymity,” Data Min. Knowl. Discov., vol. 25, no. 1, pp. 134–168, 2012.
S. Karagiannis et al., “Mastering data privacy: Leveraging k-anonymity for robust health data sharing,” Int. J. Inf. Secur., vol. 23, pp. 2189–2201, 2024.
Y. Chen et al., “An innovative k-anonymity privacy-preserving algorithm,” Comput. Mater. Continua, vol. 79, no. 1, pp. 1561–1579, 2024.
M. Djoudi, L. Kacha, and A. Zitouni, “KAB: A new k-anonymity approach,” J. King Saud Univ. Comput. Inf. Sci., vol. 34, no. 7, pp. 4075–4088, 2022.
J. Andrew and J. Karthikeyan, “Privacy-preserving big data publication: (k, l)-anonymity,” in Intelligence in Big Data Technologies, pp. 77–88, 2021.
M. Z. Gök and M. E. Nergiz, “Hybrid k-anonymity,” Comput. Secur., vol. 44, pp. 51–63, 2014.
C. Eyüpoğlu and B. C. Kara, “A new privacy-preserving data publishing algorithm,” Comput. Mater. Continua, vol. 76, no. 2, pp. 1515–1535, 2023.
Y. Canbay, Ş. Sağıroğlu, and Y. Vural, “CANON: A new anonymization model,” Balkan J. Electr. Comput. Eng., vol. 10, no. 3, pp. 307–316, 2022.
R. Padmaja and V. Santhi, “XMondrian algorithm to protect identity disclosure,” in Advances in Parallel Computing, vol. 40, pp. 481–489, 2021.
P. N. Yianilos, “Data structures and algorithms for nearest neighbor search,” in Proc. ACM-SIAM SODA, pp. 311–321, 1993.
J. C. Gower, “A general coefficient of similarity,” Biometrics, vol. 27, no. 4, pp. 857–871, 1971.
D. W. Cheung et al., “Enhancing effectiveness of outlier detections,” in Advances in Knowledge Discovery and Data Mining, LNCS vol. 2336, pp. 535–548, 2002.
D. J. DeWitt, K. LeFevre, and R. Ramakrishnan, “Mondrian multidimensional k-anonymity,” in Proc. IEEE ICDE, p. 25, 2006.
D. Dua and E. K. Taniskidou, UCI Machine Learning Repository. University of California, Irvine, 2017.

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in our journal are licensed under CC-BY 4.0, which permits authors to retain copyright of their work. This license allows for unrestricted use, sharing, and reproduction of the articles, provided that proper credit is given to the original authors and the source.