Data Privacy and Utility Trade-Off: An Efficient K-Anonymization Algorithm with Low Information Loss

Article Sidebar

Main Article Content

Charles R. Haruna
Maame G. Asante-Mensah
Festus S. Doe
Sandro K. Amofa

Privacy Preserving Data Publishing (PPDP) remains a critical challenge in the era of large-scale data sharing, where the need to balance data utility and individual privacy is inherently conflicting. Among existing models, k-anonymity continues to be widely adopted due to its simplicity and interpretability; however, traditional k-anonymization algorithms suffer from key limitations, including distribution-agnostic partitioning and inadequate handling of outliers, which lead to excessive information loss and reduced data utility.


This paper proposes RAYDEN, a novel hybrid k-anonymization algorithm that integrates distribution-aware VP-tree partitioning with Connectivity-based Outlier Factor (COF) detection to address these limitations. The algorithm employs Gower distance to support mixed-type datasets and introduces a statistically adaptive threshold for robust outlier identification. Unlike existing approaches, RAYDEN incorporates a recursive outlier recovery mechanism that re-partitions detected outliers, maximizing data retention before applying suppression as a last resort.


Experimental evaluation on the UCI Adult dataset demonstrates that RAYDEN consistently outperforms compared algorithms across key utility metrics utilized in the study. The outlier recovery mechanism achieves a mean recovery rate exceeding 90% across all k values, substantially reducing suppression-related information loss compared to Mondrian with COF. While incurring higher computational cost, the algorithm achieves practical execution times and significantly improves the privacy–utility trade-off, particularly at commonly used k values. These results establish RAYDEN as a robust and effective framework for privacy-preserving data publishing in mixed-type datasets.

Data Privacy and Utility Trade-Off: An Efficient K-Anonymization Algorithm with Low Information Loss. (2026). International Journal of Latest Technology in Engineering Management & Applied Science, 15(4), 672-689. https://doi.org/10.51583/IJLTEMAS.2026.150400063

Downloads

References

L. Sweeney, “Simple demographics often identify people uniquely,” Health, vol. 671, pp. 1–34, 2000.

R. Chen, B. Fung, K. Wang, and P. Yu, “Privacy-preserving data publishing: A survey of recent developments,” ACM Comput. Surv., vol. 42, no. 4, pp. 1–53, 2010.

S. Abdelhameed, M. Khalifa, and S. Moussa, “Privacy-preserving tabular data publishing: A comprehensive evaluation from web to cloud,” Comput. Secur., vol. 72, pp. 74–95, 2017.

A. Meyerson and R. Williams, “On the complexity of optimal k-anonymity,” in Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles Database Syst., pp. 223–228, 2004.

G. Aggarwal et al., “Approximation algorithms for k-anonymity,” J. Privacy Technol., vol. 2005, no. 1, pp. 1–18, 2005.

C. C. Aggarwal, “On k-anonymity and the curse of dimensionality,” in Proc. VLDB, pp. 901–909, 2005.

European Parliament and Council, “Regulation (EU) 2016/679 (General Data Protection Regulation),” Off. J. Eur. Union, L119, pp. 1–88, 2016.

Parliament of Ghana, Data Protection Act, 2012 (Act 843), 2012.

R. Liu and H. Wang, “Hiding outliers into crowd: Privacy-preserving data publishing with outliers,” Data Knowl. Eng., vol. 100, pp. 94–115, 2015.

L. Sweeney, “k-anonymity: A model for protecting privacy,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst., vol. 10, no. 5, pp. 557–570, 2002.

C. Eyüpoğlu, B. C. Kara, and O. Karakuş, “(r, k, ε)-anonymization: Privacy-preserving data publishing algorithm,” IEEE Access, vol. 13, pp. 70422–70435, 2025.

J. Gehrke et al., “ℓ-diversity: Privacy beyond k-anonymity,” ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, Art. 3, 2007.

N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond k-anonymity and ℓ-diversity,” in Proc. IEEE ICDE, pp. 106–115, 2007.

C. Dwork, “Differential privacy,” in Automata, Languages and Programming (ICALP 2006), LNCS vol. 4052, pp. 1–12, 2006.

C. Eyüpoğlu and B. C. Kara, “Anonymization methods for privacy-preserving data publishing,” in Smart Applications with Advanced Machine Learning, vol. 1, pp. 145–159, 2023.

D. J. DeWitt, K. LeFevre, and R. Ramakrishnan, “Mondrian multidimensional k-anonymity,” in Proc. IEEE ICDE, p. 25, 2006.

P. Kalnis, N. Mamoulis, and M. Terrovitis, “Local and global recoding methods for anonymizing set-valued data,” VLDB J., vol. 20, no. 1, pp. 83–106, 2011.

B. Kenig and T. Tassa, “A practical approximation algorithm for optimal k-anonymity,” Data Min. Knowl. Discov., vol. 25, no. 1, pp. 134–168, 2012.

S. Karagiannis et al., “Mastering data privacy: Leveraging k-anonymity for robust health data sharing,” Int. J. Inf. Secur., vol. 23, pp. 2189–2201, 2024.

Y. Chen et al., “An innovative k-anonymity privacy-preserving algorithm,” Comput. Mater. Continua, vol. 79, no. 1, pp. 1561–1579, 2024.

M. Djoudi, L. Kacha, and A. Zitouni, “KAB: A new k-anonymity approach,” J. King Saud Univ. Comput. Inf. Sci., vol. 34, no. 7, pp. 4075–4088, 2022.

J. Andrew and J. Karthikeyan, “Privacy-preserving big data publication: (k, l)-anonymity,” in Intelligence in Big Data Technologies, pp. 77–88, 2021.

M. Z. Gök and M. E. Nergiz, “Hybrid k-anonymity,” Comput. Secur., vol. 44, pp. 51–63, 2014.

C. Eyüpoğlu and B. C. Kara, “A new privacy-preserving data publishing algorithm,” Comput. Mater. Continua, vol. 76, no. 2, pp. 1515–1535, 2023.

Y. Canbay, Ş. Sağıroğlu, and Y. Vural, “CANON: A new anonymization model,” Balkan J. Electr. Comput. Eng., vol. 10, no. 3, pp. 307–316, 2022.

R. Padmaja and V. Santhi, “XMondrian algorithm to protect identity disclosure,” in Advances in Parallel Computing, vol. 40, pp. 481–489, 2021.

P. N. Yianilos, “Data structures and algorithms for nearest neighbor search,” in Proc. ACM-SIAM SODA, pp. 311–321, 1993.

J. C. Gower, “A general coefficient of similarity,” Biometrics, vol. 27, no. 4, pp. 857–871, 1971.

D. W. Cheung et al., “Enhancing effectiveness of outlier detections,” in Advances in Knowledge Discovery and Data Mining, LNCS vol. 2336, pp. 535–548, 2002.

D. J. DeWitt, K. LeFevre, and R. Ramakrishnan, “Mondrian multidimensional k-anonymity,” in Proc. IEEE ICDE, p. 25, 2006.

D. Dua and E. K. Taniskidou, UCI Machine Learning Repository. University of California, Irvine, 2017.

Article Details

How to Cite

Data Privacy and Utility Trade-Off: An Efficient K-Anonymization Algorithm with Low Information Loss. (2026). International Journal of Latest Technology in Engineering Management & Applied Science, 15(4), 672-689. https://doi.org/10.51583/IJLTEMAS.2026.150400063