Vision Transformer (VIT) Architecture for Robust Masked Face Recognition

Article Sidebar

Main Article Content

Lekha Prajapati
Girish Katkar
Ajay Ramteke

The widespread adoption of facial masks during the COVID-19 pandemic significantly challenged existing facial recognition systems by occluding critical biometric features. This paper proposes a Vision Transformer (ViT) based approach for robust Masked Face Recognition (MFR). Unlike traditional Convolutional Neural Networks (CNNs) that rely on local receptive fields, the ViT architecture utilizes global self-attention to capture long-range dependencies, making it more resilient to the information loss caused by masks. We evaluate our approach on the MFR2 dataset, by implementing a standardized training methodology, and our model achieves a peak accuracy of 98.22%. This study demonstrates that transformer-based architectures, combined with specialized attention mechanisms and contrastive learning, offer a state-of-the-art solution for secure authentication in masked environments.

Vision Transformer (VIT) Architecture for Robust Masked Face Recognition. (2026). International Journal of Latest Technology in Engineering Management & Applied Science, 15(3), 140-146. https://doi.org/10.51583/IJLTEMAS.2026.150300014

Downloads

References

Xu, "Based on the contrastive learning classifier for occluded face recognition," Procedia Computer Science, vol. 2025, 2025. DOI: 10.1016/j.procs.2025.08.148

Zhu et al., "Joint holistic and masked face recognition," IEEE Transactions on Information Forensics and Security, 2023. DOI: 10.1109/TIFS.2023.3280717

Zhao et al., "Masked Face Transformer," IEEE Transactions on Information Forensics and Security, 2023. DOI: 10.1109/tifs.2023.3322600

"Joint Holistic and Masked Face Recognition," IEEE Transactions on Information Forensics and Security, 2023. DOI: 10.1109/tifs.2023.3280717

Hosen et al., "HiMFR: A Hybrid Masked Face Recognition Through Face Inpainting," arXiv.org, 2022. DOI: 10.48550/arXiv.2209.08930

Zhao et al., "Masked Face Transformer," IEEE Transactions on Information Forensics and Security, 2023.

"Robust Masked Face Recognition via Balanced Feature Matching," in Proc. 2022 IEEE International Conference on Consumer Electronics (ICCE), 2022. DOI: 10.1109/icce53296.2022.9730338

Anwar et al., "Masked Face Recognition for Secure Authentication," arXiv: Computer Vision and Pattern Recognition, 2020.

"A Benchmark on Masked Face Recognition," in Proc. SIBGRAPI, 2022. DOI: 10.1109/sibgrapi55357.2022.9991785

Iftikhar et al., "Masked Face Detection and Recognition Using a Unified Feature Extractor," in Proc. ICACS, 2024. DOI: 10.1109/icacs60934.2024.10473243

"Ensemble Learning using Transformers and Convolutional Networks for Masked Face Recognition," arXiv.org, 2022. DOI: 10.48550/arxiv.2210.04816

Mahmoud et al., "A Comprehensive Survey of Masked Faces: Recognition, Detection, and Unmasking," Applied Sciences, vol. 14, no. 19, 2024. DOI: 10.3390/app14198781

"Towards Accurate and Lightweight Masked Face Recognition: An Experimental Evaluation," IEEE Access, vol. 2022, 2022. DOI: 10.1109/access.2021.3135255

"A Survey on Computer Vision based Human Analysis in the COVID-19 Era," arXiv.org, 2022. DOI: 10.48550/arxiv.2211.03705

Article Details

How to Cite

Vision Transformer (VIT) Architecture for Robust Masked Face Recognition. (2026). International Journal of Latest Technology in Engineering Management & Applied Science, 15(3), 140-146. https://doi.org/10.51583/IJLTEMAS.2026.150300014