AI-Based Deepfake Voice Detection Using MFCC Features and Random Forest Classification
Article Sidebar
Main Article Content
The rapid proliferation of AI-generated audio poses a serious threat to digital forensics, voice-based authentication, and information integrity. This paper presents a deepfake voice detection system that combines Mel-Frequency Cepstral Coefficient (MFCC) feature extraction with a Random Forest ensemble classifier to distinguish real human speech from synthetically generated audio. The proposed system processes input audio files in WAV or MP3 format, extracts 40 MFCC coefficients as the feature representation, and classifies each sample as real or fake through a trained Random Forest model. The complete pipeline is deployed as a Flask-based web application, enabling browser-based access without requiring any specialist software.
Experimental evaluation was conducted on a balanced binary dataset comprising 300 real voice recordings and 300 AI-generated voice samples (total: 600 samples), split 80/20 for training and testing. The system was evaluated against two baseline classifiers under identical feature conditions. Results demonstrate that the proposed Random Forest model achieves an accuracy of 92.7%, precision of 91.9%, recall of 93.5%, and an F1-score of 92.7%, indicating strong effectiveness for practical deepfake audio detection. These results represent a substantial improvement over the SVM baseline (accuracy: 76.4%) and Decision Tree baseline (accuracy: 81.2%).
Downloads
References
T. Kinnunen, M. Sahidullah, H. Delgado, M. Todisco, N. Evans, J. Yamagishi, and K. A. Lee, "The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection," in Proc. Interspeech, Stockholm, Sweden, 2017, pp. 2–6.
A. Nautsch, X. Wang, N. Evans, T. Kinnunen, V. Vestman, M. Todisco, H. Delgado, M. Sahidullah, J. Yamagishi, and K. A. Lee, "ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech," IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 3, no. 2, pp. 252–265, 2021.
M. Sahidullah, T. Kinnunen, and C. Hanilci, "A Comparison of Features for Synthetic Speech Detection," in Proc. Interspeech, Dresden, Germany, 2015, pp. 2087–2091.
G. Lavrentyeva, S. Novoselov, A. Malinin, A. Kozlov, O. Kudashev, and V. Shchemelinin, "STC Antispoofing Systems for the ASVspoof 2019 Challenge," in Proc. Interspeech, Graz, Austria, 2019, pp. 1033–1037.
H. Zhang, M. Tan, and X. Zhang, "Fake Speech Detection Using Residual Network with Transformer Encoder," in Proc. ACM Workshop on Information Hiding and Multimedia Security, 2021, pp. 13–22.
R. Reimao and V. Tzerpos, "FoR: A Dataset for Synthetic Speech Detection," in Proc. International Conference on Speech Technology and Human-Computer Dialogue (SpeD), 2019, pp. 1–8.
B. Li, L. Wang, T. Xu, and X. Li, "An Efficient Model for Real-Time Fake Voice Detection," Scientific Reports, vol. 13, no. 1, p. 7867, 2023.
M. Alzantot, B. Balaji, and M. Srivastava, "Did You Hear That? Adversarial Examples Against Automatic Speech Recognition," arXiv preprint arXiv:1801.00554, 2019.
M. Todisco, X. Wang, V. Vestman, M. Sahidullah, H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. Kinnunen, and K. A. Lee, "ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection," in Proc. Interspeech, Graz, Austria, 2019, pp. 1008–1012.
H. Tak, J. Patino, M. Todisco, A. Nautsch, N. Evans, and J. Yamagishi, "End-to-End Anti-Spoofing with RawNet2," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021, pp. 6369–6373.

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in our journal are licensed under CC-BY 4.0, which permits authors to retain copyright of their work. This license allows for unrestricted use, sharing, and reproduction of the articles, provided that proper credit is given to the original authors and the source.