Page 1395
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 2026
An Efficient Machine Learning Based Model To Predict Heart
Disease
Ravindra Chauhan, Anshika yadav, Sneha Aggarwal, Gungun Tyagi, Tania
R.D. Engineering College Ghaziabad
DOI:
https://doi.org/10.51583/IJLTEMAS.2026.150300120
Received: 03 April 2026; 04 April 2026; Published: 23 April 2026
ABSTRACT
Across the globe, cardiovascular diseases remain a leading contributor to death rates and accurate prediction are
essential to modern health systems. Unhealthy lifestyles are one of the elements leading to the increasing
occurrence of heart disease, stress and aging, and it has become essential to create a system capable of delivering
precise and reliable results diagnosis. With the growing accessibility of vast healthcare data, machine learning
technology is emerging as an important tool for helping clinical decision-making by identifying hidden patterns
and relationships in complex data sets. In this study, we developed a machine learning-based system for
predicting heart disease. The proposed system uses a structured set of data obtained from a publicly available
UCI source, contain important medical parameters. To guarantee high data quality and raise the level of
performance of models, multiple preprocessing techniques were implemented, including data cleaning, feature
normalization, and handling of missing values, classification variable encoding and outlier detection. Different
approaches were tested to identify the most effective model. The models were evaluated based on performance
indicators such as recall, accuracy, and precision and ROC-AUC points. The study focuses on the performance
of ensemble learning using Random Forest, while comparative analysis shows that KNN achieved slightly higher
accuracy on the given dataset. K-Nearest Neighbors performed the best, achieving an accuracy of around 91.8%
and superior classification capabilities indicated by ROC curves and overall evaluation metrics. Our proposed
approach can be used as an effective decision-making tool for medical professionals to identify high-risk patients
in time. Finally, this approach helps reduce mortality rates and can assist doctors in early detection and better
decision-making.
Keywords: Heart Disease Prediction, Machine Learning (ML) Techniques, Random Forest Classifier, Python,
Supervised Learning, Logistic Regression Algorithm (LR), Support Vector Machine Algorithm (SVM), K-
Nearest Neighbors (KNN), Decision Tree, Data Preprocessing, Feature Selection, Jupyter Notebook.
INTRODUCTION
Cardiovascular diseases rank as the foremost cause of mortality globally, representing a significant health
concern in both developed and developing nations. The heart plays crucial part in preserving the system’s
appropriate operation of the human body by transporting oxygen and nutrients through the circulatory system.
Any disruption to this process may result in serious health concerns, including heart related disease and sudden
heart failure. The rise in heart disease is largely due to modern lifestyle elements like a poor diet and insufficient
exercise, smoking, stress and genetic predisposition, and is therefore a crucial field of study within medicine.
Early diagnosis of heart disease is crucial for understanding effective treatment and prevention. However,
accurate prediction is often difficult because of multiple interconnected clinical factors and various physiological
aspects. Existing diagnostic methods largely based on clinical experience and cannot always capture hidden
patterns in large and complex medical data sets. This limitation highlights the need for intelligent and data-driven
systems to help doctors and professionals to make more accurate and timely decisions. Medical diagnosis field
has received a lot of attention lately, due to its capacity to analyze enormous amounts of data and identify
complex patterns and relationships between features.
Page 1396
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 2026
Machine learning techniques like logistic regression, support vector machines (SVMs), K nearest neighbors
(KNNs), decision trees and aggregation methods, are widely used to predict heart disease. These techniques
allow automated prediction by observing from existing patient data and identifying patterns connected
to the emergence of illnesses. Despite this, many previous studies face challenges such as limited prediction
accuracy and inadequate treatment of characteristic interactions, and insufficient generalization in different
datasets. Furthermore, some models are too modified or do not precisely adapt non-linear connections between
medical attributes.
This study aims to overcome these constraints proposes a model for the forecast of heart disease using many
machine learning algorithms, focusing mainly on a random forest classification. The model undergoes testing
and training on a UCI database containing relevant clinical features. To enhance data quality and model
performance, a variety of preprocessing methods are employed. Our study shows that the KNN model achieves
best performance which is 91.8 % accuracy, indicating its effectiveness in heart disease prediction. Our aims are
to provide a reliable and efficient decision support system to help medical professionals identify and assess risks
early. Our approach uses AI&ML which contributes to improving prediction accuracy and supporting better
health outcomes.
LITERATURE REVIEW
In recent years, various studies have been performed to improve early forecasting cardiovascular disease using
machine learning techniques. Researchers have studied data-driven approaches for analyzing clinical data sets
and identifying patterns associated with cardiovascular diseases. These methods are intended to assist medical
professionals in making accurate and timely decisions and ultimately reduce mortality rates. Machine learning
algorithms have demonstrated considerable promise in deriving valuable insights from intricate medical data
that traditional methods cannot capture. Previous research focused on classical classification algorithms like
Logistic regression and decision trees for predicting heart disease. These techniques' simplicity and clarity serve
as a foundation for forecasting modeling.
Logistic regression is widely used in binary classification problems, and decision trees represent effective in
handling nonlinear relationships between features. However, when applied to large and complex datasets, these
models frequently experience issues with limitations such as lower accuracy and sensitivity to data change. To
achieve overcome these challenges, advanced machine learning algorithms, such as support vector machines
(SVMs) and K-nearest neighbors (KNNs), have been introduced. SVM is well-known for establishing optimal
and managing high-dimensional data boundaries for medical classification tasks. Even so, the
prediction performance is improved by these techniques, they may require careful parameter adjustments and
require computational costs for large data sets.
In recent times, combined learning techniques such as random forest have received widespread attention due to,
they have improved prediction accuracy and reduced over-adaptation. Random forests integrate numerous
decision trees to produce more robust and reliable predictions. Various research has shown that ensemble models
have the advantage of outperforming individual classifications by recording complex character interactions and
reducing variances. In addition, techniques for choosing features are utilized in many research to find out the
most important clinical traits, making models even better. Despite progress in this region, there exist still some
limitations. Many studies indicate problems such as imbalanced data sets, insufficient generalization between
different populations, and the lack of complex model interpretation.
Furthermore, some models do not effectively use all available clinical features, which can affect predictive
accuracy. These difficulties underscore the need for greater efficient and scalable approaches to providing
precise and trustworthy predictions. Considering this situation, the current study centers on creating a heart
disease predicting system employing a range of machine learning algorithms, with a focus on KNN
classification. This study aims to achieve higher prediction accuracy and better performance by applying
appropriate pre-processing techniques and evaluating different models. The results contribute to ongoing
research into intelligent health systems and show the efficacy of group learning in medical diagnosis.
Page 1397
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 2026
METHODOLOGY
This section explains the complete implementation of the forecasting of heart disease using AI&ML techniques.
The process includes dataset collection, preprocessing, model training, and evaluation.
Dataset Description
In this research, the dataset used from a publicly available UCI heart disease dataset. It contains 303 records and
14 attributes, including 13 input features and 1 target variable.
Target:0 No heart disease
1 heart disease present
Important features such as age, gender, chest pain type (cp), cholesterol (chol), resting blood pressure (trestbps),
and maximum heart rate (thalach) into training and testing sets using an 80:20 ratio to evaluate model
performance.
Data Preprocessing
To improvise the data and model performance, below given steps were applied:
Data Cleaning: Missing and inconsistent values were handled to ensure dataset reliability.
Encoding: Categorical variables include cp, thal, and slope were changed into numerical form through
label encoding.
Normalization/Scaling: Feature values measured using Min-Max normalization defined as
 


to maintain uniformity across all features.
Outlier Handling: Extreme values were identified and treated to minimize their impact on model
performance. These steps of preprocessing considerably improved the precision and stability of the systems.
Model Implementation
Many machine learning techniques were deployed to predict heart disease, including Logistic Regression
algorithm, Support Vector Machine (SVM) algorithm, K-Nearest Neighbors algorithm (KNN), Decision Tree,
and Random Forest algorithm. Each model was supervised using the pre-processed dataset and estimated on test
data.
Logistic Regression: Predicts probability using the sigmoid function
󰇛
󰇜 
󰇛  󰇛  󰇜󰇜
• Support Vector Machine (SVM): Separates data using a hyperplane defined as
w · x + b = 0.
K-Nearest Neighbors (KNN): Classifies data based on similarity using Euclidean distance
 󰇟󰇛 󰇜 󰇛 󰇜󰇠
Decision Tree: Splits data based on feature conditions using measures such as Gini Index.
Page 1398
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 2026
Random Forest: A collective approach that integrates various decision trees, where prediction is given as
 󰇛
󰇜󰇛󰇜
Among these, Random Forest provided better performance because of its capacity to handle complex data
patterns and reduce overfitting.
D. Performance Evaluation
The model’s performance was calculated through standard classification metrics:

󰇛󰇜
󰇛󰇜


󰇛󰇜


󰇛󰇜
ROC-AUC Score: Evaluates how well the model can differentiate between classes.
According to the evaluation's findings Logistic Regression achieved approximately 88.5% accuracy, SVM
90.2%, Random Forest 88.5%, and the highest accuracy was attained by KNN of approximately 91.8%, making
it the model that works best for heart disease prediction.
RESULTS AND DISCUSSION
The performance of various machine learning models was assessed to identify the most efficient method for
predicting heart disease. The study incorporates several models, including Logistic Regression, Support Vector
Machine (SVM), K-Nearest Neighbors (KNN), and Random Forest. The evaluation was done using key
performance metrics such as accuracy, precision, recall, and ROC-AUC score.
Accuracy Comparison
The accuracy of all implemented models is compared in Fig.2. The findings suggest that the K-nearest neighbor
classifier perform better than other algorithms, achieving the highest accuracy of approximately 91.8%.
In comparison, SVM, LR and Random Forest achieved moderate accuracy levels, while Decision Tree showed
relatively lower performance.
Fig.1: Comparison of accuracy among different machine learning systems, where KNN attained the highest
accuracy.
Page 1399
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 2026
Dataset Distribution Analysis
The dataset distribution utilized in this research is illustrated in Fig. 3. The chart illustrates the count of patients
categorized into two categories: heart disease and no heart disease. It is evident that 526 instances belong to the
heart disease class, while 499 instances correlate with the no heart disease class. This indicates that the dataset
is relatively balanced, approximately the same quantity of samples in each class. A balanced dataset is important
for machine learning models, as it prevents bias toward a particular class and improves the reliability of
predictions. The slight difference in class distribution does not significantly impact model performance.
Fig. 2: Distribution of both heart disease-free and heart disease affected patients in the dataset.
Feature Importance Analysis
The feature importance was performed through the KNN model to determine which are most influential
attributes affecting heart disease prediction. As shown in Fig. 3, features such as type of chest pain (cp),
maximum heart rate (thalach) and number of major vessels (ca), and factor of depression (oldpeak) plays a
crucial role in making predictions. This analysis helps in understanding the relationship between clinical
attributes and heart disease, providing valuable insights for medical decision-making.
Fig.3: Feature importance analysis using KNN indicating the contribution of each attribute in predicting heart
disease.
Page 1400
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 2026
The experimental results show that KNN achieved the highest accuracy among all models. However, Random
Forest also demonstrated strong and consistent performance with an accuracy of around 88.5%, making it a
reliable model for heart disease prediction.
Comparison Between The Proposed Model And Previous Research
Authors
Methods
Accuracy (%)
Our study
KNN
91.8
Mohan et al. [3]
HRFLM
88.47
Amin et al. [4]
Naïve Bayes And Logistic
regression
87.41
Latha & Jeeva [5]
NB, BN, RF, and MP
85.48
Patel et al. [9]
J48 with Reduced Error
pruning Algorithm
56.76
Tomar & Agarwal [10]
Feature selection-based
LSTSVM
85.59
Buscema et al. [11]
TWIST algorithm
84.14
CONCLUSION
In this research, the method of predicting heart disease system based on machine learning has been created and
evaluated using several classification algorithms. The objective was to identify efficient and reliable models for
accurately predicting heart disease using clinical data. Machine learning technologies, including linear regression
algorithm, support vector machine algorithm (SVM), K-Nearest Neighbors (KNN), decision tree, and random
forest, were tested and evaluated. To ensure the most effective model, data preprocessing methods including
data cleaning and encoding, normalization and extra processing were applied. The model is evaluated using
standard performance indicators, including accuracy, accuracy, recall, and ROC-AUC score. The experimental
results show that KNN achieved the highest accuracy among all models. However, Although KNN achieved the
highest accuracy, Random Forest remains a robust and reliable model due to its ability to handle complex feature
interactions and provide stable performance across different data conditions. In addition, the importance analysis
of characteristics showed that certain medical characteristics, such as the number of major vessels and the kind
of chest pain and the maximum heart rate play a crucial factor in forecasting heart disease. Our research illustrates
the effectiveness of group learning methods for handling complex medical and datasets enhancing prediction
precision. The proposed model will assist medical experts in early detection and risk assessment, thus
contributing to better clinical decision making and potentially lowering the death rate of cardiovascular diseases.
In future research, the model will be able to further enhance with the incorporation of large-scale and varied data
sets, advanced technology, deep learning, and the integration of real-time healthcare data into more resilient and
scalable prediction systems.
The study is based on a publicly available dataset, which may not fully represent real-world clinical scenarios.
The dataset size is limited, which may affect model generalization. Additionally, the model has not been
validated on real-time clinical data.
Future work can include testing the model on larger and real-world datasets. Advanced techniques such as deep
learning and explainable AI can be applied to improve performance and interpretability. The system can also be
integrated into real-time healthcare applications. The use of cross-validation helps reduce overfitting and ensures
consistent model performance.
Page 1401
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 2026
REFERENCES
1. N. Biswas, M. M. Ali, M. A. Rahaman, M. Islam, M. R. Mia, S. Azam et al., Machine learning-
based model to predict heart disease in early stage employing different feature selection techniques,”
BioMed Research International, vol. 2023, Article ID 6864343, pp. 115, 2023.
2. I. D. Mienye and Y. Sun, “Effective feature selection for improved prediction of heart disease,” in
Pan-African Artificial Intelligence and Smart Systems Conference, pp. 94107, Springer, Cham,
2021.
3. S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease prediction using hybrid machine
learning techniques,” IEEE Access, vol. 7, pp. 8154281554, 2019.
4. M. S. Amin, Y. K. Chiam, and K. D. Varathan, “Identification of significant features and data mining
techniques in predicting heart disease,” Telematics and Informatics, vol. 36, pp. 8293, 2019.
5. C. B. C. Latha and S. C. Jeeva, “Improving the accuracy of pre diction of heart disease risk based on
ensemble classification techniques,” Informatics in Medicine Unlocked, vol. 16, article 100203,
2019.
6. V. V. Ramalingam, A. Dandapath, and M. K. Raja, Heart dis ease prediction using machine learning
techniques : a survey,” International Journal of Engineering & Technology, vol. 7, no. 2.8, pp. 684
687, 2018
7. M. F. Rabbi, M. P. Uddin, M. A. Ali et al., Performance evaluation of data mining classification
techniques for heart dis ease prediction,” American Journal of Engineering Research, vol. 7, no. 2,
pp. 278283, 2018.
8. S. Pouriyeh, S. Vahid, G. Sannino, G. De Pietro, H. Arabnia, and J. Gutierrez, A comprehensive
investigation and comparison of machine learning techniques in the domain of heart disease,” in 2017
IEEE Symposium on Computers and Communications (ISCC), pp. 204207, Heraklion, Greece,
2017.
9. J. Patel, D. Tejal Upadhyay, and S. Patel, “heart disease prediction using machine learning and data
mining technique,” heart disease, vol. 7, no. 1, pp. 129137, 2015.
10. D. Tomar and Agarwal, Feature selection based least square twin support vector machine for
diagnosis of heart disease,International Journal of Bio-Science and Bio-Technology, vol. 6, no. 2,
pp. 6982, 2014.
11. M. Buscema, M. Breda, and W. Lodwick, Training with Input Selection and Testing (TWIST)
Algorithm: A Significant Advance in Pattern Recognition Performance of Machine Learning,”
Journal of Intelligent Learning Systems and Applications, vol. 5, no. 1, article 27937, 2013.