INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025
Random Forest attained R² values above 99% in urban AQI prediction, whereas Rezk et al. [5]and Wang et al.
[6] emphasised the reliability of Random Forest in sustainable monitoring systems and IoT-based applications.
Feature selection techniques, including Sequential Forward Selection (SFS), have demonstrated efficacy in
improving performance by pinpointing the most pertinent variables [5].
Deep learning methods, especially recurrent neural networks like LSTM and BiLSTM, have been extensively
investigated for air quality forecasting. These models are proficient in capturing temporal dependencies in time-
series data. Ma et al. [7] illustrated the efficacy of an IDW-BLSTM framework for multi-granularity PM2.5
forecasting, whereas Liu et al. [8]introduced a hybrid ITTAO–sLSTM–Attention model that markedly decreased
RMSE in comparison to conventional designs. Comparable progress has been documented in research integrating
meteorological and pollutant variables using hybrid or feature-enhanced techniques, yielding R² values exceeding
0.96 in practical case studies [9]
Notably, several studies indicate that simpler models can rival or even surpass complicated models under some
circumstances. Zareba et al. [10] indicated that Ridge Regression outperformed LSTM during winter smog events,
whereas Liu et al. [2] illustrated the efficacy of Extreme Learning Machines when enhanced by genetic
algorithms. These findings indicate that a model's appropriateness is contingent upon aspects like data
characteristics, feature quality, and computing limitations.
This study focuses on three things. Firstly, it offers a consistent and equitable comparison of deep learning models,
ensemble-based approaches, and classical machine learning models for PM₂.5 prediction under the same
experimental settings. Second, it assesses the resilience and dependability of the model using both quantitative
metrics and visual diagnostic assessments. Third, it provides useful information about the trade-off between
interpretability and predictive accuracy when utilizing temporal features exclusively for air quality forecasting.
RELATED WORK
A substantial body of research has concentrated on devising precise methodologies for forecasting air quality,
specifically PM2.5 concentrations, employing both conventional statistical models and sophisticated machine
learning (ML) or deep learning (DL) techniques. Initial research frequently utilised regression and autoregressive
models; however, these methodologies encountered difficulties with the nonlinear and spatiotemporal
relationships inherent in air pollution data [10] The constraints of statistical methods have prompted the extensive
use of data- driven machine learning frameworks that more effectively elucidate intricate pollutant-meteorology
relationships [11]
Ensemble-based machine learning approaches have gained prominence for their equilibrium between accuracy
and interpretability. Random Forest has repeatedly exhibited superior performance in diverse scenarios, with
Rosca et al. [8] finding an R² over 99% for urban AQI prediction and Rezk et al. [5] validating its efficacy when
integrated with Sequential Forward Selection. Wang et al. [9] incorporated Random Forest into an IoT-enabled
system, demonstrating its efficacy for real-time monitoring. Additional ensemble models, including XGBoost and
CatBoost, have been evaluated and shown to be useful for nonlinear, multivariate datasets[7]
Deep learning algorithms have greatly enhanced the field by using temporal and spatial patterns in extensive
datasets. Ma et al. [6] introduced an IDW-BLSTM architecture for multi-granularity forecasting, whereas Liu et
al. [2] created a hybrid ITTAO–sLSTM–Attention framework that minimized prediction errors relative to
traditional recurrent networks. CNN–LSTM hybrids have been developed, with Stergiou et al. [9] demonstrating
enhanced peak pollution detection. Research by Zhou et al. [12] showed that the integration of meteorological
data with NARX neural networks improved air quality index prediction in urban settings.
Hybrid methodologies that amalgamate feature engineering, optimization, or sensor integration have
demonstrated potential. El Mghouchi and Udristioiu [1]employed hybrid AI-driven models that amalgamated
pollutant and meteorological data, attaining substantial accuracy in PM forecasting. El Mghouchi et al.
[4]expanded upon this research using multivariable hybrid machine learning models, routinely achieving R²
values over 0.96. Likewise, Popescu et al. [5] proved that hybrid models utilising feature selection techniques
Page 846