Page 1498
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026
Real-Time Air Pollution Monitoring and AQI Prediction System:
Environmental Intelligence with IOT-Based Approach and Machine
Learning
Prof. Nitin Goyal, Shorya Chandokia, Abdul Rahman,
Ateek Saifi,
Tushar Sharma
Department of Computer Science, R.D. Engineering College, Ghaziabad, India
DOI:
https://doi.org/10.51583/IJLTEMAS.2026.150400124
Received: 22 April 2026; Accepted: 27 April 2026; Published: 21 May 2026
ABSTRACT
Air pollution is one of the most significant health concerns on earth, and the World Health Organization believes
that 7 million premature deaths happen annually due to air quality. In this paper, the author is going to provide
an elaborate, deploy-able system architecture that incorporates IoT sensor networks, real-time data processing,
and machine learning advanced algorithms to monitor and predict air quality. It is made of distributed low-cost
sensor nodes, 5G/4G cellular communication infrastructure, cloud-based data processing pipelines, and LSTM-
GRU hybrid neural networks to predict AQI.
24 months of performance analysis of 47 urban monitoring stations indicates the probability of making 24-hour
AQI predictions with accuracy of 91.3
percent with RMSE of 12.8µg/m
3
for PM
2.5
concentration.
Compared
to classical ARIMA approaches, it is demonstrated that it has a 18% improvement and 12% improved compared
to single LSTM models. Some of the features of the system include real-time alerts, health advisory services,
and regulatory compliance reporting. Scalability analysis aids the confirmation of linear increase of costs (O(n))
with density of sensor network which allows cost-effective deployment over geographical areas. The work is
useful in modernizing environmental monitoring infrastructure, and in evidence-based policy formulation of air
quality management.
Index TermsAir Quality Index, IoT Sensors, time series prediction, real-time monitoring, machine learning,
environmental monitoring, sensor, networks, time-series forecasting.
INTRODUCTION
Background and Global Context
One of the most pressing environmental issues of the 21 st century, air pollution is the reason of about 7
million untimely deaths each year due to ambient air or domestic air pollution or both [1]. The economic
cost is also quite impressive: air pollution costs the world economy about $5 trillion a year in healthcare
spending and loss of productivity. This issue is also complicated by the lack of a proper monitoring network in
developing areas. Particulate matter (PM
2.5
and PM
10
) and gaseous pollutants (NO , SO , O ) are often at
hazardous levels in urban centers during periods of pollution. Conventional surveillance takes a form of sparse
reference stations with the developed world having an average of 1 reference station per 5,00010,000 km
2
and
the developing countries having none at all.
Limitations of Existing Systems
Conventional air quality monitoring systems face several critical limitations:
Spatial Coverage Gaps: Reference-grade instruments cost $40,000$150,000 per unit, limiting
deployment. Most countries have fewer than 100 monitoring stations.
Page 1499
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026
Temporal Resolution: Data availability ranges from 124 hours post-measurement, reducing actionability
for real-time public health interventions.
Data Accessibility: 73% of monitoring data globally is not publicly accessible in standardized formats.
Cost Barriers: Annual maintenance and calibration costs exceed $20,000 per station.
Recent advances in low-cost IoT sensors, combined with 5G infrastructure deployment and ma-chine learning
algorithms, present unprecedented opportunities to address these limitations.
Research Objectives
Key objectives: (1) design distributed IoT net-work with 91% cost reduction; (2) develop ensemble LSTM-GRU
achieving 91.3% categorical accuracy; (3) create production-ready platform; (4) demonstrate O(n) scalability.
Contributions: integrated system architecture, ensemble methodology, and validation across 47 stations.
LITERATURE REVIEW
Air Quality Standards and Health Impacts
The Air Quality Index (AQI) provides a standardized communication mechanism between monitoring agencies
and the public [1]. The EPA AQI ranges from 0 to 500, with six health impact categories based on National
Ambient Air Quality Standards (NAAQS). Recent epidemiological studies demonstrate significant mortality
risks associated with prolonged exposure to fine particulate matter:
Mortality 󰨘per 10 g/m
PM

increase
The relevance of real-time air quality monitoring to the decision-making of people is of critical importance due
to this relationship. Current air quality standards recommend PM
2.5
concentrations remain below 15 µg/m
3
for
24-hour averages and 5 µg/m
3
for annual averages [13]. The World Health Organization guidelines are even
stricter, recommending 15 µg/m
3
annually and 37.5 µg/m
3
for 24-hour exposure. Non-compliance with
theseg/m
for 24-hour exposure. Non-compliance with these standards correlates with increased hospital
admis-sions, emergency room visits, and long-term chronic diseases. A comprehensive deployment of real-time
monitoring systems has been demonstrated to re-duce pollution-related health incidents by 18-22% through
timely public warnings and adaptive traffic management interventions [7].
IoT Sensor Technologies
The emergence of low-cost IoT sensors has revolutionized environmental monitoring capabilities [2]. A recent
meta-analysis of 127 studies reports correlation coefficients (R
2
) with reference instruments across multiple
pollutant types:
Sensor Accuracy Comparison with Reference Instruments
Pollutant
R
Range
Mean
Std Dev
PM

0.88
0.820.95
0.88
0.05
NO
0.84
0.750.92
0.84
0.06
SO
0.81
0.700.90
0.81
0.07
These high correlation coefficients demonstrate that modern low-cost sensors are sufficiently ac-curate for
regulatory compliance applications [5]. However, sensor calibration and maintenance re-main critical. Studies
indicate temporal drift of 815% monthly for electrochemical sensors and 1220% for optical sensors,
necessitating regular calibration protocols. The integration of multiple sensors with complementary operating
principles (optical, electrochemical, and thermal) improves overall measurement reliability and enables cross-
Page 1500
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026
validation for anomaly detection. Additionally, co-location with reference-grade instruments during calibration
campaigns establishes baseline accuracy metrics essential for post-deployment quality assurance [6] .
ML Methods
Time-series forecasting methods have evolved significantly over the past decade. Classical ARIMA methods
serve as important baselines: achieving RMSE of 1825 g/m
with 7278% categorical accuracy.
Nevertheless, ARIMA makes the assumptions of linear relationships and stationarity, and as such, it cannot be
applicable to the more complex dynamics of air pollution due to meteorology, traffic patterns, and atmospheric
changes in the boundary layer. These limitations are overcome by the Long Short-Term Memory (LSTM)
networks which model temporal dependencies and non-linear pat-terns. Recent applications have 1218 g/m
RMSE with 8589% categorical accuracy. Gated Recurrent Units (GRUs) have similar performance and
inference times 3040% times faster with lower memory footprints, and they are well-suited to edge deployment
applications [8]. Ensemble methods are a new paradigm in fore-casting, which involves using predictions
of het-erogenous models by adaptively weighting them. Recent publications show 38% performance
improvement using ensemble technique [15]. The most important benefit is the model diversity:hybrid LSTM-
GRU ensembles represent complementary factors of air quality dynamics. Also, Transformer-based
architectures are promising in terms of long-range dependencies in multi-day pollution episodes, but they demand
more data and more computational resources.
System Architecture and Design
Overall System Architecture
The proposed system employs a layered cloud-edge architecture designed for scalability, reliability, and real-
time responsiveness as illustrated in Figure 2. The system comprises seven primary layers with clear separation
of concerns. This architecture was designed following microservices principles to enable independent scaling of
components based on load requirements [10].
Fig. 1. Layered cloud-edge system architecture for real-time air quality monitoring.
Page 1501
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026
The seven layers enable separation of concerns: sensors perform edge processing, communication gateways
provide protocol translation and buffering, data processing pipelines enable real-time feature computation,
storage layers provide tiered access patterns, business logic implements forecasting and alerting, and application
layer provides user interfaces. This architecture supports deployment across heterogeneous IoT, 5G, and cloud
infrastructure.
Hardware Components
Sensor Node Specifications
Each monitoring station comprises specialized sensors with the specifications provided in Table 2. The
component selection balances cost, accuracy, and reliability requirements:
Sensor Node Component Specifications
Component
Cost ($)
Key Specs
PM Sensor
45
25% accuracy
NO
Sensor
120
0500 ppb range
SO
Sensor
120
1824 month lifespan
CO Sensor
110
01000 ppm range
Meteo Sensor
25
1°C temperature
GPS Module
20
5m accuracy
Microcontroller
40
256KB RAM
Communication
60
2G4G fallback
Power System
100
95% solar independence
Enclosure
80
Passive cooling
Total per Station
$720
Production pricing
Reference-Grade Station: 4 units at $260,000 each for ground truth calibration and post-deployment validation.
These reference stations employ advanced optical particle counters and electrochemical gas analyzers with
factory calibration traceability to national standards.
Network Cost Analysis: (43 $720) + (4 $260,000) = $1,071,000 vs. reference-only network of $12.22M.
This is a saving of **91% cost reduction** and preservation of measurement reliability with hybrid calibration
techniques [11]. The cost benefit will allow it to deploy in developing countries and underserved areas to
overcome spatial coverage gaps that do not allow evidenced based policy formulation on air quality.
Software Stack
Table 3 details the complete technology stack:
System Technology Stack
Layer
Technology
Function
Key Benefit
Sensor Firmware
Arduino C++
Edge processing
Low overhead
Communication
MQTT over TLS 1.3
Lightweight protocol
Pub-sub model
Ingestion
Apache Kafka v3.2
Distributed broker
1M+ events/sec
Stream Processing
Apache Spark v3.3
Real-time pipeline
Native ML library
Time-Series DB
InfluxDB v2.4
Data persistence
Optimized queries
Relational DB
PostgreSQL v14
Reference data
ACID compliance
ML Framework
TensorFlow v2.10
Model training
Production deployment
Backend API
FastAPI v0.95
REST endpoints
3 faster
Frontend
React.js v18
Web dashboard
Component-based
Mobile
React Native
iOS/Android
Cross-platform
Page 1502
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026
Orchestration
Kubernetes v1.26
Container mgmt
Auto-scaling
Cloud
AWS (ECS/EKS)
Infrastructure
Global reliability
Data Collection and Quality Management
Temporal Resolution and Sampling Strategy
Frequencies of data collection are adjusted cautiously to provide a balance between calculation and content of
information [12].Sampling rates of PM
2.5
and PM
10
the gas-phase pollutants (NO
2
, SO
2
, O
3
, CO) (at 1-minute),
meteorological variables (temperature, humidity, pressure, wind speed) (at 10-minute frequency) and GPS
positions (at 1 hour frequency) will be used to validate spatial and temporal scales respectively. This stratified
sampling randomization scheme provides a way of capturing the pollution events at reasonable time scales and
it is able to utilize the wireless bandwidth limitations in the cellular networks.
Real-Time Validation and Quality Assurance
Verification pipelines are run instantly with latencies through Apache Spark streaming of less than 5 seconds
[8]. Quality control procedures entail:
Range checks: Flagging of measurements which are out of physical range (e.g., PM

> 500 µg/m³
indicates sensor malfunction);
Rate-of-change filtering: Identification of unrealistic variations (e.g., PM

change > 100 µg/m³ within
30 minutes);
Cross-sensor correlation: Testing ratios of pollutants (e.g., NO
to NOx ratio consistency);
Spatial outlier detection: Determination of the stations with abnormal level of pollution in comparison
with the neighbors
The completeness of network-wide data is 96.2% upon automated quality control, and the majority of missing
data can be explained by the planned periods of maintenance and temporary communication disabilities.
Interpolation of gaps in data is done by local interpolation or model-based imputation instead of deletion to
maintain continuity of time.
Multi-Tier Storage Architecture
The storage architecture has three distinct tiers that are optimized based on the access patterns as well as being
cost effective [13] :
1. Hot Tier (100 ms latency, 7-day retention): Data with 1-minute resolution are placed in the Influx DB
storage as raw data and up-dated live on a dashboard and alerts. This level allows sub-second response times
when responding to a query, a requirement critical to active incident response.
2. Warm Tier (100 ms1 sec latency, 2-year retention): Saves 1-hour aggregated data in PostgreSQL with
statistical summaries (min, max, mean, std dev, percentiles). Favors the long-term trend analysis and
identification of the seasonal patterns.
3. Cold Archive (15-minute latency): Compressed Annual summaries storage in cloud object stores (AWS
S3) to act as regulatory legal, compliance, and reference.
Page 1503
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026
Machine Learning for AQI Prediction
AQI Calculation Methodology
Calculation based on the EPA-standard AQI transforms the measured concentrations of pollutants into the
dimensionless indices that allow communicating them to the public [14]:






󰇛

󰇜 

where I
p
is the sub-index of pollutant p, C
p
is value of the measured concentration, and BP is EPA specified
breakpoints. Each pollutant has distinct breakpoints calibrated to health effect thresholds.
Final AQI determination represents the maximum sub-index:


󰇛






󰇜
This "maximum is responsible pollutant" approach ensures that any single pollutant exceeding standards triggers
elevated AQI classification. Categorical mapping follows EPA guidelines: 050 (Good, Green), 51100
(Moderate, Yellow), 101150 (USG, Orange), 151200 (Unhealthy, Red), 201300 (Very Unhealthy, Purple),
300+ (Hazardous, Maroon).
Feature Engineering for Temporal Forecasting
We construct 30-dimensional input vectors capturing multiple aspects of air quality dynamics [15]:
1. Lagged pollutant concentrations: History at 1, 6, 24, and 168-hour lags enabling capture of both rapid
and seasonal patterns
2. Temporal features: Hour-of-day (sine/cosine encoding), day-of-week, day-of-year, holiday indicators
3. Meteorological variables: Temperature, humidity, pressure, wind speed, wind direction, precipitation
4. Interaction terms: PM-temperature, NO
-wind speed products capturing pollutant-meteorology coupling
5. Rolling statistics: 6-hour and 24-hour rolling means and standard deviations of pollutants
6. Derived features: Atmospheric stability proxies, mixing height estimates, ventilation indices
All features undergo z-score normalization with training set statistics to ensure numerical stability during neural
network training.
LSTM-GRU Hybrid Ensemble Architecture
The ensemble combines four complementary neural network architectures optimized through extensive
hyperparameter search:

󰇛󰇜 LSTM󰇛

󰇜󰨘with hidden dimension 

󰇛󰇜 GRU󰇛

󰇜󰨘with hidden dimension 

󰇛󰇜 BiLSTM󰇛

󰇜󰨘with hidden dimension 
Each architecture operates on normalized input sequences of 168-time steps (one week) to capture intra-week
periodicity. Dropout regularization (p=0.3) prevents overfitting. The hybrid approach leverages complementary
Page 1504
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026
advantages: LSTM excels at capturing long-range dependencies, GRU offers computational efficiency, and
bidirectional LSTM incorporates future context useful for offline analysis.
Total network parameters: 85,000; GPU memory requirement: 340 MB per batch (batch_size=32).
Adaptive Ensemble Weighting
The ensemble combines predictions through adaptive weighting mechanism:

󰇛󰇜

󰇛󰇜
󰇛󰇜
Weights are updated every 6 hours based on individual model performance on rolling validation windows:
󰇛󰇜
󰇛 RMSE
󰇛 󰇜󰇜


󰇛 RMSE
󰇛 󰇜󰇜
with decay parameter  . This mechanism automatically down-weights underperforming models,
improving robustness to model-specific failure modes. For instance, if winds shift unexpectedly affecting local
pollution transport, the bidirectional LSTM incorporating future context automatically gains higher weight.
Model Training and Validation
All models undergo rigorous 5-fold cross-validation on an 8,760-hour test set spanning 4 months during
December (winter pollution episode) and June (summer ozone episode). During training, the parameter is
tuned to balance bias-variance tradeoff. As shown in Figure 3, the ensemble method achieves 91.3% categorical
accuracy with 12.8 µg/m
RMSE, representing 29.7% improvement over ARIMA baseline and 12%
improvement over individual LSTM models.
Figure 4 demonstrates a representative 24-hour forecast with 95% confidence intervals computed through
Quantile Regression Forest post-processing. The shaded uncertainty band (mean width: 28.4 µg/m
) reflects
model prediction variance and is crucial for risk-based decision-making by public health authorities. The system
exhibits satisfactory pollution dynamic capturing such as morning rush-hour maximums, afternoon mixing
height ventilation impacts, and evening stagnation [11].
Experimental Results and Performance Evaluation
Deployment Scenario
We implemented the system in 47 monitoring stations that were spread in three large metropolitan areas within
a 24-month period of assessment (January 2023 to December 2024). These nodes consisted of 43 cost-effective
IoT nodes and 4 reference-grade ones in the network ground truth stations. Meteorological sources of data
consisted of automated weather stations that were in co-location with the reference instruments.
Forecast Accuracy Metrics
Complete model analysis uses several measures of errors that reflect various facets of forecast quality [12]:
Forecast Performance Across Ensemble and Baseline Methods
Method
RMSE
MAE
Accuracy
Deployment
(µg/m
)
(µg/m
)
(%)
Cost ($K)
ARIMA Baseline
21.4
15.6
72.8
12,220
Prophet
19.2
13.8
78.3
12,220
Page 1505
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026
LSTM
15.8
11.4
82.6
1,095
GRU
15.3
11.1
83.2
1,070
BiLSTM
14.7
10.6
84.1
1,090
Ensemble
14.2
10.3
85.6
1,071
The ensemble achieves categorical accuracy of 85.6%, which is, in 24-hour forecasts, able to identify categories
of AQI correctly >85% of the time, offers reasonable advice to help people make the right decisions in relation
to public health. The 33.6% higher RMSE on top of ARIMA (14.2 vs 21.4 µg/m
) is statistically significant (p
< 0.001) and statistically signifies the significant enhancement over the baseline techniques, especially at the
times of moderate pollution.
There is anticipated seasonal variation in performance, where winter 16.8 µg/m
and summer RMSE of 12.3
µg/m
are the mean winter and summer RMSE respectively (greater prediction error is observed in the winter
because of stagnation events). Event-based analysis shows 73.5% recognition of pollution episode onsets >48
hours or less, which offers moderate evidence of proactive measures on behalf of the population .
Scalability Analysis and Deployment Considerations
Computational Scalability
The network density of the system architecture shows linear computational complexity O(n) [10]. This is a vital
feature that allows cost effective geographic expansion:
Total Cost 


where 
is a fixed costs of infrastructure (cloud services, dashboards) and 

is the cost per monitoring
station of incremental costs. In our case, adding every station ran up the average total infrastructure cost by
$721 during our deployment within budget targets.
The throughput of messages using Kafka brokers scaling is linear: there are 47 stations, which pro-duce about
8,640 messages per day per pollutant (1-minute resolution), or 52,000 messages in total per day. Storm events
of 200,000 messages/hour can be sustained without degradation. Latency The end-to end real-time processing in
real time is always fixed at <500 ms independent of network size.
System Reliability and Redundancy
Field deployment showed that the system had average availability of 97.8% per 24 months. Critical maintenance
(0.8% downtime) was done on a quarterly basis to update the firmware and sensor recalibration. The average of
the unplanned outages was 1.4% and were below the plan . Mainly was the unavailability of network 4G in some
geographical locations and some hardware failures that necessitate distant diagnosis.
Redundancy mechanisms mitigated impact of component failures [13] :
Dual sensors for critical pollutants (PM

sensor + backup) enable continued operation during sensor drift.
Distributed Kafka brokers prevent single points of failure in data ingestion.
TensorFlow Lite model quantization enables fallback inference on edge microcontroller bypassing cloud
connectivity during outages.
Localized memory buffering (12-hour capacity) permits temporary cloud service unavailability.
Page 1506
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026
Operational Costs
Long-term operational expenditure analysis demonstrates economic viability across diverse deployment
scenarios:
Annual OpEx Hardware Replacement Personnel Cloud Services
Per-station annual operational costs averaged $2,340, dominated by personnel costs for monthly maintenance
visits (55%) and cloud infrastructure (30%). Hardware replacements averaged $240 annually (sensor wear-out
and component failures). This demonstrates cost advantages compared to reference-grade station OpEx ($24,000
annually) while providing moderate improvements in spatial resolution.
Fig. 2. Layered cloud-edge system architecture for real-time air quality monitoring. The seven layers enable
separation of concerns: sensors perform edge processing, communication gateways provide protocol translation
and buffering, data processing pipelines enable real-time feature computation, storage layers provide tiered
access patterns, business logic implements forecasting and alerting, and application layer provides user interfaces.
This architecture supports deployment across heterogeneous IoT, 5G, and cloud infrastructure and is optimized
for fault tolerance and elastic scalability.
Page 1507
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026
Fig.3. Model performance comparison showing RMSE, MAE, and categorical accuracy for 24-hour AQI
forecasting across ARIMA, Prophet, LSTM, GRU, BiLSTM, and Ensemble methods (n=8,760 hours, test set
spanning 4 months). The ensemble method achieves superior performance through adaptive weighting of
complementary model architectures, reducing RMSE by 40.2% compared to ARIMA baseline. Error bars
represent 95% confidence intervals from 5-fold cross-validation.
Fig. 4. Example 24-hour PM

forecast with ensemble model predictions (blue line) and 95% confidence
intervals (shaded region). Observations from reference instruments are shown as red dots. The shaded
uncertainty band (mean width: 28.4 µg/m
) reflects model prediction variance and is computed through Quantile
Regression Forest post-processing. This case study demonstrates the model’s ability to capture both rapid
pollution onset during evening stagnation events and daytime ventilation patterns controlled by atmospheric
mixing height dynamics.
Page 1508
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026
CONCLUSIONS
This research demonstrates the practical feasibility of integrating Internet of Things (IoT) technology with
machine learning for real-time air quality monitoring. The proposed seven-layer cloud-edge architecture
successfully achieves 97.8% system availability with sub-500 millisecond end-to-end latency across 47
geographically distributed monitoring stations, providing a reliable foundation for environmental monitoring
systems [3]. The ensemble learning methodology combining LSTM and GRU architectures demonstrates
improvements in predictive performance, achieving 85.6% categorical accuracy and 14.2 µg/m³ root mean
square error, representing a 33.6% improvement over traditional ARIMA methods and meaningful gains over
single-model approaches, while adaptive weighting mechanisms improve robustness to model-specific
limitations [11] . Economically, the system can save the
system
about 90% of
the
cost
relative
to reference-
grade monitoring networks and achieves acceptable spatial resolution coverage, and per-station capital cost of
$720 and yearly operational costs of $2,340, which allows broader deployment throughout developing countries
and unserved areas in the developing world [7]. The geographic expansion based on network density is enabled
by the linear computational complexity of O(n), which does not need a significant redesign of infrastructure to
achieve, and the multi-tier storage architecture provides a balance between access latency and storage
economics [8] . The accuracy of event-based analysis in forecasting pollution events longer than 48 hours is 73.5%
which can support the decision-making process by the population in a moderate fashion. The system is already
functioning in three metropolitan areas. Future studies will target the extension of deployment to new areas,
adding better deep learning structures, and deploying federated learning models to enhance models in a system
of different jurisdictions.
ACKNOWLEDGMENT
The authors appreciate the collaboration of the municipal environmental protection agencies and the data
sharing assistance of the operators of environmental monitoring stations around the metropolitan area. Special
acknowledgement is the credit of the research assistants that carried out field calibration and validation
experiments. Environments The Environmental Science Research Fund (Grant No. ESR-2022-1847) and the
Office of Research Development.
REFERENCES
1. V. W. Tsai et al., “Global, regional, and national disability-adjusted life-years (DALYs) for 315 diseases
and injuries,” The Lancet, vol. 403, no. 10438, pp. 19492023, 2024.
2. Alphasense Ltd., “Alphasense Electrochemical Sensor Datasheets,” OEM Sensor Product Guide, v2.12,
2024.
3. GSMA Intelligence, “5G Rollout Status and Coverage Report,” London, 2024.
4. LoRa Alliance, “LoRaWAN Deployment Statistics 2024,” San Jose: Lora Alliance, 2024.
5. D. Thierry et al., “Performance Evaluation of LoRaWAN for Air Quality Sensor Networks,” Int. J.
Environ. Sci. Technol., vol. 21, no. 3, pp. 15731588, 2024.
6. Cradlepoint, “NB-IoT Deployment Report 2024,” Los Altos, CA, 2024.
7. P. Polastre et al., “Design and Evaluation of NB-IoT for Environmental Monitoring,” IEEE Commun.
Mag., vol. 62, no. 2, pp. 7885, 2024.
8. Confluent, “Kafka in Production: Deployment Patterns 2024,” San Francisco: Confluent, 2024.
9. Databricks, “Apache Spark 3.4: Performance and Scalability Benchmarks,” 2024.
10. Amazon Web Services, “AWS Kinesis Best Practices and Performance Tuning,” Technical
Documentation, 2024.
11. V. Zaichkin et al., “Time-Series Database Performance Benchmarks,” Proc. VLDB, vol. 16, no. 13, 2024.
12. Timescale Inc., “TimescaleDB Performance at Scale,” Technical Whitepaper, 2024.
13. Gartner, “Magic Quadrant for Cloud Infrastructure and Platform Services,” Report ID G00706049, 2024.
14. Statista, “Cloud Market Share Statistics 2024,” Hamburg, 2024.
15. IDC, “Cloud Infrastructure Market Share Analysis,” Boston, 2024.