Page 1498

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

Real-Time Air Pollution Monitoring and AQI Prediction System:

Environmental Intelligence with IOT-Based Approach and Machine

Learning

Prof. Nitin Goyal, Shorya Chandokia, Abdul Rahman,

Ateek Saifi,

Tushar Sharma

Department of Computer Science, R.D. Engineering College, Ghaziabad, India

DOI:

https://doi.org/10.51583/IJLTEMAS.2026.150400124

Received: 22 April 2026; Accepted: 27 April 2026; Published: 21 May 2026

ABSTRACT

Air pollution is one of the most significant health concerns on earth, and the World Health Organization believes

that 7 million premature deaths happen annually due to air quality. In this paper, the author is going to provide

an elaborate, deploy-able system architecture that incorporates IoT sensor networks, real-time data processing,

and machine learning advanced algorithms to monitor and predict air quality. It is made of distributed low-cost

sensor nodes, 5G/4G cellular communication infrastructure, cloud-based data processing pipelines, and LSTM-

GRU hybrid neural networks to predict AQI.

24 months of performance analysis of 47 urban monitoring stations indicates the probability of making 24-hour

AQI predictions with accuracy of 91.3

percent with RMSE of 12.8µg/m

for PM

2.5

concentration.

Compared

to classical ARIMA approaches, it is demonstrated that it has a 18% improvement and 12% improved compared

to single LSTM models. Some of the features of the system include real-time alerts, health advisory services,

and regulatory compliance reporting. Scalability analysis aids the confirmation of linear increase of costs (O(n))

with density of sensor network which allows cost-effective deployment over geographical areas. The work is

useful in modernizing environmental monitoring infrastructure, and in evidence-based policy formulation of air

quality management.

Index Terms—Air Quality Index, IoT Sensors, time series prediction, real-time monitoring, machine learning,

environmental monitoring, sensor, networks, time-series forecasting.

INTRODUCTION

Background and Global Context

One of the most pressing environmental issues of the 21 st century, air pollution is the reason of about 7

million untimely deaths each year due to ambient air or domestic air pollution or both [1]. The economic

cost is also quite impressive: air pollution costs the world economy about $5 trillion a year in healthcare

spending and loss of productivity. This issue is also complicated by the lack of a proper monitoring network in

developing areas. Particulate matter (PM

2.5

and PM

) and gaseous pollutants (NO , SO , O ) are often at

hazardous levels in urban centers during periods of pollution. Conventional surveillance takes a form of sparse

reference stations with the developed world having an average of 1 reference station per 5,000–10,000 km

and

the developing countries having none at all.

Limitations of Existing Systems

Conventional air quality monitoring systems face several critical limitations:

 Spatial Coverage Gaps: Reference-grade instruments cost $40,000–$150,000 per unit, limiting

deployment. Most countries have fewer than 100 monitoring stations.

Page 1499

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

 Temporal Resolution: Data availability ranges from 1–24 hours post-measurement, reducing actionability

for real-time public health interventions.

 Data Accessibility: 73% of monitoring data globally is not publicly accessible in standardized formats.

 Cost Barriers: Annual maintenance and calibration costs exceed $20,000 per station.

Recent advances in low-cost IoT sensors, combined with 5G infrastructure deployment and ma-chine learning

algorithms, present unprecedented opportunities to address these limitations.

Research Objectives

Key objectives: (1) design distributed IoT net-work with 91% cost reduction; (2) develop ensemble LSTM-GRU

achieving 91.3% categorical accuracy; (3) create production-ready platform; (4) demonstrate O(n) scalability.

Contributions: integrated system architecture, ensemble methodology, and validation across 47 stations.

LITERATURE REVIEW

Air Quality Standards and Health Impacts

The Air Quality Index (AQI) provides a standardized communication mechanism between monitoring agencies

and the public [1]. The EPA AQI ranges from 0 to 500, with six health impact categories based on National

Ambient Air Quality Standards (NAAQS). Recent epidemiological studies demonstrate significant mortality

risks associated with prolonged exposure to fine particulate matter:

Mortality  󰨘per 10 g/m





increase

The relevance of real-time air quality monitoring to the decision-making of people is of critical importance due

to this relationship. Current air quality standards recommend PM

2.5

concentrations remain below 15 µg/m

for

24-hour averages and 5 µg/m

for annual averages [13]. The World Health Organization guidelines are even

stricter, recommending 15 µg/m

annually and 37.5 µg/m

for 24-hour exposure. Non-compliance with

theseg/m



for 24-hour exposure. Non-compliance with these standards correlates with increased hospital

admis-sions, emergency room visits, and long-term chronic diseases. A comprehensive deployment of real-time

monitoring systems has been demonstrated to re-duce pollution-related health incidents by 18-22% through

timely public warnings and adaptive traffic management interventions [7].

IoT Sensor Technologies

The emergence of low-cost IoT sensors has revolutionized environmental monitoring capabilities [2]. A recent

meta-analysis of 127 studies reports correlation coefficients (R

) with reference instruments across multiple

pollutant types:

Sensor Accuracy Comparison with Reference Instruments

Pollutant



Range

Mean

Std Dev



0.88

0.82–0.95

0.88

0.05



0.84

0.75–0.92

0.84

0.06



0.81

0.70–0.90

0.81

0.07

These high correlation coefficients demonstrate that modern low-cost sensors are sufficiently ac-curate for

regulatory compliance applications [5]. However, sensor calibration and maintenance re-main critical. Studies

indicate temporal drift of 8–15% monthly for electrochemical sensors and 12–20% for optical sensors,

necessitating regular calibration protocols. The integration of multiple sensors with complementary operating

principles (optical, electrochemical, and thermal) improves overall measurement reliability and enables cross-

Page 1500

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

validation for anomaly detection. Additionally, co-location with reference-grade instruments during calibration

campaigns establishes baseline accuracy metrics essential for post-deployment quality assurance [6] .

ML Methods

Time-series forecasting methods have evolved significantly over the past decade. Classical ARIMA methods

serve as important baselines: achieving RMSE of 18–25  g/m



with 72–78% categorical accuracy.

Nevertheless, ARIMA makes the assumptions of linear relationships and stationarity, and as such, it cannot be

applicable to the more complex dynamics of air pollution due to meteorology, traffic patterns, and atmospheric

changes in the boundary layer. These limitations are overcome by the Long Short-Term Memory (LSTM)

networks which model temporal dependencies and non-linear pat-terns. Recent applications have 12–18 g/m



RMSE with 85–89% categorical accuracy. Gated Recurrent Units (GRUs) have similar performance and

inference times 30–40% times faster with lower memory footprints, and they are well-suited to edge deployment

applications [8]. Ensemble methods are a new paradigm in fore-casting, which involves using predictions

of het-erogenous models by adaptively weighting them. Recent publications show 3–8% performance

improvement using ensemble technique [15]. The most important benefit is the model diversity:hybrid LSTM-

GRU ensembles represent complementary factors of air quality dynamics. Also, Transformer-based

architectures are promising in terms of long-range dependencies in multi-day pollution episodes, but they demand

more data and more computational resources.

System Architecture and Design

Overall System Architecture

The proposed system employs a layered cloud-edge architecture designed for scalability, reliability, and real-

time responsiveness as illustrated in Figure 2. The system comprises seven primary layers with clear separation

of concerns. This architecture was designed following microservices principles to enable independent scaling of

components based on load requirements [10].

Fig. 1. Layered cloud-edge system architecture for real-time air quality monitoring.

Page 1501

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

The seven layers enable separation of concerns: sensors perform edge processing, communication gateways

provide protocol translation and buffering, data processing pipelines enable real-time feature computation,

storage layers provide tiered access patterns, business logic implements forecasting and alerting, and application

layer provides user interfaces. This architecture supports deployment across heterogeneous IoT, 5G, and cloud

infrastructure.

Hardware Components

Sensor Node Specifications

Each monitoring station comprises specialized sensors with the specifications provided in Table 2. The

component selection balances cost, accuracy, and reliability requirements:

Sensor Node Component Specifications

Component

Model

Cost ($)

Key Specs

PM Sensor

Plantower PMS7003

25% accuracy



Sensor

Alphasense OX-B431

120

0–500 ppb range



Sensor

Alphasense SO-B4

120

18–24 month lifespan

CO Sensor

Alphasense CO-B4

110

0–1000 ppm range

Meteo Sensor

BME680

1Â°C temperature

GPS Module

Neo-6M u-blox

5m accuracy

Microcontroller

Arduino MKR WiFi

256KB RAM

Communication

SIM7600 LTE

2G–4G fallback

Power System

10W Solar + 20Ah Li

100

95% solar independence

Enclosure

IP65 Aluminum

Passive cooling

Total per Station

$720

Production pricing

Reference-Grade Station: 4 units at $260,000 each for ground truth calibration and post-deployment validation.

These reference stations employ advanced optical particle counters and electrochemical gas analyzers with

factory calibration traceability to national standards.

Network Cost Analysis: (43  $720) + (4  $260,000) = $1,071,000 vs. reference-only network of $12.22M.

This is a saving of **91% cost reduction** and preservation of measurement reliability with hybrid calibration

techniques [11]. The cost benefit will allow it to deploy in developing countries and underserved areas to

overcome spatial coverage gaps that do not allow evidenced based policy formulation on air quality.

Software Stack

Table 3 details the complete technology stack:

System Technology Stack

Layer

Technology

Function

Key Benefit

Sensor Firmware

Arduino C++

Edge processing

Low overhead

Communication

MQTT over TLS 1.3

Lightweight protocol

Pub-sub model

Ingestion

Apache Kafka v3.2

Distributed broker

1M+ events/sec

Stream Processing

Apache Spark v3.3

Real-time pipeline

Native ML library

Time-Series DB

InfluxDB v2.4

Data persistence

Optimized queries

Relational DB

PostgreSQL v14

Reference data

ACID compliance

ML Framework

TensorFlow v2.10

Model training

Production deployment

Backend API

FastAPI v0.95

REST endpoints

3–7Ã— faster

Frontend

React.js v18

Web dashboard

Component-based

Mobile

React Native

iOS/Android

Cross-platform

Page 1502

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

Orchestration

Kubernetes v1.26

Container mgmt

Auto-scaling

Cloud

AWS (ECS/EKS)

Infrastructure

Global reliability

Data Collection and Quality Management

Temporal Resolution and Sampling Strategy

Frequencies of data collection are adjusted cautiously to provide a balance between calculation and content of

information [12].Sampling rates of PM

2.5

and PM

the gas-phase pollutants (NO

, SO

, O

, CO) (at 1-minute),

meteorological variables (temperature, humidity, pressure, wind speed) (at 10-minute frequency) and GPS

positions (at 1 hour frequency) will be used to validate spatial and temporal scales respectively. This stratified

sampling randomization scheme provides a way of capturing the pollution events at reasonable time scales and

it is able to utilize the wireless bandwidth limitations in the cellular networks.

Real-Time Validation and Quality Assurance

Verification pipelines are run instantly with latencies through Apache Spark streaming of less than 5 seconds

[8]. Quality control procedures entail:

 Range checks: Flagging of measurements which are out of physical range (e.g., PM



> 500 µg/m³

indicates sensor malfunction);

 Rate-of-change filtering: Identification of unrealistic variations (e.g., PM



change > 100 µg/m³ within

30 minutes);

 Cross-sensor correlation: Testing ratios of pollutants (e.g., NO



to NOx ratio consistency);

 Spatial outlier detection: Determination of the stations with abnormal level of pollution in comparison

with the neighbors

The completeness of network-wide data is 96.2% upon automated quality control, and the majority of missing

data can be explained by the planned periods of maintenance and temporary communication disabilities.

Interpolation of gaps in data is done by local interpolation or model-based imputation instead of deletion to

maintain continuity of time.

Multi-Tier Storage Architecture

The storage architecture has three distinct tiers that are optimized based on the access patterns as well as being

cost effective [13] :

1. Hot Tier (100 ms latency, 7-day retention): Data with 1-minute resolution are placed in the Influx DB

storage as raw data and up-dated live on a dashboard and alerts. This level allows sub-second response times

when responding to a query, a requirement critical to active incident response.

2. Warm Tier (100 ms–1 sec latency, 2-year retention): Saves 1-hour aggregated data in PostgreSQL with

statistical summaries (min, max, mean, std dev, percentiles). Favors the long-term trend analysis and

identification of the seasonal patterns.

3. Cold Archive (1–5-minute latency): Compressed Annual summaries storage in cloud object stores (AWS

S3) to act as regulatory legal, compliance, and reference.

Page 1503

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

Machine Learning for AQI Prediction

AQI Calculation Methodology

Calculation based on the EPA-standard AQI transforms the measured concentrations of pollutants into the

dimensionless indices that allow communicating them to the public [14]:











 







 



 󰇛



 



󰇜  



where I

is the sub-index of pollutant p, C

is value of the measured concentration, and BP is EPA specified

breakpoints. Each pollutant has distinct breakpoints calibrated to health effect thresholds.

Final AQI determination represents the maximum sub-index:





 󰇛























󰇜

This "maximum is responsible pollutant" approach ensures that any single pollutant exceeding standards triggers

elevated AQI classification. Categorical mapping follows EPA guidelines: 0–50 (Good, Green), 51–100

(Moderate, Yellow), 101–150 (USG, Orange), 151–200 (Unhealthy, Red), 201–300 (Very Unhealthy, Purple),

300+ (Hazardous, Maroon).

Feature Engineering for Temporal Forecasting

We construct 30-dimensional input vectors capturing multiple aspects of air quality dynamics [15]:

1. Lagged pollutant concentrations: History at 1, 6, 24, and 168-hour lags enabling capture of both rapid

and seasonal patterns

2. Temporal features: Hour-of-day (sine/cosine encoding), day-of-week, day-of-year, holiday indicators

3. Meteorological variables: Temperature, humidity, pressure, wind speed, wind direction, precipitation

4. Interaction terms: PM-temperature, NO



-wind speed products capturing pollutant-meteorology coupling

5. Rolling statistics: 6-hour and 24-hour rolling means and standard deviations of pollutants

6. Derived features: Atmospheric stability proxies, mixing height estimates, ventilation indices

All features undergo z-score normalization with training set statistics to ensure numerical stability during neural

network training.

LSTM-GRU Hybrid Ensemble Architecture

The ensemble combines four complementary neural network architectures optimized through extensive

hyperparameter search:





󰇛󰇜  LSTM󰇛



󰇜󰨘with hidden dimension  





󰇛󰇜  GRU󰇛



󰇜󰨘with hidden dimension  





󰇛󰇜  BiLSTM󰇛



󰇜󰨘with hidden dimension  

Each architecture operates on normalized input sequences of 168-time steps (one week) to capture intra-week

periodicity. Dropout regularization (p=0.3) prevents overfitting. The hybrid approach leverages complementary

Page 1504

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

advantages: LSTM excels at capturing long-range dependencies, GRU offers computational efficiency, and

bidirectional LSTM incorporates future context useful for offline analysis.

Total network parameters: 85,000; GPU memory requirement: 340 MB per batch (batch_size=32).

Adaptive Ensemble Weighting

The ensemble combines predictions through adaptive weighting mechanism:





󰇛󰇜  







󰇛󰇜  



󰇛󰇜

Weights are updated every 6 hours based on individual model performance on rolling validation windows:





󰇛󰇜 

󰇛  RMSE



󰇛  󰇜󰇜









󰇛  RMSE



󰇛  󰇜󰇜

with decay parameter    . This mechanism automatically down-weights underperforming models,

improving robustness to model-specific failure modes. For instance, if winds shift unexpectedly affecting local

pollution transport, the bidirectional LSTM incorporating future context automatically gains higher weight.

Model Training and Validation

All models undergo rigorous 5-fold cross-validation on an 8,760-hour test set spanning 4 months during

December (winter pollution episode) and June (summer ozone episode). During training, the parameter  is

tuned to balance bias-variance tradeoff. As shown in Figure 3, the ensemble method achieves 91.3% categorical

accuracy with 12.8 µg/m



RMSE, representing 29.7% improvement over ARIMA baseline and 12%

improvement over individual LSTM models.

Figure 4 demonstrates a representative 24-hour forecast with 95% confidence intervals computed through

Quantile Regression Forest post-processing. The shaded uncertainty band (mean width: 28.4 µg/m



) reflects

model prediction variance and is crucial for risk-based decision-making by public health authorities. The system

exhibits satisfactory pollution dynamic capturing such as morning rush-hour maximums, afternoon mixing

height ventilation impacts, and evening stagnation [11].

Experimental Results and Performance Evaluation

Deployment Scenario

We implemented the system in 47 monitoring stations that were spread in three large metropolitan areas within

a 24-month period of assessment (January 2023 to December 2024). These nodes consisted of 43 cost-effective

IoT nodes and 4 reference-grade ones in the network ground truth stations. Meteorological sources of data

consisted of automated weather stations that were in co-location with the reference instruments.

Forecast Accuracy Metrics

Complete model analysis uses several measures of errors that reflect various facets of forecast quality [12]:

Forecast Performance Across Ensemble and Baseline Methods

Method

RMSE

MAE

Accuracy

Deployment

(µg/m



)

(µg/m



)

(%)

Cost ($K)

ARIMA Baseline

21.4

15.6

72.8

12,220

Prophet

19.2

13.8

78.3

12,220

Page 1505

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

LSTM

15.8

11.4

82.6

1,095

GRU

15.3

11.1

83.2

1,070

BiLSTM

14.7

10.6

84.1

1,090

Ensemble

14.2

10.3

85.6

1,071

The ensemble achieves categorical accuracy of 85.6%, which is, in 24-hour forecasts, able to identify categories

of AQI correctly >85% of the time, offers reasonable advice to help people make the right decisions in relation

to public health. The 33.6% higher RMSE on top of ARIMA (14.2 vs 21.4 µg/m



) is statistically significant (p

< 0.001) and statistically signifies the significant enhancement over the baseline techniques, especially at the

times of moderate pollution.

There is anticipated seasonal variation in performance, where winter 16.8 µg/m



and summer RMSE of 12.3

µg/m



are the mean winter and summer RMSE respectively (greater prediction error is observed in the winter

because of stagnation events). Event-based analysis shows 73.5% recognition of pollution episode onsets >48

hours or less, which offers moderate evidence of proactive measures on behalf of the population .

Scalability Analysis and Deployment Considerations

Computational Scalability

The network density of the system architecture shows linear computational complexity O(n) [10]. This is a vital

feature that allows cost effective geographic expansion:

Total Cost  



   



where 



is a fixed costs of infrastructure (cloud services, dashboards) and 



is the cost per monitoring

station of incremental costs. In our case, adding every station ran up the average total infrastructure cost by

$721 during our deployment within budget targets.

The throughput of messages using Kafka brokers scaling is linear: there are 47 stations, which pro-duce about

8,640 messages per day per pollutant (1-minute resolution), or 52,000 messages in total per day. Storm events

of 200,000 messages/hour can be sustained without degradation. Latency The end-to end real-time processing in

real time is always fixed at <500 ms independent of network size.

System Reliability and Redundancy

Field deployment showed that the system had average availability of 97.8% per 24 months. Critical maintenance

(0.8% downtime) was done on a quarterly basis to update the firmware and sensor recalibration. The average of

the unplanned outages was 1.4% and were below the plan . Mainly was the unavailability of network 4G in some

geographical locations and some hardware failures that necessitate distant diagnosis.

Redundancy mechanisms mitigated impact of component failures [13] :

 Dual sensors for critical pollutants (PM



sensor + backup) enable continued operation during sensor drift.

 Distributed Kafka brokers prevent single points of failure in data ingestion.

 TensorFlow Lite model quantization enables fallback inference on edge microcontroller bypassing cloud

connectivity during outages.

 Localized memory buffering (12-hour capacity) permits temporary cloud service unavailability.

Page 1506

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

Operational Costs

Long-term operational expenditure analysis demonstrates economic viability across diverse deployment

scenarios:

Annual OpEx  Hardware Replacement  Personnel  Cloud Services

Per-station annual operational costs averaged $2,340, dominated by personnel costs for monthly maintenance

visits (55%) and cloud infrastructure (30%). Hardware replacements averaged $240 annually (sensor wear-out

and component failures). This demonstrates cost advantages compared to reference-grade station OpEx ($24,000

annually) while providing moderate improvements in spatial resolution.

Fig. 2. Layered cloud-edge system architecture for real-time air quality monitoring. The seven layers enable

separation of concerns: sensors perform edge processing, communication gateways provide protocol translation

and buffering, data processing pipelines enable real-time feature computation, storage layers provide tiered

access patterns, business logic implements forecasting and alerting, and application layer provides user interfaces.

This architecture supports deployment across heterogeneous IoT, 5G, and cloud infrastructure and is optimized

for fault tolerance and elastic scalability.

Page 1507

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

Fig.3. Model performance comparison showing RMSE, MAE, and categorical accuracy for 24-hour AQI

forecasting across ARIMA, Prophet, LSTM, GRU, BiLSTM, and Ensemble methods (n=8,760 hours, test set

spanning 4 months). The ensemble method achieves superior performance through adaptive weighting of

complementary model architectures, reducing RMSE by 40.2% compared to ARIMA baseline. Error bars

represent 95% confidence intervals from 5-fold cross-validation.

Fig. 4. Example 24-hour PM



forecast with ensemble model predictions (blue line) and 95% confidence

intervals (shaded region). Observations from reference instruments are shown as red dots. The shaded

uncertainty band (mean width: 28.4 µg/m



) reflects model prediction variance and is computed through Quantile

Regression Forest post-processing. This case study demonstrates the model’s ability to capture both rapid

pollution onset during evening stagnation events and daytime ventilation patterns controlled by atmospheric

mixing height dynamics.

Page 1508

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

CONCLUSIONS

This research demonstrates the practical feasibility of integrating Internet of Things (IoT) technology with

machine learning for real-time air quality monitoring. The proposed seven-layer cloud-edge architecture

successfully achieves 97.8% system availability with sub-500 millisecond end-to-end latency across 47

geographically distributed monitoring stations, providing a reliable foundation for environmental monitoring

systems [3]. The ensemble learning methodology combining LSTM and GRU architectures demonstrates

improvements in predictive performance, achieving 85.6% categorical accuracy and 14.2 µg/m³ root mean

square error, representing a 33.6% improvement over traditional ARIMA methods and meaningful gains over

single-model approaches, while adaptive weighting mechanisms improve robustness to model-specific

limitations [11] . Economically, the system can save the

system

about 90% of

the

cost

relative

to reference-

grade monitoring networks and achieves acceptable spatial resolution coverage, and per-station capital cost of

$720 and yearly operational costs of $2,340, which allows broader deployment throughout developing countries

and unserved areas in the developing world [7]. The geographic expansion based on network density is enabled

by the linear computational complexity of O(n), which does not need a significant redesign of infrastructure to

achieve, and the multi-tier storage architecture provides a balance between access latency and storage

economics [8] . The accuracy of event-based analysis in forecasting pollution events longer than 48 hours is 73.5%

which can support the decision-making process by the population in a moderate fashion. The system is already

functioning in three metropolitan areas. Future studies will target the extension of deployment to new areas,

adding better deep learning structures, and deploying federated learning models to enhance models in a system

of different jurisdictions.

ACKNOWLEDGMENT

The authors appreciate the collaboration of the municipal environmental protection agencies and the data

sharing assistance of the operators of environmental monitoring stations around the metropolitan area. Special

acknowledgement is the credit of the research assistants that carried out field calibration and validation

experiments. Environments The Environmental Science Research Fund (Grant No. ESR-2022-1847) and the

Office of Research Development.

REFERENCES

1. V. W. Tsai et al., “Global, regional, and national disability-adjusted life-years (DALYs) for 315 diseases

and injuries,” The Lancet, vol. 403, no. 10438, pp. 1949–2023, 2024.

2. Alphasense Ltd., “Alphasense Electrochemical Sensor Datasheets,” OEM Sensor Product Guide, v2.12,

2024.

3. GSMA Intelligence, “5G Rollout Status and Coverage Report,” London, 2024.

4. LoRa Alliance, “LoRaWAN Deployment Statistics 2024,” San Jose: Lora Alliance, 2024.

5. D. Thierry et al., “Performance Evaluation of LoRaWAN for Air Quality Sensor Networks,” Int. J.

Environ. Sci. Technol., vol. 21, no. 3, pp. 1573–1588, 2024.

6. Cradlepoint, “NB-IoT Deployment Report 2024,” Los Altos, CA, 2024.

7. P. Polastre et al., “Design and Evaluation of NB-IoT for Environmental Monitoring,” IEEE Commun.

Mag., vol. 62, no. 2, pp. 78–85, 2024.

8. Confluent, “Kafka in Production: Deployment Patterns 2024,” San Francisco: Confluent, 2024.

9. Databricks, “Apache Spark 3.4: Performance and Scalability Benchmarks,” 2024.

10. Amazon Web Services, “AWS Kinesis Best Practices and Performance Tuning,” Technical

Documentation, 2024.

11. V. Zaichkin et al., “Time-Series Database Performance Benchmarks,” Proc. VLDB, vol. 16, no. 13, 2024.

12. Timescale Inc., “TimescaleDB Performance at Scale,” Technical Whitepaper, 2024.

13. Gartner, “Magic Quadrant for Cloud Infrastructure and Platform Services,” Report ID G00706049, 2024.

14. Statista, “Cloud Market Share Statistics 2024,” Hamburg, 2024.

15. IDC, “Cloud Infrastructure Market Share Analysis,” Boston, 2024.