INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

Centrality Measures in QSPR Modelling of Antiviral Compounds for

COVID-19

Pavithra M and Veena Mathad

Department of Studies in Mathematics, University of Mysore, Manasagangotri, Mysuru, India

DOI : https://doi.org/10.51583/IJLTEMAS.2025.1412000101

Received: 28 December 2025; Accepted: 02 January 2026; Published: 09 January 2026

ABSTRACT:

The Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), causing COVID-19, lacks specific

antiviral treatments, escalating the global health crisis. This study employs Quantitative Structure-Property

Relationship (QSPR) modelling to explore eight physicochemical properties of antiviral compounds, including

Arbidol, Chloroquine, Hydroxychloroquine, Lopinavir, Remdesivir, Ritonavir, Thalidomide, and Theaflavin.

Centrality measures, used as molecular descriptors, quantify the relationship between molecular structure and

physicochemical attributes in QSPR studies.

To address missing data for Remdesivir’s Boiling Point (BP), Enthalpy of Vaporisation (E), and Flash Point

(FP), correlation-based linear regression imputation was applied using descriptors like Normalised Harmonic

Centrality Weight (푟 =0.976 for BP, 0.973 for E) and Eccentricity Weight (푟 =0.957 for FP), ensuring dataset

integrity. Nine graph-based centrality measures were evaluated for their correlation with the physicochemical

properties of these drugs. Pearson correlation analysis revealed strong positive correlations, notably Normalised

Harmonic Centrality Weight with BP (0.978), E (0.974), and Polar Surface Area (PSA) (0.894), and Eccentricity

Weight with Flash Point (0.959), Molar Refractivity (MR) (0.975), and Molar Volume (MV) (0.923).

Conversely, Total Closeness Centrality Weight and Leverage Centrality Weight showed significant negative

correlations (below −0.5). Single-predictor linear regression models were developed, with robustness assessed

via predictive 푅² using leave-one-out cross-validation and the PRESS statistic. These models offer interpretable

predictions of structural influences on physicochemical behaviour, aiding pharmaceutical researchers in

predicting antiviral drug properties for COVID-19 before experimental validation.

Keywords: Centrality Measures, QSPR Modelling, Antiviral Drugs, Physicochemical Properties, Molecular

Descriptors, Predictive Modelling

Mathematics Subject Classification AMS (2000): 05C12, 05C38.

INTRODUCTION

The global outbreak of COVID-19, caused by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-

2), originated in Wuhan, China, on 31st December 2019 [1], and was classified as a pandemic by the World

Health Organisation on 11th March 2020 [2]. The lack of specific antiviral agents poses a critical barrier to

effective management of the disease, necessitating rapid identification of viable therapeutic options [3]. Antiviral

compounds previously developed for infections such as SARS, MERS, and Influenza are currently under

evaluation for their potential against SARS-CoV-2 [4, 5]. These include Arbidol [6], Chloroquine [7],

Hydroxychloroquine [5], Lopinavir [8], Ritonavir [9], Thalidomide [10], Theaflavin [11], and Remdesivir [12],

selected based on their pharmacological properties and preliminary indication of efficacy.

The foundation for selecting these compounds is rooted in their established mechanisms. Lopinavir and Ritonavir

inhibit coronavirus proteases (papain-like and 3C-like), with clinical data suggesting reduced mortality in severe

cases [13]. Arbidol, employed in Russia and China, targets Influenza A, Influenza B, and Hepatitis C, prompting

its investigation despite limited global approval for COVID-19 [9]. Thalidomide’s immunomodulatory effects

make it suitable for addressing inflammatory complications [10]. Chloroquine and Hydroxychloroquine,

www.ijltemas.in

Page 1135

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

traditionally antimalarial drugs, exhibit antiviral and immune-modulating activities [7]. Theaflavin, derived from

tea, shows early antiviral potential [11]. Remdesivir, originally for Ebola, disrupts SARS-CoV-2 replication,

with trials indicating therapeutic promise [12, 14]. A preliminary study combining Lopinavir/Ritonavir, Arbidol,

and Shufeng Jiedu Capsule (a Chinese herbal remedy) reported clinical benefits in patients, supporting

exploration of combination therapies [6, 15].

This investigation applies Quantitative Structure-Property Relationship (QSPR) modelling, employing distance-

based and degree-based centrality measures derived from molecular graph theory to predict eight

physicochemical properties of the selected compounds: Boiling Point (BP, mmHg), Enthalpy of Vaporisation

(E, kJ/mol), Flash Point (FP, °C), Molar Refractivity (MR, cm³), Polar Surface Area (PSA, Å²), Packing Volume

(P, cm³), Molar Volume (MV, cm³), and Molecular Weight (MW, g/mol) [16, 17]. These centrality measures

quantify structural influences on molecular behaviour, enabling prediction of properties critical for drug design

and reducing dependence on resource-intensive laboratory experiments [18, 19].

A primary obstacle was the unavailability of experimental data for Remdesivir’s boiling point, enthalpy of

vaporisation, and flash point [20]. Lacking a full dataset, making QSPR models is unreliable, as incomplete data

can compromise model accuracy and reliability [21, 22]. This was resolved through correlation-based linear

regression imputation, utilising strongly correlated molecular descriptors to maintain dataset integrity [23, 24].

To address missing data in this investigation of QSPR analysis, we employed regression-based data substitution,

achieving a correlation fit of 97%. This approach is supported by studies demonstrating its efficacy in predictive

modelling [25] and its suitability for datasets with strong variable correlations [26]. While applicable to

longitudinal data [27], its principles extend to QSPR’s interdependent molecular descriptors. Compared to

modern techniques [28], regression imputation remains a robust choice for our high-correlation context. The

methodology integrates statistical analysis via Minitab and advanced data processing through MATLAB,

structured in three phases: imputation of missing Remdesivir data, construction of QSPR models using single-

predictor linear regression based on centrality measures, and validation of model accuracy via the Predicted

Residual Error Sum of Squares (PRESS) statistic [29, 30], from leave-one-out cross-validation. This approach

ensures robust and generalizable models [31-33].

Complex networks, consisting of nodes and links, are dominant in various fields like physics, biology, and social

sciences; these include examples such as the internet, social media, and biological networks like protein-protein

interaction links. Studying these networks is a crucial area of multidisciplinary research, as identifying the most

influential nodes is theoretically and practically significant for understanding how information spreads;

centrality is a key concept in network analysis used to determine a node’s importance, and these measures are

among the most common analytical techniques for identifying powerful nodes.

Several centrality measures have been proposed to evaluate the influence of nodes in networks, including degree,

betweenness, closeness, leverage, harmonic, and eigenvector centrality. Degree Centrality ranks nodes by the

number of connections they have, with those having more connections being considered more influential (1)[34];

( )

a related measures, the Total Degree Centrality Weight (퐷푊 퐺 ), and Degree Centrality Weight (퐷퐶푊 퐺 ),

measure the overall heterogeneity of the network’s degree distribution, reflecting how evenly connections are

distributed across nodes (11, 12). Closeness centrality measures how quickly a node can spread information to

other nodes in the network (2)[35], while the Total Closeness Centrality Weight (푇퐶푊(퐺)) and Closeness

( )

(

)

Centrality Weight 퐶푊 퐺

are global measures that characterise the network’s overall compactness and

communication efficiency (4, 5). Leverage Centrality considers a node’s degree relative to its neighbours,

operating on the principle that a node is central, if its neighbours depend on it for information (6) [36, 37]; the

Leverage Centrality Weight (퐿푊(퐺)) serves as a collective measure of the network’s structural dependency

(13). Harmonic Centrality calculates a node’s importance by summating the inverse of its geodesic distances to

( )

(

)

all other nodes (7)[38]; the corresponding Harmonic Centrality Weight 퐻푊 퐺

and Normalised Harmonic

( )

(

)

Centrality Weight 퐻퐶푊 퐺

quantify the network’s global influence potential and overall efficiency (14, 15).

Finally, Eccentricity Centrality identifies the "weakest links" in a network, which can be targeted for either

disruption or strengthening to improve flexibility (10)[39]; the Eccentricity Weight (퐸푊(퐺)) provides insight

www.ijltemas.in

Page 1136

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

into the longest paths within the network (16), while the Eccentricity Centrality Weight (퐸퐶푊(퐺)) quantifies

the network’s overall flexibility by summing the reciprocals of these path lengths (17).

Mathematical Preliminaries and Centrality Measures

Let 퐺 = (푉(퐺), 퐸(퐺)) be a simple, connected, finite, and undirected graph, where 푉(퐺) is the set of vertices

and 퐸(퐺) is the set of edges. The number of vertices and edges are denoted by 푛 = |푉(퐺)| and 푚 = |퐸(퐺)|,

respectively. The degree of a vertex 푣, denoted by 푑(푣), is the number of edges incident to 푣. The distance

between two vertices 푢 and 푣, denoted by 푑(푢, 푣), is the length of the shortest path connecting them in 퐺.

Centrality Measures

( )

1.1.1 Degree Centrality (퐷_ꢀ푣 )[34 ]: Degree Centrality is a measure based on the number of connections a

node has. The degree centrality of a vertex 푣 ∈ 푉(퐺) is defined as the number of vertices adjacent to 푣, which

is nothing but the degree of 푣. This value is divided by the maximum possible degree of a vertex to normalise it.

So, the Normalised Degree Centrality (퐷퐶(푣)) of the vertex 푣 is given by:

ꢁ(ꢂ)

퐷퐶(푣) =

…. (1)

ꢃ−1

1.1.2 Closeness Centrality (퐶_ꢀ(푢))[35]: Closeness Centrality is the reciprocal of the sum of distances from a

vertex to all other vertices. For a vertex 푢 ∈ 푉(퐺) it is defined as:

1

퐶_ꢀ(푢) =

…. (2)

∑

ꢁ(ꢄ,ꢂ)

( )

ꢅ∈ꢆ ꢇ ,ꢅ≠ꢈ

This value is multiplied by the maximum possible degree of a vertex to normalise it. So, the normalised

( )

Closeness Centrality (퐶_ꢉ푢 ) of a vertex 푢 is given by:

ꢃ−1

( )

퐶_ꢉ푢 =

…. (3)

∑

(

)

ꢁ ꢄ,ꢂ

( )

ꢅ∈ꢆ ꢇ ,ꢅ≠ꢈ

The Total Closeness Centrality Weight (푇퐶푊(퐺)) of 퐺 is:

ꢃ

∑

푇퐶푊(퐺) = _푘=1퐶_ꢀ(푣_푘)

…. (4)

The Closeness Centrality Weight (퐶푊(퐺)) of 퐺 is:

ꢃ

∑

퐶푊(퐺) =

_푘=1퐶_ꢉ(푣_푘)

…. (5)

1.1.3 Leverage Centrality (퐿(푣))[36, 37]: Leverage Centrality measures the relationship between a vertex’s

( )

degree and the degrees of its neighbours. For a vertex 푣 ∈ 푉 퐺 , it is defined as:

( )

ꢁ ꢂ −ꢁ(ꢄ)

1

( )

퐿 푣 =

∑

ꢄ∈푁

…. (6)

ꢅ

( )

ꢁ ꢂ +ꢁ(ꢄ)

ꢁ(ꢂ)

Where ꢊ_ꢂis the set of neighbours of 푣.

1.1.4 Harmonic Centrality (퐻퐶(푢))[38]: Harmonic Centrality is the sum of the inverse of distances from a vertex

to all other vertices. For a vertex 푢 ∈ 푉(퐺), it is given by:

1

( )

퐻퐶 푢 =

∑

ꢂ∈ꢋ ꢉ ,ꢂ≠ꢄ

…. (7)

(

)

ꢁ(ꢄ,ꢂ)

( )

The Normalised Harmonic Centrality (ꢊ퐻퐶 푢 ) of a vertex 푢 is:

1

( )

ꢊ퐻퐶 푢 =

∑

ꢂ∈ꢋ ꢉ ,ꢂ≠ꢄ

…. (8)

(

)

ꢃ−1

ꢁ(ꢄ,ꢂ)

www.ijltemas.in

Page 1137

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

( )

1.1.5 Eccentricity (푒 푣 )[39]: The eccentricity of a vertex 푣, denoted by 푒(푣), is the maximum distance from 푣

to any other vertex in the graph:

ꢌ푎푥

( )

푒 푣 =

₎푑(푣, 푢)

…. (9)

(

ꢄ∈ꢋ ꢉ

( )

Eccentricity Centrality (퐸퐶 푣 ) is the reciprocal of the eccentricity of a vertex. For a vertex 푣 ∈ 푉(퐺), it is

defined as:

1

( )

퐸퐶 푣 =

=

…. (10)

ꢎꢏꢐ

(

ꢈ∈ꢆ ꢇ

( )

ꢍ ꢂ

ꢁ(ꢂ,ꢄ)

)

The Total Degree Centrality Weight (퐷푊(퐺)) of 퐺 is a measure of the network’s heterogeneity, defined as the

sum degrees of all vertices, which is equal to

2푚.

퐷푊(퐺) = 2푚

…. (11)

…. (12)

The Degree Centrality Weight (퐷퐶푊(퐺)) of 퐺 is:

ꢃ

∑

퐷퐶푊(퐺) = _푘=1퐷퐶(푣_푘)

The Leverage Centrality Weight (퐿푊(퐺)) of 퐺 is defined as the sum of the leverage centralities of all vertices:

ꢃ

∑

퐿푊(퐺) = _푘=1퐿(푣_푘)

…. (13)

…. (14)

…. (15)

The Harmonic Centrality Weight (퐻푊(퐺)) of 퐺 is:

∑

퐻푊(퐺) =

₎퐻퐶(푢)

(

ꢄ∈ꢋ ꢉ

The Normalised Harmonic Centrality Weight (퐻퐶푊(퐺))of 퐺 is:

∑

퐻퐶푊(퐺) =

₎ꢊ퐻퐶(푢)

(

ꢄ∈ꢋ ꢉ

The Eccentricity Weight (퐸푊(퐺)) of 퐺 is the sum of the eccentricities of all vertices:

∑

퐸푊(퐺) =

_{ꢂ∈ꢋ(ꢉ)}푒(푣)

…. (16)

The Eccentricity Centrality Weight (퐸퐶푊(퐺)) of 퐺 is the sum of the eccentricity centrality of all vertices:

1

∑

ꢂ∈ꢋ(ꢉ)

퐸퐶푊(퐺) = _{ꢂ∈ꢋ(ꢉ)}퐸퐶(푣) =

…. (17)

( )

ꢍ ꢂ

Research influences fundamental principles of complex network analysis to characterise chemical structures. By

employing a suite of well-established centrality measures-including degree, closeness, leverage, harmonic, and

eccentricity centrality, the structural importance of individual atoms and their relationships within a molecular

graph can be quantified. This detailed network-based characterisation using molecular graphs provides a

powerful set of descriptors for QSPR analysis (Table 1). The present work outlines a robust methodology that

integrates these centrality measure descriptors with statistical models. This integration ultimately paves the way

for a deeper understanding of molecular properties and their direct application in designing novel drugs, with a

particular focus on the urgent need for targeted rehabilitations for diseases like COVID-19.

MATERIALS AND METHODS

Data Collection and Preprocessing

The dataset for this study is taken from Kara et al. [20], which comprised eight diverse drug compounds: Arbidol,

Chloroquine, Hydroxychloroquine, Lopanavir, Remdesivir, Ritonavir, Thalidomide, and Theaflavin. For each

www.ijltemas.in

Page 1138

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

compound, eight key physicochemical properties were considered: Boiling Point (BP, mmHg), Enthalpy of

Vaporisation (E, kJ/mol), Flash Point (FP, °C), Molar Refractivity (MR, cm³), Polar Surface Area (PSA, Å²),

Packing Volume (P, cm³), Molar Volume (MV, cm³), and Molecular Weight (MW, g/mol). Nine molecular

( )

(

)

graph-based centrality measures were used as descriptors: Total Degree Centrality Weight 퐷푊 퐺 , Degree

Centrality Weight (퐷퐶푊(퐺)), Total Closeness Centrality Weight (푇퐶푊(퐺)), Closeness Centrality Weight

(퐶푊(퐺)), Leverage Centrality Weight

Harmonic Centrality Weight (퐻퐶푊(퐺)), Eccentricity Weight (퐸푊(퐺)), Eccentricity Centrality Weight

(퐸퐶푊(퐺)). Table 2 and Table 3 contain the reference property data and the experimental and estimated

(퐿푊(퐺)), Harmonic Centrality Weight (퐻푊(퐺)), Normalised

physicochemical properties for the drugs.

Table 1: Antiviral Compounds with Corresponding Molecular and Graph Representations for QSPR Analysis

Molecular Structure

Molecular Graph Representations

Arbidol (C₂₂H₂₅O₃N₂SBr)

Chloroquine

(C₁₈H₂₆N₃Cl)

Hydroxychloroquine (C₁₈H₂₆N₃ClO)

www.ijltemas.in

Page 1139

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

Lopinavir (C₃₇H₄₈O₅N₄)

Remdesivir (C₂₇H₃₅O₈N₆P)

Ritonavir (C₃₇H₄₈N₆O₅S₂)

Thalidomide (C₁₃H₁₀O₄N₂)

www.ijltemas.in

Page 1140

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

Theaflavin (C₂₉H₂₄O₁₂)

Heatmaps were generated to visualise Standardised molecular descriptors and physicochemical properties of

eight antiviral drugs (Figure 1 & Figure 2). The colour heatmaps use a blue-to-yellow gradient to highlight value

differences. High descriptor values, such as elevated 퐷푊(퐺) and 퐸푊(퐺) for Ritonavir and Lopinavir, indicate

greater molecular complexity, while high property values, like MW and PSA for Ritonavir, suggest increased

size or polarity. These visualisations reveal structural and physicochemical patterns critical for QSPR modelling,

assisting in the prediction of drug activity and chemo-kinetic behaviour.

To mitigate the impact of disparate scales observed across the raw dataset, continuous predictor variables were

standardised for the linear regression modelling process. This was achieved by using the subtracted mean and

then dividing by the standard deviation for continuous predictors. This process ensured that centrality measures

contributed equitably to the determination of model coefficients, preventing undue bias from variables with

larger absolute magnitudes. The physicochemical properties, serving as response variables, were retained in their

original units for direct interpretability of predictions.

Table 2: Centrality Measures for Molecular Structure Representation in QSPR Modelling of Drug Compounds

Drugs

퐷푊(퐺) 퐷퐶푊(퐺) 푇퐶푊(퐺) 퐶푊(퐺) 퐻푊(퐺) 퐻퐶푊(퐺) 퐿푊(퐺) 퐸푊(퐺) 퐸퐶푊(퐺)

Arbidol

62.000

46.000

48.000

2.213

2.190

2.166

0.209

0.207

0.198

5.883

4.367

4.356

233.57

4

8.340

6.833

6.979

-2.865

-1.399

-2.198

274.00

0

3.165

2.234

2.188

Chloroquine

143.50

1

224.00

0

Hydroxy

153.52

3

250.00

0

Chloroquine

Lopanavir

98.000

88.000

2.177

2.200

2.163

0.135

0.144

0.121

6.073

5.790

5.954

427.47

6

9.509

9.238

9.556

-3.635

-3.623

-4.467

653.00

0

3.334

3.013

3.058

Remdesivir

Ritonavir

369.31

6

574.00

0

106.00

0

468.28

6

844.00

0

www.ijltemas.in

Page 1141

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

Thalidomid

e

42.000

2.333

0.289

5.210

127.79

5

7.099

-1.499

135.00

0

2.763

Theaflavin

92.000

2.300

0.2070

6.707

403.02

3

10.077

-4.134

481.00

0

3.599

Figure 1: Molecular Descriptors of Antiviral Drugs for QSPR Analysis.

Table 3: Experimental and Estimated Physicochemical Properties of Drugs Used in QSPR Analysis

Boiling

Point

Enthalpy of

Vaporisation Point

Flash

Molar

Refractivity

Polar

Surface

Area

Packing

Volume

Molar

Volume

Molecular

Weight

Drugs

BP

E (kJ/mol)

FP

MR (cm³)

PSA

P (cm³)

MV

MW

(mmHg)

(⁰C)

(Ǻ²)

(cm³)

(g/mol)

Arbidol

591.80

460.60

516.70

91.50

72.10

83.00

311.7 121.90

232.3 97.40

266.3 99.00

80.00

28.20

48.40

48.30

38.60

39.20

347.30

287.90

285.40

477.40

319.90

335.90

Chloroquine

Hydroxy-

Chloroquine

Lopinavir

924.20

-

140.80

-

512.7 179.20

120.00

204.00

71.00

59.30

540.50

409.00

628.80

620.60

Remdesivir

-

149.50

www.ijltemas.in

Page 1142

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

Ritonavir

947.00

487.80

1003.9

144.40

526.6 198.90

248.8 65.20

336.5 137.30

202.00

78.90

25.90

54.40

581.70

161.00

301.00

720.90

258.23

564.50

Thalidomide

Theaflavin

79.40

83.60

156.50

218.00

Note: BP: Boiling Point; E: Enthalpy of Vaporisation; FP: Flash Point; MR: Molar Refractivity; PSA: Polar

Surface Area; P: Packing Volume; MV: Molar Volume; MW: Molecular Weight. Missing values were estimated

via correlation-based regression.

Figure 2: Physicochemical Properties of Antiviral Drugs for QSPR Modelling

Imputation of Missing Physicochemical Properties

Missing values for Boiling Point (BP), Enthalpy of Vaporisation (E), and Flash Point (FP) for remdesivir were

imputed to complete the dataset for subsequent QSPR modelling. This imputation was performed using Pearson

correlation-based simple linear regression models, the influence of strongest linear relationships observed

between these properties and selected molecular descriptors. The selection of descriptors for imputation was

based on the highest absolute Pearson correlation coefficients (∣푟∣>0.95) obtained from preliminary correlation

analyses (Table 4). Specifically, the 퐻퐶푊(퐺) centrality measure was selected for BP (푟 =0.976) and E

(푟 =0.973), while 퐸푊(퐺) was chosen for FP (푟 = 0.957). Linear regression models were then fitted using these

highly correlated predictors, Utilising data from other drugs in the dataset.

Table 4a: Pearson Correlation Coefficients for Properties Used in Remdesivir Data Imputation.

퐷푊(퐺) 퐷퐶푊(퐺) 푇퐶푊(퐺) 퐶푊(퐺) 퐻푊(퐺) 퐻퐶푊(퐺) 퐿푊(퐺) 퐸푊(퐺)

-0.258

-0.804

퐷퐶푊(퐺)

푇퐶푊(퐺)

0.769

www.ijltemas.in

Page 1143

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

0.793

1

0.273

-0.234

-0.03

-0.33

-0.788

-0.632

0.738

-0.887

-0.28

-0.66

-0.63

-0.85

-0.92

-0.499

-0.92

-0.939

-0.86

퐶푊(퐺)

퐻푊(퐺)

퐻퐶푊(퐺)

퐿푊(퐺)

퐸푊(퐺)

퐸퐶푊(퐺)

BP

0.808

0.927

-0.828

0.61

0.953

-0.963

0.959

0.749

0.97

0.959

-0.965

0.951

0.765

0.975

0.964

0.909

0.941

0.856

0.941

0.831

0.974

0.209

-0.436

0.303

-0.056

-0.02

-0.959

0.83

-0.897

-0.772

-0.954

-0.948

-0.83

0.992

0.852

0.855

0.608

0.621

0.817

0.621

0.445

0.749

0.895

0.976^*

0.973^*

0.782

0.831

0.894

0.831

0.674

0.919

0.551

0.874

0.856

0.957^*

0.975

0.763

0.975

0.923

0.958

E

0.958

0.917

0.949

0.85

FP

-0.421

-0.525

0.129

-0.524

-0.673

-0.375

MR

-0.895

-0.883

-0.895

-0.76

PSA

P

0.949

0.844

0.978

MV

MW

-0.957

Note: Values represent Pearson correlation coefficients between pairs of physicochemical properties and

molecular descriptors deduced from Table 1 & Table 2. Bolded values indicate the highest correlation used for

regression-based estimation of missing data.

Table 4b: Pearson Correlation Coefficients for Properties Used in Remdesivir Data Imputation (columns

continued).

BP

E

FP

MR

PSA

P

MV

퐸퐶푊(퐺)

퐷퐶푊(퐺)

푇퐶푊(퐺)

퐶푊(퐺)

퐻푊(퐺)

퐻퐶푊(퐺)

퐿푊(퐺)

퐸푊(퐺)

퐸퐶푊(퐺)

BP

0.817

www.ijltemas.in

Page 1144

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

0.999

E

0.819

0.582

0.577

0.746

0.577

0.411

0.698

FP

0.815 0.793

MR

PSA

P

0.849 0.825 0.955

0.915 0.925 0.649 0.675

0.849 0.825 0.955 1

0.676

MV

MW

0.696 0.664 0.945 0.967 0.487 0.967

0.912 0.894 0.926 0.969 0.812 0.969 0.889

Note: Values represent Pearson correlation coefficients between pairs of physicochemical properties and

molecular descriptors deduced from Table 1 & Table 2. Bolded values indicate the highest correlation used for

regression-based estimation of missing data.

Figure 3: Regression Plots Illustrating Descriptor-Property Relationships Utilized for Remdesivir Data

Imputation.

The Pearson correlation coefficients for the properties and descriptors used in this imputation are presented in

Table 4. The resulting regression equations were subsequently applied to remdesivir’s known 퐻퐶푊(퐺) and

퐸푊(퐺) values to estimate its missing BP, E, and FP values, ensuring robust estimations for subsequent

quantitative structure-property relationship modelling. Table 5 summarises these imputed values. For a visual

representation of the regression models employed for imputation, scatter plots illustrating these descriptor-

property relationships are presented in Figure 3 contributing to robust fit accuracies & prediction ability.

www.ijltemas.in

Page 1145

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

Table 5: Imputation of Missing Physicochemical Properties for Remdesivir Using Correlation-Based

Regression.

Property

Strongest

Correlated

Descriptor

Regression

Equation

Input Descriptor Estimated

Value

Property Value

Boiling Point (BP), in

mmHg

BP = -717 + 170.4 x

퐻퐶푊(퐺)

857.22 mmHg

132.31 kJ/mol

422.39 ⁰C

퐻퐶푊(퐺)

퐸푊(퐺)

퐻퐶푊(퐺)

=9.238375

Enthalpy of Vaporisation

(E), in kJ/mol

E = -100.5 + 25.20 x

퐻퐶푊(퐺)

=9.238375

Flash Point (FP), in ⁰C

FP =163.4+ 0.4512

퐸푊(퐺)

x 퐸푊(퐺)

= 574

Computational and Statistical Parameters

All statistical analysis and regression model developments were performed using statistical programmes,

influencing its robust capabilities for linear regression. For each physicochemical property, potential QSPR

models were developed by identifying the most statistically significant single centrality measure descriptor.

For all developed linear regression models, the predictive performance was rigorously assessed using the

2

coefficient of determination 푅², adjusted 푅²(푅_푎ꢁ푗), and critically, the predictive 푅²푅_푝ꢑꢍꢁ. The 푅_푝ꢑꢍꢁwas

(

)

calculated using the PRESS (Predicted Residual Error Sum of Squares) statistic, which is derived from a leave-

one-out cross-validation approach. In this method, each data point is successively removed from the dataset, and

a model is built using the remaining n−1 data points to predict the removed observation. This process is repeated

for all data points (includes eight observations for each of nine measures after data imputation), providing an

2

(

)

unbiased estimate of the model’s predictive ability for external data. A higher 푅_푝ꢑꢍꢁindicates a more robust

and generalizable model. The standard error of regression (S) and F-statistic with its associated p-value were

also reported to evaluate model fit and overall statistical significance. The significance of individual predictor

coefficients was assessed via p-values (ꢒ < 0.05). The resulting regression equations, derived from these models,

were used to calculate the predicted values for the physicochemical properties, and these predictions were

subsequently compared with the actual experimental properties to assess model accuracy and identify deviations.

Exploratory Data Analysis (EDA)

Exploratory Data Analysis was conducted to systematically explore the relationships between physicochemical

properties and molecular descriptors, providing both statistical and visual insights crucial for informing

subsequent model selection.

Table 6 presents the fundamental descriptive statistics, including mean (휇), standard deviation (휎), minimum,

and maximum values, for all 17 centrality measures and physicochemical properties measured across the eight

drug molecules, including remdesivir. This statistical summary illustrates the inherent variability and range of

each descriptor and property within the dataset. Notably, significant scale disparities are evident, such as between

( )

퐷푊 퐺 ( 휇= 72.75, 휎=26.01) and Boiling Point (BP: 휇 = 723.7, 휎=230.4).

www.ijltemas.in

Page 1146

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

Table 6: Descriptive Statistics of Centrality Measures and Physicochemical Properties

Variable

퐷푊(퐺)

퐷퐶푊(퐺)

푇퐶푊(퐺)

퐶푊(퐺)

Mean (μ_y) Standard Deviation (σ_y) Minimum (Min) Maximum (Max)

72.750

2.2178

0.1892

5.5430

290.80

8.4540

-2.978

429.40

2.9190

26.0100

0.0638

42.0000

2.1633

106.0000

2.3330

0.0544

0.1215

0.2895

0.8350

4.3560

6.7070

141.1000

1.3220

127.8000

6.8330

468.3000

10.0770

-1.3990

퐻푊(퐺)

퐻퐶푊(퐺)

퐿푊(퐺)

1.1770

-4.4680

135.0000

2.1880

248.0000

0.5000

844.0000

3.5990

퐸푊(퐺)

퐸퐶푊(퐺)

BP (mmHg) 723.70

230.4000

34.2000

116.6000

44.5000

75.4000

17.6600

140.4000

169.8000

460.6000

72.1000

232.3000

65.2000

28.2000

25.9000

161.0000

258.2000

1003.9000

156.5000

526.6000

198.9000

218.0000

78.9000

581.7000

720.9000

E (kJ/mol)

FP (⁰C)

112.50

357.20

131.10

123.00

51.950

364.20

MR (cm³)

PSA (Ǻ²)

P(cm³)

MV (cm³)

MW(g/mol) 490.80

Understanding the linear relationships between all variables was a foundational step. A comprehensive Pearson

correlation analysis was performed on the complete dataset (post-missing data substitution) to elucidate these

relationships. The Pearson correlation coefficient, ranging from -1 to 1, provides a quantitative measure of both

the strength and direction of linear associations (Figure 5).

Initially, the intercorrelations among the eight physicochemical properties themselves were examined. Within

the physicochemical properties themselves, Figure 4 illustrates a Pearson correlation matrix highlighting strong

linear relationships (e.g., Boiling Point with Enthalpy of Vaporisation, Molar Refractivity with Packing

Volume); while indicative of shared molecular influences, these intrinsic inter-property relationships also

contribute to collinearity, posing a significant consideration for direct multi-linear regression (MLR) modelling,

especially with the limited available dataset.

www.ijltemas.in

Page 1147

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

Figure 4: Pearson Correlation Matrix of Physicochemical Properties

( )

Further, the linear relationships between the nine centrality measures (퐷푊 퐺 , 퐷퐶푊 퐺 , 푇퐶푊 퐺 , 퐶푊 퐺 ,

( )

퐿푊 퐺 , 퐻푊 퐺 , 퐻퐶푊 퐺 , 퐸푊 퐺 , 퐸퐶푊(퐺)) and the eight physicochemical properties (BP, E, FP, MR, PSA,

P, MV, MW) were investigated. The full correlation matrix, detailing these relationships are presented in Table

7. This table served as a primary tool for identifying promising descriptors for each property. The individual

contributions of each centrality measure to the prediction of specific physicochemical properties are further

visualised through line charts of their respective Pearson correlation coefficients (Figure 5). Each subplot in

Figure 5 illustrates the correlation strength of the nine centrality measures against a single physicochemical

property, making it easy to graphically identify the best correlating descriptors for that specific property.

Figure 5: Pearson Correlation Coefficients of Molecular Descriptors with Individual Physicochemical

Properties.

www.ijltemas.in

Page 1148

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

Table 7: Pearson Correlation Matrix of Centrality Measures and Physicochemical Properties

Measures BP

(mmHg)

E

FP

(⁰C)

MR

(cm³)

PSA

(Ǻ²)

P

(cm³)

MV

(cm³)

MW

(g/mol)

(kJ/mol)

0.921^**

-0.433

-0.856

0.615

0.914

0.794

-0.839

0.959^*

0.582

0.949^**

-0.525

-0.920

0.621

0.941

0.831

-0.895

0.975^*

0.577

0.949^**

-0.524

-0.920

0.621

0.941

0.831

-0.895

0.975^*

0.577

0.978^*

-0.375

-0.860

0.749

0.974^**

0.919

-0.957

0.958

0.698

0.972

-0.080

-0.683

0.851

0.976^**

0.978^*

-0.956

0.881

0.810

0.960

-0.046

-0.655

0.853

0.966^**

0.974^*

-0.950

0.864

0.812

0.850

0.129

-0.499

0.817

0.856

0.894^*

-0.883^**

0.763

0.746

0.844

-0.673

-0.939^*

0.445

0.831

0.674

-0.760

0.923^**

0.411

퐷푊(퐺)

퐷퐶푊(퐺)

푇퐶푊(퐺)

퐶푊(퐺)

퐻푊(퐺)

퐻퐶푊(퐺)

퐿푊(퐺)

퐸푊(퐺)

퐸퐶푊(퐺)

Note: *Highest and **Second highest Pearson correlation values.

Figure 6: Overall Correlation Trends of Molecular Descriptors Across All Physicochemical Properties.

Beyond descriptor-property relationships, the inter-correlation among the centrality descriptors themselves was

also rigorously assessed. Revealed instances of high collinearity. For instance, 퐷푊(퐺) and 퐻푊(퐺) exhibit a

perfect positive correlation (푟=1.000), indicating their redundancy within this dataset. The presence of such

strong correlations among descriptors was a critical consideration in the subsequent model development phase,

guiding the selection of single-predictor models to ensure robustness and avoid multicollinearity issues in the

www.ijltemas.in

Page 1149

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

final QSPR models. The overall trends and distribution of correlation coefficients across all properties for each

centrality measure are summarised in Figure 6. This figure plots the range of correlation coefficients (from -1 to

1) for each of the nine centrality measures, with reference lines at ±0.5 and 0 to delineate strong positive, strong

( )

negative, and negligible correlations. As depicted, six measures (퐷푊 퐺 , 퐶푊 퐺 , 퐻푊 퐺 , 퐻퐶푊 퐺 , 퐸푊 퐺 ,

퐸퐶푊(퐺)) consistently show strong positive correlations across various properties, while 푇퐶푊(퐺) and 퐿푊(퐺)

primarily exhibit strong negative correlations. 퐷퐶푊(퐺), however, generally displays very weak correlations,

consistently falling close to the zero midline, indicating its limited predictive utility, as also observed numerically

in Table 7. This comprehensive correlation analysis was crucial for identifying the most relevant descriptors that

are central to developing robust linear QSPR models.

Regression Model Selection

Prior to model development, a comprehensive Pearson correlation analysis was conducted to assess the linear

relationships between all physicochemical properties and molecular descriptors. This matrix served as an initial

guide for identifying strong potential predictors. It also highlighted instances of high inter-correlation among

certain descriptors, informing the subsequent decision to favour single-predictor models to ensure model

robustness and avoid multicollinearity.

Following this comprehensive correlation analysis, which identified promising molecular descriptors (Table 7,

Figure 5, and Figure 6), single-predictor linear regression models were developed for each of the eight

physicochemical properties. The primary objective was to identify the most robust and statistically significant

QSPR model for each property. To achieve this, all nine centrality measures were individually evaluated as

potential predictors against each physicochemical property, resulting in 72 unique simple LRM’s (linear

regression models). Model performance was rigorously evaluated based on standard statistical parameters,

2

including the coefficient of determination 푅², adjusted 푅²푅_푎ꢁ푗, the standard error of regression (S), F-

(

)

2

statistic, and the model’s p-value. Crucially, the predictive 푅²푅_푝ꢑꢍꢁwas used as the primary criterion for

(

)

assessing the model’s ability to generalise to new, unseen data, reflecting its true predictive power.

Table 8: Summary of Optimised Single-Predictor QSPR Models for Physicochemical Properties.

Property

Optimal

Descripto Equation

Regression

R-sq.

Adj.

R-sq.

(Pred)

S (Standard

Error of

F-

value

p-value

(Model)

r

(Uncoded)

Regression)

BP

(mmHg)

BP = -717 +

170.4 x 퐻퐶푊(퐺)

0.9526

0.9489

0.9205

0.9499

0.7992

0.9507

0.9482

0.9404

0.9073

0.9416

0.7657

0.9424

0.9358

0.9256

0.8613

0.9088

0.6793

0.9104

52.4528

8.3483

129.03

111.46

69.49

0.000

0.003

0.000

퐻퐶푊(퐺)

E

E = -100.5 +

25.20 x 퐻퐶푊(퐺)

퐻퐶푊(퐺)

퐸푊(퐺)

퐻퐶푊(퐺)

퐸푊(퐺)

(kJ/mol)

FP (⁰C)

FP = 163.4 +

0.4512 x 퐸푊(퐺)

35.5185

10.7617

36.4949

4.2362

MR

(cm³)

MR = 55.92 +

0.1750 x 퐸푊(퐺)

113.86

23.87

PSA

(Ǻ²)

PSA = -308.0 +

51.0 x 퐻퐶푊(퐺)

P (cm³)

P = 22.15 +

115.59

0.0694 x 퐸푊(퐺)

www.ijltemas.in

Page 1150

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

MV

(cm³)

MV = 139.8 +

0.5226 x 퐸푊(퐺)

0.8524

0.8278

0.7698

58.2660

34.65

0.001

퐸푊(퐺)

퐷푊(퐺)

MW

MW = 26.3 +

0.9569

0.9497

0.9295

38.0859

133.05

0.000

(g/mol)

6.385 x 퐷푊(퐺)

Table 8 summarises the optimal single-predictor QSPR model identified for each physicochemical property. For

each property, this table presents the descriptor that yielded the highest predictive performance 푅_푝²_ꢑꢍꢁ, along

with its corresponding regression equation in uncoded units and key statistical parameters. This consolidated

view allows for a direct comparison of the best-performing models across all properties.

Validation of QSPR Models for Physicochemical Property Prediction

The coefficient of determination (푅²), Pearson’s correlation coefficient, and root mean square error (RMSE) are

critical metrics in the predicted versus, actual scatter plots (Figure 7), as they quantify the accuracy and reliability

of the QSPR models for predicting physicochemical properties (BP, E, FP, MR, PSA, P, MV, MW) of eight

drugs. 푅², derived from the square of Pearson corelation (푟), indicates the proportion of variance in actual values

explained by the model (e.g., 푅²=0.837 for BP shows strong fit). 푟, calculated is the measure of linear correlation

between actual and predicted values, with values near 1 indicating strong positive correlation. RMSE, computed

as the square root of the mean squared differences between actual and predicted values, quantifies prediction

error in the property’s units, where lower values denote higher accuracy. These metrics, displayed on each plot,

2

ꢓ

validate model performance, with tight clustering around the 45-degree line and high

, values for BP and E

ꢑ

suggesting robust predictions, while larger RMSE values for PSA highlight areas for model refinement.

Figure 7. Scatter Plots of Predicted vs. Actual Physicochemical Properties for QSPR Model Validation, Using

Marker Shapes with 푅², 푟 and RMSE.

www.ijltemas.in

Page 1151

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

RESULTS

Imputation Outcomes

Missing experimental values for Boiling Point (BP), Enthalpy of Vaporisation (E), and Flash Point (FP) for

remdesivir were successfully imputed using correlation-based simple linear regression models. The selection of

descriptors for this imputation was guided by strong Pearson correlation coefficients: 퐻퐶푊(퐺) exhibited a

correlation of 0.976 with BP and 0.973 with E, while 퐸푊(퐺) showed a correlation of 0.957 with FP, as detailed

in Table 4.

Using remdesivir’s known descriptor values, the following linear regression equations were applied:

 For Boiling Point (BP): BP = -717 + 170.4 × 퐻퐶푊(퐺)

 For Enthalpy of Vaporisation (E): E = -100.5 + 25.20 × 퐻퐶푊(퐺)

 For Flash Point (FP): FP = 163.4 + 0.4512 × 퐸푊(퐺)

Applying remdesivir’s 퐻퐶푊(퐺) value of 9.238375, its estimated Boiling Point was 857.22 mmHg and its

estimated Enthalpy of Vaporisation was 132.31 kJ/mol. For Flash Point, using an EW value of 574, the estimated

value was 422.39 °C. These imputed values, essential for completing the dataset for subsequent QSPR

modelling, are summarised in Table 5. Visual representations of the regression models used for these imputations

are presented in Figure 3.

Data Characteristics and Correlations

The fundamental descriptive statistics for all 17 centrality measures and 8 physicochemical properties across the

eight drug molecules, including remdesivir, are presented in Table 6. This summary illustrates the inherent

variability and ranges within the dataset, highlighting significant scale disparities.

A comprehensive Pearson correlation analysis was conducted on the complete dataset (post missing data

substitution) to elucidate the linear relationships between all variables. Figure 4 provides a Pearson correlation

matrix of the physicochemical properties themselves, showing strong intercorrelations such as that between

Boiling Point (BP) and Enthalpy of Vaporisation (E).

The full correlation matrix detailing the relationships between the nine centrality measures and the eight

physicochemical properties is presented in Table 7. This matrix was instrumental in identifying promising

descriptors for QSPR modelling. Key findings include:

 Strong Positive Correlations: 퐻퐶푊(퐺) exhibited strong positive correlations with BP (0.978), E (0.974),

and PSA (0.894). EW demonstrated high positive correlations with FP (0.959), MR (0.975), P (0.975),

and MV (0.923). 퐷푊(퐺) showed a substantial positive correlation with MW (0.978).

( )

 Strong Negative Correlations: Conversely, 푇퐶푊 퐺 and 퐿푊(퐺) consistently displayed pronounced

( )

negative correlations across multiple properties. Notably, 푇퐶푊 퐺

was strongly associated with MV (-

0.939), MR (-0.920), and P (-0.920), while 퐿푊(퐺) correlated negatively with BP (-0.956), E (-0.950),

and MW (-0.957).

 Weak Correlations: 퐷퐶푊(퐺) generally displayed very weak correlations, consistently close to zero,

indicating limited predictive utility for the studied properties.

The individual contributions and correlation strengths of each centrality measure against specific

physicochemical properties are visually represented through line charts of their respective Pearson correlation

coefficients in Figure 5. Figure 6 further summarises the overall trends and distribution of correlation coefficients

www.ijltemas.in

Page 1152

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

for each centrality measure across all properties, highlighting those consistently showing strong positive or

negative associations.

Furthermore, a rigorous assessment of the intercorrelation among the centrality measure descriptors was

conducted. This analysis identified instances of pronounced collinearity, exemplified by a perfect positive

correlation (푟=1.000) between 퐷푊(퐺) and 퐻푊(퐺), patterns of which are visually represented in the

comprehensive correlation matrix of Figure 4. The presence of such strong inter-descriptor correlations

significantly guided the selection of single-predictor models for QSPR development, a strategy adopted to ensure

model stability and effectively avoid issues associated with multicollinearity.

Optimised QSPR Model Performance

2

The overall predictive capabilities of the single-predictor QSPR models, assessed by the predictive 푅²푅_푝ꢑꢍꢁ

(

)

,

are comprehensively presented in Table 9. This table highlights the 푅_푝²_ꢑꢍꢁvalue for each of the 72 individual

models (9 centrality measures × 8 physicochemical properties), providing a robust measure of their ability to

generalise to unseen data through LOO (leave-one-out) cross-validation. Values closer to 1 indicate higher

predictive accuracy. Notably, several descriptors consistently yield high 푅_푝²_ꢑꢍꢁvalues across various properties,

affirming their robust predictive potential. For instance, the 퐻퐶푊(퐺) measure shows exceptional predictive

power for BP (푅_푝²_ꢑꢍꢁ=0.9358) and E (푅_푝²_ꢑꢍꢁ=0.9256), while 퐸푊(G) demonstrates strong predictive performance

for FP (푅_푝²_ꢑꢍꢁ=0.8613), MR (푅_푝²_ꢑꢍꢁ=0.9088), and PV (푅_푝²_ꢑꢍꢁ=0.9104). 퐷푊(퐺) also exhibits high predictive

accuracy for MW (푅_푝²_ꢑꢍꢁ=0.9295).

2

Table 9: Predictive 푅²푅_푝ꢑꢍꢁValues for Single-Predictor QSPR Models of Physicochemical Properties.

(

)

Drugs

퐷푊(퐺)

퐷퐶푊(퐺) 푇퐶푊(퐺) 퐶푊(퐺)

퐻푊(퐺)

퐻퐶푊(퐺)

퐿푊(퐺)

퐸푊(퐺) 퐸퐶푊(퐺)

BP

(mmHg

)

0.9008

0.0000

0.2310

0.5943

0.8594

0.5803

0.5037

0.9164*

*

0.9358*

E

0.8633

0.0000

0.1817

0.5942

0.8398

0.5353

0.4974

0.881**

0.9256*

(kJ/mol

)

FP (⁰C)

0.0000

0.0833

0.0000

0.2832

0.7305

0.0000

0.7287

0.0000

0.2210

0.5249

0.2190

0.0000

0.3081

0.7145

0.7879

0.5322

0.7883

0.4438

0.5689

0.3945

0.6793*

0.3945

0.0000

0.6897

0.4921

0.6103

0.0000

0.0017

0.3540

0.0016

0.0000

0.2540

0.7428*

*

0.8613

*

MR

(cm³)

0.8158*

*

0.9088

*

PSA

(Ǻ²)

0.5146

0.3403

0.6292*

*

P (cm³)

0.6103

0.1926

0.8427

0.8163*

*

0.9104

*

MV

(cm³)

0.4899

0.7433*

*

0.7698

*

MW

(g/mol)

0.6031

0.8455

0.9295*

0.9143*

*

Note: *Highest and **Second highest predictive efficiency

www.ijltemas.in

Page 1153

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

Figure 8. Comparative Predictive Performance (푅_푝²_ꢑꢍꢁ) of Individual Centrality Measures for Each

Physicochemical Property.

A visual representation of these predictive performances is provided in Figure 8, which comparatively plots the

푅_푝²_ꢑꢍꢁvalues for all centrality measures across each physicochemical property. This figure clearly delineates the

best-performing descriptors for each property and illustrates the relative predictive strengths and weaknesses of

each centrality measure. Notably, the consistently low 푅_푝²_ꢑꢍꢁ

values observed for 퐷퐶푊(퐺) across most

( )

properties, as well as for 푇퐶푊 퐺 and 퐿푊(퐺) in certain cases, further underscore their limited predictive utility

in single-predictor models for this dataset. Conversely, measures like 퐻퐶푊(퐺) and 퐸푊(퐺) frequently occupy

the higher end of the predictive spectrum, reinforcing their value as robust QSPR descriptors. The comparison

between the model’s fit (푅²) and its predictive ability (푅_푝²_ꢑꢍꢁ) is critical, as a high 푅_푝²_ꢑꢍꢁconfirms that the

observed fit is not merely due to overfitting but reflects true model robustness & generalisability.

DISCUSSION

Effectiveness of Missing Data Imputation

The imputation of missing experimental data for remdesivir’s Boiling Point (BP), Enthalpy of Vaporisation (E),

and Flash Point (FP) was critical for comprehensive QSPR model development. By employing correlation-based

linear regression with highly correlated centrality measures, accurate estimated values were generated. This

efficient approach ensured dataset completeness without data exclusion, a crucial advantage for a limited sample

size. The success of this missing data substitution method enhances the robustness and applicability of the

subsequent QSPR analysis, demonstrating a valuable strategy for handling incomplete datasets in

cheminformatics studies, which is consistently evident from robust high 푅_푝²_ꢑꢍꢁ

.

Interpretation of Developed QSPR Models

The comprehensive correlation analysis illuminated strong linear relationships between molecular graphs,

represented by the nine centrality measures, and the eight physicochemical properties. As presented in Table 7

and Figure 5, 퐻퐶푊(G) and 퐸푊(퐺) consistently showed strong positive correlations (푟≥0.9) with properties

like Boiling Point, Enthalpy of Vaporisation, Flash Point, Molar Refractivity, Packing Volume, and Molar

www.ijltemas.in

Page 1154

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

Volume. This suggests these measures effectively capture structural features influencing intermolecular forces

( )

and bulk properties. Conversely, 푇퐶푊(퐺) and 퐿푊 퐺 frequently exhibited strong inverse relationships (푟≤−0.8)

with properties such as Molar Volume and Molecular Weight, indicating their sensitivity to structural attributes

inversely related to these properties.

The selection of optimal single-predictor linear regression models, based on their predictive 푅²(푅_푝²_ꢑꢍꢁ) derived

from LOO cross-validation (Table 8, Table 9), confirmed the robust predictive power and generalisability of

these relationships. The high 푅_푝²_ꢑꢍꢁValues demonstrate that even a single, well-chosen centrality measure can

establish highly accurate and interpretable QSPR models for this dataset, elucidating the structural determinants

of physicochemical behaviour.

Limitations and Future Directions

A primary limitation of this study is the small sample size of eight drug compounds. While rigorous cross-

validation using PRESS and the focus on predictive 푅²mitigate overfitting, the small observations DOF (Degree

of Freedom, N-1=7) restricts the complexity of models that can be reliably developed and limits the capture of

more intricate, multi-descriptor relationships (MLR Models). The high collinearity observed among certain

descriptors further dictated the focus on single-predictor models for robustness because of the small sample size

in model building.

Future research should aim to expand the dataset with a larger and more structurally diverse set of compounds.

This would facilitate the exploration of advanced QSPR methodologies, including:

 Multiple Linear Regression (MLR): Incorporating multiple, non-redundant centrality measures to

potentially improve accuracy and mechanistic interpretability.

 Non-linear and Machine Learning Models: Investigating algorithms capable of capturing complex, non-

linear structure-property relationships.

 Expanded Descriptor Sets: Exploring additional classes of molecular descriptors to provide

complementary structural information.

These advancements would refine predictive capabilities and deepen the understanding of molecular

mechanisms, thereby facilitating more efficient rational drug design.

CONCLUSIONS

This study has successfully developed and evaluated Quantitative Structure-Property Relationship (QSPR)

models for eight physicochemical properties across a diverse set of eight drug compounds related to COVID-19

treatment. A critical aspect of this research involved efficiently addressing missing data; the imputation of

Boiling Point, Enthalpy of Vaporisation, and Flash Point for remdesivir drug using correlation-based linear

regression proved to be an optimally accurate and effective, ensuring a complete and robust dataset for

subsequent statistical analyses.

The comprehensive correlation analysis performed underscores the significant influence of molecular graphs on

diverse physicochemical properties, which are the basis for QSPR modelling. Statistical analysis demonstrates

that specific molecular graph-based centrality measures, particularly 퐻퐶푊(퐺), 퐸푊(퐺) serve as powerful

descriptors, exhibiting strong positive correlations with key properties such as Boiling Point, Enthalpy of

Vaporisation, Molar Refractivity, and Packing Volume. Conversely, measures like 푇퐶푊(퐺) and

퐿푊(퐺) consistently show strong inverse relationships with several properties, including Molar Volume and

Molecular Weight. These identified relationships are pivotal for selecting optimal descriptors in QSPR model

development.

www.ijltemas.in

Page 1155

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

The developed single-predictor linear regression models, summarised in Table 8

provide interpretable

relationships between molecular structure and physicochemical behaviour. The rigorous validation employing

the PRESS (Predicted Residual Error Sum of Squares) statistic derived from LOOCV (leave-one-out cross-

validation) ensures the high predictive accuracy and generalisability of these models for external data. These

models not only advance our fundamental understanding of structure-property relationships but also offer

detailed methodology & use of practical tools for predicting properties of relevant new compounds. By

elucidating the structural determinants of physicochemical behaviour, this research contributes valuable insights

for rational drug design and molecular engineering, facilitating more efficient and targeted development

processes in cheminformatics.

FUNDING

The authors declare that no funds, grants, or other support were received during the preparation of this

manuscript.

ACKNOWLEDGMENTS

The author, Pavithra M. sincerely acknowledges the financial support provided by the Department of Science

and Technology (DST) – Karnataka Science and Technology Promotion Society (KSTePS), Government of

Karnataka (Ref. No. MP02/2023-24/430, dated 23rd January 2024), through the DST-KSTePS Fellowship.

Disclosure statement

The authors report there are no competing interests to declare.

Author Contributions

Pavithra M.: Conceptualization, Methodology, Data Curation, Formal Analysis, Literature Review, Original

Draft preparation and Editing; Veena Mathad: Supervision, writing-review and editing.

REFERENCES

1. Zhu, N., D. Zhang, W. Wang, X. Li, B. Yang, J. Song, X. Zhao, B. Huang, W. Shi and R. Lu, A novel

coronavirus from patients with pneumonia in China, 2019. New England journal of medicine. 382 (2020),

no. 8, 727-733, DOI 10.1056/NEJMoa2001017

2. Organization, W. H. WHO Director-General's opening remarks at the media briefing on COVID-19.

2020.

3. Sanders, J. M., M. L. Monogue, T. Z. Jodlowski and J. B. Cutrell, Pharmacologic treatments for

coronavirus disease 2019 (COVID-19): a review. Jama. 323 (2020), no. 18, 1824-1836, DOI

10.1001/jama.2020.6019

4. Wang, Y., D. Zhang, G. Du, R. Du, J. Zhao, Y. Jin, S. Fu, L. Gao, Z. Cheng and Q. Lu, Remdesivir in

adults with severe COVID-19: a randomised, double-blind, placebo-controlled, multicentre trial. The

lancet. 395 (2020), no. 10236, 1569-1578, DOI 10.1016/S0140-6736(20)31022-9

5. Yao, X., F. Ye, M. Zhang, C. Cui, B. Huang, P. Niu, X. Liu, L. Zhao, E. Dong and C. Song, In vitro

antiviral activity and projection of optimized dosing design of hydroxychloroquine for the treatment of

severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Clinical infectious diseases. 71 (2020),

no. 15, 732-739, DOI 10.1093/cid/ciaa237

6. Blaising, J., S. J. Polyak and E.-I. Pécheur, Arbidol as a broad-spectrum antiviral: an update. Antiviral

research. 107 (2014), no. 84-94, DOI 10.1016/j.antiviral.2014.04.006

7. Savarino, A., J. R. Boelaert, A. Cassone, G. Majori and R. Cauda, Effects of chloroquine on viral

infections: an old drug against today's diseases. The Lancet infectious diseases. 3 (2003), no. 11, 722-

727, DOI 10.1016/S1473-3099(03)00806-5

8. Gagliardini, R., A. Cozzi-Lepri, A. Mariano, F. Taglietti, A. Vergori, A. Abdeddaim, F. Di Gennaro, V.

Mazzotta, A. Amendola and G. D’Offizi, No efficacy of the combination of lopinavir/ritonavir plus

www.ijltemas.in

Page 1156

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

hydroxychloroquine versus standard of care in patients hospitalized with COVID-19: a non-randomized

comparison. Frontiers in Pharmacology. 12 (2021), no. 621676, DOI 10.3389/fphar.2021.621676

9. Cao, B., Y. Wang, D. Wen, W. Liu, J. Wang, G. Fan, L. Ruan, B. Song, Y. Cai and M. Wei, A trial of

lopinavir–ritonavir in adults hospitalized with severe Covid-19. New England journal of medicine. 382

(2020), no. 19, 1787-1799, DOI 10.1056/NEJMoa2001282

10. Franks, M. E., G. R. Macpherson and W. D. Figg, Thalidomide. The Lancet. 363 (2004), no. 9423, 1802-

1811, DOI 10.1016/S0140-6736(04)16308-3

11. Zu, M., F. Yang, W. Zhou, A. Liu, G. Du and L. Zheng, In vitro anti-influenza virus and anti-

inflammatory activities of theaflavin derivatives. Antiviral research. 94 (2012), no. 3, 217-224, DOI

10.1016/j.antiviral.2012.04.001

12. Beigel, J. H., K. M. Tomashek, L. E. Dodd, A. K. Mehta, B. S. Zingman, A. C. Kalil, E. Hohmann, H.

Y. Chu, A. Luetkemeyer and S. Kline, Remdesivir for the treatment of Covid-19—preliminary report.

New England Journal of Medicine. 383 (2020), no. 19, 1813-1836, DOI 10.1056/NEJMoa2007764

13. De Clercq, E., Anti-HIV drugs: 25 compounds approved within 25 years after the discovery of HIV.

International

journal

of

antimicrobial

agents.

33

(2009),

no.

4,

307-320,

DOI

10.1016/j.ijantimicag.2008.10.010

14. Sheahan, T. P., A. C. Sims, R. L. Graham, V. D. Menachery, L. E. Gralinski, J. B. Case, S. R. Leist, K.

Pyrc, J. Y. Feng and I. Trantcheva, Broad-spectrum antiviral GS-5734 inhibits both epidemic and

zoonotic coronaviruses. Science translational medicine. 9 (2017), no. 396, eaal3653, DOI

10.1126/scitranslmed.aal3653

15. Hung, I. F.-N., K.-C. Lung, E. Y.-K. Tso, R. Liu, T. W.-H. Chung, M.-Y. Chu, Y.-Y. Ng, J. Lo, J. Chan

and A. R. Tam, Triple combination of interferon beta-1b, lopinavir–ritonavir, and ribavirin in the

treatment of patients admitted to hospital with COVID-19: an open-label, randomised, phase 2 trial. The

lancet. 395 (2020), no. 10238, 1695-1704, DOI 10.1016/S0140-6736(20)31042-4

16. Todeschini, R. and V. Consonni, Molecular descriptors for chemoinformatics: volume I: alphabetical

listing/volume II: appendices, references; John Wiley & Sons, 2009.

17. Ivanciuc, O., S. L. Taraviras and D. Cabrol-Bass, Quasi-orthogonal basis sets of molecular graph

descriptors as a chemical diversity measure. Journal of Chemical Information and Computer Sciences.

40 (2000), no. 1, 126-134, DOI 10.1021/ci990064x

18. Todeschini, R. and V. Consonni, Handbook of molecular descriptors; John Wiley & Sons, 2008. DOI:

10.1002/9783527613106.

19. Ivanciuc, O., Chemical graphs, molecular matrices and topological indices in chemoinformatics and

quantitative structure-activity relationships §. Current computer-aided drug design. 9 (2013), no. 2, 153-

163, DOI 10.2174/1573409911309020002

20. Kara, Y., Y. S. Özkan, A. Ullah, Y. S. Hamed and M. B. Belay, QSPR modeling of some COVID-19

drugs using neighborhood eccentricity-based topological indices: A comparative analysis. PLoS One. 20

(2025), no. 5, e0321359, DOI 10.1371/journal.pone.0321359

21. Jyothish, K. and R. Santiago, Quantitative Structure–Property Relationship Modeling with the Prediction

of Physicochemical Properties of Some Novel Duchenne Muscular Dystrophy Drugs. ACS omega. 10

(2025), no. 4, 3640, DOI 10.1021/acsomega.4c08572

22. Tiikkainen, P., L. Bellis, Y. Light and L. Franke, Estimating error rates in bioactivity databases. Journal

of chemical information and modeling. 53 (2013), no. 10, 2499-2505, DOI 10.1021/ci400099q

23. Little, R. J. and D. B. Rubin, Statistical analysis with missing data; John Wiley & Sons, 2019. DOI:

10.1002/9781119482260.

24. Schneider, T., Analysis of incomplete climate data: Estimation of mean values and covariance matrices

and imputation of missing values. Journal of climate. 14 (2001), no. 5, 853-871, DOI 10.1175/1520-

0442(2001)014%3C0853:AOICDE%3E2.0.CO;2

25. Li, J., S. Guo, R. Ma, J. He, X. Zhang, D. Rui, Y. Ding, Y. Li, L. Jian and J. Cheng, Comparison of the

effects of imputation methods for missing data in predictive modelling of cohort study datasets. BMC

Medical Research Methodology. 24 (2024), no. 1, 41, DOI 10.1186/s12874-024-02173-x

26. Alwateer, M., E.-S. Atlam, M. M. Abd El-Raouf, O. A. Ghoneim and I. Gad, Missing data imputation:

A comprehensive review. Journal of Computer and Communications. 12 (2024), no. 11, 53-75, DOI

10.4236/jcc.2024.1211004

www.ijltemas.in

Page 1157

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue XII, December 2025

27. Schneiderman, E. D., C. J. Kowalski and S. M. Willis, Regression imputation of missing values in

longitudinal data sets. International journal of bio-medical computing. 32 (1993), no. 2, 121-133, DOI

10.1016/0020-7101(93)90051-7

28. Sun, Y., J. Li, Y. Xu, T. Zhang and X. Wang, Deep learning versus conventional methods for missing

data imputation: A review and comparative study. Expert Systems with Applications. 227 (2023), no.

120201, DOI 10.1016/j.eswa.2023.120201

29. Allen, D. M., The relationship between variable selection and data agumentation and a method for

prediction. technometrics. 16 (1974), no. 1, 125-127, DOI 10.1080/00401706.1974.10489157

30. Myers, R. H., Classical and modern regression with applications; Boston : PWS-Kent, 1990.

31. Golbraikh, A. and A. Tropsha, Beware of q2! Journal of molecular graphics and modelling. 20 (2002),

no. 4, 269-276, DOI 10.1016/S1093-3263(01)00123-1

32. Roy, K., S. Kar and R. N. Das, Understanding the basics of QSAR for applications in pharmaceutical

sciences and risk assessment; Academic press, 2015.

33. Consonni, V., D. Ballabio and R. Todeschini, Evaluation of model predictive ability by external

validation techniques. Journal of chemometrics. 24 (2010), no. 3‐4, 194-201, DOI 10.1002/cem.1290

34. Pavel, H., A. Roy, A. Santra and S. Chakravarthy. Degree centrality definition, and its computation for

homogeneous multilayer networks using heuristics-based algorithms. In International Joint Conference

on Knowledge Discovery, Knowledge Engineering, and Knowledge Management, 2022; Springer: pp

28-52.

35. Mathad, V. and M. Pavithra, Closeness Centrality Weight and Edge Closeness Centrality Weight of

Graphs. Dynamics of Continuous, Discrete and Impulsive Systems Series B: Applications & Algorithms.

32 (2025), no. 109-124, DOI

36. Sukumaran, S. and S. Unnithan, Mathematical Perspectives of Leverage Centrality: A Review. Indian

Journal of Science and Technology. 16 (2023), no. 39, 3325-3331, DOI 10.17485/IJST/v16i39.1234

37. Berberler, M. E., Leverage centrality analysis of infrastructure networks. Numerical Methods for Partial

Differential Equations. 37 (2021), no. 1, 767-781, DOI 10.1002/num.22551

38. Ortega, J. M. E. and R. G. Eballe, Harmonic Centrality and Centralization of Some Graph Products.

arXiv preprint arXiv:2205.03791. (2022), no. DOI 10.9734/ARJOM/2022/v18i530377

39. Hage, P. and F. Harary, Eccentricity and centrality in networks. Social networks. 17 (1995), no. 1, 57-

63, DOI 10.1016/0378-8733(94)00248-9

www.ijltemas.in

Page 1158