Page 164

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 202

Enhancing Fabric Defect Detection Using Efficient Pyramid Split

Attention in a Lightweight YOLOv5 Framework

Hou zongxiang and Ashardi bin Abas

Faculty of Computing And Meta-Technology,University Pendidikan Sultan Idris

DOI:

https://doi.org/10.51583/IJLTEMAS.2026.150300016

Received: 16 March 2026; Accepted: 21 March 2026; Published: 02 April 2026

ABSTRACT

Fabric defect detection is a fundamental quality control process in textile manufacturing, yet achieving accurate

and reliable automated inspection remains difficult because of complex background textures, subtle defect

patterns, and substantial variation in defect scale and shape. Although deep learning–based detectors have

improved inspection performance, many lightweight models still suffer from limited feature discrimination,

particularly in real-time industrial environments where computational efficiency is critical. To address this

limitation, this study proposes an enhanced fabric defect detection framework by integrating an Efficient

Pyramid Split Attention (EPSA) mechanism into a YOLOv5-based convolutional network. The EPSA module

is designed to adaptively recalibrate multi-scale feature responses, enabling the network to emphasize defect-

relevant information more effectively while preserving inference efficiency. A quantitative experimental design

was employed using a labeled fabric defect image dataset, and the proposed model was evaluated through

comparative and ablation analyses against baseline and alternative attention-based configurations. Experimental

results indicate that the EPSA-enhanced model achieves superior detection performance in terms of mean

Average Precision while maintaining real-time processing capability. The improvement is especially evident for

small, low-contrast, and irregular defects embedded in repetitive fabric textures. These findings confirm that

pyramid-based attention can substantially improve feature representation without imposing significant

computational overhead. The proposed approach offers a practical and efficient solution for automated textile

inspection and provides a useful foundation for future research on lightweight attention modeling for industrial

vision systems.

Keywords: fabric defect detection; attention mechanism; EPSA; convolutional neural networks

INTRODUCTION

Background and Context

Fabric defect detection is a fundamental function in textile quality assurance because surface defects directly

affect product grade, commercial value, downstream processing, and customer acceptance. Defects such as holes,

stains, broken yarns, knots, abrasion marks, and texture irregularities reduce fabric usability and can significantly

lower market value. Underscoring the economic importance of accurate and early inspection in textile

manufacturing [1]. The same source further notes that severe defects may lead to revenue declines, product

rejections, and reputational damage for merchants, especially in high-volume apparel production that requires

strict quality control [1].

The industrial significance of this problem is increasing as textile production moves toward smart manufacturing,

automated quality assurance, and continuous high-throughput processing. In such environments, inspection

systems must operate not only with acceptable detection accuracy, but also with consistency, speed, and practical

deployability under real production constraints [2], [3]. It further positions fabric defect inspection as part of the

broader intelligent manufacturing transition, in which automated visual inspection systems are becoming

mainstream because they enable contactless operation, modular deployment, greater integration, and improved

responsiveness in industrial quality-control workflows [2]. In this sense, AI-driven fabric inspection is not merely

Page 165

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 202

a technical improvement in defect recognition, but a strategic enabler of productivity, quality standardization,

and smarter textile manufacturing systems [2], [3].

Limitations of manual inspection and conventional methods

Despite its industrial importance, fabric inspection in many production settings still depends heavily on manual

visual examination. This traditional workflow is inherently subjective because inspectors must continuously

observe moving fabric, identify defects of different shapes and contrasts, and manually mark problematic regions.

It explains that inspectors may even need to stop the cloth inspection machine to verify hard-to-distinguish

defects, which makes the process slow, inconsistent, and unsuitable for real-time quantitative monitoring [1].

Manual inspection also suffers from four persistent limitations: low efficiency, low accuracy and reliability, poor

real-time responsiveness, and high labor intensity. Reported human inspection accuracy remains only 60%–75%,

and performance is strongly affected by fatigue, loss of concentration, individual experience, and subjective

judgment [1]. These limitations make manual inspection increasingly incompatible with modern textile

manufacturing, where wide fabric surfaces, high production speeds, and subtle defect patterns demand objective,

scalable automated solutions [1], [2].

Earlier automated approaches based on texture descriptors, statistical methods, spectral analysis, and handcrafted

features partially improved inspection under controlled conditions, but their robustness remained limited when

confronted with repetitive fabric textures, illumination changes, irregular defect morphology, and low-contrast

backgrounds [3]–[6]. The literature structure reflects this progression from texture- and statistics-based methods

to machine learning and then deep learning, indicating that conventional feature-engineering approaches struggle

to generalize across complex industrial scenarios [2], [7].

Deep learning progress in fabric defect detection

Recent progress in deep learning has substantially improved automated fabric inspection by enabling models to

learn hierarchical and discriminative representations directly from raw image data. Convolutional neural

networks have outperformed many handcrafted-feature pipelines because they can capture local texture cues,

structural variations, and defect-context relationships more effectively [1], [2]. Deep learning has shifted the

field toward end-to-end detection frameworks that are better suited to complex and variable inspection conditions

[3], [4].

Among these developments, one-stage object detectors from the YOLO family have become especially attractive

for industrial deployment because they offer a practical balance between detection accuracy and inference speed

[5]–[7]. In inspection environments, the ability to localize and classify defects in a single forward pass is highly

valuable for continuous monitoring. It therefore, adopts a YOLOv5-based framework as the core detector and

evaluates performance not only by accuracy, but also by model size and real-time suitability, reflecting the

industrial need to balance predictive quality with deployability [3], [8]. The further argues that YOLO-based

detectors are well-suited to textile inspection because industrial fabric defects exhibit large size variation, diverse

distribution density, and strong demands for both low miss-detection rates and fast inference [8].

At the same time, it shows that real-world fabric defects exhibit several difficult characteristics, including small

target size, unbalanced aspect ratios, and weak contrast between the defect and the surrounding fabric

background [3]. The Alibaba Tianchi dataset used in the study contains 5,913 annotated images, consolidated

into 20 defect categories, with defects ranging from line-like structures to clustered dots and large-area

contamination [3], [9]. This diversity confirms that fabric inspection is not a trivial detection problem, but a

multi-scale visual recognition task requiring robust feature modeling across highly varied defect appearances [3],

[9].

Problem statement: limited feature discrimination in lightweight detectors

Although deep learning has improved fabric defect detection, a major unresolved problem is that lightweight

detectors often struggle to emphasize defect-relevant features while suppressing repetitive background textures.

This is especially problematic in textile inspection because many defects are subtle, irregular, and visually

entangled with the woven pattern itself. Small and low-contrast defects can be overlooked when feature

Page 166

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 202

extraction is insufficiently discriminative, even if the overall architecture is computationally efficient [1], [2].

The issue as a core limitation of existing lightweight CNN-based inspection systems and uses it as a key

motivation for introducing pyramid-aware attention enhancement [2].

A second problem is the accuracy–efficiency trade-off. Many studies achieve stronger laboratory performance

by adopting deeper networks, larger backbones, or more computationally expensive enhancement modules.

However, these choices increase latency, memory consumption, and deployment cost, thereby reducing

suitability for continuous industrial use [3], [4]. Conversely, when architectures are aggressively simplified to

preserve speed, the resulting detector may lose robustness precisely on the difficult defects that matter most in

real inspection settings. This gap between algorithmic performance and deployable performance is emphasized

in both the YOLO background, which argues that industrial inspection systems should be evaluated not only by

predictive accuracy but also by model size, inference efficiency, and deployment feasibility [1], [5].

Research gap: pyramid-aware attention remains underexplored

Attention mechanisms have been widely introduced in convolutional networks to improve feature weighting,

with channel attention emphasizing inter-channel dependencies and spatial attention highlighting informative

spatial regions. Combined channel–spatial mechanisms have also shown useful gains in visual recognition and

detection tasks [1]–[3]. However, as noted in the original enhancement, many attention modules are designed

for general vision tasks and do not explicitly address the multi-scale nature of fabric defects, where useful cues

may appear differently across feature pyramid levels [4]. In textile inspection, this limitation is important because

defect cues are often weak, irregular, and scale-dependent, making generic attention insufficient for robust

lightweight detection [4], [5].

This creates a specific research gap. While pyramid-based feature representations are already recognized as

important for handling scale variation in detection networks [6], [7], pyramid-aware attention mechanisms

remain underexplored in fabric defect detection, particularly in lightweight detectors intended for real-time

industrial deployment [4], [5]. Existing studies often improve accuracy by increasing architectural complexity

rather than by efficient, scale-sensitive feature recalibration. The enhancing paper, therefore, positions Efficient

Pyramid Split Attention (EPSA) as a relevant candidate because it can adaptively recalibrate multi-scale

responses while preserving computational efficiency [4], [8].

To further supports this direction by organizing its experiments around feature fusion, attention mechanisms,

convolutional operations, state-of-the-art comparison, and ablation analysis. This structure shows that attention

design is not an isolated addition, but part of a broader effort to improve lightweight detection under the

combined constraints of complex textures, scale variation, and online inspection requirements [5], [9]. It

reinforces this same design logic by showing that multi-scale fusion, adaptive convolution, and lightweight

attention should be evaluated as interacting components of an efficiency-oriented industrial detection framework

rather than as independent accuracy boosters [5], [9].

Research objective and research questions

In response to these challenges, the present study investigates whether Efficient Pyramid Split Attention (EPSA)

can enhance feature discrimination in a lightweight fabric defect detector without compromising real-time

performance. The central objective is to improve fabric defect detection by integrating EPSA into a YOLOv5-

based convolutional framework, thereby making the detector more sensitive to subtle multi-scale defects while

remaining practical for industrial use [1], [2]. This objective is consistent with the broader goal of improving

accuracy, speed, and robustness in lightweight textile inspection models, particularly under conditions involving

complex textures, variable defect scales, and industrial deployment constraints [1], [3].

Accordingly, this addresses three research questions. First, how does EPSA-based attention affect feature

discrimination in fabric defect detection? Second, does EPSA outperform conventional attention mechanisms

when detecting small, low-contrast, and structurally complex defects? Third, can EPSA be integrated into a

lightweight detector while preserving inference performance suitable for real-time industrial deployment? These

Page 167

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 202

questions align with the experimental logic of the original enhancing draft, in which comparative attention

analysis, robustness evaluation, and efficiency analysis are central to the results section [2], [4].

Main contribution of the paper

This paper contributes to the fabric defect detection literature by presenting an EPSA-enhanced lightweight

detection framework that specifically addresses the challenge of limited feature discrimination in complex textile

backgrounds. Rather than improving performance through model expansion alone, the study investigates

whether pyramid-aware attention can selectively strengthen multi-scale defect cues in a computationally efficient

manner [1], [2]. In this sense, the paper provides empirical evidence of the value of scale-sensitive attention

modeling for industrial computer vision tasks, especially in inspection settings where subtle defects are

embedded in repetitive, visually noisy textures [2], [3].

The contribution is both methodological and practical. Methodologically, the study provides a focused

comparison between EPSA and conventional attention mechanisms within the same lightweight detection

context, thereby clarifying the role of pyramid-aware feature recalibration in defect-sensitive representation

learning [1], [2]. Practically, it aims to support textile manufacturers in deploying inspection models that improve

robustness without incurring excessive computational overhead. This emphasis on efficiency-aware

enhancement is consistent with the conclusion that industrial AI systems should be evaluated not only by raw

accuracy, but also by deployability, responsiveness, and computational sustainability [1], [3].

Figure 1: The traditional fabric defect detection method illustrates the limitations of manual inspection.

LITERATURE REVIEW

Traditional and deep learning–based fabric defect detection

Fabric defect detection has evolved from manual, handcrafted-feature inspection to data-driven visual

intelligence systems. Early automated approaches relied on classical image processing and texture analysis

methods, such as Fourier-based inspection, statistical texture transformation, morphological analysis, adaptive

wavelets, and low-rank decomposition, to identify deviations from the normal fabric structure [1]–[5]. These

methods were useful under controlled conditions because they could detect relatively obvious texture disruptions,

but their performance was often unstable when fabric patterns, illumination, noise, and defect appearance varied

across production conditions [2], [3], [5]. As a result, conventional handcrafted approaches were generally

sensitive to environmental interference and lacked the robustness required for real-world industrial deployment

[2], [4].

The problem is also more complex than simple binary texture anomaly detection. Fabric surfaces often feature

regular yet visually dense backgrounds, while defects may appear as stains, holes, knots, cracks, broken yarns,

Page 168

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 202

or localized color inconsistencies. To address these challenges, several later studies introduced improved two-

stage detectors, background-suppression models, and reconstruction-based strategies. For example, EDSR-

enhanced Faster R-CNN was used to strengthen defect detail representation, while weighted double low-rank

decomposition was proposed to suppress repetitive background texture and highlight defective regions [5], [6].

Even so, these approaches still face limitations in efficient localization, geometric adaptability, and consistent

operation under real industrial constraints [5], [6].

The transition to deep learning significantly advanced the field. CNN-based methods began to outperform

handcrafted feature pipelines because they learn hierarchical visual representations directly from image data,

reducing reliance on manually designed descriptors [7], [8]. Deep learning methods, therefore, improved

discriminative representation learning and expanded fabric inspection beyond traditional classification-only or

segmentation-oriented workflows [7], [9]. However, many early deep learning studies were still not optimized

for real-time industrial detection, especially when defects had to be simultaneously localized, classified, and

processed at production-line speed [8], [9].

Table 1. Comparative review of related studies on fabric defect detection

Study

Method/model

Attention or

feature

strategy

Dataset/task

Strength

Limitation

Abouelela

et al.

(2005)

Automated

vision system for

textile defect

localization

Structural image

analysis /

classical vision

Fabric defect

localization

Early automation

for defect

localization in

textile surfaces

Limited robustness

under complex

textures and

varying

illumination

Selver et

al. (2014)

Statistical

texture

transformation

with gradient

Statistical

texture features

Textile defect detection

Effective under

controlled

texture

conditions

Sensitive to noise,

pattern variation,

and real production

variability

Yapi et al.

(2015)

Learning-based

textile image

defect detection

Machine

learning feature

learning

Automatic textile defect

detection

Better

adaptability than

purely

handcrafted

approaches

Still depends on

feature design and

may not fully

support real-time

detection

Biradar et

al. (2021)

Deep

convolutional

neural network

CNN-based

deep feature

extraction

Fabric defect

detection/classification

Stronger

discriminative

feature learning

than traditional

methods

Primarily

classification-

oriented; limited

emphasis on real-

time object

detection

Zheng et

al. (2021)

Improved

YOLOv5

One-stage

detection with

YOLO

optimization

Fabric defect detection

Good balance

between speed

and detection

capability

Standard YOLO

variants may still

miss small or low-

contrast defects in

repetitive textures

Yao et al.

(2022)

EDSR +

improved Faster

R-CNN

Super-

resolution

reconstruction +

two-stage

detection

AI Tianchi fabric defect

dataset / 20 categories

Very high

accuracy through

background–

defect separation

and enhanced

detail

reconstruction

Computationally

heavier; less

suitable for

lightweight real-

time deployment

Page 169

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 202

Zhang et

al. (2022)

EPSANet

Efficient

Pyramid Split

Attention

General CNN attention

enhancement

Efficient multi-

scale feature

recalibration

Not originally

designed

specifically for

fabric defect

detection in

lightweight

industrial settings

Proposed

study

EPSA-enhanced

YOLOv5-based

detector

Pyramid-aware

lightweight

attention for

multi-scale

defect emphasis

Fabric defect detection

on AI Tianchi-based

setting

Targets

accuracy–

efficiency

balance, better

small-defect

sensitivity, real-

time suitability

Requires further

validation on

broader textile

domains and

embedded

platforms

Table 1 compares traditional, machine-learning, two-stage, one-stage, and attention-enhanced deep learning

approaches.

One-stage convolutional detection frameworks for fabric inspection

Among modern deep learning detectors, one-stage object detection frameworks have become especially

important for industrial inspection because they combine localization and classification within a single inference

pipeline, making them well suited to continuous production lines where latency and throughput are critical

operational constraints [1], [2]. Representative one-stage detectors include SSD and the YOLO family, both of

which have been widely adopted for real-time visual inspection tasks due to their relatively simple architectures

and fast inference speeds [1], [3]. In fabric defect detection, YOLO-based models are particularly attractive

because they generally provide a more practical balance between speed and accuracy than heavier two-stage

alternatives such as Faster R-CNN [4], [5].

It further supports this design choice by selecting YOLOv5 as the base framework, given its high detection speed

and accuracy, which make it suitable for real-time defect inspection. This choice is also consistent with the

practical characteristics of textile defects, which may appear as circular holes and stains, elongated broken yarns,

or irregular knot-like structures, thus requiring a detector that can handle substantial variation in shape, size, and

contextual appearance [4], [6]. For this reason, a one-stage yet adaptable detection architecture provides a

rational baseline for industrial fabric inspection.

However, the literature also shows that standard one-stage detectors remain limited when the targets are

extremely small, low-contrast, irregular, or embedded in repetitive fabric textures. In such cases, conventional

YOLO-based detectors may not sufficiently emphasize defect-relevant regions within convolutional feature

maps, which reduces sensitivity to subtle flaws [4], [7]. This limitation is especially important in lightweight

industrial models, because reducing complexity to preserve real-time performance can also weaken the

representational richness needed to reliably identify difficult defects [6], [7].

Attention mechanisms in convolutional neural networks

Attention mechanisms were introduced into convolutional neural networks to improve selective feature emphasis

by amplifying informative responses and suppressing irrelevant background information. In visual detection

tasks, this is especially useful when the target object is small, partially occluded, weakly contrasted, or visually

entangled with the background, because attention can guide the network toward more discriminative feature

responses [1]–[3]. In the context of fabric defect detection, this is particularly relevant because subtle defects are

often embedded within repetitive textures, making conventional lightweight detectors prone to insufficient

feature discrimination [4], [5].

The literature commonly distinguishes among channel attention, spatial attention, and combined channel–spatial

attention. Channel attention mechanisms such as Squeeze-and-Excitation (SE) emphasize inter-channel

Page 170

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 202

dependency by learning channel-wise importance weights, whereas spatial attention highlights informative

regions within the feature map [1], [2]. Combined mechanisms such as the Convolutional Block Attention

Module (CBAM) attempt to exploit both channel and spatial selectivity, thereby improving representational

capacity and localization sensitivity [2], [6]. The original draft correctly adopts this categorization and uses it to

explain why attention mechanisms can improve feature representation in CNN-based defect detection [4].

Nevertheless, not all attention mechanisms are equally suitable for industrial fabric inspection. Many were

developed for general computer vision benchmarks rather than texture-dominated inspection problems, and some

improve raw detection accuracy at the cost of higher model complexity, memory usage, or inference delay [4],

[5]. Explicitly argues that lightweight attention should be assessed not only in terms of accuracy gain but also

by its balance between speed and detection quality. Its comparative experiments show that the EPSA-based

backbone provides an effective trade-off between detection speed and accuracy, making it more suitable for real-

time industrial deployment than attention strategies that impose greater computational overhead [5], [7].

This shift in evaluation criterion is important. In textile inspection, the most useful attention mechanism is not

necessarily the one with the highest standalone mAP, but the one with the best deployable accuracy–efficiency

profile under real production constraints. That requirement directly motivates the use of a more efficiency-aware

and pyramid-sensitive attention design such as EPSA [3], [5], [7].

Pyramid-based and multi-scale attention strategies

Multi-scale representation has become central in modern object detection because real targets often appear at

highly variable spatial resolutions. This issue is particularly severe in fabric defect detection, where some defects

occupy only a few pixels while others span larger structured regions with very different aspect ratios and visual

signatures [1], [2]. Explicitly shows that a large variation in defect size places high demands on the feature-

fusion stage, and this was one of the main reasons for introducing the Bidirectional Feature Pyramid Network

(BiFPN) into the neck of the YOLOv5 architecture [1], [3].

Feature pyramid architectures are important because they combine representations from different network depths,

allowing both fine-grained spatial details and higher-level semantic information to contribute to detection [4],

[5]. In fabric inspection, this is particularly valuable because the same defect class may appear as a tiny local

anomaly, a line-like structural break, or a broader textured contamination region. Pyramid-aware attention is

therefore more suitable than scale-agnostic attention, since it can refine not only which features are important,

but also the scale level at which they should be emphasized [3], [6].

However, the current literature still reveals a gap between multi-scale detection and efficient multi-scale

attention. Many studies improve scale robustness through deeper backbones, more complex feature-fusion

pipelines, or computationally expensive multi-branch architectures, but do not systematically address efficiency

constraints required for real-time industrial deployment [3], [6], [7]. As a result, although multi-scale feature

fusion is widely recognized as beneficial, efficient pyramid-aware attention remains insufficiently explored for

fabric defect detection in lightweight deployment settings [3], [7].

Figure 2: Research gap map for attention mechanisms in fabric defect detection.

Figure 2 shows the progression from traditional methods to CNN detection, then to generic attention, and finally

to the unmet need for lightweight pyramid-aware attention in industrial fabric inspection.

Page 171

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue III, March 202

Theoretical framework: hierarchical feature learning

The theoretical foundation of this study is hierarchical feature learning, which underlies the operation of

convolutional neural networks. In this framework, shallow layers capture low-level spatial details such as edges,

local contrast, and fine texture variations, while deeper layers encode more abstract semantic structures.

Effective object detection therefore depends on how well these hierarchical feature levels are coordinated for

localization and classification across varying scales [1], [2]. This theoretical view is directly relevant to the

present study because it explains why feature enhancement strategies are needed when subtle defect cues are

easily overwhelmed by repetitive fabric background patterns [3], [4]. This theory is highly relevant to fabric

defect detection because defects do not appear in a uniform visual form. Some are subtle local discontinuities,

some are elongated structural breaks, and others are distributed spots or larger contaminated regions. A detector

must therefore preserve sensitivity to low-level texture while also integrating broader contextual information.

Attention mechanisms support this process by selectively emphasizing informative features and suppressing

irrelevant background signals, thereby improving discriminative representation in complex visual scenes [3], [5].

Pyramid-based attention mechanisms extend this idea by enabling adaptive recalibration across multiple feature

levels rather than relying on a single representational scale [4], [6]. It further strengthens this theoretical rationale

through its experimental design. Rather than treating feature extraction, feature fusion, and attention as isolated

modules, it evaluates them as interacting components of a lightweight detection system. Its findings indicate that

BiFPN improves multi-scale feature fusion in small-object and complex-background scenes, deformable

convolution enhances geometric adaptability to irregular defect structures, and EPSA provides a lightweight

attention mechanism that balances speed and detection accuracy [1], [7]. This integrated design logic is

consistent with hierarchical feature learning, as it assumes that improved detection performance emerges from

better coordination across representation levels and scales within the network [1], [4], [7].

Research gaps and positioning of EPSA

The theoretical foundation of this study is hierarchical feature learning, which underlies the operation of

convolutional neural networks. In this framework, shallow layers capture low-level spatial details such as edges,

local contrast, and fine texture variations, while deeper layers encode more abstract semantic structures.

Effective object detection, therefore, depends on how well these hierarchical feature levels are coordinated for