Page 217

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

Laser Printer Identification Using Convolutional Neural Network for

Forensic Document Authentication

Dr. Pushpalata Gonasagi

Associate Professor, Department of Computer Science, Govt. First Grade College, Mahagaon Cross,

Kalaburagi, India.

DOI:

https://doi.org/10.51583/IJLTEMAS.2026.150400019

Received: 06 April 2026; 11 April 2026; Published: 02 May 2026

ABSTRACT

Document forgery has become easier with the advancement of printing technologies and image editing software.

Identifying the source printer of a printed document is an important task in forensic document analysis.

Traditional approaches rely on handcrafted texture features such as Local Binary Pattern (LBP), Local

Directional Pattern (LDP), and Local Optimal Oriented Pattern (LOOP). However, these methods require manual

feature extraction and often fail to capture complex intrinsic printer signatures effectively. This research

proposes a deep learning-based approach using Convolutional Neural Networks (CNN) to automatically identify

laser printer models based on texture patterns observed in printed documents. The CNN model learns

discriminative features from character-level images without requiring handcrafted descriptors. The dataset

consists of scanned document images printed from ten different laser printers, and character-level segmentation

is applied to extract the character ‘e’ images. The proposed CNN-based method achieves high classification

accuracy and demonstrates superior performance compared to traditional machine learning approaches such as

SVM with handcrafted features. The results show that CNN can effectively capture intrinsic printer signatures

and improve document authentication systems in forensic applications.

Keywords: Laser printer, CNN, RELU, Grayscale.

INTRODUCTION

Forgery detection plays an important role in forensic document examination, particularly in cases involving legal

documents such as contracts, agreements, wills, ownership papers, and suicide notes [1]. With the rapid evolution

of printing technologies, identifying whether a document is genuine or forged has become a challenging task.

Modern printers produce high-quality outputs, making it difficult to distinguish printed documents visually. Each

printer has a unique intrinsic signature caused by mechanical imperfections, toner distribution, and printing

mechanisms [2]. These signatures appear as subtle texture variations in printed characters. Therefore, printer

identification can be used as a reliable method for document authentication. Traditional approaches rely on

handcrafted feature extraction methods such as LBP, LDP, GLCM, and DWT. However, these techniques

require manual feature engineering and may fail to generalize well for complex texture patterns. Recently, deep

learning techniques such as CNN have shown remarkable performance in image classification tasks due to their

ability to automatically learn hierarchical features. In this research, we propose a CNN-based approach to

identify ten laser printer models using character-level images extracted from scanned printed documents.

Related Work

Several studies have been conducted to identify printers based on texture and geometric distortions in printed

documents. Elkasrawi et al. [3] proposed a supervised learning method using noise features produced by printers

and achieved 76.75% accuracy. Tsai et al. [4] applied GLCM and DWT features with SVM classifier for printer

identification and obtained 98.64% accuracy. Lampert et al [5] used text-line features such as edge roughness

and correlation coefficients for identifying forged documents using SVM classifier. Mikkilineni et al. [6]

analysed font characteristics, paper type, and document age for printer identification. Wu et al. [7] used

geometric distortion features at the page level and achieved 100% classification accuracy using SVM. Ferreira

Page 218

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

et al. [8] applied CNN to classify ten printers using character images and achieved 97.33% accuracy. Jain et al.

[9] used text-line geometric distortion features and obtained 98.85% accuracy using SVM classifier. Shang et al.

[10] extracted contour roughness and noise energy features to differentiate laser printers, inkjet printers, and

photocopiers. Although traditional machine learning approaches perform well, they depend heavily on

handcrafted features. CNN-based models can automatically extract discriminative features, improving

classification performance.

Proposed Method

The proposed CNN-based printer identification system follows four main steps. First, data collection is

performed by gathering scanned printed documents from different laser printer models. Second, preprocessing

is applied to improve the quality of the extracted character images by resizing, converting to grayscale, removing

noise, and normalizing pixel values. Third, feature learning using CNN is carried out, where the convolutional

neural network automatically learns important patterns such as edges, textures, and microscopic printing

characteristics from the character images. Finally, classification of printer models is performed using the trained

CNN model to identify the source printer of the document.

Data Collection

The dataset used in this study consists of scanned printed documents obtained from the Figshare dataset [8]. The

documents contain scientist biographies collected from Wikipedia and printed using ten different laser printer

models, including Brother HL-4070CDW, Canon D1150, Canon MF3240, Canon MF4370DN, HP CP1518, HP

CP2025A, HP CP2025B, Lexmark E260DN, OKI Data C330DN, and Samsung CLP315. A total of 600 pages

were printed and scanned at a resolution of 600 dpi to ensure high-quality image capture. The character ‘e’ was

selected for analysis because it appears frequently in English text, providing a large number of samples for

training the model. Character segmentation was applied to extract individual character images from the scanned

pages. The dataset contains 600 scanned pages and approximately 100,000 images of the character ‘e’, out of

which 10,000 images were selected for the experiment, with 1,000 images collected from each printer model.

Preprocessing

Character images often contain variations in size, orientation, and noise due to differences in printing and

scanning conditions. Therefore, preprocessing is applied to enhance the image quality before training the CNN

model. In this process, all character images are first resized to 28 × 28 pixels to maintain a uniform input size

for the network. The images are then converted into grayscale to reduce computational complexity while

preserving important structural information. A median filtering technique with a 3 × 3 kernel is applied to remove

noise generated during scanning, while preserving important edge details of the characters. Finally, the pixel

values are normalized between 0 and 1, which helps in faster convergence and improves the learning

performance of the CNN model. Median filtering plays an important role in maintaining edge information while

effectively reducing scanning noise, thereby improving the feature learning capability of the CNN.

CNN Architecture

The CNN is used to automatically learn discriminative features from the segmented character images without

manual feature extraction. The input to the network is a 28 × 28 grayscale image of the character ‘e’. The first

convolution layer applies multiple filters to detect low-level features such as edges and curves, followed by a

ReLU activation function to introduce non-linearity. A max-pooling layer reduces the spatial dimensions and

helps in extracting dominant features while reducing computational complexity. The second convolution layer

learns higher-level patterns such as textures and micro-printing characteristics that are unique to each printer.

Another max-pooling layer is applied to further reduce dimensionality. The extracted feature maps are then

flattened into a one-dimensional vector and passed to fully connected layers, which perform classification based

on learned patterns. A softmax output layer is used to classify the input character image into one of the ten printer

models [11]. The CNN architecture effectively captures subtle printing artifacts such as toner distribution,

microscopic distortions, and texture differences, enabling accurate identification of the source printer. CNN

Page 219

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

automatically extracts hierarchical features such as edges, shapes, and textures from character images. The

proposed CNN architecture and image as shown in the Table 1 and Figure 1 respectively.

Table 1: CNN architecture

Layer

No.

Layer Name

Configuration

Output

Size

Purpose

Input Layer

28 × 28 grayscale image

28 × 28 × 1

Takes character image as input

Convolution Layer 1

32 filters, kernel size (3 ×

3), Activation: ReLU

26 × 26 × 32

Extracts low-level features such

as edges and textures

Max Pooling Layer 1

Pool size (2 × 2)

13 × 13 × 32

Reduces spatial dimensions and

computation

Convolution Layer 2

64 filters, kernel size (3 ×

3), Activation: ReLU

11 × 11 × 64

Extracts complex patterns and

shapes

Max Pooling Layer 2

Pool size (2 × 2)

5 × 5 × 64

Further reduces feature map size

Flatten Layer

Converts 2D feature maps

to 1D vector

1600

Prepares data for fully connected

layer

Fully Connected

Layer

128 neurons, Activation:

ReLU

128

Learns high-level feature

representation

Output Layer

10 neurons, Activation:

Softmax

Classifies input into 10 printer

classes

Figure 1: CNN architecture of printer identification.

Training Parameters

The CNN model was trained using carefully selected parameters to ensure effective learning and accurate

classification of printer models. The input images used for training were 28 × 28 pixels in size and in grayscale

format, which reduces computational complexity while preserving important texture and structural information

necessary for printer identification. The categorical cross-entropy loss function was used because the problem

involves multi-class classification with 10 different printer models. The Adam optimizer was applied to

efficiently update network weights and achieve faster convergence during training [12][13]. Model performance

was evaluated using standard metrics including accuracy, precision, recall, and F1-score, which provide a

comprehensive assessment of classification performance.

Page 220

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

EXPERIMENTAL RESULTS AND DISCUSSION

The experiment was conducted using Python with the TensorFlow deep learning framework to implement and

evaluate the proposed CNN model. The dataset was divided into training and testing sets, where 80% of the data

(8000 images) was used for training the model and 20% (2000 images) was used for testing its performance. The

confusion matrix is shown in the Table [2]. The experimental results show that the CNN model achieved an

accuracy of 99.3% as shown in the Table [3], while the SVM classifier with LOOP feature extraction achieved

99.8% accuracy [14].

The comparison result is shown in the Table [4]. The CNN model effectively captures subtle texture variations

caused by toner distribution and mechanical differences among printers. The experimental result is slightly lower

than the existing method due to limited system resources. The experimental implementation was carried out on

a system with limited computational resources (hardware constraints), which influenced the training

performance of the CNN model. But the major advantages of the CNN-based approach include automatic feature

extraction, high classification accuracy, reduced manual effort, robustness to noise variations, and scalability for

large datasets, making it suitable for printer identification tasks in digital forensics.

Table 2. Confusion matrix

Ten Laser Printer

models/Classes

Brother --HL- 4070CDW

Canon --D1150

Canon-- MF3240

Canon--MF4370DN

Hewlett.PackardCP15 18

Hewlett.packard- CP2025A

Hewlett ackard- CP2025B

Lexmark-E260DN

OKI Data-C330DN

Samsung- CLP315

Brother- -HL-4070CDW

994

Canon --D1150

994

Canon-- MF3240

997

Canon-MF4370DN

992

Hewlett Packard-CP1518

999

Hewlett Packard-CP2025A

1000

Hewlett Packard-CP2025B

994

Lexmark-E260DN

1000

OKI Data-C330DN

1000

Samsung- CLP315

993

Table 3. Classification accuracy using CNN with ReLU activation

Ten Laser Printer models/classes

Classification rate (%)

Error Rate (%)

Brother -HL- 4070CDW

99.4

0.6

Canon -D1150

99.4

0.6

Canon- MF3240

97.0

3.0

Canon-MF4370DN

99.1

0.9

Hewlett Packard- CP1518

98.8

1.2

Hewlett Packard- CP2025A

100

0.0

Hewlett Packard- CP2025B

99.4

0.6

Lexmark-E260DN

99.2

0.8

Page 221

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

OKI Data-C330DN

100

Samsung- CLP315

100

Average Accuracy

99.3

0.7

Table 4. Comparison performance of the proposed method.

Authors

Classifier

Accuracy (%)

Gonasagi et al. [14]

Linear SVM

99.2

Quadratic SVM

99.8

Proposed Approach

CNN + ReLU activation

99.3

CONCLUSION

This research presented a CNN-based approach for identifying laser printer models using printed character

images. The experimental results demonstrate that the CNN model can effectively capture intrinsic printer

signatures, such as texture patterns and toner distribution characteristics, and classify printers with high accuracy.

The proposed method enhances forensic document authentication by providing an automated and reliable system

for identifying the source printer of a document. For future work, the study can be extended by including inkjet

and dot-matrix printers, utilizing word-level and line-level features, and increasing the dataset size to improve

model generalization. In addition, advanced transfer learning models such as ResNet and VGG can be applied

to further improve performance, while efforts can be made to reduce computational complexity for faster

processing. Overall, CNN-based printer identification systems can assist forensic experts in efficiently detecting

forged documents and determining document ownership, making them valuable tools in the field of digital

forensics.

REFERENCES

1. Hilton, O. (1992). Scientific examination of questioned documents. CRC press.

2. Schreyer, M., Schulze, C., Stahl, A., &Effelsberg, W. (2009, March). Intelligent Printing Technique

Recognition and Photocopy Detection for Forensic Document Examination. In Informatiktage (Vol. 8,

pp. 39-42).

3. Elkasrawi, S., &Shafait, F. (2014, April). Printer identification using supervised learning for document

forgery detection. In 2014 11th IAPR International Workshop on Document Analysis Systems (pp. 146

150). IEEE.

4. Tsai, M. J., & Liu, J. (2013, May). Digital forensics for printed source identification. In 2013 IEEE

International Symposium on Circuits and Systems (ISCAS) (pp. 2347-2350). IEEE.

5. Lampert, C. H., Mei, L., &Breuel, T. M. (2006, November). Printing technique classification for

document counterfeit detection. In 2006 International Conference on Computational Intelligence and

Security (Vol. 1, pp. 639-644). IEEE.

6. Mikkilineni, A. K., Arslan, O., Chiang, P. J., Kumontoy, R. M., Allebach, J. P., Chiu, G. T. C., &Delp,

E. J. (2005, January). Printer forensics using svm techniques. In NIP & Digital Fabrication Conference

(Vol. 2005, No. 1, pp. 223-226). Society for Imaging Science and Technology.

7. Wu, Y., Kong, X., & Guo, Y. (2009, November). Printer forensics based on page document's geometric

distortion. In 2009 16th IEEE International Conference on Image Processing (ICIP) (pp. 2909-2912).

IEEE.

8. Ferreira, A., Bondi, L., Baroffio, L., Bestagini, P., Huang, J., Dos Santos, J. A., ... & Rocha, A. (2017).

Data-driven feature characterization techniques for laser printer attribution. IEEE Transactions on

Information Forensics and Security, 12(8), 1860-1873.

9. Jain, H., Joshi, S., Gupta, G., & Khanna, N. (2020). Passive classification of source printer using text

line-level geometric distortion signatures from scanned images of printed documents. Multimedia Tools

and Applications, 79(11), 7377-7400.

10. Shang, S., Memon, N., & Kong, X. (2014). Detecting documents forged by printing and copying.

EURASIP Journal on Advances in Signal Processing, 2014(1), 1-13.

Page 222

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue IV, April 2026

11. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional

neural networks. Advances in neural information processing systems, 25.

12. Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by

reducing internal covariate shift. In International conference on machine learning (pp. 448-456). PMLR.

13. Fix, E., & Hodges, J. L. (1951). Discriminatory Analysis, Nonparametric Discrimination: Consistency

Properties USAF School of Aviation Medicine, Randolph Field (pp. 1-21). Texas, Tech. Report 4.

14. Gonasagi, Pushpalata, and Mallikarjun Hangarge. "Source Identification of Documents Based on LOOP

Features." In Futuristic Trends for Sustainable Development and Sustainable Ecosystems, pp. 237-248.

IGI Global Scientific Publishing, 2022.