Page 800
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Knowledge Management (Km) Information System for Laguna State
Polytechnic University - Santa Cruz Campus
Badillo, Jerahmeel A , Calapao, Jan Reinnen S , Rebong, Dexter D , Villarica, Mia V
College of Computer Studies Laguna State Polytechnic University Laguna, Philippines
DOI:
https://doi.org/10.51583/IJLTEMAS.2026.150500067
Received: 01 May 2026; Accepted: 07 May 2026; Published: 01 June 2026
ABSTRACT
Documents in any institution have been outnumbered and scattered in various repositories, hindering efficient
information retrieval and threatening institutional memory. This study developed a web-based Knowledge
Management Information System for Laguna State Polytechnic University - Sta. Cruz Campus that consolidates
core institutional documents such as research outputs, policies, course materials and extension documents into a
centralized, secure, and searchable platform.
Employing a Developmental and Experimental Research Design,
KMIS engineered four core intelligent components: Role-Based Access Control centralized repository, Retrieval
Augmented Generation pipeline that uses vector embeddings to utilize semantic search and LLM-generated
summaries from natural language queries. Lastly, the automated QPRO analysis engine. The developed KRA
text classification model achieved an overall accuracy of 98% and automatically categorized institutional reports
against the 22 Key Result Areas from the 2025-2029 LSPU Strategic Plan.
Keywords: Knowledge Management, Centralized Repository, Role-based Access Control, Semantic Search,
Retrieval Augmented Generation, LLM, Transformers
INTRODUCTION
Knowledge can be considered as one of the most important assets of universities because it is used as a basis for
teaching-learning activities, research and administration. According to the World Bank, Higher Education
Institutions are able to manage information through digital systems which allows them to organize their services
effectively and improve decision-making. Knowledge Management Information Systems at Laguna State
Polytechnic University Sta. Cruz Main Campus can help in gathering files and practices scattered in different
areas to a secured platform with searchable functions that can ease every day transactions. Knowledge at LSPU
Sta. Cruz Main Campus is fragmented. It is located per office, each individual's google drive and some are
kept in different systems. This makes it difficult to locate, reuse and manage vital documents.
Currently knowledge at the university is distributed among offices, personal google drives and other platforms
making it difficult to identify, reuse and manage key papers. This results in duplicate effort, out-of-date
references, and loss of institutional memory with human or institutional employee changes. One approach to
address difficulties is a campus-wide knowledge management information system that provides a central
repository for institutional files and reports such as research outputs, syllabi, policies, and extension reports with
explicit access controls and standard forms.
Artificial intelligence can improve the efficiency of knowledge management by offering more precise search
results and useful tags for content (Taherdoost, 2023). Intelligent search allows university users to ask natural
questions and receive relevant results, while automated tagging helps to provide consistent metadata without
burdening contributors. In practice, intelligent search allows users in universities to ask queries in natural
language and get relevant answers, while automated tagging ensures consistent metadata without burdening
contributors. These two aspects are realistic and valuable to be focused in the first place by LSPU Sta. Cruz
Main Campus and not a commitment for immediate enhancements, but for future enhancements with analytics
and chatbots.
Page 801
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
The ideal knowledge management information system for the campus should be accessible and secure and role
mapped to the different roles like administrators, faculty, personnel and students. It should have simple upload
work-flows, standard tags and reliable retrieval so knowledge flows easily between people and units. Such
infrastructure allows universities to retain institutional memory, speed up academic and administrative processes
and raise the profile of research and community outcomes. In the World Bank’s view, the digital transformation
in Philippine higher education is about building shared platforms and data-driven processes that will improve
teaching, research and governance. These directions are good background for a KMIS which enhances
knowledge consolidation, discoverability and responsible access at LSPU Sta. Cruz Main Campus. Further
details on the performance metrics are given in the following sections. What we need now is a good working
central system for the regular chores. This project is about the design and implementation of a KMIS for Laguna
State Polytechnic University Sta.Cruz Main Campus. The initial scope was a central repository, role-based
access, intelligent search and auto tagging.
Research Objectives
The purpose of this study is to create and develop a web-based Knowledge Management Information System
(KMIS) for Laguna State Polytechnic University Sta. Cruz Main Campus that creates institutional knowledge
into a single, secure, searchable platform with role-based access and intelligent search to improve everyday tasks
such as teaching, research and administration. Specifically, this study aimed to do the following:
1. To design and implement a centralized repository with role-based access controls for core assets or
institutional documents such as research outputs, course materials, campus policies, and extension
documents, ensuring secure storage and consistent organization across units.
2. To implement a semantic search functionality capable of processing natural language queries to optimize
information discovery and improve search precision while streamlining data retrieval workflows.
3. To develop an automated quarterly progress report on objectives analysis engine that intelligently evaluates
strategic plan progress through AI-powered document analysis, providing real-time insights into key result
areas and key performance indicator achievement across organizational units.
4. To design and train a text classification model for the automated categorization of institutional
accomplishments into specific key result areas (KRAs) and to evaluate the performance of the trained model
using standard metrics such as accuracy, precision, recall, and F1-score.
RELATED LITERATURE
The accumulated studies have been analyzed by the researchers and have served as a guide to concepts, methods,
strategies, and techniques in order to achieve the desired goals for the implementation of a centralized
Knowledge Management Information System.
Knowledge management aims to develop a structured process for creating, capturing, and sharing institutional
information or assets. Alvarenga et al. (2020) defined digital transformation as an essential shift that changes
how knowledge is produced and stored in higher education which makes it more efficient and relevant. In fact,
Putri et al. (2023) emphasized that without a centralized system, institutional knowledge remains a personal
advantage of individuals rather than a shared asset of the university, leading to documentation lacks and
duplication of effort.
In relation to the technical implementation, Che et al. (2024) highlighted the use of Retrieval-Augmented
Generation (RAG) and hierarchical context augmentation as powerful methods for handling huge number of
scientific papers. This approach allows the system to capture institutional knowledge such as the relationship
between abstracts which is more effective than traditional keyword-based searches. Similarly, Tinh and Phuong
(2025) highlighted heading-aware chunking as a way to enhance context retrieval by ensuring that the system
understands the contained headings and directory hierarchies of university files.
Page 802
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Moreover, the application of Large Language Models has revolutionized document classification and
information retrieval. Xu (2024) shows that models like LLaMA2 outperform typical machine learning
classifiers in terms of precision and recall for academic document classification. Wang and Pei (2024) researched
LLM-based query expansion for increasing the accuracy of user inquiries so that the system could give accurate
answers even if the query terms of the user are incomplete.
Finally, the design of the system interface is important to its acceptance and success. According to Kurniawan
et al. (2024), the platform addresses the real business needs of a higher education institution using the User-
Centered Design (UCD) and evaluation by System Usability Scale (SUS). The proposed KMIS of the LSPU
Santa Cruz Campus integrates these sophisticated AI-enabled retrieval techniques in a user-centered design
paradigm to establish a repository of institutional knowledge that is robust, effective and highly accessible.
Digital transformation claims adaptive knowledge management in higher education institutions (HEIs) to sustain
relevance among fast technological and informational shifts. Alvarenga et al. (2020) emphasized that universities
must properly manage knowledge resources to promote intellectual alignment with institutional visions and
strategies. Mehta (2021) described knowledge management as essential for collecting, analyzing, and sharing
data to generate insights, strategies, and innovations among students and personnel where knowledge
management handles organizational output through best practices and efficient processes. Galgotia & Lakshmi
(2022) highlighted its role in enabling quick responses to challenges, better decision-making, skill enhancement,
reduced redundancies, cost savings, and error minimization via targeted policies and faster information access.
Putri et al. (2023) identified issues like poor documentation, uncoordinated updates, trust on informal verbal
sharing, duplicated efforts in operations, not using of online guides, and human errors producing misleading
data.
Academic repositories are improving knowledge access and storage in higher education by using structured
metadata and open data principles. Mosha & Ngulube (2023) literature study 2003-2023 through Scopus, Web
of Science, Google Scholar, Boolean searches. Metadata standards highlight repositories in research data sharing
since open data paradigm. Dube (2025) support the whole research data lifecycle, in accordance with the FAIR
principles of discoverability, reuse and collaboration. Faculty and institutional repositories gather different
resources in one place, making them more efficient. Faculty repositories, as observed by Zibani, Rajkoomar, &
Naicker (2021), can be used to curate teaching, learning, and research information and thereby save search time
compared to using the web to locate scattered web resources. Ferreras et al. (2013) pointed out the importance
of openness, rich metadata and interoperability in institutional repositories for information access.
Effective KM requires knowledge repositories with technical features for sharing and discovery. Fadhlan &
Sensuse (2022) recommend the use of ontological tagging for organizational knowledge bases that may be used
by LSPU students and faculty because data catalogs, strong search, APIs and visualization are needed.
RESEARCH METHODOLOGY
Research Design
The suggested Knowledge Management Information System has been carefully designed, developed and
deployed utilizing the developmental research design. The approach was to evaluate the existing information
management processes at LSPU, identify system needs, design the system architecture, and design core features
such as centralized knowledge repository, AI-based search, automatic tagging, and recommendation tools.
Developmental design included system development, integration of machine learning components, and
continuous improvement based on usability and usefulness. This approach is ideal (Clark-Plaskie et al., 2022)
as the primary purpose of the project is to create a usable institution-specific KM Information System that
promotes knowledge storage, retrieval, and sharing in the university. In addition, an experimental study design
was used to evaluate the performance of the constructed system. The second part of the study is controlled
testing. In this part the accuracy of the semantic search function, the relevance of the document insights and
suggestions, and the usability and reliability of the system are assessed. The team examined the system outputs
and the user interactions to assess if the new KMIS features actually enhance information accessibility and user
Page 803
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
experience in the academic setting of the LSPU.
According to Clark-Plaskie et al. (2022), developmental research designs are effective for studies focused on
system creation and improvement because they allow researchers to observe changes and enhancements
throughout the development process. Meanwhile, Zubair (2022) explained that experimental research includes
controlled testing of variables to determine their effects on results. By relating the concepts above, combining
both developmental and experimental research designs ensured that the proposed KMIS was not only should be
systematically evaluated in terms of functionality, performance, and user satisfaction.
Applied Concepts and Techniques
The study advanced on the principles and methods that were used in developing the knowledge management
information system so as to solve the challenges connected with disorganized information storage and retrieval,
the following concepts and methodologies were used:
Knowledge Management (KM)
Knowledge management is the basic concept used in this study. Knowledge management is a systematic
approach to obtaining, storing, organizing, sharing and retrieving knowledge inside an institution. It is composed
of academic materials, research findings, administrative documents, and institutional policies in the setting of
Laguna State Polytechnic University. The KMIS is designed to serve core KM tasks, including knowledge
creation, storage, sharing and reuse, through a centralized digital repository.
Figure 4. Knowledge Management Cycle (Manjunatha, 2023)
Figure 4 depicts the cyclic process of the framework for institutional knowledge handling in Laguna State
Polytechnic University The diagram shows four interconnected phases which are capture, store, organize and
share which are placed in a continuous loop to emphasize the iterative nature of information management. They
draw the dynamic flow with arrows from stage to stage. Knowledge is obtained from institutional documents,
such as strategic plans and operational reports, stored in centralized repositories, structured to enable easy access
and retrieval, and shared with stakeholders to guide decision making and align performance.
Machine Learning
Machine learning is a way for computer systems to find patterns from data and make predictions without being
explicitly programmed. This study applies machine learning to categorize institutional documents into the 22
Key Result Areas (KRAs) in LSPU’s Strategic Plan 20252029, converting unstructured strategic documents
and QPRO reports into structured, actionable insights for performance monitoring and alignment.
Page 804
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Figure 5. Machine learning process diagram (Al-Saad et al. 2023)
Figure 5 shows the operation flow of ML algorithms used in this study. The procedure starts with a dataset of
institutional papers such as the QPRO reports, the filled PDO forms and the LSPU Strategic Plan. That leads
into data cleansing to eliminate inconsistencies, then feature engineering to provide structured inputs.
Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a technique to augment large language models with real-time
retrieval from an external knowledge source. It generates contextually correct responses. It uses live documents
for the domain. This study uses RAG to enable the institutional knowledge repository to provide semantic search
for all kinds of submitted documents such as PDF, Word files, spreadsheets and reports regardless of their format
or structure.
Once documents are submitted to the centralized repository, documents will be processed automatically
including text extraction, generation of semantic embeddings with transformer models, and indexing in a vector
database for efficient similarity searching. When the user poses a query, the system will perform a retrieval step
where chunks of documents that are most semantically similar will be identified and ranked based on the cosine
similarity of the embeddings. The retrieved segments are then used as context for the generative model to
synthesize accurate answers, citing the location of the sources.
Figure 6. Retrieval Augmented Generation Diagram
Page 805
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Figure 6 depicted the design of the Retrieval Augmented Generation pipeline implemented in the institutional
knowledge management information system. Smart Search occurs on user query with vector similarity matching
on embeddings in knowledge graph to retrieve relevant information from uploaded files. This context is loaded
with the query and sent to a Large Language Model which provides answer output based on the retrieved
provided documents.
Tokenization
Tokenization is the first step in preprocessing in natural language processing. It breaks down unstructured texts
of institutional documents into separate pieces or tokens that a machine can interpret. In this work, we used the
distilbert-base-uncased tokenizer in Hugging Face Transformers to tokenize the extracted text from the LSPU
Strategic Plan 20252029 and QPRO reports into subword tokens that preserve the semantic sense of the text
but enable out-of-vocabulary phrases to be processed by WordPiece algorithms.
Figure 7. Tokenization Method (Hashem et al., 2021)
Figure 7 provides an example of the tokenization approach that divides a phrase, a sentence, a paragraph or a
full text document into smaller units such as single words or phrases. Each little component is called a token
(Hashem et al., 2021).
Large Language Model
Large Language Models (LLMs) are transformer-based neural networks trained on enormous text structure,
offering improved capabilities in natural language processing, creation and reasoning. In this work we show how
LLMs can be used to improve semantic search, by improving the user query, generating semantically correct
answers from retrieved chunks of documents and generating prescriptive analysis from QPRO reports. It can
identify the Key Result Areas (KRA’s) and performance gaps between the actual operational data and the
strategic targets and can generate actionable recommendations e.g. re-allocation of resources or priority
interventions for planning officers. In other words, it can translate raw institutional documents into strategic
decision support.
Figure 8. How Large Language Model Work
Page 806
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
The main inference method of the LLM in the semantic search engine is illustrated in Figure 8. The instruction
prompt with labeled examples, and test input are created and passed to the Large Language Model. This
structured input is then passed to the model to generate text. The resulting text is then translated into labels to
build the final outputs such as KRA classifications or prescriptive insights.
Algorithm Analysis
The proponents used techniques for the analysis of the algorithms that may help the researchers to determine the
most suitable for the development of AI based system for institutional document classification and Key Result
Area (KRA) mapping in the Knowledge Management Information System (KMIS) application. To mitigate this,
researchers explored a number of transformer models for natural language processing (NLP) cited in the
literature, particularly those related to text classification and knowledge organization. The researchers hope that
by testing and comparing various models, they would be able to determine which algorithm offers the best
accuracy, speed and reliability for sorting university assets into their respective KRAs.
DistilBERT
DistilBERT is a distilled version of BERT base which is a compact, fast, inexpensive and light Transformer
type. It is designed to be 40% smaller than a BERT model, while keeping the same language understanding and
running 60% faster. DistilBERT employs a method known as knowledge distillation in which a smaller model
is trained to replicate the behaviors of a larger model. It makes it viable for settings where the computational
capacity is limited but the high-quality text classification is still needed.
Figure 9. DistilBERT Architecture
The architecture of the DistilBERT used for KRA text classification is presented in Figure 9. The model takes
as input tokenized text from QPRO reports and segments of the strategic plan. BERT Embedding Layer generates
contextual embedding. These are then fed through 6 Transformer Layers (diluted from the original 12 of BERT)
each consisting of multi-head attention mechanisms to capture semantic relationships as well as feed-forward
networks to apply non-linear transformations. Layer Normalization which stabilizes training. Multi-Head
Attention, which allows you to attend to different parts of the text at the same time. The final output layer
generates KRA probability distributions to facilitate appropriate transformation from operational data to strategic
objectives.
Page 807
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
DeBERTa-V3
DeBERTa-V3 (Decoding-enhanced BERT with Disentangled Attention Version 3) is a major improvement of
the original Transformer models, combining the pre-training methodology of ELECTRA with a disentangled
attention mechanism. While the common BERT models merge content and position information into a single
vector, DeBERTa disjoins the two, allowing the model to investigate how the meaning of administrative phrases
changes according to their relative distance to one another. Specifically, V3 uses Gradient-Disentangled
Embedding Sharing (GDES) to improve the training efficiency and learn more discriminative features for each
category.
Figure 10. DeBERTa-V3 Disentangled Attention and RTD Mechanism
The main architecture, as shown in Figure 10, employs a Replaced Token Detection (RTD) objective rather than
standard masking. Here, a generator model replaces some words with plausible fakes, and the DeBERTa
discriminator has to detect the true words. This sensitizes the model to indirect changes in organizational
language. In addition, the disentangled attention mechanism computes four different interaction scores, namely
content-to-content, content-to-position, position-to-content and position-to-position. By separately considering
these interactions, the KMIS AI is able to better discriminate between a “Research Report” (KRA 5) and a
“Financial Report for Research” (KRA 22) and reach the level of accuracy and precision required for institutional
data management.
MiniLM
MiniLM (Deep Self-Attention Distillation) is a highly efficient lightweight pre-trained model that focuses on
distilling the "self-attention" distributions of deep Transformer networks. Unlike other models that distill the
output of layers, MiniLM focuses on the relations between tokens. This allows the model to capture the complex
structure of a sentence even with a much smaller number of parameters, making it one of the best choices for
high-speed automated tagging and real-time search functionality in information systems (Wang et al., 2020).
Figure 11. MiniLM Self-Attention Distillation
Page 808
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Figure 11 shows how MiniLM processes text by distilling the "Self-Attention" maps from the teacher model.
The researchers apply a deep self-attention distillation strategy, training the student model to mimic the attention
values and the “Value-Relation” between tokens. This guarantees that the model preserves the most important
linguistic features needed for the classification of the 22 KRAs of the university. The main merit of this model
is its capability of working on a high level and being substantially smaller than the standard models. This helps
the KMIS to stay responsive with the increase of the university's document database.
Data Collection Methods
Data collection methods are the process of collecting and evaluating information or data from multiple sources
to find answers to research problems, answer questions, evaluate outcomes, and forecast trends and probabilities
(Pulkit, 2025). In addition, data collection is the methodological process of gathering information about a specific
subject. It is crucial to ensure the data is complete during the collection phase and that it is collected legally and
ethically (Cote, 2021).
The researchers’ study entitled “Knowledge Management Information System (KMIS) for Laguna State
Polytechnic University- Santa Cruz Campus” is a collection of institutional papers and knowledge assets that
were exclusively obtained from the university departments. Such resources as accomplishment reports, research
outputs, administrative files were gathered by systematic collecting throughout LSPU units such as CSS, COE,
OSAS, HRMU etc. The Researchers perform formal data gathering by personally visiting various offices of
LSPU, the College of Computer Studies and administrative facilities to ask for assistance and to validate the
relevance of the dataset for KMIS development. They also worked with faculty specialists and department heads
internally through official channels to provide access to complete records needed for training, testing and
implementation of AI models for automated tagging, search and knowledge sharing.
Data Model Generation
In this study, the researchers include data collection, pre-processing, splitting and balancing, testing and
validation, data evaluation, model training and model evaluation, model selection, and integration. Hence, it
replicates within the scope of the data information that was being generated.
Figure 12. Data generation process
Figure 12 states that the data generation process involves four key stages, population, selection, recording, and
derivation. Each of which can introduce specific types of gaps or biases. The representation gap occurs during
the selection phase when only a subset of the population is chosen, which may not accurately reflect the whole.
The accuracy gap happens in the recording phase due inconsistencies in the captured data. The interpretation
gap happens during derivation where incorrect conclusions may be drawn from incomplete or biased data.
Page 809
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Data Preparation
Data preparation is important to ensure that the institutional documents obtained are ready for analysis and
matched with the study’s objectives. The data preparation in this research was concentrated in the refining of the
Quarterly Physical Report of Operation (QPRO), and the LSPU Strategic Plan 2025-2029 to generate accurate
and reliable information for evaluation according to the needs of the study.
QPRO documents of each campus unit were collected, classified and formatted in a similar manner and then
labelled as the test dataset. This is to see if the operational performance of the university meets the requirements
and performance targets set in the strategic plan. The reports were checked for completeness, consistency of the
reporting period and comparability of the indicators in order to leave only certified entries for further research.
The LSPU Strategic Plan 2025-2029 is subject to data cleaning to eliminate narrative portions and other items
not directly related to the assessment of institutional performance. Critical components, i.e. Key Result Areas
(KRAs) and their related Key Performance Indicators (KPIs) were the only components which were kept and
encoded in a structured fashion to be the major reference framework. These KRAs and KPIs were further linked
to the respective indicators of the QPRO dataset. This allowed for a logical and objective comparison between
the actual operational data and the strategic performance requirements of the university.
Data Pre-processing
Data preprocessing in this study was performed on the LSPU Strategic Plan 20252029, which was provided as
a PDF document. The goal of this stage was to convert the unstructured textual content into a machine-readable
format suitable for input to the trained chosen model through a series of extraction, cleaning, balancing, and
tokenization procedures.
For data extraction and labeling, the text was first broken down from the PDF using a Python-based PDF
processing library. To partner each text segment with its corresponding Key Result Area (KRA), a rule-based
labeling strategy was applied. Specifically, the document was read consecutively and whenever a KRA marker
was encountered, all following lines were defined with that KRA label until the next KRA marker appeared,
resulting to a labeled corpus organized by KRA.
The text that was extracted was then cleaned to remove noise from the PDF structure and formatting. Headers,
footers, labels and administrative phrases that did not contain any useful information were removed via stop-
phrases and regular-expression matching. Bullet points, numbering stylization, and other special characters were
removed. Extremely small chunks of text were removed. Identical duplicate rows were dropped to prevent data
leakage in model training.
An inspection of the labeled corpus showed remarkable imbalance across the 22 key result areas (KRA) with
some areas being excessively relative to others. To reduce this, random oversampling was used which the KRA
with the highest sample count was taken as the reference and instances from minority KRAs were repeatedly
sampled with replacement until all KRAs attained an equal number of examples. This procedure yielded a fully
balanced dataset with uniform representation for each KRA class.
Data Splitting and Balancing
Initial observations revealed class imbalance across the 22 KRAs. Random Oversampling accomplished this by
identifying the KRA with maximum samples and up-sampling minority classes with replacement until all
achieved equal representation. The balanced dataset was then structured into training (80%) and testing (20%)
subsets to maintain proportional KRA distribution across splits, ensuring robust model evaluation.
Data Testing and Validation
The test dataset, which is 20 % of the balanced and tokenized corpus, was set aside for the final model evaluation
to achieve an unbiased performance evaluation. To address the multi-class (22 KRAs) balancing, the DistilBERT
Page 810
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
model was evaluated with accuracy, precision, recall, F1-score, and macro-averaged scores on the lasted subsets
from the training split. We then tuned the hyper-parameters and detected over-fitting on the training set via cross-
validation with early stopping.
Confusion matrix was built to perform detailed per-class performance evaluation and identify misclassifications
among related KRAs by comparing the model predictions on the test set with the actual KRA labels. Results
were evaluated for statistical significance using bootstrap resampling of test predictions, demonstrating
resilience across the data set. All validation methods are standard procedures for NLP categorization, so we can
be confident of the insights gained with respect to the degree to which the model aligns extracted text segments
to strategic plan objectives.
Data Evaluation
After testing and validation, the data is assessed to determine the accuracy and reliability of the constructed
dataset. The evaluation process is conducted using analytical techniques to assess the dataset’s capability to
support decision making for predictive purposes, missing value analysis, outlier detection and data consistency
with the original LSPU Strategic Plan and QPRO documents. The assessment uses analytical tools that consider
the ability of the dataset to predict decision making, including missing values checks, outlier detection and
consistency check against the original LSPU Strategic Plan. Descriptive statistics, distribution plots and cross-
referencing techniques were used to assess the data quality to generate reliable insights avoiding confusing
patterns The data format and sources were reviewed to confirm the quality and consistency of the information
used in the analysis.
Model Integration
The researchers did not directly integrate the model to the system, but first took steps to verify the reliability of
the model. They wanted to see if the model was good enough. Model selection stage: it was tested thoroughly
for performance before it was used as a model. The measures were the accuracy, precision and overall
effectiveness of the selected algorithms. And if there were any problems then they would be fixed here so the
model would work in the application and give results.
RESULTS AND DISCUSSION
This chapter presents the comprehensive results and detailed discussion relating to the development and
evaluation of the Knowledge Management Information System (KMIS) for Laguna State Polytechnic University
Santa Cruz Campus.
Research Objective 1: To design and implement a centralized repository with role-based access controls for
core assets or institutional documents such as research outputs, course materials, campus policies, and extension
documents, ensuring secure storage and consistent organization across units.
Researchers have developed a centralized institutional document repository providing a single document model
to store institutional core assets of documents with standard metadata and versioning, and role-based access
controls. The repository is organized at all institutional units with a dedicated unit entity and explicit unit-
document linkage. This allows structured grouping and retrieval while having a single source of truth. Data are
stored in a secure environment with controlled access via role-based access control with defined institutional
roles.
In addition to global RBAC, the system also supports unit scope (READ/WRITE/ADMIN) and document level
permissions, thus providing fine-grained governance over sensitive institutional files. The repository supports
accountability and auditability by persistently logging document views and downloads to reinforce data
governance requirements anticipated in an institutional knowledge management environment.
Page 811
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Figure 17. Screenshot of the system’s repository UI
The visual implementation of the KMIS document repository is shown in Figure 17. Respond to Research
Question 1: KMIS document repository as common and secure platform for institutional documents. UI is well
balanced with a database-backed structure with institutional assets such as completed PDO forms and their
associated metadata and versioning. The repository supports Role Based Access Control (RBAC) for assignment
of permissions at the unit scope (READ/WRITE/ADMIN) and permissions at the document level. The system
will generate a standard file list with standard metadata elements like document type, unit of origin and version
number, the researchers said. The database allows version control and traceability assuring the user that he/she
is looking at research outputs, course materials or administrative policies. This is important standardization to
make auditability and compliance possible. It is consistent information and UI is clear with upload workflows
where required metadata can be added.
Research Objective 2: To implement a semantic search functionality capable of processing natural language
queries to optimize information discovery and improve search precision while streamlining data retrieval
workflows.
The researchers developed a semantic search that can take natural language queries. Researchers built a potent
Retrieval-Augmented Generation (RAG) pipeline tailored for extracting institutional knowledge. This pipeline
allows end-users to submit natural-language queries through a chat-style endpoint that uses LLM augmentation
to convert retrieved materials into accurate, human-readable summaries. The system adopts semantic processing
for high precision in search. Chunking is applied to documents, and these are converted to vector embeddings.
Retrieval is also optimized for speed with a managed vector index for efficient nearest neighbor queries. This
system supports efficient workflows such as ad-hoc searching of attached files by a temporary session chat search
path and continuous institutional search by persistent document embeddings.
Page 812
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Figure 21. AI-Powered Search Query Output
Figure 21 shows that the system successfully implements semantic search and the Retrieval-Augmented
Generation (RAG) pipeline. The interface tests the system’s ability to enhance data retrieval processes by
displaying the source documents underlying the answer, which significantly enhances search accuracy, optimizes
information search and provides evidence-based responses to increase user confidence in the system’s ability to
retrieve knowledge.
Research Objective 3: To develop an automated quarterly progress report on objectives analysis engine that
intelligently evaluates strategic plan progress through AI-powered document analysis, providing real-time
insights into key result areas and key performance indicator achievement across organizational units.
The researchers built an end-to-end pipeline for QPRO analyses. This pipeline allows units to submit Quarterly
Progress Report of Operation (QPRO) documents which immediately invoke a server-side Analysis Engine. The
engine leverages embeddings and vector search to semantically identify the best matching document evidence
to the strategic plan. Then it asks an integrated LLM to generate structured outputs of analysis such as alignment
narratives, opportunities, gaps, recommendations, and an achievement score calculated for AI-powered
interpretation.
The outputs generated by the LLM are verified and stored in the QPRO Analysis database model, connected to
the original document. Most importantly, the system automatically extracts activities from the QPRO reports
and maps them to Key Result Area/Key Performance Indicator identifiers. The system stores and processes
analyses to provide strategic insights in real time. Background processing results in timely performance. The
final results are presented through KPI/KRA dashboards and detail views which display achievement
percentages and status badges.
Page 813
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Figure 22. Submission Section of Reports
Figure 22 shows the upload flow of the Quarterly Progress Report of Operation by institutional units. On
submission of the document, the server-side Analysis Engine is invoked immediately to initiate the end-to-end
process of ingestion of the document, extraction and semantic analysis of the document content using
embeddings and Large Language Models, and mapping of activities to Key Result Areas (KRAs) and Key
Performance Indicators (KPIs).
Figure 23. Overview of the Analyses
Figure 23 illustrates the overview section showing the performance tracking dashboard that yields specific
insights regarding the institution’s strategic alignment and operational performance.
Page 814
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Figure 24. AI Generated Insights
Figure 24 depicts the AI Insights module of the Knowledge Management Information System (KMIS). The
diagnostic assessment and the recommended strategic interventions are discussed based on the reviewed data of
institutional performance. This automated output transforms raw quarterly reports into real-time data-driven
guidance that enables administrators to quickly track KRA achievement and make targeted decisions on priority
interventions or resource allocation.
Research Objective 4: To design and train a text classification model for the automated categorization of
institutional accomplishments into specific key result areas (KRAs) and to evaluate the performance of the
trained model using standard metrics such as accuracy, precision, recall, and F1-score.
In developing and training the text classifier model that classifies institutional achievements into specific KRAs,
the researchers systematically developed the transformer-based classifier, which involved a complete process of
data preprocessing to extract, clean and oversample the LSPU Strategic Plan 2025-2029, in order to produce a
well-balanced dataset of 22 different classes of KRA, as well as to measure the performance of the trained model
by means of accuracy, precision, recall and F1-score. For this purpose, the researchers selected the DistilBERT
model because of its optimal performance with regards to the trade-off between speed of operation and language
comprehension capability. The model was trained to map QPRO text segments to the relevant strategic
objectives. In the evaluation phase, the model was put through strict tests to measure accuracy, precision, recall
and F1-score, in addition to a confusion matrix test to validate the reliability of the model to automatically
segment and classify the operational data, thus forming the core component of the prescriptive analytics engine
of the system.
Page 815
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Figure 28. Model’s Performance Metrics
The performance metrics such as precision, recall and f1-score validate the effectiveness of the model to
automatically segment and recognize institutional reports into 22 Key Result Areas (KRAs) as shown in Figure
28.
Table 10 presents the results of the Ground Truth Verification phase to assess the performance of the AI-powered
analysis engine of the Knowledge Management (KM) Information System. Twenty-five completed PDO
documents were scored by PASS/FAIL validation and accuracy scoring. Of the 25 documents rated, 20 were
rated as “PASS” and 5 were rated as “FAIL”, with an overall model accuracy of 80%. The system is considered
“Accurate” by the interpretation scale used in the study, indicating that the model was able to properly extract,
interpret and analyze the majority of the given PDO documents.
The evaluation results revealed that the majority of the identified errors were categorized as “Data Extraction
Errors”. These errors occurred when the system incorrectly identified some information from the documents due
to inconsistencies in formatting or unclear text structures. Despite these limitations, the system still able to
maintain a satisfactory level of overall performance during the evaluation process.
SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS
This study titled “KNOWLEDGE MANAGEMENT (KM) INFORMATION SYSTEM FOR LAGUNA STATE
POLYTECHNIC UNIVERSITY - SANTA CRUZ CAMPUS” which intends to develop a web-based
Knowledge Management Information System (KMIS) for Laguna State Polytechnic University Sta. Cruz Main
Campus that unites institutional documents into a single, secure, and searchable platform, utilizing role-based
access to documents, intelligent or semantic search to improve retrieval of documents, efficient research, and
administration. The researchers implemented a secure repository with role-based access control features for core
institutional documents and applied semantic search functionality using natural language queries to optimize
Page 816
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
discovery. The researchers also developed an automated quarterly progress report of operation (QPRO) analysis
engine to evaluate strategic plan progress against key result areas (KRA) and key performance indicators (KPI).
Lastly, the researchers design and train a KRA classification model to automatically categorize accomplishments
into 22 specific KRAs and evaluate its performance using standard metrics.
Conclusion
Based on the findings, objectives and confirmed results from system development and evaluation from the study,
the following conclusions are established:
1. The study successfully developed a unified institutional document repository that utilizes Microsoft
Azure’s blob storage. This centralized platform is secured because of the implemented RBAC (role-based
access control) and a mechanism to request permission for cross-unit access.
2. The developed system has produced a highly precise semantic search capability that was achieved through
the implementation of Retrieval-Augmented Generation (RAG) pipeline. It allows users to submit natural
language queries and uses an integrated Large Language Model (LLM) to convert retrieved document
segments into accurate, human-friendly summaries grounded in institutional documents that are uploaded
in the repository.
3. The system did a good job at giving prescriptive analysis through the reports that are submitted by every
unit. It is with the help of the automated QPRO analysis engine that was developed. Upon document
submission, this engine uses vector search to semantically match extracted operational evidence against
the strategic plan. The outputs include performance gaps and actionable recommendations that are
displayed in dashboards fulfilling the objective to provide intelligent, real-time insights into strategic plan
progress.
4. The study designed a KRA text classification model to automatically recognize the institutional
accomplishments into the 22 specific Key Result Area (KRA) of the 2025-2029 LSPU Strategic Plan. The
model is rigorously evaluated using standard metrics such as precision, recall and f1-score to confirm the
model’s reliability.
Recommendations
Based on the conclusion drawn from the findings of the study, the following recommendations are offered to
improve and address the identified challenges.
1. Use an excellent Optical character Recognition (OCR) for extracting tables from PDF files as the current
system supports only DOCX files.
2. They could include a conversational AI assistant or chatbot into future research. This improvement noted as
a future upgrade in the scope and limitations will allow users to query the knowledge base using natural
language that provides a more intuitive and context-aware experience than the current chat-style search.
LITERATURE CITED
1. AN APPROACH TO SEMANTIC EDUCATIONAL
CONTENT
MINING USING NLP. (2022, July
19). Proceedings of the 16th International Conference on E-Learning (EL 2022). 16th
International Conference on e-Learning.
https://doi.org/10.33965/EL2022_202203L0 05
2. Anshari, M., Syafrudin, M., Tan, A., Fitriyani, N. L., & Alas, Y. (2023). Optimisation of Knowledge
Management (KM) with Machine Learning (ML) Enabled. Information, 14(1), 35.
3. https://doi.org/10.3390/info14010035 Antonio
de Carvalho Junior, M., & Bandiera-Paiva, P. (2021).
Implications of loosened Role-based Access Control session control implementation for the enforcement
of Dynamic Mutually Exclusive Roles properties on Health Information Systems. Informatics in
Page 817
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Medicine Unlocked, 27, 100780. https://doi.org/10.1016/j.imu.2021.100780 Che,
4. T.-Y., Mao, X.-L., Lan, T., & Huang, H. (2024). A Hierarchical Context Augmentation Method to
Improve Retrieval-Augmented LLMs on Scientific Papers. Proceedings of the 30th ACM SIGKDD
Conference on Knowledge Discovery and Data Mining, 243254.
https://doi.org/10.1145/3637528.3671847
5. Dube, T. V. (2025). Research data management in academic libraries: Institutional repositories as a
reservoir for research data. Library Management, 46(5), 319331. https://doi.org/10.1108/LM-06-2024-
0070
6. Eguchi, M., & Kyle, K. (2024). Building custom NLP tools to annotate discourse-functional features for
second language writing research: A tutorial. Research Methods in Applied Linguistics, 3(3), 100153.
7. Faruqui, S. H. A., Tasnim, N., Basith, I. I., Obeidat, S., & Yildiz, F. (2024). Integrating A.I. in Higher
Education: Protocol for a Pilot Study with “SAMCares: An Adaptive Learning Hub” (No.
arXiv:2405.00330).
8. Galgotia, D., & Lakshmi, N. (2022). Implementation of Knowledge Management in Higher Education:
A Comparative Study of Private and Government Universities in India and Abroad. Frontiers in
Psychology, 13, 944153. https://doi.org/10.3389/fpsyg.2022.944153
9. Hafeez, S., Shahzad, K., Helo, P., & Mubarak, M. F. (2025a). Knowledge management and SMEs’ digital
transformation: A systematic literature review and future research agenda. Journal of Innovation &
Knowledge, 10(3), 100728. https://doi.org/10.1016/j.jik.2025.100728
10. S., Shahzad, K., Helo, P., & Mubarak, M. F. (2025b). Knowledge management and SMEs’ digital
transformation: A systematic literature review and future research agenda. Journal of Innovation &
Knowledge, 10(3), 100728. https://doi.org/10.1016/j.jik.2025.100728
11. Jochim, C., Gleize, M., Bonin, F., & Ganguly, D. (2021). TDMSci: A Specialized Corpus for Scientific
Literature Entity Tagging of Tasks Datasets and Metrics. In P. Merlo, J. Tiedemann, & R. Tsarfaty (Eds.),
Proceedings of the 16th Conference of the European Chapter of the Association for Computational
Linguistics: Main Volume (pp. 707714). Association for Computational Linguistics.
https://doi.org/10.18653/v1/2021.eacl-main.
12. Kitto, S., Chiang, H. L. M., Ng, O., & Cleland, J. (2025). More, better feedback please: Are learning
analytics dashboards (LAD) the solution to a wicked problem? Advances in Health Sciences Education,
30(1), 6985. https://doi.org/10.1007/s10459-024-10358-8
13. Kliimask, K., & Nikiforova, A. (2024). TAGIFY: LLM-powered Tagging Interface for Improved Data
Findability on OGD portals (No. arXiv:2407.18764). arXiv. https://doi.org/10.48550/arXiv.2407.18764
14. Kulkarni, A., & Eagle, M. (n.d.). Towards Understanding the Impact of Real-Time AI-Powered
Educational Dashboards (RAED) on Providing Guidance to Instructors.
15. Kurniawan, S. S., Maharani, N. Z., Sensuse, D. I., Purwaningsih, E. H., & Hidayat, D. S. (2024a).
Applying User Centered Design and System Usability Scale to Design Knowledge Management System
for Exam Proctors in Higher Education. Scientific Journal of Informatics, 11(4), 10431056.
https://doi.org/10.15294/sji.v11i4.9919
16. Lhoest, Q., Villanova del Moral, A., Jernite, Y., Thakur, A., von Platen, P., Patil, S., Chaumond, J.,
Drame, M., Plu, J., Tunstall, L., Davison, J., Šaško, M., Chhablani, G., Malik, B., Brandeis, S., Le Scao,
T., Sanh, V., Xu, C., Patry, N., Wolf, T. (2021). Datasets: A Community Library for Natural Language
Processing. In H. Adel & S. Shi (Eds.), Proceedings of the 2021 Conference on Empirical Methods in
Natural Language Processing: System Demonstrations (pp. 175184). Association for Computational
Linguistics. https://doi.org/10.18653/v1/2021.emnlp-demo.21
17. Liu, M., & Xu, J. (2025). NLI4DB: A Systematic Review of Natural Language Interfaces for Databases
(No. arXiv:2503.02435). arXiv. https://doi.org/10.48550/arXiv.2503.02435
18. Mehta, A. (2021). Knowledge Management in Higher Education Institutions: A Framework to Improve
Collaboration. Asian Review of 129 Mechanical Engineering, 10(2), 4347.
https://doi.org/10.51983/arme2021.10.2.3179
19. Methods and Applications of Fine-Tuning Llama-2 and Llama-Based Models: A Systematic Literature
Analysis. (2024). Journal of System and Management Sciences.
https://doi.org/10.33168/JSMS.2024.1015
20. Mosha, N. F., & Ngulube, P. (2023). Metadata Standard for Continuous Preservation, Discovery, and
Reuse of Research Data in Repositories by Higher Education Institutions: A Systematic Review.
Page 818
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
Information, 14(8), 427. https://doi.org/10.3390/info14080427
21. Muhammad farid fadhlan, null, & Dana indra sensuse, null. (2022). Knowledge Repository Design to
Improve Knowledge Management Process Capabilities: A Systematic Literature Review. Jurnal RESTI
(Rekayasa Sistem Dan Teknologi Informasi), 6(2), 246251. https://doi.org/10.29207/resti.v6i2.3929