Page 949
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
Building A Unified Benefits Data Repository for Real-Time Eligibility
and Enrollment Processing
Murugan Ambalakannu
Director Consulting Services, CGI, USA
DOI:
https://doi.org/10.51583/IJLTEMAS.2026.15020000084
Received: 25 February 2026; Accepted: 02 March 2026; Published: 19 March 2026
ABSTRACT
The establishment of a Unified Client Database (CDB) system and the associated ePRO program has
dramatically improved the efficiency of benefits administration for organizations. This client-centric approach
provided the Technical Lead from the Engineering Group with the ability to create an end-to-end data framework
using normalized, metadata-driven design that supports intricate multi-client benefit structures, secured
ETL/API connections to ensure seamless integration of enrolment, human resource and claims systems with
robust data governance through automated data validation, audit logging and version control. The platform was
developed for a cloud-based environment with the use of Redshift to support analytics and AWS S3 to serve as
the archive for the repository of data, supported with telemetry dashboards to enable real-time visibility into the
health of synchronization and ingestion latency. Implementing the platform has resulted in near real-time
downstream integration with the subsequent increase in the timeliness and accuracy of data which has reduced
the number of manual reconciliation errors by greater than 25% while improving compliance controls. In
alignment with the trend towards the utilization of data-based automation within enterprise EPM Modernization,
the CDB platform enables organizations to onboard clients quicker, achieve greater operational transparency and
provide a scalable foundation for future cloud services. The next phase of enhancements will seek to evolve the
CDB platform into a fully automated cloud-based ecosystem, including developments related to AI-based
anomaly detection; enhanced capabilities to expand on serverless microservices; and advanced analytics using
Redshift to provide predictive insight regarding benefit-related usage.
Keywords: Unified Client Database (CDB) system, Cloud-Based Environment, EPM Modernization, Serverless
Microservices
INTRODUCTION
In an effort to help manage eligibility, enrollment, and benefits more effectively throughout all organizational
divisions, while overcoming the complexity of creating client specific plan designs and keeping all plans upto-
date regularly, and ultimately to provide one unified system for all employee benefit administration related data,
the CDB and ePRO program were developed. Historically, organizations struggled with managing employee
benefit data due mainly to outdated technology, and have relied upon manual paper-based processes which
resulted in inaccurate data and ultimately hurt employee morale and put organizations at risk of not being
compliant with government regulations. In order to provide organizations with a flexible and efficient way of
providing employee benefit administration and data integration for their clients, the CDB and ePRO program
create a "single source of truth" when it comes to employee benefit information; therefore alleviating the
challenges associated with working with multiple data sources that have non-standardized formats and the
inability to obtain real-time updates about employee benefits. Modernizing how employee benefit data is
collected, managed, and reported prepares organizations to utilize Cloud technologies to improve the overall
storage capacity and provide enhanced analytics capabilities for employee benefit data, while maintaining a
secure environment to safely house and store sensitive employee data. The common challenges organizations
face when administering employee benefits include relying on manual methods, maintaining outdated
technology, and experiencing difficulties with regulatory compliance, as well as integrating data from several
different sources that create silos within organizations; therefore creating problems with high turnover rates and
low ROI due to fragmented data. Challenges arise due to educational deficiencies in helping employees
Page 950
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
understand options and manage the open enrollment process, as well as associated costs to the employer in
managing all the various aspects of open enrollment [1].
Software vendors and human resources consultants source their insights concerning the difficulties so many
businesses face in managing their business data via benefit systems from a variety of sources. The customer
database (CDB) allows a business to manage and organize both employee and customer benefits information as
a centralized repository. By centralizing all benefit-related data (i.e., employee eligibility, benefit plan
configurations, etc.), the CDB provides a single location for all businesses and applications to access accurate,
timely data concerning benefits. In addition to serving as a centralized repository for storing benefit information,
the CDB is also involved in supporting critical business processes such as eligibility determination, enrollment
management, and the creation of complex benefit plans for clients and insurance companies. To address the
challenges of managing real-time benefit data, it is important that the CDB remain synchronized across multiple
HR and payroll systems to avoid inaccuracies in eligibility determinations and compliance violations.
Additionally, the CDB must remain capable of supporting the complexities of benefit hierarchies and frequent
changes in policy while ensuring the integrity of the data being stored within it. Centralized databases are
essential to managing eligibility, enrollment, and configuration challenges that arise in the administration of
employee benefits. As a result, many acknowledge the CDB's critical role as an authoritative system of record
for managing benefits information [2].
For the Client Database and ePRO project, the Engineering Technical Lead was tasked with developing a
comprehensive framework to store client benefits data in a scalable manner throughout numerous client
environments at an enterprise level. An end-to-end solution needed to be developed which would provide a
consistent approach to capturing, processing, storing and distributing critical benefit data for clients and
employees in an accurate, timely and reliable way. Additionally, the complexity of creating and applying data
models to accommodate the varied configurations of client benefits presented an ongoing challenge to develop
an approach to accommodate client requirements via a flexible yet compliant metadata model - allowing for
modifications to be made to accommodate change as rules and configurations change regularly.
The integration of downstream applications, which rely on the processing of benefits data, eligibility and
demographics data required the Engineering Technical Lead to build resilient ETL Pipelines and secure API
endpoints that would allow for the proper processing and syncing of this information. In order to mitigate issues
related to inconsistent or untimely data resulting in eligibility mismatches or delays in employee enrollment and
the resulting impacts to both the Organizations and its Employees, the Engineering Technical Lead worked
towards resolving technology-related challenges. The architecture encompassed additional considerations such
as failover strategies, monitoring dashboards, and anomaly detection to ensure uninterrupted data flow.
Furthermore, the framework was developed to include a variety of elements related to effective data governance,
such as Audit Logging, Version Control, and Validation Criteria that utilize AWS Technologies to provide
scalable archival storage and advanced analytical capabilities. Every effort was made in balancing the technology
requirements against the constant need for system performance and reliability related to Leadership and
Engineering challenges [3].
A standardized, metadata-driven data architecture was developed for our project design that enabled us to
manage a multi-client, complex benefit structure efficiently, provided for an expandable solution, and eliminated
redundant data across the enterprise operations. Secure, ETL pipelines using RESTful APIs were built for
synchronizing benefits and eligibility, and demographic data with both the enrollment and claims systems. The
success of this development led to resilience in the flow of data. Automated data governance was implemented,
including version control, audit logging, and validation processes, to provide the ability to detect anomalies in
real-time and track compliance [4].
The AWS-ready data framework was built using Redshift for analytical processing and S3 for data storage in
order to establish a low-cost, high-performance infrastructure to migrate to the AWS cloud. Business process
monitoring was automated using telemetry dashboards to monitor key performance indicators, allowing
operational issues to be addressed before they escalated.
Page 951
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
Creating a centralized data management system for client-specific configurations and complex benefits plans
has established a solid operational basis for all of our clients benefits operations, which has provided our
company with the ability to quickly react to new business and regulatory requirements without incurring the
costs associated with undertaking reengineering of our operational systems. Establishing one single source of
truth for all benefits and eligibility and demographic data significantly reduces the ongoing manual reconciliation
of these numerous sources of data and greatly increases the accuracy and reliability of all benefit operations [5].
The scalable design and the ability to support growing data volumes and complex transactions as a client-based
business offering has enabled our systems to continue to meet the demands of our client-base as our clientbased
businesses continue to grow. By combining real-time availability of HR, claims and enrollment system data, we
have been able to create an operationally efficient process for delivering benefit payments and also reduce the
delays in processing benefit payments. In addition to providing improved operational visibility and the ability to
identify issues proactively through telemetry and dashboard monitoring, this visibility also enables compliance
auditing to be conducted with greater ease and performance improvement systems to be developed in order to
create a more robust and scalable system for managing benefits ultimately creating an opportunity for operational
excellence and increased strategic agility within a competitive business environment [6].
Related Work
Various publications have addressed the need for Centralized Data Architectures (CDA) and Quality Frameworks
and Integration Issues stemming from CDB and ePRO Programme Modernizations. The IJSAT paper describes
several strategies for integrating multiple Data Sources into one central System based on three areas of Entity
Resolution, Metadata-driven Models and Governance, with the ultimate goal of removing the Operational Silos
currently in effect. In addition, Techment has developed an Enterprise Data Quality Framework that allows
Corporations to assess their ROI through a more Structured Approach, as well as to ensure Compliance and
reduce Rework Costs by utilizing Validation Techniques. Another IJSAT paper discusses the use of Technological
Solutions to implements Systematic Quality Frameworks in Business Data Warehouses to Reduce Costs.
EWSolutions has developed a Governance-to-Analytics Framework which Facilitates Scalability; while a
research study published by SSRN analyzed Benefits Data Architectural Types for Benefits Data Systems
(BDSs) (Table 1) [7].
There have also been several recent systematic literature reviews that focused on Governance, Measurement,
Maturity Models, and Integration Issues with BDSs and Corporate Data Quality Frameworks. The 2023
Systematic Literature Review in the IJCCT discusses the Integration of AI and Machine Learning into Corporate
Data Quality Frameworks and states that it is crucial to provide Flexible Frameworks that can be evaluated via
Scalability KPIs. The ACM Systematic Literature Review published in 2024 identified gaps in Real-Time
Corporate Applications and Proposed Hybrid Frameworks that take Contextual Factors into consideration with
respect to the Management of Data Quality [8]. A Review of over 55 publications on ScienceDirect provides key
innovations in Enterprise Governance to improve Global Compliance and Analytics and identifies the main
concepts and Frameworks. The IJTech Study focuses on the Problems created by Multi-System Corporate
Environments and proposes an Application of TQM Principles to the areas of Validation and Auditing for BDSs.
The scholarly articles that were published between 2022 through 2024, as a collective whole, demonstrate a clear
relationship between normalized models, automated validation, and observable products of Data Quality
Management (DQM) [9]. Systematic literature reviews (SLRs) have been published since the year 2020 and
provide a comprehensive evaluation of DQM frameworks, which are defined by governance, timely response,
correctness, and completeness for both generic and specific domains. According to Miller (2025), hybrid
techniques would be more useful for achieving business compliance; the research is limited to basic metric
coverage and the absence of semantics. The ACM (2024) report provides information on Contextual Data Quality
Management, proposing hybrid models to scale. The work of Bernardo (2024) focuses on the advancement of
Data Governance through Validation and Observability in multiple system environments, while the PMC (2021)
report evaluates DQM frameworks for interoperability with benefit-like data and recommends technology
expansions. As a result, the common theme of this research is to develop framework(s) that utilize metadata and
are adaptive in regards to introducing automated governance into business benefit systems [10].
Page 952
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
Case studies involving the AWS Redshift-S3 integration illustrate that enterprise data access can now be
simplified through using an IAM Identity Center for centralized user authentication, which provides secure ETL
operations for collecting eligibility and claims data from all tenant accounts. The role chain enables tenant-
isolated schemas to be housed within Data Lakes, providing fine-grained access to HR data synchronization
within the same account. With Data Vault 2.0 technology, companies can perform agile ETL operations, process
data in real-time, and still store the data on the cloud by utilizing the specific normalized model and data
repository patterns within for Data Vault 2.0. In a previous CMU study prior to 2020, Redshift was shown to
have considerable utility at the petabyte scale, making it much easier for companies to integrate data with
Redshift.As supported by the Gavant case study, AWS Migration can facilitate scalable enrollment processes for
benefits administrators, allowing you to validate the design of corporate benefits data platforms in AWS papers
[11].
Architectures supporting an integration of clinical quality data through Electronic Health Record (EHR) and
patient-reported outcome (ePRO) data include the ability to ingest ePRO data in real time via AWS Healthlake
and allow for querying Redshift. Clinical Quality Repositories are designed to facilitate querying clinical quality
data through Redshift Data Vault 2.0 to manage EHR/ePRO data ingestion and to allow for the creation of
materialized views for ePRO reporting, as well as enabling access to Kinesis-Redshift data pipelines to support
low-latencyanalytic approaches to ePROs in clinical trials. Redshift's MPP architecture supports the evolution
of the infrastructure mentioned in CMU articles and is designed to support ePRO-related workloads, based on
findings from the CMU Heresies labs study of Pharma Big Data use cases on ePROs and patient outcome
analytics.
The architecture of both healthcare and patient data pipelines leverages Redshift for analytics and reporting
capabilities on ePRO-like data stored in S3, supporting the infrastructure for benefits administration. The Zero
ETL Patient 360 model has been developed by AWS as a hybrid solution, integrating Kinesis/S3 and AWS
CloudWatch as the mechanism to link Redshift to Healthlake streams of ePRO data in real-time for querying on
both eligibility and outcome data. In 2022, the Modernization Case discussed the Clinical Quality Repositories,
which support the enrollment process through the use of Redshift Data Vault in the ePRO/EHR reports from S3,
enabling modern companies like Halodoc to use Airflow and an S3-Redshift integration for conducting ELT of
sensitive benefits-scale patient data. Redshift supports ETL orchestration of customer and enrollment data from
S3 through the use of the Redshift Data API in AWS Step Functions, supporting the realtime synchronization of
ePRO data with Redshift. In the enterprise reporting context, examples of Redshift's capabilities to support data-
warehouse scenarios are discussed in connection to the RE/MAX NorthBay Case of 2025, which applies
migration techniques that are CDB cloud-ready [13].
As noted in prior sections, healthcare and patient data pipelines utilize one of many AWS methods for creating
data pipelines to stream ePRO-like data into S3 and load ePRO-like data from S3 into Redshift, using COPY or
Spectrum to load ePRO-like data into Redshift. For example, Kinesis Data Firehose can be used to stream real-
time data updates from mobile applications into S3 and ingest them into Redshift. One way to facilitate ingestion
and conversion of less-than-homogeneous ePRO file types and schema inference through ETL jobs is to use
AWS Glue, with orchestration of the job-specific ETL being handled by Airflow. The Redshift COPY command
can optimize the speed of data ingestion by copying data to several compute nodes while concurrently monitoring
for files uploaded in S3. Tools such as Integrate.io and Fivetran can be used for automated ELT processes, which
can reduce the manual intervention required for loading raw data into Redshift, eliminating the need for manifest
files. As such, important considerations for effectively scaling ePRO data include implementing incremental
Change Data Capture, using Glue for schema evolution, and partitioning S3's content in ways to facilitate better
query performance on the data [13].
System Architecture
The Client Database (CDB) is a secure multi-tiered database containing a distinct set of information about each
client and their employees, and a single solution for managing benefits via the ePRO Platform. The CDB uses
an AWS Redshift warehouse to provide a normalized data structure, where all benefit plan configurations and
enrollment processes are stored using metadata-driven configurations, allowing for dynamic benefit
configuration settings and enrollment queries that support multi-client benefit hierarchy structures. Data
Page 953
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
integration occurs via the AWS COPY/Spectrum load process after secure ETLs, and the CDB integrates with
client systems using RESTful APIs allowing for both directions of data synchronization between the CDB and
client systems, which greatly reduces reconciliation failures.
To ensure compliance with data quality and governance, Spark tasks automate the data validation process,
maintain an audit log of all activity performed on CDB data, maintain a version history of any CDB changes,
and provide both automated and manual methods of monitoring all activity. This design utilizes the power of
AWS technologies including Redshift Serverless for on-demand analytical processing, and S3 JavaScript
lifecycle policies for a secure cloud data storage solution. Monitoring of both the API flows and processing
statistics can be achieved using AWS X-Ray and CloudWatch services enabling real-time visibility into the
ingestion timing and synchronization health of the migrated data. Through these efficiencies, the data from the
Client Database (CDB) is accessible to all authorized users, thus providing efficiency improvements through
reduced errors, enhanced scalability, and improved governance of that data in the benefits ecosystem.
The following Figure 1 outlines the architecture for building a scalable data repository and integration layer
utilizing AWS services of S3, Redshift, and CloudWatch for optimizing benefits data for multiple clients in
accordance with the Well-Architected Framework.
Figure 1: CDB and ePRO Benefits Administration Architecture
Ingestion and integration phase:
Glue ETL, which is responsible for converting the input data into Parquet format in S3 stored in
hierarchical structures based on the clients' plans.
API Gateway for securing RESTful endpoints to HR, claims and enrollment data sources.
Data synchronization is scheduled to be completed in near real-time using Spectrum or COPY
commands, which will reduce reconciliation timelines by approximately 25%.
Redshift data model:
A standardised metadata-driven schema for multiple client benefits (from clients to plans to
eligibility).
CloudTrail books' audits and GitOps will govern configuration compliance.
Automated validation routines using Spark ensure adherence to compliance standards.
Storage and Analytics phase:
Redshift Serverless allows for scalable reporting on eligibility and enrollment trends.
Page 954
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
Using S3 Lifecycle tiers for archiving your data, GLACIER was selected as the historical data
storage.
Observability phase:
monitor the health status of each entity with the dashboards.
monitor the latency of ingestion (<5-minute SLA) and failures of synchronization via
CloudWatch metrics.
The AWS services that support the CDB/ePRO benefits platform are categorised into two areas: ingestion and
transformation. The procedures for ingestion are focused on the secure and efficient transfer of source data from
HR, claims and enrollment into S3 with minimum processing. The services for ingestion include Kinesis Data
Firehose as a real-time source for streaming enrollments and eligibility changes; API Gateway with Lambda for
RESTful API ingestion; and AWS DMS for Change Data Capture replication from Legacy databases into S3.
The transformation process is targeted on the conversion of data from S3 into standardised RedShift Schemas,
while ensuring quality and governance of the data. To complete the ETL process in a serverless environment,
AWS Glue will be used to carry out all the ETL tasks, including schema inference and incremental loading; while
AWS EMR ( spark ) will be responsible for significant transformations (i.e. validation and audit of benefit plans,
creating hierarchical structures for complex benefit plans) during the transformation phase. Finally, the post-
ingestion ELT of Redshift's COPY command and SQL Views enables analytical capabilities on both the curated
and raw S3 data.
AWS Batch supports the nightly loading of eligibility for CDB/ePRO benefits using AWS services such as S3,
DMS, AWS Glue and AWS Batch; while streaming services such as Amazon MSK and Kinesis enable realtime
enrolment changes to be ingested. The process of batch ingestion is associated with complex ETL operations,
while the planned latencies for ingestion can range from minutes to hours but are associated with a high volume
of data. In addition, streaming ingestion provides immediate updates at throughput levels of 10,000
events/minute. On the durability side, S3 is extremely reliable and provides the most scalable performance, while
additional cost models of pay for throughput are available; Glue pricing is also provided on a per DPU hour
basis. A hybrid model is recommended, combining Firehose for real-time streaming and Glue Batch for
transformation, in order to increase data governance, reduce errors by approximately 25% and improved
performance with CloudWatch metrics. Refer to the table below (Table 1) for additional details.
Aspect
Batch Ingestion
Streaming Ingestion
Primary Services
AWS Glue (ETL jobs), AWS Batch
(scheduled), S3 + DMS (CDC
replication)
Kinesis Data Firehose/Streams (realtime
buffering), MSK (Kafkacompatible)
Latency
Minutes-hours (scheduled runs, e.g.,
daily reconciliation)
Seconds (<60s for eligibility updates)
Throughput
High volume, low velocity (e.g., 1TB
nightly benefits extracts)
Continuous high-velocity (e.g., 10K
enrollment events/min)
Use Case Fit
Historical loads, complex ETL
(multiclient hierarchies to S3/Redshift
COPY)
Real-time sync (claims eligibility,
reducing 25% errors)
Durability/Scaling
99.999999999% (S3 landing); autoscales
jobs
99.9% delivery; shard-based scaling
(1MB/s/shard)
Cost Model
Pay-per-job (Glue ~$0.44/DPU-hr); spot
instances
Pay-per-throughput (~$0.015/GB
ingested)
Page 955
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
Downstream
Redshift COPY from S3 manifests; Glue
crawlers
Firehose direct-to-S3 Glue streaming
ETL → Redshift Streaming
CDB/ePRO Example
Nightly HR extracts normalized
model build
Live enrollment APIs <5min latency
dashboards
Table 1: AWS Services: Batch vs Streaming Ingestion
A hybrid microbatch and streaming pipeline for CDB/ePRO on AWS is designed using a Kappa/Lambda
architecture that allows for correcting and reconciling microbatches each night, thereby reducing errors by
approximately 25%. It supports managing real-time eligibility and enrollment requests within 60 seconds of
receiving them. The Kappa Architecture consists of multiple components: Amazon S3 and Amazon Redshift are
used for serving layers; Amazon Kinesis Firehose supports streaming; and AWS Glue is used for streaming and
batch processing. There are two unique paths through the Kappa Architecture to accommodate both realtime and
nightly transactions: one is the Streaming Path, which processes and consumes real-time data as it arrives; and
the other is the Microbatch Path, which reconciles night's errors (up to definition of reconciliation) for all
transactions received during the previous 24 hours.
The Kappa Architecture orchestrates processing along its two primary paths using Step Functions when
processing in microbatches, and using EventBridge when processing in real-time. The ingestion layer utilises
Glue Batch to process files received, and Kinesis Firehose to stream data into fulfilment systems. The microbatch
and streaming paths have consistent data formats and closely aligned transformation processes through Glue
Data Catalog and communal Spark UDFs. Data governance is supported by using idempotent upserts and
watermarking to support late arriving data and provide proof of scientific rigor when using third party sources.
Additionally, observability into the Kappa Architecture is supported by X-Ray and CloudWatch to monitor
latency.
The Kappa Architecture is scalable, supporting multiple customers and provides almost real-time data
corrections and reconciliations, which results in improved accuracy of transactional data; and improves the
integrity of historical transaction records. CloudFormation can be used to deploy the Kappa Architecture. Figure
2 provides a visual representation.
Figure 2: Hybrid Microbatch + Streaming Pipeline Design on AWS
Page 956
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
The CDC Choices develop a Hybrid CDB/ePRO pipeline via RedShift that will facilitate and manage the changes
associated with eligibility and enrollment through RedShift and RDS. Prior to normalizing the records into
RedShift, the staging area for those records is Amazon S3 where a combination of Batch and Stream Processing
Solutions are used through AWS-native methods.
AWS DMS is an overall preferred solution, as it is capable of performing both full and continuous replication,
with one minute of latency between replications, and has as of yet exhibited a 25% lower error rate based on
reconciliation accuracy than any other type of replication by less than 1%. A viable option for real-time
synchronized claims data is Kinesis along with the native support of Debezium to enable sub-second latency
support of this solution combination. Supporting the latency of the streaming environment, AWS Glue Streaming
ETL provides an additional logical replication option between Glue Streaming ETL and S3 and RedShift.
Lambda and Firehose will be low-cost options to process small volumes of data.
Architected pathways exist from RDS to RedShift using CDB/ePRO optimally, whereas DMS and Glue
Streaming are recommended overall options for this pipeline. For this use case, it is recommended that logical
replication from RDS permit the use of Hudi for upsert capabilities and use CloudWatch to monitor validation
accuracy. The key performance indicators for the hybrid Microbatch and Streaming Pipelines represent the
combination of all of the throughput (TP), latency (LT), data freshness (DF), and Error Rates as the best use of
resource(s) across the different pipelines, while ensuring that all SLAs (Service Level Agreements) are met
within five minutes and an error reduction of at least 25% is achieved.
The primary metrics of end-to-end latency (LT), Throughput (TP), and Error Rate measures the performance of
both batch processing and Streaming Processing (TP) for the purpose of comparing the two processes, such that
confidence in supporting the required events (over 10K per minute) through both pipelines continues, as well as
tracking accuracy metrics, such as the error rate of failing validations and failing to deliver, with an anticipated
improvement of reconciliation error rates to less than 1%. Additionally, the data freshness will support real-time
eligibility determination and updated systems within a five-minute timeframe.
To manage the processes and mitigate over-provisioning, resource utilization metrics are targeted to achieve
operating utilization of 60-80% of each system's capacity to manage correctly the function of Slave/Master. Cost
efficiencies in both the primary measures and secondary measures will emphasize the need to have total costs
maintained below $3.00 per Terabyte of data processed.
Through the overall successes achieved, the total deliveries subsequently have increased substantially through
the use of dashboard tools for monitoring key performance metrics and alerting of major issues. Machine learning
will enable enhanced anomaly detection and the completion of weekly scalability evaluations in place to support
this effort as well.
The historical synthetic dataset illustrates the significant growth from the years of 2018 through 2025 with a
clear delineation between the transition from linear batch procession integrity (accuracy) only through the hybrid
Microbatch/Streaming solutions.
One primary metric for overall growth in this has been, from the total latency (TL) of the year 2018 (240-minutes)
to the total latency (TL) of the year 2025(3.6-minutes), and compliance with SLAs of (92%) in 2025 versus
(45%) compliance in 2018. The second metric highlights significant improvements in data quality from the
reduction of 18.5% reconciliation error rates in 2018, to 2.1% reconciliation errors in 2025, therefore
accomplishing the goal of a 25% error reduction.
The dataset also shows a substantial increase in efficiency through a reduction of the total cost of $12.50 per
Terabyte (2018), to $2.18 per Terabyte (2025) represents a [82%] cost reduction. Lastly, the visual representation
of the incremental increase in TP demonstrates an increase from processing 0.4 billion records (2018) to
processing 18.5 billion records (2025), or an 11x Increase in querying scalability. The specific details regarding
these trends can be referenced in Figure 3.
Page 957
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
Figure 3: CDB/ePRO Progress (2018-2025)
CONCLUSION
The Client Database (CDB) and ePRO initiatives brought together all of the organizations, employees, and
benefits administration information into one shared repository that would allow access via one scalable platform.
The purpose of establishing this consolidated platform was to expedite the process of eligibility determination
for benefits, benefit enrollment and configuration of benefits across multiple business lines. The development of
the Unified Platform created a Centralized Record System with a Consistent Data Architecture for storing data
and providing secure ETL and API interface connections from the Claims System to the Human Resource (HR)
System and the Enrollment Process System. The Unified Platform's design will enable organizations to provide
the best possible levels of Data Quality and Governance, which has resulted in a significant reduction in the
errors associated with manual record reconciliations. The use of Amazon Web Services (AWS) will provide the
ability to add functionality to the platform in the future. As the project continues to develop, Telemetry
Dashboards will be created for the monitoring of real-time data flows throughout the Unified Platform and the
overall health of the system to create better controls of the Benefit Administration process through compliance,
while also increasing the Quality and Timeliness of Benefit Data. The Updated System Architecture will help to
streamline new Client Onboarding and improve the efficiency of Benefit Administration, allowing for increased
Transparency and Scalability. Future objectives include building the ability to utilize Machine Learning for
identifying/assessing Anomalies in Client Behavior; creating more opportunities for containerized
Microservices; and utilizing Artificial Intelligence for improving insights into Client Behavior. This Project
demonstrates how moving to Cloud and utilizing Modern Data Designs allows Organizations to develop Flexible
and Adaptable System Architectures to respond to everchanging Business and Regulatory needs.
REFERENCES
1. “Challenges and Opportunities of Big Data in Health Care: A Systematic Review”, Clemens Scott Kruse,
Rishi Goswamy, Yesha Raval, Sarah Marawi, 2016 Nov 21,
https://doi.org/10.2196/medinform.5359.
2. "How to Deal with the Challenges of Benefits Administration", November 18th, 2021,
https://www.obsidianhr.com/how-to-deal-with-the-challenges-of-benefits-administration/.
3. “The Benefits of Client Database Management for Service Businesses”, Kirsten McNeice, December 17,
2020, https://www.accelo.com/crm/client-database-management-benefits.
4. “Developing a Relational Database for Best Practice Data Management: The TEAM-UP Database”, Jenny
Alderden, Phoebe D Sharkey, Susan M Kennerly, Sanjay Ghosh, Ryan S Barrett, Susan D Horn, Sayoni
Ghosh, Tracey L Yap, 2023 Feb 1, https://doi.org/10.1097/CIN.0000000000001011.
Page 958
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
5. "A Framework for Data Quality - FCSM-20-04", September 2020,
https://nces.ed.gov/fcsm/pdf/FCSM.20.04_A_Framework_for_Data_Quality.pdf.
6. "The data-driven enterprise of 2025", January 28, 2022,
https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-data-driven-enterprise-of-2025.
7. "A systematic literature review towards a conceptual framework for enablers and barriers of an enterprise
data science strategy", Rajesh Chidananda Reddy, Biplab Bhattacharjee, Debasisha Mishra, Anandadeep
Mandal, 2022,
https://ideas.repec.org/a/spr/infsem/v20y2022i1d10.1007_s10257-02200550-x.html.
8. "Factors Influencing Master Data Quality: A Systematic Review", Azira Ibrahim, Ibrahim Mohamed,
Nurhizam Safie Mohd Satar, 2021,
https://thesai.org/Downloads/Volume12No2/Paper_24Factors_Influencing_Master_Data_ Quality.pdf.
9. “A systematic literature review on data quality assessment”, Oumaima Reda, Naoual Chaouni
Benabdellah, Ahmed Zellou, 2023, ISSN: 2302-9285, DOI: 10.11591/eei.v12i6.5667.
10. Quality assessment of real-world data repositories across the data life cycle: A literature review”,
SiawTeng Liaw, Jason Guan Nan Guo, Sameera Ansari, Jitendra Jonnagaddala, Myron Anthony Godinho,
Alder Jose Borelli Jr, Simon de Lusignan, Daniel Capurro, Harshana Liyanage 2, Navreet Bhattal, Vicki
Bennett, Jaclyn Chan, Michael G Kahn, 2021 Jan 26,
https://doi.org/10.1093/jamia/ocaa340.
11. "Secure data movement across Amazon S3 and Amazon Redshift using role chaining and
ASSUMEROLE", Sudipta Mitra, Lisa Matacotta, Michelle Deng, Jared Cook, 28 APR 2022,
https://aws.amazon.com/blogs/big-data/secure-data-movement-across-amazon-s3-and-amazon-
redshiftusing-role-chaining-and-assumerole/
.
12. Improve clinical trial outcomes by using AWS technologies”, Mayank Thakkar, Deven Atnoor, 11 APR
2019, https://aws.amazon.com/blogs/big-data/improve-clinical-trial-outcomes-by-using-
awstechnologies/.
13. "ETL orchestration using the Amazon Redshift Data API and AWS Step Functions with AWS SDK
integration", Jason Pedreza, Bipin Pandey, David Zhang, 02 MAR 2022,
https://aws.amazon.com/blogs/big-data/etl-orchestration-using-the-amazon-redshift-data-api-and-
awsstep-functions-with-aws-sdk-integration/
.