Page 1207

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026

Intelligent MPD Set-Point Control for Narrow Windows: A

Reinforcement Learning Framework for Automated Choke Control

in Deepwater HPHT Wells

John Lander Ichenwo, Marvellous Amos

Department of Petroleum Engineering, University of Port Harcourt

DOI: https://doi.org/10.51583/IJLTEMAS.2026.15020000105

Received: 01 March 2026; Accepted: 06 March 2026; Published: 20 March 2026

ABSTRACT

Deep water high-pressure, high-temperature (HPHT) drilling in managed pressure drilling (MPD) offers great

challenges because of very thin changes between pore pressure and fracture gradient, frequently below

equivalent mud weight of 0.3 ppg. Traditional rule-based MPD tuning algorithms fail to hold optimal setpoints

in real-time resulting in influx/loss events and inefficient rate of penetration (ROP).

This paper introduces a new reinforcement learning (RL) model of intelligent MPD set-point control, which is

more specifically backpressure and equivalent circulating density (ECD) optimization in narrow-margin

deepwater wells that characteristic of the Gulf of Guinea area, such as Ghana and Nigeria. A Deep Deterministic

Policy Gradient (DDPG) model was optimized using past MPD operational data in West African deepwater

campaigns which included a multi-objective reward function balancing between influx risk, loss risk, and ROP

optimization. The paradigm shows that the pressure control accuracy (23% improvement over rule-based

approaches) and mean absolute pressure deviation (42 psi to 32 psi) are lower in comparison to rule-based

approaches.

Moreover, the model had an average increase of 15 percent in the average ROP and was able to keep wellbore

stable despite the thin drilling window. The intelligent control system detected and reacted to simulated kick

situations 18 seconds earlier than the traditional automated MPD systems, which is a giant development in real-

time well control capacity. The above results imply that the operational benefits of RLbased MPD control are

enormous in strenuous deepwater HPHT drilling operations, and could be applicable to the whole range of the

Gulf of Guinea deepwater drilling ventures.

Keywords: Managed pressure drilling, reinforcement learning, HPHT, narrow margin, deepwater drilling,

DDPG, choke control, Gulf of Guinea

INTRODUCTION

The world chase of the hydrocarbon resource has moved drilling activities towards the deepwater environment

with intense geology such as the high-pressure, high-temperature (HPHT) oil reservoirs with small working

conditions. These issues are seen in the Gulf of Guinea, which contains the prolific hydrocarbon provinces

offshore Nigeria and Ghana, and the deepwater operations have added significant production capacity to the

region. The deepwater fields in Nigeria have a record of contributing more than 800,000 barrels per day to

national production with Ghana having the Jubilee, TEN and Sankofa fields that contribute around 160,000

barrels per day by the deepwater offshores.

The main problem with HPHT deepwater drilling is that formation pore pressure and fracture gradient have a

thin margin that may not exceed 0.3 ppg equivalent mud weight (EMW). This limitation is a serious hindrance

to the operational time frame to ensure stability of drilling wellbores besides maximizing the performance of

the drilling. The pressure margin between pore pressure and fracture initiation pressure determines the margin

of drilling; when the equivalent circulating density (ECD) reaches any of the above limits, then influx or loss

Page 1208

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026

situations can take place. Traditional drilling techniques are often inadequate in addressing such thin margins of

pressure as they are incapable of delivering the accuracy that is necessary in real-time management of pressure.

The making of managed pressure drilling (MPD) has become the supporting technology in overcoming such

hazardous conditions. In all its drilling activities, MPD makes use of dynamically controlled surface equipment

such as choke valves and auxiliary pumps to ensure total control of the bottomhole pressure (BHP). The method

allows the real-time management of the pressure, limiting the potential influx and loss circulation and permitting

the drilling works to be performed in the previously undrillable formations. The performance of MPD operations

is however critical on the accurateness and responsiveness of the set-point control system controlling the

position of choke and application of the backpressure.

Current MPD control systems are primarily based on proportional-integral-derivative controllers or model

predictive control systems that are based on simplified models of hydraulic systems Although these methods

have proven to be very successful in numerous applications, it is limited in very dynamic HPHT environments

where formation pressures are very dynamic in terms of ramps and regressions. The complexity of the natural

multi-phase flow process, the fluid's dependent properties on temperature, as well as the variations in the

wellbore conditions are in question of the accuracy of the reduced-order models on which conventional

controllers are based. Moreover, rule-based tuning methods demand a lot of manual tuning, and are not able to

react to dynamic downhole conditions dynamically.

The recent development in artificial intelligence, especially reinforcement learning (RL) has provided good

alternative solutions to the complex control problems in the drilling activities. Deep RL algorithms have shown

an impressive ability in optimization of drilling parameters such as weight on bit (WOB) and rotary speed

(RPM) to optimize ROP without causing downhole vibrations. The Deep Deterministic Policy Gradient (DDPG)

algorithm, which is algorithmically tailored to continuous action space, has been demonstrated to be of special

interest to drilling tasks in which the outputs of control systems need to be continuously tuned, as opposed to

being chosen between discrete values. The given research fills an important gap in the literature by creating and

testing a reinforcement learning model that is specifically tailored to MPD set-point control in narrow-margin

HPHT deepwater wells. The paper is dedicated to the Gulf of Guinea working situation, where historical data

on the Nigerian and Ghanaian deepwater campaigns is used to develop models that suggest realtime

backpressure and equivalent mud weight set-points. The new multi-objective reward function proposed also

balances influx risk, loss risk and ROP optimization simultaneously, unlike previous single-objective based

research which has dominated the research on the automation of drilling.

The three main goals of the study are as follows: first, this paper is expected to create a DDPG-based control

agent that will be able to learn the optimal MPD set-point policies on the basis of historical operational data;

second, this paper is supposed to show the quantifiable benefits of pressure control accuracy and drilling

performance that are provided by the framework in comparison with the rule-based approaches to MPD tuning;

and third, this paper should prove that the framework will be able to detect early kicks and respond during

simulated HPHT conditions typical of the Gulf of Guinea conditions.

LITERATURE REVIEW

Managed Pressure Drilling in HPHT Environments

Managed pressure drilling is a modern technology that was aimed at solving the difficulty of drilling in HPHT

reservoirs often encountered in deepwater conditions. The technology offers accurate annular pressure profile

controlling by manipulating surface backpressure using automated choke systems to give operators the

capability of maintaining constant bottomhole pressure (CBHP) within narrow operational windows. The basic

principle is to seal the annulus with a rotating control device (RCD) and supply flow via a choke manifold to

permit the quick pressure up and down adjustments to counter any changes in ECD needed during various

operations of the drill.

The use of MPD in deepwater HPHT environment has been well reported with significant achievements in the

Mediterranean, West Africa and Southeast Asia. The application of MPD in the West Nile Delta by BP proved

Page 1209

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026

that drilling in less than 0.3 ppg EMW PP-FG window formations was possible at target depth, and that the

technology would allow it to obtain wells that could not have been reached by conventional techniques. On the

same note, the breakthrough of ENI through its continuous circulation technology coupled with MPD also

facilitated successful penetration through ultra-narrow pressure windows of deepwater HPHT conditions. These

case studies can define MPD as a developed technology which can cope with extreme challenges in drilling but

also mark the importance of correct pressure control and the drawbacks of available automation solutions.

Control Systems for MPD Operations

The automation process has been developing over the last twenty years in the system of the MPD choke control

as it shifted to the period of the manual control systems to the modern system of the model-based control. Most

modern MPD control systems use hydraulic models to calculate real time downhole pressure and feedback

control algorithms to automatically set choke position to achieve desired pressure set-points. Model predictive

control has become the most popular way of automated MPD operations through the capacity to deal with

numerous inputs and outputs without violating operational constraints. The initial application of a highly

accurate real-time high-fidelity flow model modified to operate in an MPC controller was reported by Park et

al. (2020), in which the model showed a better control outcome when used in the process of drilling, pipe

connections, and mud density displacements. The research has pointed out that the success of controllers is

dependent on the outlook of the model, and discrepancies between the model and the real system results in poor

control.

It has been suggested that nonlinear MPC (NMPC) strategies are required to deal with nonlinearities inherent to

MPD systems, especially when operating in two-phase flow conditions which appear when there is a kick event.

The Memorial University researchers have come up with elaborate solutions of simple PID controllers to

complex NMPC with fault management capabilities, which have been experimented both at the simulation and

lab level. Despite this progress, there is still much work to be done before robust control performance can be

attained throughout the complete range of conditions experienced in HPHT deepwater drilling, and especially

during transient periods of operations, such as during connections, trips, and formation transitions.

Machine Learning Applications in Drilling

Machine learning applications have proved to be very successful in other drilling applications that include

prediction of ROP, estimation of ECD, and optimization of drilling parameters. Gamal et al. (2021) used

artificial neural networks (ANNs) and adaptive network-based fuzzy inference systems (ANFIS) to predict ECD

with correlation coefficients of more than 0.96 and average error ratios of less than 0.7 on average. Recently,

Ekechukwu et al. (2024) introduced an explainable machine learning framework based on XGBoost

methodology to predict ECD and showed R 2 values of 0.989 to predict the trend on testing data with a feature

importance analysis to interpret the results. Such studies prove that machine learning models are able to

effectively represent the multifaceted relationships between drilling parameters and the most important wellbore

conditions.

Random Forest models have demonstrated especially good performance in ECD prediction with the authors

showing R

of 0.9859 and RMSE of 0.0017 in testing datasets with input of parameters of surface drilling (Gao

et al., 2024). The benefit of surface measurements only removes the use of expensive downhole sensors yet still

has a high prediction accuracy. These methods can be immediately applied to MPD control systems and, in that

case, the reliable ECD estimation is required to keep the pressure within tight operational limits.

Reinforcement Learning for Drilling Optimization

Reinforcement learning is a new paradigm to the automation of drilling, which promises the possibility of agents

learning how to act in the best control policy when interacting with their surroundings instead of being

programmed to do so. For instance, a virtual drilling agent developed by Huang et al. (2024) is based on DDPG

algorithm and will automatically optimize drilling variables, incorporating ROP, vibration, dull bit, and risk of

tool breakage into a reward function. Their findings indicated that RL model has the potential of identifying the

Page 1210

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026

best solution to different drilling conditions including hard formation, embedded rock and unstable drilling

conditions.

The DDPG algorithm belongs to the type of actor-critic algorithms for RL, which combines value-based and

policy-based approaches for dealing with actions that have infinite dimensions. The algorithm keeps two neural

networks, an actor network which computes the best action based on the current state, and a critic network

which approximates the value of state-action pairs. This is particularly suited to MPD control systems wherein

the set-points for choke position as well as backpressure are not chosen discretely but are varied continuously.

Keshavarz et al. (2024, 2025) suggested deep reinforcement learning representations of the real-time planning

of drilling operations with Markov Decision Process formulations and the application of the Gaussian process

algorithm to determine the safe operating window. Their work showed that in the process of wellbore cleaning,

automated decision-making was possible, the high-level performance could be ensured, and non-value-added

tasks are removed. The theoretical use of RL methods in the MPD set-point optimization is supported by the

autonomous optimization technique created by the researchers utilizing the Q-learning algorithms to control the

drilling parameters.

Research Gaps and Contribution

The research gaps and contribution are as follows. Although tremendous progress was achieved in the field of

MPD automation and RL-based drilling optimization, there is an essential gap in the intersection. Existing

systems of MPD control are mainly model-based and need good hydraulic models and manual fine-tuning.

Although, RL methods have been used to optimise the drilling parameters (WOB, RPM) their implementation

to MPD set-point control has not been investigated in the published literature. Moreover, the current RL drilling

programs are mainly devoted to single-objective optimization (usually, the ROP maximization), whereas the

MPD control should be considered in a multi-objective way and involve the influx risk, loss risk, and drilling

efficiency optimization simultaneously.

This paper fills these gaps by creating a DDPG-based framework that has been tailored in MPD set-point control

in narrow-margin HPHT wells. The new contribution consists of: (1) the MPD control problem is formulated as

a continuous-action RL problem; (2) a multi-objective rewarding function balancing the risk of influx/loss and

ROP maximization is developed; (3) the problem is trained and validated using data that is representative of Gulf

of Guinea deepwater operations; and (4) the problem is quantitatively compared to rulebased MPD tuning

techniques.

METHODS

Data Description and Preprocessing

The data used in this paper is operational data of 12 motorized HPHT wells that have been drilled in the Gulf of

Guinea between 2018 and 2024, of which 7 wells are offshore Nigeria and 5 wells are offshore Ghana. The data

were collected in the MPD operations carried out as per the CBHP methodology in water depths of 850 to 2,100

meters of which the depths were measured up to 5,500 meters. The total data size is about 18,500-time stamped

records taken at 10 s when MPD is active.

The input features are measurements of the surface that are regularly taken when the MPD is being used: pump

flow rate (GPM), standpipe pressure (SPP), weight on hook (WOH), rotary speed (RPM), rate of penetration

(ROP), mud density (ppg), and choke position (%). The target variables in this work are bottomhole pressure as

well as PWD tools and ECD from downhole measurements. Extra derived features would be flow-in/flowout

differential, pressure trend derivatives and time-lagged variables that represent system dynamics.

Preprocessing of the data was done by the standard drilling data quality assurance. Values from null records,

sensor malfunction, or redundant data points were dropped, making valid observations 15,840. Outlier detection

and removal was done by the Interquartile Range (IQR) method, which removes about 4% of records that

Page 1211

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026

contained anomalies in the pressure or flow measurements. All features were normalized to a z-score to

guarantee equal scaling of all the input dimensions, which is vital in the stability of neural network training.

Stratified sampling was used to divide the dataset into the training (70%), validation (15%), and testing (15%)

subsets to maintain representative representation of well types and formation characteristics by the partitions.

Each well was ordered in time to retain realistic sequential dependencies that are needed when training RL.

Reinforcement Learning Framework

The problem of MPD set-point control was defined as Markov Decision Process (MDP) that could be optimized

by use of a RL. On every time step t, the agent perceives state s t which includes the current drilling parameters

and pressure readings, chooses action a t which includes backpressure and ECD set-point changes, receives

reward r t indicating the performance of control, and changes to successor state s_{t+1}.

State Space: The state vector s_t ∈ ℝ consists of pump flow rate, standpipe pressure, weight on hook, rotary

speed, ROP, mud density, current choke position, current backpressure, BHP, ECD, formation depth, and

porefracture margin (calculated based on offset well data).

Action Space: Action vector a_t ∈ ℝ² defines continuous changes in: (1) backpressure set-point ( 0 +50 psi),

and (2) desired ECD set-point ( 0 +0.02 ppg). The continuous action space is needed to obtain fine-grained

control of pressure within tight margins that are typical of HPHT works.

Reward Function: The reward function was designed as a multi-objective to balance competing goals of MPD

control:

𝑟𝑡 = 𝑤

⋅ 𝑟𝑝𝑟𝑒𝑠𝑠𝑢𝑟𝑒 + 𝑤

⋅ 𝑟𝑅𝑂𝑃 + 𝑤

⋅ 𝑟𝑠𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦

where:

• 𝑟

𝑝𝑟𝑒𝑠𝑠𝑢𝑟𝑒

= −∣ 𝐵𝐻𝑃

𝑡

− 𝐵𝐻𝑃

𝑡𝑎𝑟𝑔𝑒𝑡

∣/𝜎

𝑝𝑟𝑒𝑠𝑠𝑢𝑟𝑒

penalizes deviation from target BHP

• 𝑟

𝑅𝑂𝑃

= (𝑅𝑂𝑃

𝑡

− 𝑅𝑂𝑃

𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒

)/𝑅𝑂𝑃

𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒

rewards ROP improvements

• 𝑟

𝑠𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦

= −𝜆

⋅ 𝐼

𝑖𝑛𝑓𝑙𝑢𝑥

− 𝜆

⋅ 𝐼

𝑙𝑜𝑠𝑠

penalizes influx/loss indicators

Sensitivity analysis helped to optimize the weighting coefficients (w₁ = 0.5, w₂ = 0.3, w₃ = 0.2) in order to create

a balanced optimization of objectives. The influx indicator I_influx activates when flow-out exceeds flow-in by

more than 2%, while I_loss activates when the opposite condition holds for sustained periods.

DDPG Algorithm Implementation

The Deep Deterministic Policy Gradient algorithm was selected for its proven capability in continuous control

tasks with high-dimensional state spaces. DDPG maintainsintains four neural networks: actor network μ(s|θ^μ),

critic network Q(s,a|θ^Q), and their respective target networks with parameters θ^μ' and θ^Q' that are soft-

updated to improve training stability.

The actor network architecture comprises three fully-connected layers with 256, 128, and 64 neurons

respectively, using ReLU activation functions and batch normalization. The output layer employs tanh activation

scaled to the action bounds. The critic network concatenates state and action vectors after the first layer, with

subsequent architecture matching the actor.

Training utilized experience replay with buffer size 100,000 and mini-batch size 64. Target networks were

updated using soft update coefficient τ = 0.001. The Ornstein-Uhlenbeck process provided exploration noise

with parameters θ = 0.15 and σ = 0.2, decayed exponentially over training episodes. Learning rates were set at

10⁻⁴ for the actor and 10⁻³ for the critic, following recommendations for drilling applications.

Page 1212

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026

Baseline Comparison Methods

To quantify improvements over existing approaches, three baseline methods were implemented:

Rule-Based Control: Traditional MPD automation using PID controller with fixed gains tuned for nominal

operating conditions. Backpressure set-points adjusted based on lookup tables indexed by measured depth and

drilling phase.

Model Predictive Control: Linear MPC using simplified hydraulic model with prediction horizon of 40 time

steps and control horizon of 15 steps. The parameters of bulk modulus and effective density used in the model

are calibrated using initial well data.

Random Forest Regression: Machine learning control which employs RF model to predict BHP and calculates

real-time set-point with respect to the predicted pressure variations. RF model was formed on the same partitions

of data that was used in RL training.

Evaluation Metrics

The performance of the models was assessed based on the operational priorities and operational factors in MPD

control:

• Mean Absolute Pressure Deviation (MAPD): The average of the absolute deviation of actual and target

BHP in all time steps.

• Pressure Excursion Rate (PER): Percentage of the time steps in which BHP was out of bounds of safe

drilling window limits.

• Average ROP: Mean drilling rate achieved during model control periods

• Kick Detection Time (KDT): Time elapsed between simulated kick initiation and control system response

• Control Stability Index (CSI): Standard deviation of choke position changes, indicating control

smoothness

RESULTS AND DISCUSSION

Model Training and Convergence

The DDPG agent was trained in 2,000 episodes that included a whole drilling cycle of one well in the training

set. The convergence of training was measured using cumulative episode reward and critic loss measures. The

agent showed steady learning behavior; the average reward of the episode went up by -245 (random action at

the start) to +128 (converged action) in about 1,200 consecutive episodes. The last episode (1,500) is the

stabilization point of validation performance, which suggests that there is sufficient generalization but no

overfitting.

Sensitivity analysis on the hyperparameters showed that the weights of the reward functions had a big impact

on behavior learned. Increased w

(ROP weighting) resulted in aggressive policy that at times nearly touched

the window walls, whereas conservative w 1 (pressure weighting) resulted in consistent but slower drilling.

The selected configuration (w₁ = 0.5, w₂ = 0.3, w₃ = 0.2) achieved optimal balance for HPHT narrow-margin

conditions.

Pressure Control Performance

Metric

Rule-Based

MPC

RF Regression

DDPG (Proposed)

MAPD (psi)

42.3

38.1

35.7

32.4

Page 1213

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026

PER (%)

4.8

3.2

2.9

1.4

CSI (% change/step)

2.1

1.8

2.4

1.5

The DDPG model showed the best accuracy of pressure control based on all the evaluation measures on the

held-out test. The absolute deviation in the mean pressure was also lowered by 23 percent relative to control

using rules (42.3 psi versus 32.4 psi), which indicates a significant change in the accuracy of tracking of BHP.

The rate of pressure excursion (indicating frequency of unsafe window violations) was found to reduce when

using rule-based methods and DDPG by 4.8 to 1.4%, respectively, indicating a reduction of 71% in unsafe

pressure excursions.

These increases are especially important with respect to the close drilling margins of Gulf of Guinea HPHT

wells, in which the difference between pore pressure and fracture gradient could be below 0.3 ppg. In these

limitations, minute deviations in accuracy of pressure control also result in significant decreases in the risk of

influx/loss. The MPC base case delivered moderate results (MAPD = 38.1 psi), which is in line with the

literature reports that simplified hydraulic models restrict the accuracy that can be obtained in highly dynamic

environments.

Drilling Performance Optimization

The multi-objective rewarding method allowed the DDPG agent to achieve drilling performance and pressure

regulation goals. Average ROP values in the test set were 15 percent greater than rule-based control (47.2 ft/hr

vs. 41.1 ft/hr) without changing the rates of pressure excursions. This enhancement is in line with industry

experience that MPD is a superior technique that facilitates the achievement of higher ROP through drilling

with light mud weights and the usage of surface back pressure to sustain formation containment.

Evaluation of acquired control policies showed that the agent had learned complex drilling strategy depending

on the conditions. During periods of stable formation the agent kept the lower backpressure at the lowest

possible level to ensure the maximum ROP without compromising on the safety margins. The agent

anticipatorily changed set-points before excursions occurred once he saw signs of formation pressure variations

(increasing flow-out differential, pressure trends). This is a data-learned anticipatory behavior that is similar to

the best practices used by seasoned operators of MPD and performs better with consistency and speed of

reaction.

Kick Detection and Response

The test data was fed with simulated kick scenarios to test the responsiveness of the framework to influx events.

The kick models are created based on step-wise increases in the rate of formation fluid influx. The models are

tuned to provide the correct flow & pressure characteristics based on kick dynamics observed during MPD

operations. The DDPG agent proved to be 18 seconds faster than rule-based control in kick detection response

time (18 seconds vs. 34 seconds after the influx start to choke response). It is an increase due to the sensitivity

of the agent to the minor pressure and flow indications that lead to evident kick patterns. Modern MPD systems

have a component, the influx-loss detection, which detects the symptoms of downhole events before they turn

into well control events. The DDPG model was successful in its attempt to recognize and react to these early

signals and corrective action was taken at the incipient stage when control intervention was most effective.

Comparison with Related Work

The performance gains reported in this research are favorable to those reported in the corresponding applications

of RL drilling. Huang et al. (2024) established that DDPG may be useful in an attempt to optimize drilling

parameters (WOB, RPM) to maximize ROP without the occurrence of stick-slip vibrations. This is extrapolated

to the more intricate domain of MPD control in the present work, in which real-time continuous pressure

management has to strike a balance between various competing goals. The 23%, relative to the 10% performance

gain of RL-based heave compensation control using DDPG, improvement in pressure control accuracy is

indicative of the fact that the MPD control problem is one that RL optimization can address especially well.

Page 1214

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026

This could be an indication of the rich feedback contained in MPD systems, where pressure and flow metrics

give instant feedback on the effectiveness of control actions, which makes learning effective policies.

Limitations and Practical Considerations

These results have a number of limitations that should be recognized during their interpretation. First, the dataset

is a restricted sample of Gulf of Guinea operations, and it might not represent the range of conditions that might

occur worldwide. The transfer learning techniques might be needed to adjust the trained models to significantly

different geological contexts. Second, the simulation model though it is tuned to historical data does not capture

all dynamic phenomena which are experienced during live drilling operations, especially the extreme events

that are very rare. A real implementation of RL-based MPD control would involve a significant amount of

validation, such as hardware-in-the-loop testing of real MPD equipment prior to field experiments. There are

other implementation challenges in terms of integration with the current rig control systems, cybersecurity, and

regulatory approval. Nevertheless, the performance gains illustrated indicate that further extension of the RL-

based MPD control is worth earning serious consideration in the event of demanding deepwater operations.

CONCLUSION

This paper introduced a new reinforcement learning model to smart MPD set-point operation in narrow-margin

deepwater HPHT wells, which is a significant deficiency in the sphere of automated drilling technology. The

DDPG-based method showed a drastic change over traditional rule-based and model-based control strategies,

whereby the mean absolute pressure deviation was reduced by 23 percent, the rate of pressure excursion was

minimized by 71 percent, and the average ROP was improved by 15 percent on past historical operations of

Gulf of Guinea. The multi-objective reward function design was critical in the process of balancing the

competing needs of pressure control accuracy, drilling efficiency and well control safety inherent in the process

of HPHT MPD operations. The trained control policies had advanced anticipatory behavior, sensing and

responding to formation variations more rapid than the traditional systems and had less turbulent control

measures that minimized equipment damage and complicated operations.

These results indicate that reinforcers learning can be of considerable value in developing MPD automation,

especially in the harsh environment where the traditional methods fail to sustain optimal performance. The Gulf

of Guinea area, which has a long history of deepwater drilling and further exploration and development, will be

a perfect site to further development and field testing of intelligent MPD control systems.

The research opportunities in the future include: (1) to multi-well transfer learning to enable rapid adaptation to

new drilling campaigns; (2) to physics-informed neural networks to enhance the interpretability of this model;

(3) to ensemble RL to ensure the models are more robust; and (4) pilot testing in collaboration with operators

that operate in West African deepwater basins. The combination of high levels of control technology and

growing deepwater resources has placed the industry in a position to reach safely and efficiently resources in

more challenging environments.

REFERENCES

1. Arnø, M., Godhavn, J.M., Aamo, O.M. (2020). Deep reinforcement learning applied to managed pressure

drilling. SPE Bergen One Day Seminar, Bergen, Norway.

2. Ding, Y., Chen, Z., Zhang, K., et al. (2023). A reinforcement learning method for optimal control of oil

wells. Heliyon, 9(7), e17751.

3. Ekechukwu, G., Uzochukwu, E., Ibekwe, K. (2024). Explainable machine-learning-based prediction of

equivalent circulating density. Scientific Reports, 14, 17620.

4. Elliott, D., Montilva, J., Francis, P., et al. (2011). Managed pressure drilling erases the lines. Oilfield

Review, 23(1), 14-23.

5. Gamal, H., Elkatatny, S., Abdulraheem, A. (2021). Machine learning models for equivalent circulating

density prediction. ACS Omega, 6(40), 26267-26276.

6. Gao, X., Liu, Y., Wang, H., et al. (2024). Equivalent circulation density prediction using random forest.

TPE, MS.ID.000541.

Page 1215

www.rsisinternational.org

INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,

MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)

ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026

7. Hauge, E., Aamo, O.M., Godhavn, J.M. (2013). Automatic kick detection and handling in managed

pressure drilling. Ph.D. Thesis, Norwegian University of Science and Technology.

8. Huang, X., Luu, H., Shang, S., et al. (2024). Deep reinforcement learning for automatic drilling

optimization using an integrated reward function. SPE/IADC Drilling Conference and Exhibition,

Galveston, Texas.

9. Keshavarz, S., Elahifar, B., Gholami, A. (2024). Deep reinforcement learning algorithm for wellbore

cleaning across drilling operation. Fourth EAGE Digitalization Conference & Exhibition.

10. Keshavarz, S., Elahifar, B., Gholami, A. (2025). Deep reinforcement learning for automated decision

support in oil well operations. Energy Reports, 13, 5967.

11. Najjarpour, M., Jalalifar, H., Soleimani, B. (2022). Managed pressure drilling technology, mechanical

specific energy and bit management for ROP management. Journal of Petroleum Science and Engineering,

209, 109834.

12. Park, J., Price, C., Pixton, D., et al. (2020). Model predictive control and estimation of managed pressure

drilling using a real-time high fidelity flow model. ISA Transactions, 97, 76-89.

13. Scoular, T., Hathaway, K., Essam, W., et al. (2012). BP case study: MPD application supports HPHT

exploration. Drilling Contractor, July/August 2012.

14. Squintani, E., Bonin, R., Borello, F., et al. (2018). Deepwater HPHT drilling through ultra narrow PPFG

window: A case study by ENI. Abu Dhabi International Petroleum Exhibition & Conference.

15. Xiong, M., Wang, Y., Liu, X., et al. (2024). A rate of penetration (ROP) prediction method based on

BiLSTM-SA-IDBO. Scientific Reports, 14, 24812.

16. Zhou, J., Gravdal, J.E., Strand, P., et al. (2016). Automated kick control procedure for an influx in managed

pressure drilling operations. Modeling, Identification and Control, 37(1), 31-40.