Page 1207
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
Intelligent MPD Set-Point Control for Narrow Windows: A
Reinforcement Learning Framework for Automated Choke Control
in Deepwater HPHT Wells
John Lander Ichenwo, Marvellous Amos
Department of Petroleum Engineering, University of Port Harcourt
DOI: https://doi.org/10.51583/IJLTEMAS.2026.15020000105
Received: 01 March 2026; Accepted: 06 March 2026; Published: 20 March 2026
ABSTRACT
Deep water high-pressure, high-temperature (HPHT) drilling in managed pressure drilling (MPD) offers great
challenges because of very thin changes between pore pressure and fracture gradient, frequently below
equivalent mud weight of 0.3 ppg. Traditional rule-based MPD tuning algorithms fail to hold optimal setpoints
in real-time resulting in influx/loss events and inefficient rate of penetration (ROP).
This paper introduces a new reinforcement learning (RL) model of intelligent MPD set-point control, which is
more specifically backpressure and equivalent circulating density (ECD) optimization in narrow-margin
deepwater wells that characteristic of the Gulf of Guinea area, such as Ghana and Nigeria. A Deep Deterministic
Policy Gradient (DDPG) model was optimized using past MPD operational data in West African deepwater
campaigns which included a multi-objective reward function balancing between influx risk, loss risk, and ROP
optimization. The paradigm shows that the pressure control accuracy (23% improvement over rule-based
approaches) and mean absolute pressure deviation (42 psi to 32 psi) are lower in comparison to rule-based
approaches.
Moreover, the model had an average increase of 15 percent in the average ROP and was able to keep wellbore
stable despite the thin drilling window. The intelligent control system detected and reacted to simulated kick
situations 18 seconds earlier than the traditional automated MPD systems, which is a giant development in real-
time well control capacity. The above results imply that the operational benefits of RLbased MPD control are
enormous in strenuous deepwater HPHT drilling operations, and could be applicable to the whole range of the
Gulf of Guinea deepwater drilling ventures.
Keywords: Managed pressure drilling, reinforcement learning, HPHT, narrow margin, deepwater drilling,
DDPG, choke control, Gulf of Guinea
INTRODUCTION
The world chase of the hydrocarbon resource has moved drilling activities towards the deepwater environment
with intense geology such as the high-pressure, high-temperature (HPHT) oil reservoirs with small working
conditions. These issues are seen in the Gulf of Guinea, which contains the prolific hydrocarbon provinces
offshore Nigeria and Ghana, and the deepwater operations have added significant production capacity to the
region. The deepwater fields in Nigeria have a record of contributing more than 800,000 barrels per day to
national production with Ghana having the Jubilee, TEN and Sankofa fields that contribute around 160,000
barrels per day by the deepwater offshores.
The main problem with HPHT deepwater drilling is that formation pore pressure and fracture gradient have a
thin margin that may not exceed 0.3 ppg equivalent mud weight (EMW). This limitation is a serious hindrance
to the operational time frame to ensure stability of drilling wellbores besides maximizing the performance of
the drilling. The pressure margin between pore pressure and fracture initiation pressure determines the margin
of drilling; when the equivalent circulating density (ECD) reaches any of the above limits, then influx or loss
Page 1208
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
situations can take place. Traditional drilling techniques are often inadequate in addressing such thin margins of
pressure as they are incapable of delivering the accuracy that is necessary in real-time management of pressure.
The making of managed pressure drilling (MPD) has become the supporting technology in overcoming such
hazardous conditions. In all its drilling activities, MPD makes use of dynamically controlled surface equipment
such as choke valves and auxiliary pumps to ensure total control of the bottomhole pressure (BHP). The method
allows the real-time management of the pressure, limiting the potential influx and loss circulation and permitting
the drilling works to be performed in the previously undrillable formations. The performance of MPD operations
is however critical on the accurateness and responsiveness of the set-point control system controlling the
position of choke and application of the backpressure.
Current MPD control systems are primarily based on proportional-integral-derivative controllers or model
predictive control systems that are based on simplified models of hydraulic systems Although these methods
have proven to be very successful in numerous applications, it is limited in very dynamic HPHT environments
where formation pressures are very dynamic in terms of ramps and regressions. The complexity of the natural
multi-phase flow process, the fluid's dependent properties on temperature, as well as the variations in the
wellbore conditions are in question of the accuracy of the reduced-order models on which conventional
controllers are based. Moreover, rule-based tuning methods demand a lot of manual tuning, and are not able to
react to dynamic downhole conditions dynamically.
The recent development in artificial intelligence, especially reinforcement learning (RL) has provided good
alternative solutions to the complex control problems in the drilling activities. Deep RL algorithms have shown
an impressive ability in optimization of drilling parameters such as weight on bit (WOB) and rotary speed
(RPM) to optimize ROP without causing downhole vibrations. The Deep Deterministic Policy Gradient (DDPG)
algorithm, which is algorithmically tailored to continuous action space, has been demonstrated to be of special
interest to drilling tasks in which the outputs of control systems need to be continuously tuned, as opposed to
being chosen between discrete values. The given research fills an important gap in the literature by creating and
testing a reinforcement learning model that is specifically tailored to MPD set-point control in narrow-margin
HPHT deepwater wells. The paper is dedicated to the Gulf of Guinea working situation, where historical data
on the Nigerian and Ghanaian deepwater campaigns is used to develop models that suggest realtime
backpressure and equivalent mud weight set-points. The new multi-objective reward function proposed also
balances influx risk, loss risk and ROP optimization simultaneously, unlike previous single-objective based
research which has dominated the research on the automation of drilling.
The three main goals of the study are as follows: first, this paper is expected to create a DDPG-based control
agent that will be able to learn the optimal MPD set-point policies on the basis of historical operational data;
second, this paper is supposed to show the quantifiable benefits of pressure control accuracy and drilling
performance that are provided by the framework in comparison with the rule-based approaches to MPD tuning;
and third, this paper should prove that the framework will be able to detect early kicks and respond during
simulated HPHT conditions typical of the Gulf of Guinea conditions.
LITERATURE REVIEW
Managed Pressure Drilling in HPHT Environments
Managed pressure drilling is a modern technology that was aimed at solving the difficulty of drilling in HPHT
reservoirs often encountered in deepwater conditions. The technology offers accurate annular pressure profile
controlling by manipulating surface backpressure using automated choke systems to give operators the
capability of maintaining constant bottomhole pressure (CBHP) within narrow operational windows. The basic
principle is to seal the annulus with a rotating control device (RCD) and supply flow via a choke manifold to
permit the quick pressure up and down adjustments to counter any changes in ECD needed during various
operations of the drill.
The use of MPD in deepwater HPHT environment has been well reported with significant achievements in the
Mediterranean, West Africa and Southeast Asia. The application of MPD in the West Nile Delta by BP proved
Page 1209
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
that drilling in less than 0.3 ppg EMW PP-FG window formations was possible at target depth, and that the
technology would allow it to obtain wells that could not have been reached by conventional techniques. On the
same note, the breakthrough of ENI through its continuous circulation technology coupled with MPD also
facilitated successful penetration through ultra-narrow pressure windows of deepwater HPHT conditions. These
case studies can define MPD as a developed technology which can cope with extreme challenges in drilling but
also mark the importance of correct pressure control and the drawbacks of available automation solutions.
Control Systems for MPD Operations
The automation process has been developing over the last twenty years in the system of the MPD choke control
as it shifted to the period of the manual control systems to the modern system of the model-based control. Most
modern MPD control systems use hydraulic models to calculate real time downhole pressure and feedback
control algorithms to automatically set choke position to achieve desired pressure set-points. Model predictive
control has become the most popular way of automated MPD operations through the capacity to deal with
numerous inputs and outputs without violating operational constraints. The initial application of a highly
accurate real-time high-fidelity flow model modified to operate in an MPC controller was reported by Park et
al. (2020), in which the model showed a better control outcome when used in the process of drilling, pipe
connections, and mud density displacements. The research has pointed out that the success of controllers is
dependent on the outlook of the model, and discrepancies between the model and the real system results in poor
control.
It has been suggested that nonlinear MPC (NMPC) strategies are required to deal with nonlinearities inherent to
MPD systems, especially when operating in two-phase flow conditions which appear when there is a kick event.
The Memorial University researchers have come up with elaborate solutions of simple PID controllers to
complex NMPC with fault management capabilities, which have been experimented both at the simulation and
lab level. Despite this progress, there is still much work to be done before robust control performance can be
attained throughout the complete range of conditions experienced in HPHT deepwater drilling, and especially
during transient periods of operations, such as during connections, trips, and formation transitions.
Machine Learning Applications in Drilling
Machine learning applications have proved to be very successful in other drilling applications that include
prediction of ROP, estimation of ECD, and optimization of drilling parameters. Gamal et al. (2021) used
artificial neural networks (ANNs) and adaptive network-based fuzzy inference systems (ANFIS) to predict ECD
with correlation coefficients of more than 0.96 and average error ratios of less than 0.7 on average. Recently,
Ekechukwu et al. (2024) introduced an explainable machine learning framework based on XGBoost
methodology to predict ECD and showed R 2 values of 0.989 to predict the trend on testing data with a feature
importance analysis to interpret the results. Such studies prove that machine learning models are able to
effectively represent the multifaceted relationships between drilling parameters and the most important wellbore
conditions.
Random Forest models have demonstrated especially good performance in ECD prediction with the authors
showing R
2
of 0.9859 and RMSE of 0.0017 in testing datasets with input of parameters of surface drilling (Gao
et al., 2024). The benefit of surface measurements only removes the use of expensive downhole sensors yet still
has a high prediction accuracy. These methods can be immediately applied to MPD control systems and, in that
case, the reliable ECD estimation is required to keep the pressure within tight operational limits.
Reinforcement Learning for Drilling Optimization
Reinforcement learning is a new paradigm to the automation of drilling, which promises the possibility of agents
learning how to act in the best control policy when interacting with their surroundings instead of being
programmed to do so. For instance, a virtual drilling agent developed by Huang et al. (2024) is based on DDPG
algorithm and will automatically optimize drilling variables, incorporating ROP, vibration, dull bit, and risk of
tool breakage into a reward function. Their findings indicated that RL model has the potential of identifying the
Page 1210
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
best solution to different drilling conditions including hard formation, embedded rock and unstable drilling
conditions.
The DDPG algorithm belongs to the type of actor-critic algorithms for RL, which combines value-based and
policy-based approaches for dealing with actions that have infinite dimensions. The algorithm keeps two neural
networks, an actor network which computes the best action based on the current state, and a critic network
which approximates the value of state-action pairs. This is particularly suited to MPD control systems wherein
the set-points for choke position as well as backpressure are not chosen discretely but are varied continuously.
Keshavarz et al. (2024, 2025) suggested deep reinforcement learning representations of the real-time planning
of drilling operations with Markov Decision Process formulations and the application of the Gaussian process
algorithm to determine the safe operating window. Their work showed that in the process of wellbore cleaning,
automated decision-making was possible, the high-level performance could be ensured, and non-value-added
tasks are removed. The theoretical use of RL methods in the MPD set-point optimization is supported by the
autonomous optimization technique created by the researchers utilizing the Q-learning algorithms to control the
drilling parameters.
Research Gaps and Contribution
The research gaps and contribution are as follows. Although tremendous progress was achieved in the field of
MPD automation and RL-based drilling optimization, there is an essential gap in the intersection. Existing
systems of MPD control are mainly model-based and need good hydraulic models and manual fine-tuning.
Although, RL methods have been used to optimise the drilling parameters (WOB, RPM) their implementation
to MPD set-point control has not been investigated in the published literature. Moreover, the current RL drilling
programs are mainly devoted to single-objective optimization (usually, the ROP maximization), whereas the
MPD control should be considered in a multi-objective way and involve the influx risk, loss risk, and drilling
efficiency optimization simultaneously.
This paper fills these gaps by creating a DDPG-based framework that has been tailored in MPD set-point control
in narrow-margin HPHT wells. The new contribution consists of: (1) the MPD control problem is formulated as
a continuous-action RL problem; (2) a multi-objective rewarding function balancing the risk of influx/loss and
ROP maximization is developed; (3) the problem is trained and validated using data that is representative of Gulf
of Guinea deepwater operations; and (4) the problem is quantitatively compared to rulebased MPD tuning
techniques.
METHODS
Data Description and Preprocessing
The data used in this paper is operational data of 12 motorized HPHT wells that have been drilled in the Gulf of
Guinea between 2018 and 2024, of which 7 wells are offshore Nigeria and 5 wells are offshore Ghana. The data
were collected in the MPD operations carried out as per the CBHP methodology in water depths of 850 to 2,100
meters of which the depths were measured up to 5,500 meters. The total data size is about 18,500-time stamped
records taken at 10 s when MPD is active.
The input features are measurements of the surface that are regularly taken when the MPD is being used: pump
flow rate (GPM), standpipe pressure (SPP), weight on hook (WOH), rotary speed (RPM), rate of penetration
(ROP), mud density (ppg), and choke position (%). The target variables in this work are bottomhole pressure as
well as PWD tools and ECD from downhole measurements. Extra derived features would be flow-in/flowout
differential, pressure trend derivatives and time-lagged variables that represent system dynamics.
Preprocessing of the data was done by the standard drilling data quality assurance. Values from null records,
sensor malfunction, or redundant data points were dropped, making valid observations 15,840. Outlier detection
and removal was done by the Interquartile Range (IQR) method, which removes about 4% of records that
Page 1211
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
contained anomalies in the pressure or flow measurements. All features were normalized to a z-score to
guarantee equal scaling of all the input dimensions, which is vital in the stability of neural network training.
Stratified sampling was used to divide the dataset into the training (70%), validation (15%), and testing (15%)
subsets to maintain representative representation of well types and formation characteristics by the partitions.
Each well was ordered in time to retain realistic sequential dependencies that are needed when training RL.
Reinforcement Learning Framework
The problem of MPD set-point control was defined as Markov Decision Process (MDP) that could be optimized
by use of a RL. On every time step t, the agent perceives state s t which includes the current drilling parameters
and pressure readings, chooses action a t which includes backpressure and ECD set-point changes, receives
reward r t indicating the performance of control, and changes to successor state s_{t+1}.
State Space: The state vector s_t consists of pump flow rate, standpipe pressure, weight on hook, rotary
speed, ROP, mud density, current choke position, current backpressure, BHP, ECD, formation depth, and
porefracture margin (calculated based on offset well data).
Action Space: Action vector a_t ℝ² defines continuous changes in: (1) backpressure set-point ( 0 +50 psi),
and (2) desired ECD set-point ( 0 +0.02 ppg). The continuous action space is needed to obtain fine-grained
control of pressure within tight margins that are typical of HPHT works.
Reward Function: The reward function was designed as a multi-objective to balance competing goals of MPD
control:
𝑟𝑡 = 𝑤
1
⋅ 𝑟𝑝𝑟𝑒𝑠𝑠𝑢𝑟𝑒 + 𝑤
2
⋅ 𝑟𝑅𝑂𝑃 + 𝑤
3
⋅ 𝑟𝑠𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦
where:
𝑟
𝑝𝑟𝑒𝑠𝑠𝑢𝑟𝑒
= −∣ 𝐵𝐻𝑃
𝑡
− 𝐵𝐻𝑃
𝑡𝑎𝑟𝑔𝑒𝑡
∣/𝜎
𝑝𝑟𝑒𝑠𝑠𝑢𝑟𝑒
penalizes deviation from target BHP
𝑟
𝑅𝑂𝑃
= (𝑅𝑂𝑃
𝑡
− 𝑅𝑂𝑃
𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒
)/𝑅𝑂𝑃
𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒
rewards ROP improvements
𝑟
𝑠𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦
= −𝜆
1
⋅ 𝐼
𝑖𝑛𝑓𝑙𝑢𝑥
− 𝜆
2
⋅ 𝐼
𝑙𝑜𝑠𝑠
penalizes influx/loss indicators
Sensitivity analysis helped to optimize the weighting coefficients (w₁ = 0.5, w₂ = 0.3, w₃ = 0.2) in order to create
a balanced optimization of objectives. The influx indicator I_influx activates when flow-out exceeds flow-in by
more than 2%, while I_loss activates when the opposite condition holds for sustained periods.
DDPG Algorithm Implementation
The Deep Deterministic Policy Gradient algorithm was selected for its proven capability in continuous control
tasks with high-dimensional state spaces. DDPG maintainsintains four neural networks: actor network μ(s|θ^μ),
critic network Q(s,a|θ^Q), and their respective target networks with parameters θ^μ' and θ^Q' that are soft-
updated to improve training stability.
The actor network architecture comprises three fully-connected layers with 256, 128, and 64 neurons
respectively, using ReLU activation functions and batch normalization. The output layer employs tanh activation
scaled to the action bounds. The critic network concatenates state and action vectors after the first layer, with
subsequent architecture matching the actor.
Training utilized experience replay with buffer size 100,000 and mini-batch size 64. Target networks were
updated using soft update coefficient τ = 0.001. The Ornstein-Uhlenbeck process provided exploration noise
with parameters θ = 0.15 and σ = 0.2, decayed exponentially over training episodes. Learning rates were set at
10⁻⁴ for the actor and 10⁻³ for the critic, following recommendations for drilling applications.
Page 1212
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
Baseline Comparison Methods
To quantify improvements over existing approaches, three baseline methods were implemented:
Rule-Based Control: Traditional MPD automation using PID controller with fixed gains tuned for nominal
operating conditions. Backpressure set-points adjusted based on lookup tables indexed by measured depth and
drilling phase.
Model Predictive Control: Linear MPC using simplified hydraulic model with prediction horizon of 40 time
steps and control horizon of 15 steps. The parameters of bulk modulus and effective density used in the model
are calibrated using initial well data.
Random Forest Regression: Machine learning control which employs RF model to predict BHP and calculates
real-time set-point with respect to the predicted pressure variations. RF model was formed on the same partitions
of data that was used in RL training.
Evaluation Metrics
The performance of the models was assessed based on the operational priorities and operational factors in MPD
control:
Mean Absolute Pressure Deviation (MAPD): The average of the absolute deviation of actual and target
BHP in all time steps.
Pressure Excursion Rate (PER): Percentage of the time steps in which BHP was out of bounds of safe
drilling window limits.
Average ROP: Mean drilling rate achieved during model control periods
Kick Detection Time (KDT): Time elapsed between simulated kick initiation and control system response
Control Stability Index (CSI): Standard deviation of choke position changes, indicating control
smoothness
RESULTS AND DISCUSSION
Model Training and Convergence
The DDPG agent was trained in 2,000 episodes that included a whole drilling cycle of one well in the training
set. The convergence of training was measured using cumulative episode reward and critic loss measures. The
agent showed steady learning behavior; the average reward of the episode went up by -245 (random action at
the start) to +128 (converged action) in about 1,200 consecutive episodes. The last episode (1,500) is the
stabilization point of validation performance, which suggests that there is sufficient generalization but no
overfitting.
Sensitivity analysis on the hyperparameters showed that the weights of the reward functions had a big impact
on behavior learned. Increased w
2
(ROP weighting) resulted in aggressive policy that at times nearly touched
the window walls, whereas conservative w 1 (pressure weighting) resulted in consistent but slower drilling.
The selected configuration (w= 0.5, w= 0.3, w= 0.2) achieved optimal balance for HPHT narrow-margin
conditions.
Pressure Control Performance
Metric
Rule-Based
MPC
RF Regression
DDPG (Proposed)
MAPD (psi)
42.3
38.1
35.7
32.4
Page 1213
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
PER (%)
4.8
3.2
2.9
1.4
CSI (% change/step)
2.1
1.8
2.4
1.5
The DDPG model showed the best accuracy of pressure control based on all the evaluation measures on the
held-out test. The absolute deviation in the mean pressure was also lowered by 23 percent relative to control
using rules (42.3 psi versus 32.4 psi), which indicates a significant change in the accuracy of tracking of BHP.
The rate of pressure excursion (indicating frequency of unsafe window violations) was found to reduce when
using rule-based methods and DDPG by 4.8 to 1.4%, respectively, indicating a reduction of 71% in unsafe
pressure excursions.
These increases are especially important with respect to the close drilling margins of Gulf of Guinea HPHT
wells, in which the difference between pore pressure and fracture gradient could be below 0.3 ppg. In these
limitations, minute deviations in accuracy of pressure control also result in significant decreases in the risk of
influx/loss. The MPC base case delivered moderate results (MAPD = 38.1 psi), which is in line with the
literature reports that simplified hydraulic models restrict the accuracy that can be obtained in highly dynamic
environments.
Drilling Performance Optimization
The multi-objective rewarding method allowed the DDPG agent to achieve drilling performance and pressure
regulation goals. Average ROP values in the test set were 15 percent greater than rule-based control (47.2 ft/hr
vs. 41.1 ft/hr) without changing the rates of pressure excursions. This enhancement is in line with industry
experience that MPD is a superior technique that facilitates the achievement of higher ROP through drilling
with light mud weights and the usage of surface back pressure to sustain formation containment.
Evaluation of acquired control policies showed that the agent had learned complex drilling strategy depending
on the conditions. During periods of stable formation the agent kept the lower backpressure at the lowest
possible level to ensure the maximum ROP without compromising on the safety margins. The agent
anticipatorily changed set-points before excursions occurred once he saw signs of formation pressure variations
(increasing flow-out differential, pressure trends). This is a data-learned anticipatory behavior that is similar to
the best practices used by seasoned operators of MPD and performs better with consistency and speed of
reaction.
Kick Detection and Response
The test data was fed with simulated kick scenarios to test the responsiveness of the framework to influx events.
The kick models are created based on step-wise increases in the rate of formation fluid influx. The models are
tuned to provide the correct flow & pressure characteristics based on kick dynamics observed during MPD
operations. The DDPG agent proved to be 18 seconds faster than rule-based control in kick detection response
time (18 seconds vs. 34 seconds after the influx start to choke response). It is an increase due to the sensitivity
of the agent to the minor pressure and flow indications that lead to evident kick patterns. Modern MPD systems
have a component, the influx-loss detection, which detects the symptoms of downhole events before they turn
into well control events. The DDPG model was successful in its attempt to recognize and react to these early
signals and corrective action was taken at the incipient stage when control intervention was most effective.
Comparison with Related Work
The performance gains reported in this research are favorable to those reported in the corresponding applications
of RL drilling. Huang et al. (2024) established that DDPG may be useful in an attempt to optimize drilling
parameters (WOB, RPM) to maximize ROP without the occurrence of stick-slip vibrations. This is extrapolated
to the more intricate domain of MPD control in the present work, in which real-time continuous pressure
management has to strike a balance between various competing goals. The 23%, relative to the 10% performance
gain of RL-based heave compensation control using DDPG, improvement in pressure control accuracy is
indicative of the fact that the MPD control problem is one that RL optimization can address especially well.
Page 1214
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
This could be an indication of the rich feedback contained in MPD systems, where pressure and flow metrics
give instant feedback on the effectiveness of control actions, which makes learning effective policies.
Limitations and Practical Considerations
These results have a number of limitations that should be recognized during their interpretation. First, the dataset
is a restricted sample of Gulf of Guinea operations, and it might not represent the range of conditions that might
occur worldwide. The transfer learning techniques might be needed to adjust the trained models to significantly
different geological contexts. Second, the simulation model though it is tuned to historical data does not capture
all dynamic phenomena which are experienced during live drilling operations, especially the extreme events
that are very rare. A real implementation of RL-based MPD control would involve a significant amount of
validation, such as hardware-in-the-loop testing of real MPD equipment prior to field experiments. There are
other implementation challenges in terms of integration with the current rig control systems, cybersecurity, and
regulatory approval. Nevertheless, the performance gains illustrated indicate that further extension of the RL-
based MPD control is worth earning serious consideration in the event of demanding deepwater operations.
CONCLUSION
This paper introduced a new reinforcement learning model to smart MPD set-point operation in narrow-margin
deepwater HPHT wells, which is a significant deficiency in the sphere of automated drilling technology. The
DDPG-based method showed a drastic change over traditional rule-based and model-based control strategies,
whereby the mean absolute pressure deviation was reduced by 23 percent, the rate of pressure excursion was
minimized by 71 percent, and the average ROP was improved by 15 percent on past historical operations of
Gulf of Guinea. The multi-objective reward function design was critical in the process of balancing the
competing needs of pressure control accuracy, drilling efficiency and well control safety inherent in the process
of HPHT MPD operations. The trained control policies had advanced anticipatory behavior, sensing and
responding to formation variations more rapid than the traditional systems and had less turbulent control
measures that minimized equipment damage and complicated operations.
These results indicate that reinforcers learning can be of considerable value in developing MPD automation,
especially in the harsh environment where the traditional methods fail to sustain optimal performance. The Gulf
of Guinea area, which has a long history of deepwater drilling and further exploration and development, will be
a perfect site to further development and field testing of intelligent MPD control systems.
The research opportunities in the future include: (1) to multi-well transfer learning to enable rapid adaptation to
new drilling campaigns; (2) to physics-informed neural networks to enhance the interpretability of this model;
(3) to ensemble RL to ensure the models are more robust; and (4) pilot testing in collaboration with operators
that operate in West African deepwater basins. The combination of high levels of control technology and
growing deepwater resources has placed the industry in a position to reach safely and efficiently resources in
more challenging environments.
REFERENCES
1. Arnø, M., Godhavn, J.M., Aamo, O.M. (2020). Deep reinforcement learning applied to managed pressure
drilling. SPE Bergen One Day Seminar, Bergen, Norway.
2. Ding, Y., Chen, Z., Zhang, K., et al. (2023). A reinforcement learning method for optimal control of oil
wells. Heliyon, 9(7), e17751.
3. Ekechukwu, G., Uzochukwu, E., Ibekwe, K. (2024). Explainable machine-learning-based prediction of
equivalent circulating density. Scientific Reports, 14, 17620.
4. Elliott, D., Montilva, J., Francis, P., et al. (2011). Managed pressure drilling erases the lines. Oilfield
Review, 23(1), 14-23.
5. Gamal, H., Elkatatny, S., Abdulraheem, A. (2021). Machine learning models for equivalent circulating
density prediction. ACS Omega, 6(40), 26267-26276.
6. Gao, X., Liu, Y., Wang, H., et al. (2024). Equivalent circulation density prediction using random forest.
TPE, MS.ID.000541.
Page 1215
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue II, February 2026
7. Hauge, E., Aamo, O.M., Godhavn, J.M. (2013). Automatic kick detection and handling in managed
pressure drilling. Ph.D. Thesis, Norwegian University of Science and Technology.
8. Huang, X., Luu, H., Shang, S., et al. (2024). Deep reinforcement learning for automatic drilling
optimization using an integrated reward function. SPE/IADC Drilling Conference and Exhibition,
Galveston, Texas.
9. Keshavarz, S., Elahifar, B., Gholami, A. (2024). Deep reinforcement learning algorithm for wellbore
cleaning across drilling operation. Fourth EAGE Digitalization Conference & Exhibition.
10. Keshavarz, S., Elahifar, B., Gholami, A. (2025). Deep reinforcement learning for automated decision
support in oil well operations. Energy Reports, 13, 5967.
11. Najjarpour, M., Jalalifar, H., Soleimani, B. (2022). Managed pressure drilling technology, mechanical
specific energy and bit management for ROP management. Journal of Petroleum Science and Engineering,
209, 109834.
12. Park, J., Price, C., Pixton, D., et al. (2020). Model predictive control and estimation of managed pressure
drilling using a real-time high fidelity flow model. ISA Transactions, 97, 76-89.
13. Scoular, T., Hathaway, K., Essam, W., et al. (2012). BP case study: MPD application supports HPHT
exploration. Drilling Contractor, July/August 2012.
14. Squintani, E., Bonin, R., Borello, F., et al. (2018). Deepwater HPHT drilling through ultra narrow PPFG
window: A case study by ENI. Abu Dhabi International Petroleum Exhibition & Conference.
15. Xiong, M., Wang, Y., Liu, X., et al. (2024). A rate of penetration (ROP) prediction method based on
BiLSTM-SA-IDBO. Scientific Reports, 14, 24812.
16. Zhou, J., Gravdal, J.E., Strand, P., et al. (2016). Automated kick control procedure for an influx in managed
pressure drilling operations. Modeling, Identification and Control, 37(1), 31-40.