Optimizing Lunar Descent Efficiency Using Reinforcement earning: A Computational Physics Investigation of Fuel-Constrained Landing Dynamics

Article Sidebar

Main Article Content

Akshey Sharma Kasibhatla

Planetary landings can only be autonomous by having control systems that are able to balance accuracy and efficiency under nonlinear conditions.


This research project examines the use of reinforcement learning (RL) which in this case is Proximal Policy Optimization (PPO) algorithm to simulate and optimize the dynamics of lunar descent in a one dimensional environment with Newtonian physics.


The agent was trained to successfully complete soft landings at different fuel and thrust-cost settings, where a reward function punished the use of fuel and high terminal velocity.


On 40 000 training episodes, the agent always shared similar stable policies that reduced fuel usage and yet reached safe touchdown speeds (less than 2 m/s-1).


Quantitative analysis showed the existence of a strong negative correlation between available fuel and final velocity and unique dual-phase kinetics of fuel-use patterns were found which were similar to real-world powered-descent sequences.


These findings confirm that reinforcement learning is a physically consistent, adaptive control technique to optimize descent, and has the potential to be useful in autonomous guidance systems in future trips to the moon and other planets.

Optimizing Lunar Descent Efficiency Using Reinforcement earning: A Computational Physics Investigation of Fuel-Constrained Landing Dynamics. (2025). International Journal of Latest Technology in Engineering Management & Applied Science, 14(11), 923-934. https://doi.org/10.51583/IJLTEMAS.2025.1411000088

Downloads

References

Blackmore, L., Fathpour, N., & Sutter, B. (2010). Autonomous precision landing of space rockets. AIAA Guidance, Navigation, and Control Conference.

Bryson, A. E. (1975). Applied optimal control: Optimization, estimation, and control. Taylor & Francis.

Chobotov, V. (2001). Orbital mechanics (3rd ed.). AIAA.

Farama Foundation. (2023). Gymnasium documentation. https://gymnasium.farama.org/

Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing function approximation error in actor-critic methods. Proceedings of the 35th International Conference on Machine Learning.

Gupta, M., & Kochenderfer, M. (2019). Online planning for autonomous planetary landing. Journal of Guidance, Control, and Dynamics, 42(6), 1256–1267.

Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic algorithms and applications. arXiv:1812.05905.

Harris, C., & D’Souza, C. (2011). Powered descent guidance and control for Mars landing. NASA Technical Reports.

Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., & Meger, D. (2018). Deep reinforcement learning that matters. Proceedings of the AAAI Conference on Artificial Intelligence.

Humphries, S. (2020). Propulsion efficiency modelling for low-fuel space landing operations. Aerospace Propulsion Journal, 9(4), 200–214.

Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90– 95.

ISRO. (2023). Chandrayaan-3 mission report. Indian Space Research Organisation.

Kakade, S. (2002). A natural policy gradient. MIT Press.

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. arXiv:1509.02971.

Mattingly, J. (2017). Elements of propulsion: Gas turbines and rockets (2nd ed.). AIAA Education Series.

Mihail, J. C. (2022). Modelling lunar descent dynamics using computational physics. Acta Astronautica, 175, 58–69.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.

NASA. (1969). Apollo 11 mission report. NASA Headquarters.

NASA. (2019). Lunar landing and descent trajectory analysis. NASA Technical Publications.

NASA. (2020). Artemis program: Lunar surface mission planning. NASA Exploration Systems Directorate.

Raffin, A., Hill, A., Ernestus, M., Gleave, A., Kanervisto, A., Dormann, N., & Plappert, M. (2021). Stablebaselines3: Reliable reinforcement learning framework for Python. Journal of Machine Learning Tools.

Raman, V., & Patel, S. (2023). Autonomous lunar descent optimization using machine learning models. Journal of Aerospace Systems Engineering, 14(2), 44–60.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347.

Serway, R. A., & Jewett, J. W. (2014). Physics for scientists and engineers (9th ed.). Brooks Cole.

Silver, D., Hubert, T., Schrittwieser, J., et al. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go. Science, 362(6419), 1140–1144.

Sutton, G. P., & Biblarz, O. (2017). Rocket propulsion elements (9th ed.). Wiley.

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.

Tsiolkovsky, K. E. (1903). Exploration of cosmic space by means of reaction devices. Russian Academy of Sciences.

Vallado, D. A. (2013). Fundamentals of astrodynamics and applications (4th ed.). Microcosm Press.

Article Details

How to Cite

Optimizing Lunar Descent Efficiency Using Reinforcement earning: A Computational Physics Investigation of Fuel-Constrained Landing Dynamics. (2025). International Journal of Latest Technology in Engineering Management & Applied Science, 14(11), 923-934. https://doi.org/10.51583/IJLTEMAS.2025.1411000088