District-Level Crop Yield Prediction in India: A Random Forest Framework with SHAP-Enhanced Explainability and Spatial Residual Analysis.
Article Sidebar
Main Article Content
Precise assessment of district-level crop yields is crucial for food security planning and targeted agricultural interventions in India; however, conventional statistical methods fail to account for spatial variability and nonlinear connections among agronomic variables. This study developed a Random Forest-based framework for predicting crop yield across Indian districts using multi-year data on crop type, season, production, and cultivated area, complemented by open-source agronomic datasets. Yield was log-transformed to stabilise variance, and the model was trained with an 80:20 train–test split and hyperparameter tuning via grid search and cross-validation, while permutation importance and SHAP analyses were applied to interpret feature contributions and district-level residual patterns. The Random Forest model achieved strong predictive performance on the test set, with , low RMSE and MAE, and close alignment between predicted and observed yields for most districts. Feature attribution indicated that production, cultivated area, and season were the most influential predictors, and spatial aggregation of residuals revealed clusters of systematic over- and under-prediction linked to data-poor or agro-ecologically complex regions. An explainable machine learning pipeline, resolved at the district level, can accurately forecast crop output variability in India, providing detailed insights that exceed those of conventional regression techniques and facilitate region-specific policy and management decisions. The framework necessitates enhanced regional data quality and the incorporation of more comprehensive meteorological and soil information to better operational agriculture monitoring.
Downloads
References
“Final estimates of production of major crops for the year 2022-23.” [Online]. Available: www.phdcci.in
“Ministry of Agriculture & Farmers Welfare Department of Agriculture and Farmers’ Welfare releases Final Estimates of major agricultural crops for 2023-24.” [Online]. Available:
https://www.pib.gov.in/PressReleasePage.aspx?PRID=2058534
S. Saiful and N. B. Wibisono, “Crop Yield Prediction Using Random Forest Algorithm and XGBoost Machine Learning Model,” International Journal of Research and Innovation in Social Science, vol. IX, no. III, pp. 1983–1994, Apr. 2025, doi: 10.47772/IJRISS.2025.90300155.
R. Prathiba, D. Sri Harsha, D. Madhu, D. Chaitanya Venkata Ajay, and D. Harsha Vardhan Assistant Professor, “International Journal of Innovative Research in Science Engineering and Technology (IJIRSET) Crop Yield Prediction using Random Forest Algorithm”, doi:
15680/IJIRSET.2025.1404465.
T. van Klompenburg, A. Kassahun, and C. Catal, “Crop yield prediction using machine learning: A systematic literature review,” Comput Electron Agric, vol. 177, p. 105709, Oct. 2020, doi: 10.1016/j.compag.2020.105709.
S. K. Sharma, D. P. Sharma, and K. Gaur, “Machine Learning Techniques for Crop Yield Forecasting in Semi-Arid (3A) Zone, Rajasthan (India),” Current Agriculture Research Journal, vol. 11, no. 3, pp. 895–914, Jan. 2024, doi: 10.12944/CARJ.11.3.19.

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in our journal are licensed under CC-BY 4.0, which permits authors to retain copyright of their work. This license allows for unrestricted use, sharing, and reproduction of the articles, provided that proper credit is given to the original authors and the source.