AI-Driven Welfare Scheme Recommendation Using Random Forest and RAG
Article Sidebar
Main Article Content
We propose a scalable framework that integrates a multi-output Random Forest classifier with a retrieval-augmented generation module to recommend government welfare schemes to citizens based on their demographic profiles. The system first applies a preprocessing pipeline that normalizes raw input features—such as age, income, occupation, and caste—using a fuzzy matching algorithm to resolve lexical inconsistencies in categorical variables. A multi-label Random Forest ensemble, comprising hundreds of decision trees, then predicts eligibility probabilities across all available schemes simultaneously, and a calibrated confidence threshold selects a candidate subset of schemes. To ensure factual accuracy in the natural language explanations delivered to users, we incorporate a retrieval-augmented generation component. This module embeds verified scheme descriptions into a high-dimensional vector space, retrieves the most relevant document chunks for the candidate schemes using cosine similarity, and feeds both the retrieved context and the user’s original query into an instruction-tuned large language model. The classification stage thus acts as a computational filter that narrows the retrieval search space, thereby improving both system efficiency and response precision. The primary contribution of this work lies in the novel coupling of an ensemble-based eligibility predictor with a retrieval-constrained generative model, which prevents hallucinated outputs while remaining adaptable to large-scale, heterogeneous citizen data. Experimental evaluations on synthetic datasets, designed to mimic real-world public records, demonstrate that the framework achieves high precision in eligibility prediction and generates coherent, evidence-backed recommendations. This approach has significant implications for making complex social welfare systems more accessible to underserved populations.
Downloads
References
C Malhotra (2024) Digital India: Past, present and future. Indien im 21. Jahrhundert: Auf dem Weg zur digitalen Großmacht.
M Kos & L Foreman (2001) Using expert systems to deliver better public services. Canberra Bulletin of Public Administration.
H Linusson (2013) Multi-output random forests. diva-portal.org.
P Lewis, E Perez, A Piktus, F Petroni, et al. (2020) Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems.
S Chaudhuri, K Ganjam, V Ganti, et al. (2003) Robust and efficient fuzzy match for online data cleaning. Proceedings of.
N Reimers & I Gurevych (2019) Sentence-BERT: Sentence embeddings using siamese BERT-networks. Proceedings of.
MSM AL-Inizi (2025) Enhancing governmental decision-making through predictive analytics with machine learning-based data-driven framework. Babylonian Journal of Machine Learning.
G Tsoumakas & I Katakis (2007) Multi-label classification: An overview. International Journal of Data Warehousing and Mining.
W Zhang & J Zhang (2025) Hallucination mitigation for retrieval-augmented large language models: a review. Mathematics.
F Neha, D Bhati & DK Shukla (2025) Retrieval-augmented generation (RAG) in healthcare: A comprehensive review. AI.
N Wiratunga, R Abeyratne, L Jayawardena, et al. (2024) CBR-RAG: case-based reasoning for retrieval augmented generation in LLMs for legal question answering. International Conference on Case-Based Reasoning.
A Alsubayhay & M Abdalla (2024) Enhancing citizen engagement in E-government services through AI-driven chatbots. Sebha University Conference Proceedings.
L Wang, M Tan & J Han (2016) FastHybrid: A hybrid model for efficient answer selection. Proceedings of COLING.
G Hui & MR Hayllar (2010) Creating public value in e-Government: A public-private-citizen collaboration framework in Web 2.0. Australian Journal of Public Administration.
V Leelavathi & S Pavithra (2026) AI-Based Prediction of Beneficiary Eligibility for Government Welfare Schemes. International Journal of Engineering and Technical Research.
L Breiman (2001) Random forests. Machine Learning.
X Wu, Y Gao & D Jiao (2019) Multi-label classification based on random forest algorithm for non-intrusive load monitoring system. Processes.
H Touvron, L Martin, K Stone, P Albert, et al. (2023) Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
M Douze, A Guzhva, C Deng, J Johnson, et al. (2025) The FAISS library. IEEE Transactions on Big Data.
V Unnikrishnan & KS Imai (2020) Does the old-age pension scheme improve household welfare? Evidence from India. World Development.
International Institute for Population Sciences (IIPS), et al. (2021) National Family Health Survey (NFHS-5), India, 2019–21. International Institute for Population Sciences.
A Sengupta (2023) Pandemic on employment and earning in urban India during the first three months of pandemic period: An analysis with unit-level data of periodic labour force survey. The Indian Journal of Labour Economics.
L von Puttkamer (2016) India: Slum-free by 2022? A people-centered evaluation of the Pradhan Mantri Awas Yojana Scheme. ETH Zurich.
J Read & F Perez-Cruz (2014) Deep learning for multi-label classification. arXiv preprint arXiv:1502.05988.
T Zhang, V Kishore, F Wu, KQ Weinberger, et al. (2019) BERTScore: Evaluating text generation with BERT. arXiv preprint arXiv:1904.09675.
F Pedregosa, G Varoquaux, A Gramfort, et al. (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research.
M Veale, M Van Kleek & R Binns (2018) Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making.
C Dwork, M Hardt, T Pitassi, O Reingold, et al. (2012) Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference.
L Loezer, F Enembreck, JP Barddal, et al. (2020) Cost-sensitive learning for imbalanced data streams. Proceedings of the 35th Annual ACM Symposium on Applied Computing.
G Hovakimyan & JM Bravo (2024) Evolving strategies in machine learning: a systematic review of concept drift detection. Information.
HM Gomes, A Bifet, J Read, JP Barddal, F Enembreck, et al. (2017) Adaptive random forests for evolving data stream classification. Machine Learning.
MK Chan (2013) A dynamic model of welfare reform. Econometrica.
N Mehdiyev, C Houy, O Gutermuth, L Mayer, et al. (2021) Explainable artificial intelligence (XAI) supporting public administration processes – on the potential of XAI in tax audit processes. International Conference on Electronic Government.
SM Lundberg & SI Lee (2017) A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems.
S Kumar, S Datta, V Singh, D Datta, SK Singh, et al. (2024) Applications, challenges, and future directions of human-in-the-loop learning. IEEE Access.

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in our journal are licensed under CC-BY 4.0, which permits authors to retain copyright of their work. This license allows for unrestricted use, sharing, and reproduction of the articles, provided that proper credit is given to the original authors and the source.