AI-Driven Welfare Scheme Recommendation Using Random Forest and RAG

Article Sidebar

Main Article Content

A.karunamurthy
S.Barath

We propose a scalable framework that integrates a multi-output Random Forest classifier with a retrieval-augmented generation module to recommend government welfare schemes to citizens based on their demographic profiles. The system first applies a preprocessing pipeline that normalizes raw input features—such as age, income, occupation, and caste—using a fuzzy matching algorithm to resolve lexical inconsistencies in categorical variables. A multi-label Random Forest ensemble, comprising hundreds of decision trees, then predicts eligibility probabilities across all available schemes simultaneously, and a calibrated confidence threshold selects a candidate subset of schemes. To ensure factual accuracy in the natural language explanations delivered to users, we incorporate a retrieval-augmented generation component. This module embeds verified scheme descriptions into a high-dimensional vector space, retrieves the most relevant document chunks for the candidate schemes using cosine similarity, and feeds both the retrieved context and the user’s original query into an instruction-tuned large language model. The classification stage thus acts as a computational filter that narrows the retrieval search space, thereby improving both system efficiency and response precision. The primary contribution of this work lies in the novel coupling of an ensemble-based eligibility predictor with a retrieval-constrained generative model, which prevents hallucinated outputs while remaining adaptable to large-scale, heterogeneous citizen data. Experimental evaluations on synthetic datasets, designed to mimic real-world public records, demonstrate that the framework achieves high precision in eligibility prediction and generates coherent, evidence-backed recommendations. This approach has significant implications for making complex social welfare systems more accessible to underserved populations.

AI-Driven Welfare Scheme Recommendation Using Random Forest and RAG. (2026). International Journal of Latest Technology in Engineering Management & Applied Science, 15(5), 2449-2466. https://doi.org/10.51583/

Downloads

References

C Malhotra (2024) Digital India: Past, present and future. Indien im 21. Jahrhundert: Auf dem Weg zur digitalen Großmacht.

M Kos & L Foreman (2001) Using expert systems to deliver better public services. Canberra Bulletin of Public Administration.

H Linusson (2013) Multi-output random forests. diva-portal.org.

P Lewis, E Perez, A Piktus, F Petroni, et al. (2020) Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems.

S Chaudhuri, K Ganjam, V Ganti, et al. (2003) Robust and efficient fuzzy match for online data cleaning. Proceedings of.

N Reimers & I Gurevych (2019) Sentence-BERT: Sentence embeddings using siamese BERT-networks. Proceedings of.

MSM AL-Inizi (2025) Enhancing governmental decision-making through predictive analytics with machine learning-based data-driven framework. Babylonian Journal of Machine Learning.

G Tsoumakas & I Katakis (2007) Multi-label classification: An overview. International Journal of Data Warehousing and Mining.

W Zhang & J Zhang (2025) Hallucination mitigation for retrieval-augmented large language models: a review. Mathematics.

F Neha, D Bhati & DK Shukla (2025) Retrieval-augmented generation (RAG) in healthcare: A comprehensive review. AI.

N Wiratunga, R Abeyratne, L Jayawardena, et al. (2024) CBR-RAG: case-based reasoning for retrieval augmented generation in LLMs for legal question answering. International Conference on Case-Based Reasoning.

A Alsubayhay & M Abdalla (2024) Enhancing citizen engagement in E-government services through AI-driven chatbots. Sebha University Conference Proceedings.

L Wang, M Tan & J Han (2016) FastHybrid: A hybrid model for efficient answer selection. Proceedings of COLING.

G Hui & MR Hayllar (2010) Creating public value in e-Government: A public-private-citizen collaboration framework in Web 2.0. Australian Journal of Public Administration.

V Leelavathi & S Pavithra (2026) AI-Based Prediction of Beneficiary Eligibility for Government Welfare Schemes. International Journal of Engineering and Technical Research.

L Breiman (2001) Random forests. Machine Learning.

X Wu, Y Gao & D Jiao (2019) Multi-label classification based on random forest algorithm for non-intrusive load monitoring system. Processes.

H Touvron, L Martin, K Stone, P Albert, et al. (2023) Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.

M Douze, A Guzhva, C Deng, J Johnson, et al. (2025) The FAISS library. IEEE Transactions on Big Data.

V Unnikrishnan & KS Imai (2020) Does the old-age pension scheme improve household welfare? Evidence from India. World Development.

International Institute for Population Sciences (IIPS), et al. (2021) National Family Health Survey (NFHS-5), India, 2019–21. International Institute for Population Sciences.

A Sengupta (2023) Pandemic on employment and earning in urban India during the first three months of pandemic period: An analysis with unit-level data of periodic labour force survey. The Indian Journal of Labour Economics.

L von Puttkamer (2016) India: Slum-free by 2022? A people-centered evaluation of the Pradhan Mantri Awas Yojana Scheme. ETH Zurich.

J Read & F Perez-Cruz (2014) Deep learning for multi-label classification. arXiv preprint arXiv:1502.05988.

T Zhang, V Kishore, F Wu, KQ Weinberger, et al. (2019) BERTScore: Evaluating text generation with BERT. arXiv preprint arXiv:1904.09675.

F Pedregosa, G Varoquaux, A Gramfort, et al. (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research.

M Veale, M Van Kleek & R Binns (2018) Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making.

C Dwork, M Hardt, T Pitassi, O Reingold, et al. (2012) Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference.

L Loezer, F Enembreck, JP Barddal, et al. (2020) Cost-sensitive learning for imbalanced data streams. Proceedings of the 35th Annual ACM Symposium on Applied Computing.

G Hovakimyan & JM Bravo (2024) Evolving strategies in machine learning: a systematic review of concept drift detection. Information.

HM Gomes, A Bifet, J Read, JP Barddal, F Enembreck, et al. (2017) Adaptive random forests for evolving data stream classification. Machine Learning.

MK Chan (2013) A dynamic model of welfare reform. Econometrica.

N Mehdiyev, C Houy, O Gutermuth, L Mayer, et al. (2021) Explainable artificial intelligence (XAI) supporting public administration processes – on the potential of XAI in tax audit processes. International Conference on Electronic Government.

SM Lundberg & SI Lee (2017) A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems.

S Kumar, S Datta, V Singh, D Datta, SK Singh, et al. (2024) Applications, challenges, and future directions of human-in-the-loop learning. IEEE Access.

Article Details

How to Cite

AI-Driven Welfare Scheme Recommendation Using Random Forest and RAG. (2026). International Journal of Latest Technology in Engineering Management & Applied Science, 15(5), 2449-2466. https://doi.org/10.51583/