AI-Powered Document Generation: Using NLP for Intelligent Data-To-Template Mapping
Article Sidebar
Main Article Content
Abstract: Augmenting Automated Document Generation This paper introduces the Sandbox: Document Generating Engine, a novel, secure, and modular web application built with Python and Streamli (Achachlouei, A., Patil, M. A., Joshi, Q., Vair, T. & N. 2021). The primary research objective is to validate the feasibility and efficacy of augmenting Intelligent Document Processing (IDP) workflows by integrating Contemporary Large Language Models (LLMs) for semantic data-to-template mapping. Addressing the challenges of manual, time-consuming, and error-prone document creation, the system leverages Natural Language Processing (NLP) capabilities to analyze data uploaded in diverse formats (e.g., .csv, .xlsx, .txt) and automatically populate predefined document templates (Adhikari, P. R. 2018). The system features a robust secure authentication module utilizing bcrypt for password hashing and PostgreSQL for credential management. Our initial technical findings demonstrate high reliability, with Extraction Accuracy consistently over 95% across test documents. Furthermore, the system drastically reduced the time required for complex document creation, validating the capacity of LLM-enhanced IDP to yield substantial improvements in efficiency and productivity over simple rule-based methods. (Bitzenbauer, P. 2023).
Downloads
References
Achachlouei, A., Patil, M. A., Joshi, Q., Vair, T. & N. (2021). Document Automation Architectures and Technologies: A Survey. arXiv. https://arxiv.org/abs/2109.02605
Adhikari, P. R. (2018). Understanding of Plagiarism through Information Literacy: A Study among the Students of Higher Education of Nepal. Journal of Business and Social Sciences Research, 3(2), 165–181. https://doi.org/10.3126/jbssr.v3i2.28132
AlAli, R., & Wardat, Y. (2024). Opportunities and Challenges of Integrating Generative Artificial Intelligence in Education. International Journal of Religion, 5(7), 784–793. https://doi.org/10.61707/8y29gv34
Aldosari, S. A. M. (2020). The Future of Higher Education in the Light of Artificial Intelligence Transformations. International Journal of Higher Education, 9(3), 145. https://doi.org/10.5430/ijhe.v9n3p145
Almahasees, Z., Khalil, M., & Am inzadeh, S. (2024). Students’ Perceptions of the Benefits and Challenges of Integrating ChatGPT in Higher Education. Pakistan Journal of Life and Social Sciences (PJLSS), 22(2), 3479–3494. https://doi.org/10.57239/PJLSS-2024-22.2.00256
Archila, P. A., Ortiz, B. T., Truscott de Mejía, A.-M., & Molina, J. (2024). Thinking critically about scientific information generated by ChatGPT. Information and Learning Science. https://doi.org/10.1108/ILS-04-2024-0040
Arora, S., Yang, S., Eyuboglu, B., Narayan, S., Hojel, A., Trummer, A., & E., I. R. (2023). Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes. Proc. VLDB Endow., 17(2), 92–104. https://doi.org/10.14778/3620359.3620366
Athaluri, A. S., Manthena, S. V., K., M. V. S. R., Kesapragada, V., Yarlagadda, T., Dave, & Dudumpudi, R. T. S. (2023). Exploring the Boundaries of Reality: Investigating the Phenomenon of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT References. Cureus, 15(12). https://doi.org/10.7759/cureus.49964
Bakiri, H., Mbembati, H., & Tinabo, R. (2023). Artificial Intelligence Services at Academic Libraries in Tanzania: Awareness, Adoption and Prospects. University of Dar Es Salaam Library Journal, 18(2).https://doi.org/10.4314/udslj.v18i2.3
Bearman, M., Tai, J., Dawson, P., Boud, D., & Ajjawi, R. (2024). Developing evaluative judgement for a time of generative artificial intelligence. Assessment & Evaluation in Higher Education, 49(6), 893–905. https://doi.org/10.1080/02602938.2024.2335321
Biswas, S., Jain, S., Morariu, R., Gu, V. L., Mathur, J., Wigington, P., Sun, C., & Uehida, T. (2024). DocSynthV2: A Practical Autoregressive Modelling for Document Generation. arXiv. https://arxiv.org/abs/2406.02492.
Bitzenbauer, P. (2023). ChatGPT in physics education: A pilot study on easy-to-implement activities. Contemporary Educational Technology, 15(3), ep430. https://doi.org/10.30935/cedtech/13176.
Borkovska, I., Kolosova, H., Kozubska, I., & Antonenko, I. (2024). Integration of AI into the Distance Learning Environment: Enhancing Soft Skills. Arab World English Journal, 1(1), 56–72. https://doi.org/10.24093/awej/ChatGPT.3
Bozkurt, A. (2024). Tell Me Your Prompts and I Will Make Them True: The Alchemy of Prompt Engineering and Generative AI. Open Praxis, 16(2), 111–118. https://doi.org/10.55982/openpraxis.16.2.661
Bradley, C. (2013). Information Literacy Articles in Science Pedagogy Journals. Evidence Based Library and Information Practice, 8(4), 78–92. https://doi.org/10.18438/B8JG76
Cain, W. (2024). Prompting Change: Exploring Prompt Engineering in Large Language Model AI and Its Potential to Transform Education. TechTrends, 68(1), 47–57. https://doi.org/10.1007/s11528-023-00896-0
Carroll, A. J., & Borycz, J. (2024). Integrating large language models and generative artificial intelligence tools into information literacy instruction. The Journal of Academic Librarianship, 50(4), 102899.https://doi.org/10.1016/j.acalib.2024.102899
ÇAYIR, A. (2023). A Literature Review on the Effect of Artificial Intelligence on Education. İnsan ve Sosyal Bilimler Dergisi, 6(2), 276–288. https://doi.org/10.53048/johass.1375684
Lin, C.-H., & Cheng, C. P. (2024). Legal Documents Drafting with Fine-Tuned Pre-trained Large Language Model. arXiv. https://arxiv.org/abs/2406.08860
Mohammadi, B., et al. (2024). Creativity Has Left the Chat: The Price of Debiasing Language Models. arXiv. https://arxiv.org/abs/2403.04595
Mridul, M. A., Sloyan, I., Gupta, A., & Seneviratne, O. (2025). AI4Contracts: LLM & RAG-Powered Encoding of Financial Derivative Contracts. arXiv. https://arxiv.org/abs/2506.09633
Nigam, S. K., Patnaik, B. D., Thomas, A. V., Shallum, N., Ghosh, K., & Bhattacharya, A. (2025). Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhiDastavej. International Journal of Law, Technology, and Management. https://doi.org/10.48550/arXiv.2506.09540
Zhao, H., & Li, D. (2024). A Large Language Model-based Framework for Semi-Structured Tender Document Retrieval–Augmented Generation. arXiv. https://arxiv.org/abs/2403.18560
Zhang, Q., Huang, B., Jiang, V., Wang, J., Jiang, Z., He, L., & Zhang, C. (2024). Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction. ResearchGate. https://arxiv.org/abs/2403.11186

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in our journal are licensed under CC-BY 4.0, which permits authors to retain copyright of their work. This license allows for unrestricted use, sharing, and reproduction of the articles, provided that proper credit is given to the original authors and the source.