Floatchat RAG: An AI-Powered Conversational System for Argo Oceanographic Data Exploration Using Retrieval-Augmented Generation

Article Sidebar

Main Article Content

Dr. R. Madhavi
Moka Abhived
Tippabhotla Sri Harshavardhan
Polimetla Dennis Prathyush Paul

Oceanographic research involves massive volumes of heterogeneous data produced by autonomous profiling floats. The Argo program, one of the world's largest ocean observation efforts, generates datasets in NetCDF format containing temperature, salinity, and pressure measurements at varying ocean depths. However, accessing and querying this data requires specialized knowledge of scientific programming, data formats, and oceanographic conventions, creating barriers for non-technical users. This paper presents FloatChat RAG, an AI-powered conversational system that uses Retrieval-Augmented Generation (RAG) to enable natural language exploration of Argo float data. The system processes Argo NetCDF files streamed via OPeNDAP from NOAA's THREDDS servers into a SQLite relational database and generates semantic vector embeddings stored in ChromaDB using the all-MiniLM-L6-v2 sentence transformer model. A LangChain-based tool-calling agent, powered by Google's Gemini large language model, interprets user queries and autonomously selects from nine specialized tools spanning semantic search, structured SQL retrieval, geographic and temporal filtering, and interactive Plotly visualization generation. The system incorporates reliability mechanisms including API key rotation, deterministic fallback routes, and response caching. Evaluation on a proof-of-concept dataset from Indian Ocean Argo floats demonstrates 93.3% tool selection accuracy across 30 test queries, 100% factual correctness on deterministic queries, a semantic search precision@5 of 1.00, and a 0% hallucination rate. The system bridges the gap between raw oceanographic data and actionable insights through an intuitive Streamlit chat interface.

Floatchat RAG: An AI-Powered Conversational System for Argo Oceanographic Data Exploration Using Retrieval-Augmented Generation. (2026). International Journal of Latest Technology in Engineering Management & Applied Science, 15(5), 1292-1301. https://doi.org/10.51583/IJLTEMAS.2026.150500100

Downloads

References

Argo Data Management Team, “Argo User’s Manual V3.41,” IFREMER, 2023. https://doi.org/10.13155/29825

Argo Science Team, “Argo: The Global Array of Profiling Floats,” CLIVAR Exchanges, vol. 5, no. 4, pp. 2–3, 2000.

B. Eaton et al., “NetCDF Climate and Forecast (CF) Metadata Conventions, Version 1.6,” 2011. Available: https://cfconventions.org/

J. Gallagher, N. Potter, T. Sgouros, S. Flierl, and S. Hankin, “The Data Access Protocol — DAP 2.0,” in Proc. ESA-ESO-NASA-NSF Conf. on Astronomical Data Analysis Software and Systems, 2005.

Y. Gao et al., “Retrieval-Augmented Generation for Large Language Models: A Survey,” arXiv preprint arXiv:2312.10997, 2024. https://doi.org/10.48550/arXiv.2312.10997

S. Hoyer and J. Hamman, “xarray: N-D labeled arrays and datasets in Python,” J. Open Research Software, vol. 5, no. 1, p. 10, 2017. https://doi.org/10.5334/jors.148

P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020.

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” in Proc. EMNLP, pp. 3982–3992, 2019. https://doi.org/10.18653/v1/D19-1410

R. Rew and G. Davis, “NetCDF: An interface for scientific data access,” IEEE Computer Graphics and Applications, vol. 10, no. 4, pp. 76–82, 1990. https://doi.org/10.1109/38.56302

D. Roemmich et al., “The Argo Program: Observing the global ocean with profiling floats,” Oceanography, vol. 22, no. 2, pp. 34–43, 2009. https://doi.org/10.5670/oceanog.2009.36

T. Schick et al., “Toolformer: Language Models Can Teach Themselves to Use Tools,” Advances in Neural Information Processing Systems, vol. 36, 2024.

K. Shuster, S. Poff, M. Chen, D. Kiela, and J. Weston, “Retrieval Augmentation Reduces Hallucination in Conversation,” in Findings of the ACL: EMNLP 2021, pp. 3784–3803, 2021. https://doi.org/10.18653/v1/2021.findings-emnlp.320

T. Tucker, D. Giglio, M. Scanderbeg, and S. S. P. Shen, “Argovis: A Web Application for Fast Delivery, Visualization, and Analysis of Argo Data,” J. Atmospheric and Oceanic Technology, vol. 37, no. 3, pp. 401–416, 2020. https://doi.org/10.1175/JTECH-D-19-0041.1

Article Details

How to Cite

Floatchat RAG: An AI-Powered Conversational System for Argo Oceanographic Data Exploration Using Retrieval-Augmented Generation. (2026). International Journal of Latest Technology in Engineering Management & Applied Science, 15(5), 1292-1301. https://doi.org/10.51583/IJLTEMAS.2026.150500100