
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
5. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.,
Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive
NLP tasks. NeurIPS 2020.
6. Anthropic. (2024). Prompt Caching (API feature). https://www.anthropic.com/news/prompt-caching
7. Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv
preprint arXiv:2312.00752.
8. Meng, K., Bau, D., Andonian, A., & Belinkov, Y. (2022). Locating and editing factual associations in
GPT. NeurIPS 2022.
9. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA:
Low-rank adaptation of large language models. ICLR 2022.
10. Anthropic. (2024). Model Context Protocol. https://modelcontextprotocol.io
11. Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom,
T. (2023). Toolformer: Language models can teach themselves to use tools. NeurIPS 2023.
12. Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., Lin, Y., Cong, X., Tang, X., Qian, B., Zhao, S., Hong,
L., Tian, R., Xie, R., Zhou, J., Gerstein, M., Li, D., Liu, Z., & Sun, M. (2023). ToolLLM: Facilitating large
language models to master 16000+ real-world APIs. ICLR 2024.
13. Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., & Anandkumar, A. (2023).
Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
14. Park, J. S., O’Brien, J., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents:
Interactive simulacra of human behavior. UIST 2023.
15. Gravitas, S. (2023). AutoGPT: An autonomous GPT-4 experiment. https://github.com/Significant-
Gravitas/AutoGPT
16. Turner, A., Thiergart, L., Udell, D., Leech, G., Mini, U., & MacDiarmid, M. (2023). Activation addition:
Steering language models without optimization. arXiv preprint arXiv:2308.10248.
17. Zou, A., Phan, L., Chen, S., Campbell, J., Guo, B., Ren, R., Pan, A., Yin, P., Mazeika, M., Dombrowski,
A. K., Goel, S., Li, N., Byun, M., Wang, Z., Mallen, A., Schwinn, L., Bhatt, U., Steinhardt, J., Fredrikson,
M., & Hendrycks, D. (2023). Representation engineering: A top-down approach to AI transparency. arXiv
preprint arXiv:2310.01405.
18. Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. ACL-IJCNLP
2021.
19. Fu, D., Dao, T., Saab, K. K., Thomas, A. W., Rudra, A., & Ré, C. (2023). Hungry hungry hippos: Towards
language modeling with state space models. ICLR 2023.
20. Poli, M., Massaroli, S., Nguyen, E., Fu, D., Dao, T., Baccus, S., Bengio, Y., Ermon, S., & Ré, C. (2023).
Hyena hierarchy: Towards larger convolutional language models. ICML 2023.
21. Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient finetuning of
quantized LLMs. Advances in Neural Information Processing Systems, 36.
22. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N.,
Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and
efficient foundation language models. arXiv preprint arXiv:2302.13971.
23. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I.
(2017). Attention is all you need. NeurIPS 2017.
24. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022).
Chain-of-thought prompting elicits reasoning in large language models. NeurIPS 2022
25. Z. Zhang et al. “H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.”
NeurIPS 2023.
26. G. Xiao et al. “Efficient Streaming Language Models with Attention Sinks (StreamingLLM).” ICLR 2024.
27. Z. Liu et al. “Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache
Compression at Test Time.” NeurIPS 2023.
28. J. Mu, X. Li, N. Goodman. “Learning to Compress Prompts with Gist Tokens.” NeurIPS 2023.
29. A. Gu et al. “Efficiently Modeling Long Sequences with Structured State Spaces (S4).” ICLR 2022.
30. A. Gu, T. Dao. “Mamba: Linear-Time Sequence Modeling with Selective State Spaces.”
arXiv:2312.00752, 2023.