
www.rsisinternational.org
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue V, May 2026
6. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2020). An
image is worth 16×16 words: Transformers for image recognition at scale. arXiv.
https://arxiv.org/abs/2010.11929
7. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021). Swin Transformer: Hierarchical vision
transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on
Computer Vision (pp. 10012–10022). https://doi.org/10.1109/ICCV48922.2021.00986
8. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv.
https://arxiv.org/abs/1702.08608
9. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual
explanations from deep networks via gradient-based localization. In Proceedings of the IEEE
International Conference on Computer Vision (pp. 618–626). https://doi.org/10.1109/ICCV.2017.74
10. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). SLIC superpixels compared
to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence,
34(11), 2274–2282. https://doi.org/10.1109/TPAMI.2012.120
11. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?”: Explaining the predictions
of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (pp. 1135–1144).
https://doi.org/10.1145/2939672.2939778
12. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances
in Neural Information Processing Systems (Vol. 30, pp. 4765–4774).
13. Abnar, S., & Zuidema, W. (2020). Quantifying attention flow in transformers. In Proceedings of the 58th
Annual Meeting of the Association for Computational Linguistics (pp. 4190–4197).
https://doi.org/10.18653/v1/2020.acl-main.385
14. Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). CBAM: Convolutional block attention module. In
Proceedings of the European Conference on Computer Vision (pp. 3–19). https://doi.org/10.1007/978-3-
030-01234-2_1
15. Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., & Müller, K. R. (2021). Explaining deep neural
networks and beyond: A review of methods and applications. Proceedings of the IEEE, 109(3), 247–278.
https://doi.org/10.1109/JPROC.2021.3060483
16. Hughes, D. P., & Salathé, M. (2015). An open access repository of images on plant health to enable the
development of mobile disease diagnostics. arXiv. https://arxiv.org/abs/1511.08060
17. Singh, D., Jain, N., Jain, P., Kayal, P., Kumawat, S., & Batra, N. (2020). PlantDoc: A dataset for visual
plant disease detection. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD (pp. 249–253).
https://doi.org/10.1145/3371158.3371196
18. Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In
Proceedings of the IEEE International Conference on Computer Vision (pp. 2980–2988).
https://doi.org/10.1109/ICCV.2017.324
19. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., et al. (2021). Learning
transferable visual models from natural language supervision. In Proceedings of the 38th International
Conference on Machine Learning (pp. 8748–8763).
20. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning
of visual representations. In Proceedings of the 37th International Conference on Machine Learning (pp.
1597–1607).
21. McMahan, B., Moore, E., Ramage, D., Hampson, S., & Agüera y Arcas, B. (2017). Communication-
efficient learning of deep networks from decentralized data. In Proceedings of the 20th International
Conference on Artificial Intelligence and Statistics (pp. 1273–1282).
22. Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1251–1258).
https://doi.org/10.1109/CVPR.2017.195
23. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for
discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 2921–2929).
https://doi.org/10.1109/CVPR.2016.319