Return to Article Details An Explainability-Driven Framework for Interpretable Cross-Modal Image-Text Retrieval Using CLIP