GestureTalk: Real-Time Sign Language Recognition Using Deep Learning
Article Sidebar
Main Article Content
GestureTalk+, a real-time Indian Sign Language (ISL) recognition and speech synthesis system designed for commodity Android devices, combining MediaPipe landmark extraction with lightweight temporal models (BiLSTM and Transformer encoders) and quantized on-device inference to achieve sub-300 ms end-to-end latency under subject-independent evaluation.[1]The system targets accessibility in education and healthcare, addressing robustness under variable lighting, backgrounds, and camera viewpoints, and includes bilingual text-to-speech to support Hindi/English output. Experiments on a 26-letter static ISL set and 20 common dynamic words demonstrate macro-F1 of 0.93 and 0.89 respectively, outperforming SVM and CNN-TCN baselines while maintaining 30–45 FPS with NNAPI delegates on mid-range hardware. Ablation studies show that temporal attention and landmark normalization improve confusion cases (e.g., M vs N), and int8 quantization preserves accuracy while reducing compute and power. The paper contributes a reproducible edge-AI pipeline, signer-independent protocols, and deployment telemetry for latency and battery use
Downloads
References
F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C. L. Chang, and M. Grundmann, “MediaPipe hands: On-device real-time hand tracking,” arXiv preprint arXiv:2006.10214, 2020.
P. Kumar and A. Sharma, “Indian Sign Language Recognition using Deep Convolutional Neural Networks,” International Journal of Computer Applications, vol. 175, no. 8, pp. 23–29, 2021.
L. Zhang, X. Wang, Y. Li, and M. Chen, “CNN-RNN Hybrid Architecture for Dynamic Gesture Recognition in Real-time Applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 1234–1247, 2022.
J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang,…and Y. Wu, “Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions,” in Proc. IEEE ICASSP, pp. 4779–4783, 2018.
R. Patel, S. Gupta, and M. Jain, “Mobile-based Gesture Recognition System for Augmentative and Alternative Communication,” Journal of Assistive Technologies, vol. 15, no. 2, pp. 89–102, 2021.
Census of India, “Data on Disability – India and States/UTs,” Office of the Registrar General & Census Commissioner, Ministry of Home Affairs, Government of India, 2011.
Microsoft Research, “Kinect-based Sign Language Translation System for American Sign Language,” Microsoft Technical Report MSR-TR-2020-15, 2020.
G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, vol. 25, no. 11, pp. 120–125, 2000.
World Health Organization, Global Report on Assistive Technology, WHO Press, Geneva, Switzerland, 2023.
Google AI, “TensorFlow Lite: Machine Learning for Mobile and Edge Devices,” Google Developers Documentation, 2022.
A. Das, R. Kumar, and P. Singh, “Comparative Analysis of Assistive Communication Technologies in Indian Context,” in Proc. International Conference on Accessibility and Assistive Technologies, pp. 145–152, 2023.
V. Sharma and N. Patel, “Cost-effective Solutions for Speech Impairment: A Survey of Emerging Technologies,” Journal of Rehabilitation Research and Development, vol. 58, no. 3, pp. 34–48, 2021.

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in our journal are licensed under CC-BY 4.0, which permits authors to retain copyright of their work. This license allows for unrestricted use, sharing, and reproduction of the articles, provided that proper credit is given to the original authors and the source.