
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XV, Issue I, January 2026
www.rsisinternational.org
Overall, the study shows that an HMM-based POS tagger can serve as a strong baseline for English–Punjabi
code-mixed NLP and provides a useful annotated resource and analysis for future advances. The corpus,
approach, and published findings serve as a basis for advancing POS tagging toward more reliable models and
for facilitating downstream uses, including information extraction in bilingual contexts, sentiment analysis, and
conversational comprehension.
REFERENCES
1. AlGhamdi, F., Molina, G., Diab, M., Solorio, T., Hawwari, A., Soto, V., & Hirschberg, J. (2016,
November). Part of speech tagging for code switched data. In Proceedings of the Second Workshop
on Computational Approaches to Code Switching (pp. 98-107).
2. Baig, A., Rahman, M. U., Kazi, H., & Baloch, A. (2020). Developing a pos tagged corpus of urdu
tweets. Computers, 9(4), 90.
https://doi.org/10.3390/computers9040090
3. Bansal, N., Goyal, V., & Rani, S. (2020). Experimenting language identification for sentiment
analysis of english punjabi code mixed social media text. International Journal of E-Adoption
(IJEA), 12(1), 52-62.
https://doi.org/10.4018/IJEA.2020010105
4. Gill, M. S., Lehal, G. S., & Joshi, S. S. (2009). Part of speech tagging for grammar checking of
punjabi. The Linguistic Journal, 4(1), 6-21.
5. Jamatia, A., Gambäck, B., & Das, A. (2015). Part-of-speech tagging for code-mixed english-hindi
twitter and facebook chat messages. Association for Computational Linguistics.
6. Jamatia, A., Das, A., & Gamback, B. (2020). Deep learning-based language identification in
English-Hindi-Bengali code-mixed social media corpora. Journal of Intelligent Systems, 28(3), 399-
408.
https://doi.org/10.1515/jisys-2017-0440
7. Kumar, S., Kumar, M. A., & Soman, K. P. (2019). Deep learning based part-of-speech tagging for
Malayalam Twitter data (Special issue: deep learning techniques for natural language
processing). Journal of Intelligent Systems, 28(3), 423-435. https://doi.org/10.1515/jisys-2017-
0520
8. Nikiforos, M. N., & Kermanidis, K. L. (2020, May). A supervised part-of-speech tagger for the
Greek language of the social web. In Proceedings of the Twelfth Language Resources and
Evaluation Conference (pp. 3861-3867).
https://aclanthology.org/2020.lrec-1.476/
9. Pakray, P., Majumder, G., & Pathak, A. (2018, January). An hmm based pos tagger for pos tagging
of code-mixed indian social media text. In Annual Convention of the Computer Society of India (pp.
495-504). Singapore: Springer Singapore. https://doi.org/10.1007/978-981-13-1343-1_41
10. Pathak, D., Nandi, S., & Sarmah, P. (2023). Part-of-speech tagger for assamese using ensembling
approach. ACM Transactions on Asian and Low-Resource Language Information
Processing, 22(10), 1-22.
https://doi.org/10.1145/3617653
11. Paul, A., Purkayastha, B. S., & Sarkar, S. (2015, September). Hidden Markov model based part of
speech tagging for Nepali language. In 2015 international symposium on advanced computing and
communication (ISACC) (pp. 149-156). IEEE. DOI: https://doi.org/10.1109/ISACC.2015.7377332
12. Pimpale, P. B., & Patel, R. N. (2016). Experiments with POS tagging code-mixed Indian social
media text. arXiv preprint arXiv:1610.09799.
https://doi.org/10.48550/arXiv.1610.09799
13. Raha, T., Mahata, S., Das, D., & Bandyopadhyay, S. (2019, December). Development of pos tagger
for english-bengali code-mixed data. In Proceedings of the 16th International Conference on
Natural Language Processing (pp. 143-149). https://aclanthology.org/2019.icon-1.17/
14. Saharia, N., Das, D., Sharma, U., & Kalita, J. (2009, August). Part of speech tagger for Assamese
text. In Proceedings of the ACL-IJCNLP 2009 conference short papers (pp. 33-36). DOI
10.4018/IJSE.2018010102
15. Santiago-Benito, H., Cordova-Esparza, D. M., Castro-Sanchez, N. A., Terven, J., Romero-
González, J. A., & Garcia-Ramirez, T. (2025). Automatic grammatical tagger for a Spanish–Mixtec
parallel corpus. SoftwareX, 29, 101985.
https://doi.org/10.1016/j.softx.2024.101985
16. Sarkar, K., & Gayen, V. (2013). A trigram HMM-based POS tagger for Indian languages.
In Proceedings of the international conference on frontiers of intelligent computing: theory and
applications (FICTA) (pp. 205-212). Berlin, Heidelberg: Springer Berlin Heidelberg.