A Comparison of Conditional Random Fields and Hidden Markov Model for The Nyishi Part of The Speech Tagging Task

Main Article Content

Joyir Siram Murtem, Dr. Koj Sambyo, Dr. Achyuth Sarkar

Abstract

Part-of-speech (POS) tagging is used to identify the grammatical function of words in a document. POS refers to word clusters that have common grammatical properties. Nouns, verbs, adjectives, adverbs, pronouns, adverbs, conjunctions, and prepositions make up the majority of POS in English. An estimated three lakh Nyishi people speak the Tani branch of the Sino-Tibetan language, making them one of the most populous ethnic groups in the Indian state of Arunachal Pradesh. Using Conditional Random Fields (CRFs) and Hidden Markov Models (HMMs), this paper discusses POS tagging research for the Nyishi language. In Nyishi, the POS tagging challenge is paired with the word identification difficulty, making it harder to solve than its English equivalent. In this research, the authors have developed a tagset and POS tagging task. Experiments showed that compared to the HMM approach, the proposed technique (CRF) for the Nyishi language achieves higher F-Measure (87%), Precision (89%), and Recall (89%).

Article Details

Section
Articles