ISSN 2394-5125
 

Research Article 


PART-OF-SPEECH (POS) TAGGER FOR MALAY LANGUAGE USING NAVE BAYES AND K-NEAREST NEIGHBOR MODEL

Shamsan Gaber, Mohd Zakree Ahmad Nazri, Nazlia Omar, Salwani Abdullah.

Abstract
Part-of-Speech (POS) tagging effectiveness is essential in the era of the 4th industrial revolution
as high technology machines such as cars and smart homes can be controlled using human voice command. POS
tagger is important in many domains, including information retrieval. POS tags such as verb or noun, in turn,
can be used as features for higher-level natural language processing (NLP) tasks such as Named Entity
Recognition, Sentiment Analysis, and Question Answering chatbots. However, research on developing an
effective part-of-speech (POS) tagger for the Malay language is still in its infancy. Many existing methods that
have been tested in English have not been tested for the Malay language. This study presents an experiment to
tag Malay words using the supervised machine learning (ML) approach. The purpose of this work is to
investigate the performance of the supervised ML approaches in tagging Malay words and the effectiveness of
the affixes-based feature patterns. The Nave Bayes and k-nearest neighbor models have been used to assign a
specific tag for the words. A corpus obtained from Dewan Bahasa dan Pustaka (DBP) has been used in this
experiment. DBP has defined 21 tagsets (categories) for the corpus. We have used two sizes of corpora for the
tests, which have 20,000 tokens and 40,000 tokens. Moreover, affixes-based feature pattern engineering has
been extracted from the corpora to improve the process of tagging.

Key words: Natural Language Processing, Machine Learning, Part-of-speech Tagging, Malay Language


 
ARTICLE TOOLS
Abstract
PDF Fulltext
How to cite this articleHow to cite this article
Citation Tools
Related Records
 Articles by Shamsan Gaber
Articles by Mohd Zakree Ahmad Nazri
Articles by Nazlia Omar
Articles by Salwani Abdullah
on Google
on Google Scholar


How to Cite this Article
Pubmed Style

Shamsan Gaber, Mohd Zakree Ahmad Nazri, Nazlia Omar, Salwani Abdullah. PART-OF-SPEECH (POS) TAGGER FOR MALAY LANGUAGE USING NAVE BAYES AND K-NEAREST NEIGHBOR MODEL. JCR. 2020; 7(16): 248-257. doi:10.31838/jcr.07.16.33


Web Style

Shamsan Gaber, Mohd Zakree Ahmad Nazri, Nazlia Omar, Salwani Abdullah. PART-OF-SPEECH (POS) TAGGER FOR MALAY LANGUAGE USING NAVE BAYES AND K-NEAREST NEIGHBOR MODEL. http://www.jcreview.com/?mno=3479 [Access: June 01, 2021]. doi:10.31838/jcr.07.16.33


AMA (American Medical Association) Style

Shamsan Gaber, Mohd Zakree Ahmad Nazri, Nazlia Omar, Salwani Abdullah. PART-OF-SPEECH (POS) TAGGER FOR MALAY LANGUAGE USING NAVE BAYES AND K-NEAREST NEIGHBOR MODEL. JCR. 2020; 7(16): 248-257. doi:10.31838/jcr.07.16.33



Vancouver/ICMJE Style

Shamsan Gaber, Mohd Zakree Ahmad Nazri, Nazlia Omar, Salwani Abdullah. PART-OF-SPEECH (POS) TAGGER FOR MALAY LANGUAGE USING NAVE BAYES AND K-NEAREST NEIGHBOR MODEL. JCR. (2020), [cited June 01, 2021]; 7(16): 248-257. doi:10.31838/jcr.07.16.33



Harvard Style

Shamsan Gaber, Mohd Zakree Ahmad Nazri, Nazlia Omar, Salwani Abdullah (2020) PART-OF-SPEECH (POS) TAGGER FOR MALAY LANGUAGE USING NAVE BAYES AND K-NEAREST NEIGHBOR MODEL. JCR, 7 (16), 248-257. doi:10.31838/jcr.07.16.33



Turabian Style

Shamsan Gaber, Mohd Zakree Ahmad Nazri, Nazlia Omar, Salwani Abdullah. 2020. PART-OF-SPEECH (POS) TAGGER FOR MALAY LANGUAGE USING NAVE BAYES AND K-NEAREST NEIGHBOR MODEL. Journal of Critical Reviews, 7 (16), 248-257. doi:10.31838/jcr.07.16.33



Chicago Style

Shamsan Gaber, Mohd Zakree Ahmad Nazri, Nazlia Omar, Salwani Abdullah. "PART-OF-SPEECH (POS) TAGGER FOR MALAY LANGUAGE USING NAVE BAYES AND K-NEAREST NEIGHBOR MODEL." Journal of Critical Reviews 7 (2020), 248-257. doi:10.31838/jcr.07.16.33



MLA (The Modern Language Association) Style

Shamsan Gaber, Mohd Zakree Ahmad Nazri, Nazlia Omar, Salwani Abdullah. "PART-OF-SPEECH (POS) TAGGER FOR MALAY LANGUAGE USING NAVE BAYES AND K-NEAREST NEIGHBOR MODEL." Journal of Critical Reviews 7.16 (2020), 248-257. Print. doi:10.31838/jcr.07.16.33



APA (American Psychological Association) Style

Shamsan Gaber, Mohd Zakree Ahmad Nazri, Nazlia Omar, Salwani Abdullah (2020) PART-OF-SPEECH (POS) TAGGER FOR MALAY LANGUAGE USING NAVE BAYES AND K-NEAREST NEIGHBOR MODEL. Journal of Critical Reviews, 7 (16), 248-257. doi:10.31838/jcr.07.16.33