Automatic Extraction Of Malay Compound Nouns Using A Hybrid Of Statistical And Machine Learning Methods

Muneer A. S. Hazaa, Nazlia Omar, Fadl Mutaher Ba-Alwi, Mohammed Albared

Abstract


Identifying of compound nouns is important for a wide spectrum of applications in the field of natural language processing such as machine translation and information retrieval. Extraction of compound nouns requires deep or shallow syntactic preprocessing tools and large corpora. This paper investigates several methods for extracting Noun compounds from Malay text corpora. First, we present the empirical results of sixteen statistical association measures of Malay <N+N> compound nouns extraction. Second, we introduce the possibility of integrating multiple association measures. Third, this work also provides a standard dataset intended to provide a common platform for evaluating research on the identification compound Nouns in Malay language. The standard data set contains 7,235 unique N-N candidates, 2,970 of them are N-N compound nouns collocations. The extraction algorithms are evaluated against this reference data set. The experimental results  demonstrate that a group of association measures (T-test , Piatersky-Shapiro (PS) , C_value, FGM and  rank combination method) are the best association measure and outperforms the other association measures for <N+N> collocations in the Malay  corpus. Finally, we describe several classification methods for combining association measures scores of the basic measures, followed by their evaluation. Evaluation results show that classification algorithms significantly outperform individual association measures. Experimental results obtained are quite satisfactory in terms of the Precision, Recall and F-score.

Keywords


Malay Compound Nouns ;Statistical Methods ;Machine Learning Methods

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v6i3.pp925-935

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).