Enhancing Performance in Medical Articles Summarization with Multi-Feature Selection
Abstract
The research aimed at providing an outcome summary of extraordinary events information for public health surveillance systems based on the extraction of online medical articles. The data set used is 7,346 pieces. Characteristics possessed by online medical articles include paragraphs that comprise more than one and the core location of the story or important sentences scattered at the beginning, middle and end of a paragraph. Therefore, this study conducted a summary by maintaining important phrases related to the information of extraordinary events scattered in every paragraph in the medical article online. The summary method used is maximal marginal relevance with an n-best value of 0.7. While the multi feature selection in question is the use of features to improve the performance of the summary system. The first feature selection is the use of title and statistic number of word and noun occurrence, and weighting tf-idf. In addition, other features are word level category in medical content patterns to identify important sentences of each paragraph in the online medical article. The important sentences defined in this study are classified into three categories: core sentence, explanatory sentence, and supporting sentence. The system test in this study was divided into two categories, such as extrinsic and intrinsic test. Extrinsic test is comparing the summary results of the decisions made by the experts with the output resulting from the system. While intrinsic test compared three n-Best weighting value method, feature selection combination, and combined feature selection combination with word level category in medical content. The extrinsic evaluation result was 72%. While intrinsic evaluation result of feature selection combination merger method with word category in medical content was 91,6% for precision, 92,6% for recall and f-measure was 92,2%.
Keywords
text summarization; feature selection; N-Best; second opinion; weighting; word level category in medical content
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v8i4.pp2299-2309
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).