Deep sequential pattern mining for readability enhancement of Indonesian summarization
Abstract
In text summarization research, readability is a great issue that must be addressed. Our hypothesis is readability can be accomplished by using text representations that keep the meaning of text documents intact. Therefore, this study aims to combine sequential pattern mining (SPM) in producing a sequence of a word as text representation with unsupervised deep learning to produce an Indonesian text summary called DeepSPM. This research uses PrefixSpan as an SPM algorithm and deep belief network (DBN) as an unsupervised deep learning method. This research uses 18,774 Indonesian news text from IndoSum. The readability aspect is evaluated by recall-oriented understudy for gisting evaluation (ROUGE) as a co-selection-based analysis; Dwiyanto Djoko Pranowo metrics, Gunning fog index (GFI), and Flesch-Kincaid grade level (FKGL) as content-based analysis; and human readability evaluation with two experts. The experiment result shows that DeepSPM yields better than DBN, with the F-measure value of ROUGE-1 enhanced to 0.462, ROUGE-2 is 0.37, and ROUGE-L is 0.41. The significance of ROUGE results also be tested using T-Test. The content-based analysis and human readability evaluation findings are conformable with the findings of co-selection-based analysis that generated summaries are only partially readable or have a medium level of readability aspect.
Keywords
Automatic text summarization; Deep learning; Indonesian language; Readability evaluation; Sequential pattern mining;
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v14i1.pp782-795
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).