Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-Nearest Neighbor

ABSTRACT


INTRODUCTION
According to a survey conducted by IDC (International Data Corporation), a market research agency in the United States, in 2013 to 2020 the number of digital information will continue to grow corresponding the factor of 10, from 4 trillion gigabytes to 44 trillion gigabytes. This is commensurate with the growing number of users of social media nowadays since they want to be able to exchange information more quickly. However, not all information displayed always has a good opinion value. There are multiple opinions that can be either positive or negative to a particular topic that is being discussed.
One of the most widely circulated information today is the opinion of 2013 curriculum by Indonesian Ministry of Education and Culture. The 2013 curriculum is a new curriculum to succeed the old 2006 curriculum (often referred as KTSP) in the Indonesian education system [1][2]. The application of this new curriculum reaps a variety of opinions from public. There are some significant differences between this new curriculum and the old one such as students are required to be active, teachers only submit materials and students must find out for themselves, there are some lessons that are eliminated, require scouts and other things that increasingly provoked various opinions about the topic especially among twitter users.
Twitter is one of the largest and most dynamic social media contributors based on user-generated content. It is very popular among Indonesian people. In Twitter, users can post status or a message that is called as a tweet that is not more than 140 characters. It is estimated that there are about 400 million tweets posted by 200 million users daily [3]. In this study, sentiment analysis system is built to know the positive or negative opinion that developed in the society about 2013 curriculum through twitter media. Ensemble of several features will be used for classifying the polarity of tweets. One of the previous work conducted by [4] was using several statistical and semantic features including textual features, twitter specific features, lexicon based features, Parts of Speech (POS) features, and Bag of Words (BOW) features alone only give 73.8% accuracy. Meanwhile, the ensemble of features can improve the accuracy to become 87.7%. The use of this ensemble feature also give better accuracy than other features like unigram + bigram, propagation label, sentiment topic feature, sentistrength, meta level features, and semantria (online system).
In this study, we will explore the use of K-Nearest Neighbor (KNN) for the classification task. K-Nearest Neighbor (K-NN) is an algorithm that classifies objects based on learning data that resembles the closest resemblance to the object [5][6]. In a previous study conducted by [7], K-NN yielded the highest accuracy value when compared with Naive Bayes and Term Graph. The average accuracy result is 98.95% for K-NN method, 62.66% for Naive Bayes and 98.72% for Term Graph. Therefore, K-NN would be more suitable to use for this task.

RESEARCH METHOD
This section describes the steps in the sentiment analysis system. The main workflow of the system can be seen in Figure 1. As shown in Figure 1, the first step conducted in this system is taking a tweet that entered by the user and then standardization of words is conducted. This standardization is the purpose of this standardization is to convert non-standard words into standard and to correct spelling errors. The next step is features extraction. Some feature used in this work including including textual features, twitter specific features, lexicon based features, Parts of Speech (POS) features, and Bag of Words (BOW) features.
The detailed features can be seen in Table 1. For POS fatures, we utilize kateglo API to get POS tag of each words. We also use data from previous research for lexicon of positive and negative words, emoticons, data dictionary word amplifier or intensifier word by [8]. We also use dictionary of non-standard or slang language by [9]. Special for the BOW features extraction, preprocessing generaly should be conducted first before the extraction begin [10]. This preprocessing step including tokenization, filtering, and stemming. In the tokenization process, each documents is splitted into smaller units called token [11]. In this step, all letters are converted into lowercase and some characters like punctuation, numbers, and HTML tags are also removed [12][13]. In filtering, uninformative words are removed based on the existing stoplist by by Tala [14]. The last process in preprocessing is stemming or restoring every words to its root [15][16]. In this case, we use Sastrawi Stemmer. The last stage is sentiment classification using K-Nearest Neighbor. This stage output is test data category wheter they are positive or negative. For the term weighting method, we use TF.IDF since it is a very poplar method and generally gives very good performance on classification task [17]. The neighbor proximity calculation in this study is using cosine similarity instead of Euclidian distance. Based on the previous works [18][19], cosine similarity gives performs very well on NLP task.

RESULTS AND ANALYSIS
The dataset used in this study is obtained from twitter. A total of 200 tweets containing the keyword 'Kurikulum2013' were taken. Of the 200 data, 100 data are positive tweets and the other 100 are negative ones. The category of the tweets is annotated manually by an expert. Datasets then be divided into training data and test data. A total of 150 tweet data were used as training data (75 positive categorical data and 75 negative categorical data) and 50 as test data (25 positive categorical data and 25 negative categorical data).
In this study, several experiments are conducted and the results are analyzed. The first experiment is to determine the effect of k value of K-NN to the accuracy of sentiment analysis system. The next experiment to explore the use of the BOW features, the ensemble features without BOW (textual features, twitter specific features, lexicon-based features, and POS features), and the combination of them all.

K Value Experiment Result anad Analysis
The first experiment is to analyze the effect of k value of K-NN to the accuracy of sentiment analysis system and determine which the k value of K-NN that has the best accuracy value. In this experiment, the features used are the complete ensemble features. The experiment is conducted using several values of k started from 3 to 31. The experiment result displayed in Figure 2. The result showed that when the value of k was too small, for example the value k=3, the classification accuracy could not reach the maximum point because there are some relevant data that are not involved in the category voting by K-NN. However, when the value of k was too big, for example when the k value was more than 13, the accuracy decreased slowly because there are many irrelevant data that had been involved in the category voting. The best accuracy value is obtained when k=5 with 96% accuracy. Therefore, this best value of k would be used for the next experiment.

Ensemble Features Experiment Result anad Analysis
This experiment aim to analyze the use of ensemble features. In this experiment, we compared the use of the the BOW features, the ensemble features without BOW (textual features, twitter specific features, lexicon based features, and POS features), and the combination of them all. The experiment result displayed in Figure 3. tweet data never appeared in the training data. This shows that the use of this feature is highly dependent on word statistics contained in the training data. The ensemble features without BOW (textual features, twitter specific features, lexicon-based features, and POS features) had slightly better performance than only involving BOW features. The accuracy value was 82%. This feature is very dependent on the dictionary or lexicon used. Words on test data tweets that have not been well-recognized or not contained within the lexicon will affect the feature's value so that it impacts the classification result.
The complete combination of all features sets perform the best accuracy by 96%. There is an improvement compared to the previous features. By combining all of the feature sets, it can cover the weakness of each features sets and get the best out of them.

CONCLUSION
In this study, we built sentiment analysis of 2013 curriculum using K-NN and ensemble features. Various test scenarios have been conducted to specify the effect of k value and the effect of feature combination on sentiment classification accuracy. The value of k is very prominent in the accuracy of the K-NN method, the best value k obtained when k was 5 with the accuracy of 96%. The k value that is too small causes the accuracy obtained has not reached the maximum point otherwise the k value too much will cause the accuracy to decrease.
Apart from the k values, feature combinations also have significant significant influence in improving the accuracy. Combining BOW features and other features including textual features, twitterspecific features, POS features, and lexicon-based features can improve the accuracy compared to only using independent features. Incorporating this feature can cover the weaknesses of each feature sets and and get the best out of them. The best accuracy gained by combining all features sets reaches 96% accuracy value.