Word2Vec model for sentiment analysis of product reviews in Indonesian language

ABSTRACT


INTRODUCTION
Since the rise of Web 2.0, the internet has become more user centric [1]. People are participating in making more and more content on the Internet through social media, discussion boards, Web forums, and blogs. Concurrently with such trends, an increasing number of websites where consumers can write and read reviews, and express their experiences, feeling, opinions, views, and complaints about various products and services has emerged [2]. From a consumer behavior perspective, it can be called as one of the greatest developments on the Internet.
Online platforms has become a source of greatly valuable information for both consumers and producers. In making purchase decisions, consumers often seek advice and purchase recommendations from others [3][4]. Previously, consumers commonly refer to advertisements in mass media to make this decision [5]. However, with the growth of e-commerce and increasing number of online review platforms, online reviews have become a reference for consumers they can rely on in finding information about the product to be purchased [6][7]. Consumers tend to learn how others like or dislike a product before buying. In fact, previous research found that consumers believe that online reviews provided by other users are more credible and trustworthy than the traditional sources [8].
For producer, online reviews can become a reference about what people think about their products or services to predict public acceptance level of their products. This information can help to forecast product sales. Furthermore, negative reviews can be the basis in product improvement and marketing strategies [9]. Therefore, understanding such sentiment and opinion information has become more and more prominent for both producers and customers. However, it becomes more and more difficult for people to understand and evaluate what the general opinion about a particular product in manual way since the number of reviews available increases. Hence, the automatic way is preferred.
Sentiment analysis, also known as sentiment or polarity classification, is a work of analyzing people's opinion or sentiment from a piece of text -for example to decide whether the sentiment is positive or negative [10]. We can consider sentiment analysis as text classification problem with sentiment as its classes. Nevertheless, sentiment classification is more challenging than traditional topic-based classification due to the necessity to extract more implicit information, instead of only keywords [11].
One of the most popular techniques is using machine learning approach. In recent years, sentiment classification using machine learning methods have been widely adopted and proven to provide supreme performance [12][13][14][15][16][17]. Prior research conducted by [10] also showed that machine learning techniques have quite good performance with SVMs tend to do the best. Two key issues in machine learning approach are how to extract complex features and finding out which kinds of features are more valuable [18]. Several feature extraction methods have been proposed such as single words [19][20], n-grams [21][22], lexicon [23], textual features [24], and many other new models [25][26][27]. However, semantic features have been infrequently employed in this field. Semantic features can disclose the implicit semantic relationships between words, which is should be useful for improving the sentiment classification performance.
Word embedding, also known as distributed word representation [28], is feature learning technique in Natural Language Processing (NLP) where words from the vocabulary are represented to low-dimensional vectors of real numbers [29]. By using word embedding, the semantic and syntactic information of words can be captured from a large number of unlabeled corpora [30][31]. Word embedding have been employed in many works in Natural Language Processing (NLP) to produce more effective word representations [32][33][34][35][36]. One of the most popular example of word embedding is Word2Vec model. Word2Vec [37] maps each words in the vocabulary into a dense vectors of real numbers using a shallow neural probabilistic language model [38]. By using Word2vec, words that similar will be close to each other in the embedding space [39].
In this study, we will explore the use of Word2Vec model for sentiment analysis of product reviews in Indonesian language. Word2Vec will be used as feature representation. For the classification task, we will use Support Vector Machine due its supreme performance. We will also explore the use of Bag of Word (BOW) model utilizing several term weighting methods including Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF.IDF).

RESEARCH METHOD
The general flowchart of the sentiment analysis system in this study is shown in Figure 1. There are three main stages in this system i.e. preprocessing, building Word2Vec model and classification using SVM. Each review will be classified into positive or negative class.

Preprocessing
Preprocessing is conducted before the main process begin. Some steps conducted in this stage including tokenization, case folding and cleaning [40][41][42][43]. In tokenization, each review is splitted into smaller units called tokens or terms [44]. Case folding is a task of converting all of characters in review text become lowercase [45]. Meanwhile, in cleaning, characters outside of the alphabet such as punctuation, numbers, and html tag is omitted. In this study, stemming and filtering are not conducted because in some previous studies, stemming and filtering cannot improve sentiment analysis performance.

Building Word2Vec model
After the preprocessing stage was done, we build word vector representation using Word2Vec. First, the Word2Vec model builds a vocabulary from training data. Then, it learns and determines the vector representation of each words. There are two training algorithms in word2vec, i.e. continuous bag-of-words (CBOW) and skip-gram [46]. In this study, CBOW is employed. In CBOW, the word vector is built by predicting each word cooccurance based on its neighboring words. The resulting word vector will be employed as the classification features. Word2Vec generally can help to improve classification performance because in Wor2Vec, the similar words have similar vectors.

Sentiment classification using support vector model
Finally, in the last stage, the reviews are classified into positive or negative class. In this study, support vector machines (SVMs) is used for the classification task. Despite its high computational complexity [47], SVM has become a popular algorithm in the last decade because of its excellent performance in text classification field [48].
Based on the representation of training data in feature space, SVM finds a hyperplane that separates the positive and negative data with maximum margin. Then, the testing data are then mapped into that same feature space and predicted to belong to positive or negative category based on which side they fall. In this study, we use linear kernel because based on the work of Mc Callum and Nigam [49], linear SVM has the best performance in text classification. The other benefit of linear kernel is that it is faster and require fewer parameters than other kernels in SVM.

RESULTS AND ANALYSIS
Experiment is conducted by using 772 product reviews extracted from FemaleDaily website. The text reviews and their ratings were collected and labelled manually from the website (https://femaledaily.com/). There are 386 reviews labelled as positive and 386 reviews labelled as negative. All of the reviews is in Indonesian language. Scikit-Learn [50] was used to implement the experiments. In the experiments, we compared the results of sentiment classification using Word2Vec with the other methods including Bag of Words (BOW) using Binary TF, Raw TF, and TF.IDF. We use 10-fold cross validation, which means the product reviews dataset is equally divided into 10 folds. We iterate the experiment 10 times. In each iteration, reviews from 9 folds were used as training data and the remaining one-fold was used as testing data. Average accuracy was used as the evaluation method. Experiment results can be seen in Figure 2.  The dataset used in this experiment can be said as small dataset. In a small dataset, Word2Vec cannot capture the the semantic and syntactic information of words very well. When Word2Vec learn the word representation, each word starts at random position in the vector space. The words will be moved closer into the position of words that similar to them gradually based on their neigbors in training data. If

CONCLUSION
In this study, we used Word2Vec model to represent the features for product review sentiment classification in Indonesian language. We used SVM for the classification method. We also compared the Wor2Vec based classification performance with Bag of Words features using Binary TF, Raw TF, and TF.IDF. In general, SVM can performs well on the sentiment classification. However, the Word2vec model have the lowest accuracy value than other method. This is because we only have small dataset to train the Word2Vec model. Word2Vec need large example to learn the word representation and place similar words into closer position. Otherwise, in a small dataset, there too many examples to move the words into the better place.
In the future work, we can use larger dataset to build the Word2Vec model. This dataset does not need to be labeled first as positive or negative. This dataset also does not need to be sentiment analysis dataset. We can use another dataset such as news, articles, wikipedia, and so on.