Sentiment review of coastal assessment using neural network and naïve Bayes

ABSTRACT


INTRODUCTION
The assessment of a place made by people who have visited or occupied the area is now often used as a benchmark for other potential visitors when considering whether to visit the area or not.The natural beauty of Indonesia, which is an archipelagic country, has a certain attraction for everyone.The charm of tourism which is currently the prima donna is the beauty of the coast.The beach has its charm, not only visitor can enjoy the beauty of the sea, but they can also play around the coast.The current beach review assessment is carried out via the internet, whether it's a website or social media, which is used as a benchmark by other visitors as reference material.The assessment of positive and negative sentiments provides input for decision support when assessing a place that has a coastline.The southern coast of the island of Java is currently being visited by many, this can provide many sources of information in the form of a review of coastal assessment sentiment, especially for the southern coast of Java Island, Indonesia.
Sentiment review is an intelligent-based technology that uses artificial intelligence to get the best model used in classifying positive, negative, and neutral sentiments.Sentiment review in its process uses machine learning (ML) as the main intelligence tool in analyzing the data used.Whether or not the machine learning method is used depends on the ML algorithm itself because the characteristics of each data differ from one another.Sentiment models are currently being developed from various fields such as sentiment product reviews [1], [2], movie and video reviews [3], sentiment comments on social media [4], [5], and reviews of tourist attractions and hotels [6], [7].Currently, many sentiment reviews have also been carried out in the government sector, namely sentiment in the field of public services [8].Neural network (NN) and naïve Bayes (NB) [9], [10] are machine learning algorithms that are currently widely used for sentiment analysis.It shows that these two methods are the best methods that can be applied to sentiment models [11].Several studies that are currently being developed using NN for sentiment models include those carried out for the Chinese car review [12].Other studies have proposed a graph neural network model for sentiment analysis [13].There is research that applies NN to hurricane sentiment taken from social media tweets [14].In other previous studies, naïve Bayes was widely used for product review sentiment [15], tourists, and hotel and restaurant services [16], [17].However, sentiment reviews that use this method discuss more the sentiment side of tourist attractions, hotels, and other services and are not focused on reviewing the place itself.Based on the research that has been done, this article proposes a sentiment review that is directly focused on the place itself, namely a review of the coast.
The weakness of the method developed using NN and NB is that the accuracy level is still not optimal, so this article is proposed to optimize the model to increase the accuracy of sentiment classification.The determination of the classification of coastal sentiment review carried out in this study is to find a model that can provide the best level of accuracy.This article proposes an optimization of the model using feature weights [18], [19] to get the best model accuracy value and increase the accuracy value.Based on its advantages, the particle swarm optimization (PSO) [20] and genetic algorithm (GA) methods are proposed for the optimization model, and the information gain (IG) method is also applied as an effort to increase accuracy.The contribution proposed in this article is the application of an optimization model to get the best weight value for the model on the neural network and naive Bayes to get the best accuracy level.

METHOD 2.1. Dataset and tools
The dataset used in this study comes from Indonesian language text data which contains words of review comments given by people who have visited or even seen the coasts in the southern region of Java Island, Indonesia.For now, the data obtained and used is a review of comments given for Pantai Teluk Penyu in the Cilacap Regency, Central Java Province, Indonesia which was taken between 2020 and 2022.Data taken from the dataset [21] contains the results of a review of comments on that place.Data labeling was carried out in this study by classifying data based on star ratings, where 1-3-star ratings were included in the "negative" category with 140 data comments and 5-star ratings in the "positive" category with 250 data.The process of data analysis in obtaining the best model in this study used RapidMiner studio software tools.

Proposed method and framework
This study carried out several stages of the experimental process to get the desired best model results.The experimental stage was carried out through several stages, namely, the data preprocessing stage, model application, model optimization stage, data validation, and the final stage model evaluation.This study classifies two categories of sentiment, namely "positive" and "negative".In the data pre-processing stage, the process of transform case, tokenization, and filtering of tokens is carried out.In the data pre-processing process, text document data is also transformed into weights using the TF-IDF [22] method using (1), where Wi,j is the weight of i and j values, tfi,j is the number of occurrences of i this is j, dfi is the number of documents containing i, and N is the total number of documents.
For the stages of applying the model in this experiment, two main methods were used, namely neural networks and naïve Bayes.At this stage, various ways were carried out to determine the combination of the parameters of the two methods.The data validation stage uses cross-validation [23], at this stage, the data is split between the training data and the testing data; 90% of the training data is determined and 10% is the testing data.The model evaluation carried out in this article is done by doing a comparison of all the proposed experimental models to show the model with the best level of accuracy.Figure 1 shows the framework of the research method proposed in this article.The performance value of the model proposed in this study is obtained by using the evaluation value of the confusion matrix [24], [25] as in (2), where TP is True Positive, TN is True Negative, FP is False Positive, and FN is False Negative.

Neural network and naïve Bayes
Neural network (NN) is a technique that is widely used in artificial intelligence, where NN is an algorithm whose way of working is to imitate the human brain with a computational element that mimics a neuron [26], [27].The neural network is a method that simulates the workings of the human brain and tries to identify the underlying relationships in a data set.NN has a way of working where each layer consists of an input layer, hidden layer, and output layer that are interconnected to produce the desired mode.
Naïve Bayes is an algorithm that belongs to the machine learning category that can be used for classification in sentiment analysis models and has proven that this method can provide models with the best level of accuracy.In addition, NB is very efficient for use in a classification model for multivariate analysis [28].The equation of the nave Bayes method can be seen in ( 3).In the naive Bayes equation, a probability of c being true given that x is true is sought.

Genetic algorithm and particle swarm optimization
Genetic algorithm (GA) and particle swarm optimization (PSO) are optimization algorithms that allow obtaining the best weights for the model we want [29]- [31].In this research, NN and NB are optimized so that they can have an effect, namely increasing the level of classification accuracy for coastal sentiment review.The PSO calculation process is shown in (4).

RESULTS AND DISCUSSION
The experimental results have produced several models with different levels of accuracy according to the method produced.In this study, the search for a model used the Core i7 computer specifications and 8GB of memory, in addition to data analysis, the RapidMiner studio 9.0 software based on the Windows 7 operating system was used for data analysis.This section of the chapter shows some experimental results from the proposed models according to predetermined scenarios, namely neural networks, naïve Bayes, and optimization models from the two methods.In this article, each model either has different or similar results depending on the experiments conducted.

Neural network experiment
The application of the neural network algorithm in this experiment is set by several predetermined parameters, to produce the best accuracy value, this is done by conducting experiments.The first stage of the process is to apply a classical neural network (NN), in this model optimization is carried out using only the  1 and Table 2, in this experiment the cross-fold validation parameter=10.The results shown in Table 1 and Table 2 show that the average level of accuracy produced is in the range of 74% to 76%.The model with the highest level of accuracy is 76.15% using the training cycle parameter=700 and momentum 0.9 using stratified sampling as shown in Table 1.This accuracy value is the same as the model generated using NN+IG by applying the learning parameter.rate=0.04,momentum=0.9, and training cycle=900 using stratified sampling as shown in Table 2.The experiments that have been carried out with the application of NN+IG using fold=10 are still not good enough because they are not in line with expectations.The desired level of accuracy is still below 80%, so it is deemed necessary to improve again.The next experiment was carried out by applying NN and IG using cross-validation fold=5, at this stage, an experiment was carried out to find out the difference when changing the value of different fold parameters.Broadly speaking, the results of this model produce accuracy values as shown in Table 3 and Figure 2

685
previous studies, namely using fold=10, which is around 76%.The accuracy level equation of the models that have been obtained gives the impression that the model applied is a classical neural network is not enough if it is only optimized using information gain, therefore other efforts by using the application of other optimization methods need to be pursued.

Feature weights in neural network
Efforts to increase the accuracy of the NN model are to optimize using feature weights.At this stage, the NN is optimized using the PSO particle swarm optimization and genetic algorithm (GA) methods.The performance value of the experimental results at this stage is shown in Table 4.In Table 4, the parameters specified in NN are learning rate=0.01,momentum=3, and training cycle=900.Data validation using cross-validation was set to fold=10 through the stratified sampling method.It can be seen in this experiment that the highest accuracy rate is 77.44% with the GA parameter set population=10, selection scheme=tournament, crossover type=uniform, with an analysis processing time of 31.06 minutes.The results in Table 4 on average are still below 80% so efforts are still needed to increase the level of accuracy.In addition, it appears that the best model produced is the model that requires more time to find.
The next experimental result is to optimize the model using PSO.The PSO method is applied to optimize the weight values of the neural network algorithm.The parameters specified in this PSO are using the parameter learning rate 0.01, momentum=0.3, and training cycle=700.The results of the accuracy value for the NN+IG+PSO model can be seen in Table 5, the highest accuracy value is 76.92%.Model evaluation using feature weights optimization is done by comparing the models that have been obtained based on the highest level of accuracy produced.Table 6 and Figure 3 show the performance differences of all optimized models, it can be seen that the highest level of accuracy is as follow.

Naïve Bayes experiment
The next research stage is to apply the naive Bayes (NB) method to the sentiment review model for coastal assessments.The NB model at this stage produces a level of sentiment classification accuracy that is still not optimal and by what is desired.The feature weights method on the NB model is applied using the PSO and GA algorithms.The experimental results obtained in the FS-based NB model are shown in Table 7 and Table 8.In Table 7, an optimization model using the genetic algorithm (GA), the parameter value is determined by using population=5.In addition, the highest accuracy value obtained is 86.61%where the GA parameters set are using selection scheme=tournament, crossover type=uniform, in addition to model validation using fold=9 with sampling type=linear.The same results were obtained in model number 7 with the highest accuracy of 86.81% using fold=9 with sampling=linear.
Other models that are slightly different provide the highest accuracy rate of 85.08% where the PSO model is assigned the parameters population=5, fold=9, and sampling=linear.The other experiments shown in Table 9 have determined the PSO model using the population parameter = 10 and there is a difference in accuracy, which is 87.11%, slightly higher than the previous model in Table 8.These results show that the difference in the value of the population parameter gives a performance that has a different accuracy value.

Evaluation model and recommendation
The search that was carried out to get the best model for the classification of sentiment review assessment of the coast of the southern region of Java in Indonesia with the best level of accuracy is the focus of this article.Various experiments are conducted to get the best model, including optimizing the model using feature weights (FS).FS methods such as genetic algorithms and particle swarm optimization are applied to the main algorithms, namely neural networks and naïve Bayes.Based on the results obtained, it was found that the model with the highest accuracy rate was 87.11%, which we named the NB_IG+PSO model.
The best model in Figure 4 that produces the highest level of classification accuracy is the one that use the naïve Bayes method based on information gain (IG) with PSO algorithm feature weights optimization.There is a slight difference between the models that use NN as the main method, but the results are not much different, the accuracy rate is only 0.5%.The difference occurs because apart from the different methods used, the parameter values set in each method are slightly different as well as the parameters set.
The experiments carried out and those that have been proposed in this article have not yet reached the perfect model.It is because several other efforts can still be made, especially during the data preprocessing stage.The process that has not been optimally carried out in this research is at the stage of the steaming process, because the data used is in the form of text in Indonesian, so there is a slight difference in the method with the existing one, which is in English.Future research can be a challenge as more in-depth research is needed to get the best model with a better level of accuracy.

CONCLUSION
A sentiment review model for assessing the state of the coast of the southern region of Java in Indonesia has been produced, namely by using the naive Bayes hybrid method based on information gain and particle swarm optimization and we call it the NB_IG+PSO model.The highest best model proposed in this article is 87.11%.The model obtained still has deficiencies in the process of finding the best parameter values, therefore efforts are needed to determine more precise parameter values.In addition, other methods need to be tested on this model to find out whether the proposed model can be optimized and whether there is an increasing level of accuracy.Other methods that can be applied for future research are by implementing support vector machines (SVM) and k-NN, as well as other methods that can be recommended according to the data type of the dataset since several methods do not match the characteristics of the data used.

Table 1 .
. The accuracy of neural network with IG using momentum parameters 0.9

Table 3 .
The accuracy of neural network with IG using momentum parameters 0.9 with fold=5 Figure3shows the best model of NN+IG, obtained by using the momentum parameter value=0.3 and training cycle=900 resulting in the best accuracy rate of 76.41%.This experiment uses a data validation method with stratified sampling.The results of experiments carried out using fold=5 are almost the same as Sentiment review of coastal assessment using neural network and naïve Bayes (OmanSomantri)

Table 4 .
Performance model NN using genetic algorithm

Table 5 .
NN optimization model performance using PSO

Table 7 .
Results of model experiments using GA-based NB Sentiment review of coastal assessment using neural network and naïve Bayes (Oman Somantri) 687

Table 9 .
Performance of model NN_IG+PSO by using population=10