Emoji’s sentiment score estimation using convolutional neural network with multi-scale emoji images

ABSTRACT


INTRODUCTION
Recently, social media can be considered as part of our everyday life that allows people to express their feelings with freedom of choices, such as opinion, emotion, experience sharing, and so on.The number of social media users is growing significantly since many users use social media as a new communication channel to exchange information with each other.Their posts on social media can be either private or public depending on their preference in many different formats, such as text, images, voice, and video.The contents in social media can be either user-generate-content (UGC) or multi-format content, such as text with images, text with emojis, images with emojis, and text with images and emojis.Thus, there must be lots of information within one content that is quite difficult to analyze the feeling of a person who posted the content using traditional methods.Many researchers are interested in analyzing the sentiment from the contents and classifying whether the sentiment is positive, neutral, or negative.Emojis are graphical symbols that have been widely used to represent emotion in social media besides texts and images.Most of the existing emojis have been ranked and sentiment scores have been assigned to them; however, many new emojis have been released and they have never been ranked yet.Several existing methods rank an emoji based on the entire Int J Elec & Comp Eng ISSN: 2088-8708  Emoji's sentiment score estimation using convolutional neural network with …(Theerawee Kulkongkoon) 699 content or the name of an emoji, thus if the content or the name does not contain any word that represents emotion then sentiment of that emoji will be classified as neutral which might not always be true.It should be better if a new emoji can be classified as one of the ranked emojis and use its sentiment score to estimate the sentiment score of a new emoji.This sentiment score can then be used for sentiment analysis.
Research on sentiment analysis has been focused on developing new methods for analyzing sentiment from emojis.Several methods that enhanced sentiment analysis using emojis have been proposed.Alzubaidi et al. [1] proposed sentiment analysis on social media posts using machine learning algorithms with emojis to classify emotions into neutral and negative categories.Zhao et al. [2] developed a machine learning-based approach to perform sentiment analysis from emojis for different social platforms.Novak et al. [3] investigated the sentiment of emojis, analyzed their use in different contexts and cultures.Zhang et al. [4] used deep learning for sentiment analysis on large-scale data sets that achieved state-of-the-art results.Eisner et al. [5] proposed Emoji2vec, a method for learning emoji representations from their description, which could potentially improve sentiment analysis of new emojis.Additionally, Sun et al. [6] developed the sentiment analysis method that incorporated both emojis and text information for more accurate results.Barbieri et al. [7] also introduced a multi-modal sentiment analysis method that combined visual and textual features of emojis to improve the performance of sentiment analysis.Their research offered insights into the use of emojis for sentiment analysis and provided potential solutions for analyzing sentiment in multi-format content.A novel approach called Chinese emoji-embedding long short-term memory (CEmo-LSTM) model based on bi-directional LSTM (Bi-LSTM), that integrated emojis into sentiment analysis algorithms, achieved high accuracy in analyzing online Chinese texts, particularly during the corona virus disease 2019 (COVID-19) pandemic.By combining emojis and emoji usage with the sentiment analysis model, the model can effectively handle emotion mining tasks, providing a promising way to enhance sentiment analysis in micro-texts and online conversations [8].Chauhan et al. [9] presented a deep learningbased framework for detecting sarcasm in text, which incorporated emojis as an important feature.The multitask framework treated sarcasm as the primary task while considering emotion and sentiment analysis as auxiliary tasks.The SEEmoji MUStARD dataset was used for evaluation, and the proposed framework achieved state-of-the-art performance, outperforming existing methods in terms of accuracy and precision in sarcasm detection.Lou et al. [10] introduced the EA-Bi-LSTM model, which incorporated emojis into sentiment analysis for Chinese microblog posts.The model employed an attention mechanism to weigh words based on associated emojis that can effectively capture the influence of emojis on sentiment polarity.The model's effectiveness was compared to the baseline models and the results highlighted potential applications in enhancing sentiment analysis accuracy.
Besides, the new methods for evaluating emoji sentiment lexica and emoji resources have been presented.Milagros et al. [11] presented a gold standard dataset of manually annotated English Twitter data that can be used for evaluating emoji sentiment lexica generated from various online resources.The proposed unsupervised methodology involved correlation analysis between sentiment scores from different resources and the gold standard dataset.The results provide insights into the quality of emoji sentiment lexica and the potential for improving emoji-based sentiment analysis.Eunhee et al. [12] explored the impact of emojis on user engagement in brand-related user-generated content on Instagram by analyzing a dataset of 1,000 user generated content (UGC) posts.Many factors like the presence, number, and type of emojis used, as well as the valence of accompanying text were examined and it was found that emojis have a positive influence on consumer engagement, with emotional emojis being particularly effective.However, an excessive use of emojis may not lead to additional benefits and could overwhelm the audience.
Other methods for understanding emojis in communication and language processing have been proposed as well.Allan and Budd [13] investigated how alexithymia affects verbal and non-verbal expressivity in text messaging by focusing on the use of words and emojis.Their research suggested that emojis can compensate for verbal deficits in individuals with alexithymia, potentially helping them overcome communication challenges in computer-mediated settings.Scheffler et al. [14] examined the processing of emojis in sentence comprehension and their ability to replace words without affecting comprehension.Participants' reading times were measured during self-paced reading experiments, and their research demonstrated that emojis can be easily integrated into sentence interpretation, though certain types of emojis may exhibit visual ambiguity.Pfeifer et al. [15] investigated the influence of facial emojis on the emotional significance of text messages and subsequent text processing.They concluded that facial emojis can significantly impact perceived sender emotion and facilitate downstream text processing.However, the subjective nature of facial emojis may lead to varied interpretations of emotional meaning.
There are also other research topics on emojis, such as emoji prediction and sense disambiguation.Shardlow et al. [16] developed a semantic network for emojis to examine their multiple meanings or polysemous nature by analyzing a corpus of tweets and using a deep learning-based disambiguation algorithm.The proposed network identified separable information among different classes of emoji senses, paving the way for improving emoji prediction and disambiguation in natural language processing systems.
Lee et al. [17] introduced a deep neural network model called the MultiEmo that combined emoji prediction and emotion detection tasks.The model was evaluated using two datasets: the Twitter dataset for emoji prediction and the GoEmotion dataset for emotion detection.MultiEmo achieved superior performance compared to existing models, demonstrating its effectiveness in predicting emoji and emotion labels.The evaluation results provide insights into the relationship between emojis and emotions, offering interpretability for the prediction results and shedding light on the decision-making process of the model.Kaye et al. [18] investigated the impact of emojis on word processing and recall.Through self-paced reading experiments, the experiments examined whether emojis can replace words without affecting sentence comprehension and whether they facilitate the retrieval of complete lexical entries.The findings indicate that emojis can be integrated into sentence interpretation, but their use may exhibit visual ambiguity.Additionally, familiarity with emojis may influence processing efficiency.
Recently, the sentiment of an emoji can be analyzed from the name of an emoji or the content of the post; however, this sentiment analysis might not be appropriate if the name of an emoji does not contain any emotion or if the content contains a variety of emotions.An alternative method proposed in this research is to estimate the sentiment score of an emoji from its image.The main ideas are to classify a new emoji as belonging to the class of the most similar ranked emoji image and estimate the sentiment score from the classification results.Several methods have been proposed for image classification and convolutional neural network (CNN) is among the most widely used neural networks for image classification due to its ability to automatically extract features from images.The performances of CNNs were evaluated in various image classifications on the Modified National Institute of Standards and Technology database (MNIST) and the Canadian Institute for Advanced Research (CIFAR) CIFAR-10 datasets [19]- [21].Additionally, the comparison between different CNN models, including AlexNets, GoogLeNet, and ResNet50, was discussed with testing results on ImageNet, CIFAR-10, and CIFAR-100 datasets [22], [23] The use of CNNs for medical image classifications, such as diabetic retinopathy analysis and wound image classification have been explored in several literatures [24]- [26].Moreover, data augmentation techniques that were used to increase and balance the size of a dataset including transfer learning, web data augmentation, and different image transformation methods were also proposed [27], [28].
Burnik and Kneževic [29] utilized a CNN to classify emoticons with high accuracy despite a small dataset.Elastic transformations were employed for artificial dataset expansion, and background noise was added to enhance dataset variance and accuracy.They also suggested that additional improvements in CNN models could be achieved by collecting more data and exploring the use of generative adversarial networks for generating emoticons.Akter et al. [30] presented a system for detecting and categorizing hand-drawn emojis into eight distinct classes using a CNN model.A local dataset of 4,000 images was generated and 97% accuracy can be achieved.Their research provided valuable insights into image recognition and classification for hand-drawn emojis and had potential applications in social media platforms.
Noord et al. [31] proposed a method for enhancing image classification performance by utilizing both scale-variant and scale-invariant features with multi-scale images.Their approach was evaluated on a dataset of digital photographic reproductions of print artworks.The multi-scale convolutional neural network outperformed the single-scale CNN in terms of classification accuracy.They also suggested to incorporate a scale-invariant and scale-variant representation in CNNs to improve image recognition performance.Liu et al. [32] have found that deep CNNs applied to small datasets like CIFAR-10 could lead to overfitting.They proposed the visual geometry group (VGG) VGG-16 network which is the modification of VGG that could prevent overfitting and achieved an error rate of 8.45%.They also suggested that deep convolutional neural networks (D-CNNs) can be used for small datasets with appropriate modifications.He et al. [33] examined the training procedure refinements for image classification models using an ablation study to evaluate the impact of these refinements and concluded that combining them could improve the performance of various CNN models.Blot et al. [34] proposed a modification to the standard convolutional block of CNN to transfer more information layer by layer while maintaining invariance which could be achieved by using a MaxMin strategy with modified activation function before pooling to exploit positive and negative high scores in convolution maps.The modified CNN outperformed the standard CNN on two classic datasets which are MNIST and the CIFAR-10.
According to the existing literatures, sentiment score of an emoji has never been estimated from an emoji image.This paper thus proposes a new method for sentiment score estimation of an unranked emoji from multi-scale images using the convolutional neural network together with the proposed majority voting with probability (MVP) algorithm.An unranked emoji is classified by CNN in different scales and the sentiment score obtained from the classification of each scale is then used to estimate the sentiment score of an unranked emoji by the proposed MVP algorithm.The remaining of this paper is organized as follows.Section 2 presents an overview of the proposed method.Results and discussions are presented in Section 3. Section 4 draws conclusions and presents future work.

METHOD
Sentiment score of an unranked emoji is commonly estimated by classifying it as one of the most similar emojis from its name or Unicodes and use the sentiment score of a ranked emoji but this can cause the classification to be imprecise because some emoji names or Unicodes do not express any emotion, and they sometimes use different names even though they express the same emotion.This paper proposes a new method that can overcome these problems by using CNN and MVP algorithm to classify and estimate the sentiment score of an unranked emoji.In order to classify emojis from their images, emoji images were extracted from two sources [3], [35] before passing them through the proposed method.Figure 1 shows the proposed method that is divided into three steps: data collection, pre-processing, emoji image classification, and sentiment score estimation.

Data collection
Data collection was performed by extracting data from two different sources: the Unicode consortium website [35] and the emoji sentiment ranking [3].Emoji images were extracted from the Unicode consortium website using Python and the Beautiful soup library [35] and were stored in .csvformat.The web-derived emoji images are version 13.1 which contains approximately 1,816 classes of emojis, and each class contains seven emoji images from seven platforms which are Windows, Facebook, Apple, Twitter, Google, JoyPixels, and Samsung.For example, , are emojis from the same class but different platforms, the first emoji image is from iOS and the second one is from Windows platforms.Sample emoji images from different platforms are shown in Figure 2.These emoji images were used for emoji classification.On the other hand, sentiment scores of the ranked emojis were obtained from the emoji sentiment ranking [3] which contains the sentiment scores and polarities of 752 emojis as shown in Figure 3.

Emoji classification and sentiment score estimation
Emoji classification and sentiment score estimation was performed in two steps: emoji image classification and sentiment score estimation.In the first step, an unranked emoji was classified using the convolutional neural network with multi-scale images that could effectively classified emojis based on their visual features.In the second step, sentiment score estimation was performed using the proposed MVP algorithm that yielded better estimation especially when an emoji was classified as belonging to multiple classes.

Emoji classification using convolutional neural network
The main advantages of CNN are that the architecture of CNN can be modified to suit any particular problems [36]- [41] and its ability to automatically extract features from images.In this paper, CNN was used to classify unranked emojis as one of the most similar ranked emojis from emoji images instead of their names or Unicodes.The architecture of CNN used for emoji classification consisted of one input layer, three convolutional layers with max pooling operations, followed by two fully connected layers and one output layer.The first convolutional layer and the second convolutional layer consisted of 256 filters of size 7x7 pixels, the last convolutional layer consisted of 128 filters of size 3x3 pixels, while all max pooling operations used a 2x2 window and stride 2. For fully connected layers, there were two dense layers with 4,096 nodes and 613 nodes, respectively.
The CNN was trained using the batch size of 16, 70 epochs with early stopping to prevent overfitting, categorical cross entropy as loss function, Adam optimization algorithm as learning algorithm with the learning rate of 0.0001, and SoftMax as the activation function for the output layer as shown in Figure 6.The classification of each unranked emoji was repeated seven times with seven multi-scale datasets: three times for the data set of each individual scales, three times for three data sets obtained from the combinations of two scales, and one time for the data set obtained from the combination of three scales.The classification of each iteration returned the class of the most similar emoji with the probability of being in that class as the output which was used to estimate the sentiment score of an unranked emoji in the next step.

Sentiment score estimation using the majority voting with probability algorithm
After obtaining the classification results from seven multi-scale datasets, the sentiment scores of the classes of an unranked emoji in seven scales were used to improve the sentiment score estimation of an unranked emoji by the proposed MVP algorithm.The proposed MVP algorithm estimated the sentiment score of an unranked emoji in two steps.The first step estimated the sentiment score of an unranked emoji by using the sentiment score of a class that has majority vote among seven classification results.The majority vote means the sentiment score of the class that is returned as the output with the highest frequency is assigned as sentiment score of an unranked emoji.However, there might be cases when there was no majority, i.e., the frequencies of two or more classes were equal.This problem was solved in the second step by returning the sentiment score of the class with the highest probability of being in that class as the sentiment score of an unranked emoji.The proposed MVP algorithm is as shown in Algorithm where FreqClass is frequency of the classification results

RESULTS AND DISCUSSION
The proposed method was evaluated on seven datasets of multi-scale emoji images with seven different training sets of multi-scale images.The test images were divided into two sets: the first set consisted of the ranked emoji images and the second set consisted of the unranked emoji images.The first set was used to evaluate the performance of the proposed method which was then used to estimate the sentiment scores of unranked emojis in the second set.The first test set consisted of 613 emoji images that were randomly extracted from each class of the training set.The classifications of the test images in the first set were performed seven times using the same test images and the same CNN model with seven different training sets mentioned earlier.Thus, each iteration contained seven classification results with sentiment scores, and the final sentiment score was obtained by applying the proposed MVP algorithm to the sentiment scores obtained from seven classifications with multiscale training sets.
Figures 7 to 10 show examples of possible classification results that were obtained where the first column shows the test emoji image, the second column shows the classification results, the third column shows the Unicode common locale data repository (CLDR) short name of the class, and the last column shows the multi-scale classification results.Figure 7 is an example of an emoji that was classified correctly by the proposed method on all seven datasets with 100% accuracy.Figure 8 shows an example of an emoji that was classified as belonging to three different classes.In this case, the proposed MVP method was applied to determine the result, i.e., an emoji was finally classified as belonging to the face with tears of joy class with the highest probability of being in this class with the majority vote from 5 out of 7 votes.Figure 9 shows another example of the test image that is similar to the training images from many different classes, but the proposed MVP method can correctly classify it as belonging to the cat with wry smile class from its highest probability.The sentiment score of the class was returned as the sentiment score of the test image.
However, there are some cases that the proposed method gave the wrong results as shown in Figure 10.In Figure 10, the test image was classified as belonging to the angry face when it actually belongs to the pouting face.The reason that the classification result was incorrect is because the dominant features of  The classifications and estimations with seven multi-scale data sets were iterated nine times with different test sets.Table 1 shows the accuracies obtained from using each training set where the first seven columns show the accuracies from the classifications and estimations without using MVP algorithm and the last column shows the accuracies from applying MVP algorithm to classification results from seven multiscale data sets.It can be noticed that the proposed MVP algorithm yields higher accuracies for all iterations and the overall accuracy can be increased to 98.40%.The second test set consisted of 42 unranked emoji images that were not in any class of the training sets.The images in the second test set were classified using the same CNN architecture as the first test set and the classification and estimation were performed in the exact same way as the first test set.However, it is not possible to measure the actual accuracy because there is no ground truth data for the second test set; therefore, the accuracy of the proposed method was estimated by visual assessment.Figure 11 shows an example of an emoji that was classified correctly by the classifications from seven multi-scale data sets and the sentiment score of the face with tears of joy class was assigned as the sentiment score of the test image.However, there are some cases when the classification results are not all agreed.Figure 12 shows an example of an emoji that was classified as belonging to three different classes, but the proposed MVP method classified it as belonging to the neutral face class with the highest probability and majority vote which seems reasonable.Figure 13 shows an example of an emoji that was classified as belonging to the knocked-out face class with the highest probability and the majority vote from 3 out of 7.This might be because both the test image and the knocked-out face training image have eyebrows and have eyes that are different from eyes of other training images.Figure 14 is another good example of how the proposed MVP algorithm can be used to classify and estimate the sentiment score of an unranked emoji when CNN classified it as belonging to totally different classes from seven multi-scale training data sets and the final result was obtained by using the probability because there was no majority vote.Table 2 shows the classification and estimation results that were evaluated by visual assessment.The overall accuracy for each training set was calculated from the average of the accuracies obtained from repeating the experiments nine times.The features that were used for visual assessment include physical appearance, emotion, and color.It can be noticed that the classification results from using the proposed MVP method are about the same for all experiments whereas the classification results from using CNN without the proposed MVP algorithm depend on the training data.

CONCLUSION
Nowadays, social media has become an important communication channel and part of our daily lives where people use social media to express their feelings, opinions or share experiences.There are various forms of content that are posted on social media, such as text, images, video, and emoji.This paper proposes a new method that can estimate the sentiment score of any new emoji by classifying it from its multi-scale images using convolutional neural network and estimating the sentiment score using the proposed MVP algorithm.CNN is used to solve the classification problem when the name of an emoji does not indicate any emotion; whereas the MVP algorithm is used to improve the classification performance when the test emoji image is classified as belonging to more than one class.The performance of the proposed method was evaluated on two test sets.The first test set contained 613 ranked emojis that were extracted from the training set and the second test set contained 42 new or unranked emojis.For the first test set, the classification results of the proposed method yield higher accuracies for all seven multi-scale data sets and the overall accuracy increases to 98.40%.For the second test set, the classification results had to be evaluated visually and it can be noticed that the results are acceptable although there are some classification results that might be difficult to determine.For example, the classification result of the new emoji image whose features are so much different from the training images is not easy to determine whether the result is acceptable because it depends on individual judgement.Some emojis express ambiguous emotions which are very subjective.However, the overall accuracy for the second test set based on visual evaluation is approximately 51.06%.
In the future, it would be valuable to enhance classification accuracy by refining the proposed algorithm to better differentiate between visually similar emojis.Additionally, more efforts might be focused on proposing new features that can be used to classify new emoji images and a new method that can optimally estimate sentiment score.This could lead to a more robust emoji classification and sentiment score estimation.

Figure 1 .
Figure 1.Flowchart of the proposed emoji classification and sentiment score estimation method

Figure 6 .
Figure 6.The CNN architecture for emoji classification The first training set (S1) consisted of 613 classes of emoji images in their original size (72x72x3 pixels) with 127 images in each class for the total of 77,851 images.The second training set (S2) consisted of 613 classes of 55x55x3 emoji images with 127 images in each class for the total of 77,851 images.The third training set (S3) consisted of 613 classes of 35x35x3 emoji images with 127 images in each class for the total of 77,851 images.The fourth training set (S4) consisted of 613 classes of the combination of 72x72x3 emoji images and 55x55x3 emoji images with 254 images in each class for the total of 155,702 images.The fifth training set (S5) consisted of 613 classes of the combination of 72x72x3 emoji images and 35x35x3 emoji images with 254 images in each class for the total of 155,702 images.The sixth training set (S6) consisted of 613 classes of the combination of 55x55x3 emoji images and 35x35x3 emoji images with 254 images in each class for the total of 155,702 images and the last training set (S7) consisted of 613 classes of the combination of 72x72x3 emoji images with 55x55x3 emoji images and 35x35x3 emoji images with 381 images in each class for the total of 233,553 images.
ISSN: 2088-8708  Emoji's sentiment score estimation using convolutional neural network with …(Theerawee Kulkongkoon) 705 the test image, which are the eyes and the mouth, are very similar to those of the training image in the angry face class.

Figure 7 .Figure 8 . 7 Figure 9 .Figure 10 .
Figure 7.An example of an emoji image that was classified correctly from all scales

Figure 11 .
Figure 11.An example of an unranked emoji image that was classified into the same class for all scales

Figure 12 .Figure 13 .Figure 14 .
Figure 12.An example of an unranked emoji image that was classified by using majority vote and probability from the proposed MVP algorithm

Table 1 .
The classification and estimation accuracies from using the ranked emojis as test data

Table 2 .
The classification and estimation results from using unranked emojis as test data