Personal customized recommendation system reflecting purchase criteria and product reviews sentiment analysis

Received Jul 31, 2020 Revised Sep 27, 2020 Accepted Oct 13, 2020 As the size of the e-commerce market grows, the consequences of it are appearing throughout society. The business environment of a company changes from a product center to a user center and introduces a recommendation system. However, the existing research has shown a limitation in deriving customized recommendation information to reflect the detailed information that users consider when purchasing a product. Therefore, the proposed system reflects the user‟s subjective purchasing criteria in the recommendation algorithm. And conduct sentiment analysis of product review data. Finally, the final sentiment score is weighted according to the purchase criteria priority, recommends the results to the user.


INTRODUCTION
Recently, the purchase form of consumers is gradually changing from offline or store visit to online purchase. There is data published by the Bank of Korea that demonstrates this change. According to the data, consumption in the online sector outpaced consumption in the offline sector [1]. In the past, purchasing products online was considered to be a disadvantage because of the inconvenience of not being able to see or touch them. But, recently, they are overcoming shortcomings by referencing product reviews of other consumers in their purchasing activities. In fact, research has shown that these product reviews have significant benefits for consumers [2][3][4][5].
In addition, in the online shopping environment, a lot of services provided for the convenience of users are introduced, and one of them is a recommendation system. The recommendation system can help users to choose from tens of thousands of products that are right for me and help them save time and effort. Therefore, research on the recommendation system is being actively conducted. A widely used technique for building a recommendation system is collaborative filtering. Collaborative filtering is a technique that recommends items that are expected to be of interest to new users based on their preference information. In other words, it does not require extensive data on users and items because they recommend items based on other users" preference information. And there is an advantage that it is free from problems caused by the limitation of the accuracy of the user profile and item characteristic information [6][7][8].
In the existing collaborative filtering recommendation system, the utilization of the rating, which is quantitative information, which is easier to process, was high. However, as questions about whether rating information accurately reflects the user"s preferences are raised, users are paying attention to the recommendation system that uses user-written product review data. In fact, research has shown that user"s text review plays an important role in building recommendation system [9][10][11][12][13][14]. Under these influences, research on analyzing product review data and reflecting it in the recommendation system is being actively conducted [15][16][17][18][19][20]. Recently, research on emotional reviews of products and combining them with recommendation systems have been actively conducted [21][22][23][24][25].
However, the existing online shopping mall structure shows the same product list despite the different tastes and purchasing criteria of the same consumer. In addition, the existing recommendation system has a limitation in implementing a personalized recommendation system because it does not grasp the consumer"s specific consumption propensity and purchase reason. Therefore, in this study, the user can choose the attributes of products that are most important when purchasing products and prioritize them. The personalized product recommendation list is proposed by analyzing the product review data and applying the weighted value according to the priority selected by the user.

SYSTEM ARCHITECTURE 2.1. Architecture design
The proposed system collects data by crawling information about products that the user wants to purchase and product reviews about the products. The collected review data is pre-processed and filtered to be used for later emotional analysis. The pre-processed review data is stored in the "crawl_review" table, and information related to the product, namely the product name, the purchase page URL, and the number of reviews registered in the product are stored in the "crawl_product" table, respectively. Sentiment analysis is conducted to quantify the emotions of users included in the reviews by using the review data, which is qualitative data. In order to proceed with emotional analysis, emotional dictionary is required. Existing sentiment dictionaries are not divided into specific domains or product groups, and because they are integrated, there is a limit in analyzing emotions. Therefore, in this study, emotional analysis is conducted by constructing an emotional dictionary for each attribute of a product.
And the review data for the product attributes selected by the user is scored according to positive (1) and negative (0) ratings. The final sentimental score is derived by applying different weights according to priorities. The list of final recommendation products is completed by the top 10 products by descending order based on the final sentimental score. Therefore, by building a sentimental dictionary for each product attribute, analyze the review data, by applying the priority of the user"s purchase criteria to the emotional analysis, it can be a customized recommendation service reflecting the user"s purchase taste and preference. Figure 1 shows the system architecture of the proposed system.

System flowchart
The user may select and prioritize purchase criteria and product attributes while searching for products. The system performs a repetition of selecting only products having a number of product reviews of 20 or more on the product list page searched by the user, and stores the selected product information and product reviews in a database. The reason for the 20 filtering criteria is to derive meaningful emotional analysis results. The review data stored in the database then separates each review written by one author. The nouns corresponding to the product attributes and the evaluation of the attributes are preprocessed by parsing them with adjectives and verbs, respectively. Then, the sentimental analysis is conducted with the emotional dictionary that is built in advance, and the score for each attribute is calculated. The top 10 product recommendation lists are generated as the final scores weighted according to the priority. Figure 2 shows the flow chart of the proposed system.

SYSTEM IMPLEMENTATION
The system building environment was based on the Windows operating system CPU-Intel i5-4690, RAM-8G. The programming language used is Python. The database was built using MySQL.

Crawling
This study collected data from domestic internet shopping malls using Python. The subject of the collection was selected as "air purifier" from the category of seasonal home appliances, and the crawl was conducted using the BeautifulSoup package and the Selenium tool. On the product listing page, you access the purchase page one by one from the first product and crawl the review. At this time, if the number of reviews registered on the product detail page is 20 or more, the product information and review data are stored in the database and if the number is less than 20, repeat the next product. As a result, the total number of product reviews collected was 12,632. Table 1 summarizes the number of product reviews collected by product attributes.

Pre-processing
Review data stored in the database is generally composed of at least 1 to as many as 10 sentences per review. To get a gender score, we filter only sentences that contain product attributes in the entire review. Then remove the special characters and stopwords. In order to build an emotional dictionary in the future, it is divided into one csv file for each product and stored. From the filtered reviews, extract nouns corresponding to the attributes of the product and adjectives or verbs indicating positive or negative evaluations of the attributes.

Creation sentiment dictionary by product attributes
For sentiment dictionary construction, we used the natural language processing machine learning API provided by Google. The product attributes to be used in the experiment were "shipping", "design" and "performance". The valuation method determines whether it coincides with the emotional direction established by the verbs and adjectives that express positive or negative opinions about the attributes of products tagged in the evaluation set. The evaluation scales were Precision, Recall, and F-Score, using the respective in (1).
(1) Table 2 summarizes the number of reviews used to build the emotional dictionary. Since each product has a different attribute, the attribute may be positive depending on the product, and sometimes negative, an emotional dictionary for each product is required. The number of review text data used in constructing the emotional dictionary used 120 attributes in all three attributes of "delivery", "design" and "performance".
When calculating the emotional score, the weighting method according to the priority was used as the rank order centroid (ROC). ROC is a ranking-based method that prioritizes the evaluation criteria in order of importance and calculates the weight of the second most important evaluation criteria based on the priorities. The weight w i is derived from (2). Where is the priority and is the number of criteria.

∑
(2) Table 3 summarizes the emotional score calculation process. If the priority is given to the "Shipping" and "Design" attributes among the attributes of the air purifier product, it is the process of calculating the emotional score of the review that says "I like delivery and I like the design." First, a word that is an attribute of a product is found in a review sentence, and a predicate that reveals an evaluation of the attribute is extracted. The positive and negative of the extracted descriptors are judged, and the final emotional score is derived by assigning weights for each priority.

DISCUSSION
As a result of collecting the review data, the product attribute with the highest number of product reviews was "shipping", The least attribute was design. The reason is that a product called air cleaner belongs to the category of electronic products and this is because when purchasing electronic products, attributes such as delivery and performance are considered more than design aspects. The reason why 100% of emotional classification performance of design attribute is shown is that emotional classification is well done because it is obvious that the attribute that reflects more subjective thoughts more than "delivery" or "performance" is obvious. In addition, all three attributes were constructed using the same number of 120 reviews, but the accuracy and recall were different. This is because the number of reviews used to classify positive and negative scores is different. In the case of the "Shipping" attribute, the accuracy and recall of the negative scores are 5% and 28.9% higher than the accuracy and recall of the positive scores, respectively. Table 4 is a table comparing performance of product attributes of the emotion dictionary. Table 5 summarizes the overall performance of the emotional dictionary.  In order to compare the result value according to the weight of each priority, the priority was changed to three attributes of "delivery", "design" and "performance". Figure 3 shows the top 10 products when each attribute is prioritized as 1 st , 3 rd and 2 nd . Figure 4 shows the top 10 products when each attribute is prioritized as 2 nd , 1 st and 3 rd . Figure 5 shows the top 10 products ranked at the top when each attribute"s priority is set to 3 rd , 2 nd and 1 st . Therefore, it can be seen that the rank of the product varies according to the priority weight. In addition, the common ranking of the products ranked in the top regardless of priority weight is that the number of reviews including product attributes of "delivery", "design" and "performance" is higher than the total number of product reviews.

CONCLUSION
As the communication between consumers and sellers becomes more active online and mobile, the company"s business environment is being driven from product to user. Consumers are moving beyond product promotion and marketing to product development as well as simply purchasing products. Recently, more and more companies are analyzing various opinions of customers on their products and services, focusing on their needs, and reflecting them in their management activities. And There is a research showing that customer reviews written by real consumers have a positive effect on the performance of a company. Accordingly, research is being conducted to reflect customer reviews in the recommendation system. Existing recommendation systems, however, provide a consistent and unified list of products, despite different criteria for selecting products. Therefore, there is a limit to acting as a recommendation system without understanding the user"s purchase intention.
Therefore, the proposed system utilizes detailed and reliable product reviews rather than ratings based on collaborative filtering techniques. In order to quantify the product review, which is text data, an emotional dictionary was established for each product attribute to conduct an emotional analysis. The proposed system differs from the existing recommendation system in two ways. First, the personality inclination such as purchase intention or preference of the user is reflected in the recommendation algorithm. In other words, when a user searches for a product, the user can select the attribute of the product which is most important when purchasing the product as a priority. Here, the priority selected by the user is applied with different weights when calculating the emotional score later. Therefore, the score is derived for each attribute that the user wants to consider first and is used as a recommendation list. Existing internet shopping sites offer a consistent list of products, even though each user has different purchasing criteria and priorities. Therefore, the proposed system can help users" purchase decision. Second, the emotional dictionary was constructed for each product attribute. There is a research on constructing Korean emotion ontology based on linguistic characteristics. In English-speaking research, there is a study that established a dictionary called SentiWordNet for emotional analysis. This dictionary is a vocabulary dictionary that assigns emotional scores to WordNet"s Synset based on ring diagram learning techniques. Some studies have applied dictionaries to Korean to construct dictionaries necessary for emotional analysis considering Korean linguistic characteristics. In addition, there is a study on the construction of domain-specific predicate affirmation/negative dictionaries. As mentioned above, many researches on the emotional dictionary have been conducted, but the emotional dictionary established by the attributes of the product is difficult to find. This study aims to reflect on the linguistic characteristics of Korean language and to analyze the sensitivity of the review in more detail in consideration of adverbs among the parts of the review sentence.