Framework for opinion as a service on review data of customer using semantics based analytics

ABSTRACT


INTRODUCTION
With the increasing competitive world of commercial market, it is essential to maintain maximum competitive products or services in the market. Customer opinion is one such mechanism to directly understand the personalized opinion associated with every products/services [1]. Generally, such opinions are always in the form of text in a defined field. In this regards, opinion mining plays a critical role as targets to solve the issue connected with the varied opinion in order to enhance the service offers that in turn will positively affect service quality [2]. It is also seen that opinion mining, emotion analysis, and sentiment analysis are interchangeably used in many studies [3]. The concept of sentiment analysis has a distinct difference as compared to opinion mining. Sentiment analysis deals with investigating the feeling or emotion of the user for a defined product or services while the outcomes are normally binarized i.e. positive or negative [4]. But opinion mining is all about investigating the depth level to extract the factors that are responsible for the feelings or the emotion of the user for a defined product /services. Sentiment analysis can be also considered as an extension of knowledge discovery process and it uses computational linguistic-based methods for obtaining the opinion of the user [5].
The essential information that are used in opinion mining is obtained from different sources e.g. blogs, news websites, social networks, review-based sites, etc. in accordance with the polarity [6]. However the classification of the polarity is carried out using positive or negative or neutral score. At the same time, it is feasible that the nature of the text could be either subjective or it could be also objective. As opinion mining is gaining importance in existing system, therefore, it is imperative to emphasize over the inherent challenges in this area. The first challenge in opinion mining is to manage the level of specification integrity which is because of presence of various spam messages, opinion from non-expert, bias opinion, credibility of opinion, gap between opinion topic and keyword used to search it, typographical errors, irrelevancy of opinion, and dependency of domain under sentiment analysis. The second challenge in opinion mining is connected with the level of word used where the problems are mainly related to the absence of contextual information, issues of word orientation, usage of verb and adjective in the form of opinion words, and adoption of orthogonal words. The third challenge in opinion mining is associated with level of language or sentences which is mainly caused due to usage of different linguistic, usage of multiple writing styles, restriction of filtering classification, product opinion etc. There are many existing tools to carryout opinion mining e.g. Opinion Observer, Opinion Finder, Natural Language Toolkit, Apache OpenNLP, LingPipe, WebFountain, Review Seer Tool, etc. [7,8].
There is no doubt of many number of dedicated research work being carried out towards enhancing the performance of opinion mining [9,10], however, they are mainly localized to perform a complicated mining operation over a static data. With the evolution of cloud computing and services offered by it, the scale and size of the opinion data is exponentially increasing. This generates a highly unstructured opinion data that is quite challenging to perform many further processing and they cannot be stored in conventional storage units too. Therefore, the contribution of the proposed study is to present a framework that can perform a novel opinion mining such that its output of extracted knowledge could be offered as services. The proposed system uses a light weight design methodologies for this purpose and the discussion has been carried out towards it. The organization of the paper is as follows: Section 1(a) discusses about the existing literatures where different techniques are discussed for detection schemes used in power transmission lines followed by discussion of research problems in Section 1(b) and proposed solution in 1(c). Section 2 discusses about algorithm implementation followed by discussion of result analysis in Section 3. Finally, the conclusive remarks are provided in Section 4.

a. Background
This section discusses about the recent work being carried out towards sentiment analysis over the opinion of varied forms using different approaches. Study towards opinion mining over the social network is carried out by Riquelme et al. [11]. Existing system was also recorded to use ontology-based approach for carrying out opinion mining as seen in the work of Siddiqui et al. [12]. The study has also used semantics using mathematical modeling towards facilitating feature engineering. Study towards sentiment analysis was carried out the Xu et al. [13] where memory-based approach is used along with the classification approach of neural network. Similar neural-network based approach was also implemented in the study of Yu et al. [14] who have carried out extraction of opinion terms with an objective to find correlation between different tasks.
The work of Huynh et al. [15] have used big data approach and convolution neural network over opinion mining from social network using Vietnamese language. The work of He et al. [16] has used binary mechanism for classifying opinion from online network in order to identify potential node. Classification problems associated with opinion from social network was investigated by Fernandes et al. [17]. Consideration of metadata could further enrich the analysis process that is proven in the work of Kren et al. [18] where opinion are extracted from the multimedia content where five different classification is carried out over the data. A specific focus over influential node and its possible connection with the social network is carried out by Mohammadinejad et al. [19]. The study has also used consensus factor for better modeling perspective.
The work of Mullick et al. [20] presents a technique to extract particular feature from the social network using learning-based technique. Usage of genetic algorithm was seen in the work of Iqbal et al. [21] where the focus was over reduction of the feature. A discrete model was constructed for computing social opinion by Wu et al. [22] where voting-based approach was used for performing predictive mining analysis. The work of Zhang and Zhong [23] has carried out mining of the trust factor connected with two individual using sentiment analyses using shortest path. AskariSichani and Jalili [24] have investigated over complex network for formulating opinion using maximum-a-posteriori process considering identification of influential node. Lv et al. [25] have used Markov modeling in order to perform opinion mining over essential data resources. The study has also utilized belief propagation algorithm for this purpose. Zuo et al. [26] have implemented aspect-based knowledge extraction process using Dirichlet processes along with the involuntary indexing mechanism for differentiating multiple opinions. Zhou et al. [27] have considered language specific opinion mining approach where a co-ranking algorithm is presented. The work of computing sentiments was carried out by Jiang et al. [28] considering news dataset from social network using semantics. Adoption of latent dirichlet allocation towards sentiment analysis was seen in work of Zhang and Chow [29] while consideration of human-based interaction was seen in work of Clavel and Callejas [30]. The next section briefs of the cumulative open end issues in existing approaches towards opinion mining. b. The research problem The significant research problems are as follows: -Existing approaches uses complex mining approach which induces delay in processing that can be applied for offline analysis and not for online analysis. - The problem of data transformation that is essential prior to storing the data over cloud considering the various fields of opinion is never considered in any prior work. -Adoption of machine learning approach towards optimization has higher dependencies over training data which is highly time consuming in the process of yielding outcomes. -Majority of the existing mining approach are resource dependent and not much distributed where chances to apply an effective transformation technique is challenging. Therefore, the problem statement of the proposed study can be stated as "Developing a framework towards cost effective mining of the opinion data using semantic-based sentiment analysis is quite a challenging work". c. The proposed solution The core goal of the proposed system is to develop a novel framework that could facilitate processing the unstructured opinion from e-commerce application in order to carry out extraction of knowledge. The extracted knowledge is delivered in the form of service and hence the framework is coined as opinion data as a Service. The scheme of the service as per the proposed system is diagrammatically shown in Figure 1.

Algorithm
Mined data The main focus of the implementation of the proposed system is that it offers opinion-as-a-service. In order to do this implementation, the proposed system presents a comprehensive framework of highly distributed scheme where multiple forms of unstructured data is aggregated unlikely the existing system. The prime emphasis is to offer a discrete transformation of the unstructured opinion data and make it highly structured so that it can be effectively stored in the distributed cloud storage units. The proposed scheme in Figure 1 enables acceptance of different types of opinion data from the e-commerce application which are in the form of text. The study considers that there are possibilities of massive number of opinion in large scale from distributed system which is not feasible to be stored in cloud units effectively. Therefore, the proposed system utilizes a novel indexing mechanism which is used to construct and retain a definite metadataof the incoming stream of data.
All the incoming stream of the data is maintained in the form of temporary buffer where the samples of the stream are forwarded to the algorithm over cloud in order to construct a progressive transformed data. The algorithm running over the cloud clusters are developed in order to carry out following operation sequentially viz. i) the algorithm initially perform aggregation of distributed data with primary indexing, ii) the algorithm extracts all the indexed values from the incoming data index in the form of file objects with multiple attributes, and iii) the algorithm also applies a semantic-based approach in order to perform a correlation-based extraction of the mined data in the form of knowledge. One of the interesting contributions of the proposed study is that it offers a user-friendly delivery of the knowledge from the mined opinion. Another essential contribution of the proposed study is that it doesn't store up the raw data of opinion text but it retains only the mined information. All the processed and intermediate data are stored in  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 10, No. 5, October 2020 : 5453 -5461 5456 a temporary buffer system which is released from its memory once the final outcome in terms of knowledge is released to the user as the response of query generated by the user. The next section discusses about the system design involved in the proposed study.

SYSTEM IMPLEMENTATION
The core agenda of the proposed system is to design and develop a mechanism that is capable of extracting the knowledge from the online log files of the customer opinion. The implementation of the proposed analytical operation is carried out using a novel semantic-based concept over the opinion data. This section overviews different essential information associated with the system implementation with respect to assumption-dependencies, implementation strategy, and execution flow.

Assumption and dependencies
The primary assumption of the proposed system is that there is a dedicated enterprise application that could pull out the customer opinion data from different electronic commerce portals/applications from a specific region of customer opinion and not all the data. The secondary assumption of the proposed system is that all the customer opinion data could be related to different commercial domain connected with varied product or services. The tertiary assumption of implementation is that all the customer opinion data is in the form of text explicitly. Moreover, the study assumes that there are good possibilities of the data to be highly unformatted as well as unstructured owing to the distributed traffic system over cloud as well as peak traffic condition owing to massive concurrent users. The prime dependency of the proposed system is that a specific domain of customer opinion is required in order to perform accurate assessment although the implemented model is capable of analyzing any data from any commercial domain of customer opinion.

Implementation strategy
The actual idea of the proposed implementation is to perform online (or instantaneous) analysis of the incoming streams of data (customer opinion). However, this opinion originates from multiple sources of the e-commerce applications which are required to be gathered in one place without which performing analysis is not feasible to be carried out. Therefore, the proposed system develops a distributed environment in order to take the input of data from its distributed origination point. The next step is to carry out distributed data aggregation with proper indexing of an incoming stream of data followed by identification of the core file objects. The next process is to extract the header files, which represents the field of the essential value of the customer data and reposit it over the cloud distributed storage. Finally, the proposed system develops a novel semantic indexing that is used for correlating the values of the customer opinion data in order to extract the knowledge from the indexed aggregated data. The complete process is highly sequential and hence the execution is highly progressive leading to faster process of transforming. Hence, a simple and faster process of analyzing the customer opinion data is obtained whose outcome is offered as a service.

Execution flow
The proposed system is implemented using divide and conquers rule where the complete problem of extracting specific knowledge from the customer opinion data in the form of text over highly distributed and collaborative system. The complete execution is carried out using three sequential algorithms viz. i) Algorithm for distributed data aggregation, ii) algorithm for obtaining indexed values, and iii) algorithm for semantic-based knowledge extraction. The discussions of the algorithms are as follows:

Algorithm for distributed data aggregation
It is to be noted that proposed system is actually an analytical design where it is essential to develop a highly distributed terminals of generation of customer data. Therefore, this algorithm is basically responsible for constructing a distributed environment that can facilitate data aggregation in order to obtain indexed aggregated data for better data transformation leading to effective analysis of distributed data.

5457
This algorithm takes the input of n (selected opinion-files from stream) that after processing yields dagg (aggregated data). The algorithm constructs different terminals in distributed order that takes the input of various opinion file and thereby act as the source of origination point of data n (Line-1). The next process is to apply a function f(x) to the individual data di from each source point n (Line-2). This operation is used for indexing all the incoming stream of data which is maintained in the form of matrix dindex (Line-2). The next step is to further apply a function g1(x) for all the individual data di considering the generated data index dindex in prior step to construct aggregated data dagg (Line-3). Finally, a discrete function g2(x), which is refinement of g1(x) generates a file object of the data with its index to finally confirmed the indexed aggregated data dagg (Line-4). The flow of the process involved in proposed algorithm is as follows. The Figure 2 illustrates process flow of algorithm for distributed data aggregation.

Aggregated data
File objects d agg Figure 2. Process flow of algorithm for distributed data aggregation

Algorithm for obtaining indexed values
This algorithm performs an essential operation that is related with the further indexing operation of the incoming aggregated data with respect to essential attributes of the file objects. Without this algorithm, the system will not be able to perform indexing of explicit area of the corpus present in the discrete file objects that will be further subjected for analysis. eobjc(dagg,m) 3.
γfobjextract (val (φ+1) ) 6. End 7. Store αfobj, βfobj in cloud 8. tbuffindexed(γfobj) End The system constructs an object of the entire file (individual aggregated data dagg) and performs further processing. The algorithm takes the input of m (number of file objects) that after processing yields tbuff (memory with indexed value). Each file object consist of an explicit fields in the form of headers and separator that are required to be stored in a cloud storage template so that it doesn't have to store the same everytime for the incoming data. The algorithm applies a function c(x) for the aggregated data with respect to number of file objects m in order to generate an encoded object eobj of the file object (Line-2). The encoded version of the file objects allows more user-friendly representation of the corpus of opinion data. The algorithm than performs extraction of all the necessary fields in the individual file objects.
The first attribute within the file object is header files αfobj where the encoded objects are extracted and retained with respect to discrete identity of encoded data encID (Line-3). The next part of the algorithm is about extracting the separator φ used for distinguishing the header object and encoded value of the respective header object βfobj (Line-4). Finally, the algorithm extract the value which is immediately after the separator object and it is stored in a different matrix γfobj (Line-5). A closer look into the algorithmic steps will show that objects like αfobj and βfobj are constants for every incoming data stream and is not required to be processed everytime and hence, these two attributes are directed keyed in the cloud storage system which will be further used for only indexing the respective value-based file objects γfobj. Figure 3 shows process flow of this algorithm.  Figure 3. Algorithm for obtaining indexed values

Algorithm for semantic-based knowledge extraction
This algorithm is responsible for extracting the knowledge from the third file object i.e. γfob obtained from prior algorithm using semantics. The idea is also to construct a very simplified semantics that uses correlation-based approach to extract the significant corpus attribute from the values of third file object as a representation of knowledge. The potential benefit of this algorithm is its applicability towards any domain of products/services of customer opinion which eliminates any chances of dependencies of lexicals unlike existing approaches of mining. This algorithm takes the input of tbuff (memory with indexed value) for giving an outcome of h (core-semantic) (Line-1). The algorithm constructs various semantics h1 which are all possible representation of corpus attributes (Line-2). These attributes are modifiable and the algorithm constructs further checks for all h2 which are basically used for separating each corpus attributes in distinguished way and stored back in matrix y (Line-4). The algorithm then performs string comparison for all the y matrix with the partitioned corpus attributes in order to find the appropriate semantic h (Line-6 and Line-9). The Figure 4 presents the algorithm for semantic-based knowledge extraction.

RESULTS ANALYSIS
The proposed system uses the standard dataset of product review data from Amazon that is frequently used for sentiment analysis [31]. The format of the data is reconfirmed to be taking the shape of big data by referring publically available big data [32]. Hence, a synthetic data is constructed for opinion mining. The dataset consists of 5500 plain text files with multiple reviews of specific products. This dataset is taken as an input for the implementation of the proposed system that is carried out in MATLAB. The study of the proposed system is compared with the standard mining approach e.g. supervised learning and dictionary-based learning approach.
The analysis is carried out with respect to response time, memory consumption, and accuracy as the performance parameters. The study outcome shows that the proposed system is comparatively better than existing approach of opinion mining for sentiment analysis. The prime reason behind better response time is that proposed system doesn't have any dependency to carry out iterative training operation unlike the existing approach. Moreover, the complete process is highly progressive and not recursive which reduces the cumulative time of operation as shown in Figure 5(a). It can also be seen that proposed system also offers lower memory consumption in contrast to existing approaches of opinion mining. It is because of the fact that proposed system make use of temporary memory system where the processing takes place and only the final mined data is stored. This phenomenon not only optimize the storage space in data center over cloud but also offers faster processing of the queries towards leading to output of mined data as shown in Figure 5(b). Apart from this, the proposed system also offers higher accuracy score towards mining performance as it makes use of context that are formulated from the semantics used in the proposed system as shown in Figure 5(c). These performance outcomes of the proposed system offers a robust confirmation that in order to relay the services in the form of mined opinion. Another interesting point of the outcome is that it is capable to mine any form of products or services and hence they are highly applicable for heterogeneous data too. It also offers a flexibility to perform mining based on specific domain of query system e.g. degree of satisfaction, products in higher demands, customer buying behavior, overall utilization score of products/services undertaken by the user. Hence, the proposed system offers a highly cost effective computational model that is capable of delivering opinion as a service with higher accuracy.

CONCLUSION
Opinion mining is gaining proliferation owing to its importance in almost every field of product and service delivery. From the past decade, there has been various research work carried out towards opinion mining and sentiment analysis, however, they are all concerned about centralized system of applying analytics. Inclusion of distributed computing leads to various challenges in the mining process as well as in data aggregation process too. Therefore, the proposed system introduces a novel framework with following contribution viz. i) the proposed system is capable of synchronizing all the incoming streams of data of opinion and perform a structured aggregation, ii) the proposed system can also perform a perfect indexing of all the file objects followed by extracting entered values of direct representation of opinion, and iii) proposed system applies a very unique mechanism of semantics-based knowledge discovery process which is not only fast but also makes the design free from usage of lexical unlike any existing mining approach.