Dynamic Hilbert clustering based on convex set for web services aggregation

ABSTRACT


INTRODUCTION
Web services can be defined as web application that communicates with other systems using standard Internet mechanisms such as hypertext transfer protocol (HTTP) and extensible markup language (XML) in order to provide the required service to the user [1]- [3].Web services are competitive technologies that allow e-business and e-commerce by offering clients quick, efficient, and dependable services through the internet [4].Web services are a kind of simple object access (SOA)-based distributed computing paradigm that can be accessed through HTTP.Web servers offer dynamic replies for all requests made via the internet [5]- [7].Web service architecture refers to the software system design for supporting interoperable communication from machine to machine via a network [1], [8].Web services have become one of the most important areas of technological development in recent years [4], [9].Web services represent the most usable communication protocol over the net.Web services communication is built on standard technologies such as simple object access protocol (SOAP), web service description language (WSDL), and universal description discovery and integration (UDDI) [10], [11].The SOAP protocol is a widely-used communication protocol based on XML Web service [12], [13].SOAP is a protocol for allowing different distributed computing systems to communicate with each other [14].Organizations are increasingly using web services due to the technological advantages of web services that provide interoperability and dynamic scalability [12], [15].The XML-based encoding of web messages has harmed the network's efficiency due to the high bandwidth that their excessive repetition of information has caused [16].Because of the large amount of XML data being sent across the network, consumers have suffered from bottlenecks and congestion [17].Most of the time, Network latency is a major issue that often causes service delays for end users [18].Particularly, SOAP messages are inherently large, necessitating greater bandwidth for web service requests and responses [2], [19].This fact has driven researchers to create solutions that minimize loads on networks and increase the speed at which services may be provided.One of the newest suggested improvements is a web-based messages aggregator, which works to combine several messages into a single message by removing duplicate content [20], [21].However, the aggregation's efficacy depends on the degree of similarity between aggregated messages.To provide high aggregation outcomes, a similarity measurement model is necessary.This demand would be supported by effective clustering, which would as well as enhance network-based web services to endpoints.
Technically, the proposed model has provided several major contributions in regard to the web service performance over the network.Firstly, it would improve web-based communication over the network by significantly reducing the required network volume.This fact would solve the network congestion and bottleneck significantly.Secondly, efficient web messages clustering has been achieved based on Hilbert similarity measurements.Finally, the proposed model would potentially improve web service response time as a result of the required low processing time.
An efficient evaluation strategy is established and executed by evaluating the effectiveness of the suggested model.This is accomplished by sending the results of its grouped message to the aggregator model developed by [17], [22] and examining its effects on processing times and compression ratios for network traffic reductions.Other clustering models, including k-means and principal component analysis (PCA) coupled with k-means, are treated in the same way since they serve as a standard for the suggested clustering method.Compared to other methods, the Hilbert clustering model has been shown to be superior due to its ability to achieve similar outcomes while using less processing time and generating significantly less network traffic.
The rest of this paper is structured as: section 2 explains the related work.Section 3 describes the proposed model.Experiments and results are discussed in section 4. Finally, section 5 presents the conclusion and future work.

RELATED WORK
XML message clustering is essential in web service environments and applications.Machine learning methods have often been used for clustering web services.Many researchers have technically presented different clustering methods for textual and XML documents.The primary focus of the proposed clustering models has been on clustering XML documents.Similarly, another major trend in clustering methods is focused on numerical data and hence cannot be used to text without first being converted to a similar binary form.
Hwang and Gu [23] presented a new approach to grouping XML texts based on the repetitive structure weight in XML texts, treating XML document is viewed as a transaction and its structures are extracted as transactional items.The suggested method focuses on the idea of identifying the large elements included in each XML document and then grouping them together based on their similarities.Evaluation metrics consist of XML tag and data item paths in the XML tree.A basic principle of the suggested method is to allocate new XML documents according to an average of the accumulated frequency of structures in the XML tree.The suggested method is compared to both the hierarchical agglomerative clustering (HAC) and k-means methods to determine its efficacy.The experimental results demonstrate that the suggested method outperforms both HAC and k-means.
The extended vector space model (EVSM) was developed by Yongming et al. [24] as a novel model for grouping XML documents according to their structure and content by calculating similarities between them.Additionally, the hierarchical clustering method has been used in this approach.The effectiveness of the suggested clustering approach was measured using two different criteria.Purity and entropy are two of these measures.Their model has been experimentally tested on two datasets (the INEX IEEE corpus and the Wikipedia collection).According to the findings, this approach would result in more efficient clustering.
A dynamic clustering approach for SOAP web messages was implemented by Al-Shammary et al. [17] based on fractal similarity measurements.The clustered messages resultant has been formed in a compact form for network traffic reducing.The proposed model has been examined using the compression ratio and processing time.By analyzing the web aggregator's resulting compression ratio (CR), the suggested model has been compared with both k-means and PCA+k-means.The dataset has been utilized for testing and experimentation consist of 160 SOAP messages, divided into four sets of 40 messages each.These messages were created using the WSDL for stock quotes found at [25].The highest Av.CR values were obtained for the following datasets, in that order: 3.92, 7.98, 16.63, and 21.70.Average processing time for the proposed model takes around 15.6 milliseconds.

PROPOSED MODEL
This section explains the complete design for proposed model including Hilbert convex function, web services aggregation and Hilbert clustering model.First, Hilbert convex set as similarity measurements are explained with the mathematical and equations.Moreover, the proposed model includes the implemented techniques for web service aggregation that combine web messages in a compact size based on the Hilbert similarity grouping.Finally, the proposed model has included a new dynamic clustering technique based on Hilbert measurements.

Hilbert convex set as similarity measurements
In this research, we have proposed Hilbert based on convex set measurements for SOAP messages clustering.We have considered each message as convex (set) and each feature in the message as a point.Technically, convex set measurements are applied to calculate the similarity between two messages.Equation (1) illustrates the Hilbert convex set measurements, including their parameters: where, x and y are vectors (x and y are represented by SOAP messages), n is number of attributes in vector, and Max(x) is maximum value in vector x.

Web service aggregation
In this study, an aggregation model based on compression [17], [22] is used to examine the performance of the suggested approach.The paradigm of aggregation consists of three phases leading to a compact aggregated message.Initially, XML messages are transformed into a web matrix (in matrix form), and then they are represented as an XML tree.XML trees are then transformed into vectors using either depth-first or breadth-first tree traversal.These text expressions are compressed using either the fixed-length or Huffman lossless compression methods.The suggested model has been formed into two primary approaches: one-bit and two-bit aggregators.Both aggregators provide two versions of the fixed-length and Huffman encoding techniques.

Dynamic Hilbert clustering model
A novel Hilbert clustering technique is proposed to provide dynamic clustering for web services.Hilbert space and distance measurements are introduced as new mathematical computations for the similarity between XML messages.Hilbert is suggested as a strategy for unsupervised clustering, which groups XML messages in a dynamic manner.The presented approach intends to build clusters with a high similarity degree by grouping messages together dynamically without requiring a fixed number of clusters.The proposed model first converts Web messages into numeric form developed by Al-Shammary et al. [17] and Hwang and Gu [23].Next, Hilbert convex set measurements are achieved on the numeric representations of XML messages.Technically, Hilbert similarity measurements are applied in order to cluster XML messages based on the maximum similarity values.The final Hilbert clusters would be the input to the We aggregators to aggregate each cluster's messages together.Figure 1 explains the main steps of the proposed Hilbert clustering model.
Practically, dynamic Hilbert clustering begins with constructing an XML tree that maintains a single occurrence in the tree while eliminating duplicate ending tags (closing tags), then employing a term frequency-inverse document frequency (TF-IDF) weighting technique to calculate elements' frequency for each message.Then the outcomes of this method will be recorded in a vectors matrix of two dimensions.Next, Hilbert convex measurements have been applied to compute the similarity degree between messages.Each (current) message's Hilbert similarity measurements are computed with all messages and compared, 6657 and the message is then assigned to the cluster with the highest similarity value.Eventually, the resulting clusters are constructed in the string form and input into an aggregator tool.This tool combines the contents of each cluster of messages that are similar into a single compressed message.Figure 2 shows a detailed explanation of all steps taken by the dynamic clustering model.Algorithm 1 illustrate this process.

EXPERIMENTS AND RESULTS
In order to evaluate the performance of the proposed Hilbert clustering model, a wide range of XML message sizes has been considered in the experimental analysis.The dataset was formed using the WSDL at [25].It contains 160 messages divided into four groups based on their size.Which categories: small, medium, large, and very large, with sizes ranging from 140 to 5,500 bytes.Each group has 40 messages [17].We have used the compression-based aggregation model [22] to investigate the proposed model's efficiency compared to other standard clustering methods.Vector space model [26] has been applied to the same dataset and compared with the proposed model.The main evaluation metrics include compression ratio, clustering time and compressed size file that are computed by aggregator tool on the resultant clusters.The highest CR achieved by these methods means this is the best clustering.Furthermore, other comparisons are applied with k-means and PCA+k-means from previous studies [17], [27].
In order to achieve potentially large compression ratios on the clustered XML messages, all strategies demonstrated substantial results.Experimentally, the structure of clusters for each group is explained in Table 1.The clusters that formed as a result of the small group distribute different numbers of XML messages (10, 20, 30, and 40) into variety clusters (3,8,10,13), illustrating the dynamic nature of the Hilbert model since the size of the cluster is not predetermined.In the medium group, the number of clusters is 3, 6, 9, and 11, including 10, 20, 30, and 40 messages, respectively.Then again, the large group with 10, 20, 30, and 40 messages distributed into 3, 6, 6, and 7 clusters and finally, the very large group was divided into 3, 5, 7, and 8 clusters for 10, 20, 30, and 40 message respectively.Clearly, the cluster number decreases with the message size increase.This is because larger XML messages have more redundancy, which in turn causes a fewer number of clusters to be allocated to messages than is the case of smaller files.Moreover, most clusters are about equal size.This shows that a dynamic model can efficiently distribute web messages of such size, which may be more appropriate than several others.Moreover, the average CR results achieved by applying the dynamic model to the dataset are 3.2056334, 7.2671758, 13.8358308, and 15.3180299 for small, medium, large, and very large groups, respectively, with 40 messages in the fixed-length method.On the other hand, when implementing the proposed model with the Huffman technique, the average CR results are 3.0780064, 7.8991900, 17.6439850, and 20.4135664 for small, medium, large, and very large groups, respectively, with 40 messages.Evidently, the dynamic model performance with the Huffman technique has shown better CR results than the fixedlength method.The average values obtained for CR are stated in Table 2. Table 3   Several illustrative figures have been computed in order to show the outcome in a clear manner.Figure 3 shows the detailed results of average clustering time for small, medium, large and very large groups with (10, 20, 30, and 40) message numbers in both fixed-length as shown in Figure 3(a) and Huffman as shown in Figure 3(b) encoding.Evidently, the clustering time has shown the best values with a small size of messages.This indicates that the needed processing time increase as the number of messages increases.Figure 4 depicts the ability to minimize the overall size of the aggregated messages with various original sizes for small as shown in Figure 4

CONCLUSION
In conclusion, this paper has introduced a new Hilbert clustering model based on a convex set.The main technique is computing the similarity values between web service messages and clustering them.The experimental results have shown the proposed model's effectiveness in dramatically enabling the aggregation approach to reduce network traffic compared to other traditional methods.The proposed model can achieve a high compression ratio that reaches up to 20 with the Huffman technique.Moreover, it has provided a considerable reduction in processing time.In the future, we will apply the proposed model to high dimensional dataset and use Hilbert similarity measurements for web services classification.
Int J Elec & Comp Eng ISSN: 2088-8708  Dynamic Hilbert clustering based on convex set for web services aggregation (Nawras A. Al-Musawi) 6655

Figure 1 .Figure 2 .
Figure 1.Main steps of Hilbert clustering (a), medium in Figure4(b), large in Figure4(c), and very large in Figure4(d).Moreover, the proposed model has achieved better results with the Huffman technique than with the fixed-length technique.

Figure 3 .Figure 4 .
Figure 3. Average clustering time of dynamic Hilbert clustering model for all groups of messages in (a) fixed-length and (b) Huffman techniques

Table 1 .
presents the resultant average CR for the proposed model compared with k-means, PCA combined with k-means, and vector space clustering techniques for all groups with 40 messages and different cluster sizes.It has shown the performance of the dynamic Hilbert clustering model as better than other models (k-means, k-means+PCA) 6659 in medium, large and very large groups, as well as it has outperformed the vector space model in large and very large groups of messages.Clusters number results of dynamic Hilbert model for small, medium, large, and very large groups with 10, 20, 30, and 40 message Dynamic Hilbert clustering based on convex set for web services aggregation (Nawras A. Al-Musawi)

Table 2 .
Average compression ratio results of dynamic Hilbert model for small, medium, large, and very large groups with 10, 20, 30, and 40 message

Table 3 .
Overall average compression ratio of k-means, PCA+k-means, Hilbert, vector space for small, medium, large and very large groups with 40 messages in both fixed-length and variable-length (Huffman)