Fitness function X-means for prolonging wireless sensor networks lifetime

X-means and k-means are clustering algorithms proposed as a solution for prolonging wireless sensor networks (WSN) lifetime. In general, X-means overcomes k-means limitations such as predetermined number of clusters. The main concept of X-means is to create a network with basic clusters called parents and then generate ( j ) number of children clusters by parents splitting. X-means did not provide any criteria for splitting parent’s clusters, nor does it provide a method to determine the acceptable number of children. This article proposes fitness function X-means (FFX-means) as an enhancement of X-means; FFX-means has a new method that determines if the parent clusters are worth splitting or not based on predefined network criteria, and later on it determines the number of children. Furthermore, FFX-means proposes a new cluster-heads selection method, where the cluster-head is selected based on the remaining energy of the node and the intra-cluster distance. The simulation results show that FFX-means extend network lifetime by 11.5% over X-means and 75.34% over k-means. Furthermore, the results show that FFX-means balance the node’s energy consumption, and nearly all nodes depleted their energy within an acceptable range of simulation rounds.

consumption among nodes. The concept of random clustering in classical algorithms suffers from an unbalanced intra-cluster distance, unbalanced cluster size, and unbalanced energy consumptions; thus, the concept of fixed clustering has been introduced in k-means algorithms [14]- [19], k-means algorithm has many applications such as developing vehicle driving cycle [20], fingerprint recognition system [21], and data mining [22]. The k-means algorithms estimate the number of clusters, their initial centroids and then assign nodes to clusters based on the minimum distance to centroids. After that, the average Euclidean distance to these centroids was determined and centroids repositioned; later, wireless nodes were re-clustered and assigned to the new centroids. Determining the mean distance and re-clustering the nodes are repeated until centroid positions are fixed, and cluster members are static.
Unlike classical algorithms, k-means algorithms balance clusters' intra-cluster distance, but these algorithms still construct bad clusters. A bad choice of initial centroid leads to unbalanced cluster sizes and unbalanced CHs consumption that causes premature CHs death at the early stage of the network lifetime. Therefore, Radwan et al. [23]- [25] proposed the X-mean algorithm in three articles. Their concept is to split k-means centroid (parents) into multiple new positions (children), thus expanding the search space for better positioning and the best number of clusters. Their work provided a leap in the network lifetime and reduced k-means limitation, but the problem of determining the number of clusters persists at the children's level. Furthermore, X-means did not provide splitting criteria to determine which cluster should split or not but assumed that all clusters were worth splitting to a number of children determined by the user. This article proposes fitness function X-means (FFX-means) as a clustering algorithm based on X-means. Simply, each parent centroid will have a fitness based on its cluster size and the Euclidean distance to the sink. The number of children determined if the cluster is worth splitting; otherwise, the parent cluster remains; the position of the children centroids selected randomly within the average intra-cluster distance of the parent centroid.
The rest of the article is as follows; in section 2, FFX-means are proposed and described with mathematical equations. In section 3, the simulation results, and a discussion on how the proposed algorithm prolongs WSN lifetime. Finally, in section 4, we conclude our work with future recommendations to further enhance the FFX-means algorithm.

PROPOSED METHOD
The proposed method in this article includes energy model and introduces new methods for clusters formation, cluster splitting criteria, and cluster-heads selection. The energy model is obtained from [2]. The Energy-model consists of transmitter and receiver parts, as in (1) and (2). The transmitter accounts for the energy required to aggregate and process data (Eelec), data size (D), the number of nodes (n), and the amplification energy required to transmit (D) data over (d) distance. The amplification energy in (1) is described by two types of signal attenuation, where ɛfs describes the free-space attenuation model, and ɛmp describes the multipath fading attenuation model. (d0) is the crossover distance between the free-space and multipath models, described in [15]. The total transmission energy (ETx) is proportional to the distance between a source and destination nodes. It corresponds to d 2 attenuation when d is less than d0 and corresponds to d 4 attenuation when d is greater than or equal to d0. The receiving consumption (ERx) includes the energy required to process data packet (D) that is received from (n) nodes in bits as in (2).

Clusters formation and splitting criteria
The concept of the X-means clustering is summarized in previous work [25]. The first phase is constructing random clusters with random centroids, then, using k-means [15], optimizing these positions as in (3). The process is recursively repeated until a final copy of centroids is determined, and the final copy of centroids is called parent centroids. The second process is searching for updates of the parent centroids; here, a new method proposed. The method is driven from metaheuristic algorithms; a parent centroid's fitness is determined as in (4). A parent is fit to split if its fitness (F) is greater than average fitness ( ). found by the crossover distance ( ) divided by the average distance of the parents' centroids to the sink, added to them the expected cluster-size divided by the average cluster-size of the parents as in (5). The expected cluster-size found by (OCS * N), OCS is user defined value (1%, 10%, 20% …).
Pn : parent (n) centroid, αn-1(x,y) : x and y position of previous (n-1) parent centroid, Sn : set of nodes assigned in cluster to Pn, βi(x,y) : the x and y position of node (i), Fn : the fitness of the parent (n) centroid to split, K : predefined initial number of centroids, Cn : cluster-size of parent (n), dn,Sink : the Euclidean distance between parent (n) centroid and the sink, OCS : optimal cluster size (percentage of total nodes of the network), N : total number of nodes in the network. After determining if parents are fit to split, the number of children per parent has to be determined as in (6). Then every child location is determined as in (7), where a random value is selected from the range 1 to dm and multiplied by cos  to get x position and sin  to get y position. Finally, (dm) is the mean Euclidean distance within the parent cluster, which is determined as in (8).
Pn,j : number of children (j) generated from the parent (n).
Cj : the location of the child (j) of parent (n). dm,n : the average intra-cluster distance between parent centroid and member nodes in the cluster. After that, new children Cj are set as cluster centroids, then nodes are re-clustered and assigned using a minimization technique, where nodes are grouped by the minimum average sum of their distance from the sink and children (Cj) as in (9) and (10). Finally, the X-means algorithm recursively executes (11) until children's centroid positions converge. The recursive run has three outcomes; children form their clusters and parents collapse, second some children and parents diminish, and third, children collapse, and parents repositioned to best locations. Figure 1 shows an example for parent centroids, Figure 2 shows parent splitting, and Figure 3 shows the remaining centroids and their final position.
ADi,j : average distance of node (i) from the child (j) and from the sink. di,j : node (i) Euclidean distance to child (j). Di,sink : node (i) Euclidean distance to the sink. ∅( , ) : position of the sink. CCn,j : final position of child (j) of parent (n).

Cluster-heads selection
In previous work [25], cluster-heads were selected and rotated based on their remaining energy; if cluster-head energy dropped below a certain threshold, the cluster-head step-down and a new cluster-head with energy that exceeds the threshold is selected. The threshold itself is not a fixed value, and a node updates its threshold by a small step when its energy drops below it; thus, the node will have a chance of being selected at a later stage. However, this technique benefits the cluster-head only, as it ignores the importance of minimizing the intra-cluster distance to extend the network lifetime. Here, this article proposes a new technique to select cluster-heads as in (12) to (14). At the beginning of the simulation, all nodes with the same initial energy and nodes with smaller distances to their cluster centroid are selected to be cluster heads. Then, when these nodes' energy decay, new cluster-heads selected have the minimum distance to centroids and the highest remaining energy among their peers.
Ω : Represent the ratio of initial energy of a node to its remaining energy, di,c : The distance of node (i) to the child/ cluster centroid, Fc,I : Fitness of node (i) to become cluster-head, nodes with small distance to centroid and high remaining energy will have the smallest fitness, CHSc : The selected cluster-head from set of nodes (Sc) that forms a cluster.

RESULTS AND DISCUSSION
An extensive simulation has been carried out in MATLAB R2020b to test FFX-means. Table 1 shows the simulation parameters and the result benchmarked among traditional k-means and X-means as in Figure 4. The initial number of clusters is five, and their centroids position is selected randomly; the simulation area is 220×220 unit 2 , and the density of the nodes is uniform. Figure 4 shows that FFX-means has prolonged the network lifetime by 11.5% over X-means and 75.34% over k-means. However, a sharp decline in the number of alive nodes of FFX-means appeared after round 2750 of the simulation; this decline indicates a balanced consumption among nodes and a balanced cluster size in the network; thus, nodes depleted their energy at the same round of simulation. Heinzelman et al. [11], [26] estimated that the optimal cluster size (OCS) will vary between 9-11% of total network nodes; thus, FFX-means simulated for OCS values and Figure 5 shows that the first node death (FND) appeared between round 2,600 and 2,750 for ten different runs. To further test the stability of FFX-means, the simulation repeated for fifty runs with OCS set to 10% and results in Figure 6 capture FND and last node death (LND); similar to Figure 5, there is no abnormal behavior in FND, and it is noticeable that whenever FND increases the LND decreases and vice versa.     Figure 5. Ten runs of the simulation results of FFX-means when cluster-size varies from 9% to 11% of the total network nodes Figure 6. Fifty runs of the simulation results of FFX-means when the cluster size is set to 10% of the total number of nodes in the network

CONCLUSION
Splitting and merging clusters in wireless sensor networks aims to extend the network lifetime. X-means algorithm is an example of splitting the clusters into multiple children searching for new centroids. Still, the algorithm neither provides any criteria or measurements to determine if the clusters were worth splitting nor the number of new clusters. In this article, X-means updated with fitness function (FFX-means) to resolve these issues, the fitness function determines if the cluster is worth splitting based on its centroid distance from the sink and how its cluster-size compares to average network cluster-size. Furthermore, FFX-means equipped with a new cluster-heads selection algorithm that balances the remaining energy of the node and its intra-cluster distance. For future works, new splitting criteria such as the intra-cluster distance and average network energy should be explored, and further simulation with a large-scale network should be conducted.