An artificial immune system algorithm for solving the uncapacitated single allocation p-Hub median problem

The present paper deals with a variant of hub location problems (HLP): the uncapacitated single allocation p-Hub median problem (USApHMP). This problem consists to jointly locate hub facilities and to allocate demand nodes to these selected facilities. The objective function is to minimize the routing of demands between any origin and destination pair of nodes. This problem is known to be NP-hard. Based on the artificial immune systems (AIS) framework, this paper develops a new approach to efficiently solve the USApHMP. The proposed approach is in the form of a clonal selection algorithm (CSA) that uses appropriate encoding schemes of solutions and maintains their feasibility. Comprehensive experiments and comparison of the proposed approach with other existing heuristics are conducted on benchmark from civil aeronautics board, Australian post, PlanetLab and Urand data sets. The results obtained allow to demonstrate the validity and the effectiveness of our approach. In terms of solution quality, the results obtained outperform the best-known solutions in the literature.


INTRODUCTION
In a distribution network, hubs refer to facilities that allow consolidation, sorting, switching and connecting of flows between origins and destinations [1]. The design of a hubs network (i.e., distribution network containing hubs) aims to reduce the cost resulting from connecting many origin-destination pairs of nodes. Indeed, using hub facilities to consolidate flows brings the advantages of economies of scale. Hub networks are widely exploited in many industrial areas including telecommunication, air passenger and freight transportation, postal delivery and public transportation network [1,2]. In transportation networks, for example, decision makers are usually faced with the commodity flows problem and need to develop efficient and competitive logistic strategies to manage such flows. It has been shown that logistic platforms (hubs) can indeed offer a very interesting competitive potential to cover the commodity flows issue. Furthermore, in addition to their ability to reduce logistic costs, logistic platforms can be designed by considering sustainable criteria.
Dealing with hub location problems (HLP), two main decisions are addressed, namely the location of hubs and the allocation of the demand nodes to the selected hubs. The selection of hubs is performed so hub location problems. In [11], Klincewicz developed two heuristics based on tabu search and on a greedy randomized adaptive search procedure (GRASP) where nodes are assumed to be assigned to their nearest hub.
A new variant of tabu search algorithm, called TABUHUB, is proposed by Skorin-Kapov and [12]. This approach considers equally the locational and the allocational part of the hub location problem. It also consists of a single exchange heuristic for the location of hubs, and for the reallocations of non-hub nodes. Ernst and Krishnamoorthy [15] developed a heuristic method based on simulated annealing (SA) to solve the US-ApHMP. They have developed an LP-based branch and bound solution method by using the SA upper bound. Pérez et al. [16] have presented an algorithm based on a hybridization combining the variable neighborhood search (VNS) together with the path relinking (PR) heuristics. Their VNS-PR hybrid heuristic outperforms the TABUHUB-based approach [12] and SA-based approach [15] in both solution quality and computational times.
In Kratica et al. [17] two genetic algorithm GA-based approaches are proposed. New encoding schemes and modified genetic operators are used to maintain the feasibility of individuals. The proposed approach exploited the idea of frozen bits to increase the diversity of genetic algorithm. To improve the computational time, the authors used the caching technique to avoid the fitness calculation for individuals that reappear during the GA run. Experiments conducted on both civil aeronautics board (CAB) and Australia post (AP) benchmark data sets, demonstrated to be robustness and efficiency of the approach in solving the USApHMP with up to 200 nodes and 20 hubs.
Urošević et al. [18] proposed a new general variable neighborhood search algorithm for solving the USApHMP. The authors provided three neighborhoods and structures to efficiently computing the network's total flow. In addition, the authors proposed a new nested strategy in designing a deterministic variable neighborhood descent local search.
The more recent solution approach of the USApHMP appeared in Benaini et al. [19]. the authors implemented a parallel genetic algorithm on graphic processing unit (GPU) for solving the problem. Their approach improved some best-known solutions in cost and in computing times.
The present paper develops a new approach to efficiently solve the USApHMP. The approach is based on the artificial immune systems (AIS) framework which is a new class of computational intelligence inspired by the biologic immune system [20]. As point out in [21], AIS is one of the most effective methods to be used while dealing with optimization problems. It is a sort of population based metaheuristics which is known for its ability to reach global optimum and avoid local optimums [22]. The most appropriate AIS algorithms for optimization problems are clonal selection algorithms (CSA) which are based on clonal selection theory [23]. CSA starts by generating a set of candidate solutions in the form of an initial population, and improves the features of the population by applying evolutionary operators until a stopping criterion is reached. Its computational cost is reduced when compared to other evolutionary algorithms [22]. This paper is an extension to the work presented in [24]. In our previous work we provided a summary about theoretical framework of the artificial immune systems and a detailed review of the clonal selection algorithm. In this study we provide numerical results to the proposed approach and compare it with existing methods in the literature.
Accordingly, this paper develops a CSA-based method to solve the USApHMP, which is an adaptation of the optimization version of the CLONALG algorithm [22]. To the best of our knowledge, this heuristic has never been used before in solving the USApHMP. Comprehensive experiments and comparison of the proposed approach with other existing heuristics are conducted on benchmark from civil aeronautics board and Australian post data sets. Roughly speaking, the proposed approach is compared with: The simulated annealing method (SA), proposed by Ernst and Krishnamoorthy [15], the hybrid variable neighborhood search VNS and path relinking PR heuristics presented by [16], Genetic algorithm GAHUB2 proposed by Kratica et al. [17], and the general variable neighborhood search (GVNS) proposed by [18]. The results obtained allow to demonstrate the validity and the effectiveness of our approach. In terms of solution quality, the results obtained outperform the best-known solutions in the literature.
The rest of the paper is structured around 5 sections. Section 2 provides the mathematical formulation of USApHMP used in this work. A brief overview of artificial immune systems and clonal selection algorithms is given in section 3. Section 4 describes the proposed approach. Computational results on different problem instances from the literature are reported in section 5. Conclusions and future research works are summarized in section 6.

NOTATION AND MAIN WORKING ASSUMPTIONS
This section presents the notation system used and also states the main working assumptions made in this paper. − Hubs are assumed to have unlimited capacities.

Notation
− Each origin-destination node is assigned to a single hub.
− Direct link between nodes is not allowed.
− The network is complete.

PROBLEM DEFINITION AND MATHEMATICAL FORMULATION
Let us consider a hub network modelled as a complete graph G = (N , A) where N is the set of nodes and A is the set of arcs. A note i ∈ N can either be a hub or a spoke (non-hub nodes). To simplify the notations, nodes are considered as numbered from 1 to cardinality N of N . An arc (i, j) ∈ A (i = 1, · · · , N ; j = 1, · · · , N ) is a link between nodes i and j, respectively, the origin and destination. Each arc (i, j) ∈ A is characterized by the amount w ij of flow to be routed, and a distance d ij between nodes i and j (d ii = 0). Three cost rates are considered, namely the collection cost χ defined between origin and hub nodes, the transfer cost α defined between hubs, in addition to the distribution cost δ defined between hub and destination nodes.
The uncapacitated single allocation p-hub median problem (USApHMP) considered in the present work is a variant of the HLP where hubs are assumed to have unlimited capacities and each spoke is assigned to one and only one hub. Furthermore, the number p of hubs to locate is assumed to be known (exogenous parameter). The objective of the USApHMP consists to jointly select appropriate hub facilities and to allocate demand nodes to these selected facilities so that to minimize the transportation costs incurred from collection, transfer and distribution of the overall flow in the network. To model this optimization problem, we consider a binary decision variable z ij defined as (1): Accordingly, the objective function is computed as the sum of collection, transfer and distribution costs as [18]: The resulting USApHMP optimization problem can be then formulated as:

Model USApHMP
Subject to: In the above optimization model, the objective function in (3) minimizes the overall transportation cost. Constraints in (4) ensures that exactly p hubs are located. Constraints given by (5) ensure that all nodes are hubs or are assigned to a single hub. Finally, constraints in (6) ensures that no node is assigned to a location unless a hub is opened at that site, which also mean that node to node connection is not allowed in the network.

SOLUTION METHOD
The optimization model USApHMP formulated by (3)- (7), is NP-hard. An exhaustive examination of all possible solutions is therefore not realistic due to the computational time limitation. Meta-heuristic algorithms are well known approaches for their efficiency and effectiveness to search, in a reasonable amount of time, the optimal or a near optimal solution of complex combinational problems. Based on the artificial immune systems (AIS) framework, this paper develops a new approach to efficiently solve the USApHMP (3)- (7). AIS are computational intelligence algorithms inspired by the natural biological immune systems. The four known AIS-based algorithms are immune network, negative/positive selection and clonal selection algorithms. They are all recognized for their powerful adaptive learning and robustness, in addition to their valuable properties to avoid premature convergence and improve local search [23]. They have indeed been successfully applied to a large range of domains including classification/clustering, anomaly detection and pattern recognition [25,26].
To solve the USApHMP problem, the present paper develops a procedure based on clonal selection algorithm (CSA). CSA is inspired by the features of clonal selection theories of immune systems and appeared first in the work by de Castro and Von Zuben [27]. Since then, it has been demonstrated to be an efficient method to solve multi-modal and combinatorial optimization problems. The CSA is constructed around 7 steps and processes: Population initialization, affinity evaluation, selection and cloning (proliferation), hypermutation, receptor editing, convergence and termination condition. Algorithm 1 describes the pseudo code of our algorithm.
The analogy between the immune system and the algorithm is presented in Table 1. In this Table, an antigen refers to the value of the objective function, an antibody represents a candidate feasible solution, and the affinity value refers to the performance measure of an antibody. The affinity represents indeed the degree of matching between the antibody and the antigen. The detailed design (solution encoding and operators' definitions, initial population selection, · · · ) of the CSA developed here will be discussed as shown in Table 1.
Interchanging mutation(P i) Calculate objective f unction value f or each individual(P i) end for P ← Select(P, P clones, P size) P rand ← rand(d, N ) Replace(d, P rand) end while Return P Report the best antibody as the final solution Table 1. Analogy between natural immune system principles and our developed system Natural immune system Artificial immune system applied in USApHMP Antigen The value of the objective function to be optimized Antibody Candidate feasible solution to the problem Affinity Degree of matching between an antibody and the antigen Proliferation of antibody Creation of new antibodies by using of clonal operators

Solution representation (encoding), and initial population generation
In order to apply the CSA to the stated optimization problem, a specific representation scheme to encode antibodies is required. In the present work, individual solution (antibody) is represented by a vector s of integers whose length is given by the number N of nodes in the complete graph G. The i th element of the vector solution s corresponds to the hub number to which node i is assigned to. To illustrate, let us consider the solution s : Accordingly, the corresponding hub network is composed of a total of N = 11 nodes among which p = 3 nodes are defined as hubs. The three pre-selected hubs are numbered as 1, 4, and 7. From this representation, one may observe that each node of the network is linked to the appropriate hub. For example, nodes 2, 6 and 10 are all linked to hub 4, while nodes 3, 5 and 8 are assigned to hub 7. It is worth noticing that each hub is assigned to itself. To generate the initial population, a procedure is developed on the basis of flow quantities emanating from and destined to a given node. Roughly speaking, the procedure considers the total amount O i and D i of flow originating from and destined to a node i (i ∈ N ), respectively. These two quantities are evaluated as: O i = j w ij and D i = j w ji . The resulting total amount F i of flows at each node i of the network is computed as F i = O i + D i . The overall nodes are then arranged in a decreasing order of total flows. From the resulting ordered list, the initial population is then generated in two steps as summarized in the pseudo code of algorithm 2. From this algorithm,the first step defines the 80% of candidate solutions of the population. For this part of the population, for each individual we choose in a randomly way p hubs from nodes that represent the highest total amount of flows in the network. To do this, we sort nodes in a list (flow-list) in decreasing order of total amount of flows, and we select p hubs from the nodes that are in the range 1...3/4n of the flow-list. This strategy promotes the nodes that has higher amount of flow to have a higher probability of being selected as hub. The second step construct the remaining 20% of the population by choosing p hubs randomly from the entire set of nodes. Once the selection of hubs is achieved, we proceed to the allocation of the remaining spokes (non-hub nodes) to the established hubs. For each non-hub node i, the located hubs are sorted in ascending order of distance from i, then i is allocated to the nearest hub. Each established hub is allocated to itself, following the assumptions of the problem.

. Affinity determination
Let us recall that the affinity value of a given candidate solution measures the degree of matching between antibodies and antigen. In the present work, the affinity of a solution is inverse proportional to the solution fitness. We use the global cost given by the objective function of the optimization model USApHMP (3) to evaluate the fitness of each solution. Each solution in the population has a chance of surviving to the next generation with respect to its affinity level. The affinity value is computed for each solution to conduct the selection and cloning process.

Selection and cloning
Selection process is a stochastic procedure, based on the affinity/fitness value of each solution in the population, used to select the solutions that will move on to the subsequent generation. The stochastic universal sampling (SUS) is employed in this implementation of clonal selection algorithm (algorithm 3), it was developed as a single-phase sampling algorithm with minimum spread and zero bias. This selection method is a development of roulette wheel selection strategy (RWS); it uses N equally spaced pointers instead of a single selection pointer (N is the required number of selection), see algorithm 3. In the population, each individual gets a reproduction probability which depends on his own affinity value and the total affinity of all other individuals. The principle of this method consists to map the individuals to a line, such that each individual gets a portion of the line which is proportional to its affinity value. Then, a single random number (pointer1) is generated in the range [0, 1/N ] and N pointers are generated starting with pointer1 and Ì ISSN: 2088-8708 spaced by 1/N . The N individuals whose affinity spans the positions of the pointers are selected see Figure 1.
In Figure 1, the selected individuals consist of individuals 1, 2, 3, and 6. Unlike the RWS, this selection method gives weaker individuals in the population a chance to be selected. The antibodies that are selected from the initial population will be cloned directly proportional to their affinity values. The number of clones generated from each selected antibody depends on the selection probability of the antibody. The number of clones of a selected antibody i is hereafter denoted as N C i and computed as: where β is a multiplying factor (here β is set to β = 2), r i is the rank of the selected antibody i, and N s is the total number of selected antibodies. The function x gives the value of the integer nearest to x. From (8) results a proliferation phenomena according to which for an antibody with high affinity value (i.e. with low rank) corresponds a high number of clones. Consequently, the selection and cloning process gives to solutions with low costs (high affinity value) a high chance of being selected and also to be cloned with a high number of clones.

Hypermutation operator
Once the cloning process is achieved, the hypermutation operation is carried out on clones. In the present paper, a two-phase hypermutation method is adopted. This method consists first an interchanging mutation followed by a scramble mutation. This later is performed only when no affinity improvement is obtained in the interchanging mutation. To illustrate, let us consider an arbitrary solution vector s from the clone population. The interchanging mutation consists to choose randomly two elements, say s(i) and s(j) of the solution vector s, and then to interchange their respective hubs' values. If the affinity value of the resulting new solution vector s is better than that of the original solution s, then the antibody s is replaced by the new one s . In case of no affinity value improvement, a scramble mutation is then performed on the original antibody s. The scramble mutation consists to chose randomly a subset of elements and then to scramble or shuffle their respective hubs' values. Once again, if the affinity value of the resulting new solution improves that of the original solution, then the original antibody is replaced by the new one. If the affinity of the solution is not improved by both hypermutation phases, the original antibody string is conserved as is. Once the hypermutation process is accomplished, antibodies are sorted with respect to their respective affinity values.

Receptor editing operator
After the hypermutation process, new antibodies are introduced to ensure population diversity. Accordingly, a fraction of antibodies with the worst affinity values in the population are eliminated and replaced by randomly generated newcomers. The purpose of receptor editing operator consists to avoid pre-mature solution convergence by escaping from local optima.

Convergence and termination
When the maximum number of generations is reached, the convergence is thought to be attained and the corresponding solution can be considered as the best one, and the algorithm stops. In the contrary case, the search process is restarted from step 2 and the improvement process continues.

RESULTS AND DISCUSSION
In this section, computational results of the proposed CSA heuristic are provided and compared to the best results appeared in the literature. The clonal selection algorithm was implemented in MATLAB (2016a) and experiments are performed on a 2.00 GHz Intel Core IV PC, with 8 GB of RAM.
To compare our approach to the existing works, all experiments are conducted on four different sets of benchmark instances commonly used in the literature related to the HLP [4,14]. The first is the well-known CAB data set provided by the US civil aeronautics board. The CAB data set considers 100 cities in the US and provides Euclidean distances d ij and the amount of flows w ij routed between each pair of cities (i, j). Dealing with CAB data set, three instances with N ∈ 10, 20, 25 nodes and up to p = 4 hubs are selected. For these instances, the collection and distribution costs rates are equally estimated to χ = δ = 1 while the transfer cost rate α takes values between 0.2 and 1 (α ∈ [0.2, 1]). The second data set is established within the Australian postal (AP) delivery system where 200 postal districts in the metro Sydney area are considered. Transportation costs are proportional to the Euclidian distances d ij between two postal districts i and j with a postal flow w ij . Seven instances are also selected where the number N of nodes is such that N ∈ 10, 20, 25, 40, 50, 100, 200 nodes and up to p = 20 hubs. In the AP data set, collection, transfer and distribution costs rates are, respectively, estimated to χ = 3, α = 0.75 and δ = 2. The third data set is PlanetLab instances which is a real world nodeto-node delay data for performing internet measurements [18]. In these networks, α = χ = δ = 1 and the distance matrix does not respect the triangle inequality due to missing links. The flow w ij is equal to 0 if i = j and 1 otherwise. The fourth data set concerns the very large size instances with 1000 nodes of the Urand dataset which was generated by [18]. In these benchmarks, the nodes coordinates were randomly generated from 0 to 105 and the flow matrix was randomly generated.
The comparison of our approach against the existing works is made on the basis of both CPU time and solution quality criteria. Solutions quality comparisons is obtained by measuring the gap defined (in percent) as the absolute value of the ratio: Cost function obtained − Best exiting cost function Best exiting cost function (9) The Standard deviation of the average gap is also calculated: In all experiments conducted hereafter, each test instance was launched 20 times and the elimination ratio is set at 30% in the Receptor Editing process. The search process is altered whenever the maximum Ì ISSN: 2088-8708 number of generations is reached. In what follows, this stoppage criterion is set at 500 and 1000 for small and large problem instances, respectively.

Experiments set #1
: Case of small to moderate size problems By using the optimal solution obtained from CPLEX solver as a reference, we will show that our heuristic based on CSA also provides optimal solutions within a low computation time. For varying values of the parameters α and that of the number p of hubs, the results obtained for CAB instances are reported in Tables 2 and 3. For varying values of the number p of hubs, the results obtained for AP instances are given in Tables 4 and 5. In these tables, E (CPU time) and E (Gap), refer, respectively, to the average CSA computation time, and the average gap value of the 20 runs performed.
From the results reported in Tables 2, 3, 4, and 5, one may observe that our heuristic method reaches all previously known optimal solutions for small to moderate instances. From these table, one may also observe that the computation time is very low and the proposed approach is capable of providing an optimal solution in less than 1 second for instances with N = 50 nodes. Moreover, the robustness of our method is confirmed as indicated by the low values of standard deviation. Now that the proposed approach has been validated against previously published works for small to moderate problem sizes, numerical experiments will be run for large size problems.

Experiments set #2: Case of large size problems
In this set of experiments, the proposed approach is applied in large size problems. We will also show that our heuristic based on CSA provides fast, good and robust solutions. The performance of the proposed heuristic will be compared to 6 existing heuristic methods: Simulated annealing method (SA) [15], the hybrid Variable neighbourhood search (VNS) and path re-linking (PR) heuristics [16], Genetic algorithm (GAHUB2) in [17], the general variable neighborhood search (GVNS) [18] and the parallel genetic algorithm on GPU (GPU) [19]. In the present experiments, instances are selected from the AP data set where the number of the nodes N ∈ 100, 200, as shown in Tables 6 and 7 and from PlanetLab data set with up to 414 nodes, see  Tables 8 and 9.

Experiments set #3: Case of very large size problems
We also compare our heuristic method with the GVNS and GPU on very large instances from Urand data set with 1000 nodes generated by [18], as shown in Tables 10 and 11. By analyzing the above results, we conclude that our CSA method is more robust than other related methods for large sized problems, see the number of times it reached the best solution. In regards to small sized problems, the results of our approach are similar to those existed in the literature, but when the problem size increases our proposed heuristic enhances the results of GVNS and GPU in terms of solutions quality. Moreover, we could improve the best-known solution for eight instances in total.

CONCLUSION
In this paper we develop a clonal selection algorithm for solving the Uncapacitated single allocation p-Hub median problem. The clonal selection algorithm mimics the clonal selection principles in improving the affinity of individuals through generations in order to achieve the near optimal solution. The computational results state that our CSA method is more robust than other related methods for large sized problems, see the number of times it reached the best solution. In regards to small sized problems, the results of our approach are similar to those existed in the literature, but when the problem size increases our proposed heuristic enhances the results of GVNS in terms of solutions quality. Moreover, we could improve the best-known solution for six instances in total. The main contributions of this paper can be summarized as: i) To our best knowledge, CSA is applied to the USApHMP for the first time; ii) A new different selection method "Stochastic universal sampling "is used which is a development of the roulette wheel selection strategy (RWS) usually used in the literature; iii) An appropriate encoding scheme is used, which keep the feasibility of solutions through generations; and iv) Quality solution was enhanced for six instances.
In the future, we will concentrate on extending our heuristic method on an other more complex hub location problem model, which captures real life circumstances more effectively. The model will take into account the perishability nature of transported products in the network, i.e., the products have to be routed to their customers through the hub network in a limited time.