A hybrid constructive algorithm incorporating teaching-learning based optimization for neural network training

Received May 29, 2019 Revised Jan 16, 2020 Accepted Feb 1, 2020 In neural networks, simultaneous determination of the optimum structure and weights is a challenge. This paper proposes a combination of teachinglearning based optimization (TLBO) algorithm and a constructive algorithm (CA) to cope with the challenge. In literature, TLBO is used to choose proper weights, while CA is adopted to construct different structures in order to select the proper one. In this study, the basic TLBO algorithm along with an improved version of this algorithm for network weights selection are utilized. Meanwhile, as a constructive algorithm, a novel modification to multiple operations, using statistical tests (MOST), is applied and tested to choose the proper structure. The proposed combinatorial algorithms are applied to ten classification problems and two-time-series prediction problems, as the benchmark. The results are evaluated based on training and testing error, network complexity and mean-square error. The experimental results illustrate that the proposed hybrid method of the modified MOST constructive algorithm and the improved TLBO (MCO-ITLBO) algorithm outperform the others; moreover, they have been proven by Wilcoxon statistical tests as well. The proposed method demonstrates less average error with less complexity in the network structure.


INTRODUCTION
Artificial neural networks (ANN), having a strong similitude to biological networks, have the ability to learn noise data as well as the ability to classify and recognize different types of input patterns. These take place only if the neural network is well trained without an over-fitting or under-fitting model. The most wellknown training algorithm is the back propagation [1], but it has numerous drawbacks such as trapping in local minima [2]. Hence, researchers have decided to utilize evolutionary algorithms instead. In addition to the training and determination of optimal weights, another critical issue is the design of an appropriate ANN architecture. Many studies have been conducted for architecture as well as weight optimization. For instance, in applying a novel method based on Gaussian-PSO and fuzzy reasoning [3] ANN weight and structure optimization is presented. In literature, there are other methods to optimize ANN architecture, namely constructive algorithms and pruning algorithms. Constructive algorithms have many advantages over pruning algorithms, such as easy initiation, less complexity of the final solution, and lighter load of computation. Furthermore, CA's are able to freeze the existing weights in the neural network if they are useful in output; as a result, resulting in the reduction of the required time and memory. In pruning algorithms, several problem-dependent parameters are to be properly identified in order to obtain an acceptable network with a satisfactory performance. This makes it difficult to be used in real-world applications [4].
This paper portrays a combination of random search procedures and systematic methods, proposing hybridizing improved teaching-learning algorithms with constructive algorithms for the purpose of ANN design. The hybrid is advantageous, for teaching-learning algorithm is a parameter-independent optimization algorithm that balance between exploration and exploitation. Meanwhile, constructive algorithms are adopted to select an appropriate ANN architecture. Since using constructive algorithms is cost-effective in terms of the training-time and complexity of ANN, it hinders the production of networks with an inefficient very large architecture. This paper, with the aim of simultaneously optimizing the ANN weights and architecture, combines training and constructive algorithms applied to ten classification problems and two-time series prediction problems, as the benchmark. After evaluating the performance of proposed hybrid algorithms and comparing their results, it was found that the proposed method outperformed other algorithms. The proposed combination method proves to have a lower mean error in most cases. The rest of this study is organized as follows: Section 2 provides a brief description of the algorithms that we provided. Then, in Section 3, a hybrid proposed method to ANN optimization is presented. In Section 4, the experimental results of the application of the proposed approaches to the ANN problems are reported, and finally, the conclusion is drawn in the last section.

ALGORITHM DESCRIPTION 2.1. Improved teaching-learning based optimization (ITLBO)
Although TLBO provides high-quality solutions in the least amount of time and has a great stability in convergence [5], in the learner phase of this algorithm, learners randomly choose another learner from the population. This difficulty leads to a lack of balance between the two concepts of diversity and convergence. ITLBO with an improvement into basic TLBO overcomes this difficulty. In this algorithm, the teacher phase is the same as the teacher phase in the basic TLBO algorithm and the learner phase is expressed as follows. The ITLBO has been developed to improve the weaknesses of TLBO algorithm; for example, in TLBO random choices due to low local search capability, but in ITLBO with addition concept of neighborhood we trying to reduce random choices and utilize of neighborhood abilities. This issue increases local search and global search capability. The main sections of ITLBO are as follows:

ITLBO learner phase
In this phase, each learner is encoded with an integer and placed in a rectangular array. learners may learn from their neighbors or from the best individual in whole class. This process is based on local search ability; furthermore, balance between global search and local search ability is applied. In local search, each learner updates his position with Pc probability by the best learner in his neighborhood (or , ℎ ) and also global best learner that in population.
Where , ℎ is the teacher in neighborhood, ℎ is teacher of whole class, 2 , 3 are random numbers in the range of (0, 1). The new position of each learner will be accepted if its fitness value has improved. In the concept of global search, if Pc probability don't meet, each learner chooses a random learner ( ) from the whole class to provide the learning goal, if is better than , or otherwise, learning occurs according to learner phase in basic TLBO. Therefore, using these operations both local and global search capability will be obtained. All the accepted learners at the end of learner phase are preserved. Due to the enhanced exploitation ability along with the exploration ability, which already existed in the learning phase of the original algorithm, we use the concept of neighborhood in the classroom. For each individual in the population exist a number of neighborhood member that learn from the best one. For maintain of diversity after a number of iterations the neighborhood members of each individual are changed. This issue balance between the exploration and exploiting abilities. Other advantage there is in this algorithm, when a new position is obtained for each member, it may lead to the production of decision variables values that are out of the range of the definition interval. In this case, most researchers use the convergence approach to the upper and lower bound according to algorithm, but this method is Old and disabled method witch cause algorithm to local optima. In the improved teaching-learning based optimization method, we use modified technique to check boundaries of the variables [6]. Its advantage is avoiding equalization of the decision variables.

Modified MOST algorithm (MMOST)
Determining the architecture of artificial networks has lured many researchers in the field in recent years. We used Multiple Operators using Statistical Tests algorithm MOST [7]. In MOST algorithm, there isn't any controlled method for change struture. This algorithm may have large changes in network structure during the algorithm. Another weakness of this algorithm is the addition of layers frequently without any condition to control. In modified MOST algorithm, the operator pool was removed. For changing structure neurons are added one after another. Selecting the new structures is done more carefully by adding multiple conditions. At the beginning, algorithm starts with a single hidden layer network by the minimum number of neurons. We chose one of popular approach for allowed minimum number that is the average of number of output layer and input layer. Network in the first step has a single hidden layer and neurons are added continually to the hidden layer to obtain a proper structure of the network. To avoid creating very large structures for networks, the neurons are added to single hidden layer of the network until they don't exceed Max-hidden number. In fact, networks with very large structure not only don't have good generalizability, but they also increase the computational time of the algorithm. To eliminate this weakness, we add the second layer to network structure to create proper architecture with a probability less than P. after adding the second layer, the number of neurons in each hidden layer is set by min-hidden. MMOST constructive algorithm chooses the best architecture between constructed structures. So, as noted above, the differences between the MMOST and MOST algorithm are as follows: operator's pool is deleted; neurons are continually added; and there is a more precise choice between the three previous, current and the candidate architectures.

THE PROPOSED METHOD
In this paper, we proposed a combination algorithm for producing a neural network with proper structure and weights, to simultaneous optimization of weights and structure. For this purpose, a combination of the modified MOST constructive algorithm with an improved version of the training algorithm was proposed. The role of the constructive algorithm in the proposed algorithm is to construct different structures in order to select the proper one, which is carried out by using a switching systematic approach between the various structures allowed for the neural network. On the other hand, the role of training algorithm is to find optimal weights for the structure that is created by the constructive algorithm. Using constructive algorithms in creating a network architecture reduces computational cost and complexity. But using these algorithms in solving noisy problems [8] has failed, which in combination with other techniques, such as the use of evolutionary algorithms, can be effective in improving the constructive algorithm. In addition, we have made some modifications on the MOST constructive algorithm. For a more detailed description, the pseudo code of the proposed hybrid algorithms is shown in Figure 1. In other words, in order to clarify the combination of evolutionary training algorithms and constructive algorithms, we showed the process in flowchart by Figure 2.

COMPARISON RESULTS
In this section, we evaluate the effectiveness of proposed hybrid methods. These algorithms are applied to ten classification problem and two time series prediction problems. We compare the performance of the proposed hybrid algorithms first with each other and then with other available methods.

Definition of classification and time series prediction problems
The task of assigning a sample to a proper group, based on the characteristics of describing that object in a problem, is defined as classification. The classification problems used in this article include iris, diabetes diagnosis, thyroid, breast cancer, credit card, glass, heart, wine, page blocks, and liver. These classification problems are taken from the UCI machine learning repository [9]. But the time series prediction problems use a specific model to predict future values based on their previous values. The first is the Gas Furnace Dataset [10], which is compiled from Jenkins's Book of Time Series Analysis. It contains gas content and CO2 percentage in gas, and another is a Mackey glass dataset obtained from the below differential equation: All proposed hybrid algorithms in this article have been implemented using MATLAB software and have used 30 time run to evaluate the performance of these methods. The 4-fold-cross-validation method has been used to divide the original dataset into two training and testing sets. This method can effectively prevent trapping to local minima. Because both the training and testing samples contribute to learning as much as possible, it can provide a satisfactory learning effect. The average error is obtained from the 4-fold-crossvalidation which is presented as the final error of the network. In addition, the input dataset to the neural network is normalized using the min-max normalization method to the interval [-1.1]. The results of the comparison are presented in two parts. First, the proposed algorithms are compared with each other, and then the best proposed method is compared with the existing methods.

Comparing proposed methods with each other
Each of these algorithms has been executed 30 times, and the results of the experiments have been compared with each other according to three criteria: classification error percentage of training and testing data and complexity percentage. The function of error calculation For the Mackey glass is RMSE and for gas furnace is MSE. First, we compare the performance of two kind of training algorithm that consist of classic training algorithm (back-propagation) and evolutionary training algorithm (improved teaching learningbased optimization). The results from Table 1 show that the ITLBO algorithm has a higher efficiency for most data sets. According to Table 1, the ITLBO algorithm for all of classification problems has better performance than the Bp algorithm, then in part2 from Table 1 we showed the results of comparing proposed hybrid algorithms with each other. All the results are based on three characteristics (parameters) of training and testing error for classification, MSE error and complexity. To better demonstrate the superior algorithm, we did rank average test, and the rank average for different data set was presented in Table 2. As can be seen

Results of comparing the best proposed hybrid method with other methods
In this section, we compare our hybrid algorithms with other literature methods in Table 4. The percentage of training error and testing error collection in this table. Each article works on a batch of datasets. The cells of this table that don't have any value (that indicate with an -icon) shows that these values are missing data or belong to a dataset that articles don't work on this. We give a brief description of the comparative approaches as follows. We reference all the approaches that we compared our best proposed method with them.  Table 4. Comparing the results of best algorithm with other methods in literature

CONCLUSION
In this paper, we proposed a hybridization of training algorithms and constructive algorithms to simultaneously determine the weight and structure of the neural network. The goal is to examine hybridization of a deterministic and systematic procedure (constructive algorithm) with random search (evolutionary algorithm) for neural network optimization. Combined methods include the base and improved version of the TLBO algorithm with the MMOST algorithms. Then we compared hybrid algorithms, and selected the superior algorithm in classification and time series prediction problems. The results of the comparison illustrate the superior performance belongs to the MCO-ITLBO algorithm. This version has a powerful training algorithm against early convergence, and balances between exploitation and exploration. This algorithm in combination with the MMOST constructive algorithm, more effectively selects the optimal network structure. We have also verified these results with statistical tests, and finally this algorithm was compared with other methods in literature and it has been proven that it is more convenient than other algorithms for classification and time series prediction error. These promising results motivate us to find ways to change our path to future work. This development can be using chaotic (disorder) mappings in this method.