Optimization of Fuzzy Tsukamoto Membership Function using Genetic Algorithm to Determine the River Water

Some aquatic ecosystems in rivers depend on the river water, so it needs to be maintained by measuring and analyzing the river water quality. STORET is one of the methods used to measure the river water quality, but it takes a quite high of time and costs. Fuzzy Tsukamoto is an alternative method that works by grouping the river water data, but it is difficult to determine the membership function value. The solution offered in this study is the use of genetic algorithm to determine the membership function value of each criterion. Based on the test results, the optimization of fuzzy membership function using genetic algorithm provides higher accuracy value that is 95%, while the accuracy value without optimization process is 90%. The parameters used in genetic algorithm are as follows: population size is 80, generation number is 175, crossover rate ( cr ) is 0.6, and mutation rate ( mr ) is 0.4.


INTRODUCTION
Water is one of the natural resources that is very important to human beings and other beings. The river is one of the place or container to hold the water resources flowing to the lowest point on the earth's surface. Rivers have a role as home to several aquatic ecosystems on earth that flows from upstream to downstream. The river is one of the nearest water sources for some residents in rural and urban areas in Indonesia. However, the river which an important part of human life sources have been polluted. It is mostly caused by human activity that occurred in the catchment area which became a river water supply and then flowed to residential areas. Increased industrial activity and household activity became one of the main causes of pollution that occurred in the watershed.
The disposal of industrial waste and household waste that does not do the screening process contributed to a decrease in the quality and quantity of the river water. Therefore, it needs an effort to maintain the quality, quantity, and continuity of the water so the ecosystems that exist therein remain balanced [1]. This is done by monitoring and controlling the water pollution regularly. An effort to monitor and control the water pollution is done by measuring and analyzing the water quality so that the factors affecting the pollution can be known, as stipulated in the Indonesian Government Regulation No. 82 of 2001. Based on the Ministry of the Living Environment of Indonesia No. 115 of Article 2 of 2003, the guidelines used to determine the status of the water quality is using the STORET method or Pollution Index Method. Determination of the water quality using STORET method still done manually so it took quite a long time and needs a laboratory cost which is quite expensive. The development of existing technologies can provide solutions to assist in the calculation and determination of the water quality to reduce the time and cost required. One of the research relating to the determination of the river water quality has been conducted by Alam, Soebroto, and Dewi (2015) using the Fuzzy Tsukamoto method. In this study, the river water quality is calculated based on the criteria of chemical physics that have been stipulated in the Indonesian Government Regulation No. 82 of 2001. The parameters used in this study are TSS, BOD, DO, pH, phenols, oil, and grease. This study uses the 60 river water data to calculate its quality using the Fuzzy Tsukamoto method. Then, the calculation results are compared to the STORET method. Based on test results obtained, there are 6 different data between the calculations using the Fuzzy Tsukamoto method and the STORET method, so the accuracy value obtained is 90% [2]. An optimization process on the fuzzy membership function of each criterion that has been set is required to get a higher accuracy value.
Research on optimization using genetic algorithm has been successfully carried out by some previous researchers to solve different problems. The genetic algorithm has been applied to solve flexible job-shop scheduling problem [3], ship's route scheduling [4], and frozen food distribution [5]. Therefore, this research uses the genetic algorithm to optimize the membership function in the fuzzy Tsukamoto to measure the river water quality. The water quality will be divided into four classes based on some predetermined criteria. The solution offered in this research is forming a membership function in the fuzzy Tsukamoto to measure the river water quality to obtain higher accuracy value.

WATER
Water is part of the life of the earth's surface that has an important role. Water as one of the primary materials required to meet the needs of many people, even by all living beings. According to the Indonesian Government Regulation No. 82 of 2001, the water quality classification defined into four classes as follows:  Class I: water that can be used for drinking and or other uses that require the water quality equal to that usability.
 Class II: water that can be used for infrastructure/water recreation facilities, freshwater fish farming, animal husbandry, water to irrigate crops or other uses that require the water quality equal to that usability.
 Class III: water that can be used for freshwater fish farming, animal husbandry, water to irrigate crops or other uses that require the water quality equal to that usability.  Class IV: water that can be used to irrigate crops or other uses that require the water quality equal to that usability.
Based on the Ministry of the Living Environment of Indonesia No. 01 of 2010, the water pollution is defined as the entry of living beings, substances, energy or other components into the water caused by human activity. This can degrade the water quality up to a certain level, which causes the water cannot function according to its usefulness. The water pollution or commonly called liquid waste that pollutes the watershed can be divided into domestic waste and industrial waste.

PHYSICOCHEMICAL PARAMETERS
To find out the occurrence of water pollution, water pollutant parameter is used as an indicator to be able to conduct a prevention and control of pollution that occurred. This research uses the guidelines of the STORET (storage and retrieval) is a method used to assess water quality status. The basic concept of STORET is comparing the data of water quality and its standard [6]. aldrin, dieldrin, chlordane, dichlorodiphenyltrichloroethane (DDT), detergent, lindane, Polychlorinated Biphenyls (PCB), endrine, benzene hexachloride (BHC), fecal coliform, and total coliform. Determination of water quality using STORET have to use all of these parameters while in Fuzzy Tsukamoto, the water quality can be calculated only with 8 parameters as follows: Total Suspended Solid (TSS), Biological Oxygen Demand (BOD), Chemical Oxygen Demand (COD), Dissolved Oxygen (DO), degree of acidity (pH), phenol, fats, and oils. The use of Fuzzy Tsukamoto to determine the water quality by using fewer parameters than STORET will be able to save the cost of inspection.

FUZZY LOGIC
Lothfi A. Zadeh introduced fuzzy logic, which is the set theory of logic to overcome the concept of value between the truth values 'true' and 'false'. Fuzzy logic adopts the human way of thinking so that the value is not only 0 and 1, but also all the possibilities between 0 and 1 [7]. The language variables introduced to describe fuzzy phenomena in the natural language quantitatively. If the value is expressed in terms of language, fuzzy truth value to be very true, true, fairly true, less true, and not true [8].
In fuzzy concept, formulating an input to an output using fuzzy logic called fuzzy reasoning. Fuzzy reasoning can be solved using Tsukamoto, Mamdani, and Sugeno. Each type of reasoning will provide a different way to get the output. Fuzzy reasoning consists of five main parts as follows [9]: a. Fuzzification of input variables in the form of crisp data. b. The use of fuzzy operator (OR or AND). c. The implications of the premise. d. Tggregation effect based on the rule base that has been determined. e. Defuzzification.

GENETIC ALGORITHM
The genetic algorithm (GA) inspired by evolutionary biology, which used to find approximate solutions to an issue, especially in optimization problems [10]. GA has a population consisting of some chromosomes that represent possible solutions. GA has three main processes to form a new generation in each iteration that is crossover, mutation, and selection [11]. These processes are the basic principles of GA to conduct exploration and exploitation among the possible solutions [12].
The process of genetic algorithm as follows [13]: a. Generating a population consisting of several random individual chromosomes, that have the composition of specific genes. b. Calculating the fitness value of each individual. c. Conducting the reproduction process to produce offspring by performing crossover and mutation. d. Selecting individuals from a population consisting of parents and offspring to stay alive for the next generations to replace the old population.

METHODOLOGY
The data used in this study consists of 60 records of the river water sample. This data will be grouped into four classes according to the classification of the water quality that has been regulated by Indonesian Government Regulation No. 82 of 2001. The water quality symbolized by A for class I, B for class II, C for class III, and D for class IV.

Chromosome Representation
The first step in the GA process is determining the type of chromosome representation that will be used. In this research, the type of chromosome representation used is a real code. Each chromosome has genes as much as 20 which represents the value of the fuzzy membership functions for all criteria. Figure 1 shows a chromosome representation that can be formed.

Fitness Function
To calculate the fitness values, the gene values on each chromosome are applied to the fuzzy membership function according to the parameters that have been determined. Then, 60 records of the river water samples are predicted their class type using fuzzy Tsukamoto based on the membership function that has formed. Having obtained their class type, the accuracy value calculated by comparing the classification results using fuzzy Tsukamoto with the classification results using the STORET method. The fitness value is the accuracy value calculated using Equation (1)

Crossover
The crossover process works by exchanging information between the chromosomes of two parents [14]. Crossover provides a method to explore a new region in the search space [15].
The type of crossover method used is extended intermediate crossover, which works by combining genes from both parents to produce two offspring [16].

Mutation
The mutation process prevents the obtained solutions fall into local optimum because it provides a diversity of chromosomes in a population [14]. Mutation plays a role in restoring the lost genetic information as well as disturbs genetic information randomly to help the exploration process in the search space [17].
The type of mutation used is the random mutation, which works by changing the value of the selected gene with a small random number. Suppose variable x i has a value range from min i to max i and offspring formed is C=[x' i , ..., x' n ], then the offspring can be formed using Equation 3, by choosing a random value in the range of values [-0.1, 0.1] for r.

Selection
Selection is a step in GA which is used to select individuals that will be used in the next generation [18]. The selection method used is elitism. This selection types work by selecting individuals who have a greater fitness for survival.
All chromosomes in a population consisting of parents and offspring are combined into one and then sorted from the highest to the lowest fitness value. The top of chromosomes in the population as much as population size (popsize) taken for use in the next generation [19]. Tests conducted in this study consist of three testing types, namely testing of the population size, testing of the combination of cr and mr, and testing the generation number. This testing is done to obtain the best parameters of the genetic algorithm. Each test scenario performed 10 times to obtain an average fitness value.
In the testing of population size, a population that is used is the numbers with multiples of 20 starting from number 20 to 200. The cr value used is 0.5 and the mr value is 0.5. The generation number used is 100. The test results of the population size shown in Figure 9. Figure 9. Test results of the population size Figure 9 shows that the best average fitness value produced by population size of 80 is 0.938. The test results show that the bigger a population, the greater the possibility to obtain optimum solutions so that it can produce high fitness values.
The testing of the combination of cr and mr used to determine the most optimal cr and mr values in order to produce the best solution. The population size used is 80. The generation number used is 100. The test results of the combination of cr and mr are shown in Figure 10.  Figure 10 shows that the best average fitness value produced by the combination of cr is 0.6 and mr is 0.4. Based on Figure 10, it can be seen that the average fitness obtained is very diverse because of the absence of provisions cr and mr that should be used to obtain an optimal solution. Each issue will have a combined value of cr and mr that is different. If the value of cr and mr are not well defined, then the exploration and exploitation cannot produce a solution that is favorable, so premature convergence may occur [20].
The testing of generation number is done to get an optimal generation number to produce an optimal solution. The generation number used is the numbers with multiples of 25 starting from number 25 to 250. The other parameters used are the parameters from the previous testing results as follows: the population size is 80, the cr value is 0.6, and the mr value is 0.4. The testing result of generation number is shown in Figure  11.

Figure 11. Test result of the generation number
Based on the test results of the generation number in Figure 11, the generation number as much as 175 produce the highest average fitness value. The generation number of 75 until 150 producing unstable fitness value. While the generation number of 175 until 250 has started to show stability in the absence of great change on the average fitness value. The more the generation number, the longer computation time required and the resulting solution is not necessarily optimal.
The parameters of the genetic algorithm which has been obtained from the testing results used to classify the 60 records of the river water sample. Table 1 shows a comparison of the water quality classification results in 2009 using STORET, Fuzzy Tsukamoto, and optimized Fuzzy Tsukamoto. Based on the classification results using fuzzy Tsukamoto, 57 records of the river water sample produce the same class type of classification results were performed using STORET method. This indicates that the accuracy value obtained is 95%.

CONCLUSION
Based on the testing results and analysis that has been done, it can be concluded that the genetic algorithm can be used to optimize the membership function in the fuzzy Tsukamoto to determine the river water quality by using a real code representation. The genetic algorithm parameters that are used greatly affect the best chromosome that will be used as the value of membership function in the fuzzy Tsukamoto. The parameters to produce the best solution are as follows: population size is 80, generation number is 175, cr is 0.6, and mr is 0.4. The optimization process of membership function in the fuzzy Tsukamoto using genetic algorithm to measure the river water quality is proven to produce higher accuracy value that is equal to 95% compared to the measurement without optimization that is 90%. In further research, the optimization of rule base in Fuzzy Inference System Tsukamoto can be done using the genetic algorithm so that the accuracy value obtained to be higher than the optimization process that is only done in the membership function. The use of hybridization of genetic algorithm and Variable Neighborhood Search (VNS) can also be developed on this issue so that the solution obtained optimal and efficient [21].