Optimizing requirement analysis by the use of meta-heuristic in Search Based Software Engineering

Requirements analysis is the ﬁrst phase of software development process and it is one of the main concerns of software engineers. The selection of requirements is a complex problem caused by the heterogeneity of the users and their varied interests and demands. In this paper, it is justiﬁed that their is a strong need of optimization in requirement analysis. The paper argues that requirement selection can be viewed as an application area of Search-Based Software Engineering(SBSE). The aim is to justify the claim that requirement engineering can be re-formulated as search problem to which meta-heuristic technique can be applied.

INTRODUCTION Requirement analysis is an important part of the software engineering process. The process of requirement engineering is described as a series of activities including elicitation, modeling & analysis, specification and validation & verification.
In a problem, sometimes the size of requirements is huge and it is very difficult to implement all the requirements. It is also observed that the requirements gathered from heterogeneous users are of varying size.
A key aspect in any successful project is to determine an appropriate set of requirements for implementation. The requirements selection needs to be optimized to limit the infinite flow of demands posed by the users. Requirements selection is an engineering process to select an optimal set of requirements out of all the requirements proposed by the customers. The objective of requirement analysis is to address the question: "What do you want to achieve in a product or service?", which however is a subtle task in the praxis.
Hence, a formalized representation technique is required to optimize the requirement selection process. Thus, it is a big challenge to represent these requirements in a formalized way. This justify that Requirements selection can be viewed as an application area of Search-Based Software Engineering (SBSE). Search based Software Engineering (SBSE) seeks to reformulate Software Engineering problems as search problems.There is a need of optimization to find out the perfect requirement set using some metaheuristics. Harman [1,2] coined a term Search Based Software Engineering (SBSE) in which Search Based Optimisation is applied to Software Engineering. The first objective of this paper is to reformulate the requirement selection as a search based problem.

REQUIREMENT SELECTION: A SEARCH PROBLEM
It is assumed that an initial set of requirements has been collected and the requirements have been identified using a requirements elicitation process. The task of optimizing requirements is usually regarded to be a challenging and hard problem in itself [3,4,5] Nevertheless, there are still several feasible approaches in previous works that address the solution of this problem [6,7,8] but none is absolute. So, in software engineering, determining the set of requirements is a critical foundation for the success of a project. Including or excluding requirements inappropriately may result in the emergence of products that fail to satisfy stakeholders' needs and might cause a huge loss of revenue. However, uncertainty is inevitable in the early phase of requirements engineering and could lead to unsound requirement decisions. So, requirement engineering needs some intelligent techniques to overcome these type of problems. So, first objective is to represent the problem in the form of search space. Then selection will be performed on using some fitness function followed by encoding scheme. Then crossover, mutation and replacement will be take place and repeat these steps until the terminating criteria reached.
The framework proposed in the paper is working in the context of requirement representation in the form of state space. The emphasis of the framework is to provide a systematic method which uses the searchbased optimization approach to solve the computational and cognitive complexity of the requirement selection.

SEARCH BASED SOFTWARE ENGINEERING
Search Based Software Engineering is the name given by Harman [1] to a oeuvre in which Search Based Optimization is applied to Software Engineering. Optimization approaches have been applied to solve software engineering problems since the 1970s decade [9]. There are different solutions to optimization problem and one solution is searching. According to Harman [10] there are three aspects for formulating software engineering problem as search based Optimization problem.

Problem representation
Problem representation should be in the form of state space. Choosing a proper problem representation technique is essential for reconstructing a problem into a search-based optimization problem. To represent the problem in the form of state space firstly state space need to be define.

State space representation
Search-based requirements selection and optimization lies within the general SBSE framework [11]. SBSE is concerned with the application of search-based algorithms to software engineering topics such as Requirement analysis. The purpose of all search algorithms is to explore the optimal, near-optimal or "good enough" solutions among a number of possible solutions in a search space. Search/state space search is formally describes a search problem. A search problem consists of the following:

Use of meta-heuristic
Different meta-heuristic technique are available such as Genetic Algorithms (GA) [12], Simulated Annealing (SA) [13] and Hill Climbing (HC) [14]. In this paper Genetic Algorithm has been proposed to search the state space.

Genetic algorithms
GA is a generalized search and optimization technique. It works with populations (chromosomes) of individuals, each representing a possible solution to a given problem. Each individual is evaluated to give some measure of its fitness to the problem from the objective functions. Algorithm applies the principle of survival of fittest to find better approximations. At each generation, a new set of approximation is created by the process of selecting individual potential solutions according to their level of fitness in the problem domain and breeding Ì ISSN: 2088-8708 them together using operators borrowed from natural genetics. A simple GA that yields good results in many practical problems is composed of following operators [15]:

Encoding schemes-categorization
Depending on the structure of encoding, it can be classified into two groups, namely, one dimensional and two-dimensional. Binary, Value and Permutation encoding is one dimensional and Tree encoding is two dimensional encoding schemes [6]. Studying these encoding schemes, one can infer that characters represented by permutation encoding are position dependent. In Binary encoding, real value encoding, the characters are value oriented. The two factors identified by studying different encoding schemes are locus and value of character in the chromosome. So, factors like locus and value should be kept in mind while encoding a solution for a particular problem.

Goldberg's classification of encoding techniques
Goldberg has declared in his work that fitness function for a specific encoding scheme is dependent on two factors-value and order [8]. Three different categories of encoding can be grouped depending on fitness evaluation factors such as:

NEED OF NOVEL ENCODING
It can be stated that the existing encoding schemes fall under the three categories mentioned above and they are dependent on value or order or both factors for assessment of fitness function. These existing encoding schemes are not suitable for the representation of the requirements because there is no mathematical way to represent all the requirements in the form of state space. Consequently, there arises a need to unearth a new encoding scheme that is independent of these two factors. It is a big challenge for the researchers to represent the requirements, gathered from the different users, in the form of a chromosome. The size of the requirements is not fixed. In this purposed encoding scheme, requirements are stored in the form of a SET.

Prerequisites for purposed encoding scheme
It is assumed that there are N users and M possible system requirements are gathered from users, out of these M requirements some are common and some are altogether different. Every requirement is assign a unique number on the basic of their uniqueness. It is assumed that there is a SET of users, N={N 1 , N 2 , ...., N n } and a SET of possible system requirements, In this new Encoding scheme all the requirements gathered from a user are represented as a chromosomes in raw form. In figure 1, chromosome 1 represent the requirements gathered from user 1 and so on.

FITNESS FUNCTION
It can not be denied that there is uncertainty, inconsistency and ambiguity in the requirement analysis phase. Requirement selection is a complex task as it is difficult to select the fitness function for requirements. No mathematical function is applicable to measure the fitness value of a particular requirement [16]. Thus, in this section, the author tries to justify that Human Based Computation can be used as a fitness function for the genetic algorithm. Human-based computation is a technique in which human computational power is utilized to solve the problems that computers cannot yet solve [17]. Alex Kosorukoff [18], who coined the term, designed a genetic algorithm that allows humans to suggest solutions that might improve evolutionary processes. Since the evolution of Artificial Intelligence(AI) research in the 1950s, computer scientists have been trying to imitate human-like capabilities, such as visual processing, language and reasoning. Alan Turing wrote in 1949:The idea behind computers may be explained by saying that these machines are intended to carry out any operations which could be done by a human mind using mathematical model. [19] The idea of using human effort to perform tasks that computers cannot yet perform [20] is called Human-based computation. It is a technique that makes use of human abilities for computation to solve complex problems [21]. The thesis "Human Computation" [17] defines the term as: A paradigm for utilizing human processing power to solve problems that computers cannot yet solve.
Both CAPTCHA (which stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart) and reCAPTCHA are the invention of Luis von Ahn, a Carnegie Mellon computer scientist and MacArthur "genius grant" recipient. "A couple hundred million CAPTCHAs are typed daily around the world," von Ahn tells NEWSWEEK. "The first time I did the calculations, I felt quite proud. And then I felt bad because people really find these annoying." They're also wasteful. It takes about 8 to 10 seconds to type a CAPTCHA more, obviously, if you err and have to start over meaning a total of some 400,000 human hours per day are spent typing them in. As a point of comparison, according to von Ahn, the Empire State Building took 7.5 million human hours to build. "Life is only like 750,000 hours," The Author says. "It's almost the equivalent of a life. One thought, is there any way oneself can use this human effort in a way that's good for humanity?" Recognizing distorted words is one of the things that the human mind can still do better than computers. In order to make old newspaper, books and other texts searchable, pages are scanned and fed into optical character recognition software. Because ink and paper degrade over time, some words remain incomprehensible. The reCAPTCHA system presents Web users with two words: one word that computers can not read, and one that they can. So long as you type the known word in correctly, and a few other people agree with you on the unknown word, you have helped digitize an archival page. And, Von Ahn says, typing in two words instead of one does not cost you a significant amount of extra time.
It has been said that by oneself should not ask what is next in terms of what the technology and Internet will be able to do but instead try to understand what society already got and figure out how to put it to good use. Von Ahn's efforts surely prove that point. They also show that in some ways, peoples can help computers as much as they help society.
Another example of human based computation is Duolingo. Duolingo is a free-ware language learning platform. It includes a language learning website and application, as well as a digital language competence assessment exam. As of Aug., 2018, the language learning website and application offer 81 different language courses across 36 languages. The application has about 350 million registered users across the world. Duolingo wins Apple's iPhone App of the Year award in 2013.
On the basis of above instances, it is concluded that human based computation is a compelling fitness function in requirement selection. With the ever-increasing number of Internet users, the relevance of man based computation becomes all the more prominent. It is a potential medium to procure the fitness value of requirements.

SELECTION
Selection is the first genetic operation in the reproductive phase of genetic algorithm. The objective of selection is to choose the fitter individuals in the population that will create offsprings for the next generation, commonly known as mating pool. The mating pool thus selected takes part in further genetic operations, advancing the population to the next generation and hopefully close to the optimal solution. In other words, Selection is the process of choosing breeding stock or parents from a population. As the generations pass, the Ì ISSN: 2088-8708 members of population should get fitter and fitter. Individuals from the mating pool are used to generate new offsprings, with the resulting offspring forming the basis of next generation. So it is desirable that the mating pool should have good individuals. Selection operator works at the level of chromosomes. The key idea of selection operator is to give preference to better individuals by allowing them to pass on their genes to the next generation and prohibit the entrance of worst fit individuals into the next generation. The goodness of each individual depends on its fitness. Fitness value is determined by an objective function [21]. Selection of individuals in the population is fitness dependent and is done using different algorithms [8]. Some are roulette wheel selection, rank selection, tournament, steady state selection and many more. Selection acts as active force in a genetic algorithm by regulate the genetic search towards favorable domain in the search space. Selection operator emulate phenomena and processes in nature. Selection chooses more fit individuals in analogy to Darwin's theory of evolution -survival of fittest [22]. All the individuals have a chance of being selected into the mating pool, but there are chances that an individual in the population can be selected more than once depending upon its fitness. Selection schemes are characterised by selection pressure, selection variance and loss of diversity. They primarily determine the convergence characteristics of genetic algorithms. Selection has to be balanced. Too strong selection means suboptimal highly fit individuals will take over the population reducing the diversity and too weak selection will result in too slow evolution [23]. Goldberg and Deb grouped selection methods in to four categories: Proportionate, Ranking, Tournament and Steady state selection.

. Rank selection
Rank selection is used in this paper for selection. Rank Selection sorts the population first according to fitness value and ranks them. Rank N is assigned to the best individual and rank 1 to the worst individual. Then every chromosome is allocated selection probability with respect to its rank [24]. Individuals are selected as per their selection probability. Rank selection is an explorative technique of selection. Rank selection prevents too quick convergence and differs from roulette wheel selection in terms of selection pressure. Rank selection overcomes the scaling problems like stagnation or premature convergence. Ranking controls selective pressure by uniform method of scaling across the population. Rank selection behaves in a more robust manner than other methods [25,26].

7.
CROSSOVER Genetic algorithms are optimization algorithms and mimic the natural process of evolution. Important operators used in genetic algorithms are selection, crossover and mutation. In the previous chapters, different forms of selections have been discussed. Crossover and mutation operators are used to introduce diversity in the population. Type of crossover and mutation operator used for a problem depends on the type of encoding used. Different crossover operators are described in this section. In natural systems, crossing-over is a complex process that occurs between pairs of chromosomes. Two chromosomes are physically aligned, breakage occurs at one or more corresponding locations on each chromosome, and homologous chromosome fragments are exchanged before the breaks are repaired. This results in a recombination of genetic material that contributes to variability in the population. In genetic algorithms, crossover operator exchanges substrings between chromosomes represented as linear strings of symbols. The basic crossover operation is a three-step procedure [27]. First, two individuals are chosen at random from the population of 'parent' strings generated by the selection operator. Second, one or more string locations are chosen as breakpoints or crossover points delineating the string segments to exchange. Finally, parent string segments are exchanged and then combined to produce two resultant offspring individuals. Crossover operates on selected genes from parent chromosomes and creates new offspring. Simplest crossover may be exchanging genetic material of two strings with respect to single crossover point. Crossover can be quite complicated and depends mainly on the encoding of chromosomes. Specific crossover made for a specific problem can improve performance of the genetic algorithm. Crossover combines parental solutions to form offspring with a hope to produce better solutions. Crossover operators are critical in ensuring good mixing of building blocks [28].

Int J Elec & Comp Eng
ISSN: 2088-8708 Ì 4341 7.1. Uniform crossover Uniform crossover operator does not divide the parent chromosome into segments for recombination. Rather, it treats each gene of the chromosome independently to choose for the offspring. In Uniform crossover, number of crossover points is not fixed initially. It recombines genes of parent chromosomes on the basis of crossover mask. It selects x number of crossover points in the chromosome where the value of x is a random value less than the length of the chromosome. Crossover mask is generated according to this random value. In this crossover, each gene in the offspring is created by copying the corresponding gene from one of the parents. The selection of the corresponding parent is undertaken via a randomly generated crossover mask [29,30,31] . At each index, the offspring gene is taken from the first parent if there is a 1 in the mask at this index, and if there is a 0 in the mask at this index, the gene is taken from the second parent. Due to this construction principle uniform crossover does not support the evolvement of higher order building blocks. Uniform crossover does not exhibit positional bias but do exhibit distributional bias due to which uniform crossover has a strong tendency towards transmitting 50% of the genes from each parent and against transmitting an offspring a large number of co-adapted genes from one parent. Table 1. Uniform Crossover Parent 1 R 1 , R 2 , R 8 , R 4 , R 9 , R 6 , R 7 Parent 2 R 4 , R 2 , R 6 , R 3 , R 5 , R 7 , R 1 , R 8 Mask 1 1 0 1 0 1 1 0 Child 1 R 1 , R 2 , R 6 , R 4 , R 5 , R 6 , R 7 , R 8 Child 2 R 4 , R 2 , R 8 , R 3 , R 9 , R 7 , R 1

CONCLUSION AND FUTURE WORK
The application of metaheuristics to resolve Software Engineering problems is part of a nearly new field called Search Based Software Engineering (SBSE). In this paper, the author justified the claim that requirement engineering can be re-formulated as a search problem. This paper has presented the novel encoding technique for requirement optimization. The author used the idea of harnessing human computation power in order to solve a problem that computers cannot yet solve. People engage in computation not because they want to do a good deed but because they enjoy it. The future of SBSE is a luminous one. There are many areas to which the techniques associated with SBSE surely apply, but have yet to be fully considered. In existing fields of application, the results are already very encouraging.