A survey of search-based refactoring for software maintenance

Mohan, Michael; Greer, Des

doi:10.1186/s40411-018-0046-4

Review
Open access
Published: 07 February 2018

A survey of search-based refactoring for software maintenance

Journal of Software Engineering Research and Development volume 6, Article number: 3 (2018) Cite this article

10k Accesses
26 Citations
4 Altmetric
Metrics details

Abstract

This survey reviews published materials related to the specific area of Search-Based Software Engineering that concerns software maintenance and, in particular, refactoring. The survey aims to give a comprehensive review of the use of search-based refactoring to maintain software. Fifty different papers have been selected from online databases to analyze and review the use of search-based refactoring in software engineering. The current state of the research is analyzed and patterns in the studies are investigated in order to assess gaps in the area and suggest opportunities for future research. The papers reviewed are tabulated in order to aid researchers in quickly referencing studies. The literature addresses different methods using search-based refactoring for software maintenance, as well as studies that investigate the optimization process and discuss components of the search. There are studies that analyze different software metrics, experiment with multi-objective techniques and propose refactoring tools for use. Analysis of the literature has indicated some opportunities for future research in the area. More experimentation of the techniques in an industrial environment and feedback from software developers is needed to support the approaches. Also, recent work with multi-objective techniques has shown that there are exciting possibilities for future research using these techniques with refactoring. This survey is beneficial as an introduction for any researchers aiming to work in the area of Search-Based Software Engineering with respect to software maintenance and will allow them to gain an understanding of the current landscape of the research and the insights gathered.

1 Introduction

SEARCH-Based Software Engineering (SBSE) concerns itself with the resolution of software engineering optimization problems by restructuring them as combinatorial optimization problems. The topic has been addressed and researched in a number of different areas of the software development life cycle, including requirements optimization, software code maintenance and refactoring, test case optimization and debugging. While the area has existed since the early 1990s and the term “search-based software engineering” was originally coined by (Harman & Jones, 2001), most work in this area has been recent with the number of published papers on the topic exploding in the last number of years (De Freitas & De Souza, 2011). Many of the papers in the area of SBSE propose using an automated approach to increase the efficiency of the area of the software process looked at. Of the papers published concerning SBSE, a relatively small amount is related to software maintenance. This is despite the fact that it is estimated that the maintenance process takes 70–75% of development effort (Bell, 2000; Pressman & Maxim, 2000) of the development process.

Software code can fall victim to what is known as technical debt. For a software project, especially large legacy systems, the structure of the software can be degraded over time as new requirements are added or removed. This “software entropy” implies that over time, the quality of the software tends towards untidiness and clutter. This degradation leads to negative consequences such as extra coupling between objects and increased difficulty in adding new features. As a result of this issue, the developer often has to restructure the program before new functionality can be added. This costs the developer time as the overall development time for functionality is offset by this obligatory cleaning up of code.

SBSE has been used to automate this process, thus decreasing time taken to restructure a program. SBSE can be applied to software maintenance by applying refactorings to the code to reduce technical debt. Using a search-based algorithm, the developer starts with the original program as a baseline from which to improve. The measure of improvement for the program is an uncertain aspect and can be subjective and so can be done in a variety of different ways. The developer needs to devise a heuristic, or most likely a set of heuristics to inform how the structure of the program should be improved. Often these improvements are based on the basic tenets of object-oriented design, where the software has been written in an object-oriented language (these tenets consist of cohesion, coupling, inheritance depth, use of polymorphism and adherence to encapsulation and information hiding). Additionally, there are other sources of heuristics such as the SOLID principles introduced by (Martin, 2003). The developer then needs to devise a number of changes that can be made to the software to refactor it in order to enforce the heuristics. A refactoring action modifies the structure of the code without changing the external functionality of the program. When the refactorings are applied to the software they may either improve or impair the quality, but regardless, they act as tools used to modify the solution.

The refactorings are applied stochastically to the original software solution and then the software is measured to see if the quality of the solution has improved or degraded. A “fitness function” combining one or more software metrics is generally used to measure the quality. These metrics are very important as they heavily influence how the software is modified. There are various metric suites available to measure characteristics like cohesion and coupling, but different metrics measure the software in different ways and thus how they are used will have a different affect on the outcome. The CK (Chidamber & Kemerer, 1994) and QMOOD (Quality Model for Object-Oriented Design) (Bansiya & Davis, 2002) metric suites have been designed to represent object-oriented properties of a system as well as more abstract concepts such as flexibility.

Metrics can be used to measure single aspects of quality in a program or multiple metrics can be combined to form an aggregate function. One common approach is to give weights to the metrics on which heuristics are more important to maintain and combine them into one weighted sum (although this weighting process is often subjective). This weighting may be inappropriate since there is a possibility of metrics conflicting with each other. For instance, one metric may cause inheritance depth to be improved but may increase coupling between the objects. Another method is to use Pareto fronts (Harman & Tratt, 2007) to measure and compare solutions and have the developer choose which solution is most desirable, depending on the trade-offs allowed. A Pareto front will indicate a set of optimal solutions among the available group and will allow the developer to compare the different solutions in the subset according to each individual objective used.

Using the metric or metrics to give an overall fitness value, the fitness function of the search-based technique measures the quality of the software solution and generates a numerical value to represent it. In the solution, refactorings are applied at random and then the program is measured to compare the quality with the previously measured value. If the new solution is improved according to the software metrics applied, this becomes the new solution to compare against. If not, the changes are discarded and the previous solution is kept. This approach is followed over a number of iterations, causing the software solution to gradually improve in quality until an end point is reached and an optimal (or near optimal) solution is generated. The end point can be triggered by various conditions such as number of iterations executed or the amount of time passed. The particular approach used by the search technique may vary depending on the type of search-based approach chosen, but the general method consists of iteratively making changes to the solution, measuring the quality of the new solution, and comparing the solutions to progress towards a more optimal result.

This survey aims to review and analyze papers that use search-based refactoring. We apply certain inclusion and exclusion criteria to find relevant papers across a number of research databases. We highlight different aspects of the research to inspect and analyze through a set of research questions. We also identify related work and highlight the differences between those papers and this survey (e.g. many of the related reviews investigate other areas of SBSE or different aspects of refactoring like UML model refactoring or refactoring opportunities. One study that looks at search-based refactoring is investigated in more detail in Section 7.1 to compare similar aspects of the analysis conducted in the paper). Each paper is reviewed and summarised to give an overview of any experiments conducted and the results gained. The overview of the papers is organised into five different groups to cluster together related studies. The papers are then analyzed to address the research questions outlined and derive similarities and differences between the studies. Various different aspects of the papers are analyzed as well as research gaps and possible areas for future work in the area.

The remainder of the survey is structured as follows. Section 2 gives an overview of some of the more common search techniques used for refactoring in SBSE. Section 3 gives an outline of how the survey is conducted, along with an outline of aspects to be measured and analyzed, and introduces the research questions. Section 4 gives a synopsis of the analyzed papers. Section 5 analyzes the papers reviewed and measures patterns that can be derived from the work conducted in the literature. Section 6 discusses and addresses the research questions outlined in Section 3. Section 7 gives an overview of related work along with a discussion of the differences and similarities. Section 8 looks at validity threats in the survey and Section 9 concludes the survey.

2 Search techniques

There are numerous different metaheuristic algorithms available to use in the SBSE field. These methods are used to automate search-based problems through gradual quality increases. Random search is used as a benchmark for most search-based metaheuristic algorithms to compare against. Although most metaheuristics use a non-deterministic approach to making choices, the choice must be assessed for validity and a fitness function is used to evaluate whether the search should continue from that point or backtrack. Below, the most common metaheuristic algorithms used in the literature are discussed:

2.1 Hill climbing

Hill climbing (HC) is a type of local search algorithm. With the HC approach, a random starting point is chosen in the solution, and the algorithm begins from that point. A change is then made, and the fitness function is used to compare the two solutions. The one with the highest perceived “quality” becomes the new optimum solution and the algorithm continues in this way. Over time, the quality of the solution is improved as less optimal changes are discarded and better solutions are chosen. Eventually, an optimal or sub-optimal solution is reached with the same functionality but a better structure. This is considered a fast algorithm in relation to the other metaheuristic choices but, as with other local search algorithms, it has the risk of being restricted to local optima. The algorithm may “peak” at a less optimal solution (akin to reaching a peak after climbing a hill). There are two main types of HC search algorithm that differ in one aspect. First-ascent HC is the simpler version of the algorithm, whereas steepest-ascent HC has a slightly more sophisticated search method and is a superior choice for quality. Other variations are stochastic HC (neighbors are chosen at random and compared) or random-restart HC (algorithm is restarted at different points to explore the search space and improve the local optima reached). HC is one of the more common search algorithms used in SBSE, and has similarities to other search techniques. The HC technique may not produce solutions as effective as some others do, but it does tend to find a suitable solution faster and more consistently (O’Keeffe & Cinnéide, 2006).

2.2 Simulated annealing

Simulated annealing (SA) is a modification of the local search algorithm, used to address the problem of being trapped with a locally optimum solution. In SA, the basic method is the same as the HC algorithm. The metaheuristic checks stochastically between different variations of a solution and decides between them with a fitness function until it reaches a higher quality. The variation is that it introduces a “cooling factor” to overcome the disadvantage of local optima in the HC approach. The cooling factor adds an extra heuristic by stating the probability that the algorithm will choose a solution that is less optimal than the current iteration. While this may seem unintuitive, it allows the process to explore different areas of the search space, giving extra options for optimization that would otherwise be unavailable. This probability is initially high, giving the search the ability to experiment with different options and choose the most desirable neighborhood in which to optimize. This is then generally decreased gradually until it is negligible. The probability given by the cooling factor is normally linked to a “temperature” value that is used to simulate the speed in which the algorithm “cools”. Although the SA process may come up with a better solution compared to the HC process, HC is a lot more reliable as the SA process may struggle to settle on a solution.

2.3 Genetic algorithms

Genetic algorithms (GAs) are a class of evolutionary algorithms (EAs) that, much like SA, mimic a process used elsewhere in science, namely the reproduction and mutation processes in genetics and natural selection. GAs use a fitness function to measure the quality among a number of different solutions (known as “genes”) and prioritize them. At each generation (i.e. each iteration of the search), the genes are measured to determine which are the “fittest”. Each generation, in order to introduce variation into the gene pool, a proportion of the population is selected and used to breed the new generation of solutions. With this selection, two steps are used to create the new generation. First, a crossover operator is used to create the child solution(s) from the parents selected. The algorithm itself determines exactly how the crossover operator works, but generally, selections are taken from each parent and spliced together to form a child. Once the child solution(s) have been created, the second step is mutation. Again, the mutation implementation depends on the GA written. The mutation is used to provide random changes in the solutions to maintain variation in the selection of solutions and prevent convergence. After mutation is applied to a selection of the child solutions, the newly created solutions are inserted back into the gene pool. At this point the algorithm calculates the fitness of any new solutions and reorders them in relation to the overall set. Generally, a population size is specified, and this ensures that the weakest solutions are weeded out of the gene pool each generation. This process is repeated until a termination condition is reached.

2.4 Multi-objective evolutionary algorithms

When refactoring a software project, as with other areas of software engineering, there are likely numerous conflicting objectives to address and optimize. A multi-objective algorithm can be used to consider the objectives independently instead of having to combine them into one overarching objective to improve. There are numerous EAs available that are used for multi-objective problems, known as multi-objective evolutionary algorithms (MOEAs). The downside to using multi-objective algorithms for software refactoring over the mono-objective metaheuristic algorithms is that the extra processing needed to consider the various objectives can cause an increase in the time needed to generate a set of solutions. Another issue is that when a MOEA generates a population of solutions, the “best solution” is up to the interpretation of the user. Whereas a single-objective EA can rank the final population of solutions by a single fitness value, there may be numerous possible choices in the MOEA population depending on which objective fitness is more important. On the other hand, this gives the user multiple options depending on their desire or the situation.

Most MOEAs use Pareto dominance (Coello Coello, 1999) in order to restrict the population of solutions generated. If, for a solution, at least one objective of that solution is better than in another solution and none of the remaining objectives are worse, that solution is said to dominate the other solution. Therefore, a solution is non-dominated unless another solution in the population dominates it. When Pareto dominance is used to generate an optimal set of solutions, they can be inspected on a Pareto front. When the number of objectives is less than or equal to three, the non-dominated solutions can be visualized on the Pareto front. This allows the user to easily visualize the state of the solutions according to each independent objective and choose the most suitable one along the Pareto front. Multi-objective algorithms that are designed to handle more than three objectives are generally referred to as many-objective algorithms (Deb & Jain, 2013). These tend to avoid using only Pareto dominance as it can have difficulty successfully handling more than three objectives (Jain & Deb, 2014). As the amount of objectives to measure increases, it becomes more difficult to rank the solutions into different fitness fronts (as an increasing amount become non-dominated, which results in many of the solutions being given the same fitness rank). When this happens, the multi-objective algorithm becomes less useful at discerning better populations of solutions. Table 1 lists MOEAs that use Pareto dominance to choose solutions. For further information, a survey of MOEAs is given by (Coello Coello, 1999).

Table 1 MOEAs That Use Pareto Dominance

A survey of search-based refactoring for software maintenance

Abstract

1 Introduction

2 Search techniques

2.1 Hill climbing

2.2 Simulated annealing

2.3 Genetic algorithms

2.4 Multi-objective evolutionary algorithms

3 Survey outline

4 Refactoring in search-based software engineering

4.1 Refactoring to improve software quality

4.2 Refactoring for testability

4.3 Testing metric effectiveness with refactoring

4.4 Refactoring to correct software defects

4.5 Refactoring tools

5 Analysis

5.1 Papers published over time

5.2 Types of paper

5.3 Authors

5.4 Types of study

5.5 Refactoring approaches

5.6 Search techniques

5.7 Input programs used

5.8 Tools

5.9 Metrics

5.10 Research gaps and opportunities

6 Discussion

7 Related work

7.1 Comparison

8 Threats to validity

9 Conclusion

Abbreviations

References

Acknowledgments

Funding

Availability of data and materials

Authors’ contributions

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Author’s information

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Appendix

Additional file

Additional file 1:

Rights and permissions

About this article

Cite this article

Share this article

Keywords