A genetic algorithm based framework for software effort prediction

Murillo-Morera, Juan; Quesada-López, Christian; Castro-Herrera, Carlos; Jenkins, Marcelo

doi:10.1186/s40411-017-0037-x

Research
Open access
Published: 31 May 2017

A genetic algorithm based framework for software effort prediction

Juan Murillo-Morera¹,
Christian Quesada-López¹,
Carlos Castro-Herrera² &
…
Marcelo Jenkins¹

Journal of Software Engineering Research and Development volume 5, Article number: 4 (2017) Cite this article

8302 Accesses
16 Citations
Metrics details

Abstract

Background

Several prediction models have been proposed in the literature using different techniques obtaining different results in different contexts. The need for accurate effort predictions for projects is one of the most critical and complex issues in the software industry. The automated selection and the combination of techniques in alternative ways could improve the overall accuracy of the prediction models.

Objectives

In this study, we validate an automated genetic framework, and then conduct a sensitivity analysis across different genetic configurations. Following is the comparison of the framework with a baseline random guessing and an exhaustive framework. Lastly, we investigate the performance results of the best learning schemes.

Methods

In total, six hundred learning schemes that include the combination of eight data preprocessors, five attribute selectors and fifteen modeling techniques represent our search space. The genetic framework, through the elitism technique, selects the best learning schemes automatically. The best learning scheme in this context means the combination of data preprocessing + attribute selection + learning algorithm with the highest coefficient correlation possible. The selected learning schemes are applied to eight datasets extracted from the ISBSG R12 Dataset.

Results

The genetic framework performs as good as an exhaustive framework. The analysis of the standardized accuracy (SA) measure revealed that all best learning schemes selected by the genetic framework outperforms the baseline random guessing by 45–80%. The sensitivity analysis confirms the stability between different genetic configurations.

Conclusions

The genetic framework is stable, performs better than a random guessing approach, and is as good as an exhaustive framework. Our results confirm previous ones in the field, simple regression techniques with transformations could perform as well as nonlinear techniques, and ensembles of learning machines techniques such as SMO, M5P or M5R could optimize effort predictions.

1 Background

Providing accurate software effort prediction models is complex but necessary for the software industry (Moløkken and Jørgensen 2003). Software effort prediction models have been studied for many years, but empirical evaluation has not led to simple nor consistent ways to interpret their results (Shepperd and MacDonell 2012). Many software companies are still using expert judgment as their preferred estimation method, thus producing inaccurate estimations and severe schedule overruns in many of their projects (Boehm 1981). Software project managers need to be able to estimate the effort and cost of development early in the life cycle, as it affects the success of software project management (Huang et al. 2015).

Several prediction models have been evaluated in the literature and inconsistent findings have been reported regarding which technique is the best (Jorgensen and Shepperd 2007; Shepperd 2007; Dejaeger et al. 2012; Shepperd and MacDonell 2012). The results of these studies are not univocal and are often highly technique-and dataset-dependent. Since there are many models that can fit to certain datasets, the selection of the most efficient prediction model is crucial (Shepperd 2007; Mittas and Angelis 2008). Additionally, the results of different studies are difficult to compare due to different empirical setups and data preprocessing, possibly leading to contradictory results. The issue of which modeling technique to use for software effort estimation remains an open research question (Dejaeger et al. 2012).

The automated selection and combination of techniques in alternative ways could improve the overall accuracy of the prediction models for specific datasets (Dejaeger et al. 2012; Malhotra 2014). The motivation behind the use of these methods is to make minimal assumptions about the data used for training. In this sense, genetic algorithms are search-based algorithms that are much faster than exhaustive search procedures (Malhotra 2014). According to Harman (2007), search based (SB) optimization techniques have been applied to a number of software engineering (SE) activities, of all optimization algorithms, genetic algorithms have been the most widely applied search technique in SBSE.

In our genetic approach, we address how to select the data preprocessing, attribute selection techniques and the learning algorithms automatically according to the characteristics of a specific data set. The main goal is to increase prediction performance optimizing processing time. In this case, the automatic selection of the learning scheme (preprocessing + attributes selection + learning algorithms) is determined by using a genetic approach. For example, the decision of how to select the different techniques considering the characteristics of a specific dataset through of genetic algorithms could be considered a search-based problem for software engineering.

This paper reports an empirical validation of an automated genetic framework. In total, 600 learning schemes that include the combination of 8 data preprocessors, 5 attribute selectors, and 15 modeling techniques represented our search space. The genetic framework through the elitism technique selects the best learning schemes automatically based on the highest correlation coefficient.

In this study, we conducted a sensitivity analysis across different genetic configurations to evaluate the stability of an automated genetic framework and to discover which genetic configurations report best performance. Further, we compared the automated genetic framework performance with a baseline random guessing (Langdon et al. 2016) and an exhaustive framework (Quesada-Lopez et al. 2016). Then, we analyzed the performance of the best learning schemes. We aim to find the best framework configuration (generation and population, mutation levels, crossover levels) according to given data set context. After that, we want to compare the genetic framework results with the exhaustive framework results in order to determine if our genetic approach presents similar solutions to the best solutions found by the exhaustive approach. Besides, we would like to evaluate if evaluation and prediction phases reported similar results as it would mean that our approach is reliable between phases. Finally, we would like to know the most frequently learning schemes selected and the learning schemes that reports the best performance in the effort prediction domain. The main contribution of this empirical study is that it presents a genetic framework, which can be used to automatically determine the best learning schemes to use according to the characteristics of a specific data set. Our approach is different respect to others because we use a full model (data preprocessing + attribute selection + learning algorithm) in our fitness function thus maximizing the three components.

This study generates through genetic algorithms software effort prediction models based on function point measure using the International Software Benchmarking Standards Group (ISBSG R12) Dataset (Hill 2010; ISBSG 2015) and evaluates their effectiveness. We analyze the performance of effort prediction models based on the base functional components (BFC) and unadjusted function point size (UFP) (Albrecht 1979; Jeng et al. 2011). We have established the following research questions as the focus of our analysis. We address each of these questions in the section referenced in parentheses.

RQ1. Which genetic framework configuration (generation and population, mutation levels, crossover levels) did report the best performance when compared to the baseline exhaustive framework? (Section 5.1).
RQ2. Is the performance of the genetic framework similar between evaluation and prediction phases? (Section 5.2).
RQ3. Which are the learning schemes (data preprocessors, attribute selectors, learning algorithms) more frequently selected by the genetic framework? (Section 5.3).
RQ4. Which learning schemes did report the best performance according evaluation criteria metrics? (Section 5.4).

The remainder of the paper is structured as follows: Section 2 briefly provides information on previous studies in effort prediction. Section 3 describes the genetic framework to select and evaluate automatically effort prediction models. Section 4 details the experimental design, and Section 5 presents the analysis and interpretation of results. Finally, Section 6 outlines conclusions and future work.

2 Related work

Several formal models have been employed in software effort prediction using a number of data mining techniques (Jorgensen and Shepperd 2007; Wen et al. 2012). These include several regression analysis techniques, neural networks, instance-based learners, tree/rule-based models, case-based reasoners, lazy learning, bayesian classifiers, support vector machines, and ensembles of learners (Jorgensen and Shepperd 2007; Shepperd and MacDonell 2012). Most studies evaluate only a limited number of modeling techniques on a dataset, which limits the generalization of results. In addition, the results of different studies are difficult to compare due to their different empirical setups, data preprocessing, and dataset characteristics (Shepperd and MacDonell 2012; Langdon et al. 2016). Therefore, the issue of which modeling technique to use for software effort estimation remains an open research question (Dejaeger et al. 2012).

2.1 Frameworks for benchmarking prediction models

Several prediction models have been evaluated in the literature and inconsistent findings have been reported regarding which technique is the best (Shepperd and MacDonell 2012; Dejaeger et al. 2012). For example, in (Jørgensen 2004) differences between model-based and expert-based estimation results were found. In (Mair and Shepperd 2005) regression and analogy methods for effort estimation where compared and conflicting evidence were found. In (Dejaeger et al. 2012) the authors conducted a large scale benchmarking study using different types of techniques and analyzed aspects related with the selection of features. The results indicated that ordinary least squares regression in combination with a logarithmic transformation performs best; however, the combination of other techniques could obtain similar results. Similar results were found in previous work conducted for the authors of this paper (Quesada-Lopez et al. 2016; Murillo-Morera et al. 2016a; Quesada-Løpez et al. 2016). In (Keung et al. 2013), ninety predictors are evaluated with 20 datasets, they used 7 performance measures to determine stable rankings of different predictors. They concluded that regression trees or analogy-based methods are the best performers and offered means to address the conclusion instability issue. In (Huang et al. 2015) several data preprocessing techniques were empirically assessed on the effectiveness of machine learning methods for effort estimation. The results indicate that data preprocessing techniques may significantly influence the predictions, but sometimes it might have negative impacts on prediction performance. They concluded that a careful selection is necessary according to the characteristics of machine learning methods, as well as the datasets.

In consequence, a number of frameworks for benchmarking prediction models in software effort estimation have been proposed. The main motivation of these studies is to achieve an unbiased criterion when comparing different software estimation models and evaluate the effectiveness of data preprocessing techniques, data attribute selectors, and machine learning algorithms in the context of software effort estimation. For example, in (Shepperd and MacDonell 2012) the authors proposed a framework for evaluating prediction systems to reduce the inconsistency amongst validation study results and provide a more formal foundation to interpret results on continuous prediction systems. The use of an unbiased statistic will assist in performing future meta-analyses and in providing more robust and usable recommendations to practitioners. In (Menzies and Shepperd 2012; Keung et al. 2013), conclusion instability in prediction systems is discussed with the intention of providing a framework for studies in the area. The paper analyzed known sources of instability such as the bias measures, variance from sampling, pre-processing and others; after that, it provided recommendations in order to reduce the instability problems. The authors state that an interesting research possibility is to tune the data mining and machine learning techniques using feedback from the domain. This approach generates the learner for a particular dataset. Finally, they concluded that learning learners is an active research area and much further work is required before we can understand the costs and benefits of this approach. In (Song et al. 2013) the authors proposed a framework to investigate to what extent parameter settings affect the performance of learning machines in software effort estimation, and what learning machines are more sensitive to their parameters. They concluded that different learning machines have different sensitivity to their parameter settings. Finally, (Dolado et al. 2016; Langdon et al. 2016) proposed a measure based on a random guessing framework to compare methods for software estimation.

2.2 Effort prediction approaches using genetic algorithms

Harman and Jones (2001) stated that software engineering is ideal for the application of metaheuristic search techniques, such as genetic algorithms, simulated annealing and tabu search. They argued that such search-based techniques could provide solutions to the difficult problems of balancing competing (and sometimes inconsistent) constraints and may suggest ways of finding acceptable solutions in situations where perfect solutions are either theoretically impossible or practically infeasible. In their work they briefly set out key ingredients for successful reformulation and evaluation criteria for search-based software engineering.

In (Harman 2007), Harman described a study on the application of optimization techniques in software engineering. The optimization techniques in Harman’s work came from the operations research and metaheuristic computation research communities. His research reviewed the used optimization techniques and the key ingredients required for their successful application to software engineering, providing an overview of existing results in eight software engineering application domains. Harman’s paper also described the benefits that are likely to accrue from the growing body of work in this area and provided a set of open problems, challenges and areas for future work. He stated that for new areas of software engineering that have yet to be attacked using search based approaches it remains acceptable to experiment with a variety of search algorithms in order to obtain baseline data and to validate the application of search. But, in order to develop the field of search-based software engineering, a reformulation of classic software engineering problems as search problems is required. In (Harman et al. 2012), the authors argued that search based software engineering has proved to be a very effective way of optimizing software engineering problems. Nevertheless, its full potential as a means of dynamic adaptivity remains under explored.

In (Lefley and Shepperd 2003), the authors investigated the use of various techniques including genetic programming with public data sets. They attempted to model and estimate software project effort. They analyzed when a genetic program can offer better solution search using public domain metrics rather than company specific ones. The study also offered insights into genetic programming performance. They determined that genetic programming performed consistently well, but was harder to configure. In (Arcuri and Fraser 2011; Sayyad et al. 2013), the authors conducted a comprehensive study analyzing the impact of parameter settings in machine learning and software effort estimation. They performed a large study of parameter settings using genetic algorithms. Their results showed that parameter tuning can have critical impact on algorithmic performance, and that overfitting of parameter tuning is a serious limitation of empirical studies in search-based software engineering. In (Aljahdali and Sheta 2013), the authors argued that recently, computational intelligence paradigms were explored to handle the software effort estimation problem with promising results. In this paper they evolve two new models for software effort estimation using Multigene Symbolic Regression Genetic Programming. One model utilizes the source line of code as input variable to estimate the effort; while the second model utilizes the inputs, outputs, files, and user inquiries to estimate the function points. Finally, in (Chen et al. 2017), the authors proposed that instead of mutating a small population, building a large initial population which is then culled using a recursive bi-clustering binary chop approach. They evaluated this approach on multiple software engineering models, unconstrained as well as constrained, and compared its performance with standard evolutionary algorithms. Using just a few evaluations (under 100), they can obtain the comparable results to standard evolutionary algorithms.

In (Singh and Misra 2012), the authors argued that COCOMO is used as algorithmic model and an attempt is being made to validate the soundness of genetic algorithm technique using NASA project data. The main objective of this research is to investigate the effect of crisp inputs and genetic algorithm techniques on the accuracy of system’s output when a modified version of the famous COCOMO model is applied to the NASA dataset. In (Ghatasheh et al. 2015), a firefly algorithm is proposed as a metaheuristic optimization method for optimizing the parameters of three COCOMO-based models. These models include the basic COCOMO model and other two models proposed in the literature as extensions of the basic model. The developed estimation models are evaluated using different evaluation metrics. Experimental results show high accuracy and significant error minimization of firefly algorithm over other metaheuristic optimization algorithms including genetic algorithms and particle swarm optimization.

2.3 Techniques and algorithms applied to software effort estimation models

Several techniques have been applied to the field of software effort prediction. In our study, we applied and evaluated different data preprocessing approaches, attribute selector techniques, and machine learning algorithms, some of them representing groups of learning algorithms. In Table 1, we present a summary of techniques and algorithms based on previous literature in the domain of software effort prediction and other prediction contexts (Witten and Frank 2005; Song et al. 2011; Dejaeger et al. 2012). The data preprocessing approaches are represented by the tag (PP), the attribute selector techniques are represented by the tag (AS), and the learning algorithms are represented by the tag (LA). In the following sections, we briefly discuss each of the included techniques.

Table 1 Data mining techniques for learning schemes

A genetic algorithm based framework for software effort prediction

Abstract

Background

Objectives

Methods

Results

Conclusions

1 Background

2 Related work

2.1 Frameworks for benchmarking prediction models

2.2 Effort prediction approaches using genetic algorithms

2.3 Techniques and algorithms applied to software effort estimation models

2.3.1 Data preprocessing (PP)

2.3.2 Attribute Selector (AS)

2.3.3 Learning Algorithm (LA)

3 Genetic effort prediction framework

3.1 Learning schemes generator and evaluator

3.2 Effort prediction

3.3 Genetic approach

3.4 Genetic setup

4 Methods

4.1 Experimental procedure

4.2 Dataset selection

4.3 Learning schemes

4.4 Evaluation criteria

4.5 Statistical tests

4.6 Threads to validity

5 Results and discussion

5.1 RQ1. Which genetic framework configuration (generation and population, mutation levels, crossover levels) did report the best performance when compared to the baseline exhaustive framework?

5.1.1 Generation and population

5.1.2 Mutation level(s)

5.1.3 Crossover level(s)

5.1.4 Runtime comparison between the exhaustive framework and the genetic framework

5.2 RQ2. Is the performance of the genetic framework similar between evaluation and prediction phases?

5.3 RQ3. Which are the learning schemes (data preprocessors, attribute selectors, learning algorithms) more frequently selected by the genetic framework?

5.4 RQ4. Which learning schemes did report the best performance according evaluation criteria metrics?

5.4.1 Learning schemes with the best performance

5.4.2 Attributes selected by the best learning schemes

6 Conclusions

7 Appendix A: dataset variables

8 Appendix B: datasets descriptive statistics

9 Appendix C: evaluation criteria and equations

10 Appendix D: best learning schemes models performance

Abbreviations

References

Acknowledgments

Funding

Availability of data and materials

Authors’ contributions

Authors’ information

Competing interests

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords