- Open Access
© The Author(s). 2018
- Received: 20 March 2018
- Accepted: 19 September 2018
- Published: 6 October 2018
- Genetic Improvement
- Source code Optimization
- Search Based Software Engineering
Genetic Improvement (GI) (Harman et al. 2012) is a set of techniques that allows the evolution of software through a process of genetic programming, aiming to improve the software regarding a set of quality criteria. Genetic Programming (Arcuri et al. 2008) is an automated method for creating a computer program from the high-level definition of a problem.
GI preserves the functional properties of the software. It can be seen as an automatic source code refactoring technique that generates functionally equivalent variants, which offer improvements in one or more non-functional properties such as usability, flexibility, performance, among others. The variants produced by the process are readable for a human and should be checked by software engineers before their use because changes can remove correct software behavior. However, the technique removes from the software engineers the laborious task of navigating among implementation alternatives to reach the one meeting the required software non-functional characteristics. Instead, the Optimization process proposes a text patch with changes that might attend to the needs expressed by a fitness function that drives the optimization.
The application of this kind of research is naturally experimental. Different configurations of an optimization algorithm (optimizer) should be executed until the observation that there is no difference in the optimizer performance regardless of the program under analysis can be observed. The large number of syntactical elements comprising a programming language and the diversity of selection, crossover, and mutation operators that can be applied to the source code leads to an extensive list of optimizer composition alternatives. A long cycle of experimentation is required to understand the patterns of changes made to the program and revealing optimization settings for a given language and domain. With each experiment cycle, more information is acquired, promoting adjustments to the optimization process, and requiring more experimentation cycles to study their effects.
Each experiment cycle generates a program variant, verifies that it compiles and executes the program’s test cases dozens of times to take into account the random aspects inherent from the selected improvement operators. Furthermore, the optimizer checks whether the program variant represents an improvement from the standpoint of the non-functional criteria under analysis. This process is costly regarding requiring significant computer resources to be executed.
The construction and testing of the optimizer revealed the need for using high-performance computing (HPC) resources (supercomputing) to enable the execution of the experimental plan. It was due to the amount of memory required to maintain, at the same running time, multiple source code variants and their correspondingly testing in order to validate variants correctness.
2.1 Genetic improvement
In 1992, John Koza used genetic algorithms (GA) to automatically generate programs for accomplishing specific tasks, such as solving mathematical expressions. The author baptized the method as Genetic Programming (GP). In GP, syntax trees contain function or terminal nodes to represent the programs. Functions can be arithmetic, Boolean, conditional, iteration-related, recursion-related, in addition to domain-specific functions for the problem in question. Terminals are variables or constants. Therefore, GP is a GA specialization in the domain of manipulating computer programs.
Genetic Improvement (GI) makes use of GP to evolve a human-written software (a target program) taking into account a set of quality criteria. Like GP, GI uses a syntax tree representation to evolve the target program over a sequence of generations in search of improved versions from the selected quality criteria perspective.
2.2 Local search
Local search is a class of point-based algorithms that use heuristics to systematic traverse the neighborhood of a given individual searching for a better neighbor if one can be found. A neighbor from a given individual A results from the application of a simple operation upon A, such as changing an individual characteristic. The neighborhood of an individual A is defined by the set of distinct neighbors that can be generated by applying the selected operation upon A. Once it finds a better neighbor, the search procedure is repeated to examine its neighborhood. Local search algorithms favor exploitation instead of exploration, concentrating the investigation in the neighboring region of a given individual taken as an initial solution for the problem at hand and neglecting the remaining parts of the search space.
Petke et al. (2013) applied GI to a system called MiniSAT, an open source project written in C++ to solve Boolean problems (SAT, Satisfiability or Boolean Satisfiability Problem). It uses state-of-the-art technologies for solving SAT problems, including Unit propagation, Conflict-driven clause learning and Watched literals (Jensen et al. 2009). According to them, the genetic improvement of the MiniSAT was a significant challenge, since expert programmers evolved it over the years. The researchers wanted to check if it was possible to improve the best human solution execution time. Harman et al. (2012) described the optimization approach. The optimization process was applied to one core system class (Solver.cc), and the observed improvement was 1% in the best case (measured in lines of code), where assertions were erased from the code.
White et al. (2010) proposed a framework for C program optimization using multi-objective GI to meet pre-configured non-functional criteria, where the software engineer can extend the evaluation behavior to include further criterion as needed (memory, disk, energy, among others). In particular, they evaluated the computer programs execution time. The authors selected eight distinct functions of the software for a simulation-based evaluation and submitted them to the optimization process. The results show an exciting improvement: in the best case, there was a 5% improvement over the original computer program. It is worth noting that some improvement patterns have been identified, such as the types of instructions executing faster than others do. However, the limitations in the study reduce the capacity of observation, due to i) the tests to evaluate a change were selected according to some structural criterion (coverage, for example), assessing the changes from a different perspective from the original; (ii) the experiment optimized the functions separately, observing improvements only in this isolated context; and (iii) there was no update of these functions in the original software for evaluation and comparison with all of them updated.
Harman et al. (2014) applied GI in the migration and transplantation of functionalities between software systems in operation. The researchers experimented with using an instant messaging system (Pidgin), and another one of text translation (Babel Fish). The goal was to add text translation behavior to Pidgin. The technique is divided into two parts: growth and graft. In the growth part, a software developer selects the parts responsible for the functionality along with their tests so that the optimization process looks for a group of instructions (classes and methods) optimized from a runtime perspective. In the graft part, the challenge is to find insertion points in the target software. To do so, the researchers randomly choose a point and apply three distinct types of mutation (variable replacement, statement replacement, and statement swapping). The study performed thirty rounds of the optimization process, with a population of 500 individuals evolved over 20 generations. The authors reported that in addition to finding real solutions, this was the first work of code transplantation using SBSE to that date.
Cody-Kenny et al. (2015) proposed a tool called locoGP that uses GI in Java computer programs to reduce their size, measured by the number of instructions. The tool reads the source code in Java, assembles its syntactic tree, and applies crossover and mutation operations. Operations can swap, erase, or clone nodes of the syntax tree as selected for the operation. Program instances implementing 12 distinct sorting algorithms (Insertion Sort, Bubblesort, BubbleLoops, among others) supported the tool evaluation. In all cases, locoGP was able to find better alternatives than the implementations presented to it, that is, it reduced the number of instructions used in each of these algorithms.
Although we have not found previous experiences of using HPC for the execution and analysis of experimental studies in the context of Software Engineering, the technical literature presents works involving the application of Software Engineering techniques in HPC environments. Fountoukis and Chatzistavrou (2018) propose the use of design patterns for software written for HPC environments. Two standards are presented to address bottlenecks in parallel execution, a common problem in this context. Overbey et al. (2005) present a tool called Photran for automatic refactoring of Fortran code. The tool uses static code transformations to increase the use of resources in HPC environments. According to the authors, the advent of object-oriented Fortran made it possible to observe static patterns of instruction exchange related to parallelism.
The related works presented here represent a small set of literature review works that we did during the research. Despite seeking inspiration in these researches, none influenced directly the implementation decisions described in this paper. The influence is presented in the experimental plan described in the next section.
The independent variable selected was the test cases execution time, measured in milliseconds. Since the optimization perspective is the code execution time, running the test suite faster is the optimization criterion guiding the search. The treatment applied is the transformation of code instructions with the purpose of reducing (remove or transform) instructions, where it is expected that with fewer instructions the execution time is also smaller.
Thirteen libraries observed. The first column shows the number of lines of each library
The experimental design is one factor and more than two treatments. The comparison regards the average improvement observed between treatments (GI, Local search, and random search). All instrumentation of the experimental study is automated in the Optimizer, as will be seen in Sections 5 and 6.
During the Optimizer construction, these experimental studies were performed twice on different platforms: using ordinary machines and using HPC (Section 5). The first configuration executed 30 optimization cycles for each library in which the genetic algorithm used an arrangement of 100 individuals evolved over 50 generations. In HPC, due to the available resources, we increased the observation for 60 optimization cycles and repeated the other configurations. Each variant presenting an original code improvement was executed 20 times to improve the execution time reliability of the test case suite. The initial implementation of each library received the same treatment. It intends to obtain a consistent baseline for comparisons against not controlled components residing in the execution time measure of programs in a multitasking environment, although restricting the number of applications running in parallel.
The results obtained with conventional machines were discarded due to the need to observe the normalized effects. The main reason for rejecting the first results is to maintain the basis of comparison using an HPC environment since the code optimization perspective was the execution time during the Optimizer construction. The observed libraries execution times are different between the ordinary machines and HPC environment.
5.1 The ECJ tool
5.2 The Jurassic project
5.3 Chrome V8
However, new problems emerged in this version. By creating the first real original code variations, sometimes the optimizer produced code containing recursive calls or infinite loops. These features caused memory leaks and stack overflows in Chrome V8, stopping the optimizer. Also, each new library required configuration and development effort to support the library’s specific features and unit tests (browsers, dependencies, and environment variables. Since Chrome V8 is not a browser, unit-testing support was not provided by default. Therefore, it was coded on an instance basis. Nevertheless, it was possible to perform automatic experiments in three libraries and observe their results described in Section 5.
5.4 Nodejs and typescript
Another challenge imposed by NodeJs is being a single thread, i.e., it only executes one process at a time. It is not possible, for instance, to generate and control two activities on the same execution stack. It prevented the Optimizer from adequately dealing with problems like lack of memory and fatal errors. Because Optimizer operations (as well as unit tests) can cause failures of these types, we have situations where the Optimizer execution terminated unexpectedly due to a fatal error or lack of resources. To address this problem, the Optimizer was redesigned to work in two layers: a server and a client layer. The purpose was to run an operation that might fail on the client and keep the primary process protected inside the server. The WebSocket protocol, which allows direct communication between different machines using the HTTP protocol, was used to exchange messages between the layers. Although it operates over HTTP, it lets a server to stay actively connected to clients throughout the process.
Optimizer exception handling started to work by controlling the connection status to a particular client. Once the server decides which operations need to be performed, it distributes them to the available clients, which in turn will carry out the operations. If one of these clients fails, the connection to the server will terminate unexpectedly. After the server notices the connection drop, it redirects the operation that caused the failure of another client. During the processing of operation by the clients, the server waits until a group of operations, which only have logical meaning together (all crossings and mutations in the genetic algorithm case), are completed and then proceed with the primary optimization process.
At this point, we begin the reproduction of the three experiments performed with the Optimizer third version. However, performance problems and lack of computational resources made it impossible to execute the experiments entirely and required us to restart the procedure a few times. Using multiple computers in a connected (HTTP) context has resulted in performance issues. Also, the memory consumption is very high for large libraries, with 5000 lines or more, making it impossible to use conventional platforms for their execution.
5.5 Using a high-performance computing environment on SE experiments
In July of 2016, it arose an opportunity to use a supercomputer of COPPE/UFRJ. Lobo Carneiro (LOBOC), in honor of the late professor Fernando Luiz Lobo Barboza Carneiro (1913–2001), is a supercomputer with the capacity of 226 teraflops, 16 Tbytes of RAM and 720 terabytes of disk, capable of running 226 trillion of mathematical operations per second. With these features available, the single thread limitation no longer exists (memory limitation over one single thread of NodeJs), and the fault handling returned to the previous state without having to split the Optimizer tasks among multiple clients (simpler and faster). It allowed us to conduct experiments with larger libraries (in lines of code) in a much shorter period.
To run the Optimizer in a parallel environment, like LOBOC, we produced a new specific optimizer version for that environment (fifth version). LOBOC runs SUSE Linux Enterprise Server version 12. As NodeJs does not have a specific package for this operating system, we compiled it under the assigned user account for use. Once NodeJs was running in LOBOC, the Optimizer was able to run its tests without significant problems. However, there was a complication: despite the NodeJs process can address the 16 terabytes of available memory; it was not able to see all available processors.
The LOBOC architecture is organized as a cluster, that is, it consists of 252 nodes, each with a processor composed of 24 cores using HyperThreading technology to execute up to 48 tasks in parallel. A queue system, called Altair PBS Professional, allocates the nodes to a process. The method of allocating resources (disk, memory, nodes, and cores) is simple: through a native Linux bash script file, it is possible to describe what features are needed to run a program and make use of them.
Even after allocating more nodes, NodeJs processes continued to use only 48 cores (24 physical cores, seen by the system as 48 cores due to HyperThreading). The PBS launches the Optimizer root process on one of these nodes that act as the execution host and informs that other node will compose the cluster during the session through a text file passed via command line. By itself, the Optimizer does not know the other nodes existence (and their cores). It was necessary to handle the details file sent by PBS inside the optimization process initialization.
First, each node proceeded to perform a genetic algorithm configuration, using an algorithm core and the other 47 cores for the evaluation of the objects under analysis (individuals). Then, 60 genetic algorithm rounds were executed in parallel (one on each node). We used 60 parallel rounds because the automatic experiments were designed to work with 60 optimization process observations per instance. Thus, we were able to run all the rounds of one library in parallel by the algorithm (for instance, Random Search) at a time. This composition consumed 2880 cores with 500 GB of RAM. In this scenario, the average duration of an automatic experiment was reduced to 10% of the previous total time (already in HPC, but without parallelism). Since we have all the necessary rounds in parallel, the last trial ending determines the total experiment time. Before finishing a library, others were started and the total time observed was the sum of these trials.
5.6 The current version of the optimizer
The fourth version6 was completed in July 2016 and was evaluated through experiments with thirteen libraries with the goal to reduce the code execution time of these libraries. The list of libraries that were observed is in Table 1. They were chosen based on size (measured in lines of code and presented in the LOC column), their use (measured in downloads per month), as well as the coverage and size of its suite of unit tests. The Downloads column shows the number of downloads for each library in December 2016. The suite size is measured in the number of unit test cases reported in the Tests column, while the coverage represents the percentage of library code statements exercised during the test case suite execution (Coverage column). As the whole process uses unit testing as an oracle, this is the primary criterion for selecting new libraries (code) for analysis. Little code coverage can generate false positives, i.e., removal or alteration of valid but untested code that will only be discovered during the results of the human analysis.
In the first run (conventional computers), the process took 28 days to complete the three libraries on three machines with identical configuration: Intel Core i7 3.4 GHz quad-core processor and HyperThreading technology (up to eight simultaneous tasks) and 4GB of memory. As described in the experimental plan, these results were discarded, and we carried out the second execution of the automatic experiment using LOBOC.
Results of the fourth version of the Optimizer, representing the test cases execution time (in milliseconds) of the original implementation and the best variant found by the optimization process
In the Tleaf library case, the results are more impressive, indicating 71% of time improvement. In this case, the Optimizer made substantial and significant modifications. The changes found in the library code are divided between the removal of parameters in a function call, elimination of the declaration and initialization of variables, removal of attributes in objects, assignment of values to variables and full body of functions. Of this group, the researchers’ attention was drawn to the fact that the Optimizer was able to remove entire functions: sometimes its declaration is completely removed and in others all its code while maintaining its declaration. The same type of behavior was observed in the uuid, gulp-cccr, express-ifttt, Plivo-Node, Jade, xml2js, and Node-Browserify libraries.
To understand how the Optimizer was able to remove entire functions, it was necessary to read the test cases as well the library code. During reading, it was clear that the reason why the removal happened is due to the behavior of the library. It changes according to the presence or absence of certain values in specific code properties (objects and properties).
Tleaf is a library for automatic generation of test code for Angular controllers, i.e., the physical path of a file containing one or several controllers written in Angular is passed to the library, in addition to the output path for automatic code generation of those controllers. Its test code is based on internal templates written inside the library code (string concatenation and use of some dictionaries). The file with test code generated by the library contains empty methods which the developer may include the appropriate testing behavior. In summary: Tleaf is a library that automatically generates empty test functions to increase a developer’s productivity in Angular technology. It has 131 unit tests.
Once a software engineering understands the library domain, s/he can understand the results obtained by the Optimizer. It was able to remove properties from the internal templates used by the library to generate the test codes. For example, the Optimizer removed an attribute in a template called ‘validate.’ This attribute had ‘required’ as its value. This attribute determines that in the generation of a test code the attributes values in the Angular controllers are mandatory. Once this attribute (and its value) has been removed, the function that writes the code and makes the validation mandatory is no longer necessary for the library and, with that, it was removed. The same removal behavior was observed in three other library functions (identifyDeps, addUnitDependency, and isEmptyString).
The same behavior was observed in uuid and Plivo-node. However, in the gulp-cccr, express-ifttt, jade, xml2js, and Node-Browserify libraries, the relevant factor was the lack of direct code coverage of the removed functions. To illustrate the situation, we will use the uuid library as an example.
Inside uuid, the Optimizer has found two cases (fromURN and fromBytes functions) where there was a problem regarding the absence of coverage in the tests, that is, at no time these functions are exercised during the unit tests execution. So, it was possible to remove them and still succeed in running the tests.
This fact helps in to confirm that the unit tests coverage directly affects the optimization process results. There is also evidence contributing to show a tester (or developer) characteristics in their test cases that need attention because they do not exercise a reasonable source code area.
All libraries have had positive results, that is, the Optimizer was able to find variations of the original code that run in less time where it is possible to realize that the altered or removed statement directly affects the code execution of its function or library. Typical examples of modifications include unused variable declarations or the removal of their initialization values (which are not used), parameters removed in internal function calls and others. These results show the importance of a good design of tests cases to subsidize the genetic improvement, as well as the need for the review of a software engineer after the optimization process. All changes produced by the Optimizer were confirmed after manual analysis.
The validity of the results observed through an experimental study should be addressed in all phases of its life cycle, from planning to analysis. This section outlines the major threats to validity regarding our findings.
Though some automation should support it, the process of setting up new libraries and determining their test suite statement coverage usually requires the manual developer intervention. It subjects experimentation to the researcher intervention and, therefore, requires the researchers to follow strict rules (clone repository, install and configure the Istanbul tool7 and extract coverage percent’s) for the sake of consistent replication and generalization (external validity).
Finally, we are interested in a qualitative analysis of the optimized libraries. In Section 5, objective improvement measures (see Table 2) were given far less attention than the specific changes proposed to some libraries in the discussion following Table 2. Therefore, we have not used statistical tests to assert whether the results of independent optimization rounds were due to chance (conclusion/construct validity). We intend to apply such tests in the future to compare the power of different optimization algorithms, but in the sense of results attained and the time required to find these results.
During the Optimizer construction of the some essential lessons were learned. Some of them could have led the research to a premature end, whether a solution for that thread was not possible in viable time. They were someway depicted in Sections 5 and 6.
8.1 Correct choice of language interpretation engine
Choosing the right interpretation engine is one of the initial decisions when one is studying interpreted languages. Nevertheless, one of the most critical decisions. The criteria that should be observed for the choice are (i) compatibility with the correct version of the code of the subjects chosen for observation and (ii) the performance required to observe thousands of variations of this code. The troubleshooting about this item is discussed in Sections 5.2 and 5.3.
8.2 Representation used for code variants (solutions)
Representing a solution is a common problem in SBSE. Therefore, there is a small trap in this decision. Using a standard representation in other searches, such as code lines or even AST of code can lead to very high memory consumption, as seen in Section 5. Le Goues et al. (2012) propose using a Patch representation to reduce memory consumption for genetic algorithms targeting automated bug fixing. This type of representation might reduce significantly the memory consumed by the optimization process of this kind of research. The troubleshooting about this item is discussed in Sections 5.4, 5.5 and 5.6.
8.3 Tests quality
The tests percent coverage criteria that were used to select the subjects was considered necessary to prevent the Optimizer from removing “correct” instructions, that is, instructions that the test exercise them. However, we do not consider analyzing the quality of these tests as a selection criterion. After analyzing some previous Optimizer results, we noticed that covered instructions and even whole functions were removed entirely. This behavior is directly associated with the perceived quality of the tests in question. One point to be observed in this type of research is how the test design has, in addition to completeness (measure in coverage percent), quality.
We did not find a quality metric of tests that could support us in this decision and, therefore, only in the analysis of the previous results it was possible to revise the code of the tests in the perspective of the modification produced by the Optimizer. The troubleshooting about this item is discussed along with all Section 6.
8.4 Environment for running tests
8.5 Some other situations
Some of the lessons learned in the construction of the Optimizer were classified as specific to the development of the Optimizer. For example, platform switching from the typical environment to the HPC environment or the absent of libraries requirements/documents. These two and other situations are described in detail in Section 5 of the text.
The authors would like to thank Núcleo Avançado de Computação de Alto Desempenho at COPPE/UFRJ, CAPES and CNPq for their support. Prof. Barros and Prof. Travassos are CNPq Researchers.
This research was developed with the support of the Núcleo Avançado de Computação de Alto Desempenho (NACAD) of COPPE/UFRJ. CAPES and CNPq also supported this research. Prof. Travassos and Prof. Barros are CNPq Researchers.
Availability of data and materials
Please contact the first author for data requests.
FF performed the experiments and analyses while supervised by GT and MB. MB and GT repeated the analyses with a revision bias, proposed and supervise corrections. FF, MB, and GT jointly wrote the text. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Arcuri A, White DR, Clark J, Yao X (2008) Multi-objective improvement of software using coevolution and smart seeding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). http://doi.org/10.1007/978-3-540-89694-4_7 Google Scholar
- Bierman G, Abadi M, Torgersen M (2014) Understanding typescript. Proceedings of the 23rd European Conference on Object-Oriented Programming, Genova, ITView ArticleGoogle Scholar
- Cody-Kenny B, Lopez EG, Barrett S (2015) locoGP: Improving Performance by Genetic Programming Java Source Code. In Genetic Improvement 2015 Workshop. http://doi.org/doi:10.1145/2739482.2768419
- Harman M, Jia Y, Langdon WB (2014) Babel Pidgin: SBSE can grow and graft entirely new functionality into a real world system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). http://doi.org/10.1007/978-3-319-09940-8_19 Google Scholar
- Harman M, Langdon WB, Jia Y, White DR, Arcuri A, Clark JA (2012) The GISMOE challenge: constructing the pareto program surface using genetic programming to find better programs (keynote paper). In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE’12). http://doi.org/10.1145/2351676.2351678
- Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012) A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings - International Conference on Software Engineering. http://doi.org/10.1109/ICSE.2012.6227211
- Orlov M, Sipper M (2011) Flight of the FINCH through the Java wilderness. IEEE Transactions Evolutionary Computation 15(2):166–182View ArticleGoogle Scholar
- Fountoukis SG and Chatzistavrou DT (2009). Pattern Based Object Oriented Software Systems Design for High Performance Computing. https://www.semanticscholar.org/paper/Pattern-Based-Object-Oriented-Software-Systems-for-Fountoukis-CHATZISTAVROU/2ccab1ae9af32cc0a1d109becb6d93928fb1fb91?tab=abstract
- Overbey J, Xanthos S, Johnson R, Foote B (2005) Refactorings for Fortran and high-performance computing. In Proceedings of the second international workshop on Software engineering for high performance computing system applications - SE-HPCS ’05. http://doi.org/10.1145/1145319.1145331
- Petke J, Haraldsson SO, Harman M, Langdon WB, White DR, Woodward JR (2018) Genetic Improvement of Software: A Comprehensive Survey. IEEE Transactions on Evolutionary Computation. http://doi.org/10.1109/TEVC.2017.2693219 View ArticleGoogle Scholar
- Petke J, Langdon WB, Harman M (2013) Applying genetic improvement to MINISAT. Proceedings of the Symposium on 6th Search Based Software Engineering, St Petersburgh, RUView ArticleGoogle Scholar
- Marques-Silva J, Lynce I, Malik S (2009) Conflict-driven clause learning SAT solvers. Frontiers in Artificial Intelligence and Applications. http://doi.org/10.3233/978-1-58603-929-5-131
- White DR (2010) Genetic Programming for Low-Resource Systems. PhD Thesis, Dapartment of Computer Science, University of York, UKGoogle Scholar
- Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Experimentation in Software Engineering. http://doi.org/10.1007/978-3-642-29044-2 View ArticleGoogle Scholar