An algorithm for combinatorial interaction testing: definitions and rigorous evaluations

Balera, Juliana M.; Santiago Júnior, Valdivino A. de

doi:10.1186/s40411-017-0043-z

Research
Open access
Published: 28 December 2017

An algorithm for combinatorial interaction testing: definitions and rigorous evaluations

Journal of Software Engineering Research and Development volume 5, Article number: 10 (2017) Cite this article

6077 Accesses
7 Citations
Metrics details

Abstract

Background

Combinatorial Interaction Testing (CIT) approaches have drawn attention of the software testing community to generate sets of smaller, efficient, and effective test cases where they have been successful in detecting faults due to the interaction of several input parameters. Recent empirical studies show that greedy algorithms are still competitive for CIT. It is thus interesting to investigate new approaches to address CIT test case generation via greedy solutions and to perform rigorous evaluations within the greedy context.

Methods

We present a new greedy algorithm for unconstrained CIT, T-Tuple

Reallocation (TTR), to generate CIT test suites specifically via the Mixed-value Covering Array (MCA) technique. The main reasoning behind TTR is to generate an MCA M by creating and reallocating t-tuples into this matrix M, considering a variable called goal (ζ). We performed two controlled experiments addressing cost-efficiency and only cost. Considering both experiments, we did 3200 executions related to 8 solutions. In the first controlled experiment, we compared versions 1.1 and 1.2 of TTR in order to check whether there is significant difference between both versions of our algorithm. In such experiment, we jointly considered cost (size of test suites) and efficiency (time to generate the test suites) in a multi-objective perspective. In the second controlled experiment we confronted TTR 1.2 with five other greedy algorithms/tools for unconstrained CIT: IPOG-F, jenny, IPO-TConfig, PICT, and ACTS. We performed two different evaluations within this second experiment where in the first one we addressed cost-efficiency (multi-objective) and in the second only cost (single objective).

Results

Results of the first controlled experiment indicate that TTR 1.2 is more adequate than TTR 1.1 especially for higher strengths (5, 6). In the second controlled experiment, TTR 1.2 also presents better performance for higher strengths (5, 6) where only in one case it is not superior (in the comparison with IPOG-F). We can explain this better performance of TTR 1.2 due to the fact that it no longer generates, at the beginning, the matrix of t-tuples but rather the algorithm works on a t-tuple by t-tuple creation and reallocation into M.

Conclusion

Considering the metrics we defined in this work and based on both controlled experiments, TTR 1.2 is a better option if we need to consider higher strengths (5, 6). For lower strengths, other solutions, like IPOG-F, may be better alternatives.

1 Introduction

The academic community has been making efforts to reduce the cost of the software testing process by decreasing the size of test suites while at the same time aiming at maintaining the effectiveness (ability to detect defects) of such sets of test cases. Hence, several contributions exist for test suite/case minimization (Yoo and Harman 2012; Ahmed 2016; Huang et al. 2016; Khan et al. 2016) where the goal is to decrease the size of a test suite by eliminating redundant test cases, and hence demanding less effort to execute the test cases (Yoo and Harman 2012). One of the approaches to reduce the number of test cases is Combinatorial Interaction Testing (CIT) (Petke et al. 2015), also known as Combinatorial Testing (CT) (Kuhn et al. 2013; Schroeder and Korel 2000), Combinatorial Test Design (CTD) (Tzoref-Brill et al. 2016), or Combinatorial Designs (CD) (Mathur 2008). CIT relates to combinatorial analysis whose objective is to answer whether it is possible to organize elements of a finite set into subsets so that certain balance or symmetry properties are satisfied (Stinson 2004).

There are reports which claim the success of CIT (Dalal et al. 1999; Tai and Lei 2002; Kuhn et al. 2004; Yilmaz et al. 2014; Qu et al. 2007; Petke et al. 2015). Such approaches have drawn attention of the software testing community to generate sets of smaller (lower cost to run) and effective (greater ability to find faults in the software) test cases where they have been successful in detecting faults due to the interaction of several input parameters (factors).

CIT approaches to generate test cases can be divided in four main classes: Binary Decision Diagrams (BDDs) (Segall et al. 2011), Satisfiability (SAT) solving (Cohen et al. 1997; Yamada et al. 2015; Yamada et al. 2016), meta-heuristics (Garvin et al. 2011; Shiba et al. 2004; Hernandez et al. 2010), and greedy algorithms (Lei and Tai 1998; Lei et al. 2007)^{Footnote 1}. Recent CIT test case generation methods based on BDD and SAT are interesting to constrained (there are restrictions related to parameter interactions) problems but they perform worse compared with greedy algorithms/tools in the context of unconstrained (there are no restrictions at all) problems.

To corroborate this claim, in (Segall et al. 2011) a BDD-based approach, implemented in the Focus tool, was better in terms of cost than the greedy solutions Advanced Combinatorial Testing System (ACTS) (Yu et al. 2013), Pairwise Indepedent Combinatorial Testing (PICT) (Czerwonka 2006), and jenny (Jenkins 2016) in the constrained domain. However, their method was worse than such greedy solutions for unconstrained problems.

A recent SAT-based approach (Yamada et al. 2016), implemented in the Calot tool, performed well in terms of efficiency (time to generate the test suites) and cost (test suite sizes) comparing again with the greedy tools ACTS (Yu et al. 2013) and PICT (Czerwonka 2006). Despite the advantages of the SAT-based approach, ACTS was much more faster than Calot for many 3-way test case examples. Moreover, if unconstrained CIT is considered, ACTS again was remarkable faster than Calot for large SUT models and higher-strength test case generation.

In the context of CIT, meta-heuristics such as simulated annealing (Garvin et al. 2011), genetic algorithms (Shiba et al. 2004), and Tabu Search Approach (TSA) (Hernandez et al. 2010) have been used. Recent empirical studies show that meta-heurisitic and greedy algorithms have similar performance (Petke et al. 2015). Hence, early fault detection via a greedy algorithm with constraint handling (implemented in the ACTS tool (Yu et al. 2013)) was no worse than a simulated annealing algorithm (implemented in the CASA tool (Garvin et al. 2011)). Moreover, there was not enough difference between test suites generated by ACTS and CASA in terms of efficiency (runtime) and t-way coverage. All such previous remarks, some of them based on strong empirical evidences, emphasize that greedy algorithms are still very competitive for CIT.

Even if some authors have argued that CIT resides in the constrained domain in real-world applications (Bryce and Colbourn 2006; Cohen et al. 2008; Petke et al. 2015), it is important to mention that unconstrained CIT may be interesting from a practical point of view, especially for critical applications such as satellites, rockets, airplanes, controllers of an unmanned train metro system, etc. For such types of applications, robustness testing is very important. In the context of software systems, robustness testing aims to verify whether the Software Under Test (SUT) behaves correctly in the presence of invalid inputs. Therefore, even though an unconstrained CIT-derived test case may seem pointless or even somewhat difficult to execute, it may still be interesting to see how the software will behave in the presence of inconsistent inputs.

Let us consider that we need to test a communication protocol implemented in several critical embedded systems. If each field of such a protocol is a parameter, it is interesting to impose no restriction (no constraint) in the parameter interactions so that a certain Protocol Data Unit (PDU) sent from system A to system B may have values not allowed in the combination of the fields (parameters) of the PDU. In other words, if the specification says that when field f _i=1, possible values of field f _j are between 20 and 70 (20≤f _j≤70), and other field f _k<5, then a test case where f _i=1, 1≤f _j≤4, and f _k<5 is clearly inconsistent because of the value of f _j. But, this can precisely the goal of the test designer because he/she wants to check how the receiving system (B) will act upon receiving a PDU like that from A. This is an example where unconstrained CIT is relevant. It is important to mention that the argument is not that constraints can not be used for testing critical systems but rather that, for certain types of tests (robustness), constraints are not as relevant.

Based on the context and motivation previously presented, this research relates to greedy algorithms for unconstrained CIT. In (Pairwise 2017), 43 algorithms/tools are presented for CIT and many more not shown there exist. Some of these solutions are variations of the In-Parameter-Order (IPO) algorithm (Lei and Tai 1998) such as IPOG, IPOG-D (Lei et al. 2007), IPOG-F, IPOG-F2 (Forbes et al. 2008), IPOG-C (Yu et al. 2013), IPO-TConfig (Williams 2000), ACTS (where IPOG, IPOG-D, IPOG-F, IPOG-F2 are implemented) (Yu et al. 2013), and CitLab (Cavalgna et al. 2013). All IPO-based proposals have in common the fact that they perform horizontal and vertical growths to construct the final test suite. Moreover, some need two auxiliary matrices which may decrease its performance by demanding more computer memory. Such algorithms accomplish exhaustive comparisons within each horizontal extension which may penalize efficiency.

PICT can be regarded as one baseline tool where other approaches have been done based on it (PictMaster 2017). The algorithm implemented in this tool works in two phases, the first being the construction of all t-tuples to be covered. This can often be a not interesting solution since many t-tuples may require large disk space for storage.

Thus, it is interesting to think about a new greedy solution for CIT that does not need, at the beginning, to enumerate all t-tuples (such as PICT) and does not demand many auxiliary matrices to operate (as some IPO-based approaches). Although we have some recent rigorous empirical evaluations comparing greedy algorithms with meta-heuristics solutions (Petke et al. 2015) and greedy approaches against SAT-based methods (Yamada et al. 2016), there are no rigorous empirical assessments comparing greedy algorithms/tools, representative of the unconstrained CIT domain, among each other.

In this paper, we present a new algorithm, called T-Tuple Reallocation (TTR), to generate CIT test suites specifically via the Mixed-value Covering Array (MCA) technique. The main reasoning behind TTR is to generate an MCA M by creating and reallocating t-tuples into this matrix M, considering a variable called goal (ζ). TTR is a greedy algorithm for unconstrained CIT.

Three versions of the TTR algorithm were developed and implemented in Java. Version 1.0 is the original version of TTR (Balera and Santiago Júnior 2015). In version 1.1 (Balera and Santiago Júnior 2016), we made a change where we do not order the input parameters. In the last version, 1.2, the algorithm no longer generates the matrix of t-tuples (Θ) but rather it works on a t-tuple by t-tuple creation and reallocation into M. Moreover, version 1.2 was also implemented in C.

We performed two controlled experiments addressing cost-efficiency and only cost. Considering both experiments, we performed 3,200 executions related to 8 solutions. In the first controlled experiment, our goal was to compare versions 1.1 and 1.2 of TTR (in Java) in order to check whether there is significant difference between both versions of our algorithm. In such experiment, we jointly considered cost (size of test suites) and efficiency (time to generate the test suites) in a multi-objective perspective. We conclude that TTR 1.2 is more adequate than TTR 1.1 especially for higher strengths (5 and 6).

We then carried out a second controlled experiment where we confronted TTR 1.2 with five other greedy algorithms/tools for unconstrained CIT: IPOG-F (Forbes et al. 2008), jenny (Jenkins 2016), IPO-TConfig (Williams 2000), PICT (Czerwonka 2006), and ACTS (Yu et al. 2013). We performed two evaluations where in the first one we compared TTR 1.2 with IPOG-F and jenny since these were the solutions we had the source code (to precisely measure the time). Hence, a cost-efficiency (multi-objective) assessment was accomplished. In order to address a possible evaluation bias in the time measures due to different programming languages, we compared the implementation of TTR 1.2 (in Java) with IPOG-F (in Java), and the implementation of TTR 1.2 (in C) with jenny (in C). In the second assessment, we did a cost (single objective) evaluation where TTR 1.2 (Java) was compared with PICT, IPO-TConfig, and ACTS. The conclusion is the same as before: TTR 1.2 is better for higher strengths (5 and 6).

In this paper, we extend our previous works where we presented version 1.0 of TTR (Balera and Santiago Júnior 2015), and version 1.1 together with another controlled experiment (Balera and Santiago Júnior 2016). The contributions of this work are:

Even though we considered version 1.1 of TTR in (Balera and Santiago Júnior 2016), we did not detail this version since the focus of this previous paper was this other controlled experiment. Thus, we highlight the key features of TTR 1.1 here;
We created another version of our algorithm, 1.2, where, at the beginning, TTR does not generate the matrix of t-tuples. Our goal here is trying to avoid an exhaustive combination of t-tuples as might happen with other classical greedy approaches. Moreover, we rely on just one auxiliary matrix which is different from other greedy solutions which require two auxiliary matrices;
We performed two controlled experiments in the unconstrained CIT domain (TTR 1.1 × TTR 1.2; TTR 1.2 × IPOG-F, jenny, IPO-TConfig, PICT, ACTS) with almost three times more participants, in each experiment, than in the previous one (Balera and Santiago Júnior 2016). In addition, we run each participant (instance) 5 times with different input orders of parameters and values to address the nondeterminism of the solutions. To the best of our knowledge, no previous research presented rigorous empirical evaluations for greedy solutions within the unconstrained CIT domain;
We really accomplished a multi-objective (cost-efficiency) evaluation in both controlled experiments (in the second one, we did it in the first assessment). Previously (Balera and Santiago Júnior 2016), we analyzed cost and efficiency in isolation.

This paper is structured as follows. Section 2 presents an overview of the main concepts related to CIT. In Section 3, we show the main definitions and procedures of versions 1.1 and 1.2 of our algorithm. Section 4 shows all the details of the first controlled experiment when we compare TTR 1.1 against TTR 1.2. In Section 6, the second controlled experiment is presented where TTR is confronted with the other 5 greedy tools. Section 7 presents related work. In Section 8, we show the conclusions and future directions of our research.

2 Background

In this section we present some basic concepts and definitions (Kuhn et al. 2013; Petke et al. 2015; Cohen et al. 2003) related to CIT. A CIT algorithm receives as input a number of parameters (also known as factors), p, which refer to the input variables. Each parameter can assume a number of values (also known as levels) v. Moreover, t is the strength of the coverage of interactions. For example, in pairwise testing, the degree of interaction is two, so the value of strength is 2. In t-way testing, a t-tuple is an interaction of parameter values of size equal to the strength. Thus, a t-tuple is a finite ordered list of elements, i.e. it is a set of elements.

A Fixed-value Covering Array (CA) denoted by CA(N,p,v,t) is an N×p matrix of entries from the set {0,1,⋯,(v−1)} such that every set of t-columns contains each possible t-tuple of entries at least a certain number of times (e.g. once). N is the number of rows of the array (matrix). Note that in a CA, entries are from the same set of v values.

A Mixed-value Covering Array (MCA)^{Footnote 2} it is an extension of a CA and it is more flexible because it allows parameters to assume values from different sets. Hence, it is represented as MCA$\left (N,v^{p_{1}}_{1}v^{p_{2}}_{2}...v^{p_{m}}_{m}, t\right)$, where N is the number of rows of the matrix, $\sum \limits _{i=1}^{m} p_{i}$ is the number of parameters, each v _i is the number of values for each parameter p _i, and t is the strength.

Therefore, in CIT a CA or MCA is a test suite and each row of such matrices is a test case. Suppose that we need to generate a pairwise unconstrained CIT test suite considering the following parameters and their respective values:

$$\begin{array}{*{20}l} OS &= \{macOS, Linux, Windows\},\\ Protocol &= \{IPv4, IPv6\},\\ DBMS &= \{MySQL, PostgreSQL, Oracle\}. \end{array} $$

We can formulate this problem as MCA (N,2¹3²,2) which is denoted as a model for the CIT problem. In other words, we have one parameter (Protocol) which can assume two values, two parameters (OS, DBMS) which can assume three values, and t=2.

As we have mentioned in Section 1, CIT is an interesting solution for the test suite minimization problem. As a matter of perspective, let us consider that there are 10 parameters (A,B,⋯,J) and that each parameter has 5 values, i.e. A={a ₁,a ₂,⋯,a ₅}, B={b ₁,b ₂,⋯,b ₅},..., J={j ₁,j ₂,⋯,j ₅}. If we performed an exhaustive combination, there would be 5¹⁰=9.765.625 test cases generated where each test case is: t c _i={a _k,b _k,⋯j _k}. By using version 1.2 of TTR with t=2, even in a unconstrained context, the test suite reduces to 45 test cases. This gives an idea of the strength of CIT for test suite minimization.

Note that the concepts and definitions we provided in this section are related to the context in which our work is inserted: unconstrained CIT. In case of constrained CIT, constraints must be considered and other definitions can be used (see e.g. (Yamada et al. 2016)).

3 TTR: a new algorithm for combinatorial interaction testing

In this section we detail versions 1.1 and 1.2 of our algorithm. The three versions (1.0 (Balera and Santiago Júnior 2015), 1.1, and 1.2) of TTR were implemented in Java.

3.1 TTR: Version 1.1

Version 1.0 of TTR (Balera and Santiago Júnior 2015) can be summarized as follows: (i) it generates all possible t-tuples that have not yet been covered. The Constructor procedure constructs the matrix Θ; (ii) it generates an initial solution, the matrix M; and (iii) it reallocates the t-tuples from Θ in order to achieve the best final solution (M) via the Main procedure. Then, the final set of test cases is updated in the matrix M. An important point here is that we order the parameters and values that are submitted to the algorithm. In other words, if we submit five parameters A,B,C,D,E with 10, 4, 3, 8, 5 values respectively, TTR orders these five parameters in ascending order: A,D,E,B,C. The goal is trying to be insensitive to the input order of parameters and values.

The same steps described above also exist in TTR 1.1. However, comparing with version 1.0 (Balera and Santiago Júnior 2015), in version 1.1 we do not order the parameters and values submitted to our algorithm. The result is that test suites of different sizes may be derived if we submit a different order of parameters and values. The motivation for such a change is because we realized that, in some cases, less test cases were created due to non-ordering of parameters and values.

Let us consider the running example in Fig. 1 with the strength, t, equals to 2. It is important to note that this is a unit testing level and hence each one of the parameters of register is an input parameter sumitted to TTR. Thus, there are 3 parameters: bank, function and card. We assume that there are two banks (bankA, bankB), two functions (debit, credit), and three types of cards (cardA, cardB, cardC) to deal with. Therefore, there are 2, 2, and 3 values of bank, function and card, respectively, as shown in Table 1.

Table 1 Example of parameters and values: Fig. 1

An algorithm for combinatorial interaction testing: definitions and rigorous evaluations

Abstract

Background

Methods

Results

Conclusion

1 Introduction

2 Background

3 TTR: a new algorithm for combinatorial interaction testing

3.1 TTR: Version 1.1

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Definition 6

3.1.1 The constructor procedure

3.1.2 The initial solution and addition of test cases

3.1.3 Goals

3.1.4 The Main Procedure

3.2 TTR: version 1.2

3.2.1 Initial solution

3.2.2 The main procedure

4 Controlled experiment 1: TTR 1.1 × TTR 1.2

4.1 Definition and context

4.2 Hypotheses and variables

4.3 Description of the experiment

5 Results and discussion

5.1 Validity

6 Controlled experiment 2: TTR 1.2 × other solutions

6.1 Definition and context

6.2 Hypotheses and variables

6.3 Description of the experiment

6.4 Results, discussion and validity

7 Related work

8 Conclusions

Notes

Abbreviations

References

Acknowledgements

Funding

Availability of data and materials

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords