A metrics suite for JUnit test code: a multiple case study on open source software

Toure, Fadel; Badri, Mourad; Lamontagne, Luc

doi:10.1186/s40411-014-0014-6

Research
Open access
Published: 30 December 2014

A metrics suite for JUnit test code: a multiple case study on open source software

Fadel Toure^1,2,
Mourad Badri¹ &
Luc Lamontagne²

Journal of Software Engineering Research and Development volume 2, Article number: 14 (2014) Cite this article

9389 Accesses
5 Citations
Metrics details

Abstract

Background

The code of JUnit test cases is commonly used to characterize software testing effort. Different metrics have been proposed in literature to measure various perspectives of the size of JUnit test cases. Unfortunately, there is little understanding of the empirical application of these metrics, particularly which metrics are more useful in terms of provided information.

Methods

This paper aims at proposing a unified metrics suite that can be used to quantify the unit testing effort. We addressed the unit testing effort from the perspective of unit test case construction, and particularly the effort involved in writing the code of JUnit test cases. We used in our study five unit test case metrics, two of which were introduced in a previous work. We conducted an empirical study in three main stages. We collected data from six open source Java software systems, of different sizes and from different domains, for which JUnit test cases exist. We performed in a first stage a Principal Component Analysis to find whether the analyzed unit test case metrics are independent or are measuring similar structural aspects of the code of JUnit test cases. We used in a second stage clustering techniques to determine the unit test case metrics that are the less volatile, i.e. the least affected by the style adopted by developers while writing the code of test cases. We used in a third stage correlation and linear regression analysis to evaluate the relationships between the internal software class attributes and the test case metrics.

Results and Conclusions

The main goal of this study was to identify a subset of unit test case metrics: (1) providing useful information on the effort involved to write the code of JUnit test cases, (2) that are independent from each other, and (3) that are the less volatile. Results confirm the conclusions of our previous work and show, in addition, that: (1) the set of analyzed unit test case metrics could be reduced to a subset of two independent metrics maximizing the whole set of provided information, (2) these metrics are the less volatile, and (3) are also the most correlated to the internal software class attributes.

1 Background

Software testing plays a crucial role in software quality assurance. It is an important part of the software development lifecycle. Software testing is, however, a time and resource consuming process. The overall effort spent on testing depends on many different factors, including human factors, testing techniques, used tools, characteristics of the software development artifacts, and so forth. We focus, in this paper, on unit test case construction, and particularly on the effort required to write unit test cases. Software metrics can be used to quantify different perspectives related to unit test case construction. Different metrics have, in fact, been proposed in literature in order to quantify various perspectives related to the size of JUnit test cases. Unfortunately, there is little understanding of the empirical application of these metrics, particularly which metrics provide more useful information on the effort involved to write the code of JUnit test cases.

In a previous work (Toure et al. [2014]), we extended existing JUnit test case metrics by introducing two new metrics. We analyzed the code of the JUnit test cases of two open source Java software systems. We used in total five unit test case metrics. We investigated, using the Principal Component Analysis technique, the orthogonal dimensions captured by the studied suite of unit test case metrics. We wanted, in fact, to better understand the structural aspects of the code of JUnit test cases measured by the metrics and particularly determine which metrics are more useful for quantifying the JUnit test code. Results show that, overall: (1) the new introduced unit test case metrics are relevant in the sense that they provide useful information related to the code of unit test cases, (2) the studied unit test case metrics are not independent (overlapping information), and (3) the best subset of independent unit test case metrics providing the best independent information (maximizing the variance) varies from one system to the other. As the number of analyzed system was limited to two, we could not reasonably draw final conclusions about the best subset of metrics. Furthermore, this preliminary study leads us to suspect that some of the unit test case metrics are more volatile than others, in the sense that they are more influenced by the style adopted by developers while writing the code of unit test cases.

The empirical study presented in this paper extends our previous work and aims at analyzing more deeply the suite of unit test case metrics. The study was conducted in three main stages. We used the same five unit test case metrics. This time, we collected data from six open source Java software systems for which JUnit test cases exist. The analyzed case studies are of different sizes and from different domains. In a first stage, we replicated the study performed in our previous work on the data we collected from the six selected systems. We performed a Principal Component Analysis (PCA). We used this technique to find whether the analyzed unit test case metrics are independent or are measuring similar structural aspects of the code of JUnit test cases. We used in a second stage clustering techniques, particularly K-Means and Univariate clustering, to determine the unit test case metrics that are the less volatile, i.e. the less influenced by the style adopted by developers while writing the code of unit test cases. We investigated the distribution and the variance of the unit test case metrics based on three important internal software class attributes. We focused on size, complexity and coupling. We used in a third stage correlation and linear regression analysis to evaluate the relationships between the internal software class attributes and the suite of unit test case metrics, and particularly to determine what are the unit test case metrics that are the most related to the internal software class attributes. Results confirm two observations made in our previous work: (1) the studied unit test case metrics are not independent, i.e. they capture overlapping information, and (2) the new introduced unit test case metrics provide useful information related to the code of JUnit test cases. Results also show three new findings: (3) there is a couple of independent unit test case metrics that maximizes the information, (4) these two metrics are the less affected by the style adopted by developers while writing the code of unit test cases, and (5) these metrics are also the most related to the internal software class attributes.

The rest of this paper is organized as follows: Section 2 gives a brief survey of related work. The studied unit test case metrics are presented in Section 3. Section 4 presents the different stages of the empirical study we conducted. Finally, Section 5 concludes the paper and outlines some future work directions.

2 Related work

Several studies in literature have addressed the estimation (prediction) of the testing effort by considering various factors such as use case points, number of test cases, test case execution, defects, cost, and so forth. Unfortunately, only few studies have focused on the analysis (quantification) of different aspects related to the test code. Unit test code has, however, been used in different studies addressing for example the testing coverage (Mockus et al. [2009]) or the relationships (links) between the units under test and corresponding test code (Rompaey and Demeyer [2009], Qusef et al. [2011]).

Bruntink and Van Deursen ([2004], [2006]) investigated factors of testability of object-oriented software systems. The authors studied five open source Java software systems in order to explore the relationships between object-oriented design metrics and some characteristics of the code of JUnit test cases. Testability was measured inversely by the number of lines of test code and the number of assert statements in the test code. Results show that there is a significant relationship between the used object-oriented design metrics and the measured characteristics of JUnit test classes. The two unit test case metrics (the number of lines of test code and the number of assert statements in the test code) used by Bruntink and Van Deursen were, in fact, intended to measure two perspectives related to the size of the JUnit test cases. The authors used an adapted version of the fish bone diagram developed by Binder in ([1994]) to identify testability factors. Bruntink and Van Deursen argued that the used test case metrics reflect, in fact, different source code factors Bruntink and Van Deursen ([2004], [2006]): factors that influence the number of required test cases and factors that influence the effort involved to develop each individual test case. These two categories have been referred as test case generation and test case construction factors.

Singh et al. ([2008]) used object-oriented metrics and neural networks to predict the testing effort. The testing effort was measured in terms of lines of code added or changed during the lifecycle of a defect. Singh and Saha ([2010]) focused on the prediction of the testability of Eclipse at the package level. Testability was measured using several metrics including the number of lines of test code, the number of assert statements in the test code, the number of test methods and the number of test classes. Results show that there is a significant relationship between the used object-oriented metrics and test metrics.

Badri et al. ([2010]) explored the relationship between lack of cohesion metrics and unit testability in object-oriented software systems. Badri et al. ([2011]) investigated the capability of lack of cohesion metrics to predict testability of classes using logistic regression methods. In these studies also, testability was measured inversely by the number of lines of test code and the number of assert statements in the test code. Results show that lack of cohesion is a significant predictor of unit testability of classes. Badri and Toure ([2012]) explored the capacity of object-oriented metrics to predict the unit testing effort of classes using logistic regression analysis. Results indicate, among others, that multivariate regression models based on object-oriented design metrics are able to accurately predict the unit testing effort of classes. The same unit test case metrics have been used in this study.

Zhou et al. ([2012]) investigated the relationship between the object-oriented metrics measuring structural properties and unit testability of a class. The investigated structural metrics cover in fact five property dimensions including size, cohesion, coupling, inheritance, and complexity. In this study, the size of a test class is used to indicate the effort involved in unit testing.

We can intuitively expect that all the metrics mentioned above are related to the size of test suites. However, there is little understanding of the empirical application of these metrics, particularly which metrics provide more useful information on the effort involved to write the code of JUnit test cases. To the best of our knowledge, there is no empirical evidence on the underlying orthogonal dimensions captured by these metrics. Also, is that these metrics are independent or are measuring similar structural aspects of the code of JUnit test cases (overlapping information). In addition, is that the distribution of these metrics is influenced by the systems design and the style adopted by the developers while writing the code of unit test cases? In others words, do the distribution of these metrics varies significantly from one developer to another for similar classes (test case metrics information could be strongly biased)? In the case where the unit test case metrics vary significantly, what is the subset of metrics that are the less sensitive to the development style variations? Furthermore, are there others structural aspects that these metrics do not capture? Indeed, some classes, depending on the design and particularly on the collaboration between classes, will require drivers and/or monitors to achieve unit testing. We believe that this will also affect the effort involved in the construction of test cases. The metrics mentioned above do not seem to capture these dimensions. This issue needs, however, to be investigated.

3 Unit test case metrics

We used in our study the following unit test case metrics:

TLOC: This metric counts the number of lines of code of a test class (Bruntink and Van Deursen [2004]). It is used to indicate the size of the test class.

TASSERT: This metric counts the number of assert statements that occur in the code of a test class (Bruntink and Van Deursen [2004]). In JUnit, assert statements are used by the testers to compare the expected behavior of the class under test to its current behavior. This metric is used to indicate another perspective of the size of a test class. It is directly related to the construction of test cases.

TNOO: This metric counts the number of methods in a test class (Singh and Saha [2010]). It reflects another perspective of the size of a test class.

The metrics TLOC, TASSERT and TNOO, were chosen in our study because they were used in many (related) empirical studies in literature. Size is an attribute that strongly characterizes the effort involved in writing the code of test cases. TLOC and TNOO are size related metrics. TNOO is, however, a little bit different from TLOC in the way that it captures a different perspective of the size by counting the number of methods in a test class. Furthermore, even if intuitively we can expect that the TASSERT metric is correlated with the size of a test class, it is a little bit different from the others size related metrics. It is rather related to the effort involved in the verification between the expected behavior and the actual behavior of the class under test.

We also used in our study the two unit test case metrics that we introduced in our previous work (Toure et al. [2014]):

TINVOK: This metric counts the number of direct method invocations in a test class. It captures the dependencies needed to run the test class.

TDATA: This metric gives the number of new Java objects created in a test class. These data are required to initialize the test.

We assume that the effort necessary to write the code of a test class is proportional to the characteristics measured by the selected unit test case metrics.

4 Empirical study

4.1 Selected case studies

Six open source Java software systems were selected for the study: (1) ANT^a: is a Java library and command-line tool that drives processes described in build files as targets and extension points dependent upon each other. (2) JFREECHART (JFC)^b: is a free chart library for Java platform. (3) JODA-Time (JODA)^c: is the de facto standard library for advanced date and time in Java. It provides a quality replacement for the Java date and time classes. The design supports multiple calendar systems, while still providing a simple API. (4) Apache Lucene Core (LUCENE)^d: is a high-performance, full-featured text search engine library. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. (5) POI^e: is a Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft’s OLE 2 Compound Document format (OLE2). It can read and write MS Excel files using Java. (6) IVY^f: is a popular dependency manager. It is characterized by flexibility, simplicity and tight integration with Apache ANT.

These systems have been selected based on different requirements, such as: (1) the source code (and test code) archives of the subject systems must be available and important enough to provide a significant data set on the systems and corresponding JUnit test cases, (2) the subject systems must be of different overall size and from different domains, in order to see if our results will differ from one system to another, (3) the subject systems must be developed in Java. Table 1 summarizes some of the characteristics of the analyzed systems. It gives, for each system: (1) the total number of source code classes, (2) the total number of lines of code of source code classes, (3) the number of classes for which JUnit test cases have been developed, (4) the total number of lines of code of JUnit test cases, (5) the percentage of source code classes for which JUnit test cases have been developed, (6) the percentage of tested lines of code (source code classes for which JUnit test cases have been developed), and (7) the ratio of the number of lines of test code per number of tested lines of source code.

Table 1 Some statistics on the selected systems

A metrics suite for JUnit test code: a multiple case study on open source software

Abstract

Background

Methods

Results and Conclusions

1 Background

2 Related work

3 Unit test case metrics

4 Empirical study

4.1 Selected case studies

4.2 Research methodology and data collection

4.3 Understanding the underlying dimensions captured by the unit test case metrics

4.3.1 ANT

4.3.2 JFC

4.3.3 JODA

4.3.4 LUCENE

4.3.5 POI

4.3.6 IVY

4.3.7 Summary

4.4 Investigating the distribution and the variance of the unit test case metrics

4.4.1 K-Means clustering

4.4.2 Univariate clustering

4.4.3 Summary

4.5 Exploring the relationships between the internal software class attributes and the unit test case metrics

4.5.1 Correlation between metrics

4.5.2 Linear regression analysis

4.6 Threats to validity

4.6.1 Internal validity threats

4.6.2 External validity threats

4.6.3 Construct threats

5 Conclusions and future work

6 Endnotes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords