The fundamental research question regarding our work refer to “What are the trends observed in empirical studies available in the technical literature regarding the benefits and challenges of using Kanban in software organizations?” To answer this research question, we aggregated the results of primary studies regarding Kanban using the SSM. The SSM allows the aggregation of qualitative and quantitative evidence through the use of diagrammatic models (Santos and Travassos 2013). As both qualitative and quantitative research synthesis method, the SSM briefly depicts the essential contextual aspects and informs the effects trend (e.g., positive or negative), as well as a certainty estimation about them. Therefore, the SSM provides balanced information regarding the phenomena, neither aggregating precise quantitative findings nor rich qualitative descriptions.
This blend of integrative and interpretive synthesis (Cruzes and Dybå 2011) was the primary reason for deciding to use the SSM as our research method since the primary studies regarding the use of Kanban in SE report both quantitative and qualitative evidence. In the SSM, interpretative synthesis aspects are concerned with the organization and development of concepts to describe contextual aspects of evidence whereas integrative features are focused on pooling data about cause-effect or moderation relations taking into account the uncertainty estimated for each evidence. Besides, the SSM offers tool support to model and synthesize evidence (Santos et al. 2015; Santos and Travassos 2017a), including facilities for graphical modeling, evidence search, and support for the synthesis. Another essential functionality is the evidence model comparison used to aggregate evidence, which has mechanisms for ‘conflict resolution’ between the models. The Evidence Factory tool including all the results of the synthesis presented in this paper can be accessed at http://evidencefactory.lens-ese.cos.ufrj.br/synthesis/editor/80416.
In general terms, an SSM synthesis study follows three steps: (i) the selection of primary studies, (ii) the analysis and representation of evidence acquired by such studies, and (iii) evidence synthesis. The basic idea involving these three steps is to collect evidence then represent them from the same perspective so that the results can be consolidated and synthesized. It is similar to statistical meta-analysis studies – which is a kind of integrative synthesis – where the effect size is used to get a uniformed view over the studies outcomes, which is also used in their aggregation (Borenstein et al. 2009).
Next, we describe how we applied the SSM to synthesize the research on benefits and challenges of using Kanban in SE. We also provide some descriptions of the SSM definition and utilization necessary for understanding how this synthesis study was conducted. We refer the reader to the following work Santos and Travassos (2013) for further details regarding the method, and to Martinez-Fernandez et al. (2015), Chapetta (2016), and Santos and Travassos (2017b) to find examples of its application.
SSM step 1: Selecting primary studies
As there are four secondary studies regarding Kanban including one (Ahmad et al. 2018) that has been recently published, there is no reason to perform some of the typical procedures involved in this step, such as defining a search string and selecting the studies based on inclusion and exclusion criteria. Instead, we used the datasets from these secondary studies to form the set of primary studies to be aggregated. We have taken the primary studies from the two most recent secondary studies (Al-Baik and Miller 2015; Ahmad et al. 2018). Regarding the other two, one (Ahmad et al. 2013) is updated by Ahmad et al. (2018), and the other (Corona and Pani 2013) is focused on the tools available for Kanban boards in software development. Only the papers reporting results from primary studies (i.e., case study, survey, controlled experiment, or simulation study) on using Kanban in SE were selected from the two secondary studies. Grey literature and experience reports that were considered in these two secondary studies were excluded from the synthesis.
Ahmad et al. (2018) enumerate 23 technical papers as primary studies (and other 23 as experience reports). However, we have found that three of them (Corona and Pani 2013; Ahmad et al. 2013; Al-Baik and Miller 2015) were, in fact, secondary studies. Also, one additional paper was excluded (Heikkilä et al. 2016), since the only challenge it reported, called “Setting up and maintaining Kanban”, has not been translated as it did not represent a moderator – more details about the benefits and challenges of using Kanban in SE considered in each primary study is shown in Table 1 (Section 4). Thus, from the 23 primary studies, 19 were included in the synthesis.
Al-Baik and Miller (2015) enumerate 37 papers as studies from which six studies we could classify as primary studies. Only one of the six primary studies was not included in Ahmad et al. (2018) and, thus, was also included in the synthesis. The remaining 31 papers were excluded because of the following reasons: (i) grey literature (19 papers); (ii) experience reports (six papers – all included in Ahmad et al. (2018)); (iii) it is not an empirical study (five papers); and (iv) not reporting results describing the benefits and challenges of Kanban use (one paper). Appendix B lists all the primary studies. It also should be noticed that all included papers are from the Software Engineering realm. That is, they investigate Kanban as a software technology (i.e., a set of techniques and tools employed in software development).
SSM step 2: Analysis and evidence representation
In this step, the goal is to put the selected primary studies under the same perspective so they can be aggregated. The idea is similar to the statistical meta-analysis, in which all primary studies are represented by a numerical value called effect size and then aggregated by combining those values (Borenstein et al. 2009). In the case of SSM, each primary study is represented by an evidence model, which is denominated theoretical structure. The evidence models describe the primary studies’ contextual aspects and the effects/moderators expected from the object of study – Kanban, in this synthesis. These descriptions are used as input for determining the evidence compatibility and for the aggregation itself.
As 20 primary studies were selected, we have had to create 20 theoretical structures. The name theoretical structure is related to the origin of the model constructs, which were taken from a representation created for theory building (Sjøberg et al. 2008). Since the model was adapted for the purpose of research synthesis, we use the name theoretical structure to bring attention to the model structure instead of the epistemological aspects related to theory building. In fact, this emphasis is also reflected in the method name Structured Synthesis Method. In the following paragraphs, we describe the evidence model constructs.
The ten semantic constructs used in the theoretical structures are shown in Fig. 2. There are three possible types of structural relationships in the representation: is a, part of and property of. All of them have counterparts in UML, respectively: generalization, composition and class attributes. The is a and part of relationships use the same UML notation for generalization and composition. Dashed connections denote properties. The relationships are used to link two types of concepts – value and variable.
A value concept represents a particular variable value, usually an independent variable. Rectangles represent value concepts. They are classified in archetypes (the root of each hierarchy), causes (indicated by the use of bold font and a ‘C1’ following the name denoting that it is the ‘cause 1’ (e.g., ‘Kanban’), and contextual aspects (e.g., ‘Distributed Project’). The four archetypes – activity, actor, system, and technology – were suggested by Sjøberg et al. (2008) in an attempt to capture the typical scenario in SE described by an actor applying a technology to perform activities in a software system.
A variable concept focuses on value variations usually associated with a dependent variable. Variable concepts are represented by ellipses or parallelograms symbolizing effects (e.g., ‘Work Visibility’) and moderators (e.g., ‘Training), respectively. Also, effects are not connected to cause using lines as they are assumed to exist when reading the diagram. Lines are also lacking in the link between moderators and the (moderated) effects. In this case, a textual hint (e.g., ‘M1’) is shown beside both the moderated effect and moderator. Both relationships, cause-effect, and moderation, are denominated influence relationships.
A seven-point Likert scale is used to indicate an effect size. The scale ranges from strongly negative to strongly positive. It is indicated above the ellipse (e.g.,
indicates that ‘Collaboration’ is between weakly positively and positively affected by ‘Kanban’ – the number of arrows indicates the value in the scale;
represents strongly negative and
strongly positive, and half arrows indicate a range such as in the case of ‘Collaboration’). The other type of variable concepts, namely moderators, indicates that some positive or negative effect is moderated (i.e., reduced) when it increases or decreases. It has a scale with three values indicating the moderation direction: inversely proportional, indifferent, and directly proportional. For instance, the moderator ‘Training’ has an inversely proportional influence on ‘Collaboration,’ which means that the more it is present, the less it exerts a moderation influence. The last aspect related to variable concepts is the association of a belief value (ranging from 0% to 100% or 0 to 1) to estimate the confidence in the observed effects and moderations. The bar under each element represents the belief value, e.g., ‘Flow of work’ has 47% of belief value.
Extracting information to build evidence models
In order to create the evidence models, it is necessary to extract information from the primary studies. The goal is to determine and define the concepts (contextual aspects, moderators, and effects) that will form the evidence model, and to estimate the confidence (i.e., belief value) over the variable concepts (moderators and effects). This is usually performed in two stages one for determining the concepts and other for estimating the confidence. These two stages are described next.
In the first stage, the procedures are analog to the coding process (Auerbach and Silverstein 2003), but with the specific goal of developing concepts and relating them according to the diagrammatic model definitions given earlier. Hence, the coding in the SSM does not necessarily need to go through a continuous and iterative process of small steps as it is usually indicated for coding, but it can be focused on the elements of the theoretical structures. There are several recommendations for performing this coding process in the SSM. For instance, one of the recommendations is the translation procedure (Britten et al. 2002). In the SSM, as the goal is to aggregate evidence by combining the compatible theoretical structures, the translation procedure can support the identification of concepts, which at first glance are not comparable, but when translated to the proper concept they become comparable. One example in software context would be translating Understandability and Learnability by a more generic concept, for instance, Usability.Footnote 1 This kind of generalization is not free from threads and should be considered in case by case basis according to the researchers’ interpretations. Readers interested in a detailed view regarding the recommendations and heuristics for the coding process in the SSM can find in Santos (2015).
During the coding, besides the evidence model concepts, it is also necessary to determine the effects intensity and the moderators direction. For qualitative studies, the adverbs and adjectives used to qualify the reported outcomes are translated to the seven-point Likert scale describing the effect size or intensity. When there was no indication of the effect intensity we were conservative and decided to define a range of values to represent the imprecision regarding the intensity, e.g., ‘between weakly positive and positive.’ For the quantitative studies, on the other hand, we need to arbitrate ranges of values using the domain of the dependent variable scale as input to be able to translate it to the seven-point Likert scale. For instance, in Fitzgerald et al. (2014) “the overall cycle time was reduced from almost 100 days to just over 60 days, a significant improvement” – the authors qualified the difference of 40 days as a significant improvement, which was used to determine this effect as strongly positive in the Likert scale.
In the second stage, with the concepts and their relationships defined, the SSM needs further definitions to determine the confidence (i.e., the belief value) related to the effects and moderators. Two inputs are used to that end. One is the study type of which evidence was acquired. The SSM uses the GRADE evidence hierarchy (Atkins et al. 2004) to split the 0–1 belief value range into four subranges: unsystematic observations [0.00, 0.25]; observational studies [0.25, 0.50]; quasi-experiments [0.50, 0.75]; and randomized controlled [0.75, 1]. The second input is the quality assessment which is translated into the 0.25 subrange. The SSM proposes to use two checklists to assess the quality of each study, which are explained in Santos and Travassos (2013). Based on this, the belief values listed in Table 5 (Appendix A) are calculated, e.g., the study P1 was observational (0.25), and in the performed quality assessment using the checklists, it got 0.17 out of 0.25. As one can see, the estimation procedure give lower belief values for less reliable studies and higher values for the more reliable ones. Thus, the basic idea is to reflect the reliability of the evidence represented by a theoretical structure. Details regarding the quality assessment for each study can be found in the Evidence Factory tool.
For performing these two stages for translating evidence from the primary studies into the diagrammatic evidence models, three researchers – the first three authors – organized the tasks in the following manner.
First, the 20 papers were evenly distributed among the researchers. Each of them thoroughly read the papers and extracted the relevant information to create the evidence models. The benefits and challenges enumerated in the secondary study by Ahmad et al. (2018) were used as the primary source for identifying the effects (i.e., benefits) and moderators (i.e., challenges) in the primary studies’ reports. It should be noticed that this process is usually performed inductively based on the primary studies textual report, but in the specific case of this study we used the work of Ahmad et al. (2018) as the benefits and challenges represent the codes extracted from the primary studies. Still, we were not able to find all the benefits and challenges in the papers as indicated in Ahmad et al. (2018) – the differences are presented in Section 4. On the other hand, contextual aspects were identified using the different SSM recommendations and heuristics.
Second, after the evidence models’ creation, the researchers discussed the models together. Each researcher summarized his papers and presented the models for the other two. During this process, three mains aspects were focused: (i) assessing the understanding of the primary studies’ context and outcomes, (ii) indicating the trace between the theoretical structures’ concepts and the excerpts from which they were extracted, and (iii) reaching a consensus regarding the theoretical structures’ concepts definition (e.g., guided by the reciprocal translation procedure indicated in the SSM – adapted from Meta-Ethnography (Da Silva et al. 2013)).
SSM step 3: Evidence synthesis
In this step, the evidence extracted from the primary studies are aggregated based on the evidence models. Therefore, it is essential to define what makes theoretical structures to match, i.e., what makes them compatible and allowing to aggregate evidence. The SSM defines two theoretical structures are compatible when their value concepts are the same or have the same meaning, which includes the cause, archetypes, and contextual aspects. Once the researcher determines that the theoretical structures can be compatible, then their effects and moderators are combined according to their directions and intensities.
Pair-by-pair comparisons determine the compatibility among theoretical structures. When a pair is found to be compatible, the combined theoretical structure is formed by the common value concepts of both theoretical structures being compared and by the variable concepts present in at least one of the two structures. Archetypes and contextual aspects, represented by value concepts, describe the conditions under which the aggregation is valid. For instance, in order to an evidence model be compatible with the one shown in the Fig. 1, it must have the same value concepts, namely: ‘portfolio management,’ ‘software development process,’ ‘software project,’ ‘distributed project,’ ‘software team,’ ‘medium-scale system,’ and ‘Kanban.’
After identifying compatibility based on the value concepts, the variable concepts’ (i.e., effects and moderators) intensity (e.g., positive or negative) and uncertainty (i.e., belief value) are pooled, in such a way that their intensity reflects the resulted agreement on the combined evidence. To that end, an uncertainty formalism is necessary to combine the results – otherwise, a simple vote counting strategy would be used. In the SSM, the Mathematical Theory of Evidence (Shafer 1976) (also known as Dempster-Shafer theory, DST) is the mathematical formalism that enables obtaining the pooled outcomes. The DST uses two primary inputs to combine two pieces of evidence. One is the hypotheses believed to have a chance to be true – a belief value greater than zero – and the other is the belief values themselves. Hypotheses are defined as sets of the powerset of the defined frame of discernment set whereas the belief value is estimated based on the procedures described in the previous step.
In order to perform the aggregation in the SSM using the DST formalisms, the different intensity values that an effect is possible to assume is represented as the frame of discernment. Since the intensity of an effect uses a seven-point Likert scale, the corresponding frame of discernment in the DST is defined as Θ = {SN, NE, WN, IF, WP, PO, SP} – the element names are abbreviations for the Likert scale terms, e.g., SN is ‘strongly negative’, IF is ‘indifferent’, and WP is ‘weakly positive.’ Likewise, the frame of discernment for moderators is formed by three values that are used to indicate the moderation direction: Θ = {IP, IF, DP} – ‘inversely proportional,’ ‘indifferent,’ and ‘directly proportional,’ respectively.
Once hypotheses and belief values are defined for each evidence, then the Dempster’s Rule of Combination is applied. Eq. (1) shows that the aggregated belief value for each hypothesis C is equal to the sum of the product of the hypotheses’ belief values whose intersection between all hypotheses Ai and Bj of both evidence is C. The function m is called basic probability assignment function which, as the name implies, is used to assign a belief value to the different hypotheses of the powerset.
$$ {m}_3(C)=\frac{\sum \limits_{\begin{array}{c}i,j\\ {}{A}_i\cap {B}_j=C\end{array}}{m}_1\left({A}_i\right)\ {m}_2\left({B}_i\right)}{1-K}, where\ K=\sum \limits_{\begin{array}{c}i,j\\ {}{A}_i\cap {B}_j=\varnothing \end{array}}{m}_1\left({A}_i\right)\ {m}_2\left({B}_i\right) $$
(1)
When the intersection between two hypotheses is an empty set, we say that there is a conflict. A conflict is, then, redistributed to the aggregated hypotheses – that is the function of 1 - K in the denominator. More details about how DST is used in SSM are available in Santos and Travassos (2013). At the end of Section 5, the reader finds an example of how to compute it.