Technical Debt Prioritization: State of the Art. A Systematic Literature Review
Valentina Lenarduzzi, Terese Besker, Davide Taibi, Antonio Martini, Francesca Arcelli Fontana
TTechnical Debt Prioritization: State of the Art. ASystematic Literature Review
Valentina Lenarduzzi a , Terese Besker b , Davide Taibi c , Antonio Martini d ,Francesca Arcelli Fontana e a LUT University, Lathi (Finland) b Chalmers University of Technology, G¨oteborg (Sweden) c Tampere University, Tampere (Finland) d University of Oslo, Oslo (Norway) e University of Milano-Bicocca, Milan (Italy)
Abstract
Background.
Software companies need to manage and refactor TechnicalDebt issues. Therefore, it is necessary to understand if and when refac-toring of Technical Debt should be prioritized with respect to developingfeatures or fixing bugs.
Objective.
The goal of this study is to investigate the existing body ofknowledge in software engineering to understand what Technical Debt pri-oritization approaches have been proposed in research and industry.
Method.
We conducted a Systematic Literature Review of 557 unique pa-pers published until 2019, following a consolidated methodology applied insoftware engineering. We included 44 primary studies.
Results.
Different approaches have been proposed for Technical Debt pri-oritization, all having different goals and proposing optimization regardingdifferent criteria. The proposed measures capture only a small part of theplethora of factors used to prioritize Technical Debt qualitatively in prac-tice. We present an impact map of such factors. However, there is a lack ofempirical and validated set of tools.
Conclusion.
We observed that Technical Debt prioritization research is pre-liminary and there is no consensus on what the important factors are andhow to measure them. Consequently, we cannot consider current research
Email addresses: [email protected] (Valentina Lenarduzzi), [email protected] (Terese Besker), [email protected] (Davide Taibi), [email protected] (Antonio Martini), [email protected] (FrancescaArcelli Fontana)
Preprint submitted to JSS January 31, 2020 a r X i v : . [ c s . S E ] J a n onclusive. In this paper, we therefore outline different directions for neces-sary future investigations. Keywords:
Technical Debt, Technical Debt Prioritization
1. Introduction
Technical Debt (TD) is a metaphor introduced by Ward Cunningham [1]to represent sub-optimal design or implementation solutions that yield abenefit in the short term but make changes more costly or even impossiblein the medium to long term [2].Software companies need to manage such sub-optimal solutions. Thepresence of TD is inevitable [3] and even desirable under some circum-stances [4] for a number of reasons, which may often be related to unpre-dictable business or environmental forces internal or external to the organi-zation.However, just like any other financial debt, every TD has an interestattached, or else an extra cost or negative impact that is generated bythe presence of a sub-optimal solution [5]. When such interest becomesvery costly, it can lead to disruptive events, such as development crises [3].The current best practices employed by software companies include keepingTD at bay by avoiding it if the consequences are known or refactoring orrewriting code and other artifacts in order to get rid of the accumulatedsub-optimal solutions and their negative impact.However, companies cannot afford to avoid or repay all the TD thatis generated continuously and may be unknown [3]. The main businessgoals of companies are to continuously deliver value to their customers andto maintain their products. Thus, the activity of refactoring TD usuallycompetes with developing new features and fixing defects: Such activitiesare often prioritized over repayment of TD [3]. It is therefore of utmostimportance to understand when refactoring TD becomes more importantthan postponing a feature or a bug fix. In other words, it is important tounderstand how to prioritize TD with respect to features and bugs .In addition, recent studies show how different projects and even differenttypes of TD might be associated with different refactoring costs (principal)and negative impact (interest) [6]. This means that some TDs can be moredangerous than others [7, 8], and it is therefore important to understandhow to prioritize TD with respect to other TD .However, there is no overall study reporting the current state of the artand practice related to how to prioritize TD. Our goal in this paper is to2urvey the existing body of knowledge in software engineering to understandwhich approaches have been proposed in research and industry to prioritize
TD.For this reason, we performed a Systematic Literature Review (SLR) onthe prioritization of TD.We conducted an SLR in order to investigate the existing body of knowl-edge in software engineering to understand how TD is prioritized in softwareorganizations and which research approaches have been proposed.The main contribution of this paper is a report on the state of the artconcerning approaches, factors, measures, and tools used in practice or pro-posed in research to prioritize TD.The paper is structured as follows: In Section 2, we describe the back-ground of this review. In Section 3, we outline the research methodologyadopted in this study. Section 4 and Section 5 present and discuss the ob-tained results. Finally, in Section 6, we identify the threats to validity andin Section 7 draw the conclusion.
2. Background
In this Section, we will explain the meaning of TD in order to avoidconfusion or misunderstandings, and we will report on previously publishedsystematic reviews.
The concept of TD was introduced for the first time in 1992 by Cun-ningham as ”The debt incurred through the speeding up of software projectdevelopment which results in a number of deficiencies ending up in highmaintenance overheads” [1]. In 2013, McConnell [9] refined the definitionof TD as ”A design or construction approach that’s expedient in the shortterm but that creates a technical context in which the same work will costmore to do later than it would cost to do now (including increased cost overtime)” . In 2016, Avgeriou et al. [10] defined it as ”A collection of designor implementation constructs that are expedient in the short term, but setup a technical context that can make future changes more costly or impossi-ble. TD presents an actual or contingent liability whose impact is limited tointernal system qualities, primarily maintainability and evolvability” .Li et al. [5] conducted a systematic mapping study for understanding theconcept of TD and created an overview of the current state of research onmanaging TD. Based on the selected studies (96), they proposed a classifica-tion of ten types of TD at different levels, as reported in Table 1. Since this3lassification derives from a recent secondary study and is, according to ourknowledge, the most complete one available in the literature, we consideredit in our search strategy process (Section 3.2) to define our search terms.
Table 1: Definition of Technical Debt [5]
TD Type Definition
Requirements TD ”refers to the distance between the optimal requirements specificationand the actual system implementation, under domain assumptions andconstraints”Architectural TD ”is caused by architecture decisions that make compromises in some in-ternal quality aspects, such as maintainability”Design TD ”refers to technical shortcuts that are taken in detailed design”Code TD ”is the poorly written code that violates best coding practices or codingrules. Examples include code duplication and over- complex code”Test TD ”refers to shortcuts taken in testing. An example is lack of tests (e.g.,unit tests, integration tests, and acceptance tests)”Build TD ”refers to flaws in a software system, in its build system, or in its buildprocess that make the build overly complex and difficult”Documentation TD ”refers to insufficient, incomplete, or outdated documentation in any as-pect of software development. Examples include out-of-date architecturedocumentation and lack of code comments”Infrastructure TD ”refers to a sub-optimal configuration of development-related processes,technologies, supporting tools, etc. Such a sub-optimal configuration neg-atively affects the team’s ability to produce a quality product”Versioning TD ”refers to the problems in source code versioning, such as unnecessarycode forks”Defect TD ”refers to defects, bugs, or failures found in software systems” s In this Section, we briefly report on previous systematic reviews (Sys-tematic Mapping Studies and Systematic Literature Reviews) available inthe source engines, showing their main goals in Table 2). We present thestudies in chronological order in order to show the research evolution re-garding TD. The first systematic review was published in 2012 [11] and thelast ones, to the best of our knowledge, in 2018 [12],[13].Tom et al. [11] exploited an exploratory case study technique that in-volves a multivocal literature review, supplemented by interviews with soft-ware practitioners and academics, in order to establish the boundaries of theTD phenomenon. As a result, they created a theoretical framework that pro-vides a holistic view of TD, comprising a set of TD dimensions, attributes,precedents, and outcomes. The framework provides a useful approach tounderstanding the overall phenomenon of TD for practical purposes.Li et al. [5] investigated TD management (TDM), providing a classifica-tion of TD concepts and presenting the current state of research on TDM.They considered publications between 1992 and 2013, ultimately selecting44 studies. The results showed a need for empirical studies with high-qualityevidence on the TDM process, application of TDM approaches in industrialcontexts, and tools for managing the different TD types during the TDMprocess.Ampatzoglou et al. [14] analyzed research efforts regarding TD, focusingon financial aspects underlying software engineering concepts. They con-sidered publications until 2015, selecting 69 studies. The results provide aglossary of terms and a classification scheme for financial approaches to beapplied for managing TD. Moreover, they discovered that a clear mappingbetween financial and software engineering concepts is lacking.Ribeiro et al. [15] evaluated the appropriate time for paying a TD itemand how to apply decision-making criteria to balance the short-term benefitsagainst long-term costs. They considered publications until 2016, selecting38 studies. They identified 14 decision-making criteria that can be used bydevelopment teams to prioritize the payment of TD items and a list of typesof debt related to the criteria.Alves et al. [16] investigated what strategies have been proposed to iden-tify and manage TD in software projects, considering publications between2010 and 2014 and selecting 100 studies. They proposed an initial taxonomyof TD types and provided a list of indicators to identify TD and managementstrategies. Moreover, they analyzed the current state on TD, highlightingpossible research gaps. The results showed a growing interest of researchersin the TD area. They identified some gaps regarding new indicator proposalsand management strategies and tools for controlling TD. Another gap theyidentified regards empirical studies for validating the proposed strategies.Fern´andez-S´anchez et al. [17] identified the elements needed to man-age TD, considering publications until 2017 and selecting 69 studies. Theydid not provide a general overview of the TD phenomenon or of the ac-tivities for managing TD. The elements were classified into three groups(basic decision-making factors, cost estimation techniques, practices andtechniques for decision-making) and grouped based on stakeholders pointsof view (engineering, engineering management, and business-organizationalmanagement).Behutiye et al. [18] analyzed the state of the art of TD and its causes,consequences, and management strategies in the context of agile softwaredevelopment (ASD). They considered publications until 2017 and selected 38studies, finding potential research areas for further investigation. The studyhighlighted positive interest in TD and ASD and provided some potentialcategories that can easily lead to TD, such as ”focus on quick delivery andarchitectural and design issues. 5esker et al. [12] investigated Architectural TD (ATD), synthesizing andcompiling research efforts in order to create new knowledge with a specificinterest in ATD. They considered publications between 2005 and 2016, se-lecting 43 studies. The results showed a lack of guidelines on how to manageATD successfully in practice and of an overall process where these activitiesare fully integrated.Rios er al. [13] performed a tertiary study based on a set of five researchquestions and evaluated 13 secondary studies dating from 2012 to March2018. They evolved a taxonomy of TD types, identified a list of situationsin which debt items can be found in software projects, and organized amap representing the state of the art of activities, strategies, and tools forsupporting TD management. Their results can help to identify points thatstill require further investigation in TD research. For example, they foundthat there are management activities that do not have any type of supporttool.Recently, Khomyakov et al. [19] investigated existing tools for the mea-surement and analysis of TD, focusing on quantitative methods that couldalso be automated. They selected 21 papers out of 331 retrieved. Theirresults show that many new approaches are being defined to measure TD.
Table 2: Previous SLR s ID Year Goal [11] 2012 Understanding the nature of TD[5] 2015 TD management and TD classification[14] 2015 Financial approaches for managing TD[15] 2016 TD payment prioritization[16] 2016 TD management strategies, TD taxonomy[17] 2017 TD management elements[18] 2017 TD in Agile development[12] 2018 Managing architectural TD[13] 2018 TD types, management strategies[19] 2019 TD tools
3. Methodology
In order to understand the state of the art and the practice on TechnicalDebt prioritization, we conducted a systematic literature review based onthe guidelines defined by Kitchenham et al. [20], [21]. We also applied the”snowballing” process defined by Wohlin [22].In this Section, we will describe the goal and the research questions (Sec-tion 3.1) and report our search strategy approach (Section 3.2). Moreover,we performed a quality assessment (Section 3.3) for each included paper6nd outlined the data extraction and the analysis (Section 3.4) of the cor-responding data.
The study goal was to investigate the existing body of knowledge insoftware engineering to understand how TD is prioritized in software orga-nizations and what research approaches have been proposed.Based on our goal, we defined the following research questions ( RQ s ): RQ Which types of TD have been investigated mostly? RQ Which prioritization aspects have been proposed?RQ . Are papers prioritizing TD vs TD or TD vs Features?RQ . Is the prioritization based on a one-shot activity or on a contin-uous process? RQ Which factors and measures have been considered for TD prior-itization? RQ Which tools have been used to prioritize TD?In order to satisfy our goal, we first investigated which types of TDare investigated mostly by researchers and when they should concentrateresearch efforts in the future ( RQ ). Regarding TD types, we adopted theclassification proposed by Li et al. [5] reported in Table 1. Moreover, wecharacterized how the different TD types are evaluated, highlighting themeasures and information.The second research question targets how the investigated research pa-pers address the prioritization process of TD, both in terms of differentaspects ( RQ ), i.e., whether the prioritization process of TD mainly focuseson different TD items or also includes prioritization between TD items and,e.g., the implementation of new features ( RQ . ), and of how the prioriti-zation process is described in terms of its periodicity ( RQ . ).Based on the above RQ s , we aimed at identifying a set of factors and mea-sures considered useful during TD prioritization activities ( RQ ). Moreover,we aimed at understanding which measures are considered in the prioritiza-tion of the main TD components, principal and interest.We aim to provide a list of existing tools used to evaluate TD in order todepict the current situation in terms of numbers and the maturity of eachtool ( RQ ). The search strategy involves the outline of the most relevant biblio-graphic sources and search terms, the definition of the inclusion and ex-7lusion criteria, and the selection process relevant for the inclusion decision.Our search strategy is depicted in Figure 1.
KeywordsBibliographicsourcesRetrievedpapers Inclusion andexclusion criteria testing Inclusion andexclusion criteria Full reading SnowballingReferencesAccepted papers
Figure 1: The Search and Selection Process
Search terms . In our search string, we included all the terms relatedto TD proposed by Li et al. [5] and reported in Table 1 (Section 2).The search string contained the following search terms: (”technical debt”)OR (”design debt”) OR (”architect* debt”) OR(”test* debt”) OR (”implem* debt”) OR (”docum* debt”) OR(”requirement debt”) OR (”code debt”) OR (”Infrastructure debt”)OR (”versioning debt”) OR (”defect debt”) OR (”build debt”)
We used the asterisk character (*) for the second term group in order tocapture possible term variations such as plurals and verb conjugations. Toincrease the likelihood of finding publications addressing TD prioritization,we applied the search string to both title and abstract.
Bibliographic sources . We selected the list of relevant bibliographicsources following the suggestions of Kitchenham and Charters [20], sincethese sources are recognized as the most representative in the software en-gineering domain and used in many reviews. The list includes:
ACM Dig-ital Library, IEEEXplore Digital Library, Science Direct, Scopus, GoogleScholar, CiteSeer library, Inspec, Springer link . Moreover, we performeda manual search on the most important conferences and workshops onTechnical Debt, such as the International Conference on Technical Debt8TechDebt).
Inclusion and exclusion criteria . We defined inclusion and exclusioncriteria to be applied to the title and abstract (T/A) or to the full text (F)or to both cases (All), as reported in Table 3.
Table 3: Inclusion and exclusion criteria
Criteria Assessment Criteria Step
Inclusion Papers that prioritize TD issues AllPapers that report the criteria of removal&refactoring&remediation ofTD issues regarding any aspect (financial, maintenance, performance,readability, ...) AllPapers that compare TD issues AllPapers that empirically validated/elicited the results FExclusion Papers not fully written in English T/APapers not peer-reviewed (i.e., blog, forum ...)Duplicate papers (only consider the most recent version) T/APosition papers and work plans (i.e., papers that do not report results) T/APublications where the full paper cannot be located (i.e., if database useddoes not have access to the full text of the publication) T/APublications that only mention prioritization of TD in an introductorystatement and do not fully or partly focus on it AllOnly the latest version of the papers (e.g., journal papers that extendconference papers are excluded if they refer to the same dataset) All
Search and selection process . The search was conducted in Decem-ber 2019 and included all the publications available until this period. Theapplication of the searching terms returned 557 unique papers.
Testing the applicability of inclusion and exclusion criteria:
Before ap-plying the inclusion and exclusion criteria, we tested their applicability [21]on a subset of ten papers (assigned to all the authors) randomly selectedfrom the papers retrieved.
Applying inclusion and exclusion criteria to title and abstract:
We ap-plied the refined criteria to the remaining 547 papers. Each paper was readby two authors; in the case of disagreement, a third author was involved inthe discussion to clear up any such disagreement. For 29 papers, we involveda third author. Out of the 557 initial papers, we included 116 based on titleand abstract.
Full reading:
We fully read the 116 papers included by title and abstract,applying the criteria defined in Table 3 and assigning each one to two au-thors. We involved a third author for six papers to reach a final decision.Based on this step, we selected 49 papers as possibly relevant contributions.
Snowballing:
We performed the snowballing process [22], consideringall the references presented in the retrieved papers and evaluating all thepapers referencing the retrieved ones, which resulted in one additional rele-9ant paper. We applied the same process as for the retrieved papers. Thesnowballing search was conducted in December 2019. We identified only 11potential papers, but only one of these was included in order to composethe final set of publications.Based on the search and selection process, we retrieved a total of 50papers for the review, as reported in Table 5.
Before proceeding with the review, we checked whether the quality ofthe selected papers was sufficient to support our goal and whether the qual-ity of each paper reached a certain quality level. We performed this stepaccording to the protocol proposed by Dyb˚a and Dingsøyr [23]. To evaluatethe selected papers, we prepared a checklist (Table 4) with a set of specificquestions. We ranked each answer, assigning a score on a five-point Lik-ert scale (0=poor, 4=excellent). A paper satisfied the quality assessmentcriteria if it achieved a rating higher than (or equal to) 2.
Table 4: Quality Assessment Criteria QA s Quality Assessment Criteria (QA) Response Scale QA Is the paper based on research (or is it merely a ”lessonslearned” report based on expert opinion)?QA Is there a clear statement of the aims of the research?QA Is there an adequate description of the context in which theresearch was carried out?QA Was the research design appropriate to address the aims of theresearch? Excellent = 4QA Was the recruitment strategy appropriate for the aims of theresearch? Very Good=3QA Was there a control group with which to compare treatments? Good=2QA Was the data collected in a way that addressed the researchissue? Fair=1QA Was the data analysis sufficiently rigorous? Poor=0QA Has the relationship between researcher and participants beenconsidered to an adequate degree?QA Is there a clear statement of findings?QA Is the study of value for research or practice?
Among the 50 papers included in the review from the search and selec-tion process, only 44 fulfilled the quality assessment criteria, as reported inTable 5. 10 able 5: Results of search and selection and application of quality assessment criteria
Step
Retrieval from bibliographic sources (unique papers) 557Reading by title and abstract 439 rejectedFull reading 68 rejectedBackward and forward snowballing 1Papers identified 50Quality assessment 6 rejected primary studies 44
In Table 6, we list the 44 papers included in the review (Appendix Areports the details for each paper). The detailed references of all the 44primary studies is reported in Appendix A.
Table 6: The Selected Papers id Title Authors Year [SP1] An empirical model of technical debt and interest Nugroho, A. et al. 2011[SP2] Investigating the impact of design debt on softwarequality Zazworka, N. et al. 2011[SP3] Prioritizing design debt investment opportunities Zazworka, N. et al. 2011[SP4] Estimating the principal of an application’s techni-cal debt Curtis, B. et al. 2012[SP5] Investigating the impact of code smells debt on qual-ity code evaluation Arcelli Fontana, F.et al. 2012[SP6] Using technical debt data in decision making: Po-tential decision approaches Seaman, C. et al. 2012[SP7] Defining the decision factors for managing defects:A technical debt perspective Snipes, W. et al. 2012[SP8] A formal approach to technical debt decision making Schmid, K. 2013[SP9] Challenges to and Solutions for Refactoring Adop-tion: An Industrial Perspective Sharma, T. et al. 2015[SP10] Investigating Architectural Technical Debt accumu-lation and refactoring over time: A multiple-casestudy Martini, A. et al. 2015[SP11] On the use of time series and search based softwareengineering for refactoring recommendation Wang, H. et al. 2015[SP12] Towards Prioritizing Architecture Technical Debt:Information Needs of Architects and Product Own-ers Martini, A. andBosch, J. 2015[SP13] Validating and prioritizing quality rules for manag-ing technical debt: An industrial case study Falessi, D. andVoegele, A. 2015[SP14] Developing processes to increase technical debt vis-ibility and manageability An action research studyin industry Yli-Huumo, J. et al. 2016[SP16] How do software development teams manage tech-nical debt? An empirical study Yli-Huumo, J. et al. 2016[SP17] Identifying and quantifying architectural debt Xiao, L. et al. 2016[SP18] JSpIRIT: A flexible tool for the analysis of codesmells Vidal, S. et al. 2016Continued on next page able 6 continued from previous page id Title Authors Year [SP19] Minimizing refactoring effort through prioritizationof classes based on historical, architectural and codesmell information Choudhary, A. andSingh, P. 2016[SP20] Pragmatic approach for managing technical debt inlegacy software project Gupta, R.K. et al. 2016[SP21] Technical debt prioritization using predictive ana-lytics Codabux, Z. andWilliams, B.J. 2016[SP22] Technical Debt Management with Genetic Algo-rithms Vathsavayi, S. H.and Systa, K. 2016[SP23] A Heuristic for Estimating the Impact of LingeringDefects: Can Debt Analogy Be Used as a Metric? Akbarinasaji, S. etal. 2017[SP24] A strategy based on multiple decision criteria tosupport technical debt management Ribeiro, L.F. et al. 2017[SP25] An empirical assessment of technical debt practicesin industry Codabux, Z. et al. 2017[SP26] Assessing code smell interest probability: A casestudy Charalampidou, S.et al. 2017[SP27] Impact of architectural technical debt on daily soft-ware development work - A survey of software prac-titioners Besker, T. et al. 2017[SP28] Investigating the identification of technical debtthrough code comment analysis de Freitas Farias,M.A. et al. 2017[SP29] Lessons learned from the ProDebt research projecton planning technical debt strategically Ciolkowski, M. et al. 2017[SP30] Looking for Peace of Mind? Manage Your (Techni-cal) Debt: An Exploratory Field Study Ghanbari, H. et al. 2017[SP31] Revealing social debt with the CAFFEA framework:An antidote to architectural debt Martini, A., Bosch,J. 2017[SP32] Technical debt interest assessment: From issues toproject Martini, A. et al. 2017[SP33] The magnificent seven: Towards a systematic esti-mation of technical debt interest Martini, A., Bosch,J. 2017[SP34] The pricey bill of Technical Debt - When and bywhom will it be paid? Besker, T. et al. 2017[SP35] A semi-automated framework for the identificationand estimation of Architectural Technical Debt: Acomparative case-study on the modularization of asoftware component Martini, A. et al. 2018[SP36] Early evaluation of technical debt impact on main-tainability Conejero, J.M. et al. 2018[SP37] Technical Debt tracking: Current state of practice:A survey and multiple case study in 15 large orga-nizations Martini, A. et al. 2018[SP38] Identifying and Prioritizing Architectural DebtThrough Architectural Smells: A Case Study in aLarge Software Company Martini, A. et al. 2018[SP39] Prioritize technical debt in large-scale systems usingcodescene Tornhill A. 2018[SP40] Prioritizing technical debt in database normaliza-tion using portfolio theory and data quality metrics Albarak M. andBahsoon R. 2018[SP41] Towards a Technical Debt Management Frameworkbased on Cost-Benefit Analysi Firdaus H.M. andLichter H. 2018Continued on next page able 6 continued from previous page id Title Authors Year [SP42] Design debt prioritization: a design best practice-based approach Pl¨osch R. et al. 2018[SP43] Aligning Technical Debt Prioritization with Busi-ness Objectives: A Multiple-Case Study Rebouas R. et al. 2018[SP44] Technical Debt Prioritization: A Search-Based Ap-proach Alfayez R. andBoehm B. 2019 We extracted data from the 43 primary studies (PSs) that satisfied thequality assessment criteria. The context of each PS is explained in terms of:Context Data, Process Data, and Outcome Data, as reported in Table 7.
Context Data is necessary to outline the context of each PS in termsof the type of evaluated TD, according to the list proposed by [5]. Wealso extracted data regarding the projects considered in the study, such asnumber of projects, project size, and programming languages. Moreover, wecollected information about the process phase where the TD is evaluated.
Process Data explain the process adopted to evaluate and prioritize TDissues. We collected data on the type of process (single activity or continuousprocess, proactive or reactive) and the type of analysis, distinguishing be-tween qualitative, quantitative, and mixed evaluation approaches. We alsoretrieved information about the frameworks and tools adopted to evaluateand prioritize TD issues. This data is exclusively based on what is reportedin the papers, without any kind of personal interpretation.
Outcome Data identifies the criteria of removal/refactoring/remediationof TD issues. Moreover, we extracted the measures and factors used to as-sess the prioritization of a TD issue and which of these are suggested duringthe prioritization process.
Table 7: Data Extraction
Category Type
Context Data Technical Debt type (according to [5])Analyzed project ( .5. Replicability In order to allow replication and extension of our work by other re-searchers, we prepared a replication package for this study with the com-plete results obtained.
4. Results
Based on the adopted selection process, we identified 39 primary studies(PSs) as listed in Table 6. We illustrate the distribution by year in Figure 2.The first three relevant papers on TD prioritization were published in2011. In the next two years, between 2012 and 2014, only three papers werepublished. From 2015, the publication trend increased a lot (5 papers),experiencing a considerable increase in 2016, 2017, and 2018 with 10, 12,and 8 papers, respectively.The selected PSs are published in 22 different sources, including 6 jour-nals and 15 conferences and workshops. Specifically, the journal publicationsources are: (2 papers) Information and Software Technology (IST), (2 pa-pers) Journal of Systems and Software (JSS), (2 papers) IEEE Software, (1paper) Empirical Software Engineering Journal (EMSE), (1 paper) Journalof Software: Evolution and Process (JSEP), (1 paper) Science of ComputerProgramming.Regarding conferences and workshops, the numbers are: (10 papers)International Conference on Technical Debt (TechDebt) (former Workshopon Managing Technical Debt (MTD)), (4 papers) Euromicro Conference onSoftware Engineering and Advanced Applications (SEAA),(3 papers) In-ternational Conference on Agile Software Development (XP), (2 papers)International Conference on Product-Focused Software Process Improve-ment (PROFES), (2 papers) International Conference on Software Engi-neering (ICSE), (1 paper) International Conference on Management of Dig-ital Eco Systems (MEDES), (1 paper) International Conference on ServicesComputing (SCCC), (1 paper) International Workshop on Quantitative Ap-proaches to Software Quality (QuASoQ), (1 paper) International Workshopon Emerging Trends in Software Metrics (WETSoM), (1 paper) Interna-tional Conference on Enterprise Information Systems (ICEIS), (1 paper) N u m b e r o f P a p e r s Figure 2: Paper Distribution by Year
28 PSs (75.67%) conducted case studies in order to investigate TD issues,analyzing different sets of projects. 24 out 28 PSs report the findings for eachanalyzed project in terms of projects number, project size, and programminglanguage.Regarding the number of projects analyzed, the majority of the PSsconsidered fewer than seven each, with most considering only one project.We identified three papers that took into account as context a huge numberof projects, such as [SP4] with 700 projects, [SP1] with 44 projects, and[SP5] with 12 projects. Only 11 PSs report on the programming languageof the project(s), with Java, C .2. RQ Which types of TD have been investigated mostly?
Considering the TD type reported in Table 1, the types of TD consideredmost frequently in the PSs were: Code Debt (38%), Architectural Debt(24%), and Design Debt (10%). Moreover, some PSs (24%) do not reporton issues of any specific TD type, but evaluate TD in general (Figure 3).
Figure 3: Types of TD
Code TD is generally investigated from the point of view of its impact onone - or more than one - software qualities [SP13], [SP18], [SP19], [SP26].Maintainability [SP4], [SP5], [SP11] and maintenance effort [SP1], [SP2],[SP11], [SP19] are considered most often by the PSs. Code debt evaluationis mostly based on code smells [SP2], [SP5], [SP11], [SP18], [SP19], [SP26].Other metrics are also considered, such as the time [SP4], [SP23] orcost [SP1] needed to fix a violation, and quality rules [SP13].Some factors related to subjective evaluation such as customer feed-back [SP23] or developers’ comments in the code [SP28] are evaluated lessoften.The approaches mainly involve models that reduce TD by removing orrefactoring code smells or other metrics [SP11],[SP18]. These approacheslook at the impact on code smells [SP5], make a comparison with classeswithout smells [SP2], [SP26], or rank the code rules [SP13] perceived ascritical by developers.Architectural TD is general investigated taking into account the roleof architectural smells [SP17], [SP19], [SP20] or complex architectural de-sign [SP17], [SP27] which negatively impact software quality [SP17], [SP19],[SP20]. Architectural TD is evaluated by measuring the extra maintenanceeffort for bug fixing [SP17] or analyzing the bug-proneness [SP17] of the16ode. Another approach combines three different perspectives, such as his-torical data of the projects, architectural design, and severity of the classprioritizing the refactoring activities [SP19].Architectural design is used to identify high interest in terms of wastedtime related to architectural TD [SP27], combined with other metrics suchas number of files and percentage of complex functions and files [SP35].Another approach identifies dependencies and social gaps across archi-tecture organization in order to define architectural TD [SP31]. Which prioritization aspects have been proposed?
TD prioritization is considered as one of the most important activitieswhen managing TD. The TD prioritization process is used for defining theordering and/or scheduling of planned refactoring initiatives based on thepriority of each identified TD item concerning the impact of the individualitems on the software. Several different prioritization aspects have beenproposed by researchers in the reviewed publications and a few methods onhow to prioritize TD have been developed, but there is no unified approachregarding how the TD prioritization process should be carried out, nor isthere a consensus on which aspects to focus on when performing the TDprioritization process. The selection of the prioritization strategy is currentlycontext-dependent in most organizations [SP21].In order to analyze the prioritization aspects presented in the retrievedpublications, a thematic analysis approach was used. Thematic analysis isan effective method for identifying, analyzing, and reporting patterns andthemes within a searched data scope [24]. The thematic analysis returnedmainly five themes illustrating different prioritization aspects. However, oneshould note that from a software evolution perspective, these aspects canpotentially have dependencies and couplings.Based on the analysis, the different suggested prioritization strategiespresented in the reviewed publications are mainly: a) improving softwarequality, b) increasing software practitioners productivity, c) affection on thecorrectness of the software, d) cost-benefit analysis (CBA) to compare vari-ous TD items with respect to low cost and high payoff, or e) a combinationof several different approaches.Studies focusing on internal software quality as a prioritization strategycommonly focus on a quality assessment of the software in order to identifythe TD items that cause the highest maintenance costs [SP1], [SP2], [SP13],[SP19], [SP28], [SP26], [SP4], [SP31], [SP35], [SP41], [SP44], togetherwith factors such as remaining product life, debt severity and its impact17n future development activities, and current business-related constraints[SP3], [SP9].Xiao et al. [SP17] suggest an approach that focuses on architectural TD.It focuses both on locating TD items and on ranking and prioritizing them.Their approach returns the TD items that consume the largest maintenanceeort and therefore deserve more attention and higher priority for refactoringPl¨osh et al. proposes a TD prioritization approach with a primarilyfocus on the prioritization of Design debt, and their approach relies on thequantification of design best practices by transferring the identified TD itemsinto a portfolio-matrix [SP42]. Albarak and Bahsoon further claim thatsoftware systems having database tables below fourth normal form are likelyto form TD and therefore the ill-normalized tables should be prioritized forrefactoring [SP40].Other reviewed publications also take the decrease in software practi-tioners’ productivity into consideration when prioritizing TD, since softwaresuffering from architectural TD, for example, slows down development bycausing rework [SP2], [SP3].Also, the effect TD has on the correctness of the software is describedas an approach for evaluating different candidate TD items for prioritiza-tion [SP2]. More specifically, Arcelli Fontana, Ferme and Spinelli [SP5]report that the prioritization of the refactoring of code smells representingdesign debt can be evaluated by studying the impact of the refactoring ofthe code smells on different quality metrics, with the goal is to identify andprioritize ”the most dangerous smell and hence the smell which representsthe worst TD”. When prioritizing defect debt, in particular, Akbarinasajiet al. [SP23] focus their approach on the severity of the debt items (usingthe categorizations critical, major, normal, and minor) and the duration ofbug-xing time.Codabux et al. [SP21] used a Bayesian approach to build a predictionmodel for determining the ”TD proneness” of each TD item using a classifi-cation scheme according to the TD proneness probability where the risk ofthe individual items is assessed.Other researchers such as [SP3], [SP6] use a cost-benefit analysis whenprioritizing different TD items, focusing on which refactoring activities shouldbe performed first because they are likely to be inexpensive to implementyet have a significant effect, and which refactoring should be postponed dueto high cost and low payoff. The main focus of this approach is on making alucrative investment in the software, with the output of this analysis beinga prioritized list of different TD items ordered by the profitability of thedifferent possible refactoring activities [SP3].18his strategy is echoed by Martini et al. [SP32], who state that ”if theinterest is (or is going to be) high, the debt is worth being paid. On thecontrary, if the interest is not enough to justify the cost of refactoring, thereis no reason to ”waste” resources to refactor the system.”. However, Martiniet al. [SP32] also stress the importance of not only focusing the prioritiza-tion decisions on single TD items by assessing each TD item separately,but also understanding the overall impact TD items generally have on thewhole project, thus focusing on the overall project goals by evaluating theinformation holistically. In this approach, Martini et al. [SP32] also includefactors such as the portion of the code affected by the TD, the project size,the roadmap, the positive impact of the TD, the existence of an alterna-tive, and the cultural attitude of the team when prioritizing TD refactoringactivities.Further, Alfayez and Boehm [SP44] propose an automated search-basedapproach for prioritizing TD using a multi-objective evolutionary algorithmcalled MOEA (which is an open-source Java library), having a focus on therepayment of the TD refactoring activity within a specic cost constraint.Borrowing prioritization approaches from other disciplines, such as fi-nance and psychology, Seaman et al. [SP6] include techniques such as Ana-lytic Hierarchy Process (AHP), the Portfolio method, and the Options ap-proach. The AHP approach involves building a criteria hierarchy, assigningweights and scales to the criteria, and finally performing a series of pairwisecomparisons between the alternatives against the various criteria. The goalof using the Portfolio approach is to select those assets that maximize thereturn on investment or minimize the investment risk.Codabux et al. [SP25] stress the importance of adopting a broader per-spective on the prioritization process, focusing on the liability of TD. Ac-cording to them, decision makers need to think beyond the cost associatedwith fixing the debt, including estimates of the possible future costs result-ing from the decision to ship. The additional costs reflected during theprioritization in terms of liability costs include, e.g., costs for respondingto support requests, costs associated with catastrophic failures, etc., andpotential litigation costs if service level agreements are violated because ofunmanageable debts.Ribeiro et al. [SP24] present a multiple decision strategy criteria modelusing a combination of different prioritization approaches, which can be usedduring different project phases. Their model focuses on aspects such as, e.g.,the severity of the impact the TD items have from a customer perspectiveon the interest cost of TD, the lifetime of the project’s properties, and itspossibility of evolution. 19et another prioritization process that includes different perspectives isthe approach described by Ciolkowski et al. [SP29]. Their approach focuseson a combination of the overall software quality with a focus on produc-tivity improvement from a future-oriented perspective, using a proactivemethodology.Gupta et al. [SP20] use a two-level approach for prioritizing TD. First,the TD items are assessed according to their importance and urgency. In asecond step, the TD items’ impact on business values and effort is assessed.Guo et al. [SP15] present a TD prioritization approach that ranks cus-tomer expectations according to top priority, followed by availability of de-velopment resources, the interest of the TD items, the current status of thedebt-infected modules, and the impact of the debt on other features. Bystudying how software practitioners prioritize TD items in practice, Yli-Huumo et al. [SP14], [SP16] concluded that their prioritization approachcommonly focuses on scalability, business value, use of a feature, and cus-tomer effect.Snipes et al. [SP7] suggest a TD prioritization approach that includesa combination of factors such as severity, the existence of a workaround,urgency of the refactoring required by customers, refactoring effort, the riskof the proposed refactoring, and the scope of testing required.Schmid [SP8] distinguishes between potential and effective TD, wherepotential TD is any type of suboptimal software system, while effective TDrefers to issues in the software system that make further development of thatsystem more difficult. This prioritization approach considers aspects suchas evolution cost, refactoring cost, and the probability that the predictedevolution path will be realized.Further, Almeida et al. [SP43] suggest to also focus on business objectiveswhen prioritization TD in order to support business expectations and goals.The researcher compared the differences between a technical prioritizationand a business-oriented one, and they state that their results show that tak-ing business priorities into account can change decisions related to technicaldebt prioritization. This prioritization aspect is also described to facilitatethe argumentation from the technical side and thereby to convince busi-ness stakeholders to prioritize what was previously considered pure-technicalproblems.Martini and Bosch [SP33] propose a tool called AnaConDebt to provideassistance during the TD prioritization process. Their tool assesses theseverity of the interest for different TD items, with the calculation of theinterest being based on an assessment of seven different factors and theirgrowth. The assessed factors are: 1) reduced development speed, 2) bugs20elated to the TD item, 3) other qualities compromised, 4) other extra costs,5) frequency of the issue, 6) spread in the system, and 7) users affected. Vidalet al. [SP18] also propose a tool called JSpIRIT for specifically prioritizingsource-code-related TD, where the TD items are evaluated according to theirimportance based on different prioritization criteria. The tool calculates aranking for a set of code smells according to their importance, where thetool can instantiate to prioritize TD items by different criteria. Examplesof such criteria are the relevance of the kind of code smells, the history ofthe system, or different software metrics, among others. Additionally, thedeveloper can use external information to improve the prioritization.Yet another reviewed publication [SP39] suggests performing TD priori-tization using a tool called CodeScene, where factors such as how developerswork with the code is taken into consideration. The process uses an com-plexity trend analysis when calculating the indentation-based complexity ofthe identified TD items and together with a skilled human observer set outthe final TD prioritization. . Are papers prioritizing TD vs TD or TD vs Features?
Since todays software companies face increasing pressure to deliver cus-tomer value, the balance between spending developer time, effort, and re-sources on implementing new features or spending it on TD remediationactivities, on fixing bugs, or on other system improvements become vital. Inthis study, we limited the scope to studying the balance between prioritizingthe implementation of new features or the remediation of existing TD.To conclude, this research question seeks to address whether the TDprioritization process mainly focuses on the prioritization among differentTD items or whether the TD items are described as competing with theimplementation of new features or not.Budget, resources, and available time are important factors in a softwareproject, especially during the prioritization process, since spending time andeffort on refactoring activities commonly infers that less time can be spenton implementing new features, for example. This is one of the main reasonswhy software companies do not always spend additional budget and effort onthe refactoring of TD since they commonly have a strong focus on deliveringcustomer-visible features [SP18].Ciolkowski et al. [SP29] describe this situation like this: ”The challengefor project managers is to nd a balance when using the given budget andschedule, either by reducing TD or by adding technical features. This bal-ance is needed to keep time to market for current product releases short andfuture maintenance costs at an acceptable level.” Echo this view stating21hat Ideally, actionable refactoring targets should be prioritized based onthe technical debt interest rate to balance the trade-os between improve-ments, risk, and new features [SP39].Furthermore, Martini, Bosch and Chaudron [SP10] state that TD refac-toring initiatives usually get low priority compared to the implementation ofnew features and that TD that is not directly related to the implementationof new features is often postponed.Vathsavayi and Syst [SP22] echo this notion, stating that ”Decidingwhether to spend resources for developing new features or fixing the debtis a challenging task.” The researchers highlight that software teams needto prioritize new features, bug fixes, and TD refactoring within the sameprioritization process.However, even if the balance between implementing new features andTD refactoring activities is described as important [SP31], the papers in-vestigated in this study commonly focus their prioritization approaches onprioritization among different TD items, with the goal being to determinewhich item should be refactored first. None of the prioritization approachesdescribed in the surveyed publications explicitly addresses how the priori-tization between implementing new features and spending time and efforton the refactoring of TD should be carried out. However, the study byBesker et al. [25] states that ”the pressure of delivering customer value andmeeting delivery deadlines forces the software teams to down-prioritize TDrefactorings continuously in favor of implementing new features rapidly”. . Is the prioritization based on a one-shot activity or on a contin-uous process?
Just as important as prioritizing TD refactoring activities in a project isto describe a management strategy for the prioritization process.Therefore, this research question focuses on how the prioritization pro-cess is described in the reviewed publications in terms of its periodicity. Wedistinguish the different approaches in terms of one-shot activities versuspart of a continuous process.Some of the publications reviewed in this study highlight the TD prior-itization process in terms of it being a continuous, integrated, and iterativeprocess [SP16], [SP22], whereas others stress the importance of prioritizingTD refactoring within each sprint [SP15]. Choudhary et al. [SP19] illustratethe prioritization process as being an integral part of the continuous de-velopment process by saying ”ideally software companies try to incorporaterefactoring practices as an integral part of their development and mainte-nance processes” [SP9], and [SP39] echos this notion stating that a sys-22ematic management of TD and how to reduce it should also be consideredimportant in each release of the development project.Interestingly, however, the rest of the publications reviewed in this studydo not give any explicit recommendations on how often or in what way theprioritization of TD should be carried out. Which factors and measures have been considered for TD priori-tization?
During the prioritization process, six PSs considered both principal andinterest ([SP1], [SP10], [SP13], [SP15], [SP23], [SP35]), while four PSs con-sidered only interest ([SP13], [SP17], [SP27], [SP34]).Principal is calculated as cost [SP1], [SP10] or time [SP1], [SP4] needed tofix technical quality issues [SP1] or violations of quality rules [SP13]. Otherfactors are also considered, such as page rank or customer feedback [SP23].Interest is calculated as extra cost spent on maintenance due to technicalquality issues [SP1], [SP10], [SP17], [SP35] or as wasted time related todifferent activities (management or refactoring) [SP27], [SP34].Principal is compared with interest without considering any item forwhich the benefit does not outweigh the cost [SP15]. The factors consideredare: customer expectations, which have the top priority, followed by avail-ability of development resources, the interest of the TD items, the currentstatus of the debt-infected modules, and the impact of the debt on otherfeatures [SP15].In Table 8, we present an ”Impact Map”, which highlights the plethoraof factors related to the impact (interest) of TD to be considered for prior-itization, and their wide variation across studies and projects. In total, wecounted 53 unique factors.A few of the factors might overlap, although in different papers thefactors are calculated differently. For example, ”number of bugs” and ”ROI(calculated on number of bugs)” are obviously overlapping factors, althoughusing the sheer number of bugs or the cost of their impact as indicatorsmight give very different results when prioritizing. In other cases, a genericconcept of ”interest” or ”cost” has been used, although such values wereprobably implicitly calculated by the researchers or practitioners taking inconsideration some of the other 52 remaining factors explicitly mentioned inthe other papers. However, given the reported information, there is no wayto perform such a mapping. Thus, we report a generic factor, for example”risk”, as different from all the other specific ones.The factors have been grouped into categories, when possible, to helpnavigate them. First, we mapped the factors to qualities that are mentioned23ost often in relation with TD. These categories are ”Evolution”, ”Main-tenance”, and ”Productivity”. For example, the current working definitionof TD explicitly mentions the impact on maintainability and evolvability.Given the emphasis on such qualities, we first grouped the factors accordingto them. TD impacting other qualities was gathered under ”System Qual-ities” (which do not include the former two). Productivity is also usuallyassociated with TD in the form of extra effort spent because of the debt.Next, we proceeded to categorize and group the remaining factors accord-ing to what aspect of software development the impact is related to. Thiscan be important in order to understand which roles would be hit the mostby such impact and what consequences it might have on the prioritization.As an example, TD can have a direct impact on the ”Customer” factors, sosuch TD might be considered more important by some organizations in theirprioritization. Understanding the impact on ”Business” factors can also bevery useful in a prioritization against features that are prioritized mostlyusing business concerns. ”Social” and ”Project” factors need to be takeninto consideration as well, as non-technical aspects of software development.For some of the factors, it was not possible to find a common category(”Other factors”), or they were only described as high-level factors withoutadditional details (”Not specified”).The majority of the papers focus on the impact of TD on maintainability(12). Some papers focus on productivity (7), evolvability (5), and othersystem qualities (6), while 5 papers consider the customer perspective.Only a few papers take into consideration other factors, such businessfactors (3), social factors (3), project factors (3), and other non-categorizedfactors (6). In most of these cases (including the customer aspect), theidentified factors have been reported in a single paper or two. This highlightseither their specificity for a specific context or a lack of focus on these factorsin the literature. In both [SP10] and [SP24], the authors conducted a surveywith practitioners to understand which of these factors are most importantfor developers, architects, and product owners. In most cases, customer andbusiness factors were considered the most important ones. However, only afew papers address such factors when prioritizing TD, so we can concludethat these factors have been overlooked in the literature.In quite a few studies (8), the interest (impact) of TD has been identifiedand assessed as generic interest, interest likelihood, risk, severity, or as cus-tomizable by the practitioners. Six papers present factors not categorizedspecifically in the previously mentioned categories and that represent theimpact of TD spanning multiple categories or represent a specific aspect notrelated to these categories. 24ight other papers assume that the impact of TD is associated withthe (co-)occurrence of instances of different issues (e.g., code smells) thatare considered sub-optimal (”quantity of debt” in the table). However, themeasures used in different papers differ according to the tools used, and theimpact of the individual issues is assumed to be the same or was assignedarbitrarily. Very few papers (4) use an estimate or a measure of the costof refactoring (principal) in contrast to the impact of TD (interest). Thisis in contrast with the theoretical approach ([26], [27], [SP8]), according towhich TD needs to be prioritized by taking into consideration both the costof refactoring and the impact.
Table 8: Impact Map: Factors and measures related to the interest of TD considered whenprioritizing (RQ3)
Category Factors PSsID competitive advantage [SP10] lead time [SP10]attractiveness for the market [SP10]penalties [SP10]feature usage [SP16]business value [SP16]ROI (calculated per bug) [SP20] Customer satisfaction [SP12] long-term satisfaction [SP10]specific customer value [SP10]customer expectations [SP13]customer effect [SP16], [SP24] Evolution time of impact on evolution (short- orlong-term) [SP8] risk of critical impact on evolution(possible crisis) [SP8]impact on other features [SP13], [SP24]impact on upcoming features [SP22], [SP24], [SP32] Maintenance modifiability [SP2], [SP18], [SP26],[SP28] number of bugs [SP2], [SP10], [SP11],[SP17], [SP20], [SP23],[SP28], [SP32], [SP33],[SP38]maintenance cost [SP10], [SP17], [SP35] System Qualities robustness [SP4] performance efficiency [SP2], [SP4], [SP12]security [SP4]transferability [SP4]scalability [SP16]generic qualities [SP32], [SP33], [SP38] Quality Debt Continued on next page able 8 continued from previous page Category Factors PSsID % wasted time (effort) [SP27], [SP32], [SP33],[SP34], [SP35], [SP38] number of developers working on TD [SP35]wasted development hours [SP35]generic effort [SP24]coding output/effort [SP29] Project Factors availability of resources [SP13] project size and complexity [SP32]postponement of bugs [SP23] Social Factors developers’ morale [SP30] social debt [SP31]positive impact of TD [SP32]team culture [SP32] Other Factors contagious debt [SP10] existence of TD solution (alternative) [SP32]spread of impact in the system [SP32], [SP33], [SP38]number of users affected [SP32], [SP33], [SP38]frequency of negative impact [SP32], [SP33], [SP38]kind of smell [SP18], [SP24]history of the system [SP18]compromise architecture [SP18]future cost [SP22]user perception [SP24] Not Specified risk [SP10], [SP25] interest likelihood [SP13], [SP22]interest [SP13], [SP24]severity [SP24], [SP38]customizable [SP18], [SP24], [SP25],[SP32], [SP33], [SP38] Which tools have been used to prioritize TD?
As reported in Table 9, only 14 papers mentioned the usage of tools forevaluating and prioritizing TD, but only ten of them report information onwhich tools were used. The other studies used a custom-made tool developedfor their specific purposes. 26 able 9: Tool Used when Prioritizing TD (RQ4)
Tool Name Tool Link Paper ID
AnaConDebt https://anacondebt.com [SP32], [SP33]ARCAN [28], [29] http://essere.disco.unimib.it/wiki/arcan [SP38]CAFFEA not available [SP31]CAST [SP4]Coverity [SP20]Findbugs http://findbugs.sourceforge.net [SP20]Visual Studio FxCo-pAnalyzer [SP20]iPlasma http://loose.cs.upt.ro/index.php?n=Main.IPlasma [SP5]Jsprit https://sites.google.com/site/santiagoavidal [SP18]Scitool Understand https://scitools.com [SP21]SonarQube [SP30]Codescene https://codescene.io [SP39]
Out of the aforementioned tools, we can identify ten static analysistools:
ARCAN , CAST , Coverity , Findbugs , Visual Studio FxCopAnalyzer , iPlasma , Jspirit , Scitool Understand , and
SonarQube . Scitool Understand analyzes the code and visualizes its architecture. The remaining ones de-tect TD issues such as code or architectural smells, security violations, orothers.
CAST , Coverity , Findbugs , Studio FxCopAnalyzer , Codescene , and
SonarQube are commercial tools commonly used to analyze code complianceagainst a set of rules. When the rules are violated, they raise a TD issue.These tools provide the severity of the issues and classify them into differenttypes (e.g., issues that could lead to bugs, to increased software maintenanceeffort, or to security vulnerabilities). Moreover,
CAST and
SonarQube alsoassociate a remediation effort (principal), the time needed to remove theTD issue.
ARCAN , iPlasma , and Jspirit are open-source tools, developedby research teams and aimed at detecting architectural smells (
ARCAN )and code smells ( iPlasma and
Jspirit ). AnaConDebt [30] is a management tool based on a TD-enhanced back-log. The backlog allows the creation of TD items and performs TD-specificoperations on the created items. In [SP32] and [SP33],
AnaConDebt hasbeen used to report and visualize the information on TD manually collectedby product managers and developers.The
CAFFEA framework [31] identifies organizational roles, where ar-chitectural responsibilities are allocated. Moreover, the tool defines the teammembers and share among them. The framework has been used in [SP31]27o analyze mismatches between the architecture community and the systemarchitecture.
ARCAN was used in [SP38] to detect architectural smells. The TDprincipal was then investigated by means of a survey in a large company.In [SP30], developers were asked to discuss the TD issues raised by
SonarQube . However, there is no information on whether the developersconsidered the severity or the type of TD issues. In [SP4], the authors used
CAST as is to estimate the principal calculated as time to remove all TDissues. iPlasma and
Jspirit were used in [SP5] and [SP18], respectively, todetect the number of code smells to be refactored in the systems underinvestigation.
Scitool Understand was used in [SP21] to identify architectural issues inthe system under investigation.The TD issues detected by
Coverity , Findbugs , and
Visual Studio FxCo-pAnalyser were used in [SP20] for an industrial survey.
5. Discussion
In this Section, we will discuss the results obtained, outlining some im-plications for researchers and practitioners working in the TD domain.Although the TD domain is relatively young compared to other domainssuch as software testing or software quality, significant contributions havebeen published in the last ten years and researchers are becoming more andmore active (Figure 2).Among the ten TD types proposed in 2015 by Li et al. [5] (Table 1),only Code Debt and Architectural Debt have been considered frequently byresearchers ( RQ ) in the context of TD prioritization.In the study proposed by Li et al. [5], Code Debt was the most commonlyinvestigated type of TD, followed by Test Debt. However, other types of TDhave also received significant attention. Differently than in [5], in our workit emerged that Code Debt and Architectural Debt are by far the mostfrequently investigated types of debt when considering TD prioritization.This could be due to the fact that they are easy to measure, mainly based onextensions of previous research from other domains, or it may also be due tothe fact that they (particularly ATD) are considered as the most harmful andexpensive types to manage in software. For example, architectural and codepatterns have been investigated for more than twenty years, even thoughthey were not considered as ”debt”. 28he two most commonly considered types of TD (Code Debt and Archi-tectural Debt) are mainly evaluated by means of architectural or code-levelanti-patterns (architectural smells, code smells, or code violations). More-over, their harmfulness is mainly related to the influence they have on someexternal quality (e.g., the impact of a specific code smell on maintenanceeffort). However, their influence is still not clear, since the vast majority ofstudies do not agree on their harmfulness. Other types of TD should be in-vestigated in the future. We believe that Code Debt is the type investigatedmost often since it is easy to access the data by mining software repositorystudies, while other types of debt require other types of studies, includ-ing case studies involving developers. We recommend that practitionersshould consider the measures identified in this RQ, but should complementthem with expert judgement to understand which architectural smells, codesmells, or code violations to consider.In a software affected by TD, the only significantly effective way to re-duce this TD is to refactor it. This fact stresses the importance of continu-ously and iteratively prioritizing the identified refactoring tasks and therebyhighlights the importance of using an appropriate TD prioritization process.Through this study, we have identified several different approaches for prior-itizing TD ( RQ , RQ . , and RQ . ). However, there is no unified approachfor this activity, nor is there a consensus on which aspects to focus on whenperforming the TD prioritization process.It is evidently clear from the findings that the prioritization process ofTD refactoring can be carried out using different approaches, all havingdifferent goals and proposing optimization with regard to different criteria.This study has identified five different main approaches that aim to: a)improve software qualities, especially maintainability and evolvability, b)increase software practitioners productivity, c) reduce the fault-pronenessof the software, d) compare various TD items using cost-benefit analysis(CBA) to understand the convenience of refactoring, and e) combine severaldifferent approaches.This result is of value to both academics and practitioners and illustratesthat is it important to first identify the goals of TD prioritization, and there-after to implement a corresponding TD prioritization approach targeting theidentified and specified goals.One interesting finding is that the investigated papers usually only com-pared different TD items during this prioritizing process and more rarelycompared the need for implementing a new feature with the refactoring ofTD.Regarding the characteristics and measures considered during the prior-29tization process ( RQ ), the results so far imply that prioritizing TD is anactivity that requires a holistic view of several factors. The systematic as-sessment of TD requires a wide amount of information, which might changefrom case to case, and in most cases TD is prioritized without following astandardized approach. Also, the known measures used in a few papers cap-ture only a small part of the factors that are used to prioritize TD (proxy formaintenance costs or productivity). Using only such measures to prioritizeTD without considering the full picture of the relevant factors (risks andcosts) might consequently result in partial and thus biased prioritization,which in turn could lead to poor business decisions. On the other hand,some of the factors have been reported in a single study conducted in aspecific context and might not be relevant in other prioritization cases.More studies are necessary in order to obtain better evidence on fac-tors that have been overlooked (for example factors related to customers,business, social, and project aspects). In addition, we need to better under-stand which factors should be considered in different contexts, and whichadditional measures should be considered when prioritizing TD. Finally, al-though a few holistic approaches have been reported ([27], [SP24], [SP33]),there is a need for a better defined framework and a standardized approachfor assessing TD.Considering the two main components of TD, only a limited number ofpapers propose how to evaluate principal and interest. Interest is mainlycalculated as extra cost, or as time wasted to fix TD issues. The reason couldbe that TD interest is not easy to calculate without access to empiricaldata from companies. Researchers should design and perform studies tounderstand the actual interest of existing TD issues.The tool support for prioritization activities is very fragmented ( RQ ),which highlights the lack of a solid, widely used, and validated set of toolsspecifically for TD prioritization. Current tools mainly identify TD issuesand, in some cases, propose an estimate of the time needed to fix them.However, to the best of our knowledge, no tools calculate the interest dueto the postponement of activities.Our results can be useful for both researchers and practitioners. Re-searchers should focus on the other types of TD, also considering types ofTD that have not been investigated a lot in the last few years. They canalso evaluate approaches, factors, and measures and how to prioritize them.Moreover, since the available tools are not fully mature, research activitiescan focus on empirical validation of existing tools, confirming the usefulnessof each measure proposed by each tool.Practitioners can benefit from our results by using our impact map to30xplore/anticipate what kind of impact might occur because of TD. More-over, they should be careful in selecting tools, not applying only one butconsidering more than one. Based on our results, we propose a preliminary framework to help prac-titioners during TD prioritization activities as illustrated in Figure 4. Thisframework offers an exploration of different factors that need to be takeninto consideration during the TD prioritization process and how these fac-tors relate to each other.The first step for practitioners is to decide whether they require a pri-oritization of the refactorings among TD issues or whether they need toprioritize a TD refactoring versus the implementation of new features andbug fixes (results from RQ ). This is because the approaches differ in termsof assessing the impact of TD and assessing the value or the impact of fea-tures and bugs. In the former case, the comparison can use the same factors,while in the latter case, it is more probable that the principal and interestof TD need to be compared with feature-oriented factors, for example com-petitive advantage or cost of delay.Once the scope of the comparison is defined, the evaluation of TD shouldbe performed taking into account: 1) the difference in the TD principal (thecost of fixing the issues), 2) the impact (the TD interest), and 3) otherfactors, including economic and marketing factors (results from RQ ).The evaluation can be both quantitative and qualitative ( RQ ), and insome cases could be supported by tools ( RQ ). As an example, companiesmight quantitatively evaluate the presence of Code Debt using tools, butthey might also need to perform a qualitative evaluation (e.g., with codereviews) of factors that cannot be measured with tools, for example con-sidering code readability, analyzability, or other quality characteristics. Inaddition, some tools provide means to calculate the principal of the TD, butpractitioners might need to calculate the interest by qualitatively assessingthe impact factors.Moreover, the evaluation should be performed considering different sce-narios, including the available resources and the possible evolution of thesystem. In fact, TD can be quite context-dependent (as we discussed for theimpact factors in RQ ), which means that practitioners need to assess itwith estimations of future scenarios. For example, in the tool AnaConDebt,practitioners can specify events happening in short-, medium- or long-termscenarios. The evaluation of the different scenarios should help in making31efactoring decisions, for example regarding which refactorings should beperformed and which should be postponed.As an example of the decision process, a company might consider notimplementing a new feature that involves a code section or module that issuffering from TD. This can happen if such TD is estimated to generate highinterest in the short-term scenario: In such a case, the interest generated bythe TD could overcome the cost of delaying the feature. The practitionersmight then decide to refactor the code before implementing the feature.Let’s take a concrete example of how a refactoring decision is made fol-lowing the steps in the prioritization framework. An architect needs todecide whether to refactor a ”sub-optimal” interface before more applica-tions accessing it are developed. The main activity of the architect is toevaluate whether to prioritize the refactoring of TD vs. developing newfeatures. Then the architect needs to take into consideration and calculatedifferent factors (principal, interest, and other factors). Without the refac-toring, the TD would spread to all the new code ( Contagious debt , Table 8).In addition, all the new applications would suffer from the negative impact(interest) generated by interacting with the sub-optimal API (
Spread of im-pact in the system , Table 8). Although delaying the development of the newapplications (feature-oriented factor) would imply costs in the short-termscenario, the lead time (Table 8) for developing new features in the long-term scenario could be reduced as the developers would not pay the interestgenerated by the sub-optimal interface. If such long-term gain overcomesthe cost of delaying the application development, the practitioners shouldchoose to perform the refactoring of the API. In this case, the refactoringdecision would be made by evaluating whether, in a future scenario, the costof avoiding the interest is worth paying the principal.The TD prioritization framework can assist practitioners, in combinationwith the other results presented in this paper (impact map, descriptionof prioritization approaches, and available tools), in reaching a refactoringdecision. 32 pproachOther factorsImpact (interest) Principal (cost of fix) Prioritization
Basedon
QualitativeQuantitativeComparisonTD Vs other TDTD Vs Features, Bugs ScenariosOver TimeResources...
Main activity Evaluation of Is calculated for
RefactoringDecision
OutputAssessed by
RQ2 ToolsRQ3 RQ2 RQ4
Figure 4: The TD Prioritization Framework
6. Threats to Validity
The results of an SLR may be subject to validity threats, mainly con-cerning the correctness and completeness of the survey. In this Section, wewill outline some implications for researchers and practitioners working inthe TD domain. We have structured this Section as proposed by Wohlinet al. [32], including construct, internal, external, and conclusion validitythreats.
Construct validity is related to generalization of the result to the conceptor theory behind the study execution [32]. In our case, it is related to thepotentially subjective analysis of the selected studies. As recommended byKitchenhams guidelines [20], data extraction was performed independentlyby two or more researchers and, in case of discrepancies, a third authorwas involved in the discussion to clear up any disagreement. Moreover,the quality of each selected paper was checked according to the protocolproposed by Dyb˚a and Dingsøyr [23].
Internal validity threats are related to possible wrong conclusions aboutcausal relationships between treatment and outcome [32]. In the case ofsecondary studies, internal validity represents how well the findings representthe findings reported in the literature. In order to address these threats, wecarefully followed the tactics proposed by [20].33 .3. External validity
External validity threats are related to the ability to generalize the re-sult [32]. In secondary studies, external validity depends on the validityof the selected studies. If the selected studies are not externally valid, thesynthesis of its content will not be valid either. In our work, we were notable to evaluate the external validity of all the included studies.
Conclusion validity is related to the reliability of the conclusions drawnfrom the results [32]. In our case, threats are related to the potential non-inclusion of some studies. In order to mitigate this threat, we carefullyapplied the search strategy, performing the search in eight digital librariesin conjunction with the snowballing process [22], considering all the refer-ences presented in the retrieved papers, and evaluating all the papers thatreference the retrieved ones, which resulted in one additional relevant pa-per. We applied a broad search string, which led to a large set of articles,but enabled us to include more possible results. We defined inclusion andexclusion criteria and applied them first to title and abstract. However,we did not rely exclusively on titles and abstracts to establish whether thework reported evidence on Technical Debt prioritization. Before accepting apaper based on title and abstract, we browsed the full text, again applyingour inclusion and exclusion criteria.
7. Conclusion
Software companies need to manage and refactor TD issues since some-times their presence is inevitable, due to a number of causes that may berelated to unpredictable business or environmental forces internal or exter-nal to the organization. Moreover, some types of TD can be more dangerousthan others.Therefore, it is necessary to understand when refactoring TD should beprioritized with respect to implementing features or fixing bugs, or withrespect to other types of TD.We conducted an SLR in order to investigate the existing body of knowl-edge in software engineering and gain an understanding of how TD is pri-oritized in software organizations and what research approaches have beenproposed.The SLR process was carried out by following two rigorous approaches.We included scientific articles indexed by the most important bibliographic34ources and selected by a rigorous process. We considered articles publishedbefore December 2019. Our work is based on 38 selected studies, whichinclude data on the state of the art concerning approaches, factors, measures,and tools used in practice or proposed in research to prioritize TD.The results of our review show that Code Debt and Architectural Debtare by far the most frequently investigated type of debt when consideringTD prioritization, while there is scant evidence about other types of TDsuch as Test Debt and Requirement Debt. The prioritization process of TDrefactoring can be carried out using different approaches, all having differentgoals and proposing optimization with regard to different criteria. However,the identified measures used in a few papers capture only a small part ofthe factors that are used to prioritize TD.There is a lack of empirical evidence on measuring principal and interest.Moreover, our results highlight the lack of a solid, validated, and widely usedset of tools specifically for TD prioritization.In practice, we found that there is a plethora of aspects that need tobe considered when prioritizing TD. We presented an impact map of suchfactors, which can be used as a comprehensive reference regarding whichinterest might be paid by an organization and how it should be considered.This map can also be used to follow up with further research.Future work should focus on the investigation of types of TD that havebeen investigated less often. Moreover, we are planning to investigate howto systematically evaluate and measure the principal and interest of differenttypes of TD. We also aim at developing a framework to support decision-making related to the prioritization of TD.
ReferencesReferences