[PDF] Technical Debt Prioritization: State of the Art. A Systematic Literature Review

Abstract

Background. Software companies need to manage and refactor Technical Debt issues. Therefore, it is necessary to understand if and when refactoring Technical Debt should be prioritized with respect to developing features or fixing bugs. Objective. The goal of this study is to investigate the existing body of knowledge in software engineering to understand what Technical Debt prioritization approaches have been proposed in research and industry. Method. We conducted a Systematic Literature Review among 384 unique papers published until 2018, following a consolidated methodology applied in Software Engineering. We included 38 primary studies. Results. Different approaches have been proposed for Technical Debt prioritization, all having different goals and optimizing on different criteria. The proposed measures capture only a small part of the plethora of factors used to prioritize Technical Debt qualitatively in practice. We report an impact map of such factors. However, there is a lack of empirical and validated set of tools. Conclusion. We observed that technical Debt prioritization research is preliminary and there is no consensus on what are the important factors and how to measure them. Consequently, we cannot consider current research conclusive and in this paper, we outline different directions for necessary future investigations.

Full PDF

TTechnical Debt Prioritization: State of the Art. ASystematic Literature Review

Valentina Lenarduzzi a , Terese Besker b , Davide Taibi c , Antonio Martini d ,Francesca Arcelli Fontana e a LUT University, Lathi (Finland) b Chalmers University of Technology, G¨oteborg (Sweden) c Tampere University, Tampere (Finland) d University of Oslo, Oslo (Norway) e University of Milano-Bicocca, Milan (Italy)

Abstract

Background.

Software companies need to manage and refactor TechnicalDebt issues. Therefore, it is necessary to understand if and when refac-toring of Technical Debt should be prioritized with respect to developingfeatures or ﬁxing bugs.

Objective.

The goal of this study is to investigate the existing body ofknowledge in software engineering to understand what Technical Debt pri-oritization approaches have been proposed in research and industry.

Method.

We conducted a Systematic Literature Review of 557 unique pa-pers published until 2019, following a consolidated methodology applied insoftware engineering. We included 44 primary studies.

Results.

Diﬀerent approaches have been proposed for Technical Debt pri-oritization, all having diﬀerent goals and proposing optimization regardingdiﬀerent criteria. The proposed measures capture only a small part of theplethora of factors used to prioritize Technical Debt qualitatively in prac-tice. We present an impact map of such factors. However, there is a lack ofempirical and validated set of tools.

Conclusion.

We observed that Technical Debt prioritization research is pre-liminary and there is no consensus on what the important factors are andhow to measure them. Consequently, we cannot consider current research

Email addresses: [email protected] (Valentina Lenarduzzi), [email protected] (Terese Besker), [email protected] (Davide Taibi), [email protected] (Antonio Martini), [email protected] (FrancescaArcelli Fontana)

Preprint submitted to JSS January 31, 2020 a r X i v : . [ c s . S E ] J a n onclusive. In this paper, we therefore outline diﬀerent directions for neces-sary future investigations. Keywords:

Technical Debt, Technical Debt Prioritization

1. Introduction

Technical Debt (TD) is a metaphor introduced by Ward Cunningham [1]to represent sub-optimal design or implementation solutions that yield abeneﬁt in the short term but make changes more costly or even impossiblein the medium to long term [2].Software companies need to manage such sub-optimal solutions. Thepresence of TD is inevitable [3] and even desirable under some circum-stances [4] for a number of reasons, which may often be related to unpre-dictable business or environmental forces internal or external to the organi-zation.However, just like any other ﬁnancial debt, every TD has an interestattached, or else an extra cost or negative impact that is generated bythe presence of a sub-optimal solution [5]. When such interest becomesvery costly, it can lead to disruptive events, such as development crises [3].The current best practices employed by software companies include keepingTD at bay by avoiding it if the consequences are known or refactoring orrewriting code and other artifacts in order to get rid of the accumulatedsub-optimal solutions and their negative impact.However, companies cannot aﬀord to avoid or repay all the TD thatis generated continuously and may be unknown [3]. The main businessgoals of companies are to continuously deliver value to their customers andto maintain their products. Thus, the activity of refactoring TD usuallycompetes with developing new features and ﬁxing defects: Such activitiesare often prioritized over repayment of TD [3]. It is therefore of utmostimportance to understand when refactoring TD becomes more importantthan postponing a feature or a bug ﬁx. In other words, it is important tounderstand how to prioritize TD with respect to features and bugs .In addition, recent studies show how diﬀerent projects and even diﬀerenttypes of TD might be associated with diﬀerent refactoring costs (principal)and negative impact (interest) [6]. This means that some TDs can be moredangerous than others [7, 8], and it is therefore important to understandhow to prioritize TD with respect to other TD .However, there is no overall study reporting the current state of the artand practice related to how to prioritize TD. Our goal in this paper is to2urvey the existing body of knowledge in software engineering to understandwhich approaches have been proposed in research and industry to prioritize

TD.For this reason, we performed a Systematic Literature Review (SLR) onthe prioritization of TD.We conducted an SLR in order to investigate the existing body of knowl-edge in software engineering to understand how TD is prioritized in softwareorganizations and which research approaches have been proposed.The main contribution of this paper is a report on the state of the artconcerning approaches, factors, measures, and tools used in practice or pro-posed in research to prioritize TD.The paper is structured as follows: In Section 2, we describe the back-ground of this review. In Section 3, we outline the research methodologyadopted in this study. Section 4 and Section 5 present and discuss the ob-tained results. Finally, in Section 6, we identify the threats to validity andin Section 7 draw the conclusion.

2. Background

In this Section, we will explain the meaning of TD in order to avoidconfusion or misunderstandings, and we will report on previously publishedsystematic reviews.

The concept of TD was introduced for the ﬁrst time in 1992 by Cun-ningham as ”The debt incurred through the speeding up of software projectdevelopment which results in a number of deﬁciencies ending up in highmaintenance overheads” [1]. In 2013, McConnell [9] reﬁned the deﬁnitionof TD as ”A design or construction approach that’s expedient in the shortterm but that creates a technical context in which the same work will costmore to do later than it would cost to do now (including increased cost overtime)” . In 2016, Avgeriou et al. [10] deﬁned it as ”A collection of designor implementation constructs that are expedient in the short term, but setup a technical context that can make future changes more costly or impossi-ble. TD presents an actual or contingent liability whose impact is limited tointernal system qualities, primarily maintainability and evolvability” .Li et al. [5] conducted a systematic mapping study for understanding theconcept of TD and created an overview of the current state of research onmanaging TD. Based on the selected studies (96), they proposed a classiﬁca-tion of ten types of TD at diﬀerent levels, as reported in Table 1. Since this3lassiﬁcation derives from a recent secondary study and is, according to ourknowledge, the most complete one available in the literature, we consideredit in our search strategy process (Section 3.2) to deﬁne our search terms.

Table 1: Deﬁnition of Technical Debt [5]

TD Type Deﬁnition

Requirements TD ”refers to the distance between the optimal requirements speciﬁcationand the actual system implementation, under domain assumptions andconstraints”Architectural TD ”is caused by architecture decisions that make compromises in some in-ternal quality aspects, such as maintainability”Design TD ”refers to technical shortcuts that are taken in detailed design”Code TD ”is the poorly written code that violates best coding practices or codingrules. Examples include code duplication and over- complex code”Test TD ”refers to shortcuts taken in testing. An example is lack of tests (e.g.,unit tests, integration tests, and acceptance tests)”Build TD ”refers to ﬂaws in a software system, in its build system, or in its buildprocess that make the build overly complex and diﬃcult”Documentation TD ”refers to insuﬃcient, incomplete, or outdated documentation in any as-pect of software development. Examples include out-of-date architecturedocumentation and lack of code comments”Infrastructure TD ”refers to a sub-optimal conﬁguration of development-related processes,technologies, supporting tools, etc. Such a sub-optimal conﬁguration neg-atively aﬀects the team’s ability to produce a quality product”Versioning TD ”refers to the problems in source code versioning, such as unnecessarycode forks”Defect TD ”refers to defects, bugs, or failures found in software systems” s In this Section, we brieﬂy report on previous systematic reviews (Sys-tematic Mapping Studies and Systematic Literature Reviews) available inthe source engines, showing their main goals in Table 2). We present thestudies in chronological order in order to show the research evolution re-garding TD. The ﬁrst systematic review was published in 2012 [11] and thelast ones, to the best of our knowledge, in 2018 [12],[13].Tom et al. [11] exploited an exploratory case study technique that in-volves a multivocal literature review, supplemented by interviews with soft-ware practitioners and academics, in order to establish the boundaries of theTD phenomenon. As a result, they created a theoretical framework that pro-vides a holistic view of TD, comprising a set of TD dimensions, attributes,precedents, and outcomes. The framework provides a useful approach tounderstanding the overall phenomenon of TD for practical purposes.Li et al. [5] investigated TD management (TDM), providing a classiﬁca-tion of TD concepts and presenting the current state of research on TDM.They considered publications between 1992 and 2013, ultimately selecting44 studies. The results showed a need for empirical studies with high-qualityevidence on the TDM process, application of TDM approaches in industrialcontexts, and tools for managing the diﬀerent TD types during the TDMprocess.Ampatzoglou et al. [14] analyzed research eﬀorts regarding TD, focusingon ﬁnancial aspects underlying software engineering concepts. They con-sidered publications until 2015, selecting 69 studies. The results provide aglossary of terms and a classiﬁcation scheme for ﬁnancial approaches to beapplied for managing TD. Moreover, they discovered that a clear mappingbetween ﬁnancial and software engineering concepts is lacking.Ribeiro et al. [15] evaluated the appropriate time for paying a TD itemand how to apply decision-making criteria to balance the short-term beneﬁtsagainst long-term costs. They considered publications until 2016, selecting38 studies. They identiﬁed 14 decision-making criteria that can be used bydevelopment teams to prioritize the payment of TD items and a list of typesof debt related to the criteria.Alves et al. [16] investigated what strategies have been proposed to iden-tify and manage TD in software projects, considering publications between2010 and 2014 and selecting 100 studies. They proposed an initial taxonomyof TD types and provided a list of indicators to identify TD and managementstrategies. Moreover, they analyzed the current state on TD, highlightingpossible research gaps. The results showed a growing interest of researchersin the TD area. They identiﬁed some gaps regarding new indicator proposalsand management strategies and tools for controlling TD. Another gap theyidentiﬁed regards empirical studies for validating the proposed strategies.Fern´andez-S´anchez et al. [17] identiﬁed the elements needed to man-age TD, considering publications until 2017 and selecting 69 studies. Theydid not provide a general overview of the TD phenomenon or of the ac-tivities for managing TD. The elements were classiﬁed into three groups(basic decision-making factors, cost estimation techniques, practices andtechniques for decision-making) and grouped based on stakeholders pointsof view (engineering, engineering management, and business-organizationalmanagement).Behutiye et al. [18] analyzed the state of the art of TD and its causes,consequences, and management strategies in the context of agile softwaredevelopment (ASD). They considered publications until 2017 and selected 38studies, ﬁnding potential research areas for further investigation. The studyhighlighted positive interest in TD and ASD and provided some potentialcategories that can easily lead to TD, such as ”focus on quick delivery andarchitectural and design issues. 5esker et al. [12] investigated Architectural TD (ATD), synthesizing andcompiling research eﬀorts in order to create new knowledge with a speciﬁcinterest in ATD. They considered publications between 2005 and 2016, se-lecting 43 studies. The results showed a lack of guidelines on how to manageATD successfully in practice and of an overall process where these activitiesare fully integrated.Rios er al. [13] performed a tertiary study based on a set of ﬁve researchquestions and evaluated 13 secondary studies dating from 2012 to March2018. They evolved a taxonomy of TD types, identiﬁed a list of situationsin which debt items can be found in software projects, and organized amap representing the state of the art of activities, strategies, and tools forsupporting TD management. Their results can help to identify points thatstill require further investigation in TD research. For example, they foundthat there are management activities that do not have any type of supporttool.Recently, Khomyakov et al. [19] investigated existing tools for the mea-surement and analysis of TD, focusing on quantitative methods that couldalso be automated. They selected 21 papers out of 331 retrieved. Theirresults show that many new approaches are being deﬁned to measure TD.

Table 2: Previous SLR s ID Year Goal [11] 2012 Understanding the nature of TD[5] 2015 TD management and TD classiﬁcation[14] 2015 Financial approaches for managing TD[15] 2016 TD payment prioritization[16] 2016 TD management strategies, TD taxonomy[17] 2017 TD management elements[18] 2017 TD in Agile development[12] 2018 Managing architectural TD[13] 2018 TD types, management strategies[19] 2019 TD tools

3. Methodology

In order to understand the state of the art and the practice on TechnicalDebt prioritization, we conducted a systematic literature review based onthe guidelines deﬁned by Kitchenham et al. [20], [21]. We also applied the”snowballing” process deﬁned by Wohlin [22].In this Section, we will describe the goal and the research questions (Sec-tion 3.1) and report our search strategy approach (Section 3.2). Moreover,we performed a quality assessment (Section 3.3) for each included paper6nd outlined the data extraction and the analysis (Section 3.4) of the cor-responding data.

The study goal was to investigate the existing body of knowledge insoftware engineering to understand how TD is prioritized in software orga-nizations and what research approaches have been proposed.Based on our goal, we deﬁned the following research questions ( RQ s ): RQ Which types of TD have been investigated mostly? RQ Which prioritization aspects have been proposed?RQ . Are papers prioritizing TD vs TD or TD vs Features?RQ . Is the prioritization based on a one-shot activity or on a contin-uous process? RQ Which factors and measures have been considered for TD prior-itization? RQ Which tools have been used to prioritize TD?In order to satisfy our goal, we ﬁrst investigated which types of TDare investigated mostly by researchers and when they should concentrateresearch eﬀorts in the future ( RQ ). Regarding TD types, we adopted theclassiﬁcation proposed by Li et al. [5] reported in Table 1. Moreover, wecharacterized how the diﬀerent TD types are evaluated, highlighting themeasures and information.The second research question targets how the investigated research pa-pers address the prioritization process of TD, both in terms of diﬀerentaspects ( RQ ), i.e., whether the prioritization process of TD mainly focuseson diﬀerent TD items or also includes prioritization between TD items and,e.g., the implementation of new features ( RQ . ), and of how the prioriti-zation process is described in terms of its periodicity ( RQ . ).Based on the above RQ s , we aimed at identifying a set of factors and mea-sures considered useful during TD prioritization activities ( RQ ). Moreover,we aimed at understanding which measures are considered in the prioritiza-tion of the main TD components, principal and interest.We aim to provide a list of existing tools used to evaluate TD in order todepict the current situation in terms of numbers and the maturity of eachtool ( RQ ). The search strategy involves the outline of the most relevant biblio-graphic sources and search terms, the deﬁnition of the inclusion and ex-7lusion criteria, and the selection process relevant for the inclusion decision.Our search strategy is depicted in Figure 1.

KeywordsBibliographicsourcesRetrievedpapers Inclusion andexclusion criteria testing Inclusion andexclusion criteria Full reading SnowballingReferencesAccepted papers

Figure 1: The Search and Selection Process

Search terms . In our search string, we included all the terms relatedto TD proposed by Li et al. [5] and reported in Table 1 (Section 2).The search string contained the following search terms: (”technical debt”)OR (”design debt”) OR (”architect* debt”) OR(”test* debt”) OR (”implem* debt”) OR (”docum* debt”) OR(”requirement debt”) OR (”code debt”) OR (”Infrastructure debt”)OR (”versioning debt”) OR (”defect debt”) OR (”build debt”)

We used the asterisk character (*) for the second term group in order tocapture possible term variations such as plurals and verb conjugations. Toincrease the likelihood of ﬁnding publications addressing TD prioritization,we applied the search string to both title and abstract.

Bibliographic sources . We selected the list of relevant bibliographicsources following the suggestions of Kitchenham and Charters [20], sincethese sources are recognized as the most representative in the software en-gineering domain and used in many reviews. The list includes:

ACM Dig-ital Library, IEEEXplore Digital Library, Science Direct, Scopus, GoogleScholar, CiteSeer library, Inspec, Springer link . Moreover, we performeda manual search on the most important conferences and workshops onTechnical Debt, such as the International Conference on Technical Debt8TechDebt).

Inclusion and exclusion criteria . We deﬁned inclusion and exclusioncriteria to be applied to the title and abstract (T/A) or to the full text (F)or to both cases (All), as reported in Table 3.

Table 3: Inclusion and exclusion criteria

Criteria Assessment Criteria Step

Inclusion Papers that prioritize TD issues AllPapers that report the criteria of removal&refactoring&remediation ofTD issues regarding any aspect (ﬁnancial, maintenance, performance,readability, ...) AllPapers that compare TD issues AllPapers that empirically validated/elicited the results FExclusion Papers not fully written in English T/APapers not peer-reviewed (i.e., blog, forum ...)Duplicate papers (only consider the most recent version) T/APosition papers and work plans (i.e., papers that do not report results) T/APublications where the full paper cannot be located (i.e., if database useddoes not have access to the full text of the publication) T/APublications that only mention prioritization of TD in an introductorystatement and do not fully or partly focus on it AllOnly the latest version of the papers (e.g., journal papers that extendconference papers are excluded if they refer to the same dataset) All

Search and selection process . The search was conducted in Decem-ber 2019 and included all the publications available until this period. Theapplication of the searching terms returned 557 unique papers.

Testing the applicability of inclusion and exclusion criteria:

Before ap-plying the inclusion and exclusion criteria, we tested their applicability [21]on a subset of ten papers (assigned to all the authors) randomly selectedfrom the papers retrieved.

Applying inclusion and exclusion criteria to title and abstract:

We ap-plied the reﬁned criteria to the remaining 547 papers. Each paper was readby two authors; in the case of disagreement, a third author was involved inthe discussion to clear up any such disagreement. For 29 papers, we involveda third author. Out of the 557 initial papers, we included 116 based on titleand abstract.

Full reading:

We fully read the 116 papers included by title and abstract,applying the criteria deﬁned in Table 3 and assigning each one to two au-thors. We involved a third author for six papers to reach a ﬁnal decision.Based on this step, we selected 49 papers as possibly relevant contributions.

Snowballing:

We performed the snowballing process [22], consideringall the references presented in the retrieved papers and evaluating all thepapers referencing the retrieved ones, which resulted in one additional rele-9ant paper. We applied the same process as for the retrieved papers. Thesnowballing search was conducted in December 2019. We identiﬁed only 11potential papers, but only one of these was included in order to composethe ﬁnal set of publications.Based on the search and selection process, we retrieved a total of 50papers for the review, as reported in Table 5.

Before proceeding with the review, we checked whether the quality ofthe selected papers was suﬃcient to support our goal and whether the qual-ity of each paper reached a certain quality level. We performed this stepaccording to the protocol proposed by Dyb˚a and Dingsøyr [23]. To evaluatethe selected papers, we prepared a checklist (Table 4) with a set of speciﬁcquestions. We ranked each answer, assigning a score on a ﬁve-point Lik-ert scale (0=poor, 4=excellent). A paper satisﬁed the quality assessmentcriteria if it achieved a rating higher than (or equal to) 2.

Table 4: Quality Assessment Criteria QA s Quality Assessment Criteria (QA) Response Scale QA Is the paper based on research (or is it merely a ”lessonslearned” report based on expert opinion)?QA Is there a clear statement of the aims of the research?QA Is there an adequate description of the context in which theresearch was carried out?QA Was the research design appropriate to address the aims of theresearch? Excellent = 4QA Was the recruitment strategy appropriate for the aims of theresearch? Very Good=3QA Was there a control group with which to compare treatments? Good=2QA Was the data collected in a way that addressed the researchissue? Fair=1QA Was the data analysis suﬃciently rigorous? Poor=0QA Has the relationship between researcher and participants beenconsidered to an adequate degree?QA Is there a clear statement of ﬁndings?QA Is the study of value for research or practice?

Among the 50 papers included in the review from the search and selec-tion process, only 44 fulﬁlled the quality assessment criteria, as reported inTable 5. 10 able 5: Results of search and selection and application of quality assessment criteria

Step

Retrieval from bibliographic sources (unique papers) 557Reading by title and abstract 439 rejectedFull reading 68 rejectedBackward and forward snowballing 1Papers identiﬁed 50Quality assessment 6 rejected primary studies 44

In Table 6, we list the 44 papers included in the review (Appendix Areports the details for each paper). The detailed references of all the 44primary studies is reported in Appendix A.

Table 6: The Selected Papers id Title Authors Year [SP1] An empirical model of technical debt and interest Nugroho, A. et al. 2011[SP2] Investigating the impact of design debt on softwarequality Zazworka, N. et al. 2011[SP3] Prioritizing design debt investment opportunities Zazworka, N. et al. 2011[SP4] Estimating the principal of an application’s techni-cal debt Curtis, B. et al. 2012[SP5] Investigating the impact of code smells debt on qual-ity code evaluation Arcelli Fontana, F.et al. 2012[SP6] Using technical debt data in decision making: Po-tential decision approaches Seaman, C. et al. 2012[SP7] Deﬁning the decision factors for managing defects:A technical debt perspective Snipes, W. et al. 2012[SP8] A formal approach to technical debt decision making Schmid, K. 2013[SP9] Challenges to and Solutions for Refactoring Adop-tion: An Industrial Perspective Sharma, T. et al. 2015[SP10] Investigating Architectural Technical Debt accumu-lation and refactoring over time: A multiple-casestudy Martini, A. et al. 2015[SP11] On the use of time series and search based softwareengineering for refactoring recommendation Wang, H. et al. 2015[SP12] Towards Prioritizing Architecture Technical Debt:Information Needs of Architects and Product Own-ers Martini, A. andBosch, J. 2015[SP13] Validating and prioritizing quality rules for manag-ing technical debt: An industrial case study Falessi, D. andVoegele, A. 2015[SP14] Developing processes to increase technical debt vis-ibility and manageability An action research studyin industry Yli-Huumo, J. et al. 2016[SP16] How do software development teams manage tech-nical debt? An empirical study Yli-Huumo, J. et al. 2016[SP17] Identifying and quantifying architectural debt Xiao, L. et al. 2016[SP18] JSpIRIT: A ﬂexible tool for the analysis of codesmells Vidal, S. et al. 2016Continued on next page able 6 continued from previous page id Title Authors Year [SP19] Minimizing refactoring eﬀort through prioritizationof classes based on historical, architectural and codesmell information Choudhary, A. andSingh, P. 2016[SP20] Pragmatic approach for managing technical debt inlegacy software project Gupta, R.K. et al. 2016[SP21] Technical debt prioritization using predictive ana-lytics Codabux, Z. andWilliams, B.J. 2016[SP22] Technical Debt Management with Genetic Algo-rithms Vathsavayi, S. H.and Systa, K. 2016[SP23] A Heuristic for Estimating the Impact of LingeringDefects: Can Debt Analogy Be Used as a Metric? Akbarinasaji, S. etal. 2017[SP24] A strategy based on multiple decision criteria tosupport technical debt management Ribeiro, L.F. et al. 2017[SP25] An empirical assessment of technical debt practicesin industry Codabux, Z. et al. 2017[SP26] Assessing code smell interest probability: A casestudy Charalampidou, S.et al. 2017[SP27] Impact of architectural technical debt on daily soft-ware development work - A survey of software prac-titioners Besker, T. et al. 2017[SP28] Investigating the identiﬁcation of technical debtthrough code comment analysis de Freitas Farias,M.A. et al. 2017[SP29] Lessons learned from the ProDebt research projecton planning technical debt strategically Ciolkowski, M. et al. 2017[SP30] Looking for Peace of Mind? Manage Your (Techni-cal) Debt: An Exploratory Field Study Ghanbari, H. et al. 2017[SP31] Revealing social debt with the CAFFEA framework:An antidote to architectural debt Martini, A., Bosch,J. 2017[SP32] Technical debt interest assessment: From issues toproject Martini, A. et al. 2017[SP33] The magniﬁcent seven: Towards a systematic esti-mation of technical debt interest Martini, A., Bosch,J. 2017[SP34] The pricey bill of Technical Debt - When and bywhom will it be paid? Besker, T. et al. 2017[SP35] A semi-automated framework for the identiﬁcationand estimation of Architectural Technical Debt: Acomparative case-study on the modularization of asoftware component Martini, A. et al. 2018[SP36] Early evaluation of technical debt impact on main-tainability Conejero, J.M. et al. 2018[SP37] Technical Debt tracking: Current state of practice:A survey and multiple case study in 15 large orga-nizations Martini, A. et al. 2018[SP38] Identifying and Prioritizing Architectural DebtThrough Architectural Smells: A Case Study in aLarge Software Company Martini, A. et al. 2018[SP39] Prioritize technical debt in large-scale systems usingcodescene Tornhill A. 2018[SP40] Prioritizing technical debt in database normaliza-tion using portfolio theory and data quality metrics Albarak M. andBahsoon R. 2018[SP41] Towards a Technical Debt Management Frameworkbased on Cost-Beneﬁt Analysi Firdaus H.M. andLichter H. 2018Continued on next page able 6 continued from previous page id Title Authors Year [SP42] Design debt prioritization: a design best practice-based approach Pl¨osch R. et al. 2018[SP43] Aligning Technical Debt Prioritization with Busi-ness Objectives: A Multiple-Case Study Rebouas R. et al. 2018[SP44] Technical Debt Prioritization: A Search-Based Ap-proach Alfayez R. andBoehm B. 2019 We extracted data from the 43 primary studies (PSs) that satisﬁed thequality assessment criteria. The context of each PS is explained in terms of:Context Data, Process Data, and Outcome Data, as reported in Table 7.

Context Data is necessary to outline the context of each PS in termsof the type of evaluated TD, according to the list proposed by [5]. Wealso extracted data regarding the projects considered in the study, such asnumber of projects, project size, and programming languages. Moreover, wecollected information about the process phase where the TD is evaluated.

Process Data explain the process adopted to evaluate and prioritize TDissues. We collected data on the type of process (single activity or continuousprocess, proactive or reactive) and the type of analysis, distinguishing be-tween qualitative, quantitative, and mixed evaluation approaches. We alsoretrieved information about the frameworks and tools adopted to evaluateand prioritize TD issues. This data is exclusively based on what is reportedin the papers, without any kind of personal interpretation.

Outcome Data identiﬁes the criteria of removal/refactoring/remediationof TD issues. Moreover, we extracted the measures and factors used to as-sess the prioritization of a TD issue and which of these are suggested duringthe prioritization process.

Table 7: Data Extraction

Category Type

Context Data Technical Debt type (according to [5])Analyzed project ( .5. Replicability In order to allow replication and extension of our work by other re-searchers, we prepared a replication package for this study with the com-plete results obtained.

4. Results

Based on the adopted selection process, we identiﬁed 39 primary studies(PSs) as listed in Table 6. We illustrate the distribution by year in Figure 2.The ﬁrst three relevant papers on TD prioritization were published in2011. In the next two years, between 2012 and 2014, only three papers werepublished. From 2015, the publication trend increased a lot (5 papers),experiencing a considerable increase in 2016, 2017, and 2018 with 10, 12,and 8 papers, respectively.The selected PSs are published in 22 diﬀerent sources, including 6 jour-nals and 15 conferences and workshops. Speciﬁcally, the journal publicationsources are: (2 papers) Information and Software Technology (IST), (2 pa-pers) Journal of Systems and Software (JSS), (2 papers) IEEE Software, (1paper) Empirical Software Engineering Journal (EMSE), (1 paper) Journalof Software: Evolution and Process (JSEP), (1 paper) Science of ComputerProgramming.Regarding conferences and workshops, the numbers are: (10 papers)International Conference on Technical Debt (TechDebt) (former Workshopon Managing Technical Debt (MTD)), (4 papers) Euromicro Conference onSoftware Engineering and Advanced Applications (SEAA),(3 papers) In-ternational Conference on Agile Software Development (XP), (2 papers)International Conference on Product-Focused Software Process Improve-ment (PROFES), (2 papers) International Conference on Software Engi-neering (ICSE), (1 paper) International Conference on Management of Dig-ital Eco Systems (MEDES), (1 paper) International Conference on ServicesComputing (SCCC), (1 paper) International Workshop on Quantitative Ap-proaches to Software Quality (QuASoQ), (1 paper) International Workshopon Emerging Trends in Software Metrics (WETSoM), (1 paper) Interna-tional Conference on Enterprise Information Systems (ICEIS), (1 paper) N u m b e r o f P a p e r s Figure 2: Paper Distribution by Year

28 PSs (75.67%) conducted case studies in order to investigate TD issues,analyzing diﬀerent sets of projects. 24 out 28 PSs report the ﬁndings for eachanalyzed project in terms of projects number, project size, and programminglanguage.Regarding the number of projects analyzed, the majority of the PSsconsidered fewer than seven each, with most considering only one project.We identiﬁed three papers that took into account as context a huge numberof projects, such as [SP4] with 700 projects, [SP1] with 44 projects, and[SP5] with 12 projects. Only 11 PSs report on the programming languageof the project(s), with Java, C .2. RQ Which types of TD have been investigated mostly?

Considering the TD type reported in Table 1, the types of TD consideredmost frequently in the PSs were: Code Debt (38%), Architectural Debt(24%), and Design Debt (10%). Moreover, some PSs (24%) do not reporton issues of any speciﬁc TD type, but evaluate TD in general (Figure 3).

Figure 3: Types of TD

Code TD is generally investigated from the point of view of its impact onone - or more than one - software qualities [SP13], [SP18], [SP19], [SP26].Maintainability [SP4], [SP5], [SP11] and maintenance eﬀort [SP1], [SP2],[SP11], [SP19] are considered most often by the PSs. Code debt evaluationis mostly based on code smells [SP2], [SP5], [SP11], [SP18], [SP19], [SP26].Other metrics are also considered, such as the time [SP4], [SP23] orcost [SP1] needed to ﬁx a violation, and quality rules [SP13].Some factors related to subjective evaluation such as customer feed-back [SP23] or developers’ comments in the code [SP28] are evaluated lessoften.The approaches mainly involve models that reduce TD by removing orrefactoring code smells or other metrics [SP11],[SP18]. These approacheslook at the impact on code smells [SP5], make a comparison with classeswithout smells [SP2], [SP26], or rank the code rules [SP13] perceived ascritical by developers.Architectural TD is general investigated taking into account the roleof architectural smells [SP17], [SP19], [SP20] or complex architectural de-sign [SP17], [SP27] which negatively impact software quality [SP17], [SP19],[SP20]. Architectural TD is evaluated by measuring the extra maintenanceeﬀort for bug ﬁxing [SP17] or analyzing the bug-proneness [SP17] of the16ode. Another approach combines three diﬀerent perspectives, such as his-torical data of the projects, architectural design, and severity of the classprioritizing the refactoring activities [SP19].Architectural design is used to identify high interest in terms of wastedtime related to architectural TD [SP27], combined with other metrics suchas number of ﬁles and percentage of complex functions and ﬁles [SP35].Another approach identiﬁes dependencies and social gaps across archi-tecture organization in order to deﬁne architectural TD [SP31]. Which prioritization aspects have been proposed?

TD prioritization is considered as one of the most important activitieswhen managing TD. The TD prioritization process is used for deﬁning theordering and/or scheduling of planned refactoring initiatives based on thepriority of each identiﬁed TD item concerning the impact of the individualitems on the software. Several diﬀerent prioritization aspects have beenproposed by researchers in the reviewed publications and a few methods onhow to prioritize TD have been developed, but there is no uniﬁed approachregarding how the TD prioritization process should be carried out, nor isthere a consensus on which aspects to focus on when performing the TDprioritization process. The selection of the prioritization strategy is currentlycontext-dependent in most organizations [SP21].In order to analyze the prioritization aspects presented in the retrievedpublications, a thematic analysis approach was used. Thematic analysis isan eﬀective method for identifying, analyzing, and reporting patterns andthemes within a searched data scope [24]. The thematic analysis returnedmainly ﬁve themes illustrating diﬀerent prioritization aspects. However, oneshould note that from a software evolution perspective, these aspects canpotentially have dependencies and couplings.Based on the analysis, the diﬀerent suggested prioritization strategiespresented in the reviewed publications are mainly: a) improving softwarequality, b) increasing software practitioners productivity, c) aﬀection on thecorrectness of the software, d) cost-beneﬁt analysis (CBA) to compare vari-ous TD items with respect to low cost and high payoﬀ, or e) a combinationof several diﬀerent approaches.Studies focusing on internal software quality as a prioritization strategycommonly focus on a quality assessment of the software in order to identifythe TD items that cause the highest maintenance costs [SP1], [SP2], [SP13],[SP19], [SP28], [SP26], [SP4], [SP31], [SP35], [SP41], [SP44], togetherwith factors such as remaining product life, debt severity and its impact17n future development activities, and current business-related constraints[SP3], [SP9].Xiao et al. [SP17] suggest an approach that focuses on architectural TD.It focuses both on locating TD items and on ranking and prioritizing them.Their approach returns the TD items that consume the largest maintenanceeort and therefore deserve more attention and higher priority for refactoringPl¨osh et al. proposes a TD prioritization approach with a primarilyfocus on the prioritization of Design debt, and their approach relies on thequantiﬁcation of design best practices by transferring the identiﬁed TD itemsinto a portfolio-matrix [SP42]. Albarak and Bahsoon further claim thatsoftware systems having database tables below fourth normal form are likelyto form TD and therefore the ill-normalized tables should be prioritized forrefactoring [SP40].Other reviewed publications also take the decrease in software practi-tioners’ productivity into consideration when prioritizing TD, since softwaresuﬀering from architectural TD, for example, slows down development bycausing rework [SP2], [SP3].Also, the eﬀect TD has on the correctness of the software is describedas an approach for evaluating diﬀerent candidate TD items for prioritiza-tion [SP2]. More speciﬁcally, Arcelli Fontana, Ferme and Spinelli [SP5]report that the prioritization of the refactoring of code smells representingdesign debt can be evaluated by studying the impact of the refactoring ofthe code smells on diﬀerent quality metrics, with the goal is to identify andprioritize ”the most dangerous smell and hence the smell which representsthe worst TD”. When prioritizing defect debt, in particular, Akbarinasajiet al. [SP23] focus their approach on the severity of the debt items (usingthe categorizations critical, major, normal, and minor) and the duration ofbug-xing time.Codabux et al. [SP21] used a Bayesian approach to build a predictionmodel for determining the ”TD proneness” of each TD item using a classiﬁ-cation scheme according to the TD proneness probability where the risk ofthe individual items is assessed.Other researchers such as [SP3], [SP6] use a cost-beneﬁt analysis whenprioritizing diﬀerent TD items, focusing on which refactoring activities shouldbe performed ﬁrst because they are likely to be inexpensive to implementyet have a signiﬁcant eﬀect, and which refactoring should be postponed dueto high cost and low payoﬀ. The main focus of this approach is on making alucrative investment in the software, with the output of this analysis beinga prioritized list of diﬀerent TD items ordered by the proﬁtability of thediﬀerent possible refactoring activities [SP3].18his strategy is echoed by Martini et al. [SP32], who state that ”if theinterest is (or is going to be) high, the debt is worth being paid. On thecontrary, if the interest is not enough to justify the cost of refactoring, thereis no reason to ”waste” resources to refactor the system.”. However, Martiniet al. [SP32] also stress the importance of not only focusing the prioritiza-tion decisions on single TD items by assessing each TD item separately,but also understanding the overall impact TD items generally have on thewhole project, thus focusing on the overall project goals by evaluating theinformation holistically. In this approach, Martini et al. [SP32] also includefactors such as the portion of the code aﬀected by the TD, the project size,the roadmap, the positive impact of the TD, the existence of an alterna-tive, and the cultural attitude of the team when prioritizing TD refactoringactivities.Further, Alfayez and Boehm [SP44] propose an automated search-basedapproach for prioritizing TD using a multi-objective evolutionary algorithmcalled MOEA (which is an open-source Java library), having a focus on therepayment of the TD refactoring activity within a specic cost constraint.Borrowing prioritization approaches from other disciplines, such as ﬁ-nance and psychology, Seaman et al. [SP6] include techniques such as Ana-lytic Hierarchy Process (AHP), the Portfolio method, and the Options ap-proach. The AHP approach involves building a criteria hierarchy, assigningweights and scales to the criteria, and ﬁnally performing a series of pairwisecomparisons between the alternatives against the various criteria. The goalof using the Portfolio approach is to select those assets that maximize thereturn on investment or minimize the investment risk.Codabux et al. [SP25] stress the importance of adopting a broader per-spective on the prioritization process, focusing on the liability of TD. Ac-cording to them, decision makers need to think beyond the cost associatedwith ﬁxing the debt, including estimates of the possible future costs result-ing from the decision to ship. The additional costs reﬂected during theprioritization in terms of liability costs include, e.g., costs for respondingto support requests, costs associated with catastrophic failures, etc., andpotential litigation costs if service level agreements are violated because ofunmanageable debts.Ribeiro et al. [SP24] present a multiple decision strategy criteria modelusing a combination of diﬀerent prioritization approaches, which can be usedduring diﬀerent project phases. Their model focuses on aspects such as, e.g.,the severity of the impact the TD items have from a customer perspectiveon the interest cost of TD, the lifetime of the project’s properties, and itspossibility of evolution. 19et another prioritization process that includes diﬀerent perspectives isthe approach described by Ciolkowski et al. [SP29]. Their approach focuseson a combination of the overall software quality with a focus on produc-tivity improvement from a future-oriented perspective, using a proactivemethodology.Gupta et al. [SP20] use a two-level approach for prioritizing TD. First,the TD items are assessed according to their importance and urgency. In asecond step, the TD items’ impact on business values and eﬀort is assessed.Guo et al. [SP15] present a TD prioritization approach that ranks cus-tomer expectations according to top priority, followed by availability of de-velopment resources, the interest of the TD items, the current status of thedebt-infected modules, and the impact of the debt on other features. Bystudying how software practitioners prioritize TD items in practice, Yli-Huumo et al. [SP14], [SP16] concluded that their prioritization approachcommonly focuses on scalability, business value, use of a feature, and cus-tomer eﬀect.Snipes et al. [SP7] suggest a TD prioritization approach that includesa combination of factors such as severity, the existence of a workaround,urgency of the refactoring required by customers, refactoring eﬀort, the riskof the proposed refactoring, and the scope of testing required.Schmid [SP8] distinguishes between potential and eﬀective TD, wherepotential TD is any type of suboptimal software system, while eﬀective TDrefers to issues in the software system that make further development of thatsystem more diﬃcult. This prioritization approach considers aspects suchas evolution cost, refactoring cost, and the probability that the predictedevolution path will be realized.Further, Almeida et al. [SP43] suggest to also focus on business objectiveswhen prioritization TD in order to support business expectations and goals.The researcher compared the diﬀerences between a technical prioritizationand a business-oriented one, and they state that their results show that tak-ing business priorities into account can change decisions related to technicaldebt prioritization. This prioritization aspect is also described to facilitatethe argumentation from the technical side and thereby to convince busi-ness stakeholders to prioritize what was previously considered pure-technicalproblems.Martini and Bosch [SP33] propose a tool called AnaConDebt to provideassistance during the TD prioritization process. Their tool assesses theseverity of the interest for diﬀerent TD items, with the calculation of theinterest being based on an assessment of seven diﬀerent factors and theirgrowth. The assessed factors are: 1) reduced development speed, 2) bugs20elated to the TD item, 3) other qualities compromised, 4) other extra costs,5) frequency of the issue, 6) spread in the system, and 7) users aﬀected. Vidalet al. [SP18] also propose a tool called JSpIRIT for speciﬁcally prioritizingsource-code-related TD, where the TD items are evaluated according to theirimportance based on diﬀerent prioritization criteria. The tool calculates aranking for a set of code smells according to their importance, where thetool can instantiate to prioritize TD items by diﬀerent criteria. Examplesof such criteria are the relevance of the kind of code smells, the history ofthe system, or diﬀerent software metrics, among others. Additionally, thedeveloper can use external information to improve the prioritization.Yet another reviewed publication [SP39] suggests performing TD priori-tization using a tool called CodeScene, where factors such as how developerswork with the code is taken into consideration. The process uses an com-plexity trend analysis when calculating the indentation-based complexity ofthe identiﬁed TD items and together with a skilled human observer set outthe ﬁnal TD prioritization. . Are papers prioritizing TD vs TD or TD vs Features?

Since todays software companies face increasing pressure to deliver cus-tomer value, the balance between spending developer time, eﬀort, and re-sources on implementing new features or spending it on TD remediationactivities, on ﬁxing bugs, or on other system improvements become vital. Inthis study, we limited the scope to studying the balance between prioritizingthe implementation of new features or the remediation of existing TD.To conclude, this research question seeks to address whether the TDprioritization process mainly focuses on the prioritization among diﬀerentTD items or whether the TD items are described as competing with theimplementation of new features or not.Budget, resources, and available time are important factors in a softwareproject, especially during the prioritization process, since spending time andeﬀort on refactoring activities commonly infers that less time can be spenton implementing new features, for example. This is one of the main reasonswhy software companies do not always spend additional budget and eﬀort onthe refactoring of TD since they commonly have a strong focus on deliveringcustomer-visible features [SP18].Ciolkowski et al. [SP29] describe this situation like this: ”The challengefor project managers is to nd a balance when using the given budget andschedule, either by reducing TD or by adding technical features. This bal-ance is needed to keep time to market for current product releases short andfuture maintenance costs at an acceptable level.” Echo this view stating21hat Ideally, actionable refactoring targets should be prioritized based onthe technical debt interest rate to balance the trade-os between improve-ments, risk, and new features [SP39].Furthermore, Martini, Bosch and Chaudron [SP10] state that TD refac-toring initiatives usually get low priority compared to the implementation ofnew features and that TD that is not directly related to the implementationof new features is often postponed.Vathsavayi and Syst [SP22] echo this notion, stating that ”Decidingwhether to spend resources for developing new features or ﬁxing the debtis a challenging task.” The researchers highlight that software teams needto prioritize new features, bug ﬁxes, and TD refactoring within the sameprioritization process.However, even if the balance between implementing new features andTD refactoring activities is described as important [SP31], the papers in-vestigated in this study commonly focus their prioritization approaches onprioritization among diﬀerent TD items, with the goal being to determinewhich item should be refactored ﬁrst. None of the prioritization approachesdescribed in the surveyed publications explicitly addresses how the priori-tization between implementing new features and spending time and eﬀorton the refactoring of TD should be carried out. However, the study byBesker et al. [25] states that ”the pressure of delivering customer value andmeeting delivery deadlines forces the software teams to down-prioritize TDrefactorings continuously in favor of implementing new features rapidly”. . Is the prioritization based on a one-shot activity or on a contin-uous process?

Just as important as prioritizing TD refactoring activities in a project isto describe a management strategy for the prioritization process.Therefore, this research question focuses on how the prioritization pro-cess is described in the reviewed publications in terms of its periodicity. Wedistinguish the diﬀerent approaches in terms of one-shot activities versuspart of a continuous process.Some of the publications reviewed in this study highlight the TD prior-itization process in terms of it being a continuous, integrated, and iterativeprocess [SP16], [SP22], whereas others stress the importance of prioritizingTD refactoring within each sprint [SP15]. Choudhary et al. [SP19] illustratethe prioritization process as being an integral part of the continuous de-velopment process by saying ”ideally software companies try to incorporaterefactoring practices as an integral part of their development and mainte-nance processes” [SP9], and [SP39] echos this notion stating that a sys-22ematic management of TD and how to reduce it should also be consideredimportant in each release of the development project.Interestingly, however, the rest of the publications reviewed in this studydo not give any explicit recommendations on how often or in what way theprioritization of TD should be carried out. Which factors and measures have been considered for TD priori-tization?

During the prioritization process, six PSs considered both principal andinterest ([SP1], [SP10], [SP13], [SP15], [SP23], [SP35]), while four PSs con-sidered only interest ([SP13], [SP17], [SP27], [SP34]).Principal is calculated as cost [SP1], [SP10] or time [SP1], [SP4] needed toﬁx technical quality issues [SP1] or violations of quality rules [SP13]. Otherfactors are also considered, such as page rank or customer feedback [SP23].Interest is calculated as extra cost spent on maintenance due to technicalquality issues [SP1], [SP10], [SP17], [SP35] or as wasted time related todiﬀerent activities (management or refactoring) [SP27], [SP34].Principal is compared with interest without considering any item forwhich the beneﬁt does not outweigh the cost [SP15]. The factors consideredare: customer expectations, which have the top priority, followed by avail-ability of development resources, the interest of the TD items, the currentstatus of the debt-infected modules, and the impact of the debt on otherfeatures [SP15].In Table 8, we present an ”Impact Map”, which highlights the plethoraof factors related to the impact (interest) of TD to be considered for prior-itization, and their wide variation across studies and projects. In total, wecounted 53 unique factors.A few of the factors might overlap, although in diﬀerent papers thefactors are calculated diﬀerently. For example, ”number of bugs” and ”ROI(calculated on number of bugs)” are obviously overlapping factors, althoughusing the sheer number of bugs or the cost of their impact as indicatorsmight give very diﬀerent results when prioritizing. In other cases, a genericconcept of ”interest” or ”cost” has been used, although such values wereprobably implicitly calculated by the researchers or practitioners taking inconsideration some of the other 52 remaining factors explicitly mentioned inthe other papers. However, given the reported information, there is no wayto perform such a mapping. Thus, we report a generic factor, for example”risk”, as diﬀerent from all the other speciﬁc ones.The factors have been grouped into categories, when possible, to helpnavigate them. First, we mapped the factors to qualities that are mentioned23ost often in relation with TD. These categories are ”Evolution”, ”Main-tenance”, and ”Productivity”. For example, the current working deﬁnitionof TD explicitly mentions the impact on maintainability and evolvability.Given the emphasis on such qualities, we ﬁrst grouped the factors accordingto them. TD impacting other qualities was gathered under ”System Qual-ities” (which do not include the former two). Productivity is also usuallyassociated with TD in the form of extra eﬀort spent because of the debt.Next, we proceeded to categorize and group the remaining factors accord-ing to what aspect of software development the impact is related to. Thiscan be important in order to understand which roles would be hit the mostby such impact and what consequences it might have on the prioritization.As an example, TD can have a direct impact on the ”Customer” factors, sosuch TD might be considered more important by some organizations in theirprioritization. Understanding the impact on ”Business” factors can also bevery useful in a prioritization against features that are prioritized mostlyusing business concerns. ”Social” and ”Project” factors need to be takeninto consideration as well, as non-technical aspects of software development.For some of the factors, it was not possible to ﬁnd a common category(”Other factors”), or they were only described as high-level factors withoutadditional details (”Not speciﬁed”).The majority of the papers focus on the impact of TD on maintainability(12). Some papers focus on productivity (7), evolvability (5), and othersystem qualities (6), while 5 papers consider the customer perspective.Only a few papers take into consideration other factors, such businessfactors (3), social factors (3), project factors (3), and other non-categorizedfactors (6). In most of these cases (including the customer aspect), theidentiﬁed factors have been reported in a single paper or two. This highlightseither their speciﬁcity for a speciﬁc context or a lack of focus on these factorsin the literature. In both [SP10] and [SP24], the authors conducted a surveywith practitioners to understand which of these factors are most importantfor developers, architects, and product owners. In most cases, customer andbusiness factors were considered the most important ones. However, only afew papers address such factors when prioritizing TD, so we can concludethat these factors have been overlooked in the literature.In quite a few studies (8), the interest (impact) of TD has been identiﬁedand assessed as generic interest, interest likelihood, risk, severity, or as cus-tomizable by the practitioners. Six papers present factors not categorizedspeciﬁcally in the previously mentioned categories and that represent theimpact of TD spanning multiple categories or represent a speciﬁc aspect notrelated to these categories. 24ight other papers assume that the impact of TD is associated withthe (co-)occurrence of instances of diﬀerent issues (e.g., code smells) thatare considered sub-optimal (”quantity of debt” in the table). However, themeasures used in diﬀerent papers diﬀer according to the tools used, and theimpact of the individual issues is assumed to be the same or was assignedarbitrarily. Very few papers (4) use an estimate or a measure of the costof refactoring (principal) in contrast to the impact of TD (interest). Thisis in contrast with the theoretical approach ([26], [27], [SP8]), according towhich TD needs to be prioritized by taking into consideration both the costof refactoring and the impact.

Table 8: Impact Map: Factors and measures related to the interest of TD considered whenprioritizing (RQ3)

Category Factors PSsID competitive advantage [SP10] lead time [SP10]attractiveness for the market [SP10]penalties [SP10]feature usage [SP16]business value [SP16]ROI (calculated per bug) [SP20] Customer satisfaction [SP12] long-term satisfaction [SP10]speciﬁc customer value [SP10]customer expectations [SP13]customer eﬀect [SP16], [SP24] Evolution time of impact on evolution (short- orlong-term) [SP8] risk of critical impact on evolution(possible crisis) [SP8]impact on other features [SP13], [SP24]impact on upcoming features [SP22], [SP24], [SP32] Maintenance modiﬁability [SP2], [SP18], [SP26],[SP28] number of bugs [SP2], [SP10], [SP11],[SP17], [SP20], [SP23],[SP28], [SP32], [SP33],[SP38]maintenance cost [SP10], [SP17], [SP35] System Qualities robustness [SP4] performance eﬃciency [SP2], [SP4], [SP12]security [SP4]transferability [SP4]scalability [SP16]generic qualities [SP32], [SP33], [SP38] Quality Debt Continued on next page able 8 continued from previous page Category Factors PSsID % wasted time (eﬀort) [SP27], [SP32], [SP33],[SP34], [SP35], [SP38] number of developers working on TD [SP35]wasted development hours [SP35]generic eﬀort [SP24]coding output/eﬀort [SP29] Project Factors availability of resources [SP13] project size and complexity [SP32]postponement of bugs [SP23] Social Factors developers’ morale [SP30] social debt [SP31]positive impact of TD [SP32]team culture [SP32] Other Factors contagious debt [SP10] existence of TD solution (alternative) [SP32]spread of impact in the system [SP32], [SP33], [SP38]number of users aﬀected [SP32], [SP33], [SP38]frequency of negative impact [SP32], [SP33], [SP38]kind of smell [SP18], [SP24]history of the system [SP18]compromise architecture [SP18]future cost [SP22]user perception [SP24] Not Speciﬁed risk [SP10], [SP25] interest likelihood [SP13], [SP22]interest [SP13], [SP24]severity [SP24], [SP38]customizable [SP18], [SP24], [SP25],[SP32], [SP33], [SP38] Which tools have been used to prioritize TD?

As reported in Table 9, only 14 papers mentioned the usage of tools forevaluating and prioritizing TD, but only ten of them report information onwhich tools were used. The other studies used a custom-made tool developedfor their speciﬁc purposes. 26 able 9: Tool Used when Prioritizing TD (RQ4)

Tool Name Tool Link Paper ID

AnaConDebt https://anacondebt.com [SP32], [SP33]ARCAN [28], [29] http://essere.disco.unimib.it/wiki/arcan [SP38]CAFFEA not available [SP31]CAST [SP4]Coverity [SP20]Findbugs http://findbugs.sourceforge.net [SP20]Visual Studio FxCo-pAnalyzer [SP20]iPlasma http://loose.cs.upt.ro/index.php?n=Main.IPlasma [SP5]Jsprit https://sites.google.com/site/santiagoavidal [SP18]Scitool Understand https://scitools.com [SP21]SonarQube [SP30]Codescene https://codescene.io [SP39]

Out of the aforementioned tools, we can identify ten static analysistools:

ARCAN , CAST , Coverity , Findbugs , Visual Studio FxCopAnalyzer , iPlasma , Jspirit , Scitool Understand , and

SonarQube . Scitool Understand analyzes the code and visualizes its architecture. The remaining ones de-tect TD issues such as code or architectural smells, security violations, orothers.

CAST , Coverity , Findbugs , Studio FxCopAnalyzer , Codescene , and

SonarQube are commercial tools commonly used to analyze code complianceagainst a set of rules. When the rules are violated, they raise a TD issue.These tools provide the severity of the issues and classify them into diﬀerenttypes (e.g., issues that could lead to bugs, to increased software maintenanceeﬀort, or to security vulnerabilities). Moreover,

CAST and

SonarQube alsoassociate a remediation eﬀort (principal), the time needed to remove theTD issue.

ARCAN , iPlasma , and Jspirit are open-source tools, developedby research teams and aimed at detecting architectural smells (

ARCAN )and code smells ( iPlasma and

Jspirit ). AnaConDebt [30] is a management tool based on a TD-enhanced back-log. The backlog allows the creation of TD items and performs TD-speciﬁcoperations on the created items. In [SP32] and [SP33],

AnaConDebt hasbeen used to report and visualize the information on TD manually collectedby product managers and developers.The

CAFFEA framework [31] identiﬁes organizational roles, where ar-chitectural responsibilities are allocated. Moreover, the tool deﬁnes the teammembers and share among them. The framework has been used in [SP31]27o analyze mismatches between the architecture community and the systemarchitecture.

ARCAN was used in [SP38] to detect architectural smells. The TDprincipal was then investigated by means of a survey in a large company.In [SP30], developers were asked to discuss the TD issues raised by

SonarQube . However, there is no information on whether the developersconsidered the severity or the type of TD issues. In [SP4], the authors used

CAST as is to estimate the principal calculated as time to remove all TDissues. iPlasma and

Jspirit were used in [SP5] and [SP18], respectively, todetect the number of code smells to be refactored in the systems underinvestigation.

Scitool Understand was used in [SP21] to identify architectural issues inthe system under investigation.The TD issues detected by

Coverity , Findbugs , and

Visual Studio FxCo-pAnalyser were used in [SP20] for an industrial survey.

5. Discussion

In this Section, we will discuss the results obtained, outlining some im-plications for researchers and practitioners working in the TD domain.Although the TD domain is relatively young compared to other domainssuch as software testing or software quality, signiﬁcant contributions havebeen published in the last ten years and researchers are becoming more andmore active (Figure 2).Among the ten TD types proposed in 2015 by Li et al. [5] (Table 1),only Code Debt and Architectural Debt have been considered frequently byresearchers ( RQ ) in the context of TD prioritization.In the study proposed by Li et al. [5], Code Debt was the most commonlyinvestigated type of TD, followed by Test Debt. However, other types of TDhave also received signiﬁcant attention. Diﬀerently than in [5], in our workit emerged that Code Debt and Architectural Debt are by far the mostfrequently investigated types of debt when considering TD prioritization.This could be due to the fact that they are easy to measure, mainly based onextensions of previous research from other domains, or it may also be due tothe fact that they (particularly ATD) are considered as the most harmful andexpensive types to manage in software. For example, architectural and codepatterns have been investigated for more than twenty years, even thoughthey were not considered as ”debt”. 28he two most commonly considered types of TD (Code Debt and Archi-tectural Debt) are mainly evaluated by means of architectural or code-levelanti-patterns (architectural smells, code smells, or code violations). More-over, their harmfulness is mainly related to the inﬂuence they have on someexternal quality (e.g., the impact of a speciﬁc code smell on maintenanceeﬀort). However, their inﬂuence is still not clear, since the vast majority ofstudies do not agree on their harmfulness. Other types of TD should be in-vestigated in the future. We believe that Code Debt is the type investigatedmost often since it is easy to access the data by mining software repositorystudies, while other types of debt require other types of studies, includ-ing case studies involving developers. We recommend that practitionersshould consider the measures identiﬁed in this RQ, but should complementthem with expert judgement to understand which architectural smells, codesmells, or code violations to consider.In a software aﬀected by TD, the only signiﬁcantly eﬀective way to re-duce this TD is to refactor it. This fact stresses the importance of continu-ously and iteratively prioritizing the identiﬁed refactoring tasks and therebyhighlights the importance of using an appropriate TD prioritization process.Through this study, we have identiﬁed several diﬀerent approaches for prior-itizing TD ( RQ , RQ . , and RQ . ). However, there is no uniﬁed approachfor this activity, nor is there a consensus on which aspects to focus on whenperforming the TD prioritization process.It is evidently clear from the ﬁndings that the prioritization process ofTD refactoring can be carried out using diﬀerent approaches, all havingdiﬀerent goals and proposing optimization with regard to diﬀerent criteria.This study has identiﬁed ﬁve diﬀerent main approaches that aim to: a)improve software qualities, especially maintainability and evolvability, b)increase software practitioners productivity, c) reduce the fault-pronenessof the software, d) compare various TD items using cost-beneﬁt analysis(CBA) to understand the convenience of refactoring, and e) combine severaldiﬀerent approaches.This result is of value to both academics and practitioners and illustratesthat is it important to ﬁrst identify the goals of TD prioritization, and there-after to implement a corresponding TD prioritization approach targeting theidentiﬁed and speciﬁed goals.One interesting ﬁnding is that the investigated papers usually only com-pared diﬀerent TD items during this prioritizing process and more rarelycompared the need for implementing a new feature with the refactoring ofTD.Regarding the characteristics and measures considered during the prior-29tization process ( RQ ), the results so far imply that prioritizing TD is anactivity that requires a holistic view of several factors. The systematic as-sessment of TD requires a wide amount of information, which might changefrom case to case, and in most cases TD is prioritized without following astandardized approach. Also, the known measures used in a few papers cap-ture only a small part of the factors that are used to prioritize TD (proxy formaintenance costs or productivity). Using only such measures to prioritizeTD without considering the full picture of the relevant factors (risks andcosts) might consequently result in partial and thus biased prioritization,which in turn could lead to poor business decisions. On the other hand,some of the factors have been reported in a single study conducted in aspeciﬁc context and might not be relevant in other prioritization cases.More studies are necessary in order to obtain better evidence on fac-tors that have been overlooked (for example factors related to customers,business, social, and project aspects). In addition, we need to better under-stand which factors should be considered in diﬀerent contexts, and whichadditional measures should be considered when prioritizing TD. Finally, al-though a few holistic approaches have been reported ([27], [SP24], [SP33]),there is a need for a better deﬁned framework and a standardized approachfor assessing TD.Considering the two main components of TD, only a limited number ofpapers propose how to evaluate principal and interest. Interest is mainlycalculated as extra cost, or as time wasted to ﬁx TD issues. The reason couldbe that TD interest is not easy to calculate without access to empiricaldata from companies. Researchers should design and perform studies tounderstand the actual interest of existing TD issues.The tool support for prioritization activities is very fragmented ( RQ ),which highlights the lack of a solid, widely used, and validated set of toolsspeciﬁcally for TD prioritization. Current tools mainly identify TD issuesand, in some cases, propose an estimate of the time needed to ﬁx them.However, to the best of our knowledge, no tools calculate the interest dueto the postponement of activities.Our results can be useful for both researchers and practitioners. Re-searchers should focus on the other types of TD, also considering types ofTD that have not been investigated a lot in the last few years. They canalso evaluate approaches, factors, and measures and how to prioritize them.Moreover, since the available tools are not fully mature, research activitiescan focus on empirical validation of existing tools, conﬁrming the usefulnessof each measure proposed by each tool.Practitioners can beneﬁt from our results by using our impact map to30xplore/anticipate what kind of impact might occur because of TD. More-over, they should be careful in selecting tools, not applying only one butconsidering more than one. Based on our results, we propose a preliminary framework to help prac-titioners during TD prioritization activities as illustrated in Figure 4. Thisframework oﬀers an exploration of diﬀerent factors that need to be takeninto consideration during the TD prioritization process and how these fac-tors relate to each other.The ﬁrst step for practitioners is to decide whether they require a pri-oritization of the refactorings among TD issues or whether they need toprioritize a TD refactoring versus the implementation of new features andbug ﬁxes (results from RQ ). This is because the approaches diﬀer in termsof assessing the impact of TD and assessing the value or the impact of fea-tures and bugs. In the former case, the comparison can use the same factors,while in the latter case, it is more probable that the principal and interestof TD need to be compared with feature-oriented factors, for example com-petitive advantage or cost of delay.Once the scope of the comparison is deﬁned, the evaluation of TD shouldbe performed taking into account: 1) the diﬀerence in the TD principal (thecost of ﬁxing the issues), 2) the impact (the TD interest), and 3) otherfactors, including economic and marketing factors (results from RQ ).The evaluation can be both quantitative and qualitative ( RQ ), and insome cases could be supported by tools ( RQ ). As an example, companiesmight quantitatively evaluate the presence of Code Debt using tools, butthey might also need to perform a qualitative evaluation (e.g., with codereviews) of factors that cannot be measured with tools, for example con-sidering code readability, analyzability, or other quality characteristics. Inaddition, some tools provide means to calculate the principal of the TD, butpractitioners might need to calculate the interest by qualitatively assessingthe impact factors.Moreover, the evaluation should be performed considering diﬀerent sce-narios, including the available resources and the possible evolution of thesystem. In fact, TD can be quite context-dependent (as we discussed for theimpact factors in RQ ), which means that practitioners need to assess itwith estimations of future scenarios. For example, in the tool AnaConDebt,practitioners can specify events happening in short-, medium- or long-termscenarios. The evaluation of the diﬀerent scenarios should help in making31efactoring decisions, for example regarding which refactorings should beperformed and which should be postponed.As an example of the decision process, a company might consider notimplementing a new feature that involves a code section or module that issuﬀering from TD. This can happen if such TD is estimated to generate highinterest in the short-term scenario: In such a case, the interest generated bythe TD could overcome the cost of delaying the feature. The practitionersmight then decide to refactor the code before implementing the feature.Let’s take a concrete example of how a refactoring decision is made fol-lowing the steps in the prioritization framework. An architect needs todecide whether to refactor a ”sub-optimal” interface before more applica-tions accessing it are developed. The main activity of the architect is toevaluate whether to prioritize the refactoring of TD vs. developing newfeatures. Then the architect needs to take into consideration and calculatediﬀerent factors (principal, interest, and other factors). Without the refac-toring, the TD would spread to all the new code ( Contagious debt , Table 8).In addition, all the new applications would suﬀer from the negative impact(interest) generated by interacting with the sub-optimal API (

Spread of im-pact in the system , Table 8). Although delaying the development of the newapplications (feature-oriented factor) would imply costs in the short-termscenario, the lead time (Table 8) for developing new features in the long-term scenario could be reduced as the developers would not pay the interestgenerated by the sub-optimal interface. If such long-term gain overcomesthe cost of delaying the application development, the practitioners shouldchoose to perform the refactoring of the API. In this case, the refactoringdecision would be made by evaluating whether, in a future scenario, the costof avoiding the interest is worth paying the principal.The TD prioritization framework can assist practitioners, in combinationwith the other results presented in this paper (impact map, descriptionof prioritization approaches, and available tools), in reaching a refactoringdecision. 32 pproachOther factorsImpact (interest) Principal (cost of fix) Prioritization

Basedon

QualitativeQuantitativeComparisonTD Vs other TDTD Vs Features, Bugs ScenariosOver TimeResources...

Main activity Evaluation of Is calculated for

RefactoringDecision

OutputAssessed by

RQ2 ToolsRQ3 RQ2 RQ4

Figure 4: The TD Prioritization Framework

6. Threats to Validity

The results of an SLR may be subject to validity threats, mainly con-cerning the correctness and completeness of the survey. In this Section, wewill outline some implications for researchers and practitioners working inthe TD domain. We have structured this Section as proposed by Wohlinet al. [32], including construct, internal, external, and conclusion validitythreats.

Construct validity is related to generalization of the result to the conceptor theory behind the study execution [32]. In our case, it is related to thepotentially subjective analysis of the selected studies. As recommended byKitchenhams guidelines [20], data extraction was performed independentlyby two or more researchers and, in case of discrepancies, a third authorwas involved in the discussion to clear up any disagreement. Moreover,the quality of each selected paper was checked according to the protocolproposed by Dyb˚a and Dingsøyr [23].

Internal validity threats are related to possible wrong conclusions aboutcausal relationships between treatment and outcome [32]. In the case ofsecondary studies, internal validity represents how well the ﬁndings representthe ﬁndings reported in the literature. In order to address these threats, wecarefully followed the tactics proposed by [20].33 .3. External validity

External validity threats are related to the ability to generalize the re-sult [32]. In secondary studies, external validity depends on the validityof the selected studies. If the selected studies are not externally valid, thesynthesis of its content will not be valid either. In our work, we were notable to evaluate the external validity of all the included studies.

Conclusion validity is related to the reliability of the conclusions drawnfrom the results [32]. In our case, threats are related to the potential non-inclusion of some studies. In order to mitigate this threat, we carefullyapplied the search strategy, performing the search in eight digital librariesin conjunction with the snowballing process [22], considering all the refer-ences presented in the retrieved papers, and evaluating all the papers thatreference the retrieved ones, which resulted in one additional relevant pa-per. We applied a broad search string, which led to a large set of articles,but enabled us to include more possible results. We deﬁned inclusion andexclusion criteria and applied them ﬁrst to title and abstract. However,we did not rely exclusively on titles and abstracts to establish whether thework reported evidence on Technical Debt prioritization. Before accepting apaper based on title and abstract, we browsed the full text, again applyingour inclusion and exclusion criteria.

7. Conclusion

Software companies need to manage and refactor TD issues since some-times their presence is inevitable, due to a number of causes that may berelated to unpredictable business or environmental forces internal or exter-nal to the organization. Moreover, some types of TD can be more dangerousthan others.Therefore, it is necessary to understand when refactoring TD should beprioritized with respect to implementing features or ﬁxing bugs, or withrespect to other types of TD.We conducted an SLR in order to investigate the existing body of knowl-edge in software engineering and gain an understanding of how TD is pri-oritized in software organizations and what research approaches have beenproposed.The SLR process was carried out by following two rigorous approaches.We included scientiﬁc articles indexed by the most important bibliographic34ources and selected by a rigorous process. We considered articles publishedbefore December 2019. Our work is based on 38 selected studies, whichinclude data on the state of the art concerning approaches, factors, measures,and tools used in practice or proposed in research to prioritize TD.The results of our review show that Code Debt and Architectural Debtare by far the most frequently investigated type of debt when consideringTD prioritization, while there is scant evidence about other types of TDsuch as Test Debt and Requirement Debt. The prioritization process of TDrefactoring can be carried out using diﬀerent approaches, all having diﬀerentgoals and proposing optimization with regard to diﬀerent criteria. However,the identiﬁed measures used in a few papers capture only a small part ofthe factors that are used to prioritize TD.There is a lack of empirical evidence on measuring principal and interest.Moreover, our results highlight the lack of a solid, validated, and widely usedset of tools speciﬁcally for TD prioritization.In practice, we found that there is a plethora of aspects that need tobe considered when prioritizing TD. We presented an impact map of suchfactors, which can be used as a comprehensive reference regarding whichinterest might be paid by an organization and how it should be considered.This map can also be used to follow up with further research.Future work should focus on the investigation of types of TD that havebeen investigated less often. Moreover, we are planning to investigate howto systematically evaluate and measure the principal and interest of diﬀerenttypes of TD. We also aim at developing a framework to support decision-making related to the prioritization of TD.

ReferencesReferences