Characterizing and Mitigating Self-Admitted Build Debt
Tao Xiao, Dong Wang, Shane McIntosh, Hideaki Hata, Raula Gaikovina Kula, Takashi Ishio, Kenichi Matsumoto
11 Characterizing and MitigatingSelf-Admitted Build Debt
Tao Xiao, Dong Wang, Shane McIntosh, Hideaki Hata,Raula Gaikovina Kula, Takashi Ishio, and Kenichi Matsumoto
Abstract —Technical Debt is a metaphor used to describe the situation in which long-term code quality is traded for short-term goals insoftware projects. In recent years, the concept of self-admitted technical debt (SATD) was proposed, which focuses on debt that isintentionally introduced and described by developers. Although prior work has made important observations about admitted technicaldebt in source code, little is known about SATD in build systems. In this paper, we coin the term
Self-Admitted Build Debt (SABD) andthrough a qualitative analysis of 500 SABD comments in the Maven build system of 300 projects, we characterize SABD by locationand rationale (reason and purpose). Our results show that limitations in tools and libraries, and complexities of dependencymanagement are the most frequent causes, accounting for 49% and 23% of the comments. We also find that developers oftendocument SABD as issues to be fixed later. To automate the detection of SABD rationale, we train classifiers to label commentsaccording to the surrounding document content. The classifier performance is promising, achieving an F1-score of 0.67–0.75. Finally,within 16 identified ‘ready-to-be-addressed’ SABD instances, the three SABD submitted by pull requests and the five SABD submittedby issue reports were resolved after developers were made aware. Our work presents the first step towards understanding technicaldebt in build systems and opens up avenues for future work, such as tool support to track and manage SABD backlogs.
Index Terms —Self-Admitted Technical Debt, Build Debt, Build System, Build Maintenance (cid:70)
NTRODUCTION T H roughout the software development process, stake-holders strive to build functional, maintainable, andhigh-quality software. Despite their best efforts, developersinevitably encounter situations where suboptimal solutions,known as Technical Debt (TD) are implemented in a softwareproject [8]. Although studies have traced evidence of TDin source code, TD covers a range of software artifacts andprocesses (i.e., architecture, build, defects, design, documen-tation, infrastructure, people, process, requirements, service,and testing) [2]. Clear evidence of TD is at the core of self-admitted technical debt (SATD), where developers recordthe reasoning behind such suboptimal solutions. Potdar andShihab [35] observed that SATD was present existed in 31%of source code files.Although prior work on SATD in source code has madeimportant advances, modern software development has abroader scope than solely producing source code. Indeed, acomplex collection of other software artifacts and tools areneeded to produce official software releases. At the heart ofthese, other artifact is the build system , which orchestratestools (e.g., automated test suites, containerization tools,external and internal dependency management) into a re-peatable (and ideally incremental) process. Software teamsuse build system specifications to express dependencies within • T. Xiao, D. Wang, H. Hata, R. G. Kula, T. Ishio, and K. Matsumoto arewith the Nara Institute of Science and Technology, Japan.E-mail: { tao.xiao.ts2, wang.dong.vt8, hata, raula-k, ishio, matu-moto } @is.naist.jp. • S. McIntosh is with the Cheriton School of Computer Science, Universityof Waterloo, Canada.E-mail: [email protected]. and among internal and external software artifacts. Buildsystems also suffer from TD. Developers at Google calledsuch form of TD as
Build Debt (BD), which is describedas the efforts to measure and pay down technical debtfound in their build files and associated dead code [33].They described the four types of BD, namely (i) dependencyspecifications—where there is a slowdown of the build andtest systems and the brittleness of a project’s build due tounder-declared direct dependencies; and (ii) unbuildabletargets—where abandoned targets that have not success-fully built for several months (i.e., zombie targets); and (iii)visibility—where public components that become privateare never removed; and (iv) unnecessary command-lineflags—where a set of flags for libraries and binaries are nolonger needed.Despite the critical role that build systems play, to thebest of our knowledge, there have been no previous studiesthat investigate SATD in build specification files. To fill thisgap, in this paper, we propose to analyze build files andtheir existing SATD, which we refer to as
Self-Admitted BuildDebt (SABD) . More specifically, we set out to characterizeSABD, explore its potential for automation and evaluateSABD mitigation strategies. By analyzing the 500 SABDcomments extracted from 300 GitHub repositories that uti-lize the Maven build system, we address the following threeresearch questions: (RQ1) What are the characteristics of a SABD?
Motivation:
It is unclear what types of SABD exist. Simi-lar to SATD, analyzing SABD characteristics (location andrationale) will lay the foundation for understanding thescope of build debt. Answering this question will drivefuture research and tool development on SABD problemsof practical relevance. a r X i v : . [ c s . S E ] F e b (RQ1.1) Location: Which build file specifications are mostsusceptible to SABD? Results:
SABD tends to occur in the plugin configuration andthe external dependency configuration code, accounting for47% and 31%, respectively. (RQ1.2) Rationale: What causes a developer to documentSABD and what purpose does it serve?
Results:
We analyze rationale along reason and purposedimensions. First, we find that there are ten categoriesof SABD reasons. The most frequent reasons include thelimitations in tools and libraries, and complexities of de-pendency management, accounting for 49% and 23% ofanalyzed SABD instances, respectively. Second, we find thatthere are six purposes for leaving SABD comments, withdocumenting issues to be fixed later and explaining therationale for a workaround occurring the most frequently,accounting for 34% and 23% of analyzed SABD instances,respectively. (RQ2) Can automated classifiers accurately identify thecharacteristics of SABD?
Motivation:
Analysis of build systems at large organizationslike Google would require an automated approach to bepractical. For practitioners, automatic SABD identificationcould facilitate the replication of detection approaches topromote their adoption, and improve the detection qualityand traceability. Hence, we explore the feasibility of trainingautomatic classifiers to identify SABD characteristics.
Results:
Experimental results show automation is feasible,achieving a precision of 0.68 and 0.79, a recall of 0.67 and0.75, and an F1-score of 0.67 and 0.75 for SABD reasonsand purposes, respectively. Comparing both traditional andstate-of-the-art machine learning techniques, we find thatthe auto-sklearn based classifiers tend to outperform the setof baseline classifiers, i.e., Naive Bayes (NB), Support VectorMachine (SVM), and k-Nearest Neighbor (kNN) by a marginof 10–22 percentage points. (RQ3) To what extent can SABD be removed?
Motivation:
The manner by which developers handle SABDis currently unknown, i.e., whether or not they can beremoved. Hence, we investigate the ‘ready-to-be-addressed’SABD removal by submitting pull requests and issue re-ports. Answering this research question can address thenecessity of proposing automatic tools to identify SABDcomments for researchers and furthermore facilitate bettertechnical debt management for projects.
Results:
Within 16 ‘ready-to-be-addressed’ SABD instances,we propose pull requests for seven cases, three of whichwere merged. Moreover, we produce issue reports for ninecases, five of which were resolved within 20 days. Whilethere are a number of factors at play, these responses suggestthat developers are receptive and reactive to SABD.
Replication package.
To facilitate replication and futurework in the area, we have prepared a replication package,which includes the manually labeled dataset and the scriptsfor reproducing our analyses. The package is available on-line at https://github.com/NAIST-SE/SABD.The remainder of the paper is organized as follows.Section 2 describes the workflow that we followed to collectSABD comments. Sections 3–5 show the experiments thatwe conducted to address RQ1–3, respectively. Section 6presents our recommendations for build system stakehold- (DP2) Identify SABDcommentsGitHub (DP1) Extractcomments fromMaven repositories SABDcomments
Data Preparation
Manually classifylocations, reasons,and purposes
RQ 1
Coded SABDcomments (RQ 1 results)Model evaluationresults (RQ 2 results)Feedback(RQ 3 results)Create issue reportsand pull requestsIdentify 'ready-to-be-addressed' SABD
RQ 3
Constructclassification models
RQ 2
Evaluate models'ready-to-be-addressed' SABDClassificationmodelsComments
Fig. 1: Overview of the studyers based on our study. Section 7 situates our work withrespect to the literature on build systems and technicaldebt. Section 8 discusses the threats to validity. Section 9draws the conclusions and highlights opportunities for fu-ture work.
ATA P REPARATION
In this section, we describe the data collection procedure.Figure 1 shows an overview of the procedure, which consistsof two steps: (DP1) Extract comments from Maven reposito-ries; and (DP2) Identify SABD comments. (DP1) Extract comments from Maven repositories.
Maven is a popular build automation tool used primarily forJava projects. In a large-scale analysis of 177,039 repositories,McIntosh et al. [31] found that Maven repositories tend to bethe most actively maintained. Since developers are activelyupdating their Maven files, we suspect that technical debtmay also be accruing. Thus, we select Maven as our studiedbuild system.We analyze Java repositories in the dataset shared byHata et al. [14]. That dataset includes the Git repositoriesof actively developed software projects containing (i) morethan 500 commits; and (ii) at least 100 commits duringthe most active two-year period. Forked repositories areexcluded from the analysis. We analyze the latest version(
HEAD revision) of the repositories. The list of
HEAD revisionsis summarized in the replication package.We select the Maven specifications from each studiedrepository using the filename convention (i.e., pom.xml ).Each studied repository may have several Maven specifi-cations. Since the specifications are written in XML, com-ments are recognized as content appearing between “ ” XML tokens. We extract comment content fromMaven specifications using a script that builds upon the JavaSE XML parser. Finally, we extract 253,555 comments from100,765 POM files in 3,710 Maven repositories. (DP2) Identify SABD comments.
We detect SABD com-ments using the keyword-based approach of Potdar and
1. https://github.com/takashi-ishio/CommentLister
TABLE 1: Summary of studied dataset
Shihab [35]. In addition, to reduce the risk of missing SABDcomments and enlarge the dataset, we expand Potdar andShihab’s keyword list to include 13 frequent features thatwere recommended by Huang et al. [16]. Our adapted listof SABD keywords is summarized in our online appendix. In the end, we are able to detect 3,424 SABD comments.Table 1 provides an overview of our studied dataset.
HARACTERIZING
SABD C
OMMENTS (RQ1)
SABD comments may appear in different locations withinbuild files. Moreover, the rationale for incurring SABD maydiffer. In this section, we analyze the locations, reasonsfor adoption, and purposes served by SABD comments. Toperform our analyses, we use a manually-intensive method.Below, we present our approach to classify SABD commentsaccording to locations, reasons, and purposes (Section 3.1),followed by the results (Section 3.2). Finally, we explore therelationships between locations and reasons, and locationsand purposes (Section 3.3).
We apply an open coding approach [7] to classify randomlysampled SABD comments in build files. Open coding isa qualitative data analysis method by which artifacts un-der inspection (SABD comments in our case) are classifiedaccording to emergent concepts (i.e., codes). After coding,we apply open card sorting [34] to raise low-level codesto higher levels and more abstract concepts, especially forSABD reasons. Below, we describe our code saturation,sample coding, and card sorting procedures in more detail.
Code saturation.
Section 2 shows that there are 3,424SABD comments out of 253,555 comments appearing in thecurated set of GitHub Maven repositories. Since coding ofall 3,424 instances is impractical, we elect to code a sampleof SABD comments.To discover as a complete list of SABD locations, reasons,and purposes as possible, we strive for saturation . Similar toprior work [15], we set our saturation criterion to 50, i.e.,we continue to code randomly selected SABD commentsuntil no new codes have been discovered for 50 consecutivecomments. Finally, we reach saturation after coding 266SABD comments. To test the level of agreement of ourconstructed codes, we calculate the Kappa agreement ofour codes among the first two authors, who independentlycoded the locations, reasons, and purposes of all 266 SABDcomments. Cohen’s Kappa for location codes is 0.91 or‘Almost perfect’ agreement [44], whereas Cohen’s Kappafor reason and purpose codes are 0.78 and 0.75, respectively,which indicate ‘Substantial’ agreement [44]. The somewhatlower agreement can be explained by the need for extrapo-lation when coding the reason and purpose of an instanceof SABD from its context and comment content.
2. https://doi.org/10.6084/m9.figshare.13018580
Sample coding.
To increase the generalizability of ourresults, after our codes achieve saturation, we coded addi-tional SABD comments to reach 500 samples. We dividedthe additional 234 samples into two sets. Then, the firstauthor independently coded the first set, and the secondauthor independently coded the second set. In a series offollow-up meetings with the third author, each case wherethe coders disagree was discussed until a consensus wasachieved. When coding each SABD comment, the codersfocus on the location, key reason, and key purpose. Forexample, a SABD comment from the
Apache OODT projectis located in the plugin configuration. The reason of thiscomment is labeled as ‘External library limitation’ and itspurpose is ‘Document for later fix’.Since the open coding is an exploratory data analysistechnique, it may be prone to errors. To mitigate errors, wecode in two passes. First, we code based on the commentitself. After completing an initial round of coding, we per-form a second pass over all of SABD comments to correctmiscoded instances. In the second pass, we code basedon contextual information, such as the surrounding buildspecification code, prior commit history, and other relevantdevelopment records.
Card sorting.
We apply open card sorting to construct ataxonomy of SABD reasons. Open card sorting helps us togenerate general types from our low-level codes. The opencard sorting includes two steps. First, the coded commentsare merged into cohesive groups that can be representedby a similar subcategory. Second, the related subcategoriesare merged to form categories that can be summarized by ashort title.
In this section, we present our results for RQ1, consisting ofSABD location and rationale (reason and purpose).
RQ1.1 - SABD Location
Observation 1:
Plugin configuration and External dependenciesconfiguration are the most frequently occurring locations in oursample.
We identified the nine location codes that emergedfrom our qualitative analysis. Table 2 provides an overviewof the categories and their definitions, frequencies, andlines of code (LOC) for SABD locations. From the table,we observe that
Plugin configuration is the most frequentlyoccurring location for developers to leave SABD comments,with 47% of SABD comments appearing in that location. Thesecond most frequently occurring location is the
Externaldependencies configuration location, accounting for 31% ofSABD comments appearing in that location. In addition,locations such as
Project metadata , Build organization , and
Software configuration management are rarely associated withSABD, i.e., 1% for each category.The fourth column of Table 2 shows that our locationtendencies appear to follow with the volume of code ineach category. This result shows that, as one might expect,categories that require larger volumes of build configurationcode tend to be more prone to contain SABD.
TABLE 2: Definition and Frequency of SABD locations and lines of code (LOC) of build code
Category Definition Frequency LOC
Plugin configuration Build code that specifies which build plugin features should be in-cluded or excluded and how they should be configured for buildexecution, e.g.,
Reasons.
The top portion of Table 3 defines and quantifiesthe reasons that we observe for SABD comments in ourstudied sample.
Observation 2:
Limitation is the most common reason cate-gory for developers to leave SABD comments.
We identified 17subcategories that emerged from our classification for SABDreasons. The 17 subcategories fit into ten categories. Table 3shows definitions and frequencies for various SABD reasoncategories. As we can see from the table, 49% of SABDcomments are left due to the
Limitation reason.Upon closer inspection, external library limitation is themain limitation, accounting for 42% of the occurrences of theLimitation category. Indeed, it appears that working aroundthe constraints imposed by external libraries is a complexityof modern development from which build specifications arenot exempt. The following is an example of the
Limitation reason. The comment describes the limitation of the specificversion of the maven-war-plugin plugin. < !−− This is broken in maven−war−plugin 2.0, works in 2.0.1 −− >< warSourceExcludes > WEB−INF/no−lib/*.jar < /warSourceExcludes > The second most frequently occurring category isthe
Dependency reason (23%). In an example below,the org.apache.httpcomponents:httpclient depen-dency is required to implement the
OAuth2 testing. Thus,developers leave comments as a reminder for why thisdependency is needed. < !−− Required to implement OAuth2 testing −− >< dependency >< groupId > org.apache.httpcomponents < /groupId >< artifactId > httpclient < /artifactId >< scope > test < /scope >< /dependency >
3. https://tinyurl.com/y5jtkxkb4. https://tinyurl.com/y5jxjk45
Less frequently occurring patterns include the
Deploy-ment process reason (1%). Finally, only 5% of commentsare labeled as
No reason , which means that we could notdetermine the reasons from them.In the study of Mensah et al. [32], they identified thepossible causes of SATD introduction. The most prominentcauses are code smells (23%), complicated and complextasks (22%), inadequate code testing (21%), and unexpectedcode performance (17%). Comparatively, those causes ac-count for only 1% of the SABD. We suspect that this isbecause build code specifies a set of rules to prepare andtransform the source code into deliverables. Unlike imper-ative or object-oriented systems, build specifications areprimarily implemented to inform an expert system so thatit may make efficient and correct decisions. This change inparadigm is likely changing the characteristics of observableSABD.We describe the remaining reason categories in detailusing representative examples in our online appendix tohelp the reader understand this taxonomy. Purposes.
The bottom portion of Table 3 defines andquantifies the purposes that we observe for SABD com-ments in our studied sample.
Observation 3:
Document for later fix is the mostfrequently occurring purpose.
Our classification revealedsix SABD purposes. Table 3 shows the results of ourpurpose classification. We observe that 34% SABDcomments are left by developers with the
Documentfor later fix purpose. The result indicates that SABDcomments are likely to be used as a short-termmemo for developers to recheck in the future. Forexample, since Maven resolves dependencies transitively,it is possible to include unwanted or problematic
5. https://doi.org/10.6084/m9.figshare.13147727
TABLE 3: Definition and Frequency of SABD reasons andpurposes. Ten categories merged from subcategories forSABD reasons are shown in bold.
Category Definition Frequency R e a s o n Limitation
Constraints imposed by the design orimplementation of third-party librariesor development tools.
245 (49%)
External library limitation 208 (42%)External tool limitation 22 (4%)Build tool limitation 15 (3%)
Dependency
When accessing unavailable artifacts orassets, such as missing or staledependencies.
117 (23%)
Missing dependency 56 (11%)Stale dependency 49 (10%)Problematic dependency 12 (2%)
Recursive call
Coherence issues, recursive calls toinvoke another build file.
33 (7%)Document
Inadequate project description issues,such as licensing and metadataspecification.
23 (4%)
Specify metadata 12 (2%)Licensing 11 (2%)
Build break
Broken builds (i.e., failures that occurduring the build process) in build files.
22 (4%)Compiler setting
Configuration issues during thecompilation process, such as compilerconfiguration and symbol visibility.
18 (4%)
Compiler configuration 16 (3%)Symbol visibility 2 (1%)
Deployment process
Processes which make the softwareartifacts ready for execution oravailable for use.
12 (2%)
Installation 6 (1%)Deployment 6 (1%)
Code smell
Violations of fundamentals of designprinciples, i.e., instances of poor codingpractice in build files.
Changes that need to be propagated tokeep software artifacts in sync duringupdates.
A label could not be assigned (due tolack of information).
25 (5%) P u r p o s e Document for later fix Document an issue that should berevisited in the future. 172 (34%)Workaround Document constraints imposed bydesign or implementation choices andthe impact they have had on solutionstructure and/or content. 113 (23%)Warning for futuredevelopers Warn other developers to pay attentionto an aspect of the solution that maynot be clear from its structure orcontent. 111 (22%)Document suboptimalimplementation choice Explain why a problematic solution hasbeen adopted. 82 (16%)Placeholder for laterextension Document an extension point for laterenhancement(s). 13 (3%)Silence build warnings Defer or ignore warnings emitted byunderlying tools. 9 (2%) dependencies. For the software.amazon.awssdk:s3 dependency which includes the broken netty-nio-client:software.amazon.awssdk dependency, developers exclude this dependency topreserve a clean (i.e., non-broken) build status. Developersleft this comment as a note to revisit in the future. < dependency >< groupId > software.amazon.awssdk < /groupId >< artifactId > s3 < /artifactId >< version > $ { awsjavasdk.version } < /version >< exclusions >< exclusion >< !−− TODO remove exclusions after we fix netty module −− >< artifactId > netty−nio−client < /artifactId >< groupId > software.amazon.awssdk < /groupId >< /exclusion >< /exclusions >< /dependency > Another commonly occurring purpose is the
Workaround purpose, accounting for 23% of our sample. In the examplebelow, the io.grpc:grpc-core dependency is partly un-usable. Developers comment out this dependency and leftthe comment to document this temporary fix.
6. https://tinyurl.com/y4wg8n3z7. https://tinyurl.com/y43rxj9a < !−− FIXME(lesv) Temporary fix due to Datastore having the wrong@@version@@ −− >< !−− < dependencyManagement >< dependencies >< dependency >< groupId > io.grpc < /groupId >< artifactId > grpc−core < /artifactId >< version > < /version >< /dependency >< /dependencies >< /dependencyManagement > −− >< !−− end of FIXME −− > On the other hand, only 3% and 2% of SABD commentsare identified for the
Placeholder for later extension purposeand the
Silence build warnings purpose.The survey of Maldonado et al. [26] showed that devel-opers most often use SATD to track future bugs and badimplementation areas. In our context of build systems, thehigh frequency of the
Document for later fix purpose agreeswith their observations.We describe the remaining purpose categories in detailusing representative examples in our online appendix tohelp the reader understand this taxonomy. Observation 4:
Location categories share a strong relationshipwith reason categories.
Motivated by representative examples,we observe that SABD comments in similar locations canvary in terms of reasons and purposes. Thus, we conducta further study to investigate the relationships betweenlocations and reasons, and locations and purposes. We vi-sualize these relationships by using two parallel sets [19]in parallel categories diagrams. Parallel sets are variants ofparallel coordinates, in which the width of lines that connectsets corresponds to the frequency of their co-occurrence.Figure 2 shows relationships between locations and reasons,and locations and purposes.In Figure 2a, SABD comments in
Plugin configuration most frequently occur (65.1%) because of the
Limitation reason in our sample. The example below shows a co-occurrence of
Plugin configuration - Limitation . This SABDcomment is located in the
External dependencies con-figuration most frequently occur (58.3%) due to the
Dependency reason in our sample. The example be-low shows this relationship, where a comment lo-cated in the
8. https://doi.org/10.6084/m9.figshare.131477399. https://tinyurl.com/y6fuzkrk10. https://tinyurl.com/yxdopn3g
Plugin(cid:3)con(cid:192)guration
Loca(cid:87)ion
Build(cid:3)(cid:89)ariablesMulti-director(cid:92)(cid:3)con(cid:192)guration
Build(cid:3)organi(cid:93)ationSoft(cid:90)are(cid:3)con(cid:192)guration(cid:3)managementE(cid:91)ternal(cid:3)dependencies(cid:3)con(cid:192)gurationResource(cid:3)con(cid:192)gurationRepositor(cid:92)(cid:3)con(cid:192)gurationProject(cid:3)metadata Build(cid:3)break
Rea(cid:86)on
LimitationChange(cid:3)propagationCode(cid:3)smellCompiler(cid:3)settingDeplo(cid:92)ment(cid:3)processDocumentDependenc(cid:92)No(cid:3)reasonRecursi(cid:89)e(cid:3)call N (cid:82) . (cid:3) (cid:82) f(cid:3) c a (cid:86) e (cid:86) . % . % The Trial Version (a) Location-Reason
Plugin(cid:3)con(cid:192)guration
Loca(cid:87)ion
Build(cid:3)variablesMulti-director(cid:92)(cid:3)con(cid:192)guration
Build(cid:3)organi(cid:93)ationSoftware(cid:3)con(cid:192)guration(cid:3)managementE(cid:91)ternal(cid:3)dependencies(cid:3)con(cid:192)gurationResource(cid:3)con(cid:192)gurationRepositor(cid:92)(cid:3)con(cid:192)gurationProject(cid:3)metadata Document(cid:3)for(cid:3)later(cid:3)(cid:192)(cid:91)
P(cid:88)(cid:85)(cid:83)o(cid:86)e
Silence(cid:3)build(cid:3)warningsWorkaroundWarning(cid:3)for(cid:3)future(cid:3)developersDocument(cid:3)suboptimal(cid:3)implementation(cid:3)choicePlaceholder(cid:3)for(cid:3)later(cid:3)e(cid:91)tension N (cid:82) . (cid:3) (cid:82) (cid:73)(cid:3) c a (cid:86) e (cid:86) . % . % . % . % The Trial Version (b) Location-Purpose
Fig. 2: Parallel sets between locations and reasons, andlocations and purposes. For example,
Plugin configuration most frequently occur because of the
Limitation reason. < dependencies >< !−− fix protobuf dependency issue −− >< dependency >< groupId > com.google.protobuf < /groupId >< artifactId > protobuf−java < /artifactId >< version > < /version >< /dependency >< /dependencies > These two observations suggest that location categories tendto be more prone to SABD causes.Moreover, for the relationships between locations andpurposes that are shown in Figure 2b, SABD comments inthe
Plugin configuration location are most often left withthe
Workaround purpose (30.7%). For instance, the SABDcomment below is provided with a workaround for Travisbuild. < plugins >< plugin >
11. https://tinyurl.com/y65fqwlv < groupId > org.apache.maven.plugins < /groupId >< artifactId > maven−surefire−plugin < /artifactId >< version > < /version >< configuration >< !−− Travis build workaround −− >< argLine > −Xms1024m −Xmx2048m < /argLine >< /configuration >< /plugin >< /plugins > Figure 2b also shows that SABD comments in the
Ex-ternal dependencies configuration location are most often leftwith the
Document for later fix purpose (41.7%). In a casebelow, the comment located in the
We identified nine SABD locations,ten reasons, and six purposes in Maven build sys-tems. Location categories tend to be more proneto SABD causes. In the build system maintenance,stakeholders involved in SABD management shouldbe aware of diverse SABD characteristics to assist inmaking effective management decisions.
OMMENT C LASSIFICATION (RQ2)
In Section 3, our results provide evidence that diverseSABD locations and rationale (i.e., reasons and purposes)exist in build files. To facilitate the replication of detectionapproaches, and promote the adoption and traceability ofSABD, an automated SABD classifier should be beneficial.Thus, we further study the feasibility of automaticallyclassifying SABD comments. To do so, we use the manuallycoded SABD comments from Section 3 as a dataset. With thisdataset, we train classifiers based on machine learning tech-niques and evaluate their performance. Below, we presentour approaches to automated classification (Section 4.1) andmodel evaluation (Section 4.2), as well as the results to RQ2(Section 4.3).
Since in the SABD reasons and purposes, minority codesprovide the lesser with enough signal to reliably classify,while minority codes may possess more valuable knowl-edge. Thus, to reduce the bias introduced by oversampling,we rearrange the codes whose frequencies are less than 10%of the sampled comments with the aspects of SABD reasons
12. https://tinyurl.com/y49s8kg3 and purposes. For the reason category, we merge
Recur-sive call , Document , Build break , Compiler setting , Deploymentprocess , Code smell , Change propagation , and
No reason into
Other . For the purpose category, we merge
Placeholder forlater extension and
Silence build warnings into
Other . Text preprocessing.
An analysis of coded SABD com-ments revealed that bug report links usually appear in com-ments. Thus, for all SABD comments, we replace hyperlinkswith abstracturl by using the regular expression similarto the previous study [25]: https ?: \ / \ \ . ) ?[ −a−zA−Z0−9@:%. \ + ˜ { } \ . [ a−z ] { }\ b ([ − a−zA−Z0−9@:% \ +.˜ } Moreover, to reduce the impact of noisy text in com-ments, we remove special characters by using the regular ex-pression [ˆA-Za-z0-9]+ . Additionally, we apply Spacy to lemmatize words, which accounts for term conjugation.Although it is common practice, we opt to exclude stopwords removal, since stop words like ‘for’ and ‘until’ conveycritical semantics in the context of SABD comments [25]. Feature extraction.
We apply the N-gram Inverse Doc-ument Frequency (IDF) approach to extract features fromthe preprocessed text using the N-gram weighting schemetool [39] with its default setting. N-gram IDF [40] is a the-oretical extension of the IDF approach for handling wordsand phrases of any length. The approach generates a list ofall valid N-gram terms, and the strength of their associationwith the targeted classes excluding
Other . We remove anyterm that appears only once in each class. In total, 1,997and 1,120 N-gram terms are retrieved for SABD reasons andpurposes, respectively.
Classifier preparation.
Previous studies [24, 25, 45] re-ported that classifiers trained by combining N-gram IDFand auto-sklearn machine learning tend to outperform clas-sifiers that are trained with single word features. Headingtheir advice, we train our classifiers using auto-sklearn [10],which automatically determines effective machine learningpipelines for classification. Auto-sklearn searches a con-figuration space of 15 classification algorithms, 14 featurepreprocesses, and four data preprocesses for optimal hyper-parameter settings. We configure the approach to optimizefor the weighted F1-score, with a budget of one hour foreach round, and a memory capacity of 32 GB.
To evaluate our classifier, we use common performancemeasures. The precision is the fraction of SABD commentsthat are correctly classified. The recall is the fraction of trulySABD comments that are classified as such. The F1-score isthe harmonic mean of precision and recall.To investigate the impact that the choice of classificationtechnique has, we apply Naive Bayes (NB), Support VectorMachine (SVM), and k-Nearest Neighbors (kNN) classifica-tion techniques. These classifiers have been broadly adoptedin prior studies [16, 46]. Similar to prior work [25], we applyTF-IDF [38] to extract the features for our baseline classifiers.
13. https://spacy.io/
TABLE 4: Performance of classifiers for SABD reason
Category auto-sklearn NB SVM kNNPrecision
Limitation
Recall
Limitation 0.73 0.62
F1-score
Limitation 0.73 0.65
TABLE 5: Performance of classifiers for SABD purpose
Category auto-sklearn NB SVM kNNPrecision
Document for later fix
Recall
Document for later fix 0.82 0.37
Workaround
Avg.
F1-score
Document for later fix
Avg.
Observation 5:
The auto-sklearn classifier tends to outperformthe baseline classifiers for both reasons and purposes.
Table 4shows the classifier performance with respect to the reasonsfor SABD. The table shows that the average precision is0.68, which is greater than the precision of the NB andkNN classifiers (0.57 and 0.62, respectively). The averagerecall and F1-score of the auto-sklearn are also greater thanbaseline classifiers by at least nine and seven percentagepoints, respectively. Upon closer inspection, we find thatthe classification of
Limitation achieves the best performancewhen compared with the other two reason categories. Forinstance, the precision, recall, and F1-score for
Limitation are0.75, 0.73, and 0.73, respectively, which are greater than thenext best performance,
Dependency , by a margin of at leastseven percentage points.Table 5 shows the classifier performance with respect tothe purposes of SABD. As we can see from Table 5, theauto-sklearn classifier outperforms the baseline classifiers.The average recall and F1-score are greater than the otherbaseline classifiers by at least eight points. Closer inspec-tion of the purpose categories reveals that classifying the
Workaround purpose has the best performance, with theF1-score reaching 0.89. Such high performance is possibleas there usually exist keywords that explicitly map thiscategory, e.g., ‘workaround’ and ‘temporary’. Moreover, forthe other purpose categories, we find that the performanceis still promising, e.g., achieving F1-scores of 0.77 and 0.70for
Document for later fix and
Waiting for future developers purposes. On the other hand, we observe that SVM out-performs auto-sklearn, especially in terms of precision. Forexample, SVM reaches a mean/median precision of 0.82 for
TABLE 6: Frequently occurring n-gram features in eachcategory
Category N-gram features Frequency R e a s o n Limitation be break 18available 10break in 10link 9jdk 9
Dependency java 9 7require by 6to implement 6framework 5be need 4 P u r p o s e Documentfor later fix fix this 7late 6the project 6todo fix this 6pron fix 6
Documentsuboptimalimplementationchoice offline 6todo why 6script 4do pron 4copy 3
Workaround workaround for 54workaround for abstracturl 19workaround to 14a workaround 9java 9 7
Warning forfuture developers break in 10war 6fix a 5in osgi 4mvn 4 the SABD purpose, while the auto-sklearn classifier achievesa mean/median precision of 0.79.In Table 6, we list the most frequently occurring N-gram features. For example, ‘workaround for’ appeared 54times for SABD comments with the
Workaround purpose,and ‘be break’ appeared 18 times for SABD comments about
Limitation reason.
RQ2 Summary:
The auto-sklearn classifier tends tooutperform the baselines for SABD reasons (0.68precision, 0.67 recall, 0.67 F1-score) and purposes(0.79 precision, 0.75 recall, 0.75 F1-score).
EMOVAL (RQ3)
In this section, we investigate the willingness of developersto remove the ‘ready-to-be-addressed’ SABD that containsresolved bug reports similar to the previous study [24]. Todo so, we mine for links in the comments of the manuallycoded data from Section 3. We then systematically assesswhether the SABD is ready to be addressed. This concept isknown as ‘on-hold’ SATD, which is a condition to indicatethat a developer is waiting for a certain event to occurelsewhere (e.g., an update to the behavior of a third-partylibrary or tool), according to the study of Maipradit etal. [25]. Below, we first describe our studies of the incidencesof ‘ready-to-be-addressed’ SABD (Section 5.1), and our pro-posed clean-up pull requests and tracking issue reports(Section 5.2). TABLE 7: Frequency of link target types in 91 SABD com-ments
Category Frequency bug report 87 (84%)tutorial or article 6 (6%)404 4 (4%)Stack Overflow 2 (2%)pull request 1 (1%)software homepage 1 (1%)forum thread 1 (1%)blog post 1 (1%) sum
103 (100%)
Identify ready-to-be-addressed SABD.
We systematicallyidentify ‘ready-to-be-addressed’ SABD using the followinglist of conditions:Step 1. Extract hyperlinks or issue IDs from the com-ments. Using regular expressions, we extract 103 links from91 SABD comments. Then we manually code them basedon the link target coding guide of Hata et al. [14]. Table 7shows the link target distribution. We observe that bug report is the most frequently occurring (84%) link target in SABDcomments.Step 2. Check link target in SABD comments. We checkif the link target is an bug report with the status of ‘resolved’,‘closed’, ‘verified’, or ‘completed’, and resolution type is setto ‘fixed’ similar to the previous study [24]. Furthermore,to facilitate the creation of our pull requests and issuereports, we exclude four candidates where: (I) the repositoryreferenced in the SABD comment has been archived; (II)the repository referenced by the issue report in the SABDcomment has been archived; (III) the repository referencedin the SABD comment is a mirror repository; (IV) the issuereport link in the SABD comment is a ‘cross-reference’ (e.g.,the issue report is referenced to aid in documenting therationale behind an implementation choice).
Observation 6:
Of the 91 SABD comments that containhyperlinks, 16 contain ‘ready-to-be-addressed’ SABD. Amongthe 16, Plugin configuration is the most frequently occurringlocation, Limitation is the most frequently occurring reason, andWorkaround is the most frequently occurring purpose, i.e., 13,9, and 12 cases, respectively.
Table 8 shows that we initiallyidentified 27 ‘ready-to-be-addressed’ SABD in our dataset.However, we observed that 10 of the 27 SABD had alreadybeen removed by developers. Five SABD were removedbecause the entire file was deleted. The other five SABDwere addressed directly by developers. Additionally, oneSABD had been submitted as an issue report by a developer,but it has not been closed.
To evaluate the importance of ‘ready-to-be-addressed’SABD, we created issue reports and pull requests to thestudied projects. When preparing issue reports, we also pro-vide possible solutions for developers to deal with ‘ready-to-be-addressed’ SABD. Examples of issue reports and pullrequests are shown in Figures 3a and 3b.
Observation 7:
For ‘ready-to-be-addressed’ SABD, removalrates have been reached 43% and 56% in pull requests and issue (a) Example of an issue report(b) Example of a pull request
Fig. 3: Examples of created issue report and pull requestTABLE 8: Distribution of ‘ready-to-be-addressed’ SABD
Status Frequency
Existing 16 (59%)File deleted 5 (19%)Fixed by developers 5 (19%)Developers try to fix 1 (3%) reports, respectively.
In total, we prepared seven issue reportsfor nine instances of SABD, since three SABD belong tothe same repository. In addition, we prepared seven pullrequests for the other seven instances of SABD comments.We found developers actively resolve these pull requestsand issue reports within a 20 days time frame.Three of the four responded pull requests have beenaccepted and merged into the master branch. For instance,one developer responded: “I merged it, and found and fixedtwo others of the same type, which would have remained if youhad not brought it to our attention. Thanks!” . Only one pullrequest was rejected because the developer had to considerthe plugin version dependency, i.e., “Thanks for the reminder!Upgrading the plugin in my TODO list for the summer, so I’lllook into it shortly. The plugin version must be updated beforeremoving the config.” .For the prepared issue reports, five ‘ready-to-be-addressed’ SABD were resolved. For instance, one devel-oper left the appreciation: “Thanks for making us aware of thisfact.” . On the other hand, two issue reports were rejected:one case is where the developer did not agree that the issuewas an instance of technical debt, and another case is where,due to the system supporting multiple versions, the SABDcould not be removed.
RQ3 Summary:
We identified 16 instances of SABDthat are ‘ready-to-be-addressed’. Through our ex-periment, we propose pull requests for seven cases,three of which were merged. Moreover, we produceissue reports for nine cases, five of which wereresolved within 20 days. These responses suggestthat developers are receptive and reactive to SABDin build systems.
ECOMMENDATIONS
Based on our findings, we make the following recommen-dations for practitioners, researchers, and tool builders.First, we recommend that practitioners: • Track SABD by using issue trackers, as we find thatdevelopers have tried to add issue report hyperlinksor issue IDs in tandem with comments. Using onlycomments to track or manage SABD is still problematic.Indeed, only 91 out of the 500 SABD comments containhyperlinks. Explicitly referencing related content willimprove traceability. • Check SABD containing resolved bug reports, as we identi-fied 16 instances of SABD comments that are ready-to-be-addressed from RQ3. These stale SABD commentscould create confusion for anyone inspecting the code.Even more insights for practitioners would be discov-ered by the following future research: • Further studies of workarounds for SABD.
During thecoding process, we observe that one SABD workaroundcan be used across several repositories. This suggeststhat the retrieval and curation of workarounds mayhave broad implications beyond the scope of a specificproject, and are thus important and of value for practi-tioners. • Establishing an understanding of SABD in other buildsystems.
As seen in Tables 2 and 3, we identified ninelocations, ten reasons, and six purposes, which couldimprove the overall understanding of SABD in buildsystems. Applying the coding guides from RQ1 to otherbuild systems (e.g., make , CMake, or Ant) could help toestablish a broader theory of SABD in build systems ingeneral. • Improving the classification of SABD in build systems.
InRQ2, we propose automatic classifiers to identify SABDcharacteristics. Tables 4 and 5 show that our classifiersare promising. The result demonstrates the feasibility ofautomatic classifiers and serves as a key step for devel-oping a SABD classification system. However, it still hasroom for performance improvement. We suggest that infuture research, researchers evaluate other approachesto improve the SABD classifiers.The following directions for future work may yield valuefor tool builders: • Tool support for managing SABD in build systems.
Al-though we recommend that practitioners use issuereport hyperlinks or IDs to track SABD, it could bepractically useful to have tools or systems to helppractitioners manage SABD traceability automatically.A SABD management tool could make the developersaware of the debt being incurred and would makeit easy to continually avoid the debt as part of theirnormal workflow. A possible mock-up is presented byMaipradit et al. [24, Fig. 7]. • Focusing on top SABD locations and reasons would pro-vide the most benefit to developers . In RQ1, we providethe most frequently occurring locations and reasonsfor SABD in the build systems. We suggest that toolbuilders make an extra effort on these top locations andreasons. • Tool support for recommending solutions to SABD in buildsystems.
During the creation of pull requests and issuereports, we observe that the possible solutions thatwe provided for developers to mitigate ‘ready-to-be-addressed’ SABD are similar and straightforward (e.g.,remove the extraneous comment or code). This obser-vation suggests that an automated tool for addressingSABD could be useful. This will not just help develop-ers to manage such SABD, but also will improve thequality of the final product.
ELATED W ORK
In this section, we position our work with respect to theliterature on build systems and technical debt related to thisstudy.
Build system maintenance is a hidden cost, which takesa considerable amount of development effort. Kumfert etal. [20] argued that the need to keep the build systemsynchronized with the source code generates an implicitoverhead on the development process, and in their survey,developers claimed that they spend up to 35.71% of theirtime on build system maintenance. McIntosh et al. [30]analyzed ten large, long-live projects by mining the versionhistories, and their study showed that build system mainte-nance is 27% overhead on source code development. Adamset al. [1] studied the evolution of the Linux KBUILD files andhow these files co-evolve with the source code. McIntosh etal. [29] made similar observations in Java build systems.Build breakage and how to repair it have been widelystudied. Kerzazi et al. [18] interviewed 28 software engi-neers to study why build breakages are introduced in anindustrial setting. Rausch et al. [36] performed an analysisof build failures, which studies the variety and frequency ofbuild breakage in the CI environments of 14 open sourceJava projects. Islam and Zibran [17] studied the factorsthat may impact the build outcome, observing that thenumber of changed lines of code, files, and built commits intasks are most significantly associated with build outcomes.Zolfagharinia et al. [47] studied the impact of operationsystem and runtime environment on build breakage in theCI environment of the Comprehensive Perl Archive Net-work (CPAN) ecosystem, suggesting interpretation of buildresults is not straightforward.In addition, researchers have proposed automated ap-proaches to repair build breakages. For example, Macho etal. [23] proposed BUILDMEDIC, an approach to automat-ically repair Maven builds that break due to dependency-related issues. Hassan and Wang [13] proposed HireBuild,an approach to automatically repair build scripts with fixinghistories. Hassan [12] also outlined promising preliminarywork towards automatic build repair in CI environment thatinvolves both source code and build script.There have also been other predictive approaches pro-posed to promote awareness and simplify interactions withbuild systems. Tufano et al. [42] envisioned a predictivemodel that would preemptively alert developers about theextent to which their software changes may impact future building activities. Hassan and Zhang [11] defined a modelfor predicting the certification results of a software build.Bisong et al. [5] proposed and analyzed models that canpredict the build time of a job. Cao et. al [6] proposedBuildM´et´eo—a tool to forecast the duration of incrementalbuild jobs by analyzing a timing-annotated Build Depen-dency Graph (BDG).Although plenty of studies widely investigate the impor-tance of build system maintenance and propose approachesto relieve the build issue, there is no study that focuseson SATD within the scope of build system maintenance.However, build systems often suffer from massive main-tenance activities during the development process, and thepart of these activities is produced by SATD, since SATDchanges are more difficult to perform and SATD inevitablygenerate long-term maintenance problems from a short-term hack. Thus, in this study, we first characterize andmitigate SABD in the Maven build system and explore thefeasibility of training automatic classifiers to identify SABDcharacteristics.
Due to the importance of technical debt to the software de-velopment process and quality, there have been surveys andmapping studies about technical debt. Sierra et al. [41] sur-veyed research work on SATD, analyzing the characteristicsof current approaches and techniques for SATD detection,comprehension, and repayment. Li et al. [21] performed amapping study on technical debt and technical manage-ment. Vassallo et al. [43] showed that 88% of participantsmentioned documenting their suboptimal implementationchoices in the code that they produced.Prior studies widely analyzed the factors or activitiesthat affect technical debt. Besker et al. [3] observed that sixorganizational factors (experience of developers, softwareknowledge of startup founders, employee growth, uncer-tainty, lack of development process, and the autonomy ofdevelopers regarding TD decisions) were associated withthe benefits and challenges of the intentional accumulationof technical debt in software. Besker et al. [4] also investi-gated activities on which wasted time is spent and whetherdifferent TD types impact the wasted time in different ways.The detection of technical debt is also widely studied.Liu et al. [22] proposed the SATD detector to automaticallydetect SATD comments and highlight, list, and managedetected comments in an Integrated Development Envi-ronment (IDE). Farias et al. [9] carry out three empiricalstudies to curate the knowledge embedded in the SATDidentification vocabulary, which can be used to automat-ically identify and classify TD items through code com-ment analysis. Yan et al. [46] also proposed an automatedchange-level TD determination model that can identify TD-introducing changes. Wattanakriengkrai et al. [45] combineN-gram IDF and auto-sklearn machine learning approachesto train classifiers to identify requirement and design debt.Maldonado et al. [27] used NLP maximum entropy classi-fiers [28] to automatically identify design and requirementSATD in source code comments. Moreover, Ren et al. [37]used Convolution Neural Network-based approaches withbaseline text-mining approaches [16] to identify SATD in a cross-project prediction setting. Maipradit et al. [24, 25]identified “On-Hold” SATD for automated management.Inspired by these past studies of SATD, in this paper,we conduct the first study on self-admitted technical debtin build systems. Similar to prior work, we first set out tocharacterize SABD in build systems in terms of locations,reasons, and purposes. We provide three coding guides forSABD in build systems, and automated SABD classifiersare provided in Section 4. Furthermore, in this work, weinvestigate the willingness of developers to remove the‘ready-to-be-addressed’ SABD that refers to resolved issuereports. HREATS T O V ALIDITY
Below, we discuss the threats to the validity of our study:
Construct validity.
We use comment patterns to identifySABD comments in build files. Since SABD comment pat-terns are not enforced, we will miss SABD comments thatdo not conform to these comment patterns. To mitigate thisrisk, we expand upon a popular comment patterns list [35]with features recommended by Huang et al. [16].
Internal validity.
We rely on manually coded data,which may be miscoded due to the subjective nature ofunderstanding the coding schema. To mitigate this threat,we apply three best practices for open coding: 1) we con-duct four rounds of independent coding and calculate theCohen’s Kappa to ensure that our agreements at least are‘Substantial’; 2) we pursue saturation with concrete criteria,i.e., 50 consecutively coded comments for which no newcategories were discovered; 3) we perform two passes thatrevisited miscoded SABD comments based on additionalcontextual information.
External validity.
We only conduct an empirical study of300 Maven projects. As such, our results may not generalizeto all Maven projects or other build technologies. On theother hand, our sample of projects is diverse, includingprojects of varying size and domain. Nonetheless, replica-tion studies may help to improve the strength of generaliza-tions that can be drawn.
ONCLUSIONS
Addressing self-admitted technical debt (SATD) is an im-portant step in the development process. Recently, manystudies have focused on SATD in source code, but littleis known about SATD in the build systems. Thus, in thispaper, we characterize and propose mitigation strategies forthe coined term
Self-Admitted Build Debt (SABD). To do that,we (i) manually classified 500 SABD comments accordingto their locations, reasons, and purposes; and (ii) trainedSABD classifiers using the coded SABD comments; and (iii)investigated the willingness of developers to remove the‘ready-to-be-addressed’ SABD that references resolved bugreports.We observe that (i) SABD comments in Maven buildsystems most often occur in the plugin configuration loca-tion; the most frequently occurring reasons behind SABDis to document limitations in tools and libraries, as wellas issues to be fixed later; and (ii) our auto-sklearn clas-sifier achieves better performance than baseline classifiers, achieving an F1-score of 0.67–0.75; and (iii) the removal ratesof ‘ready-to-be-addressed’ SABD in pull requests and issuereports reached 43% and 56%, respectively. We foresee manypromising avenues for future work, such as improvementsto the classifiers, expanding our coded corpus of SABDcomments to other build systems, and automatic approachesto address SABD in build systems. A CKNOWLEDGEMENT
We would like to thank Rungroj Maipradit for provid-ing technical assistance in training auto-sklearn classi-fier. This work has been supported by JSPS KAKENHIGrant Numbers JP18KT0013, JP18H04094, JP20K19774, andJP20H05706. R EFERENCES [1] B. Adams, K. D. Schutter, H. Tromp, and W. Meuter,“The evolution of the linux build system,”
ElectronicCommunication of the European Association of SoftwareScience and Technology (ECEASST) , 2007.[2] N. S. R. Alves, L. F. Ribeiro, V. Caires, T. S. Mendes,and R. O. Sp´ınola, “Towards an ontology of termson technical debt,” in
Proceedings of the InternationalWorkshop on Managing Technical Debt (MTD) , 2014.[3] T. Besker, A. Martini, R. Edirisooriya Lokuge, K. Blin-coe, and J. Bosch, “Embracing technical debt, froma startup company perspective,” in
Proceedings of theInternational Conference on Software Maintenance and Evo-lution (ICSME) , 2018.[4] T. Besker, A. Martini, and J. Bosch, “Software developerproductivity loss due to technical debt—a replicationand extension study examining developers’ develop-ment work,”
Journal of Systems and Software (JSS) , 2019.[5] E. Bisong, E. Tran, and O. Baysal, “Built to last or builttoo fast? evaluating prediction models for build times,”in
Proceedings of the International Conference on MiningSoftware Repositories (MSR) , 2017.[6] Q. Cao, R. Wen, and S. McIntosh, “Forecasting theduration of incremental build jobs,” in
Proceedings ofthe International Conference on Software Maintenance andEvolution (ICSME) , 2017.[7] K. Charmaz,
Constructing Grounded Theory . SAGE,2014.[8] W. Cunningham, “The wycash portfolio managementsystem,”
SIGPLAN OOPS Messenger , 1992.[9] M. A. de Freitas Farias, M. G. de Mendonc¸a Neto,M. Kalinowski, and R. O. Sp´ınola, “Identifying self-admitted technical debt through code comment anal-ysis with a contextualized vocabulary,”
Information andSoftware Technology (IST) , 2020.[10] M. Feurer, A. Klein, K. Eggensperger, J. T. Springen-berg, M. Blum, and F. Hutter, “Efficient and robustautomated machine learning,” in
Proceedings of the In-ternational Conference on Neural Information ProcessingSystems (NIPS) , 2015.[11] A. E. Hassan and K. Zhang, “Using decision trees topredict the certification result of a build,” in
Proceed-ings of the International Conference on Automated SoftwareEngineering (ASE) , 2006. [12] F. Hassan, “Tackling build failures in continuous inte-gration,” in Proceedings of the International Conference onAutomated Software Engineering (ASE) , 2019.[13] F. Hassan and X. Wang, “Hirebuild: An automaticapproach to history-driven repair of build scripts,” in
Proceedings of the International Conference on SoftwareEngineering (ICSE) , 2018.[14] H. Hata, C. Treude, R. G. Kula, and T. Ishio, “9.6 millionlinks in source code comments: Purpose, evolution, anddecay,” in
Proceedings of the International Conference onSoftware Engineering (ICSE) , 2019.[15] T. Hirao, S. McIntosh, A. Ihara, and K. Matsumoto,“The review linkage graph for code review analytics: Arecovery approach and empirical study,” in
Proceedingsof the European Conference on Foundations of SoftwareEngineering (ESEC/FSE) , 2019.[16] Q. Huang, E. Shihab, X. Xia, D. Lo, and S. Li, “Iden-tifying self-admitted technical debt in open sourceprojects using text mining,”
Empirical Software Engineer-ing (EMSE) , 2017.[17] M. R. Islam and M. F. Zibran, “Insights into continuousintegration build failures,” in
Proceedings of the Interna-tional Conference on Mining Software Repositories (MSR) ,2017.[18] N. Kerzazi, F. Khomh, and B. Adams, “Why do auto-mated builds break? an empirical study,” in
Proceedingsof the International Conference on Software Maintenanceand Evolution ((ICSME)) , 2014.[19] R. Kosara, F. Bendix, and H. Hauser, “Parallel sets:Interactive exploration and visual analysis of categor-ical data,”
Transactions on Visualization and ComputerGraphics (TVCG) , 2006.[20] G. Kumfert and T. Epperly, “Software in the doe: Thehidden overhead of ”the build”,” Lawrence LivermoreNational Lab., CA (US), Tech. Rep., 2002.[21] Z. Li, P. Avgeriou, and P. Liang, “A systematic mappingstudy on technical debt and its management,”
Journalof Systems and Software (JSS) , 2015.[22] Z. Liu, Q. Huang, X. Xia, E. Shihab, D. Lo, and S. Li,“Satd detector: A text-mining-based self-admitted tech-nical debt detection tool,” in
Proceedings of the Inter-national Conference on Software Engineering: CompanionProceeedings (ICSE-Companion) , 2018.[23] C. Macho, S. McIntosh, and M. Pinzger, “Automat-ically repairing dependency-related build breakage,”in
Proceedings of the International Conference on SoftwareAnalysis, Evolution and Reengineering (SANER) , 2018.[24] R. Maipradit, B. Lin, C. Nagy, G. Bavota, M. Lanza,H. Hata, and K. Matsumoto, “Automated identificationof on-hold self-admitted technical debt,” in
Proceedingsof the International Working Conference on Source CodeAnalysis and Manipulation (SCAM) , 2020.[25] R. Maipradit, C. Treude, H. Hata, and K. Matsumoto,“Wait for it: identifying “on-hold” self-admitted techni-cal debt,”
Empirical Software Engineering (EMSE) , 2020.[26] E. D. S. Maldonado, R. Abdalkareem, E. Shihab, andA. Serebrenik, “An empirical study on the removalof self-admitted technical debt,” in
Proceedings of theInternational Conference on Software Maintenance and Evo-lution (ICSME) , 2017.[27] E. D. S. Maldonado, E. Shihab, and N. Tsantalis, “Using natural language processing to automatically detectself-admitted technical debt,”
Transactions on SoftwareEngineering (TSE) , 2017.[28] C. Manning and D. Klein, “Optimization, maxent mod-els, and conditional estimation without magic,” in
Pro-ceedings of the Conference of the North American Chapterof the Association for Computational Linguistics on HumanLanguage Technology: Tutorials(NAACL-Tutorials) , 2003.[29] S. Mcintosh, B. Adams, and A. E. Hassan, “The evolu-tion of java build systems,”
Empirical Software Engineer-ing (EMSE) , 2012.[30] S. McIntosh, B. Adams, T. H. Nguyen, Y. Kamei, andA. E. Hassan, “An empirical study of build mainte-nance effort,” in
Proceedings of the International Confer-ence on Software Engineering (ICSE) , 2011.[31] S. McIntosh, M. Nagappan, B. Adams, A. Mockus,and A. E. Hassan, “A large-scale empirical study ofthe relationship between build technology and buildmaintenance,”
Empirical Software Engineering (EMSE) ,2015.[32] S. Mensah, J. Keung, J. Svajlenko, K. E. Bennin, andQ. Mi, “On the value of a prioritization scheme for re-solving self-admitted technical debt,”
Journal of Systemsand Software (JSS) , 2018.[33] J. D. Morgenthaler, M. Gridnev, R. Sauciuc, andS. Bhansali, “Searching for build debt: Experiencesmanaging technical debt at google,” in
Proceedings ofthe International Workshop on Managing Technical Debt(MTD) , 2012.[34] P. Morville and L. Rosenfeld,
Information architecturefor the World Wide Web: Designing large-scale web sites .O’Reilly Media, 2006.[35] A. Potdar and E. Shihab, “An exploratory study on self-admitted technical debt,” in
Proceedings of the Interna-tional Conference on Software Maintenance and Evolution(ICSME) , 2014.[36] T. Rausch, W. Hummer, P. Leitner, and S. Schulte, “Anempirical analysis of build failures in the continuousintegration workflows of java-based open-source soft-ware,” in
Proceedings of the International Conference onMining Software Repositories (MSR) , 2017.[37] X. Ren, Z. Xing, X. Xia, D. Lo, X. Wang, andJ. Grundy, “Neural network-based detection of self-admitted technical debt: from performance to explain-ability,”
Transactions on Software Engineering and Method-ology (TOSEM) , 2019.[38] G. Salton and C. Buckley, “Term-weighting approachesin automatic text retrieval,”
Information Processing andManagement (IP&M) , 1988.[39] M. Shirakawa. N-gram weighting scheme,. [Online].Available: https://github.com/iwnsew/ngweight[40] M. Shirakawa, T. Hara, and S. Nishio, “Idf for wordn-grams,”
Transactions on Information Systems (TOIS) ,2017.[41] G. Sierra, E. Shihab, and Y. Kamei, “A survey of self-admitted technical debt,”
Journal of Systems and Software(JSS) , 2019.[42] M. Tufano, H. Sajnani, and K. Herzig, “Towards pre-dicting the impact of software changes on buildingactivities,” in
Proceedings of the International Conferenceon Software Engineering: New Ideas and Emerging Results (ICSE-NIER) , 2019.[43] C. Vassallo, F. Zampetti, D. Romano, M. Beller,A. Panichella, M. Di Penta, and A. Zaidman, “Continu-ous delivery practices in a large financial organization,”in Proceedings of the International Conference on SoftwareMaintenance and Evolution (ICSME) , 2016.[44] A. Viera and J. Garrett, “Understanding interobserveragreement: The kappa statistic,”
Family medicine , 2005.[45] S. Wattanakriengkrai, R. Maipradit, H. Hata,M. Choetkiertikul, T. Sunetnanta, and K. Matsumoto,“Identifying design and requirement self-admittedtechnical debt using n-gram idf,” in
Proceedings ofInternational Workshop on Empirical Software Engineeringin Practice (IWESEP) , 2018.[46] M. Yan, X. Xia, E. Shihab, D. Lo, J. Yin, and X. Yang,“Automating change-level self-admitted technical debtdetermination,”
Transactions on Software Engineering(TSE) , 2019.[47] M. Zolfagharinia, B. Adams, and Y.-G. Gu´eh´eneuc,“Do not trust build results at face value: An empiricalstudy of 30 million cpan builds,” in
Proceedings of theInternational Conference on Mining Software Repositories(MSR) , 2017.
Tao Xiao is a Master’s student at the Depart-ment of Information Science, Nara Institute ofScience and Technology, Japan. He receivedhis BSc degree in Software Engineering fromChiang Mai University, Thailand, in 2020. Hismain research interests are empirical softwareengineering, mining software repositories, natu-ral language processing.
Dong Wang is currently working toward the Doc-tor degree in Nara Institute of Science and Tech-nology in Japan. His research interests includecode review and mining software repositories.
Shane McIntosh is an Associate Professor atthe University of Waterloo. Previously, he was anAssistant Professor at McGill University, wherehe held the Canada Research Chair in Soft-ware Release Engineering. He received hisPh.D. from Queen’s University, for which he wasawarded the Governor General’s Academic GoldMedal. In his research, Shane uses empiricalmethods to study software build systems, re-lease engineering, and software quality: http://shanemcintosh.org/.
Hideaki Hata is an Assistant Professor at theNara Institute of Science and Technology. Hisresearch interests include software ecosystems,human capital in software engineering, and soft-ware economics. He received a Ph.D. in in-formation science from Osaka University. Moreabout Hideaki and his work is available online athttps://hideakihata.github.io/.
Raula Gaikovina Kula is an Assistant Profes-sor at Nara Institute of Science and Technol-ogy. He received the Ph.D degree from NaraInstitute of Science and Technology in 2013.His interests include Software Libraries, Soft-ware Ecosystems, Code Reviews and MiningSoftware Repositories.
Takashi Ishio received the Ph.D. degree in infor-mation science and technology from Osaka Uni-versity in 2006. He was a JSPS Research Fellowfrom 2006–2007. He was an Assistant Profes-sor at Osaka University from 2007–2017. He isnow an Associate Professor of Nara Institute ofScience and Technology. His research interestsinclude program analysis, program comprehen-sion, and software reuse. He is a member of theIEEE, ACM, IPSJ and JSSST.