[PDF] Improved dependency management for issue trackers in large collaborative projects

Abstract

Issue trackers, such as Jira, have become the prevalent collaborative tools in software engineering for managing issues, such as requirements, development tasks, and software bugs. However, issue trackers inherently focus on the life-cycle of single issues although issues have and express dependencies on other issues that constitute an issue dependency network in a large complex collaborative projects. The objective of this study is to develop supportive solutions for the improved management of dependent issues in an issue tracker. This study follows Design Science methodology, consisting of elicitation of drawbacks, and construction and evaluation of a solution and system. The study was carried out in the context of The Qt Company's Jira, which exemplifies an actively used, almost two decade old issue tracker with over 100,000 issues. The drawbacks capture how users operate with issue trackers to handle issue information in large, collaborative and long-lived projects. The basis of the solution is to keep issues and dependencies as separate objects and automatically construct an issue graph. Dependency detection complements the issue graph by proposing missing dependencies, and consistency check and diagnosis identify incompatible issue priorities and release assignments. Jira's plugin and service-based system architecture realizes the functional and quality concerns of the system implementation. We show how to adopt the supporting intelligent techniques of an issue tracker in a complex use context and a large data-set. The solution takes into account integrated and holistic system-view, practical applicability and utility, and the practical characteristics, such as inherent incompleteness, of issue data.

Full PDF

11 Improved dependency management for issuetrackers in large collaborative projects

Mikko Raatikainen, Quim Motger, Clara Marie L ¨uders, Xavier Franch, Lalli Myllyaho, Elina Kettunen,Jordi Marco, Juha Tiihonen, Mikko Halonen, and Tomi M ¨annist ¨o

Abstract —Issue trackers, such as Jira, have become the prevalent collaborative tools in software engineering for managing issues,such as requirements, development tasks, and software bugs. However, issue trackers inherently focus on the life-cycle of single issuesalthough issues have and express dependencies on other issues that constitute an issue dependency network in a large complexcollaborative projects. The objective of this study is to develop supportive solutions for the improved management of dependent issuesin an issue tracker. This study follows Design Science methodology, consisting of elicitation of drawbacks, and construction andevaluation of a solution and system. The study was carried out in the context of The Qt Company’s Jira, which exempliﬁes an activelyused, almost two decade old issue tracker with over 100,000 issues. The drawbacks capture how users operate with issue trackers tohandle issue information in large, collaborative and long-lived projects. The basis of the solution is to keep issues and dependencies asseparate objects and automatically construct an issue graph. Dependency detection complements the issue graph by proposingmissing dependencies, and consistency check and diagnosis identify incompatible issue priorities and release assignments. Jira’splugin and service-based system architecture realizes the functional and quality concerns of the system implementation. We show howto adopt the supporting intelligent techniques of an issue tracker in a complex use context and a large data-set. The solution takes intoaccount integrated and holistic system-view, practical applicability and utility, and the practical characteristics, such as inherentincompleteness, of issue data.

Index Terms —Issue, issue tracker, issue management, dependency, release, requirement, bug, design science, Jira. (cid:70)

NTRODUCTION AND MOTIVATION

Issue management is a fundamental activity in many oftoday’s software development projects, and is especiallyprevalent in open-source software development projects. Itconsists of the identiﬁcation and resolution of new require-ments, development tasks, unexpected problems, softwarebugs, and questions (i.e., the issues ) that may arise at anymoment during the project. As issues convey importantobservations, failing to manage issues may result in delays,quality problems, or even complete failure of the softwareproject [1]. Due to this critical nature, as well as the complex-ity that it entails, particularly in large collaborative projects,issue management is usually tool-supported: software en-gineering teams use issue trackers to report, manage, andresolve software project related issues [2]. Issue trackers aretypically used collaboratively by various project stakehold-ers, including project managers, developers, and even end-users.In this complex, collaborative environment, issues can-not be conceived as independent entities. Instead, issuesaffect each other through various types of dependencies ,which form an issue dependency network . For example, areported issue might be part of a major bug, or a bug mightcontribute to a speciﬁc requirement, or two issues might • M. Raatikainen, L. Myllyaho, E. Kettunen, J. Tiihonen, and T. M¨annist¨oare with University of Helsinki, Finland. E-mail: ﬁrst.last@helsinki.ﬁ. • Q. Motger, X. Franch, and J. Marco are with UniversitatPolit`ecnica de Catalunya, Barcelona, Spain. E-mail: [email protected],[email protected], [email protected]. • C. M. L¨uders is with University of Hamburg, Germany. E-mail:[email protected]. • M. Halonen is with The Qt Company, Oulu, Finland. E-mail:[email protected]. refer to the same topic. In fact, dependencies are one of thekey concerns that need to be considered in various softwareengineering planning activities, such as requirements prior-itization [3] [4], and release planning [5] [6].Even though issue trackers usually allow the expres-sion of dependencies, they are still speciﬁed in relation toan individual issue and, in practice, not always reported.Moreover, issue trackers have been designed primarily toprovide stakeholders with specialized support for each in-dividual issue and its properties throughout its life-cycle.Advanced understanding as well as analytical and manage-ment features over issue dependency network are not wellsupported. Although some well-known issue trackers, suchas Jira , offer ﬁlters and dashboards that group issues bytheir properties, issue trackers lack the ability to thoroughlyanalyze and manage these dependencies. Given that issuesrarely appear in isolation, the limitations in managing issuedependency network are harmful.This paper addresses the problem of how to supportstakeholders of a software project, with the managementof dependent issues in an issue tracker over the softwaresystem development life-cycle. In this paper, we refer to de-pendencies as horizontal interdependencies between issuesrather than vertical dependencies, i.e. traceability betweenissues and other types of artifacts, such as issues and theirimplementation [7]. The solution focuses on the nature ofdependencies themselves, the detection of missing depen-dencies between issues, and consistency analysis of issue de-pendency network to extend well-known features offered byissue trackers. The solution is implemented as a Jira plugin a r X i v : . [ c s . S E ] F e b and service-based system considering contextual productquality characteristics – including security, scalability, andefﬁciency – to ﬁt in real, large data-set scenarios. To this end,the research, the solution design, and its evaluation havebeen carried out in the context of The Qt Company (TQC) , apublicly listed, global software company.This paper is organized as follows. Section 2 providesbackground about issue trackers. Section 3 depicts theoverall research method, including the research questions,and a description of TQC and its Jira as the context forthis research. The results are presented in three sections:Section 4 reports the main drawbacks in issue tracker use;Section 5 depicts the objectives and the techniques of oursolution; and Section 6 addresses the artifact implementa-tion. Section 7 reports the evaluation, while Section 8 collectsdiscussion, the threats to validity, and related work. Finally,Section 9 concludes the research.

ACKGROUND : I

SSUE TRACKERS

Issue trackers provide technological support for issue man-agement tasks. Given the collaborative nature of these tools,they can be conceived as ”a type of social media” [8] forthe software development domain. As a consequence, theusers of an issue tracker, e.g. software developers, productowners, and project managers, rely on the features of thesetools.Karre et al. [9] conducted a categorization analysis of31 well-known issue trackers to identify their features andmain differences. The analysis resulted in 24 characteristics,including simple, traditional features, such as e-mail sup-port or the existence of a comments section, as well as morecomplex features like a customizable graphical user inter-face, or the ability to establish links between independentissues. The result is a four-class categorization (

Cluster − )based on the complexity and number of features offered byeach issue tracker. Among this categorization, Cluster isreported as the set of most advanced issue trackers with ahigh number of features, including custom ﬁelds, planningand project management features.Among the most advanced issue trackers from Cluster ,we highlight Redmine, Mantis, BugZilla, and Jira. They areall well-known and widely-used tools that provide complexand advanced issue modeling features, and include a widevariety of issue management functionalities, especially forsingle issue management. These single issue modeling fea-tures include type (e.g., ‘epic’, ‘bug’, ‘user story’, and ‘task’),scope (product or component) and status (e.g., ‘open’ and‘closed’). However, there are signiﬁcant differences amongthem. For instance, neither BugZilla nor Mantis supportissue types other than ‘bugs’, which is the underlying typeof each issue, nor the deﬁnition of custom issue typesor statuses. On the other hand, all of these are featuressupported by both Redmine and Jira.If we focus on more advanced features beyond single-entity analysis, all of them support some type of speciﬁ-cation process for dependencies among issues (i.e., issuesdepending on the resolution of another issue) or duplicatedissues (i.e., marking an issue as a copy of an existing one).However, these dependency and duplicate managementfeatures are limited to speciﬁcation as properties modeling of a single issue. In addition, this complex speciﬁcationprocess requires human action to manually label and createthese dependencies.More advanced features, such as release managementtasks, including creating a release plan and adding issues toa scheduled release, are supported only by Jira. This makesJira one of the most advanced issue trackers in terms ofthe scope of its functionalities. Since Jira is the issue trackerused by TQC, the company providing the context for thisresearch, we argue that ﬁndings related to drawbacks andimprovements presented in our study may apply to othertrackers (see Section 8.4 for a more detailed discussion ongeneralization of the ﬁndings). A more detailed account ofthe existing features in Jira is provided in sections 3 and 4 ESEARCH APPROACH

Our research follows the Design Science methodology [10],in which the solution knowledge and artifact are developedfor a speciﬁc context, and aim to solve the problem and haveutility in that context. In this section, we ﬁrst describe howwe applied Design Science, and then the context of TQC,where we carried out the research.

We apply Peffers’ et al. [11] incremental and iterative processto Design Science, which Figure 1 illustrates. The phasesare linked to the research questions listed in Table 1. Wealso discussed about the research with TQC’s stakeholdersfrequently and their feedback was incorporated.

Problem identiﬁcation (RQ1).

As part of the Open-Req research and innovation collaborative project , we con-ducted a multiple case study to understand the needs ofcompanies for a platform to support large-scale require-ments engineering [12]. TQC was one of the ﬁve organi-zations in the study – the details of the protocol are avail-able at [12]. The main problems found in these interviewswere: information overload, limited tool support, handlingof dependencies between requirements, and stakeholderidentiﬁcation for issue assignment. RQ1 reﬁnes the mainﬁndings of [12] from TQC’s Jira use perspective. After a pre-liminary analysis, we excluded the problem of stakeholderidentiﬁcation because open source communities are in gen-eral sensitive to disclosure of personal information. Theremaining problems, which are detailed by the drawbacksin Section 4, were reﬁned during the process of developinga solution. Deﬁne solution objectives and design (RQ2).

On thebasis of the drawbacks, we synthesized the solution ob-jectives and scenarios, as well as solution techniques thatintegrate with Jira as the tool, and the existing issues of Jiraas the data. As a major design principle, the solution shouldnot change but rather support and complement the currentprocesses at TQC to lower the adoption barrier. The designwas based on promoting the role of dependencies in issuemanagement. We deﬁne the solution in Section 5.

Implementation of the solution and demonstration ofits operations (RQ3).

The incremental development of theartifact to realize the solution was done iteratively with

Fig. 1. The phases of incremental and iterative Design Science process of Peffers et al. [11] applied in this research.TABLE 1Research questions of the study

ID TextRQ1 What drawbacks do stakeholders suffer with current issuetrackers?RQ2 What features can be added to issue trackers to address thesedrawbacks?RQ3 How can these features be integrated in an issue tracker so thatit has value for use? a continuous feedback loop, which helped to ensure thatthe artifact was meeting general quality objectives, whichwe structured following the ISO/IEC 25010 product qualitymodel [13]. These challenges shaped the ﬁnal artifact design,as detailed in Section 6.

Evaluation.

The evaluation is divided into veriﬁcation and validation [13]. Veriﬁcation evaluates the results againstthe stated objectives and it was carried out by executingan extensive set of tests that explored the functionality andobserved and measured the product quality characteris-tics [13]. Validation techniques assess the results with theirintended users. For validation, we interviewed ﬁve TQC’sstakeholders who were all active Jira users who tested andused our solution. The technical details and results of theevaluation are provided in Section 7.

TQC’s product is a software development kit (Qt) that consistsof the Qt software framework itself and its supporting tools,including the integrated development environment (IDE)called

Creator , and the

3D studio (3DS) and the

Automotivesuite extensions to the Qt software framework. Qt specif-ically targets the development of cross-platform mobileapplications, graphical user interfaces, and embedded ap-plications. Qt is estimated to be used by about one milliondevelopers and most of today’s embedded and touch screensystems rely on Qt. Qt is licensed under open source andcommercial licenses.All issues of Qt are managed in Jira, and Jira is the onlysystem for product management and requirements engi-neering. Each Jira issue has an ID consisting of a preceding project acronym and a running number (e.g., ‘QBS-991’ ),a title (‘Qt Android support’) and description, as well asseveral properties, such as the type (in QBS-991, a bug), re-lease (referred to as Fix Version/s ), priority, status (identiﬁeswhere an issue is in its life-cycle, such as ‘Open’, ‘Closed’),resolution (gives additional details for status, such as anissue is closed because it is a ‘Duplicate’), and automaticmeta-data, such as the creation date. There are variousreleases, such as major and minor releases, and bug ﬁxes,and the release numbering typically follows up to three-part(x.y.z). Priority is a number from 0 (‘P0 blocker’) to 5 (‘P5 notimportant’). In addition, an issue includes comments.In TQC’s Jira, issues may report Bugs, Epics, User Sto-ries, Suggestions, and Tasks. While bugs are the prevalentissues, TQC aims to organize development by applying anissue hierarchy like in agile methods: large functionalities orfeatures are deﬁned as Epics that are reﬁned as User Storiesand further as Tasks. In addition to the parent-child rela-tionships induced by this issue hierarchy, issues can havedependencies referred to as

Issue Links in Jira. These linkscan only be set by employees of TQC or authorized open-source developers. Other TQC Jira users, even the creatorsof issues, cannot set any links. TQC’s Jira supports thefollowing links: ‘duplicates’, ‘requires’, ‘relates’, ‘replaces’,‘results’, and ‘tests’. All these links are bidirectional (e.g.,‘is related to’ and ‘relates to’), but it is not uncommon forusers to declare an incorrect direction, especially in the caseof a duplicate, as the resolution already shows duplication.For simplicity, we use the term dependency to denote bothparent-child relationship and links. There are also severalexceptions or misuse for these patterns. Sometimes issuesare used only to gather other issues, such as one majorepic depending on other epics as epics cannot form aparent-child hierarchy (e.g., QTBUG-62425 ). Some issuesgroup other issues in the description or comments ﬁeld (e.g.QTCOMPONENTS-200 ) and not necessarily all of them arelinked in the appropriate ﬁelds.TQC’s Jira is divided into projects. Examples include:‘QTBUG’, which contains issues related to the Qt Frame-work; and ‘QTCREATORBUG’, which contains issues re-lated to Creator. The large projects are further divided into

4. https://bugreports.qt.io/browse/QBS-9915. https://bugreports.qt.io/browse/QTBUG-624256. https://bugreports.qt.io/browse/QTCOMPONENTS-200

TABLE 2The number of issues and dependencies in the three largest and otherprojects in total on the 29th November 2019.

Issues Internaldependencies Cross-projectdependencies*

Qt Framework 78,676 15,739 1,811Creator 21,926 3,126 1,1323D Studio 3,877 2,023 133Other projects 15,441 3,517 1,307* A dependency between two projects is counted in both projects. components, such as a Bluetooth component in ‘QTBUG’.Each component has a responsible maintainer either fromTQC’s R&D department or the open source community.TQC’s product management has more general responsi-bility for the projects. The projects, and components, arenot isolated but have cross-project dependencies to eachother, such as Automotive suite being built on top of QtFramework.TQC operates in a meritocratic manner in which devel-opers get promoted when they contribute to Qt and receiverecommendations from other developers. This meritocraticstructure is reﬂected on TQC’s Jira. Anyone can register andreport issues to TQC’s Jira as well as view the full detailsof issues, follow issues, and add comments. However, onlythose who have received elevated rights can edit issuesin order to preserve issue quality and the integrity of theissues.To monitor overall progress, TQC uses dashboards foreach release with swim lanes for status categories ‘notstarted’, ‘in progress’, ‘blocked’, and ‘done’. A dashboardis a feature in Jira to automatically ﬁlter, organize, andvisualize a set of Jira issues based on their property values,such as the above release and status.TQC’s Jira is an independent deployment in a virtualmachine in Amazon cloud. In addition, TQC has a snapshotof this virtual machine as a test environment, which we usein our research. The snapshot was taken on November 29,2019. In this snapshot, TQC’s Jira is divided into 20 public,separate projects which can have cross-project dependenciesto other projects in Jira. We used this same snapshot of thedata for all tests in order to make the results comparable.Table 2 shows the number of issues and dependencies inQt Framework, Creator, and 3D Studio, which are the threelargest projects, and the remaining 17 other projects com-bined. Out of the total of 119,920 issues, 26,746 (22%) issueswere modiﬁed within the past year (29.11.2018-29.11.2019),and 25,938 (22%) were open, i.e. not resolved, at the endof the period. Modiﬁcations include any changes, such asediting text, changing properties, or adding comments. Inaddition, TQC has about ten private projects in Jira for spe-ciﬁc customers and product management, which contain afew thousand additional issues. For conﬁdentiality reasons,these projects are not included in the data-set of this paper.

RAWBACKS IN ISSUE MANAGEMENT

The ﬁrst research question of our study results in a reﬁne-ment of drawbacks related to the use of Jira at TQC.

Drawback 1.

Limited view of the issue dependency net-work.

TQC’s Jira users, when resolving an issue, typically need to take the issue dependency network into account.For instance, in the epic – user story – task hierarchy, adeveloper needs to consider all issues in the hierarchy.Another example is having two issues where one issue canbe resolved only if a solution is found for the other.As noted, Jira issues have Issue Links for dependencies.To explore a resulting issue dependency network beyonddirect dependencies, a user needs to follow the dependen-cies from one issue to another. The drawback is that it istedious and error-prone for TQC’s Jira users to form anoverall understanding of the network structure by followingthe links one by one, because Jira does not support anyother ways to explore an issue dependency network —there is no view beyond the list of direct dependencies.Moreover, none of the features in Jira, such as searchesand dashboards, can take any dependencies into accountautomatically, as the dependencies appear only in the issuepages. However, the issues of TQC’s Jira constitute a set oflarge, disconnected networks comprising both internal andcross-project dependencies, in which the largest networkconsists of 8,952 issues.

Example 1.

Issue QT3DS-1802 has 15 dependencies toother issues, which in turn have another 59 additional directdependencies. The network grows further similarly beyondthese issues. A Jira user needs to open each dependentissue in order to see their details and how many – ifany – dependencies there are in these dependent issuesbeyond direct dependencies. This means, in the worst case,separately opening dozens of issues, and keeping in mindwhat is dependent on what and how. This is practicallyimpossible.

Drawback 2.

Issues lack explicit dependencies.

Jirarequires users to report dependencies among issues man-ually. Eventually, users may not report all of them, resultingin missing dependencies. TQC’s Jira users have reportedthat this is a frequent situation and identiﬁed ﬁve differentreasons behind missing dependencies: • Unawareness.

When reporting an issue, a user is notaware of all related issues and may completely missthe corresponding dependencies. • Uncertainty.

A user may be unsure whether a certaindependency is needed or not, and thus may mistak-enly decide not to add it. In fact, it is customary inTQC’s practices that uncertain dependencies are onlymentioned in the description or comments of an issuerather than marked properly. • Discrepancy.

Users have different opinions on whetheror not an explicit dependency is needed. • Lack of time.

Even when a user is completely sure abouta dependency, adding it can be cumbersome, requiringseveral actions (clicks, scrolls, etc.). • Lack of permissions.

Not everyone is allowed to adddependencies. As said above, adding dependencies inJira is editing an existing issue to modify its properties,and at TQC this operation requires elevated privileges.In this situation, it becomes difﬁcult (if not impossible)for TQC’s Jira users to be aware of all dependent issues,considering the potentially large size of the dependency net-work as pointed out above (and manually searching them istedious and error-prone work). Missing dependencies may have critical consequences for activities like ensuring theintegrity and quality of a release.To understand the magnitude of possibly missing de-pendencies, we can take a closer look at the Qt 3D Studioproject. A developer of Qt 3D Studio stated once that theiraim is to use dependencies rigorously. As a result, 50% ofthe issues in the project have dependencies compared to25% in Qt Framework and 24% in Qt Creator. Because anissue can have multiple dependencies, another measure isthe dependency-issue ratio, i.e., how many dependenciesthere are compared to issues. The ratio in the Qt 3D studiois 0.6 and 0.2 in the other two projects (cf. Table 2).

Example 2.

A Jira user commented on the issue QBS-881: ”i see this task as being redundant with QBS-912- close?” (sic). Another user responded and agreed in afollow-up comment. However, no-one declared an explicitdependency. As a consequence, dependencies do not appearproperly but require reading the comments, which makesunderstanding the issue dependency network even morechallenging. Furthermore, when a user, such as the reporteror a watcher of the bug, inspects whether the bug has beenresolved, they can see that the bug has been closed as aduplicate, but they do not see in which version the bug hasbeen ﬁxed unless they notice the comment about duplica-tion and open issue QBS-912. In practice, such commentsoften go unnoticed. On the other hand, users looking atQBS-912 cannot ﬁnd QBS-881 to be its duplicate becausethe comment is not visible on this end.

Drawback 3.

Duplicated issues are reported.

As anyonecan report an issue, it is not uncommon that the sameconcept is reported more than once, resulting in very similarissues, which can be considered duplicates. For instance,a unique bug can be identiﬁed and reported by differentusers, or similar features can be requested several times. Asfound in [12], TQC’s Jira users reported that it would beconvenient to detect and link duplicates in order to bettercomprehend the structure of the issue network. On the otherhand, it is also important not to delete any of them becauseeach similar issue can still have some original content ofits own. For example, different issues reporting the samebug may contain a description in different contexts, makingdebugging easier, or may suggest slightly different solutionsthat provide novel insights.Issue trackers offer limited features as support for mod-eling and identifying these duplicates. Jira offers ‘duplicate’as a resolution property value to indicate that the issueduplicates another issue, and the ‘duplicates’ dependency toconnect duplicate issues. Any issue that duplicates anotherissue should have a ‘duplicates’ dependency towards theduplicated issue, and have the resolution and status proper-ties marked as ‘duplicate’ and ‘done’, respectively. However,this does not always happen. For instance, TQC’s Jira hasin total 8,150 (7%) issues marked as ‘duplicate’, of which5,839 lack a ‘duplicates’ dependency. 4,925 of these issueshave some other dependency, which in some cases can meanthat, e.g., a ‘relates’ dependency is used incorrectly to denoteduplication. Still, the remaining 914 issues do not have anydependency. In addition, it is possible that duplicate issueshave simply been closed without setting the resolution, orthat some duplicates have gone unnoticed. Since duplicate dependencies are a type of dependency,the reasons for, and consequences of, missing duplicates aresimilar to the previous drawback. Another shortcoming isthat the TQC community can voice their opinion on issuesby watching them. This is an indicator for TQC about thepopularity of an issue. If there are duplicates of an issueand the watchers are split over all of them, TQC will not beable to hear the community voice properly, since that voiceis incoherent. Thus, important information goes missingunless duplicates are detected.

Example 3.

Example 2 already represents a missingduplicate dependency but likewise issue QTBUG-33588 con-tains three comments suggesting a link to three differentissues: “May be related to QTBUG-3145”, “Could be re-lated to QTBUG-34552 Please, consider increasing priorityof this issue since there’s not work-around. Thanks.” and“QTBUG-35085 is relevant as well since custom contextmenus are also broken.” (sic) Even though QTBUG-33588has three different comments suggesting a link to differentissues, no link is marked in TQC’s Jira. While the QTBUG-33588 only has 6 watchers, the issues mentioned have 33watchers altogether.

Drawback 4 . Incorrect release assignments and prior-ities in an issue dependency network.

As Qt has speciﬁcrelease cycles, it is relevant when issues are – or are plannedto be – resolved when taking the issue dependency networkinto account. For example, an

A requires B dependencymeans that the solution of A needs the solution of B to oper-ate properly – it is not meaningful to implement or release A ﬁrst as its solution will not be useful without B . TQC’s Jirausers reported two practically relevant dependency rules : • Parent-child rule . In a parent-child dependency, the chil-dren must be scheduled in the same or an earlier releasethan its parent, or have a lower priority. • Requires rule . A required issue must not have a laterrelease or lower priority than an issue requiring it.However, TQC’s Jira users do not always set the depen-dencies, priorities, and releases of issues correctly, and thedashboards and ﬁlters – like practically all functionalities inJira – are not able to take dependencies into account. As aresult, the checks for dependency rule violations in an issuedependency network need to be carried out manually byinspecting the release, priority, and dependencies of eachissue. Any violation that goes unnoticed can lead to anincomplete release. We found that over 12% of these de-pendencies in TQC’s Jira violate the rules. All dependencyrules violations must be manually located and corrected.

Example 4.

An example of an incorrect release versionwith a ’requires’ rule is issue “QTBUG-72510”. It has releaseversion 5.13 and a sub-task that is not assigned to any re-lease. An example of an incorrect priority is“QTBUG-27426”(with priority P0) requiring “QTBUG-28416” (priority P2).This violates the rule that a required issue cannot have alower priority.

BJECTIVES AND F EATURES FOR THEENRICHMENT OF ISSUE MANAGEMENT

Following the Design Science process presented in Section 3,in this section, we cover the objectives and scenario, and the background and concrete techniques of our solutions.

On the basis of the drawbacks enumerated in the previoussection, we synthesize the objectives of our solution, whichaims to improve dependency management in TQC’s Jira. • Users gain a better understanding about the existingissue dependency network in the surroundings of theissues they are working on. • Users can search for missing dependencies and uniden-tiﬁed duplicate issues of the issues they are workingon. • Users can check the correct release assignments andpriorities of the issue dependency network in the sur-roundings of the issues they are working on and theycan receive suggestions for resolving inconsistencies.These objectives share three common characteristics.First, the objectives integrate into the current ways of work-ing at TQC, being usable whenever needed without dis-turbing existing processes. Second, the objectives are aboutimproving Jira so that their realization becomes integratedinto the functionalities and, especially, data of Jira. Third, theobjectives address the context and surroundings dependentissues of the existing issues that the user is working on. Thatis, the objectives primarily address tool improvement ratherthan process improvements or changes at TQC.In order to make objectives more concrete, we illustratean example scenario as follows:”Jane, a developer at TQC, is assigned to develop a solu-tion for an issue A , which is a task in a user story for the nextrelease. To understand A better, she opens a user interfacethat visualizes all dependencies and issues in the proximityof A . Jane also gets a notiﬁcation about another issue thatlooks like a duplicate of A . She checks and conﬁrms that theissues are duplicates, which means resolving the other issueand creating a ‘duplicates’ dependency between the issues.She also gets a notiﬁcation that another issue is a part of thesame user story A , but it is not assigned to the same releaseeven though its priority is the same as in the user story. Thisis a mistake in the release assignment that needs to be takeninto account and resolved before the release is complete.” The fundamental principle of our solution is that the rolesof dependencies in Jira can be ﬁrst-class entities rather thanonly properties of issues. We approached this by handlingissues and dependencies as two separate entity types in agraph-like structure: issues are nodes, and dependencies aretyped (i.e., labeled) and directed edges between the nodes.This approach gives issues a context beyond their explicitproperties, revealing implicit constraints, e.g., the mutualaggregation of two issues through a dependency betweenthem. Moreover, dependencies can then have properties oftheir own, like issues have, such as a status and creationdate.We deﬁne what we call an issue graph as follows. Wedenote the set of all issues as R and the set of all depen-dencies between issues of R as D , i.e., D ⊆ R × R , where D is anti-reﬂexive, i.e., ∀ r i ∈ R : ( r i , r i ) / ∈ D ; and all edges are bidirected, i.e. ∀ r i , r j ∈ R : ( r i , r j ) ∈ D ⇐⇒ ( r j , r i ) ∈ D .That is, for every edge that belongs to the graph, there isalso the corresponding inverse edge where the semantics ofthe edge depends on the direction. For a particular issue r ∈ R , the issue graph is a symmetric connected graph G = ( R , D ) , where R ⊆ R and D ⊆ D , so thatall issues of R are reachable from r , i.e., for all issues r i ∈ R there is a path from r to r i and D includesall dependencies between the issues in R and only thoseones. A special case of G is an orphan issue r that has nodependencies, and thus for which R = r and D = ∅ .This deﬁnition of an issue graph is issue-centered and doesnot necessarily include all issues ( R (cid:40) R ) because thereis no path between all issues. However, the union of all G , denoted by G = (cid:83) G , contains all issues ( R ) anddependencies ( D ). Equivalently, every G is a componentof G .Given an issue r , we deﬁne G p , called a p -depth issuegraph , as an induced subgraph of G that includes allissues up to p edges apart from r and all dependenciesbetween the included issues. That is, an issue is taken tothe point of focus and we follow all dependencies of thatissue to neighboring issues and beyond, breadth-ﬁrst up tothe desired depth. The rationale and beneﬁt of a p -depthissue graph are that different sizes of contexts of analysis canbe constructed automatically without user involvement, toprovide a given issue with the issues and dependencies in aspeciﬁc proximity.For an issue r i , we can apply the functions,such as r i .property ( priority ) to obtain its priority and r i .property ( release ) to get its scheduled release. Similarly, d i .property ( status ) will yield the status-property of a de-pendency d i , with possible values ‘proposed’, ‘accepted’ or‘rejected’ and d i .property ( score ) will give a score value(0..1) representing the conﬁdence level of correctness orvalidity of the dependency.These deﬁnitions provide the baseline of the backgroundtechniques of dependency management for formulating thetechniques required by TQC’s Jira users, addressing theobjectives presented in Section 5.1. An issue graph ( G )– or the issue graph corresponding to the entire issuedependency network ( G ) – can be generated automaticallywith the information stored in Jira; therefore, any operationdeﬁned over an issue graph or any transformation to anyother formalism (e.g., constraint satisfaction problem (CSP))can be computed from Jira, as we have effectively done. Anissue graph does not need to affect Jira; rather the graphcan form a parallel, complementary structure. In particular,an issue graph ( G ) makes efﬁcient issue management andvisualization easier. In this subsection, we describe four concrete techniques ofour solution, relying on the background techniques builton the concept of an issue graph. These techniques havebeen designed considering the objectives in Section 5.1. Inparticular, the techniques need to work in the context ofTQC, such as provide near real-time response times evenwhen managing large sets of issues, which may sometimesprevent the adoption of more sophisticated approaches.

Algorithm 1

ReferenceDetection( R , projectID ) R : Set of issues of an issue graph projectID : Set of project IDs (e.g., ”QTWB”, ”QTBUG”) D p = []: set of proposed dependencies for all r i in R do for all p i in projectID do toID [] = r i .ﬁndStrings( p i +“-”+[0-9] { } ) for all to i in toID do D p .add( r i , to i , ’dependency’, ‘proposed’) end for end for end for return D p Automated detection of potential missing dependen-cies.

Section 4 showed how TQC’s Jira users may neglect asigniﬁcant number of dependencies. Therefore, TQC wouldgreatly beneﬁt from an automatic dependency detectionprocedure that informs Jira users about missing depen-dencies. This mitigates the burden of searching for thedependent issues, making it also less critical for users tobe familiar with all other existing issues. It is possible toautomatically detect missing dependencies using varioustechniques, including deep learning [14], active learning andontology-based approaches [15].Our solution includes a reference detection technique fornatural language text (see Algorithm 1). This simple tech-nique was selected after prototyping more complex tech-niques, which did not meet the stringent time requirementsand lacked proper training data, and recommendationsfrom TQC’s Jira users who noted that dependencies areoften only mentioned as a reference to another issue inthe textually added content, i.e. the title, description, andcomments, of an issue (Section 4, see Example 2). Thereference detection technique analyzes this textually addedcontent from the issues by searching for sub-strings thatrepresent an issue ID (line 4 of Algorithm 1) and createsproposals for new dependencies whenever other issues arementioned (lines 5–7). The reference detection techniquemarks the found dependencies as ‘proposed’ (line 6).

Automated detection of potential duplicated issues.

The need for, solution to, and beneﬁts of automatic duplica-tion detection are much like the above because, as alreadycommented in Section 4, duplicates result in a particulartype of dependency in Jira. State-of-the-practice approachesuse bag-of-words of natural language representations tomeasure the similarity between these representations usingvector-space models [16]. Among these approaches,

TermFrequency - Inverse Document Frequency (TF-IDF) is the the-oretical baseline for the detection of duplicated entities orissues [17], [18]. More recent deep contextualized models,such as Google’s BERT [19] or ELMo [20], are more suit-able for complex information retrieval scenarios, but theyintroduce a challenge in terms of efﬁciency, complexity,and training data required [21]. These challenges make itdifﬁcult to use them in the TQC context of large issuedependency networks.Our solution (see Algorithm 2) is an extension of theTF-IDF model based on three additional steps to improve

Algorithm 2

DuplicateDetection( G , thr ) G = (R, D) : Issue graph thr:

Similarity threshold score bow = [] : Bag of words clusters = [] : Set of sub-graphs of duplicated issues for all r i in R do bow .add.text preprocess( r i ) end for tﬁdf model = build model( bow ) for all r i , r j ∈ R where i (cid:54) = j and d ij = ( r i , r j ) / ∈ D do score = cosine sim( r i , r j , tﬁdf model ) if score ≥ thr then D .add( r i , to i , ’duplicates’, ‘proposed’, score ) end if end for clusters = compute clusters( R , D ) return clustersthe accuracy and performance of the similarity evaluation.After initially running the title and description of eachissue through a lexical analysis pipeline (Lines 1–4 of Al-gorithm 2), we built a TF-IDF model from the resultingbag-of-words representations (Line 5). Then, we apply thecosine similarity for the resulting TF-IDF model to compareeach pair of issues. Each resulting score is then comparedto a context-based minimum threshold value to decidewhether a pair is a potential duplicate, in which case a new‘duplicate’ dependency proposal is constructed (Lines 6–11).After the similarity evaluation, we represent the du-plicated issues as sets of complete graphs, where issueshave an existing or proposed ‘duplicate’ dependency toother issues. We treat these sets of graphs as clusters —the process proposes sets of duplicated issues by simplyincluding the issues belonging to the same cluster (Lines12-13). During this process, we apply transitivity throughexisting duplicate dependencies to all issues belonging toa same cluster, which results in new duplicated proposals.Hence, instead of reporting all the existing and proposed‘duplicate’ dependencies among them, we only report theduplicated dependency with the greatest similarity scorefor all other issues in the cluster. Given a sub-graph of m duplicated issues, the clusters can be reported using ( m − dependencies instead of representing all ( m ∗ ( m − / )dependency objects, improving performance efﬁciency indata processing and transactions. Contextualization of dependency proposals for an is-sue.

Contextualization is a practically necessary techniquethat takes into account the user’s context and prioritizesthe results of the detection techniques when a user fetchesdependency proposals for an issue ( r ). We present inAlgorithm 3 the resulting algorithm that aggregates thesetechniques into a holistic solution. The algorithm presumesthat Algorithm 1 and 2 have been executed, and the resultsare stored and retrievable.First, our solution retrieves the stored results of detectiontechniques for an issue r (Line 1 of Algorithm 3). If adependency is proposed between the same issues by bothtechniques, retrieving includes merging these two propos-als into one ‘duplicates’ dependency with an aggregated Algorithm 3

Proposals( r , D , D (cid:48) , depth , orphan , property ) r : Issue of interest D = [( r , r p ) , ... ] : Dependencies for r in TQC’s Jira D (cid:48) = [( r , r (cid:48) p ) , ... ] : Dependencies for r stored as rejected depth = [p, f depth ] : Minimum depth and its factor orphan = f orphan : Orphan factor (default value = 1) property = [[ p , v , f ],...] : Properties, values and factors D p = [] : Set of proposed dependencies for r D p .combine ( ref erences ( r ) + duplicates ( r )) for all d p in D p [] do if ( d p member of D ) OR ( d p member of D (cid:48) ) then D p .delete ( d p ) else if r .distance( d p .r p ) > p then d p .score.multiply ( f depth ) end if if d p .r p .orphan () then d p .score.multiply ( f orphan ) end if for all ( p i , v i , f i ) in property ( p, v, f ) do if r p .property ( p i ) == v i then d p .score.multiply ( f i ) end if end for end if end for return D p score. The ‘duplicates’ dependency is applied because thereference technique does not propose any type. Because theproposal is more likely correct when two techniques detectit, the aggregation is simply the sum of the cosine similarity( .. , see Algorithm 2) for duplicate detection and a defaultvalue for reference detection. This also prevents a proposalbetween two issues from appearing twice.Next, the solution examines all proposals obtained (loopcomprising Lines 2–18). As the detection techniques canresult in proposals of dependencies for r that currentlyexist in TQC’s Jira or have already been rejected by users,these are ﬁltered out from the combined proposals (Lines3–4).For the remaining proposals, our solution applies twospeciﬁc contextualizations that were developed based on thefeedback of TQC’s Jira users. Both of them rely on factors thatare used to multiply, i.e. to increase (or decrease if the factoris < • Issue graph based contextualization has the purpose ofprioritizing the proposals to issues that are not inclose proximity in the issue graph and are consideredmore valuable to the user. First, it increases the scoreof those dependencies from r to issues in differentissue graphs, or in the same issue graph with a greaterdistance than the given minimum depth p (Lines 6–8).Second, it increases the score of dependencies from r toorphans; in this way, the orphaned, disconnected issuesbecome easier to discover as a part of an issue graph Algorithm 4

CheckConsistencyAndDiagnose( r , G ) G = ( R , D ) : Issue graph for r D i : Inconsistent dependencies diag d : Dependency diagnosis diag i : Issue diagnosis mergeDuplicates( G ) for all d in D do if inconsistent( d ) then D i .add( d ) end if end for if D i = ∅ then return(‘Consistent’) else diag d = FastDiag( r , D , sortByPriority( R − r )) diag i = FastDiag( r , sortByPriority( R − r ) , D ) return(‘Inconsistent’, D i , diag d , diag i ) end if (Lines 9–11). • Property based contextualization increases the score whenthe properties of an issue in a proposed dependencyhas the same values as speciﬁed by the user, such asin environment, project, or creation time (Lines 12–16).For example, if a user wishes to ﬁnd duplicates fromthe Qt Framework project, the scores of those proposalsthat have the issues in this project are increased.

Automated consistency check and diagnosis of incon-sistencies.

Dependencies between issues need to be consid-ered when analyzing the correctness of release assignmentsor priorities in issue graphs. The existing release planningmodels (cf. [5], [6]) are techniques for the task of ﬁndingan optimal release assignment from existing requirementsby assigning requirements to releases. Since the releaseassignment task is not a problem at TQC, the existing releaseassignments need to be checked for consistency. When an is-sue graph is represented in a more machine-understandablemanner, a consistency check is an elementary operation thatcan be automated. In addition, a diagnosis can identifyminimal conﬂict sets that lead to consistency. The ﬁrstdiagnosis algorithm HSDAG (Hitting Set Directed AsyclicGraph) [22] uses breadth-ﬁrst search to ﬁnd all minimalsets of constraints that could be deleted to restore the con-sistency. Several improved diagnosis algorithms have beendeveloped (e.g., [23]). Clearly deﬁned dependency types(e.g., [7], [24], [25]) form the basis for any automation.In our solution for consistency check and diagnosis,we utilize ‘requires’ and ‘parent-child’ dependencies, whichhave well-deﬁned semantics that take priorities and re-lease assignments into account; the details are described inDrawback 4 in Section 4. In addition, our solution mergesissues with the ‘duplicate’ dependency between them andthe resulting merged issue inherits all dependencies fromthe merged issues; this is the ﬁrst step (Line 1 of Algo-rithm 4). The consistency check is a procedural methodthat evaluates, for each dependency, whether the conditionsof the dependency are satisﬁed, and reports the violateddependencies (Lines 2–6).If the dependency contains inconsistent dependencies,

Fig. 2. The software architecture of the artifact. diagnosis can be invoked. We adopted FastDiag (see detailsin [26]), which is an efﬁcient divide-and-conquer algorithmused to determine preferred diagnoses of constraint sets.Diagnosis applies a CSP representation of an issue graphwhere dependencies, priorities, and releases are constraints.Constraints are assumed to be in a lexical order accordingto their priorities: a higher priority constraint is retained ifat all possible, even if all lower priority constraints wouldhave to be removed.

The issue diagnosis (Line 9) identiﬁes aset of issues that need to be assigned to a different releaseor re-prioritized or removed to restore the consistency ofthe network. For this diagnosis, each issue is considered asa constraint that can be relaxed or ’diagnosed away’.

Thedependency diagnosis (Line 10) determines a set of depen-dencies whose removal from the issue graph restores theconsistency.

RTIFACT IMPLEMENTATION

In this section, we describe the developed artifact (cf. Sec-tion 3). We ﬁrst elaborate on the objectives of the artifacts,which we derived from TQC’s Jira users, and then describethe implementation.

We articulate the design objectives using the eight ISO25010quality model characteristics [13]. • Functional suitability . The artifact needs to implementthe techniques described in the previous section in thecontext of TQC’s Jira. • Performance efﬁciency . The artifact needs to efﬁcientlyhandle a large number of issues and the efﬁciencyof Jira may not be unacceptably damaged. TQC’s Jirausers informally estimated this goal as “responses evento the largest requests within a few seconds”. • Compatibility . The artifact itself needs to be compatible(co-exist and interoperate) with Jira’s functionality anddata without the need to develop additional softwarefor interchanging data or accessing functions. • Usability . The usage of the artifact needs to be smoothlyintegrated with Jira, and the way of working at TQC. • Reliability . The integration of the artifact and its datashould not interfere with Jira’s current issue manage-ment. TQC’s Jira users had as their top priority to avoidany risk concerning their current Jira management. • Security . The solution must not compromise privatedata, especially from non-public projects, and mustadhere to TQC’s access policies. • Maintainability . The architecture needs to support easyevolution and extension as Jira evolves, as well as allowfor easy integration of new techniques. • Portability . The solution should not be strongly tiedto any particular technology except Jira, or imposeunnecessary additional installation decisions.

We implemented the artifact as a Jira plugin and service-based system consisting of independent microservices( → maintainability, compatibility), which in practice operatein a choreographic manner following a layered architec-tural style. The services collaborate through JSON-basedmessages following a generic ontology [27] that adheres toREST principles ( → portability). There are three classes ofmicroservices and the plugin as summarized below and inthe architecture diagram in Figure 2.

1. Integration microservices.

First, one microservice(

Milla ) integrates with Jira, fetches issue data, and constructsdependencies as separate, ﬁrst-class entities. We realizedthe integration by using Jira’s existing OAuth-based RESTAPI ( → portability, security). A full projection of TQC’sJira issues is made and relevant information is cached toprovide more efﬁcient access to issue data ( → efﬁciency).The resulting issue and dependency data from Jira is cachedin a local database embedded into an auxiliary integrationmicroservice ( Mallikas ). Frequent updates fetch new andchanged issues from TQC’s Jira ( → compatibility).

2. Detector microservices.

After the data projection hascompleted, the integration service (

Milla ) sends the resultingissues and dependencies – or their changes when updating– to the detector microservices for processing ( → efﬁciency).The reference detector ( Nikke ) searches for missing depen-dencies (i.e., implementing Algorithm 1 presented in Sec-tion 5) and the similarity detector (

ORSI ) searches for dupli-cated issues (implementing Algorithm 2). As users make alimited amount of references, the former (

Nikke ) is stateless.It returns proposed dependencies (‘proposals’ in Figure 2),which are then stored in the same local database (

Mallikas )as the existing dependencies, applying the ‘proposed’ valuefor the status-property. However, the similarity detector(

ORSI ) requires persistence on service-side to optimize the Fig. 3. Two screen captures of Issue Link Map for the issue QBS-585 of TQC Jira. The p-depth issue graph to depth ﬁve ( G ) and the properties ofthe issue ( r ) from Jira are shown on the left. The consistency check tab is shown on the right. similarity due to clustering and vector-based algorithms.Therefore, the proposals are stored internally in the cluster( → efﬁciency).

3. Model microservices.

The integration service (

Milla )also sends the issue graph ( G ) to the model microservices( Mulperi and

KeljuCaas ). These microservices translate theissue graph into a more general knowledge representation,and store the data as a map datatype with issues as keys anda list of said issues’ neighbors along with the correspondingdependency types. The way in which the issue graphs arestored allows the easy extraction of various p -depth issuegraphs ( G p ) by following the dependencies recursively tothe required depth ( → efﬁciency).

4. User interface plugin.

Users interact through a ded-icated Jira plugin (

Fisutankki ) installed in TQC’s Jira. Theplugin technology integrates the user interface into Jira andinto Jira’ security mechanisms ( → usability, compatibility).This allows public access, where authenticated users adhereto Jira’s security schema ( → security).On the users’ side, Issue Link Map [28] (Figure 3) isembedded in the Jira plug-in (

Fisutankki ), which creates abrowser-based user interface ( → usability, compatibility). Acentral part of the user interface is a 2D representation ofa p -depth issue graph ( G p ). The issue ( r ) in focus is inthe center, and the other issues are automatically positionedaround it circularly, depending on their depth. A user canselect the desired depth, up to depth ﬁve, from the top-left,rearrange the the issues, zoom in and out, etc. The colorsindicate the status of the issues. A set of ﬁlters, such as typeor status, can be applied to the visualization. On the right,various tabs represent the other techniques. The ﬁrst tabshows the basic information for the selected issue as in Jirabecause the 2D diagram cannot convey all the details of anissue. The second tab shows dependency proposals, whichare then also shown in the 2D diagram as dashed lines. Thethird tab shows the results of the consistency check.The user interface accesses the functionality providedby other services through REST calls, which we refer toas queries in Figure 2. Each query goes through the plu-gin ( Fisutankki ) that applies Jira’s security policies. Then the integration microservice (

Milla ) orchestrates all queriesto other microservices ( → maintainability). The elementaryfunctionality to initiate the user interface is to query anissue graph to depth ﬁve ( G ) from the model microservices,which the user interface visualizes to the desired depth.The integration microservice ( Milla ) processes a user’squery for a dependency proposal implementing Algorithm3. First, it combines reference proposals (in

Mallikas ), simi-larity proposals (in

ORSI ), and removes rejected proposals(stored in

Mallikas ). Second, it calls the model services forthe desired p-depth issue graph ( G p ) to apply the issuegraph-based contextualization. Third, it queries the cacheddata (in Mallikas ) for the property-based contextualization.A user can accept a proposed dependency that requiresthem to specify its type, or reject or disregard the proposal.Provided that the authorized user has sufﬁcient privileges,the plugin (

Fisutankki ) writes accepted decisions to Jira asnew dependencies, while the local database (

Mallikas ) storesrejection decisions.The integration service (

Milla ) forwards a user’s queryfor consistency check and diagnosis to the model servicesthat ﬁrst construct an issue graph ( G p ) internally andprepare data for inference, such as translating the versionnumbers to integers (in Mulperi ). Then the consistency checkis carried out and, in case of inconsistency, the issue graphis read to constraint programming objects and the Chocosolver [29] (in

KeljuCaaS ) is used to infer diagnosis (Algo-rithm 1).The microservices are deployed to the same serveras TQC’s Jira, which then relies on the server’s securitymechanisms (‘server boundary’ in Figure 2). Although themicroservices use secure communication, the data is nottransferred to other servers remaining behind the server’sﬁrewall – only the plugin’s (

Fisutankki ) REST endpoint ispublicly accessible ( → security). VALUATION

The evaluation focused on veriﬁcation of microservices bysystem tests for functionality (Sections 7.1-7.3) and perfor- TABLE 3A summary of evaluation, metrics, and used data-sets.

Technique Metric Id Description Data-sets

Issue graphhandling p -depth issue graphs p -depth issue graphsDependencydetection requires dependencies Qt Repository parent-child dependencies p -depth issue graphs mance (Section 7.4), and validation of the solution by user-interviews (Section 7.5). Table 3 summarizes the metrics forfunctionality, and we measured performance by the execu-tion times. Throughout the evaluation, we had our artifactdeployed to the TQC’s test environment (cf. Section 3.2) andwe used its full public data – referred to as Qt Repository – consisting of 119,920 issues in 20 different projects andtheir 29,582 dependencies. Additionally, we executed allmicroservice veriﬁcation tests for comparability using thesame Linux computing node with a single Intel Xeon CPUE7-8890 v4 2.20GHz processor and 50GB memory located atUniversity of Helsinki, Finland.We did not measure reliability, but we did not encounterproblems with reliability during the test period. We ex-perimented with various test setups and the ﬁnal teststook over a week without discontinuity of service. A smallnumber of performance tests behaved abnormally, suchas 9 out of 119,920 (0.0075%) dependency queries, whichshould take roughly the same time but in practice took morethan twice the average time. Since we could not reproducethe behavior, we assume that they were caused by theinfrastructure, such as Java’s garbage collection. The systemwas also operational, although only experimentally used forseveral months, in TQC’s test Jira without discontinuity ofthe service.

Evaluation design and data-set.

For evaluation of thebackground techniques (Section 5.2), we carried out anexploratory analysis of Qt Repository. This included theevaluation of metrics related to the topology and size of thegenerated p -depth issue graphs as shown by the ﬁrst blockin Table 3. Evaluation results.

In total, 31,182 issues (26%) have atleast one dependency declared by TQC’s Jira users by IssueLinks in Jira ( in Table 3), meaning that 88,738issues (74%) are orphans before any automated dependencydetection. Out of the issues that have dependencies, 75%have only one dependency. The average is 1.7 and themedian is 1. As noted in Section 3.2, issues are sometimesused for grouping, resulting in and explaining that the maximum number of dependencies is 139 and 24 issueshave at least 50 dependencies. Generating all different p-depth issue graphs for all issues (i.e. ∀ r i ∈ R we generateda G pi ∀ p ∈ [1 , n ] so that G ni = G i ) resulted in 320,159issue graphs ( p -depth-graphs ). By analyzing the number ofissues in various p -depth issue graphs ( p -graphs ),we observed that the largest issue graph consists of 8,952issues, and the maximum depth in its topology is 42. Thisissue graph is exceptionally large, with a large number ofsubgraphs, as the next largest maximal issue graph consistsof 162 issues with the maximum depth of 16. Finally, weinspected the number of issues in all different p -depth issuegraphs ( G ) reveals that there are many dependencies but alsodisjoint issue graphs ( G ) including orphans. However, thenumber of issues in p -depth issue graphs can often be quitelarge, and grow rapidly and exponentially as a consequenceof average dependency count but also the grouping issuesin the topology. Evaluation design.

We evaluated quantitatively the resultsof the reference detection (

Nikke ) and the duplicate detection(

ORSI ), as well the union and the intersection of their re-sults. We also report a statistical quality analysis by runninga cross-validation analysis with k=10 for a sub-set of labeledpotential duplicated issues (the second block in Table 3).

Data-sets.

The analysis was carried out for each issue inthe following data-sets. • Qt Repository . All issues and their dependencies. • Duplicate set . A sub-set of

Qt Repository consisting of5,839 issues marked as duplicates without ‘duplicate’dependency (See Drawback 3 in Section 4). As theseissues were duplicates, we assumed a duplicating issuein

Qt Repository . • Duplicate set . A sub-set of

Duplicate set consistingof 914 issues without any dependencies. TABLE 4The results of dependency detection in terms of and as deﬁned in Table 3

Data-set Detector (%)

Qt Repository Reference detection 24,097 (20%) 31,646Duplicate detection 45,570 (38%) 578,739Union 60,250 (50%) 610,348Intersection 1,727 (1%) 1,801Duplicate set

TABLE 5Cross validation results of detectors for

Cross-validation set . Measure Reference detection Duplicate detection

Accuracy 77,15% 91,66%Recall 53,31% 86,15%Precision 100,00% 96,42%F-measure 69,54% 91,00% • Cross-validation set . A sub-set of 2,936 pairs of is-sues without existing dependencies in

Qt Repository structured as follows. On the one hand, 1,437 pairsof issues reported as duplicates in TQC’s Jira that welabeled as duplicates . On the other hand, 1,499 pairsof randomly selected closed issues with no duplicateresolution reported in TQC’s Jira that we labeled as not-duplicates . Evaluation results

The results of the quantitative anal-ysis for the three ﬁrst data-sets are shown in Table 4. Inthe case of issue graph-based contextualization, only 2% ofthe proposals were three edges apart or closer ( ) in Qt Repository and all resulted from duplicatedetection (

ORSI ). Table 5 shows the results of the cross-validation analysis for detectors services. We compare bothdetectors although reference detection is not designed onlyfor duplicate detection, and therefore the results must beinterpreted with this in mind.The quantitative analysis shows that the detectors havethe potential to expand the issue dependency networkby proposing a signiﬁcant number of novel dependencies.The number of issues for which reference detection makesproposals is relatively large, but the number of dependen-cies for one issue is small – on average 1,4 proposals forissues for which a proposal is made. A few issues haveseveral proposals but an analysis of a sample showed thatdependent issues were then gathered as a list or table (cf.Section 3.2). Duplicate detection ﬁnds proposals for manyissues and results in many proposals per issue, especiallyconsidering that the proposals are about duplicates: 38%of issues cannot be duplicates but the results include falsepositives. Likewise, the number of issues in Qt Repository(119,920), compared to the number of proposed dependen-cies (578,739), indicates false positives. Only a small numberof false positives can be explained by closely connected issues, such as between the children of an epic based onissue graph-based contextualization.Duplicate detection reports balanced quality metrics,with special emphasis on high precision. Compared withthe data in Table 4, our solution tries to reduce false positiveinstances as much as possible, given the large number ofissues and, as a consequence, the large number of de-pendency proposals. This idea is reinforced if comparedwith reference detection results, where a perfect precision isachieved. For reference detection, the low recall is expectedbut high precision is unexpected. However, a qualitativeanalysis of the sample revealed that it is customary toadd a comment about duplication, which explains the highprecision. The precision and small number of proposals ofreference detection were used to justify its default scoreof 1.0, while experimenting with different cross-validationwas used to select the threshold of 0.7 (in Algorithm 2) forduplicate detection.

Evaluation design and data-set.

Using Qt Repository as thedata-set, we analyzed the consistency of each dependencyindividually, i.e. taking into account the dependency andthe issues on both ends, and the consistency and diagno-sis of all p -depth issue graphs ( G p ). However, since wenoticed that different Jira projects do not have comparableand machine-understandable version numbering, we disre-garded all cross-project dependencies from the analysis. Asdiagnoses turned out to be computationally heavy opera-tions, we set the time limit to ﬁve seconds for each p -depthissue graph and did not carry out diagnosis to any greaterdepth. A ﬁve-second limit was considered reasonable fromthe user’s perspective. This limitation was also necessaryas the tests already took over a week, and a larger limit orremoving a limit would have required a signiﬁcantly longertime or design change with little practical value. Evaluation results

The consistency check for eachdependency individually found inconsistency in 780(20%) of ‘requires’ dependencies ( ) and884 (11%) of ‘parent-child’ dependencies ( ). The results of consistency check and diagnosesfor all 320,159 p -depth issue graphs are summarized inTable 6 by depth to a depth of 10 ( G i ... G i ), to draw anoverview on the evolution of inconsistencies with issuegraph depth. With respect to issue graph sizes, the ﬁrstunsuccessful and the last successful execution of issue di-agnosis were carried out for the issue graphs of size 371and 701 issues, respectively. The respective numbers for thedependency diagnosis were 580 and 1362.We observe that a signiﬁcant amount (11-20%) of de-pendencies are inconsistent. However, some of the incon-sistencies result from new issues that have not yet beenassigned to a release. Inconsistency becomes prevalent forissue graphs at any greater depth, as shown by the de-creasing p -depth-consistency , presented as a percentage inTable 6 (the 3rd row). Moreover, the number of detectedinconsistencies increases signiﬁcantly with greater depthsof issue graphs. There are already dozens of inconsistenciesat quite small depths, as shown by the two ﬁrst rows ofTable 6. TABLE 6A summary of consistency check and diagnosis results until depth of 10 ( G i ... G i ). DepthMeasure 1 2 3 4 5 6 7 8 9 10 (%) 100% 100% 100% 99% 91% 69% 54% 39% 28% 21% (%) 100% 100% 100% 100% 98% 80% 67% 55% 41% 32% Success is measured by not exceeding the time limit (5 seconds) since all other diagnoses found a solution.

When considering the success of diagnosis ( issue-diagnosis-success (%) and dependency-diagnosis-success (%) inTable 6), the diagnoses start to fail, i.e. take more than ﬁveseconds, from depth 4 and success rate falls quite rapidlyat any greater depth. At small depths, when all diagnosesare successful, we see that the diagnosis of dependenciesessentially proposes to remove all inconsistent dependen-cies ( ) while the diagnosis of issues re-quires changes to the priority or release of a signiﬁcantlysmaller number of issues ( ). The rela-tively small increase in these numbers as depth increasesmeans that only the smallest issue graphs are diagnosedsuccessfully – there is a large variance in the issue graphsizes at greater depths as covered above. The evaluationshows that the implemented diagnosis is functionally suc-cessful, although it is computationally so expensive thatissue graphs containing over 1000 issues are not practicallymeaningful to diagnose. However, a qualitative analysisof diagnosis results revealed that lexical order does notalways work properly when dependencies are not clearlyprioritized and issues appear in a few priority classes.

Evaluation design.

We divided the performance evaluationinto ( i ) background tasks including updates, which are batchprocesses, and ( ii ) queries, which are usage scenarios. Inorder to individually evaluate background tasks includingupdates in different microservices, we divided the perfor-mance evaluation into a data projection from Jira, which alsocovers processing dependencies, and processing in both de-tectors. We report the average times of ﬁve tests to eliminaterandom errors. For the evaluation of the queries, we appliedthe various usage scenarios to microservices as orchestratedend-to-end system measuring the time from sending auser’s query request to a response. This corresponds withtime for submitting a query to and getting a response fromthe integration service ( Milla in Figure 2). Since we focuson the microservices, we omitted user interface renderingand Jira plugin functionality. We analyzed execution timesin the data-sets for dependency query for all issues, andissue graph initialization, consistency check and diagnosisfor all p -depth issue graphs. Evaluation data-sets.

We applied various data-sets forevaluation as detailed below. • Qt Repository . All issues and their dependencies.

TABLE 7Performance analysis results.

Task (Data-set)

Technique Time

Data processing (Qt Repository)

Data projection (

Milla ) 40 mReference processing (

Nikke ) 31 mSimilarity processing (

ORSI ) 4 h 34 mUpdate processing (Update data-set)

Data update projection (

Milla ) 4.4 sReference processing (

Nikke ) 1.4 sSimilarity processing (

ORSI ) 28.6 sQueries (Qt Repository ) p -depth issue graph query 0.3 sDependency query 1.7 sConsistency check query 1.9 sDiagnosis —Queries (Large issue graphs) p -depth issue graph query 0.7 sConsistency check query 4.7 sQueries (Sizeable issue graphs) p -depth issue graph query 0.01 sConsistency check query 0.2 s • Large issue graphs . A sub-set of

Qt Repository containingall p -depth issue graphs for any p with at least 8,000issues, which integrate 82,640 different issue graphs. Weuse this data-set for the worst-case scenario. • Sizeable issue graphs . A sub-set of

Qt Repository contain-ing all p -depth issue graphs for any p with 500-1,000issues, which integrate 14,783 different issue graphs.We use this data-set to represent a possible large casescenario that a user might be interested in, being similarwith the largest 5-depth issue graphs. • Update data-set . The small project (QTWB) as sub-set of

Qt Repository consisting of 27 issues and 9 dependenciesto simulate an update. This data was ﬁrst manuallyremoved from

Qt Repository and our system.

Evaluation results.

The results of the performance eval-uation are summarized in Table 7 as average executiontimes. Data transfer between servers took the majority ofthe time in the data projection, but even when all softwareis deployed to the same server, we found that data pro-jection takes several minutes because of the large amountof data and Jira’s inefﬁcient REST interface, which requiresfetching issues as sets of individual issues. The p -depth issuegraph queries are fast, and depend on the size of the issuegraph because many issue properties are returned, makingthe return data large. The execution times of dependencyqueries have small variance and do not depend on datasize: the minimum time was 1.3 seconds and 62 queries tookover 2.5 seconds, out of which 25 queries returned fewer than 10 proposals. The time required for the consistencycheck appears to increase almost linearly with respect tothe number of issues. The data has minor variation as 0.15%of queries take 10-17 seconds. We do not present averagetimes for diagnosis because diagnoses for large graphs werenot calculated; diagnosis under a ﬁve-second limit has beendiscussed in the previous sub-section.The evaluation results show that the initial operationstake hours, but they are performed as a batch process uponsystem initialization. Updates are then relatively fast, up totens of seconds. Queries other than diagnosis are within rea-sonable limits for a user as they take less than ﬁve secondson average, even for the largest issue graphs. However,the tests with Sizeable issue graphs shows that operationsare fast and even diagnoses are then feasible as discussedabove. Although we did not measure the time required forauthorization and visualization in the Jira plugin, we havenot experienced any signiﬁcant delays.

Validation study design

We validated the artifact by inter-viewing ﬁve of TQC’s Jira users: two release managers, onesoftware architect, one product manager, and one developer.We interviewed each respondent individually following asemi-structured approach consisting of an introduction, is-sue dependencies especially by visualization, consistencycheck and diagnosis, dependency proposals, and updateparts. We had prepared and printed a set of slides but onlysome example screenshots and diagrams were shown to therespondents on paper when they were needed to explainsomething, and the slides contained the questions to whichthe interviewers sought answers [30]. Each respondent hadbeen instructed to use the system beforehand. During theinterviews they were asked to use a shared meeting roommonitor to demonstrate and explain the tasks, while inter-viewers voice-recorded and took notes of the process. Results.

The users appreciated very different functionali-ties, although they understood that the other functionalitiescould be important for other roles or tasks. For example,two users considered ﬁnding duplicates the key function-ality while the others did not consider duplicate detectionrelevant to their daily work. The duplicate detection wasalso considered important for large projects and less so insmall projects. The existing dependencies and larger issuegraphs are especially important and challenging for R&Dteam lead and product managers who valued visualization.A user summarized vividly: ”

Using Jira is like looking througha keyhole ”.Although our solution relies on data projection fromJira that can be out of sync when issues are updated, theusers commented that even day-old information is usable,although a practical update interval should be from a fewminutes up to an hour, especially during the busy daysbefore a release.

Issue graphs.

The respondents liked the p -depth issuegraph and its visualizations as a means of capturing infor-mation with a glance. The users considered depths 2–4 mostrelevant – a 5-depth issue graph already showed too muchinformation to the users. One user discussed representing

7. https://github.com/ESE-UH the parent-child hierarchy better while acknowledging thatit is difﬁcult to visualize without ending up with a verywide view and being a very implementation-speciﬁc chal-lenge. Likewise, another user mentioned a release as anotherrelevant viewpoint. The users also commented on the userinterface. A recurring comment concerned adding moreinformation, such as tooltips or additional information byhovering the mouse cursor.

Dependency detection.

Finding duplicated issues was con-sidered the most practical technique although other types ofmissing dependencies were also acknowledged. The usersfelt that detection could take place in different phases andtasks mentioning creating, triaging, resolving, and manag-ing issues, and making releases. The time around releases isespecially critical for ﬁnding duplicates, although the earlierthe duplicates are found the better, especially if the reportedissue turns out to be a blocker. Nobody considered falsepositive or incorrect proposals to be a problem becausea proposal needs to be checked manually anyway, andproposals can always be disregarded – false negatives orundetected proposals were considered much more inconve-nient. In particular, one user noted that duplicate detectioncould also be used to ﬁnd similar older issues in order toﬁnd out how they were resolved or who resolved themso that users could be asked for help or even to resolvesimilar open issues. Our solution to store rejected depen-dency proposals and not show them again to any userwas considered possible, although a more delicate approachcould be applied. That is, a rejection decision is context-and sometimes user-speciﬁc and it should be possible torevise the decisions. In particular, if an issue is changed,the rejection decision should be re-evaluated. Additionaldesired functionality was that the detectors should detect ifissues have changed and the existing dependency betweenthem has become obsolete. In contrast, predicting the type ofdependency was not considered important or even feasible.

Consistency check.

The users considered consistencychecking to be relevant, especially in larger projects wherethe complexity and size of issue dependency network havegrown. Such a large project at TQC contains several parallelversions and multiple R&D teams. In small projects, theusers did not consider consistency checks necessary be-cause the users can manage consistency manually. One userreported that, on one hand, the consistency check wouldbe more valuable if the processes inside TQC were morerigorous and issues contained fewer inconsistencies. On theother hand, he reckoned that the consistency check hasthe potential to improve the processes if inconsistencies orincorrect information can be made more visible. This couldalso make it possible to more reliably check cross-projectdependencies. A challenge for consistency check was saidto be the time-boxed releases where the release is oftenset to the issues only after the resolving solution is ready– if at all. Thus, for detected inconsistencies in issues, thecorresponding resolving solutions need to be checked andmight exist, meaning that a cause of the inconsistency issometimes in the correspondence between Jira issues andtheir resolving solution. The limitation of the consistencycheck to the ‘parent-child’, ‘requires’, and ‘duplicate’ depen-dencies was extensive enough. All respondents commentedthat only a general ‘relates’ dependency would also be useful but nothing additional was needed. Finally, otherchecks, such as identiﬁcation of cyclic dependencies, couldbe interesting but not yet clearly practically needed. ISCUSSION

RQ1. What drawbacks do stakeholders suffer with current issuetrackers?

The drawbacks about how users operate with issuetrackers to handle information in issues, which we capturedas part of RQ1, are especially relevant in the context of large,collaborative, long-lived projects. When focusing on the con-structs and the quality of the underlying issue dependencynetwork, large projects bring forward the limitations of thedata model, missing explicit dependencies, and inconsis-tencies. This results in an incomplete broader view, whichis critical for complex tasks like product management. Thenumber of issues, potential dependencies and stakeholdersinvolved, all of them in constant change, raise the complex-ity. However, and as a consequence of this complexity, cap-turing all dependencies and having full consistency areelusive targets and even based on subjective and contex-tual judgment — issues are not a static speciﬁcation buta constantly evolving network of things to be done . Thus,the drawbacks need to be mitigated rather than resolved.Therefore, it is important to provide users with useful infor-mation and practical support features, rather than aimingat fully automatic decision making. It is remarkable thatdrawbacks are not necessarily TQC or even Jira speciﬁc,but can appear in the use of other issue trackers, or othersystems for similar use, although appearing predominantlyin the aforementioned large project contexts.

RQ2. What features can be added to issue trackers to addressthese drawbacks?

Our solution proposal of issue graphs forms a parallel,automatically constructed structure to the data available inJira, which enables more efﬁcient dependency managementand visualization. Beyond the focus on the life-cycle of asingle issue, we proposed to treat dependencies as ﬁrst classentities with their own properties, which are usable, e.g., independency detection. We used issue ( r ) centered, bottom-up p -depth issue graphs ( G p ) as the principal contextualstructure for analysis and users. However, future workcan allow other partial issue graphs and better emphasizeexisting hierarchies between issues.Regarding the extension techniques, the detection tech-niques aim to assist users with simple but effective algo-rithms that operate with large data sets. A quintessentialsystem-view is added by contextualization that combinesproposals, considers them in a context of existing issuegraphs and issue properties, and manages rejected depen-dencies. While the quite simple but holistic solution ap-peared valuable, bringing forward many practical conse-quences, the solution can be further improved by morereﬁned rejection handling and adding other — more ad-vanced — detection techniques and algorithms, which can then require a different aggregation approach. Another de-sired improvement is explainability to detection techniques,pointing out why a proposal was made.Regarding the consistency check and diagnoses, ratherthan to achieve full consistency, the practical value of thesetechniques is to make inconsistencies in an issue graph vis-ible. This improves the transparency and control of the de-velopment process and can even induce processes improve-ments. To this end, we did not focus on fully automateddecision making, but on providing users with assistanceduring the consistency check process within a speciﬁed( G p =50 ) context of analysis rather than a full analysis ofall inconsistencies, which might not be relevant or evenpractical information. Among the main future challenges aremore suitable and efﬁcient algorithms for diagnosis but alsoa study of other analyses, such as redundant dependencies,including their practical value. RQ3. How can these features be integrated in an issue tracker ina way that it has value for use?

The Jira plugin and microservice based architecture wedepicted in RQ3 addresses practical implementation and useconcerns. This plugin technology facilitates compatibility,security, and usability in the context of TQC’s Jira. How-ever, TQC’s Jira is standard deployment and, apart fromthe integration microservice (Milla), other microservices areindependent of Jira, providing good maintainability, porta-bility, and compatibility. The system should be deployablebeyond TQC’s Jira to other Jira installations, and with minormodiﬁcations even to other issue trackers and even othersystems, such as requirements management, backlog, orroadmapping systems. In fact, we have already prototypedthe same microservices in a research prototype. Likewise,we have prototyped two other, more advanced detectorswithin the system, which turned out to be too unreliable.On the one hand, a solely plugin-based design could bedone for a smaller data-set but the design would have beenvery Jira-speciﬁc, resulting in an inefﬁcient and more com-plex design. On the other hand, we had the microservicesactually operational without plugin technology but theythen could not handle the private issues, write decisions toJira, or integrate the user interface with Jira. Such an inde-pendent tool from Jira was considered to have little practicalvalue for TQC. The projection of data was another keydesign decision that allowed us to separate batch processesand user queries. This was needed for the microservice-based solution and beneﬁcial for efﬁciency while the dis-advantages were within users’ acceptance limits.Besides the aforementioned improvements to the solu-tion, certain design improvements could be considered. Ourprimary focus was not on graphical design and usability,both of which can be improved. Additionally, the system’susability could be further improved through integrationwith existing dashboards, rather than being in a separateplugin.

We analyze the threats to validity according to the fourcategories proposed by [31] on experimental research. Construct validity refers to proper conceptualization ortheoretical generalizations. This study focused on tool (Jira)improvement rather than process improvements. Our con-ceptualization is based on a few stakeholders and, as notedin the validation interviews, their needs differ. One threatis whether we conceptualized the problem correctly andanother whether we focused on a relevant problem of thecase company. However, the respondents were highly ex-perienced, they were several of them, the researcher hada prolonged engagement with the problem as the processlasted a reasonably long time, and the problems the expertsraised were also evident in the data. Furthermore, the resultscause no harm either as they aim to help and do not disturbexisting ways of working. In our solution development,we relied on hand-picked examples. In order to alleviatepotential threats with the selection of the examples, weestablished good communication with TQC’s stakeholders.In eliciting the drawbacks in RQ1, we used interviews thatwere carefully designed and piloted. This helped us toassess which issues would be suitable to serve as examplesfor our research. However, the evaluation iterated throughall public data, with the exception of cross-validation, thusnot limiting ourselves to the hand-chosen examples.

Internal validity refers to inferences about whether thepresumed treatment and the presumed outcome reﬂect acausal relationship between them. Our solution aims toaddress drawbacks that have been acknowledged before-hand by the stakeholders. Thus, the knowledge claim isabout whether the suggested solution, i.e., techniques im-plemented and integrated to Jira, help in addressing thedrawbacks. The solutions were validated with TQC’s Jirausers to check that they were actually applicable to tacklethe drawbacks. However, a limitation is that the Jira userstesting our system used real data but did not test the systemextensively in their daily work.

External validity concerns whether our knowledge claimscould be generalized beyond the TQC environment. Weconsider TQC as a good case for research due to its large,standard Jira and open source practices. Thus, there isa high probability that the solutions could be applicablein other environments as well. However, TQC’s Jira is afairly mature and complex environment, and the drawbacksand our solutions reﬂect this. Although our solutions maytechnically work in less complex environments, it is notcertain that they would be equally valuable. In terms ofthe mutability of the artifact, we intentionally constructedthe solution to be ﬂexibly adaptable to new algorithms andmicroservices. Interviews with a few selected users do notfully compare to full-scale use in practice. This is notable,as the generalizability of the artifact is, in addition to itsapplicability to the drawbacks themselves, also dependenton whether the solution is accepted by the users. This isdifﬁcult to assess with only a few interviewees, and mightcome down to, for example, whether or not the users aresatisﬁed with the artifact and its microservices in the longrun, and not just initially.

Feature extension of traditional issue trackers in open-source context.

Several studies have focused on analyzing the main challenges raised by the use of traditional issuetrackers in open-source environments. Bertram et al. [1]reported a list of seven design consideration features forissue trackers based on a qualitative study of their maindrawbacks, including (i) providing customizable featuresfor the visualization of issues data and their relations, and(ii) the simpliﬁcation of tagging and reporting complexissue properties such as ‘requires’ or ‘duplicates’ relations,opening the door to automated features for the autonomousdetection of these properties. Baysal et al. [32] ran a qualita-tive analysis through 20 personal interviews with Bugzillacommunity stakeholders. From these interviews, they iden-tiﬁed that developers faced difﬁculties managing large issuerepositories due to the constant ﬂow of data (e.g. new issues,comments, reported dependencies) and the lack of supportfor ﬁltering, visualizing and managing changes in the issuedependency network. Heck and Zaidman [33] studied aset of 20 open-source GitHub projects, from which theyhighlighted the management of duplicated issues, as wellas the visualization of the issues and issue dependenciesas two of the most critical challenges for software develop-ers. However, these contributions are limited to providinggeneral highlights to key challenges and features for issuemanagement tasks, rather than designing and depictingconcrete, detailed processes or theoretical models for thepractical application of these features.

Modeling and visualization of the issue dependencynetwork.

Both Baysal et al. and the Heck and Zaymanstudies mentioned above highlight visualization of the issuedepdendency network beyond the single-issue perspectiveThe latter narrowly depicts a modeling and visualizationproposal based on the Bug Report Network (BRN) proposedby Sandusky et al. [34], where an issue dependency networkis represented as a tree of issues linked by their relations(including dependencies and duplicate relationships). The swarmOS Analyzer Jira plugin delivers a practical solutionfor representing the issue dependency network as an issuegraph. Despite its ﬁltering and classiﬁcation features, itlacks advanced visualization tools to enable large projects tosimplify and adapt the context of visualization to a speciﬁcissue or sub-set of issues.

Dependency detection and duplicate detection in is-sue management.

Although requirements for traceabilityand dependency management are largely addressed by thestate-of-the-art, very few are focused on the issue trackerdomain. Borg et al. [35] conducted a systematic mappingof information retrieval techniques for traceability and arti-fact dependencies in software projects. But even among 79related publications, most of them were limited to a proof-of-concept solution with a reduced sample validation withpartial quality metrics like precision or recall, in a validationscenario of no more than 500 artifacts. Despite the existenceof supporting tools like Jira plug-ins for the visualizationof issue dependency trees, like SwarmOS Analyzer or VividTrace , apparently there are no popular examples of plug-ins or tools for the autonomous detection of dependenciesor cross-references among issues in an issue repository.On the other hand, managing and detecting duplicated

8. https://marketplace.atlassian.com/apps/1217806/9. https://marketplace.atlassian.com/apps/1212548/ issues is a well-known problem considered critical by sev-eral studies when managing issues with issue trackers [36],[37], [38]. Ellmann [39] deﬁnes a theoretical backgroundfor the potential of state-of-the-art natural language andmachine learning techniques to extend issue trackers withautomated duplicate detection. However, no artifact norpractical implementation is reported. The Find Duplicates Jira plug-in uses similar techniques to those reported byEllmann to extend search features from Jira by reportingpotential duplicates at report time or run queries to ﬁndrelated issues. Nevertheless, these tools do not provide validknowledge about the scalability of these solutions for largedata-sets, as the emphasis is on proof-of-concept evaluation.Instead, they offer centralized server-side extensions for Jiraenvironments with few details from a software architecturepoint of view, which makes them less suitable for large data-sets.

Consistency checking and repair of releases.

As re-ported in Section 5.3, literature on release planning for issuemanagement is especially focused on autonomous releaseplan generation, rather than consistency checking and repairof releases [5] [6]. As a consequence, it is difﬁcult to ﬁndrelated work focused on the analysis and diagnoses ofrelease planning in the issue tracker domain. If we focuson tool support examples, in addition to the visualizationof issue dependencies, the

Vivid Trace

Jira plugin uses thisfeature to provide deep dependency analysis capabilitiesfocused on visual representation, monitoring of chains ofevents and the detection of potential blockers or conﬂictsamong the dependencies.

ONCLUSIONS

We have presented an approach that addresses drawbacksin issue dependency network. The contributions are inapplied Design Science research in the context of use ofissue tracker in large projects that QTC’s Jira concretize. Thebasis of the solution is having issues and dependencies asseparate objects and automatically constructing a comple-mentary issue graph. Dependency detection complementan issue graph by proposing missing dependencies andconsistency check identiﬁes incorrectness in an issue graph.The results show how to adopt the technologically quitestraightforward techniques to a complex collaborative issuetracker use context and a large data-set, taking into consider-ation the integrated system concern, practical applicability,and inherent incompleteness of issue data. The system is notyet in active use because it is a research prototype withouta guarantee of technical support and maintenance for TQC.However, TQC has expressed interest to have the system inoperational use and the results can be generalized beyondTQC. Issue trackers still remain a little-researched area al-though they are prevalent in open source communities, andwidely used in other organizations. More research on issuetrackers is needed, including studies on how they are usedand adding intelligence to their functionalities.

10. https://marketplace.atlassian.com/apps/1212706/ R EFERENCES [1] D. Bertram, A. Voida, S. Greenberg, and R. Walker, “Communica-tion, collaboration, and bugs: The social nature of issue tracking insmall, collocated teams,” in

ACM Conference on Computer SupportedCooperative Work , 2010, p. 291–300.[2] T. F. Bissyand´e, D. Lo, L. Jiang, L. R´eveill`ere, J. Klein, and Y. L.Traon, “Got issues? Who cares about it? A large scale investigationof issue trackers from github,” in

IEEE International Symposium onSoftware Reliability Engineering , 2013, pp. 188–197.[3] P. Achimugu, A. Selamat, R. Ibrahim, and M. N. Mahrin, “A sys-tematic literature review of software requirements prioritizationresearch,”

Information and Software Technology , vol. 56, no. 6, pp.568–585, 2014.[4] R. Thakurta, “Understanding requirement prioritization artifacts:a systematic mapping study,”

Requirements Engineering , vol. 22,no. 4, pp. 491–526, 2017.[5] M. Svahnberg, T. Gorschek, R. Feldt, R. Torkar, S. B. Saleem, andM. U. Shaﬁque, “A systematic review on strategic release planningmodels,”

Information and Software Technology , vol. 52, no. 3, pp. 237– 248, 2010.[6] D. Ameller, C. Farr´e, X. Franch, and G. Ruﬁan, “A survey onsoftware release planning models,” in

International Conference onProduct-Focused Software Process Improvement , 2016, pp. 48–65.[7] ˚A. G. Dahlstedt and A. Persson,

Engineering and Managing SoftwareRequirements . Springer, 2005, ch. Requirements Interdependen-cies: State of the Art and Future Challenges, pp. 95–116.[8] G. Chrupala, “Learning from evolving data streams: Online triageof bug reports,” in

Conference of the European Chapter of the Associa-tion for Computational Linguistics , 2012, p. 613–622.[9] S. A. Karre, A. Shukla, and Y. R. Reddy, “Does your bug trackingtool suit your needs? A study on open source bug tracking tools,”

CoRR , vol. abs/1706.06799, 2017.[10] S. Gregor, “The nature of theory in information systems,”

MISQuarterly , vol. 30, no. 3, pp. 611–642, 2006.[11] K. Peffers, T. Tuunanen, M. Rothenberger, and S. Chatterjee, “Adesign science research methodology for information systemsresearch,”

Journal of Management Information Systems , vol. 24, no. 3,pp. 45–77, 2007.[12] D. Fucci, C. Palomares, X. Franch, D. Costal, M. Raatikainen,M. Stettinger, Z. Kurtanovic, T. Kojo, L. Koenig, A. Falkner,G. Schenner, F. Brasca, T. M¨annist¨o, A. Felfernig, and W. Maalej,“Needs and challenges for a platform to support large-scalerequirements engineering: A multiple-case study,” in

ACM/IEEEInternational Symposium on Empirical Software Engineering and Mea-surement , 2018.[13] ISO/IEC, “Systems and software engineering — systems andsoftware quality requirements and evaluation (square) — systemand software quality models ISO/IEC 25010,” Tech. Rep., 2011.[14] J. Guo, J. Cheng, and J. Cleland-Huang, “Semantically en-hanced software traceability using deep learning techniques,” in

IEEE/ACM International Conference on Software Engineering , 2017,pp. 3–14.[15] G. Deshpande, Q. Motger, C. Palomares, I. Kamra, K. Biesialska,X. Franch, G. Ruhe, and J. Ho, “Requirements dependency extrac-tion by integrating active learning with ontology-based retrieval,”in

IEEE International Requirements Engineering Conference , 2020, pp.78–89.[16] O. Shahmirzadi, A. Lugowski, and K. Younge, “Text similar-ity in vector space models: A comparative study,”

ArXiv , vol.abs/1810.00664, 2019.[17] C. Sun, D. Lo, S.-c. Khoo, and J. Jiang, “Towards more accurateretrieval of duplicate bug reports,” in

IEEE/ACM InternationalConference on Automated Software Engineering , 11 2011, pp. 253–262.[18] Q. Motger, C. Palomares, and J. Marco, “Resim: Automated detec-tion of duplicated requirements in software engineering projects,”in

Working Conference on Requirements Engineering: Foundation forSoftware Quality , 03 2020.[19] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language under-standing,” in

Annual Conference of the North American Chapterof the Association for Computational Linguistics: Human LanguageTechnologies , 2019.[20] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee,and L. Zettlemoyer, “Deep contextualized word representations,”

Annual Conference of the North American Chapter of the Associationfor Computational Linguistics: Human Language Technologies , 2018. [21] B. Wang, A. Wang, F. Chen, Y. Wang, and C.-C. J. Kuo, “Evaluatingword embedding models: methods and experimental results,” APSIPA Transactions on Signal and Information Processing , vol. 8,2019.[22] R. Reiter, “A theory of diagnosis from ﬁrst principles,”

Artiﬁcialintelligence , vol. 32, no. 1, pp. 57–95, 1987.[23] A. Felfernig, S. Reiterer, F. Reinfrank, G. Ninaus, and M. Jeran,“Conﬂict detection and diagnosis in conﬁguration,” in

Knowledge-based conﬁguration: From research to business cases , A. Felfernig,L. Hotz, C. Bagley, and J. Tiihonen, Eds. Morgan Kaufman, 2014,pp. 73–87.[24] P. Carlshamre, K. Sandahl, M. Lindvall, B. Regnell, and J. Natt ochDag, “An industrial survey of requirements interdependencies insoftware product release planning,” in

IEEE International Sympo-sium on Requirements Engineering , 2001, pp. 84–91.[25] A. Felfernig, J. Sp¨ocklberger, R. Samer, M. Stettinger, M. Atas,J. Tiihonen, and M. Raatikainen, “Conﬁguring release plans,” in

International Workshop on Conﬁguration , vol. 2220 (CEUR workshopproceedings), 2018, pp. 9–14.[26] A. Felfernig, M. Schubert, and C. Zehentner, “An efﬁcient diagno-sis algorithm for inconsistent constraint sets,”

Artiﬁcial Intelligencefor Engineering Design, Analysis and Manufacturing , vol. 26, no. 1,pp. 53–62, 2012.[27] C. Quer, X. Franch, C. Palomares, A. Falkner, A. Felfernig, D. Fucci,W. Maalej, J. Nerlich, M. Raatikainen, G. Schenner et al. , “Rec-onciling practice and rigour in ontology-based heterogeneousinformation systems construction,” in

IFIP Working Conference onThe Practice of Enterprise Modeling , 2018, pp. 205–220.[28] C. L ¨uders, M. Raatikainen, J. Motger, and W. Maalej, “OpenReqissue link map: A tool to visualize issue links in Jira,” in

IEEEInternational Requirements Engineering Conference , 2019.[29] C. Prud’homme, J.-G. Fages, and X. Lorca,

Choco Documentation

Case Study Research and Applications (6th Edition) . Thou-sand Oaks: SAGE, 2018.[31] W. R. Shadish, T. D. Cook, D. T. Campbell et al. , Experimental andquasi-experimental designs for generalized causal inference/William R.Shedish, Thomas D. Cook, Donald T. Campbell.

Boston: HoughtonMifﬂin, 2002.[32] O. Baysal, R. Holmes, and M. W. Godfrey, “No issue left behind:Reducing information overload in issue tracking,” in

ACM SpecialInterest Group on Software Engineering International Symposium onFoundations of Software Engineering , 2014, pp. 666–677.[33] P. Heck and A. Zaidman, “An analysis of requirements evolutionin open source projects: Recommendations for issue trackers,” in

Workshop on Principles of Software Evolution , 2013, p. 43–52.[34] R. J. Sandusky, L. Gasser, and G. Ripoche, “Bug report networks:Varieties, strategies, and impacts in a F/OSS development com-munity.” in

MSR , 2004, pp. 80–84.[35] M. Borg, P. Runeson, and A. Ard¨o, “Recovering from a decade: asystematic mapping of information retrieval approaches to soft-ware traceability,”

Empirical Software Engineering , vol. 19, no. 6, pp.1565–1616, 2014.[36] A. Alipour, A. Hindle, and E. Stroulia, “A contextual approachtowards more accurate duplicate bug report detection,” in

WorkingConference on Mining Software Repositories , 2013, pp. 183–192.[37] A. P. Kshirsagar and P. R. Chandre, “Issue tracking system withduplicate issue detection,” in

International Conference on Computerand Communication Technology , 2015, p. 41–45.[38] J. Deshmukh, K. Annervaz, S. Podder, S. Sengupta, andN. Dubash, “Towards accurate duplicate bug retrieval using deeplearning techniques,” in

IEEE International Conference on SoftwareMaintenance and Evolution , 2017, pp. 115–124.[39] M. Ellmann, “Natural language processing (nlp) applied on issuetrackers,” in

International Workshop on NLP for Software Engineering ,2018, pp. 38–41. A CKNOWLEDGMENTS

The work presented in this paper has been conducted withinthe scope of the Horizon 2020 project OpenReq, which issupported by the European Union under the Grant Nr.732463. We are grateful for the provision of the Finnishcomputing infrastructure to carry out the tests (persistentidentiﬁer urn:nbn:ﬁ:research-infras-2016072533).

Mikko Raatikainen received his PhD in computer science and engi-neering from Aalto University. He is a researcher in University of Helsinkiof the empirical software engineering research group. His research in-terests include empirical research in software engineering and business.

Quim Motger is a PhD student at Universitat Polit`ecnica de Catalunya(UPC). He is a member of the UPC research group on software andservice engineering. His research focuses on natural language process-ing, machine/deep learning software systems and web-based softwarearchitecture environments.

Clara Marie L ¨uders is a PhD student at University of Hamburg (UHH).She is a member of the UHH research group on applied softwaretechnology. Her research focuses on machine/deep learning, naturallanguage processing, Issue Tracking Systems, and graph theory.

Xavier Franch received his PhD from the Universitat Polit`ecnica deCatalunya (UPC). He is a full professor in UPC where he leads the re-search group on software and service engineering. His research focuseson requirements engineering and empirical software engineering. He isassociated editor in IST, REJ and Computing, and J1 chair at JSS.

Lalli Myllyaho is a PhD student at University of Helsinki (UH). Withbackground in mathematics and teaching, he is a member of the em-pirical software engineering group at UH. His current interests includereliability and operations of machine learning systems.

Elina Kettunen received her PhD in plant biology and her Master’sdegree in computer science from the University of Helsinki. Her researchinterests include empirical software engineering and paleobotany.

Jordi Marco received his Ph.D. from Universitat Polit`ecnica deCatalunya (UPC). He is an Associate Professor in Computer Scienceat the UPC and member of the software and service engineering group(GESSI). His research interests include natural language processing,machine learning, service-oriented computing, quality of service andconceptual modeling.

Juha Tiihonen received his PhD in computer science and engineeringfrom Aalto University. His research interests include conﬁguration sys-tems and processes for physical, service, and software products. Thiswork was performed at University of Helsinki. He is currently the leaddeveloper of sales conﬁguration systems at Variantum oy.

Mikko Halonen is B.Sc (Automation Eng. Tech.) from Technical Collegeof Oulu. He currently works as a quality manager in The Qt Company.