[PDF] Automatic Detection and Resolution of Software Merge Conflicts: Are We There Yet?

Abstract

Developers create software branches for tentative feature addition and bug fixing, and periodically merge branches to release software with new features or repairing patches. When the program edits from different branches textually overlap (i.e., textual conflicts), or the co-application of those edits lead to compilation or runtime errors (i.e., compiling or dynamic conflicts), it is challenging and time-consuming for developers to eliminate merge conflicts. Prior studies examined %the popularity of merge conflicts and how conflicts were related to code smells or software development process; tools were built to find and solve conflicts. However, some fundamental research questions are still not comprehensively explored, including (1) how conflicts were introduced, (2) how developers manually resolved conflicts, and (3) what conflicts cannot be handled by current tools. For this paper, we took a hybrid approach that combines automatic detection with manual inspection to reveal 204 merge conflicts and their resolutions in 15 open-source repositories. %in the version history of 15 open-source projects. Our data analysis reveals three phenomena. First, compiling and dynamic conflicts are harder to detect, although current tools mainly focus on textual conflicts. Second, in the same merging context, developers usually resolved similar textual conflicts with similar strategies. Third, developers manually fixed most of the inspected compiling and dynamic conflicts by similarly editing the merged version as what they did for one of the branches. Our research reveals the challenges and opportunities for automatic detection and resolution of merge conflicts; it also sheds light on related areas like systematic program editing and change recommendation.

Full PDF

AAutomatic Detection and Resolution of Software MergeConflicts: Are We There Yet?

Bowen Shen [email protected] Polytechnic Institute and State University, USA

Cihan Xiao [email protected] Polytechnic Institute and State University, USA

Na Meng [email protected] Polytechnic Institute and State University, USA

Fei He [email protected] University, China

ABSTRACT

Developers create software branches for tentative feature additionand bug fixing, and periodically merge branches to release soft-ware with new features or repairing patches. When the programedits from different branches textually overlap (i.e., textual con-flicts ), or the co-application of those edits lead to compilation orruntime errors (i.e., compiling or dynamic conflicts ), it is challengingand time-consuming for developers to eliminate merge conflicts.Prior studies examined how conflicts were related to code smellsor software development process; tools were built to find and solveconflicts. However, some fundamental research questions are stillnot comprehensively explored, including (1) how conflicts wereintroduced, (2) how developers manually resolved conflicts, and (3)what conflicts cannot be handled by current tools.For this paper, we took a hybrid approach that combines auto-matic detection with manual inspection to reveal 204 merge con-flicts and their resolutions in 15 open-source repositories. Our dataanalysis reveals three phenomena. First, compiling and dynamicconflicts are harder to detect, although current tools mainly focuson textual conflicts. Second, in the same merging context, develop-ers usually resolved similar textual conflicts with similar strategies.Third, developers manually fixed most of the inspected compilingand dynamic conflicts by similarly editing the merged version aswhat they did for one of the branches. Our research reveals the chal-lenges and opportunities for automatic detection and resolution ofmerge conflicts; it also sheds light on related areas like systematicprogram editing and change recommendation.

CCS CONCEPTS • Software and its engineering → Software maintenance tools ; Maintaining software ; Software evolution . KEYWORDS

Empirical, software merge, conflict detection, conflict resolution “Integration Hell” refers to the scenarios when developers integrateor merge a big chunk of code changes at the last minute beforedelivering a software product [3]. In traditional software develop-ment environments, this integration process is rarely smooth andseamless, but results in conflicts , which can take developers hoursor perhaps days in fixing the code so that it can finally integrate [7].To avoid “Integration Hell”, more and more developers have re-cently adopted Continuous Integration (CI) to integrate code more frequently (e.g., several times a day), and to verify each integrationvia automated builds (i.e., compilation and testing) [33, 35].Nevertheless, CI practices do not eliminate the challenges posedby merge conflicts. Instead, developers still mainly rely on the mergefeature of version control systems (e.g., git-merge [9]) to automati-cally (1) integrate branches and (2) reveal any conflict for manualresolution. However, such text-based merge usually produces lots offalse positives and false negatives . For example, when two branchesreformat the same line in divergent ways (e.g., add vs. delete awhitespace), git-merge simply reports a textual conflict althoughsuch conflicts are unimportant and cause no difference syntacticallyor semantically. Meanwhile, when two branches edit different linesand modify the program semantics or logic in conflicting ways,git-merge silently applies both edits with no conflict reported.Tools were proposed to improve over text-based merge [11–13, 20, 38]. Specifically, FSTMerge reduces false positives by mod-eling Java program entities (e.g., classes, methods, and fields) asunordered leaf nodes, and resolving textual conflicts when twoedits insert unrelated declarations at the same location [12]. JDimeremoves fake conflicts by modeling both program entities and state-ments in its tree representation, and by applying tree matching andamalgamation algorithms to resolve conflicts [11, 20]. Given textu-ally conflicting edits, AutoMerge uses Version Space Algebra (VSA)to enumerate all possible combinations of the edit operations fromboth sides and to recommend alternative resolutions [38]. How-ever, none of these tools help reduce the false negatives of text-basedmerge . Additionally, Crystal proactively monitors for developers’program commits in separate branches [13]. By tentatively merg-ing the latest commits between branches and building the mergedsoftware, Crystal notifies developers of higher-order conflicts, i.e.,the changes that are semantically incompatible.Despite existing tool support, our knowledge of software mergeconflicts is still limited. For instance, although some tools (e.g.,JDime and AutoMerge) refine and resolve the conflicts reportedby text-based merge, it is still unknown whether these tools canresolve all true conflicts fully automatically. Even though Crystalcan reveal different conflicts from git-merge, it is still unknownwhether Crystal can capture all the conflicts missed by git-merge.

Having a better understanding of merge conflicts and their resolutionsis crucially important for two reasons.

First, by exploring the gapbetween challenges of merge conflicts and capabilities of existingtools, we can shed light on future tools to better aid developers. Sec-ond, by characterizing the limit of automatic tool support, we can a r X i v : . [ c s . S E ] M a r owen Shen, Cihan Xiao, Na Meng, and Fei He Merging Scenarios from Open Source Repositories Can we merge branches with git-merge? Can we merge branches with JDime? N Y Does the merged software compile? Y N Inspection for textual/compiling/dynamic conflicts N Inspection for compiling conflicts Does the merged software pass all tests? Y Inspection for dynamic conflicts N

Automatic Manual

Figure 1: Workflow of our hybrid approach to study merge conflicts and their resolutions design better human-in-the-loop approaches to focus developers’manual effort on the most important and challenging conflicts.For this paper, we conducted a comprehensive in-depth investi-gation on software merge conflicts and their resolutions. Becausewe are unsure whether existing tools can detect all kinds of con-flicts, we took a hybrid approach that combines automatic toolswith manual inspection. Specifically, as shown in Figure 1, we firstapplied both git-merge and JDime to the merging scenarios in 35open-source repositories. Given a scenario with two branches, ifneither tool could merge the branches, we examined related editsto identify various conflicts with our best effort. Otherwise, if thegiven pair is automatically mergeable, we further used automaticcompilation and testing to identify any higher-order conflict. Withthis approach, we collected 100 textual conflicts, 100 compilingconflicts (i.e., incompatible changes causing compilation errors),and 4 dynamic conflicts (i.e., incompatible changes triggering ab-normal program behaviors). We contrasted these conflicts with thecapabilities of existing tools, and characterized the root cause andresolution of each conflict quantitatively and qualitatively.By analyzing many conflicts unrevealed before, our researchuncovers many interesting findings that have not been previouslyreported. The major findings are summarized as follows. • How were conflicts introduced?

51% of textual conflictswere caused by the contradictory statement updates betweenbranches. 93% of compiling conflicts occurred when onebranch adds one or more references to a program entity thatis updated, removed, or replaced in another branch. 75% ofdynamic conflicts happened because the test oracle added byone branch does not correspond to the code implementationupdated by the other branch. • How did developers manually resolve such conflicts?

For 86% of textual conflicts, developers resolved conflicts by(1) keeping the changes from one branch or (2) combiningpart or all of the edits from both sides. For compiling anddynamic conflicts, developers never purely combine editsfrom both branches; instead, they applied extra edits to themerged software such that all similar code locations weremodified consistently. • What conflicts cannot be handled by current tools?

Inour data set, 79% of compiling conflicts and 75% of dynamicconflicts were not reported or reflected by any explored au-tomatic approach, let alone to be resolved automatically. Al-though some tools could suggest resolutions for 92% textual conflicts and 25% dynamic conflicts, they rarely guaranteethe correctness of their suggestions.

Importantly, we found that although many conflicts cannot be detectedor resolved by existing tools, the conflicts were introduced for typicalreasons and developers took certain ways to resolve those conflicts.

Our study will enlighten future software merge tools, and suggestfuture research directions in related areas like automatic programtransformation and change recommendation.

This section defines the terminology used in this paper (Section 2.1),and introduces the software merge tools we explored (Section 2.2).

When developers merge two branches (e.g., 𝑏 𝑙 and 𝑏 𝑟 ) in a softwarerepository, there can be three types of conflicts. Changes in b l Changes in b r (a) Textual conflict - private int a = 4; + private int a = 40; - private int a = 4; + private int a = 20; (b) Compiling conflict - public class A{...} + public class A extends B{...} - public class

B{...} + public class

B2{...} (c) Dynamic conflict y = x; - if (y < 13){... // x < 13} + if (y < 12) {... // x < 12} - y = x; + y=x+1; if (y < 13) {... // x < 12} Figure 2: Exemplar merge conflicts1. Textual Conflicts exist when 𝑏 𝑙 and 𝑏 𝑟 edit the same line oftext. As illustrated by Figure 2 (a), since 𝑏 𝑙 and 𝑏 𝑟 update the samestatement with conflicting values (e.g.,

40 vs. 20 ), there is a textualconflict between the branches.

2. Compiling Conflicts happen when (1) the edits of 𝑏 𝑙 and 𝑏 𝑟 do not have any textual conflict, and (2) the co-application of bothedits triggers a compilation error. As shown in Figure 2 (b), when 𝑏 𝑙 adds a reference to class B and 𝑏 𝑟 renames B to B2 , the integrationof these edits can make the B -reference unresolvable.

3. Dynamic Conflicts occur when (1) the edits of 𝑏 𝑙 and 𝑏 𝑟 donot have any textual conflict, and (2) the co-application of bothedits triggers a runtime error or unexpected program behavior . For utomatic Detection and Resolution of Software Merge Conflicts: Are We There Yet? Index Origin Version base Left Version b l Right Version b r Automatically Merged Version A m (a) String s = “hello”; String s = “hello”; String s = “bye”; String s = “bye”; (by git-merge) (b) for ( int i = 0; i < 10; i++) { for ( int i = 0; i < 5; i++) { for ( int i = 0; i < 20; i++) { -- (conflict detected by git-merge) (c) int a=b; int a = b; int a=b+c; int a=b+c; (by JDime) (d) … // unchanged code … // unchanged code public int foo() { … } … // unchanged code public int bar() { … } … // unchanged code public int foo() { … } (by JDime) public int bar() { … } (e) … // unchanged code public void set ( int a) { this .f = a; } … //unchanged code … // unchanged code public void set ( int a) { this .f = a * a; } public void set ( int a) { -- (conflict detected by JDime) } … // unchanged code Figure 3: Merging scenarios where 𝑏 𝑙 and 𝑏 𝑟 are different example, in Figure 2 (c), although the edits in 𝑏 𝑙 and 𝑏 𝑟 separatelysatisfy the invariant “ x < 12 ” inside the then -branch, applying bothof them dissatisfies the invariant. In our research, we directly adopted text-based merge (i.e., git-merge) and JDime to detect textual conflicts. We also mimicked theworkflow of Crystal to reveal compiling and dynamic conflicts.

Itis the default merge feature provided by various version controlsystems (e.g., Git and SVN). We used “git-merge” in our study,which conducts a three-way merge [26] to analyze and resolvedifferences between branches. Suppose that 𝑏 𝑙 and 𝑏 𝑟 derive fromthe same origin— 𝑏𝑎𝑠𝑒 . Whenever 𝑏 𝑙 and 𝑏 𝑟 have different text forthe same line of code, three-way merge further compares bothbranches with 𝑏𝑎𝑠𝑒 to decide which branch changed the line. Asshown in Figure 3 (a), if only one branch (i.e., 𝑏 𝑟 ) changes theline, the change is kept in the automatically merged software 𝐴 𝑚 ;otherwise, if both branches modify the line in divergent ways (seeFigure 3 (b)), three-way merge reports a textual conflict. Given a Java file,JDime creates a syntax tree by modeling declarations of programentities (imports, classes, methods, and fields) as unordered nodes ,and modeling Java statements as ordered tree nodes . JDime matchesunordered nodes based on the content and subtrees of each node; itmatches ordered nodes by also considering the sequential orderingbetween nodes.

Intuitively, given a textual conflict 𝐶 , if JDime canresolve 𝐶 via structured merge, it does not report the conflict; otherwise,JDime still reports 𝐶 .Theoretically, JDime can improve over git-merge in three ways.First, as JDime compares syntax trees to infer changes, it can sup-press the unimportant conflicts due to formatting changes. Asshown in Figure 3 (c), when 𝑏 𝑙 inserts whitespaces and 𝑏 𝑟 modifiesthe assignment logic, JDime simply keeps the logic change in 𝐴 𝑚 instead of reporting any conflict. Second, JDime compares entitydeclarations by ignoring their code locations. Therefore, when 𝑏 𝑙 and 𝑏 𝑟 insert distinct declarations at the same code location (asillustrated in Figure 3 (d)), JDime can resolve the textual conflict byinserting both declarations to 𝐴 𝑚 . Third, JDime can reveal someconflicts missed by git-merge. For instance, when 𝑏 𝑙 and 𝑏 𝑟 insert same-named entity declarations at different code locations (seeFigure 3 (e)), JDime is able to correlate these insertions and decidewhether the declarations contain any textual conflict. When developers work on different softwarebranches and commit program changes now and then, Crystalspeculatively merges the latest versions between branches andnotifies developers of potential conflicts, before developers conductany actual merge. As Crystal is an interactive tool that works whiledevelopers actively work on different software branches, it doesnot fit into our history-based merging scenario analysis. Therefore,we did not run Crystal, but followed Crystal’s three-step strategyto reveal different kinds of conflicts: • Step 1 : If 𝑏 𝑙 and 𝑏 𝑟 cannot be integrated via text-based merge,there is one or more textual conflicts. • Step 2 : If 𝑏 𝑙 and 𝑏 𝑟 compile and can be merged automatically,but the merged software 𝐴 𝑚 fails to compile, then there isat least one compiling conflict. • Step 3 : If 𝑏 𝑙 and 𝑏 𝑟 pass their separate test suites and can bemerged automatically, but the merged software 𝐴 𝑚 fails itstest suite, then there is at least one dynamic conflict.Our goal of trying these tools is not to evaluate their implemen-tation status. Instead, we focus on the limitation of their approachdesign. Assuming that existing methodologies are perfectly imple-mented, we were curious what are the conflicts that have not beautomatically handled yet and what is the missing tool support. As shown in Figure 1, our approach has two phases. Phase I usesautomatic approaches to detect conflicts or identify errors due toconflicts (Section 3.1). Phase II adopts manual inspection to examinethe merging commits reported by Phase I or reveal extra conflicts,creating a data set of conflicts and their resolutions (Section 3.2).

Given a software repository, for each merging commit or sce-nario (see Figure 4), there are two branches to merge—left ( 𝑏 𝑙 )and right( 𝑏 𝑟 ), the most recent common ancestor of those branches( 𝑏𝑎𝑠𝑒 ), and the merged version created by developers ( 𝑀 𝑚 ). With owen Shen, Cihan Xiao, Na Meng, and Fei He all merging scenarios identified in the version history, similar toCrystal, our approach took three steps to reveal conflicts. Origin ( base ) Left (b l ) Right (b r ) Manually merged ( M m ) Automatically merged ( A m ) Figure 4: Software versions related to a merging scenario

In Step 1, we first applied git-merge to 𝑏 𝑙 and 𝑏 𝑟 , in order togenerate an automatically merged version 𝐴 𝑚 (see Figure 4). Ifgit-merge failed to produce 𝐴 𝑚 due to textual conflicts in someJava files, we further applied JDime especially to those files, hopingto generate 𝐴 𝑚 successfully. We intentionally applied the toolsin this specific order for three reasons. First, git-merge comparesbranches faster while JDime better resolves conflicting Java edits.By focusing JDime on the textual conflicts that git-merge cannotprocess, our approach can efficiently detect true textual conflicts.Second, git-merge can propagate file-level operations (e.g., movingfiles across folders) to 𝐴 𝑚 , while JDime cannot. By using both tools,we ensure that the merged program is more likely to compile thanthe one generated solely by JDime. Third, we intended to revealany conflicts unsolvable by existing tools, so JDime was used as afilter to refine the textual conflicts reported by git-merge.In Step 2, we further compiled 𝑏 𝑙 , 𝑏 𝑟 and the 𝐴 𝑚 produced byStep 1. If both branches compile successfully but 𝐴 𝑚 fails to com-pile, we conclude that the branches have at least one compilingconflict; otherwise, if either branch does not compile, we skip thecommit. Currently, our research handles the programs compilableby Maven [4], Ant [8], or Gradle [2]. Notice that this step cannotdirectly pinpoint the location of any compiling conflict; instead, itpresents symptoms caused by those conflicts.In Step 3, we ran the compiled versions of 𝑏 𝑙 , 𝑏 𝑟 and 𝐴 𝑚 withtheir separate test suites. If both branches pass all tests while 𝐴 𝑚 fails any test, our approach concludes that the branches have at leastone dynamic conflict. Similar to Step 2, this step presents effects ofdynamic conflicts instead of showing the conflicts themselves. The ultimate goal of our manual inspection is to (1) reveal as manytrue conflicts as possible and (2) remove the false conflicts reportedby tools. However, for individual types of conflicts, our inspectionserves slightly different purposes.

Given a textualconflict reported by Step 1, we first determined whether the con-flict is a false positive. Due to some implementation issues, JDimesometime failed to eliminate the textual conflicts that its methodol-ogy is supposed to handle. Again, our study does not examine thelimitation of existing tool implementation. Instead, it explores theconflicts overlooked by current automatic approaches. Therefore,we manually filtered out the textual conflicts that should have been resolved by a perfect implementation of JDime. Additionally, sinceit is almost infeasible to include all true (i.e., unresolved) textualconflicts into our study, we gathered 100 samples of such conflictsfor further analysis. Especially, for each gathered sample, we com-pared 𝑏 𝑙 , 𝑏 𝑟 , and the manually merged version 𝑀 𝑚 to learn (1) howthe conflict was introduced and (2) how developers resolved it. Givenan error reported by Step 2 or Step 3, we identified the conflictby examining the error message, source code, and related editsfrom both branches. Because test failures are usually harder toreason about than compilation errors, our manual analysis doesnot guarantee to reveal the root cause for any observed test failure.Additionally, for each identified conflict, we also inspected thecorresponding resolution edits applied by developers in 𝑀 𝑚 . Based on our ex-perience, Step 2 and Step 3 did not generate many errors or failuresfor further examination. To collect more data, we also inspectedthe unmerged branches in Step 1 to recognize more conflicts.

Origin ( base ) Left (b l ) Right (b r ) Manually merged ( M m ) Δleft Δright Δmleft Δmright Δmleft – Δright = ? Δmright – Δleft = ?

Figure 5: Manual detection of compiling/dynamic conflicts

Our manual process is shown in Figure 5. Given a merging sce-nario where branches cannot be automatically merged, we identifythe edits that developers applied in individual branches as Δ 𝑙𝑒 𝑓 𝑡 and Δ 𝑟𝑖𝑔ℎ𝑡 . We also denote the difference between the left branchand the merged version as Δ 𝑚𝑙𝑒 𝑓 𝑡 , and the difference between theright branch and the merged version as Δ 𝑚𝑟𝑖𝑔ℎ𝑡 . Formally, 𝑀 𝑚 = 𝑎𝑝𝑝𝑙𝑦 ( 𝑎𝑝𝑝𝑙𝑦 ( 𝑏𝑎𝑠𝑒, Δ 𝑙𝑒 𝑓 𝑡 ) , Δ 𝑚𝑙𝑒 𝑓 𝑡 ) = 𝑎𝑝𝑝𝑙𝑦 ( 𝑎𝑝𝑝𝑙𝑦 ( 𝑏𝑎𝑠𝑒, Δ 𝑟𝑖𝑔ℎ𝑡 ) , Δ 𝑚𝑟𝑖𝑔ℎ𝑡 ) Ideally, if the two branches can be automatically integrated withoutany conflict, Δ 𝑚𝑙𝑒 𝑓 𝑡 = Δ 𝑟𝑖𝑔ℎ𝑡 and Δ 𝑚𝑟𝑖𝑔ℎ𝑡 = Δ 𝑙𝑒 𝑓 𝑡 . However, forthe scenarios where branches cannot be automatically merged, bycomparing Δ 𝑟𝑖𝑔ℎ𝑡 with Δ 𝑚𝑙𝑒 𝑓 𝑡 (or comparing Δ 𝑙𝑒 𝑓 𝑡 with Δ 𝑚𝑟𝑖𝑔ℎ𝑡 ),we can learn how developers manually resolved conflicts. For eachobserved difference, we speculated whether any compilation orruntime error would occur if developers had not made the observededit. Since we do not have sufficient knowledge for any projectunder study, our speculative analysis is limited to revealing obviouscompiling or dynamic conflicts.For one commit of elasticsearch [5], we compared the diff filesand found the following difference between Δ 𝑟𝑖𝑔ℎ𝑡 and Δ 𝑚𝑙𝑒 𝑓 𝑡 : < + i f ( parseContext . queryTypes ( ) . s i z e ( ) > 1 ) {−−−> + i f ( parseContext . shardContext ( ) . queryTypes ( ) . s i z e ( )> 1 ) { It means that to merge the branches, developers manually updateda method call (i.e., parseContext.queryTypes() ), which was added in utomatic Detection and Resolution of Software Merge Conflicts: Are We There Yet?

Table 1: Observed conflicts included in our sample data

Project Name orientdb 32 - -wildfly 7 *3 -pmd 13 - -lombok 13 - -bigbluebutton 7 - -cassandra 15 - -Activiti 13 8 -*1fastjson - 2 -javapoet - 1 -pebble - 4 1truth - 2 -vectorz - 2 -webmagic - 2 -nuxeo - *1 -elasticsearch - *74 *3

Sum 100 100 4 “-” means no conflict of certain type(s) is revealed or included in the data set.“*” implies that the reported conflicts were manually detected. Δ 𝑟𝑖𝑔ℎ𝑡 . To confirm that this update is mandatory for softwaremerge, we further checked the edits in Δ 𝑙𝑒 𝑓 𝑡 , finding that the re-lated method declaration was removed. This implies that withoutthe manual update, the two branches have at least one compilingconflict because 𝑏 𝑟 added a reference to a method removed by 𝑏 𝑙 .For this phase, we have three authors involved in manual inspec-tion. For each conflict included in our data set, we ensured that atleast two people examined and described the merging scenario andconflict resolution. When unsure about certain conflicts, we haddiscussions to achieve consensus. This section first introduces the open source projects studied and thedata set constructed (Section 4.1); it then explains the experimentdesign and findings for each research question (Section 4.2-4.4).

We experimented with the repositories of 35 open source projects.33 of the projects were included because they were mentionedin prior work [11, 22]. Besides, we also included nuxeo [6] and elasticsearch [1] because these projects are large and popular, con-taining many branches and merging scenarios. Table 1 shows theconflicts we found in these projects and included in our data set.As mentioned in Section 3, there are so many real textual conflictsrevealed by tools that it is infeasible to manually analyze all of them.Therefore, we sampled 100 of the revealed conflicts in 7 projects.Based on the compilation of automatically merged software ( 𝐴 𝑚 ),we only identified 21 compiling conflicts in 7 projects; among 5of these projects, automatic compilation could reveal at most 2conflicts for each project. In contrast, we manually inspected theautomatically unmergeable branches in four projects— Activiti , wildfly , nuxeo , and elasticsearch —and detected many more conflicts.For simplicity, we included 79 of the manually found conflicts,obtaining 100 compiling conflicts in total.We identified only four dynamic conflicts in total, one of whichwas revealed by automatic compilation and testing, while three were manually detected. Three reasons can explain the small num-ber of obtained conflicts. First, when both 𝑏 𝑙 and 𝑏 𝑟 can pass theirseparate test suites and 𝐴 𝑚 can compile, 𝐴 𝑚 always passes all tests.Second, when 𝐴 𝑚 fails a test, it is quite challenging to reason aboutthe root cause. We actually had three more merging scenarios where 𝐴 𝑚 failed at least one test. However, since we could not understandhow those test failures were related to software merge, we did notinclude them into our data set. Third, manually detecting dynamicconflicts is also very time-consuming and error-prone. Although weinspected the automatically unmergeable branches and developers’resolutions in one project— elasticsearch —we could rarely assesswhat runtime errors would occur had developers not applied theiredits in the merging commits. This section introduces our characterization for different kinds ofconflicts (Section 4.2.1-Section 4.2.3).

As shown in Table 2,we identified six reasons to explain how textual conflicts were intro-duced. Specifically, 51 conflicts happened when 𝑏 𝑙 and 𝑏 𝑟 updatedthe same statement(s) in distinct ways. 29 conflicts occurred be-cause a branch deleted certain statement(s) while the other branchupdated the same statement(s). 15 conflicts were introduced be-cause of conflicting statement insertion. For instance, suppose that 𝑏 𝑙 inserts the following if -statement: i f ( ! genericArgs . isEmpty ( ) ) {sb . d e l e t e ( sb . length ( ) − 3 , sb . length ( ) − 1 ) ;} while 𝑏 𝑟 inserts a similar if -statement at the same location: i f ( ! genericArgs . isEmpty ( ) ) {sb . r e p l a c e ( sb . length ( ) − 3 , sb . length ( ) − 1 , " " ) ;} The two statements are different because 𝑏 𝑙 invokes StringBuffer.delete(...) method to remove some characters, while 𝑏 𝑙 invokes a differ-ent method StringBuffer.replace(...) to achieve the same goal.

Table 2: Classification of textual conflicts

Idx Conflict Type (a) update-update delete-update insert-insert update-move delete-move move-move Additionally, 6 conflicts occurred because a branch updated astatement while the other branch moved the statement; 2 conflictshappened because one branch deleted a statement while the otherbranch moved the statement. Finally, one conflict was introduced be-cause both branches moved the same statement to different places.

Finding 1:

66% of textual conflicts happened when branchesapplied conflicting updates or insertions ; the remainingtextual conflicts occurred because one branch updated ormoved certain statements, while the other branch deleted those statements or moved the statements to a different place. owen Shen, Cihan Xiao, Na Meng, and Fei He

Table 3: Classification of compiling conflicts based on their introduction

Entity Conflict Type Description

Class- Referencing a missing class 𝐶 One branch adds a reference to 𝐶 , while the other branch renames, replaces, or removes 𝐶 , or removesthe import declaration of 𝐶 . 17related Importing a missing class 𝐶 One branch imports a class declaration of 𝐶 , while the other branch removes the dependency librarythat declares 𝐶 . 1Interface- Referencing a missing interface 𝐼 One branch adds a reference to 𝐼 , while the other branch renames 𝐼 . 1related Restructuring an interface 𝐼 One branch declares a new method in 𝐼 , while the other branch redefines the interface as an abstractclass. 2Enum-related Referencing a missing enum data type 𝐸 One branch adds a reference to 𝐸 , while the other branch remove 𝐸 or move it into a different Java class. 2Referencing a missing method 𝑀 One branch adds an invocation to 𝑀 , while the other branch (1) renames, replaces, removes 𝑀 , (2)removes the import declaration of 𝑀 , or changes the parameter list. 58Method- Invoking a method 𝑀 with an updatedreturn type One branch adds an invocation to 𝑀 , while the other branch changes the return type of 𝑀 . 2related Updating the parameter list of 𝑀 One branch updates one parameter, while the other branch adds a new parameter. 1Restructuring a method 𝑀 One branch changes the parameter list, while the other branch overrides 𝑀 . 2Field-related Referencing a missing field 𝐹 One branch adds a reference to 𝐹 , while the other branch renames or replaces 𝐹 . 12Var- Referencing a missing variable 𝑉 One branch adds a reference to 𝑉 , while the other branch removes 𝑉 . 1related Adding duplicated declarations forvariable 𝑉 Two branches separately add a declaration of 𝑉 at different locations. 1 We classified theobserved compiling conflicts based on two factors: (1) the majorentities involved, and (2) the edits producing the conflicts. As shownin Table 3, each conflict involves one of the following six programentities: classes, interfaces, enums, methods, fields, and variables. Inparticular, 63% of conflicts were related to methods, 18% of conflictswere relevant to classes, and 12% of conflicts were about field usage.Specifically among the method-related conflicts, 58 conflicts wereintroduced because a branch added a reference (invocation) tomethod 𝑀 , and the other branch voided the referenced methodby (1) renaming, removing, or replacing 𝑀 , or (2) changing theparameter list of 𝑀 . Similarly, among the conflicts related to otherentities, most conflicts occurred because of the references to missingentities. Such conflicts usually produce the compilation errors ofunresolvable referenced entities. public interface QueryBuilder extends

ToContent{ … } base public interface QueryBuilder extends

ToContent { QueryValidationException validation(); … } b l public abstract class QueryBuilder extends

ToContentToBytes { … } b r public abstract class QueryBuilder extends

ToContentToBytes { QueryValidationException validation(); … } A m Figure 6: A compiling conflict that produces an errorneousabstract class definition

In addition to the typical broken def-use relationship of entitiesmentioned above, some compiling conflicts can produce invalid en-tity definition. As illustrated by Figure 6, 𝑏 𝑙 declares a new method validation() in an existing interface QueryBuilder , while 𝑏 𝑟 convertsthe interface to an abstract class. Although the edits can be textuallyintegrated, the resulting software 𝐴 𝑚 has an errorneous abstract class declaration. Java compilers usually prompt developers to fixsuch errors by either adding method body for validation() or declar-ing it as an abstract method (i.e., using the annotation @abstract ). Finding 2:

93% of compiling conflicts were related to ei-ther Java classes, methods, or fields. 91% of conflicts wereintroduced because the conflicting edits broke the referencer-referencee relationship of entities.

We identified tworeasons to explain the four inspected dynamic conflicts.

1. Incorrect Test Oracle . Three conflicts occurred for this reason.As shown in Figure 7, 𝑏 𝑙 replaces the implementation of method getTemplate(...) such that a new exception RuntimePebbleException is thrown. Meanwhile, 𝑏 𝑟 adds some test cases, assuming that getTemplate(...) is unchanged and still throws the original excep-tion ParserException . Consequently, these test cases fail because theexpected exception type in the test oracle (i.e.,

ParserException ) doesnot match the actual thrown exceptions (i.e.,

RuntimePebbleException ).

2. Incorrectly Added Class Declaration.

One manually detectedconflict happened for this reason. As shown in Figure 8, 𝑏 𝑙 refac-tors the class hierarchy such that the interface QueryParser is imple-mented by only one class—the new class

BaseQueryParserTemp , and allother classes originally implementing the interface are changed toinstead inherit from this new class. On the other hand, 𝑏 𝑟 definesa new class ExistsQueryParser to implement the original interface.Blindly merging the two versions can break the new class hierarchydesign that 𝑏 𝑙 tries to realize. Therefore, in 𝑀 , developers modified ExistsQueryParser to instead inherit from

BaseQueryParserTemp . Finding 3:

The observed dynamic conflicts were introducedfor two reasons: (1) the inconsistent changes between codeimplementation and test oracle, and (2) the inconsistent main-tenance of the same class hierarchy. utomatic Detection and Resolution of Software Merge Conflicts: Are We There Yet? public getTemplate( final

String templateName) throws

PebbleException { //original implementation that may throw ParserException } base public getTemplate( final

String templateName) throws

PebbleException { // updated implementation that may // throw RuntimePebbleException } b l + @Test(expected = + ParserException.class) + public void + testRenderWithoutEndBlock() + throws PebbleException, IOException { … b r public getTemplate( final String templateName) throws

PebbleException { // updated implementation that may // throw RuntimePebbleException } … @Test(expected = ParserException.class) public void testRenderWithoutEndBlock() throws PebbleException, IOException { … A m public getTemplate( final String templateName) throws

PebbleException { // updated implementation that may // throw RuntimePebbleException } … @Test(expected = RuntimePebbleException.class) public void testRenderWithoutEndBlock() throws PebbleException, IOException { … M m Figure 7: A dynamic conflict that causes incorrect test oracle public class

QueryStringQueryParser implements

QueryParser { … } public class BoolQueryParser implements

QueryParser { … } base + public class BaseQueryParserTemp + implements

QueryParser { … } public class QueryStringQueryParser extends

BaseQueryParserTemp { … } public class BoolQueryParser extends

BaseQueryParserTemp { … } b l + public class ExistsQueryParser + implements

QueryParser {…} b r public class BaseQueryParserTemp implements

QueryParser { … } public class QueryStringQueryParser extends

BaseQueryParserTemp { … } public class BoolQueryParser extends

BaseQueryParserTemp { … } public class ExistsQueryParser implement QueryParser {…} A m public class BaseQueryParserTemp implements

QueryParser { … } public class QueryStringQueryParser extends

BaseQueryParserTemp { … } public class BoolQueryParser extends

BaseQueryParserTemp { … } public class ExistsQueryParser extends

BaseQueryParserTemp {…} M m Figure 8: A dynamic conflict that incorrectly adds a class

Generally speaking, developers fixed conflicts by either keepingthe edits from 𝑏 𝑙 ( L ), keeping the edits from 𝑏 𝑟 ( R ), keeping part orall edits from both branches ( L + R ), or applying extra edits aftermerging the branches (

L + R + M ). As shown in Table 4, de-velopers resolved 69% of textual conflicts by taking the edits from 𝑏 𝑙 or 𝑏 𝑟 . There is no evidence showing that developers usuallypreferred one branch over the other, probably because developers’decision-making often depends on the applied edits and edit loca-tions. Interestingly, for all six inspected update-move conflicts,developers always tried to integrate the branches instead of inherit-ing from one branch. We further grouped conflicts based on (1) theirtypes and (2) the commits from which they were extracted. Among the 19 groups of conflicts identified in this way, 15 groups wereresolved consistently. Namely, for the multiple conflicts in each ofthese 15 groups, developers took the same resolution strategy: 7groups were resolved via L , 4 groups were resolved via R , 1 groupwere resolved via L+R , and 3 groups were resolved via

L + R + M . Table 4: Resolutions for different kinds of textual conflicts

Conflict Type L R L + R L + R + Mupdate-update

17 13 13 8 delete-update

14 9 - 2 insert-insert update-move - - 2 4 delete-move move-move - - 1 -

Total

38 31 17 14“-” indicates “zero-entry”

Finding 4:

Developers handled 69% of textual conflicts bygiving up the edits in one branch; they fixed the other 31% ofconflicts by somehow integrating the edits from both sides.

Although there are manyconflict types listed in Table 3, to simplify discussion, we mergedthese types into three major categories, as shown in Table 5.

Table 5: Resolutions for compiling conflicts

Category L R L + R + Mbroken def-use invalid def - - 6 other - - 3

Total • Broken def-use includes all conflicts for which an addedentity usage (e.g., method call, field access, or class refer-ence) refers to a missing entity definition. This categorycorresponds to the finer-grained types in Table 3 that matchthe pattern “Referencing a missing entity *”. • Invalid def includes the cases where two branches declareor update the definition of related entities, creating problem-atic entity definitions. This category covers the followingtypes listed in Table 3: “Restructuring an interface 𝐼 ”, “Up-dating the parameter list of 𝑀 ”, “Restructuring a method 𝑀 ”,and “Adding duplicated declarations for variable 𝑉 ”. • Other covers the cases not included by the above categories.By definition, developers cannot resolve compiling conflicts bynaïvely integrating the edits from both sides (i.e.,

L+R ), becausesuch integration can cause compilation errors. Therefore, in Ta-ble 5, we only list three resolution strategies: L , R , and L + R + M .According to the table, developers resolved 95 conflicts by apply-ing extra edits after merging branches. It implies that developersusually integrated as many edits as possible, as long as those editscould be orchestrated with moderate effort. owen Shen, Cihan Xiao, Na Meng, and Fei He public

SearchParseException(SearchContext context, String msg) { //original implementation} … throw new SearchParseException(context, “No parse for … ”); base public SearchParseException(SearchContext context, String msg, @Nullable XContentLocation location) { //updated implementation} … throw new SearchParseException(context, “No parse for … ”, parser.getTokenLocation()); b l + throw new + SearchParseException(context, + “Unknown key …”); public

SearchParseException(SearchContext context, String msg, @Nullable XContentLocation location) { //updated implementation} … throw new SearchParseException(context, “No parse for … ”, parser.getTokenLocation()); … throw new SearchParseException(context, + “Unknown key …”, parser.getTokenLocation()); M m b r public SearchParseException(SearchContext context, String msg, @Nullable XContentLocation location) { //updated implementation} … throw new SearchParseException(context, “No parse for … ”, parser.getTokenLocation()); … throw new SearchParseException(context, + “Unknown key …”); A m Figure 9: A compiling conflict that developers resolved byconsistently updating method calls

Figure 9 shows a typical way that developers took to resolveconflicts. In this example, 𝑏 𝑙 inserts a method call to SearchParse-Exception(...) ; 𝑏 𝑟 updates the method signature by adding the pa-rameter location , and consistently modifies the method calls bypassing in one more value parser.getTokenLocation() . Blindly com-bining these edits can produce a broken def-use chain betweenthe declared method and the added method call. Therefore, 𝑀 𝑚 contains developers’ extra edit to similarly update the new methodinvocation. Although the extra edits in 𝑀 𝑚 can be more complexfor many conflicts (e.g., inserting, deleting, or moving statements),such edits were usually similar to some of the edits in one branch.Additionally, developers resolved five other conflicts by keeping 𝑏 𝑙 or 𝑏 𝑟 . Four of these conflicts are about adding a reference toa class whose import declaration is removed by the other branch.Naturally, developers resolved the conflicts by keeping the brancheswith imports. Another conflict is about adding a reference to a vari-able whose declaration is removed by the other branch. Similarly,developers kept the branch that has the original declaration. Finding 5:

Developers handled 95% of compiling conflictsby applying extra edits to the integrated version. These extraedits were usually similar to some of the edits in one branch.

All dynamic conflictswere resolved with the strategy

L + R + M . For the three con-flicts with incorrect test oracles, developers applied extra edits tocorrect those oracles. In particular, the corrective edits in 𝑀 𝑚 fortwo conflicts were similar to those applied in one branch. However,the corrective edits for the third conflict is different from those inboth branches. As shown in Figure 7, the corrective edit in 𝑀 𝑚 is: − @Test ( expected = P a r s e r E x c e p t i o n . c l a s s )+ @Test ( expected = RuntimePebbleException . c l a s s ) Meanwhile, the adaptive edit consistently applied in 𝑏 𝑙 is: − @Test ( expected = P a r s e r E x c e p t i o n . c l a s s )+ @Test Theoretically, even if developers replicate their edits in 𝑏 𝑙 to 𝑀 𝑚 ,they can still fix the conflict. This example still supports our obser-vation that the extra edits in 𝑀 𝑚 are usually similar to those in onebranch. Finally, for the conflict producing incorrectly added classdeclaration, developers resolved the problem by similarly applyingtheir 𝑏 𝑙 edits to 𝑀 𝑚 (see Figure 8). Finding 6:

Developers handled all dynamic conflicts byapplying extra edits to the integrated versions; 75% of theseextra edits were similar to those from one branch.

This section discusses the capability of current tools in two aspects:conflict detection (Section 4.4.1) and resolution (Section 4.4.2).

As shownin Table 6, all explored tools could reveal the textual conflicts inour data set. However, only Crystal-like merge revealed 21% ofthe inspected compiling conflicts and 25% of dynamic conflicts. Inparticular, although we manually found most compiling conflictsin automatically unmergeable branches, Crystal-like merge cannotdetect these conflicts because automatic compilation is feasibleonly when 𝐴 𝑚 is generated. Therefore, to more efficiently detectcompiling conflicts, future tools can (1) statically compare the pro-gram structures of software branches, and (2) mimic compilers toidentify any conflicting dependency between program entities (e.g.,an added method call refers to a removed method declaration). Table 6: Conflicts detected by existing tools

Tools TextualConflicts CompilingConflicts DynamicConflictsText-based merge (e.g.,git-merge)

JDime

Crystal-like merge (e.g.,Crystal and WeCode [16])

Similarly, Crystal-like approaches are not effective to revealdynamic conflicts, either. Two reasons can explain such deficiency.First, automatic testing does not help when no 𝐴 𝑚 is generated orexecutable. Second, when dynamic conflicts are not covered by anytest run, testing cannot reveal such conflicts. Sousa et al. recentlyproposed SafeMerge—a verification algorithm to reason about thesemantics of 𝑏𝑎𝑠𝑒 , 𝑏 𝑙 , 𝑏 𝑟 , and 𝐴 𝑚 [34]. By identifying any semanticequivalence between the four versions, the researchers showedthat SafeMerge correctly revealed three dynamic conflicts withoutrunning any test. However, SafeMerge verifies one procedure ata time, assuming that other procedures never affect this one; theapproach cannot detect conflicts when the assumption does nothold. In our data set, unfortunately, SafeMerge cannot detect any ofthe studied dynamic conflicts because the edits applied in distinctprogram entities affect each other. utomatic Detection and Resolution of Software Merge Conflicts: Are We There Yet? Finding 7:

In our data set, 79% of compiling conflicts and75% of dynamic conflicts could NOT be revealed by anyexisting tool, meaning that we still need better detection toolsthat require for no compilers or test execution.

Due to theapproach we took to collect data, none of the tools mentioned inSection 4.4.1 could resolve the studied conflicts.Additionally, AutoMerge is a tool that recommends alternativeresolutions for developers’ consideration [38]. For instance, sup-pose that 𝑏𝑎𝑠𝑒 has two statements ( 𝑆 ; 𝑆 ) , which were updatedto ( 𝑆 ′ ; 𝑆 ′ ) by 𝑏 𝑙 and updated to ( 𝑆 ′′ ; 𝑆 ′′ ) by 𝑏 𝑟 ( 𝑆 ′ ≠ 𝑆 ′′ , 𝑆 ′ ≠ 𝑆 ′′ ).AutoMerge enumerates all possible combinations between the up-dated statements, e.g., ( 𝑆 ′ ; 𝑆 ′′ ) , to suggest candidate resolutions. Inour data set, the actual resolutions for 86% of textual conflicts canbe covered by such candidate resolutions, meaning that AutoMergecan potentially resolve 86% of the inspected textual conflicts. How-ever, given a conflict, AutoMerge cannot automatically decide fordevelopers which candidate resolution to take.WeCode resolves certain kinds of textual conflicts in predefinedways [16]. Specifically, for the update-update conflicts where thesame variable is assigned to different constant integers, WeCodesimply assigns the variable to a predefined default value (e.g., 0).For delete-update conflicts, WeCode keeps the updated statement.However, WeCode does not ensure the correctness of its resolu-tions. Theoretically, WeCode can resolve at most 30 conflicts (i.e. 5 update-upate for variable assignments + 25 delete-update ).Xing et al. built a semi-automatic approach to resolve the dy-namic conflicts implied by test failures [36]. After test failures,developers are required to define extra tests to show the expectedprogram behaviors; an off-the-shelf Automatic Program Repair(APR) system (i.e., kGenProg [17]) is then used to generate patchessuch that the patched program can pass all tests. Theoretically, thisapproach can resolve only one conflict in our data set—the conflictthat passes compilation but fails testing. In reality, the approach islimited by two assumptions. First, the newly defined tests can coverall desired program behaviors of the merged software. Second, APRcan create the correct patch within a given period of time. Finding 8:

Theoretically, the resolutions for 86% of textualconflicts and 25% of dynamic conflicts can be generated byexisting tools. However, these tools do not guarantee to auto-matically choose the correct resolutions over the other gener-ated candidates. No tool resolves compiling conflicts.

Our work reveals the characteristics of various conflicts and theresolution strategies of developers. These findings lead us to givethe following recommendations on future tool design.

Resolution Prediction for Textual Conflicts.

Given a textualconflict, existing tools at most enumerate and suggest the feasibleresolution alternatives, but cannot predict which suggestion devel-opers finally take. According to Section 4.3.1, for the same-typedconflicts from the same commit, developers usually took the same resolution strategy. Inspired by this observation, we can build futuretools that (1) monitor developers’ resolutions (e.g., L ) for certaintypes of conflicts (e.g., update-update ), and (2) dynamically sug-gest the same resolution strategies for the same-typed unresolvedconflicts occurring in the same commit. In this way, these tools canminimize the manual effort required to fix textual conflicts, andthus help developers create the merged software much faster. Detection of Higher-Order Conflicts.

Existing tools detect higher-order conflicts via compiling and testing the 𝐴 𝑚 created by text-based merge. However, we observed that (1) many higher-orderconflicts were actually introduced by text-based merge; (2) dynamicconflicts do not always trigger test failures; and (3) not every testfailure can help pinpoint the related conflict. All our observationsmotivate better approaches that conduct static program analysis todetect all types of conflicts at once. With a holistic view of the con-flicts between branches, developers can prioritize conflicts basedon their importance instead of the exposure sequence, and avoidintroducing higher-order conflicts when handling textual conflicts. Resolution of Higher-Order Conflicts.

Current tools rarelyfix higher-order conflicts. However, we observed that developerstypically took the

L + R + M strategy by applying extra edits to 𝐴 𝑚 , which edits were similar to those applied in one branch. Somesystematic editing tools like SYDIT [25] and LASE [24] can gener-alize abstract program transformations from concrete code changeexamples, and repetitively apply similar transformations to similarcode snippets. It is promising to extend these tools for automaticconflict resolution. For instance, given a code snippet with a higher-order conflict, we can extend LASE to search for similar code thatwas edited in one branch, and then similarly apply the edit to 𝐴 𝑚 . The related work includes empirical studies of merge conflicts,automatic merge approaches, and change recommendation systems.

Several studies were recently conducted to characterize merge con-flicts [10, 15, 19, 23, 32]. Specifically, Le 𝛽 enich et al. surveyed 41developers and identified 7 potential indicators (e.g., owen Shen, Cihan Xiao, Na Meng, and Fei He They observed that 44% of conflicts were caused by conflicting up-dates on the same line of code, and developers resolved 99% of con-flicts by taking either the left- or right- version of code. Our studyrevealed similar findings. Nguyen et al. observed that (1) a higherintegration rate of a project does not generate a higher unresolvedconflict rate, and (2) developers are more likely to take the left- orright- version of code to resolve higher-order conflicts [30]. Nelsonet al. conducted both surveys and interviews with developers [28].They learnt that developers deferred responding to conflicts basedon their perception of the complexity of the conflicting code, andthat deferring affects the workflow of the entire team.Our study is different from prior work in two aspects. First, byexperimenting with various conflict detection tools, we exploredthe gap between hard-to-detect conflicts and capabilities of currenttools, and suggested future research to close the gap. Second, byinvestigating the resolution strategies adopted by developers, wesummarized the fixing patterns for conflicts and identified oppor-tunities for further automation.

Tools were built to detect or resolve merge conflicts [11–14, 16, 20,36, 38]. For instance, FSTMerger combines structured and unstruc-tured merge [12]. It matches Java methods purely based on themethod signatures, and integrates the content of matched methodsvia text-based merge. WeCode continuously merges the committedand uncommitted changes in software branches, and raises devel-opers’ awareness when any conflict is detected [16]. The systemdetects conflicts matching certain patterns and resolves conflicts inpredefined ways. Xing et al. proposed a semi-automatic approachto resolve dynamic conflicts [36]. When a merged program failsone or more tests, the approach requires developers to define moretests, and then adopts Automatic Program Repair (APR) to generatecandidate fixes to resolve conflicts until all tests are passed.We took both manual and automatic approaches to reveal a spec-trum of conflicts. By revealing the conflicts overlooked by existingtools, we suggested new tools for conflict detection. By characteriz-ing developers’ strategies to resolve conflicts, we checked whetherexisting tools mimic humans’ resolutions, and revealed some un-known but frequently adopted strategies that are automatable.

Based on the insight that similar code is likely to be changed sim-ilarly, researchers proposed tools to recommend code changes orfacilitate systematic program editing [18, 21, 24, 25, 27, 29, 31].With more details, simultaneous editing enables developers to si-multaneously edit multiple pre-selected code fragments in the sameway [27]. While a developer interactively demonstrates the editoperations in one fragment, the tool replicates the lexical edits (e.g.,copy a line) to other fragments. CP-Miner identifies code clones (i.e.,similar code snippets), and detects copy-paste related bugs if cloneshave inconsistent identifier or context mappings [21]. Given twoor more similarly changed code examples, LASE extracts the com-mon edit operations, infers a general program transformation, andleverages the transformation to locate code for similar edits [24].Although both software merge and change recommendationsystems are active research areas, our study is the first piece ofwork that identifies a nice connection between the areas. Our study uncovers various scenarios where developers repetitively appliedsimilar or related edits to resolve conflicts. It enlightens futureresearch to improve change recommendations such that the conflict-specific systematic edits can be dynamically generated and applied.

Threats to External Validity.

Our study is based on the 204 con-flicts extracted from 15 Java project repositories. The characteristicsthese conflicts present and the observed resolutions may not gener-alize well to other conflicts, other projects, or other programminglanguages. In the future, we plan to reduce this threat by consid-ering the projects of other programming languages, investigatingmore projects, and including more samples in our data set.

Threats to Construct Validity.

We took a hybrid approach to de-tect conflicts. In particular, as automatic tools only revealed 21compiling conflicts and 1 dynamic conflict, we manually analyzedunmerged software branches to detect more higher-order conflicts.It is possible that certain conflicts are easier to manually detectthan others, so the collected data can be subject to human bias. Toovercome this limitation, we leveraged cross-validation (i.e., havingmultiple people to examine the same conflicts) to reduce randomerrors and avoid bias. In the future, we will build better tools todetect various conflicts more systematically and efficiently.

Prior empirical studies showed that merge conflicts frequently occurand solving conflicts is important but challenging. In this study, wecomprehensively studied all kinds of conflicts and their resolutions,and characterized the conflicts that cannot be handled by existingtools. Different from prior studies that mainly focus on textualconflicts, our study is wider and deeper for two reasons. First, wealso examined higher-order conflicts, which are harder to reveal andresolve. Second, by assessing related tools with the same data set,we evaluated the capabilities of current approaches in a theoreticalway and characterized the limit of current approach design.Our study provides multiple insights. First, same-typed textualconflicts in the same commits were usually resolved with the samestrategy (i.e., L , R , L + R , or

L + R + M ). Therefore, even though it isalmost impossible to predict developers’ strategy for any arbitrarytextual conflict, such prediction is feasible given the resolutionof other textual conflicts in the same merging commit. Second,text-based merge can produce higher-order conflicts by silentlyintegrating semantically conflicting edits, while compilation andtesting usually fail to capture the produced conflicts. It means thatbetter tools are desperately needed to detect all kinds of conflictsaltogether, instead of detecting certain conflicts at the cost of intro-ducing other conflicts. Third, developers usually resolved higher-order conflicts by consistently applying similar edits to similar codelocations. By automating such practices, future tools can resolvemany higher-order conflicts that we observed. Our future work ison building the (semi)-automatic tools enlightened by this study.

REFERENCES utomatic Detection and Resolution of Software Merge Conflicts: Are We There Yet? , pages 58–67, Nov 2017.[11] S. Apel, O. Lessenich, and C. Lengauer. Structured merge with auto-tuning:Balancing precision and performance. In

Proceedings of the 27th IEEE/ACMInternational Conference on Automated Software Engineering , ASE 2012, pages120–129, New York, NY, USA, 2012. ACM.[12] S. Apel, J. Liebig, B. Brandl, C. Lengauer, and C. Kastner. Semistructured merge:Rethinking merge in revision control systems. In

Proceedings of the 19th ACMSIGSOFT Symposium and the 13th European Conference on Foundations of SoftwareEngineering , ESEC/FSE ’11, pages 190–200, New York, NY, USA, 2011. ACM.[13] Y. Brun, R. Holmes, M. D. Ernst, and D. Notkin. Proactive detection of collab-oration conflicts. In

Proceedings of the 19th ACM SIGSOFT Symposium and the13th European Conference on Foundations of Software Engineering , ESEC/FSE ’11,pages 168–178, New York, NY, USA, 2011. ACM.[14] Y. Brun, R. Holmes, M. D. Ernst, and D. Notkin. Early detection of collaborationconflicts and risks.

IEEE Transactions on Software Engineering , 39(10):1358–1375,Oct 2013.[15] H. C. Estler, M. Nordio, C. A. Furia, and B. Meyer. Awareness and merge conflictsin distributed software development. In , pages 26–35, Aug 2014.[16] M. L. Guimarães and A. R. Silva. Improving early detection of software merge con-flicts. In

Proceedings of the 34th International Conference on Software Engineering ,ICSE ’12, pages 342–352, Piscataway, NJ, USA, 2012. IEEE Press.[17] Y. Higo, S. Matsumoto, R. Arima, A. Tanikado, K. Naitou, J. Matsumoto, Y. Tomida,and S. Kusumoto. kgenprog: A high-performance, high-extensibility and high-portability apr system. In , pages 697–698, Dec 2018.[18] L. Jiang, Z. Su, and E. Chiu. Context-based detection of clone-related bugs. In

Proceedings of the the 6th Joint Meeting of the European Software EngineeringConference and the ACM SIGSOFT Symposium on The Foundations of SoftwareEngineering , 2007.[19] O. Le 𝛽 enich, J. Siegmund, S. Apel, C. K´’astner, and C. Hunsen. Indicators formerge conflicts in the wild: Survey and empirical study. Automated SoftwareEngg. , 25(2):279–313, June 2018.[20] O. Leßenich, S. Apel, and C. Lengauer. Balancing precision and performance instructured merge.

Automated Software Engineering , 22:367–397, 2014.[21] Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A tool for finding copy-pasteand related bugs in operating system code. In

OSDI , pages 289–302, 2004.[22] F. Long, P. Amidon, and M. Rinard. Automatic inference of code transforms forpatch generation. In

Proceedings of the 2017 11th Joint Meeting on Foundations ofSoftware Engineering , pages 727–739. ACM, 2017.[23] M. Mahmoudi, S. Nadi, and N. Tsantalis. Are refactorings to blame? an empiricalstudy of refactorings in merge conflicts. In , pages 151–162, Feb2019.[24] N. Meng, M. Kim, and K. McKinley. Lase: Locating and applying systematic edits.In

ICSE , page 10, 2013.[25] N. Meng, M. Kim, and K. S. McKinley. Systematic editing: Generating programtransformations from an example. In

Proceedings of the 32Nd ACM SIGPLANConference on Programming Language Design and Implementation , PLDI ’11, pages329–342, New York, NY, USA, 2011. ACM.[26] T. Mens. A state-of-the-art survey on software merging.

IEEE Transactions onSoftware Engineering , 28(5):449–462, May 2002.[27] R. C. Miller and B. A. Myers. Interactive simultaneous editing of multiple textregions. In

Proceedings of the General Track: 2002 USENIX Annual TechnicalConference , pages 161–174, Berkeley, CA, USA, 2001. USENIX Association.[28] N. Nelson, C. Brindescu, S. McKee, A. Sarma, and D. Dig. The life-cycle of mergeconflicts: processes, barriers, and strategies.

Empirical Software Engineering ,pages 1–44, 2018.[29] H. A. Nguyen, T. T. Nguyen, G. Wilson, Jr., A. T. Nguyen, M. Kim, and T. N.Nguyen. A graph-based approach to API usage adaptation. pages 302–321, 2010.[30] H. L. Nguyen and C.-L. Ignat. An analysis of merge conflicts and resolutions ingit-based open source projects.

Computer Supported Cooperative Work (CSCW) ,27(3):741–765, Dec 2018.[31] T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen.Clone-aware configuration management. In

ASE , pages 123–134, 2009. [32] M. Owhadi-Kareshk, S. Nadi, and J. Rubin. Predicting merge conflicts in collabo-rative software development. https://arxiv.org/pdf/1907.06274.pdf.[33] T. Savor, M. Douglas, M. Gentili, L. Williams, K. Beck, and M. Stumm. Continuousdeployment at facebook and oanda. In

Proceedings of the 38th InternationalConference on Software Engineering Companion , ICSE ’16, pages 21–30, New York,NY, USA, 2016. ACM.[34] M. Sousa, I. Dillig, and S. Lahiri. Verified three-way program merge. In

Object-Oriented Programming, Systems, Languages & Applications Conference (OOPSLA2018) . ACM, November 2018.[35] B. Vasilescu, Y. Yu, H. Wang, P. Devanbu, and V. Filkov. Quality and productivityoutcomes relating to continuous integration in github. In

Proceedings of the 201510th Joint Meeting on Foundations of Software Engineering , ESEC/FSE 2015, pages805–816, New York, NY, USA, 2015. ACM.[36] X. Xing and K. Maruyama. Automatic software merging using automated pro-gram repair. In , pages 11–16, Feb 2019.[37] R. Yuzuki, H. Hata, and K. Matsumoto. How we resolve conflict: an empiricalstudy of method-level conflict resolution. In , pages 21–24, March 2015.[38] F. Zhu and F. He. Conflict resolution for structured merge via version spacealgebra.