Recommending More Efficient Workflows to Software Developers
aa r X i v : . [ c s . S E ] F e b Recommending More EfficientWorkflows to Software Developers
Dylan Bates
Coker University [email protected]
ABSTRACT
Existing recommendation systems can help developers im-prove their software development abilities by recommendingnew programming tools, such as a refactoring tool or a pro-gram navigation tool. However, simply recommending toolsin isolation may not, in and of itself, allow developers tosuccessfully complete their tasks. In this paper, I introducea new recommendation system that recommends workflows,or sequences of tools, to developers. By learning more effi-cient workflows, the system could make software developersmore efficient.
1. RESEARCH PROBLEMAND MOTIVATION
Software developers use tools in order to make programmingeasier, by reducing development time and improving soft-ware quality. However, software developers only use a smallfraction of the total number of tools available to them [4].Several programs combat this, by recommending tools to theuser. Some of these programs work by simply recommend-ing the most popular tools in an integrated developmentenvironment that the user does not make use of [2]. Otherswork by comparing a user’s tool frequencies to the entireuser population [3]. Exposure to more tools encourages thedeveloper to use these tools to reduce development time andultimately increase software quality.The sequential use of several tools characterize a softwaredeveloper’s workflow, which Linton and colleagues define as“one or more plans for attaining a goal,” which “ultimatelydecompose to a sequence of actions” [2]. In this paper, Iinterpret this as a series of tools used one after another, inorder to accomplish a specific, unique, and efficient task. Ipropose building on prior work that recommends tools, byinstead recommending more efficient workflows.Consider the following example of an inefficient workflow.Suppose Evelyn is working in the Eclipse development envi- ronment , repeatedly using the Find References commandon several methods in a call chain. In this example, the taskshe is trying to accomplish is walking up a call hierarchy.This is an example of an inefficient workflow, because shecould accomplish the same task by using the
Call Hierar-chy tool, which takes fewer steps than using
Find Refer-ences repeatedly. Most workflows use several tools, suchas
Copy/Paste to duplicate and move text and
OrganizeImports/Format/Save All to clean up code.The insight of this paper is that I can use some of the sameideas to recommend workflows as were previously used torecommend tools.My paper makes three contributions: it introduces a systemthat recognizes the most common workflows comprised of n tools; it demonstrates how n -tool workflows (called n -flows )can then be used to recommend a workflow to a software de-veloper; and it explains how a recommendation system couldimplement this technique of recommending workflows, po-tentially increasing developers’ efficiency and productivity.
2. BACKGROUND AND RELATED WORK
The previous work in this field recommends single tools(which I define as 1-flows) to a user. Existing systems do thisvia one of several algorithms, such as content-based filter-ing [3, 4], collaborative filtering [3, 4], and most popular [2].The main difference between each of these methods is theway they generate the items to recommend. These meth-ods have already been used by Matejka and colleagues [3],Murphy-Hill and colleagues [4], and Linton and colleagues [2]to recommend tools. One could feasibly extend their meth-ods to recommend workflows to developers.These methods are focused on recommending single toolsthat will potentially increase the productivity of a devel-oper. The problem with existing implementations is thatgiven out of context, some tools do not constitute a completetask, and are therefore difficult for a user to implement ef-fectively. This problem was addressed by Viriyakattiyapornand Murphy [5] when they implemented a system called
Spy-glass , which attempts to recognize inefficient navigationalworkflows and suggests tools to aid program navigation as adeveloper works. Unfortunately, Spyglass is limited in thatcan only recommend single navigational tools, instead of po-tential workflows that users could implement. . APPROACH AND UNIQUENESS I went about the task of identifying the most common work-flows by analyzing about 23 million time-stamped tool usesfrom 4308 Eclipse users, collected from the Eclipse UsageData Collector . A total of 700 unique tools were used. Iattempted three ways to discover the most common n -flows,each of which has its own advantages and disadvantages.These methods consisted of a Top- K Sequential pattern min-ing algorithm called TKS [1], sorting an n -dimensional ma-trix, and using a Map to map n -flows to the number of uses.Each implementation worked well for different sized datasetsand numbers of dimensions.Through the use of all three algorithms, I was able to de-termine the top n -flows for n <
5. I was then able to useLinton and colleagues’ most popular algorithm [2] to recom-mend workflows to any given user in the set.By looking at n -flows instead of single tools, this system isable to recommend entire workflows to a user that they maynot otherwise discover on their own. For example, if Eve-lyn is new to Eclipse, and using an existing recommendationsystem to learn new tools, it will recommend a single tool toher (for example, Copy ). Without the proper documentationfor the tool, Evelyn may be lost. Certain tools, when usedmake no observable changes to the screen; she will find thatnothing apparent happens. Had Evelyn been recommendeda workflow containing the tool (in this case
Copy/Paste ), itmay be more clear what the workflow accomplishes. In thiscase, the context for use is then clear. This method can elim-inate the discoverability barrier [4] presented by Murphy-Hilland colleagues, as developers will learn new tools as well asmore efficient workflows.When cleaning the data, I had to prune the dataset to re-move all repeated uses of tools, such as deleting an entireline one character at a time, or saving seven times before do-ing anything else. Before doing this, the top five workflowsfor n <
Delete/Delete/Delete or Save/Save/Save/Save .These were removed from the results.
4. RESULTS AND CONTRIBUTION
My research suggests that while there are several commonworkflows that Eclipse users implement in order to accom-plish a task, there are many others that do not appear toserve a specific purpose. I collected the top 100 n -flows for n < and was able to generate 209 workflows and indi-vidual tools to recommend to a user in the dataset. I foundthat many of the n -flows I discovered did not meet my defi-nition of a workflow, as they did not appear to accomplish aspecific, unique, or efficient task. Due to this, most shouldnot be recommended in a system that recommends moreefficient workflows.While 2-flows occurred most frequently, they are also themost trivial. For example, the most common 2-flow was Copy/Paste , which >
99% of all users in the dataset used.The most common 3-flow was
Paste/Copy/Paste . The most Available at http://goo.gl/nRKMv3 common 4-flow was
Copy/Paste/Copy/Paste . Each of these n -flows is a subset of an ( n + 1)-flow, which was found tooccur throughout the n -flows discovered. Interestingly, thetop eight n -flows for n < Copy , Cut , Delete , Paste , and
Save . In fact, most of the n -flows I discovered consisted ofthese five tools, as well as simple text editing tools, such as Go To Line Start , Select To Line End , Go To PreviousWord , and
Select Next Word . The major limitation that I faced was the noise found in min-ing common workflows, which makes recommending more ef-ficient workflows difficult. That is to say, the vast majorityof n -flows found do not appear to complete a task. Also, themost common n -flows found are redundant, in that they areoften a subset of another common workflow. Additionally,the most frequently occurring n -flows consist of tools thatnearly everybody uses, such as saving, moving the cursor,and selecting text, which means they would not be as ef-fective for recommendations. In a recommendation system,workflows would need to be recommended that are neces-sary, efficient, and accomplish a task. Building on the results presented here, I can use the idea ofrecommending workflows to create a functioning recommen-dation system based on the algorithms mentioned in Sec-tion 2. Theoretically, it could be possible to go throughthese n -flows and recommend only those that meet the defi-nition of a workflow, as defined earlier. Then, a recommen-dation system could be made that only recommends relevantworkflows to users. Finally, a study could be carried outto determine the effectiveness of recommending workflows.Only then could the insight of this paper be confirmed: thatlearning more efficient workflows makes software developersmore efficient.
5. REFERENCES [1] P. Fournier-Viger, A. Gomariz, T. Gueniche,E. Mwamikazi, and R. Thomas. TKS: Efficient miningof top-k sequential patterns. In
ADMA (1) , pages109–120, 2013.[2] F. Linton, D. Joy, H. Schaefer, and A. Charron. OWL:A recommender system for organization-wide learning.
Educational Technology & Society , 3(1):62–76, 2000.[3] J. Matejka, W. Li, T. Grossman, and G. Fitzmaurice.CommunityCommands: Command recommendationsfor software applications. In
Proceedings of the 22ndAnnual ACM Symposium on User Interface Softwareand Technology , UIST ’09, pages 193–202, 2009.[4] E. Murphy-Hill, R. Jiresal, and G. C. Murphy.Improving software developers’ fluency byrecommending development environment commands. In
Proceedings of the ACM SIGSOFT 20th InternationalSymposium on the Foundations of SoftwareEngineering , FSE ’12, pages 42:1–42:11, 2012.[5] P. Viriyakattiyaporn and G. C. Murphy. Improvingprogram navigation with an active help system. In