[PDF] CcNav: Understanding Compiler Optimizations in Binary Code

Abstract

Program developers spend significant time on optimizing and tuning programs. During this iterative process, they apply optimizations, analyze the resulting code, and modify the compilation until they are satisfied. Understanding what the compiler did with the code is crucial to this process but is very time-consuming and labor-intensive. Users need to navigate through thousands of lines of binary code and correlate it to source code concepts to understand the results of the compilation and to identify optimizations. We present a design study in collaboration with program developers and performance analysts. Our collaborators work with various artifacts related to the program such as binary code, source code, control flow graphs, and call graphs. Through interviews, feedback, and pair-analytics sessions, we analyzed their tasks and workflow. Based on this task analysis and through a human-centric design process, we designed a visual analytics system Compilation Navigator (CcNav) to aid exploration of the effects of compiler optimizations on the program. CcNav provides a streamlined workflow and a unified context that integrates disparate artifacts. CcNav supports consistent interactions across all the artifacts making it easy to correlate binary code with source code concepts. CcNav enables users to navigate and filter large binary code to identify and summarize optimizations such as inlining, vectorization, loop unrolling, and code hoisting. We evaluate CcNav through guided sessions and semi-structured interviews. We reflect on our design process, particularly the immersive elements, and on the transferability of design studies through our experience with a previous design study on program analysis.

Full PDF

©© 2020 IEEE. This is the author’s version of the article that has been published in IEEE Transactions on Visualization andComputer Graphics. The ﬁnal version of this record is available at: 10.1109/TVCG.2020.3030357

CcNav: Understanding Compiler Optimizations in Binary Code

Sabin Devkota, Pascal Aschwanden, Adam Kunen, Matthew Legendre, and Katherine E. Isaacs

Fig. 1.

CcNav uses multiple coordinated views to enable correlation between source code (a) and disassembled binary code (b). Aloop is selected in the loop hierarchy view (e). Matching disassembly is highlighted (b) with source variables annotated either withautomated analysis or manual entry (h). The control ﬂow graph (c), call graph (d), and function inlining (f) views provide extra context tothe selection and alternative modes of navigation. A separate panel (g) collects all highlighted items for detailed examination.

Abstract —Program developers spend signiﬁcant time on optimizing and tuning programs. During this iterative process, they apply optimizations,analyze the resulting code, and modify the compilation until they are satisﬁed. Understanding what the compiler did with the code iscrucial to this process but is very time-consuming and labor-intensive. Users need to navigate through thousands of lines of binarycode and correlate it to source code concepts to understand the results of the compilation and to identify optimizations. We present adesign study in collaboration with program developers and performance analysts. Our collaborators work with various artifacts relatedto the program such as binary code, source code, control ﬂow graphs, and call graphs. Through interviews, feedback, and pair-analyticssessions, we analyzed their tasks and workﬂow. Based on this task analysis and through a human-centric design process, we designeda visual analytics system Compilation Navigator (

CcNav ) to aid exploration of the effects of compiler optimizations on the program.

CcNav provides a streamlined workﬂow and a uniﬁed context that integrates disparate artifacts.

CcNav supports consistent interactionsacross all the artifacts making it easy to correlate binary code with source code concepts.

CcNav enables users to navigate and ﬁlterlarge binary code to identify and summarize optimizations such as inlining, vectorization, loop unrolling, and code hoisting. We evaluate

CcNav through guided sessions and semi-structured interviews. We reﬂect on our design process, particularly the immersive elements,and on the transferability of design studies through our experience with a previous design study on program analysis.

Index Terms —Design Study, Program Analysis, Compilation, Binary Code, Transferability, Immersion

NTRODUCTION

Demand for high performance computing (HPC) resources as well asscalability limitations of HPC applications drive the need for optimiza- • Sabin Devkota and Katherine E. Isaacs are with University of Arizona.E-mail: { [email protected] | [email protected] } .• Pascal Aschwanden, Adam Kunen, and Matthew Legendre are with LLNL.E-mail: { aschwanden1 | kunen1 | legendre1 } @llnl.gov .Manuscript received xx xxx. 201x; accepted xx xxx. 201x. Date of Publicationxx xxx. 201x; date of current version xx xxx. 201x. For information onobtaining reprints of this article, please send e-mail to: [email protected] Object Identiﬁer: xx.xxxx/TVCG.201x.xxxxxxx tion. Even small percentage increases in efﬁciency can mean morescience computed, either through more programs running or higherﬁdelity results than previously computationally possible. Thus, ap-plication developers and performance analysts spend signiﬁcant timeoptimizing and tuning these programs.One vector for optimization is at the compilation stage. When build-ing the application, there are many choices in terms of which compilerto use and with what options. Furthermore, small non-algorithmicchanges in the source code can lead the compiler to make differentchoices in how it transforms source code into machine-interpretableinstructions. Running experiments can show which compiler versionwith which options performs better for a speciﬁc machine. However, forsome applications performance is so critical that signiﬁcant time and1 a r X i v : . [ c s . H C ] N ov abor is devoted to trying to determine what optimizations were madeby the compiler, whether they were effective, and what can be done toencourage it to optimize further. Understanding the optimizations maynot only increase application efﬁciency on the target system, but lead to portable improvements where ﬁndings can be applied when compilingon another system.Analyzing compiler optimizations is an iterative, experimental, andtime-consuming task. Typically an analyst will disassemble a compiledbinary into human-readable instructions and inspect in a text editor.They may also view the source code, make annotations, draw ﬁgures,and run ad hoc analyses. Even with debugging tools that show bothsource and disassembled code, analysts struggle to orient themselves ineven moderately-sized programs of a few hundred lines of code.This project is a collaboration between visualization and programanalysis experts resulting in CcNav , a visual analytics tool to aid identi-ﬁcation of compiler optimizations, their underlying causes, and theireffect on performance.

CcNav combines automated static analysis ofcompiled binaries with visual interfaces to support ﬁne-grained analysisof compilation results. We conduct an ongoing design study [44] over18 months with regular pair analytics [15,26] sessions and a three monthimmersive study. Through these activities we develop a data and taskabstraction driving the design of our integrated system. We evaluatethe system through pair analytics sessions with domain experts.We ﬁnd that experts in this style of program analysis employ awide range of strategies, often jumping between whatever differentabstractions and organizations of the data they had available to themand deriving or annotating new data. We therefore design

CcNav toautomatically derive views where possible, support linked navigationconsistently through all views, and assist the most used forms of anno-tations. We also ﬁnd that the collaborative and immersive nature of ourmeetings were fundamental in understanding these workﬂows.We also reﬂect on the transferability of design studies based on ourexperience with a previous design study on program analysis [23] thatled the domain experts on our team to seek out the visualization experts.We describe the limitations of transferability, despite remaining in thesame domain, and how our process either supported or dissuaded thetransference of design.In summary, our contributions are:• a data and task analysis for ﬁne-grained analysis of compilationoutput (Sect. 3.2, Sect. 3.3)• the design and evaluation of a visual analytics system for analyz-ing the results of compilation (Sect. 4, Sect. 5), and• a reﬂection on both transferability of visual solutions and immer-sive design techniques with implications for future visualizationstudies (Sect. 6).Before discussing these contributions, we provide a brief overviewof the domain and related work (Sect. 2). We then discuss our method-ology in further detail (Sect. 3). We conclude in Sect. 7.

ACKGROUND AND R ELATED W ORK

Scientiﬁc simulation is used in diverse ﬁelds such as climate science,medicine, energy, and physics to study phenomena where it maybe infeasible to do so otherwise. These simulations are frequentlycomputationally-intensive and run on large, shared resources such assupercomputers and clusters. Thus, optimizing these programs to runfaster can free resources for further scientiﬁc questions to be answered.One avenue for optimizing these applications is to increase theefﬁciency of an algorithm through its translation for execution on amachine.

CcNav aims to help developers and program analysts inunderstanding this process and ultimately using that understandingto generate more highly optimized software. We discuss the neces-sary background in compilation, optimization, and program analysisworkﬂow, followed by a review of related literature.

Compilation is the transformation of source code into a machine-interpretable format. The program that performs this transformation is

Fig. 2. Source code (a), disassembly (b), and control ﬂow graph (c). known as the compiler and the resulting machine-executable softwareis known as the executable or binary ﬁle. As the machine-interpretableformat is different between machines, this process must be done foreach machine-architecture targeted. Thus, compilation is also an im-portant element of code portability across machines.People typically write source code in a higher-level language thanthat understood by the machine. As operations are available in the highlevel language that are not available to the machine, compilation is nota direct translation. The low level language of the machine is calledthe instruction set architecture (ISA) or assembly language . Assemblyinstructions typically have a format of an operation (e.g., add , mov )followed by parameters such as values or locations of values. Theselocations can be in memory or in temporary storage on the computationunit. These temporary locations are known as registers .There are a multitude of valid transformations from source code tomachine code. While the compiler must always generate correct code,it may also attempt to fulﬁll goals such as making the executable moreefﬁcient or producing a small binary. While in many contexts, develop-ers are satisﬁed with the choices made by the compiler under defaultoptions, our collaborators are particularly concerned with generatingmore optimized code. Common optimizations seen in scientiﬁc code in-clude function inlining, loop unrolling, and vectorization. Additionally,the compiler may create several variants —blocks of instructions thatcorrespond to the same code but are optimized for different situations. Function inlining removes the instructions (and therefore the over-head) of a function call by moving the body of the function within itscalling function. This process sometimes requires duplicating instruc-tions when a function is called from multiple places.

Loop unrolling similarly removes overhead associated with checking loop bounds andjumping by placing several iterations of a loop body sequentially beforeperforming the jump.

Vectorization translates repeated operations that might naively beperformed in sequence to take advantage of parallel features of thecomputation unit. For example, a loop that multiplies every value inan array by a constant can be transformed to perform the operation inparallel across chunks of that array. The ISA typically has separateinstructions and registers for vectorized operations.To generate a more performant executable through compilation, de-velopers can change the compiler (e.g., gcc , clang , or llvm ), thecompiler options (e.g., -O3 for optimization-level-3 or -funroll to en-courage loop unrolling), or even make small changes to the source codewithout changing the algorithm. However, even with these features, itcan be difﬁcult to predict what the compiler will do.Since performance is at a premium to our collaborators, they wantto know whether the optimizations they expected were made and ifnot, what they can do to further encourage them. We call the collectivestrategies by which they answer these questions program analysis . Developers can examine the results of the compilation by viewing thegenerated instructions, possibly with the help of automated tools. Acompiled binary can be translated into human-readable machine instruc-tions through a process known as disassembly . Typically the resultingtext ﬁle is often referred to as disassembly code (or just ‘disassembly’)2 and includes the address (memory location) associated with each in-struction. These addresses are used to jump non-sequentially, e.g., in aloop or function call.If the binary was compiled with an option to retain debug informa-tion, more information can be retrieved, such as mappings betweensource code and disassembly or whether a function was inlined. Thequality of debug data is dependent on the features of the compiler. Itis often incomplete or incorrect, especially in the presence of heavyoptimization [39], so manual inspection is required.To provide a structural interpretation of the disassembly, other struc-tures, such as the control ﬂow graph and call graph, may be derivedfrom it. The control ﬂow graph (CFG) divides the disassembly into basic blocks : contiguous address ranges that must be executed sequen-tially. Basic blocks are the nodes in the graph. Edges represent validpaths between basic blocks due to jumps, branches, and function calls.Fig. 2 shows a small example. In the call graph , the functions are thenodes and the edges are valid calls between them.Dwarf [3] is a popular format to support source level debugging.Objdump [10] and dwarfdump [4] are popular tools for retrieving disas-sembly with debug information. Both produce text ﬁles. Dyninst [21]is a library for more advanced analysis. We use Dyninst as a basis forthe automated analysis components of

CcNav . Typical workﬂows.

Analysts typically use the above tools to get thedisassembly and view both it and source code with a text editor, switch-ing between views to orient themselves. They may also generate aCFG, sometimes ﬁltered locally to the portion of the disassembly ofinterest. This is sometimes done with pen and paper or with tools likeLLVM [37] that generate a

DOT ﬁle for rendering with the GraphViz dot algorithm [25]. Our domain experts’ interest in the CFGExplorer [23]visualization over dot was an impetus for our collaboration.When the domain experts initially trained the visualization expertsin this process, they started with small enough examples that the rec-ommended workﬂow was almost entirely pen and paper (Fig. 3). Thelearner printed a ﬁltered version of the disassembly. As they were ableto correlate with source, they annotated the disassembly with variablesand structures from source along with evidence of optimizations.A complimentary approach is to use an integrated debugging toolwhich aids navigation between source code and disassembly, but ismore focused on correctness debugging than optimization. We foundmost people we spoke with viewed ﬁles directly rather than through adebugger when trying to understand compiler optimizations.

Several tools link source and disassembly [6, 9, 11, 13, 29, 41, 42]for debugging or reverse engineering. Intel Vtune [9] can incorpo-rate proﬁling information—metrics about how fast the code ran. TheGodbolt Compiler Explorer allows fast switching between compilersand options, linking across the multiple generated assembly ﬁles, andSeeSoft [24]-style ﬁle navigation. However, it does not scale to largeprograms. Reverse engineering tools [6, 13, 42] also incorporate avisualization of the CFG, though with limited selection and ﬁltering.Other approaches prioritize either the source or disassembly.Rivet [45] visualizes how instructions are scheduled on superscalarprocessors. Instructions within a window of time are linked back tosource code. The focus is on the processor’s scheduling, rather thanthe choice of instructions. PSE [34] visualizes instructions collectedwhile a program executes. It can therefore show performance metrics,but does not incorporate source code. Baum et al. [16] present a visualtool for exploring conditionally compiled variants of programs. Thefocus of the tool is displaying what portions of the source code remain,rather than the resulting disassembly.Linking between source code and call graph has also been usedin applications like performance analysis [12] and software mainte-nance [32]. Several tree-metaphors have been used for call graphsincluding indentation [12, 27, 40], node-link diagrams [17, 22, 35, 40],icicle timelines [18], and sunbursts [14]. As the call graph served asan auxiliary view and following design study methodology guidelinesof ‘satisfying rather than optimizing,’ we use an indented tree and

Fig. 3. Annotations made on the disassembly of a benchmark programfor vector addition during immersion study. node-link view for different subgraphs of the call graph, leveragingfamiliarity of our users while supporting their tasks.While many of these visualizations share core views and featureswith

CcNav , we found no tool or design that suited the needs of ourtarget audience in terms of other important elements such as annotation,ﬁltering, scalability, and integration with structural views like the CFG.Furthermore, like the visual designs, the integrated analyses were forother purposes, not for exploring compiler optimizations. Despite thesimilar domains of these projects, the task differences led to a differentdesign. We discuss related issues of transferability further in Sect. 6.

ETHODOLOGY , D

ATA , AND T ASK A NALYSIS

In conducting this design study, we followed the guidance of Sedlmairet al. [44]. We detail our collaboration below (Sect. 3.1) as well as theresulting data (Sect. 3.2) and task analyses (Sect. 3.3). Sect. 4 thendescribes the resulting visual analytics approach.

Our team consists of two visualization experts, an HPC applicationsexpert, an expert in (HPC) program analysis and tools, and a softwaredeveloper. Two additional HPC experts attended the early projectmeetings as well. The applications expert represented the typical front-line analyst, though the program analysis expert also had goals inunderstanding compilation.The program analysis expert approached the visualization expertsupon seeing their prior work with visualizing CFGs [23]. He wanted toextend the work to support his use case of optimizing compilations.The resulting collaboration has been ongoing for 18 months withvideo conferences scheduled every other week. These meetings in-cluded discussions of the available data, the analysis needs, and thedevelopment and deployment of

CcNav , including both the visualiza-tion front end and the analysis software backing it. Copious notes3ere generated each meeting. Demonstrations via screen share werefrequent, with the domain experts modeling their tasks using a combi-nation of existing tools and the presented prototype as driven by thelead visualization expert in a pair analytics [15, 26] fashion.A visualization expert (the lead author) also spent three monthson-site with the domain experts. With their guidance, he performedtheir current workﬂow to better understand their tasks. We discuss theimmersive elements of our collaboration further in Sect. 6.

The input data for

CcNav is a compiled executable and its source code,the former of which can be disassembled into disassembly code. Bothsource and disassembly code are text data. There may be multiplesource ﬁles associated with a single executable ﬁle.Through a custom static analysis tool built using Dyninst [21] by theprogram analysis expert, we derive a mapping between lines of sourcecode and address ranges in disassembly code. Note that this mappingcan be many-to-many. We also derive a control ﬂow graph, loops withinthat graph, a mapping between source code variables and disassembly,and annotations regarding disassembly addresses of inlined functions.There are limitations to the automated analysis. For example, dif-ferent compilers report varying amounts of information, which affectsthe completeness of the mapping between source and disassembly. Theprogram can’t match some variables with registers in the disassembly.Some function names, which are mangled into unique identiﬁers bythe compiler, do not get properly de-mangled. Experts must combinethe automated assistance with their awareness of compiler reportinglimitations and knowledge of the domain.The programs of interest to our collaborators are sizable and compli-cated, using many advanced features and libraries. Thus, we can makefew assumptions. For example, in one case we found a de-mangledfunction name that was (correctly) 137,777 characters long.

The ultimate goal of our collaborators is to determine a combinationof source code changes, compiler choice, and compiler ﬂags that willachieve improved performance. The domain experts are aware ofstrategies the compiler can take, so they analyze the results of thecompilation to determine where there is room for improvement.Following the ethos of understanding tasks in the context of highlevel goals [20, 36, 43, 48], two visualization experts independentlycoded the observation notes and then met to discuss and ultimatelygenerate the task hierarchy. We found no task typology mapped well tothe low-level operations, which were frequently correlating concepts(e.g., source code lines to disassembly lines) and identifying knownstructures. We present the higher levels below and summarize the lowerlevels in text. The full hierarchy is in the supplemental materials.Focus on particular optimizations or analysis strategies varied frommeeting to meeting, though the overall goal did not change. Similarto Williams et al. [47], we used the persistence of tasks over time toprioritize the design and implementation of

CcNav features. Fig. 4shows when tasks were demonstrated or discussed in our interactionswith domain experts.Goal: Understand performance / Identify optimizationsT1 Understand/Identify compiled structureT1.1 Match source code with binary codeT1.2 Identify/Relate structures with codeT1.3 Annotate relationsT1.4 Trace variableT2 Understand optimizationsT2.1 Find areas of interestT2.2 Identify optimizationsT2.3 Assess optimizationsT2.4 Compare generated codeT2.5 Annotate optimizations We found two major groupings of tasks: understanding and interpret-ing the disassembly itself (T1) and understanding what optimizationswere applied in it (T2). When we started this project, we expectedthe focus would be on T2, speciﬁcally comparisons across parameters(T2.4). However, our initial collaborative analysis sessions showedus that simply understanding how what we were looking at related tosource (T1) was a signiﬁcant hurdle.

T1: Understand/Identify Compiled Structure.

The disassembly rep-resents what the compiler did. To understand what the compiler did,analysts must match the disassembly and the source (T1.1). Typicalqueries are “What disassembly matches these lines of code?” or “Whatare these lines of disassembly doing with respect to the source code?”As code structures like functions and loops both help organize the code,identifying those structures in particular are a common task (T1.2).Once these ﬁrst sub-tasks are done, the disassembly may be annotated(T1.3), e.g., marking a register by its associated source code variableor marking an address range with a line of code, loop, or function.Another way to understand disassembly in the context of source is totrace (T1.4) a source variable through the disassembly.

T2: Understand Optimizations.

Analyzing how well a compiler hasoptimized some code is typically focused on the instructions that willbe run the most. Thus, the ﬁrst sub-task would be ﬁnding those areasof interest (T2.1). This is often a winnowing task—decreasing the datato a speciﬁc function, loop, or even line of code. However, it mayalso be a search task, like identifying anomalous code performing anunreasonable number of operations. Thus, the entirety of the code mustbe accessible.Once the area of interest has been found, the analyst will try toidentify the optimizations present (T2.2), such as inlining, vectorization,code variants, or unrolling, and make an assessment (T2.3) regardingwhether the optimizations applied are appropriate or if any are missing.Performance metrics, if available, can also be used to assess the efﬁcacyof the optimizations. Typical queries include “Is this loop vectorized?What about its nested children?” and “How much inlining is there?” Aswith T1, discoveries are annotated (T2.5) during the analysis process.The identiﬁcation of the absence or presence of possible optimiza-tions and their effect on performance may be further supported bycomparing disassembly generated with different source, using differentcompilers, or using different compiler optimizations (T2.4). However,this operation is limited by the difﬁculty of understanding even onecompilation.

The target audience of our project is application or program analysisexperts with experience reviewing disassembly code. We focus onthose who are interested in optimization, but there is overlap with thosewho are trying to debug compilation or build issues as well. Theseexpert users are familiar with DWARF and other debug data, as well asthe limitations in collecting and reporting it. CcNav : C

OMPILATION N AVIGATOR

The existing workﬂow of our collaborators involved understanding thecompiled code using multiple tools to create or view different artifactsrelated to the program such as the source code, disassembly, and debuginformation. The process of relating between these artifacts requires alarge amount of context switching between the different programs andis both labor-intensive and time-consuming.Through our regular meetings, we iterated on the design of

CcNav ,discovering in addition to the ﬁndings of our data and task analyses(Sect. 3), that experts in this style of program analysis: 1) have manystrategies in navigating code artifacts, indicating a highly linked multi-view system could streamline their strategies, and 2) they generate newdata and new data arrangements in the form of supporting annotationsand graphs and that some of this generation could be automated. Thus,we developed a custom analysis program designed in tandem witha highly-coordinated multi-view system to better serve the needs ofcompilation analysis.4

Jul Aug Sep Oct Nov Dec Jan2019 Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan2020 Feb

Dates

T1.1: MatchT1.2: Identify StructuresT1.3: AnnotateT1.4: TraceT2.1: Find InterestT2.2: Identify OptsT2.3: Assess OptsT2.4: CompareT2.5: Annotate Opts T a s k s Fig. 4. Tasks appearing in our meetings and on-site immersive study. The on-site period is highlighted with a gray background. Tasks related tounderstanding the disassembly and ﬁnding areas of interest for optimization dominated. These tasks are necessary to perform the less prevalenttasks but are difﬁcult in their own right, thus our design study focused on them.

We balanced the effort in our design by focusing more on the tasksmost numerous and persistent across time (Fig. 4). These were the tasksnecessary to perform the other tasks: those related to understanding thedisassembly and ﬁnding areas of interest for optimization.The input to

CcNav is a binary ﬁle compiled with debug information.We derive the rest of the data through a custom analysis programdeveloped for this project (Sect. 4.3). We ﬁrst describe the viewsand interactions (Sect. 4.1) which are based in our task analysis andobservations and drove the development of the automated analysis.

CcNav is composed of multiple views which can be arranged, resized,closed, and re-opened by the user via a ﬂexible window managementsystem. We describe these views and their relation to our tasks.

Source Code View. (Fig. 1(a)) The source code view displays a singlesource code ﬁle. By default, it displays the one with the most data,but the ﬁle can be changed in the interface. Multiple lines can beselected and will be highlighted across other views, supporting thetask of matching the source code and disassembly (T1.1). Lines withno mapping are grayed out. We chose not to use syntax highlightto conserve the use of color and because our domain experts did notconsider it a priority.

Disassembly View. (Fig. 1(b)) The disassembly represents the groundtruth of the compiled program. One strategy commonly employed byusers was to use linked navigation to get close to an area of interest nototherwise selectable with information from our automated analysis andthen search by scrolling from there, so we include it in its entirety. Thisview also supports multi-line linked highlighting in support of T1.1.When available, we modify the instruction text to include the associ-ated source code variable name. We denote this by striking through theregister name and presenting the source code with a pink background.This feature supports our annotation (item T1.3), structure identiﬁcation(T1.2), and variable tracing (T1.4) tasks.

Control Flow Graph (CFG) View (Fig. 1(c)) The CFG view shows asubgraph of the full binary CFG, based on the current selection. Priorwork on CFGs by the visualization experts led to this project. However,early meetings indicated matching of source and disassembly was themain workﬂow. Thus, our initial prototypes did not include a CFG.(See Fig. 5 for one such example prototype.) In subsequent meetings,we observed our domain experts had difﬁculty understanding structuressuch as loops (T1.2) with only matching or nesting. We thus chose toprovide such context with a CFG view.We chose the visual design and layout from CFGExplorer [23] as(1) that design was the impetus for our collaboration and (2) the tasksfulﬁlled by a CFG in this project matched well with the tasks in CFG-Explorer. The design is a node-link diagram with a modiﬁed Sugiyamalayout [46] which prioritizes drawing loops similar to by-hand diagramsof small CFGs, matching the mental model of compilation experts. Theconvex hull of loop nodes are drawn with an orange background, with

Fig. 5. An early prototype visualizing the mapping between source codeand disassembly. Even in this toy example, a single line of code isdispersed across the disassembly. Early prototypes like this one alsoclariﬁed the importance of the CFG in conveying structure. nested loops having a darker shade of orange.Instead of showing all contained instructions as CFGExplorer did,we show the block ID and its containing function. We found includ-ing all instructions led to very long nodes which obscured the graphtopology and worked against T1.2. After making the design choice,during one of our pair analytics sessions, our collaborators commentedthey could see the global structure and the connected components inthe graph. They noticed that a program we were viewing had a discon-nected CFG, leading to the insight that library and initialization codewere present but unable to be retrieved by the automated analysis.Another change from CFGExplorer is ﬁltering the graph to a k -hopregion of interest around selected basic blocks. Our data creates CFGsthat are too large for Sugiyama-style layouts. To support the winnowingof data to ﬁnd areas of interest (T2.1), k is conﬁgurable via the interface,with a default of k = Highlighted Items View (Fig. 1(g)) The highlighted items view liststhe highlighted source lines, disassembly lines, and basic blocks withoutcontext. As highlighted items are often dispersed across large rangesof source lines (see Fig. 5), this view provides a way to examine themtogether when the content is more important than the context, e.g.,when assessing the use of instruction types (T2.2, T2.3) or the presenceof variables (T1.2).

Function Inlining View (Fig. 1(f)) Function inlining is one of the mostcommon optimizations performed by compiler and is of great interestto our collaborators. Thus, we design a separate panel for inlininginformation to help identify (T2.2) and navigate (T2.1) to them.5e use a selectable, collapsible indented tree with non-inlined func-tions as the top level and only in-lined children beneath them. Selec-tions in other views will ﬁlter this one.The function inlining hierarchy can get quite large as functions maybe inlined in multiple places and inlining chains into libraries or kernelcode can be tens of layers deep. We include an autocomplete searchfeature to further support navigation (T2.1) and a reset control to restorethe full view.Function calls and therefore inlining forms a hierarchy, so a treevisualization is intuitive. We picked the collapsible indented tree topreserve readability of function names and efﬁciently use screen spacegiven their size and deep nesting. Also, while a direct inlining view isnot common, our audience is familiar with collapsible indented treesfor navigating call stacks or ﬁle systems.

Loop Hierarchy View (Fig. 1(e)) Identifying (T1.2) or navigating to(T2.1) a particular loop is a common operation, so we chose to directlysupport it by creating a loop-centric view. Consistent with the functioninlining view, selections in other views will ﬁlter this one, providingloop context to those other views.We designed this view to show the nesting hierarchy of loops as acollapsible indented tree with linked selection to the other views. Thetop level is the containing function, matching the policy of the functioninlining view.There is no standard way to name loops, nor can the appropriate lineof source code be derived with suitable consistency. Thus, we assignmulti-part IDs to indicate nesting behavior and rely on the analyst tointerpret them further using the other features of

CcNav .Explicitly showing the loop hierarchy is not common and thus thereis no standard view. Our rationale for using a collapsible indentedtree is similar to that of the function inlining view and as it was addedsecond, we chose to keep the designs as consistent as possible.

Call Graph View (Fig. 1(d)) The call graph view shows a subgraph ofthe full call graph, with all functions reported by our analysis regardlessof whether they were inlined. This view provides a way to relateselected disassembly to the functions and call stack in support of T1.2and T1.4. Inlined calls are shown with a dashed red line to help identifythem (T2.2). We chose a node-link diagram to display the call graphas analysts performed navigation tasks [28, 33, 38] on the graph. Thisview supports linked selection with the other views.

Variable Annotation View (Fig. 1(h)) Annotating the disassemblywith source code variable names is a common task (T1.3). While ourautomated analysis provides a best-effort annotation, it is incomplete.We allow the user to manually add annotations with this view. The viewfurther summarizes all active renamings.

We follow a consistent scheme for selections across all the panels. Onperforming any selection, the corresponding items in all text and node-link panels are highlighted with teal border. To reduce scrolling, theviews automatically scroll to the ﬁrst highlighted item in text views andcenter on the ﬁrst highlighted node in node-link views. We do not alterthe zoom for node-link diagrams as users found it disorienting.For our indented tree views, we ﬁlter to matching items rather thanhighlighting them. The ordering of top-level nodes in these trees arenot consistently related to source code structure. The ordering insteadis an artifact of the analysis, and thus the context is less meaningful, sowe ﬁlter these hierarchy views more aggressively.We support a consistent interaction mechanism across all the viewswhere ‘click’ interactions select single items (e.g., line of code, node).Text views support range selection through ‘click and drag’, whilenode-link views support it through brushing.The one exception to our linking is the Highlighted Items View. Wefound linking this view resulted in mis-clicks and mis-selections aspeople focused on this window. However, based on our Evaluation(Sect. 5), we are considering changing this policy in the next iteration.All linking and ﬁltering is calculated based on disassembly address.Unit items in a view (e.g., line of source code, instruction address, basicblock, loop, function) can be represented as a list of corresponding address ranges from our automated analysis. Thus, any selection istranslated into a list of (non-contiguous) address ranges which is thenused to query matching items in all the other views.We use interval trees to speed up queries. Speciﬁcally, we create 4interval trees for storing the address ranges associated with i) lines ofsource code, ii) basic blocks, iii) functions, and iv) loops. These treesare also used to reconstruct the inlining tree and loop hierarchy.

CcNav takes as input a binary ﬁle compiled with debug informationand from it retrieves or generates all additional data used in the visual-ization. This includes: disassembling the binary into disassembly code,retrieving the source code (if reachable), reporting available mappingsbetween source and assembly (including variable names), reportinginlined functions, detecting loops, and generating a control ﬂow graphand call graph. These features were added iteratively, matching withthe visualization design.The data generated by the automated analysis is incomplete bynature due to limitations in what an individual compiler will reportand limitations in state-of-the-art static program analysis. For example,most lines of code do not map to disassembly. These limitations areone reason a completely automated solution is infeasible, leading to ourdesign a visual analytics system to combine partially automated analysiswith expert knowledge and intuition. Furthermore, the limitations driveour design to support multiple workﬂow to target disassembly, as anycommon workﬂow may fail in a particular situation.The automated analysis also provides the front end visualizationwith shortened names of strings greater than 256 characters by elidingthe middle characters in long function names. Our domain expertsindicated further elision was too much.

CcNav is a browser-based client-server application. The automatedanalysis is written in C++ using the Dyninst API [21]. The serverreturns the output in JSON format in a RESTful manner. The client iswritten in Javascript using D3.js [19] with Dagre [2] as the base layoutfor directed graphs. Flexible window management is implementedusing GoldenLayout [7]. The autocomplete search is supported by theawesomplete library [1]. We use the ﬂatten-js interval tree library [5]to speed up the calculation of addresses across our linked views.

VALUATION

To evaluate the effectiveness of

CcNav , we conducted evaluation ses-sions with four participants.

The evaluation sessions were 90 minutes in length and consisted ofan initial brieﬁng, an overview and demonstration of

CcNav , tasks forthe participants, a semi-structured interview, and a debrieﬁng. Theoverview and demonstration used a small example dataset. With ques-tions, the demonstration portion was approximately 25 minutes long.All evaluations were conducted remotely over video conference, withthe facilitator conducting the evaluation sharing his screen.

Participants.

There were four participants. The ﬁrst, P0, was a grad-uate student with experience with disassembly, but not performanceanalysis. The other three, P1-P3 were professionals who often performcompilation performance analysis. P3 attended design meetings for theﬁrst two months of the project, but had not seen any prototypes in theintervening 16 months. P0-P2 were not involved in the design. P2 hada time constraint, so their session was limited to 70 minutes.

Pair Analytics.

We employed pair analytics [15, 26] in our evaluation.Following this method, we encouraged participants to provide speciﬁcinstructions (e.g., ‘click’, ‘scroll’, ‘go to the CFG view’) to the facilita-tor who would then perform them. Participants were also encouragedto ‘think aloud’ as they directed these actions. The facilitator answeredquestions from the participants. One author acted as facilitator whileanother took notes.6

Fig. 6. Drilling down into the loop hierarchy (left) reveals nested loops in the CFG subgraph (right). Associated disassembly (middle) is highlighted.Several registers have been automatically annotated with variable names from source code.

Our choice of pair analytics was driven by our goal of evaluatingwhether the design of

CcNav supported analysis workﬂows . We wantedthe focus to remain on the analysis rather than the troubleshootingassociated with using an in-development system or the learning curveof a new complex system. We also value the beneﬁts of pair analyticsin encouraging participants to communicate their actions and thoughts.However, we recognize there are potential biases associated with pairanalytics, which we discuss further in Sect. 5.5.

Evaluation Dataset.

For the task part of our session, we used theLTIMES application of RAJAPerf [31]. RAJAPerf is a proxy applica-tion for assessing performance and portability of HPC code. The corecomputation of LTIMES is a quadrupally nested loop, of which severalversions are implemented in the same C++ ﬁle. We compiled LTIMESusing Intel C++ Compiler v19.1.0 [8] with ﬂags -O3 and -g . P1 and P3had worked with RAJAPerf before and P2 was familiar with it, thoughnone were particularly familiar with LTIMES. Tasks.

Participants were asked to perform evaluation tasks of increas-ing complexity. Our initial task list included basic tasks like identifyingwhat was inlined in a line of source code. After sessions with P0 andP1, we determined these tasks were too elementary and omitted themto afford more time to the more open-ended tasks. We list the tasksgiven to all participants below with their corresponding task abstractionitems from Sect. 3.3:E1. Identify the assembly of a loop containing a selected line of sourcecode (T1.1, T1.2, T2.1)E2. Identify/Assess vectorization in that loop (T2.2, T2.3)E3. Compare/Assess multiple variants in the source code (T2.4, T2.3)

We summarize our observations of task performance below. A moredetailed account is available in the supplemental materials.

E1: Identify the assembly of a loop containing a selected line ofsource code.

Because a loop spans multiple lines and the mappingbetween source code and disassembly is imperfect, this task requiresmore than straightforward highlighting. All participants were able tocomplete this tasks with different strategies:All participants started by asking to click on the ﬁrst line of theloop in the source code, which highlights the directly correspondingdisassembly, but not the entire loop. They all recognized this fact. P0,P1, and P2 next examined the loop hierarchy view. P0 and P1 askedto click on the loop hierarchy view to highlight the assembly, whileP2 returned to the source code and asked for a range selection. Bothstrategies result in the targeted selection. P2 followed up by asking toclick on the loop hierarchy view, verifying the selection was the same.P3 instead looked at the selected items view, ﬁnding the loop indexvariable ‘z’ annotated and were satisﬁed they had found the code. Whenasked for the loop name in the loop hierarchy, they asked to click on thetop level loop loop3 , similar to the other participants. Observing that both source code and loop hierarchy have ﬁve levels of nested loops,P3 guessed the correct loop.

E2: Identify/Assess vectorization in that loop.

P1, P2, and P3 saidthey would look for vector instruction, but noted they did not recallor know them by name. P0 required some background knowledge onvectorization. The facilitator instructed that the presence of a vectorregister would indicate vectorization. P0, P1, and P2 were suggestednames of vector registers.P0, P1, and P2 started by asking to click on loop 3.1 in the hierarchyview while P3 asked to click on the body of the innermost loop in thesource code, explaining they wanted to look for arithmetic instructionsand unrolling. All participants then went to the selected items view. P0and P3 asked to scroll through them while P1 and P2 chose to search( ctrl-f ). They all discovered vector registers and instructions. P1, P2and P3 concluded the loop was vectorized. P0 followed up by returningto the disassembly view and asking to select the found instruction there.They veriﬁed the loop was selected in the source code and only thenasserted the loop was vectorized.

E3: Compare/Assess multiple variants in the source code.

TheLTIMES application has several versions of the same computation. Inthis task, we focused on two: a) a “base-sequential” (“Base”) versionwith nested four loops, and b) a “RAJA-sequential” (“RAJA”) versionwhere loops are implemented using RAJA constructs and thus thequadruple nesting is not explicitly written in the source ﬁle. Someparticipants also chose to look at a third variant, “lambda-sequential”(“Lambda”) which is like Base, but uses a lambda function for the body.The task was free-form by design. Each participant approached itwith a different strategy. P1, P2, and P3 were able to draw conclusions.P0 was able to isolate the RAJA disassembly, but said they did not knowhow to assess differences due to lack of experience in such analyses.Identifying each variant’s disassembly and assessing the optimiza-tions were key sub-tasks. As in E1 and E2, they typically started byselecting the source code or the loop hierarchy, switching between theseviews to further their search while using the other views to examine thechanges in the selection.Selecting the RAJA disassembly was the most tricky, because whileit could be selected from the full loop hierarchy (P0, P1), selecting thesource code (P0-P3), retrieved only a few instructions and ﬁltered outall loops. All participants recognized this limitation of the automaticmatching with the source and found the full target disassembly eitherby searching for RAJA code in the inlining view (P1) or ﬁnding relatedlines of source code (P0, P2, P3). From there, P0, P1, and P2 used theloop hierarchy to further navigate the disassembly.P3 did not recall that elements in the loop hierarchy could be clickedto drill down and instead examined the selected items view. Spotting theannotations in the disassembly for variable phidat , P3 hypothesizedthey were looking at the data setup, but wanted arithmetic instructionsthat would indicate the loop body. They switched to the full disassemblyview and found some non-highlighted arithmetic instructions and said“that’s completely what we want to see.”7 ig. 7. Loop hierarchy view. Evaluation participant P1 determines theleaves are four variants of the same loop, generated by the compiler toaid loop unrolling.

While navigating the code, the participants all considered the CFGView. However, in many cases they noted it was not enough informationbecause it showed function names and not instructions (P1), oftenreturned disconnected nodes due to ﬁltering (P2), or was too low leveland lacked context (P3). P2 used the call graph view to reason why thenodes were disconnected in their selection. P1 identiﬁed the quadruple-nested RAJA loop in the CFG View (Fig. 6), and from there identiﬁedcandidates for the preamble and postamble loop instructions.In assessing variant similarities, P1 noted the code structure wassimilar between RAJA and Base, but RAJA was obfuscated by a longcall stack. P2 and P3 remarked both versions had similar vectorization.After drilling down further in the loop hierarchy, P2 hypothesized thatboth versions have everything inlined, but there is more overhead inthe RAJA version due to the indirect call. This is consistent withperformance data not used in the evaluation.P1 also compared the Base and Lambda versions, ﬁnding them tobe similar. By navigating down the loop hierarchy, they came to theconclusion that the inner loops (Fig. 7) in both versions were vectorizedand that the leaf loops are “ﬁxing up the ends for the vectorizationunroll.” They repeated the process with the Base version, conﬁrmingtheir hypothesis.

Participants were asked what tasks were easy or difﬁcult and whatfeatures they would like to see. We summarize the resulting discussions.Participants generally liked the linking between all views (P0, P2,P3), with some remarking that the variable renaming is helpful indecreasing the need to switch between multiple sources (P0, P3). P3said of the linking, “It already beats pawing around in something likeVTune” and “I gotta say that variable renaming thing changes so muchin trying to navigate this thing.”Participants also remarked other views were useful for overview andnavigation, including the loop hierarchy view (P1, P2), the functioninlining view (P2), and the CFG and call graph views (P0). P1 notedthe CFG picked up loops well in the RAJA version, but not the baseversion. P3 found the selected items view convenient.Participants expressed difﬁculty with the drill down behavior in theloop hierarchy (P2, P3). P1 noted the autocentering of the source codewas disorienting and wanted more text in the CFG nodes. Suggestionsfor new features included a back button and history (P2, P3), annota-tions of loop preamble, postamble, and body (P1), keyboard shortcuts(P2), and pop-out windows for more space (P3).P2 summarized their remarks with “I’m kind of excited to try this outon a couple of different things.” They later added in email a situationwhere they previously compared different compilation ﬂags for threeversions of the same source code. They manually created a roughequivalent of the selected items view and produced a diff of the results.They remarked

CcNav would have been “easier, faster, and cleaner” ifit supported this kind of comparison. P3 shared that he has compared program performance across com-pilers, noting he would do exactly the tasks from the evaluation sessionwhen trying to determine if the compiler applied the changes correctly.

The participants completed all of the tasks with the exception of thenon-expert P0 on the comparison task. Participants employed a varietyof strategies in each task. We consider this to be positive evidence ofthe system’s ﬂexibility in supporting compilation analysis. Compilationanalysis is complicated and often requires clever ways to probe.For example, in E3 the RAJA disassembly proved non-trivial toisolate. Participants used multiple views in sequence for selection(source, loop hierarchy, function inlining) and multiple views to assessthe results (disassembly, selected items, loop hierarchy, CFG, callgraph). This meandering style of navigation, where participants arefree to consider different facets, matches our task analysis observations.Multiple strategies can further allow analysts to verify discoveries, aswe saw P0 do in E1.Participants also expressed positivity regarding linked navigation,but noted a lack of tool-maintained history supporting their exploration.We observed some participants repeat actions to return to prior views,further underscoring this potential area of improvement.Another goal of

CcNav was to aid users with their mental model. Weobserved all participants using the nested nature of the loop hierarchyto navigate. P1 was able to match disassembly instructions with higherlevel loop constructs using the CFG view.Through the evaluation tasks, the participants performed tasks fromour task abstraction. Source-disassembly matching (T1.1), loop identi-ﬁcation (T1.2), and ﬁnding areas of interest (T2.1) were sub-tasks in allevaluation tasks. Participants identiﬁed (T2.2) and assessed optimiza-tions (T2.3) in E2 and E3. Participant P3 used annotations (T1.3) inE3. We interpret this as validation of our task analysis and of

CcNav ’sability to support those tasks.Though comparison is not supported explicitly, P1-P3 were ableto compare (T2.4) results of different versions of the same code inE3. The only tasks not demonstrated were tracing a variable (T1.4)and annotating optimizations (T2.5). These weren’t required by theevaluation tasks and as they were the least performed tasks over ourdesign study meetings, they were the lowest priority in our design.All views were used by at least one participant to achieve someinsight during the evaluation. We interpret this as validation for ourchoice of views. However, there was also some confusion caused bysome of these views, many of which are related to selection and ﬁlteringchoices, explained below. Another issue is the call graph view can getvery wide—a more compact layout will require further research.Though the participants acknowledged limitations in debug infor-mation, these limitations still led to confusion regarding some of theselections. For example, participants clicked on the for loop line ratherthan range-selecting the whole loop. There was similar confusion withhow much context was shown in the loop hierarchy and CFG. We be-lieve both can be improved by showing more nesting context. We havesince revised our CFG view to pull in the entirety of loops overlappingthe selection rather than only those within the k -hop radius. All participants came from the authors’ institutions. In brieﬁng theywere told the purpose was to evaluate

CcNav and determine issues forfuture iterations. However they may still have been inclined to givepositive feedback.The small participant pool in this evaluation limits its generalizability.Though the group was small, they demonstrated similar patterns inselecting disassembly of interest and using the source, disassembly,loop hierarchy, and selected items views. However, use of the inlining,CFG, and call graph views was more unique and should be interpretedas preliminary and with caution.The remote nature of our evaluation required some concessions. Allparticipants required a larger font size, decreasing screen real estate.Also, they could not point to anything on screen or “take the reins,”which may have changed their behavior.8

All participants asked for reminders regarding details of particularviews or interactions. Due to the complexity of both the visualizationand their tasks, the demonstration was insufﬁcient. Furthermore, thebasic tasks performed by P0 and P1 (see supplementary materials) mayhave had a tutorial effect, accounting for some participant differences.While pair analytics may have alleviated some of the training issues,it may have also altered participant actions. For example, P3 didnot recall they could click on the loop hierarchy and was unable torediscover the functionality through remote pair analytics. We did notsuggest it to them because they did not explicitly state that was theirintended effect and thus we did not want to bias them.In addition to limiting participant discovery, there is a complemen-tary threat of leading, over-interpreting, or otherwise biasing participantactions. To mitigate biased ﬁndings, we explicitly recorded and reportedwhere the facilitator made suggestions or answered complex questions.These are available in our more detailed description of participantactions in the supplemental material.On reﬂection, in future projects with similarly complex tasks, wecould combine pair analytics sessions for one set of participants withtraditional sessions with another, thereby covering the limitations andenjoying the beneﬁts of both. However, it may be difﬁcult to recruitenough qualiﬁed participants.

EFLECTIONS AND L ESSONS L EARNED

We reﬂect on our design study and what we learned regarding transfer-ability between design studies and immersion in the design process.

Transferability from a previous, highly-related design study wasbeneﬁcial, but more limited than expected.

One key outcome ofdesign studies is transferable design knowledge, but it can be difﬁcult toassess in what ways and to what extent such knowledge is transferable.This design study started in response to the domain experts seeing thevisualization experts’ previous work, CFGExplorer [23]. The domainexperts were particularly interested in the custom node-link layout ofCFGExplorer and its linking to the assembly code. They wanted todirectly extend

CFGExplorer for their problem. The team thus beganthe project assuming previous work would be highly transferable andthe process would be like a design iteration. However, in practice, wefound the process more similar to a new design.While two of the main data types (CFG and disassembly code) werethe same between

CcNav and CFGExplorer, the goals of our users, andthus the tasks the visualizations had to support, differed enough thatwe started the design anew. In CFGExplorer, domain experts are tryingto recover parallelizable loops from the disassembly and CFG only. In

CcNav , domain experts are trying to understand what optimizationswere performed on their source. This shift in goals prioritizes sourcecode in

CcNav , a data type that was not available in CFGExplorer.Despite our initial assumptions, we avoided premature design com-mitment to CFGExplorer by restarting our task analysis, questioningdesign choices frequently, and creating revolutionary prototypes. Thesecorrespond to the discover, design, and implement phases of designmethodology. We did not assume any tasks going into our ﬁrst meeting.The workﬂow described in that meeting emphasized the correspon-dence between source code and disassembly. We thus questioned theassumption to include the CFG and ultimately decided to omit it fromour ﬁrst design/prototype based on the experts’ described operations.However, when we tried analyzing a problem using this prototype withthe experts on our team, the value of the CFG became clear. The expertsstruggled to understand and recall how disassembly instructions relatedto loops, despite source code linking. This discovery led us to addthe CFG view. Following this early design discovery, we continued toquestion our designs as the project evolved.The custom node-link layout from CFGExplorer transferred becausethe primary tasks it served remained the same, albeit in a lesser role. Inboth CFGExplorer and

CcNav , the node-link view serves in building amental model from disassembly code and identifying loops. The twoprojects differ in their use of this view only in the level of detail required.In

CcNav , some of the lower level operations, such as determining theloop bounds, were better served by the linked source code that wasunavailable to CFGExplorer.

Immersive data analysis and prototyping activities had the mostinﬂuence on our design.

Immersive activities are those in which vi-sualization experts engage in the work of the target domain or viceversa. We found immersive data analysis and prototyping activities, ascatalogued by the Design by Immersion framework of Hall et al. [30],to be the most fruitful. These correspond to the discover and implementphases of design study methodology. In particular, one visualizationexpert performed typical analyses “by hand” (Fig. 3) and both visual-ization and domain experts engaged in collaborative analyses with thein-process prototype.The collaborative analyses provided insight into the data analysisprocess and feedback on the prototype simultaneously. These analysissessions occurred during biweekly meetings. The meetings were remotewith the lead author driving a pair analytics session, sometimes usingthe prototype in tandem with ad hoc ﬁle browsing when features werenot yet implemented or even yet ideated. This process helped us ﬁndgaps in our design.Prototyping the visualization was done in tandem with prototypingthe automated analysis. This is also a prototyping activity as noted byHall et al. and adds a “moving target” element as discussed by Williamset al. [47]. As noted by Williams et al., the copious documentation oftasks and interests over time helped us to prioritize design elements thatfulﬁlled long-standing task-needs over those that had gained attentionﬂeetingly.As the collaborative, immersive analysis processes required deepattention, we found it especially helpful to have multiple people fromthe visualization team present when interacting with prototypes. Thissetup allowed one visualization expert to become fully immersed inthe activity and workﬂow without pause, while reserving another togenerate the observation artifacts that were used to reﬁne the taskanalysis and design over time.We note that both of these ﬁndings relate to the core stages of thedesign study framework: discover, design, implement, and deploy [44].Design Study Methodology notes that stages may overlap and theprocess can iterate through any sub-loop of stages. This overlappingand looping describes our workﬂow with the core stages, to the pointwhere we might even consider them continuous. In particular, theopen nature of the discover phase was important in understandingthe differences in needs between CFGExplorer and

CcNav as well asadapting to the evolving capabilities of the automated analysis and thereﬁnement naturally arising through iteration.

ONCLUSION

We have presented a data abstraction, task analysis, and visual analyticssystem,

CcNav , for analyzing how an application is translated intooptimized machine instructions by the compiler. Through evaluationsessions, we showed

CcNav assisted in performing tasks common toexperts’ workﬂows. We found it was important to support a variety ofpaths through different representations of the instructions and sourcecode. We also observed that while experts appreciated automatedassistance and acknowledged its limitations, integration still led toconfusion, which we plan to continue to address in future work.In conducting this design study, we found that immersive activitiessuch as collaborative analysis sessions, having visualization expertsperform analysis workﬂows, and frequent engagement with unpolishedprototypes elicited rich feedback aiding our task analysis and visualdesign. We also found that despite a high degree of similarity betweenthis design study and a previous one, the transferability between designswas limited. The immersive activities helped us identify this quicklyand careful task analysis allowed us to retain the transferable elements.

CKNOWLEDGEMENTS

We thank our study participants for their valuable time and the LLNLLEARN project, LLNS B639881 & B630670, and NSF III-1656958& IIS-1844573 for supporting this research. This work performedunder the auspices of the U.S. Department of Energy by LawrenceLivermore National Laboratory under Contract DE-AC52-07NA27344.LLNL-CONF-812737.9

EFERENCES [1] Awesomplete: Lightweight autocomplete widget. https://leaverou.github.io/awesomplete/ . Accessed: 2020-04-26.[2] dagrejs: Directed graph layout for javascript. Accessed: 2020-04-23.[3] Dwarf debugging standard. http://dwarfstd.org/ . Accessed: 2020-04-08.[4] dwarfdump. . Accessed: 2020-04-08.[5] Flatten-js: Interval binary search tree. https://github.com/alexbol99/flatten-interval-tree . Accessed: 2020-04-26.[6] Ghidra software reverse engineering framework. https://ghidra-sre.org/ . Accessed: 2020-04-08.[7] Goldenlayout: Multi-screen layout manager for webapps. http://golden-layout.com/ . Accessed: 2020-04-23.[8] Intel c++ compiler. https://software.intel.com/en-us/c-compilers . Accessed: 2020-04-27.[9] Intel vtune proﬁler. https://software.intel.com/en-us/vtune .Accessed: 2020-04-08.[10] objdump. https://linux.die.net/man/1/objdump . Accessed:2020-04-08.[11] Totalview hpc debugging software. https://totalview.io/products/totalview . Accessed: 2020-04-08.[12] Hpctoolkit: tools for performance analysis of optimized parallel programs.

Concurrency and Computation: Practice and Experience

Proceedings of the 5th International Symposium on SoftwareVisualization , SoftVis, pp. 73–82. ACM, New York, NY, USA, 2010. doi:10.1145/1879211.1879224[15] R. Arias-Hernandez, L. T. Kaastra, T. M. Green, and B. Fisher. Pairanalytics: Capturing reasoning processes in collaborative visual analytics.In , pp.1–10, 2011.[16] D. Baum, C. Sixtus, L. Vogelsberg, and U. Eisenecker. Understandingconditional compilation through integrated representation of variabilityand source code. In

Proceedings of the 23rd International Systems andSoftware Product Line Conference - Volume B , SPLC ’19, p. 21–24. As-sociation for Computing Machinery, New York, NY, USA, 2019. doi: 10.1145/3307630.3342387[17] A. Bergel, A. Bhatele, D. Boehme, P. Gralka, K. Grifﬁn, M.-A. Hermanns,D. Okanovi´c, O. Pearce, and T. Vierjahn. Visual analytics challenges inanalyzing calling context trees. In A. Bhatele, D. Boehme, J. A. Levine,A. D. Malony, and M. Schulz, eds.,

Programming and Performance Vi-sualization Tools , pp. 233–249. Springer International Publishing, Cham,2019.[18] C. Bezemer, J. Pouwelse, and B. Gregg. Understanding software perfor-mance regressions using differential ﬂame graphs. In , pp. 535–539, 2015.[19] M. Bostock, V. Ogievetsky, and J. Heer. D3: Data-driven documents.

IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis) , 2011.[20] M. Brehmer and T. Munzner. A multi-level typology of abstract visualiza-tion tasks.

IEEE Transactions on Visualization and Computer Graphics ,19(12):2376–2385, Dec 2013. doi: 10.1109/TVCG.2013.124[21] B. Buck and J. K. Hollingsworth. An api for runtime code patching.

Int. J. High Perform. Comput. Appl. , 14(4):317–329, Nov. 2000. doi: 10.1177/109434200001400404[22] L. DeRose, B. Homer, and D. Johnson. Detecting application load im-balance on high end massively parallel systems. In A.-M. Kermarrec,L. Boug´e, and T. Priol, eds.,

Euro-Par , vol. 4641, of

Lecture Notes inComputer Science , pp. 150–159. Springer, 2007,. doi: 10.1007/978-3-540-74466-5 17[23] S. Devkota and K. E. Isaacs. Cfgexplorer: Designing a visual control ﬂowanalytics system around basic program analysis operations.

ComputerGraphics Forum , 37(3):453–464, 2018. doi: 10.1111/cgf.13433[24] S. G. Eick and J. L. Steffen. Visualizing code proﬁling line orientedstatistics. In

Proceedings of the 3rd Conference on Visualization ’92 , VIS’92, p. 210–217. IEEE Computer Society Press, Washington, DC, USA,1992.[25] J. Ellson, E. Gansner, L. Koutsoﬁos, S. North, and G. W. Woodhull. GraphViz — open source graph drawing tools. In

Lecture Notes in Com-puter Science , pp. 483–484. Springer-Verlag, 2001.[26] N. Elmqvist and J. S. Yi. Patterns for visualization evaluation. In

Pro-ceedings of the 2012 BELIV Workshop: Beyond Time and Errors - NovelEvaluation Methods for Visualization , BELIV ’12. Association for Com-puting Machinery, New York, NY, USA, 2012. doi: 10.1145/2442576.2442588[27] M. Geimer, F. Wolf, B. J. N. Wylie, E. ´Abrah´am, D. Becker, and B. Mohr.The Scalasca performance toolset architecture.

Concurr. Comput. : Pract.Exper. , 22(6):702–719, apr, 2010. doi: 10.1002/cpe.v22:6[28] M. Ghoniem, J. . Fekete, and P. Castagliola. A comparison of the readabil-ity of graphs using node-link and matrix-based representations. In

IEEESymposium on Information Visualization , pp. 17–24, 2004.[29] M. Godbolt. Godbolt compiler explorer. https://github.com/mattgodbolt/compiler-explorer . Accessed: 2020-04-08.[30] K. W. Hall, A. J. Bradley, U. Hinrichs, S. Huron, J. Wood, C. Collins, andS. Carpendale. Design by immersion: A transdisciplinary approach toproblem-driven visualizations.

IEEE Transactions on Visualization andComputer Graphics , 26(1):109–118, 2020.[31] R. Hornung. Raja performance suite. https://github.com/LLNL/RAJAPerf . Accessed: 2020-04-26.[32] T. Karrer, J.-P. Kr¨amer, J. Diehl, B. Hartmann, and J. Borchers. Stack-splorer: Call graph navigation helps increasing code maintenance efﬁ-ciency. In

Proceedings of the 24th Annual ACM Symposium on UserInterface Software and Technology , UIST, pp. 217–224. ACM, New York,NY, USA, 2011. doi: 10.1145/2047196.2047225[33] R. Keller, C. Eckert, and P. Clarkson. Matrices or node-link diagrams:Which visual representation is better for visualising connectivity models?

Information Visualization , 5:62–76, 04 2006. doi: 10.1057/palgrave.ivs.9500116[34] D. M. Koppelman and C. J. Michael. Discovering barriers to efﬁcientexecution, both obvious and subtle, using instruction-level visualization.In

Proceedings of the 1st Workshop on Visual Performance Analysis , VPA,pp. 36 – 41, Nov 2014. doi: 10.1109/VPA.2014.11[35] J. Krinke. Visualization of program dependence and slices. In ,pp. 168–177, 2004.[36] H. Lam, M. Tory, and T. Munzner. Bridging from goals to tasks withdesign study analysis reports.

IEEE Transactions on Visualization andComputer Graphics , 24:435–445, 2018.[37] C. Lattner and V. Adve. Llvm: A compilation framework for lifelongprogram analysis & transformation. In

Proceedings of the InternationalSymposium on Code Generation and Optimization: Feedback-Directedand Runtime Optimization , CGO ’04, p. 75. IEEE Computer Society, USA,2004.[38] B. Lee, C. Plaisant, C. S. Parr, J.-D. Fekete, and N. Henry. Task taxonomyfor graph visualization. In

Proceedings of the 2006 AVI Workshop onBEyond Time and Errors: Novel Evaluation Methods for InformationVisualization , BELIV ’06, p. 1–5. Association for Computing Machinery,New York, NY, USA, 2006. doi: 10.1145/1168149.1168168[39] Y. Li, S. Ding, Q. Zhang, and D. Italiano. Debug information validation foroptimized code. In

Programming Languages Design and Implementation(PLDI) , 2020.[40] S. Lin, F. Ta¨ıani, T. C. Ormerod, and L. J. Ball. Towards anomaly com-prehension: Using structural compression to navigate proﬁling call-trees.In

Proceedings of the 5th International Symposium on Software Visualiza-tion , SOFTVIS, pp. 103–112. ACM, New York, NY, USA, 2010. doi: 10.1145/1879211.1879228[41] J. Roberts and C. Zilles. TraceVis: An execution trace visualization tool.In

MoBS ’05

Theoretical Issues in ErgonomicsScience , 11:504–531, 11 2010. doi: 10.1080/14639220903165169[44] M. Sedlmair, M. Meyer, and T. Munzner. Design study methodology:Reﬂections from the trenches and the stacks.

IEEE Transactions onVisualization and Computer Graphics , 18(12):2431–2440, Dec 2012. doi:10.1109/TVCG.2012.213[45] C. Stolte, R. Bosch, P. Hanrahan, and M. Rosenblum. Visualizing appli-cation behavior on superscalar processors. In

Proceedings 1999 IEEESymposium on Information Visualization (InfoVis’99) , pp. 10–17, 1999. [46] K. Sugiyama, S. Tagawa, and M. Toda. Methods for visual understandingof hierarchical system structures. IEEE Transactions on Systems, Man,and Cybernetics , 11(2):109–125, 1981.[47] K. Williams, A. Bigelow, and K. E. Isaacs. Visualizing a moving target: Adesign study on task parallel programs in the presence of evolving data andconcerns.

To appear in IEEE Transactions on Visualization and ComputerGraphics (Proceedings of InfoVis ’19) , Jan. 2020. doi: 10.1109/TVCG.2019.2934285[48] Y. Zhang, K. Chanana, and C. Dunne. Idmvis: Temporal event sequencevisualization for type 1 diabetes treatment decision support.

IEEE Trans-actions on Visualization & Computer Graphics , 25(01):512–522, jan 2019.doi: 10.1109/TVCG.2018.2865076, 25(01):512–522, jan 2019.doi: 10.1109/TVCG.2018.2865076