[PDF] How Researchers Use Diagrams in Communicating Neural Network Systems

Abstract

Neural networks are a prevalent and effective machine learning component, and their application is leading to significant scientific progress in many domains. As the field of neural network systems is fast growing, it is important to understand how advances are communicated. Diagrams are key to this, appearing in almost all papers describing novel systems. This paper reports on a study into the use of neural network system diagrams, through interviews, card sorting, and qualitative feedback structured around ecologically-derived examples. We find high diversity of usage, perception and preference in both creation and interpretation of diagrams, examining this in the context of existing design, information visualisation, and user experience guidelines. Considering the interview data alongside existing guidance, we propose guidelines aiming to improve the way in which neural network system diagrams are constructed.

Full PDF

HHighlights

How Researchers Use Diagrams in Communicating Neural Network Systems

Guy Clarke Marshall,André Freitas,Caroline Jay• Twelve artiﬁcial intelligence system experts are interviewed about their use of diagrams.• Diﬀerences in interpretation, preference and use of scholarly system diagrams are discovered.• Priorities and problems that scholarly system diagram users encounter are identiﬁed.• Guidelines for neural network system diagrams are proposed. a r X i v : . [ c s . H C ] A ug ow Researchers Use Diagrams in Communicating NeuralNetwork Systems ⋆ Guy Clarke Marshall a , ∗ , André Freitas a and Caroline Jay a a Department of Computer Science, University of Manchester, UK

A R T I C L E I N F O

Keywords :Neural NetworksSystemsDiagramsInterview Study

A B S T R A C T

Neural networks are a prevalent and eﬀective machine learning component, and their applicationis leading to signiﬁcant scientiﬁc progress in many domains. As the ﬁeld of neural networksystems is fast growing, it is important to understand how advances are communicated. Diagramsare key to this, appearing in almost all papers describing novel systems. This paper reports ona study into the use of neural network system diagrams, through interviews, card sorting, andqualitative feedback structured around ecologically-derived examples. We ﬁnd high diversityof usage, perception and preference in both creation and interpretation of diagrams, examiningthis in the context of existing design, information visualisation, and user experience guidelines.Considering the interview data alongside existing guidance, we propose guidelines aiming toimprove the way in which neural network system diagrams are constructed.

1. Introduction

Neural networks are often used in Artiﬁcial Intelligence (AI) systems. Two important application domains areNatural Language Processing (NLP) and Computer Vision (CV), specialising in the creation of systems to performtasks involving language or images respectively. In addition to the core areas of classiﬁcation and pattern prediction,neural network systems have been successfully applied in complex domains such as autonomous driving, languagetranslation, or automated question answering.Increasingly complex and niche application areas are being identiﬁed, and systems built to address these problems.SemEval, an annual semantic evaluation workshop, has diﬀerent tasks each year. In 2020 the tasks included MemotionAnalysis (the analysis of internet memes), Detection of Propaganda Techniques in News Articles, and MultilingualOﬀensive Language Identiﬁcation in Social Media (SemEval-2020, 2020). Neural network systems have demonstratedthe potential to address a wide range of modern issues. In some cases, neural network systems have been created whichoutperform humans by a considerable margin, such as recently found in biology, in an image classiﬁcation task, wherea neural network’s 90% accuracy signiﬁcantly outperforms the 50% human expert accuracy (Buetti-Dinh et al., 2019).With such a wide range of useful application areas, and with such demonstrable potential advancement, there is a hugeamount of scholarly activity related to neural networks.As in other disciplines, scholars communicating advances to their community do so through journal and conferencepapers, which often include a system diagram. Interpretation of these diagrams is an important part of scholarlycommunication about neural network systems. An example diagram is shown in Figure 1.Incorrect interpretation of these diagrams has the potential to cause misunderstandings about the system design,leading to an incorrect understanding of the scientiﬁc advancement by other researchers. For scientists and softwareengineers applying the research in their application areas, there is the risk of wasting time in applying unsuitabletechniques, again through a lack of proper understanding of the system. For these reasons, accuracy, clarity, andoverall eﬀectiveness of system diagrams is important.We use an interview study in order to capture rich, individual feedback about diagrams. We explore a broad range oftopics about the use of diagrams, and uncover preferences and communication issues, in order to generate requirementsfor guidelines. The guidelines we propose aim to facilitate improvement to communication concerning advances inneural network systems. ⋆ Funding: This work was supported by the Department of Computer Science, University of Manchester, UK. ∗ Corresponding author [email protected] (G.C. Marshall)

ORCID (s):

GC Marshall et al.:

Preprint submitted to Elsevier

Page 1 of 21 eural Network Diagrams Interview Study

Figure 1:

An example scholarly neural network system diagram, from Maharjan et al. (2018)

Figure 2:

ACL (Joyce Chai and Tetreault, 2020a,b), CVPR (ComputerVisionFoundation, 2019, 2020) and SIGCHI (ACM,2019, 2020) long and short paper submissions

Figure 2 shows main track long paper submissions to ACL (Natural language processing), CVPR (Computer vision)and SIGCHI (Human-computer interaction) conferences from 2009 to 2020. Note the rapid increase in submissionsto ACL and CVPR since 2017, even relative to the number of SIGCHI submissions, which is also fast growing. Eachof these conferences has the highest h5-index and highest attendee numbers in their domains, and have similar (ap-proximately 25%) acceptance rates. From 2017 to 2020, there was 30% increase in submissions to SIGCHI comparedwith 160% and 119% increases for ACL and CVPR respectively. This comes with familiar administrative issues fororganisers, and also for researchers in remaining current with the ﬁeld. The fast pace also increases the importance ofeﬀective communication.In terms of scholarly communication, AI research consists of journals, conferences and reviews. Perhaps in partdue to the fast pace of development the ﬁeld, conferences are particularly prestigious in Computer Science (Freyneet al., 2010), and for this reason we focus our analysis and discussion on conference proceedings.

Neural network systems are usually designed and trained to perform a speciﬁc task, such as classifying images orpredicting the next word in a sentence. A neural network system can be considered to encompass the entire softwaresystem, rather than a distinct neural component in isolation. This scope corresponds well to the content commonlyincluded in diagrams in scholarly publications.

GC Marshall et al.:

Preprint submitted to Elsevier

Page 2 of 21eural Network Diagrams Interview Study

A neural network takes an input (such as text or images), and then processes this via a series of layers , to arriveat an output (classiﬁcation/prediction). Within each layer are a number of nodes that hold information and transmitoutputs to nodes in other layers. Speciﬁc mathematical functions or operations are also used in these systems, such assigmoid, concatenate, softmax, max pooling, and loss. Hyperparameters are parameters used to control the learningprocess, such as the learning rate, and are often tuned for each system implementation. The system architecture de-scribes the way in which the components are arranged. Diﬀerent architectures are used for diﬀerent types of activities.For example Convolutional Neural Networks (CNNs) are commonly used for processing images. Long Short TermMemory networks (LSTMs), a type of Recurrent Neural Network (RNN) which are designed for processing sequences,are often used for text.Neural networks "learn" a function, but have to be trained to do so. Training consists of providing inputs andexpected outputs, allowing the system to develop an understanding of how an input should be interpreted. The systemis then tested with unseen inputs, to measure whether it is able to handle these correctly and generalise to new cases.A more detailed introduction to the ﬁeld is provided by Goodfellow et al. (2016).

Diagrams are a useful way of representing general systems. Peirce, an American philosopher and semiotician,deﬁnes diagrams as "icons of relation" (Peirce and Moore, 1998). In Cybernetics, a system can be deﬁned as "anintegral set of elements in the aggregate of their relations and connections" (Novikov, 2015). The shared emphasis onrelations suggest that diagrams may be a suitable and useful representation for systems.In practice, diagrams are a prevalent medium for communicating neural network systems. Examining neural net-work system diagrams found in conference proceedings, there are few conventions. There is variety at a structurallevel, in terms of what is represented (be it inclusion of an example, data shapes, processing steps, or class names),the level of granularity, and how it is represented (such as blocks, graphical icons, natural language, or mathematicalnotation). At a lower level there is variety in how fundamental elements such as vectors are represented as graphicalcomponents, sometimes even within the same diagram. This contrasts with terms in text, equations, pseudocode, andcode, which are predominantly formalised and consistent.

The study addresses the following research questions, in the domain of neural network systems:• Why do people create system diagrams for scholarly papers? (Interview)• How do people create them? (Interview)• What tasks do system diagrams support for readers of scholarly papers? (Card sorting; Interview)• What aspects of presentation do people ﬁnd helpful or confusing? (Example diagrams; Interview)The research questions are designed to gathering requirements for potential diagrammatic tools and identify avenuesfor future research. Additionally, this information is useful to researchers in the domain of neural network systems, toinform diagram design.We ﬁnd a large variety of opinions, with only slight agreement on preference of example diagrams, task importance,and diagramming tools. We also identify a number of areas causing confusion to readers, such as whether a precisedepiction is meaningful, and the omission of expected details from a diagram. We also report that for some readers,scholarly system diagrams provide an overview of the system, allowing them to quickly understand a paper. Thishighlights the importance and unique role of diagrams in scholarly communication.

2. Related work

This section discusses literature related to each of the research questions, and concludes with related work con-cerning scholarly ﬁgures.

GC Marshall et al.:

Preprint submitted to Elsevier

Page 3 of 21eural Network Diagrams Interview Study

There are good reasons for using diagrams generally, which also apply to system diagrams. Diagrams make abstractproperties and relations accessible (Hutchins, 1995, 2005). They are external representations which support cognitiveprocesses (Clark and Chalmers, 1998; Zhang and Norman, 1994). Further, in a public setting, they can enable collec-tive or distributed thinking (Peirce and Moore, 1998). Diagrams can also be "manipulated in order to proﬁle knowninformation in an optimal fashion" (Tylén et al., 2014). Cognitive and perceptual beneﬁts of diagrams for handlingcomplexity are well documented, particularly in their ability to limit abstraction and aid "processibility" (Stenning andOberlander, 1995). Each of these attributes of diagrams has the capability to support research processes.More speciﬁcally, there are beneﬁts in using visual representations to display information. Van Wijk’s (2005)economic model provides quantiﬁcation of this value, by adding "cost" to each activity. For example, a useful businessinformation visualisation that reduces employee time taken and is frequently used gives a measurable ﬁnancial beneﬁt,and the ﬁnancial cost of initially building and maintaining the visualisation determines the return on investment, andwhether the visualisation is good value. From an Information Visualisation perspective, diagrams make informationmore useful by removing noise, and improve the accessibility of complex algorithms (Keim, 2002).In education, cognitive beneﬁts of diagrams have been researched in Venn diagrams, tree diagrams and otherrepresentations "encouraging thought regarding the whole and its parts" (Stokes, 2002). In a recent meta-study, Guoet al. (2020) showed diagrams had a moderate overall positive eﬀect on the comprehension of educational texts. Bothresearch and education require information searching behaviour, so it would be reasonable to expect some elementsof these education domain results to also hold in the research domain. However, compared with research tasks whichare primarily communicative, education tasks have diﬀerent, pedagogical, desired outcomes. See Tippett (2016) for asystematic review of visual representations in science education. The substantial diﬀerences in user proﬁles, use casesand representational choices compared with scholarly research lead us to exclude the education domain from furtherdiscussion.

In terms of the process for creating diagrams, cognitive theories are helpful for understanding how people sum-marise and integrate information. There is a close relation between "how" and "why" diagrams are created, particularlyin terms of perceptual and cognitive attributes. As such, work found in Section 2.1 is also relevant to this researchquestion. Author’s mental models (Johnson-Laird, 1983) of systems has been explored in the AI system diagram do-main (Marshall et al., 2020b), suggesting a close relationship between the author’s mental model and the diagram theycreate.In an interview study conducted retrospectively with building architects, Suwa and Tversky (1997) use a protocolanalysis to demonstrate the utility of sketches in "crystallizing design ideas" .Conceptual diagramming, using diagrams to support the cognition of concepts, can be considered a closely relateddomain, if we consider a system architecture to be a conceptualisation of the design. In an interview study of conceptualdiagramming, Ma’ayan et al. (2020) investigated how people draw diagrams relating to complex concepts, includingcomputer systems, in order to generate requirements for diagramming tools. They focus on what they term "naturaldiagramming" , which refers to the author having a direct relation between their conceptualisation and the diagram, andbeing able to use the diagram to explore a conceptual space.

Diﬀerent ways of writing things down can lead to vastly diﬀerent outcomes, both in natural language (Evans, 2006;Wason and Johnson-Laird, 1972) and diagrams (Shimojima, 2015), particularly mathematical education diagrams(Diezmann, 1999; Martinovic et al., 2013; Novick et al., 1999). Speciﬁc graphical objects used in diagrams can conveyentirely diﬀerent meanings, aiding or hindering accurate interpretation. In mechanical drawings, experiments haveshown that the addition of arrows alters a structural diagram to convey functional information (Heiser and Tversky,2006). Physics of notation, a diagram analysis framework proposed by Moody (2009) which is increasingly used todesign new notations (Van Der Linden and Hadar, 2018), includes a category for "semantic transparency" , wherechosen visual representations automatically suggest their meaning.Additional related work from Design, Information Visualisation, and User Experience domains, which providefurther insight into helpful and hindering practices, are discussed in Section 5.4. The later placement of this addi-tional related work allows more speciﬁc discussion, with reference to our domain, and facilitates comparison with theinterview outputs.

GC Marshall et al.:

Preprint submitted to Elsevier

Page 4 of 21eural Network Diagrams Interview Study

We are not aware of any prior work examining usage of scholarly neural network system diagrams. There isa signiﬁcant body of empirical systems diagram research for Uniﬁed Modeling Language (UML), a diagrammaticlanguage used for software diagrams (Booch et al., 1998). The tasks commonly studied in empirical research are forsoftware engineering, rather than software research, and often prioritise error detection (Gopalakrishnan et al., 2010) ormaintenance (Soh et al., 2012), alongside more generally applicable diagramming topics such as cognitive integration(Hahn and Kim, 1999) or comprehension (Purchase et al., 2003).Tasks that researchers ask participants to perform using diagrams are often stated without evidence or discussion,such as the examples given above. In experiments on ﬂow maps with non-specialist users, Koylu and Guo (2017)conclude that "The inﬂuence of the design on performance and perception depends on the type of the task" , suggestingthis is a useful research question.

Carberry et al. (2006) note that information sometimes resides in ﬁgures that cannot be found elsewhere in the text.This suggests that diagrams contain content not available elsewhere, and as such may have an important and uniquerole when reading and extracting information from a paper.Rowley-Jolivet (2000) examines the academic conference presentation as a medium, investigating the diﬀerenttypes of imagery used, from photographs to system diagrams. Rowley-Jolivet found that in Physics 52% of slides wereof images. This highlights the prevalence of diagrams as part of scientiﬁc scholarly communication.Figures and diagrams are important and prevalent, but discussion of these is limited in popular academic writingguides. Swales and Feak’s (2004) "Academic Writing for Graduate Students", despite including 11 conceptual dia-grams to explain their own work, only gives guidance for the use of charts, not for the use of other ﬁgures such assystem or conceptual diagrams. In 212 pages, the single mention of ﬁgures or diagrams in Murray’s (2009) "Writingfor academic journals" is the rhetorical question "Do you have any ﬁgures, diagrams or tables to include?" . Hall’s(2012) medically-focused "How to write a paper" discusses some speciﬁc areas related to diagramming, providingextensive advice for captions, legends and referencing the ﬁgure in text, and advising brevity and minimising duplica-tion for the content of diagrams. Hall deals with ﬁgures and illustrations primarily relating to graphs in the "Results"section, and also notes in the "Methods" section that "A diagram may be helpful if the design of the study is complex orif a complicated sequence of interventions is carried out" . This is the only reference to system diagrams. No furtherguidance on content or presentation of diagrams is given. Schimel’s (2012) "Writing Science" includes limited adviceon referencing a chart in the text, and their advice on diagrams and ﬁgures extends only to the following comment: "Ihave always felt that I don’t understand something until I can draw a cartoon to explain it. A simple diagram or model- the clearer the picture, the better" .None of the above paper writing guides include a chapter, section or subsection discussing diagrams. These ex-amples are indicative of the usual level of diagram discussion in highly cited scholarly paper-writing guides. Thereare exceptions to this brevity. One such relevant domain-speciﬁc paper writing guide providing some depth of dia-gramming advice is "Writing for Computer Science", in which Zobel (2004) includes one chapter and two additionalsubsections about ﬁgures. This includes tables, algorithm ﬁgures, graphs, and ﬁgures in slide presentations. For sys-tem diagrams, Zobel suggests making use of sketches, using available diagrammatic languages for the speciﬁc domain,and outlines general design considerations such as removing clutter. These guidelines are not evidence-based, and arepublished without any citations, though many replicate the inﬂuential advice of Tufte et al. (1990). Zobel notes that:"Diagrams illustrating system structure often seem to be poor. In too many of these pictures the symbolismis inconsistent: boxes have diﬀerent meanings in diﬀerent places, lines represent both control ﬂow and dataﬂow, objects of primary interest are not distinguished from minor components, and so on. Unnecessaryelements are included, such as cheesy clip-art or computer components that are irrelevant to the system."Graphical abstracts (GAs) are diagrams which summarise scholarly work, and are "increasingly required by pub-lishers to make scientiﬁc ﬁndings more accessible across and within disciplines" (Hullman and Bach, 2018). In theiranalysis of 54 GAs, Hullman and Bach (2018) deﬁne a taxonomy to describe, classify and analyse the visual struc-ture of GAs, noting "design of GAs is more diverse in its use of spatial layout than the textbook diagrams, whichwere presumably created by professional artists" . At present, formal graphical abstracts are uncommon in ComputerScience.

GC Marshall et al.:

Preprint submitted to Elsevier

Page 5 of 21eural Network Diagrams Interview StudyCode Role Sector SpecialismP1 PhD year 1 Academic AI for PhysicsP2 PhD year 3 Academic NLPP3 Postdoctoral Academic NLPP4 Postdoctoral Academic NLPP5 PhD year 3 Academic NLPP6 Postdoctoral Academic CVP7 PhD year 2 Academic NLPP8 PhD year 4 Academic NLPP9 Data scientist Academic-related NLPP10 Data scientist Industry CVP11 Postdoctoral Academic CVP12 Data scientist Industry NLP

Table 1

Participant summary

Tenopir et al. (2007) uses a survey and a series of user studies to understand readership use cases of ﬁgures withinscholarly documents and to test prototypes for ProQuest, a digital research library. The prototype was tested in ecolog-ical science, and involved extraction of data, including ﬁgures, into a metadata page of "disaggregated components" .Their conclusions are primarily about researcher activity using these components, noting "emerging opportunitiesto conduct research into scholarly communications focused on artifacts at ﬁner levels of granularity" . From theirhands-on study, they identify four main readership uses of ﬁgures: (i) "creating new ﬁxed documents" , (ii) "creatingdocuments to support performative activities" , (iii) "making comparisons between a scientist’s own work and the workof other researchers" , and (iv) "creating other information forms and objects" . Of relevance to diagrams, Tenopir et al.state that "in-depth indexing is applied to individual tables and ﬁgures, which allows searchers to locate informationof interest even if the entire article is not on that topic" . Referring to a lack of metascience, they noted more generallythat "investigations of scientists’ use of journal articles for purposes other than research have been rare" . We are notaware of any prior empirical research on system diagrams contained in conference proceedings.In their study of scholarly information, Pontis et al. (2017) identiﬁed diﬀerent attributes, such as experience leveland the project’s state, inﬂuencing researchers’ information-seeking behaviour. Pain points, uses and strategies aredescribed through the information journey. It was concluded that better support for ﬁltering content is important. Useof diagrams was not reported.

3. Study setup

We conduct a semi-structured interview study, and including the examination of six example diagrams and a closedcard-sorting exercise to identify "useful and not-at-all-useful tasks". University of Manchester Department of Com-puter Science ethical review board approval was granted for this study (2019-7852-11951). Full interview scripts andtranscripts have been made available (Marshall et al., 2020a).

We recruited 12 participants (Table 1), each reporting having read at least one paper from the top three H-indexedcomputer vision or natural language processing conferences, in the last 12 months (ACL, NAACL, EMNLP, CVPR,ECCV, or ICCV). All participants were previously known to the research team, though not necessarily the interviewer,and spanned seven academic, academic related, and commercial institutions.

Semi-structured interview

Semi-structured interviews are a well-established technique for collecting data (Kallioet al., 2016). Prior to formal commencement of the study one pilot user was taken through using a preliminary interviewscript, which led to the reﬁnement of the interview materials. The full interview questions are available alongside thetranscript data, the overarching questions being:• Can you describe how you use diagrams when communicating your research?

GC Marshall et al.:

Preprint submitted to Elsevier

Page 6 of 21eural Network Diagrams Interview Study • How do you use diagrams when consuming research?Following the graphic elicitation and card sorting exercises detailed below, additional follow-up questions wereasked, about the role of diagrams more generally and exploring topics that came up during the interview. The entireinterview session, including the two exercises, was audio recorded and documented in the transcript. Six participantswere interviewed face-to-face, and six were interviewed over Skype video software. Interview resources were presentedas printouts or as PDFs. The interviews took an average of just over 1 hour, resulting in 12 hours, 4 minutes, 54 secondsof audio recording. The recordings were transcribed with personally identiﬁable information and unnecessary non-words redacted, resulting in over 58,000 words of transcription. The interviews were conducted in English, and themajority of participants were non-native English speakers. The transcripts capture what was said, with the intervieweradding clariﬁcations of understood meaning in square brackets where required.

Graphical stimuli

Graphic elicitation is a complex term, used in a variety of ways, as discussed by Umoquit et al.(2013). In our study we use pre-made diagrams as stimuli, ﬁtting with Crilly et al.’s (2006) deﬁnition and usage ofgraphic elicitation. We use example diagrams for "graphic communication" rather than "graphic ideation". We choseto use graphic stimuli as part of the interview for the reasons outlined by Crilly et al.: That it allows a shared frame ofreference, facilitates complex lines of enquiry, and provokes comments on interpretation and assumptions.The six example diagrams we used were chosen after the research team conducted an open card sorting exercise ona manually extracted corpus of 120 scholarly neural network (NN) system diagrams. Twenty diagrams were randomlyselected from each of CVPR 2019, ICCV 2017, ECCV 2018, ACL 2019, NAACL 2019 and EMNLP 2018. Fromthese 120 diagrams, six groupings were identiﬁed: "Labelled layers", "3D blocks", "pictoral example centric", "textexample centric", "modular" and "block diagram". The groups are not distinct, but encompass the main visual aspectsthat authors seem to be prioritising. This follows the classiﬁcation advice of Futrelle (2004), stating that "familyresemblances" are often the best we can do for diagram schemas. The speciﬁc examples used were selected from 2019conferences, and were chosen (a) to be contemporary, (b) to cover a range of venues, (c) to be visually diﬀerent and (d)to be clearly placed within the groups identiﬁed. This selection criteria was chosen in order to cover the search space,and facilitate discussion about the diﬀerent visual and content aspects. Due to the heterogeneity of representations usedin the ﬁeld, we were not able to construct a small subset that we felt were representative of the whole ﬁeld. Instead weaimed to cover a range of the most commonly observed diagrammatic phenomena.

Card sorting

Closed card sorting (Wood and Wood, 2008) asks users to put cards into groups. The cards we usedhad activities a researcher might perform using scholarly diagrams, and the groups used were: "Important in your useof diagrams", "absolutely not important in your use of diagrams, do not do this at all" and "somewhere between". Thismethod was chosen in order to gather quantitative data about reported usage. Initially we encouraged participants torank all the cards from most to least important, but this proved to be too much with the ﬁrst participant, so we adaptedto simpliﬁed groupings. The tasks list was generated based on the experience of the researchers conducting the study,and participants were given the opportunity to add or remove from this list.

Analysis method

We conducted a thematic analysis, following the framework of Braun and Clarke (2006). A bottom-up analysis was appropriate due to the lack of theoretical framework to inform a priori categorisation. With a briefcommentary, the steps were:1. "Familiarising with the data": Assisted by researchers conducting the transcription.2. "Generating initial codes": Where applicable, we chose to code latent themes rather than semantically/literally,in order to examine underlying issues. Initial scope was the entire interview content.3. "Searching for themes": In gathering themes, we found the investigative scope too broad, and restricted ourthematic analysis to visual encoding mechanisms. This decision was made from a reﬂexive standpoint (Blandfordet al., 2016), as it enables guidelines which will be pragmatic and relatively straightforward for diagram authorsto implement.4. "Reviewing themes": Iterative, between research team, getting external input from a thematic analysis expert.5. "Deﬁning and naming themes": Including establishing a narrative of the research.6. "Producing the report": This stage involved tweaking themes and reviewing previous codes, particularly onwhether to classify (and in doing so quantify) parts of the qualitative feedback, and selecting aspects for publi-cation in this venue.

GC Marshall et al.:

Preprint submitted to Elsevier

Page 7 of 21eural Network Diagrams Interview Study

Topics such as diagram content requirements and inter-participant agreement were also captured through the codingiterations. We chose to use these as categories rather than themes, as the possible responses were relatively restricted("do you like or not like this diagram", and ordering of tasks). The thematic analysis supports a narrower researchquestion of which presentation aspects are helpful or confusing.

4. Results

Our analysis examines the diﬀerences in opinions and usage of NN diagrams. Transcripts were uploaded intoNVivo 12 qualitative data analysis software (QSR International Pty Ltd.). Our reporting of this study is centred aroundthe research questions, and includes thematic analysis based around user requirements for reading diagrams. Thereporting does not reference individual diagrams in each instance, because (i) many quotes are not in reference to adiagram (ii) some diagrams have multiple instances of a phenomenon, which the participant may be referring to onlyone of, and (iii) the transcripts are available and are easily text search-able for the quote, with the Example diagramletter added where it is not easily identiﬁable from the verbal transcription alone.

Participants reported constructing diagrams in order to give a "schematic overview" (P1), to provide an "anchor" (P12), and to "simplify understanding" (P3). Whilst these topics are conceptually intertwined, our analysis led to threecategories of why people use diagrams: (i) Summary view (ii) Perceived eﬀectiveness, and (iii) Relation expression.

All participants expressed that their diagrams (or those they read) facilitate a holistic overview of the system. Thisalso relates to the use of the diagram to screen for a paper’s relevance. "Because in the diagram you can expressdirectionality and you can show consistent things over the whole approach. You can have a holistic view in a diagram,that is quite hard to do in text, there is a bunch of stuﬀ you have to go through to understand. For a new reader,providing a diagram that gives you most of the picture." (P7) "I use a diagram when I want to summarise or represent a higher level view of some process, or the building blocksof a method that I’m trying to convey." (P8) "So what I prefer to do through the diagram is give the intuition rather than the speciﬁcation." (P12)The diﬀerence in opinion on whether a diagram should provide a schematic summary appears to be linked to themacroscopic readership behaviours described in Section 4.1.4, and is further complicated by the level of "engineering"speciﬁcation the participant felt beneﬁcial to be detailed diagrammatically.Participants reported part of the value of the diagram being in the omission of information in order to make un-derstanding easier for readers, overlapping "summary" with "relational communication": "there are some things thata diagram can just explain really succinctly and clearly" (P11).

All participants expressed that diagrams were good, useful or eﬀective for communicating systems, though thiswas often latent or with vagueness around the reasons for this (e.g. "I like diagrams" (P1), "Diagrams are good" (P2)).The perceived eﬀectiveness of diagrams led two participants to compare the utility of text and diagrams. P3described text as being an eﬀective default modality, with diagrams secondary: "Maybe this is the type of informationthat reading from a diagram is more eﬀort than reading from a text. The main idea of creating a diagram is to simplifythe understanding. If the concept is easy to describe in a few words, it is better to read a paragraph than to look ata picture and try to decode it." (P3). In contrast, P4 described a diagram as being an eﬀective default modality, withtext secondary: "Usually I prefer to use a graphical representation because it reduces the time to understand the ideathat is being expressed. Text for me is only to explain something that is not easy to do with a diagram by itself." (P4).Both participants had the suitability of representation underlying their comments, which otherwise seem to be basedon personal preference. Overall, the perceived relative eﬀectiveness of text or diagrams seems to be contextual andpersonal, as found in many other aspects of the interview analysis.The ability of diagrams to represent relations was expressed: "people can easily lose track of what connects to whatand why, while a diagram gives something like a grounding or an anchor" (P12). Section 4.4 includes commentary onexamples of usage of the relational nature of diagrams, especially navigation, which can be considered as the abilityto easily transition between objects using relations. Section 4.5 explore eﬀectiveness of diagrams in more detail,particularly within the theme "Visual ease of use" (Section 4.5.1).

GC Marshall et al.:

Preprint submitted to Elsevier

Page 8 of 21eural Network Diagrams Interview Study

Participants saw the (relational) complexity of their systems as better represented by a diagram than linear text.The comments on this topic focus on explanatory and communicative value. Participant quotes express this: "It isnot easy to explain a complex network without a diagram." (P4) and "I am not sure they would grasp this concept ofcompositionality as well through text . . . what I try to do is show how this ﬁts together." (P12).For some participants there were multiple reasons for using the relational advantage of diagrams: "It encodes acouple of things quite well, it encodes data ﬂow and also computation steps, so it is a nice way of doing both of thosetwo things at the same time. [Pause] It is generally easier to walk through people, so if you are presenting some designit tends to be easier to walk people through what is going on by pointing at blocks and describing that particular blockin the overall picture." (P10).

We now consider primarily readership usage, though readership is naturally intertwined with authorship. In anacademic paper reading context, six participants reported using diagrams as an accompaniment to the text, while sixreported a special role for diagrams.3/12 participants reported reading the diagram before the text of a scholarly publication. "I personally start withthe diagram to get a general view" (P3), "Even before reading the abstract or conclusions. I go directly to the diagram,this is what I’m looking for" (P6) and "Reading the paper starts with the diagram, for me." (P7). This contrasts withthe comment that "It is a process of moving to and from the text and the diagram to get a complete picture." (P1). It isinteresting to note, particularly in terms of usability, that these three participants expressed time pressure as the reasonfor using the diagram in this way, and were using the diagram as a cross-cutting schematic to understand and screenthe paper for relevance. This usage of system diagrams suggests that the diagrams may be fulﬁlling a role as a type ofinformal graphical abstract.An initial overview was not the only special role aﬀorded to diagrams. Participants also reported the diagram asthe primary view on the system "if I look at a diagram and I really can’t ﬁgure out what this is doing, I’ll usually readthe paragraph before, or I’ll scan the text around it to see where the ﬁgure is referenced and read that bit. I don’t haveenough time to read whole papers." (P10) and as an index "I go back every now and then to the diagram to see how itﬁts together." (P12)

Six participants reported not having a systematic method to create diagrams. Two quotes describe this unsystematicauthorship process: "There is no standard for diagrams in our area, so I try to make something that makes sense, orseems to make sense, and additionally put some explanation and hope the reader will understand." (P3) and "I guessI’ve always got an idea about what a ﬁgure is trying to communicate that I feel is easier done in images than wordsbut it just depends what that happens to be." (P2).Three further participants methods were to use "the people we cited; their diagrams" (P5) to design their owndiagrams.Two participants had created their own standard method for authoring diagrams, and both were inspired by relatedwork in the creation of this method. One (P8) had a method of always using labelled layers, the other (P10) created acustom diagram notation: "I do enforce that when I’m doing my own notebook diagramming, I’ll use a consistent setof conventions for myself. . . the abstraction just melts away, and you can understand what’s going on rather than theshapes." (P10).One participant (P9) was not authoring papers for conferences.

The interview focused on authoring and readership practices for scholarly publication. Some additional uses ofdiagrams arose during interview.In addition to being used in the task of authoring of a paper, all non-academic researchers described the use caseof giving presentations that was not commonly discussed by the academic participants. See Table 1 for participantbackgrounds.When asked about creative process, participants mentioned using block diagrams in a broader cognitive context,to understand their own wider projects (P3), to solve coding problems (P1), and to interpret papers (P4). For some

GC Marshall et al.:

Preprint submitted to Elsevier

Page 9 of 21eural Network Diagrams Interview Study

Figure 3:

Number of diﬀerent digital diagram creation tools reported as being used by each participant participants diagrams were fundamental in their research process: "Interviewer: Do you draw diagrams then beforeyou are building the system? P4: Before, during and after [Laugh]. The whole process."

Five participants mentionedhaving an iterative diagram creation process, often including a colleague or supervisor, or using a whiteboard.

The software tools used to create diagrams were also very diverse. Between them, the 12 participants reportedusing 16 diﬀerent digital tools to create NN system diagrams. Figure 3 shows the distribution of count of tools used.The main reasons for tool choice were ease of use and ﬁle export format. Inkscape (4), Google Draw (3), Draw.io (2),and Tix (2) were the most commonly used tools, with other tools used by only one participant (astah, fast.ai, GoogleSlides, graphviz, Illustrator, Lucid Chart, Omnigraﬄe, Open Oﬃce Draw, Microsoft Paint, Microsoft Powerpoint,Microsoft Visio, and yEd). Six participants commented that a custom tool for creating NN system diagrams would beuseful, either as a plug-in to an existing tool or stand-alone.

The card-sorting exercise gave insight into the tasks performed by participants reading NN systems architecturediagrams. We had expected some topics to emerge as prevalent. However, we discovered heterogeneity. In thisexercise, we started with 15 core tasks, as shown in Table 2. Three additional tasks were each added by one participantand are included in the table.We asked participants to pick their top three tasks, and to highlight any number that they felt they did not do atall. The results are reported in Table 2. Five participants chose to group some tasks together into one task, and oneparticipant only chose two top tasks, leading to non-conformity with "three top tasks" for those participants. Theresearchers were intentionally not rigid in enforcing the limit, as the intention was to understand usage rather thanforce participants into the task framework. P11 did not select an "do not do at all tasks", stating "You wouldn’t do allof them because then you’d be trying to put too much information in the diagram. Depending on what you’re tryingto get across, each of these could potentially be uses. I wouldn’t rule any of them out" . All 12 presented tasks werechosen by at least one participant as a "top importance" task. Nine out of 12 tasks were reported as "not done at all"by at least one participant. This highlights the variable use of diagrams by readers. The top-rated tasks were:• "Understanding how the system works" (8/12)• "Identifying the system novelty or contribution of the paper" (5/12)• "Identifying layers, relations between components, or internal dependencies" (5/12)That these modal tasks were selected by so few participants further underlines the diﬀerences in users’ requirementsof the diagrams, and complements the diﬀerences in opinion seen when discussing the Examples. To quantify this,we used Fleiss’ Kappa for m Raters, executed in the software R (Subjects = 15, Raters = 12). We ﬁnd 𝜅 = 0 . and 𝑧 = 3 . with p-value = 0.000149. This indicates, with signiﬁcance, that there is only "slight agreement" on top tasks.The low number of added tasks, and that each core task was chosen at least once, suggest the 15 core tasks presentedhave reasonably good coverage of the task-space. GC Marshall et al.:

Preprint submitted to Elsevier

Page 10 of 21eural Network Diagrams Interview StudyTask P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12Identifying corpora and data types N Y NIdentifying representational choices (e.g.embeddings, graphs) Y YIdentifying the purpose of the system N Y Y YIdentifying speciﬁc architectural features Y Y Y YIdentifying opportunity to alter the archi-tecture Y N N N N Y Y NIdentifying what the author thinks is im-portant to communicate N N Y Y NComparing to other systems N Y NInitial check to see if they use a particularthing I am interested in Y N Y Y Y YIndex to navigate the paper N Y N N N YMemory aid N N N Y N N Y Y YAid for writing a summary of the paper Y N NUnderstanding how the system works Y Y Y Y Y Y Y YExtra: Identifying input and outputs YExtra: Parameters to rebuild YExtra: Gauge overall complexity Y

Table 2

User tasks reported by each participant when reading diagrams. Y indicates "top threemost important task" and N indicates "do not do at all". Some participants did not selectprecisely three tasks, as explained in Section 4.3

Generating codes was done using the thematic analysis protocol described. However, because we found a need toreport heterogeneity, the clustering aspect of thematic analysis was not appropriate in the reporting of this section. Tobe clear for a thematic analysis purist, this subsection contains codes that are categorical labels, not themes. Whendescribing requirements for the domain in Section 4.5, we use full thematic analysis.

Schematic overview versus implementation details

In terms of what the diagram should convey, nine participantsexpressed that the system diagram should provide an overview. One participant commented that diagrams should becontext dependent (P11), and two participants required implementation details such as hyperparameters (P1, P4) which10/12 participants deemed unnecessary. When prompted with example diagrams, participants occasionally changedtheir preferences. This highlights the beneﬁt of graphic elicitation.

Use of concrete examples

Two participants used diagrams as a way to instantiate an example (P5, P12). Ten par-ticipants reported ﬁnding an example input helpful (not P4, P7). The instantiation described was key for the twoparticipants’ cognition (P5, P12): "I tend to inductively understand something. That is, from an example, generaliseto how it works. . . They usually don’t put examples in text, the example is usually in the diagram" (P5).

Technical knowledge and familiarity

All participants commented that technical knowledge was required to under-stand the example diagrams. Comments ranged from "Maybe this is easy to understand because I know what resnetis." (P4, overall positive opinion) to "I’ve not used resnets so I don’t know what resnet-34 means" (P11, overall negativeopinion).Familiarity appears to have had a substantial impact on opinion of the diagram, for all participants: "Since I comefrom a more similar ﬁeld and I understand this diagram well, I like this diagram." (P7). Preference diﬀerence isquantiﬁed in Section 4.6, though with the small sample size and broad specialism groupings we did not identify anyclusters, including any similar sentiment based on primary research domain.

GC Marshall et al.:

Preprint submitted to Elsevier

Page 11 of 21eural Network Diagrams Interview StudyNot meaningful Need to refer to text or unsure MeaningfulP1: "I doubt it, I wouldn’t expectso." P2: "Well, it’s not clear whetherthat is just an abstract represen-tation, I’d have to look in thetext." P6: "Yes! I mean this is com-puter science not literature. Ifyou have four, it means you havefour."P7: "And the number, it cannotbe four dimensions, it must bemuch more than that. It is ei-ther misleading or just plain theirown internal reasoning to put itlike that." P4: "I don’t know." P10: "So we’ve got the embed-ding layer which is a 5-vector andout inputs which are 4-vectors."P8: "No. I don’t think they aresigniﬁcant." P5: "It might be signiﬁcant or itmight be arbitrary, we’d probablyneed to check in the paper."P11: "I’m working on the as-sumption that there aren’t fourinputs there, and it’s kind of anarbitrary number." P7: "Or maybe the hidden layerhas some other transformationlayer, then maybe that is whythey put that. I’d go in and checkit out in the paper."P12: "And the fact that theyhave four and ﬁve, I think proba-bly the reason why they did thatwas to show that the dimensionsdon’t have to match. But I don’tthink it is that important, to behonest."

Table 3

Participants expressed a spectrum of opinions, of diﬀering conﬁdence, on whether a precisedepiction is meaningful. P7 made two diﬀerent comments, and P3 and P9 did not makea comment mapped to this theme

Navigation

Diagram E, which features labelled modules containing further detail, had conﬂicting views on the easeof navigation: "Yeah this is automatically easier, because you’ve got the input on the top left, and it’s a bit clearerabout where you’re supposed to follow the model, as you’ve these clear arrows and guidelines." (P1) contrasts with "This lack of direction here. I like how they have put examples here, so at least I have a start and an end, but thereis a lack of directionality in the diagram." (P7). These two quotes reﬂect the diﬀerences in perception and perceivedmeaning found elsewhere in this study.

Ineﬀective use of diagrams relational properties

Commenting while reading example diagrams, 2/12 participantssuggested text would be more appropriate than a diagram "It’s nice and linear, but has not revealed much more to methan the text caption did to be honest." (P10) and "The diagram isn’t giving you over and above what you could get fromtext. So in the sense of conveying information clearly, because it is that simple and basic it isn’t giving you anything asimple sentence wouldn’t give you." (P11).

Precision meaningfulness

Participants expressed varying opinions as to whether a set of circles related to a speciﬁcor an arbitrary dimension of vector, of the type shown in Figure 1. This style of depiction of a vector is fairly commonin neural networks. Table 3 demonstrates this issue through some of the conﬂicting quotes. The comments are basedon Example A or E.

Hyperparameters

A ﬁnal comment on speciﬁcity relates to hyperparameters, the settings for the system that are usedto control the training process training. This includes, for example, the size of the vector used in each hidden layer,and is therefore related to the requirement for precision, but not to the visual encoding of precision: The dimension

GC Marshall et al.:

Preprint submitted to Elsevier

Page 12 of 21eural Network Diagrams Interview Study of a hidden layer is often quite large. In practice, few authors attempt to visually encode hyperparameters of theirsystems, relying instead on labelling using numbers (as in Example B). Feedback on explicit numerical representationvaried from "If you want to see that in an experimental section, they are not ﬁxed. I would not expect to see that in anarchitecture diagram, certainly not." (P8) to "It would be good to have a labelling of the dimensionality of the diﬀerentlayers" (P1). The example of hyperparameters is indicative of the varied levels of granularity required by participants.This is not always the case: At a similar level of granularity, but often required by participants, was the detailing ofspeciﬁc functions such as "loss" or "pooling".

Our thematic analysis focuses on visual encoding user requirements, and as such examines preferences, whichare primarily explicit rather than latent. Despite the variety and conﬂict of opinions demonstrated previously, thereis some underlying commonality. This subsection simultaneously considers both the creation and consumption ofdiagrams. The top level themes we identiﬁed were visual ease of use, appropriate content, and expectation matching,with multiple sub-themes. Codes in this subsection are in addition to the codes and categories outlined in the previoussections.

All participants mentioned aspects related to ease of use. Navigation and Ineﬀective use of relational properties(Section 4.4.2) support this theme.

Clear navigation

Navigational issues were focused on overall structure: "it is relatively easy to follow due to thestructure of it." (P2). See Section 4.4.2 for more detail.

Aesthetics

Consistency within diagram

Consistency was seen as important by all participants. This was sometimes at a struc-tural level (e.g. "It should be all at the same level of abstraction" (P6)) and sometimes at a graphical component level(e.g. colour inconsistency "annoys me a little bit because it is making me think that those things are diﬀerent whileprobably they are not." (P12)). The importance of consistency can also be inferred by participants requesting diagramguidelines or standards (see Section 4.5.4). There is a slight dissonance between the reported importance of consis-tency and the lack of confusion due to precision meaninglessness (Section 3). This may be partially explained by thelack of ability to validate assumptions against the full paper. Two participants expressed frustration at having to turntheir heads to read rotated labels, saying "I don’t particularly like having to turn my head to read the labels." (P10), and "I don’t like that some things are written horizontally, some things vertically, because I have to turn my head aroundto actually read them" (P12).

Process stages

Particularly when creating diagrams, 8/12 participants commented on using the diagram to under-stand the process steps "I need to write something in the paper to not only imagine but to think well about the paper,about sequence, about its choice" (P4)

This theme builds on the codes of Section 4.4.

Wanting more information in the diagram

The "missing" information participants sought includes symbols (P1),what to focus on (P2), speciﬁc details "score is not clear, it would be good to have an explanation" (P3), maths " 𝑥 , 𝑥 𝑘 , 𝑥 𝑛 , this part is confusing" (P4), inputs and outputs (P8), caption, key or legend (P9) or the purpose of colour(P10). Wanting less information in the diagram

This does not conﬂict directly with the previous sub-theme, being aboutpresenting the right information. "I don’t know what’s important here because everything is on there" (P2), and "It isnot explicit what is core in the diagram" (P4), and "The whole point is it is meant to be concise and get the informationover to you quickly but that it not concise at all." (P11).

GC Marshall et al.:

Preprint submitted to Elsevier

Page 13 of 21eural Network Diagrams Interview Study

Wanting multiple diagrams

Eight participants expressed wanting multiple diagrams in order to get diﬀerent contentfrom each, usually one schematic and one or more detailed for speciﬁc components. "I think we need another, moredetailed graph to represent the architecture of the model. This is just an overview." (P6).

This theme is primarily latent. It encompasses sub-themes relating to social and contextual aspects (Section 4.1.4),including familiarity (Section 4.4), consistency, and more broadly "seeing what I expect to".

Consistency across diagrams

This includes comments on complying with conventions. Half the participants saidit was diﬃcult to understand a given author: "We have to unify the language of diagrams because I take a long timeunderstanding the visual language of each author." (P4). Ten participants commented on the current lack of a standardvisual language. This is further supported by the request for Guidelines (see Section 4.5.4).

Consistency within domain

This includes numerically representing hyperparameters in CNN diagrams, which isa common practice. Another example of consistency commented on was the usage of domain-speciﬁc terminology,such as "resnet-34", "conv1", or "BERT".

Unexplained symbols

These often caused frustration (9 participants). "There are some links which aren’t explained.There is quite a lot of notation, mathematical notation, which is in the diagram but not explained in the caption. I don’tknow what half, any, of these symbols stand for." (P1). For other participants this was less of an issue or was case-dependent "I don’t know the symbols like 𝐻 𝑔 . But I assume they are described in the text." (P6). Guidelines for creating diagrams were requested explicitly by ﬁve participants: "A set of guidance, something likethat, could be super useful for researchers because most people don’t really know what they are doing and don’t knoweven basic things about use of shape and colour and fonts." (P11). The nature of this request ranged from designtopics to standardised symbols. All participants made a comment that could be viewed as supporting the creation ofguidelines for authors. One participant (P9) requested guidelines for reading diagrams.

This subsection refers to the subjective preference of each example diagram. As such, it may involve any contextualor non-contextual factors personal to the reader. This is not an attempt to discover "good" diagrams, particularly as thediagrams are (by deﬁnition of being in a scholarly publication) describing diﬀerent systems and diﬀerent contributions.Again, the analysis suggests heterogeneity.As part of the interview, we asked a binary "do you like or not like this diagram" for each of the examples. Neutralsentiment was permitted, giving three possible ratings. Reliability of agreement for diagram preference was analysedusing Fleiss’ Kappa for m Raters, the standard measure of agreement for categorical ratings. This was executed usingR software (Subjects = 6, Raters = 12) giving 𝜅 = 0 . , 𝑧 = 2 . , with p-value = 0.02. This represents only"slight agreement" between participants, with signiﬁcance (Landis and Koch, 1977). Table 4 shows all reported overallopinions. Overall, Example Diagrams A and F were most "liked" (10 and 9 participants liked, respectively), ExampleDiagram D polarised (6 liked, 6 disliked, 0 neutral) and Example Diagram C was least liked overall (2 liked, 1 neutral,9 disliked).In an attempt to obtain a higher rater agreement, we also examined "expressed positive" or "expressed negative" in-dividually, eﬀectively removing neutral responses. This gives approx 0.1 kappa and p value <

5. Discussion

We have shown that these diagrams are used by authors and readers in a wide range of ways, with a range of needs,perceptions, and preferences. In the interview study, we found:

GC Marshall et al.:

Preprint submitted to Elsevier

Page 14 of 21eural Network Diagrams Interview StudyParticipantopinion ofexample A B C D E FP1 + - - -P2 + - - - + +P3 + + - +P4 + + - + + +P5 - - - + +P6 - + - - + +P7 + + + -P8 + + - + + +P9 + - - - - +P10 + + + + + +P11 + + - - - -P12 + - - + - +

Table 4

Overall opinion of example diagram: + = like, - = dislike, blank = neutralTopic ObservationHeterogeneity Reading Perception (e.g. navigation) (Sections 4.4, 4.5)Precision meaning (Section 4.4)Use cases (Sections 4.3, 4.2.2)Preferences (Sections 4.4, 4.5, 4.6)Creation Method (Section 4.2)Software used (Section 4.2.3)Role of scholarly diagrams Diagrams as a summary (Section 4.1.1)Diagrams as cognitive entry point (Section 4.1.4)Diagrams as extraction of example for understanding (Section 4.4)

Table 5

Summary of interview ﬁndings related to the usage of scholarly neural network system diagrams • Participants reported a wide variety of tasks performed while reading system diagrams, and had a wide varietyof preferences.• Three important themes were identiﬁed: Visual ease of use, appropriate content and expectation matching.• The usage of diagrams within papers for some researchers is not just as an accompaniment to text, but may beused before (and preferentially to) textual content. This suggests there is some usage of system diagrams as aschematic, or an informal graphical abstract.• A large variety of diﬀerent tools are used to author diagrams, even by the same individual researcher.• Diagram creation guidelines were requested by participants.• There is potential for confusion and communication error in diagrams being caused by the content and repre-sentation. This includes topics such as navigation ambiguity, and whether precise depiction contained meaning.Table 5 summarises the key ﬁndings of this study, including the areas in which heterogeneity was manifest. Apossible explanation is that the cognitive tools to support understanding in this domain are still not well developed, soeach individual has created their own way of reasoning about the topic, which is manifesting in the diagrams. The fastpace of the ﬁeld (as discussed in Section 1.1) may also be contributing to this.This heterogeneity impacts the creation, use and eﬀectiveness of diagrams, so is directly related to the RQs. Wefound that why people use diagrams, how they create them, and their presentation preferences are all extremely varied.

GC Marshall et al.:

Preprint submitted to Elsevier

Page 15 of 21eural Network Diagrams Interview Study

In this scholarly and technical setting, it was very useful to use ecologically derived examples to elicit commentsfrom participants, encouraging discussion of topics that we did not foresee. In previous literature this is unusual,instead using more constrained researcher-created diagrams. • We conducted a small scale interview study, and participants were selected to have varied levels of experienceand expertise (Table 1). This method is useful for providing rich perspectives from individuals, and allows for awide range of topics. However, it is a limitation of this approach that ﬁndings cannot be generalised.• Diagrams were considered outside of the paper context, in order to focus the discussion on the diagrams ratherthan the content of the paper. We mitigated this risk by using example diagrams that were relatively self-contained. This methodological choice, combined with the unnatural interview setting, means we are able todiscuss "reported usage" rather than "actual usage".• Participants took part in the study knowing it was about diagrams, and may be unrepresentatively positive abouttheir use.• Participant opinion on diagrams may have been distorted by their own perceived self-eﬃcacy: "So, since I comefrom a more similar ﬁeld and I understand this diagram well, I like this diagram." (P7). We did not assess thecorrectness of their statements or sentiments as part of this study, in order to help the participant feel at ease. Itis possible participants were wrong in their assumptions about the systems, and the accompanying text was notprovided to assist in validating assumptions. This could have inﬂuenced their opinions on the example diagrams.• For the card-sorting, we used a broad range of tasks, all of which were relevant and therefore useful to include.We constrained the number of tasks to the level we felt appropriate in order to make the task feasible for par-ticipants to undertake. In hindsight, due to the heterogeneous usage of diagrams, there would have been beneﬁtfrom more precision particularly in "understanding how the system works", for example the level of granularitythe user requires.

In the face of this heterogeneity both in usage and preference, we hope to lay a path for discussion and perhapsstep towards conventions or evidence-based guidelines for improved eﬀectiveness. Unlike in Human-AI interactiondesign, where multiple sets of guidelines already exist and are being methodically reﬁned (Amershi et al., 2019), weare not aware of any AI system architecture diagram guidelines. There is a long established tradition of providingtheoretically grounded guidelines (Koyani and Allison, 2003) for many areas of HCI which may have relevance to dia-grams, including guidelines for GUIs (Mayhew, 1994), to support navigation (Vinson, 1999) and optimise perceptualproperties of layouts (Serrano et al., 2017). Further, there are HCI methods for comparing the utility of guidelinesfor user interfaces in theory (Jeﬀries et al., 1991) and evaluating their eﬀectiveness in practice (Linehan et al., 2011;Miniukovich et al., 2019; Power et al., 2012). These methods and measures could be applied for the beneﬁt of systemdiagrams. Guidelines with empirical evaluation would have the potential to make a quantiﬁable reduction in ambiguity,miscommunication and errors, and allow scholarly diagrams to better serve their readers and authors. In the followingsections we discuss existing HCI guidelines, and their relevance or otherwise to AI system diagrams, proposing a setspeciﬁcally to address this area in Table 6.

In abstract diagram design, there are fragments of advice to be found in the literature, on topics from cognition toperception. Perhaps the most concrete diagram guidelines are those derived by Larkin and Simon (1987), for max-imising the interpretability of diagrams:• Group together spatially information that is used together, in order to avoid searching during inference.• Avoid symbolic labels.• Make use of perceptual enhancement, for example working from left to right.

GC Marshall et al.:

Preprint submitted to Elsevier

Page 16 of 21eural Network Diagrams Interview Study

Whilst perhaps useful for neural network diagrams, neither grouping nor symbolic labels featured prominently inthe interviews, suggesting these may not be the priority for useful guidelines in this domain. Concrete examplesof perceptual enhancements applicable to diagrammatic representations can be found in Gestalt laws (Wertheimer,1923). Relevant Gestalt principles for AI diagrams include proximity, similarity, closure, direction, and habit (commonassociation). They optimise for perceptual ease, such as easy discriminability of elements. Gestalt laws seem to beuseful for consideration, however they are essentially for perceptual eﬀectiveness, rather than communication, and areoptimised for visual speed rather than communicative eﬃcacy. Gestalt principles feature heavily in UX guidelines (seeSection 5.4.4).There are further practical recommendations that can be found in other studies. These oﬀer at least a partial view on"good diagrams", and include ensuring good labeling and highlighting relative importance (Moody, 2007), using non-linguistic symbols depending on audience experience (Petre, 1995) and minimising the number of symbolic elements(Nordbotten and Crosby, 1999).These general diagramming guidelines appear relevant to neural network diagrams. However, they have not beendesigned for the complexity of neural network systems, nor communicative scholarly tasks, and would beneﬁt fromempirical evaluation.

There are existing design guidelines which may support the improvement of diagrams, such those relating to UserInterface Design (Shneiderman and Plaisant, 2010). Four of Schneiderman’s "Eight Golden Rules of Interface Design"are applicable to diagrams:• "Strive for consistency": Whilst Schneiderman focuses on internal consistency, it was contextual consistencythat came out strongly in the interview. As such, is it unclear how well this translates from interfaces to diagrams.• "Design dialog to yield closure": Extending this concept potentially leads to inputs and outputs being good toinclude, as they give more completeness. It also could support the "expectation matching" theme.• "Reduce short-term memory load": Supports the comments about schematics and simplicity.• "Enable frequent users to use shortcuts": This supports abbreviations and exploitation of existing conventions.The remaining four rules are not relevant to (static) diagrams, as they focus on interaction: "Oﬀer informativefeedback", "Oﬀer simple error handling", "Permit easy reversal of actions", "Support internal locus of control".

Of Schneiderman’s (1996) seven tasks "overview, zoom, ﬁlter, details-on-demand, relate, history and extract", two(overview and relate) were found to be important tasks for participants. Schneiderman’s Mantra of "overview ﬁrst,zoom and ﬁlter, then details on demand" does not appear to be useful for static system diagrams. This perhaps reﬂectsthat, whilst the high level may not ﬁt diagrams, there is insight which can be gained from the granular empiricalevidence from which these are derived.Tufte’s (1990) inﬂuential work centred on data visualisation also includes application to information visualisationmore broadly, and spans a wide variety of two dimensional representations. In his wide-ranging coverage of domainsfrom multivariate data to planetary relationships, and from maps to music, Tufte comments that for technical engi-neering diagrams "What matters - inevitably, unrelentingly - is the proper relationship among information layers." (emphasis in original). This was also suggested by our interviews.Several of Tufte’s guidelines support creation of schematic diagrams, such as "Maximise Data Ink; Minimise non-data ink" and avoidance of "Chartjunk" . Further, support for his guidance on eﬀective use of colour, and emphasisinga horizontal direction, can be distilled from the interviews as being sometimes problematic for readers of NN systemdiagrams. A number of Tufte’s recommendations may be less appropriate for complex systems diagrams, such as(a) high density being desirable, (b) assuming everyone is an expert, and (c) giving readers all the data so they canexercise their processing power. This advice would appear to conﬂict with the aims of the "summary overview" usecase indicated by participants (see Section 4.1.1).

GC Marshall et al.:

Preprint submitted to Elsevier

Page 17 of 21eural Network Diagrams Interview StudyGuideline ExplanationA. Use conventional graphicalobjects where possible These are aesthetically preferred, and less likely to cause confusionB. Only use one type of arrowfor information ﬂow This is less likely to cause confusion. Reserve diﬀerent types of arrow for fundamentallydiﬀerent usesC. Use precision with care Using (for example) 4 of a thing will make some readers think there are 4 of the thingand others 𝑛 of the thingD. Include the input and out-put of the whole system This helps make the overall purpose of the system clearE. Consider using a singleconsistent example through-out This helps some readers to understand by instantiating the example and then gener-alisingF. Use visual encodings mean-ingfully When using a visual encoding principle, such as grouping by proximity or alignment,there should be a reason for itG. Make navigation easy Ensure it is easy to navigate a path through the diagram. Labels for layers, arrows,and linear alignment help to make navigation straightforwardH. Do not use colour for aes-thetics If you use colour, it should indicate grouping, otherwise it can cause confusionI. Use available conventions For example, if representing a CNN, it seems good to use the conventional 3D CNNformat, and include all the ﬁlter widths numericallyJ. Consider what peoplemight expect to see For example, if representing a CNN, put pooling in as a step. If you don’t use poolingand that is important, consider noting that in a caption or label, as otherwise it maybe assumed presentK. Be speciﬁc For example, "BERT" is better than "embedding". This aids interpretability by avoid-ing obvious gapsL. Consider that some readersmay use the diagram withouttext For these readers, a relatively self-contained diagram is particularly helpful Table 6

Proposed guidelines for neural network system architectures

User Experience (UX) draws heavily on both design and information visualisation, combining evidence and draw-ing new conclusions with relevance to the UX domain. Hartson and Pyla (2012) recommend Tufte’s approach: "Don’tlet aﬀordances for new users be performance barriers to experienced users" , whilst suggesting to "Accommodate dif-ferent levels of expertise/experience with preferences" . Hartson and Pyla’s guidelines for UX cover a wide ﬁeld. Manyof these guidelines are related to the ﬁelds of Design and Information Visualisation, and appear relevant to systemdiagrams (as discussed in Sections 5.4.2 and 5.4.3).An important UX consideration is accessibility, the ease of use for speciﬁc user groups such as those with disabil-ities. Accessibility in diagrams appears to not be prioritised, particularly with respect to colour-blindness, languageﬂuency, the examples chosen, or textural or mathematical modalities.It is not clear without further research whether Hartson and Pyla’s guideline to "Support human memory limitswith recognition over recall" would also be advisable for scholarly system diagrams. In scholarly research, recall (theretrieval of related details) is required to place this diagram against related work, whilst recognition (the ability toidentify familiar information) is likely to aid eﬃcient perception of the diagram. Scientiﬁc practices make the usecases for scholarly diagrams more integrated with their context than for a comparatively stand-alone user interface,and as such the prioritisation of recognition over recall may also be diﬀerent.

Few of the existing guidelines are evidence based, and none have been empirically evaluated in a scholarly domain.For the speciﬁc cognitive tasks performed in research, and for the speciﬁc representational requirements of neuralnetwork systems, it may be expected that guidelines devised for general usage would not be appropriate.Table 6 proposes a set of guidelines designed for pragmatic improvement to neural network system diagrams, based

GC Marshall et al.:

Preprint submitted to Elsevier

Page 18 of 21eural Network Diagrams Interview Study on the ﬁndings of this study. The intention is for the guidelines to be adapted as the community’s requirements change,and evidence for the eﬃcacy of each guideline is established.

6. Conclusion

Diagrams are an important and widely used way of communicating the architecture of neural network systems. Ourinterview study ﬁnds heterogeneity in the way they are constructed and understood, which provides freedom for theauthor, but leads to potential inaccuracies in their interpretation. Existing HCI guidelines have relevance for scholarlyneural network system diagrams, but no set maps directly to the issues we uncovered in the study. To bridge this gap,we propose guidelines speciﬁcally addressing the main causes of confusion. We conclude with a participant commentthat concisely summarises the ﬁndings of this study: "I think this lack of language for diagrams is so bad, even at ahigh level there is nothing the same at all." (P10).

CRediT authorship contribution statement

Guy Clarke Marshall:

Conceptualization, Methodology, Investigation, Formal analysis, Writing - Original draftpreparation.

André Freitas:

Conceptualization, Writing - review & editing, Supervision.

Caroline Jay:

Conceptu-alization, Methodology, Writing - review & editing, Supervision.

References

ACM, 2019. Chi ’20: Post-program committee update. https://chi2020.acm.org/blog/chi-2020-post-program-committee-update/. Accessed: 2020-06-17.ACM, 2020. Chi ’20: Proceedings of the 2020 chi conference on human factors in computing systems acceptance rates.https://dl.acm.org/doi/proceedings/10.1145/3313831 https://pdfs.semanticscholar.org/fc51/1dcebd3dae76133d5dbbda4250bebd0fb5e3.pdf .Braun, V., Clarke, V., 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 77–101.Buetti-Dinh, A., Galli, V., Bellenberg, S., Ilie, O., Herold, M., Christel, S., Boretska, M., Pivkin, I.V., Wilmes, P., Sand, W., et al., 2019. Deepneural networks outperform human expert’s capacity in characterizing bioleaching bacterial bioﬁlm composition. Biotechnology Reports 22,e00321.Carberry, S., Elzer, S., Demir, S., 2006. Information graphics: an untapped resource for digital libraries, in: Proceedings of the 29th annualinternational ACM SIGIR conference on Research and development in information retrieval, pp. 581–588.Clark, A., Chalmers, D., 1998. The Extended Mind. Analysis 58, 7–19. URL: https://academic.oup.com/analysis/article-lookup/doi/10.1093/analys/58.1.7 , doi:

GC Marshall et al.:

Preprint submitted to Elsevier

Page 19 of 21eural Network Diagrams Interview Study pp. 689 – 801. URL: , doi: https://doi.org/10.1016/B978-0-12-385241-0.00022-1 .Heiser, J., Tversky, B., 2006. Arrows in comprehending and producing mechanical diagrams. Cognitive science 30, 581–592.Hullman, J., Bach, B., 2018. Picturing science: Design patterns in graphical abstracts, in: International Conference on Theory and Application ofDiagrams, Springer. pp. 183–200.Hutchins, E., 1995. How a Cockpit Remembers Its Speeds. Cognitive Science 19, 265–288. URL: http://doi.wiley.com/10.1207/s15516709cog1903_1 , doi: .Hutchins, E., 2005. Material anchors for conceptual blends. Pragmatics 37, 1555âĂŞ1577. URL: , doi: .Jeﬀries, R., Miller, J.R., Wharton, C., Uyeda, K., 1991. User interface evaluation in the real world: a comparison of four techniques, in: Proceedingsof the SIGCHI conference on Human factors in computing systems, pp. 119–124.Johnson-Laird, P.N., 1983. Mental models: Towards a cognitive science of language, inference, and consciousness. 6, Harvard University Press.Joyce Chai, N.S., Tetreault, J., 2020a. Acl wiki: Conference acceptance rates. https://aclweb.org/aclwiki/Conference_acceptance_rates. Accessed:2020-06-17.Joyce Chai, N.S., Tetreault, J., 2020b. Acl2020: General conference statistics. https://acl2020.org/blog/general-conference-statistics/. Accessed:2020-06-17.Kallio, H., Pietilä, A.M., Johnson, M., Kangasniemi, M., 2016. Systematic methodological review: Developing a framework for a qualitativesemi-structured interview guide. Journal of advanced nursing 72, 2954–2965.Keim, D.A., 2002. Information visualization and visual data mining. IEEE transactions on Visualization and Computer Graphics 8, 1–8.Koyani, S., Allison, S., 2003. Use of research-based guidelines in the development of websites, in: CHI’03 extended abstracts on Human factors incomputing systems, pp. 696–697.Koylu, C., Guo, D., 2017. Design and evaluation of line symbolizations for origin–destination ﬂow maps. Information Visualization 16, 309–331.Landis, J.R., Koch, G.G., 1977. The measurement of observer agreement for categorical data. biometrics , 159–174.Larkin, J.H., Simon, H., 1987. Why a Diagram is (Sometimes) Worth Ten Thousand Words. Technical Report. Carnegie-Mellon University. URL: https://pdfs.semanticscholar.org/b7bd/d9331ed1ecbc931ccaf50c091cd0bb8b71b7.pdf .Linehan, C., Kirman, B., Lawson, S., Chan, G., 2011. Practical, appropriate, empirically-validated guidelines for designing educational games, in:Proceedings of the SIGCHI conference on human factors in computing systems, pp. 1979–1988.Ma’ayan, D., Ni, W., Ye, K., Kulkarni, C., Sunshine, J., 2020. How domain experts create conceptual diagrams and implications for tool design, in:Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–14.Maharjan, S., Montes, M., González, F.A., Solorio, T., 2018. A genre-aware attention model to improve the likability prediction of books, in:Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3381–3391.Marshall, G., Freitas, A., Jay, C., 2020a. Neural network diagram interview transcripts. URL: https://figshare.com/articles/dataset/_/12765596/0 , doi: .Marshall, G.C., Jay, C., Freitas, A., 2020b. Diagrammatic signiﬁcation of artiﬁcial intelligence systems. Unpublished results.Martinovic, D., Freiman, V., Karadag, Z., 2013. Visual mathematics and cyberlearning in view of aﬀordance and activity theories, in: Visualmathematics and cyberlearning. Springer, pp. 209–238.Mayhew, D.J., 1994. The conceptual model in graphical user interface design, in: Conference Companion on Human Factors in Computing Systems,pp. 361–362.Miniukovich, A., Scaltritti, M., Sulpizio, S., De Angeli, A., 2019. Guideline-based evaluation of web readability, in: Proceedings of the 2019 CHIConference on Human Factors in Computing Systems, pp. 1–12.Moody, D., 2007. What Makes a Good Diagram? Improving the Cognitive Eﬀectiveness of Diagrams in IS Development, in: Advances in Informa-tion Systems Development. Springer US, Boston, MA, pp. 481–492. URL: http://link.springer.com/10.1007/978-0-387-70802-7_40 , doi: .Moody, D., 2009. The "physics" of notations: toward a scientiﬁc basis for constructing visual notations in software engineering. IEEE Transactionson software engineering 35, 756–779.Murray, R., 2009. Writing for academic journals. McGraw-Hill/Open University Press.Nordbotten, J.C., Crosby, M.E., 1999. The eﬀect of graphic style on data model interpretation. Information Systems Journal 9, 139–155. URL: http://doi.wiley.com/10.1046/j.1365-2575.1999.00052.x , doi: .Novick, L.R., Hurley, S.M., Francis, M., 1999. Evidence for abstract, schematic knowledge of three spatial diagram representations. Memory &Cognition 27, 288–308.Novikov, D.A., 2015. Cybernetics: from past to future. volume 47. Springer.Peirce, C.S.C.S., Moore, E.C.E.C., 1998. Charles S. Peirce : the essential writings. Prometheus Books. URL: https://catalog.hathitrust.org/Record/004212310 .Petre, M., 1995. Why Looking Isn’t Always Seeing: Readership Skills and Graphical Programming. COMMUNICATIONS OF THE ACM 38,33–44.Pontis, S., Blandford, A., Greifeneder, E., Attalla, H., Neal, D., 2017. Keeping up to date: An academic researcher’s information journey. Journalof the Association for Information Science and Technology 68, 22–35.Power, C., Freire, A., Petrie, H., Swallow, D., 2012. Guidelines are only half of the story: accessibility problems encountered by blind users on theweb, in: Proceedings of the SIGCHI conference on human factors in computing systems, pp. 433–442.Purchase, H.C., Colpoys, L., Carrington, D., McGill, M., 2003. Uml class diagrams: an empirical study of comprehension, in: Software Visualiza-tion. Springer, pp. 149–178.QSR International Pty Ltd., . Nvivo qualitative data analysis software. March 2020 release.Rowley-Jolivet, E., 2000. Image as text. aspects of the shared visual language of scientiﬁc conference participants. ASp. la revue du GERAS ,

GC Marshall et al.:

Preprint submitted to Elsevier

Page 20 of 21eural Network Diagrams Interview Study http://alt.qcri.org/semeval2020/index.php?id=tasks .Accessed: 2020-07-25.Serrano, M., Roudaut, A., Irani, P., 2017. Visual composition of graphical elements on non-rectangular displays, in: Proceedings of the 2017 CHIConference on Human Factors in Computing Systems, pp. 4405–4416.Shimojima, A., 2015. Semantic properties of diagrams and their cognitive potentials. Center for the Study of Language and Information.Shneiderman, B., 1996. The eyes have it: A task by data type taxonomy for information visualizations, in: Proceedings 1996 IEEE symposium onvisual languages, IEEE. pp. 336–343.Shneiderman, B., Plaisant, C., 2010. Designing the user interface: strategies for eﬀective human-computer interaction. Pearson Education India.Soh, Z., Sharaﬁ, Z., Van den Plas, B., Porras, G.C., Guéhéneuc, Y.G., Antoniol, G., 2012. Professional status and expertise for uml class diagramcomprehension: An empirical study, in: 2012 20th IEEE International Conference on Program Comprehension (ICPC), IEEE. pp. 163–172.Stenning, K., Oberlander, J., 1995. A cognitive theory of graphical and linguistic reasoning: Logic and implementation. Cognitive Science 19, 97–140. URL: , doi: .Stokes, S., 2002. Visual literacy in teaching and learning: A literature perspective. Electronic Journal for the integration of Technology in Education1, 10–19.Suwa, M., Tversky, B., 1997. What do architects and students perceive in their design sketches? a protocol analysis. Design Studies 18, 385–403.Swales, J.M., Feak, C.B., et al., 2004. Academic writing for graduate students: Essential tasks and skills. volume 1. University of Michigan PressAnn Arbor.Tenopir, C., Sandusky, R.J., Casado, M.M., 2007. Uses of ﬁgures and tables from scholarly journal articles in teaching and research, in: Proceedingsof the 70th Annual Meeting of the American Society for Information Science & Technology (ASIS&T), pp. 18–25.Tippett, C.D., 2016. What recent research on diagrams suggests about learning with rather than learning from visual representations in science.International Journal of Science Education 38, 725–746.Tufte, E.R., Goeler, N.H., Benson, R., 1990. Envisioning information. volume 126. Graphics press Cheshire, CT.Tylén, K., Fusaroli, R., Bjørndahl, J.S., Ra¸czaszek-Leonardi, J., Østergaard, S., Stjernfelt, F., 2014. Diagrammatic reasoning Ab-straction, interaction, and insight. Pragmatics & Cognition 22, 264–283. URL: https://pdfs.semanticscholar.org/fce0/6d97b7cd7bdac420daa189d363f5844ebb38.pdf , doi: .Umoquit, M., Tso, P., Varga-Atkins, T., OâĂŹBrien, M., Wheeldon, J., 2013. Diagrammatic elicitation: Deﬁning the use of diagrams in datacollection. The Qualitative Report 18, 1–12.Van Der Linden, D., Hadar, I., 2018. A systematic literature review of applications of the physics of notation. IEEE Transactions on SoftwareEngineering .Van Wijk, J.J., 2005. The value of visualization, in: VIS 05. IEEE Visualization, 2005., IEEE. pp. 79–86.Vinson, N.G., 1999. Design guidelines for landmarks to support navigation in virtual environments, in: Proceedings of the SIGCHI conference onHuman Factors in Computing Systems, pp. 278–285.Wason, P.C., Johnson-Laird, P.N., 1972. Psychology of reasoning: Structure and content. volume 86. Harvard University Press.Wertheimer, M., 1923. Untersuchungen zur Lehre von der Gestalt. II. Psychologische Forschung 4, 301–350. URL: http://link.springer.com/10.1007/BF00410640 , doi: .Wood, J.R., Wood, L.E., 2008. Card sorting: current practices and beyond. Journal of Usability Studies 4, 1–6.Zhang, J., Norman, D.A., 1994. Representations in distributed cognitive tasks. Cognitive science 18, 87–122.Zobel, J., 2004. Writing for computer science. volume 8. Springer.

GC Marshall et al.: