Externalizing Transformations of Historical Documents: Opportunities for Provenance-Driven Visualization
EExternalizing Transformations of Historical Documents: Opportunities forProvenance-Driven Visualization
Tomas Vancisin * University of St Andrews
Mary Orr † University of St Andrews
Uta Hinrichs ‡ University of St Andrews A BSTRACT
Transcription, annotation, digitization and/or visualization are com-mon transformations that historical documents such as nationalrecords, birth/death registers, university records, letters or booksundergo. Reasons for those transformations span from the (physical)protection of the original materials to disclosure of “hidden” infor-mation or patterns within the documents. Even though such transfor-mations bring new insights and perspectives on the documents, theyalso modify the documents’ content, structure, and/or artifactualform and thus, occlude prior knowledge and interpretation. When itcomes to visualization as a means to transform historical documentsfrom written to abstract visual form, there is typically little acknowl-edgment or even understanding of the previous transformation stepsthese documents have gone through. The “tremendous rhetoricalforce” [3] of visualization, we argue, should not be at the expenseof the multiple pasts, contexts, and curators that are inherent in his-torical record collections. Rather, the urgent question for the fieldsof visualization and the (digital) humanities is how to better supportawareness of these multiple layers of interpretation and the peoplebehind them when representing historical documents. We begin toaddress this question based on a collection of historical universityrecords by (a) investigating common transformation processes of his-torical documents, and (b) discussing opportunities and challengesfor making such transformations transparent through what we call“provenance-driven visualization”; the idea for a visualization thatmakes visible the layers of transformation (including interpretation,re-structuring, and curation) inherent in historical documents.
Keywords:
Visualization, Historical Records, Digital Humanities,Interpretation, Provenance Visualization
NTRODUCTION
The visualization of historical documents and cultural collections hasattracted extensive research within the fields of visualization and the(digital) humanities (see Windhager et al. [30] and J¨anicke et al. [16]for overviews). Whether the research focuses on building tools thatprovide new ways for exploration of historical documents [8, 29],or on the development of novel visualization techniques tailored forsuch documents [13,22], the variety and the amount of visualization-focused research shows its far-reaching potential in the area. How-ever, efforts to disclose processes related to data acquisition anddocument transformation have, so far, focused mostly on the techni-cal steps crucial for visualization [7, 11, 15]. Details on how recordswere transformed from original into digital/visualization form, andthe curatorial decisions that were involved in these transformationsteps are typically omitted. Similarly, the resulting interfaces andvisualizations often do not allude to the labor and interpretative workinvolved in these processes. * e-mail: [email protected] † e-mail: [email protected] ‡ e-mail: [email protected] Research in visualization and digital humanities has alreadystarted to critically discuss the issue of (often) hidden data choicesand transformation processes [5, 14, 27], and their potential soci-etal and political impact [2–4]. The question that stands is howto transform theory into practice. How can we characterize thesetransformation processes and their impact on interpretation of andengagement with historical collections? How can we use visualiza-tion to make these processes visible or, at least, more transparent inorder to facilitate contextual interpretations and re-engagement withthe past knowledge (processes)? Focusing on the case study of theSt Andrews Historical University records, an exemplary collectionof student records that date back to the 15th century [18, 19, 25],we begin to address these questions. The collection is particularlyinteresting because parts of it have undergone a variety of docu-mented transformations across three centuries (see Fig. 1)—fromthe original handwritten matriculation rolls to interactive visualiza-tions of the records [27]. Through in-depth interviews conductedwith experts from the University of St Andrews who have workedwith these records at different stages, we capture and characterizethese processes and their impact not only on the collection’s struc-ture, content, artifactual and representational form, but, ultimately,on the way researchers and the general public can engage with it.Based on this analysis and in the context of previous work in thedigital humanities and visualization [5, 14], we present and define provenance-driven visualization as visualizations which focus ondisclosing the transformation processes that historical and culturalcollections have gone through, e.g., prior to or as part of digitizationand visualization processes. Provenance-driven visualization can beconsidered as a visualization approach and/or visualization-basedresearch method to make such processes explorable. We illustratethe idea of provenance-driven visualization based on a visualizationprototype that shows interpretation and transformation work that hasbeen done to the St Andrews Historical University records acrossthree centuries. Part of this visualization work has been discussedin a DH2020 750-word abstract [28]. Here, we expand on thisprior design work by addressing its methodological and theoreticalimplications. We see provenance-driven visualization as a novelmethod/perspective to visualization in DH and, ultimately, as an inte-gral part of what we call digital research ethics —methodologies andresearch approaches to data analysis and visualization that focus notonly on the content of historical documents and cultural collections,but also on the inherent interpretation and curatorial processes thesedocuments and subsequent data representations embody. T A NDREWS ’ H
ISTORICAL U NIVERSITY R ECORDS
The University of St Andrews has been keeping records of its stu-dents and staff members since its foundation in 1413. The recordsprovide rich insights into the University’s history as well as the soci-etal and political structures at the time. Our case study focuses onthe records created between 1747 and 1897 which have undergone avariety of transformations as part of several projects that aimed topreserve and conserve this collection (see Figure 1 for an overview).Originally, each student wrote down their name, and toward theend of this period, also church affiliation and birth place into theMatriculation/Graduation Roll (see Fig. 1.1). From 1888 to 1905the then Keeper of Manuscripts and Muniments, James Maitland- a r X i v : . [ c s . H C ] S e p nderson, transcribed these records which resulted in a printed book(see Fig. 1.2). Anderson’s work was re-visited between 1960 and2004 by another Keeper of Manuscripts and Muniments at St An-drews, Dr Robert Smart, who also modified the records’ content,drawing from a large variety of additional sources. He transformedthe collection into what is now known as the Biographical Regis-ter of the University of St Andrews (BRUSA) [24], a physicallybound alphabetical index of student and staff names that includesinformation about their demographics, courses taken in St Andrews,parentage, and subsequent careers (see Fig. 1.3). Almost 10 yearsafter the publication of BRUSA, the University Library’s DigitalHumanities and Research Computing team led by Dr. Alice Craw-ford transformed the register into searchable digital form using theText Endocing Iniciative (TEI) (see Fig. 1.4). This resulted in anonline interface which enables the textual search of BRUSA. In2018, we transformed the 11,894 TEI files, each representing one https://arts.st-andrews.ac.uk/biographical-register/ Figure 1: Transformations of the St Andrews’ Historical UniversityRecords (1747–1897). person, into a relational database and created a number of interactivevisualization sketches in Tableau Desktop that enable the explorationof the records’ content from different perspectives [27] (see Fig. 1.5).Our early visualizations take a decidedly quantitative approach tothis collection and they have revealed interesting trends and patterns.However, we also found this approach to be potentially misleadingas these graphs, charts and maps do not disclose the curatorial andinterpretative efforts that have shaped the underlying data. The in-completeness of records, for example, or uncertainty of names andgeographic locations become invisible behind definitive graphs andcharts [27]. This led us to research the history of this collectionand the transformation processes it has gone through by conductinginterviews with experts who have worked on its previous iterations.
NTERVIEWS WITH E XPERTS
We conducted seven in-depth interviews with the expert archivists,librarians, historians, and software engineers who have worked withthe St Andrews University records at different stages. All partici-pants agreed to be credited with their name for this research. In-terviewees included Dr. Robert Smart, the historian, archivist, andpaleographer responsible for compiling the records into BRUSA [24](see Fig. 1.3). From the team that worked on the transformation ofBRUSA into a web-based search platform (see Fig. 1.4) we inter-viewed Dr. Alice Crawford (project lead), Siri Hjelsvold, a medievalhistorian responsible for manually introducing XML:TEI tags tothe records, and Patrick McCann and Swithun Crowe, softwareengineers who helped conceptualizing the tagging framework andimplemented the search platform. The interviewees were asked todescribe their processes when working with the University records,what motivated these processes, what key decisions and challengesinformed their methods, and how they thought the transformationsthat they applied to the records impacted subsequent interpretationsand engagement with this historical collection. While an interviewwith Maitland-Anderson (1852–1927) about his transcription pro-cesses of the original records (see Fig. 1 [18]) was not possible, weinterviewed Rachel Hart, senior archivist and the current Keeper ofManuscripts and Muniments at the University who provided richdetails about the context in which the original records were createdand who also described Maitland-Anderson’s working practices.Hart’s work on similar document collections also provided invalu-able information about the opportunities as well as issues introducedby the digitization of historical records. Finally, we interviewedSean Rippington, the digital archives officer responsible for curatingand implementing the current digital preservation system at the Uni-versity. While not directly involved with the historical Universityrecords, he provided insights into common digitization processes.All interviews were transcribed and analyzed for common themes(motivation, process, modifications, challenges, effect, insights, ad-vice, future for the records) using an open-coding and thematicanalysis approach [1, 9]. The qualitative coding focused on charac-terizing transformation processes and related challenges as well asimplications for interpretation. Below, we describe the transforma-tion processes we have identified as part of our interview analysis.
RANSFORMATIONS
Our qualitative analysis revealed four key categories of transforma-tion processes that the University records have gone through:
Man-ual Transcription , Content Modification , Organizational & Struc-tural Modifications and
Artifactual & Representational Form . The first transformation the records (handwritten, originally in Latin)underwent involved their manual transcription. This process isdefined as “ the effort to report—insofar as typography allows—precisely what the textual inscription of a manuscript consistsof. ” [20, p.201]. The records were transcribed by Maitland-Andersonand later by Smart, who verified and in some cases re-transcribedaitland-Anderson’s work. Our interviews with Hart and Smarthighlight the effort and level of interpretation inherent in this processwhich involves extensive experience in paleography: “ [...] it takestime, it takes experience, and you have to learn how to read the oldhands. ” [Hart]. A paleographer also often needs to transcribe Latintexts: “
Latin has a lot of abbreviations within it, so, immediately,you need to have somebody who can understand Latin and expandabbreviations correctly. ” [Hart]. Hart also stresses that a transcrip-tion of historical records can never be considered a reproduction ofthe originals; interpretation is necessary: “
You’re dependent on theability to read the language but also to read the hands in order to beable to interpret. And this is why it’s never a 100% certain that theperson who’s transcribing has got it absolutely right. ” [Hart]. Smarthimself acknowledges this in relation to transcription: “The furtherback in time you get, the more difficult it becomes, so that with thepresent one [student records from 1413 to 1579], I am not evensure if I got the names right. ” [Smart]. Interpretation is necessaryin the transcription process and it will introduce uncertainties, butwithout the meticulous work of Anderson and Smart, the knowledgewithin the original Roll would only be available to paleographers:and by transcribing the Roll, they have protected the physicality ofthe original materials and its onward curation.
Maitland-Anderson aimed to preserve the content included in theoriginal Matriculation/Graduation Roll. Smart however, deliberatelyexcluded some student information such as their age [24]. At thesame time, he vastly expanded the demographic information aboutstudents and staff by researching the University archives (e.g., li-brary records, class lists, or medical degree testimonials) as wellas national and church records, academic publications, newspapers,individual/family/national biographies, and history books for ad-ditional information. He even corresponded with living relativesand traveled to graveyards to find information on monumental in-scriptions. As part of his archival work, Smart had to interpretinformation from multiple record collections in order to extract us-able and consistent snippets to include in the existing student andstaff records. His curatorial expansion of the historical records isremarkable and provides a much richer picture of University stu-dents and staff than the original records. However, Smart himselfalso emphasizes the limitations of his work in terms of complete-ness: “
I simply used the sources that were available at the time. Butsince it [BRUSA] was published, of course, a lot of new resourceshave become available. The Internet has become available. I didn’thave any of that. ” [Smart]. Crawford’s project further expanded thehistorical records by adding URLs to student and staff publicationswhere available. While all these expansions of the records were donemanually through extensive research, we expanded the records com-putationally using Google’s geocoding API to link locations of birthand death with exact geographic coordinates—a requirement forthe creation of geospatial visualizations that introduce uncertaintiesdue to ambiguities in historical place names [27]. These expan-sions of the original records have contributed to the records’ overallvalue and research potential, but also, again, introduced additionalinterpretation layers as well as uncertainty.
Another category of transformation processes includes organiza-tional and structural changes which can have a strong influence onhow people engage with and make sense of historical and culturalcollections. Maitland-Anderson decidedly aimed to avoid modifi-cations of the original records as much as possible: “
The reader ofthe printed Roll is, thus, as nearly as may be, in the same positionas the consulter of the manuscript Roll. ” [18, p.62]. However, histranscribed version of the records moves away from the originalrecords’ tabular representation by excluding the explicit labeling ofindividual parts of the records (see Fig. 1.1 & 2). He also removed the numbering of individual records. Nevertheless, the order oflisted records still mirrors the order in which students signed the Ma-triculation Roll. A more major structural modification is introducedby Smart who moved away from this originally temporal structureof the records and organized them alphabetically. This enables easylook-up of individual names, but the inherent chronological order ofthe records is lost. Smart also introduced an implicit internal struc-ture to the additional information he gathered for each record. Allrecords contain consistent sections (name, education, birth, floruit,and death), although these are only visible in each record’s internalstructure; no explicit labels are provided.This internal structure was kept and further emphasized inCrawford’s project where consistent tags ( < name > , < education > , < birth > , < floruit > and < death > ) were applied to each record. Tag-ging makes the implied internal structure of each record explicit andallows the identification of individual record parts across the collec-tion. The process requires an interpretative effort, as Patrick McCannexplains: “ [With TEI] you got a very rigid structure. [Rigid] in termsof the kinds of elements you can have and the kinds of informationthose things can describe. So, there is necessarily a change to thedata in that process. ”. When it comes to the external structure of therecords, Crawford’s team divided the register into 11,894 individualXML:TEI files without any order. The order and organization ofrecords purely depends on search queries put forward by the user.For example, text-searching for a particular student name will bringup all records that contain this name. Our process of transformingthe TEI-tagged records into a relational database further emphasizesthe rigid structure introduced by Crawford and colleagues (basedon Smart’s prior work): information included in each record is seg-mented into tables. This enables more flexibility when it comes tosearching and visually representing the records, but can be consid-ered as a strong interpretation step that permits certain perspectiveson the records and hinders others. Figure 1 clearly shows that as part of curatorial and interpretationprocesses, the records have fundamentally changed both in theirartifactual and representational form. Hart describes the originalMatriculation/Graduation Roll as a “ [...] lovely big book. Physicallylarge, it’s labeled Matriculation and Graduation Roll 1739 - 1888.[...] This is clever, because it has one end—matriculations—and onanother end, backwards, it has graduations. They’ve used the samevolume for two purposes, and they’ve simply turned it over in themiddle. ” Maitland-Anderson describes the Roll as “ an autographalbum of a most interesting kind. ” [18, p.62]. The “mechanicalprint” artifacts produced by Maitland-Anderson [18] and Smart [24]broaden this collection’s audience to non-expert readers, but removefurther paleographical inquiry, and do not support the same kindof intriguing reading affordance or human “touch” implied by thehand-crafted book and handwritten text.The transformation of the records from physical to digital formis an even more significant change. Physical affordance is replacedby on-screen interactions; nothing remains of the aesthetics andmateriality of the original records—a potential problem discussed inprevious research [26]. The University records are a good exampleof this: their original artifactual form illustrates, for example, thediversity of individuals attending the University at the time, as visiblein their unique signatures [27], and the ways in which knowledge waspassed on at the time through record keeping. Although searchableby name and keyword, the digital versions of the records occludeall prior efforts of transcription and interpretation in a way thatalso changes how the information is represented artifactually andepistemologically (e.g., alphabetical re-ordering).Apart from the visualization, all modification processes appliedto the University records are text-based. Our transition from textto abstract visual representations is perhaps the most (visually) no-table transformation the records have gone through. For example, igure 2: Prototype to illustrate what a provenance-driven visualization could look like. birth locations are given geographical context though a map view,and people from the same country of birth are aggregated in a barchart view to provide an overview of students’ demography. Thisvisual transformation of records enables “birds-eye” perspectiveson the records and reveals high-level trends and patterns. However,individual records and the people behind them get lost along withall the previous transformations. The viewer is left with a mere tipof an information/context “iceberg” which, even though rhetoricallypowerful, can only portray certain perspectives on the records.It is crucial to emphasize that while these categories apply to ourcollection of historical University records, other types of historicaland cultural collections may reveal additional categories of trans-formations and also additional nuances to the ones introduced here.However, we do believe that the list we present here, in some form,likely applies to other historical and cultural collections. What ourcase study shows is that there is not only a variety of transformationprocesses that take place when digitizing historical collections, butalso that these processes have a strong influence on (1) the content ofrecords, (2) how these records, individually and as a collection, canbe represented, and (3) how people may engage with and interpretthem on a physical and cognitive level. Representing only the finalstage of data collected from such collection—as is most often thecase—deprives the viewer of important contextual information and,therefore, can skew interpretation. Again, our own initial visualiza-tions (see Fig. 1.5) may lead viewers to assume completeness of thestudent and staff records from 1747–1897 when there are not onlygaps, but also layers of interpretation and curatorial decisions. Thisis also problematic from an ethical perspective because the people in-volved in these transformations often remain unacknowledged [2, 3].This led us to the question of if and how visualization could be lever-aged to not only focus on historical and cultural collections’ content,but also on the transformation processes that such collections havegone through and the people behind these processes. ROVENANCE -D RIVEN V ISUALIZATION
The idea of provenance-driven visualization is in line with recentdiscussions in the field of visualization and the (digital) human-ities toward critical approaches to data- and visualization-driven research processes. Diakopulous & Hullman emphasize the impor-tance of data provenance in narrative visualization as a means for“transparency and trustworthiness of the presentation source to theend-users” [14, p.2234]. A similar approach is pointed at by Do-erk et al. who also highlight the importance of trustworthiness inthe field of visualization that can be achieved through disclosureof decisions made with the data [5]. Drucker’s critical reflectionon the role of visualization in humanities research stresses the con-structed nature of data and calls for visualization approaches thatportray data as “interpreted knowledge, situated and partial, ratherthan complete” [6]. Correll highlights the importance of consider-ing the ethical implications of visualization from the perspective ofgiving proper credit to the people and labor involved in all relatedprocesses [2].On a practical level, in order to provide context for a visualizationand its underlying data, Wrisley has introduced the concept of pre-visualization [31]. This idea aims to introduce textual prefaces thatare common in books (see Maitland-Anderson [18] or Smart [24]as examples), or in web-based projects’ ‘About’ sections (see Craw-ford ). Another approach is introduced by Peoux & Houllier whopropose abstract process diagrams to reveal transformation pro-cesses in the context of information management [23]. In contrast,provenance-driven visualization takes a data/visualization-drivenapproach where transformation processes inherent in a historicalor cultural collection are not only identified, but also characterizedin the form of data and then visually represented in the form ofstatic or interactive visualizations so they can be directly exploredby viewers, perhaps even alongside more content-based visualiza-tions. As such, provenance-driven visualization could be consideredas a visual and interactive type of pre-visualization. Provenance-driven visualization provides a visual trace of the multiple contexts,formats, and curatorial decisions embodied in historical data, rec-ognizing the importance of the onward value and interpretation ofsuch decisions. Below we illustrate the idea of provenance-drivenvisualization, based on our case study of the St Andrews’ HistoricalUniversity Records. https://arts.st-andrews.ac.uk/biographical-register/about-the-project/ P ROVENANCE - DRIVEN V ISUALIZATION P ROTOTYPE
Figure 2 shows an example of a provenance-driven visualization that we designed to make visible many of the transformation pro-cesses inherent in the St Andrews’ historical University records. Thevisualization highlights the stages of transformation in the form offive layers (see Fig. 2.1–5). Each layer represents the characteristicsof the historical records as a result of the transformation processesthey have gone through. The visualization combines overviewsof all records to highlight changes in their content and structure(see Fig. 2.1–5) with details on individual records enabling a close-reading perspective (see Fig. 2.6). Layer 1: Original Records.
The amount and structure of orig-inal records is represented in form of a temporal bar chart at thebottom layer of the visualization (see Fig. 2.1). Student recordsare aggregated and organized according to their temporal distribu-tion, hinting at the chronological order in which student signatureswere initially collected in the Matriculation Roll. Using a sketch-based stroke for bars we emphasize the unique characteristics ofthe original, handwritten records’ Artifactual & RepresentationalForm . Hovering over a bar reveals the student numbers in this yearand provides individual names of corresponding students to the rightin the “Record View” (see Fig. 3.1). Variation in fonts emphasizesthe ‘different hands’ that signed the Matriculation Roll.
Layer 2: First Transcription.
Maitland-Anderson’s transcrip-tion work on the records left the temporal distribution of studentrecords unchanged; so we represent student records, again in formof a temporal bar chart in this layer (see Fig. 2.2). However, toemphasize the results of Maitland-Anderson’s transcription processthat transformed the hand-written records into print form we useda smooth stroke for the bar chart, hinting at the unifying effect ofthis process. As in the previous layer, hovering over a bar revealsthe student numbers in this year and provides individual names ofcorresponding students to the right (see Fig. 3.2). However, studentnames are shown in the same font, again, hinting at the print-basedcharacter of records after transcription.
Layer 3: Expansion & Re-structuring.
The next layer showsthe result of Smart’s work on the student records, highlighting inparticular his
Content Modification (expansion of record content)and
Organizational & Structural Modification (from chronologicalorder to alphabetical index) (see Fig. 2.3). The bar chart remains asthe visualization technique of choice, but records are aggregated bythe first letter of students’ last names, and bar width represents theaggregated amount of content (word count) by alphabetical index.We keep the depiction of bars in smooth lines, again, hinting at theprint-based format of the records at this stage. Hovering over a barreveals corresponding student numbers and shows correspondingstudent records to the right (see Fig. 3.3). Here, Smart’s expansionand structuring of individual records becomes visible as studentnames are expanded with additional demographic information.
Layer 4: Re-Organization & Re-structuring via TEI.
Craw-ford’s work revoked both the temporal and alphabetical structureof the historical University records and applied more structure toindividual records. Our visualization reflects this by portraying eachindividual record as a square where squares are randomly arrangedinto a pile with no inherent ordering (see Fig. 2.4). Hovering over asquare reveals the corresponding student record in TEI form to theright (see Fig. 3.4), including the applied TEI tags that provide astronger structure to each record.
Layer 5: Structuring into a Relational Database.
The mostrecent transformation process—the
Organizational & StructuralModification of Crawford’s TEI files into a relational database is https://tv8.host.cs.st-andrews.ac.uk/provenanceDrivenVisualization/ https://github.com/sebastian-meier/d3.sketchy/blob/master/README.md represented in the top-most layer of our provenance-driven visual-ization. The database structure is represented as a hierarchical treediagram where each circle represents a table and links depict rela-tions between them (see Fig. 2.5). This representation emphasizeshow records are no longer treated as individual entities; their contenthas been broken down into segments (e.g., locations of birth anddeath, degree, college, occupation, etc.). Hovering over one of thenodes will reveal data that can be extracted from the correspondingtable (e.g., occupations, see Fig. 3.5).Key to our visualization is that all layers are interactive andinterlinked. Hovering over an element in one layer automaticallybrings up the corresponding records in the “Record View” to theright and also highlights these in the other visualization layers. Forexample, when hovering over a year bar in the original record layerat the bottom, the same year is highlighted in Maitland-Anderson’slayer, and all student records from that year are shown in Smart’slayer, according to their last names. In Crawford’s layer individualsquares corresponding to these records are highlighted in orange andin our database representation, the user can see which tables in thedatabase contain which parts of the records (see Fig. 2).We consider this visualization prototype as one example of aprovenance-driven visualization—of course other approaches, alsoincorporating different visualization techniques are possible. Wefound that organizational and structural transformations can be vi-sualized relatively easily by modifying the grouping and/or spatialposition of visual elements. Other aspects such as changes in Arti-factual & Representational Form are more difficult to depict. For ex-ample, in our visualization the material form of the original recordsor subsequent books produced by Maitland-Anderson and Smart arestill invisible—we would like to incorporate this in from of scannedsnippets of original records, but this is not without significant effortin terms of data preparation.We also highlight that a provenance-driven visualization as pre-sented here should not be considered as a replacement of textualbackground information of the collection and corresponding datarepresented—some textual explanations of curatorial decisions andtransformations are likely necessary and should be incorporated intothe visualization. However, we believe that a visual and interactive
Figure 3: Internal Transformations. orm of representing the often layered transformation processes ofhistorical and cultural collections can be evocative and raise curios-ity that may promote critical perspectives on the visualization ofhistorical and cultural collections, stimulate discussion, and, ulti-mately, point the viewer to engage in more in-depth research onthe history and background of such collections. Based on our inter-views and the case study presented here, we discuss opportunities forprovenance-driven visualization, also in the context of ongoing workat the intersection of (digital) humanities and data visualization.
ISCUSSION
We see the concept of provenance-driven visualization as an oppor-tunity to (1) promote attribution & fairness by acknowledging thelaborious process of making historical and cultural collections avail-able to broader audiences, (2) expose different layers of knowledgeand interpretation that may have changed throughout the years, (3)promote transparency of transformation processes that have been ap-plied to the collection, and (4) encourage interdisciplinary research . Attribution & Fairness.
Working with historical and culturalcollections in order to preserve them and/or to make them availableto a broader audience is a huge effort. Our interviews revealedyears of work behind each transformation of the historical Univer-sity records. For example, Smart worked on the expansion of therecords for 60 years. Making transformations of historical recordsvisible ensures that this type of labor and the people involved areproperly acknowledged. Provenance-driven visualization providesan opportunity here as it can help to disclose the effort and nuancesin this work in evocative ways. This approach is in line with feministapproaches to data analysis and visualization [3, 4] as introducedby D’Ignazio and Klein who advocate for a stronger acknowledg-ment of the hidden labor involved in data-driven analysis processes.Provenance-driven visualization can thus be considered a practicalapproach to ensure ethical practices in visualization as advocatedby Correll who argues that we “ [...]ought to visualize hidden labor.Properly acknowledging and rewarding people for their labor isa key component of fairness. Certain kinds of labor (especiallythose performed by marginalized groups) are under-represented orunder-valued in our current schemes of commodification or val-uation. ” [2, p.8]. Our provenance-driven visualization prototypeaddresses this by making visible the transformation layers alongsidepeople’s names responsible for these. However, future work shouldinvestigate other forms of provenance-driven visualization that put,for example, the historians, archivists, librarians, paleographers, andcomputer scientists involved in such processes even more into focus.
Exposing Different Layers of Knowledge-Making.
Druckerargues, that the “ history of knowledge is the history of forms ofexpression of knowledge, and those forms change. What can besaid, expressed, represented in any era is distinct from that of anyother, with all the attendant caveats and reservations that attend tothe study of the sequence of human intellectual events, keeping usfrom any assertion of progress while noting the facts of change andtransformation. ” [6]. Engaging in provenance-driven visualizationis an opportunity for exploring and exposing transformations ofknowledge expression. Visualizing the transformation of historicalrecord collections can be considered an open-ended inquiry into theircontents. This means representing knowledge not as absolute butrather as layered, organic and ever-evolving. Maitland-Anderson’stranscriptions of the University records, for example, not only madeavailable the information ‘hidden’ behind the many student hands toa wider audience, but his work is also a manifestation of practicesand technologies at the time: he followed archiving practices com-mon for the era, and his curatorial decisions would no be the samehundred years before or after his work. Taking records’ transfor-mations into consideration through provenance-driven visualizationis an opportunity to highlight changes in knowledge production practices and underlying assumptions. However, this also raisesthe question of whether there are ways of integrating or combiningprovenance-driven with traditional, content-focused approaches tovisualization that aim at representing selected perspectives on thecollection based on corresponding data in its final stages.
Promoting Transparency.
Transparency about data, designand research processes has been highlighted as key for visualizationdesign studies [21]. To achieve this, textual descriptions and/ordiagrams are often used to portray the transformation processesinvolved in preparing a collection for visualization [11, 23]. Withprovenance-driven visualization we argue for a more data-drivenand visual approach to making transformation processes of recordsvisible and explorable in order to allow for a better understandingof curatorial decisions and their impact on the data and subsequentinterpretations. We see this as an opportunity for humanities andvisualization researchers to engage with and reflect on previous workconducted on the collection at hand in order to inform subsequentdata processing and design approaches. Moreover, it is also an oppor-tunity for the general public to better understand the background ofhistorical or cultural collections represented in digital space. How-ever, the concrete impact of provenance-driven visualizations topromote the critical interpretation of historical or cultural collec-tions has yet to be studied in detail, and we invite researchers in thehumanities and visualization to actively engage in this endeavor.
Encouraging Interdisciplinary Research.
Provenance-drivenvisualization and, with it, the visual disclosure of different stages oftransformation and interpretation of historical and cultural collec-tions can also be an opportunity for encouraging interdisciplinaryresearch not only at the intersection of visualization and humanitiesfields, but also involving public audiences. Many researchers fromthe humanities and visualization community have pushed for suchan approach [10, 12] and also for incorporating more diverse per-spectives in visualization research [17]. Through provenance-drivenvisualization we aim to trigger new questions about historical andcultural collections, both regarding insights they can promote as wellas opportunities for design. The different perspectives that archivists,paleographers, digitization officers, visualization researchers, anddata analysts have about such collections can be very enriching, andwe believe that both the design and exploration of provenance-drivenvisualizations can promote interesting discussions.
ONCLUSION
We have introduced and illustrated the concept of provenance-drivenvisualization as an approach to visualizing historical and cultural col-lections that focuses on the explorations of records through the lensof the layered transformation processes they have gone through. Byexternalizing the “tremendous rhetorical force” [3] of these layers,provenance-driven visualization exposes the individual and com-bined curatorial and interpretative efforts of the people who havebeen working with these collections. In contrast to visualizationapproaches that focus on representing the content of historical andcultural collections in its final processed stage, provenance-drivenvisualization enables viewers to see a more nuanced perspective onthe curatorial history of such collections in order to inform critical in-terpretation and research perspectives. Our case study have outlinedopportunities for provenance-driven visualization which gives creditto the people and labor involved in preparing historical and culturalcollection for digitization and visualization, exposes different layersof knowledge-making, promotes data transparency and underlyingcuration processes, and, ultimately, encourages interdisciplinaryresearch. We hope this paper will spark further practical explo-rations of provenance-driven visualizations and how this approachcan impact the interpretation of historical and cultural collections.
EFERENCES [1] R. Boyatzis.
Transforming Qualitative Information: Thematic Analy-sisand Code Development . Sage Publications, 1998.[2] M. Correll. Ethical Dimensions of Visualization Research. In
Pro-ceedings of the 2019 CHI Conference on Human Factors in ComputingSystems , pp. 1–13, 2018.[3] C. D’Ignazio and L. Klein.
Data Feminism . Strong Ideas. MIT Press,2020.[4] C. D’Ignazio and L. F. Klein. Feminist Data Visualization. In
Workshopon Visualization for the Digital Humanities (VIS4DH), IEEE , 2016.[5] M. D¨ork, P. Feng, C. Collins, and S. Carpendale. Critical InfoVis:Exploring the Politics of Visualization. In
Proceedings of CHI ExtendedAbstracts on Human Factors in Computing Systems , pp. 2189–2198,2013.[6] J. Drucker. Humanities Approaches to Graphical Display.
DigitalHumanities Quarterly , 5(1):1–21, 2011.[7] D. Edelstein, P. Findlen, G. Ceserani, C. Winterer, and N. Coleman.Historical Research in a Digital Age: Reflections from the Mapping theRepublic of Letters Project.
American Historical Review , 122(2):400–424, 2017.[8] Google. Google Arts & Culture. Library Catalog: artsandcul-ture.google.com.[9] G. Guest, K. MacQueen, and E. Namey.
Introduction to AppliedThematic Analysis . Sage Publications, 2012.[10] K. Hall, A. Bradley, U. Hinrichs, S. Huron, J. Wood, C. Collins, andS. Carpendale. Design by immersion: A transdisciplinary approachto problem-driven visualizations.
IEEE Transactions on Visualizationand Computer Graphics , 26(1):109–118, 2020.[11] U. Hinrichs, B. Alex, J. Clifford, A. Watson, A. Quigley, E. Klein, andC. M. Coates. Trading Consequences: A Case Study of CombiningText Mining and Visualization to Facilitate Document Exploration.
Digital Scholarship in the Humanities , 30(suppl 1):i50–i75, 2015.[12] U. Hinrichs, S. Forlini, and B. Moynihan. In defense of sandcastles:Research thinking through visualization in digital humanities.
DigitalScholarship in the Humanities (DSH) , 34(Issue Supplement 1):i80–i99,2019.[13] U. Hinrichs, H. Schmidt, and S. Carpendale. EMDialog: BringingInformation Visualization into the Museum.
IEEE Transactions onVisualization and Computer Graphics , 14(6):1181–1188, 2008.[14] J. Hullman and N. Diakopoulos. Visualization Rhetoric: FramingEffects in Narrative Visualization.
IEEE Transactions on Visualizationand Computer Graphics , 17(12):2231–2240, 2011.[15] E. Hyv¨onen, P. Leskinen, E. Heino, J. Tuominen, and L. Sirola. Re-assembling and enriching the life stories in printed biographical regis-ters: Norssi high school alumni on the semantic web. In
InternationalConference on Language, Data and Knowledge , pp. 113–119, 2017.[16] S. Jnicke, G. Franzini, M. F. Cheema, and G. Scheuermann. VisualText Analysis in Digital Humanities: Visual Text Analysis in DigitalHumanities.
Computer Graphics Forum , 36(6):226–250, 2017.[17] B. Lee, K. Isaacs, D. A. Szafir, G. E. Marai, C. Turkay, M. Tory,S. Carpendale, A. Endert, and T.-M. Rhyne. Broadening IntellectualDiversity in Visualization Research Papers.
IEEE Computer Graphicsand Applications , 39(4):78–85, 2019.[18] J. Maitland Anderson.
The matriculation roll of the University of StAndrews, 1747 1897 . Blackwood, Edinburgh, 1905.[19] J. Maitland-Anderson.
Early records of the University of St Andrews :the graduation roll, 1413 to 1579, and the matriculation roll, 1473 to1579 . Printed by T. and A. Constable for the Scottish History Society,Edinburgh, 1926.[20] D. L. V. Meulen and G. T. Tanselle. A System of Manuscript Tran-scription.
Studies in Bibliography , pp. 201 – 212, 1999.[21] M. Meyer and J. Dykes. Criteria for Rigor in Visualization DesignStudy.
IEEE Transactions on Visualization and Computer Graphics ,26(1):87–97, 2019.[22] T. B. Museum. The Museum of the World. The British MuseumLibrary Catalog: britishmuseum.withgoogle.com.[23] G. P´eoux and J.-R. Houllier. To Visualize Past Communities: A So-lution from Contemporary Practices in the Industry for the DigitalHumanities.
Digital Humanities Quarterly , 11(2), 2017. [24] R. N. Smart.
Biographical Register of the University of St Andrews,1747-1897 . St Andrews University Library, 2004.[25] R. N. Smart.
Alphabetical register of the students, graduates andofficials of the University of St Andrews 1579-1747 . University of StAndrews Library, St Andrews, 2012.[26] J. B. Stefania Forlini, Uta Hinrichs. Mining the material archive:Balancing sensate experience and sense-making in digitized print col-lections.
Open Library of Humanities , 2018.[27] T. Vancisin, A. Crawford, M. Orr, and U. Hinrichs. From people topixels: Visualizing historical university records. In
Proceedings of the5th Biennial Transdisciplinary Imaging Conference 2018 (Transimage2018) , pp. 41–57, 2018.[28] T. Vancisin, M. Orr, and U. Hinrichs. Illuminating Past Labor: MakingTransformation Processes of Historical Documents Visible. In
Pro-ceedings of the ADHO Conference on Digital Humanities (DH2020)(Long Presentation) , 2020.[29] M. Whitelaw. Generous Interfaces for Digital Cultural Collections.
Digital Humanities Quarterly , 2015.[30] F. Windhager, P. Federico, S. Miksch, and E. Mayr. Visualization ofCultural Heritage Collection Data: State of the Art and Future Chal-lenges.
IEEE Transactions on Visualization and Computer Graphics ,25(6):2311–2330, 2018.[31] D. J. Wrisley. Pre-visualization. In