A 25 Year Retrospective on D-Lib Magazine
AA 25 Year Retrospective on D-Lib Magazine
Michael L. NelsonOld Dominion UniversityNorfolk, VA USA [email protected]
Herbert Van de SompelData Archiving and Networked ServicesThe Hague, Netherlands [email protected]
August 28, 2020
Abstract
In July, 1995 the first issue of D-Lib Magazine was published as anon-line, HTML-only, open access magazine, serving as the focal point forthe then emerging digital library research community. In 2017 it ceasedpublication, in part due to the maturity of the community it served aswell as the increasing availability of and competition from eprints, in-stitutional repositories, conferences, social media, and online journals –the very ecosystem that D-Lib Magazine nurtured and enabled. As long-time members of the digital library community and authors with the mostcontributions to D-Lib Magazine, we reflect on the history of the digitallibrary community and D-Lib Magazine, taking its very first issue as guid-ance. It contained three articles, which described: the Dublin Core Meta-data Element Set, a project status report from the NSF/DARPA/NASA-funded Digital Library Initiative (DLI), and a summary of the Kahn-Wilensky Framework (KWF) which gave us, among other things, DigitalObject Identifiers (DOIs). These technologies, as well as many more de-scribed in D-Lib Magazine through its 22 years, have had a profound andcontinuing impact on the digital library and general web communities.
In July, 1995, the Corporation for National Research Initiatives (CNRI) pub-lished the first issue of D-Lib Magazine ( ). D-Lib Magazinewas the most visible and impactful component of the D-Lib Forum adminis-tered by CNRI and funded by the Defense Advanced Research Projects Agency a r X i v : . [ c s . D L ] A ug DARPA). In July, 2017, D-Lib Magazine published its 265th and final issue,bringing to a close a successful 22 year run that saw it evolve into an entityaround which the entire digital library (DL) community coalesced. D-Lib Mag-azine was itself an innovation: it was published in HTML only and therebyencouraged exploration in scholarly publishing with hypertext and hyperme-dia, it was open access with no article processing charge so it reached a broadcommunity, its “magazine” focus and initially monthly publication schedule fa-cilitated community building in a pre-blog and pre-social media world, and itfound the elusive middle ground between researchers and practitioners.During its 22 year run, D-Lib Magazine offered several opportunities forself-reflection for both the magazine and the community at large. In 2000, BillArms surveyed the first five years [10]. In 2005, a ten year anniversary spe-cial issue was published with contributions from many of the central figuresof D-Lib Magazine and the DL community at large [54, 128]. The 20 year an-niversary had a more muted tone, with only the issue’s editorial marking theevent [65]; perhaps because editor Larry Lannom knew the time for the finaleditorial was not far off [66]. So we take this, the 25 year anniversary of thefirst issue, to reflect on the impact of D-Lib Magazine, both for the informationthat it conveyed as well as a proof-of-concept for many DL and web conceptsand technologies that we enjoy today. We provide this retrospective as thosefor whom D-Lib Magazine had a significant career impact, both as readers andauthors; after the editors we were the top two most frequent authors, with 39unique contributions between us.Internet-based digital libraries (or “electronic libraries” as they were fre-quently known as prior to 1994) predated the popularity of the web; someof the well-known examples include: “Knowbots” [55], the CORE electronicjournal project [69], Netlib [25], xxx.lanl.gov [34], Computer Science TechnicalReports Project (CS-TR) [53], Wide-Area Technical Report Server (WATERS)[73], and the Langley Technical Report Server (LTRS) [85]. However, the NSF-funded Digital Library Initiative (DLI, 1994–1998) co-occurred with the rapidlyincreasing interest in the web, which was accelerated by the late 1993 releaseof the NCSA Mosaic browser [4]. As a result, the story of the early web par-allels the story of digital libraries and D-Lib Magazine. It is in this context ofthe nascent web that D-Lib Magazine should be understood, for it addressed acritical need in 1995. From the editorial of the first issue [31]:The magazine is itself an experiment in electronic publishing, whichfulfills its communication function for the Digital Library Forum bytesting the limits of writing in and for a wholly networked environ-ment. We have no – and propose no – print analogue, and we willbe most intrigued by substantive articles that take advantage of thepower of hypermedia while retaining the strengths of traditional,print publishing. https://doi.org/10.1045/july2005-contents .The first issue had three articles, then carried under the heading of “storiesand briefings”, reflecting the early position of a “magazine” and not an onlinejournal. In fact, they were summaries of existing conventional reports andpublications:1. “Metadata: the foundations of resource description” – a summary of theOCLC/NCSA Metadata Workshop [123] that produced the Dublin CoreMetadata Element Set, which continues today as the Dublin Core Meta-data Initiative (dublincore.org).2. “An agent-based architecture for digital libraries” – a description of thedistributed agent architecture explored in the University of Michigan Digi-tal Library (UMDL) [12]; the University of Michigan was one of six partic-ipants in the first NSF/DARPA/NASA Digital Library Initiative (DLI).3. “Key concepts in the architecture of the digital library” – an introductionto and contextualization of what would become known as the “Kahn-Wilensky Framework” (KWF) [51], part of which included handles [63],upon which Digital Object Identifiers [92] are implemented.As tentative steps in this new publishing experiment, all three articles aresingle authored (though they summarize multi-author publications), are rela-tively short, and have limited figures and references. Although D-Lib Magazinewould soon evolve into a venue where original research was published (e.g.,a 1999 editorial estimates that half of the contributions described original re-search [9]) and essentially functioned as an online journal, it was edited andnever refereed. This produced a well-known problem: if you wanted your ma-terial to reach a wide audience, it needed to be in D-Lib Magazine, but if youwanted academic “credit”, it needed to be in a conventional journal or refereedconference proceedings. In the time before Google, Google Scholar, CiteSeer,Microsoft Academic et al., this was a binary choice. Now it is possible for au-thors to gain the imprimatur of a quality journal or conference proceedings,and at the same time leverage the permissive attitude regarding pre-prints ande-prints of many publishers (e.g., ACM) to ensure that articles are discoverable https://twitter.com/search?q=%23JCDL2020&src=typed_query&f=live D-Lib Magazine was unique in many respects. First, although it clearly billeditself as a “magazine”, it quickly became a venue where original research waspublished. Second, although it initially offered additional services and cate-gories, the real innovation came about because it embraced HTML, and onlyHTML, as the publication medium. HTML allowed the articles themselves totake advantage of a rapidly evolving medium, including links and multimediain a way PDF-primary publications could not. Finally, with the vantage of25 years, the decisions made in how D-Lib Magazine would be structured andmaintained compare favorably to other Web-based publishing peers which be-gan shortly after D-Lib Magazine.
Although early issues had unsuccessful experiments with HyperNews [16] forcomments as well as a separate “technology playpen” / “technology spotlight”section [128], these features were eventually subsumed within the HTML pub-lishing experiment itself, and D-Lib Magazine’s primary unit of currency becameits articles. From 1995 through 2017, D-Lib Magazine published 265 issues and1062 articles (D-Lib Magazine actually defined and evolved many different cat-egories of contributions [127], but we refer to entries available from the titleindex as simply “articles”). The issues were published monthly through June,2006 (with the July/August issues published simultaneously as “7/8”), and itswitched to bimonthly publication from July/August 2006 through July/August2017. D-Lib Magazine was always “a magazine, not a peer-reviewed journal”and aimed for “articles that are 1,500 to 3,000 words in length and seldom ac-cept articles in excess of 5,000 words” [128]. To explore this, we took the titleindex:1. from the HTML extracted all links that begin with
,which includes the articles but excludes “in brief” and “opinion” entries.2. for each of the 1062 URLs, we used lynx -dump $URL > $filename ,which saves only the result of rendering the HTML into plain text.3. used wc -w on each of the resulting files to count the number of words inthe article.Using lynx to render the HTML is not perfect, but it reasonably approxi-mates the number of words in the article. Figure 1 shows the number of articlespublished each calendar year, and Figure 2 shows the average number of words4igure 1: Total articles published per year (1995 and 2017 were incompleteyears).per article for each calendar year. From Figure 2 we can see that althoughswitching to bimonthly publication in 2006 reduced the number of articles peryear, it did not halve it. Even though in 2017 D-Lib published only four issues(instead of six), the total number was only slightly down from 2016, perhapsindicating clearing the queue of remaining articles for the year.Figure 2 shows a trend of shorter articles in the first three years, and thenfinally hitting its stride in 1998, perhaps corresponding with the acceptance ofthe format by both authors and editors. From 1998 on, the values fluctuate (weare unsure of why 2009 has a low value) but it is not until the last six years(2012–2017) that the word count approximates the early peak from 1998.Even though it was never peer-reviewed, and did not have an editorial boardlike a conventional journal (though it did have an advisory board ), D-Lib Mag-azine had a significant impact in the conventional literature and served as a defacto journal. A ten year anniversary analysis (from 2005) showed that D-LibMagazine had acquired 147 citations from the ACM/IEEE Joint Conference onDigital Libraries and its predecessor conferences [128]. A more detailed author-ship and citation analysis showed over 1300 citations in the first 15 years [91]. As the initial editorial makes clear, D-Lib Magazine was an ongoing experimentin “electronic publishing” itself, and as a result was an early adopter and proof-of-concept for a lot of conventions and techniques that are now best practicesin the community. Perhaps most importantly, D-Lib Magazine was always pub-lished in HTML – and only in HTML: there was never a parallel PDF version.6igure 3: Library & Information Science, Google Scholar https://scholar.google.com/citations?view_op=top_venues&hl=en&vq=eng_libraryinformationscience (from 2020-07-19).Figure 4: D-Lib Magazine, Google Scholar https://scholar.google.com/citations?hl=en&view_op=search_venues&vq=D-Lib++magazine&btnG= (from 2020-07-19). 7igure 5: JCDL, Google Scholar https://scholar.google.com/citations?hl=en&view_op=search_venues&vq=joint+conference+on+digital+libraries&btnG= (from 2020-07-19).Figure 6: First Monday, Google Scholar https://scholar.google.com/citations?hl=en&view_op=search_venues&vq=First+Monday&btnG= (from2020-07-19). 8ubmissions were encouraged in MS Word , but the editors handled the conver-sion to HTML themselves. Adopting an HTML-only publishing strategy seemsobvious in retrospect, but considering the limitations of HTML ca. 1995 (cf.HTML5 [1] today) this was a bold strategy. Despite the dominance of the PDFin the scholarly publishing ecosystem, the HTML format allowed authors to ex-periment with multimedia and interactivity extensions not possible with PDFs.Quoting from the October, 1995 editorial, one gets a glimpse of the willingnessto explore the boundaries of what an HTML-only publication could be [32]:You will see that the stories have varied in their treatment of images,for example, in the background color, and even in the organizationof the text itself. But I do not believe that these individual treat-ments posed a problem for our readers, partly because the storiesare unified by subject, partly because the medium is itself experi-mental and preconceptions are fairly few, and partly because in eachcase, the structure of the story reinforces and extends its informa-tional content. Thus, the highly visual story that the Informediateam wrote on indexing video subtly embodies the notion of framesin its file structure. It offers readers multiple paths through thematerial and cues through buttons not unlike the signage found inmuseums and airports, and through menus that other writers for themagazine have also employed. In the same issue, the Netlib authorsused a classic, straightforward narrative approach with an internalmenu to explain the complex structure of a library of mathematicalsoftware .As authors, we certainly appreciated the editors’ willingness to explore whatnew features were possible in an HTML scholarly publication. For example, inour 1999 article about the Universal Preprint Service [113], we included screencams to show the now defunct ups.cs.odu.edu digital library in action. Thosescreen cams were stored in .exe format and would thus likely require emula-tion to run now, but those animations (stored at dlib.org) would not have beenpossible in a PDF. Another of our articles from 2002 used animations, but thistime in a more web-friendly and standard MPEG format [83]. In a 2005 article,we did not use animations, but did have 377 images linked from the article, afeat that would have been unwieldy at best in PDF [17]. Our last article inD-Lib Magazine used JavaScript to make annotated hyperlinks in the articleactionable, thereby serving as a demonstration of how “Robust Links” couldwork in practice [115].Another significant decision was to fix the template and formatting of pastissues, and not reformat earlier issues with updated templates. Updates were https://robustlinks.mementoweb.org .only made in the cases of errata and corrigenda . D-Lib Magazine updated theirdesign as tools and experience allowed, but the first issue looks the same todayas it did 25 years ago, thereby serving as a monument to the best practices ofthe time. Indeed, the live web version of the first issue and the web archivedversion of the first issue are indistinguishable (Figures 7, 8, 9). Not only didthey keep their HTML and style intact, but thanks to an ongoing commitmentfrom CNRI all of D-Lib Magazine’s issues are still available on the live web, withno changes in their URIs since the fourth issue (October, 1995) . Although wehave long known “Cool URIs Don’t Change” [13], the reality is that most do,and persisting over 5,000 URIs for up to 25 years is an accomplishment in itself.Another groundbreaking innovation for D-Lib Magazine was that it wasopen access before that term was even coined, with the authors retaining theircopyright, and D-Lib requiring neither subscriptions for readers nor article pro-cessing charges from the authors. This ensured it reached a wide audience, both Although we thought we remembered this policy being explicitly stated somewhere, wecould find no record of it. In emails with former editors Larry Lannom and Cathy Rey,neither could recall such a document. The closest we could find was “Once the issue has beenreleased, only vital corrections or changes will be made to the file. These changes will benoted and dated at the end of the file.” in the Author Guidelines: . The first three issues were published at (cf. ), and it was not until the Octo-ber, 1995 issue that ). . Figure 9: Using curl to download both the live web version and first archivedversion (from 1997 and in “raw” format, via id ) and show they produce thesame md5 hash. 11uthors and readers, but it also resulted in chronic funding problems after theexpiration of the initial grants that supported the D-Lib Forum ended. In aneditorial for the ten year anniversary issue [54], Robert Kahn said:Producing a high quality magazine on the net each month turnedout to be somewhat less difficult than I would have expected, duealmost entirely to the quality of the editorial staff and the willing-ness of the readership to contribute interesting articles. Funding thecontinued production of the magazine has been, perhaps, its biggestchallenge. While the initial funding from DARPA covered most ofthe early costs, DARPA was unable to continue the support indef-initely. Subsequent funding from NSF helped greatly, but coveredperhaps half the ongoing costs, with CNRI picking up the other half.Although subscriptions and author fees were considered [64], they were neverimplemented. In 2007, the “D-Lib Alliance” membership organization was cre-ated [126] that assisted with funding, but the final issue in July 2017 acknowl-edged that decreased financial support was part of the reason for ceasing pub-lication [66]:Financial support for the magazine has waned over recent years, thenumber of unsolicited high quality articles thrown over our transomhas declined, and the very phrase ’Digital Libraries’ has gone fromsounding innovative to sounding a bit redundant. In short, it seemedlike time to make a graceful exit.Another innovation that resulted from open access HTML-only publishingwas D-Lib being the first venue to have its handles (and later DOIs, to be dis-cussed further in section 3.3) resolve to articles themselves, not a landing pagedescribing the article. By eschewing PDF, the format of paywalls, D-Lib Mag-azine was able to subtly reinforce that its content was part of the Web, andnot something separate, to be downloaded via the Web. The ability to link andprovide embedded multimedia enables the scholarly object to enjoy the sameadvances (and risks, such as link rot [49, 57, 75]) as the rest of the web. Anothersubtle result of embracing handles (and DOIs) is that although D-Lib Magazinewas published as a conventional serial, it also embraced persistent identifiers forindividual articles (owing from the computer science technical report heritage ofCNRI’s technology), which facilitates the disaggregation of serials into articlesthat are directly and persistently identifiable, which reinforces them as being“on the Web” as first-class citizens.Another innovation D-Lib Magazine embraced was the use of site mirrorsallowing users in Europe and Asia to interact with geographically closer mir-rors for faster response. That approach to address bandwidth limitations wascommon at the time and is now solved via content delivery networks (CDNs).Three of the D-Lib mirrors are still functioning, down from a peak of five . In http://web.archive.org/web/20150224045836/mirror.dlib.org/about.html .addition to the utility the mirrors provide, they were also presumably intendedas demonstrators for more advanced Handle resolution techniques, such as beingable to resolve to one of multiple URLs [93]. There were other contemporary experiments in on-line publishing from gener-ally the same community as well. For example, Ariadne is an online magazinethat began publishing in 1996 and is still publishing (78 issues since 1996). Itwas similarly not peer-reviewed, aimed at practitioners, and was initially fundedby the Joint Information Systems Committee (JISC, since renamed Jisc), a UKactivity that can be considered roughly analogous to the USA DLI program.Ariadne also had an HTML focus from the very beginning. It has changedpublishers a few times, as well as changed its URIs and template through time(Figures 10, 11, 12). It does not use handles or DOIs.First Monday began in 1996 as a monthly peer-reviewed journal, and is stillbeing published. But over time, its URIs have changed (from firstmonday.dkto simultaneously firstmonday.org and a path within journals.uic.edu), and its . Figure 12: The original URI for the first issue of Ariadne is 404.14igure 13: An article from the first issue of First Monday, archivedin 1998 ( web.archive.org/web/19980205181322/http://firstmonday.dk/issues/issue1/ecash/index.html) .template changed along the way. It uses DOIs, and we believe it adopted themin 2013.There is more difference in the original First Monday (archived in 1998, Fig-ure 13), the current live Web First Monday (Figure 14), and the inner frameof the live Web First Monday (Figure 15) than first appears, a result of signif-icant reformatting of the articles over time. Figure 16 shows downloading thearchived raw version (via id ), the live version, and the inner frame of the liveversion, respectively. The Unix utility wc (word count) respectively shows thelines, words, and characters of each file, all of which are significantly different.The Journal of Digital Information (JoDI) began as a peer-reviewed jour-nal in 1997, and ceased publication in 2013 after irregular publication of 46issues. While it was active it transitioned from the University of Southamp-ton (jodi.ecs.soton.ac.uk and journals.ecs.soton.ac.uk (the former no longer re-solves (Figure 17)) to Texas A&M University and Texas Digital Library (jour-nals.tdl.org). The templates changed through time, and publication was alwaysa hybrid of either HTML or PDF. It did not use handles or DOIs.The web archives have archived registration walls, since JoDI originally re-quired a (free) account and login to browse. Since the Internet Archive crawlsonly the surface web (i.e., no login credentials), the end result is the earliest15igure 14: A live Web version of the same article: https://journals.uic.edu/ojs/index.php/fm/article/view/465/386 .Figure 15: The inner frame of the live Web version: https://journals.uic.edu/ojs/index.php/fm/article/download/465/386?inline=1 .16 curl -sL web.archive.org/web/19980205181322id_/http://firstmonday.dk/issues/issue1/ecash/index.html> first-monday-old$ curl -ksL https://journals.uic.edu/ojs/index.php/fm/article/view/465/386 > first-monday-now$ curl -ksL https://journals.uic.edu/ojs/index.php/fm/article/download/465/386?inline=1> first-monday-now-frame$ wc first-monday-*39 100 1941 first-monday-now1 4521 34585 first-monday-now-frame0 4379 32308 first-monday-old40 9000 68834 total Figure 16: The word count (wc) utility shows the differences in lines, words,and characters (respectively) for each version of the same article. % curl -I http://jodi.ecs.soton.ac.uk/curl: (6) Could not resolve host: jodi.ecs.soton.ac.uk
Figure 17: jodi.ecs.soton.ac.uk no longer resolves.versions of JoDI were not web archived around the time they were published.Eventually the requirement for logins ceased, and the earliest web archivedpages without a registration wall are from 2000, including snapshots of the ear-liest articles created two years after their original publication. In Figure 18, wesee an archived landing page for an article from the first issue of JoDI (1997).Clicking through (Figure 19) returns a 404 page from the Internet Archive sincethat article itself was not archived at that location because of login restrictions.Figure 20 shows an archived copy of that same article, meanwhile available ata different URI, created in 2000 when JoDI had removed the registration wall,and Figure 21 shows the same article now.
With the vantage point of 25 years, we can properly assess the significance of thefirst issue of D-Lib Magazine, especially the first three articles they published.Two of the articles introduced technologies that continue to shape the digitallibrary community (Dublin Core and DOIs), and the other article is a testamentto the significant funding that the NSF, DARPA, and NASA put into researchin digital libraries, with one of the most prominent outcomes being Google [42].
The first article, “Metadata: The Foundations of Resource Description” [122],is a summary of the OCLC/NCSA Metadata Workshop Report, which resultedfrom the workshop in Dublin, Ohio, only four months prior (March, 1995) [123].The Dublin Core Metadata Element Set (DCMES, or “Dublin Core”) was still17igure 18: A landing page for an article from issue 1 (1997), archivedin 1998: http://web.archive.org/web/19980715030423/http://jodi.ecs.soton.ac.uk/Abstracts/v01/01.berners-lee.html .forming at this point, with only 13 metadata elements, not the final 15, defined,and “DCMES” becoming the Dublin Core Metadata Initiative (DCMI) Terms.While the DCMI has gone on to issue over 70 specifications , today’s DCMITerms can trace their origin to the 1995 Metadata Workshop and the originalDCMES (Table 1). The impact of Dublin Core is far beyond what we can coverhere, but Figure 22 shows a search for “dublin core” in Google yields over 11Mhits, and Figure 23 shows a similar search in Google Scholar yields over 98K hits.Dublin Core would form its own community, complete with its own gover-nance and document series. But D-Lib Magazine would continue to be a venuefor conveying the status of Dublin Core [23, 58, 124, 111], and other relatedWeb metadata efforts, such as PICS [78] and its progeny, RDF [77], and IEEELOM [26].While Dublin Core is abundantly used for the description of assets in a va-riety of content management systems , continues to this day to play a role inweb-based discovery, co-existing with similar formats such as the Open GraphProtocol [44] (Figure 24) and Schema.org [39] (Figure 25), yet facing some signif-icant competition from the latter when it comes to Search Engine Optimization[47]. https://lov.linkeddata.es/dataset/lov/ http://web.archive.org/web/19980715030423/http://jodi.ecs.soton.ac.uk/jodi/Articles/v01/i01/BernersLee/ produces a 404 since this page was not on the surface webin 1998, and since the server jodi.ecs.soton.ac.uk is no longer on the live web,we cannot patch the archive. 19igure 20: In 2000 JoDI changed the URIs and removed login restric-tions. http://web.archive.org/web/20000830084738/http://jodi.ecs.soton.ac.uk/Articles/v01/i01/BernersLee/ .Figure 21: The same article on the live web in 2020. https://journals.tdl.org/jodi/index.php/jodi/article/view/3/3 .20igure 22: 11M+ hits for a Google search for “dublin core”.Figure 23: 98K+ hits for a Google Scholar search for “dublin core”.21able 1: Original 1995 DC elements and the current terms.1995 DCMES Current DCMI TermsSubject SubjectTitle TitleAuthor CreatorPublisher PublisherOtherAgent ContributorDate DateObjectType TypeForm FormatIdentifier IdentifierRelation RelationSource SourceLanguage LanguageCoverage CoverageDescriptionRights The second article, “An Agent-Based Architecture for Digital Libraries” [15],is a high-level summary of the University of Michigan Digital Library (UMDL)project, one of the original six NSF/DARPA/NASA Digital Library Initiative(DLI) projects. The DLI ran from 1994–1998, so the 1995 article only summa-rizes the earliest results.The architectural details of the UMDL are academically interesting, but thereal value in 2020 is reading the article as a time capsule of 1990s perceptionof the Web, DLs, and DL architecture. A quote from near the beginning of thearticle describes a scenario that we have since seen come to pass:The WWW, while it probably contains more information than anysingle traditional library, is arguably not as useful as a traditionallibrary because it lacks these services (particularly organization andsophisticated search support). No one is dismantling their librariesbecause of the WWW yet.The envisioned architecture focuses heavily on agents, which navigate a dis-tributed, heterogeneous tapestry of distributed repositories on behalf of the user.The model of distributed search was dominant in early DL architecture thinking,and was reflected in the design of search protocols like Z39.50 and WAIS, as wellas DLs such as WATERS [73], NCSTRL [21], NTRS [86], and many other ex-22
Figure 24: The Library of Congress home page with both Dublin Core ( dc. )and Open Graph ( og: ) support. % curl -s https://search.datacite.org/works/10.5281/zenodo.2597274