How many years can a tiny unbalanced parenthesis go unnoticed on a widely accessed Internet document, older than the World Wide Web itself ?
aa r X i v : . [ c s . D L ] J a n How many years can a tiny unbalancedparenthesis go unnoticed on a widely accessedInternet document, older than the
World WideWeb itself ?
Michele FinelliJanuary 12, 2021
Abstract Reading the RFC Index we noticed a syntax error: a tiny unbalanced paren-thesis on one of the first paragraphs, as reported on table 1 on the followingpage. For example, the file available at still has this issue, as of today.The error is clearly a minor one, but we realized it was present also inolder releases of the same document, for example looking at the Waybackmachine of https://web.archive.org we checked that the error is al-ready present on the first occurrence, dated 15 June 2007. Also, on the sameIETF site, there is a PDF file dating back to 2012-04-06 10:17 that showsthe same issue — the PDF, by the way, reports “CREATED ON: 04/04/2012.”so it is probably really sitting on that web folder since April 2012.The RFCs are a fundamental part of the Internet documents, the old-est of all: RFC 1, titled “Host Software”, was written by Steve Crocker andpublished on 7 April 1969. We may assume that, since there was a list ofRFCs, there was also a document with the list of the citations to the pub-lished RFCs and that a document like the rfc-index.txt file (maybe byanother name) surely existed since the ’70s. Because of the relevance ofthe RFCs we may also assume that the RFC index has been downloaded alot of times — but probably not read in detail many times, as the presentinvestigation suggests. As everybody does on their Christmas holidays, at least in civilized lands. Wrong
Obsoletes xxxx refers to other RFCs that this one replaces; Ob-soleted by xxxx refers to RFCs that have replaced this one. Up-dates xxxx refers to other RFCs that this one merely updates butdoes not replace) ; Updated by xxxx refers to RFCs that have up-dated (but not replaced) this one. Generally, only immediatelysucceeding and/or preceding RFCs are indicated, not the entirehistory of each related earlier or later RFC in a related series.
Right
Obsoletes xxxx refers to other RFCs that this one replaces; Ob-soleted by xxxx refers to RFCs that have replaced this one. Up-dates xxxx refers to other RFCs that this one merely updates (but does not replace) ; Updated by xxxx refers to RFCs thathave updated (but not replaced) this one. Generally, only im-mediately succeeding and/or preceding RFCs are indicated, notthe entire history of each related earlier or later RFC in a relatedseries.
Table 1:
The sentence is italic in the first block and bold in the second.This brings to the question that gives the title to the paper: “How manyyears can a tiny unbalanced parenthesis go unnoticed on a widely accessedInternet document, older than
WWW itself ?” (probably much older than theWWW.)
We began our quest for ancient
RFC indexes: we wanted to find the precisedate where the error was introduced, or the best estimate we could deter-mine.The main issue is that, while a web search for a rfc-index.txt file re-turns millions of hits, what we really need are some of the very few archived versions, not one of the many recent ones.An help came by noticing that the rfc-index.txt has some “brotherand sisters” documents, as stated in RFC2648: the
For Your Information fyi-index.txt , the
Standards std-index.txt and the
Best CommonPractices bcp-index.txt . Moreover, and more important for our inves-tigation, current releases of those files show the same mistake, with theexception of the std-index.txt files, because they do not have that para-graph. Besides, these three indexes (RFC, FYI and BCP) show a very similarstructure, and in particular the sentence “Obsoletes xxxx (. . . ) in a relatedseries.” is the most conserved region of the header — clearly they differin the second part of the file, where the proper citations list begins. So, it2 Analysis
Filename Bytes Date - ISO 8601 format fyi-index.txt 186821 1994-07-15 20:58:34.000000000 +0200rfc-index.txt 222733 1994-07-15 21:32:32.000000000 +0200std-index.txt 186821 1994-07-15 21:38:13.000000000 +0200
Table 2:
Metadata of index files downloaded from TU Clausthal’s FTPserver.seems reasonable to infer that they are generated by some software thatmakes the same mistake whenever it builds the indexes, and that we coulduse dates and references of either index to pin down the time of change.To cut a long story short, we believe that there is good evidence thatthe change happened between the 14 and the 15 of July 1994: the keyfinding was the FTP server of the Clausthal University of Technology (see theResources section 5 on page 5 for the URLs), which contains some indexesthat seems to mark exactly the threshold between the latest occurrence of acorrect text and the earliest occurrence of the mistake. We downloaded the files at the following URLs and analyzed their contentand metadata (for readability, the path is relative to ftp://ftp.tu-clausthal.de/pub/docs/rfc/ ): rfc-index.txt /other_indexes/rfc-index.txt fyi-index.txt /other_indexes/fyi-index.txt std-index.txt /other_indexes/std-index.txt which is in fact a sym-bolic link for the file /standards/std-index.txt Table 2 shows the file size and the ISO 8601 date format of the three files.The reader may notice a strange thing, namely that the fyi-index.txt and the std-index.txt have exactly the same size, and this is very sus-picious since their content should be quite different. Looking at the contentthe mystery unveils: all the indexes are in fact RFC indexes, despite the filename says otherwise.An MD5 comparison of fyi-index.txt and std-index.txt showsthat the files are indeed different. An inspection of the files shows that theyare the same until line 271: after that the fyi-index.txt has a sequenceof ˆ@ characters, as if it was corrupted (it is worth to mention that also the rfc-index.txt file is similarly corrupted, beginning from line 232). We have not been able to track the source. rfc-index.txt while the second is referred by two different mis-leading names. Reading and comparing the file content we see that the prop-erly named RFC index has these characteristics:1. an embedded date at line 4, ,2. the right sentence ,3. the list of citations reverse sorted, from the newest to the oldest .Instead, the misnamed RFC indexes have:1. the wrong sentence ,2. the list of citations sorted from the oldest to the newest .The above findings suggests that the rfc-index.txt file was probablycreated on 14 July 1994, as the embedded date suggests, and copied on theFTP server the day after; it could be useful if the list of citations showedfurther evidence of the above, but this can not happen, since the file beginswith RFC0001 and then it is corrupted well before it reaches RFCs issued byJuly 1994.We assume that the bug was then introduced and the RFC files generatedafterwards show the unbalanced parenthesis.To support this deduction, we see that the wrong files have a date onthe FTP server that is 15 July 1994, as it is reported in table 2 on theprevious page; even assuming that the date was changed for some reasonthe last RFC shown on std-index.txt — the only file that is not cor-rupted — is RFC1653, that was issued on July 1994. So, in any case, the std-index.txt file could not have been generated before the begin ofJuly 1994. To summarize:1. it is highly probable that a correct RFC index was generated on 14 July1994 ,2. a mistaken file, that had been necessarily created on July 1994 or later,is present on the FTP server of the Clausthal University of Technologywith a timestamp of 15 July 1994.In our opinion the most plausible scenario is that the rfc-index.txt files generated before 14 July 1994 had no error, that the mistake was in-troduced that day and thereafter the answer to the question posed at thebeginning of this paper is: an unbalanced parenthesis may go unnoticed formore that twenty-six years . We have further evidence that on December 1993 the RFC index was correct, so there issupport that before 1994 the mistake was not present.
There is an empirical law — dubbed Linus’ law — that states that “givenenough eyeballs, all bugs are shallow”. It was formulated by Eric S. Ray-mond in the book “The Cathedral and the Bazaar” and so named in honourof Linus Torvalds, Linux creator.The law applies to software, not to documentation, and it has been criti-cized so there is no clear evidence either of its validity or its falsity.Our little investigation would like to bring some evidence towards a bet-ter understanding of the idea behind the Linus’ law: is it true that simplyhaving a content under a wide public scrutiny ensures for its quality ? Ifwe compare syntax errors on documentation to software bugs in computercode — which is a not too-far stretched analogy, in our opinion — then thepresent paper gives a negative answer.We have tried to understand if there is some other reason behind the factthat the error went unfixed for so long, and among the issues we noticedthat:• it is not easy to provide a proper feedback for this kind of error: itis possible to provide a RFC errata , but it is limited to RFCs and thelast resort is emailing the mailto:[email protected] address — we did that on 10 of January 2021;• it is not explained how the indexes are created: we have not been ableto find the repository of the software that generates them and file abug report.In our opinion the main enabler is not the “number of eyeballs”, to quotethe law statement, but how easy it is to contribute changes . Clearly opensource and free software have this property, the same does not necessarilyhold for documentation, even if it is freely available (legally speaking, theRFCs licenses are very permissive) and freely distributable. TU Clausthal’s FTP server
The anonymous FTP server is reachable at theaddress: ftp://ftp.tu-clausthal.de/pub/docs/rfc/ rfc-index.txt ftp://ftp.tu-clausthal.de/pub/docs/rfc/other_indexes/rfc-index.txt fyi-index.txt ftp://ftp.tu-clausthal.de/pub/docs/rfc/other_indexes/fyi-index.txt std-index.txt ftp://ftp.tu-clausthal.de/pub/docs/rfc/standards/std-index.txt Excerpt of the header of the index files analyzed in this paper. rfc-index.txt fyi-index.txt ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~RFC INDEX-------------This file contains citations for all RFCs in numeric order.RFC citations appear in this format: author, or list of authors (terminated with a period), and the date(terminated with a period).The format and byte information follows in parenthesis. The format,either ASCII text (TXT) or PostScript (PS) or both, is noted, followedby an equals sign and the number of bytes for that version. Forexample (Format: TXT=aaaaa, PS=bbbbbb bytes) shows that the ASCII textversion is aaaaa bytes, and the PostScript version of the RFC isbbbbbb bytes.Obsoletes xxxx refers to other RFCs that this one replaces;Obsoleted by xxxx refers to RFCs that have replaced this one.Updates xxxx refers to other RFCs that this one merely updates butdoes not replace); Updated by xxxx refers to RFCs that have beenupdated by this one (but not replaced). Only immediately succeedingand/or preceding RFCs are indicated, not the entire history of eachrelated earlier or later RFC in a related series. std-index.txt ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~RFC INDEX-------------This file contains citations for all RFCs in numeric order.RFC citations appear in this format:
The author thanks Andrea ‘ap’
Paolini and Guido ‘zen’‘zen’