C. M. Sperberg-McQueen

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where C. M. Sperberg-McQueen is active.

Explore More

Publication

Featured researches published by C. M. Sperberg-McQueen.

World Wide Web | 1997

Extensible markup language

Tim Bray; Jean Martin Paoli; C. M. Sperberg-McQueen

XML is the lingua franca of the wireless Web. Its strength is in its generality: XML can describe virtually any kind of structured data. Once described, the data can be presented in other formats. Moreover, XML is already being used for a host of server-server communication applications, which make it possible for different data servers to easily exchange information. The trend toward a common format for representing data will doubtlessly present new opportunities for both Web and wireless Web clients.

International Workshop on Principles of Digital Document Processing | 2000

GODDAG: A Data Structure for Overlapping Hierarchies

C. M. Sperberg-McQueen; Claus Huitfeldt

Notations like SGML and XML represent document structures using tree structures; while this is in general a step forward from earlier systems, it creates certain difficulties for the representation of documents in which the structures of interest are not properly nested. Overlapping structures, discontinuous structures, and material which occurs in different orders in different parts, views, or versions of a document are all problems for SGML and XML. Overlapping structures have received attention from a variety of authors on SGML and XML, who have proposed various solutions including the use of non-SGML notations with translation into SGML for processing, the use of the concur feature of SGML, exploitation of conditional marked sections in the DTD and document instance, the imposition of various kinds of unusual interpretations on SGML/XML elements as milestones or as fragments of some larger ‘virtual’ element, or the use of detailed annotation separate from the base text being annotated.

document engineering | 2002

Towards a semantics for XML markup

Allen H. Renear; David Dubin; C. M. Sperberg-McQueen

Although XML Document Type Definitions provide a mechanism for specifying, in machine-readable form, the syntax of an XML markup language, there is no comparable mechanism for specifying the semantics of an XML vocabulary. That is, there is no way to characterize the meaning of XML markup so that the facts and relationships represented by the occurrence of XML constructs can be explicitly, comprehensively, and mechanically identified. This has serious practical and theoretical consequences. On the positive side, XML constructs can be assigned arbitrary semantics and used in application areas not foreseen by the original designers. On the less positive side, both content developers and application engineers must rely upon prose documentation, or, worse, conjectures about the intention of the markup language designer --- a process that is time-consuming, error-prone, incomplete, and unverifiable, even when the language designer properly documents the language. In addition, the lack of a substantial body of research in markup semantics means that digital document processing is undertheorized as an engineering application area. Although there are some related projects underway (XML Schema, RDF, the Semantic Web) which provide relevant results, none of these projects directly and comprehensively address the core problems of XML markup semantics. This paper (i) summarizes the history of the concept of markup meaning, (ii) characterizes the specific problems that motivate the need for a formal semantics for XML and (iii) describes an ongoing research project --- the BECHAMEL Markup Semantics Project --- that is attempting to develop such a semantics.

international world wide web conferences | 1995

HTML to the max: a manifesto for adding SGML intelligence to the World-Wide Web

C. M. Sperberg-McQueen; Robert F. Goldstein

Abstract HTML demonstrates that SGML markup is useful for networked information. How can it be made even more useful? One way is to extend the tag set from HTML to HTML2, etc. We argue here for a more radical approach: full SGML awareness in WWW. We believe the difficulties are small, the cost affordable, and the advantages overwhelming. SGML is a metalanguage for defining markup languages; HTML is just one instance of this infinite family. At present, documents in other SGML document types must be translated into HTML for display by a Mosaic client—sometimes this imposes unacceptable information loss. WWW browsers could handle other SGML document types without translation by launching a general-purpose SGML browser to view them, as they now launch graphics viewers; a better solution overall would be to build SGML display into the WWW browsers themselves. Either way, display of an SGML document would be controlled by a style sheet using a small number of display primitives (“bold”, “line break”, etc.) to specify the rendition of each element type. For “well-known” document type definitions (DTDs) like HTML, style sheets could be distributed with the browser, or built in. For other DTDs, the browser would fetch a style sheet from the server. Using style sheets, browser software can also make it easy to customize document display. DTDs and style sheets can be designed to accommodate extensions, ensuring that authors can make small extensions to the tag set with no change whatsoever in the target browsers and virtually no performance penalty.

Computers and The Humanities | 1995

Hierarchical encoding of text: Technical problems and SGML solutions

David T. Barnard; Lou Burnard; Jean-Pierre Gaspart; Lynne A. Price; C. M. Sperberg-McQueen; Giovanni Battista Varile

One recurring theme in the TEI project has been the need to represent non-hierarchical information in a natural way — or at least in a way that is acceptable to those who must use it — using a technical tool that assumes a single hierarchical representation. This paper proposes solutions to a variety of such problems: the encoding of segments which do not reflect a documents primary hierarchy; relationships among non-adjacent segments of texts; ambiguous content; overlapping structures; parallel structures; cross-references; vague locations.

Archive | 1994

The Text Encoding Initiative

C. M. Sperberg-McQueen

This paper describes the goals and work of the Text Encoding Initiative, an international cooperative project to develop and disseminate guidelines for the encoding and interchange of electronic text for research purposes. It begins by outlining some basic problems which arise in the attempt to represent textual material in computers, and some problems which arise in the attempt to encourage the sharing and reuse of electronic textual resources. These problems provide the necessary background for a brief review of the origins and organization of the Text Encoding Initiative itself. Next, the paper describes the rationale for the the decision of the TEI to use the Standard Generalized Markup Language (SGML) as the basis for its work. Finally, the work accomplished by the TEI is described in general terms, and some attempt made to clarify what the project has and has not accomplished.

Literary and Linguistic Computing | 2003

A Logic Programming Environment for Document Semantics and Inference

David Dubin; Allen H. Renear; C. M. Sperberg-McQueen; Claus Huitfeldt

Markup licenses inferences about a text. But the information warranting such inferences may not be entirely explicit in the syntax of the markup language used to encode the text. This paper describes a Prolog environment for exploring alternative approaches to representing facts and rules of inference about structured documents. It builds on earlier work proposing an account of how markup licenses inferences, and of what is needed in a specification of the meaning of a markup language. Our system permits an analyst to specify facts and rules of inference about domain entities and properties as well as facts about the markup syntax, and to construct and test alternative approaches to translation between representation layers. The system provides a level of abstraction at which the performative or interpretive meaning of the markup can be explicitly represented in machine-readable and executable form.

Computers and The Humanities | 1995

The TEI: History, Goals, and Future

Nancy Ide; C. M. Sperberg-McQueen

This paper traces the history of the Text Encoding Initiative, through the Vassar Conference and the Poughkeepsie Principles to the publication, in May 1994, of the Guidelines for the Electronic Text Encoding and Interchange. The authors explain the types of questions that were raised, the attempts made to resolve them, the TEI project’s aims, the general organization of the TEI committees, and they discuss the project’s future.

Computers and The Humanities | 1995

The Design of the TEI Encoding Scheme

C. M. Sperberg-McQueen; Lou Burnard

This paper discusses the basic design of the encoding scheme described by the Text Encoding Initiative’s Guidelines for Electronic Text Encoding and Interchange (TEI document number TEI P3, hereafter simply P3 or the Guidelines).1 It first reviews the basic design goals of the TEI project and their development during the course of the project. Next, it outlines some basic notions relevant for the design of any markup language and uses those notions to describe the basic structure of the TEI encoding scheme. It also describes briefly the “core” tag set defined in chapter 6 of P3, and the “default text structure” defined in chapter 7 of that work. The final section of the paper attempts an evaluation of P3 in the light of its original design goals, and outlines areas in which further work is still needed.

acm ieee joint conference on digital libraries | 2003

XML semantics and digital libraries

Allen H. Renear; David Dubin; C. M. Sperberg-McQueen; Claus Huitfeldt

The lack of a standard formalism for expressing the semantics of an XML vocabulary is a major obstacle to the development of high-function interoperable digital libraries. XML document type definitions (DTDs) provide a mechanism for specifying the syntax of an XML vocabulary, but there is no comparable mechanism for specifying the semantics of that vocabulary - where semantics simply means the basic facts and relationships represented by the occurrence of XML constructs. A substantial loss of functionality and interoperability in digital libraries results from not having a common machine-readable formalism for expressing these relationships for the XML vocabularies currently being used to encode content. Recently a number of projects and standards have begun taking up related topics. We describe the problem and our own project.

Explore More