Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robert Sanderson is active.

Publication


Featured researches published by Robert Sanderson.


PLOS ONE | 2014

Scholarly context not found: One in five articles suffers from reference rot

Martin Klein; Herbert Van de Sompel; Robert Sanderson; Harihar Shankar; Lyudmila Balakireva; Ke Zhou; Richard Tobin

The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growing steadily over time. We find one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten. We suggest that, in order to safeguard the long-term integrity of the web-based scholarly record, robust solutions to combat the reference rot problem are required. In conclusion, we provide a brief insight into the directions that are explored with this regard in the context of the Hiberlink project.


acm/ieee joint conference on digital libraries | 2010

Making web annotations persistent over time

Robert Sanderson; Herbert Van de Sompel

As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource.


arXiv: Digital Libraries | 2011

The Open Annotation Collaboration (OAC) Model

Bernhard Haslhofer; Rainer Simon; Robert Sanderson; Herbert Van de Sompel

Annotations allow users to associate additional information with existing resources. Using proprietary and closed systems on the Web, users are already able to annotate multimedia resources such as images, audio and video. So far, however, this information is almost always kept locked up and inaccessible to the Web of Data. We believe that an important step to take is the integration of multimedia annotations and the Linked Data principles. This should allow clients to easily publish and consume, thus exchange annotations about resources via commonWeb standards.We first present the current status of the Open Annotation Collaboration, an international initiative that is currently working on annotation interoperability specifications based on best practices from the Linked Data effort. Then we present two use cases and early prototypes that make use of the proposed annotation model and present lessons learned and discuss yet open technical issues.


web science | 2013

Designing the W3C open annotation data model

Robert Sanderson; Paolo Ciccarese; Herbert Van de Sompel

The Open Annotation Core Data Model specifies an interoperable framework for creating associations between related resources, called annotations, using a methodology that conforms to the Architecture of the World Wide Web. Open Annotations can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for the most common use cases, such as attaching a piece of text to a single web resource. This paper presents the W3C Open Annotation Community Group specification and the rationale behind the scoping and technical decisions that were made. It also motivates interoperable Annotations via use cases, and provides a brief analysis of the advantages over previous specifications.


machine learning and data mining in pattern recognition | 2007

Statistical Identification of Key Phrases for Text Classification

Frans Coenen; Paul H. Leng; Robert Sanderson; Yanbo J. Wang

Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly domain-specific, and may also be computationally costly. In this paper we examine a number of alternative keyword-generation methods and phrase-construction strategies that identify key words and phrases by simple, language-independent statistical properties. We present results that demonstrate that these methods can produce good classification accuracy, with the best results being obtained using a phrase-based approach.


Multimedia Tools and Applications | 2014

Open annotations on multimedia Web resources

Bernhard Haslhofer; Robert Sanderson; Rainer Simon; Herbert Van de Sompel

Many Web portals allow users to associate additional information with existing multimedia resources such as images, audio, and video. However, these portals are usually closed systems and user-generated annotations are almost always kept locked up and remain inaccessible to the Web of Data. We believe that an important step to take is the integration of multimedia annotations and the Linked Data principles. We present the current state of the Open Annotation Model, explain our design rationale, and describe how the model can represent user annotations on multimedia Web resources. Applying this model in Web portals and devices, which support user annotations, should allow clients to easily publish and consume, thus exchange annotations on multimedia Web resources via common Web standards.


acm ieee joint conference on digital libraries | 2011

SharedCanvas: a collaborative model for medieval manuscript layout dissemination

Robert Sanderson; Benjamin Albritton; Rafael Schwemmer; Herbert Van de Sompel

In this paper we present a model based on the principles of Linked Data that can be used to describe the interrelationships of images, texts and other resources to facilitate the interoperability of repositories of medieval manuscripts or other culturally important handwritten documents. The model is designed from a set of requirements derived from the real world use cases of some of the largest digitized medieval content holders, and instantiations of the model are intended as the input to collection-independent page turning and scholarly presentation interfaces. A canvas painting paradigm, such as in PDF and SVG, was selected based on the lack of a one to one correlation between image and page, and to fulfill complex requirements such as when the full text of a page is known, but only fragments of the physical object remain. The model is implemented using technologies such as OAI-ORE Aggregations and OAC Annotations, as the fundamental building blocks of emerging Linked Digital Libraries. The model and implementation are evaluated through prototypes of both content providing and consuming applications. Although the system was designed from requirements drawn from the medieval manuscript domain, it is applicable to any layout-oriented presentation of images of text.


scalable information systems | 2006

Indexing and searching tera-scale Grid-Based Digital Libraries

Robert Sanderson; Ray R. Larson

The University of California, Berkeley and the University of Liverpool in conjunction with the San Diego Supercomputer Center are developing a framework for Grid-Based Digital Library systems and Information Retrieval Services (Cheshire3) that operates in both single-processor and distributed computing environments. In this paper we discuss some results of testing Grid-based parallel approaches in indexing and retrieval for a variety of information resources, ranging from small test collections like the TREC and INEX collections, to medium-scale metadata collections like Medline and a test version of University of California Online Union Catalog, MELVYL (with 15 million and 16.5 million records respectively) ranging up to large-scale collections like the US National Records and Archives Administration (NARA) Preservation Prototype. This paper examines our approaches to indexing and retrieving from these collections and the architecture of the system that supports them.


Concurrency and Computation: Practice and Experience | 2012

A Web-based resource model for scholarship 2.0: object reuse & exchange

Carl Lagoze; Herbert Van de Sompel; Michael L. Nelson; Simeon Warner; Robert Sanderson; Pete Johnston

Digital scholarship offers the opportunity to move beyond the limitations of traditional scholarly publication. Rather than limiting scholarly communication to text‐based static documents, the Web makes it possible for scholars to expose and share the full evidence of their research including data, images, video, and other genre of materials. These aggregations of evidence, or compound documents, can then be integrated into a linked data cloud, the basis of Scholarship 2.0—an open environment in which scholars collaborate and build new knowledge on the existing scholarship. We present Open Archives Initiative–Object Reuse and Exchange (OAI–ORE), a set of standards to identify and describe aggregations of WebResources, thereby making the Scholarship 2.0 vision possible. Copyright


International Conference on Innovative Techniques and Applications of Artificial Intelligence | 2007

Frequent Set Meta Mining: Towards Multi-Agent Data Mining

Kamal Ali Albashiri; Frans Coenen; Robert Sanderson; Paul H. Leng

In this paper we describe the concept of Meta ARM in the context of its objectives and challenges and go on to describe and analyse a number of potential solutions. Meta ARM is defined as the process of combining the results of a number of individually obtained Associate Rule Mining (ARM) operations to produce a composite result. The typical scenario where this is desirable is in multi-agent data mining where individual agents wish to preserve the security and privacy of their raw data but are prepared to share data mining results. Four Meta ARM algorithms are described: a Brute Force approach, an Apriori approach and two hybrid techniques. A “bench mark” system is also described to allow for appropriate comparison. A complete analysis of the algorithms is included that considers the effect of: the number of data sources, the number of records in the data sets and the number of attributes represented.

Collaboration


Dive into the Robert Sanderson's collaboration.

Top Co-Authors

Avatar

Herbert Van de Sompel

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Carl Lagoze

University of Michigan

View shared research outputs
Top Co-Authors

Avatar

Martin Klein

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Frans Coenen

University of Liverpool

View shared research outputs
Top Co-Authors

Avatar

Ray R. Larson

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Benjamin Albritton

Royal Netherlands Academy of Arts and Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge