Joan A. Smith | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joan A. Smith is active.

Explore More

Publication

Featured researches published by Joan A. Smith.

Information Processing and Management | 2005

Toward alternative metrics of journal impact: a comparison of download and citation data

Johan Bollen; Herbert Van de Sompel; Joan A. Smith; Richard Luce

We generated networks of journal relationships from citation and download data, and determined journal impact rankings from these networks using a set of social network centrality metrics. The resulting journal impact rankings were compared to the ISI IF. Results indicate that, although social network metrics and ISI IF rankings deviate moderately for citation-based journal networks, they differ considerably for journal networks derived from download data. We believe the results represent a unique aspect of general journal impact that is not captured by the ISI IF. These results furthermore raise questions regarding the validity of the ISI IF as the sole assessment of journal impact, and suggest the possibility of devising impact metrics based on usage information in general.

International Journal on Digital Libraries | 2007

Using the web infrastructure to preserve web pages

Michael L. Nelson; Frank McCown; Joan A. Smith; Martin Klein

To date, most of the focus regarding digital preservation has been on replicating copies of the resources to be preserved from the “living web” and placing them in an archive for controlled curation. Once inside an archive, the resources are subject to careful processes of refreshing (making additional copies to new media) and migrating (conversion to new formats and applications). For small numbers of resources of known value, this is a practical and worthwhile approach to digital preservation. However, due to the infrastructure costs (storage, networks, machines) and more importantly the human management costs, this approach is unsuitable for web scale preservation. The result is that difficult decisions need to be made as to what is saved and what is not saved. We provide an overview of our ongoing research projects that focus on using the “web infrastructure” to provide preservation capabilities for web pages and examine the overlap these approaches have with the field of information retrieval. The common characteristic of the projects is they creatively employ the web infrastructure to provide shallow but broad preservation capability for all web pages. These approaches are not intended to replace conventional archiving approaches, but rather they focus on providing at least some form of archival capability for the mass of web pages that may prove to have value in the future. We characterize the preservation approaches by the level of effort required by the web administrator: web sites are reconstructed from the caches of search engines (“lazy preservation”); lexical signatures are used to find the same or similar pages elsewhere on the web (“just-in-time preservation”); resources are pushed to other sites using NNTP newsgroups and SMTP email attachments (“shared infrastructure preservation”); and an Apache module is used to provide OAI-PMH access to MPEG-21 DIDL representations of web pages (“web server enhanced preservation”).

european conference on research and advanced technology for digital libraries | 2006

Repository replication using NNTP and SMTP

Joan A. Smith; Martin Klein; Michael L. Nelson

We present the results of a feasibility study using shared, existing, network-accessible infrastructure for repository replication. We utilize the SMTP and NNTP protocols to replicate both the metadata and the content of a digital library, using OAI-PMH to facilitate management of the archival process. We investigate how dissemination of repository contents can be piggybacked on top of existing email and Usenet traffic. Long-term persistence of the replicated repository may be achieved thanks to current policies and procedures which ensure that email messages and news posts are retrievable for evidentiary and other legal purposes for many years after the creation date. While the preservation issues of migration and emulation are not addressed with this approach, it does provide a simple method of refreshing content with unknown partners for smaller digital repositories that do not have the administrative resources for more sophisticated solutions.

european conference on research and advanced technology for digital libraries | 2008

A Quantitative Evaluation of Dissemination-Time Preservation Metadata

Joan A. Smith; Michael L. Nelson

One of many challenges facing web preservation efforts is the lack of metadata available for web resources. In prior work, we proposed a model that takes advantage of a sites own web server to prepare its resources for preservation. When responding to a request from an archiving repository, the server applies a series of metadata utilities, such as Jhove and Exif, to the requested resource. The output from each utility is included in the HTTP response along with the resource itself. This paper addresses the question of feasibility: Is it in fact practical to use the sites web server as a just-in-time metadata generator, or does the extra processing create an unacceptable deterioration in server responsiveness to quotidian events? Our tests indicate that (a) this approach can work effectively for both the crawler and the server; and that (b) utility selection is an important factor in overall performance.

acm/ieee joint conference on digital libraries | 2007

Generating best-effort preservation metadata for web resources at time of dissemination

Joan A. Smith; Michael L. Nelson

HTTP and MIME, while sufficient for contemporary webpage access, do not provide enough forensic information to enable the long-term preservation of the resources they describe and transport. But what if the originating web server automatically provided preservation metadata encapsulated with the resource at time of dissemination? Perhaps the ingestion process could be streamlined, with additional forensic metadata available to future information archeologists. We have adapted an Apache web server implementation of OAI-PMH which can utilize third-party metadata analysis tools to provide a metadata-rich description of each resource. The resource and its forensic metadata are packaged together as a complex object, expressed in plain ASCII and XML. The result is a CRATE: a self-contained preservation-ready version of the resource, created at time of dissemination.

web information and data management | 2006