John Kunze
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by John Kunze.
symposium on operating systems principles | 1985
John K. Ousterhout; Herve Da Costa; David Harrison; John Kunze; Michael D. Kupfer; James Thompson
Abstract : We analyzed the UNIX 4.2BSD file system by recording activity in trace files and writing programs to analyze the traces. The trace analysis shows that the average file system bandwidth needed per user is low (a few hundred bytes per second). Most of the files accessed are short, are open a short time, and are accessed sequentially. Most new information is deleted or overwritten within a few minutes of its creation. We wrote a simulator that uses the traces to predict the performance of caches for disk blocks. The moderate-sized caches used in UNIX reduce disk traffic by about 50%, but larger caches (several megabytes) can achieve much greater reductions, eliminating 90% or more of all disk traffic. With those large caches, large block sizes (16 kbytes or more) result in the fewest disk accesses.
ZooKeys | 2015
Robert P. Guralnick; Nico Cellinese; John Deck; Richard L. Pyle; John Kunze; Lyubomir Penev; Ramona L. Walls; Gregor Hagedorn; Donat Agosti; John Wieczorek; Terry Catapano; Roderic D. M. Page
Abstract Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided.
PLOS Biology | 2017
Julie McMurry; Nick Juty; Niklas Blomberg; Tony Burdett; Tom Conlin; Nathalie Conte; Mélanie Courtot; John Deck; Michel Dumontier; Donal Fellows; Alejandra Gonzalez-Beltran; Philipp Gormanns; Jeffrey S. Grethe; Janna Hastings; Jean-Karim Hériché; Henning Hermjakob; Jon Ison; Rafael C. Jimenez; Simon Jupp; John Kunze; Camille Laibe; Nicolas Le Novère; James Malone; María Martín; Johanna McEntyre; Chris Morris; Juha Muilu; Wolfgang Müller; Philippe Rocca-Serra; Susanna-Assunta Sansone
In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
International Journal on Digital Libraries | 2005
José Luis Borbinha; John Kunze; Angela Spinazzè; Peter Mutschke; Hans-Jörg Lieder; Michael Mabe; Larry E. Dixson; Howard Besser; Becky Dean; Warwick Cathro
This article summarizes the discussions of the DELOS/NSF Working Group that reviewed current research and existing practice to better understand the ways in which actors and their roles are perceived within the digital library community. Definitions given to new roles depend too often on the narrow, subjective perspective of a local context. The current situation makes it difficult to understand objectively the key actor/role issues that arise in individual cases and also to perform comparative analysis between different cases. This work brings to light several issues that warrant further research and underscores the community’s need for formal and objective reference models for the description of actors and their roles in digital libraries.
F1000Research | 2014
Carly Strasser; John Kunze; Stephen Abrams; Patricia Cruse
Scientific datasets have immeasurable value, but they lose their value over time without proper documentation, long-term storage, and easy discovery and access. Across disciplines as diverse as astronomy, demography, archeology, and ecology, large numbers of small heterogeneous datasets (i.e., the long tail of data) are especially at risk unless they are properly documented, saved, and shared. One unifying factor for many of these at-risk datasets is that they reside in spreadsheets. In response to this need, the California Digital Library (CDL) partnered with Microsoft Research Connections and the Gordon and Betty Moore Foundation to create the DataUp data management tool for Microsoft Excel. Many researchers creating these small, heterogeneous datasets use Excel at some point in their data collection and analysis workflow, so we were interested in developing a data management tool that fits easily into those work flows and minimizes the learning curve for researchers. The DataUp project began in August 2011. We first formally assessed the needs of researchers by conducting surveys and interviews of our target research groups: earth, environmental, and ecological scientists. We found that, on average, researchers had very poor data management practices, were not aware of data centers or metadata standards, and did not understand the benefits of data management or sharing. Based on our survey results, we composed a list of desirable components and requirements and solicited feedback from the community to prioritize potential features of the DataUp tool. These requirements were then relayed to the software developers, and DataUp was successfully launched in October 2012.
Scientific Data | 2018
Sarala M. Wimalaratne; Nick Juty; John Kunze; Greg Janée; Julie McMurry; Niall Beard; Rafael C. Jimenez; Jeffrey S. Grethe; Henning Hermjakob; Maryann E. Martone; Timothy W.I. Clark
Most biomedical data repositories issue locally-unique accessions numbers, but do not provide globally unique, machine-resolvable, persistent identifiers for their datasets, as required by publishers wishing to implement data citation in accordance with widely accepted principles. Local accessions may however be prefixed with a namespace identifier, providing global uniqueness. Such “compact identifiers” have been widely used in biomedical informatics to support global resource identification with local identifier assignment. We report here on our project to provide robust support for machine-resolvable, persistent compact identifiers in biomedical data citation, by harmonizing the Identifiers.org and N2T.net (Name-To-Thing) meta-resolvers and extending their capabilities. Identifiers.org services hosted at the European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), and N2T.net services hosted at the California Digital Library (CDL), can now resolve any given identifier from over 600 source databases to its original source on the Web, using a common registry of prefix-based redirection rules. We believe these services will be of significant help to publishers and others implementing persistent, machine-resolvable citation of research data.
Archive | 1991
John K. Ousterhout; Herve Da Costa; David Harrison; John Kunze; Michael D. Kupfer; James Thompson
symposium on operating systems principles | 1991
John K. Ousterhout; Herve Da Costa; David Harrison; John Kunze; Mike Kupfer
D-lib Magazine | 2011
William K. Michener; Dave Vieglais; Todd Vision; John Kunze; Patricia Cruse; Greg Janée
International Journal of Digital Curation | 2010
Stephen Abrams; John Kunze; David Loy