P. Bryan Heidorn
University of Arizona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by P. Bryan Heidorn.
Journal of Library Administration | 2011
P. Bryan Heidorn
ABSTRACT The role of libraries is to collect, preserve, and disseminate the intellectual output of the society. This output includes books and serials as well as the digital versions of the same. Scientists, other scholars, and all of society are now producing, storing, and disseminating digital data that underpin the aforementioned documents in much larger volumes than the text. The survival of this data is in question since the data are not housed in long-lived institutions such as libraries. This situation threatens the underlying principles of scientific replicability since in many cases data cannot readily be collected again. Libraries are the institutions that could best manage this intellectual output.
ASIS&T '10 Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47 | 2010
Jing Cheng; Xiao Hu; P. Bryan Heidorn
User satisfaction, though difficult to measure, is the main goal of Information Retrieval (IR) systems. In recent years, as Interactive Information Retrieval (IIR) systems have become increasingly popular, user effectiveness also has become critical in evaluating IIR systems. However, existing measures in IR evaluation are not particularly suitable for gauging user satisfaction and user effectiveness. In this paper, we propose two new measures to evaluate IIR systems, the Normalized Task Completion Time (NT) and the Normalized User Effectiveness (NUE). The two measures overcome limitations of existing measures and are efficient to calculate in that they do not need a large pool of search tasks. A user study was conducted to investigate the relationships between the two measures and the user satisfaction and effectiveness of a given IR system. The learning effects described by NT, NUE, and the task completion time were also studied and compared. The results show that NT is strongly correlated with user satisfaction, NUE is a better indicator of system effectiveness than task completion time, and both new measures are superior to task completion time in describing the learning effect of the given IR system.
Archive | 2013
P. Bryan Heidorn; Qianjin Zhang
The LABELX (Label Annotation through Biodiversity Enhanced Learning) is an extension of the HERBIS NLP system reported previously (Heidorn & Wei, 2008). The objective of the system is to formaly structure output from Optical Character Recognition (OCR) of the highly variable labels of natural history museum specimens. OCR errors are common in the OCR output. Genus and species names are particularly prone to errors. Records are preprocessed using a fuzzy-match algorithm to find and replace genus and species names, including those with OCR errors, and replace those with a constant token. Integers and strings that begin with Alphabetic characters and end with numbers are also replaced with tokens. LABELX generates structured XML data and RDF and makes corrections to OCR errors in some fields. The main algorithm is a Hidden Markov Model (HMM). This poster reports an enhancement to the previous system with a larger data set.
arXiv: Instrumentation and Methods for Astrophysics | 2018
Gretchen Stahlman; P. Bryan Heidorn; Julie Steffen
As research datasets and analyses grow in complexity, data that could be valuable to other researchers and to support the integrity of published work remain uncurated across disciplines. These data are especially concentrated in the Long Tail of funded research, where curation resources and related expertise are often inaccessible. In the domain of astronomy, it is undisputed that uncurated dark data exist, but the scope of the problem remains uncertain. The Astrolabe Project is a collaboration between University of Arizona researchers, the CyVerse cyberinfrastructure environment, and American Astronomical Society, with a mission to identify and ingest previously-uncurated astronomical data, and to provide a robust computational environment for analysis and sharing of data, as well as services for authors wishing to deposit data associated with publications. Following expert feedback obtained through two workshops held in 2015 and 2016, Astrolabe is funded in part by National Science Foundation. The system is being actively developed within CyVerse, and Astrolabe collaborators are soliciting heterogeneous datasets and potential users for the prototype system. Astrolabe team members are currently working to characterize the properties of uncurated astronomical data, and to develop automated methods for locating potentially-useful data to be targeted for ingest into Astrolabe, while cultivating a user community for the new data management system.
Astrophysical Journal Supplement Series | 2018
P. Bryan Heidorn; Gretchen Stahlman; Julie Steffen
Where appropriate repositories are not available to support all relevant astronomical data products, data can fall into darkness: unseen and unavailable for future reference and re-use. Some data in this category are legacy or old data, but newer datasets are also often uncurated and could remain dark. This paper provides a description of the design motivation and development of Astrolabe, a cyberinfrastructure project that addresses a set of community recommendations for locating and ensuring the long-term curation of dark or otherwise at-risk data and integrated computing. This paper also describes the outcomes of the series of community workshops that informed creation of Astrolabe. According to participants in these workshops, much astronomical dark data currently exist that are not curated elsewhere, as well as software that can only be executed by a few individuals and therefore becomes unusable because of changes in computing platforms. Astronomical research questions and challenges would be better addressed with integrated data and computational resources that fall outside the scope of existing observatory and space mission projects. As a solution, the design of the Astrolabe system is aimed at developing new resources for management of astronomical data. The project is based in CyVerse cyberinfrastructure technology and is a collaboration between the University of Arizona and the American Astronomical Society. Overall the project aims to support open access to research data by leveraging existing cyberinfrastructure resources and promoting scientific discovery by making potentially-useful data in a computable format broadly available to the astronomical community.
13th International Conference on Transforming Digital Worlds, iConference 2018 | 2018
Vikas Yadav; Farig Sadeque; P. Bryan Heidorn; Hong Cui
iSchools are highly interdisciplinary in nature - hence the direction and vision of iSchools have attracted researchers from various disciplines in recent times. In this paper, we analyzed the contents of the courses offered by 22 iSchools from different parts of the world. Our system extracts information from the course descriptions offered by different iSchools and visualizes the current trend of offering more courses with substantially more emphasis on computation than other paradigms. The architecture of our system is simple yet powerful - which may encourage others to implement similar techniques in different iSchool-related research.
Archive | 2013
Robert Anglin; Jason H. Best; Renato Figueiredo; Edward Gilbert; Nathan Gnanasambandam; Stephen Gottschalk; Elspeth Haston; P. Bryan Heidorn; Daryl Lafferty; Peter Lang; Gil Nelson; Deborah Paul; William Ulate; Kimberly Watson; Qianjin Zhang
There are an estimated 2 – 3 billion museum specimens world – wide (OECD 1999, Ariño 2010). In an effort to increase the research value of their collections, institutions across the U. S. have been seeking new ways to cost effectively transcribe the label information associated with these specimen collections. Current digitization methods are still relatively slow, labor-intensive, and therefore expensive. New methods, such as optical character recognition (OCR), natural language processing, and human-in-theloop assisted parsing are being explored to reduce these costs. The National Science Foundation (NSF), through the Advancing Digitization of Biodiversity Collections (ADBC) program, funded Integrated Digitized Biocollections (iDigBio) in 2011 to create a Home Uniting Biodiversity Collections (HUB) cyberinfrastructure to aggregate and collectively integrate specimen data and find ways to digitize specimen data faithfully and faster and disseminate the knowledge of how to achieve this. The iDigBio Augmenting OCR Working Group is part of this national effort.
Archive | 2013
Deborah Paul; P. Bryan Heidorn
The Augmenting OCR Working Group (A-OCR WG) at Integrated Digitized Biocollections (iDigBio) seeks to improve community OCR strategies and algorithms for faster, better parsing of OCR output derived from valuable data on natural history collection specimen labels. This task is exceedingly difficult because museum labels are often annotated, and vary in content, form and font. Under the National Science Foundations (NSF) Advancing Digitization of Biological Collections (ADBC) program, iDigBio is building a cyberinfrastructure to aggregate quality data from museum specimens housed in collections across the United States for use by researchers, educators, environmentalists and the public. Since March of 2012, the A-OCR WG formed from community consensus to begin its role in this endeavor, defining reachable goals including setting up a hackathon concurrent with iConference 2013. This paper reports on the definition of some key problems identified by the A-OCR WG since these science problems will drive research and cyberinfrastructure development.
Archive | 2013
Deborah Paul; P. Bryan Heidorn; Jason H. Best; Edward Gilbert; Amanda K. Neill; Gil Nelson; William Ulate
Integrated Digitized Biocollections (iDigBio) is a nation-wide effort funded by the National Science Foundation (NSF) to digitize data from hundreds of millions of natural history museum specimens. In a concerted five-part outreach effort, the iDigBio Augmenting Optical Character Recognition Working Group (A-OCR WG) coordinated a 2013 iConference Workshop, Poster, Notes submission, Alternative Event and a concurrent Hackathon hosted by the Botanical Research Institute of Texas (BRIT). The Workshop titled, Help iDigBio Reveal Hidden Data: iDigBio Augmenting OCR Working Group Needs You introduces the iSchools community to iDigBio and the A-OCR WG mission and challenges to improve digitization efficiency. This related Alternative Event provides the A-OCR WG an opportunity to report back to iConference Workshop attendees about our first experience using a Hackathon model to work on parsing and user interface design issues specific to our needs. We anticipate to a lively, open discussion with event attendees and future collaborators.
Archive | 2010
Qin Wei; P. Bryan Heidorn; Chris Freeland