Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where James A. Macklin is active.

Publication


Featured researches published by James A. Macklin.


PLOS Biology | 2015

Finding Our Way through Phenotypes

Andrew R. Deans; Suzanna E. Lewis; Eva Huala; Salvatore S. Anzaldo; Michael Ashburner; James P. Balhoff; David C. Blackburn; Judith A. Blake; J. Gordon Burleigh; Bruno Chanet; Laurel Cooper; Mélanie Courtot; Sándor Csösz; Hong Cui; Wasila M. Dahdul; Sandip Das; T. Alexander Dececchi; Agnes Dettai; Rui Diogo; Robert E. Druzinsky; Michel Dumontier; Nico M. Franz; Frank Friedrich; George V. Gkoutos; Melissa Haendel; Luke J. Harmon; Terry F. Hayamizu; Yongqun He; Heather M. Hines; Nizar Ibrahim

Imagine if we could compute across phenotype data as easily as genomic data; this article calls for efforts to realize this vision and discusses the potential benefits.


international conference on conceptual structures | 2012

Kurator: A Kepler package for data curation workflows

Lei Dou; G. Cao; Paul J. Morris; Robert A. Morris; Bertram Ludäscher; James A. Macklin; James Hanken

Abstract Data curation is critical for scientific data digitization, sharing, integration, and use. This paper presents Kurator, a software package for automating data curation pipelines in the Kepler scientific workflow system. Several curation tools and services are integrated into this package as actors to enable construction of workflows to perform and document various data curation tasks. The integration of Google cloud services (e.g., Google spreadsheets), allows workflow steps to invoke human experts outside the workflow in a manner that greatly simplifies the complex data handling in distributed, multi-user curation workflows. The Kepler platform provides the modeling, execution and management ability, including a collection-oriented model of computation (COMAD), and provenance tracking and browsing for the curation package. These features not only allow workflows to be easily modeled, maintained, and evolved, but also QA/QC of curation results is facilitated through examination of provenance information recorded during workflow execution. Effectiveness of the Kurator package is demonstrated through a workflow for data curation of natural science collections.


computer science and information engineering | 2009

Filtered-Push: A Map-Reduce Platform for Collaborative Taxonomic Data Management

Zhimin Wang; Hui Dong; Maureen Kelly; James A. Macklin; Paul J. Morris; Robert A. Morris

The Filtered-Push project aims to establish a cross-institutional infrastructure to help biologists (especially taxonomists) share and improve digitized natural history collection data via the exchange and management of specimen record annotations. Three challenges commonly confront the holders of data documenting specimens collected in the field: the identification of the organism and the annotation of records that arose from a single collection event but where parts of the organism have been distributed and are held as duplicates in multiple institutions; the quality control of new annotations [2]; and, more generally, the dissemination of annotations of specimen records, whether or not representing duplicate specimens. Addressing these can accelerate the rate of digital capture of data from paper records (such as handwritten labels attached to pinned insects or pasted on herbarium sheets with dried plants) and provide mechanisms for the global community of biologists to improve the quality of the data.


PLOS ONE | 2013

Semantic Annotation of Mutable Data

Robert A. Morris; Lei Dou; James Hanken; Maureen Kelly; David Lowery; Bertram Ludäscher; James A. Macklin; Paul J. Morris

Electronic annotation of scientific data is very similar to annotation of documents. Both types of annotation amplify the original object, add related knowledge to it, and dispute or support assertions in it. In each case, annotation is a framework for discourse about the original object, and, in each case, an annotation needs to clearly identify its scope and its own terminology. However, electronic annotation of data differs from annotation of documents: the content of the annotations, including expectations and supporting evidence, is more often shared among members of networks. Any consequent actions taken by the holders of the annotated data could be shared as well. But even those current annotation systems that admit data as their subject often make it difficult or impossible to annotate at fine-enough granularity to use the results in this way for data quality control. We address these kinds of issues by offering simple extensions to an existing annotation ontology and describe how the results support an interest-based distribution of annotations. We are using the result to design and deploy a platform that supports annotation services overlaid on networks of distributed data, with particular application to data quality control. Our initial instance supports a set of natural science collection metadata services. An important application is the support for data quality control and provision of missing data. A previous proof of concept demonstrated such use based on data annotations modeled with XML-Schema.


BMC Bioinformatics | 2016

Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building

Hong Cui; Dongfang Xu; Steven S. Chong; Martín J. Ramírez; Thomas Rodenhausen; James A. Macklin; Bertram Ludäscher; Robert A. Morris; Eduardo M. Soto; Nicolás Mongiardino Koch

BackgroundTaxonomic descriptions are traditionally composed in natural language and published in a format that cannot be directly used by computers. The Exploring Taxon Concepts (ETC) project has been developing a set of web-based software tools that convert morphological descriptions published in telegraphic style to character data that can be reused and repurposed. This paper introduces the first semi-automated pipeline, to our knowledge, that converts morphological descriptions into taxon-character matrices to support systematics and evolutionary biology research. We then demonstrate and evaluate the use of the ETC Input Creation - Text Capture - Matrix Generation pipeline to generate body part measurement matrices from a set of 188 spider morphological descriptions and report the findings.ResultsFrom the given set of spider taxonomic publications, two versions of input (original and normalized) were generated and used by the ETC Text Capture and ETC Matrix Generation tools. The tools produced two corresponding spider body part measurement matrices, and the matrix from the normalized input was found to be much more similar to a gold standard matrix hand-curated by the scientist co-authors. Special conventions utilized in the original descriptions (e.g., the omission of measurement units) were attributed to the lower performance of using the original input. The results show that simple normalization of the description text greatly increased the quality of the machine-generated matrix and reduced edit effort. The machine-generated matrix also helped identify issues in the gold standard matrix.ConclusionsETC Text Capture and ETC Matrix Generation are low-barrier and effective tools for extracting measurement values from spider taxonomic descriptions and are more effective when the descriptions are self-contained. Special conventions that make the description text less self-contained challenge automated extraction of data from biodiversity descriptions and hinder the automated reuse of the published knowledge. The tools will be updated to support new requirements revealed in this case study.


BMC Bioinformatics | 2015

OTO: Ontology Term Organizer.

Fengqiong Huang; James A. Macklin; Hong Cui; Heather A. Cole; Lorena Endara

BackgroundThe need to create controlled vocabularies such as ontologies for knowledge organization and access has been widely recognized in various domains. Despite the indispensable need of thorough domain knowledge in ontology construction, most software tools for ontology construction are designed for knowledge engineers and not for domain experts to use. The differences in the opinions of different domain experts and in the terminology usages in source literature are rarely addressed by existing software.MethodsOTO software was developed based on the Agile principles. Through iterations of software release and user feedback, new features are added and existing features modified to make the tool more intuitive and efficient to use for small and large data sets. The software is open source and built in Java.ResultsOntology Term Organizer (OTO; http://biosemantics.arizona.edu/OTO/) is a user-friendly, web-based, consensus-promoting, open source application for organizing domain terms by dragging and dropping terms to appropriate locations. The application is designed for users with specific domain knowledge such as biology but not in-depth ontology construction skills. Specifically OTO can be used to establish is_a, part_of, synonym, and order relationships among terms in any domain that reflects the terminology usage in source literature and based on multiple experts’ opinions. The organized terms may be fed into formal ontologies to boost their coverage. All datasets organized on OTO are publicly available.ConclusionOTO has been used to organize the terms extracted from thirty volumes of Flora of North America and Flora of China combined, in addition to some smaller datasets of different taxon groups. User feedback indicates that the tool is efficient and user friendly. Being open source software, the application can be modified to fit varied term organization needs for different domains.


Proceedings of the American Society for Information Science and Technology | 2012

OTO: Ontology term organizer

Fengqiong Huang; James A. Macklin; Paul J. Morris; Partha Pratim Sanyal; Robert A. Morris; Hong Cui

We believe biology ontologies should reflect the consensus of biologists. To promote consensus and assist the development of ontologies in biology domains, we implemented a web-based ontology term organizer (OTO) to collect grouping and relationship opinions from biologists. Besides is_a and part_of relations (the most frequently used relations in biology domains), OTO can also be used to sort out ordered values in any domain. A demo dataset OTO_demo has been set up at http://biosemantics.arizona.edu/ONTNEW. Use username OTOdemo and password OTOdemopass to login.


Taxon | 2003

Lectotypification of Crataegus coccinea L. and its conspecificity with C. pedicellata Sarg. (Rosaceae)

James B. Phipps; Steve Cafferty; James A. Macklin

Crataegus coccinea L., a name with an erratic history of usage but which has given its name to an important subdivision of the genus, C. ser. (or sect.) Coccineae, is lectotypified. It is held to be identical to the well-known North American species C. pedicellata Sarg., which it can now safely displace.


International Journal of Digital Curation | 2014

Towards Automated Design, Analysis and Optimization of Declarative Curation Workflows

Tianhong Song; Sven Köhler; Bertram Ludäscher; James Hanken; Maureen Kelly; David Lowery; James A. Macklin; Paul J. Morris; Robert A. Morris

Data curation is increasingly important. Our previous work on a Kepler curation package has demonstrated advantages that come from automating data curation pipelines by using workflow systems. However, manually designed curation workflows can be error-prone and inefficient due to a lack of user understanding of the workflow system, misuse of actors, or human error. Correcting problematic workflows is often very time-consuming. A more proactive workflow system can help users avoid such pitfalls. For example, static analysis before execution can be used to detect the potential problems in a workflow and help the user to improve workflow design. In this paper, we propose a declarative workflow approach that supports semi-automated workflow design, analysis and optimization. We show how the workflow design engine helps users to construct data curation workflows, how the workflow analysis engine detects different design problems of workflows and how workflows can be optimized by exploiting parallelism.


Biodiversity Informatics | 2010

Natural history specimen digitization: challenges and concerns

Ana Vollmar; James A. Macklin; Linda Ford

Collaboration


Dive into the James A. Macklin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hong Cui

University of Arizona

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eva Huala

Carnegie Institution for Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge