Katherine G. Herbert | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Katherine G. Herbert is active.

Explore More

Publication

Featured researches published by Katherine G. Herbert.

international conference on tools with artificial intelligence | 2004

XML clustering by principal component analysis

Jianghui Liu; Jason Tsong-Li Wang; Wynne Hsu; Katherine G. Herbert

XML is increasingly important in data exchange and information management. A large amount of efforts have been spent in developing efficient techniques for storing, querying, indexing and accessing XML documents. In This work we propose a new approach to clustering XML data. In contrast to previous work, which focused on documents defined by different DTDs, the proposed method works for documents with the same DTD. Our approach is to extract features from documents, modeled by ordered labeled trees, and transform the documents to vectors in a high-dimensional Euclidean space based on the occurrences of the features in the documents. We then reduce the dimensionality of the vectors by principal component analysis (PCA) and cluster the vectors in the reduced dimensional space. The PCA enables one to identify vectors with co-occurrent features, thereby enhancing the accuracy of the clustering. Experimental results based on documents obtained from Wisconsins XML data bank show the effectiveness and good performance of the proposed techniques.

international conference on management of data | 2004

BIO-AJAX: an extensible framework for biological data cleaning

Katherine G. Herbert; Narain H. Gehani; William H. Piel; Jason Tsong-Li Wang; Cathy H. Wu

As databases become more pervasive through the biological sciences, various data quality issues regarding data legacy, data uniformity and data duplication arise. Due to the nature of this data, each of these problems is non-trivial. For biological data to be corrected and standardized, new methods and frameworks must be developed. This paper proposes one such framework, called BIO-AJAX, which uses principles from data cleaning to improve data quality in biological information systems, specifically in TreeBASE.

International Journal of Information Quality | 2007

Biological data cleaning: a case study

Katherine G. Herbert; Jason Tsong-Li Wang

As databases become more pervasive through the biological sciences, various data quality concerns are emerging. Biological databases tend to develop data quality issues regarding data legacy, data uniformity and data duplication. Due to the nature of this data, each of these problems is non-trivial and can cause many problems for the database. For biological data to be corrected and standardised, methods and frameworks must be developed to handle both structural and traditional data. This paper discusses issues concerning biological data quality with respect to data cleaning. It presents BIO-AJAX, a framework developed to address these issues. It finally describes BIO-JAX for TreeBASE and BIO-AJAX for Lineage Path, two implementations of BIO-AJAX on phylogenetic data sets.

statistical and scientific database management | 2002

A structure-based search engine for phylogenetic databases

Huiyuan Shan; Katherine G. Herbert; William H. Piel; Dennis E. Shasha; Jason Tsong-Li Wang

Phylogenetic trees are essential for understanding the relationships among organisms or taxa. Many of the current techniques for searching phylogenetic repositories allow the user to perform a keyword-type search or an aligned sequence data search, or to browse a hierarchical list of taxa. Here we describe a new search engine that allows the user to present an example phylogeny, or a query tree, and then searches a phylogenetic database for trees that contain the query structure. The presented search engine is fully operational and is available on the World Wide Web.

International Journal of Computational Intelligence and Applications | 2002

XML QUERY BY EXAMPLE

Sen Zhang; Jason Tsong-Li Wang; Katherine G. Herbert

XMLs tree structure provides a rich background for complicated structural searches. In this paper we present a new system, called XML Query by Example (XML QBE) that allows the user to query XML documents exploiting their inherent tree structure. We present some interesting queries and describe the underlying query processing algorithms. We also describe the systems architecture and report its implementation status. Finally we conclude the paper by pointing out some future work.

conference on computer supported cooperative work | 2014

GeoTagger: a collaborative and participatory environmental inquiry system

Jerry Alan Fails; Katherine G. Herbert; Emily Hill; Christopher Loeschorn; Spencer Kordecki; David Dymko; Andrew J. DeStefano; Zill Christian

This note focuses on the motivation, approach, and the initial prototype implementation of Geotagger: a collaborative participatory environmental inquiry system. We situate the need for such a technology, and discuss related work -- much of which is situated in the realm of citizen science. Our work uniquely distinguishes itself from many other citizen science applications in that it supports limited data collection and analysis, with the additional benefit of supporting social interactions and engagement through conversations about observed data. This is accomplished by creating friends and groups which are collaborators in the observational inquiry process.

international conference on big data | 2015

Current Developments in Big Data and Sustainability Sciences in Mobile Citizen Science Applications

Nikita S. Panchariya; Andrew J. DeStefano; Varsha Nimbagal; Revathi Ragupathy; Serkan Yavuz; Katherine G. Herbert; Emily Hill; Jerry Alan Fails

Sustainability Sciences Studies is an interdisciplinary approach towards understanding how to develop a culture of conservation. This culture of conservation can be viewed from many different aspects, from the individual persons decisions to larger communitys impacts. In all of this, we see large quantities of data in a push-pull relationship with each other, with stakeholders needing to have access to real-time generated data sets and analytics from populations as large as a metropolitan community and be able to respond to as well as disseminate information to these populations. In this paper, we present a survey of the literature in these areas. The pervasive and diverse nature of big data in these fields demonstrates the need for data scientists to collaborate to identify ways to address the common and disparate needs that different projects may have in relation to big data. We also briefly present a platform and implementation the Sustainability Studies Mobile Toolkit (SSMT) and Geotagger that seek to address some of these needs.

technical symposium on computer science education | 2006

An interdisciplinary undergraduate science informatics degree in a liberal arts context

Dorothy Deremer; Katherine G. Herbert

In this paper, we describe a new interdisciplinary B.S. degree in Science Informatics at Montclair State University, a multipurpose public institution that includes a substantial General Education component. Beginning in the freshmen year, the Science Informatics curriculum contains 16 semester hours of interdisciplinary science informatics courses including a freshmen experience, internships, a research component, ethics, and a concentration currently in bioinformatics, cheminformatics, or computer science as well as core science and mathematics courses.

statistical and scientific database management | 2006

PhyloMiner: A Tool for Evolutionary Data Analysis

Sen Zhang; Katherine G. Herbert; Jason Tsong-Li Wang; William H. Piel; David R. B. Stockwell

Currently, phylogenetic tree techniques are being used in multiple areas, from tree of life problems to pathogen recognition to drug discovery. With all of these applications for phylogenetic tree techniques, methods are needed to exploit the knowledge modeled in phylogenetic trees more thoroughly. One such information point of interest is the behavior of frequent patterns in phylogenetic trees. While there are many techniques that look at maximal, consensus and super tree patterns, there are few techniques that look at frequent, but not maximal pattern. This demonstration paper presents PhyloMiner, a tool that automatically discovers frequent agreement subtrees from multiple phytogenies. It introduces this topic of frequent agreement subtrees and then concludes with describing the PhyloMiner tool that implements these concepts and is available freely on the World Wide Web

bioinformatics and bioengineering | 2006

A New Kernel Method for RNA Classification

Xiaoming Wu; Jason Tsong-Li Wang; Katherine G. Herbert

Support vector machines (SVMs) are a state-of-the-art machine learning tool widely used in speech recognition, image processing and biological sequence analysis. An essential step in SVMs is to devise a kernel function to compute the similarity between two data points in Euclidean space. In this paper we present a new kernel that takes advantage of both global and local structural information in RNAs and uses the information together to classify RNAs with support vector machines. Experimental results demonstrate the good performance of the new kernel and show that it outperforms existing kernels when applied to classifying non-coding RNA sequences

Explore More