Is this you? Create Your Porfile

Terence Critchlow

Pacific Northwest National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Terence Critchlow is active.

Explore More

Publication

Featured researches published by Terence Critchlow.

international conference on management of data | 1999

Practical lessons in supporting large-scale computational science

Ron Musick; Terence Critchlow

Business needs have driven the development of commercial database systems since their inception. As a result, there has been a strong focus on supporting many users, minimizing the potential corruption or loss of data, and maximizing performance metrics such as transactions-per-second and benchmark results [Gra93]. These goals have little to do with supporting business intelligence needs such as the decision support and data mining activities common in on-line analytic processing (OLAP) applications. As a result, business data are typically off-loaded to secondary systems before these activities occur. In addition, they have little to do with the needs of the scientific community, which typically revolve around a great deal of compute and I/O intensive analysis, often over large data with high dimensionality. For scientific data, in many cases the data was never collected in a DBMS in the first place, and so the analysis and visualization take place over specialized flat-file formats. This is a painful solution, because a DBMS has much to offer in the overall process of managing and exploring data.

knowledge discovery and data mining | 2007

Tracking multiple topics for finding interesting articles

Raymond K. Pon; Alfonso F. Cardenas; David Buttler; Terence Critchlow

We introduce multiple topic tracking (MTT) for iScore to better recommend news articles for users with multiple interests and to address changes in user interests over time. As an extension of the basic Rocchio algorithm, traditional topic detection and tracking, and single-pass clustering, MTT maintains multiple interest profiles to identify interesting articles for a specific user given user-feedback. Focusing on only interesting topics enables iScore to discard useless profiles to address changes in user interests and to achieve a balance between resource consumption and classification accuracy. Also by relating a topics interestingness to an article.s interestingness, iScore is able to achieve higher quality results than traditional methods such as the Rocchio algorithm. We identify several operating parameters that work well for MTT. Using the same parameters, we show that MTT alone yields high quality results for recommending interesting articles from several corpora. The inclusion of MTT improves iScores performance by 9% in recommending news articles from the Yahoo! News RSS feeds and the TREC11 adaptive filter article collection. And through a small user study, we show that iScore can still perform well when only provided with little user feedback.

international conference of the ieee engineering in medicine and biology society | 2000

DataFoundry: information management for scientific data

Terence Critchlow; Krzysztof Fidelis; Madhavan Ganesh; Ron Musick; Tom Slezak

Data warehouses and data marts have been successfully applied to a multitude of commercial business applications. They have proven to be invaluable tools by integrating information from distributed, heterogeneous sources and summarizing this data for use throughout the enterprise. Although the need for information dissemination is as vital in science as in business, working warehouses in this community are scarce because traditional warehousing techniques do not transfer to scientific environments. There are two primary reasons for this difficulty. First, schema integration is more difficult for scientific databases than for business sources because of the complexity of the concepts and the associated relationships. Second, scientific data sources have highly dynamic data representations (schemata). When a data source participating in a warehouse changes its schema, both the mediator transferring data to the warehouse and the warehouse itself need to be updated to reflect these modifications. The cost of repeatedly performing these updates in a traditional warehouse, as is required in a dynamic environment, is prohibitive. The paper discusses these issues within the context of the DataFoundry project, an ongoing research effort at Lawrence Livermore National Laboratory. DataFoundry utilizes a unique integration strategy to identify corresponding instances while maintaining differences between data from different sources, and a novel architecture and an extensive meta-data infrastructure, which reduce the cost of maintaining a warehouse.

international conference on web services | 2005

Domain-specific Web service discovery with service class descriptions

Daniel Rocco; James Caverlee; Ling Liu; Terence Critchlow

This paper presents DynaBot, a domain-specific Web service discovery system. The core idea of the DynaBot service discovery system is to use domain-specific service class descriptions powered by an intelligent Deep Web crawler. In contrast to current registry-based service discovery systems -like the several available UDDI registries - DynaBot promotes focused crawling of the Deep Web of services and discovers candidate services that are relevant to the domain of interest. It uses intelligent filtering algorithms to match services found by focused crawling with the domain-specific service class descriptions. We demonstrate the capability of DynaBot through the BLAST scenario and describe our initial experience with DynaBot.

statistical and scientific database management | 2008

Flexible Scientific Workflow Modeling Using Frames, Templates, and Dynamic Embedding

Anne H. H. Ngu; Shawn Bowers; Nicholas Haasch; Timothy M. McPhillips; Terence Critchlow

While most scientific workflows systems are based on dataflow, some amount of control-flow modeling is often necessary for engineering fault-tolerant, robust, and adaptive workflows. However, control-flow modeling within dataflow often results in workflow specifications that are hard to comprehend, reuse, and maintain. We describe new modeling constructs to address these issues that provide a structured approach for modeling control-flow within scientific workflows, and discuss their implementation within the Kepler scientific workflow system.

cooperative information systems | 1998

Meta-data based mediator generation

Terence Critchlow; Madhavan Ganesh; Ron Musick

Mediators are a critical component of any data warehouse; they transform data from source formats to the warehouse representation while resolving semantic and syntactic conflicts. The close relationship between mediators and databases requires a mediator to be updated whenever an associated schema is modified. Failure to quickly perform these updates significantly reduces the reliability of the warehouse because queries do not have access to the most current data. This may result in incorrect or misleading responses, and reduce user confidence in the warehouse. Unfortunately, this maintenance may be a significant undertaking if a warehouse integrates several dynamic data sources. This paper describes a meta-data framework, and associated software, designed to automate a significant portion of the mediator generation task and thereby reduce the effort involved in adapting to schema changes. By allowing the DBA to concentrate on identifying the modifications at a high level, instead of reprogramming the mediator, turnaround time is reduced and warehouse reliability is improved.

Bioinformatics | 2003

Automatic discovery and classification of bioinformatics Web sources

Daniel Rocco; Terence Critchlow

MOTIVATION The World Wide Web provides an incredible resource to genomics researchers in the form of query access to distributed data sources--e.g. BLAST sequence homology search interfaces. The number of these autonomous sources and their rate of change outpaces the speed at which they can be manually classified, meaning that the available data is not being utilized to its full potential. Manually maintaining a wrapper library will not scale to accommodate the growth of genomics data sources on the Web, challenging us to produce an automated system that can find, classify and wrap new sources without constant human intervention. Previous research has not addressed the problem of automatically locating, classifying and integrating classes of bioinformatics data sources. RESULTS This paper presents an overview of a system for finding classes of bioinformatics data sources and integrating them behind a unified interface. We describe our approach for automatic classification of new Web sources into relevance categories that eliminates the human effort required to maintain a current repository of sources. Our approach is based on a meta-data description of classes of interesting sources that describes the important features of an entire class of services without tying that description to any particular Web source. We examine the features of this format in the context of BLAST sources to show how it relates to Web sources that are being described. We then show how a description can be used to determine if an arbitrary Web source is an instance of the described service. To validate the effectiveness of this approach, we have constructed a prototype that correctly classifies approximately two-thirds of the BLAST sources we tested. We conclude with a discussion of these results, the factors that affect correct automatic classification and areas for future study.

international conference on tools with artificial intelligence | 2006

Multi-Criterion Active Learning in Conditional Random Fields

Christopher T. Symons; Nagiza F. Samatova; Ramya Krishnamurthy; Byung-Hoon Park; Tarik Umar; David Buttler; Terence Critchlow; David Hysom

Conditional random fields (CRFs), which are popular supervised learning models for many natural language processing (NLP) tasks, typically require a large collection of labeled data for training. In practice, however, manual annotation of text documents is quite costly. Furthermore, even large labeled training sets can have arbitrarily limited performance peaks if they are not chosen with care. This paper considers the use of multi-criterion active learning for identification of a small but sufficient set of text samples for training CRFs. Our empirical results demonstrate that our method is capable of reducing the manual annotation costs, while also limiting the retraining costs that are often associated with active learning. In addition, we show that the generalization performance of CRFs can be enhanced through judicious selection of training examples

international world wide web conferences | 2005

Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces

Anne H. H. Ngu; Daniel Rocco; Terence Critchlow; David Buttler

The World Wide Web provides a vast resource to genomics researchers, with Web-based access to distributed data sources such as BLAST sequence homology search interfaces. However, finding the desired scientific information can still be very tedious and frustrating. While there are several known servers on genomic data (e.g., GeneBank, EMBL, NCBI) that are shared and accessed frequently, new data sources are created each day in laboratories all over the world. Sharing these new genomics results is hindered by the lack of a common interface or data exchange mechanism. Moreover, the number of autonomous genomics sources and their rate of change outpace the speed at which they can be manually identified, meaning that the available data is not being utilized to its full potential. An automated system that can find, classify, describe, and wrap new sources without tedious and low-level coding of source-specific wrappers is needed to assist scientists in accessing hundreds of dynamically changing bioinformatics Web data sources through a single interface. A correct classification of any kind of Web data source must address both the capability of the source and the conversation/interaction semantics inherent in the design of the data source. We propose a service class description (SCD)-a meta-data approach for classifying Web data sources that takes into account both the capability and the conversational semantics of the source. The ability to discover the interaction pattern of a Web source leads to increased accuracy in the classification process. Our results show that an SCD-based approach successfully classifies two thirds of BLAST sites with 100% accuracy and two thirds of bioinformatics keyword search sites with around 80% precision.

international conference on management of data | 2004

Simulation data as data streams

Ghaleb Abdulla; Terence Critchlow; William Arrighi

Computational or scientific simulations are increasingly being applied to solve a variety of scientific problems. Domains such as astrophysics, engineering, chemistry, biology, and environmental studies are benefiting from this important capability. Simulations, however, produce enormous amounts of data that need to be analyzed and understood. In this overview paper, we describe scientific simulation data, its characteristics, and the way scientists generate and use the data. We then compare and contrast simulation data to data streams. Finally, we describe our approach to analyzing simulation data, present the AQSim (Ad-hoc Queries for Simulation data) system, and discuss some of the challenges that result from handling this kind of data.

Explore More