Daniel Rocco | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Rocco is active.

Explore More

Publication

Featured researches published by Daniel Rocco.

international conference on management of data | 2002

Querying multiple bioinformatics information sources: can semantic web research help?

David Buttler; Matthew Coleman; Terence Critchlow; Renato Fileto; Wei Han; Calton Pu; Daniel Rocco; Li Xiong

Advances in Semantic Web and Ontologies have pushed the role of semantics to a new frontier: Semantic Composition of Web Services. A good example of such compositions is the querying of multiple bioinformatics data sources. Supporting effective querying over a large collection of bioinformatics data sources presents a number of unique challenges. First, queries over bioinformatics data sources are often complex associative queries over multiple Web documents. Most associations are defined by string matching of textual fragments in two documents. Second, most of the queries required by Genomics researchers involve complex data extraction, and sophisticated workflows that implement the complex associative access. Third but not the least, complex Genomics-specific queries are often reused many times by Genomics researchers, either directly or through some refinements, and are considered as a part of the research results by Genomics researchers. In this short article we present a list of challenging issues in supporting effective querying over bioinformatics data sources and illustrate them through a selection of representative search scenarios provided by biologists. We end the article with a discussion on how the state-of-art research and technological development in Semantic Web, Ontology, Internet Data Management, and Internet Computing Systems can help addressing these issues.

international conference on web services | 2005

Domain-specific Web service discovery with service class descriptions

Daniel Rocco; James Caverlee; Ling Liu; Terence Critchlow

This paper presents DynaBot, a domain-specific Web service discovery system. The core idea of the DynaBot service discovery system is to use domain-specific service class descriptions powered by an intelligent Deep Web crawler. In contrast to current registry-based service discovery systems -like the several available UDDI registries - DynaBot promotes focused crawling of the Deep Web of services and discovers candidate services that are relevant to the domain of interest. It uses intelligent filtering algorithms to match services found by focused crawling with the domain-specific service class descriptions. We demonstrate the capability of DynaBot through the BLAST scenario and describe our initial experience with DynaBot.

international conference on service oriented computing | 2004

Discovering and ranking web services with BASIL: a personalized approach with biased focus

James Caverlee; Ling Liu; Daniel Rocco

In this paper we present a personalized web service discovery and ranking technique for discovering and ranking relevant data-intensive web services. Our first prototype -- called BASIL -- supports a <i>personalized</i> view of data-intensive web services through source-biased focus. BASIL provides service discovery and ranking through source-biased probing and source-biased relevance metrics. Concretely, the BASIL approach has three unique features: (1) It is able to determine in very few interactions whether a target service is relevant to the given source service by probing the target with very precise probes; (2) It can evaluate and rank the relevant services discovered based on a set of source-biased relevance metrics; and (3) It can identify interesting types of relationships for each source service with respect to other discovered services, which can be used as value-added metadata for each service. We also introduce a performance optimization technique called source-biased probing with focal terms to further improve the effectiveness of the basic source-biased service discovery algorithm. The paper concludes with a set of initial experiments showing the effectiveness of the BASIL system.

Bioinformatics | 2003

Automatic discovery and classification of bioinformatics Web sources

Daniel Rocco; Terence Critchlow

MOTIVATION The World Wide Web provides an incredible resource to genomics researchers in the form of query access to distributed data sources--e.g. BLAST sequence homology search interfaces. The number of these autonomous sources and their rate of change outpaces the speed at which they can be manually classified, meaning that the available data is not being utilized to its full potential. Manually maintaining a wrapper library will not scale to accommodate the growth of genomics data sources on the Web, challenging us to produce an automated system that can find, classify and wrap new sources without constant human intervention. Previous research has not addressed the problem of automatically locating, classifying and integrating classes of bioinformatics data sources. RESULTS This paper presents an overview of a system for finding classes of bioinformatics data sources and integrating them behind a unified interface. We describe our approach for automatic classification of new Web sources into relevance categories that eliminates the human effort required to maintain a current repository of sources. Our approach is based on a meta-data description of classes of interesting sources that describes the important features of an entire class of services without tying that description to any particular Web source. We examine the features of this format in the context of BLAST sources to show how it relates to Web sources that are being described. We then show how a description can be used to determine if an arbitrary Web source is an instance of the described service. To validate the effectiveness of this approach, we have constructed a prototype that correctly classifies approximately two-thirds of the BLAST sources we tested. We conclude with a discussion of these results, the factors that affect correct automatic classification and areas for future study.

international world wide web conferences | 2005

Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces

Anne H. H. Ngu; Daniel Rocco; Terence Critchlow; David Buttler

The World Wide Web provides a vast resource to genomics researchers, with Web-based access to distributed data sources such as BLAST sequence homology search interfaces. However, finding the desired scientific information can still be very tedious and frustrating. While there are several known servers on genomic data (e.g., GeneBank, EMBL, NCBI) that are shared and accessed frequently, new data sources are created each day in laboratories all over the world. Sharing these new genomics results is hindered by the lack of a common interface or data exchange mechanism. Moreover, the number of autonomous genomics sources and their rate of change outpace the speed at which they can be manually identified, meaning that the available data is not being utilized to its full potential. An automated system that can find, classify, describe, and wrap new sources without tedious and low-level coding of source-specific wrappers is needed to assist scientists in accessing hundreds of dynamically changing bioinformatics Web data sources through a single interface. A correct classification of any kind of Web data source must address both the capability of the source and the conversation/interaction semantics inherent in the design of the data source. We propose a service class description (SCD)-a meta-data approach for classifying Web data sources that takes into account both the capability and the conversational semantics of the source. The ability to discover the interaction pattern of a Web source leads to increased accuracy in the classification process. Our results show that an SCD-based approach successfully classifies two thirds of BLAST sites with 100% accuracy and two thirds of bioinformatics keyword search sites with around 80% precision.

technical symposium on computer science education | 2011

Distributed version control in the classroom

Daniel Rocco; Will Lloyd

Modern distributed version control systems offer compelling advantages for teaching students professional software development practices and skills. In this paper, we explore the potential for incorporating Mercurial into introductory, intermediate, and advanced computing courses. By incorporating version control into the entire CS curriculum, instructors create unique opportunities to engage students in collaborative, real-world projects and activities, giving them critical early exposure to the expectations and assumptions prevalent in the software development community. Early introduction to version control provides students with an important foundation in both personal and collaborative development excellence, offering them a competitive edge in the marketplace and a superior understanding of software development best practice.

international world wide web conferences | 2004

Efficient web change monitoring with page digest

David Buttler; Daniel Rocco; Ling Liu

The Internet and the World Wide Web have enabled a publishing explosion of useful online information, which has produced the unfortunate side effect of information overload: it is increasingly difficult for individuals to keep abreast of fresh information. In this paper we describe an approach for building a system for efficiently monitoring changes to Web documents. This paper has three main contributions. First, we present a coherent framework that captures different characteristics of Web documents. The system uses the Page Digest encoding to provide a comprehensive monitoring system for content, structure, and other interesting properties of Web documents. Second, the Page Digest encoding enables improved performance for individual page monitors through mechanisms such as short-circuit evaluation, linear time algorithms for document and structure similarity, and data size reduction. Finally, we develop a collection of sentinel grouping techniques based on the Page Digest encoding to reduce redundant processing in large-scale monitoring systems by grouping similar monitoring requests together. We examine how effective these techniques are over a wide range of parameters and have seen an order of magnitude speed up over existing Web-based information monitoring systems.

congress on evolutionary computation | 2003

Page Digest for large-scale Web services

Daniel Rocco; David Buttler; Ling Liu

We introduce Page Digest, a mechanism for efficient storage and processing of Web documents. The Page Digest design encourages a clean separation of the structural elements of Web documents from their content. Its encoding transformation produces many of the advantages of traditional string digest schemes yet remains invertible without introducing significant additional cost or complexity. Using the Page Digest encoding can provide at least an order of magnitude speedup when traversing a Web document as compared to using a standard document object model implementation. Our experiments show that change detection using Page Digest operates in linear time, offering 75% improvement in execution performance compared with existing systems. In addition, the Page Digest encoding can reduce the tag name redundancy found in Web documents, allowing 30% to 50% reduction in document size.

bioinformatics and bioengineering | 2003

BioSeek: exploiting source-capability information for integrated access to multiple bioinformatics data sources

Ling Liu; David Buttler; Terence Critchlow; Wei Han; Henrique Paques; Calton Pu; Daniel Rocco

Modern Bioinformatics data sources are widely used by molecular biologists for homology searching and new drug discovery. User-friendly and yet responsive access is one of the most desirable properties for integrated access to the rapidly growing, heterogeneous, and distributed collection of data sources. The increasing volume and diversity of digital information related to bioinformatics (such as genomes, protein sequences, protein structures, etc.) have led to a growing problem that conventional data management systems do not have, namely finding which information sources out of many candidate choices are the most relevant and most accessible to answer a given user query. We refer to this problem as the query routing problem. In this paper we introduce the notation and issues of query routing, and present a practical solution for designing a scalable query routing system based on multi-level progressive pruning strategies. The key idea is to create and maintain source capability profiles independently, and to provide algorithms that can dynamically discover relevant information sources for a given query through the smart use of source profiles. Compared to the keyword-based indexing techniques adopted in most of the search engines and software, our approach offers fine-granularity of interest matching, thus it is more powerful and effective for handling queries with complex conditions.

international world wide web conferences | 2006

Discovering Interesting Relationships among Deep Web Databases: A Source-Biased Approach

James Caverlee; Ling Liu; Daniel Rocco

The escalation of deep web databases has been phenomenal over the last decade, spawning a growing interest in automated discovery of interesting relationships among available deep web databases. Unlike the “surface” web of static pages, these deep web databases provide data through a web-based query interface and account for a huge portion of all web content. This paper presents a novel source-biased approach to efficiently discover interesting relationships among web-enabled databases on the deep web. Our approach supports a relationship-centric view over a collection of deep web databases through source-biased database analysis and exploration. Our source-biased approach has three unique features: First, we develop source-biased probing techniques, which allow us to determine in very few interactions whether a target database is relevant to the source database by probing the target with very precise probes. Second, we introduce source-biased relevance metrics to evaluate the relevance of deep web databases discovered, to identify interesting types of source-biased relationships for a collection of deep web databases, and to rank them accordingly. The source-biased relationships discovered not only present value-added metadata for each deep web database but can also provide direct support for personalized relationship-centric queries. Third, but not least, we also develop a performance optimization using source-biased probing with focal terms to further improve the effectiveness of the basic source-biased model. A prototype system is designed for crawling, probing, and supporting relationship-centric queries over deep web databases using the source-biased approach. Our experiments evaluate the effectiveness of the proposed source-biased analysis and discovery model, showing that the source-biased approach outperforms query-biased probing and unbiased probing.

Explore More