David E. Simmen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David E. Simmen is active.

Explore More

Publication

Featured researches published by David E. Simmen.

international conference on management of data | 1996

Fundamental techniques for order optimization

David E. Simmen; Eugene J. Shekita; Timothy R. Malkemus

Decision support applications are growing in popularity as more business data is kept on-line. Such applications typically include complex SQL queries that can test a query optimizers ability to produce an efficient access plan. Many access plan strategies exploit the physical ordering of data provided by indexes or sorting. Sorting is an expensive operation, however. Therefore, it is imperative that sorting is optimized in some way or avoided all together. Toward that goal, this paper describes novel optimization techniques for pushing down sorts in joins, minimizing the number of sorting columns, and detecting when sorting can be avoided because of predicates, keys, or indexes. A set of fundamental operations is described that provide the foundation for implementing such techniques. The operations exploit data properties that arise from predicate application, uniqueness, and functional dependencies. These operations and techniques have been implemented in IBMs DB2/CS.

international conference on management of data | 2008

Damia: data mashups for intranet applications

David E. Simmen; Mehmet Altinel; Volker Markl; Sriram Padmanabhan; Ashutosh Singh

Increasingly large numbers of situational applications are being created by enterprise business users as a by-product of solving day-to-day problems. In efforts to address the demand for such applications, corporate IT is moving toward Web 2.0 architectures. In particular, the corporate intranet is evolving into a platform of readily accessible data and services where communities of business users can assemble and deploy situational applications. Damia is a web style data integration platform being developed to address the data problem presented by such applications, which often access and combine data from a variety of sources. Damia allows business users to quickly and easily create data mashups that combine data from desktop, web, and traditional IT sources into feeds that can be consumed by AJAX, and other types of web applications. This paper describes the key features and design of Damias data integration engine, which has been packaged with Mashup Hub, an enterprise feed server currently available for download on IBM alphaWorks. Mashup Hub exposes Damias data integration capabilities in the form of a service that allows users to create hosted data mashups.

Ibm Systems Journal | 2006

Cost-based optimization in DB2 XML

Andrey Balmin; Tom Eliaz; John F. Hornibrook; Lipyeow Lim; Guy M. Lohman; David E. Simmen; Min Wang; Chun Zhang

DB2 XML is a hybrid database system that combines the relational capabilities of DB2 Universal DatabaseTM (UDB) with comprehensive native XML support. DB2 XML augments DB2® UDB with a native XML store, XML indexes, and query processing capabilities for both XQuery and SQL/XML that are integrated with those of SQL. This paper presents the extensions made to the DB2 UDB compiler, and especially its cost-based query optimizer, to support XQuery and SQL/XML queries, using much of the same infrastructure developed for relational data queried by SQL. It describes the challenses to the relational infrastructure that supporting XQuery and SQL/XML poses and provides the rationale for the extensions that were made to the three main parts of the optimizer: the plan operators, the cardinality and cost model, and statistics collection.

international conference on data engineering | 2001

Using EELs, a practical approach to outerjoin and antijoin reordering

Jun Rao; Bruce G. Lindsay; Guy M. Lohman; Hamid Pirahesh; David E. Simmen

Outerjoins and antijoins are two important classes of joins in database systems. Reordering outerjoins and antijoins with innerjoins is challenging because not all the join orders preserve the semantics of the original query. Previous work did not consider antijoins and was restricted to a limited class of queries. We consider using a conventional bottom-up optimizer to reorder different types of joins. We propose extending each join predicates eligibility list, which contains all the tables referenced in the predicate. An extended eligibility list (EEL) includes all the tables needed by a predicate to preserve the semantics of the original query. We describe an algorithm that can set up the EELs properly in a bottom-up traversal of the original operator tree. A conventional join optimizer is then modified to check the EELs when generating sub-plans. Our approach handles antijoin and can resolve many practical issues. It is now being implemented in an upcoming release of IBMs Universal Database Server for Unix, Windows and OS/2.

very large data bases | 2014

Large-scale graph analytics in Aster 6: bringing context to big data discovery

David E. Simmen; Karl Schnaitter; Jeff Davis; Yingjie He; Sangeet Lohariwala; Ajay Mysore; Vinayak Shenoi; Mingfeng Tan; Yu Xiao

Graph analytics is an important big data discovery technique. Applications include identifying influential employees for retention, detecting fraud in a complex interaction network, and determining product affinities by exploiting community buying patterns. Specialized platforms have emerged to satisfy the unique processing requirements of large-scale graph analytics; however, these platforms do not enable graph analytics to be combined with other analytics techniques, nor do they work well with the vast ecosystem of SQL-based business applications. Teradata Aster 6.0 adds support for large-scale graph analytics to its repertoire of analytics capabilities. The solution extends the multi-engine processing architecture with support for bulk synchronous parallel execution, and a specialized graph engine that enables iterative analysis of graph structures. Graph analytics functions written to the vertex-oriented API exposed by the graph engine can be invoked from the context of an SQL query and composed with existing SQL-MR functions, thereby enabling data scientists and business applications to express computations that combine large-scale graph analytics with techniques better suited to a different style of processing. The solution includes a suite of pre-built graph analytic functions adapted for parallel execution.

very large data bases | 2004

Progressive optimization in action

Vijayshankar Raman; Volker Markl; David E. Simmen; Guy M. Lohman; Hamid Pirahesh

Progressive Optimization (POP) is a technique to make query plans robust, and minimize need for DBA intervention, by repeatedly re-optimizing a query during runtime if the cardinalities estimated during optimization prove to be significantly incorrect. POP works by carefully calculating validity ranges for each plan operator under which the overall plan can be optimal. POP then instruments the query plan with checkpoints that validate at runtime that cardinalities do lie within validity ranges, and re-optimizes the query otherwise. In this demonstration we showcase POP implemented for a research prototype version of IBMs DB2 DBMS, using a mix of real-world and synthetic benchmark databases and workloads. For selected queries of the workload we display the query plans with validity ranges as well as the placement of the various kinds of CHECK operators using the DB2 graphical plan explain tool. We also execute the queries, showing how and where re-optimization is triggered through the CHECK operators, the new plan generated upon re-optimization, and the extent to which previously computed intermediate results are reused.

International Workshop on Model-Based Software and Data Integration | 2008

Data Mashups for Situational Applications

Volker Markl; Mehmet Altinel; David E. Simmen; Ashutosh Singh

Situational applications require business users to create combine, and catalog data feeds and other enterprise data sources. Damia is a lightweight enterprise data integration engine inspired by the Web 2.0 mashup phenomenon. It consists of (1) a browser-based user-interface that allows for the specification of data mashups as data flow graphs using a set of Damia operators specified by programming-by-example principles, (2) a server with an execution engine, as well as (3) APIs for searching, debugging, executing and managing mashups. Damia provides a base data model and primitive operators based on the XQuery Infoset. A feed abstraction built on that model enables combining, filtering and transforming data feeds. This paper presents an overview of the Damia system as well as a research vision for data-intensive situational applications. A first version of Damia realizing some of the concepts described in this paper is available as a webserivce [17] and for download as part of IBM’s Mashup Starter Kit [18].

extending database technology | 1996

Fundamental Techniques for Order Optimization

David E. Simmen; Eugene J. Shekita; Timothy R. Malkemus

This paper briefly describes some of the novel techniques used by the query optimizer of IBMs DB2 to process and optimize the way order requirements are satisfied.

international conference on management of data | 2009

Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT

David E. Simmen; Frederick R. Reiss; Yunyao Li; Suresh Thalamati

Enterprise mashup scenarios often involve feeds derived from data created primarily for eye consumption, such as email, news, calendars, blogs, and web feeds. These data sources can test the capabilities of current data mashup products, as the attributes needed to perform join, aggregation, and other operations are often buried within unstructured feed text. Information extraction technology is a key enabler in such scenarios, using annotators to convert unstructured text into structured information that can facilitate mashup operations. Our demo presents the integration of SystemT, an information extraction system from IBM Research, with IBMs InfoSphere MashupHub. We show how to build domain-specific annotators with SystemTs declarative rule language, AQL, and how to use these annotators to combine structured and unstructured information in an enterprise mashup.

international conference on data engineering | 2015

Accelerating Big Data analytics with Collaborative Planning in Teradata Aster 6

Aditi Subodh Pandit; Derrick Poo-ray Kondo; David E. Simmen; Anjali Norwood; Tongxin Bai

The volume, velocity, and variety of Big Data necessitate the development of new and innovative data processing software. A multitude of SQL implementations on distributed systems have emerged in recent years to enable large-scale data analysis. User-Defined Table operators (written in procedural languages) embedded in these SQL implementations are a powerful mechanism to succinctly express and perform analytic operations typical in Big Data discovery workloads. Table operators can be easily customized to implement different processing models such as map, reduce and graph execution. Despite an inherently parallel execution model, the performance and scalability of these table operators is greatly restricted as they appear as a black box to a typical SQL query optimizer. The optimizer is not able to infer even the basic properties of table operators, prohibiting the application of optimization rules and strategies. In this paper, we introduce an innovative concept of “Collaborative Planning”, which results in the removal of redundant operations and a more optimal rearrangement of query plan operators. The optimization of the query proceeds through a collaborative exchange between the planner and the table operator. Plan properties and context information of surrounding query plan operations are exchanged between the optimizer and the table operator. Knowing these properties also allows the author of the table operator to optimize its embedded logic. Our main contribution in this paper is the design and implementation of Collaborative Planning in the Teradata Aster 6 system. Using real-world workloads, we show that Collaborative Planning reduces query execution times as much as 90.0% in common use cases, resulting in a 24x speedup.

Explore More