Seema Sundara
Oracle Corporation
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Seema Sundara.
international conference on data engineering | 2000
Jagannathan Srinivasan; Ravi Murthy; Seema Sundara; Nipun Agarwal; Samuel DeFazio
Extensible indexing is a SQL-based framework that allows users to define domain-specific indexing schemes, and integrate them into the Oracle8i server. Users register a new indexing scheme, the set of related operators, and additional properties through SQL data definition language extensions. The implementation for an indexing scheme is provided as a set of Oracle Data Cartridge Interface (ODCIIndex) routines for index-definition, index-maintenance, and index-scan operations. An index created using the new indexing scheme, referred to as domain index, behaves and performs analogous to those built natively by the database system. The Oracle8i server implicitly invokes user-supplied index implementation code when domain index operations are performed, and executes user-supplied index scan routines for efficient evaluation of domain-specific operators. This paper provides an overview of the framework and describes the steps needed to implement an indexing scheme. The paper also presents a case study of Oracle Cartridges (intermedia text, spatial, and visual information retrieval), and Daylight (Chemical compound searching) Cartridge, which have implemented new indexing schemes using this framework and discusses the benefits and limitations.
international conference on data engineering | 2012
Souripriya Das; Seema Sundara; Matthew Perry; Jagannathan Srinivasan; Jayanta Banerjee; Aravind Yalamanchi
This paper describes the Semantic Indexing feature introduced in Oracle Database for indexing unstructured text (document) columns. This capability enables searching for concepts (such as people, places, organizations, and events), in addition to words or phrases, with further options for sense disambiguation and term expansion by consulting knowledge captured in OWL/RDF ontologies. The distinguishing aspects of our approach are: 1) Indexing: Instead of building a traditional inverted index of (annotated) token and/or named entity occurrences, we extract the entities, associations, and events present in a text column data and store them as RDF named graphs in the Oracle Database Semantic Store. This base content can be further augmented with knowledge bases and inferred triples (obtained by applying domain-specific ontologies and rule bases). 2) Querying: Instead of relying on proprietary extensions for specifying a search, we allow users to specify a complete SPARQL query pattern that can capture arbitrarily complex relationships between query terms. We have implemented this feature by introducing a sem_contains SQL operator and the associated sem_indextype indexing scheme. The indexing scheme employs an extensible architecture that supports indexing of unstructured text using native as well as third party text extraction tools. The paper presents a model for the semantic index and querying, describes the feature, and outlines its implementation leveraging Oracles native support for RDF/OWL storage, inferencing, and querying. We also report a study involving use of this feature on a TREC collection of over 130,000 news articles.
extending database technology | 2009
Ying Hu; Seema Sundara; Jagannathan Srinivasan
The concept of time-constrained SQL queries was introduced to address the problem of long-running SQL queries. A key approach adopted for supporting time-constrained SQL queries is to use sampling to reduce the amount of data that needs to be processed, thereby allowing completion of the query in the specified time constraint. However, sampling does make the query results approximate and hence requires the system to estimate the values of the expressions (especially aggregates) occurring in the select list. Thus, coming up with estimates for aggregates is crucial for time-constrained approximate SQL queries to be useful, which is the focus of this paper. Specifically, we address the problem of estimating commonly occurring aggregates (namely, SUM, COUNT, AVG, MEDIAN, MIN, and MAX) in time-constrained approximate queries. We give both point and interval estimates for SUM, COUNT, AVG, and MEDIAN using Bernoulli sampling for various type of queries, including join processing with cross product sampling. For MIN (MAX), we give the confidence level that the proportion 100γ% of the population will exceed the MIN (or be less than the MAX) obtained from the sampled data.
international conference on data engineering | 2010
Seema Sundara; Medha Atre; Vladimir Kolovski; Souripriya Das; Zhe Wu; Eugene Inseok Chong; Jagannathan Srinivasan
The paper addresses the problem of visualizing large scale RDF data via a 3-S approach, namely, by using, 1) Subsets: to present only relevant data for visualisation; both static and dynamic subsets can be specified, 2) Summaries: to capture the essence of RDF data being viewed; summarized data can be expanded on demand thereby allowing users to create hybrid (summary-detail) fisheye views of RDF data, and 3) Sampling: to further optimize visualization of large-scale data where a representative sample suffices. The visualization scheme works with both asserted and inferred triples (generated using RDF(S) and OWL semantics). This scheme is implemented in Oracle by developing a plug-in for the Cytoscape graph visualization tool, which uses functions defined in a Oracle PL/SQL package, to provide fast and optimized access to Oracle Semantic Store containing RDF data. Interactive visualization of a synthesized RDF data set (LUBM 1 million triples), two native RDF datasets (Wikipedia 47 million triples and UniProt 700 million triples), and an OWL ontology (eClassOwl with a large class hierarchy including over 25,000 OWL classes, 5,000 properties, and 400,000 class-properties) demonstrates the effectiveness of our visualization scheme.
international conference on data engineering | 2003
Ravi Murthy; Seema Sundara; Nipun Agarwal; Ying Hu; Timothy Chorma; Jagannathan Srinivasan
Most commercial SQL database systems support user-defined functions that can be used in WHERE clause filters, SELECT list items, or in sorting/grouping clauses. Often, user-defined functions are used as inexact search filters and then the filtered rows are sorted by a relevance measure. This is commonplace in Web search engines, multimedia, and personalization applications. We refer to the values, such as relevance measure, associated with the filtered rows as ancillary values, and address the problem of efficiently and expressively supporting queries involving them in Oracle. In our approach, the filtering operator is designated as the primary operator, and the associated ancillary values are modeled by additional operators that are declared to be ancillary to the primary operator. An ancillary operator can represent any auxiliary value for the filtered rows, including relevance values (e.g. a score which describes how well a document matches the text search query) and additional properties (e.g. the nature of spatial relationship for objects that overlap a given region). The query execution is optimized by allowing the primary and ancillary operator invocations to share computations via a shared context. Also, queries involving ancillary values can exploit user defined indexes and their capability to return results in the order of ancillary values. We present the key concepts, describes our implementation scheme and optimization techniques, and discusses alternative approaches for supporting ancillary values. Finally, we provide an experimental study that illustrates the scalability and effectiveness of our approach.
extending database technology | 2010
Ying Hu; Wen-Chi Hou; Seema Sundara; Jagannathan Srinivasan
Although the notion of time-constrained query was first introduced two decades ago to address the problem of long running SQL queries, none of the commercial database systems support such a feature. This is rather surprising given the fact that database systems are beginning to accommodate large datasets in the order of terabytes to petabytes. Thus, the long running SQL query problem needs to be addressed. Recently, at Oracle we investigated and proposed a mechanism of supporting time-constrained quenes to provide quick approximate answers by use of sampling for such long running SQL quenes. This we followed up by coming up with error estimates as a measure of goodness for the approximation. To further validate our time-constrained query work, in this paper we present an experimental study conducted on our time-constrained query prototype built on the Oracle Database. It is our hope that this work will revive interest in time-constrained queries.
international conference on management of data | 2018
Cagri Balkesen; Nitin Kunal; Georgios Giannikis; Pit Fender; Seema Sundara; Felix Schmidt; Jarod Wen; Sandeep R. Agrawal; Arun Raghavan; Venkatanathan Varadarajan; Anand Viswanathan; Balakrishnan Chandrasekaran; Sam Idicula; Nipun Agarwal; Eric Sedlar
Today, an ever increasing amount of transistors are packed into processor designs with extra features to support a broad range of applications. As a consequence, processors are becoming more and more complex and power hungry. At the same time, they only sustain an average performance for a wide variety of applications while not providing the best performance for specific applications. In this paper, we demonstrate through a carefully designed modern data processing system called RAPID and a simple, low-power processor specially tailored for data processing that at least an order of magnitude performance/power improvement in SQL processing can be achieved over a modern system running on todays complex processors. RAPID is designed from the ground up with hardware/software co-design in mind to provide architecture-conscious extreme performance while consuming less power in comparison to the modern database systems. The paper presents in detail the design and implementation of RAPID, a relational, columnar, in-memory query processing engine supporting analytical query workloads.
very large data bases | 2005
Ying Hu; Seema Sundara; Timothy Chorma; Jagannathan Srinivasan
very large data bases | 2007
Ying Hu; Seema Sundara; Jagannathan Srinivasan
Archive | 1999
Seema Sundara; Ravi Murthy; Nipun Agarwal; Jagannathan Srinivasan