Badrish Chandramouli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Badrish Chandramouli is active.

Explore More

Publication

Featured researches published by Badrish Chandramouli.

very large data bases | 2009

Microsoft CEP server and online behavioral targeting

Mohamed H. Ali; C. Gerea; Balan Sethu Raman; Beysim Sezgin; T. Tarnavski; Tomer Verona; Ping Wang; Peter Zabback; Asvin Ananthanarayan; Anton Kirilov; M. Lu; Alex Raizman; R. Krishnan; Roman Schindlauer; Torsten Grabs; S. Bjeletich; Badrish Chandramouli; Jonathan Goldstein; S. Bhat; Ying Li; V. Di Nicola; Xiaoyang Sean Wang; David Maier; S. Grell; O. Nano; Ivo Santos

In this demo, we present the Microsoft Complex Event Processing (CEP) Server, Microsoft CEP for short. Microsoft CEP is an event stream processing system featured by its declarative query language and its multiple consistency levels of stream query processing. Query composability, query fusing, and operator sharing are key features in the Microsoft CEP query processor. Moreover, the debugging and supportability tools of Microsoft CEP provide visibility of system internals to users. Web click analysis has been crucial to behavior-based online marketing. Streams of web click events provide a typical workload for a CEP server. Meanwhile, a CEP server with its processing capabilities plays a key role in web click analysis. This demo highlights the features of Microsoft CEP under a workload of web click events.

very large data bases | 2014

Trill: a high-performance incremental query processor for diverse analytics

Badrish Chandramouli; Jonathan Goldstein; Mike Barnett; Robert DeLine; Danyel Fisher; John Platt; James F. Terwilliger; John Wernsing

This paper introduces Trill -- a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: (1) Query Model: Trill is based on a tempo-relational model that enables it to handle streaming and relational queries with early results, across the latency spectrum from real-time to offline; (2) Fabric and Language Integration: Trill is architected as a high-level language library that supports rich data-types and user libraries, and integrates well with existing distribution fabrics and applications; and (3) Performance: Trills throughput is high across the latency spectrum. For streaming data, Trills throughput is 2-4 orders of magnitude higher than comparable streaming engines. For offline relational queries, Trills throughput is comparable to a major modern commercial columnar DBMS. Trill uses a streaming batched-columnar data representation with a new dynamic compilation-based system architecture that addresses all these requirements. In this paper, we describe Trills new design and architecture, and report experimental results that demonstrate Trills high performance across diverse analytics scenarios. We also describe how Trills ability to support diverse analytics has resulted in its adoption across many usage scenarios at Microsoft.

very large data bases | 2010

High-performance dynamic pattern matching over disordered streams

Badrish Chandramouli; Jonathan Goldstein; David Maier

Current pattern-detection proposals for streaming data recognize the need to move beyond a simple regular-expression model over strictly ordered input. We continue in this direction, relaxing restrictions present in some models, removing the requirement for ordered input, and permitting stream revisions (modification of prior events). Further, recognizing that patterns of interest in modern applications may change frequently over the lifetime of a query, we support updating of a pattern specification without blocking input or restarting the operator. Our new pattern operator (called AFA) is a streaming adaptation of a non-deterministic finite automaton (NFA) where additional schema-based user-defined information, called a register, is accessible to NFA transitions during execution. AFAs support dynamic patterns, where the pattern itself can change over time. We propose clean order-agnostic pattern-detection semantics for AFAs, with new algorithms that allow a very efficient implementation, while retaining significant expressiveness and supporting native handling of out-of-order input, stream revisions, dynamic patterns, and several optimizations. Experiments on Microsoft StreamInsight show that we achieve event rates of more than 200K events/sec (up to 5x better than simpler schemes). Our dynamic patterns give up to orders-of-magnitude better throughput than solutions such as operator restart, and our other optimizations are very effective, incurring low memory and latency.

international conference on data engineering | 2011

The extensibility framework in Microsoft StreamInsight

Mohamed H. Ali; Badrish Chandramouli; Jonathan Goldstein; Roman Schindlauer

Microsoft StreamInsight (StreamInsight, for brevity) is a platform for developing and deploying streaming applications, which need to run continuous queries over high-data-rate streams of input events. StreamInsight leverages a well-defined temporal stream model and operator algebra, as the underlying basis for processing long-running continuous queries over event streams. This allows StreamInsight to handle imperfections in event delivery and to provide correctness guarantees on the generated output. StreamInsight natively supports a diverse range of off-the-shelf streaming operators. In order to cater to a much broader range of customer scenarios and applications, StreamInsight has recently introduced a new extensibility infrastructure. With this infrastructure, StreamInsight enables developers to integrate their domain expertise within the query pipeline in the form of user defined modules (functions, operators, and aggregates). This paper describes the extensibility framework in StreamInsight; an ongoing effort at Microsoft SQL Server to support the integration of user-defined modules in a stream processing system. More specifically, the paper addresses the extensibility problem from three perspectives: the query writers perspective, the user defined module writers perspective, and the systems internal perspective. The paper introduces and addresses a range of new and subtle challenges that arise when we try to add extensibility to a streaming system, in a manner that is easy to use, powerful, and practical. We summarize our experience and provide future directions for supporting stream-oriented workloads in different business domains.

international conference on management of data | 2007

Query suspend and resume

Badrish Chandramouli; Christopher N. Bond; Shivnath Babu; Jun Yang

Suppose a long-running analytical query is executing on a database server and has been allocated a large amount of physical memory. A high-priority task comes in and we need to run it immediately with all available resources. We have several choices. We could swap out the old query to disk, but writing out a large execution state may take too much time. Another option is to terminate the old query and restart it after the new task completes, but we would waste all the work already performed by the old query. Yet another alternative is to periodically checkpoint the query during execution, but traditional synchronous checkpointing carries high overhead. In this paper, we advocate a database-centric approach to implementing query suspension and resumption, with negligible execution overhead, bounded suspension cost, and efficient resumption. The basic idea is to let each physical query operator perform lightweight checkpointing according to its own semantics, and coordinate asynchronous checkpoints among operators through a novel contracting mechanism. At the time of suspension, we find an optimized suspend plan for the query, which may involve a combination of dumping current state to disk and going back to previous checkpoints. The plan seeks to minimize the suspend/resume overhead while observing the constraint on suspension time. Our approach requires only small changes to the iterator interface, which we have implemented in the PREDATOR database system. Experiments with our implementation demonstrate significant advantages of our approach over traditional alternatives.

international conference on management of data | 2013

Stat!: an interactive analytics environment for big data

Mike Barnett; Badrish Chandramouli; Robert DeLine; Steven M. Drucker; Danyel Fisher; Jonathan Goldstein; Patrick Morrison; John Platt

Exploratory analysis on big data requires us to rethink data management across the entire stack -- from the underlying data processing techniques to the user experience. We demonstrate Stat! -- a visualization and analytics environment that allows users to rapidly experiment with exploratory queries over big data. Data scientists can use Stat! to quickly refine to the correct query, while getting immediate feedback after processing a fraction of the data. Stat! can work with multiple processing engines in the backend; in this demo, we use Stat! with the Microsoft StreamInsight streaming engine. StreamInsight is used to generate incremental early results to queries and refine these results as more data is processed. Stat! allows data scientists to explore data, dynamically compose multiple queries to generate streams of partial results, and display partial results in both textual and visual form.

international conference on data engineering | 2011

Accurate latency estimation in a distributed event processing system

Badrish Chandramouli; Jonathan Goldstein; Roger S. Barga; Mirek Riedewald; Ivo Santos

A distributed event processing system consists of one or more nodes (machines), and can execute a directed acyclic graph (DAG) of operators called a dataflow (or query), over long-running high-event-rate data sources. An important component of such a system is cost estimation, which predicts or estimates the “goodness” of a given input, i.e., operator graph and/or assignment of individual operators to nodes. Cost estimation is the foundation for solving many problems: optimization (plan selection and distributed operator placement), provisioning, admission control, and user reporting of system misbehavior. Latency is a significant user metric in many commercial real-time applications. Users are usually interested in quantiles of latency, such as worst-case or 99th percentile. However, existing cost estimation techniques for event-based dataflows use metrics that, while they may have the side-effect of being correlated with latency, do not directly or provably estimate latency. In this paper, we propose a new cost estimation technique using a metric called Mace (Maximum cumulative excess). Mace is provably equivalent to maximum system latency in a (potentially complex, multi-node) distributed event-based system. The close relationship to latency makes Mace ideal for addressing the problems described earlier. Experiments with real-world datasets on Microsoft StreamInsight deployed over 1–13 nodes in a data center validate our ability to closely estimate latency (within 4%), and the use of Mace for plan selection and distributed operator placement.

very large data bases | 2008

End-to-end support for joins in large-scale publish/subscribe systems

Badrish Chandramouli; Jun Yang

We address the problem of supporting a large number of select-join subscriptions for wide-area publish/subscribe. Subscriptions are joins over different tables, with varying interests expressed as range selection conditions over table attributes. Naive schemes, such as computing and sending join results from a server, are inefficient because they produce redundant data, and are unable to share dissemination costs across subscribers and events. We propose a novel, scalable scheme that group-processes and disseminates a general mix of multi-way select-join subscriptions. We also propose a simple and application-agnostic extension to content-driven networks (CN), which further improves sharing of dissemination costs. Experimental evaluations show that our schemes can generate orders of magnitude lower network traffic at very low processing cost. Our extension to CN can further reduce traffic by another order of magnitude, with almost no increase in notification latency.

international conference on management of data | 2006

On the database/network interface in large-scale publish/subscribe systems

Badrish Chandramouli; Junyi Xie; Jun Yang

The work performed by a publish/subscribe system can conceptually be divided into subscription processing and notification dissemination. Traditionally, research in the database and networking communities has focused on these aspects in isolation. The interface between the database server and the network is often overlooked by previous research. At one extreme, database servers are directly responsible for notifying individual subscribers; at the other extreme, updates are injected directly into the network, and the network is solely responsible for processing subscriptions and forwarding notifications. These extremes are unsuitable for complex and stateful subscription queries. A primary goal of this paper is to explore the design space between the two extremes, and to devise solutions that incorporate both database-side and network-side considerations in order to reduce the communication and server load and maintain system scalability. Our techniques apply to a broad range of stateful query types, and we present solutions for several of them. Our detailed experiments based on real and synthetic workloads with varying characteristics and link-level network simulation show that by exploiting the query semantics and building an appropriate interface between the database and the network, it is possible to achieve orders-of-magnitude savings in network traffic at low server-side processing cost.

IEEE Computer | 2010

Data Stream Management Systems for Computational Finance

Badrish Chandramouli; Mohamed H. Ali; Jonathan Goldstein; Beysim Sezgin; Balan Sethu Raman

Because financial applications rely on a continual stream of time-sensitive data, any data management system must be able to process complex queries on the fly. Although many organizations turn to custom solutions, data stream management systems can offer the same low-latency processing with the flexibility to handle a range of applications.

Explore More