Lisa Amini | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lisa Amini is active.

Explore More

Publication

Featured researches published by Lisa Amini.

international conference on management of data | 2006

Design, implementation, and evaluation of the linear road bnchmark on the stream processing core

Navendu Jain; Lisa Amini; Henrique Andrade; Richard P. King; Yoonho Park; Philippe Selo; Chitra Venkatramani

Stream processing applications have recently gained significant attention in the networking and database community. At the core of these applications is a stream processing engine that performs resource allocation and management to support continuous tracking of queries over collections of physically-distributed and rapidly-updating data streams. While numerous stream processing systems exist, there has been little work on understanding the performance characteristics of these applications in a distributed setup. In this paper, we examine the performance bottlenecks of streaming data applications, in particular the Linear Road stream data management benchmark, in achieving good performance in large-scale distributed environments, using the Stream Processing Core (SPC), a stream processing middleware we have developed. First, we present the design and implementation of the Linear Road benchmark on the SPC middleware. SPC has been designed to scale to tens of thousands of processing nodes, while supporting concurrent applications and multiple simultaneous queries. Second, we identify the main performance bottlenecks in the Linear Road application in achieving scalability and low query response latency. Our results show that data locality, buffer capacity, physical allocation of processing elements to infrastructure nodes, and packaging for transporting streamed data are important factors in achieving good application performance. Though we evaluate our system primarily for the Linear Road application, we believe it also provides useful insights into the overall system behavior for supporting other distributed and large-scale continuous streaming data applications. Finally, we examine how SPC can be used and tuned to enable a very efficient implementation of the Linear Road application in a distributed environment.

Proceedings of the 4th international workshop on Data mining standards, services and platforms | 2006

SPC: a distributed, scalable platform for data mining

Lisa Amini; Henrique Andrade; Ranjita Bhagwan; Frank Eskesen; Richard P. King; Philippe Selo; Yoonho Park; Chitra Venkatramani

The Stream Processing Core (SPC) is distributed stream processing middleware designed to support applications that extract information from a large number of digital data streams. In this paper, we describe the SPC programming model which, to the best of our knowledge, is the first to support stream-mining applications using a subscription-like model for specifying stream connections as well as to provide support for non-relational operators. This enables stream-mining applications to tap into, analyze and track an ever-changing array of data streams which may contain information relevant to the streaming-queries placed on it. We describe the design, implementation, and experimental evaluation of the SPC distributed middleware, which deploys applications on to the running system in an incremental fashion, making stream connections as required. Using micro-benchmarks and a representative large-scale synthetic stream-mining application, we evaluate the performance of the control and data paths of the SPC middleware.

international conference on distributed computing systems | 2006

Adaptive Control of Extreme-scale Stream Processing Systems

Lisa Amini; Navendu Jain; Anshul Sehgal; Jeremy I. Silber; Olivier Verscheure

Distributed stream processing systems offer a highly scalable and dynamically configurable platform for time-critical applications ranging from real-time, exploratory data mining to high performance transaction processing. Resource management for distributed stream processing systems is complicated by a number of factors processing elements are constrained by their producer-consumer relationships, data and processing rates can be highly bursty, and traditional measures of effectiveness, such as utilization, can be misleading. In this paper, we propose a novel distributed, adaptive control algorithm that maximizes weighted throughput while ensuring stable operation in the face of highly bursty workloads. Our algorithm is designed to meet the challenges of extreme-scale stream processing systems, where overprovisioning is not an option, by making the best use of resources even when the proffered load is greater than available resources. We have implemented our algorithm in a real-world distributed stream processing system and a simulation environment. Our results show that our algorithm is not only self-stabilizing and robust to errors, but also outperforms traditional approaches over a broad range of buffer sizes, processing graphs, and burstiness types and levels.

Computer Communications | 2002

Joint server scheduling and proxy caching for video delivery

Olivier Verscheure; Chitra Venkatramani; Pascal Frossard; Lisa Amini

We consider the delivery of video assets over a best-effort network, possibly through a caching proxy located close to the clients generating the requests. We are interested in the joint server scheduling and prefix/partial caching strategy that minimizes the aggregate transmission rate over the backbone network (i.e. average output server rate) under a cache of given capacity. We present multiple schemes to address various service levels and client resources by enabling bandwidth and cache space tradeoffs. We also propose an optimization algorithm selecting the working set of asset prefixes. We detail algorithms for practical implementation of our schemes. Simulation results show that our scheme dramatically outperforms the full caching technique.

international conference on computer communications | 2004

Effective peering for multi-provider content delivery services

Lisa Amini; Anees Shaikh; Henning Schulzrinne

Peering allows service providers to handle traffic surges without over-provisioning, reduce the cost of dedicated infrastructure, and leverage the specialization and prices of partner providers. We develop a peering system for multi-provider content delivery based on a cost-optimized peer selection algorithm. We formulate a cost model for evaluating competing peering strategies, and use measurement data collected from globally distributed network probe stations, large-scale Web sites, and existing service provider infrastructures to empirically evaluate proposed peering strategies. Our analysis shows that our peer selection algorithm is significantly more efficient than greedy alternatives, in terms of minimizing service cost and respecting network delay and server capacity thresholds, over a broad range of real-world scenarios.

IEEE Journal of Selected Topics in Signal Processing | 2007

Configuring Competing Classifier Chains in Distributed Stream Mining Systems

Fu Fangwen; Deepak S. Turaga; Olivier Verscheure; M. van der Schaar; Lisa Amini

Networks of classifiers are capturing the attention of system and algorithmic researchers because they offer improved accuracy over single model classifiers, can be distributed over a network of servers for improved scalability, and can be adapted to available system resources. In this paper, we develop algorithms to optimally configure networks (chains) of such classifiers given system processing resource constraints. We first formally define a global performance metric for classifier chains by trading off the end-to-end probabilities of detection and false alarm. We then design centralized and distributed algorithms to provide efficient and fair resource allocation among several classifier chains competing for system resources. We use the Nash bargaining solution from game theory to ensure this. We also extend our algorithms to consider arbitrary topologies of classifier chains (with shared classifiers among competing chains). We present results for both simulated and state-of-the-art classifier chains for speaker verification operating on real telephony data, discuss the convergence of our algorithms to the optimal solution, and present interesting directions for future research.

international conference on data mining | 2006

Resource Management for Networked Classifiers in Distributed Stream Mining Systems

Deepak S. Turaga; Olivier Verscheure; Upendra V. Chaudhari; Lisa Amini

Networks of classifiers are capturing the attention of system and algorithmic researchers because they offer improved accuracy over single model classifiers, can be distributed over a network of servers for improved scalability, and can be adapted to available system resources. This work provides a principled approach for the optimized allocation of system resources across a networked chain of classifiers. We begin with an illustrative example of how complex classification tasks can be decomposed into a network of binary classifiers. We formally define a global performance metric by recursively collapsing the chain of classifiers into one combined classifier. The performance metric trades off the end-to-end probabilities of detection and false alarm, both of which depend on the resources allocated to each individual classifier. We formulate the optimization problem and present optimal resource allocation results for both simulated and state-of-the-art classifier chains operating on telephony data.

international world wide web conferences | 2003

Modeling redirection in geographically diverse server sets

Lisa Amini; Anees Shaikh; Henning Schulzrinne

Internet server selection mechanisms attempt to optimize, subject to a variety of constraints, the distribution of client requests to a geographically and topologically diverse pool of servers. Research on server selection has thus far focused primarily on techniques for choosing a server from a group administered by single entity, like a content distribution network provider. In a federated, multi-provider computing system, however, selection must occur over distributed server sets deployed by the participating providers, without the benefit of the full information available in the single-provider case. Intelligent server set selection algorithms will require a model of the expected performance clients would receive from a candidate server set.In this paper, we study whether the complex policies and dynamics of intelligent server selection can be effectively modeled in order to predict client performance for server sets. We introduce a novel server set distance metric, and use it in a measurement study of several million server selection transactions to develop simple models of existing server selection schemes. We then evaluate these models in terms of their ability to accurately predict performance for a second, larger set of distributed clients. We show that our models are able to predict performance within 20ms for over 90% of the observed samples. Our analysis demonstrates that although existing deployments use a variety of complex and dynamic server selection criteria, most of which are proprietary, these schemes can be modeled with surprising accuracy.

software visualization | 2008

Streamsight: a visualization tool for large-scale streaming applications

Wim De Pauw; Henrique Andrade; Lisa Amini

Stream processing is becoming a new and important computing paradigm. Innovative streaming applications are being developed in areas ranging from scientific applications (e.g., environment monitoring), to business intelligence (e.g., fraud detection and trend analysis), to financial markets (e.g., algorithmic trading strategies). Developing, understanding, debugging, and optimizing streaming applications is non-trivial because of the adaptive and dynamic nature of these applications. The sheer complexity and the distributed character of a large number of cooperating components hosted on a distributed environment further complicate matters. In this paper we describe Streamsight, a new visualization tool built to examine, monitor, and help understand the dynamic behavior of streaming applications. Previously developed stream processing visualization tools focus solely on composition of dataflow graphs. Streamsights novelty hinges on a wide range of capabilities, including the ability to manage the dynamics of large and evolving topologies comprising multiple streaming applications with thousands of nodes and interconnections. From rendering live performance counters using different perspectives to allowing recordings and replays of the execution process, Streamsight provides the mechanisms that permit a better understanding of the evolving and adaptive behavior of streaming applications. These capabilities are used for debugging purposes, for performance optimization, and management of resources, including capacity planning. More than 50 developers, both inside and outside IBM, have been using Streamsight.

IEEE Transactions on Circuits and Systems for Video Technology | 2011

Configuring Trees of Classifiers in Distributed Multimedia Stream Mining Systems

Brian Foo; Deepak S. Turaga; Olivier Verscheure; M. van der Schaar; Lisa Amini

Multimedia stream mining applications require the identification of several different attributes in data content, and hence rely on a set of cascaded statistical classifiers to filter and process the data dynamically. In this paper, we introduce a novel methodology for configuring such cascaded classifier topologies, specifically binary classifier trees, in resource-constrained, distributed stream mining systems. Instead of traditional load shedding, our approach configures classifiers with optimized operating points after jointly considering the misclassification cost of each end-to-end class of interest in the tree, the resource constraints for every classifier, and the confidence level of each data object that is classified. The proposed approach allows for both intelligent load shedding as well as data replication based on available resources dynamically. We evaluate the algorithm on a sports video concept detection application and identify huge cost savings over load shedding alone. Additionally, we propose several distributed algorithms that enable each classifier in the tree to reconfigure itself based on local information exchange. We analyze the associated tradeoffs between convergence time, information overhead, and the cost efficiency of results achieved by each classifier for each of these algorithms.

Explore More