Theodore L. Willke | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Theodore L. Willke is active.

Explore More

Publication

Featured researches published by Theodore L. Willke.

IEEE Communications Surveys and Tutorials | 2009

A survey of inter-vehicle communication protocols and their applications

Theodore L. Willke; Patcharinee Tientrakool; Nicholas F. Maxemchuk

Inter-vehicle communication (IVC) protocols have the potential to increase the safety, efficiency, and convenience of transportation systems involving planes, trains, automobiles, and robots. The applications targeted include peer-to-peer networks for web surfing, coordinated braking, runway incursion prevention, adaptive traffic control, vehicle formations, and many others. The diversity of the applications and their potential communication protocols has challenged a systematic literature survey. We apply a classification technique to IVC applications to provide a taxonomy for detailed study of their communication requirements. The applications are divided into type classes which share common communication organization and performance requirements. IVC protocols are surveyed separately and their fundamental characteristics are revealed. The protocol characteristics are then used to determine the relevance of specific protocols to specific types of IVC applications.

international parallel and distributed processing symposium | 2014

How Well Do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis

Yong Guo; Marcin Biczak; Ana Lucia Varbanescu; Alexandru Iosup; Claudio Martella; Theodore L. Willke

Graph-processing platforms are increasingly used in a variety of domains. Although both industry and academia are developing and tuning graph-processing algorithms and platforms, the performance of graph-processing platforms has never been explored or compared in-depth. Thus, users face the daunting challenge of selecting an appropriate platform for their specific application. To alleviate this challenge, we propose an empirical method for benchmarking graph-processing platforms. We define a comprehensive process, and a selection of representative metrics, datasets, and algorithmic classes. We implement a benchmarking suite of five classes of algorithms and seven diverse graphs. Our suite reports on basic (user-lever) performance, resource utilization, scalability, and various overhead. We use our benchmarking suite to analyze and compare six platforms. We gain valuable insights for each platform and present the first comprehensive comparison of graph-processing platforms.

First International Workshop on Graph Data Management Experiences and Systems | 2013

GraphBuilder: scalable graph ETL framework

Nilesh K. Jain; Guangdeng Liao; Theodore L. Willke

Graph abstraction is essential for many applications from finding a shortest path to executing complex machine learning (ML) algorithms like collaborative filtering. Graph construction from raw data for various applications is becoming challenging, due to exponential growth in data, as well as the need for large scale graph processing. Since graph construction is a data-parallel problem, MapReduce is well-suited for this task. We developed GraphBuilder, a scalable framework for graph Extract-Transform-Load (ETL), to offload many of the complexities of graph construction, including graph formation, tabulation, transformation, partitioning, output formatting, and serialization. GraphBuilder is written in Java, for ease of programming, and it scales using the MapReduce model. In this paper, we describe the motivation for GraphBuilder, its architecture, MapReduce algorithms, and performance evaluation of the framework. Since large graphs should be partitioned over a cluster for storing and processing and partitioning methods have significant performance impacts, we develop several graph partitioning methods and evaluate their performance. We also open source the framework at https://01.org/graphbuilder/.

Green Computing Middleware on Proceedings of the 2nd International Workshop | 2011

Energy efficient scheduling of MapReduce workloads on heterogeneous clusters

Nezih Yigitbasi; Kushal Datta; Nilesh K. Jain; Theodore L. Willke

Energy efficiency has become the center of attention in emerging data center infrastructures as increasing energy costs continue to outgrow all other operating expenditures. In this work we investigate energy aware scheduling heuristics to increase the energy efficiency of MapReduce workloads on heterogeneous Hadoop clusters comprising both low power (wimpy) and high performance (brawny) nodes. We first make a case for heterogeneity by showing that low power Intel Atom processors and high performance Intel Sandy Bridge processors are more energy efficient for I/O bound workloads and CPU bound workloads, respectively. Then we present several energy efficient scheduling heuristics that exploit this heterogeneity and real-time power measurements enabled by modern processor architectures. Through experiments on a 23-node heterogeneous Hadoop cluster we demonstrate up to 27% better energy efficiency with our heuristics compared with the default Hadoop scheduler.

modeling, analysis, and simulation on computer and telecommunication systems | 2013

Towards Machine Learning-Based Auto-tuning of MapReduce

Nezih Yigitbasi; Theodore L. Willke; Guangdeng Liao; Dick H. J. Epema

MapReduce, which is the de facto programming model for large-scale distributed data processing, and its most popular implementation Hadoop have enjoyed widespread adoption in industry during the past few years. Unfortunately, from a performance point of view getting the most out of Hadoop is still a big challenge due to the large number of configuration parameters. Currently these parameters are tuned manually by trial and error, which is ineffective due to the large parameter space and the complex interactions among the parameters. Even worse, the parameters have to be re-tuned for different MapReduce applications and clusters. To make the parameter tuning process more effective, in this paper we explore machine learning-based performance models that we use to auto-tune the configuration parameters. To this end, we first evaluate several machine learning models with diverse MapReduce applications and cluster configurations, and we show that support vector regression model (SVR) has good accuracy and is also computationally efficient. We further assess our auto-tuning approach, which uses the SVR performance model, against the Starfish auto tuner, which uses a cost-based performance model. Our findings reveal that our auto-tuning approach can provide comparable or in some cases better performance improvements than Starfish with a smaller number of parameters. Finally, we propose and discuss a complete and practical end-to-end auto-tuning flow that combines our machine learning-based performance models with smart search algorithms for the effective training of the models and the effective exploration of the parameter space.

international conference on parallel processing | 2013

Gunther: search-based auto-tuning of mapreduce

Guangdeng Liao; Kushal Datta; Theodore L. Willke

MapReduce has emerged as a very popular programming model for large-scale data analytics. Despite its industry-wide acceptance, the open source ApacheTM HadoopTM framework for MapReduce remains difficult to optimize, particularly in large-scale production environments. The vast search space defined by the hundreds of MapReduce configuration parameters and the complex interactions between them makes it time consuming to rely on manual tuning. Hence something more is needed. In this paper we evaluate approaches to the automatic tuning of Hadoop MapReduce including ones based on cost-based and machine learning models. We determine that they are inadequate and instead propose a search-based approach called Gunther for Hadoop MapReduce optimization. Gunther uses a Genetic Algorithm which is specially designed to aggressively identify parameter settings that result in near-optimal job execution time. We evaluate Gunther on two types of clusters with different resource characteristics. Our experiments demonstrate that Gunther can obtain near-optimal performance within a small number of trials (<30), outperforming existing auto-tuning solutions and industry recommended configurations. We also describe a methodology for reducing the dimensionality of the auto-tuning problem, further improving search efficiency without sacrificing performance improvement.

international conference on performance engineering | 2014

Benchmarking graph-processing platforms: a vision

Yong Guo; Ana Lucia Varbanescu; Alexandru Iosup; Claudio Martella; Theodore L. Willke

Processing graphs, especially at large scale, is an increasingly useful activity in a variety of business, engineering, and scientific domains. Already, there are tens of graph-processing platforms, such as Hadoop, Giraph, GraphLab, etc., each with a different design and functionality. For graph-processing to continue to evolve, users have to find it easy to select a graph-processing platform, and developers and system integrators have to find it easy to quantify the performance and other non-functional aspects of interest. However, the state of performance analysis of graph-processing platforms is still immature: there are few studies and, for the few that exist, there are few similarities, and relatively little understanding of the impact of dataset and algorithm diversity on performance. Our vision is to develop, with the help of the performance-savvy community, a comprehensive benchmarking suite for graph-processing platforms. In this work, we take a step in this direction, by proposing a set of seven challenges, summarizing our previous work on performance evaluation of distributed graph-processing platforms, and introducing our on-going work within the SPEC Research Groups Cloud Working Group.

Knowledge and Information Systems | 2017

Graphlet decomposition: framework, algorithms, and applications

Nesreen K. Ahmed; Jennifer Neville; Ryan A. Rossi; Nick G. Duffield; Theodore L. Willke

From social science to biology, numerous applications often rely on graphlets for intuitive and meaningful characterization of networks. While graphlets have witnessed a tremendous success and impact in a variety of domains, there has yet to be a fast and efficient framework for computing the frequencies of these subgraph patterns. However, existing methods are not scalable to large networks with billions of nodes and edges. In this paper, we propose a fast, efficient, and parallel framework as well as a family of algorithms for counting k-node graphlets. The proposed framework leverages a number of theoretical combinatorial arguments that allow us to obtain significant improvement on the scalability of graphlet counting. For each edge, we count a few graphlets and obtain the exact counts of others in constant time using the combinatorial arguments. On a large collection of

Nature Neuroscience | 2017

Computational approaches to fMRI analysis

Jonathan D. Cohen; Nathaniel D. Daw; Barbara E. Engelhardt; Uri Hasson; Kai Li; Yael Niv; Kenneth A. Norman; Jonathan W. Pillow; Peter J. Ramadge; Nicholas B. Turk-Browne; Theodore L. Willke

international conference on big data | 2016