Julian James Stephen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Julian James Stephen is active.

Explore More

Publication

Featured researches published by Julian James Stephen.

IEEE Transactions on Computers | 2014

From the Cloud to the Atmosphere: Running MapReduce across Data Centers

Chamikara Jayalath; Julian James Stephen; Patrick Eugster

Efficiently analyzing big data is a major issue in our current era. Examples of analysis tasks include identification or detection of global weather patterns, economic changes, social phenomena, or epidemics. The cloud computing paradigm along with software tools such as implementations of the popular MapReduce framework offer a response to the problem by distributing computations among large sets of nodes. In many scenarios, input data are, however, geographically distributed (geodistributed) across data centers, and straightforwardly moving all data to a single data center before processing it can be prohibitively expensive. Above-mentioned tools are designed to work within a single cluster or data center and perform poorly or not at all when deployed across data centers. This paper deals with executing sequences of MapReduce jobs on geo-distributed data sets. We analyze possible ways of executing such jobs, and propose data transformation graphs that can be used to determine schedules for job sequences which are optimized either with respect to execution time or monetary cost. We introduce G-MR, a system for executing such job sequences, which implements our optimization framework. We present empirical evidence in Amazon EC2 and VICCI of the benefits of G-MR over common, naïve deployments for processing geodistributed data sets. Our evaluations show that using G-MR significantly improves processing time and cost for geodistributed data sets.

automated software engineering | 2014

Program analysis for secure big data processing

Julian James Stephen; Savvas Savvides; Russell Seidel; Patrick Eugster

The ubiquitous nature of computers is driving a massive increase in the amount of data generated by humans and machines. Two natural consequences of this are the increased efforts to (a) derive meaningful information from accumulated data and (b) ensure that data is not used for unintended purposes. In the direction of analyzing massive amounts of data (a.), tools like MapReduce, Spark, Dryad and higher level scripting languages like Pig Latin and DryadLINQ have significantly improved corresponding tasks for software developers. The second, but equally important aspect of ensuring confidentiality (b.), has seen little support emerge for programmers: while advances in cryptographic techniques allow us to process directly on encrypted data, programmer-friendly and efficient ways of programming such data analysis jobs are still missing. This paper presents novel data flow analyses and program transformations for Pig Latin, that automatically enable the execution of corresponding scripts on encrypted data. We avoid fully homomorphic encryption because of its prohibitively high cost; instead, in some cases, we rely on a minimal set of operations performed by the client. We present the algorithms used for this translation, and empirically demonstrate the practical performance of our approach as well as improvements for programmers in terms of the effort required to preserve data confidentiality.

international middleware conference | 2013

Assured Cloud-Based Data Analysis with ClusterBFT

Julian James Stephen; Patrick Eugster

The shift to cloud technologies is a paradigm change that offers considerable financial and administrative gains. However governmental and business institutions wanting to tap into these gains are concerned with security issues. The cloud presents new vulnerabilities and is dominated by new kinds of applications, which calls for new security solutions.

symposium on cloud computing | 2016

STYX: Stream Processing with Trustworthy Cloud-based Execution

Julian James Stephen; Savvas Savvides; Vinaitheerthan Sundaram; Masoud Saeida Ardekani; Patrick Eugster

With the advent of the Internet of Things (IoT), billions of devices are expected to continuously collect and process sensitive data (e.g., location, personal health). Due to limited computational capacity available on IoT devices, the current de facto model for building IoT applications is to send the gathered data to the cloud for computation. While private cloud infrastructures for handling large amounts of data streams are expensive to build, using low cost public (untrusted) cloud infrastructures for processing continuous queries including on sensitive data leads to concerns over data confidentiality. This paper presents STYX, a novel programming abstraction and managed runtime system, that ensures confidentiality of IoT applications whilst leveraging the public cloud for continuous query processing. The key idea is to intelligently utilize partially homomorphic encryption to perform as many computationally intensive operations as possible in the untrusted cloud. STYX provides a simple abstraction to the IoT developer to hide the complexities of (1) applying complex cryptographic primitives, (2) reasoning about performance of such primitives, (3) deciding which computations can be executed in an untrusted tier, and (4) optimizing cloud resource usage. An empirical evaluation with benchmarks and case studies shows the feasibility of our approach.

ieee international conference on cloud computing technology and science | 2014

Universal Cross-Cloud Communication

Chamikara Jayalath; Julian James Stephen; Patrick Eugster

Integration of applications, data-centers, and programming abstractions in the cloud-of-clouds poses many challenges to system engineers. Different cloud providers offer different communication abstractions, and applications exhibit different communication patterns. By abstracting from hardware addresses and lower-level communication, the publish/subscribe paradigm seems like an adequate abstraction for supporting communication across clouds, as it supports many-to-many communication between publishers and subscribers, of which one-to-one or one-to-many can be viewed as special cases. In particular, content-based publish/subscribe (CPS) systems provide an expressive abstraction that matches well with the key-value pair model of many established cloud storage and computing systems, and decentralized overlay-based CPS implementations scale up well. However, CPS systems perform poorly at small scale, e.g., one-to-one or one-to-many communication. This holds especially for multi-send scenarios which we refer to as entourages that may range from a channel between a publisher and a single subscriber to a broadcast between a publisher and a handful of subscribers. These scenarios are common in cloud computing, where cheap hardware is exploited for parallelism (efficiency) and redundancy (fault-tolerance). With CPS, multi-send messages go over several hops before their destinations are even identified via predicate matching, resulting in increased latency, especially when destinations are located in different data-centers or zones. Topic-based publish/subscribe (TPS) systems support communication at small scale more efficiently, but still route messages over multiple hops and inversely lack the flexibility of CPS systems. In this paper, we propose CPS protocols for cloud-of-clouds communication that can dynamically identify entourages of publishers and corresponding subscribers. Our CPS protocols dynamically connect the publishers with their entourages through überlays . These überlays can transmit messages from a publisher to its corresponding subscribers with low latency. Our experiments show that our protocols make CPS abstraction viable and beneficial for many applications. We introduce a CPS system named Atmosphere that leverages out CPS protocols and illustrate how Atmosphere has allowed us to implement, with little effort, versions of the popular HDFS and ZooKeeper systems which operate efficiently across data-centers.

international conference on autonomic computing | 2015

Distributed Real-Time Event Analysis

Julian James Stephen; Daniel Gmach; Rob Block; Adit Madan; Alvin AuYoung

Security Information and Event Management (SIEM) systems perform complex event processing over a large number of event streams at high rate. As event streams increase in volume and event processing becomes more complex, traditional approaches such as scaling up to more powerful systems quickly become ineffective. This paper describes the design and implementation of DRES, a distributed, rule-based event evaluation system that can easily scale to process a large volume of non-trivial events. DRES intelligently forwards events across a cluster of nodes to evaluate complex correlation and aggregation rules. This approach enables DRES to work with any rules engine implementation. Our evaluation shows DRES scales linearly to more than 16 nodes. At this size it successfully processed more than half a million events per second.

international middleware conference | 2013

Atmosphere: A Universal Cross-Cloud Communication Infrastructure

Chamikara Jayalath; Julian James Stephen; Patrick Eugster

As demonstrated by the emergence of paradigms like fog computing [1] or cloud-of-clouds [2], the landscape of third-party computation is moving beyond straightforward single datacenter-based cloud computing. However, building applications that execute efficiently across data-centers and clouds is tedious due to the variety of communication abstractions provided, and variations in latencies within and between datacenters.

symposium on cloud computing | 2017

Secure data types: a simple abstraction for confidentiality-preserving data analytics

Savvas Savvides; Julian James Stephen; Masoud Saeida Ardekani; Vinaitheerthan Sundaram; Patrick Eugster

Cloud computing offers a cost-efficient data analytics platform. However, due to the sensitive nature of data, many organizations are reluctant to analyze their data in public clouds. Both software-based and hardware-based solutions have been proposed to address the stalemate, yet all have substantial limitations. We observe that a main issue cutting across all solutions is that they attempt to support confidentiality in data queries in a way transparent to queries. We propose the novel abstraction of secure data types with corresponding annotations for programmers to conveniently denote constraints relevant to security. These abstractions are leveraged by novel compilation techniques in our system Cuttlefish to compute data analytics queries in public cloud infrastructures while keeping sensitive data confidential. Cuttlefish encrypts all sensitive data residing in the cloud and employs partially homomorphic encryption schemes to perform operations securely, resorting however to client-side completion, re-encryption, or secure hardware-based re-encryption based on Intels SGX when available based on a novel planner engine. Our evaluation shows that our prototype can execute all queries in standard benchmarks such as TPC-H and TPC-DS with an average overhead of 2.34× and 1.69× respectively compared to a plaintext execution that reveals all data.

ieee international conference on cloud computing technology and science | 2014