Featured Researches

Distributed Parallel And Cluster Computing

CausalEC: A Causally Consistent Data Storage Algorithm based on Cross-Object Erasure Coding

Causally consistent distributed storage systems have received significant recent attention due to the potential for providing a low latency data access as compared with linearizability. Current causally consistent data stores use partial or full replication to ensure data access to clients over a distributed setting. In this paper, we develop, for the first time, an erasure coding based algorithm called CausalEC that ensures causal consistency for a collection of read-write objects stored in a distributed set of nodes over an asynchronous message passing system. CausalEC can use an arbitrary linear erasure code for data storage, and ensures liveness and storage properties prescribed by the erasure code. CausalEC retains a key benefit of previously designed replication-based algorithms - every write operation is local, that is, a server performs only local actions before returning to a client that issued a write operation. For servers that store certain objects in an uncoded manner, read operations to those objects also return locally. In general, a read operation to an object can be returned by a server on contacting a small subset of other servers so long as the underlying erasure code allows for the object to be decoded from that subset. As a byproduct, we develop EventualEC, a new eventually consistent erasure coding based data storage algorithm. A novel technical aspect of CausalEC is the use of cross-object erasure coding, where nodes encode values across multiple objects, unlike previous consistent erasure coding based solutions. CausalEC navigates the technical challenges of cross-object erasure coding, in particular, pertaining to re-encoding the objects when writes update the values and ensuring that reads are served in the transient state where the system transitions to storing the codeword symbols corresponding to the new object versions.

Read more
Distributed Parallel And Cluster Computing

Cellular Automata and Kan Extensions

In this paper, we formalize precisely the sense in which the application of cellular automaton to partial configuration is a natural extension of its local transition function through the categorical notion of Kan extension. In fact, the two possible ways to do such an extension and the ingredients involved in their definition are related through Kan extensions in many ways. These relations provide additional links between computer science and category theory, and also give a new point of view on the famous Curtis-Hedlung theorem of cellular automata from the extended topological point of view provided by category theory. These relations provide additional links between computer science and category theory. No prior knowledge of category theory is assumed.

Read more
Distributed Parallel And Cluster Computing

Cerberus: Minimalistic Multi-shard Byzantine-resilient Transaction Processing

To enable high-performance and scalable blockchains, we need to step away from traditional consensus-based fully-replicated designs. One direction is to explore the usage of sharding in which we partition the managed dataset over many shards that independently operate as blockchains. Sharding requires an efficient fault-tolerant primitive for the ordering and execution of multi-shard transactions, however. In this work, we seek to design such a primitive suitable for distributed ledger networks with high transaction throughput. To do so, we propose Cerberus, a set of minimalistic primitives for processing single-shard and multi-shard UTXO-like transactions. Cerberus aims at maximizing parallel processing at shards while minimizing coordination within and between shards. First, we propose Core-Cerberus, that uses strict environmental requirements to enable simple yet powerful multi-shard transaction processing. In our intended UTXO-environment, Core-Cerberus will operate perfectly with respect to all transactions proposed and approved by well-behaved clients, but does not provide any guarantees for other transactions. To also support more general-purpose environments, we propose two generalizations of Core-Cerberus: we propose Optimistic-Cerberus, a protocol that does not require any additional coordination phases in the well-behaved optimistic case, while requiring intricate coordination when recovering from attacks; and we propose Pessimistic-Cerberus, a protocol that adds sufficient coordination to the well-behaved case of Core-Cerberus, allowing it to operate in a general-purpose fault-tolerant environments without significant costs to recover from attacks. Finally, we compare the three protocols, showing their potential scalability and high transaction throughput in practical environments.

Read more
Distributed Parallel And Cluster Computing

Characterizing BigBench queries, Hive, and Spark in multi-cloud environments

BigBench is the new standard (TPCx-BB) for benchmarking and testing Big Data systems. The TPCx-BB specification describes several business use cases -- queries -- which require a broad combination of data extraction techniques including SQL, Map/Reduce (M/R), user code (UDF), and Machine Learning to fulfill them. However, currently, there is no widespread knowledge of the different resource requirements and expected performance of each query, as is the case to more established benchmarks. At the same time, cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Hive and Spark come ready to use, with a general-purpose configuration and upgrade management. The study characterizes both the BigBench queries and the out-of-the-box performance of Spark and Hive versions in the cloud. At the same time, comparing popular PaaS offerings in terms of reliability, data scalability (1GB to 10TB), versions, and settings from Azure HDinsight, Amazon Web Services EMR, and Google Cloud Dataproc. The query characterization highlights the similarities and differences in Hive an Spark frameworks, and which queries are the most resource consuming according to CPU, memory, and I/O. Scalability results show how there is a need for configuration tuning in most cloud providers as data scale grows, especially with Sparks memory usage. These results can help practitioners to quickly test systems by picking a subset of the queries which stresses each of the categories. At the same time, results show how Hive and Spark compare and what performance can be expected of each in PaaS.

Read more
Distributed Parallel And Cluster Computing

Characterizing and Optimizing EDA Flows for the Cloud

Cloud computing accelerates design space exploration in logic synthesis, and parameter tuning in physical design. However, deploying EDA jobs on the cloud requires EDA teams to deeply understand the characteristics of their jobs in cloud environments. Unfortunately, there has been little to no public information on these characteristics. Thus, in this paper, we formulate the problem of migrating EDA jobs to the cloud. First, we characterize the performance of four main EDA applications, namely: synthesis, placement, routing and static timing analysis. We show that different EDA jobs require different machine configurations. Second, using observations from our characterization, we propose a novel model based on Graph Convolutional Networks to predict the total runtime of a given application on different machine configurations. Our model achieves a prediction accuracy of 87%. Third, we develop a new formulation for optimizing cloud deployments in order to reduce deployment costs while meeting deadline constraints. We present a pseudo-polynomial optimal solution using a multi-choice knapsack mapping that reduces costs by 35.29%.

Read more
Distributed Parallel And Cluster Computing

Characterizing the Energy Trade-Offs of End-to-End Vehicular Communications using an Hyperfractal Urban Modelling

We characterize trade-offs between the end-to-end communication delay and the energy in urban vehicular communications with infrastructure assistance. Our study exploits the self-similarity of the location of communication entities in cities by modeling them with an innovative model called "hyperfractal". We show that the hyperfractal model can be extended to incorporate road-side infrastructure and provide stochastic geometry tools to allow a rigorous analysis. We compute theoretical bounds for the end-to-end communication hop count considering two different energy-minimizing goals: either total accumulated energy or maximum energy per node. We prove that the hop count for an end-to-end transmission is bounded by O( n 1?��?( d F ??) ) where α<1 and d F >2 is the fractal dimension of the mobile nodes process. This proves that for both constraints the energy decreases as we allow choosing routing paths of higher length. The asymptotic limit of the energy becomes significantly small when the number of nodes becomes asymptotically large. A lower bound on the network throughput capacity with constraints on path energy is also given. We show that our model fits real deployments where open data sets are available. The results are confirmed through simulations using different fractal dimensions in a Matlab simulator.

Read more
Distributed Parallel And Cluster Computing

Checkpointing and Localized Recovery for Nested Fork-Join Programs

While checkpointing is typically combined with a restart of the whole application, localized recovery permits all but the affected processes to continue. In task-based cluster programming, for instance, the application can then be finished on the intact nodes, and the lost tasks be reassigned. This extended abstract suggests to adapt a checkpointing and localized recovery technique that has originally been developed for independent tasks to nested fork-join programs. We consider a Cilk-like work stealing scheme with work-first policy in a distributed memory setting, and describe the required algorithmic changes. The original technique has checkpointing overheads below 1% and neglectable costs for recovery, we expect the new algorithm to achieve a similar performance.

Read more
Distributed Parallel And Cluster Computing

Checkpointing with cp: the POSIX Shared Memory System

We present the checkpointing scheme of Abacus, an N -body simulation code that allocates all persistent state in POSIX shared memory, or ramdisk. Checkpointing becomes as simple as copying files from ramdisk to external storage. The main simulation executable is invoked once per time step, memory mapping the input state, computing the output state directly into ramdisk, and unmapping the input state. The main executable remains unaware of the concept of checkpointing, with the top-level driver code launching a file-system copy between executable invocations when a checkpoint is needed. Since the only information flow is through files on ramdisk, the checkpoint must be correct so long as the simulation is correct. However, we find that with multi-GB of state, there is a significant overhead to unmapping the shared memory. This can be partially mitigated with multithreading, but ultimately, we do not recommend shared memory for use with a large state.

Read more
Distributed Parallel And Cluster Computing

Chimbuko: A Workflow-Level Scalable Performance Trace Analysis Tool

Because of the limits input/output systems currently impose on high-performance computing systems, a new generation of workflows that include online data reduction and analysis is emerging. Diagnosing their performance requires sophisticated performance analysis capabilities due to the complexity of execution patterns and underlying hardware, and no tool could handle the voluminous performance trace data needed to detect potential problems. This work introduces Chimbuko, a performance analysis framework that provides real-time, distributed, in situ anomaly detection. Data volumes are reduced for human-level processing without losing necessary details. Chimbuko supports online performance monitoring via a visualization module that presents the overall workflow anomaly distribution, call stacks, and timelines. Chimbuko also supports the capture and reduction of performance provenance. To the best of our knowledge, Chimbuko is the first online, distributed, and scalable workflow-level performance trace analysis framework, and we demonstrate the tool's usefulness on Oak Ridge National Laboratory's Summit system.

Read more
Distributed Parallel And Cluster Computing

Chiron: Optimizing Fault Tolerance in QoS-aware Distributed Stream Processing Jobs

Fault tolerance is a property which needs deeper consideration when dealing with streaming jobs requiring high levels of availability and low-latency processing even in case of failures where Quality-of-Service constraints must be adhered to. Typically, systems achieve fault tolerance and the ability to recover automatically from partial failures by implementing Checkpoint and Rollback Recovery. However, this is an expensive operation which impacts negatively on the overall performance of the system and manually optimizing fault tolerance for specific jobs is a difficult and time consuming task. In this paper we introduce Chiron, an approach for automatically optimizing the frequency with which checkpoints are performed in streaming jobs. For any chosen job, parallel profiling runs are performed, each containing a variant of the configurations, with the resulting metrics used to model the impact of checkpoint-based fault tolerance on performance and availability. Understanding these relationships is key to minimizing performance objectives and meeting strict Quality-of-Service constraints. We implemented Chiron prototypically together with Apache Flink and demonstrate its usefulness experimentally.

Read more

Ready to get started?

Join us today