Featured Researches

Distributed Parallel And Cluster Computing

A General Framework for the Security Analysis of Blockchain Protocols

Blockchain protocols differ in fundamental ways, including the mechanics of selecting users to produce blocks (e.g., proof-of-work vs. proof-of-stake) and the method to establish consensus (e.g., longest chain rules vs. Byzantine fault-tolerant (BFT) inspired protocols). These fundamental differences have hindered "apples-to-apples" comparisons between different categories of blockchain protocols and, in turn, the development of theory to formally discuss their relative merits. This paper presents a parsimonious abstraction sufficient for capturing and comparing properties of many well-known permissionless blockchain protocols, simultaneously capturing essential properties of both proof-of-work (PoW) and proof-of-stake (PoS) protocols, and of both longest-chain-type and BFT-type protocols. Our framework blackboxes the precise mechanics of the user selection process, allowing us to isolate the properties of the selection process that are significant for protocol design. We demonstrate the utility of our general framework with several concrete results: 1. We prove a CAP-type impossibility theorem asserting that liveness with an unknown level of participation rules out security in a partially synchronous setting. 2. Delving deeper into the partially synchronous setting, we prove that a necessary and sufficient condition for security is the production of "certificates," meaning stand-alone proofs of block confirmation. 3. Restricting to synchronous settings, we prove that typical protocols with a known level of participation (including longest chain-type PoS protocols) can be adapted to provide certificates, but those with an unknown level of participation cannot. 4. Finally, we use our framework to articulate a modular two-step approach to blockchain security analysis that effectively reduces the permissionless case to the permissioned case.

Read more
Distributed Parallel And Cluster Computing

A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR

Tensor algebra is widely used in many applications, such as scientific computing, machine learning, and data analytics. The tensors represented real-world data are usually large and sparse. There are tens of storage formats designed for sparse matrices and/or tensors and the performance of sparse tensor operations depends on a particular architecture and/or selected sparse format, which makes it challenging to implement and optimize every tensor operation of interest and transfer the code from one architecture to another. We propose a tensor algebra domain-specific language (DSL) and compiler infrastructure to automatically generate kernels for mixed sparse-dense tensor algebra operations, named COMET. The proposed DSL provides high-level programming abstractions that resemble the familiar Einstein notation to represent tensor algebra operations. The compiler performs code optimizations and transformations for efficient code generation while covering a wide range of tensor storage formats. COMET compiler also leverages data reordering to improve spatial or temporal locality for better performance. Our results show that the performance of automatically generated kernels outperforms the state-of-the-art sparse tensor algebra compiler, with up to 20.92x, 6.39x, and 13.9x performance improvement, for parallel SpMV, SpMM, and TTM over TACO, respectively.

Read more
Distributed Parallel And Cluster Computing

A Machine Learning Approach to Online Fault Classification in HPC Systems

As High-Performance Computing (HPC) systems strive towards the exascale goal, failure rates both at the hardware and software levels will increase significantly. Thus, detecting and classifying faults in HPC systems as they occur and initiating corrective actions before they can transform into failures becomes essential for continued operation. Central to this objective is fault injection, which is the deliberate triggering of faults in a system so as to observe their behavior in a controlled environment. In this paper, we propose a fault classification method for HPC systems based on machine learning. The novelty of our approach rests with the fact that it can be operated on streamed data in an online manner, thus opening the possibility to devise and enact control actions on the target system in real-time. We introduce a high-level, easy-to-use fault injection tool called FINJ, with a focus on the management of complex experiments. In order to train and evaluate our machine learning classifiers, we inject faults to an in-house experimental HPC system using FINJ, and generate a fault dataset which we describe extensively. Both FINJ and the dataset are publicly available to facilitate resiliency research in the HPC systems field. Experimental results demonstrate that our approach allows almost perfect classification accuracy to be reached for different fault types with low computational overhead and minimal delay.

Read more
Distributed Parallel And Cluster Computing

A New Perspective of Graph Data and A Generic and Efficient Method for Large Scale Graph Data Traversal

The BFS algorithm is a basic graph data processing algorithm and many other graph data processing algorithms have similar architectural features with BFS algorithm and can be built on the basis of BFS algorithm model. We analyze the differences between graph algorithms and traditional high-performance algorithms in detail, propose a new way of classifying algorithms into data independent algorithm and data correlation algorithm based on their run-time correlation with data, and use this new classification to explain the validity of the methods proposed in this paper. Through a deeper analysis of graph data, we propose a new fundamental perspective on understanding graph data, establishing a link between two basic data structures, graph and tree, and viewing graph data as consisting of smaller subgraphs and edge trees. Small degree vertices are found to be one of important cause of random memory access. Based on this, we propose a general, easy to implement, and efficient method for graph data processing, with the basic idea of treating low-degree vertices and core subgraphs separately, thus significantly reducing the size of random memory access and improving the efficiency of memory access. Finally, we evaluated the performance of the method on three major data center computing platforms (Intel, AMD, and ARM), and the experiments showed that it brought 19.7%, 31.8% and 17.9% performance improvement, respectively, with a performance-power ratio of 282.70 MTEPS/s on the ARM platform, ranking it among the Green graph500 in November 2019. World No. 1 on the big dataset list.

Read more
Distributed Parallel And Cluster Computing

A Newcomer In The PGAS World -- UPC++ vs UPC: A Comparative Study

A newcomer in the Partitioned Global Address Space (PGAS) 'world' has arrived in its version 1.0: Unified Parallel C++ (UPC++). UPC++ targets distributed data structures where communication is irregular or fine-grained. The key abstractions are global pointers, asynchronous programming via RPC, futures and promises. UPC++ API for moving non-contiguous data and handling memories with different optimal access methods resemble those used in modern C++. In this study we provide two kernels implemented in UPC++: a sparse-matrix vector multiplication (SpMV) as part of a Partial-Differential Equation solver, and an implementation of the Heat Equation on a 2D-domain. Code listings of these two kernels are available in the article in order to show the differences in programming style between UPC and UPC++. We provide a performance comparison between UPC and UPC++ using single-node, multi-node hardware and many-core hardware (Intel Xeon Phi Knight's Landing).

Read more
Distributed Parallel And Cluster Computing

A Novel Approach for the Process Planning and Scheduling Problem Using the Concept of Maximum Weighted Independent Set

Process Planning and Scheduling (PPS) is an essential and practical topic but a very intractable problem in manufacturing systems. Many research use iterative methods to solve such problems; however, they cannot achieve satisfactory results in both quality and computational speed. Other studies formulate scheduling problems as a graph coloring problem (GCP) or its extensions, but these formulations are limited to certain types of scheduling problems. In this paper, we propose a novel approach to formulate a general type of the PPS problem with resource allocation and process planning integrated towards a typical objective, minimizing the makespan. The PPS problem is formulated into an undirected weighted conflicting graph, where nodes represent operations and their resources; edges represent constraints, and weight factors are guidelines for the node selection at each time slot. Then, the Maximum Weighted Independent Set (MWIS) problem can be solved to find the best set of operations with their desired resources for each discrete time slot. This proposed approach solves the PPS problem directly with minimum iterations. We establish that the proposed approach always returns a feasible optimum or near-optimum solution to the PPS problem. The different weight configurations of the proposed approach for solving the PPS problem are tested on a real-world PPS example and further designated test instances to evaluate the scalability, accuracy, and robustness.

Read more
Distributed Parallel And Cluster Computing

A Novel Graph-based Computation Offloading Strategy for Workflow Applications in Mobile Edge Computing

With the fast development of mobile edge computing (MEC), there is an increasing demand for running complex applications on the edge. These complex applications can be represented as workflows where task dependencies are explicitly specified. To achieve better Quality of Service (QoS), for instance, faster response time and lower energy consumption, computation offloading is widely used in the MEC environment. However, many existing computation offloading strategies only focus on independent computation tasks but overlook the task dependencies. Meanwhile, most of these strategies are based on search algorithms such as particle swarm optimization (PSO), genetic algorithm (GA) which are often time-consuming and hence not suitable for many delay-sensitive complex applications in MEC. Therefore, a highly efficient graph-based strategy was proposed in our recent work but it can only deal with simple workflow applications with linear (namely sequential) structure. For solving these problems, a novel graph-based strategy is proposed for workflow applications in MEC. Specifically, this strategy can deal with complex workflow applications with nonlinear (viz. parallel, selective and iterative) structures. Meanwhile, the offloading decision plan with the lowest energy consumption of the end-device under the deadline constraint can be found by using the graph-based partition technique. We have comprehensively evaluated our strategy using both a real-world case study on a MEC based UAV (Unmanned Aerial Vehicle) delivery system and extensive simulation experiments on the FogWorkflowSim platform for MEC based workflow applications. The evaluation results successfully demonstrate the effectiveness of our proposed strategy and its overall better performance than other representative strategies.

Read more
Distributed Parallel And Cluster Computing

A Probabilistic Approach for Data Management in Pervasive Computing Applications

Current advances in Pervasive Computing (PC) involve the adoption of the huge infrastructures of the Internet of Things (IoT) and the Edge Computing (EC). Both, IoT and EC, can support innovative applications around end users to facilitate their activities. Such applications are built upon the collected data and the appropriate processing demanded in the form of requests. To limit the latency, instead of relying on Cloud for data storage and processing, the research community provides a number of models for data management at the EC. Requests, usually defined in the form of tasks or queries, demand the processing of specific data. A model for pre-processing the data preparing them and detecting their statistics before requests arrive is necessary. In this paper, we propose a promising and easy to implement scheme for selecting the appropriate host of the incoming data based on a probabilistic approach. Our aim is to store similar data in the same distributed datasets to have, beforehand, knowledge on their statistics while keeping their solidity at high levels. As solidity, we consider the limited statistical deviation of data, thus, we can support the storage of highly correlated data in the same dataset. Additionally, we propose an aggregation mechanism for outliers detection applied just after the arrival of data. Outliers are transferred to Cloud for further processing. When data are accepted to be locally stored, we propose a model for selecting the appropriate datasets where they will be replicated for building a fault tolerant system. We analytically describe our model and evaluate it through extensive simulations presenting its pros and cons.

Read more
Distributed Parallel And Cluster Computing

A Programming Model for Hybrid Workflows: combining Task-based Workflows and Dataflows all-in-one

This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics (HPDA). We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end.

Read more
Distributed Parallel And Cluster Computing

A Pub-Sub Architecture to Promote Blockchain Interoperability

The maturing of blockchain technology leads to heterogeneity, where multiple solutions specialize in a particular use case. While the development of different blockchain networks shows great potential for blockchains, the isolated networks have led to data and asset silos, limiting the applications of this technology. Blockchain interoperability solutions are essential to enable distributed ledgers to reach their full potential. Such solutions allow blockchains to support asset and data transfer, resulting in the development of innovative applications. This paper proposes a novel blockchain interoperability solution for permissioned blockchains based on the publish/subscribe architecture. We implemented a prototype of this platform to show the feasibility of our design. We evaluate our solution by implementing examples of the different publisher and subscriber networks, such as Hyperledger Besu, which is an Ethereum client, and two different versions of Hyperledger Fabric. We present a performance analysis of the whole network that indicates its limits and bottlenecks. Finally, we discuss the extensibility and scalability of the platform in different scenarios. Our evaluation shows that our system can handle a throughput in the order of the hundreds of transactions per second.

Read more

Ready to get started?

Join us today