Featured Researches

Distributed Parallel And Cluster Computing

AdEle: An Adaptive Congestion-and-Energy-Aware Elevator Selection for Partially Connected 3D NoCs

By lowering the number of vertical connections in fully connected 3D networks-on-chip (NoCs), partially connected 3D NoCs (PC-3DNoCs) help alleviate reliability and fabrication issues. This paper proposes a novel, adaptive congestion- and energy-aware elevator-selection scheme called AdEle to improve the traffic distribution in PC-3DNoCs. AdEle employs an offline multi-objective simulated-annealing-based algorithm to find good elevator subsets and an online elevator selection policy to enhance elevator selection during routing. Compared to the state-of-the-art techniques under different real-application traffics and configuration scenarios, AdEle improves the network latency by 14.9% on average (up to 21.4%) with less than 10.5% energy consumption overhead.

Read more
Distributed Parallel And Cluster Computing

Adaptive Serverless Learning

With the emergence of distributed data, training machine learning models in the serverless manner has attracted increasing attention in recent years. Numerous training approaches have been proposed in this regime, such as decentralized SGD. However, all existing decentralized algorithms only focus on standard SGD. It might not be suitable for some applications, such as deep factorization machine in which the feature is highly sparse and categorical so that the adaptive training algorithm is needed. In this paper, we propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically. To the best of our knowledge, this is the first adaptive decentralized training approach. Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers. Moreover, to reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach, which can also achieve linear speedup with respect to the number of workers. At last, extensive experiments on different tasks have confirmed the effectiveness of our proposed two approaches.

Read more
Distributed Parallel And Cluster Computing

Addestramento con Dataset Sbilanciati

English. The following document pursues the objective of comparing some useful methods to balance a dataset and obtain a trained model. The dataset used for training is made up of short and medium length sentences, such as simple phrases or extracts from conversations that took place on web channels. The training of the models will take place with the help of the structures made available by the Apache Spark framework, the models may subsequently be useful for a possible implementation of a solution capable of classifying sentences using the distributed environment, as described in "New frontier of textual classification: Big data and distributed calculation" by Massimiliano Morrelli et al. Italiano. Il seguente documento persegue l'obiettivo di mettere a confronto alcuni metodi utili a bilanciare un dataset e ottenere un modello addestrato. Il dataset utilizzato per l'addestramento è composto da frasi di lunghezza breve e media, come frasi semplici o estratte da conversazioni avvenute su canali web. L'addestramento dei modelli avverrà con l'ausilio delle strutture messe a disposizione dal framework Apache Spark, i modelli successivamente potranno essere utili a un eventuale implementazione di una soluzione in grado di classificare frasi sfruttando l'ambiente distribuito, come descritto in "Nuova frontiera della classificazione testuale: Big data e calcolo distribuito" di Massimiliano Morrelli et al.

Read more
Distributed Parallel And Cluster Computing

Algorithm for Cross-shard Cross-EE Atomic User-level ETH Transfer in Ethereum

Sharding is a way to address scalability problem in blockchain technologies. Ethereum, a prominent blockchain technology, has included sharding in its roadmap to increase its throughput. The plan is also to include multiple execution environments. We address the problem of atomic cross shard value transfer in the presence of multiple execution environments. We leverage on the proposed Ethereum architecture, more specificially on Beacon chain and crosslinks, and propose a solution on top of the netted-balance approach that was proposed for EE-level atomic ðtransfers. We split a cross-shard transfer into two transactions: a debit and a credit. First, the debit transaction is processed at the source shard. The corresponding credit transaction is processed at the destination shard in a subsequent block. We use {\em netted} shard states as channels to communicate pending credits and pending reverts. We discuss various scenarios of debit failures and credit failures, and show our approach ensures atomicity even in the presence of a Byzantine Block proposer. The benefits of our approach are that we do not use any locks nor impose any constraints on the Block Proposer to select specific transactions. However we inherit the limitation of an expensive operation from the netted-balance approach of querying partial states from all other shards. We also show a bound on the size of such inter-shard state reads.

Read more
Distributed Parallel And Cluster Computing

Algorithm-Based Checkpoint-Recovery for the Conjugate Gradient Method

As computers reach exascale and beyond, the incidence of faults will increase. Solutions to this problem are an active research topic. We focus on strategies to make the preconditioned conjugate gradient (PCG) solver resilient against node failures, specifically, the exact state reconstruction (ESR) method, which exploits redundancies in PCG. Reducing the frequency at which redundant information is stored lessens the runtime overhead. However, after the node failure, the solver must restart from the last iteration for which redundant information was stored, which increases recovery overhead. This formulation highlights the method's similarities to checkpoint-restart (CR). Thus, this method, which we call ESR with periodic storage (ESRP), can be considered a form of algorithm-based checkpoint-restart. The state is stored implicitly, by exploiting redundancy inherent to the algorithm, rather than explicitly as in CR. We also minimize the amount of data to be stored and retrieved compared to CR, but additional computation is required to reconstruct the solver's state. In this paper, we describe the necessary modifications to ESR to convert it into ESRP, and perform an experimental evaluation. We compare ESRP experimentally with previously-existing ESR and application-level in-memory CR. Our results confirm that the overhead for ESR is reduced significantly, both in the failure-free case, and if node failures are introduced. In the former case, the overhead of ESRP is usually lower than that of CR. However, CR is faster if node failures happen. We claim that these differences can be alleviated by the implementation of more appropriate preconditioners.

Read more
Distributed Parallel And Cluster Computing

All You Need is DAG

We present DAG-Rider, the first asynchronous Byzantine Atomic Broadcast protocol that achieves optimal resilience, optimal amortized communication complexity, and optimal time complexity. DAG-Rider is post-quantum safe and ensures that all messages proposed by correct processes eventually get decided. We construct DAG-Rider in two layers: In the first layer, processes reliably broadcast their proposals and build a structured Directed Acyclic Graph (DAG) of the communication among them. In the second layer, processes locally observe their DAGs and totally order all proposals with no extra communication.

Read more
Distributed Parallel And Cluster Computing

AlphaBlock: An Evaluation Framework for Blockchain Consensus Protocols

Consensus protocols play a pivotal role to balance security and efficiency in blockchain systems. In this paper, we propose an evaluation framework for blockchain consensus protocols termed as AlphaBlock. In this framework, we compare the overall performance of Byzantine Fault Tolerant (BFT) consensus and Nakamoto Consensus (NC). BFT consensus is reached by multiple rounds of quorum votes from the supermajority, while NC is reached by accumulating credibility with the implicit voting from appending blocks. AlphaBlock incorporates the key concepts of Hotstu BFT (HBFT) and Proof-of-authority (PoA) as the case study of BFT and NC. Using this framework, we compare the throughput and latency of HBFT and PoA with practical network and blockchain configurations. Our results show that the performance of HBFT dominates PoA in most scenarios due to the absence of forks in HBFT. Moreover, we find out a set of optimal configurations in AlphaBlock, which sheds a light for improving the performance of blockchain consensus algorithms.

Read more
Distributed Parallel And Cluster Computing

Amortized Constant Round Atomic Snapshot in Message-Passing Systems

We study the lattice agreement (LA) and atomic snapshot problems in asynchronous message-passing systems where up to f nodes may crash. Our main result is a crash-tolerant atomic snapshot algorithm with \textit{amortized constant round complexity}. To the best of our knowledge, the best prior result is given by Delporte et al. [TPDS, 18] with amortized O(n) complexity if there are more scans than updates. Our algorithm achieves amortized constant round if there are Ω( k − − √ ) operations, where k is the number of actual failures in an execution and is bounded by f . Moreover, when there is no failure, our algorithm has O(1) round complexity unconditionally. To achieve amortized constant round complexity, we devise a simple \textit{early-stopping} lattice agreement algorithm and use it to "order" the update and scan operations for our snapshot object. Our LA algorithm has O( k − − √ ) round complexity. It is the first early-stopping LA algorithm in asynchronous systems.

Read more
Distributed Parallel And Cluster Computing

An Adaptive Self-Scheduling Loop Scheduler

Many shared-memory parallel irregular applications, such as sparse linear algebra and graph algorithms, depend on efficient loop scheduling (LS) in a fork-join manner despite that the work per loop iteration can greatly vary depending on the application and the input. Because of its importance, many different methods, e.g., workload-aware self-scheduling, and parameters, e.g., chunk size, have been explored to achieve reasonable performance that requires expert prior knowledge about the application and input. This work proposes a new LS method that requires little to no expert knowledge to achieve speedups close to those of tuned LS methods by self-managing chunk size based on a heuristic of workload variance and using work-stealing. This method, named \ichunk, is implemented into libgomp for testing. It is evaluated against OpenMP's guided, dynamic, and taskloop methods and is evaluated against BinLPT and generic work-stealing on an array of applications that includes: a synthetic benchmark, breadth-first search, K-Means, the molecular dynamics code LavaMD, and sparse matrix-vector multiplication. On 28 thread Intel system, \ichunk is the only method to always be one of the top three LS methods. On average across all applications, \ichunk is within 5.4% of the best method and is even able to outperform other LS methods for breadth-first search and K-Means.

Read more
Distributed Parallel And Cluster Computing

An Algebraic-Topological Approach to Processing Cross-Blockchain Transactions

The state-of-the-art techniques for processing cross-blockchain transactions take a simple centralized approach: when the assets on blockchain X , say X -coins, are exchanged with the assets on blockchain Y ---the Y -coins, those X -coins need to be exchanged to a "middle" medium (such as Bitcoin) that is then exchanged to Y -coins. If there are more than two parties involved in a single global transaction, the global transaction is split into multiple local two-party transactions, each of which follows the above central-exchange protocol. Unfortunately, the atomicity of the global transaction is violated with the central-exchange approach: those local two-party transactions, once committed, cannot be rolled back if the global transaction decides to abort. In a more general sense, the graph-based model of (two-party) transactions can hardly be extended to an arbitrary number of parties in a cross-blockchain transaction. %from why to how In this paper, we introduce a higher-level abstraction of cross-blockchain transactions. We adopt the \textit{abstract simplicial complex}, an extensively-studied mathematical object in algebraic topology, to represent an arbitrary number of parties involved in the blockchain transactions. Essentially, each party in the global transaction is modeled as a vertex and the global transaction among n+1 ( n∈Z , n>0 ) parties compose a n -dimensional simplex. While this higher-level abstraction seems plausibly trivial, we will show how this simple extension leads to a new line of modeling methods and protocols for better processing cross-blockchain transactions.

Read more

Ready to get started?

Join us today