Featured Researches

Distributed Parallel And Cluster Computing

"Reduction of Monetary Cost in Cloud Storage System by Using Extended Strict Timed Causal Consistency"

Cloud storage systems have been introduced to provide a scalable, secure, reliable, and highly available data storage environment for the organizations and end-users. Therefore, the service provider should grow in a geographical extent. Consequently, extensive storage service provision requires a replication mechanism. Replication imposes many costs on the cloud storage, including the synchronization, communications, storage, etc., costs among the replicas. Moreover, the synchronization process among replicas is a major challenge in cloud storage. Therefore, consistency can be defined as the coordination among the replicas. In this paper, we propose an extension to the strict timed causal consistency by adding the considerations for the monetary costs and the number of violations in the cloud storage systems and call it the extended strict timed causal consistency. Our proposed supports monotonic read, read your write, monotonic write, and write follow read, models by taking into account the causal relations between users' operations, at the client-side. Besides, it supports timed causal at the server-side. We employed the Cassandra cloud database that supports various consistencies such as all, one, quorum, etc. Our method performs better in reducing staleness rate, the severity of violations, and monetary cost in comparison with all, one, quorum, and causal.

Read more
Distributed Parallel And Cluster Computing

8 Steps to 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks

Performance optimization can be a daunting task especially as the hardware architecture becomes more and more complex. This paper takes a kernel from the Materials Science code BerkeleyGW, and demonstrates a few performance analysis and optimization techniques. Despite challenges such as high register usage, low occupancy, complex data access patterns, and the existence of several long-latency instructions, we have achieved 3.7 TFLOP/s of double-precision performance on an NVIDIA V100 GPU, with 8 optimization steps. This is 55% of the theoretical peak, 6.7 TFLOP/s, at nominal frequency 1312 MHz, and 70% of the more customized peak based on our 58% FMA ratio, 5.3 TFLOP/s. An array of techniques used to analyze this OpenACC kernel and optimize its performance are shown, including the use of hierarchical Roofline performance model and the performance tool Nsight Compute. This kernel exhibits computational characteristics that are commonly seen in many high-performance computing (HPC) applications, and are expected to be very helpful to a general audience of HPC developers and computational scientists, as they pursue more performance on NVIDIA GPUs.

Read more
Distributed Parallel And Cluster Computing

A Big Data Approach for Sequences Indexing on the Cloud via Burrows Wheeler Transform

Indexing sequence data is important in the context of Precision Medicine, where large amounts of ``omics'' data have to be daily collected and analyzed in order to categorize patients and identify the most effective therapies. Here we propose an algorithm for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop. Our approach is the first that distributes the index computation and not only the input dataset, allowing to fully benefit of the available cloud resources.

Read more
Distributed Parallel And Cluster Computing

A Big Data Lake for Multilevel Streaming Analytics

Large organizations are seeking to create new architectures and scalable platforms to effectively handle data management challenges due to the explosive nature of data rarely seen in the past. These data management challenges are largely posed by the availability of streaming data at high velocity from various sources in multiple formats. The changes in data paradigm have led to the emergence of new data analytics and management architecture. This paper focuses on storing high volume, velocity and variety data in the raw formats in a data storage architecture called a data lake. First, we present our study on the limitations of traditional data warehouses in handling recent changes in data paradigms. We discuss and compare different open source and commercial platforms that can be used to develop a data lake. We then describe our end-to-end data lake design and implementation approach using the Hadoop Distributed File System (HDFS) on the Hadoop Data Platform (HDP). Finally, we present a real-world data lake development use case for data stream ingestion, staging, and multilevel streaming analytics which combines structured and unstructured data. This study can serve as a guide for individuals or organizations planning to implement a data lake solution for their use cases.

Read more
Distributed Parallel And Cluster Computing

A Bilateral Game Approach for Task Outsourcing in Multi-access Edge Computing

Multi-access edge computing (MEC) is a promising architecture to provide low-latency applications for future Internet of Things (IoT)-based network systems. Together with the increasing scholarly attention on task offloading, the problem of edge servers' resource allocation has been widely studied. Most of previous works focus on a single edge server (ES) serving multiple terminal entities (TEs), which restricts their access to sufficient resources. In this paper, we consider a MEC resource transaction market with multiple ESs and multiple TEs, which are interdependent and mutually influence each other. However, this many-to-many interaction requires resolving several problems, including task allocation, TEs' selection on ESs and conflicting interests of both parties. Game theory can be used as an effective tool to realize the interests of two or more conflicting individuals in the trading market. Therefore, we propose a bilateral game framework among multiple ESs and multiple TEs by modeling the task outsourcing problem as two noncooperative games: the supplier and customer side games. In the first game, the supply function bidding mechanism is employed to model the ESs' profit maximization problem. The ESs submit their bids to the scheduler, where the computing service price is computed and sent to the TEs. While in the second game, TEs determine the optimal demand profiles according to ESs' bids to maximize their payoff. The existence and uniqueness of the Nash equilibrium in the aforementioned games are proved. A distributed task outsourcing algorithm (DTOA) is designed to determine the equilibrium. Simulation results have demonstrated the superior performance of DTOA in increasing the ESs' profit and TEs' payoff, as well as flattening the peak and off-peak load.

Read more
Distributed Parallel And Cluster Computing

A Brief Survey on Replica Consistency in Cloud Environments

Cloud computing is a general term that involves delivering hosted services over the Internet. With the accelerated growth of the volume of data used by applications, many organizations have moved their data into cloud servers to provide scalable, reliable and highly available services. A particularly challenging issue that arises in the context of cloud storage systems with geographically-distributed data replication is how to reach a consistent state for all replicas. This survey reviews major aspects related to consistency issues in cloud data storage systems, categorizing recently proposed methods into three categories: (1) fixed consistency methods, (2) configurable consistency methods and (3) consistency monitoring methods.

Read more
Distributed Parallel And Cluster Computing

A Case For Adaptive Deep Neural Networks in Edge Computing

Edge computing offers an additional layer of compute infrastructure closer to the data source before raw data from privacy-sensitive and performance-critical applications is transferred to a cloud data center. Deep Neural Networks (DNNs) are one class of applications that are reported to benefit from collaboratively computing between the edge and the cloud. A DNN is partitioned such that specific layers of the DNN are deployed onto the edge and the cloud to meet performance and privacy objectives. However, there is limited understanding of: (a) whether and how evolving operational conditions (increased CPU and memory utilization at the edge or reduced data transfer rates between the edge and the cloud) affect the performance of already deployed DNNs, and (b) whether a new partition configuration is required to maximize performance. A DNN that adapts to changing operational conditions is referred to as an 'adaptive DNN'. This paper investigates whether there is a case for adaptive DNNs in edge computing by considering three questions: (i) Are DNNs sensitive to operational conditions? (ii) How sensitive are DNNs to operational conditions? (iii) Do individual or a combination of operational conditions equally affect DNNs? (iv) Is DNN partitioning sensitive to hardware architectures on the cloud/edge? The exploration is carried out in the context of 8 pre-trained DNN models and the results presented are from analyzing nearly 8 million data points. The results highlight that network conditions affects DNN performance more than CPU or memory related operational conditions. Repartitioning is noted to provide a performance gain in a number of cases, but a specific trend was not noted in relation to its correlation to the underlying hardware architecture. Nonetheless, the need for adaptive DNNs is confirmed.

Read more
Distributed Parallel And Cluster Computing

A Competitive Edge: Can FPGAs Beat GPUs at DCNN Inference Acceleration in Resource-Limited Edge Computing Applications?

When trained as generative models, Deep Learning algorithms have shown exceptional performance on tasks involving high dimensional data such as image denoising and super-resolution. In an increasingly connected world dominated by mobile and edge devices, there is surging demand for these algorithms to run locally on embedded platforms. FPGAs, by virtue of their reprogrammability and low-power characteristics, are ideal candidates for these edge computing applications. As such, we design a spatio-temporally parallelized hardware architecture capable of accelerating a deconvolution algorithm optimized for power-efficient inference on a resource-limited FPGA. We propose this FPGA-based accelerator to be used for Deconvolutional Neural Network (DCNN) inference in low-power edge computing applications. To this end, we develop methods that systematically exploit micro-architectural innovations, design space exploration, and statistical analysis. Using a Xilinx PYNQ-Z2 FPGA, we leverage our architecture to accelerate inference for two DCNNs trained on the MNIST and CelebA datasets using the Wasserstein GAN framework. On these networks, our FPGA design achieves a higher throughput to power ratio with lower run-to-run variation when compared to the NVIDIA Jetson TX1 edge computing GPU.

Read more
Distributed Parallel And Cluster Computing

A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs

Many state-of-the-art Deep Neural Networks (DNNs) have substantial memory requirements. Limited device memory becomes a bottleneck when training those models. We propose ParDNN, an automatic, generic, and non-intrusive partitioning strategy for DNNs that are represented as computational graphs. ParDNN decides a placement of DNN's underlying computational graph operations across multiple devices so that the devices' memory constraints are met and the training time is minimized. ParDNN is completely independent of the deep learning aspects of a DNN. It requires no modification neither at the model nor at the systems level implementation of its operation kernels. ParDNN partitions DNNs having billions of parameters and hundreds of thousands of operations in seconds to few minutes. Our experiments with TensorFlow on 16 GPUs demonstrate efficient training of 5 very large models while achieving superlinear scaling for both the batch size and training throughput. ParDNN either outperforms or qualitatively improves upon the related work.

Read more
Distributed Parallel And Cluster Computing

A Consent Model for Blockchain-based Distributed Data Sharing Platforms

In modern healthcare systems, being able to share electronic health records is crucial for providing quality care and for enabling a larger spectrum of health services. Health data sharing is dependent on obtaining individual consent which, in turn, is hindered by a lack of resources. To this extend, blockchain-based platforms facilitate data sharing by inherently creating a trusted distributed network of users. These users are enabled to share their data without depending on the time and resources of specific players (such as the health services). In blockchain-based platforms, data governance mechanisms become very important due to the need to specify and monitor data sharing and data use conditions. In this paper, we present a blockchain-based data sharing consent model for access control over individual health data. We use smart contracts to dynamically represent the individual consent over health data and to enable data requesters to search and access them. The dynamic consent model extends upon two ontologies: the Data Use Ontology (DUO) which models the individual consent of users and the Automatable Discovery and Access Matrix (ADA-M) which describes queries from data requesters. We deploy the model on Ethereum blockchain and evaluate different data sharing scenarios. The contribution of this paper is to create an individual consent model for health data sharing platforms. Such a model guarantees that individual consent is respected and that there is accountability for all the participants in the data sharing platform. The evaluation of our solution indicates that such a data sharing model provides a flexible approach to decide how the data is used by data requesters. Our experimental evaluation shows that the proposed model is efficient and adapts to personalized access control policies in data sharing.

Read more

Ready to get started?

Join us today