Dong Dai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dong Dai is active.

Explore More

Publication

Featured researches published by Dong Dai.

ieee international conference on high performance computing data and analytics | 2014

Two-choice randomized dynamic I/O scheduler for object storage systems

Dong Dai; Yong Chen; Dries Kimpe; Robert B. Ross

Object storage is considered a promising solution for next-generation (exascale) high-performance computing platform because of its flexible and high-performance object interface. However, delivering high burst-write throughput is still a critical challenge. Although deploying more storage servers can potentially provide higher throughput, it can be ineffective because the burst-write throughput can be limited by a small number of stragglers (storage servers that are occasionally slower than others). In this paper, we propose a two-choice randomized dynamic I/O scheduler that schedules the concurrent burst-write operations in a balanced way to avoid stragglers and hence achieve high throughput. The contributions in this study are threefold. First, we propose a two-choice randomized dynamic I/O scheduler with collaborative probe and preassign strategies. Second, we design and implement a redirect table and metadata maintainer to address the metadata management challenge introduced by dynamic I/O scheduling. Third, we evaluate the proposed scheduler with both simulation tests and experimental tests in an HPC cluster. The evaluation results confirm the scalability and performance benefits of the proposed I/O scheduler.

ieee international conference on high performance computing data and analytics | 2014

Using property graphs for rich metadata management in HPC systems

Dong Dai; Robert B. Ross; Philip H. Carns; Dries Kimpe; Yong Chen

HPC platforms are capable of generating huge amounts of metadata about different entities including jobs, users, and files. Simple metadata, which describe the attributes of these entities (e.g., file size, name, and permissions mode), has been well recorded and used in current systems. However, only a limited amount of rich metadata, which records not only the attributes of entities but also relationships between them, are captured in current HPC systems. Rich metadata may include information from many sources, including users and applications, and must be integrated into a unified framework. Collecting, integrating, processing, and querying such a large volume of metadata pose considerable challenges for HPC systems. In this paper, we propose a rich metadata management approach that unifies metadata into one generic property graph. We argue that this approach supports not only simple metadata operations such as directory traversal and permission validation but also rich metadata operations such as provenance query and security auditing. The property graph approach provides an extensible method to store diverse metadata and presents an opportunity to leverage rapidly evolving graph storage and processing techniques.

international conference on big data | 2014

Provenance-based object storage prediction scheme for scientific big data applications

Dong Dai; Yong Chen; Dries Kimpe; Robert B. Ross

Object storage has been increasingly adopted in high-performance computing for scientific, big data applications. With object storage, applications usually use object IDs, queries, or collections to identify the data instead of using files. Since the object store changes the way data is accessed in applications, it introduces new challenges for I/O prediction, which used to work based on interfile or intrafile pattern detection. The key challenge is that the inputs of object-based applications are no longer expressed as static file names: they become much more dynamic and unstable, hidden inside application logic. Traditional prediction strategies do not work well in such conditions. In this paper, we introduce the use of provenance information, which was collected for data management in high-performance computing systems, in order to build an accurate coarse-grained (object-level) input prediction. The prediction results can be preloaded into a burst buffer to accelerate future reads. To our best knowledge, this study is the first to use provenance information in object stores to predict application inputs. Evaluation results confirm the effectiveness and accuracy of our provenance-based prediction and show that the proposed prediction system is feasible for real-work deployment.

international conference on cluster computing | 2012

Cloud Based Short Read Mapping Service

Dong Dai; Xi Li; Chao Wang; Xuehai Zhou

Bioinformatics is an emerging field with seemingly limitless possibilities for advances in numerous scientific research and applications domains. In this paper, we summaries the explosive cutting-edge acceleration engines for the emerging short read mapping problems. Whats more, we propose a novel Cloud based web service solution to the short read mapping problem in DNA sequencing, which greatly accelerates the task of aligning continuous incoming short length reads to uncertain known reference genomes. This approach is based on the pre-process of the reference genomes and iterative MapReduce jobs for aligning the continuous incoming reads. The MapReduce-based read-mapping algorithm is modeled after RMAP. Preliminary experimental results on incorporated MapReduce programming framework demonstrate that our proposed architecture and methods efficiently reduces the waiting time for large scale short reads applications. This architecture would be much important and efficient in future commercial personal gnome sequencing service.

international conference on applications of digital information and web technologies | 2014

Bwasw-Cloud: Efficient sequence alignment algorithm for two big data with MapReduce

Mingming Sun; Xuehai Zhou; Feng Yang; Kun Lu; Dong Dai

The recent next-generation sequencing machines generate sequences at an unprecedented rate, and a sequence is not short any more called read. The reference sequences which are aligned reads against are also increasingly large. Efficiently mapping large number of long sequences with big reference sequences poses a new challenge to sequence alignment. Sequence alignment algorithms become to match on two big data. To address the above problem, we propose a new parallel sequence alignment algorithm called Bwasw-Cloud, optimized for aligning long reads against a large sequence data (e.g. the human genome). It is modeled after the widely used BWA-SW algorithm and uses the open-source Hadoop implementation of MapReduce. The results show that Bwasw-Cloud can effectively and quickly match two big data in common cluster.

high performance distributed computing | 2014

Domino: an incremental computing framework in cloud with eventual synchronization

Dong Dai; Yong Chen; Dries Kimpe; Robert B. Ross; Xuehai Zhou

In recent years, more and more applications in cloud have needed to process large-scale on-line data sets that evolve over time as entries are added or modified. Several programming frameworks, such as Percolator and Oolong, are proposed for such incremental data processing and can achieve efficient updates with an event-driven abstraction. However, these frameworks are inherently asynchronous, leaving the heavy burden of managing synchronization to applications developers. Such a limitation significantly restricts their usability. In this paper, we introduce a trigger-based incremental computing framework, called Domino, with a flexible synchronization mechanism and runtime optimizations to coordinate parallel triggers efficiently. With this new framework, both synchronous and asynchronous applications can be seamlessly developed. Use cases and current evaluation results confirm that the new Domino programming model delivers sufficient performance and is easy to use in large-scale distributed computing.

international conference on cluster computing | 2012

Phase Detection for Loop-Based Programs on Multicore Architectures

Chao Wang; Xi Li; Dong Dai; Gangyong Jia; Xuehai Zhou

Phase detection and behavior analysis have been major concerned to improve the performance as well as the system throughputs. However, for the distributed acceleration engines, the execution among different phases is much more difficult to be analyzed, especially for the loop based programs. With respect to the tasks in different iterations, how to efficiently detect the phases belonging to the same loop iteration or even across iterations is posing significant challenge. In this paper we propose a phase detection method for loop-based programs on multiprocessor system-on-chip (MPSoC). A cross compiling tool based on state-of-the-art ARM RVDS is employed to locate the hot spot function of the program. Based on the hot spots, we target the function optimization on a hadoop cluster for performance evaluation. The preliminary experimental results demonstrate that our proposed techniques can extract the hot block function with high accuracy and modest overheads. The method can be applied to guide the optimization and adaptive mapping scheme on MPSoC architectures.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2017

SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient

Chao Wang; Dong Dai; Xi Li; Aili Wang; Xuehai Zhou

The maximal information coefficient (MIC) has been proposed to discover relationships and associations between pairs of variables. It poses significant challenges for bioinformatics scientists to accelerate the MIC calculation, especially in genome sequencing and biological annotations. In this paper, we explore a parallel approach which uses MapReduce framework to improve the computing efficiency and throughput of the MIC computation. The acceleration system includes biological data storage on HDFS, preprocessing algorithms, distributed memory cache mechanism, and the partition of MapReduce jobs. Based on the acceleration approach, we extend the traditional two-variable algorithm to multiple variables algorithm. The experimental results show that our parallel solution provides a linear speedup comparing with original algorithm without affecting the correctness and sensitivity.

international conference on parallel processing | 2016

Log-Assisted Straggler-Aware I/O Scheduler for High-End Computing

Neda Tavakoli; Dong Dai; Yong Chen

Object-based parallel file systems have emerged as promising storage solutions for high-end computing (HEC) systems. Despite the fact that object storage provides a flexible interface, scheduling highly concurrent I/O requests that access a large number of objects still remains as a challenging problem, especially in the case when stragglers (storage servers that are significantly slower than others) exist in the system. An efficient I/O scheduler needs to avoid possible stragglers to achieve low latency and high throughput. In this paper, we introduce a log-assisted straggler-aware I/O scheduling to mitigate the impact of storage server stragglers. The contribution of this study is threefold. First, we introduce a client-side, log-assisted, straggler-aware I/O scheduler architecture to tackle the storage straggler issue in HEC systems. Second, we present two scheduling algorithms that can make efficient decision on scheduling I/Os while avoiding stragglers based on such an architecture. Third, we evaluate the proposed I/O scheduler using simulations. The simulation results have confirmed the promise of the newly introduced log-assisted straggler-aware I/O scheduler in large-scale HEC systems.

international conference on parallel and distributed systems | 2014

Combine thread with memory scheduling for maximizing performance in multi-core systems

Gangyong Jia; Guangjie Han; Liang Shi; Jian Wan; Dong Dai

The growing gap between microprocessor speed and DRAM speed is a major problem that computer designers are facing. In order to narrow the gap, it is necessary to improve DRAMs speed and throughput. Moreover, on multi-core platforms, DRAM memory shared by all cores usually suffers from the memory contention and interference problem, which can cause serious performance degradation and unfairness among parallel running threads. To address these problems, this paper proposes techniques to take both advantages of partitioning cores, threads and memory banks into groups to reduce interference among different groups and grouping the memory accesses of the same row together to reduce cache miss rate. A memory optimization framework combined thread scheduling with memory scheduling (CTMS) is proposed in this paper, which simultaneously minimizes memory access schedule length, memory access time and reduce interference to maximize performance for multi-core systems. Experimental results show CTMS is 12.6% shorter in memory access time, while improving 11.8% throughput on average. Moreover, CTMS also saves 5.8% of the energy consumption.

Explore More