Jialin Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jialin Liu is active.

Explore More

Publication

Featured researches published by Jialin Liu.

international conference on big data | 2013

Segmented analysis for reducing data movement

Jialin Liu; Surendra Byna; Yong Chen

Many scientific applications nowadays generate a few terabytes (TB) of data in a single run and the data sizes are expected to reach petabytes (PB) in the near future. Enabling fast extraction of knowledge through analyzing these large datasets holds the key to faster scientific discoveries. However, reading data from traditional storage subsystem is a slow process as the I/O performance lags far behind computational performance. Reducing data movement from the storage subsystem is widely considered a viable option for improving performance of data analysis. In this paper, we propose Segmented Analysis, a data movement reduction strategy through reusing results, where multiple similar analysis tasks process the same segments of data. The basic idea is to segment the data accessed in an analysis task, to process the data segments with a given analysis task, and to store the results of segments in a cache for future use. In future, when an analysis task needs to perform the same process on the data segments for which the results are available in the cache, the task can avoid moving data and performing computation for the available results. The Segmented Analysis framework contains modules for computation and I/O access overlap detection, in situ segmentation, and segment result caching. We evaluate the Segmented Analysis strategy by varying factors like the overlap rate among analysis tasks, the request size and the granularity of segmentation. We observed 2X to 13X I/O and to 2X to 8X computation speedups when the overlap is above 50%.

ieee international conference on high performance computing data and analytics | 2012

Improving Data Analysis Performance for High-Performance Computing with Integrating Statistical Metadata in Scientific Datasets

Jialin Liu; Yong Chen

Scientific datasets and libraries, such as HDF5, ADIOS, and NetCDF, have been used widely in many data intensive applications. These libraries have their special file formats and I/O functions to provide efficient access to large datasets. When the data size keeps increasing, these high level I/O libraries face new challenges. Recent studies have started to utilize database techniques such as indexing and subsetting, and data reorganization to manage the increasing datasets. In this work, we present a new approach to boost the data analysis performance, namely Fast Analysis with Statistical Metadata (FASM), via data subsetting and integrating a small amount of statistics into the original datasets. The added statistical information illustrates the data shape and provides knowledge of the data distribution; therefore the original I/O libraries can utilize these statistical metadata to perform fast queries and analyses. The proposed FASM approach is currently evaluated with the PnetCDF on Lustre file systems, but can also be implemented with other scientific libraries. The FASM can potentially lead to a new dataset design and can have an impact on big data analysis.

international parallel and distributed processing symposium | 2014

Model-Driven Data Layout Selection for Improving Read Performance

Jialin Liu; Surendra Byna; Bin Dong; Kesheng Wu; Yong Chen

Performance of reading scientific data from a parallel file system depends on the organization of data on physical storage devices. Data is often immutable after producers of data, such as large-scale simulations, experiments, and observations, write the data to the parallel file system. As a result, read performance of data analysis tasks is often slow when the read pattern does not conform with the original organization of the data. For example, reading small noncontiguous chunks of data from a large array is many times slower than reading the same size of contiguous chunks of data. Towards improving the data read performance during analysis phase, we are developing the Scientific Data Services (SDS) framework for automatically reorganizing previously written data to conform with the known read patterns. In this paper, we introduce a model-driven strategy for selecting the data layouts that benefit the performance of different read patterns. We have developed a parallel I/O model based on the striping parameters on Lustre file system and the block-level striping on RAID-based disks within an Object Storage Target (OST) of Lustre. We have applied the model to reorganize large 3D array datasets on a Cray XE6 platform and achieved 9X to 128X improvement in accessing the reorganized data compared to reading the data in its original layout.

international parallel and distributed processing symposium | 2016

PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

Md. Mostofa Ali Patwary; Nadathur Satish; Narayanan Sundaram; Jialin Liu; Peter J. Sadowski; Evan Racah; Surendra Byna; Craig Tull; Wahid Bhimji; Prabhat; Pradeep Dubey

Computing k-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based O(log n) algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited scalability for Big Data analytics challenges in the scientific domain. In this paper, we present parallel and highly optimized kd*tree based KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includes novel approaches for pruning search space and improving load balancing and partitioning among nodes and threads. Using TB-sized datasets from three science applications: astrophysics, plasma physics, and particle physics, we show that our implementation can construct kd-tree of 189 billion particles in 48 seconds on utilizing ~50,000 cores. We also demonstrate computation of KNN of 19 billion queries in 12 seconds. We demonstrate almost linear speedup both for shared and distributed memory computers. Our algorithms outperforms earlier implementations by more than order of magnitude, thereby radically improving the applicability of our implementation to state-of-the-art Big Data analytics problems.

international conference on cluster computing | 2013

Fast data analysis with integrated statistical metadata in scientific datasets

Jialin Liu; Yong Chen

Scientific datasets and libraries, such as HDF5, ADIOS, and NetCDF, have been used widely in many data-intensive applications. These libraries have their special file formats and I/O functions to provide efficient access to large datasets. Recent studies have started to utilize indexing, subsetting, and data reorganization to manage the increasingly large datasets. In this work, we present an approach to boost the data analysis performance, namely Fast Analysis with Statistical Metadata (FASM), via data subsetting and integrating a small amount of statistics into the original datasets. The added statistical information illustrates the data shape and provides knowledge of the data distribution; therefore the original I/O libraries can utilize these statistical metadata to perform fast queries and analyses. Various subsetting schemes can affect the access pattern and the I/O performance. We present a comparison study of different subsetting schemes by focusing on three dominant factors, the shape, the concurrency, and the locality. The added statistical metadata slightly increases the original data size, and we evaluate the cost and trade-off as well. This work is the first study that utilizes statistical metadata with various subsetting schemes to perform fast queries and analyses on large datasets. The proposed FASM approach is currently evaluated with the PnetCDF on Lustre file systems, but can also be implemented with other scientific libraries. The FASM can potentially lead to a new dataset design and can have an impact on big data analysis.

international conference on parallel processing | 2012

Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets

Jialin Liu; Yong Chen

international conference on parallel processing | 2014

Using Working Set Reorganization to Manage Storage Systems with Hard and Solid State Disks

Junjie Chen; Jialin Liu; Philip C. Roth; Yong Chen

Scientific applications from many problem domains produce and/or access large volumes of data. To support these applications, designers of high-end computing (HEC) systems have greatly increased the capacity of storage systems in recent years. However, because hard disk drives (HDDs) are still the dominant storage device used in HEC storage systems, and because HDD performance has not improved as quickly as the capacity, it can be challenging to deploy a storage system that provides both extreme capacity and extreme performance at a reasonable cost. Solid State Drives (SSDs) are a promising high- bandwidth and low-latency alternative to HDDs for HEC storage systems, but they too have deficiencies: small capacity, limited write cycles, and high cost when compared to HDDs. Because of their complementary characteristics, storage system designers are beginning to consider heterogeneous storage system designs that include both HDDs and SSDs. However, managing the workload so as to take advantage of the strengths of each type of storage device while controlling overhead is a major challenge. In this study, we propose a novel approach for managing a heterogeneous storage system called the Working Set-based Reorganization Scheme (WS-ROS). With WS-ROS, applications write to both HDDs and SSDs using all the available storage system bandwidth. Later, a background process reorganizes the data so as to place the data most likely to be read on SSDs while relegating the data most likely to be written and the data not likely to be accessed onto the slower but higher-capacity HDDs. For our evaluation workloads, the WS-ROS approach provided a 3× to 10× performance improvement compared to a heterogeneous storage system without a working set-based data reorganization scheme, suggesting the value of lazy reorganization of data based on data access working sets.

international conference on parallel processing | 2018

Contention-Aware Resource Scheduling for Burst Buffer Systems

Weihao Liang; Yong Chen; Jialin Liu; Hong An

Many scientific applications in critical areas are becoming more and more data-intensive. As the data volume continues to grow, the data movement between storage and compute nodes has turned into a crucial performance bottleneck for many data-intensive applications. Burst buffer provides a promising solution for these applications by absorbing bursty I/O traffic to let applications return to computing phase quickly. However, the resource allocation policy for burst buffer is understudied, and the existing strategies may cause severe I/O contention when a large number of I/O-intensive jobs access the burst buffer system concurrently. In this study, based on the theoretic analysis of I/O model, we present a contention-aware resource scheduling (CARS) strategy that manages the burst buffer resource to coordinate concurrent jobs. Extensive experiments have been conducted and the results have demonstrated that the proposed CARS design outperforms the existing allocation strategies and improves both job performance and system utilization.

IEEE Transactions on Cloud Computing | 2016

Segmented In-Advance Data Analytics for Fast Scientific Discovery

Jialin Liu; Yong Chen

Scientific discovery usually involves data generation, data preprocessing, data storage and data analysis. As the data volume exceeds a few terabytes (TB) in a single simulation run, the data movement, which happens during each cycle of the scientific discovery, continues to be the bottleneck in most scientific big data applications. A lot of research works have been conducted on reducing the data movement. Among the existing efforts and based on our previous research, reusing the analysis results shows a significant potential in optimizing the data movement between analysis operations. In this work, we propose the Segmented In-Advance (SIA) data analytics approach for optimizing the data movement and we also provide a cloud-based elastic distributed in-memory database to manage the intermediate analysis results. The fundamental idea of this Segmented In-Advance approach is to analyze the history operations and to predict the future interesting analytics operations. The predicted analysis operation is in-advance performed on the finer segmented dataset and the segmented results are distributed in an in-memory key-value store for future reuse. The evaluation shows that the segmented in-advance data analytics approach achieves 1.2X-6.1X speedup. The evaluation also shows a good scalability of the in-memory distributed data store. The proposed Segmented In-Advance data analytics approach is a promising data movement reduction solution for scientific big data applications and fast scientific discovery.

international conference on parallel processing | 2015

Collective Computing for Scientific Big Data Analysis

Jialin Liu; Yong Chen; Surendra Byna

Big science discovery requires an efficient computing framework in the high performance computing architecture. Traditional scientific data analysis relies on Message Passing Interface (MPI) and MPI-IO to achieve fast computing and low I/O bottleneck. Among them, two-phase collective I/O is commonly used to reduce data movement by optimizing the non-contiguous I/O pattern. However, the inherent constraint of collective I/O prevents it from having a flexible combination with computing and lacks an efficient non-blocking I/O-Computing framework in current HPC. In this work, we propose Collective Computing, a framework that breaks the constraint of the two-phase collective I/O and provides an efficient non-blocking computing paradigm with runtime support. The fundamental idea is to move the analysis stage in advance and insert the computation into the two-phase I/O, such that the data in the first I/O phase can be computed in place and the second shuffle phase is minimized with a reduce operation. We motivate this idea by profiling the I/O and CPU usage. With both theoretical analysis and evaluation on real application and benchmarks, we show that the collective computing can achieve 2.5X speedup and is promising in big scientific data analysis.

Explore More