Guangchen Ruan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guangchen Ruan is active.

Explore More

Publication

Featured researches published by Guangchen Ruan.

scientific cloud computing | 2014

Cloud computing data capsules for non-consumptiveuse of texts

Jiaan Zeng; Guangchen Ruan; Alexander Crowell; Atul Prakash; Beth Plale

As digital data sources grow in number and size, they pose an opportunity for computational investigation by means of text mining, natural language processing (NLP), and other text analysis techniques. In this paper we propose a virtual machine (VM) framework and methodology for non-consumptive text analysis. Using a remote VM model, the VM is configured with software and tooling for text analysis. When completed, the VM is wiped out and resources released for other users to share. Our approach extends the VM by turning it into a data capsules that prevents leakage of copyrighted content in the event that the VM is compromised. The HathiTrust Research Center Data Capsules has seen early use in application against the HathiTrust repository of digitized books from university libraries nationwide.

international conference on big data | 2014

Parallel and quantitative sequential pattern mining for large-scale interval-based temporal data

Guangchen Ruan; Hui Zhang; Beth Plale

Mining frequent subsequences of patterns, or sequential pattern mining, has wide application in customer shopping sequence analysis, web log stream analysis, multi-modal behavioral studies, to name a few. To detect unknown, anomalous, and unexpected patterns from large-scale interval-based temporal data without complete a priori knowledge is challenging. In this paper, we present a framework - PESMiner which allows parallel and quantitative mining of sequential patterns at scale. Whereas most existing sequential mining algorithms can only find sequential orders of temporal events, our work presents a novel interactive temporal data mining algorithm capable of extracting precise temporal properties of sequential patterns. Furthermore, our work provides a unified parallel solution that scales our algorithms to larger temporal data sets by exploiting iterative MapReduce tasks. Comprehensive performance evaluations demonstrate that PESMiner significantly outperforms existing interval-based mining algorithms in terms of both quality (i.e. accuracy, precision, and recall) and scalability.

extreme science and engineering discovery environment | 2013

XSEDE-enabled high-throughput lesion activity assessment

Hui Zhang; Michael Boyles; Guangchen Ruan; Huian Li; Hongwei Shen; Masatoshi Ando

Caries lesion activity assessment has been a routine diagnostic procedure in dental caries management, traditionally employing subjective measurements incorporating visual and tactile inspections. Recently, advances in 2D/3D image processing and analysis methods and microfocus x-ray computerized tomography (μ-CT) hardware, together with increased power of high performance computing, have created a synergic effect that is revolutionizing many fields in dental computing. In this paper, we report such an XSEDE-enabled high-throughput lesion activity assessment workflow that exploits 2D/3D image processing, visual analytics, and high performance computing technologies. Our paper starts with a brief introduction of the image dataset in our dental studies. We then proceed to a family of 2D image analysis, ROI segmentation, and 3D geometric construction methods. By combining dental imaging technology and 2D/3D image processing algorithms, we transform the task of lesion activity assessment into a 3D-time series analysis of computer generated lesion models. Building on the computational algorithms and implementation models, we develop a high-throughput dental computing workflow exploiting MapReduce tasks to parallelize the image analysis of dental CT scans, the segmentation of region-of-interest (ROI), and the 3D construction of lesion volumes. We showcase the employment of 3D-time series analysis and several other information representations that are applied to our lesion activity assessment scenario focusing on large scale dental image data.

Ecological Informatics | 2017

Mining lake time series using symbolic representation

Guangchen Ruan; Paul C. Hanson; Hilary A. Dugan; Beth Plale

Sensor networks deployed in lakes and reservoirs, when combined with simulation models and expert knowledge from the global community, are creating deeper understanding of the ecological dynamics of lakes. However, the amount of data and the complex patterns in the data demand substantial compute resources and efficient data mining algorithms, both of which are beyond the realm of traditional limnological research. This paper uniquely adapts methods from computer science for application to data intensive ecological questions, in order to provide ecologists with approachable methodology to facilitate knowledge discovery in lake ecology. We apply a state-of-the-art time series mining technique based on symbolic representation (SAX) to high-frequency time series of phycocyanin (PHYCO) and chlorophyll (CHLORO) fluorescence, both of which are indicators of algal biomass in lakes, as well as model predictions of algal biomass (MODEL). We use data mining techniques to demonstrate that MODEL predicts PHYCO better than it predicts CHLORO. All time series have high redundancy, resulting in a relatively small subset of unique patterns. However, MODEL is much less complex than either PHYCO or CHLORO and fails to reproduce high biomass periods indicative of algal blooms. We develop a set of tools in R to enable motif discovery and anomaly detection within a single lake time series, and relationship study among multiple lake time series through distance metrics, clustering and classification. Furthermore, to improve computation times, we provision web services to launch R tools remotely on high performance computing (HPC) resources. Comprehensive experimental results on observational and simulated lake data demonstrate the effectiveness of our approach.

Proceedings of the Practice and Experience on Advanced Research Computing | 2018

High Performance Photogrammetry for Academic Research

Guangchen Ruan; Eric A. Wernert; Tassie Gniady; Esen Tuna; William R. Sherman

Photogrammetry is the process of computationally extracting a three-dimensional surface model from a set of two-dimensional photographs of an object or environment. It is used to build models of everything from terrains to statues to ancient artifacts. In the past, the computational process was done on powerful PCs and could take weeks for large datasets. Even relatively small objects often required many hours of compute time to stitch together. With the availability of parallel processing options in the latest release of state-of-the-art photogrammetry software, it is possible to leverage the power of high performance computing systems on large datasets. In this paper we present a particular implementation of a high performance photogrammetry service. Though the service is currently based on a specific software package (Agisofts PhotoScan), our system architecture is designed around a general photogrammetry process that can be easily adapted to leverage other photogrammetry tools. In addition, we report on an extensive performance study that measured the relative impacts of dataset size, software quality settings, and processing cluster size. Furthermore, we share lessons learned that are useful to system administrators looking to establish a similar service, and we describe the user-facing support components that are crucial for the success of the service.

international conference on cluster computing | 2016

Horme: Random Access Big Data Analytics

Guangchen Ruan; Beth Plale

MapReduce is a parallel framework which has been widely adopted for conducting large-scale data analytics. In cases where analysis of multiple millions of books must be analyzed using federally funded high performance computing (HPC) resources, the framework fails to port directly. We propose a solution that builds off of MapReduce for use on a HPC system that preserves the key-value semantics of map-reduce while supporting the random access of query access for subsetting Big Data datasets, and at same time hosting the service using the storage medium found in HPC architectures (parallel file systems) for reduced latencies. Experimental results demonstrate Hormes good performance in the HPC setting, with up to 41.4% faster than NoSQL based solution in random access scenario.

international conference on big data | 2015

Scalable dental computing on cyberinfrastructure

Hui Zhang; Riqing Chen; Guangchen Ruan; Masatoshi Ando

Dentistry is a particularly complex and sophisticated applied science; many problems have to be solved by analyzing intensive longitudinal data. For example, dynamic carious lesion assessment requires dental researchers to perform knowledge discovery in a situation with multiple specimens across different experimental phases. The technological development and availability of cyberinfrastructure today can enable dental researchers to perform existing procedures far faster and more accurately than ever. This paper uses dynamic carious lesion activity assessment as a case study, to illustrate how visual computing on advanced cyberinfrastructure can expand beyond statistical number crunching and information retrieval to make an imaginative and creative contribution to some aspects of dental science. Our work focuses on the generation of BIG pictures on cyberinfrastructure and the presentation of derived dental structures in an interactive means, which combine to allow researchers to navigate from observation to qualitative discovery and then to quantitative assessment with multiple variables and degrees of freedom. Our work has seen early use by our collaborators in oral health research, where our system has been used to pose and answer domain-specific questions for quantitative assessment of dynamic carious lesion activities.

extreme science and engineering discovery environment | 2014

TextRWeb: Large-Scale Text Analytics with R on the Web

Guangchen Ruan; Hui Zhang; Eric A. Wernert; Beth Plale

As digital data sources grow in number and size, they pose an opportunity for computational investigation by means of text mining, NLP, and other text analysis techniques. R is a popular and powerful text analytics tool; however, it needs to run in parallel and requires special handling to protect copyrighted content against full access (consumption). The HathiTrust Research Center (HTRC) currently has 11 million volumes (books) where 7 million volumes are copyrighted. In this paper we propose HTRC TextRWeb, an interactive R software environment which employs complexity hiding interfaces and automatic code generation to allow large-scale text analytics in a non-consumptive means. For our principal test case of copyrighted data in HathiTrust Digital Library, TextRWeb permits us to code, edit, and submit text analytics methods empowered by a family of interactive web user interfaces. All these methods combine to reveal a new interactive paradigm for large-scale text analytics on the web.

extreme science and engineering discovery environment | 2013

Exploiting MapReduce and data compression for data-intensive applications

Guangchen Ruan; Hui Zhang; Beth Plale

Archive | 2014

Big Data at Scale for Digital Humanities: An Architecture for the HathiTrust Research Center

Stacy T. Kowalczyk; Yiming Sun; Zong Peng; Beth Plale; Aaron Todd; Loretta Auvil; Craig Willis; Jiaan Zeng; Milinda Pathirage; Samitha Liyanage; Guangchen Ruan; J. Stephen Downie

Explore More