Stephen Ranshous | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephen Ranshous is active.

Explore More

Publication

Featured researches published by Stephen Ranshous.

european conference on parallel processing | 2014

Improving Read Performance with Online Access Pattern Analysis and Prefetching

Houjun Tang; Xiaocheng Zou; John Jenkins; David A. Boyuka; Stephen Ranshous; Dries Kimpe; Scott Klasky; Nagiza F. Samatova

Among the major challenges of transitioning to exascale in HPC is the ubiquitous I/O bottleneck. For analysis and visualization applications in particular, this bottleneck is exacerbated by the write-onceread- many property of most scientific datasets combined with typically complex access patterns. One promising way to alleviate this problem is to recognize the application’s access patterns and utilize them to prefetch data, thereby overlapping computation and I/O. However, current research methods for analyzing access patterns are either offline-only and/or lack the support for complex access patterns, such as high-dimensional strided or composition-based unstructured access patterns. Therefore, we propose an online analyzer capable of detecting both simple and complex access patterns with low computational and memory overhead and high accuracy. By combining our pattern detection with prefetching,we consistently observe run-time reductions, up to 26%, across 18 configurations of PIOBench and 4 configurations of a micro-benchmark with both structured and unstructured access patterns.

siam international conference on data mining | 2016

A Scalable Approach for Outlier Detection in Edge Streams Using Sketch-based Approximations.

Stephen Ranshous; Steve Harenberg; Kshitij Sharma; Nagiza F. Samatova

Dynamic graphs are a powerful way to model an evolving set of objects and their ongoing interactions. A broad spectrum of systems, such as information, communication, and social, are naturally represented by dynamic graphs. Outlier (or anomaly) detection in dynamic graphs can provide unique insights into the relationships of objects and identify novel or emerging relationships. To date, outlier detection in dynamic graphs has been studied in the context of graph streams, focusing on the analysis and comparison of entire graph objects. However, the volume and velocity of data are necessitating a transition from outlier detection in the context of graph streams to outlier detection in the context of edge streams–where the stream consists of individual graph edges instead of entire graph objects. In this paper, we propose the first approach for outlier detection in edge streams. We first describe a highlevel model for outlier detection based on global and local structural properties of a stream. We propose a novel application of the Count-Min sketch for approximating these properties, and prove probabilistic error bounds on our edge outlier scoring functions. Our sketch-based implementation provides a scalable solution, having constant time updates and constant space requirements. Experiments on synthetic and real world datasets demonstrate our method’s scalability, effectiveness for discovering outliers, and the effects of approximation.

european conference on parallel processing | 2014

Fast Set Intersection through Run-Time Bitmap Construction over PForDelta-Compressed Indexes

Xiaocheng Zou; Sriram Lakshminarasimhan; David A. Boyuka; Stephen Ranshous; Houjun Tang; Scott Klasky; Nagiza F. Samatova

Set intersection is a fundamental operation for evaluating conjunctive queries in the context of scientific data analysis. The state-of-the-art approach in performing set intersection, compressed bitmap indexing, achieves high computational efficiency because of cheap bitwise operations; however, overall efficiency is often nullified by the HPC I/O bottleneck, because compressed bitmap indexes typically exhibit a heavy storage footprint. Conversely, the recently-presented PForDelta-compressed index has been demonstrated to be storage-lightweight, but has limited performance for set intersection. Thus, a more effective set intersection approach should be efficient in both computation and I/O.

advanced data mining and applications | 2017

An Intelligent Weighted Fuzzy Time Series Model Based on a Sine-Cosine Adaptive Human Learning Optimization Algorithm and Its Application to Financial Markets Forecasting

Ruixin Yang; Mingyang Xu; Junyi He; Stephen Ranshous; Nagiza F. Samatova

Financial forecasting is an extremely challenging task given the complex, nonlinear nature of financial market systems. To overcome this challenge, we present an intelligent weighted fuzzy time series model for financial forecasting, which uses a sine-cosine adaptive human learning optimization (SCHLO) algorithm to search for the optimal parameters for forecasting. New weighted operators that consider frequency based chronological order and stock volume are analyzed, and SCHLO is integrated to determine the effective intervals and weighting factors. Furthermore, a novel short-term trend repair operation is developed to complement the final forecasting process. Finally, the proposed model is applied to four world major trading markets: the Dow Jones Index (DJI), the German Stock Index (DAX), the Japanese Stock Index (NIKKEI), and Taiwan Stock Index (TAIEX). Experimental results show that our model is consistently more accurate than the state-of-the-art baseline methods. The easy implementation and effective forecasting performance suggest our proposed model could be a favorable market application prospect.

International Conference on Complex Networks and their Applications | 2017

Efficient Outlier Detection in Hyperedge Streams Using MinHash and Locality-Sensitive Hashing

Stephen Ranshous; Mandar S. Chaudhary; Nagiza F. Samatova

Mining outliers in graph data is a rapidly growing area of research. Traditional methods focus either on static graphs, or restrict relationships to be pairwise. In this work we address both of these limitations directly, and propose the first approach for mining outliers in hyperedge streams. Hyperedges, which generalize edges, faithfully capture higher order relationships that naturally occur in complex systems. Our model annotates every incoming hyperedge with an outlier score, which is based on the incident vertices and the historical relationships among them. Additionally, we describe an approximation scheme that ensures our model is suitable for being run in streaming environments. Experimental results on several real-world datasets show our model effectively identifies outliers, and that our approximation provides speedups between 33–775x.

international conference on big data | 2016

Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications

Wenzhao Zhang; Houjun Tang; Stephen Ranshous; Surendra Byna; Daniel F. Martin; Kesheng Wu; Bin Dong; Scott Klasky; Nagiza F. Samatova

Runtime data sharing across applications is of great importance for avoiding high I/O overhead for scientific data analytics. Sharing data on a staging space running on a set of dedicated compute nodes is faster than writing data to a slow disk-based parallel file system (PFS) and then reading it back for post-processing. Originally, the staging space has been purely based on main memory (DRAM), and thus was several orders of magnitude faster than the PFS approach. However, storing all the data produced by large-scale simulations on DRAM is impractical. Moving data from memory to SSD-based burst buffers is a potential approach to address this issue. However, SSDs are about one order of magnitude slower than DRAM. To optimize data access performance over the staging space, methods such as prefetching data from SSDs according to detected spatial access patterns and distributing data across the network topology have been explored. Although these methods work well for uniform mesh data, which they were designed for, they are not well suited for adaptive mesh refinement (AMR) data. Two mąjor issues must be addressed before constructing such a memory hierarchy and topology-aware runtime AMR data sharing framework: (1) spatial access pattern detection and prefetching for AMR data; (2) AMR data distribution across the network topology at runtime. We propose a framework that addresses these challenges and demonstrate its effectiveness with extensive experiments on AMR data. Our results show the frameworks spatial access pattern detection and prefetching methods demonstrate about 26% performance improvement for client analytical processes. Moreover, the frameworks topology-aware data placement can improve overall data access performance by up to 18%.

siam international conference on data mining | 2014

Memory-efficient query-driven community detection with application to complex disease associations

Steve Harenberg; Ramona G. Seay; Stephen Ranshous; Kanchana Padmanabhan; Jitendra K. Harlalka; Eric R. Schendel; Michael P. O'Brien; Rada Chirkova; William Hendrix; Alok N. Choudhary; Vipin Kumar; Murali Doraiswamy; Nagiza F. Samatova

Community detection in real-world graphs presents a number of challenges. First, even if the number of detected communities grows linearly with the graph size, it becomes impossible to manually inspect each community for value added to the application knowledge base. Mining for communities with query nodes as knowledge priors could allow for filtering out irrelevant information and for enriching end-users knowledge associated with the problem of interest, such as discovery of genes functionally associated with the Alzheimer’s (AD) biomarker genes. Second, the data-intensive nature of community enumeration challenges current approaches that often assume that the input graph and the detected communities fit in memory. As computer systems scale, DRAM memory sizes are not expected to increase linearly, while technologies such as SSD memories have the potential to provide much higher capacities at a lower power-cost point, and have a much lower latency than disks. Out-of-core algorithms and/or databaseinspired indexing could provide an opportunity for different design optimizations for query-driven community detection algorithms tuned for emerging architectures. Therefore, this work addresses the need for query-driven and memory-efficient community detection. Using maximal cliques as the community definition, due to their high signalto-noise ratio, we propose and systematically compare two contrasting methods: indexed-based and out-of-core. Both methods improve peak memory efficiency as much as 1000X compared to the state-of-the-art. However, the index-based method, which also has a 10-to-100-fold run time reduction, outperforms the out-of-core algorithm in most cases. The achieved scalability enables the discovery of diseases that are known to be or likely associated with Alzheimer’s when the genome-scale network is mined with AD biomarker genes as knowledge priors.

financial cryptography | 2017

Exchange Pattern Mining in the Bitcoin Transaction Directed Hypergraph

Stephen Ranshous; Cliff A. Joslyn; Sean J. Kreyling; Kathleen Nowak; Nagiza F. Samatova; Curtis L. West; Samuel T. Winters

Bitcoin exchanges operate between digital and fiat currency networks, thus providing an opportunity to connect real-world identities to pseudonymous addresses, an important task for anti-money laundering efforts. We seek to characterize, understand, and identify patterns centered around exchanges in the context of a directed hypergraph model for Bitcoin transactions. We introduce the idea of motifs in directed hypergraphs, considering a particular 2-motif as a potential laundering pattern. We identify distinct statistical properties of exchange addresses related to the acquisition and spending of bitcoin. We then leverage this to build classification models to learn a set of discriminating features, and are able to predict if an address is owned by an exchange with \(>80\%\) accuracy using purely structural features of the graph. Applying this classifier to the 2-motif patterns reveals a preponderance of inter-exchange activity, while not necessarily significant laundering patterns.

International Conference on Complex Networks and their Applications | 2017

A Community-Driven Graph Partitioning Method for Constraint-Based Causal Discovery

Mandar S. Chaudhary; Stephen Ranshous; Nagiza F. Samatova

Constraint-based (CB) methods are widely used for discovering causal relationships in observational data. The PC-stable algorithm is a prominent example of CB methods. A critical component of the PC-stable algorithm is to find d-separators and perform conditional independence (CI) tests to eliminate spurious causal relationships. While the pairwise CI tests are necessary for identifying causal relationships, the error rate, where true causal relationships are erroneously removed, increases with the number of tests performed. Efficiently searching for the true d-separator set is thus a critical component to increase the accuracy of the causal graph. To this end, we propose a novel recursive algorithm for constructing causal graphs, based on a two-phase divide and conquer strategy. In phase one, we recursively partition the undirected graph using community detection, and subsequently construct partial skeletons from each partition. Phase two uses a bottom-up approach to merge the subgraph skeletons, ultimately yielding the full causal graph. Simulations on several real-world data sets show that our approach effectively finds the d-separators, leading to a significant improvement in the quality of causal graphs.

Wiley Interdisciplinary Reviews: Computational Statistics | 2014