Is this you? Create Your Porfile

Oliver Ruebel

Lawrence Berkeley National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Oliver Ruebel is active.

Explore More

Publication

Featured researches published by Oliver Ruebel.

Algorithmic Finance | 2013

A Big Data Approach to Analyzing Market Volatility

Kesheng Wu; E. Wes Bethel; Ming Gu; David Leinweber; Oliver Ruebel

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. This requires computing resources that are not easily available to financial academics and regulators. Fortunately, data-intensive scientific research has developed a series of tools and techniques for working with a large amount of data. In this work, we demonstrate that these techniques are effective for market data analysis by computing an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN) on a massive set of futures trading records. The test data contains five and a half year’s worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different parameter combinations for computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. On average, computing VPIN of one futures contract over 5.5 years takes around 1.5 seconds on one core, which demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time – an ability that could be valuable to regulators. By examining a large number of parameter combinations, we are also able to identify the parameter settings that improves the prediction accuracy from 80% to 93%.

Neuron | 2016

High-Performance Computing in Neuroscience for Data-Driven Discovery, Integration, and Dissemination

Kristofer E. Bouchard; James B. Aimone; Miyoung Chun; Thomas Dean; Michael Denker; Markus Diesmann; David Donofrio; Loren M. Frank; Narayanan Kasthuri; Chirstof Koch; Oliver Ruebel; Horst D. Simon; Friedrich T. Sommer; Prabhat

Opportunities offered by new neuro-technologies are threatened by lack of coherent plans to analyze, manage, and understand the data. High-performance computing will allow exploratory analysis of massive datasets stored in standardized formats, hosted in open repositories, and integrated with simulations.

Metabolites | 2015

Analysis of Metabolomics Datasets with High-Performance Computing and Metabolite Atlases.

Yushu Yao; Terence Sun; Tony Wang; Oliver Ruebel; Trent R. Northen; Benjamin P. Bowen

Even with the widespread use of liquid chromatography mass spectrometry (LC/MS) based metabolomics, there are still a number of challenges facing this promising technique. Many, diverse experimental workflows exist; yet there is a lack of infrastructure and systems for tracking and sharing of information. Here, we describe the Metabolite Atlas framework and interface that provides highly-efficient, web-based access to raw mass spectrometry data in concert with assertions about chemicals detected to help address some of these challenges. This integration, by design, enables experimentalists to explore their raw data, specify and refine features annotations such that they can be leveraged for future experiments. Fast queries of the data through the web using SciDB, a parallelized database for high performance computing, make this process operate quickly. By using scripting containers, such as IPython or Jupyter, to analyze the data, scientists can utilize a wide variety of freely available graphing, statistics, and information management resources. In addition, the interfaces facilitate integration with systems biology tools to ultimately link metabolomics data with biological models.

international parallel and distributed processing symposium | 2016

A Multi-Platform Evaluation of the Randomized CX Low-Rank Matrix Factorization in Spark

Alex Gittens; Jey Kottalam; Jiyan Yang; Michael F. Ringenburg; Jatin Chhugani; Evan Racah; Mohitdeep Singh; Yushu Yao; Curt R. Fischer; Oliver Ruebel; Benjamin P. Bowen; Norman G. Lewis; Michael W. Mahoney; Venkat Krishnamurthy; Prabhat

We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with the fastest times obtained on the experimental Cray cluster. In comparison, the C implementation processed the 1TB size dataset 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.

bioRxiv | 2015

BRAINformat: A Data Standardization Framework for Neuroscience Data

Oliver Ruebel; Prabhat; Peter Denes; David F. Conant; Edward F. Chang; Kristofer E. Bouchard

Neuroscience is entering the era of ‘extreme data’ with little experience and few plans for the associated volume, velocity, variety, and veracity challenges. This is a serious impediment for both the sharing of data across labs, as well as the utilization of modern and high-performance computing capabilities to enable data driven discovery. Here, we introduce BRAINformat, a novel file format and model for management and storage of neuroscience data. The BRAINformat library defines application-independent design concepts and modules that together create a general framework for standardization of scientific data. We describe the formal specification of scientific data standards, which facilitates sharing and verification of data and formats. We introduce the concept of Managed Objects, enabling semantic components of data formats to be specified as self-contained units, supporting modular and reusable design of data format components and file storage. The BRAINformat is built off of HDF5, enabling portable, scalable, and self-describing data storage. We introduce the novel concept of Relationship Attributes for modeling and use of semantic relationships between data objects, and discuss the annotation of data using dedicated data annotation modules provided by the BRAINformat library. Based on these concepts we implement dedicated, application-oriented modules and design a data standard for neuroscience data. The BRAINformat software library is open source, easy-to-use, and provides detailed user and developer documentation and is freely available at: https://bitbucket.org/oruebel/brainformat.

Visualization of Large and Unstructured Data Sets | 2008

PointCloudExplore 2: Visual exploration of 3D gene expression

Lbnl Genomics Division; Oliver Ruebel; Oliver Rübel; Gunther H. Weber; Min-Yu Huang; E. Wes Bethel; Soile V.E. Keranen; Charless C. Fowlkes; Cris L. Luengo Hendriks; Angela H. DePace; Lisa Simirenko; Michael B. Eisen; Mark D. Biggin; Hand Hagen; Jitendra Malik; David W. Knowles; Bernd Hamann

To better understand how developmental regulatory networks are defined inthe genome sequence, the Berkeley Drosophila Transcription Network Project (BDNTP)has developed a suite of methods to describe 3D gene expression data, i.e.,the output of the network at cellular resolution for multiple time points. To allow researchersto explore these novel data sets we have developed PointCloudXplore (PCX).In PCX we have linked physical and information visualization views via the concept ofbrushing (cell selection). For each view dedicated operations for performing selectionof cells are available. In PCX, all cell selections are stored in a central managementsystem. Cells selected in one view can in this way be highlighted in any view allowingfurther cell subset properties to be determined. Complex cell queries can be definedby combining different cell selections using logical operations such as AND, OR, andNOT. Here we are going to provide an overview of PointCloudXplore 2 (PCX2), thelatest publicly available version of PCX. PCX2 has shown to be an effective tool forvisual exploration of 3D gene expression data. We discuss (i) all views available inPCX2, (ii) different strategies to perform cell selection, (iii) the basic architecture ofPCX2., and (iv) illustrate the usefulness of PCX2 using selected examples.

Archive | 2016

Rendering and Compositing Infrastructure Improvements to VisIt for Insitu Rendering

Burlen Loring; Oliver Ruebel

Compared to posthoc rendering, insitu rendering often generates larger numbers of images, as a result rendering performance and scalability are critical in the insitu setting. In this work we present improvements to VisIts rendering and compositing infrastructure that deliver increased performance and scalability in both posthoc and insitu settings. We added the capability for alpha blend compositing and use it with ordered compositing when datasets have disjoint block domain decomposition to optimize the rendering of transparent geometry. We also made improvements that increase overall efficiency by reducing communication and data movement and have addressed a number of performance issues. We structured our code to take advantage of SIMD parallelization and use threads to overlap communication and compositing. We tested our improvements on a 20 core workstation using 8 cores to render geometry generated from a

bioRxiv | 2017

MAGI: A Bayesian-like method for metabolite, annotation, and gene integration

Onur Erbilgin; Oliver Ruebel; Katherine Louie; Matthew Trinh; Markus de Raad; Tony Wildish; Daniel W Udwary; Cindi A. Hoover; Samuel Deutsch; Trent R. Northen; Benjamin P. Bowen

256^3

Archive | 2012