Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrew E. Bruno is active.

Publication


Featured researches published by Andrew E. Bruno.


Comparative and Functional Genomics | 2011

The Influence of 3′UTRs on MicroRNA Function Inferred from Human SNP Data

Zihua Hu; Andrew E. Bruno

MicroRNAs (miRNAs) regulate gene expression posttranscriptionally. Although previous efforts have demonstrated the functional importance of target sites on miRNAs, little is known about the influence of the rest of 3′ untranslated regions (3′UTRs) of target genes on microRNA function. We conducted a genome-wide study and found that the entire 3′UTR sequences could also play important roles on miRNA function in addition to miRNA target sites. This was evidenced by the fact that human single nucleotide polymorphisms (SNPs) on both seed target region and the rest of 3′UTRs of miRNA target genes were under significantly stronger negative selection, when compared to non-miRNA target genes. We also discovered that the flanking nucleotides on both sides of miRNA target sites were subject to moderate strong selection. A local sequence region of ~67 nucleotides with symmetric structure is herein defined. Additionally, from gene expression analysis, we found that SNPs and miRNA target sites on target sequences may interactively affect gene expression.


Concurrency and Computation: Practice and Experience | 2013

Performance metrics and auditing framework using application kernels for high-performance computer systems

Thomas R. Furlani; Matthew D. Jones; Steven M. Gallo; Andrew E. Bruno; Charng-Da Lu; Amin Ghadersohi; Ryan J. Gentner; Abani K. Patra; Robert L. DeLeon; Gregor von Laszewski; Fugang Wang; Ann Zimmerman

This paper describes XSEDE Metrics on Demand, a comprehensive auditing framework for use by high‐performance computing centers, which provides metrics regarding resource utilization, resource performance, and impact on scholarship and research. This role‐based framework is designed to meet the following objectives: (1) provide the user community with a tool to manage their allocations and optimize their resource utilization; (2) provide operational staff with the ability to monitor and tune resource performance; (3) provide management with a tool to monitor utilization, user base, and performance of resources; and (4) provide metrics to help measure scientific impact. Although initially focused on the XSEDE program, XSEDE Metrics on Demand can be adapted to any high‐performance computing environment. The framework includes a computationally lightweight application kernel auditing system that utilizes performance kernels to measure overall system performance. This allows continuous resource auditing to measure all aspects of system performance including filesystem performance, processor and memory performance, and network latency and bandwidth. Metrics that focus on scientific impact, such as publications, citations and external funding, will be included to help quantify the important role high‐performance computing centers play in advancing research and scholarship. Copyright


BMC Bioinformatics | 2013

FUSIM: a software tool for simulating fusion transcripts

Andrew E. Bruno; Jeffrey C. Miecznikowski; Maochun Qin; Jianmin Wang; Song Liu

BackgroundGene fusions are the result of chromosomal aberrations and encode chimeric RNA (fusion transcripts) that play an important role in cancer genesis. Recent advances in high throughput transcriptome sequencing have given rise to computational methods for new fusion discovery. The ability to simulate fusion transcripts is essential for testing and improving those tools.ResultsTo facilitate this need, we developed FUSIM (FUsion SIMulator), a software tool for simulating fusion transcripts. The simulation of events known to create fusion genes and their resulting chimeric proteins is supported, including inter-chromosome translocation, trans-splicing, complex chromosomal rearrangements, and transcriptional read through events.ConclusionsFUSIM provides the ability to assemble a dataset of fusion transcripts useful for testing and benchmarking applications in fusion gene discovery.


ieee international conference on high performance computing, data, and analytics | 2009

Comparing the performance of clusters, Hadoop, and Active Disks on microarray correlation computations

Jeffrey A. Delmerico; Nathanial A. Byrnes; Andrew E. Bruno; Matthew D. Jones; Steven M. Gallo; Vipin Chaudhary

Microarray-based comparative genomic hybridization (aCGH) offers an increasingly fine-grained method for detecting copy number variations in DNA. These copy number variations can directly influence the expression of the proteins that are encoded in the genes in question. A useful analysis of the data produced from these microarray experiments is pairwise correlation. However, the high resolution of todays microarray technology requires that supercomputing computation and storage resources be leveraged in order to perform this analysis. This application is an exemplar of the class of data intensive problems which require high-throughput I/O in order to be tractable. Although the performance of these types of applications on a cluster can be improved by parallelization, storage hardware and network limitations restrict the scalability of an I/O-bound application such as this. The Hadoop software framework is designed to enable data-intensive applications on cluster architectures, and offers significantly better scalability due to its distributed file system. However, specialized architecture adhering to the Active Disk paradigm, in which compute power is placed close to the disk instead of across a network, can further improve performance. The Netezza Corporations database systems are designed around the Active Disk approach, and offer tremendous gains in implementing this application over the traditional cluster architecture. We present methods and performance analyses of several implementations of this application: on a cluster, on a cluster with a parallel file system, with Hadoop on a cluster, and using a Netezza data warehouse appliance. Our results offer benchmarks for the performance of data intensive applications within these distributed computing paradigms.1


PLOS ONE | 2014

Comparing Chemistry to Outcome: The Development of a Chemical Distance Metric, Coupled with Clustering and Hierarchal Visualization Applied to Macromolecular Crystallography

Andrew E. Bruno; Amanda Ruby; Joseph R. Luft; Thomas D. Grant; Jayaraman Seetharaman; Gaetano T. Montelione; John F. Hunt; Edward H. Snell

Many bioscience fields employ high-throughput methods to screen multiple biochemical conditions. The analysis of these becomes tedious without a degree of automation. Crystallization, a rate limiting step in biological X-ray crystallography, is one of these fields. Screening of multiple potential crystallization conditions (cocktails) is the most effective method of probing a proteins phase diagram and guiding crystallization but the interpretation of results can be time-consuming. To aid this empirical approach a cocktail distance coefficient was developed to quantitatively compare macromolecule crystallization conditions and outcome. These coefficients were evaluated against an existing similarity metric developed for crystallization, the C6 metric, using both virtual crystallization screens and by comparison of two related 1,536-cocktail high-throughput crystallization screens. Hierarchical clustering was employed to visualize one of these screens and the crystallization results from an exopolyphosphatase-related protein from Bacteroides fragilis, (BfR192) overlaid on this clustering. This demonstrated a strong correlation between certain chemically related clusters and crystal lead conditions. While this analysis was not used to guide the initial crystallization optimization, it led to the re-evaluation of unexplained peaks in the electron density map of the protein and to the insertion and correct placement of sodium, potassium and phosphate atoms in the structure. With these in place, the resulting structure of the putative active site demonstrated features consistent with active sites of other phosphatases which are involved in binding the phosphoryl moieties of nucleotide triphosphates. The new distance coefficient, CDcoeff, appears to be robust in this application, and coupled with hierarchical clustering and the overlay of crystallization outcome, reveals information of biological relevance. While tested with a single example the potential applications related to crystallography appear promising and the distance coefficient, clustering, and hierarchal visualization of results undoubtedly have applications in wider fields.


Nucleic Acids Research | 2017

Trypanosome RNA Editing Mediator Complex proteins have distinct functions in gRNA utilization

Rachel M. Simpson; Andrew E. Bruno; Runpu Chen; Kaylen Lott; Brianna L. Tylec; Jonathan Bard; Yijun Sun; Michael J. Buck; Laurie K. Read

Abstract Uridine insertion/deletion RNA editing is an essential process in kinetoplastid parasites whereby mitochondrial mRNAs are modified through the specific insertion and deletion of uridines to generate functional open reading frames, many of which encode components of the mitochondrial respiratory chain. The roles of numerous non-enzymatic editing factors have remained opaque given the limitations of conventional methods to interrogate the order and mechanism by which editing progresses and thus roles of individual proteins. Here, we examined whole populations of partially edited sequences using high throughput sequencing and a novel bioinformatic platform, the Trypanosome RNA Editing Alignment Tool (TREAT), to elucidate the roles of three proteins in the RNA Editing Mediator Complex (REMC). We determined that the factors examined function in the progression of editing through a gRNA; however, they have distinct roles and REMC is likely heterogeneous in composition. We provide the first evidence that editing can proceed through numerous paths within a single gRNA and that non-linear modifications are essential, generating commonly observed junction regions. Our data support a model in which RNA editing is executed via multiple paths that necessitate successive re-modification of junction regions facilitated, in part, by the REMC variant containing TbRGG2 and MRB8180.


PLOS ONE | 2014

Statistical Analysis of Crystallization Database Links Protein Physico-Chemical Features with Crystallization Mechanisms

Diana Fusco; Timothy James Barnum; Andrew E. Bruno; Joseph R. Luft; Edward H. Snell; Sayan Mukherjee; Patrick Charbonneau

X-ray crystallography is the predominant method for obtaining atomic-scale information about biological macromolecules. Despite the success of the technique, obtaining well diffracting crystals still critically limits going from protein to structure. In practice, the crystallization process proceeds through knowledge-informed empiricism. Better physico-chemical understanding remains elusive because of the large number of variables involved, hence little guidance is available to systematically identify solution conditions that promote crystallization. To help determine relationships between macromolecular properties and their crystallization propensity, we have trained statistical models on samples for 182 proteins supplied by the Northeast Structural Genomics consortium. Gaussian processes, which capture trends beyond the reach of linear statistical models, distinguish between two main physico-chemical mechanisms driving crystallization. One is characterized by low levels of side chain entropy and has been extensively reported in the literature. The other identifies specific electrostatic interactions not previously described in the crystallization context. Because evidence for two distinct mechanisms can be gleaned both from crystal contacts and from solution conditions leading to successful crystallization, the model offers future avenues for optimizing crystallization screens based on partial structural information. The availability of crystallization data coupled with structural outcomes analyzed through state-of-the-art statistical models may thus guide macromolecular crystallization toward a more rational basis.


Scopus | 2014

Comprehensive, open-source resource usage measurement and analysis for HPC systems

James C. Browne; Robert L. DeLeon; Abani K. Patra; William L. Barth; John Hammond; Jones; Tom Furlani; Barry I. Schneider; Steven M. Gallo; Amin Ghadersohi; Ryan J. Gentner; Jeffrey T. Palmer; Nikolay Simakov; Martins Innus; Andrew E. Bruno; Joseph P. White; Cynthia D. Cornelius; Thomas Yearke; Kyle Marcus; G. Von Laszewski; Fugang Wang

The important role high‐performance computing (HPC) resources play in science and engineering research, coupled with its high cost (capital, power and manpower), short life and oversubscription, requires us to optimize its usage – an outcome that is only possible if adequate analytical data are collected and used to drive systems management at different granularities – job, application, user and system. This paper presents a method for comprehensive job, application and system‐level resource use measurement, and analysis and its implementation. The steps in the method are system‐wide collection of comprehensive resource use and performance statistics at the job and node levels in a uniform format across all resources, mapping and storage of the resultant job‐wise data to a relational database, which enables further implementation and transformation of the data to the formats required by specific statistical and analytical algorithms. Analyses can be carried out at different levels of granularity: job, user, application or system‐wide. Measurements are based on a new lightweight job‐centric measurement tool ‘TACC_Stats’, which gathers a comprehensive set of resource use metrics on all compute nodes and data logged by the system scheduler. The data mapping and analysis tools are an extension of the XDMoD project. The method is illustrated with analyses of resource use for the Texas Advanced Computing Centers Lonestar4, Ranger and Stampede supercomputers and the HPC cluster at the Center for Computational Research. The illustrations are focused on resource use at the system, job and application levels and reveal many interesting insights into system usage patterns and also anomalous behavior due to failure/misuse. The method can be applied to any system that runs the TACC_Stats measurement tool and a tool to extract job execution environment data from the system scheduler. Copyright


Advances in Bioinformatics | 2013

Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets

Sreevidya Sadananda Sadasiva Rao; Lori Shepherd; Andrew E. Bruno; Song Liu; Jeffrey C. Miecznikowski

Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with k = 4 is most accurate under the error measures considered. The k-nearest neighbor method with k = 1 has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with k = 4 has the best overall performance and k-nearest neighbor method with k = 1 has the worst overall performance. These results hold true for both 5% and 10% missing values.


International Journal of Bioinformatics Research and Applications | 2010

iGenomicViewer: R package for visualisation of high dimension genomic data

Daniel P. Gaile; Lori Shepherd; Andrew E. Bruno; Song Liu; Carl Morrison; Lara Sucheston; Jeffrey C. Miecznikowski

While the technologies for high dimensional data have been advancing, a lack of adequate visualisation tools to accommodate the results and inability to integrate multiple sources of data has emerged. The move towards multi-disciplinary work and collaborative research impresses the need for visualisation and analysis tools that are platform independent and customisable. iGenomicViewer through the use of customisable tool-tips that may include links and images, allows for a greater level of data integration for genomic data in a variety of formats. The iGenomicViewer is a freely available R software which allows users to generate interactive, platform-independent plots of genomic data.

Collaboration


Dive into the Andrew E. Bruno's collaboration.

Top Co-Authors

Avatar

Edward H. Snell

Hauptman-Woodward Medical Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joseph R. Luft

Hauptman-Woodward Medical Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ryan J. Gentner

State University of New York System

View shared research outputs
Top Co-Authors

Avatar

Song Liu

Roswell Park Cancer Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge