Armaghan W. Naik | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Armaghan W. Naik is active.

Explore More

Publication

Featured researches published by Armaghan W. Naik.

Bioinformatics | 2013

Determining the subcellular location of new proteins from microscope images using local features

Luis Pedro Coelho; Joshua D. Kangas; Armaghan W. Naik; Elvira Osuna-Highley; Estelle Glory-Afshar; Margaret H. Fuhrman; Ramanuja Simha; Peter B. Berget; Jonathan W. Jarvik; Robert F. Murphy

MOTIVATION Evaluation of previous systems for automated determination of subcellular location from microscope images has been done using datasets in which each location class consisted of multiple images of the same representative protein. Here, we frame a more challenging and useful problem where previously unseen proteins are to be classified. RESULTS Using CD-tagging, we generated two new image datasets for evaluation of this problem, which contain several different proteins for each location class. Evaluation of previous methods on these new datasets showed that it is much harder to train a classifier that generalizes across different proteins than one that simply recognizes a protein it was trained on. We therefore developed and evaluated additional approaches, incorporating novel modifications of local features techniques. These extended the notion of local features to exploit both the protein image and any reference markers that were imaged in parallel. With these, we obtained a large accuracy improvement in our new datasets over existing methods. Additionally, these features help achieve classification improvements for other previously studied datasets. AVAILABILITY The datasets are available for download at http://murphylab.web.cmu.edu/data/. The software was written in Python and C++ and is available under an open-source license at http://murphylab.web.cmu.edu/software/. The code is split into a library, which can be easily reused for other data and a small driver script for reproducing all results presented here. A step-by-step tutorial on applying the methods to new datasets is also available at that address. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

PLOS ONE | 2013

Efficient Modeling and Active Learning Discovery of Biological Responses

Armaghan W. Naik; Joshua D. Kangas; Christopher James Langmead; Robert F. Murphy

High throughput and high content screening involve determination of the effect of many compounds on a given target. As currently practiced, screening for each new target typically makes little use of information from screens of prior targets. Further, choices of compounds to advance to drug development are made without significant screening against off-target effects. The overall drug development process could be made more effective, as well as less expensive and time consuming, if potential effects of all compounds on all possible targets could be considered, yet the cost of such full experimentation would be prohibitive. In this paper, we describe a potential solution: probabilistic models that can be used to predict results for unmeasured combinations, and active learning algorithms for efficiently selecting which experiments to perform in order to build those models and determining when to stop. Using simulated and experimental data, we show that our approaches can produce powerful predictive models without exhaustive experimentation and can learn them much faster than by selecting experiments at random.

BMC Bioinformatics | 2014

Efficient discovery of responses of proteins to compounds using active learning

Joshua D. Kangas; Armaghan W. Naik; Robert F. Murphy

BackgroundDrug discovery and development has been aided by high throughput screening methods that detect compound effects on a single target. However, when using focused initial screening, undesirable secondary effects are often detected late in the development process after significant investment has been made. An alternative approach would be to screen against undesired effects early in the process, but the number of possible secondary targets makes this prohibitively expensive.ResultsThis paper describes methods for making this global approach practical by constructing predictive models for many target responses to many compounds and using them to guide experimentation. We demonstrate for the first time that by jointly modeling targets and compounds using descriptive features and using active machine learning methods, accurate models can be built by doing only a small fraction of possible experiments. The methods were evaluated by computational experiments using a dataset of 177 assays and 20,000 compounds constructed from the PubChem database.ConclusionsAn average of nearly 60% of all hits in the dataset were found after exploring only 3% of the experimental space which suggests that active learning can be used to enable more complete characterization of compound effects than otherwise affordable. The methods described are also likely to find widespread application outside drug discovery, such as for characterizing the effects of a large number of compounds or inhibitory RNAs on a large number of cell or tissue phenotypes.

eLife | 2016

Active machine learning-driven experimentation to determine compound effects on protein patterns

Armaghan W. Naik; Joshua D. Kangas; Devin P. Sullivan; Robert F. Murphy

High throughput screening determines the effects of many conditions on a given biological target. Currently, to estimate the effects of those conditions on other targets requires either strong modeling assumptions (e.g. similarities among targets) or separate screens. Ideally, data-driven experimentation could be used to learn accurate models for many conditions and targets without doing all possible experiments. We have previously described an active machine learning algorithm that can iteratively choose small sets of experiments to learn models of multiple effects. We now show that, with no prior knowledge and with liquid handling robotics and automated microscopy under its control, this learner accurately learned the effects of 48 chemical compounds on the subcellular localization of 48 proteins while performing only 29% of all possible experiments. The results represent the first practical demonstration of the utility of active learning-driven biological experimentation in which the set of possible phenotypes is unknown in advance. DOI: http://dx.doi.org/10.7554/eLife.10047.001

BMC Bioinformatics | 2015

Deciding when to stop: efficient experimentation to learn to predict drug-target interactions

Maja Temerinac-Ott; Armaghan W. Naik; Robert F. Murphy

BackgroundActive learning is a powerful tool for guiding an experimentation process. Instead of doing all possible experiments in a given domain, active learning can be used to pick the experiments that will add the most knowledge to the current model. Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain high-confidence predictions. However, in practice, it is crucial to have a method to evaluate the quality of the current predictions and decide when to stop the experimentation process. Only by applying reliable stopping criteria to active learning can time and costs in the experimental process actually be saved.ResultsWe compute active learning traces on simulated drug-target matrices in order to determine a regression model for the accuracy of the active learner. By analyzing the performance of the regression model on simulated data, we design stopping criteria for previously unseen experimental matrices. We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions.ConclusionsWe show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drug-target predictions.

Cytometry Part A | 2016

Point process models for localization and interdependence of punctate cellular structures

Ying Li; Timothy D. Majarian; Armaghan W. Naik; Gregory R. Johnson; Robert F. Murphy

Accurate representations of cellular organization for multiple eukaryotic cell types are required for creating predictive models of dynamic cellular function. To this end, we have previously developed the CellOrganizer platform, an open source system for generative modeling of cellular components from microscopy images. CellOrganizer models capture the inherent heterogeneity in the spatial distribution, size, and quantity of different components among a cell population. Furthermore, CellOrganizer can generate quantitatively realistic synthetic images that reflect the underlying cell population. A current focus of the project is to model the complex, interdependent nature of organelle localization. We built upon previous work on developing multiple non‐parametric models of organelles or structures that show punctate patterns. The previous models described the relationships between the subcellular localization of puncta and the positions of cell and nuclear membranes and microtubules. We extend these models to consider the relationship to the endoplasmic reticulum (ER), and to consider the relationship between the positions of different puncta of the same type. Our results do not suggest that the punctate patterns we examined are dependent on ER position or inter‐ and intra‐class proximity. With these results, we built classifiers to update previous assignments of proteins to one of 11 patterns in three distinct cell lines. Our generative models demonstrate the ability to construct statistically accurate representations of puncta localization from simple cellular markers in distinct cell types, capturing the complex phenomena of cellular structure interaction with little human input. This protocol represents a novel approach to vesicular protein annotation, a field that is often neglected in high‐throughput microscopy. These results suggest that spatial point process models provide useful insight with respect to the spatial dependence between cellular structures.

research in computational molecular biology | 2015

Deciding When to Stop: Efficient Experimentation to Learn to Predict Drug-Target Interactions (Extended Abstract)

Maja Temerinac-Ott; Armaghan W. Naik; Robert F. Murphy

An active learning method for identifying drug-target interactions is presented which considers the interaction between multiple drugs and multiple targets at the same time. The goal of the proposed method is not simply to predict such interactions from experiments that have already been conducted, but to iteratively choose as few new experiments as possible to improve the accuracy of the predictive model. Kernelized Bayesian matrix factorization (KBMF) is used to model the interactions. We demonstrate on four previously characterized drug effect data sets that active learning driven experimentation using KBMF can result in highly accurate models while performing as few as 14% of the possible experiments, and more accurately than random sampling of an equivalent number. We also provide a method for estimating the accuracy of the current model based on the learning curve; and show how it can be used in practice to decide when to stop an active learning process.

Genome Research | 2018