Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Phuong Dao is active.

Publication


Featured researches published by Phuong Dao.


Bioinformatics | 2010

PSORTb 3.0

Nancy Y. Yu; James R. Wagner; Matthew R. Laird; Gabor Melli; Sébastien Rey; Ray mond Lo; Phuong Dao; S. Cenk Sahinalp; Martin Ester; Leonard J. Foster; Fiona S. L. Brinkman

Motivation: PSORTb has remained the most precise bacterial protein subcellular localization (SCL) predictor since it was first made available in 2003. However, the recall needs to be improved and no accurate SCL predictors yet make predictions for archaea, nor differentiate important localization subcategories, such as proteins targeted to a host cell or bacterial hyperstructures/organelles. Such improvements should preferably be encompassed in a freely available web-based predictor that can also be used as a standalone program. Results: We developed PSORTb version 3.0 with improved recall, higher proteome-scale prediction coverage, and new refined localization subcategories. It is the first SCL predictor specifically geared for all prokaryotes, including archaea and bacteria with atypical membrane/cell wall topologies. It features an improved standalone program, with a new batch results delivery system complementing its web interface. We evaluated the most accurate SCL predictors using 5-fold cross validation plus we performed an independent proteomics analysis, showing that PSORTb 3.0 is the most accurate but can benefit from being complemented by Proteome Analyst predictions. Availability: http://www.psort.org/psortb (download open source software or use the web interface). Contact: [email protected] Supplementary Information: Supplementary data are availableat Bioinformatics online.


Journal of Chemical Information and Modeling | 2008

Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis

Hao Zhu; Alexander Tropsha; Denis Fourches; Alexandre Varnek; Ester Papa; Paola Gramatica; Tomas Öberg; Phuong Dao; Artem Cherkasov; Igor V. Tetko

Selecting most rigorous quantitative structure-activity relationship (QSAR) approaches is of great importance in the development of robust and predictive models of chemical toxicity. To address this issue in a systematic way, we have formed an international virtual collaboratory consisting of six independent groups with shared interests in computational chemical toxicology. We have compiled an aqueous toxicity data set containing 983 unique compounds tested in the same laboratory over a decade against Tetrahymena pyriformis. A modeling set including 644 compounds was selected randomly from the original set and distributed to all groups that used their own QSAR tools for model development. The remaining 339 compounds in the original set (external set I) as well as 110 additional compounds (external set II) published recently by the same laboratory (after this computational study was already in progress) were used as two independent validation sets to assess the external predictive power of individual models. In total, our virtual collaboratory has developed 15 different types of QSAR models of aquatic toxicity for the training set. The internal prediction accuracy for the modeling set ranged from 0.76 to 0.93 as measured by the leave-one-out cross-validation correlation coefficient ( Q abs2). The prediction accuracy for the external validation sets I and II ranged from 0.71 to 0.85 (linear regression coefficient R absI2) and from 0.38 to 0.83 (linear regression coefficient R absII2), respectively. The use of an applicability domain threshold implemented in most models generally improved the external prediction accuracy but at the same time led to a decrease in chemical space coverage. Finally, several consensus models were developed by averaging the predicted aquatic toxicity for every compound using all 15 models, with or without taking into account their respective applicability domains. We find that consensus models afford higher prediction accuracy for the external validation data sets with the highest space coverage as compared to individual constituent models. Our studies prove the power of a collaborative and consensual approach to QSAR model development. The best validated models of aquatic toxicity developed by our collaboratory (both individual and consensus) can be used as reliable computational predictors of aquatic toxicity and are available from any of the participating laboratories.


Bioinformatics | 2010

Next-generation VariationHunter

Fereydoun Hormozdiari; Iman Hajirasouliha; Phuong Dao; Faraz Hach; Deniz Yorukoglu; Can Alkan; Evan E. Eichler; S. Cenk Sahinalp

Recent years have witnessed an increase in research activity for the detection of structural variants (SVs) and their association to human disease. The advent of next-generation sequencing technologies make it possible to extend the scope of structural variation studies to a point previously unimaginable as exemplified by the 1000 Genomes Project. Although various computational methods have been described for the detection of SVs, no such algorithm is yet fully capable of discovering transposon insertions, a very important class of SVs to the study of human evolution and disease. In this article, we provide a complete and novel formulation to discover both loci and classes of transposons inserted into genomes sequenced with high-throughput sequencing technologies. In addition, we also present ‘conflict resolution’ improvements to our earlier combinatorial SV detection algorithm (VariationHunter) by taking the diploid nature of the human genome into consideration. We test our algorithms with simulated data from the Venter genome (HuRef) and are able to discover >85% of transposon insertion events with precision of >90%. We also demonstrate that our conflict resolution algorithm (denoted as VariationHunter-CR) outperforms current state of the art (such as original VariationHunter, BreakDancer and MoDIL) algorithms when tested on the genome of the Yoruba African individual (NA18507). Availability: The implementation of algorithm is available at http://compbio.cs.sfu.ca/strvar.htm. Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


intelligent systems in molecular biology | 2008

Biomolecular network motif counting and discovery by color coding

Noga Alon; Phuong Dao; Iman Hajirasouliha; Fereydoun Hormozdiari; S. Cenk Sahinalp

Protein–protein interaction (PPI) networks of many organisms share global topological features such as degree distribution, k-hop reachability, betweenness and closeness. Yet, some of these networks can differ significantly from the others in terms of local structures: e.g. the number of specific network motifs can vary significantly among PPI networks. Counting the number of network motifs provides a major challenge to compare biomolecular networks. Recently developed algorithms have been able to count the number of induced occurrences of subgraphs with k≤ 7 vertices. Yet no practical algorithm exists for counting non-induced occurrences, or counting subgraphs with k≥ 8 vertices. Counting non-induced occurrences of network motifs is not only challenging but also quite desirable as available PPI networks include several false interactions and miss many others. In this article, we show how to apply the ‘color coding’ technique for counting non-induced occurrences of subgraph topologies in the form of trees and bounded treewidth subgraphs. Our algorithm can count all occurrences of motif G′ with k vertices in a network G with n vertices in time polynomial with n, provided k=O(log n). We use our algorithm to obtain ‘treelet’ distributions for k≤ 10 of available PPI networks of unicellular organisms (Saccharomyces cerevisiae Escherichia coli and Helicobacter Pyloris), which are all quite similar, and a multicellular organism (Caenorhabditis elegans) which is significantly different. Furthermore, the treelet distribution of the unicellular organisms are similar to that obtained by the ‘duplication model’ but are quite different from that of the ‘preferential attachment model’. The treelet distribution is robust w.r.t. sparsification with bait/edge coverage of 70% but differences can be observed when bait/edge coverage drops to 50%. Contact:[email protected]


intelligent systems in molecular biology | 2011

Optimally discriminative subnetwork markers predict response to chemotherapy

Phuong Dao; Kendric Wang; Colin Collins; Martin Ester; Anna Lapuk; S. Cenk Sahinalp

Motivation: Molecular profiles of tumour samples have been widely and successfully used for classification problems. A number of algorithms have been proposed to predict classes of tumor samples based on expression profiles with relatively high performance. However, prediction of response to cancer treatment has proved to be more challenging and novel approaches with improved generalizability are still highly needed. Recent studies have clearly demonstrated the advantages of integrating protein–protein interaction (PPI) data with gene expression profiles for the development of subnetwork markers in classification problems. Results: We describe a novel network-based classification algorithm (OptDis) using color coding technique to identify optimally discriminative subnetwork markers. Focusing on PPI networks, we apply our algorithm to drug response studies: we evaluate our algorithm using published cohorts of breast cancer patients treated with combination chemotherapy. We show that our OptDis method improves over previously published subnetwork methods and provides better and more stable performance compared with other subnetwork and single gene methods. We also show that our subnetwork method produces predictive markers that are more reproducible across independent cohorts and offer valuable insight into biological processes underlying response to therapy. Availability: The implementation is available at: http://www.cs.sfu.ca/~pdao/personal/OptDis.html Contact: [email protected]; [email protected]; [email protected]


Genome Research | 2011

Alu repeat discovery and characterization within human genomes.

Fereydoun Hormozdiari; Can Alkan; Mario Ventura; Iman Hajirasouliha; Maika Malig; Faraz Hach; Deniz Yorukoglu; Phuong Dao; Marzieh Bakhshi; S. Cenk Sahinalp; Evan E. Eichler

Human genomes are now being rapidly sequenced, but not all forms of genetic variation are routinely characterized. In this study, we focus on Alu retrotransposition events and seek to characterize differences in the pattern of mobile insertion between individuals based on the analysis of eight human genomes sequenced using next-generation sequencing. Applying a rapid read-pair analysis algorithm, we discover 4342 Alu insertions not found in the human reference genome and show that 98% of a selected subset (63/64) experimentally validate. Of these new insertions, 89% correspond to AluY elements, suggesting that they arose by retrotransposition. Eighty percent of the Alu insertions have not been previously reported and more novel events were detected in Africans when compared with non-African samples (76% vs. 69%). Using these data, we develop an experimental and computational screen to identify ancestry informative Alu retrotransposition events among different human populations.


Journal of Proteome Research | 2011

Mapping the protein interaction network in methicillin-resistant Staphylococcus aureus.

Artem Cherkasov; Michael Hsing; Roya Zoraghi; Leonard J. Foster; Raymond H. See; Nikolay Stoynov; Jihong Jiang; Sukhbir Kaur; Tian Lian; Linda Jackson; Huansheng Gong; Rick Swayze; Emily Amandoron; Farhad Hormozdiari; Phuong Dao; Cenk Sahinalp; Osvaldo Santos-Filho; Peter Axerio-Cilies; Kendall G. Byler; William R. McMaster; Robert C. Brunham; B. Brett Finlay; Neil E. Reiner

Mortality attributable to infection with methicillin-resistant Staphylococcus aureus (MRSA) has now overtaken the death rate for AIDS in the United States, and advances in research are urgently needed to address this challenge. We report the results of the systematic identification of protein-protein interactions for the hospital-acquired strain MRSA-252. Using a high-throughput pull-down strategy combined with quantitative proteomics to distinguish specific from nonspecific interactors, we identified 13,219 interactions involving 608 MRSA proteins. Consecutive analyses revealed that this protein interaction network (PIN) exhibits scale-free organization with the characteristic presence of highly connected hub proteins. When clinical and experimental antimicrobial targets were queried in the network, they were generally found to occupy peripheral positions in the PIN with relatively few interacting partners. In contrast, the hub proteins identified in this MRSA PIN that are essential for network integrity and stability have largely been overlooked as drug targets. Thus, this empirical MRSA-252 PIN provides a rich source for identifying critical proteins essential for network stability, many of which can be considered as prospective antimicrobial drug targets.


Bioinformatics | 2010

Inferring cancer subnetwork markers using density-constrained biclustering

Phuong Dao; Recep Colak; Raheleh Salari; Flavia Moser; Elai Davicioni; Alexander Schönhuth; Martin Ester

Motivation: Recent genomic studies have confirmed that cancer is of utmost phenotypical complexity, varying greatly in terms of subtypes and evolutionary stages. When classifying cancer tissue samples, subnetwork marker approaches have proven to be superior over single gene marker approaches, most importantly in cross-platform evaluation schemes. However, prior subnetwork-based approaches do not explicitly address the great phenotypical complexity of cancer. Results: We explicitly address this and employ density-constrained biclustering to compute subnetwork markers, which reflect pathways being dysregulated in many, but not necessarily all samples under consideration. In breast cancer we achieve substantial improvements over all cross-platform applicable approaches when predicting TP53 mutation status in a well-established non-cross-platform setting. In colon cancer, we raise prediction accuracy in the most difficult instances from 87% to 93% for cancer versus non−cancer and from 83% to (astonishing) 92%, for with versus without liver metastasis, in well-established cross-platform evaluation schemes. Availability: Software is available on request. Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Nucleic Acids Research | 2015

Large scale analysis of the mutational landscape in HT-SELEX improves aptamer discovery

Jan Hoinka; Alexey Berezhnoy; Phuong Dao; Zuben E. Sauna; Eli Gilboa; Teresa M. Przytycka

High-Throughput (HT) SELEX combines SELEX (Systematic Evolution of Ligands by EXponential Enrichment), a method for aptamer discovery, with massively parallel sequencing technologies. This emerging technology provides data for a global analysis of the selection process and for simultaneous discovery of a large number of candidates but currently lacks dedicated computational approaches for their analysis. To close this gap, we developed novel in-silico methods to analyze HT-SELEX data and utilized them to study the emergence of polymerase errors during HT-SELEX. Rather than considering these errors as a nuisance, we demonstrated their utility for guiding aptamer discovery. Our approach builds on two main advancements in aptamer analysis: AptaMut—a novel technique allowing for the identification of polymerase errors conferring an improved binding affinity relative to the ‘parent’ sequence and AptaCluster—an aptamer clustering algorithm which is to our best knowledge, the only currently available tool capable of efficiently clustering entire aptamer pools. We applied these methods to an HT-SELEX experiment developing aptamers against Interleukin 10 receptor alpha chain (IL-10RA) and experimentally confirmed our predictions thus validating our computational methods.


Bioinformatics | 2015

MEMCover: integrated analysis of mutual exclusivity and functional network reveals dysregulated pathways across multiple cancer types

Yoo-Ah Kim; Dong-Yeon Cho; Phuong Dao; Teresa M. Przytycka

MOTIVATION The data gathered by the Pan-Cancer initiative has created an unprecedented opportunity for illuminating common features across different cancer types. However, separating tissue-specific features from across cancer signatures has proven to be challenging. One of the often-observed properties of the mutational landscape of cancer is the mutual exclusivity of cancer driving mutations. Even though studies based on individual cancer types suggested that mutually exclusive pairs often share the same functional pathway, the relationship between across cancer mutual exclusivity and functional connectivity has not been previously investigated. RESULTS We introduce a classification of mutual exclusivity into three basic classes: within tissue type exclusivity, across tissue type exclusivity and between tissue type exclusivity. We then combined across-cancer mutual exclusivity with interactions data to uncover pan-cancer dysregulated pathways. Our new method, Mutual Exclusivity Module Cover (MEMCover) not only identified previously known Pan-Cancer dysregulated subnetworks but also novel subnetworks whose across cancer role has not been appreciated well before. In addition, we demonstrate the existence of mutual exclusivity hubs, putatively corresponding to cancer drivers with strong growth advantages. Finally, we show that while mutually exclusive pairs within or across cancer types are predominantly functionally interacting, the pairs in between cancer mutual exclusivity class are more often disconnected in functional networks.

Collaboration


Dive into the Phuong Dao's collaboration.

Top Co-Authors

Avatar

S. Cenk Sahinalp

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Teresa M. Przytycka

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Martin Ester

Simon Fraser University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Artem Cherkasov

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

Jan Hoinka

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kendric Wang

University of British Columbia

View shared research outputs
Researchain Logo
Decentralizing Knowledge