Ling-Hong Hung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ling-Hong Hung is active.

Explore More

Publication

Featured researches published by Ling-Hong Hung.

Nucleic Acids Research | 2005

PROTINFO: new algorithms for enhanced protein structure predictions

Ling-Hong Hung; Shing-Chung Ngan; Tianyun Liu; Ram Samudrala

We describe new algorithms and modules for protein structure prediction available as part of the PROTINFO web server. The modules, comparative and de novo modelling, have significantly improved back-end algorithms that were rigorously evaluated at the sixth meeting on the Critical Assessment of Protein Structure Prediction methods. We were one of four server groups invited to make an oral presentation (only the best performing groups are asked to do so). These two modules allow a user to submit a protein sequence and return atomic coordinates representing the tertiary structure of that protein. The PROTINFO server is available at .

Protein Science | 2003

Accurate and automated classification of protein secondary structure with PsiCSI

Ling-Hong Hung; Ram Samudrala

PsiCSI is a highly accurate and automated method of assigning secondary structure from NMR data, which is a useful intermediate step in the determination of tertiary structures. The method combines information from chemical shifts and protein sequence using three layers of neural networks. Training and testing was performed on a suite of 92 proteins (9437 residues) with known secondary and tertiary structure. Using a stringent cross‐validation procedure in which the target and homologous proteins were removed from the databases used for training the neural networks, an average 89% Q3 accuracy (per residue) was observed. This is an increase of 6.2% and 5.5% (representing 36% and 33% fewer errors) over methods that use chemical shifts (CSI) or sequence information (Psipred) alone. In addition, PsiCSI improves upon the translation of chemical shift information to secondary structure (Q3 = 87.4%) and is able to use sequence information as an effective substitute for sparse NMR data (Q3 = 86.9% without 13C shifts and Q3 = 86.8% with only Hα shifts available). Finally, errors made by PsiCSI almost exclusively involve the interchange of helix or strand with coil and not helix with strand (<2.5 occurrences per 10000 residues). The automation, increased accuracy, absence of gross errors, and robustness with regards to sparse data make PsiCSI ideal for high‐throughput applications, and should improve the effectiveness of hybrid NMR/de novo structure determination methods. A Web server is available for users to submit data and have the assignment returned.

Nucleic Acids Research | 2003

PROTINFO: secondary and tertiary protein structure prediction

Ling-Hong Hung; Ram Samudrala

Information about the secondary and tertiary structure of a protein sequence can greatly assist biologists in the generation and testing of hypotheses, as well as design of experiments. The PROTINFO server enables users to submit a protein sequence and request a prediction of the three-dimensional (tertiary) structure based on comparative modeling, fold generation and de novo methods developed by the authors. In addition, users can submit NMR chemical shift data and request protein secondary structure assignment that is based on using neural networks to combine the chemical shifts with secondary structure predictions. The server is available at http://protinfo.compbio.washington.edu.

PLOS ONE | 2016

GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research

Ling-Hong Hung; Daniel Kristiyanto; Sung Bong Lee; Ka Yee Yeung

Reproducibility is vital in science. For complex computational methods, it is often necessary, not just to recreate the code, but also the software and hardware environment to reproduce results. Virtual machines, and container software such as Docker, make it possible to reproduce the exact environment regardless of the underlying hardware and operating system. However, workflows that use Graphical User Interfaces (GUIs) remain difficult to replicate on different host systems as there is no high level graphical software layer common to all platforms. GUIdock allows for the facile distribution of a systems biology application along with its graphics environment. Complex graphics based workflows, ubiquitous in systems biology, can now be easily exported and reproduced on many different platforms. GUIdock uses Docker, an open source project that provides a container with only the absolutely necessary software dependencies and configures a common X Windows (X11) graphic interface on Linux, Macintosh and Windows platforms. As proof of concept, we present a Docker package that contains a Bioconductor application written in R and C++ called networkBMA for gene network inference. Our package also includes Cytoscape, a java-based platform with a graphical user interface for visualizing and analyzing gene networks, and the CyNetworkBMA app, a Cytoscape app that allows the use of networkBMA via the user-friendly Cytoscape interface.

Archive | 2007

De Novo Protein Structure Prediction

Ling-Hong Hung; Shing-Chung Ngan; Ram Samudrala

An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.

Journal of the American Medical Informatics Association | 2018

Reproducible Bioconductor workflows using browser-based interactive notebooks and containers.

Reem Almugbel; Ling-Hong Hung; Jiaming Hu; Abeer M. Almutairy; Nicole E. Ortogero; Yashaswi Tamta; Ka Yee Yeung

Objective Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. Materials and methods We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. Results BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Conclusion Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.

bioRxiv | 2017

Building containerized workflows for RNA-seq data using the BioDepot-workflow-Builder (BwB)

Ling-Hong Hung; Trevor Meiss; Jayant Keswani; Yuguang Xiong; Eric Sobie; Ka Yee Yeung

We present BioDepot-workflow-Builder (BwB), a portable and open-source tool for creating bioinformatics workflows with a simple drag-and-drop graphical user interface. The individual components of the workflows are Docker containers which are available from public repositories or provided by the user. The use of software containers ensures that workflows will give identical results across different operating systems and hardware architectures. The use of Docker also allows for individual components to be deployed on the cloud. The modularity and ease of customization and installation of bioinformatics tools using BwB allows for researchers to efficiently test new workflows and compare competing algorithms. Since BwB itself is packaged in a Docker container, the setup is minimal. In particular, users only need to install Docker and have access to a web browser to begin creating and running workflows. As a proof-of-concept case study, we illustrated the feasibility of BwB by developing widgets for the RNA-seq differential expression analysis workflow employed by the NIH BD2K-LINCS Drug Toxicity Signature Generation Center at Mount Sinai. The app and all the containers are available on the BioDepot repository (https://hub.docker.eom/r/biodepot).

bioRxiv | 2017

Software solutions for reproducible RNA-seq workflows

Trevor Meiss; Ling-Hong Hung; Yuguang Xiong; Eric Sobie; Ka Yee Yeung

Computational workflows typically consist of many tools that are usually distributed as compiled binaries or source code. Each of these software tools typically depends on other installed software, and performance could potentially vary due to versions, updates, and operating systems. We show here that the analysis of mRNA-seq data can depend on the computing environment, and we demonstrate that software containers represent practical solutions that ensure the reproducibility of RNAseq data analyses.

bioRxiv | 2017

Integration of multiple data sources for gene network inference using genetic perturbation data

Xiao Liang; William Chad Young; Ling-Hong Hung; Adrian E. Raftery; Ka Yee Yeung

Background The inference of gene regulatory networks is of great interest and has various applications. The recent advances in high-throughout biological data collection have facilitated the construction and understanding of gene regulatory networks in many model organisms. However, the inference of gene networks from large-scale human genomic data can be challenging. Generally, it is difficult to identify the correct regulators for each gene in the large search space, given that the high dimensional gene expression data only provides a small number of observations for each gene. Results We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we assemble multiple data sources including gene expression data, genome-wide binding data, gene ontology, known pathways and use a supervised learning framework to compute prior probabilities of regulatory relationships. We show that our integrated method improves the accuracy of inferred gene networks. We apply our method to two different human cell lines, which illustrates the general scope of our method. Conclusions We present a flexible and systematic framework for external data integration that improves the accuracy of human gene network inference while retaining efficiency. Integrating various data sources of biological information also provides a systematic way to build on knowledge from existing literature.

GigaScience | 2017

fastBMA: scalable network inference and transitive reduction

Ling-Hong Hung; Kaiyuan Shi; Migao Wu; William Chad Young; Adrian E. Raftery; Ka Yee Yeung

Abstract Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/).

Explore More