Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chun-Hsi Huang is active.

Publication


Featured researches published by Chun-Hsi Huang.


Nature Methods | 2006

Minimotif Miner: a tool for investigating protein function

Sudha Balla; Vishal Thapar; Snigdha Verma; ThaiBinh Luong; Tanaz Faghri; Chun-Hsi Huang; Sanguthevar Rajasekaran; Jacob J. del Campo; Jessica H Shinn; William A. Mohler; Mark W. Maciejewski; Michael R. Gryk; Bryan Piccirillo; Stanley R Schiller; Martin R. Schiller

In addition to large domains, many short motifs mediate functional post-translational modification of proteins as well as protein-protein interactions and protein trafficking functions. We have constructed a motif database comprising 312 unique motifs and a web-based tool for identifying motifs in proteins. Functional motifs predicted by MnM can be ranked by several approaches, and we validated these scores by analyzing thousands of confirmed examples and by confirming prediction of previously unidentified 14-3-3 motifs in EFF-1.


Journal of Computational Biology | 2005

Exact algorithms for planted motif problems.

Sanguthevar Rajasekaran; Sudha Balla; Chun-Hsi Huang

The problem of identifying meaningful patterns (i.e., motifs) from biological data has been studied extensively due to its paramount importance. Three versions of this problem have been identified in the literature. One of these three problems is the planted (l, d)-motif problem. Several instances of this problem have been posed as a challenge. Numerous algorithms have been proposed in the literature that address this challenge. Many of these algorithms fall under the category of heuristic algorithms. In this paper we present algorithms for the planted (l, d)-motif problem that always find the correct answer(s). Our algorithms are very simple and are based on some ideas that are fundamentally different from the ones employed in the literature. We believe that the techniques we introduce in this paper will find independent applications.


BMC Bioinformatics | 2009

PCA-based population structure inference with generic clustering algorithms.

Chih Lee; Ali Abdool; Chun-Hsi Huang

BackgroundHandling genotype data typed at hundreds of thousands of loci is very time-consuming and it is no exception for population structure inference. Therefore, we propose to apply PCA to the genotype data of a population, select the significant principal components using the Tracy-Widom distribution, and assign the individuals to one or more subpopulations using generic clustering algorithms.ResultsWe investigated K-means, soft K-means and spectral clustering and made comparison to STRUCTURE, a model-based algorithm specifically designed for population structure inference. Moreover, we investigated methods for predicting the number of subpopulations in a population. The results on four simulated datasets and two real datasets indicate that our approach performs comparably well to STRUCTURE. For the simulated datasets, STRUCTURE and soft K-means with BIC produced identical predictions on the number of subpopulations. We also showed that, for real dataset, BIC is a better index than likelihood in predicting the number of subpopulations.ConclusionOur approach has the advantage of being fast and scalable, while STRUCTURE is very time-consuming because of the nature of MCMC in parameter estimation. Therefore, we suggest choosing the proper algorithm based on the application of population structure inference.


BioTechniques | 2013

LASAGNA-Search: an integrated web tool for transcription factor binding site search and visualization.

Chih Lee; Chun-Hsi Huang

The release of ChIP-seq data from the ENCyclopedia Of DNA Elements (ENCODE) and Model Organism ENCyclopedia Of DNA Elements (modENCODE) projects has significantly increased the amount of transcription factor (TF) binding affinity information available to researchers. However, scientists still routinely use TF binding site (TFBS) search tools to scan unannotated sequences for TFBSs, particularly when searching for lesser-known TFs or TFs in organisms for which ChIP-seq data are unavailable. The sequence analysis often involves multiple steps such as TF model collection, promoter sequence retrieval, and visualization; thus, several different tools are required. We have developed a novel integrated web tool named LASAGNA-Search that allows users to perform TFBS searches without leaving the web site. LASAGNA-Search uses the LASAGNA (Length-Aware Site Alignment Guided by Nucleotide Association) algorithm for TFBS alignment. Important features of LASAGNA-Search include (i) acceptance of unaligned variable-length TFBSs, (ii) a collection of 1726 TF models, (iii) automatic promoter sequence retrieval, (iv) visualization in the UCSC Genome Browser, and (v) gene regulatory network inference and visualization based on binding specificities. LASAGNA-Search is freely available at http://biogrid.engr.uconn.edu/lasagna_search/.


asia-pacific bioinformatics conference | 2005

Exact algorithms for planted motif challenge problems.

Sanguthevar Rajasekaran; Sudha Balla; Chun-Hsi Huang

The problem of identifying meaningful patterns (i.e., motifs) from biological data has been studied extensively due to its paramount importance. Three versions of this problem have been identified in the literature. One of these three problems is the planted (l, d)-motif problem. Several instances of this problem have been posed as a challenge. Numerous algorithms have been proposed in the literature that address this challenge. Many of these algorithms fall under the category of approximation algorithms. In this paper we present algorithms for the planted (l, d)-motif problem that always find the correct answer(s). Our algorithms are very simple and are based on some ideas that are fundamentally different from the ones employed in the literature. We believe that the techniques we introduce in this paper will find independent applications. This research has been supported in part by the NSF Grants CCR-9912395 and ITR-0326155.


Biology Direct | 2014

A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data

Ngoc Tam L. Tran; Chun-Hsi Huang

ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.ReviewersThis article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong).


BMC Bioinformatics | 2006

Clustering of gene expression data: performance and similarity analysis.

Longde Yin; Chun-Hsi Huang; Jun Ni

BackgroundDNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. The evaluation of feasible and applicable clustering algorithms is becoming an important issue in todays bioinformatics research.ResultsIn this paper we first experimentally study three major clustering algorithms: Hierarchical Clustering (HC), Self-Organizing Map (SOM), and Self Organizing Tree Algorithm (SOTA) using Yeast Saccharomyces cerevisiae gene expression data, and compare their performance. We then introduce Cluster Diff, a new data mining tool, to conduct the similarity analysis of clusters generated by different algorithms. The performance study shows that SOTA is more efficient than SOM while HC is the least efficient. The results of similarity analysis show that when given a target cluster, the Cluster Diff can efficiently determine the closest match from a set of clusters. Therefore, it is an effective approach for evaluating different clustering algorithms.ConclusionHC methods allow a visual, convenient representation of genes. However, they are neither robust nor efficient. The SOM is more robust against noise. A disadvantage of SOM is that the number of clusters has to be fixed beforehand. The SOTA combines the advantages of both hierarchical and SOM clustering. It allows a visual representation of the clusters and their structure and is not sensitive to noises. The SOTA is also more flexible than the other two clustering methods. By using our data mining tool, Cluster Diff, it is possible to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods.


Journal of Clinical Monitoring and Computing | 2005

High-performance exact algorithms for motif search

Sanguthevar Rajasekaran; Sudha Balla; Chun-Hsi Huang; Vishal Thapar; Michael R. Gryk; Mark W. Maciejewski; Martin R. Schiller

Objective. The human genome project has resulted in the generation of voluminous biological data. Novel computational techniques are called for to extract useful information from this data. One such technique is that of finding patterns that are repeated over many sequences (and possibly over many species). In this paper we study the problem of identifying meaningful patterns (i.e., motifs) from biological data, the motif search problem. Methods. The general version of the motif search problem is NP-hard. Numerous algorithms have been proposed in the literature to solve this problem. Many of these algorithms fall under the category of heuristics. We concentrate on exact algorithms in this paper. In particular, we concentrate on two different versions of the motif search problem and offer exact algorithms for them. Results. In this paper we present algorithms for two versions of the motif search problem. All of our algorithms are elegant and use only such simple data structures as arrays. For the first version of the problem described as Problem 1 in the paper, we present a simple sorting based algorithm, SMS (Simple Motif Search). This algorithm has been coded and experimental results have been obtained. For the second version of the problem (described in the paper as Problem 2), we present two different algorithms – a deterministic algorithm (called DMS) and a randomized algorithm (Monte Carlo algorithm). We also show how these algorithms can be parallelized.Conclusions. All the algorithms proposed in this paper are improvements over existing algorithms for these versions of motif search in biological sequence data. The algorithms presented have the potential of performing well in practice.


Bioinformatics | 2014

LASAGNA-Search 2.0: integrated transcription factor binding site search and visualization in a browser

Chih Lee; Chun-Hsi Huang

UNLABELLED LASAGNA-Search 2.0 is an integrated webtool for transcription factor (TF) binding site search and visualization. The tool is based on the LASAGNA (Length-Aware Site Alignment Guided by Nucleotide Association) algorithm. It eliminates manual TF model collection and promoter sequence retrieval. Search results can be visualized locally or in the University of California Santa Cruz Genome Browser. Gene regulatory network inference based on the search results offers another way of visualization. A list of TFs and target genes is all a user needs to start using the tool. LASAGNA-Search 2.0 currently offers 1792 TF models and supports 15 species for automatic promoter retrieval and visualization in the University of California Santa Cruz Genome Browser. It is a user-friendly tool designed for non-bioinformaticians and is suitable for research and teaching. We describe important changes made since the initial release. AVAILABILITY AND IMPLEMENTATION LASAGNA-Search 2.0 is freely available without registration at http://biogrid.engr.uconn.edu/lasagna_search/.


BMC Bioinformatics | 2013

LASAGNA: A novel algorithm for transcription factor binding site alignment

Chih Lee; Chun-Hsi Huang

BackgroundScientists routinely scan DNA sequences for transcription factor (TF) bindingsites (TFBSs). Most of the available tools rely on position-specific scoringmatrices (PSSMs) constructed from aligned binding sites. Because of theresolutions of assays used to obtain TFBSs, databases such as TRANSFAC,ORegAnno and PAZAR store unaligned variable-length DNA segments containingbinding sites of a TF. These DNA segments need to be aligned to build aPSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly78% of the TFs in the public release do not have matrices available. As workon TFBS alignment algorithms has been limited, it is highly desirable tohave an alignment algorithm tailored to TFBSs.ResultsWe designed a novel algorithm named LASAGNA, which is aware of the lengths ofinput TFBSs and utilizes position dependence. Results on 189 TFs of 5species in the TRANSFAC database showed that our method significantlyoutperformed ClustalW2 and MEME. We further compared a PSSM method dependenton LASAGNA to an alignment-free TFBS search method. Results on 89 TFs whosebinding sites can be located in genomes showed that our method issignificantly more precise at fixed recall rates. Finally, we describedLASAGNA-ChIP, a more sophisticated version for ChIP (Chromatinimmunoprecipitation) experiments. Under the one-per-sequence model, itshowed comparable performance with MEME in discovering motifs in ChIP-seqpeak sequences.ConclusionsWe conclude that the LASAGNA algorithm is simple and effective in aligningvariable-length binding sites. It has been integrated into a user-friendlywebtool for TFBS search and visualization called LASAGNA-Search. The toolcurrently stores precomputed PSSM models for 189 TFs and 133 TFs built fromTFBSs in the TRANSFAC Public database (release 7.0) and the ORegAnnodatabase (08Nov10 dump), respectively. The webtool is available athttp://biogrid.engr.uconn.edu/lasagna_search/.

Collaboration


Dive into the Chun-Hsi Huang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chih Lee

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar

Xin He

University at Buffalo

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sudha Balla

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar

Vishal Thapar

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar

Chain-Wu Lee

State University of New York System

View shared research outputs
Top Co-Authors

Avatar

Longde Yin

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar

Mark W. Maciejewski

University of Connecticut Health Center

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge