Is this you? Create Your Porfile

Minmei Hou

Pennsylvania State University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Minmei Hou is active.

Explore More

Publication

Featured researches published by Minmei Hou.

Genome Research | 2014

Alignathon: A competitive assessment of whole genome alignment methods

Dent Earl; Ngan Nguyen; Glenn Hickey; Robert S. Harris; Stephen Fitzgerald; Kathryn Beal; Seledtsov I; Molodtsov; Brian J. Raney; Hiram Clawson; Jaebum Kim; Carsten Kemena; Jia-Ming Chang; Ionas Erb; Poliakov A; Minmei Hou; Javier Herrero; William Kent; Solovyev; Aaron E. Darling; Jian Ma; Cedric Notredame; Michael Brudno; Inna Dubchak; David Haussler; Benedict Paten

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.

SIAM Journal on Computing | 2008

Approximating the Spanning Star Forest Problem and Its Application to Genomic Sequence Alignment

C. Thach Nguyen; Jian Shen; Minmei Hou; Li Sheng; Webb Miller; Louxin Zhang

This paper studies the algorithmic issues of the spanning star forest problem. We prove the following results: (1) there is a polynomial-time approximation scheme for planar unweighted graphs; (2) there is a polynomial-time algorithm with approximation ratio 3/5 for unweighted graphs; (3) it is NP-hard to approximate the problem within ratio 545/546 + ε for unweighted graphs; (4) there is a linear-time algorithm to compute the maximum star forest of a weighted tree; (5) there is a polynomial-time algorithm with approximation ratio 1/2 for weighted graphs. We also show how to apply this spanning star forest model to aligning multiple genomic sequences over a tandem duplication region.

workshop on algorithms in bioinformatics | 2006

Controlling size when aligning multiple genomic sequences with duplications

Minmei Hou; Piotr Berman; Louxin Zhang; Webb Miller

For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align only orthologous segments, i.e., those separated by a speciation event. Otherwise, methods for finding regions under evolutionary selection will not perform properly. Conversely, the alignments should indicate every orthologous pair of genes or genomic segments. Attaining this goal in practice requires a technique for avoiding a combinatorial explosion in the number of local alignments. To better understand this process, we model it as a graph problem of finding a minimum cardinality set of cliques that contain all edges. We provide an upper bound for an important class of graphs (the problem is NP-hard and very difficult to approximate in the general case), and use the bound and computer simulations to evaluate two heuristic solutions. An implementation of one of them is evaluated on mammalian sequences from the α-globin gene cluster.

Bioinformatics | 2007

HomologMiner: looking for homologous genomic groups in whole genomes

Minmei Hou; Piotr Berman; Chih-Hao Hsu; Robert S. Harris

MOTIVATION Complex genomes contain numerous repeated sequences, and genomic duplication is believed to be a main evolutionary mechanism to obtain new functions. Several tools are available for de novo repeat sequence identification, and many approaches exist for clustering homologous protein sequences. We present an efficient new approach to identify and cluster homologous DNA sequences with high accuracy at the level of whole genomes, excluding low-complexity repeats, tandem repeats and annotated interspersed repeats. We also determine the boundaries of each group member so that it closely represents a biological unit, e.g. a complete gene, or a partial gene coding a protein domain. RESULTS We developed a program called HomologMiner to identify homologous groups applicable to genome sequences that have been properly marked for low-complexity repeats and annotated interspersed repeats. We applied it to the whole genomes of human (hg17), macaque (rheMac2) and mouse (mm8). Groups obtained include gene families (e.g. olfactory receptor gene family, zinc finger families), unannotated interspersed repeats and additional homologous groups that resulted from recent segmental duplications. Our program incorporates several new methods: a new abstract definition of consistent duplicate units, a new criterion to remove moderately frequent tandem repeats, and new algorithmic techniques. We also provide preliminary analysis of the output on the three genomes mentioned above, and show several applications including identifying boundaries of tandem gene clusters and novel interspersed repeat families. AVAILABILITY All programs and datasets are downloadable from www.bx.psu.edu/miller_lab.

Bioinformatics | 2011

Pico-inplace-inversions between human and chimpanzee

Minmei Hou; Ping Yao; Angela Antonou; Mitrick A. Johns

MOTIVATION There have been several studies on the micro-inversions between human and chimpanzee, but there are large discrepancies among their results. Furthermore, all of them rely on alignment procedures or existing alignment results to identify inversions. However, the core alignment procedures do not take very small inversions into consideration. Therefore, their analyses cannot find inversions that are too small to be detected by a classic aligner. We call such inversions pico-inversions. RESULTS We re-analyzed human-chimpanzee alignment from the UCSC Genome Browser for micro-inplace-inversions and screened for pico-inplace-inversions using a likelihood ratio test. We report that the quantity of inplace-inversions between human and chimpanzee is substantially greater than what had previously been discovered. We also present the software tool PicoInversionMiner to detect pico-inplace-inversions between closely related species. AVAILABILITY Software tools, scripts and result data are available at http://faculty.cs.niu.edu/~hou/PicoInversion.html. CONTACT [email protected].

research in computational molecular biology | 2009

Aligning Two Genomic Sequences That Contain Duplications

Minmei Hou; Cathy Riemer; Piotr Berman; Ross C. Hardison; Webb Miller

It is difficult to properly align genomic sequences that contain intra-species duplications. With this goal in mind, we have developed a tool, called TOAST (two-way orthologous alignment selection tool), for predicting whether two aligned regions from different species are orthologous, i.e., separated by a speciation event, as opposed to a duplication event. The advantage of restricting alignment to orthologous pairs is that they constitute the aligning regions that are most likely to share the same biological function, and most easily analyzed for evidence of selection. We evaluate TOAST on 12 human/mouse gene clusters.

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine | 2012

Alignment seeding strategies using contiguous pyrimidine purine matches

Minmei Hou; Louxin Zhang; Robert S. Harris

Large-scale genomic pairwise aligners usually start with a seeding procedure, which scans two sequences to obtain base matches (called hits) that follow a certain pattern (called a seed). The seed pattern and size determine the sensitivity and specificity of the seeding procedure and greatly affect the alignment accuracy and computational efficiency. Much effort has been focused on obtaining an optimal (set of) spaced seed(s) to improve sensitivity. However, specificity also becomes a big concern when aligning very long genomic sequences. We present a seeding strategy that identifies contiguous pyrimidine purine (py·pu) matches. This model may improve sensitivity and specificity simultaneously compared to a contiguous base match model. We further present a seeding strategy that identifies contiguous py·pu matches with at least a certain number of contiguous base matches. This model significantly improves sensitivity and specificity simultaneously compared to the base match model. It can also achieve better sensitivity than an optimal spaced seed without loss of specificity, when the ratio of transition to transversion is high. Our examination on the CFTR region of 2M bases between human and mouse shows that this new model can have very high specificity without much loss of sensitivity compared to an optimal spaced seed. Based on the characteristics (e.g. the sequence similarity, the ratio between transition and transversion, and the lengths of gapless alignments) of alignments between human and other mammals, the new seeding strategies are promising in improving alignment quality of a wide selection of species pairs. This paper also lays the groundwork for future advancement of applying spaced patterns in these seeding strategies.

Genome Research | 2005

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes

Adam Siepel; Gill Bejerano; Jakob Skou Pedersen; Angie S. Hinrichs; Minmei Hou; Kate R. Rosenbloom; Hiram Clawson; John Spieth; LaDeana W. Hillier; Stephen Richards; George M. Weinstock; Richard Wilson; Richard A. Gibbs; W. James Kent; Webb Miller; David Haussler

Genome Research | 2007

28-Way vertebrate alignment and conservation track in the UCSC Genome Browser

Webb Miller; Kate R. Rosenbloom; Ross C. Hardison; Minmei Hou; James Taylor; Brian J. Raney; Richard Burhans; David C. King; Robert Baertsch; Daniel Blankenberg; Sergei L. Kosakovsky Pond; Anton Nekrutenko; Belinda Giardine; Robert S. Harris; Svitlana Tyekucheva; Mark Diekhans; Thomas H. Pringle; William J. Murphy; Arthur M. Lesk; George M. Weinstock; Kerstin Lindblad-Toh; Richard A. Gibbs; Eric S. Lander; Adam Siepel; David Haussler; W. James Kent

Genome Research | 2005