Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wentian Li is active.

Publication


Featured researches published by Wentian Li.


IEEE Transactions on Information Theory | 1992

Random texts exhibit Zipf's-law-like word frequency distribution

Wentian Li

It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipfs law observed in natural languages such as English. The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the words length to its rank, which stretches an exponential function to a power law function. >


Genes and Immunity | 2006

High-density SNP analysis of 642 Caucasian families with rheumatoid arthritis identifies two new linkage regions on 11p12 and 2q33

Christopher I. Amos; Wei Chen; Annette Lee; Wentian Li; Marlena Kern; R. Lundsten; Franak Batliwalla; Mark H. Wener; Elaine F. Remmers; D. A. Kastner; Lindsey A. Criswell; Michael F. Seldin; Peter K. Gregersen

We have completed a genome wide linkage scan using >5700 informative single-nucleotide polymorphism (SNP) markers (Illumina IV SNP linkage panel) in 642 Caucasian families containing affected sibling pairs with rheumatoid arthritis (RA), ascertained by the North American Rheumatoid Arthritis Consortium. The results show striking new evidence of linkage at chromosomes 2q33 and 11p12 with logarithm of odds (LOD) scores of 3.52 and 3.09, respectively. In addition to a strong and broad linkage interval surrounding the major histocompatibility complex (LOD>16), regions with LOD>2.5 were observed on chromosomes 5 and 10. Additional linkage evidence (LOD scores between 1.46 and 2.35) was also observed on chromosomes 4, 7, 12, 16 and 18. This new evidence for multiple regions of genetic linkage is partly explained by the significantly increased information content of the Illumina IV SNP linkage panel (75.6%) compared with a standard microsatellite linkage panel utilized previously (mean 52.6%). Stratified analyses according to whether or not the sibling pair members showed elevated anticyclic citrullinated peptide titers indicates significant variation in evidence for linkage among strata on chromosomes 4, 5, 6 and 7. Overall, these new linkage data should reinvigorate efforts to utilize positional information to identify susceptibility genes for RA.


Computational Biology and Chemistry | 1999

STATISTICAL PROPERTIES OF OPEN READING FRAMES IN COMPLETE GENOME SEQUENCES

Wentian Li

Some statistical properties of open reading frames in all currently available complete genome sequences are analyzed (seventeen prokatyotic genomes, and 16 chromosome sequences from the yeast genome). The size distribution of open reading frames is characterized by various techniques, such as quantile tables, QQ-plots, rank-size plots (Zipfs plots), and spatial densities. The issue of the influence of CG% on the size distribution is addressed. When yeast chromosomes are compared with archaeal and eubacterial genomes, they tend to have more long open reading frames. There is little or no evidence to reject the null hypothesis that open reading frames on six different reading frames and two strands distribute similarly. A topic of current interest, the base composition asymmetry in open reading frames between the two strands, is studied using regression analysis. The base composition asymmetry at three codon positions is analyzed separately. It was shown in these genome sequences that the first codon position is G- and A-rich (i.e. purine-rich); there is a co-existence of A- and T-rich branches at the second codon position; and the third codon position is weakly T-rich.


Entropy | 2010

Fitting Ranked Linguistic Data with Two-Parameter Functions

Wentian Li; Pedro Miramontes; Germinal Cocho

It is well known that many ranked linguistic data can fit well with one-parameter models such as Zipf’s law for ranked word frequencies. However, in cases where discrepancies from the one-parameter model occur (these will come at the two extremes of the rank), it is natural to use one more parameter in the fitting model. In this paper, we compare several two-parameter models, including Beta function, Yule function, Weibull function—all can be framed as a multiple regression in the logarithmic scale—in their fitting performance of several ranked linguistic data, such as letter frequencies, word-spacings, and word frequencies. We observed that Beta function fits the ranked letter frequency the best, Yule function fits the ranked word-spacing distribution the best, and Altmann, Beta, Yule functions all slightly outperform the Zipf’s power-law function in word ranked- frequency distribution.


BMC Bioinformatics | 2009

Partial correlation analysis indicates causal relationships between GC-content, exon density and recombination rate in the human genome

Jan Freudenberg; Mingyi Wang; Yaning Yang; Wentian Li

BackgroundSeveral features are known to correlate with the GC-content in the human genome, including recombination rate, gene density and distance to telomere. However, by testing for pairwise correlation only, it is impossible to distinguish direct associations from indirect ones and to distinguish between causes and effects.ResultsWe use partial correlations to construct partially directed graphs for the following four variables: GC-content, recombination rate, exon density and distance-to-telomere. Recombination rate and exon density are unconditionally uncorrelated, but become inversely correlated by conditioning on GC-content. This pattern indicates a model where recombination rate and exon density are two independent causes of GC-content variation.ConclusionCausal inference and graphical models are useful methods to understand genome evolution and the mechanisms of isochore evolution in the human genome.


Genome Research | 2009

Two-parameter characterization of chromosome-scale recombination rate

Wentian Li; Jan Freudenberg

The genome-wide recombination rate (RR) of a species is often described by one parameter, the ratio between total genetic map length (G) and physical map length (P), measured in centimorgans per megabase (cM/Mb). The value of this parameter varies greatly between species, but the cause for these differences is not entirely clear. A constraining factor of overall RR in a species, which may cause increased RR for smaller chromosomes, is the requirement of at least one chiasma per chromosome (or chromosome arm) per meiosis. In the present study, we quantify the relative excess of recombination events on smaller chromosomes by a linear regression model, which relates the genetic length of chromosomes to their physical length. We find for several species that the two-parameter regression, G = G(0) + k x P , provides a better characterization of the relationship between genetic and physical map length than the one-parameter regression that runs through the origin. A nonzero intercept (G(0)) indicates a relative excess of recombination on smaller chromosomes in a genome. Given G(0), the parameter k predicts the increase of genetic map length over the increase of physical map length. The observed values of G(0) have a similar magnitude for diverse species, whereas k varies by two orders of magnitude. The implications of this strategy for the genetic maps of human, mouse, rat, chicken, honeybee, worm, and yeast are discussed.


Molecular Medicine | 2015

Enrichment of Genetic Variants for Rheumatoid Arthritis within T-Cell and NK-Cell Enhancer Regions.

Jan Freudenberg; Peter K. Gregersen; Wentian Li

To identify disease-causative variants, we intersected the published results of a metaanalysis of genome-wide association studies (GWAS) for rheumatoid arthritis (RA) with the set of enhancer regions for 71 primary cell types that was provided by the FANTOM consortium. We first retrieved all single nucleotide polymorphisms (SNPs) that are associated (P < 5 × 108) with RA in the GWAS meta-analysis and that are located in any of these enhancer regions. After excluding the major histocompatibility complex (MHC) region, we identified 50 such RA-associated SNPs that are located in enhancer regions. Enhancer sets from different cell types were then compared with each other for their number of RA-associated SNPs by permutation analysis. This analysis showed that RA-associated SNPs are preferentially located in enhancers from several immunological cell types. In particular, we see a strong relative enrichment in enhancer regions that are active in T cells (P < 0.001) and NK cells (P < 0.001). Several loci display multiple RA-associated SNPs in tight linkage disequilibrium that are located within the same or neighboring enhancers. These haplotypes may have a greater likelihood to influence enhancer activity than any SNP on its own. Taken together, these results support the hypothesis that RA-causative variants often act through altering the activity of immune cell enhancers. The enrichment in T-cell and NK-cell enhancer regions indicates that expression changes in these cell types are particularly relevant for the pathogenesis of RA. The specific SNPs that account for this enrichment can be used as a basis for focused genotype-phenotype studies of these cell types.


PLOS ONE | 2016

Beyond Zipf’s Law: The Lavalette Rank Function and Its Properties

Oscar Fontanelli; Pedro Miramontes; Yaning Yang; Germinal Cocho; Wentian Li

Although Zipf’s law is widespread in natural and social data, one often encounters situations where one or both ends of the ranked data deviate from the power-law function. Previously we proposed the Beta rank function to improve the fitting of data which does not follow a perfect Zipf’s law. Here we show that when the two parameters in the Beta rank function have the same value, the Lavalette rank function, the probability density function can be derived analytically. We also show both computationally and analytically that Lavalette distribution is approximately equal, though not identical, to the lognormal distribution. We illustrate the utility of Lavalette rank function in several datasets. We also address three analysis issues on the statistical testing of Lavalette fitting function, comparison between Zipf’s law and lognormal distribution through Lavalette function, and comparison between lognormal distribution and Lavalette distribution.


Comparative and Functional Genomics | 2013

Periodic distribution of a putative nucleosome positioning motif in human, nonhuman primates, and archaea: mutual information analysis.

Daniela Sosa; Pedro Miramontes; Wentian Li; Victor Mireles; Juan R. Bobadilla; Marco V. José

Recently, Trifonovs group proposed a 10-mer DNA motif YYYYYRRRRR as a solution of the long-standing problem of sequence-based nucleosome positioning. To test whether this generic decamer represents a biological meaningful signal, we compare the distribution of this motif in primates and Archaea, which are known to contain nucleosomes, and in Eubacteria, which do not possess nucleosomes. The distribution of the motif is analyzed by the mutual information function (MIF) with a shifted version of itself (MIF profile). We found common features in the patterns of this generic decamer on MIF profiles among primate species, and interestingly we found conspicuous but dissimilar MIF profiles for each Archaea tested. The overall MIF profiles for each chromosome in each primate species also follow a similar pattern. Trifonovs generic decamer may be a highly conserved motif for the nucleosome positioning, but we argue that this is not the only motif. The distribution of this generic decamer exhibits previously unidentified periodicities, which are associated to highly repetitive sequences in the genome. Alu repetitive elements contribute to the most fundamental structure of nucleosome positioning in higher Eukaryotes. In some regions of primate chromosomes, the distribution of the decamer shows symmetrical patterns including inverted repeats.


Royal Society Open Science | 2017

Population patterns in World’s administrative units

Oscar Fontanelli; Pedro Miramontes; Germinal Cocho; Wentian Li

Whereas there has been an extended discussion concerning city population distribution, little has been said about that of administrative divisions. In this work, we investigate the population distribution of second-level administrative units of 150 countries and territories and propose the discrete generalized beta distribution (DGBD) rank-size function to describe the data. After testing the balance between the goodness of fit and number of parameters of this function compared with a power law, which is the most common model for city population, the DGBD is a good statistical model for 96% of our datasets and preferred over a power law in almost every case. Moreover, the DGBD is preferred over a power law for fitting country population data, which can be seen as the zeroth-level administrative unit. We present a computational toy model to simulate the formation of administrative divisions in one dimension and give numerical evidence that the DGBD arises from a particular case of this model. This model, along with the fitting of the DGBD, proves adequate in reproducing and describing local unit evolution and its effect on the population distribution.

Collaboration


Dive into the Wentian Li's collaboration.

Top Co-Authors

Avatar

Pedro Miramontes

National Autonomous University of Mexico

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Germinal Cocho

National Autonomous University of Mexico

View shared research outputs
Top Co-Authors

Avatar

Peter K. Gregersen

The Feinstein Institute for Medical Research

View shared research outputs
Top Co-Authors

Avatar

Oscar Fontanelli

National Autonomous University of Mexico

View shared research outputs
Top Co-Authors

Avatar

Annette Lee

The Feinstein Institute for Medical Research

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

D. A. Kastner

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Danny Ben-Avraham

Albert Einstein College of Medicine

View shared research outputs
Top Co-Authors

Avatar

Elaine F. Remmers

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge