Lei Kong
Peking University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lei Kong.
Nucleic Acids Research | 2007
Lei Kong; Yong Zhang; Zhi-Qiang Ye; Xiaoqiao Liu; Shuqi Zhao; Liping Wei
Recent transcriptome studies have revealed that a large number of transcripts in mammals and other organisms do not encode proteins but function as noncoding RNAs (ncRNAs) instead. As millions of transcripts are generated by large-scale cDNA and EST sequencing projects every year, there is a need for automatic methods to distinguish protein-coding RNAs from noncoding RNAs accurately and quickly. We developed a support vector machine-based classifier, named Coding Potential Calculator (CPC), to assess the protein-coding potential of a transcript based on six biologically meaningful sequence features. Tenfold cross-validation on the training dataset and further testing on several large datasets showed that CPC can discriminate coding from noncoding transcripts with high accuracy. Furthermore, CPC also runs an order-of-magnitude faster than a previous state-of-the-art tool and has higher accuracy. We developed a user-friendly web-based interface of CPC at http://cpc.cbi.pku.edu.cn. In addition to predicting the coding potential of the input transcripts, the CPC web server also graphically displays detailed sequence features and additional annotations of the transcript that may facilitate users’ further investigation.
Nucleic Acids Research | 2011
Chen Xie; Xizeng Mao; Jiaju Huang; Yang Ding; Jianmin Wu; Shan Dong; Lei Kong; Chuan-Yun Li; Liping Wei
High-throughput experimental technologies often identify dozens to hundreds of genes related to, or changed in, a biological or pathological process. From these genes one wants to identify biological pathways that may be involved and diseases that may be implicated. Here, we report a web server, KOBAS 2.0, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations. It allows for both ID mapping and cross-species sequence similarity mapping. It then performs statistical tests to identify statistically significantly enriched pathways and diseases. KOBAS 2.0 incorporates knowledge across 1327 species from 5 pathway databases (KEGG PATHWAY, PID, BioCyc, Reactome and Panther) and 5 human disease databases (OMIM, KEGG DISEASE, FunDO, GAD and NHGRI GWAS Catalog). KOBAS 2.0 can be accessed at http://kobas.cbi.pku.edu.cn.
Nucleic Acids Research | 2014
Jinpu Jin; He Zhang; Lei Kong; Jingchu Luo
With the aim to provide a resource for functional and evolutionary study of plant transcription factors (TFs), we updated the plant TF database PlantTFDB to version 3.0 (http://planttfdb.cbi.pku.edu.cn). After refining the TF classification pipeline, we systematically identified 129 288 TFs from 83 species, of which 67 species have genome sequences, covering main lineages of green plants. Besides the abundant annotation provided in the previous version, we generated more annotations for identified TFs, including expression, regulation, interaction, conserved elements, phenotype information, expert-curated descriptions derived from UniProt, TAIR and NCBI GeneRIF, as well as references to provide clues for functional studies of TFs. To help identify evolutionary relationship among identified TFs, we assigned 69 450 TFs into 3924 orthologous groups, and constructed 9217 phylogenetic trees for TFs within the same families or same orthologous groups, respectively. In addition, we set up a TF prediction server in this version for users to identify TFs from their own sequences.
Nucleic Acids Research | 2017
Jinpu Jin; Feng Tian; Dechang Yang; Yu-Qi Meng; Lei Kong; Jingchu Luo
With the goal of providing a comprehensive, high-quality resource for both plant transcription factors (TFs) and their regulatory interactions with target genes, we upgraded plant TF database PlantTFDB to version 4.0 (http://planttfdb.cbi.pku.edu.cn/). In the new version, we identified 320 370 TFs from 165 species, presenting a more comprehensive genomic TF repertoires of green plants. Besides updating the pre-existing abundant functional and evolutionary annotation for identified TFs, we generated three new types of annotation which provide more directly clues to investigate functional mechanisms underlying: (i) a set of high-quality, non-redundant TF binding motifs derived from experiments; (ii) multiple types of regulatory elements identified from high-throughput sequencing data; (iii) regulatory interactions curated from literature and inferred by combining TF binding motifs and regulatory elements. In addition, we upgraded previous TF prediction server, and set up four novel tools for regulation prediction and functional enrichment analyses. Finally, we set up a novel companion portal PlantRegMap (http://plantregmap.cbi.pku.edu.cn) for users to access the regulation resource and analysis tools conveniently.
Database | 2011
Jonathan M. Guberman; J. Ai; Olivier Arnaiz; Joachim Baran; Andrew Blake; Richard Baldock; Claude Chelala; David Croft; Anthony Cros; Rosalind J. Cutts; A. Di Génova; Simon A. Forbes; T. Fujisawa; Emanuela Gadaleta; David Goodstein; Gunes Gundem; Bernard Haggarty; Syed Haider; Matthew Hall; Todd W. Harris; Robin Haw; Songnian Hu; Simon J. Hubbard; Jack Hsu; Vivek Iyer; Philip Jones; Toshiaki Katayama; Rhoda Kinsella; Lei Kong; Daniel Lawson
BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities. Database URL: http://central.biomart.org.
Bioinformatics | 2014
An Xiao; Zhenchao Cheng; Lei Kong; Zuoyan Zhu; Shuo Lin; Bo Zhang
The CRISPR/Cas or Cas9/guide RNA system is a newly developed, easily engineered and highly effective tool for gene targeting; it has considerable off-target effects in cultured human cells and in several organisms. However, the Cas9/guide RNA target site is too short for existing alignment tools to exhaustively and effectively identify potential off-target sites. CasOT is a local tool designed to find potential off-target sites in any given genome or user-provided sequence, with user-specified types of protospacer adjacent motif, and number of mismatches allowed in the seed and non-seed regions. AVAILABILITY http://eendb.zfgenetics.org/casot/ CONTACT: [email protected] or [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online.
BMC Bioinformatics | 2006
Xiyin Wang; Xiaoli Shi; Zhe Li; Qihui Zhu; Lei Kong; Wen Ying Tang; Song Ge; Jingchu Luo
BackgroundThe identification of chromosomal homology will shed light on such mysteries of genome evolution as DNA duplication, rearrangement and loss. Several approaches have been developed to detect chromosomal homology based on gene synteny or colinearity. However, the previously reported implementations lack statistical inferences which are essential to reveal actual homologies.ResultsIn this study, we present a statistical approach to detect homologous chromosomal segments based on gene colinearity. We implement this approach in a software package ColinearScan to detect putative colinear regions using a dynamic programming algorithm. Statistical models are proposed to estimate proper parameter values and evaluate the significance of putative homologous regions. Statistical inference, high computational efficiency and flexibility of input data type are three key features of our approach.ConclusionWe apply ColinearScan to the Arabidopsis and rice genomes to detect duplicated regions within each species and homologous fragments between these two species. We find many more homologous chromosomal segments in the rice genome than previously reported. We also find many small colinear segments between rice and Arabidopsis genomes.
Nucleic Acids Research | 2007
Yong Zhang; Jiong-Tang Li; Lei Kong; Qing-Rong Liu; Liping Wei
Natural antisense transcripts (NATs) are reverse complementary at least in part to the sequences of other endogenous sense transcripts. Most NATs are transcribed from opposite strands of their sense partners. They regulate sense genes at multiple levels and are implicated in various diseases. Using an improved whole-genome computational pipeline, we identified abundant cis-encoded exon-overlapping sense–antisense (SA) gene pairs in human (7356), mouse (6806), fly (1554), and eight other eukaryotic species (total 6534). We developed NATsDB (Natural Antisense Transcripts DataBase, ) to enable efficient browsing, searching and downloading of this currently most comprehensive collection of SA genes, grouped into six classes based on their overlapping patterns. NATsDB also includes non-exon-overlapping bidirectional (NOB) genes and non-bidirectional (NBD) genes. To facilitate the study of functions, regulations and possible pathological implications, NATsDB includes extensive information about gene structures, poly(A) signals and tails, phastCons conservation, homologues in other species, repeat elements, expressed sequence tag (EST) expression profiles and OMIM disease association. NATsDB supports interactive graphical display of the alignment of all supporting EST and mRNA transcripts of the SA and NOB genes to the genomic loci. It supports advanced search by species, gene name, sequence accession number, chromosome location, coding potential, OMIM association and sequence similarity.
BioMed Research International | 2013
Ming Ma; Adam Yongxin Ye; Weiguo Zheng; Lei Kong
Cas9/CRISPR has been reported to efficiently induce targeted gene disruption and homologous recombination in both prokaryotic and eukaryotic cells. Thus, we developed a Guide RNA Sequence Design Platform for the Cas9/CRISPR silencing system for model organisms. The platform is easy to use for gRNA design with input query sequences. It finds potential targets by PAM and ranks them according to factors including uniqueness, SNP, RNA secondary structure, and AT content. The platform allows users to upload and share their experimental results. In addition, most guide RNA sequences from published papers have been put into our database.
Nucleic Acids Research | 2008
Jiong-Tang Li; Yong Zhang; Lei Kong; Qing-Rong Liu; Liping Wei
Natural antisense transcripts are at least partially complementary to their sense transcripts. Cis-Sense/Antisense pairs (cis-SAs) have been extensively characterized and known to play diverse regulatory roles, whereas trans-Sense/Antisense pairs (trans-SAs) in animals are poorly studied. We identified long trans-SAs in human and nine other animals, using ESTs to increase coverage significantly over previous studies. The percentage of transcriptional units (TUs) involved in trans-SAs among all TUs was as high as 4.13%. Particularly 2896 human TUs (or 2.89% of all human TUs) were involved in 3327 trans-SAs. Sequence complementarities over multiple segments with predicted RNA hybridization indicated that some trans-SAs might have sophisticated RNA–RNA pairing patterns. One-fourth of human trans-SAs involved noncoding TUs, suggesting that many noncoding RNAs may function by a trans-acting antisense mechanism. TUs in trans-SAs were statistically significantly enriched in nucleic acid binding, ion/protein binding and transport and signal transduction functions and pathways; a significant number of human trans-SAs showed concordant or reciprocal expression pattern; a significant number of human trans-SAs were conserved in mouse. This evidence suggests important regulatory functions of trans-SAs. In 30 cases, trans-SAs were related to cis-SAs through paralogues, suggesting a possible mechanism for the origin of trans-SAs. All trans-SAs are available at http://trans.cbi.pku.edu.cn/.