Sin Lam Tan
Marshfield Clinic
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sin Lam Tan.
Nature Genetics | 2006
Piero Carninci; Albin Sandelin; Boris Lenhard; Shintaro Katayama; Kazuro Shimokawa; Jasmina Ponjavic; Colin A. Semple; Martin S. Taylor; Pär G. Engström; Martin C. Frith; Alistair R. R. Forrest; Wynand B.L. Alkema; Sin Lam Tan; Charles Plessy; Rimantas Kodzius; Timothy Ravasi; Takeya Kasukawa; Shiro Fukuda; Mutsumi Kanamori-Katayama; Yayoi Kitazume; Hideya Kawaji; Chikatoshi Kai; Mari Nakamura; Hideaki Konno; Kenji Nakano; Salim Mottagui-Tabar; Peter Arner; Alessandra Chesi; Stefano Gustincich; Francesca Persichetti
Mammalian promoters can be separated into two classes, conserved TATA box–enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3′ UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.
Nature Biotechnology | 2004
Vladimir B. Bajic; Sin Lam Tan; Yutaka Suzuki; Sumio Sugano
Promoter prediction programs (PPPs) are important for in silico gene discovery without support from expressed sequence tag (EST)/cDNA/mRNA sequences, in the analysis of gene regulation and in genome annotation. Contrary to previous expectations, a comprehensive analysis of PPPs reveals that no program simultaneously achieves sensitivity and a positive predictive value >65%. PPP performances deduced from a limited number of chromosomes or smaller data sets do not hold when evaluated at the level of the whole genome, with serious inaccuracy of predictions for non-CpG-island-related promoters. Some PPPs even perform worse than, or close to, pure random guessing.
Nucleic Acids Research | 2003
Vladimir B. Bajic; Sin Lam Tan; Allen Chong; Suisheng Tang; Anders Ström; Jan Åke Gustafsson; Chin-Yo Lin; Edison T. Liu
We present a unique program for identification of estrogen response elements (EREs) in genomic DNA and related analyses. The detection algorithm was tested on several large datasets and makes one prediction in 13 300 nt while achieving a sensitivity of 83%. Users can further investigate selected regions around the identified ERE patterns for transcription factor binding sites based on the TRANSFAC database. It is also possible to search for candidate human genes with a match for the identified EREs and their flanking regions within EPD annotated promoters. Additionally, users can search among the extended promoter regions of approximately 11 000 human genes for those that have a high degree of similarity to the identified ERE patterns. Dragon ERE Finder version 2 is freely available for academic and non-profit users (http://sdmc.lit.org.sg/ERE-V2/index).
PLOS Genetics | 2006
Martin C. Frith; Laurens Wilming; Alistair Raymond Russell Forrest; Hideya Kawaji; Sin Lam Tan; Claes Wahlestedt; Vladimir B. Bajic; Chikatoshi Kai; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki; Timothy L. Bailey; Lukasz Huminiecki
The mammalian transcriptome harbours shadowy entities that resist classification and analysis. In analogy with pseudogenes, we define pseudo–messenger RNA to be RNA molecules that resemble protein-coding mRNA, but cannot encode full-length proteins owing to disruptions of the reading frame. Using a rigorous computational pipeline, which rules out sequencing errors, we identify 10,679 pseudo–messenger RNAs (approximately half of which are transposon-associated) among the 102,801 FANTOM3 mouse cDNAs: just over 10% of the FANTOM3 transcriptome. These comprise not only transcribed pseudogenes, but also disrupted splice variants of otherwise protein-coding genes. Some may encode truncated proteins, only a minority of which appear subject to nonsense-mediated decay. The presence of an excess of transcripts whose only disruptions are opal stop codons suggests that there are more selenoproteins than currently estimated. We also describe compensatory frameshifts, where a segment of the gene has changed frame but remains translatable. In summary, we survey a large class of non-standard but potentially functional transcripts that are likely to encode genetic information and effect biological processes in novel ways. Many of these transcripts do not correspond cleanly to any identifiable object in the genome, implying fundamental limits to the goal of annotating all functional elements at the genome sequence level.
BMC Bioinformatics | 2006
Manisha Brahmachary; Christian Schönbach; Liang Yang; Enli Huang; Sin Lam Tan; Rajesh Chowdhary; S. P. T. Krishnan; Chin-Yo Lin; David A. Hume; Chikatoshi Kai; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki; Vladimir B. Bajic
BackgroundMammalian antimicrobial peptides (AMPs) are effectors of the innate immune response. A multitude of signals coming from pathways of mammalian pathogen/pattern recognition receptors and other proteins affect the expression of AMP-coding genes (AMPcgs). For many AMPcgs the promoter elements and transcription factors that control their tissue cell-specific expression have yet to be fully identified and characterized.ResultsBased upon the RIKEN full-length cDNA and public sequence data derived from human, mouse and rat, we identified 178 candidate AMP transcripts derived from 61 genes belonging to 29 AMP families. However, only for 31 mouse genes belonging to 22 AMP families we were able to determine true orthologous relationships with 30 human and 15 rat sequences. We screened the promoter regions of AMPcgs in the three species for motifs by an ab initio motif finding method and analyzed the derived promoter characteristics. Promoter models were developed for alpha-defensins, penk and zap AMP families. The results suggest a core set of transcription factors (TFs) that regulate the transcription of AMPcg families in mouse, rat and human. The three most frequent core TFs groups include liver-, nervous system-specific and nuclear hormone receptors (NHRs). Out of 440 motifs analyzed, we found that three represent potentially novel TF-binding motifs enriched in promoters of AMPcgs, while the other four motifs appear to be species-specific.ConclusionOur large-scale computational analysis of promoters of 22 families of AMPcgs across three mammalian species suggests that their key transcriptional regulators are likely to be TFs of the liver-, nervous system-specific and NHR groups. The computationally inferred promoter elements and potential TF binding motifs provide a rich resource for targeted experimental validation of TF binding and signaling studies that aim at the regulation of mouse, rat or human AMPcgs.
Nucleic Acids Research | 2007
Suisheng Tang; Zhuo Zhang; Sin Lam Tan; Man-Hung Eric Tang; Arun Prashanth Kumar; Suresh Kumar Ramadoss; Vladimir B. Bajic
Estrogen has a profound impact on human physiology affecting transcription of numerous genes. To decipher functional characteristics of estrogen responsive genes, we developed KnowledgeBase for Estrogen Responsive Genes (KBERG). Genes in KBERG were derived from Estrogen Responsive Gene Database (ERGDB) and were analyzed from multiple aspects. We explored the possible transcription regulation mechanism by capturing highly conserved promoter motifs across orthologous genes, using promoter regions that cover the range of [−1200, +500] relative to the transcription start sites. The motif detection is based on ab initio discovery of common cis-elements from the orthologous gene cluster from human, mouse and rat, thus reflecting a degree of promoter sequence preservation during evolution. The identified motifs are linked to transcription factor binding sites based on the TRANSFAC database. In addition, KBERG uses two established ontology systems, GO and eVOC, to associate genes with their function. Users may assess gene functionality through the description terms in GO. Alternatively, they can gain gene co-expression information through evidence from human EST libraries via eVOC. KBERG is a user-friendly system that provides links to other relevant resources such as ERGDB, UniGene, Entrez Gene, HomoloGene, GO, eVOC and GenBank, and thus offers a platform for functional exploration and potential annotation of genes responsive to estrogen. KBERG database can be accessed at .
PLOS ONE | 2012
Rajesh Chowdhary; Sin Lam Tan; Jinfeng Zhang; Shreyas Karnik; Vladimir B. Bajic; Jun S. Liu
Background Protein interaction networks (PINs) specific within a particular context contain crucial information regarding many cellular biological processes. For example, PINs may include information on the type and directionality of interaction (e.g. phosphorylation), location of interaction (i.e. tissues, cells), and related diseases. Currently, very few tools are capable of deriving context-specific PINs for conducting exploratory analysis. Results We developed a literature-based online system, Context-specific Protein Network Miner (CPNM), which derives context-specific PINs in real-time from the PubMed database based on a set of user-input keywords and enhanced PubMed query system. CPNM reports enriched information on protein interactions (with type and directionality), their network topology with summary statistics (e.g. most densely connected proteins in the network; most densely connected protein-pairs; and proteins connected by most inbound/outbound links) that can be explored via a user-friendly interface. Some of the novel features of the CPNM system include PIN generation, ontology-based PubMed query enhancement, real-time, user-queried, up-to-date PubMed document processing, and prediction of PIN directionality. Conclusions CPNM provides a tool for biologists to explore PINs. It is freely accessible at http://www.biotextminer.com/CPNM/.
data mining in bioinformatics | 2013
Rajesh Chowdhary; Jinfeng Zhang; Sin Lam Tan; Daniel E. Osborne; Vladimir B. Bajic; Jun S. Liu
Information on Protein Interactions (Pls) is valuable for biomedical research, but often lies buried in the scientific literature and cannot be readily retrieved. While much progress has been made over the years in extracting Pls from the literature using computational methods, there is a lack of free, public, user-friendly tools for the discovery of Pls. We developed an online tool for the extraction of PI relationships from PubMed-abstracts, which we name PIMiner. Protein pairs and the words that describe their interactions are reported by PIMiner so that new interactions can be easily detected within text. The interaction likelihood levels are reported too. The option to extract only specific types of interactions is also provided. The PIMiner server can be accessed through a web browser or remotely through a clients command line. PIMiner can process 50,000 PubMed abstracts in approximately 7 min and thus appears suitable for large-scale processing of biological/biomedical literature.
international conference of the ieee engineering in medicine and biology society | 2012
Shreyas Karnik; Sin Lam Tan; Bess Berg; Ingrid Glurich; Jinfeng Zhang; Humberto Vidaillet; C. David Page; Rajesh Chowdhary
Electronic Health Records (EHR) contain large amounts of useful information that could potentially be used for building models for predicting onset of diseases. In this study, we have investigated the use of free-text and coded data in Marshfield Clinics EHR, individually and in combination for building machine learning based models to predict the first ever episode of atrial fibrillation and/or atrial flutter (AFF). We trained and evaluated our AFF models on the EHR data across different time intervals (1, 3, 5 and all years) prior to first documented onset of AFF. We applied several machine learning methods, including naïve bayes, support vector machines (SVM), logistic regression and random forests for building AFF prediction models and evaluated these using 10-fold cross-validation approach. On text-based datasets, the best model achieved an F-measure of 60.1%, when applied exclusively to coded data. The combination of textual and coded data achieved comparable performance. The study results attest to the relative merit of utilizing textual data to complement the use of coded data for disease onset prediction modeling.
American Journal of Respiratory Cell and Molecular Biology | 2012
Rajesh Chowdhary; Sin Lam Tan; Giulio Pavesi; Jingjing Jin; Difeng Dong; Sameer K. Mathur; Arthur Burkart; Vipin Narang; Ingrid Glurich; Benjamin A. Raby; Scott T. Weiss; Limsoon Wong; Jun S. Liu; Vladimir B. Bajic
Many genes have been implicated in the pathogenesis of common respiratory and related diseases (RRDs), yet the underlying mechanisms are largely unknown. Differential gene expression patterns in diseased and healthy individuals suggest that RRDs affect or are affected by modified transcription regulation programs. It is thus crucial to characterize implicated genes in terms of transcriptional regulation. For this purpose, we conducted a promoter analysis of genes associated with 11 common RRDs including allergic rhinitis, asthma, bronchiectasis, bronchiolitis, bronchitis, chronic obstructive pulmonary disease, cystic fibrosis, emphysema, eczema, psoriasis, and urticaria, many of which are thought to be genetically related. The objective of the present study was to obtain deeper insight into the transcriptional regulation of these disease-associated genes by annotating their promoter regions with transcription factors (TFs) and TF binding sites (TFBSs). We discovered many TFs that are significantly enriched in the target disease groups including associations that have been documented in the literature. We also identified a number of putative TFs/TFBSs that appear to be novel. The results of our analysis are provided in an online database that is freely accessible to researchers at http://www.respiratorygenomics.com. Promoter-associated TFBS information and related genomic features, such as histone modification sites, microsatellites, CpG islands, and SNPs, are graphically summarized in the database. Users can compare and contrast underlying mechanisms of specific RRDs relative to candidate genes, TFs, gene ontology terms, micro-RNAs, and biological pathways for the conduct of metaanalyses. This database represents a novel, useful resource for RRD researchers.