Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chih-yu Chen is active.

Publication


Featured researches published by Chih-yu Chen.


Nucleic Acids Research | 2014

JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles

Anthony Mathelier; Xiaobei Zhao; Allen W. Zhang; François Parcy; Rebecca Worsley-Hunt; David J. Arenillas; Sorana Buchman; Chih-yu Chen; Alice Yi Chou; Hans Ienasescu; Jonathan S. Lim; Casper Shyr; Ge Tan; Michelle Zhou; Boris Lenhard; Albin Sandelin; Wyeth W. Wasserman

JASPAR (http://jaspar.genereg.net) is the largest open-access database of matrix-based nucleotide profiles describing the binding preference of transcription factors from multiple species. The fifth major release greatly expands the heart of JASPAR—the JASPAR CORE subcollection, which contains curated, non-redundant profiles—with 135 new curated profiles (74 in vertebrates, 8 in Drosophila melanogaster, 10 in Caenorhabditis elegans and 43 in Arabidopsis thaliana; a 30% increase in total) and 43 older updated profiles (36 in vertebrates, 3 in D. melanogaster and 4 in A. thaliana; a 9% update in total). The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets. In addition, the web interface has been enhanced with advanced capabilities in browsing, searching and subsetting. Finally, the new JASPAR release is accompanied by a new BioPython package, a new R tool package and a new R/Bioconductor data package to facilitate access for both manual and automated methods.


Nucleic Acids Research | 2016

JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles

Anthony Mathelier; Oriol Fornes; David J. Arenillas; Chih-yu Chen; Grégoire Denay; Jessica Lee; Wenqiang Shi; Casper Shyr; Ge Tan; Rebecca Worsley-Hunt; Allen W. Zhang; François Parcy; Boris Lenhard; Albin Sandelin; Wyeth W. Wasserman

JASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we expanded the JASPAR CORE collection with 494 new TF binding profiles (315 in vertebrates, 11 in nematodes, 3 in insects, 1 in fungi and 164 in plants) and updated 59 profiles (58 in vertebrates and 1 in fungi). The introduced profiles represent an 83% expansion and 10% update when compared to the previous release. We updated the structural annotation of the TF DNA binding domains (DBDs) following a published hierarchical structural classification. In addition, we introduced 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites. This new JASPAR release is accompanied by a new web tool to infer JASPAR TF binding profiles recognized by a given TF protein sequence. Moreover, we provide the users with a Ruby module complementing the JASPAR API to ease programmatic access and use of the JASPAR collection of profiles. Finally, we provide the JASPAR2016 R/Bioconductor data package with the data of this release.


Human Molecular Genetics | 2014

Spread of X-chromosome inactivation into autosomal sequences: role for DNA elements, chromatin features and chromosomal domains

Allison M. Cotton; Chih-yu Chen; Lucia L. Lam; Wyeth W. Wasserman; Michael S. Kobor; Carolyn J. Brown

X-chromosome inactivation results in dosage equivalence between the X chromosome in males and females; however, over 15% of human X-linked genes escape silencing and these genes are enriched on the evolutionarily younger short arm of the X chromosome. The spread of inactivation onto translocated autosomal material allows the study of inactivation without the confounding evolutionary history of the X chromosome. The heterogeneity and reduced extent of silencing on autosomes are evidence for the importance of DNA elements underlying the spread of silencing. We have assessed DNA methylation in six unbalanced X-autosome translocations using the Illumina Infinium HumanMethylation450 array. Two to 42% of translocated autosomal genes showed this mark of silencing, with the highest degree of inactivation observed for trisomic autosomal regions. Generally, the extent of silencing was greatest close to the translocation breakpoint; however, silencing was detected well over 100 kb into the autosomal DNA. Alu elements were found to be enriched at autosomal genes that escaped from inactivation while L1s were enriched at subject genes. In cells without the translocation, there was enrichment of heterochromatic features such as EZH2 and H3K27me3 for those genes that become silenced when translocated, suggesting that underlying chromatin structure predisposes genes towards silencing. Additionally, the analysis of topological domains indicated physical clustering of autosomal genes of common inactivation status. Overall, our analysis indicated a complex interaction between DNA sequence, chromatin features and the three-dimensional structure of the chromosome.


BMC Medical Genomics | 2014

On the identification of potential regulatory variants within genome wide association candidate SNP sets

Chih-yu Chen; I-Shou Chang; Chao A Hsiung; Wyeth W. Wasserman

BackgroundGenome wide association studies (GWAS) are a population-scale approach to the identification of segments of the genome in which genetic variations may contribute to disease risk. Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits. As there are many SNPs within identified risk loci, and the majority of these are situated within non-coding regions, a key challenge is to identify and prioritize variants affecting regulatory sequences that are likely to contribute to the phenotype assessed.MethodsWe focused investigation on SNPs within lung and breast cancer GWAS loci that reached genome-wide significance for potential roles in gene regulation with a specific focus on SNPs likely to disrupt transcription factor binding sites. Within risk loci, the regulatory potential of sub-regions was classified using relevant open chromatin and epigenetic high throughput sequencing data sets from the ENCODE project in available cancer and normal cell lines. Furthermore, transcription factor affinity altering variants were predicted by comparison of position weight matrix scores between disease and reference alleles. Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference.ResultsThe sets of SNPs, including both the disease-associated markers and those in high linkage disequilibrium with them, were significantly over-represented in regulatory sequences of cancer and/or normal cells; however, over-representation was generally not restricted to disease-relevant tissue specific regions. The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates. Fitting all three criteria, we highlighted breast cancer susceptibility SNPs and a borderline lung cancer relevant SNP located in cancer-specific enhancers overlapping multiple distinct transcription associated factor ChIP-seq binding sites.ConclusionIncorporating high throughput sequencing epigenetic and transcription factor data sets from both cancer and normal cells into cancer genetic studies reveals potential functional SNPs and informs subsequent characterization efforts.


Journal of Computational Biology | 2016

Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters

Yifeng Li; Chih-yu Chen; Wyeth W. Wasserman

Sparse linear models approximate target variable(s) by a sparse linear combination of input variables. Since they are simple, fast, and able to select features, they are widely used in classification and regression. Essentially they are shallow feed-forward neural networks that have three limitations: (1) incompatibility to model nonlinearity of features, (2) inability to learn high-level features, and (3) unnatural extensions to select features in a multiclass case. Deep neural networks are models structured by multiple hidden layers with nonlinear activation functions. Compared with linear models, they have two distinctive strengths: the capability to (1) model complex systems with nonlinear structures and (2) learn high-level representation of features. Deep learning has been applied in many large and complex systems where deep models significantly outperform shallow ones. However, feature selection at the input level, which is very helpful to understand the nature of a complex system, is still not well studied. In genome research, the cis-regulatory elements in noncoding DNA sequences play a key role in the expression of genes. Since the activity of regulatory elements involves highly interactive factors, a deep tool is strongly needed to discover informative features. In order to address the above limitations of shallow and deep models for selecting features of a complex system, we propose a deep feature selection (DFS) model that (1) takes advantages of deep structures to model nonlinearity and (2) conveniently selects a subset of features right at the input level for multiclass data. Simulation experiments convince us that this model is able to correctly identify both linear and nonlinear features. We applied this model to the identification of active enhancers and promoters by integrating multiple sources of genomic information. Results show that our model outperforms elastic net in terms of size of discriminative feature subset and classification accuracy.


BioSystems | 2015

The identification of cis-regulatory elements: A review from a machine learning perspective

Yifeng Li; Chih-yu Chen; Alice M. Kaye; Wyeth W. Wasserman

The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field.


Molecular therapy. Methods & clinical development | 2016

PAX6 MiniPromoters drive restricted expression from rAAV in the adult mouse retina

Jack W. Hickmott; Chih-yu Chen; David J. Arenillas; Andrea J. Korecki; Siu Ling Lam; Laurie L. Molday; Russell J. Bonaguro; Michelle Zhou; Alice Y Chou; Anthony Mathelier; Sanford L. Boye; William W. Hauswirth; Robert S. Molday; Wyeth W. Wasserman; Elizabeth Simpson

Current gene therapies predominantly use small, strong, and readily available ubiquitous promoters. However, as the field matures, the availability of small, cell-specific promoters would be greatly beneficial. Here we design seven small promoters from the human paired box 6 (PAX6) gene and test them in the adult mouse retina using recombinant adeno-associated virus. We chose the retina due to previous successes in gene therapy for blindness, and the PAX6 gene since it is: well studied; known to be driven by discrete regulatory regions; expressed in therapeutically interesting retinal cell types; and mutated in the vision-loss disorder aniridia, which is in need of improved therapy. At the PAX6 locus, 31 regulatory regions were bioinformatically predicted, and nine regulatory regions were constructed into seven MiniPromoters. Driving Emerald GFP, these MiniPromoters were packaged into recombinant adeno-associated virus, and injected intravitreally into postnatal day 14 mice. Four MiniPromoters drove consistent retinal expression in the adult mouse, driving expression in combinations of cell-types that endogenously express Pax6: ganglion, amacrine, horizontal, and Müller glia. Two PAX6-MiniPromoters drive expression in three of the four cell types that express PAX6 in the adult mouse retina. Combined, they capture all four cell types, making them potential tools for research, and PAX6-gene therapy for aniridia.


bioRxiv | 2016

A role for YY1 in sex-biased transcription revealed through X-linked promoter activity and allelic binding analyses

Chih-yu Chen; Wenqiang Shi; Allison Matthews; Yifeng Li; David J. Arenillas; Anthony Mathelier; Masayoshi Itoh; Hideya Kawaji; Timo Lassmann; Yoshihide Hayashizaki; Piero Carninci; Alistair R. R. Forrest; Carolyn J. Brown; Wyeth W. Wasserman

Sex differences in susceptibility and progression have been reported in numerous diseases. Female cells have two copies of the X chromosome with X-chromosome inactivation imparting mono-allelic gene silencing for dosage compensation. However, a subset of genes, named escapees, escape silencing and are transcribed bi-allelically resulting in sexual dimorphism. Here we conducted analyses of the sexes using human datasets to gain perspectives in such regulation. We identified transcription start sites of escapees (escTSSs) based on higher transcription levels in female cells using FANTOM5 CAGE data. Significant over-representations of YY1 transcription factor binding motif and ChIP-seq peaks around escTSSs highlighted its positive association with escapees. Furthermore, YY1 occupancy is significantly biased towards the inactive X (Xi) at long non-coding RNA loci that are frequent contacts of Xi-specific superloops. Our study elucidated the importance of YY1 on transcriptional activity on Xi in general through sequence-specific binding, and its involvement at superloop anchors.


Molecular Therapy | 2015

599. Deep Informatics Utilized to Design MiniPromoters for Driving PAX6-Like Retinal Expression with AAV

Jack W. Hickmott; Chih-yu Chen; David J. Arenillas; Yifeng Li; Laurie L. Molday; Andrea J. Korecki; Siu Ling Lam; Russell J. Bonaguro; Michelle Zhou; Alice Y. Chan; Sanford L. Boye; William W. Hauswirth; Robert S. Molday; Wyeth W. Wasserman; Elizabeth Simpson

Purpose: Gene-based therapies are making a comeback, as evidenced by the clinical success of Glybera, rAAV2. RPE65, and rAAV2. REP1. These therapies utilize ubiquitous promoters, however future gene therapies may benefit from promoters that can direct tissue- and cell-type specific expression. Capturing the expression of a human gene in a promoter small enough to use in AAV is challenging. Paired box six (PAX6) encodes a transcription factor with a complex and potentially therapeutically-useful retinal-expression pattern. Also, PAX6 has several previously published discrete regulatory regions (RRs), and therefore may contain modular regulation suitable for MiniPromoter design. Finally, mutations in PAX6 cause the vision-loss disorder aniridia, and a MiniPromoter capable of recapitulating the expression of PAX6 would be a useful tool for an aniridia gene therapy.Methods: PAX6 RRs were predicted using Hi-C, UCSC Genes, CAGE, 100 vertebrate phastCons, transcription factor binding site (TFBS), Segway and ChromHMM data. MiniPromoters were cloned into AAV genomes containing an emGFP reporter and WPRE, packaged into AAV2(Y272F, Y444F, Y500F, Y730F, T491V), and administered to postnatal day 14 mice by intravitreal injection. Ocular tissue was collected 30 days after injection, emGFP expression was evaluated by epifluorescent imaging of mouse retinas, and staining with antibodies against PAX6, Brn3a, syntaxin, and calbindin. MiniPromoters driving PAX6-like expression were examined for unique TFBSs using oPOSSUM 3.Results: A PAX6 containing highly interactive domain was revealed within which most previously published PAX6 RRs are located. Within this domain, 31 RRs were bioinformatically predicted, nine of which were selected to construct seven MiniPromoters. Of these MiniPromoters, two did not drive interesting emGFP expression in the retina, and five drove expression in the two retinal layers that express PAX6 (ganglion cell and inner nuclear). Twenty-one TFBSs were found to be unique to these MiniPromoters. Interestingly, one of the five (Ple255) not only drives expression in the same cell layers that express PAX6, but also in all the same cell types (ganglion, amacrine, and horizontal).Conclusions: Bioinformatic approaches can be used to design MiniPromoters capturing the endogenous expression pattern of a gene for a specific tissue. The resulting MiniPromoters may be important tools for future retinal gene therapy, with Ple255 being especially applicable to gene therapy for aniridia.


The Annals of Applied Statistics | 2012

A Bayesian measurement error model for two-channel cell-based RNAi data with replicates

Chung-Hsing Chen; Wen-Chi Su; Chih-yu Chen; Jing-Ying Huang; Fang-Yu Tsai; Wen-Chang Wang; Chao A. Hsiung; King-Song Jeng; I-Shou Chang

RNA interference (RNAi) is an endogenous cellular process in which small double-stranded RNAs lead to the destruction of mRNAs with complementary nucleoside sequence. With the production of RNAi libraries, large-scale RNAi screening in human cells can be conducted to identify unknown genes involved in a biological pathway. One challenge researchers face is how to deal with the multiple testing issue and the related false positive rate (FDR) and false negative rate (FNR). This paper proposes a Bayesian hierarchical measurement error model for the analysis of data from a two-channel RNAi high-throughput experiment with replicates, in which both the activity of a particular biological pathway and cell viability are monitored and the goal is to identify short hair-pin RNAs (shRNAs) that affect the pathway activity without affecting cell activity. Simulation studies demonstrate the flexibility and robustness of the Bayesian method and the benefits of having replicates in the experiment. This method is illustrated through analyzing the data from a RNAi high-throughput screening that searches for cellular factors affecting HCV replication without affecting cell viability; comparisons of the results from this HCV study and some of those reported in the literature are included.

Collaboration


Dive into the Chih-yu Chen's collaboration.

Top Co-Authors

Avatar

Wyeth W. Wasserman

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

David J. Arenillas

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

Yifeng Li

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michelle Zhou

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

I-Shou Chang

National Health Research Institutes

View shared research outputs
Top Co-Authors

Avatar

Allen W. Zhang

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

Andrea J. Korecki

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

Carolyn J. Brown

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

Casper Shyr

University of British Columbia

View shared research outputs
Researchain Logo
Decentralizing Knowledge