Is this you? Create Your Porfile

Shikui Tu

University of Massachusetts Medical School

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shikui Tu is active.

Explore More

Publication

Featured researches published by Shikui Tu.

Cell | 2014

The HP1 Homolog Rhino Anchors a Nuclear Complex that Suppresses piRNA Precursor Splicing

Zhao Zhang; Jie Wang; Nadine Schultz; Fan Zhang; Swapnil S. Parhad; Shikui Tu; Thom Vreven; Phillip D. Zamore; Zhiping Weng; William E. Theurkauf

piRNAs guide an adaptive genome defense system that silences transposons during germline development. The Drosophila HP1 homolog Rhino is required for germline piRNA production. We show that Rhino binds specifically to the heterochromatic clusters that produce piRNA precursors, and that binding directly correlates with piRNA production. Rhino colocalizes to germline nuclear foci with Rai1/DXO-related protein Cuff and the DEAD box protein UAP56, which are also required for germline piRNA production. RNA sequencing indicates that most cluster transcripts are not spliced and that rhino, cuff, and uap56 mutations increase expression of spliced cluster transcripts over 100-fold. LacI::Rhino fusion protein binding suppresses splicing of a reporter transgene and is sufficient to trigger piRNA production from a trans combination of sense and antisense reporters. We therefore propose that Rhino anchors a nuclear complex that suppresses cluster transcript splicing and speculate that stalled splicing differentiates piRNA precursors from mRNAs.

PLOS ONE | 2016

Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features

Longqiang Luo; Dingfang Li; Wen Zhang; Shikui Tu; Xiaopeng Zhu; Gang Tian

Background Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. Methods In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposon-derived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models. Results We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets. Conclusions Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File.

bioinformatics and biomedicine | 2016

Drug side effect prediction through linear neighborhoods and multiple data source integration

Wen Zhang; Yanlin Chen; Shikui Tu; Feng Liu; Qianlong Qu

predicting drug side effects is a critical task in the drug discovery, which attracts great attentions in both academy and industry. Although lots of machine learning methods have been proposed, great challenges arise with boom of precision medicine. On one hand, many methods are based on the assumption that similar drugs may share same side effects, but measuring the drug-drug similarity appropriately is challenging. One the other hand, multi-source data provide diverse information for the analysis of side effects, and should be integrated for the high-accuracy prediction. In this paper, we tackle the side effect prediction problem through linear neighborhoods and multi-source data integration. In the feature space, linear neighborhoods are constructed to extract the drug-drug similarity, namely “linear neighborhood similarity”. By transferring the similarity into the side effect space, known side effect information is propagated through the similarity-based graph. Thus, we propose the linear neighborhood similarity method (LNSM), which utilizes single-source data for the side effect prediction. Further, we extend LNSM to deal with multi-source data, and propose two data integration methods: similarity matrix integration method (LNSM-SMI) and cost minimization integration method (LNSM-CMI), which integrate drug substructure data, drug target data, drug transporter data, drug enzyme data, drug pathway data and drug indication data to improve the prediction accuracy. The proposed methods are evaluated on the benchmark datasets. The linear neighborhood similarity method (LNSM) can produce satisfying results on the single-source data. Data integration methods (LNSM-SMI and LNSM-CMI) can effectively integrate multi-source data, and outperform other state-of-the-art side effect prediction methods in the cross validation and independent test. The proposed methods are promising for the drug side effect prediction.

Pattern Recognition Letters | 2012

A theoretical investigation of several model selection criteria for dimensionality reduction

Shikui Tu; Lei Xu

Based on the problem of determining the hidden dimensionality (or the number of latent factors) of Factor Analysis (FA) model, this paper provides a theoretic comparison on several classical model selection criteria, including Akaikes Information Criterion (AIC), Bozdogans Consistent Akaikes Information Criterion (CAIC), Hannan-Quinn information criterion (HQC), Schwarzs Bayesian Information Criterion (BIC). We focus on building up a partial order of the relative underestimation tendency. The order is shown to be AIC, HQC, BIC, and CAIC, indicating the underestimation probabilities from small to large. This order indicates an order of model selection performances to great extent, because underestimations usually take the major proportion of wrong selections when the sample size and the population signal-to-noise ratio (SNR, defined as the ratio of the smallest variance of the hidden dimensions to the variance of noise) decrease. Synthetic experiments by varying the values of the SNR and the training sample size N verify the theoretical results.

Nucleic Acids Research | 2015

Comparative functional characterization of the CSR-1 22G-RNA pathway in Caenorhabditis nematodes

Shikui Tu; Monica Z. Wu; Jie Wang; Asher D. Cutter; Zhiping Weng; Julie M. Claycomb

As a champion of small RNA research for two decades, Caenorhabditis elegans has revealed the essential Argonaute CSR-1 to play key nuclear roles in modulating chromatin, chromosome segregation and germline gene expression via 22G-small RNAs. Despite CSR-1 being preserved among diverse nematodes, the conservation and divergence in function of the targets of small RNA pathways remains poorly resolved. Here we apply comparative functional genomic analysis between C. elegans and Caenorhabditis briggsae to characterize the CSR-1 pathway, its targets and their evolution. C. briggsae CSR-1-associated small RNAs that we identified by immunoprecipitation-small RNA sequencing overlap with 22G-RNAs depleted in cbr-csr-1 RNAi-treated worms. By comparing 22G-RNAs and target genes between species, we defined a set of CSR-1 target genes with conserved germline expression, enrichment in operons and more slowly evolving coding sequences than other genes, along with a small group of evolutionarily labile targets. We demonstrate that the association of CSR-1 with chromatin is preserved, and show that depletion of cbr-csr-1 leads to chromosome segregation defects and embryonic lethality. This first comparative characterization of a small RNA pathway in Caenorhabditis establishes a conserved nuclear role for CSR-1 and highlights its key role in germline gene regulation across multiple animal species.

Proteome Science | 2011

A binary matrix factorization algorithm for protein complex prediction

Shikui Tu; Runsheng Chen; Lei Xu

BackgroundIdentifying biologically relevant protein complexes from a large protein-protein interaction (PPI) network, is essential to understand the organization of biological systems. However, high-throughput experimental techniques that can produce a large amount of PPIs are known to yield non-negligible rates of false-positives and false-negatives, making the protein complexes difficult to be identified.ResultsWe propose a binary matrix factorization (BMF) algorithm under the Bayesian Ying-Yang (BYY) harmony learning, to detect protein complexes by clustering the proteins which share similar interactions through factorizing the binary adjacent matrix of a PPI network. The proposed BYY-BMF algorithm automatically determines the cluster number while this number is pre-given for most existing BMF algorithms. Also, BYY-BMF’s clustering results does not depend on any parameters or thresholds, unlike the Markov Cluster Algorithm (MCL) that relies on a so-called inflation parameter. On synthetic PPI networks, the predictions evaluated by the known annotated complexes indicate that BYY-BMF is more robust than MCL for most cases. On real PPI networks from the MIPS and DIP databases, BYY-BMF obtains a better balanced prediction accuracies than MCL and a spectral analysis method, while MCL has its own advantages, e.g., with good separation values.

EMBO Reports | 2015

Glycolytic enzymes localize to ribonucleoprotein granules in Drosophila germ cells, bind Tudor and protect from transposable elements

Ming Gao; Travis Thomson; T. Michael Creed; Shikui Tu; Sudan N. Loganathan; Christina A. Jackson; Patrick McCluskey; Yanyan Lin; Scott E. Collier; Zhiping Weng; Paul Lasko; Melanie D. Ohi; Alexey L. Arkov

Germ cells give rise to all cell lineages in the next‐generation and are responsible for the continuity of life. In a variety of organisms, germ cells and stem cells contain large ribonucleoprotein granules. Although these particles were discovered more than 100 years ago, their assembly and functions are not well understood. Here we report that glycolytic enzymes are components of these granules in Drosophila germ cells and both their mRNAs and the enzymes themselves are enriched in germ cells. We show that these enzymes are specifically required for germ cell development and that they protect their genomes from transposable elements, providing the first link between metabolism and transposon silencing. We further demonstrate that in the granules, glycolytic enzymes associate with the evolutionarily conserved Tudor protein. Our biochemical and single‐particle EM structural analyses of purified Tudor show a flexible molecule and suggest a mechanism for the recruitment of glycolytic enzymes to the granules. Our data indicate that germ cells, similarly to stem cells and tumor cells, might prefer to produce energy through the glycolytic pathway, thus linking a particular metabolism to pluripotency.

Neurocomputing | 2014

Learning local factor analysis versus mixture of factor analyzers with automatic model selection

Lei Shi; Zhi-Yong Liu; Shikui Tu; Lei Xu

Considering Factor Analysis (FA) for each component of Gaussian Mixture Model (GMM), clustering and local dimensionality reduction can be addressed simultaneously by Mixture of Factor Analyzers (MFA) and Local Factor Analysis (LFA), which correspond to two FA parameterizations, respectively. This paper investigates the performance of Variational Bayes (VB) and Bayesian Ying-Yang (BYY) harmony learning on MFA/LFA for the problem of automatically determining the component number and the local hidden dimensionalities (i.e., the number of factors of FA in each component). Similar to the existing VB learning algorithm on MFA, we develop an alternative VB algorithm on LFA with a similar conjugate Dirichlet-Normal-Gamma (DNG) prior on all parameters of LFA. Also, the corresponding BYY algorithms are developed for MFA and LFA. A wide range of synthetic experiments shows that LFA is superior to MFA in model selection under either VB or BYY, while BYY outperforms VB reliably on both MFA and LFA. These empirical findings are consistently observed from real applications on not only face and handwritten digit images clustering, but also unsupervised image segmentation.

Cell Reports | 2018

The Coding Regions of Germline mRNAs Confer Sensitivity to Argonaute Regulation in C. elegans

Meetu Seth; Masaki Shirayama; Wen Tang; En-zhi Shen; Shikui Tu; Heng-Chi Lee; Zhiping Weng; Craig C. Mello

SUMMARY Protein-coding genes undergo a wide array of regulatory interactions with factors that engage non-coding regions. Open reading frames (ORFs), in contrast, are thought to be constrained by coding function, precluding a major role in gene regulation. Here, we explore Piwi-interacting (pi)RNA-mediated transgene silencing in C. elegans and show that marked differences in the sensitivity to piRNA silencing map to the endogenous sequences within transgene ORFs. Artificially increasing piRNA targeting within the ORF of a resistant transgene can lead to a partial yet stable reduction in expression, revealing that piRNAs not only silence but can also “tune” gene expression. Our findings support a model that involves a temporal element to mRNA regulation by germline Argonautes, likely prior to translation, and suggest that piRNAs afford incremental control of germline mRNA expression by targeting the body of the mRNA, including the coding region.

Developmental Cell | 2017

Adaptive Evolution Leads to Cross-Species Incompatibility in the piRNA Transposon Silencing Machinery

Swapnil S. Parhad; Shikui Tu; Zhiping Weng; William E. Theurkauf

Reproductive isolation defines species divergence and is linked to adaptive evolution of hybrid incompatibility genes. Hybrids between Drosophila melanogaster and Drosophila simulans are sterile, and phenocopy mutations in the PIWI interacting RNA (piRNA) pathway, which silences transposons and shows pervasive adaptive evolution, and Drosophila rhino and deadlock encode rapidly evolving components of a complex that binds to piRNA clusters. We show that Rhino and Deadlock interact and co-localize in simulans and melanogaster, but simulans Rhino does not bind melanogaster Deadlock, due to substitutions in the rapidly evolving Shadow domain. Significantly, a chimera expressing the simulans Shadow domain in a melanogaster Rhino backbone fails to support piRNA production, disrupts binding to piRNA clusters, and leads to ectopic localization to bulk heterochromatin. Fusing melanogaster Deadlock to simulans Rhino, by contrast, restores localization to clusters. Deadlock binding thus directs Rhino to piRNA clusters, and Rhino-Deadlock co-evolution has produced cross-species incompatibilities, which may contribute to reproductive isolation.

Explore More