Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where De-Shuang Huang is active.

Publication


Featured researches published by De-Shuang Huang.


european conference on computer vision | 2012

Robust and efficient subspace segmentation via least squares regression

Canyi Lu; Hai Min; Zhong-Qiu Zhao; Lin Zhu; De-Shuang Huang; Shuicheng Yan

This paper studies the subspace segmentation problem which aims to segment data drawn from a union of multiple linear subspaces. Recent works by using sparse representation, low rank representation and their extensions attract much attention. If the subspaces from which the data drawn are independent or orthogonal, they are able to obtain a block diagonal affinity matrix, which usually leads to a correct segmentation. The main differences among them are their objective functions. We theoretically show that if the objective function satisfies some conditions, and the data are sufficiently drawn from independent subspaces, the obtained affinity matrix is always block diagonal. Furthermore, the data sampling can be insufficient if the subspaces are orthogonal. Some existing methods are all special cases. Then we present the Least Squares Regression (LSR) method for subspace segmentation. It takes advantage of data correlation, which is common in real data. LSR encourages a grouping effect which tends to group highly correlated data together. Experimental results on the Hopkins 155 database and Extended Yale Database B show that our method significantly outperforms state-of-the-art methods. Beyond segmentation accuracy, all experiments demonstrate that LSR is much more efficient.


Bioinformatics | 2018

iEnhancer-EL: Identifying enhancers and their strength with ensemble learning approach

Bin Liu; Kai Li; De-Shuang Huang; Kuo-Chen Chou

Motivation Identification of enhancers and their strength is important because they play a critical role in controlling gene expression. Although some bioinformatics tools were developed, they are limited in discriminating enhancers from non‐enhancers only. Recently, a two‐layer predictor called ‘iEnhancer‐2L’ was developed that can be used to predict the enhancers strength as well. However, its prediction quality needs further improvement to enhance the practical application value. Results A new predictor called ‘iEnhancer‐EL’ was proposed that contains two layer predictors: the first one (for identifying enhancers) is formed by fusing an array of six key individual classifiers, and the second one (for their strength) formed by fusing an array of ten key individual classifiers. All these key classifiers were selected from 171 elementary classifiers formed by SVM (Support Vector Machine) based on kmer, subsequence profile and PseKNC (Pseudo K‐tuple Nucleotide Composition), respectively. Rigorous cross‐validations have indicated that the proposed predictor is remarkably superior to the existing state‐of‐the‐art one in this area. Availability and implementation A web server for the iEnhancer‐EL has been established at http://bioinformatics.hitsz.edu.cn/iEnhancer‐EL/, by which users can easily get their desired results without the need to go through the mathematical details. Supplementary information Supplementary data are available at Bioinformatics online.


Science in China Series F: Information Sciences | 2016

Understanding tissue-specificity with human tissue-specific regulatory networks

Wei-Li Guo; Lin Zhu; Suping Deng; Xingming Zhao; De-Shuang Huang

Tissue-specificity is important for the function of human body. However, it is still not clear how the functional diversity of different tissues is achieved. Here we construct gene regulatory networks in 13 human tissues by integrating large-scale transcription factor (TF)-gene regulations with gene and protein expression data. By comparing these regulatory networks, we find many tissue-specific regulations that are important for tissue identity. In particular, the tissue-specific TFs are found to regulate more genes than those expressed in multiple tissues, and the processes regulated by these tissue-specific TFs are closely related to tissue functions. Moreover, the regulations that are present in certain tissue are found to be enriched in the tissue associated disease genes, and these networks provide the molecular context of disease genes. Therefore, recognizing tissuespecific regulatory networks can help better understand the molecular mechanisms underlying diseases and identify new disease genes.


Scientific Reports | 2017

WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data

Hongbo Zhang; Lin Zhu; De-Shuang Huang

Although discriminative motif discovery (DMD) methods are promising for eliciting motifs from high-throughput experimental data, due to consideration of computational expense, most of existing DMD methods have to choose approximate schemes that greatly restrict the search space, leading to significant loss of predictive accuracy. In this paper, we propose Weakly-Supervised Motif Discovery (WSMD) to discover motifs from ChIP-seq datasets. In contrast to the learning strategies adopted by previous DMD methods, WSMD allows a “global” optimization scheme of the motif parameters in continuous space, thereby reducing the information loss of model representation and improving the quality of resultant motifs. Meanwhile, by exploiting the connection between DMD framework and existing weakly supervised learning (WSL) technologies, we also present highly scalable learning strategies for the proposed method. The experimental results on both real ChIP-seq datasets and synthetic datasets show that WSMD substantially outperforms former DMD methods (including DREME, HOMER, XXmotif, motifRG and DECOD) in terms of predictive accuracy, while also achieving a competitive computational speed.


IEEE Transactions on Nanobioscience | 2016

Collaborative Completion of Transcription Factor Binding Profiles via Local Sensitive Unified Embedding

Lin Zhu; Wei-Li Guo; Canyi lu; De-Shuang Huang

Although the newly available ChIP-seq data provides immense opportunities for comparative study of regulatory activities across different biological conditions, due to cost, time or sample material availability, it is not always possible for researchers to obtain binding profiles for every protein in every sample of interest, which considerably limits the power of integrative studies. Recently, by leveraging related information from measured data, Ernst et al. proposed ChromImpute for predicting additional ChIP-seq and other types of datasets, it is demonstrated that the imputed signal tracks accurately approximate the experimentally measured signals, and thereby could potentially enhance the power of integrative analysis. Despite the success of ChromImpute, in this paper, we reexamine its learning process, and show that its performance may degrade substantially and sometimes may even fail to output a prediction when the available data is scarce. This limitation could hurt its applicability to important predictive tasks, such as the imputation of TF binding data. To alleviate this problem, we propose a novel method called Local Sensitive Unified Embedding (LSUE) for imputing new ChIP-seq datasets. In LSUE, the ChIP-seq data compendium are fused together by mapping proteins, samples, and genomic positions simultaneously into the Euclidean space, thereby making their underling associations directly evaluable using simple calculations. In contrast to ChromImpute which mainly makes use of the local correlations between available datasets, LSUE can better estimate the overall data structure by formulating the representation learning of all involved entities as a single unified optimization problem. Meanwhile, a novel form of local sensitive low rank regularization is also proposed to further improve the performance of LSUE. Experimental evaluations on the ENCODE TF ChIP-seq data illustrate the performance of the proposed model. The code of LSUE is available at https://github.com/ekffar/LSUE.


Bioinformatics | 2017

Direct AUC optimization of regulatory motifs

Lin Zhu; Hongbo Zhang; De-Shuang Huang

Motivation: The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high‐throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. Results: We propose a novel algorithm called CDAUC for optimizing DML‐learned motifs based on the area under the receiver‐operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate‐wise manner, the cost function of each resultant sub‐problem is a piece‐wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high‐throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. Availability and Implementation: CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


bioinformatics and biomedicine | 2016

ILSES: Identification lysine succinylation-sites with ensemble classification

Wenzheng Bao; Lin Zhu; De-Shuang Huang

Lysine succinylation is one of most important types in protein post-translational modification, which is involved in many cellular processes and serious diseases. However, effective recognition of such sites with traditional experiment methods may seem to be treated as time-consuming and laborious. Those methods can hardly meet the need of efficient identification a great deal of succinylated sites at speed. In this work, several physicochemical properties of succinylated sites have been extracted, such as the physicochemical property of the amino acids. Flexible neural tree, which is employed as the classification model, was utilized to integrate above mentioned features for generating a novel lysine succinylation prediction framework named ILSES (identification lysine succinylation-sites with ensemble features classification). Such method owns the ability to combining diverse features to predict lysine succinylation with high accuracy and real time.


bioinformatics and biomedicine | 2016

Learning regulatory motifs by direct optimization of Fisher Exact Test Score

Lin Zhu; Ning Li; Wenzheng Bao; De-Shuang Huang

Built upon the hypergeometric distribution, the Fisher Exact Test score (FETS) and its variants offer a natural way of quantifying the level of TF binding site (TFBS) motif enrichment, and have been chosen as the objective functions of several widely used discriminant motif discovery methods, such as HOMER and DREME. In spite of its popularity and efficacy, FETS is non-smooth and non-differentiable, and is thus difficult to optimize numerically. To circumvent this limitation, existing tools that learn to optimize FETS either have to rely on discrete search strategies or indirect tuning of a few external parameters, which could hurt accuracy and fail to fully utilize the potential of input sequences to generate motifs. In this paper, we propose DirectFS, which is (to our best knowledge) the first FETS-based approach that allows direct learning of the motif parameters in continuous space. We show that when the resultant loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step in each iteration of DirectFS requires finding the most statistically significant one among tens of thousands of Fishers exact tests, which is solved efficiently using a novel ‘lookahead’-style algorithm. Experimental evaluations on ENCODE ChIP-seq data illustrate the performance of the proposed approach.


bioinformatics and biomedicine | 2015

Imputation of ChIP-seq datasets via Low Rank Convex Co-Embedding

Lin Zhu; Wei-Li Guo; De-Shuang Huang; Canyi lu

In recent years, thanks to the efforts of individual scientists and research consortiums, a huge amount of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experimental data have been accumulated. Although several recent studies have demonstrated that a wealth of insights can be gained by integrative analysis of these data, owing to cost, time or sample material availability, it is not always possible for researchers to obtain binding profiles for every proteins in every sample of interest, which considerably limits the power of integrative studies. In this paper, we propose a novel method called Low Rank Convex Co-Embedding (LRCCE) for imputing new ChIP-seq datasets. In LRCCE, a diverse collection of available ChIP-seq data are fused together by mapping proteins, samples, and genomic positions simultaneously into the Euclidean space, thereby making their underling associations directly evaluable using simple calculations. In contrast with previous approaches which mainly use of the local correlations between available datasets, LRCCE can better estimate the overall data structure by formulating the representation learning of all involved entities as a single unified optimization problem. Experimental evaluations on the ENCODE data illustrate the usefulness of the proposed model.


Bioinformatics | 2018

iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC

Bin Liu; Fan Yang; De-Shuang Huang; Kuo-Chen Chou

Collaboration


Dive into the De-Shuang Huang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bin Liu

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Kuo-Chen Chou

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Canyi lu

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Fan Weng

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Fan Yang

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Hai Min

University of Science and Technology of China

View shared research outputs
Researchain Logo
Decentralizing Knowledge