Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Buzhou Tang is active.

Publication


Featured researches published by Buzhou Tang.


BMC Bioinformatics | 2009

Prediction of protein binding sites in protein structures using hidden Markov support vector machine

Bin Liu; Xiaolong Wang; Lei Lin; Buzhou Tang; Qiwen Dong; Xuan Wang

BackgroundPredicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance.ResultsIn this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods.ConclusionThe improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.


Journal of Biomedical Informatics | 2015

Automatic de-identification of electronic medical records using token-level and character-level conditional random fields

Zengjian Liu; Yangxin Chen; Buzhou Tang; Xiaolong Wang; Qingcai Chen; Haodi Li; Jingfeng Wang; Qiwen Deng; Suisong Zhu

De-identification, identifying and removing all protected health information (PHI) present in clinical data including electronic medical records (EMRs), is a critical step in making clinical data publicly available. The 2014 i2b2 (Center of Informatics for Integrating Biology and Bedside) clinical natural language processing (NLP) challenge sets up a track for de-identification (track 1). In this study, we propose a hybrid system based on both machine learning and rule approaches for the de-identification track. In our system, PHI instances are first identified by two (token-level and character-level) conditional random fields (CRFs) and a rule-based classifier, and then are merged by some rules. Experiments conducted on the i2b2 corpus show that our system submitted for the challenge achieves the highest micro F-scores of 94.64%, 91.24% and 91.63% under the token, strict and relaxed criteria respectively, which is among top-ranked systems of the 2014 i2b2 challenge. After integrating some refined localization dictionaries, our system is further improved with F-scores of 94.83%, 91.57% and 91.95% under the token, strict and relaxed criteria respectively.


Journal of Biomedical Informatics | 2015

An automatic system to identify heart disease risk factors in clinical texts over time

Qingcai Chen; Haodi Li; Buzhou Tang; Xiaolong Wang; Xin Liu; Zengjian Liu; Shu Liu; Weida Wang; Qiwen Deng; Suisong Zhu; Yangxin Chen; Jingfeng Wang

Despite recent progress in prediction and prevention, heart disease remains a leading cause of death. One preliminary step in heart disease prediction and prevention is risk factor identification. Many studies have been proposed to identify risk factors associated with heart disease; however, none have attempted to identify all risk factors. In 2014, the National Center of Informatics for Integrating Biology and Beside (i2b2) issued a clinical natural language processing (NLP) challenge that involved a track (track 2) for identifying heart disease risk factors in clinical texts over time. This track aimed to identify medically relevant information related to heart disease risk and track the progression over sets of longitudinal patient medical records. Identification of tags and attributes associated with disease presence and progression, risk factors, and medications in patient medical history were required. Our participation led to development of a hybrid pipeline system based on both machine learning-based and rule-based approaches. Evaluation using the challenge corpus revealed that our system achieved an F1-score of 92.68%, making it the top-ranked system (without additional annotations) of the 2014 i2b2 clinical NLP challenge.


Journal of Biomedical Informatics | 2017

De-identification of clinical notes via recurrent neural network and conditional random field

Zengjian Liu; Buzhou Tang; Xiaolong Wang; Qingcai Chen

De-identification, identifying information from data, such as protected health information (PHI) present in clinical data, is a critical step to enable data to be shared or published. The 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-scale and RDOC Individualized Domains (N-GRID) clinical natural language processing (NLP) challenge contains a de-identification track in de-identifying electronic medical records (EMRs) (i.e., track 1). The challenge organizers provide 1000 annotated mental health records for this track, 600 out of which are used as a training set and 400 as a test set. We develop a hybrid system for the de-identification task on the training set. Firstly, four individual subsystems, that is, a subsystem based on bidirectional LSTM (long-short term memory, a variant of recurrent neural network), a subsystem-based on bidirectional LSTM with features, a subsystem based on conditional random field (CRF) and a rule-based subsystem, are used to identify PHI instances. Then, an ensemble learning-based classifiers is deployed to combine all PHI instances predicted by above three machine learning-based subsystems. Finally, the results of the ensemble learning-based classifier and the rule-based subsystem are merged together. Experiments conducted on the official test set show that our system achieves the highest micro F1-scores of 93.07%, 91.43% and 95.23% under the token, strict and binary token criteria respectively, ranking first in the 2016 CEGS N-GRID NLP challenge. In addition, on the dataset of 2014 i2b2 NLP challenge, our system achieves the highest micro F1-scores of 96.98%, 95.11% and 98.28% under the token, strict and binary token criteria respectively, outperforming other state-of-the-art systems. All these experiments prove the effectiveness of our proposed method.


Heart and Vessels | 2016

Tachycardia pacing induces myocardial neovascularization and mobilizes circulating endothelial progenitor cells partly via SDF-1 pathway in canines.

Jing-Ting Mai; Fei Wang; Qiong Qiu; Buzhou Tang; YongQing Lin; Nian-Sang Luo; Woliang Yuan; Xiaolong Wang; Qingcai Chen; Jingfeng Wang; YangXin Chen

Neovascularization plays pivotal role in ischemic heart failure; however, it is unclear in non-ischemic heart failure. Non-ischemic heart failure was induced by chronic rapid right ventricular pacing at 200 beats/min, respectively, for 3 and 6xa0weeks in 12 dogs. Sham-operation was performed in another 6 dogs as control. Three-week tachycardia pacing could induce mild/moderate heart failure and 6-week pacing could induce severe heart failure. Pan-microvessel density (MVD) was assessed by CD31 and neovascularization density was assessed by CD105. Mean CD31-MVD and CD105-MVD were significantly increased after 3-week pacing. However, CD105-MVD was significantly decreased by 80xa0% in 6-week pacing group compared with 3-week pacing group, whereas CD31-MVD was only decreased slightly (15xa0%; Pxa0<xa00.05). Myocardial proangiogenic factor stromal cell-derived factor 1 (SDF-1), hypoxia-inducible factors 1α (HIF-1α, a transcription factor which could regulate SDF-1 expression), serum SDF-1 levels and circulating EPC mobilization were greatly elevated after 3-week pacing but nearly returned to baseline level after 6-week pacing, which were in accordance with the changes of neovascularization levels assessed by CD105. Angiogenesis and migrating ability of EPCs were enhanced after stimulation of SDF-1, which could be abolished by pretreatment with SDF-1 receptor antagonist AMD3100. In addition, angiogenesis and migrating functions of EPCs were significantly enhanced by the serum from 3-week pacing dogs, but had much weaker response to the serum from 6-week pacing dogs. In conclusion, tachycardia pacing-induced non-ischemic heart failure, promoted myocardial neovascularization and mobilized circulating EPCs, which might be mediated partly through SDF-1 pathway.


Angiology | 2016

Validation of the Ability of SYNTAX and Clinical SYNTAX Scores to Predict Adverse Cardiovascular Events After Stent Implantation A Systematic Review and Meta-Analysis

Jia-Yuan Chen; Buzhou Tang; YongQing Lin; Ying Ru; Mao-Xiong Wu; Xiaolong Wang; Qingcai Chen; YangXin Chen; Jingfeng Wang

To compare the predicative ability of SYNTAX (Synergy between PCI with Taxus and Cardiac Surgery) and clinical SYNTAX scores for major adverse cardiac events (MACEs) after stent implantation in patients with coronary artery disease (CAD). Studies were identified by electronic and manual searches. Twenty-six studies were included in the meta-analysis. The pooled C-statistics of SYNTAX score for 1- and 5-year all-cause mortality (ACM) were 0.65 (95% confidence interval [CI]: 0.61-0.68) and 0.62 (95% CI: 0.59-0.65), respectively, with weak heterogeneity. The 1- and 5-year ACM pooled C-statistics for clinical SYNTAX scores were significantly higher at 0.77 and 0.71, respectively (Ps < .05). Both scoring systems predicted 1- and 5-year MACE equally well. The pooled risk ratio of the SYNTAX score for predicting 1-year ACM per unit was 1.04 (95% CI: 1.03-1.05). Calibration analysis indicated SYNTAX scores overestimated the risk of major adverse cardiac and cerebrovascular events in each risk stratum. The SYNTAX score demonstrated minimal discrimination in predicting 1- or 5-year adverse cardiovascular events after percutaneous coronary intervention in patients with CAD. The clinical SYNTAX score could further improve the predictive capability for ACM but not MACE.


international conference on neural information processing | 2010

Reranking for stacking ensemble learning

Buzhou Tang; Qingcai Chen; Xuan Wang; Xiaolong Wang

Ensemble learning refers to the methods that combine multiple models to improve the performance. Ensemble methods, such as stacking, have been intensively studied, and can bring slight performance improvement. However, there is no guarantee that a stacking algorithm outperforms all base classifiers. In this paper, we propose a new stacking algorithm, where the predictive scores of each possible class label returned by the base classifiers are firstly collected by the meta-learner, and then all possible class labels are reranked according to the scores. This algorithm is able to find the best linear combination of the base classifiers on the training samples, which make sure it outperforms all base classifiers during training process. The experiments conducted on several public datasets show that the proposed algorithm outperforms the baseline algorithms and several state-of-the-art stacking algorithms.


Chinese National Conference on Social Media Processing | 2014

Identifying Opinion Leaders from Online Comments

Yi Chen; Xiaolong Wang; Buzhou Tang; Ruifeng Xu; Bo Yuan; Xin Xiang; Junzhao Bu

Online comments are ubiquitous in social media such as micro-blogs, forums and blogs. They provide opinions of reviewers that are useful for understanding social media. Identifying opinion leaders from all reviewers is one of the most important tasks to analysis online comments. Most existing methods to identify opinion leaders only consider positive opinions. Few studies investigate the effect of negative opinions on opinion leader identification. In this paper, we propose a novel method to identify opinion leaders from online comments based on both positive and negative opinions. In this method, we first construct a signed network from online comments, and then design a new model based on PageTrust, called TrustRank, to identify opinion leaders from the signed network. Experimental results on the online comments of a real forum show that the proposed method is competitive with other related state-of-the-art methods.


international conference on neural information processing | 2011

Diversifying Question Recommendations in Community-Based Question Answering

Yaoyun Zhang; Xiaolong Wang; Xuan Wang; Ruifeng Xu; Buzhou Tang

Question retrieval is an important research topic in community-based question answering (QA). Conventionally, questions semantically equivalent to the query question are considered as top ranks. However, traditional question retrieval technique has the difficulty to process the users’ information needs which are implicitly embedded in the question. This paper proposes a novel method of question recommendation by considering user’s diverse information needs. By estimating information need compactness in the question retrieval results, we further identify the retrieval results need to be diversified. For these results, the scores of information retrieval model, the importance and novelty of both question types and the informational aspects of question content, are combined to do diverse question recommendation. Comparative experiments on a large scale real community-based QA dataset show that the proposed method effectively improves information need coverage and diversity through relevant questions recommendation.


Journal of Computers | 2011

Protein Remote Homology Detection and Fold Recognition based on Features Extracted from Frequency Profiles

Lei Lin; Bin Liu; Xiaolong Wang; Xuan Wang; Buzhou Tang

Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. The performance of SVM depends on the method of protein vectorization, so a suitable representation of the protein sequence is a key step for the SVM-based methods. In this paper, two kinds of profile-level building blocks of proteins, binary profiles and N-nary profiles, have been presented, which contain the evolutionary information of the protein sequence frequency profile. The protein sequence frequency profiles calculated from the multiple sequence alignments outputted by PSI-BLAST are converted into binary profiles or N-nary profiles. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each binary profile or N-nary profile and then the corresponding vectors are inputted to support vector machines. The latent semantic analysis (LSA) model, an efficient feature extraction algorithm, is adopted to further improve the performance of our methods. Experiments with protein remote homology detection and fold recognition show that the methods based on profile-level building blocks give better results compared to related methods.

Collaboration


Dive into the Buzhou Tang's collaboration.

Top Co-Authors

Avatar

Xiaolong Wang

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Qingcai Chen

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Xuan Wang

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zengjian Liu

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Bin Liu

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Haodi Li

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Lei Lin

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Xin Liu

Harbin Institute of Technology Shenzhen Graduate School

View shared research outputs
Top Co-Authors

Avatar

Bingquan Liu

Harbin Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge