Yangyong Zhu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yangyong Zhu is active.

Explore More

Publication

Featured researches published by Yangyong Zhu.

Bioinformatics | 2008

ITFP: an integrated platform of mammalian transcription factors

Guangyong Zheng; Kang Tu; Qing Yang; Yun Xiong; Chaochun Wei; Lu Xie; Yangyong Zhu; Yixue Li

Investigation of transcription factors (TFs) and their downstream regulated genes (targets) is a significant issue in post-genome era, which can provide a brand new vision for some vital biological process. However, information of TFs and their targets in mammalian is far from sufficient. Here, we developed an integrated TF platform (ITFP), which included abundant TFs and their targets of mammalian. In current release, ITFP includes 4105 putative TFs and 69 496 potential TF-target pairs for human, 3134 putative TFs and 37 040 potential TF-target pairs for mouse, and 1114 putative TFs and 18 055 potential TF-target pairs for rat. In short, ITFP will serve as an important resource for the research community of transcription and provide strong support for regulatory network study.

IEEE Transactions on Knowledge and Data Engineering | 2015

Top-k Similarity Join in Heterogeneous Information Networks

Yun Xiong; Yangyong Zhu; Philip S. Yu

As a newly emerging network model, heterogeneous information networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has not yet been studied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path-based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach.

BMC Bioinformatics | 2008

The combination approach of SVM and ECOC for powerful identification and classification of transcription factor

Guangyong Zheng; Ziliang Qian; Qing Yang; Chaochun Wei; Lu Xie; Yangyong Zhu; Yixue Li

BackgroundTranscription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand.ResultsThe support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL).ConclusionThe SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining.

knowledge discovery and data mining | 2007

Incremental mining of sequential patterns using prefix tree

Yue Chen; Jiankui Guo; Yaqin Wang; Yun Xiong; Yangyong Zhu

This paper fist demonstrates that current PrefixSpan-based incremental mining algorithm IncSpan+ which is proposed in PAKDD05 cannot completely mine all sequential patterns. Then a new incremental mining algorithm of sequential patterns using prefix tree is proposed. This algorithm constructs a prefix tree to represent the sequential patterns, and then continuously scans the incremental element set to maintain the tree structure, using width pruning and depth pruning to eliminate the search space. The experiment shows this algorithm has a good performance.

world congress on intelligent control and automation | 2006

Dynamic Traffic Prediction Based on Traffic Flow Mining

Yaqin Wang; Yue Chen; Minggui Qin; Yangyong Zhu

ITS technology collects a large of historical traffic flow data that may provide information for the support and improvement of traffic control. Data mining technique is appropriate to analysis the large amount of ITS data to acquire useful traffic pattern. We present a dynamic traffic prediction model, the model deals with traffic flow data to convert them into traffic status. In this paper two data mining techniques, the clustering analysis and the classification analysis, are used to develop the model, and the classification model can be used to predict traffic status in real time. The experiment shows the prediction model can be used efficiently in the dynamic traffic prediction for the urban traffic flow guidance

Brain Informatics | 2009

Data explosion, data nature and dataology

Yangyong Zhu; Ning Zhong; Yun Xiong

The essence of computer applications is to store things in the real world into computer systems in the form of data, i.e., it is a process of producing data. Some data are the records related to culture and society, and others are the descriptions of phenomena of universe and life. The large scale of data is rapidly generated and stored in computer systems, which is called data explosion. Data explosion forms data nature in computer systems. To explore data nature, new theories and methods are required. In this paper, we present the concept of data nature and introduce the problems arising from data nature, and then we define a new discipline named dataology (also called data science or science of data), which is an umbrella of theories, methods and technologies for studying data nature. The research issues and framework of dataology are proposed.

web intelligence | 2010

User Navigation Behavior Mining Using Multiple Data Domain Description

Li Xue; Yun Xiong; Yangyong Zhu

User Navigation Behavior Mining (UNBM) mainly studies the problems of extracting the interesting user access patterns from user access sequences (UAS), which are usually used for user access prediction and web page recommendation. Through analyzing the real world web data, we find most of user access sequences carrying hybrid features of different patterns, rather than a single one.

computer and information technology | 2005

A Multi-Supports-Based Sequential Pattern Mining Algorithm

Yun Xiong; Yangyong Zhu

Sequential pattern mining is now widely used in various areas, such as the analysis of biological sequences, Web access patterns, customer purchase patterns and etc. In this paper, we propose a new definition for M-sequences. Also we present multiple supports: local support, total support, and distribution support for their related mining of local sequential patterns, total sequential patterns and existence sequential patterns. Based on multiple supports, a multi-supports-based sequential pattern mining algorithm is developed which can be generally applied to find such patterns

international conference data science | 2015

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing

Zhongyi Sun; Fengke Chen; Mingmin Chi; Yangyong Zhu

With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this problem. Here, Spark is an open-source distributed computing platform with Hadoop YARN as resource scheduler and HDFS as cloud storage system. On the Spark-based platform, data loaded into memory in the first iteration can be reused in the subsequent iterations. This mechanism makes Spark much suitable for running multi-iteration algorithms compared to MapReduce which has to load data in each iteration. The experiments are carried out on massive remote sensing data using multi-iteration singular value decomposition SVD algorithm. The results show that Spark-based SVD can obtain significantly faster computation timethan that by MapReduce, usually by one order of magnitude.

bioinformatics and biomedicine | 2010

TOPPER: An algorithm for mining top k patterns in biological sequences based on regularity measurement

Yun Xiong; Junhua He; Yangyong Zhu

Biological sequential patterns usually exhibit some significant functions in a set of sequences. Mining such patterns offers a key means of insight into transcription regulation mechanisms and becomes a useful primitive task underlying many researches and applications. Recently, various methods have been developed to identify biological patterns. However, traditional approaches to mine sequential pattern will get a huge result set, which make biologists difficult to decide which patterns are interesting and meaningful. In this paper, we study a variant of biological sequential pattern mining aiming at the huge result set, termed top k representative patterns mining based on regularity measurement. As the first attempt to tackle the problem, a new measurement ‘regularity’ is defined to evaluate the interesting of each pattern and an efficient algorithm is proposed with pruning strategy which returns top k representative patterns ranked by the regularity. Experimental results demonstrate that the proposed method is more efficient than the state-of-the-art methods on the real datasets.

Explore More