Ziyu Lin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ziyu Lin is active.

Explore More

Publication

Featured researches published by Ziyu Lin.

Briefings in Bioinformatics | 2014

Survey of MapReduce frame operation in bioinformatics

Quan Zou; Xubin Li; Wen-Rui Jiang; Ziyu Lin; Gui-Lin Li; Ke Chen

Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics.

BioMed Research International | 2015

Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods

Quan Zou; Jinjin Li; Qingqi Hong; Ziyu Lin; Yun Wu; Hua Shi; Ying Ju

MicroRNAs constitute an important class of noncoding, single-stranded, ~22 nucleotide long RNA molecules encoded by endogenous genes. They play an important role in regulating gene transcription and the regulation of normal development. MicroRNAs can be associated with disease; however, only a few microRNA-disease associations have been confirmed by traditional experimental approaches. We introduce two methods to predict microRNA-disease association. The first method, KATZ, focuses on integrating the social network analysis method with machine learning and is based on networks derived from known microRNA-disease associations, disease-disease associations, and microRNA-microRNA associations. The other method, CATAPULT, is a supervised machine learning method. We applied the two methods to 242 known microRNA-disease associations and evaluated their performance using leave-one-out cross-validation and 3-fold cross-validation. Experiments proved that our methods outperformed the state-of-the-art methods.

BioMed Research International | 2013

An Approach for Identifying Cytokines Based on a Novel Ensemble Classifier

Quan Zou; Zhen Wang; Xinjun Guan; Bin Liu; Yunfeng Wu; Ziyu Lin

Biology is meaningful and important to identify cytokines and investigate their various functions and biochemical mechanisms. However, several issues remain, including the large scale of benchmark datasets, serious imbalance of data, and discovery of new gene families. In this paper, we employ the machine learning approach based on a novel ensemble classifier to predict cytokines. We directly selected amino acids sequences as research objects. First, we pretreated the benchmark data accurately. Next, we analyzed the physicochemical properties and distribution of whole amino acids and then extracted a group of 120-dimensional (120D) valid features to represent sequences. Third, in the view of the serious imbalance in benchmark datasets, we utilized a sampling approach based on the synthetic minority oversampling technique algorithm and K-means clustering undersampling algorithm to rebuild the training set. Finally, we built a library for dynamic selection and circulating combination based on clustering (LibD3C) and employed the new training set to realize cytokine classification. Experiments showed that the geometric mean of sensitivity and specificity obtained through our approach is as high as 93.3%, which proves that our approach is effective for identifying cytokines.

Big Data Research | 2016

Finding the Best Classification Threshold in Imbalanced Classification

Quan Zou; Sifa Xie; Ziyu Lin; Meihong Wu; Ying Ju

Abstract Classification with imbalanced class distributions is a major problem in machine learning. Researchers have given considerable attention to the applications in many real-world scenarios. Although several works have utilized the area under the receiver operating characteristic (ROC) curve to select potentially optimal classifiers in imbalanced classifications, limited studies have been devoted to finding the classification threshold for testing or unknown datasets. In general, the classification threshold is simply set to 0.5, which is usually unsuitable for an imbalanced classification. In this study, we analyze the drawbacks of using ROC as the sole measure of imbalance in data classification problems. In addition, a novel framework for finding the best classification threshold is proposed. Experiments with SCOP v.1.53 data reveal that, with the default threshold set to 0.5, our proposed framework demonstrated a 20.63% improvement in terms of F-score compared with that of more commonly used methods. The findings suggest that the proposed framework is both effective and efficient. A web server and software tools are available via http://datamining.xmu.edu.cn/prht/ or http://prht.sinaapp.com/ .

Sensors | 2015

Adaptive Data Gathering in Mobile Sensor Networks Using Speedy Mobile Elements

Yongxuan Lai; Jinshan Xie; Ziyu Lin; Tian Wang; Minghong Liao

Data gathering is a key operator for applications in wireless sensor networks; yet it is also a challenging problem in mobile sensor networks when considering that all nodes are mobile and the communications among them are opportunistic. This paper proposes an efficient data gathering scheme called ADG that adopts speedy mobile elements as the mobile data collector and takes advantage of the movement patterns of the network. ADG first extracts the network meta-data at initial epochs, and calculates a set of proxy nodes based on the meta-data. Data gathering is then mapped into the Proxy node Time Slot Allocation (PTSA) problem that schedules the time slots and orders, according to which the data collector could gather the maximal amount of data within a limited period. Finally, the collector follows the schedule and picks up the sensed data from the proxy nodes through one hop of message transmissions. ADG learns the period when nodes are relatively stationary, so that the collector is able to pick up the data from them during the limited data gathering period. Moreover, proxy nodes and data gathering points could also be timely updated so that the collector could adapt to the change of node movements. Extensive experimental results show that the proposed scheme outperforms other data gathering schemes on the cost of message transmissions and the data gathering rate, especially under the constraint of limited data gathering period.

Science in China Series F: Information Sciences | 2012

MBA: A market-based approach to data allocation and dynamic migration for cloud database

Tengjiao Wang; Ziyu Lin; Bishan Yang; Jun Gao; Allen Huang; Dongqing Yang; Qi Zhang; Shiwei Tang; Jinzhong Niu

With the coming shift to cloud computing, cloud database is emerging to provide database service over the Internet. In the cloud-based environment, data are distributed at Internet scale and the system needs to handle a huge number of user queries simultaneously without delay. How data are distributed among the servers has a crucial impact on the query load distribution and the system response time. In this paper, we propose a market-based control method, called MBA, to achieve query load balance via reasonable data distribution. In MBA, database nodes are treated as traders in a market, and certain market rules are used to intelligently decide data allocation and migration. We built a prototype system and conducted extensive experiments. Experimental results show that the MBA method significantly improves system performance in terms of average query response time and fairness.

computer and information technology | 2007

User-Oriented Materialized View Selection

Ziyu Lin; Dongqing Yang; Guojie Song; Tengjiao Wang

The problem of materialized view selection has been long researched, and many approaches have been proposed to deal with this issue. However, all the methods proposed to date strive toward improving the overall query performance, instead of being user-oriented. In this paper, we propose a new user-oriented method, called SOMES (user- oriented materialized view selection), aiming at achieving better performance for view selection problem. SOMES takes into account query characteristics of different users, in which, users are classified into different groups according to their query characteristics, and various user groups are provided with their own windows, user view windows containing the views involved in their own query process. Experimental results show that our method can achieve desirable performance improvements over other methods such as BPUS and FPUS.

Journal of Software | 2012

Research on Cloud Databases: Research on Cloud Databases

Ziyu Lin; Yongxuan Lai; Chen Lin; Yi Xie; Quan Zou

With the recent development of cloud computing, the importance of cloud databases has been widely acknowledged. Here, the features, influence and related products of cloud databases are first discussed. Then, research issues of cloud databases are presented in detail, which include data model, architecture, consistency, programming model, data security, performance optimization, benchmark, and so on. Finally, some future trends in this area are discussed.

asia pacific web conference | 2011

Maintaining Internal Consistency of Report for Real-Time OLAP with Layer-Based View

Ziyu Lin; Yongxuan Lai; Chen Lin; Yi Xie; Quan Zou

Maintaining internal consistency of report is an important aspect in the field of real-time data warehouses. OLAP and Query tools were initially designed to operate on top of unchanging, static historical data. In a real-time environment, however, the results they produce are usually negatively influenced by data changes concurrent to query execution, which may result in some internal report inconsistency. In this paper, we propose a new method, called layer-based view approach, to appropriately and effectively maintain report data consistency. The core idea is to prevent the data involved in an OLAP query from being changed through using lock mechanism, and avoid the confliction between read and write operations with the help of layer mechanism. Our approach can effectively deal with report consistency issue, while at the same time avoiding the query contention between read and write operations under real-time OLAP environment.

International Journal of Distributed Sensor Networks | 2012

Data gathering in opportunistic wireless sensor networks

Yongxuan Lai; Ziyu Lin

The wireless sensor networks and opportunistic networks have nowadays presented a trend of technology convergence. On one hand, the nodes periodically sense the environment and continuously generate sensing data; on the other hand, the movements and sparse deployment of nodes usually lead to intermitted connected links and create some form of opportunistic communications. So it is a challenging problem to effectively collect the sensing data in opportunistic wireless sensor networks. In this paper, we propose an efficient data gathering algorithm based on location prediction in opportunistic wireless sensor networks. The algorithm first collects the network metadata such as history of node encounters and contact durations; then it creates a node contact graph, based on which predictive optimal data gathering locations are dynamically calculated and updated. Finally, the sink is controlled to move to these locations to collect sensing data, avoiding lots of unnecessary data exchanges and message transmissions. Extensive experimental results show that the proposed algorithm is effective to reduce the message transmissions and improve the data collection coverage rate.

Explore More