Jiarui Zhou
Zhejiang University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jiarui Zhou.
IEEE Transactions on Evolutionary Computation | 2011
Zexuan Zhu; Jiarui Zhou; Zhen Ji; Yuhui Shi
With the rapid development of high-throughput DNA sequencing technologies, the amount of DNA sequence data is accumulating exponentially. The huge influx of data creates new challenges for storage and transmission. This paper proposes a novel adaptive particle swarm optimization-based memetic algorithm (POMA) for DNA sequence compression. POMA is a synergy of comprehensive learning particle swarm optimization (CLPSO) and an adaptive intelligent single particle optimizer (AdpISPO)-based local search. It takes advantage of both CLPSO and AdpISPO to optimize the design of approximate repeat vector (ARV) codebook for DNA sequence compression. ARV is first introduced in this paper to represent the repeated fragments across multiple sequences in direct, mirror, pairing, and inverted patterns. In POMA, candidate ARV codebooks are encoded as particles and the optimal solution, which covers the most approximate repeated fragments with the fewest base variations, is identified through the exploration and exploitation of POMA. In each iteration of POMA, the leader particles in the swarm are selected based on weighted fitness values and each leader particle is fine-tuned with an AdpISPO-based local search, so that the convergence of the search in local region is accelerated. A detailed comparison study between POMA and the counterpart algorithms is performed on 29 (23 basic and 6 composite) benchmark functions and 11 real DNA sequences. POMA is observed to obtain better or competitive performance with a limited number of function evaluations. POMA also attains lower bits-per-base than other state-of-the-art DNA-specific algorithms on DNA sequence data. The experimental results suggest that the cooperation of CLPSO and AdpISPO in the framework of memetic algorithm is capable of searching the ARV codebook space efficiently.
Bioinformatics | 2014
Jiarui Zhou; Ralf J. M. Weber; J. William Allwood; Robert Mistrik; Zexuan Zhu; Zhen Ji; Siping Chen; Warwick B. Dunn; Shan He; Mark R. Viant
Summary: Experimental MSn mass spectral libraries currently do not adequately cover chemical space. This limits the robust annotation of metabolites in metabolomics studies of complex biological samples. In silico fragmentation libraries would improve the identification of compounds from experimental multistage fragmentation data when experimental reference data are unavailable. Here, we present a freely available software package to automatically control Mass Frontier software to construct in silico mass spectral libraries and to perform spectral matching. Based on two case studies, we have demonstrated that high-throughput automation of Mass Frontier allows researchers to generate in silico mass spectral libraries in an automated and high-throughput fashion with little or no human intervention required. Availability and implementation: Documentation, examples, results and source code are available at http://www.biosciences-labs.bham.ac.uk/viant/hammer/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
BMC Bioinformatics | 2014
Jiarui Zhou; Zhen Ji; Zexuan Zhu; Shan He
BackgroundThe exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores.ResultsIn this paper we present a memetic algorithm (MA) based NGS quality score data compressor, namely MMQSC. The algorithm extracts raw quality score sequences from FASTQ formatted files, and designs compression codebook using MA based multimodal optimization. The input data is then compressed in a substitutional manner. Experimental results on five representative NGS data sets show that MMQSC obtains higher compression ratio than the other state-of-the-art methods. Particularly, MMQSC is a lossless reference-free compression algorithm, yet obtains an average compression ratio of 22.82% on the experimental data sets.ConclusionsThe proposed MMQSC compresses NGS quality score data effectively. It can be utilized to improve the overall compression ratio on FASTQ formatted files.
Memetic Computing | 2011
Jiarui Zhou; Zhen Ji; Linlin Shen; Zexuan Zhu; Siping Chen
A Gabor filters based face recognition algorithm named POMA-Gabor is proposed in this paper. The algorithm uses particular Gabor wavelets in the feature extraction on specific areas of the face image and a particle swarm optimization (PSO) based memetic algorithm (POMA), which combines comprehensive learning particle swarm optimizer (CLPSO) global search and self-adaptive intelligent single particle optimizer (AdpISPO) local search, is introduced to select the Gabor filter parameters. The experimental results demonstrate that POMA obtains better performance than other comparative PSO algorithms. Employing POMA for Gabor filter design, POMA-Gabor is capable of obtaining more representative information and higher recognition rate with less computational time.
Metabolites | 2013
James William Allwood; Ralf J. M. Weber; Jiarui Zhou; Shan He; Mark R. Viant; Warwick B. Dunn
The Critical Assessment of Small Molecule Identification (CASMI) contest was developed to provide a systematic comparative evaluation of strategies applied for the annotation and identification of small molecules. The authors participated in eleven challenges in both category 1 (to deduce a molecular formula) and category 2 (to deduce a molecular structure) related to high resolution LC-MS data. For category 1 challenges, the PUTMEDID_LCMS workflows provided the correct molecular formula in nine challenges; the two incorrect submissions were related to a larger mass error in experimental data than expected or the absence of the correct molecular formula in a reference file applied in the PUTMEDID_LCMS workflows. For category 2 challenges, MetFrag was applied to construct in silico fragmentation data and compare with experimentally-derived MS/MS data. The submissions for three challenges were correct, and for eight challenges, the submissions were not correct; some submissions showed similarity to the correct structures, while others showed no similarity. The low number of correct submissions for category 2 was a result of applying the assumption that all chemicals were derived from biological samples and highlights the importance of knowing the origin of biological or chemical samples studied and the metabolites expected to be present to define the correct chemical space to search in annotation processes.
Journal of Biological Research-thessaloniki | 2016
Junshan Yang; Jiarui Zhou; Zexuan Zhu; Xiaoliang Ma; Zhen Ji
BackgroundMicroarray technology allows biologists to monitor expression levels of thousands of genes among various tumor tissues. Identifying relevant genes for sample classification of various tumor types is beneficial to clinical studies. One of the most widely used classification strategies for multiclass classification data is the One-Versus-All (OVA) schema that divides the original problem into multiple binary classification of one class against the rest. Nevertheless, multiclass microarray data tend to suffer from imbalanced class distribution between majority and minority classes, which inevitably deteriorates the performance of the OVA classification.ResultsIn this study, we propose a novel iterative ensemble feature selection (IEFS) framework for multiclass classification of imbalanced microarray data. In particular, filter feature selection and balanced sampling are performed iteratively and alternatively to boost the performance of each binary classification in the OVA schema. The proposed framework is tested and compared with other representative state-of-the-art filter feature selection methods using six benchmark multiclass microarray data sets. The experimental results show that IEFS framework provides superior or comparable performance to the other methods in terms of both classification accuracy and area under receiver operating characteristic curve. The more number of classes the data have, the better performance of IEFS framework achieves.ConclusionsBalanced sampling and feature selection together work well in improving the performance of multiclass classification of imbalanced microarray data. The IEFS framework is readily applicable to other biological data analysis tasks facing the same problem.
soft computing | 2013
Zhen Ji; Jiarui Zhou; Zexuan Zhu; Siping Chen
This paper presents a novel self-configuration single particle optimizer (SCSPO) for DNA sequence compression. Particularly, SCSPO searches an optimal compression codebook of all unique repeat patterns and then DNA sequences are compressed by replacing the duplicate fragments with the indexes of the corresponding matched code vectors in the codebook. Featured with a crucial self-configuration process, SCSPO optimizes the codebook with no predefined parameter settings required. Experimental results on benchmark numerical functions and real-world DNA sequences demonstrate that SCSPO is capable of attaining better fitness value than many other PSO variants and the proposed DNA sequence compression algorithm based on SCSPO attains encouraging compression performance.
congress on evolutionary computation | 2016
Jun Xiao; Yanming Yang; Xiaoliang Ma; Jiarui Zhou; Zexuan Zhu
This paper formulates one-to-many-to-one pickup and delivery problems with dynamic customer requests and traffic information. A multi-objective memetic algorithm namely prioLSH-MOMA is proposed to solve the problems. The new algorithm is characterized with a priority and locality-sensitive hashing based local search. prioLSH-MOMA is designed to find an optimal route of a dynamic pickup and delivery problem in terms of route length and workload. Particularly, a re-planning strategy is introduced to handle the dynamic information. Priority and locality-sensitive hashing based local search is applied to fine-tune the candidate routes during the evolution process. prioLSH-MOMA is evaluated with two dynamic pickup and delivery problems simulated on real-world maps and the results demonstrate the efficiency of the proposed algorithm.
Environmental science. Nano | 2018
Susan Dekkers; Timothy Williams; Jinkang Zhang; Jiarui Zhou; Rob J. Vandebriel; Liset J.J. de la Fonteyne; Eric R. Gremmer; Shan He; Emily J. Guggenheim; Iseult Lynch; Flemming R. Cassee; Wim H. de Jong; Mark R. Viant
The toxicity of silver (Ag) and zinc oxide (ZnO) nanoparticles (NPs) has been associated with their dissolution or ability to release metal ions while the toxicity of cerium dioxide (CeO2) NPs has been related to their ability to induce or reduce oxidative stress dependent on their surface redox state. To examine the underlying biochemical mechanisms, multiple omics technologies were applied to characterise the responses at the molecular level in cells exposed to various metal-based particles and their corresponding metal ions. Human lung epithelial carcinoma cells (A549) were exposed to various Ag, ZnO, and CeO2 NPs, Ag and ZnO micro-sized particles (MPs), Ag ions (Ag+) and zinc ions (Zn2+) over a 24 h time course. Molecular responses at exposure levels that caused ∼20% cytotoxicity were characterised by direct infusion mass spectrometry lipidomics and polar metabolomics and by RNAseq transcriptomics. All Ag, Zn and ZnO exposures resulted in significant metabolic and transcriptional responses and the great majority of these molecular changes were common to both ionic and NP exposures and characteristic of metal ion exposure. The low toxicity CeO2 NPs elicited few molecular changes, showing slight evidence of oxidative stress for only one of the four CeO2 NPs tested. The multiple omics analyses highlight the main pathways implicated in metal ion-mediated effects. These results can be used to establish adverse outcome pathways as well as strategies to group nanomaterials for risk assessment.
congress on evolutionary computation | 2016
Nuosi Wu; Jiarui Zhou; Zexuan Zhu; Zhen Ji
Structural biology is a branch of molecular biology and biochemistry, aiming to understand the interaction of molecules like proteins by observing their structures. Crystallization is one of the most widely used methods to identify protein structure, yet it is laborious and time-consuming. Researchers are seeking assistance from computers. This paper implements an improved WDSP program for recognizing and predicting secondary structure of WD40 repeat proteins, which is a large protein family in eukaryotes. The original WDSP works well on predicting WD40 protein structures but it also suffers from low computational efficiency. We propose a more computationally efficient WDSP namely FWDSP by imposing clustering and a specific local searching to the original WDSP. Experiment results on three datasets of WD40 proteins demonstrate the effectiveness and efficiency of FWDSP.