Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where André Fujita is active.

Publication


Featured researches published by André Fujita.


BMC Bioinformatics | 2006

Evaluating different methods of microarray data normalization

André Fujita; João Ricardo Sato; Leonardo de Oliveira Rodrigues; Carlos Eduardo Ferreira; Mari Cleide Sogayar

BackgroundWith the development of DNA hybridization microarray technologies, nowadays it is possible to simultaneously assess the expression levels of thousands to tens of thousands of genes. Quantitative comparison of microarrays uncovers distinct patterns of gene expression, which define different cellular phenotypes or cellular responses to drugs. Due to technical biases, normalization of the intensity levels is a pre-requisite to performing further statistical analyses. Therefore, choosing a suitable approach for normalization can be critical, deserving judicious consideration.ResultsHere, we considered three commonly used normalization approaches, namely: Loess, Splines and Wavelets, and two non-parametric regression methods, which have yet to be used for normalization, namely, the Kernel smoothing and Support Vector Regression. The results obtained were compared using artificial microarray data and benchmark studies. The results indicate that the Support Vector Regression is the most robust to outliers and that Kernel is the worst normalization technique, while no practical differences were observed between Loess, Splines and Wavelets.ConclusionIn face of our results, the Support Vector Regression is favored for microarray normalization due to its superiority when compared to the other methods for its robustness in estimating the normalization curve.


BMC Systems Biology | 2007

Modeling gene expression regulatory networks with the sparse vector autoregressive model.

André Fujita; João Ricardo Sato; Humberto Miguel Garay-Malpartida; Rui Yamaguchi; Satoru Miyano; Mari Cleide Sogayar; Carlos Eduardo Ferreira

BackgroundTo understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems.ResultsWe have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets.ConclusionThe proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.


Bioinformatics | 2007

Time-varying modeling of gene expression regulatory networks using the wavelet dynamic vector autoregressive method

André Fujita; João Ricardo Sato; Humberto Miguel Garay-Malpartida; Pedro A. Morettin; Mari Cleide Sogayar; Carlos Eduardo Ferreira

MOTIVATION A variety of biological cellular processes are achieved through a variety of extracellular regulators, signal transduction, protein-protein interactions and differential gene expression. Understanding of the mechanisms underlying these processes requires detailed molecular description of the protein and gene networks involved. To better understand these molecular networks, we propose a statistical method to estimate time-varying gene regulatory networks from time series microarray data. One well known problem when inferring connectivity in gene regulatory networks is the fact that the relationships found constitute correlations that do not allow inferring causation, for which, a priori biological knowledge is required. Moreover, it is also necessary to know the time period at which this causation occurs. Here, we present the Dynamic Vector Autoregressive model as a solution to these problems. RESULTS We have applied the Dynamic Vector Autoregressive model to estimate time-varying gene regulatory networks based on gene expression profiles obtained from microarray experiments. The network is determined entirely based on gene expression profiles data, without any prior biological knowledge. Through construction of three gene regulatory networks (of p53, NF-kappaB and c-myc) for HeLa cells, we were able to predict the connectivity, Granger-causality and dynamics of the information flow in these networks. SUPPLEMENTARY INFORMATION Additional figures may be found at http://mariwork.iq.usp.br/dvar/.


BMC Systems Biology | 2009

Recursive regularization for inferring gene networks from time-course gene expression profiles

Teppei Shimamura; Seiya Imoto; Rui Yamaguchi; André Fujita; Masao Nagasaki; Satoru Miyano

BackgroundInferring gene networks from time-course microarray experiments with vector autoregressive (VAR) model is the process of identifying functional associations between genes through multivariate time series. This problem can be cast as a variable selection problem in Statistics. One of the promising methods for variable selection is the elastic net proposed by Zou and Hastie (2005). However, VAR modeling with the elastic net succeeds in increasing the number of true positives while it also results in increasing the number of false positives.ResultsBy incorporating relative importance of the VAR coefficients into the elastic net, we propose a new class of regularization, called recursive elastic net, to increase the capability of the elastic net and estimate gene networks based on the VAR model. The recursive elastic net can reduce the number of false positives gradually by updating the importance. Numerical simulations and comparisons demonstrate that the proposed method succeeds in reducing the number of false positives drastically while keeping the high number of true positives in the network inference and achieves two or more times higher true discovery rate (the proportion of true positives among the selected edges) than the competing methods even when the number of time points is small. We also compared our method with various reverse-engineering algorithms on experimental data of MCF-7 breast cancer cells stimulated with two ErbB ligands, EGF and HRG.ConclusionThe recursive elastic net is a powerful tool for inferring gene networks from time-course gene expression profiles.


NeuroImage | 2009

Evaluating SVM and MLDA in the extraction of discriminant regions for mental state prediction

João Ricardo Sato; André Fujita; Carlos Eduardo Thomaz; María M. Martín; Janaina Mourão-Miranda; Michael Brammer; Edson Amaro Junior

Pattern recognition methods have been successfully applied in several functional neuroimaging studies. These methods can be used to infer cognitive states, so-called brain decoding. Using such approaches, it is possible to predict the mental state of a subject or a stimulus class by analyzing the spatial distribution of neural responses. In addition it is possible to identify the regions of the brain containing the information that underlies the classification. The Support Vector Machine (SVM) is one of the most popular methods used to carry out this type of analysis. The aim of the current study is the evaluation of SVM and Maximum uncertainty Linear Discrimination Analysis (MLDA) in extracting the voxels containing discriminative information for the prediction of mental states. The comparison has been carried out using fMRI data from 41 healthy control subjects who participated in two experiments, one involving visual-auditory stimulation and the other based on bi-manual fingertapping sequences. The results suggest that MLDA uses significantly more voxels containing discriminative information (related to different experimental conditions) to classify the data. On the other hand, SVM is more parsimonious and uses less voxels to achieve similar classification accuracies. In conclusion, MLDA is mostly focused on extracting all discriminative information available, while SVM extracts the information which is sufficient for classification.


Frontiers in Systems Neuroscience | 2012

Evaluation of pattern recognition and feature extraction methods in ADHD prediction

João Ricardo Sato; Marcelo Q. Hoexter; André Fujita; Luis Augusto Rohde

Attention Deficit/Hyperactivity Disorder (ADHD) is a neurodevelopmental disorder, being one of the most prevalent psychiatric disorders in childhood. The neural substrates associated with this condition, both from structural and functional perspectives, are not yet well established. Recent studies have highlighted the relevance of neuroimaging not only to provide a more solid understanding about the disorder but also for possible clinical support. The ADHD-200 Consortium organized the ADHD-200 global competition making publicly available, hundreds of structural magnetic resonance imaging (MRI) and functional MRI (fMRI) datasets of both ADHD patients and typically developing (TD) controls for research use. In the current study, we evaluate the predictive power of a set of three different feature extraction methods and 10 different pattern recognition methods. The features tested were regional homogeneity (ReHo), amplitude of low frequency fluctuations (ALFF), and independent components analysis maps (resting state networks; RSN). Our findings suggest that the combination ALFF+ReHo maps contain relevant information to discriminate ADHD patients from TD controls, but with limited accuracy. All classifiers provided almost the same performance in this case. In addition, the combination ALFF+ReHo+RSN was relevant in combined vs. inattentive ADHD classification, achieving a score accuracy of 67%. In this latter case, the performances of the classifiers were not equivalent and L2-regularized logistic regression (both in primal and dual space) provided the most accurate predictions. The analysis of brain regions containing most discriminative information suggested that in both classifications (ADHD vs. TD controls and combined vs. inattentive), the relevant information is not confined only to a small set of regions but it is spatially distributed across the whole brain.


Psychiatry Research-neuroimaging | 2011

Maximum-uncertainty linear discrimination analysis of first-episode schizophrenia subjects.

Tomáš Kašpárek; Carlos Eduardo Thomaz; João Ricardo Sato; Daniel Schwarz; Eva Janoušová; Radek Mareček; Radovan Prikryl; Jiri Vanicek; André Fujita; Eva Češková

Recent techniques of image analysis brought the possibility to recognize subjects based on discriminative image features. We performed a magnetic resonance imaging (MRI)-based classification study to assess its usefulness for outcome prediction of first-episode schizophrenia patients (FES). We included 39 FES patients and 39 healthy controls (HC) and performed the maximum-uncertainty linear discrimination analysis (MLDA) of MRI brain intensity images. The classification accuracy index (CA) was correlated with the Positive and Negative Syndrome Scale (PANSS) and the Global Assessment of Functioning scale (GAF) at 1-year follow-up. The rate of correct classifications of patients with poor and good outcomes was analyzed using chi-square tests. MLDA classification was significantly better than classification by chance. Leave-one-out accuracy was 72%. CA correlated significantly with PANSS and GAF scores at the 1-year follow-up. Moreover, significantly more patients with poor outcome than those with good outcome were classified correctly. MLDA of brain MR intensity features is, therefore, able to correctly classify a significant number of FES patients, and the discriminative features are clinically relevant for clinical presentation 1 year after the first episode of schizophrenia. The accuracy of the current approach is, however, insufficient to be used in clinical practice immediately. Several methodological issues need to be addressed to increase the usefulness of this classification approach.


Briefings in Bioinformatics | 2014

A comparative study of statistical methods used to identify dependencies between gene expression signals

Suzana de Siqueira Santos; Daniel Yasumasa Takahashi; Asuka Nakata; André Fujita

One major task in molecular biology is to understand the dependency among genes to model gene regulatory networks. Pearsons correlation is the most common method used to measure dependence between gene expression signals, but it works well only when data are linearly associated. For other types of association, such as non-linear or non-functional relationships, methods based on the concepts of rank correlation and information theory-based measures are more adequate than the Pearsons correlation, but are less used in applications, most probably because of a lack of clear guidelines for their use. This work seeks to summarize the main methods (Pearsons, Spearmans and Kendalls correlations; distance correlation; Hoeffdings D: measure; Heller-Heller-Gorfine measure; mutual information and maximal information coefficient) used to identify dependency between random variables, especially gene expression data, and also to evaluate the strengths and limitations of each method. Systematic Monte Carlo simulation analyses ranging from sample size, local dependence and linear/non-linear and also non-functional relationships are shown. Moreover, comparisons in actual gene expression data are carried out. Finally, we provide a suggestive list of methods that can be used for each type of data set.


Computational Statistics & Data Analysis | 2014

A non-parametric method to estimate the number of clusters

André Fujita; Daniel Yasumasa Takahashi; Alexandre G. Patriota

An important and yet unsolved problem in unsupervised data clustering is how to determine the number of clusters. The proposed slope statistic is a non-parametric and data driven approach for estimating the number of clusters in a dataset. This technique uses the output of any clustering algorithm and identifies the maximum number of groups that breaks down the structure of the dataset. Intensive Monte Carlo simulation studies show that the slope statistic outperforms (for the considered examples) some popular methods that have been proposed in the literature. Applications in graph clustering, in iris and breast cancer datasets are shown.


Journal of Bioinformatics and Computational Biology | 2008

MODELING NONLINEAR GENE REGULATORY NETWORKS FROM TIME SERIES GENE EXPRESSION DATA

André Fujita; João Ricardo Sato; Humberto Miguel Garay-Malpartida; Mari Cleide Sogayar; Carlos Eduardo Ferreira; Satoru Miyano

In cells, molecular networks such as gene regulatory networks are the basis of biological complexity. Therefore, gene regulatory networks have become the core of research in systems biology. Understanding the processes underlying the several extracellular regulators, signal transduction, protein-protein interactions, and differential gene expression processes requires detailed molecular description of the protein and gene networks involved. To understand better these complex molecular networks and to infer new regulatory associations, we propose a statistical method based on vector autoregressive models and Granger causality to estimate nonlinear gene regulatory networks from time series microarray data. Most of the models available in the literature assume linearity in the inference of gene connections; moreover, these models do not infer directionality in these connections. Thus, a priori biological knowledge is required. However, in pathological cases, no a priori biological information is available. To overcome these problems, we present the nonlinear vector autoregressive (NVAR) model. We have applied the NVAR model to estimate nonlinear gene regulatory networks based entirely on gene expression profiles obtained from DNA microarray experiments. We show the results obtained by NVAR through several simulations and by the construction of three actual gene regulatory networks (p53, NF-kappaB, and c-Myc) for HeLa cells.

Collaboration


Dive into the André Fujita's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Edson Amaro

University of São Paulo

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge