Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Aurelie C. Lozano is active.

Publication


Featured researches published by Aurelie C. Lozano.


Bioinformatics | 2009

Grouped graphical Granger modeling for gene expression regulatory networks discovery

Aurelie C. Lozano; Naoki Abe; Yan Liu; Saharon Rosset

We consider the problem of discovering gene regulatory networks from time-series microarray data. Recently, graphical Granger modeling has gained considerable attention as a promising direction for addressing this problem. These methods apply graphical modeling methods on time-series data and invoke the notion of ‘Granger causality’ to make assertions on causality through inference on time-lagged effects. Existing algorithms, however, have neglected an important aspect of the problem—the group structure among the lagged temporal variables naturally imposed by the time series they belong to. Specifically, existing methods in computational biology share this shortcoming, as well as additional computational limitations, prohibiting their effective applications to the large datasets including a large number of genes and many data points. In the present article, we propose a novel methodology which we term ‘grouped graphical Granger modeling method’, which overcomes the limitations mentioned above by applying a regression method suited for high-dimensional and large data, and by leveraging the group structure among the lagged temporal variables according to the time series they belong to. We demonstrate the effectiveness of the proposed methodology on both simulated and actual gene expression data, specifically the human cancer cell (HeLa S3) cycle data. The simulation results show that the proposed methodology generally exhibits higher accuracy in recovering the underlying causal structure. Those on the gene expression data demonstrate that it leads to improved accuracy with respect to prediction of known links, and also uncovers additional causal relationships uncaptured by earlier works. Contact: [email protected]


knowledge discovery and data mining | 2009

Spatial-temporal causal modeling for climate change attribution

Aurelie C. Lozano; Hongfei Li; Alexandru Niculescu-Mizil; Yan Liu; Claudia Perlich; J. R. M. Hosking; Naoki Abe

Attribution of climate change to causal factors has been based predominantly on simulations using physical climate models, which have inherent limitations in describing such a complex and chaotic system. We propose an alternative, data centric, approach that relies on actual measurements of climate observations and human and natural forcing factors. Specifically, we develop a novel method to infer causality from spatial-temporal data, as well as a procedure to incorporate extreme value modeling into our method in order to address the attribution of extreme climate events, such as heatwaves. Our experimental results on a real world dataset indicate that changes in temperature are not solely accounted for by solar radiance, but attributed more significantly to CO2 and other greenhouse gases. Combined with extreme value modeling, we also show that there has been a significant increase in the intensity of extreme temperatures, and that such changes in extreme temperature are also attributable to greenhouse gases. These preliminary results suggest that our approach can offer a useful alternative to the simulation-based approach to climate modeling and attribution, and provide valuable insights from a fresh perspective.


knowledge discovery and data mining | 2008

Multi-class cost-sensitive boosting with p-norm loss functions

Aurelie C. Lozano; Naoki Abe

We propose a family of novel cost-sensitive boosting methods for multi-class classification by applying the theory of gradient boosting to p-norm based cost functionals. We establish theoretical guarantees including proof of convergence and convergence rates for the proposed methods. Our theoretical treatment provides interpretations for some of the existing algorithms in terms of the proposed family, including a generalization of the costing algorithm, DSE and GBSE-t, and the Average Cost method. We also experimentally evaluate the performance of our new algorithms against existing methods of cost sensitive boosting, including AdaCost, CSB2, and AdaBoost.M2 with cost-sensitive weight initialization. We show that our proposed scheme generally achieves superior results in terms of cost minimization and, with the use of higher order p-norm loss in certain cases, consistently outperforms the comparison methods, thus establishing its empirical advantage.


international conference on data mining | 2014

Orthogonal Matching Pursuit for Sparse Quantile Regression

Aleksandr Y. Aravkin; Aurelie C. Lozano; Ronny Luss; Prabhanjan Kambadur

We consider new formulations and methods for sparse quantile regression in the high-dimensional setting. Quantile regression plays an important role in many data mining applications, including outlier-robust exploratory analysis in gene selection. In addition, the sparsity consideration in quantile regression enables the exploration of the entire conditional distribution of the response variable given the predictors and therefore yields a more comprehensive view of the important predictors. We propose a generalized Orthogonal Matching Pursuit algorithm for variable selection, taking the misfit loss to be either the traditional quantile loss or a smooth version we call quantile Huber, and compare the resulting greedy approaches with convex sparsity-regularized formulations. We apply a recently proposed interior point methodology to efficiently solve all formulations, provide theoretical guarantees of consistent estimation, and demonstrate the performance of our approach using empirical studies of simulated and genomic datasets.


Journal of Bioinformatics and Computational Biology | 2011

TEMPORAL GRAPHICAL MODELS FOR CROSS-SPECIES GENE REGULATORY NETWORK DISCOVERY

Yan Liu; Alexandru Niculescu-Mizil; Aurelie C. Lozano; Yong Lu

Many genes and biological processes function in similar ways across different species. Cross-species gene expression analysis, as a powerful tool to characterize the dynamical properties of the cell, has found a number of applications, such as identifying a conserved core set of cell cycle genes. However, to the best of our knowledge, there is limited effort on developing appropriate techniques to capture the causality relations between genes from time-series microarray data across species. In this paper, we present hidden Markov random field regression with L(1) penalty to uncover the regulatory network structure for different species. The algorithm provides a framework for sharing information across species via hidden component graphs and is able to incorporate domain knowledge across species easily. We demonstrate our method on two synthetic datasets and apply it to discover causal graphs from innate immune response data.


Automatica | 2017

Generalized Kalman smoothing: Modeling and algorithms

Aleksandr Y. Aravkin; James V. Burke; Lennart Ljung; Aurelie C. Lozano; Gianluigi Pillonetto

State-space smoothing has found many applications in science and engineering. Under linear and Gaussian assumptions, smoothed estimates can be obtained using efficient recursions, for example Rauch-Tung-Striebel and Mayne-Fraser algorithms. Such schemes are equivalent to linear algebraic techniques that minimize a convex quadratic objective function with structure induced by the dynamic model. These classical formulations fall short in many important circumstances. For instance, smoothers obtained using quadratic penalties can fail when outliers are present in the data, and cannot track impulsive inputs and abrupt state changes. Motivated by these shortcomings, generalized Kalman smoothing formulations have been proposed in the last few years, replacing quadratic models with more suitable, often nonsmooth, convex functions. In contrast to classical models, these general estimators require use of iterated algorithms, and these have received increased attention from control, signal processing, machine learning, and optimization communities. In this survey we show that the optimization viewpoint provides the control and signal processing community great freedom in the development of novel modeling and inference frameworks for dynamical systems. We discuss general statistical models for dynamic systems, making full use of nonsmooth convex penalties and constraints, and providing links to important models in signal processing and machine learning. We also survey optimization techniques for these formulations, paying close attention to dynamic problem structure. Modeling concepts and algorithms are illustrated with numerical examples.


PLOS ONE | 2015

Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods.

David Haws; Irina Rish; Simon Teyssedre; Dan He; Aurelie C. Lozano; Prabhanjan Kambadur; Zivan Karaman; Laxmi Parida

Accurate prediction of complex traits based on whole-genome data is a computational problem of paramount importance, particularly to plant and animal breeders. However, the number of genetic markers is typically orders of magnitude larger than the number of samples (p >> n), amongst other challenges. We assessed the effectiveness of a diverse set of state-of-the-art methods on publicly accessible real data. The most surprising finding was that approaches with feature selection performed better than others on average, in contrast to the expectation in the community that variable selection is mostly ineffective, i.e. that it does not improve accuracy of prediction, in spite of p >> n. We observed superior performance despite a somewhat simplistic approach to variable selection, possibly suggesting an inherent robustness. This bodes well in general since the variable selection methods usually improve interpretability without loss of prediction power. Apart from identifying a set of benchmark data sets (including one simulated data), we also discuss the performance analysis for each data set in terms of the input characteristics.


knowledge discovery and data mining | 2013

Robust sparse estimation of multiresponse regression and inverse covariance matrix via the L2 distance

Aurelie C. Lozano; Huijing Jiang; Xinwei Deng

We propose a robust framework to jointly perform two key modeling tasks involving high dimensional data: (i) learning a sparse functional mapping from multiple predictors to multiple responses while taking advantage of the coupling among responses, and (ii) estimating the conditional dependency structure among responses while adjusting for their predictors. The traditional likelihood-based estimators lack resilience with respect to outliers and model misspecification. This issue is exacerbated when dealing with high dimensional noisy data. In this work, we propose instead to minimize a regularized distance criterion, which is motivated by the minimum distance functionals used in nonparametric methods for their excellent robustness properties. The proposed estimates can be obtained efficiently by leveraging a sequential quadratic programming algorithm. We provide theoretical justification such as estimation consistency for the proposed estimator. Additionally, we shed light on the robustness of our estimator through its linearization, which yields a combination of weighted lasso and graphical lasso with the sample weights providing an intuitive explanation of the robustness. We demonstrate the merits of our framework through simulation study and the analysis of real financial and genetics data.


Nature Communications | 2018

Stratification of TAD boundaries reveals preferential insulation of super-enhancers by strong boundaries

Yixiao Gong; Charalampos Lazaris; Theodore Sakellaropoulos; Aurelie C. Lozano; Prabhanjan Kambadur; Panagiotis Ntziachristos; Iannis Aifantis; Aristotelis Tsirigos

The metazoan genome is compartmentalized in areas of highly interacting chromatin known as topologically associating domains (TADs). TADs are demarcated by boundaries mostly conserved across cell types and even across species. However, a genome-wide characterization of TAD boundary strength in mammals is still lacking. In this study, we first use fused two-dimensional lasso as a machine learning method to improve Hi-C contact matrix reproducibility, and, subsequently, we categorize TAD boundaries based on their insulation score. We demonstrate that higher TAD boundary insulation scores are associated with elevated CTCF levels and that they may differ across cell types. Intriguingly, we observe that super-enhancers are preferentially insulated by strong boundaries. Furthermore, we demonstrate that strong TAD boundaries and super-enhancer elements are frequently co-duplicated in cancer patients. Taken together, our findings suggest that super-enhancers insulated by strong TAD boundaries may be exploited, as a functional unit, by cancer cells to promote oncogenesis.Topologically associating domains (TADs) detected by Hi-C technologies are megabase-scale areas of highly interacting chromatin. Here Gong, Lazaris et al. develop a computational approach to improve the reproducibility of Hi-C contact matrices and stratify TAD boundaries based on their insulating strength.


Journal of Applied Statistics | 2015

Multi-relational learning via hierarchical nonparametric Bayesian collective matrix factorization

Hongxia Yang; Aurelie C. Lozano

Relational learning addresses problems where the data come from multiple sources and are linked together through complex relational networks. Two important goals are pattern discovery (e.g. by (co)-clustering) and predicting unknown values of a relation, given a set of entities and observed relations among entities. In the presence of multiple relations, combining information from different but related relations can lead to better insights and improved prediction. For this purpose, we propose a nonparametric hierarchical Bayesian model that improves on existing collaborative factorization models and frames a large number of relational learning problems. The proposed model naturally incorporates (co)-clustering and prediction analysis in a single unified framework, and allows for the estimation of entire missing row or column vectors. We develop an efficient Gibbs algorithm and a hybrid Gibbs using Newtons method to enable fast computation in high dimensions. We demonstrate the value of our framework on simulated experiments and on two real-world problems: discovering kinship systems and predicting the authors of certain articles based on article–word co-occurrence features.

Collaboration


Dive into the Aurelie C. Lozano's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yan Liu

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge