Evrim Acar
University of Copenhagen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Evrim Acar.
IEEE Transactions on Knowledge and Data Engineering | 2009
Evrim Acar; Bülent Yener
Two-way arrays or matrices are often not enough to represent all the information in the data and standard two-way analysis techniques commonly applied on matrices may fail to find the underlying structures in multi-modal datasets. Multiway data analysis has recently become popular as an exploratory analysis tool in discovering the structures in higher-order datasets, where data have more than two modes. We provide a review of significant contributions in the literature on multiway models, algorithms as well as their applications in diverse disciplines including chemometrics, neuroscience, social network analysis, text mining and computer vision.
ACM Transactions on Knowledge Discovery From Data | 2011
Daniel M. Dunlavy; Tamara G. Kolda; Evrim Acar
The data in many disciplines such as social networks, Web analysis, etc. is link-based, and the link structure can be exploited for many different data mining tasks. In this article, we consider the problem of temporal link prediction: Given link data for times 1 through T, can we predict the links at time T + 1? If our data has underlying periodic structure, can we predict out even further in time, i.e., links at time T + 2, T + 3, etc.? In this article, we consider bipartite graphs that evolve over time and consider matrix- and tensor-based methods for predicting future links. We present a weight-based method for collapsing multiyear data into a single matrix. We show how the well-known Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural three-dimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix- and tensor-based techniques are effective for temporal link prediction despite the inherent difficulty of the problem. Additionally, we show that tensor-based techniques are particularly effective for temporal data with varying periodic patterns.
intelligent systems in molecular biology | 2007
Evrim Acar; Canan Aykut-Bingol; Haluk Bingol; Rasmus Bro; Bülent Yener
MOTIVATION The success or failure of an epilepsy surgery depends greatly on the localization of epileptic focus (origin of a seizure). We address the problem of identification of a seizure origin through an analysis of ictal electroencephalogram (EEG), which is proven to be an effective standard in epileptic focus localization. SUMMARY With a goal of developing an automated and robust way of visual analysis of large amounts of EEG data, we propose a novel approach based on multiway models to study epilepsy seizure structure. Our contributions are 3-fold. First, we construct an Epilepsy Tensor with three modes, i.e. time samples, scales and electrodes, through wavelet analysis of multi-channel ictal EEG. Second, we demonstrate that multiway analysis techniques, in particular parallel factor analysis (PARAFAC), provide promising results in modeling the complex structure of an epilepsy seizure, localizing a seizure origin and extracting artifacts. Third, we introduce an approach for removing artifacts using multilinear subspace analysis and discuss its merits and drawbacks. RESULTS Ictal EEG analysis of 10 seizures from 7 patients are included in this study. Our results for 8 seizures match with clinical observations in terms of seizure origin and extracted artifacts. On the other hand, for 2 of the seizures, seizure localization is not achieved using an initial trial of PARAFAC modeling. In these cases, first, we apply an artifact removal method and subsequently apply the PARAFAC model on the epilepsy tensor from which potential artifacts have been removed. This method successfully identifies the seizure origin in both cases.
Journal of Chemometrics | 2011
Evrim Acar; Daniel M. Dunlavy; Tamara G. Kolda
Tensor decompositions are higher‐order analogues of matrix decompositions and have proven to be powerful tools for data analysis. In particular, we are interested in the canonical tensor decomposition, otherwise known as CANDECOMP/PARAFAC (CP), which expresses a tensor as the sum of component rank‐one tensors and is used in a multitude of applications such as chemometrics, signal processing, neuroscience and web analysis. The task of computing CP, however, can be difficult. The typical approach is based on alternating least‐squares (ALS) optimization, but it is not accurate in the case of overfactoring. High accuracy can be obtained by using nonlinear least‐squares (NLS) methods; the disadvantage is that NLS methods are much slower than ALS. In this paper, we propose the use of gradient‐based optimization methods. We discuss the mathematical calculation of the derivatives and show that they can be computed efficiently, at the same cost as one iteration of ALS. Computational experiments demonstrate that the gradient‐based optimization methods are more accurate than ALS and faster than NLS in terms of total computation time. Copyright
international conference on data mining | 2009
Evrim Acar; Daniel M. Dunlavy; Tamara G. Kolda
The data in many disciplines such as social networks, web analysis, etc. is link-based, and the link structure can be exploited for many different data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for time periods 1 through T, can we predict the links in time period T +1? Specifically, we look at bipartite graphs changing over time and consider matrix- and tensor-based methods for predicting links. We present a weight-based method for collapsing multi-year data into a single matrix. We show how the well-known Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural three-dimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix and tensor-based techniques are effective for temporal link prediction despite the inherent difficulty of the problem.
Data Mining and Knowledge Discovery | 2015
Beyza Ermis; Evrim Acar; A. Taylan Cemgil
This study deals with missing link prediction, the problem of predicting the existence of missing connections between entities of interest. We approach the problem as filling in missing entries in a relational dataset represented by several matrices and multiway arrays, that will be simply called tensors. Consequently, we address the link prediction problem by data fusion formulated as simultaneous factorization of several observation tensors where latent factors are shared among each observation. Previous studies on joint factorization of such heterogeneous datasets have focused on a single loss function (mainly squared Euclidean distance or Kullback–Leibler-divergence) and specific tensor factorization models (CANDECOMP/PARAFAC and/or Tucker). However, in this paper, we study various alternative tensor models as well as loss functions including the ones already studied in the literature using the generalized coupled tensor factorization framework. Through extensive experiments on two real-world datasets, we demonstrate that (i) joint analysis of data from multiple sources via coupled factorization significantly improves the link prediction performance, (ii) selection of a suitable loss function and a tensor factorization model is crucial for accurate missing link prediction and loss functions that have not been studied for link prediction before may outperform the commonly-used loss functions, (iii) joint factorization of datasets can handle difficult cases, such as the cold start problem that arises when a new entity enters the dataset, and (iv) our approach is scalable to large-scale data.
Archive | 2010
Sandia Report; Daniel M. Dunlavy; Tamara G. Kolda; Evrim Acar
We present Poblano v1.0, a Matlab toolbox for solving gradient-based unconstrained optimization problems. Poblano implements three optimization methods (nonlinear conjugate gradients, limited-memory BFGS, and truncated Newton) that require only first order derivative information. In this paper, we describe the Poblano methods, provide numerous examples on how to use Poblano, and present results of Poblano used in solving problems from a standard test collection of unconstrained optimization problems.
BMC Bioinformatics | 2014
Evrim Acar; Evangelos E. Papalexakis; Gözde Gürdeniz; Morten Rasmussen; Anders J. Lawaetz; Mathias Nilsson; Rasmus Bro
BackgroundAnalysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors.ResultsWhile the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data.ConclusionsWe have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data.
BMC Systems Biology | 2008
Bülent Yener; Evrim Acar; Pheadra Aguis; Kristin P. Bennett; Scott L. Vandenberg; George E. Plopper
BackgroundSystems biology refers to multidisciplinary approaches designed to uncover emergent properties of biological systems. Stem cells are an attractive target for this analysis, due to their broad therapeutic potential. A central theme of systems biology is the use of computational modeling to reconstruct complex systems from a wealth of reductionist, molecular data (e.g., gene/protein expression, signal transduction activity, metabolic activity, etc.). A number of deterministic, probabilistic, and statistical learning models are used to understand sophisticated cellular behaviors such as protein expression during cellular differentiation and the activity of signaling networks. However, many of these models are bimodal i.e., they only consider row-column relationships. In contrast, multiway modeling techniques (also known as tensor models) can analyze multimodal data, which capture much more information about complex behaviors such as cell differentiation. In particular, tensors can be very powerful tools for modeling the dynamic activity of biological networks over time. Here, we review the application of systems biology to stem cells and illustrate application of tensor analysis to model collagen-induced osteogenic differentiation of human mesenchymal stem cells.ResultsWe applied Tucker1, Tucker3, and Parallel Factor Analysis (PARAFAC) models to identify protein/gene expression patterns during extracellular matrix-induced osteogenic differentiation of human mesenchymal stem cells. In one case, we organized our data into a tensor of type protein/gene locus link × gene ontology category × osteogenic stimulant, and found that our cells expressed two distinct, stimulus-dependent sets of functionally related genes as they underwent osteogenic differentiation. In a second case, we organized DNA microarray data in a three-way tensor of gene IDs × osteogenic stimulus × replicates, and found that application of tensile strain to a collagen I substrate accelerated the osteogenic differentiation induced by a static collagen I substrate.ConclusionOur results suggest gene- and protein-level models whereby stem cells undergo transdifferentiation to osteoblasts, and lay the foundation for mechanistic, hypothesis-driven studies. Our analysis methods are applicable to a wide range of stem cell differentiation models.
BMC Genomics | 2007
Kristin P. Bennett; Charles Bergeron; Evrim Acar; Robert F. Klees; Scott L. Vandenberg; Bülent Yener; George E. Plopper
BackgroundRecently, we demonstrated that human mesenchymal stem cells (hMSC) stimulated with dexamethazone undergo gene focusing during osteogenic differentiation (Stem Cells Dev 14(6): 1608–20, 2005). Here, we examine the protein expression profiles of three additional populations of hMSC stimulated to undergo osteogenic differentiation via either contact with pro-osteogenic extracellular matrix (ECM) proteins (collagen I, vitronectin, or laminin-5) or osteogenic media supplements (OS media). Specifically, we annotate these four protein expression profiles, as well as profiles from naïve hMSC and differentiated human osteoblasts (hOST), with known gene ontologies and analyze them as a tensor with modes for the expressed proteins, gene ontologies, and stimulants.ResultsDirect component analysis in the gene ontology space identifies three components that account for 90% of the variance between hMSC, osteoblasts, and the four stimulated hMSC populations. The directed component maps the differentiation stages of the stimulated stem cell populations along the differentiation axis created by the difference in the expression profiles of hMSC and hOST. Surprisingly, hMSC treated with ECM proteins lie closer to osteoblasts than do hMSC treated with OS media. Additionally, the second component demonstrates that proteomic profiles of collagen I- and vitronectin-stimulated hMSC are distinct from those of OS-stimulated cells. A three-mode tensor analysis reveals additional focus proteins critical for characterizing the phenotypic variations between naïve hMSC, partially differentiated hMSC, and hOST.ConclusionThe differences between the proteomic profiles of OS-stimulated hMSC and ECM-hMSC characterize different transitional phenotypes en route to becoming osteoblasts. This conclusion is arrived at via a three-mode tensor analysis validated using hMSC plated on laminin-5.