Cristian Gatu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cristian Gatu is active.

Explore More

Publication

Featured researches published by Cristian Gatu.

Journal of Computational and Graphical Statistics | 2006

Branch-and-Bound Algorithms for Computing the Best-Subset Regression Models

Cristian Gatu; Erricos John Kontoghiorghes

An efficient branch-and-bound algorithm for computing the best-subset regression models is proposed. The algorithm avoids the computation of the whole regression tree that generates all possible subset models. It is formally shown that if the branch-and-bound test holds, then the current subtree together with its right-hand side subtrees are cut. This reduces significantly the computational burden of the proposed algorithm when compared to an existing leaps-and-bounds method which generates two trees. Specifically, the proposed algorithm, which is based on orthogonal transformations, outperforms by O(n3) the leaps-and-bounds strategy. The criteria used in identifying the best subsets are based on monotone functions of the residual sum of squares (RSS) such as R2, adjusted R2, mean square error of prediction, and Cp. Strategies and heuristics that improve the computational performance of the proposed algorithm are investigated. A computationally efficient heuristic version of the branch-and-bound strategy which decides to cut subtrees using a tolerance parameter is proposed. The heuristic algorithm derives models close to the best ones. However, it is shown analytically that the relative error of the RSS, and consequently the corresponding statistic, of the computed subsets is smaller than the value of the tolerance parameter which lies between zero and one. Computational results and experiments on random and real data are presented and analyzed.

Computational Statistics & Data Analysis | 2007

Efficient algorithms for computing the best subset regression models for large-scale problems

Marc Hofmann; Cristian Gatu; Erricos John Kontoghiorghes

Several strategies for computing the best subset regression models are proposed. Some of the algorithms are modified versions of existing regression-tree methods, while others are new. The first algorithm selects the best subset models within a given size range. It uses a reduced search space and is found to outperform computationally the existing branch-and-bound algorithm. The properties and computational aspects of the proposed algorithm are discussed in detail. The second new algorithm preorders the variables inside the regression tree. A radius is defined in order to measure the distance of a node from the root of the tree. The algorithm applies the preordering to all nodes which have a smaller distance than a certain radius that is given a priori. An efficient method of preordering the variables is employed. The experimental results indicate that the algorithm performs best when preordering is employed on a radius of between one quarter and one third of the number of variables. The algorithm has been applied with such a radius to tackle large-scale subset-selection problems that are considered to be computationally infeasible by conventional exhaustive-selection methods. A class of new heuristic strategies is also proposed. The most important of these is one that assigns a different tolerance value to each subset model size. This strategy with different kind of tolerances is equivalent to all exhaustive and heuristic subset-selection strategies. In addition the strategy can be used to investigate submodels having noncontiguous size ranges. Its implementation provides a flexible tool for tackling large scale models.

parallel computing | 2003

Parallel algorithms for computing all possible subset regression models using the QR decomposition

Cristian Gatu; Erricos John Kontoghiorghes

Efficient parallel algorithms for computing all possible subset regression models are proposed. The algorithms are based on the dropping columns method that generates a regression tree. The properties of the tree are exploited in order to provide an efficient load balancing which results in no inter-processor communication. Theoretical measures of complexity suggest linear speedup. The parallel algorithms are extended to deal with the general linear and seemingly unrelated regression models. The case where new variables are added to the regression model is also considered. Experimental results on a shared memory machine are presented and analyzed.

Computational Statistics & Data Analysis | 2007

A graph approach to generate all possible regression submodels

Cristian Gatu; Petko Yanev; Erricos John Kontoghiorghes

A regression graph to enumerate and evaluate all possible subset regression models is introduced. The graph is a generalization of a regression tree. All the spanning trees of the graph are minimum spanning trees and provide an optimal computational procedure for generating all possible submodels. Each minimum spanning tree has a different structure and characteristics. An adaptation of a branch-and-bound algorithm which computes the best-subset models using the regression graph framework is proposed. Experimental results and comparison with an existing method based on a regression tree are presented and discussed.

Computational Management Science | 2005

Efficient strategies for deriving the subset VAR models

Cristian Gatu; Erricos John Kontoghiorghes

Abstract.Algorithms for computing the subset Vector Autoregressive (VAR) models are proposed. These algorithms can be used to choose a subset of the most statistically-significant variables of a VAR model. In such cases, the selection criteria are based on the residual sum of squares or the estimated residual covariance matrix. The VAR model with zero coefficient restrictions is formulated as a Seemingly Unrelated Regressions (SUR) model. Furthermore, the SUR model is transformed into one of smaller size, where the exogenous matrices comprise columns of a triangular matrix. Efficient algorithms which exploit the common columns of the exogenous matrices, sparse structure of the variance-covariance of the disturbances and special properties of the SUR models are investigated. The main computational tool of the selection strategies is the generalized QR decomposition and its modification.

Journal of Computational and Graphical Statistics | 2010

An Exact Least Trimmed Squares Algorithm for a Range of Coverage Values

Marc Hofmann; Cristian Gatu; Erricos John Kontoghiorghes

A new algorithm to solve exact least trimmed squares (LTS) regression is presented. The adding row algorithm (ARA) extends existing methods that compute the LTS estimator for a given coverage. It employs a tree-based strategy to compute a set of LTS regressors for a range of coverage values. Thus, prior knowledge of the optimal coverage is not required. New nodes in the regression tree are generated by updating the QR decomposition of the data matrix after adding one observation to the regression model. The ARA is enhanced by employing a branch and bound strategy. The branch and bound algorithm is an exhaustive algorithm that uses a cutting test to prune nonoptimal subtrees. It significantly improves over the ARA in computational performance. Observation preordering throughout the traversal of the regression tree is investigated. A computationally efficient and numerically stable calculation of the bounds using Givens rotations is designed around the QR decomposition, avoiding the need to explicitly update the triangular factor when an observation is added. This reduces the overall computational load of the preordering device by approximately half. A solution is proposed to allow preordering when the model is underdetermined. It employs pseudo-orthogonal rotations to downdate the QR decomposition. The strategies are illustrated by example. Experimental results confirm the computational efficiency of the proposed algorithms. Supplemental materials (R package and formal proofs) are available online.

Archive | 2007

Optimisation, econometric and financial analysis

Erricos John Kontoghiorghes; Cristian Gatu

Optimisation Models and Methods: A Supply Chain Network Perspective for Electric Power Generation, Supply, Transmission, and Consumption.- Worst-Case Modelling for Management Decisions under Incomplete Information, with Application to Electricity Spot Markets.- An Approximate Winner Determination Algorithm for Hybrid Procurement Mechanisms in Logistics.- Proximal-ACCPM: A Versatile Oracle Based Optimization Method.- A Survey of Different Integer Programming Formulations of the Travelling Salesman Problem.- Econometric Modelling and Prediction: The Threshold Accepting Optimization Algorithm in Economics and Statistics.- The Autocorrelation Functions in SETARMA Models.- Trend Estimation and De-Trending.- Non-Dyadic Wavelet Analysis.- Measuring Core Inflation by Multivariate Structural Time Series Models.- Financial Modelling: Random Portfolios for Performance Measurement.- Real Options with Random Controls, Rare Events, and Risk-to-Ruin.

Computational Statistics & Data Analysis | 2007

Editorial: Special issue on statistical algorithms and software in R

Cristian Gatu; James E. Gentle; John Hinde; Moon Yul Huh

The journal of Computational Statistics and Data Analysis (CSDA) regularly publishes papers with a strong algorithmic and software component. Some recent CSDA related articles can be found in Bustos and Frery (2006), Hammill and Preisser (2006), Keeling and Pavur (2007), Novikov and Oberman (2007), Rosenthal (2007) and Tomasi and Bro (2006). This is the first CSDA special issue on Statistical Algorithms and Software. It brings together a number of contributions that relate to statistical software, algorithms, and methodology. The first five papers are concerned with statistical software and packages. Kratizig (2007) introduces an open-source Java software framework, JStatCom, that aims to support the development of rich desktop clients for data analysis. In the second article, Aluja-Banet et al. (2007) present a system for multipurpose data fusion based on the k-nearest neighbor hot-deck imputation method. Fujiwara et al. (2007) have implemented a general statistical language using Java and MathML technologies. A package for co-breaking analysis, COBRA, is presented by Massmann (2007). Finally, Höhle and Feldmann (2007) describes an R package, RLadyBug, for the simulation, visualization and estimation of stochastic epidemic models. The second part is concerned with statistical algorithms. Chavent et al. (2007) present a divisive hierarchical clustering algorithm, DIVCLUS-T, based on a monothetic bipartitional approach, allowing the dendrogram of the hierarchy to be read as a decision tree. Fernando and Kulatunga (2007) describe a Fortran program for the fitting of multivariate isotonic regression. Adaptive population-based search algorithms for the estimation of nonlinear regression parameters are proposed and implemented by Tvrdik et al. (2007). Gramm et al. (2007) provides a comparison and evaluation of algorithms for compact letter displays. Beninel and Grelaud (2007) devise algorithms for computing exact distribution values of statistics-linear combination of 3-nomial variables. The third and the last part of the special issue gathers contributions related to methodological algorithms. Park et al. (2007) present an algorithm for sampling streaming data with replacement. Bernholt et al. (2007) describe algorithms for computing the least quartile difference estimator in the plane. Several applications of random recursive partitioning are discussed by Iacus and Porro (2007). The article by Consonni and Marin (2007) illustrates the behavior of mean-field variational Bayesian inference in the setting of the probit model. Finally, Gatu et al. (2007) introduce a graph approach to the combinatorial problem of subset regression model selection.

Computational Statistics & Data Analysis | 2010

Editorial: Second special issue on statistical algorithms and software

Cristian Gatu; B. D. McCullough

Computational Statistics and Data Analysis has long published articles on algorithms and software. Recently it published its first Special Issue on Statistical Algorithms and Software (Gatu et al., 2007a). The 15 papers in the issue included, among others: Gatu et al. (2007b) on all possible regression submodels; Park et al. (2007) on sampling streaming data; Tvrdik, Krivy and Misik (2007) on adaptive population search; and several software packages (Aluja-Banet et al., 2007; Massmann, 2007; Hohle and Feldmann, 2007). The present issue is the journal’s second such special issue, in which we have again collected fifteen papers. Three papers focus on specific software packages. Bulla et al. (2010) present an R package for analyzing hidden semi-Markov models. Harrington and Salibian-Barrera (2010) use the package BIRCH to find approximate solutions to combinatorial problems on large datasets. Wua et al. (2010) introduce the package GAP, which is designed to facilitate matrix visualization and cluster analysis. Seven papers present algorithms for solving statistical problems. Davidov and Iliopoulos (2010) comment on an iterative algorithm for nonparametric estimation in biased sampling models. Hu and Kam-Wah (2010) offer a Bayesian approach to distributed evolutionary Monte Carlo. Iacobucci et al. (2010) use double Rao-Blackwellisation to enhance variance stabilisation in Population Monte Carlo. McNicholas et al. (2010) offer serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Poitevineau and Lecoutre (2010) use the K-prime and K-square distributions for Bayesian predictive procedures. Reddy and Rajaratnam (2010) use component-wise parameter smoothing to learn mixture models. Saadaoui (2010) reviews EM acceleration procedures and offers a new method. Fivemethodological papers have algorithmic or software components. Escanciano and Jacho-Chavez (2010) approximate the critical values of the Cramer-vonMises statistic. Gallegos and Ritter (2010) use combinatorial optimization for clustering with cardinality constraints. Hanea et al. (2010) use Bayesian Belief Networks to mine and visualize ordinal data. Yang et al. (2010) discuss generalized quasi-regression. Yucel and Demirtas (2010) use simulation to assess the impact of non-normal random effects on inference by multiple imputation. Computational Statistics and Data Analysis will continue to publish special issues devoted to statistical algorithms and software, providing researchers a specialized outlet for disseminating their advances in these areas.

Statistics and Computing | 2013

A fast algorithm for non-negativity model selection

Cristian Gatu; Erricos John Kontoghiorghes

An efficient optimization algorithm for identifying the best least squares regression model under the condition of non-negative coefficients is proposed. The algorithm exposits an innovative solution via the unrestricted least squares and is based on the regression tree and branch-and-bound techniques for computing the best subset regression. The aim is to filling a gap in computationally tractable solutions to the non-negative least squares problem and model selection. The proposed method is illustrated with a real dataset. Experimental results on real and artificial random datasets confirm the computational efficacy of the new strategy and demonstrates its ability to solve large model selection problems that are subject to non-negativity constrains.

Explore More