Vartan Choulakian
Université de Moncton
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vartan Choulakian.
Technometrics | 2001
Vartan Choulakian; M. A. Stephens
Tests of fit are given for the generalized Pareto distribution (GPD) based on Cramér–von Mises statistics. Examples are given to illustrate the estimation techniques and the goodness-of-fit procedures. The tests are applied to the exceedances over given thresholds for 238 river flows in Canada; in general, the GPD provides an adequate fit. The tests are useful in deciding the threshold in such applications; this method is investigated and also the closeness of the GPD to some other distributions that might be used for long-tailed data.
Canadian Journal of Statistics-revue Canadienne De Statistique | 1994
Vartan Choulakian; Richard A. Lockhart; M. A. Stephens
Cramer-von Mises statistics are developed for use in testing for discrete distributions, and tables are given for tests for the discrete uniform distribution. The Cram6r-von Mises family of goodness-of-fit statistics is a well-known group of statistics used to test fit to a continuous distribution. In this article we extend the family to provide tests for discrete distributions. The statistics examined are the analogues of those called Cramer-von Mises, Watson, and Anderson-Darling, namely W2, U2 and A2 respectively, and their components. We provide formulae for the test statistics, and asymptotic percentage points for the test for a uniform distribution with k cells. The tests are based on the empirical distribution function (EDF) of the sample. They are closely related to Pearsons X2 test, and to Neyman-Barton smooth tests; in particular, all the tests can be broken down into components, as has been observed by many authors. It is suggested that A2 be used to test the overall null hypothesis in general, and U2 for the particular case where observations are counts around a circle. Their components can be used to test for particular types of departure from the null. In Section 2, we define the test statistics and give the general distribution theory. In Section 3 the solution of the uniform case is given, together with two examples; in Section 4 modified versions of the statistics are discussed. In Section 5 power studies are given which show that A2 is a good omnibus test statistic. Finally, in Section 6 we discuss the use of components as individual test statistics and demonstrate the use of a graphical procedure called the Z-plot to determine, when a statistic is found to be significant, the type of departure from the null.
conference on communication networks and services research | 2004
L. Pelletier; Jalal Almhana; Vartan Choulakian
We present a new spam filter which acts as an additional layer in the spam filtering process. This filter is based on what we call a representative vocabulary. Spam e-mails are divided into categories in which each category is represented by a set of tokens which form a representative text (RT). Tokens are strings of characters (words, sentences, or sometimes meaningless strings of characters). This RT is used to compute a resemblance ratio with incoming e-mails. With this ratio, we decide whether the incoming e-mail is a spam. This filter was implemented and integrated to Spamihilator software. Some experimental and interesting results are presented.
Computational Statistics & Data Analysis | 2006
Zikuan Liu; Jalal Almhana; Vartan Choulakian; Robert McGorman
Since histograms of many real network traces show strong evidence of mixture, this paper uses mixture distributions to model Internet traffic and applies the EM algorithm to fit the models. Making use of the fact that at each iteration of the EM algorithm the parameter increment has a positive projection on the gradient of the likelihood function, this paper proposes an online EM algorithm to fit the models and the Bayesian Information Criterion is applied to select the best model. Experimental results on real traces are provided to illustrate the efficiency of the proposed algorithm.
Psychometrika | 1988
Vartan Choulakian
Goodmans (1979, 1981, 1985) loglinear formulation for bi-way contingency tables is extended to tables with or without missing cells and is used for exploratory purposes. A similar formulation is done for three-way tables and generalizations of correspondence analysis are deduced. A generalized version of Goodmans algorithm, based on Newtons elementary unidimensional method is used to estimate the scores in all cases.
Psychometrika | 2003
Vartan Choulakian
The aim of this note is to show that the centroid method has two optimality properties. It yields loadings with the highest sum of absolute values, even in absence of the constraint that the squared component weights be equal. In addition, it yields scores with maximum variance, subject to the constraint that none of the squared component weights be larger than 1.
Psychometrika | 2006
Vartan Choulakian
Taxicab correspondence analysis is based on the taxicab singular value decomposition of a contingency table, and it shares some similar properties with correspondence analysis. It is more robust than the ordinary correspondence analysis, because it gives uniform weights to all the points. The visual map constructed by taxicab correspondence analysis has a larger sweep and clearer perspective than the map obtained by correspondence analysis. Two examples are provided.
Psychometrika | 1996
Vartan Choulakian
Generalized bilinear models are presented for the statistical analysis of two-way arrays. These models combine bilinear models and generalized linear modeling, and yield a family of models that includes many existing models, as well as suggest other potentially useful ones. This approach both unifies and extends models for two-way arrays, including the ability to treat response and explanatory variables differently in the models, and the incorporation of external information about the variables directly into the analysis. A unifying framework for the generalized bilinear models is provided by considering four particular cases which have been proposed and used in the existing statistical literature. A three-step procedure is proposed to analyze data sets by generalized bilinear models. Two data sets of different nature are analyzed.
IEEE Communications Letters | 2006
Zikuan Liu; Jalal Almhana; Vartan Choulakian; Robert McGorman
Internet traffic has been shown to have long-range dependence, and is often modeled by using the fractional Gaussian noise model. The fractional Gaussian noise model can capture the autocorrelation of a real trace, but cannot fit the marginal distribution when the trace has a non-Gaussian marginal distribution. In this letter, we use the inverted Box-Cox transformation to establish a long-range dependent Internet traffic model that can simultaneously capture both the long-range dependence parameter and the marginal distribution of a real trace
conference on communication networks and services research | 2004
Zikuan Liu; Jalal Almhana; Vartan Choulakian; Robert McGorman
In the past decade, many quantities characterizing high-speed telecommunication network performance have been reported to have heavy-tailed distributions, namely, with tails decreasing hyperbolically rather than exponentially. Since mixture distributions can approximate many heavy-tailed distributions with high precision, the paper uses mixture distributions to model Internet traffic and applies the EM algorithm to fit the models. Making use of the fact that, at each iteration of the EM algorithm, the parameter increment has a positive projection on the gradient of the likelihood function, the paper proposes a recursive EM algorithm to fit the models, and the Bayesian information criterion is applied to select the best model. To illustrate the efficiency of the proposed algorithm, numerical results and experimental results on real traffic are provided.