Gilbert Ritschard
University of Geneva
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gilbert Ritschard.
Sociological Methods & Research | 2011
Matthias Studer; Gilbert Ritschard; Alexis Gabadinho; Nicolas S. Müller
In this article, the authors define a methodological framework for analyzing the relationship between state sequences and covariates. Inspired by the principles of analysis of variance, this approach looks at how the covariates explain the discrepancy of the sequences. The authors use the pairwise dissimilarities between sequences to determine the discrepancy, which makes it possible to develop a series of statistical significance–based analysis tools. They introduce generalized simple and multifactor discrepancy-based methods to test for differences between groups, a pseudo-R 2 for measuring the strength of sequence-covariate associations, a generalized Levene statistic for testing differences in the within-group discrepancies, as well as tools and plots for studying the evolution of the differences along the time frame and a regression tree method for discovering the most significant discriminant covariates and their interactions. In addition, the authors extend all methods to account for case weights. The scope of the proposed methodological framework is illustrated using a real-world sequence data set.
Econometrica | 1983
Gilbert Ritschard
This article is devoted to computable techniques for solving comparative static problems when only the sign of the partial derivatives of the model is considered. We first show how to extract unambiguously signed multipliers, or more generally qualitatively linked multipliers. This information then helps to reduce the size of the original system by means of a qualitative aggregation principle which we establish. As to the computation of solutions, a branch-and-bound algorithm is presented which considerably increases the efficiency of the Samuelson-Lancaster elimination principle. Finally we derive an efficient algorithm to check for signed determinants. The techniques are then applied to the analysis of an actual 20 equation model.
Archive | 2012
Fabrice Guillet; Gilbert Ritschard; Djamel A. Zighed
During the last decade, the French-speaking scientific community developed a very strong research activity in the field of Knowledge Discovery and Management (KDM or EGC for Extraction et Gestion des Connaissances in French), which is concerned with, among others, Data Mining, Knowledge Discovery, Business Intelligence, Knowledge Engineering and SemanticWeb. The recent and novel research contributions collected in this book are extended and reworked versions of a selection of the best papers that were originally presented in French at the EGC 2009 Conference held in Strasbourg, France on January 2009. The volume is organized in four parts. Part I includes five papers concerned by various aspects of supervised learning or information retrieval. Part II presents five papers concerned with unsupervised learning issues. Part III includes two papers on data streaming and two on security while in Part IV the last four papers are concerned with ontologies and semantic.
international joint conference on knowledge discovery, knowledge engineering and knowledge management | 2009
Alexis Gabadinho; Gilbert Ritschard; Matthias Studer; Nicolas S. Müller
This paper is concerned with the summarization of a set of categorical sequences. More specifically, the problem studied is the determination of the smallest possible number of representative sequences that ensure a given coverage of the whole set, i.e. that have together a given percentage of sequences in their neighbourhood. The proposed heuristic for extracting the representative subset requires as main arguments a pairwise distance matrix, a representativeness criterion and a distance threshold under which two sequences are considered as redundant or, identically, in the neighborhood of each other. It first builds a list of candidates using a representativeness score and then eliminates redundancy. We propose also a visualization tool for rendering the results and quality measures for evaluating them. The proposed tools have been implemented in our TraMineR R package for mining and visualizing sequence data and we demonstrate their efficiency on a real world example from social sciences. The methods are nonetheless by no way limited to social science data and should prove useful in many other domains.
The Statistician | 1995
Michael Olszak; Gilbert Ritschard
SUMMARY This paper is concerned with the sampling behaviour of raw and partial measures of association between categorical variables. It summarizes the asymptotic results established for raw measures and extends them in a systematic way for the derived partial associations. The validity of the asymptotic results is then stressed by means of a simulation study. Three proportional reduction in error of prediction measures are considered for nominal variables and three concordance-discordance indices for ordinal variables.
data warehousing and knowledge discovery | 2008
Nicolas S. Müller; Alexis Gabadinho; Gilbert Ritschard; Matthias Studer
This article presents some of the facilities offered by our TraMineR R-package for clustering and visualizing sequence data. Firstly, we discuss our implementation of the optimal matching algorithm for evaluating the distance between two sequences and its use for generating a distance matrix for the whole sequence data set. Once such a matrix is obtained, we may use it as input for a cluster analysis, which can be done straightforwardly with any method available in the R statistical environment. Then we present three kinds of plots for visualizing the characteristics of the obtained clusters: an aggregated plot depicting the average sequential behavior of cluster members; an sequence index plot that shows the diversity inside clusters and an original frequency plot that highlights the frequencies of the nmost frequent sequences. TraMineR was designed for analysing sequences representing life courses and our presentation is illustrated on such a real world data set. The material presented should also be of interest for other kind of sequential data such as DNA analysis or web logs.
EGC (best of volume) | 2010
Matthias Studer; Gilbert Ritschard; Alexis Gabadinho; Nicolas S. Müller
In this article we consider objects for which we have a matrix of dissimilarities and we are interested in their links with covariates. We focus on state sequences for which pairwise dissimilarities are given for instance by edit distances. The methods discussed apply however to any kind of objects and measures of dissimilarities. We start with a generalization of the analysis of variance (ANOVA) to assess the link of complex objects (e.g. sequences) with a given categorical variable. The trick is to show that discrepancy among objects can be derived from the sole pairwise dissimilarities, which permits then to identify factors that most reduce this discrepancy.We present a general statistical test and introduce an original way of rendering the results for state sequences. We then generalize the method to the case with more than one factor and discuss its advantages and limitations especially regarding interpretation. Finally, we introduce a new tree method for analyzing discrepancy of complex objects that exploits the former test as splitting criterion. We demonstrate the scope of the methods presented through a study of the factors that most discriminate Swiss occupational trajectories. All methods presented are freely accessible in our TraMineR package for the R statistical environment.
Research Papers by the Institute of Economics and Econometrics, Geneva School of Economics and Management, University of Geneva | 2010
Gilbert Ritschard
The aim of this paper is twofold. First we discuss the origin of tree methods. Essentially we survey earlier methods that led to CHAID (Kass, 1980; Biggs et al., 1991). The second goal is then to explain in details the functioning of CHAID, especially the differences between the original method as described in Kass (1980) and the nowadays currently implemented extension that was proposed by Biggs et al.(1991).
Advances in Life Course Research | 2005
Gilbert Ritschard; Michel Oris
This paper has essentially a methodological purpose. In a first section, we shortly explain why demographers have been relatively reluctant to implement the life course paradigm and methods, while the quantitative focus and the concepts of demographic analysis a priori favored such implementation. A real intellectual crisis has been needed before demographers integrated the necessity to face up the challenge of shifting “from structure to process, from macro to micro, from analysis to synthesis, from certainty to uncertainty” (Willekens, 1999, p. 26). This retrospective look also shows impressive progresses to promote a real interdisciplinarity in population studies, knotting the ties between demography and the social sciences. However, we also note that the success of multivariate causal analyses has been so rapid that some pitfalls are not always avoided. In Section 2, we focus on statistical methods for studying transitions. First, readers mind is refreshed about regression like models, and then we consider the issue of population heterogeneity. We show how it could affect results interpretation, and illustrate the interest of robust estimates and of the notion of shared frailty to deal with it. We also present Markovian modeling. Though less popular than regression event history models, Markovian models are specifically well suited for studying successive transitions between states observed at periodic time. In Sections 3, we promote some tools from the developing field of data mining, with special attention on the mining of frequent sequences and induction trees. These highly flexible heuristic tools can, among others, handle trajectories. Hence, they may prove very useful to face the deficit of knowledge on trajectories we observe between standard demographic analysis and causal research. 1 From demographic analysis to life course approach Although demographic analysis has a long history (see Dupâquier and Dupâquier, 1985), the methods still used today have essentially been elaborated between the mid nineteenth and the mid twentieth century in Western societies that felt successively threatened by race degeneration, declining birth rates and ageing. The macro frame was that of the demographic transition, i.e. the evolution from young populations with high 1 Gilbert Ritschard ([email protected]) is at the Department of Econometrics and Michel Oris ([email protected]) at the Department of Economic History. Both belong also to the Laboratory of Demography and Family Studies and have benefited, for this research, from an help of the Swiss National Science Foundation, projects 1114-68113 and 100012-105478.
international syposium on methodologies for intelligent systems | 2003
Gilbert Ritschard; Djamel A. Zighed
This paper is concerned with the goodness-of-fit of induced decision trees. Namely, we explore the possibility to measure the goodness-of-fit as it is classically done in statistical modeling. We show how Chi-square statistics and especially the Log-likelihood Ratio statistic that is abundantly used in the modeling of cross tables, can be adapted for induction trees. The Log-likelihood Ratio is well suited for testing the significance of the difference between two nested trees. In addition, we derive from it pseudo R 2’s. We propose also adapted forms of the Akaike (AIC) and Bayesian (BIC) information criteria that prove useful in selecting the best compromise model between fit and complexity.