Cees H. Elzinga
VU University Amsterdam
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cees H. Elzinga.
Sociological Methods & Research | 2003
Cees H. Elzinga
This article reviews objections to optimal-matching (OM) algorithms in sequence analysis and reformulates the concept of sequence similarity in terms of a binary precedence relation. This precedence relation is then used to develop a new quantification of sequence similarity. The new measure is used to reanalyze the life history data that were previously discussed by Dijkstra and Taris (1995). The reanalysis demonstrates the new measure to be superior to the OM algorithm and the alternatives proposed by Dijkstra and Taris. A new algorithm is presented to enumerate matching k-tuples from pairs of sequences in polynomial time.
Theoretical Computer Science | 2008
Cees H. Elzinga; Sven Rahmann; Hui Wang
A subsequence is obtained from a string by deleting any number of characters; thus in contrast to a substring, a subsequence is not necessarily a contiguous part of the string. Counting subsequences under various constraints has become relevant to biological sequence analysis, to machine learning, to coding theory, to the analysis of categorical time series in the social sciences, and to the theory of word complexity. We present theorems that lead to efficient dynamic programming algorithms to count (1) distinct subsequences in a string, (2) distinct common subsequences of two strings, (3) matching joint embeddings in two strings, (4) distinct subsequences with a given minimum span, and (5) sequences generated by a string allowing characters to come in runs of a length that is bounded from above.
Sociological Methods & Research | 2010
Cees H. Elzinga
Categorical time series, covering comparable time spans, are often quite different in a number of aspects: the number of distinct states, the number of transitions, and the distribution of durations over states. Each of these aspects contributes to an aggregate property of such series that is called complexity. Among sociologists and demographers, complexity is believed to systematically differ between groups as a result of social structure or social change. Such groups differ in, for example, age, gender, or status. The author proposes quantifications of complexity, based upon the number of distinct subsequences in combination with, in case of associated durations, the variance of these durations. A simple algorithm to compute these coefficients is provided and some of the statistical properties of the coefficients are investigated in an application to family formation histories of young American females.
Demography | 2010
Hilde Bras; Aart C. Liefbroer; Cees H. Elzinga
This article examines pathways to adulthood among Dutch cohorts born in the second half of the nineteenth century. Although largely overlooked by previous studies, theory suggests that life courses of young adults born during this period were already influenced by a process of standardization, in the sense that their life courses became more similar over time. Using data from a Dutch registry-based sample, we examine household trajectories: that is, sequences of living arrangements of young adults aged 15–40. Our study shows that for successive cohorts, household trajectories became more similar. We identified six types of trajectories: early death, life-cycle service, early family formation, late family formation, singlehood, and childless but with partner. Overtime, early family formation gradually became the “standard” trajectory to adulthood. However, late family formation and singlehood, tcommon pathways within the preindustrial western European marriage pattern, remained widespread among cohorts born in the late nineteenth century. Laboring class youths, farmers’ daughters, young people of mixed religious background, and urban-born youngsters were the nineteenth century forerunners of a standard pathway to adulthood.
Journal of Classification | 2005
Cees H. Elzinga
AbstractThis paper presents new representations of token sequences, with and without associated quantities, in Euclidean space. The representations are free of assumptions about the nature of the sequences or the processes that generate them. Algorithms and applications from the domains of structured interviews and life histories are discussed.
Information Sciences | 2011
Cees H. Elzinga; Hui Wang; Zhiwei Lin; Yash Kumar
This paper deals with the measurement of concordance and the construction of consensus in preference data, either in the form of preference rankings or in the form of response distributions with Likert-items. We propose a set of axioms of concordance in preference orderings and a new class of concordance measures. The measures outperform classic measures like Kendalls @t and W and Spearmans @r in sensitivity and apply to large sets of orderings instead of just to pairs of orderings. For sets of N orderings of n items, we present very efficient and flexible algorithms that have a time complexity of only O(Nn^2). Remarkably, the algorithms also allow for fast calculation of all longest common subsequences of the full set of orderings. We experimentally demonstrate the performance of the algorithms. A new and simple measure for assessing concordance on Likert-items is proposed.
Sociological Methods & Research | 2015
Cees H. Elzinga; Matthias Studer
Because optimal matching (OM) distance is not very sensitive to differences in the order of states, we introduce a subsequence-based distance measure that can be adapted to subsequence length, to subsequence duration, and to soft-matching of states. Using a simulation technique developed by Studer, we investigate the sensitivity, relative to OM, of several variants of this metric to variations in order, timing, and duration of states. The results show that the behavior of the metric is as intended. Furthermore, we use family formation data from the Swiss Household Panel to compare a few variants of the new metric to OM. The new metrics have been implemented in the freely available TraMineR-package.
Theoretical Computer Science | 2013
Cees H. Elzinga; Hui Wang
This paper proposes a class of string kernels that can handle a variety of subsequence-based features. Slight adaptations of the basic algorithm allow for weighing subsequence lengths, restricting or soft-penalizing gap-size, character-weighing and soft-matching of characters. An easy extension of the kernels allows for comparing run-length encoded strings with a time-complexity that is independent of the length of the original strings. Such kernels have applications in image processing, computational biology, in demography and in comparing partial rankings.
Advances in Sequence Analysis: Theory, Method, Applications | 2014
Cees H. Elzinga
In this chapter we focus on the axiomatic foundations of and the relations between two fundamental concepts of sequence analysis: distance and similarity. We discuss and interpret each of the individual axioms and point out their relevance in practical application. We will discuss units of distance, admissible transformations and normalization as a method that allows for interpreting the size of distances and similarities. We also discuss how similarity and distance can be derived from each other and, in passing, we deal with some quite common misunderstandings pertaining to these concepts.
Pattern Recognition Letters | 2012
Cees H. Elzinga; Hui Wang
This paper proposes two efficient kernels for comparing acyclic, directed graphs. The first kernel counts the number of common paths and allows for weighing according to path-length and/or according to the vertices contained in each particular path. The second kernel counts the number of paths in common minors of the graphs involved and allows for length- and vertex-weighting too. Both kernels have algorithmic complexity that is cubic in the size of the vertex-set. The performance of the algorithms is concisely demonstrated using synthetic and real data.