Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Cees H. Elzinga is active.

Publication


Featured researches published by Cees H. Elzinga.


Sociological Methods & Research | 2003

Sequence Similarity A Nonaligning Technique

Cees H. Elzinga

This article reviews objections to optimal-matching (OM) algorithms in sequence analysis and reformulates the concept of sequence similarity in terms of a binary precedence relation. This precedence relation is then used to develop a new quantification of sequence similarity. The new measure is used to reanalyze the life history data that were previously discussed by Dijkstra and Taris (1995). The reanalysis demonstrates the new measure to be superior to the OM algorithm and the alternatives proposed by Dijkstra and Taris. A new algorithm is presented to enumerate matching k-tuples from pairs of sequences in polynomial time.


Theoretical Computer Science | 2008

Algorithms for subsequence combinatorics

Cees H. Elzinga; Sven Rahmann; Hui Wang

A subsequence is obtained from a string by deleting any number of characters; thus in contrast to a substring, a subsequence is not necessarily a contiguous part of the string. Counting subsequences under various constraints has become relevant to biological sequence analysis, to machine learning, to coding theory, to the analysis of categorical time series in the social sciences, and to the theory of word complexity. We present theorems that lead to efficient dynamic programming algorithms to count (1) distinct subsequences in a string, (2) distinct common subsequences of two strings, (3) matching joint embeddings in two strings, (4) distinct subsequences with a given minimum span, and (5) sequences generated by a string allowing characters to come in runs of a length that is bounded from above.


Sociological Methods & Research | 2010

Complexity of Categorical Time Series

Cees H. Elzinga

Categorical time series, covering comparable time spans, are often quite different in a number of aspects: the number of distinct states, the number of transitions, and the distribution of durations over states. Each of these aspects contributes to an aggregate property of such series that is called complexity. Among sociologists and demographers, complexity is believed to systematically differ between groups as a result of social structure or social change. Such groups differ in, for example, age, gender, or status. The author proposes quantifications of complexity, based upon the number of distinct subsequences in combination with, in case of associated durations, the variance of these durations. A simple algorithm to compute these coefficients is provided and some of the statistical properties of the coefficients are investigated in an application to family formation histories of young American females.


Demography | 2010

Standardization of Pathways to Adulthood?: An Analysis of Dutch Cohorts Born Between 1850 and 1900

Hilde Bras; Aart C. Liefbroer; Cees H. Elzinga

This article examines pathways to adulthood among Dutch cohorts born in the second half of the nineteenth century. Although largely overlooked by previous studies, theory suggests that life courses of young adults born during this period were already influenced by a process of standardization, in the sense that their life courses became more similar over time. Using data from a Dutch registry-based sample, we examine household trajectories: that is, sequences of living arrangements of young adults aged 15–40. Our study shows that for successive cohorts, household trajectories became more similar. We identified six types of trajectories: early death, life-cycle service, early family formation, late family formation, singlehood, and childless but with partner. Overtime, early family formation gradually became the “standard” trajectory to adulthood. However, late family formation and singlehood, tcommon pathways within the preindustrial western European marriage pattern, remained widespread among cohorts born in the late nineteenth century. Laboring class youths, farmers’ daughters, young people of mixed religious background, and urban-born youngsters were the nineteenth century forerunners of a standard pathway to adulthood.


Journal of Classification | 2005

Combinatorial Representations of Token Sequences

Cees H. Elzinga

AbstractThis paper presents new representations of token sequences, with and without associated quantities, in Euclidean space. The representations are free of assumptions about the nature of the sequences or the processes that generate them. Algorithms and applications from the domains of structured interviews and life histories are discussed.


Information Sciences | 2011

Concordance and consensus

Cees H. Elzinga; Hui Wang; Zhiwei Lin; Yash Kumar

This paper deals with the measurement of concordance and the construction of consensus in preference data, either in the form of preference rankings or in the form of response distributions with Likert-items. We propose a set of axioms of concordance in preference orderings and a new class of concordance measures. The measures outperform classic measures like Kendalls @t and W and Spearmans @r in sensitivity and apply to large sets of orderings instead of just to pairs of orderings. For sets of N orderings of n items, we present very efficient and flexible algorithms that have a time complexity of only O(Nn^2). Remarkably, the algorithms also allow for fast calculation of all longest common subsequences of the full set of orderings. We experimentally demonstrate the performance of the algorithms. A new and simple measure for assessing concordance on Likert-items is proposed.


Sociological Methods & Research | 2015

Spell sequences, state proximities and distance metrics

Cees H. Elzinga; Matthias Studer

Because optimal matching (OM) distance is not very sensitive to differences in the order of states, we introduce a subsequence-based distance measure that can be adapted to subsequence length, to subsequence duration, and to soft-matching of states. Using a simulation technique developed by Studer, we investigate the sensitivity, relative to OM, of several variants of this metric to variations in order, timing, and duration of states. The results show that the behavior of the metric is as intended. Furthermore, we use family formation data from the Swiss Household Panel to compare a few variants of the new metric to OM. The new metrics have been implemented in the freely available TraMineR-package.


Theoretical Computer Science | 2013

Versatile string kernels

Cees H. Elzinga; Hui Wang

This paper proposes a class of string kernels that can handle a variety of subsequence-based features. Slight adaptations of the basic algorithm allow for weighing subsequence lengths, restricting or soft-penalizing gap-size, character-weighing and soft-matching of characters. An easy extension of the kernels allows for comparing run-length encoded strings with a time-complexity that is independent of the length of the original strings. Such kernels have applications in image processing, computational biology, in demography and in comparing partial rankings.


Advances in Sequence Analysis: Theory, Method, Applications | 2014

Distance, Similarity and Sequence Comparison

Cees H. Elzinga

In this chapter we focus on the axiomatic foundations of and the relations between two fundamental concepts of sequence analysis: distance and similarity. We discuss and interpret each of the individual axioms and point out their relevance in practical application. We will discuss units of distance, admissible transformations and normalization as a method that allows for interpreting the size of distances and similarities. We also discuss how similarity and distance can be derived from each other and, in passing, we deal with some quite common misunderstandings pertaining to these concepts.


Pattern Recognition Letters | 2012

Kernels for acyclic digraphs

Cees H. Elzinga; Hui Wang

This paper proposes two efficient kernels for comparing acyclic, directed graphs. The first kernel counts the number of common paths and allows for weighing according to path-length and/or according to the vertices contained in each particular path. The second kernel counts the number of paths in common minors of the graphs involved and allows for length- and vertex-weighting too. Both kernels have algorithmic complexity that is cubic in the size of the vertex-set. The performance of the algorithms is concisely demonstrated using synthetic and real data.

Collaboration


Dive into the Cees H. Elzinga's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

W. Dijkstra

University of Amsterdam

View shared research outputs
Top Co-Authors

Avatar

Hilde Bras

VU University Amsterdam

View shared research outputs
Top Co-Authors

Avatar

Yu Han

University of Groningen

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yash Kumar

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Sven Rahmann

University of Duisburg-Essen

View shared research outputs
Researchain Logo
Decentralizing Knowledge