Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lars Vilhuber is active.

Publication


Featured researches published by Lars Vilhuber.


international conference on data engineering | 2008

Privacy: Theory meets Practice on the Map

Ashwin Machanavajjhala; Daniel Kifer; John M. Abowd; Johannes Gehrke; Lars Vilhuber

In this paper, we propose the first formal privacy analysis of a data anonymization process known as the synthetic data generation, a technique becoming popular in the statistics community. The target application for this work is a mapping program that shows the commuting patterns of the population of the United States. The source data for this application were collected by the U.S. Census Bureau, but due to privacy constraints, they cannot be used directly by the mapping program. Instead, we generate synthetic data that statistically mimic the original data while providing privacy guarantees. We use these synthetic data as a surrogate for the original data. We find that while some existing definitions of privacy are inapplicable to our target application, others are too conservative and render the synthetic data useless since they guard against privacy breaches that are very unlikely. Moreover, the data in our target application is sparse, and none of the existing solutions are tailored to anonymize sparse data. In this paper, we propose solutions to address the above issues.


Journal of Business & Economic Statistics | 2005

The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers

John M. Abowd; Lars Vilhuber

In this article we describe the sensitivity of small-cell flow statistics to coding errors in the identity of the underlying entities. Specifically, we present results based on a comparison of the U.S. Census Bureaus Quarterly Workforce Indicators before and after correcting for such errors in Social Security Number-based identifiers in the underlying individual wage records. The correction used involves a novel application of existing statistical matching techniques. It is found that even a very conservative correction procedure has a sizable impact on the statistics. The average bias ranges from .25% up to 15% for flow statistics, and up to 5% for payroll aggregates.


Journal of Managerial Psychology | 2008

Procedural justice criteria in salary determination.

Julie Cloutier; Lars Vilhuber

Purpose – The purpose of this research is to identify the dimensionality of the procedural justice construct and the criteria used by employees to assess procedural justice, in the context of salary determination.Design/methodology/approach – Based on a survey of 297 Canadian workers, the paper uses confirmatory factor analysis (CFA) to test the dimensionality and the discriminant and convergent validity of our procedural justice construct. Convergent and predictive validity are also tested using hierarchical linear regressions.Findings – The paper shows the multidimensionality of the procedural justice construct: justice of the salary determination process is assessed through the perceived characteristics of allocation procedures, the perceived characteristics of decision‐makers, and system transparency.Research limitations/implications – Results could be biased towards acceptance; this is discussed. The results also suggest possible extensions to the study.Practical implications – Knowledge of the justi...


Industrial and Labor Relations Review | 2004

Escaping Low Earnings: The Role of Employer Characteristics and Changes

Harry J. Holzer; Julia Lane; Lars Vilhuber

Using a unique dataset based on individual Unemployment Insurance wage records for Illinois in the 1990s that are matched to other Census data, the authors analyze the extent to which escape from or entry into low earnings among adult workers was associated with changes in their employers and firm characteristics. The results show considerable mobility into and out of low earnings status, even for adults. They indicate that job changes were an important part of the process by which workers escaped or entered low-wage status, and that changes in employer characteristics help to account for these job changes. Matches between personal and firm characteristics also contributed to observed earnings outcomes.


Archive | 2012

Dynamically consistent noise infusion and partially synthetic data as confidentiality protection measures for related time-series

John M. Abowd; R. Kaj Kaj Gittings; Kevin L. McKinney; Bryce Stephens; Lars Vilhuber; Simon D. Woodcock

The Census Bureaus Quarterly Workforce Indicators (QWI) provide detailed quarterly statistics on employment measures such as worker and job flows, tabulated by worker characteristics in various combinations. The data are released for several levels of NAICS industries and geography, the lowest aggregation of the latter being counties. Disclosure avoidance methods are required to protect the information about individuals and businesses that contribute to the underlying data. The QWI disclosure avoidance mechanism we describe here relies heavily on the use of noise infusion through a permanent multiplicative noise distortion factor, used for magnitudes, counts, differences and ratios. There is minimal suppression and no complementary suppressions. To our knowledge, the release in 2003 of the QWI was the first large-scale use of noise infusion in any official statistical product. We show that the released statistics are analytically valid along several critical dimensions { measures are unbiased and time series properties are preserved. We provide an analysis of the degree to which confidentiality is protected. Furthermore, we show how the judicious use of synthetic data, injected into the tabulation process, can completely eliminate suppressions, maintain analytical validity, and increase the protection of the underlying confidential data.


Chapters | 2004

Early career experiences and later career success: an international comparison

David N. Margolis; Erik Plug; Véronique Simonnet; Lars Vilhuber

Human Capital Over the Life Cycle synthesises comparative research on the processes of human capital formation in the areas of education and training in Europe, in relation to the labour market. The book proposes that one of the most important challenges faced by Europe today is to understand the link between education and training on the one hand and economic and social inequality on the other. The authors focus the analysis on three main aspects of the links between education and social inequality: educational inequality, differences in access to labour markets and differences in lifelong earnings and training.


privacy in statistical databases | 2012

A proposed solution to the archiving and curation of confidential scientific inputs

John M. Abowd; Lars Vilhuber; William C. Block

We develop the core of a method for solving the data archive and curation problem that confronts the custodians of restricted-access research data and the scientific users of such data. Our solution recognizes the dual protections afforded by physical security and access limitation protocols. It is based on extensible tools and can be easily incorporated into existing instructional materials.


metadata and semantics research | 2013

Encoding Provenance Metadata for Social Science Datasets

Carl Lagoze; Jeremy Willliams; Lars Vilhuber

Recording provenance is a key requirement for data-centric scholarship, allowing researchers to evaluate the integrity of source data sets and reproduce, and thereby, validate results. Provenance has become even more critical in the web environment in which data from distributed sources and of varying integrity can be combined and derived. Recent work by the W3C on the PROV model provides the foundation for semantically-rich, interoperable, and web-compatible provenance metadata. We apply that model to complex, but characteristic, provenance examples of social science data, describe scenarios that make scholarly use of those provenance descriptions, and propose a manner for encoding this provenance metadata within the widely-used DDI metadata standard.


international conference on management of data | 2017

Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics

Samuel Haney; Ashwin Machanavajjhala; John M. Abowd; Matthew R. Graham; Mark J. Kutzbach; Lars Vilhuber

National statistical agencies around the world publish tabular summaries based on combined employer-employee (ER-EE) data. The privacy of both individuals and business establishments that feature in these data are protected by law in most countries. These data are currently released using a variety of statistical disclosure limitation (SDL) techniques that do not reveal the exact characteristics of particular employers and employees, but lack provable privacy guarantees limiting inferential disclosures. In this work, we present novel algorithms for releasing tabular summaries of linked ER-EE data with formal, provable guarantees of privacy. We show that state-of-the-art differentially private algorithms add too much noise for the output to be useful. Instead, we identify the privacy requirements mandated by current interpretations of the relevant laws, and formalize them using the Pufferfish framework. We then develop new privacy definitions that are customized to ER-EE data and satisfy the statutory privacy requirements. We implement the experiments in this paper on production data gathered by the U.S. Census Bureau. An empirical evaluation of utility for these data shows that for reasonable values of the privacy-loss parameter ε≥ 1, the additive error introduced by our provably private algorithms is comparable, and in some cases better, than the error introduced by existing SDL techniques that have no provable privacy guarantees. For some complex queries currently published, however, our algorithms do not have utility comparable to the existing traditional SDL algorithms. Those queries are fodder for future research.


Statistical journal of the IAOS | 2014

A first step towards a German SynLBD: Constructing a German Longitudinal Business Database

Joerg Drechsler; Lars Vilhuber

One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so in- tense that many statistical agencies cannot afford them. We argue many lessons in this evolving field have been learned in the early years of synthetic data generation, and can be used in the development of new synthetic data products, considerably reducing the required in- vestments. The final goal of the project described in this paper will be to evaluate whether synthetic data algorithms developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with infor- mation comparable to the LBD - the German Longitudinal Business Database (GLBD) - that is generated from different administrative sources at the Institute for Employment Research, Germany. In a fu- ture step, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.

Collaboration


Dive into the Lars Vilhuber's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Carl Lagoze

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mark J. Kutzbach

United States Census Bureau

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Fredrik Andersson

Office of the Comptroller of the Currency

View shared research outputs
Researchain Logo
Decentralizing Knowledge