Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Matthew Sobek is active.

Publication


Featured researches published by Matthew Sobek.


Historical Methods | 2011

The North Atlantic Population Project: Progress and Prospects

Steven Ruggles; Evan Roberts; Sula Sarkar; Matthew Sobek

Abstract The North Atlantic Population Project (NAPP) is a massive database of historical census microdata from European and North American countries. The backbone of the project is the unique collection of completely digitized censuses providing information on the entire enumerated populations of each country. In addition, for some countries, the NAPP includes sample data from surrounding census years. In this article, the authors provide a brief history of the project, describe their progress to data and plans for the future, and discuss some potential implications of this unique data resource for social and economic research.


Historical methods: A journal of quantitative and interdisciplinary history | 2003

Challenges and methods of international census harmonization

Albert Esteve; Matthew Sobek

Abstract The development of IPUMS-International involves harmonizing data from different national statistical offices created over several decades. The original samples vary in quality and have different data formats and variable coding schemes. The authors describe the methods developed to deal with the challenges posed by such diversity and unevenness. The first stage of harmonization involves standardizing the data formats and correcting errors. Diagnostic routines analyze each data set, and custom computer programs modify the different data structures into a single standard format. The second stage of the work centers on harmonizing the codes for all variables shared across data sets, including the compilation and integration of all the relevant documentation.


Historical Methods | 2011

Big Data: Large-Scale Historical Infrastructure from the Minnesota Population Center

Matthew Sobek; Lara Cleveland; Sarah Flood; Patricia Kelly Hall; Miriam L. King; Steven Ruggles; Matthew Schroeder

Abstract The Minnesota Population Center (MPC) provides aggregate data and microdata that have been integrated and harmonized to maximize crosstemporal and cross-spatial comparability. All MPC data products are distributed free of charge through an interactive Web interface that enables users to limit the data and metadata being analyzed to samples and variables of interest to their research. In this article, the authors describe the integrated databases available from the MPC, report on recent additions and enhancements to these data sets, and summarize new online tools and resources that help users to analyze the data over time. They conclude with a description of the MPCs newest and largest infrastructure project to date: a global population and environment data network.


privacy in statistical databases | 2012

When excessive perturbation goes wrong and why IPUMS-International relies instead on sampling, suppression, swapping, and other minimally harmful methods to protect privacy of census microdata

Lara Cleveland; Robert McCaa; Steven Ruggles; Matthew Sobek

IPUMS-International disseminates population census microdata at no cost for 69 countries. Currently, a series of 212 samples totaling almost a half billion person records are available to researchers. Registration is required for researchers to gain access to the microdata. Statistics from Google Analytics show that IPUMS-Internationals lengthy, probing registration form is an effective deterrent for unqualified applicants. To protect data privacy, we rely principally on sampling, suppression of geographic detail, swapping of records across geographic boundaries, and other minimally harmful methods such as top and bottom coding. We do not use excessively perturbative methods. A recent case of perturbation gone wrong- the household samples of the 2000 census of the USA (PUMS), the 2003-2006 American Community Survey, and the 2004-2009 Current Population Survey-, an empirical study of the impact of perturbation on the usability of UK census microdata-the Individual SARs of the 1991 census of the UK-, and a mathematical demonstration in a timely compendium of statistical confidentiality practices confirm the wisdom of IPUMS microdata management protocols and statistical disclosure controls.


African Population Studies | 2015

Statistical coherence of primary schooling in population census microdata: IPUMS-International integrated samples compared for fifteen African countries.

Robert McCaa; Lara Cleveland; Patricia Kelly-Hall; Steven Ruggles; Matthew Sobek

The IPUMS-International project, now in its fifteenth year, integrates and disseminates population microdata for twenty-two African countries (82 countries world-wide) and the number continues to increase as more National Statistical Offices cooperate with the initiative. Statistical quality is a serious concern both for the producers of the microdata as well as the researchers who use them. This paper applies the intra-cohort comparison method to pairs of integrated (harmonized) samples for fifteen African countries to assess statistical coherence using as a benchmark the proportion completing primary school by single years of birth. Samples for six countries show near perfect coherence (R2 > .9, and regression coefficients ~1.0 +/- 0.6 <0.9). Large deviations from 1.0 characterize samples for only four countries. On the whole, the results suggest that samples for the fifteen countries have considerable utility for socio-demographic analysis.


privacy in statistical databases | 2010

IPUMS-international statistical disclosure controls: 159 census microdata samples in dissemination, 100+ in preparation

Robert McCaa; Steven Ruggles; Matthew Sobek

In the last decade, a revolution has occurred in access to census microdata for social and behavioral research. More than 325 million person records (55 countries, 159 samples) representing two-thirds of the worlds population are now readily available to bona fide researchers from the IPUMS-International website: www.ipums.org/international hosted by the Minnesota Population Center. Confidentialized extracts are disseminated on a restricted access basis at no cost to bona fide researchers. Over the next five years, from the microdata already entrusted by National Statistical Office-owners, the database will encompass more than 80 percent of the worlds population (85 countries, ~100 additional datasets) with priority given to samples from the 2010 round of censuses. A profile of the most frequently used samples and variables is described from 64,248 requests for microdata extracts. The development of privacy protection standards by National Statistical Offices, international organizations and academic experts is fundamental to eliciting world-wide cooperation and, thus, to the success of the IPUMS initiative. This paper summarizes the legal, administrative and technical underpinnings of the project, including statistical disclosure controls, as well as the conclusions of a lengthy on-site review by the former Australian Statistician, Mr. Dennis Trewin.


Chinese journal of sociology | 2015

Statistical coherence of primary schooling in IPUMS-International integrated population samples for China, India, Vietnam, and ten other Asia-Pacific countries.

Robert McCaa; Lara Cleveland; Patricia Kelly-Hall; Steven Ruggles; Matthew Sobek

IPUMS-International disseminates harmonized census microdata for more than 80 countries at no cost, although access is restricted to bona-fide researchers and students who agree to the stringent conditions-of-use license. Currently over 270 samples are available, totaling more than 600 million person records. Each year, 15–20 additional samples are released, as more countries cooperate with the IPUMS initiative and the integration of 2010 round census samples is completed. With so much microdata so readily available, questions of data quality naturally arise. This article focusses on the concept of statistical coherence over time for a single concept, primary schooling completed. From an analysis of the percentage completing primary schooling by birth year for pairs of samples for 13 Asia-Pacific countries, outstanding coherence is found for four countries – China, Mongolia, Vietnam and Indonesia – with mean differences of less than 0.5 percentage points, regression coefficient (b) ranging from 0.93 to 1.07 and R2 = 0.99. For the 13 countries as a group there is considerable variation overall with mean absolute difference as high as 16 percentage points, b ranging from 0.62–1.44 and R2 = 0.65–0.99. As a whole, statistical coherence of primary schooling is outstanding. Nonetheless, to make expert use of the harmonized microdata, researchers are cautioned to carefully study the IPUMS integrated metadata as well as the original source documentation. National Statistical Offices not currently cooperating or that have not yet entrusted 2010 round census microdata are invited to do so.


Historical methods: A journal of quantitative and interdisciplinary history | 2012

Making Social Class Work

Matthew Sobek

Occupation data are a workhorse of social history. In a succinct form, an occupation suggests a person’s place within the social structure and the life chances that stem from it. Occupations are the only such social locator widely available in historical sources, giving them added importance in research. But modern scholars’ reliance on occupations is not simply an act of ahistorical desperation; occupations are ubiquitous in the historical record because contemporaries saw them as an essential piece of information about a person’s identity and position in society. Because of their richness, occupations have been asked to do a lot of work by historical researchers who have sometimes adapted methods from sociology to leverage more power from these data. Broadly speaking, there are two ways occupations get used in quantitative research. The first method scores occupations on a continuous scale based on social status or income. Such explicitly hierarchical measures are easy to interpret (higher is better) and readily amenable to a range of statistical techniques. The second method organizes occupations into classes or strata intended to reflect similar life chances or socially significant divisions within the working population. In HISCLASS: A Historical International Social Class Scheme, Marco H. D. van Leeuwen and Ineke Maas take the latter approach, describing a method to infer social class from occupation data. The book has two primary motivations. The first stems from the aforementioned lack of alternatives. As the only indicator of social structural position commonly present in historical sources, occupations are indispensable for answering questions about past society that hinge on social class. That covers considerable ground of importance to social historians. The second motivation is to facilitate international comparative research. If class is operationalized differently from one study to the next, it can be impossible to discern real differences from artifacts of classification. HISCLASS


History and Computing | 1996

Distributing Large Historical Census Samples on the Internet

Steven Ruggles; Matthew Sobek; Todd Gardner

A new project at the University of Minnesota will provide data extraction and distribution over the Internet of the Integrated Public Use Microdata Series – a database which integrates all existing national samples of the US census from 1850 to 1990 into a consistent format.


Historical methods: A journal of quantitative and interdisciplinary history | 1995

The Comparability of Occupations and the Generation of Income Scores

Matthew Sobek

Collaboration


Dive into the Matthew Sobek's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Robert McCaa

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Todd Gardner

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Albert Esteve

Autonomous University of Barcelona

View shared research outputs
Top Co-Authors

Avatar

Evan Roberts

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sarah Flood

University of Minnesota

View shared research outputs
Researchain Logo
Decentralizing Knowledge