[PDF] A demographic microsimulation model with an integrated household alignment method

Abstract

Many dynamic microsimulation models have shown their ability to reasonably project detailed population and households using non-data based household formation and dissolution rules. Although, those rules allow modellers to simplify changes in the household construction, they typically fall short in replicating household projections or if applied retrospectively the observed household numbers. Consequently, such models with biased estimation for household size and other household related attributes lose their usefulness in applications that are sensitive to household size, such as in travel demand and housing demand modelling. Nonetheless, these demographic microsimulation models with their associated shortcomings have been commonly used to assess various planning policies which can result in misleading judgements. In this paper, we contribute to the literature of population microsimulation by introducing a fully integrated system of models for different life event where a household alignment method adjusts household size distribution to closely align with any given target distribution. Furthermore, some demographic events that are generally difficult to model, such as incorporating immigrant families into a population, can be included. We illustrated an example of the household alignment method and put it to test in a dynamic microsimulation model that we developed using dymiumCore, a general-purpose microsimulation toolkit in R, to show potential improvements and weaknesses of the method. The implementation of this model has been made publicly available on GitHub.

Full PDF

AA demographic microsimulation model with an integratedhousehold alignment method.

Amarin Siripanich a , Taha Rashidi ∗ ,a a UNSW Sydney, NSW, 2052, Australia

Abstract

Many dynamic microsimulation models have shown their ability to reasonably projectdetailed population and households using non-data based household formation anddissolution rules. Although, those rules allow modellers to simplify changes in thehousehold construction, they typically fall short in replicating household projections orif applied retrospectively the observed household numbers. Consequently, such modelswith biased estimation for household size and other household related attributes losetheir usefulness in applications that are sensitive to household size, such as in travel de-mand and housing demand modelling. Nonetheless, these demographic microsimulationmodels with their associated shortcomings have been commonly used to assess variousplanning policies which can result in misleading judgements. In this paper, we contributeto the literature of population microsimulation by introducing a fully integrated systemof models for different life event where a household alignment method adjusts house-hold size distribution to closely align with any given target distribution. Furthermore,some demographic events that are generally difﬁcult to model, such as incorporatingimmigrant families into a population, can be included. We illustrated an example of thehousehold alignment method and put it to test in a dynamic microsimulation model thatwe developed using dymiumCore, a general-purpose microsimulation toolkit in R, toshow potential improvements and weaknesses of the method. The implementation ofthis model has been made publicly available on GitHub, a code sharing platform.

Key words: microsimulation, demography, household formation, open-source software

1. Introduction

Models that can project human populations at the levels of individual and householdare a starting point modelling complex systems. Their explore people’s behaviour andinteractions, as they advance through different stages of their lives and as their envi-ronment – such as policies, social trends and the economy – changes. This modellingmethod is known as dynamic microsimulation, which is a subset of microsimulationmodels which changes over time. It has been widely adopted by many studies whereagents’ behaviour and their states are dynamic with their life courses, such as economic ∗ Corresponding Author

Email addresses: [email protected] (Amarin Siripanich), [email protected] (TahaRashidi)

Preprint submitted to Journal of Artiﬁcial Societies and Social Simulation June 18, 2020 a r X i v : . [ ec on . GN ] J un ecisions (Fatmi and Habib, 2018), disability status (van Sonsbeek and Alblas, 2012),and health outcomes (Rutter et al., 2011). It has also gained recognition as an attractivetool for policy analysis (Figari et al., 2015). The amount of insights that can be gener-ated by a dynamic microsimulation model enables impact assessment under differentpolicy scenarios across multiple interesting dimensions, such as, in different groups ofpopulation and also spatio-temporal scales (Zaidi and Rake, 2001).Several demographic microsimulation methods have shown their ability to producesimulated populations matching some characteristics of the real populations, by using aset of simple household formation rules and demographic transition models. Although,those rules are often based on no empirical data, such as assuming that adult leaverswill create one-person households (Galler, 1988), they can still produce an acceptableresult for short-period validation, as changes in the demographic structure of mostpopulations are observed in a long run. Therefore, the effectiveness of these rules mightbe questionable as their performance has not been examined versus not having anyrules in long-run scenarios. One very important characteristic of future populations isthe distribution of their household sizes. It is a signiﬁcant factor in forecasting traveldemand, housing demand, and household expenditures. An obvious resolution wouldbe to develop behavioural models to simulate life events. However, this may not bepossible for many studies where appropriate household data is missing over time toobserve the dynamics of household evolution. Another alternative would be to align themicro-level outcomes to their the target projections which can be transferred to futurewith the aim of maintaining the overall structure of households on the population.In this study, we present a dynamic microsimulation model for projecting futurestructure of population and households with a novel alignment method to allocatenewly formed and newly immigrated households to the population such that it closelymatches pre-deﬁned household size distributions that can vary over time. We developed‘dymiumCore’ , a general-purpose microsimulation toolkit in R which incorporateseveral individual and household level decisions. The modular structure of our toolkitfacilities its maintains as more behavioural models are introduced to the toolkit toimprove the accuracy of the whole microsimulation platform. The platform is uniqueof its kind in terms of its holistic structure of lifestyle events as well as the efﬁciencyof the algorithms developed that can efﬁciently update lifestyle changes for all agentsdeﬁned in the entire populations.The rest of the paper is structured as the following. The next section presents theliterature review and background of the study, follows by the discussion of the proposedmodel, data, assumptions, and sub-models. The model was run multiple times and thoseresults were used for validation. Lastly, we conclude lessons learnt from this study andpossible future improvements. . Literature review and background The term microsimulation has been given slightly different meanings dependingon which context it appears in. However, the board meaning of the term refers toa computerised micro-analytical approach that generates stochastic micro-level andhighly complex outcomes which emerge from interactions between agents (e.g. people,ﬁrms, and vehicles) and their realisation, such as a state transition. Hence this approachrequires highly detailed unit record data and parameters that describe behaviour of theiragents and environment. In this paper, we refer to microsimulation in the context ofeconometric and social science, as pioneered by Orcutt (1957) in the late 1950s.The earlier work of Guy Orcutt provides the foundation for microanalytic simulationapproaches. This highly inﬂuential work paved a promising avenue for demographers,economists and social scientists to breakaway from traditional aggregate approaches forforecasting population, such as the ‘headship rate’ approach, to account for the effectsof individual behaviour (Galler, 1988). Clarke (1986) provides a review of the earlydevelopment of population and household microsimulation models. Microsimulationcan be used not only to model evolving populations and their relationships, but it is alsoapplicable for simulating other decisions and changes in characteristics and behaviourthrough out their life-courses. This type of microsimulation models are referred to as adynamic microsimulation model.Many dynamic microsimulation models have been developed for policy evaluationsat national level across the world. Some examples are MOSART for Norway’s laboursupply and public pension beneﬁts (Fredriksen, 1998), APPSIM for many policy-relatedquestions speciﬁc to the future Australian population (Harding, 2007), INAHSIM forJapan’s household living arrangement and public assistance related policies (Fukawa,2011). SESIM for Sweden’s ageing population’s impact on their pension system andpublic ﬁnances (Flood et al., 2005), POLISIM for USA’s social security projection(McKay, 2003), LIAM for evaluating reform scenarios to the Irish pension system(O’Donoghue et al., 2009), MIDAS for analysing the social security and pensionsystems of Belgium, Germany and Italy (Dekkers et al., 2010).As a multidisciplinary tool, several use cases exist for microsimulation platforms,such as in health care (Zucchelli et al., 2010), household formation and dissolution(Galler, 1988), transportation and land use (Salvini and Miller, 2005), travel demand(Goulias and Kitamura, 1992), housing choice (Benenson and Torrens, 2004), labourmarket (Harmon and Miller, 2018), and ﬁrm life-cycle (also known as ‘ﬁrmography’)(Bodenmann, 2011). These social and economic systems rely on an evolving populationto simulate dynamic outcomes, by simulating the progression in one’s life trajectory.

Many efforts have been put toward development of demographic components ofdynamic microsimulation models, speciﬁcally to capture population changes at theindividual level (person, family, and household) with more behavioural realism (Liand O’Donoghue, 2013). Different models may require different characteristics oftheir decision-making units, that usually depends on the intended purposes of eachmodel. Models that are intended for examining the spread of long-term infectious3iseases (Geard et al., 2013), residential location choice (Moeckel, 2016), and householdexpenditure (Lawson, 2014) would require their demographic components to accuratelycapture the household structure and composition of their simulated population. As anexample, in the case of residential mobility, changes in the characteristics of a householdsuch as the household composition and the number of household members can affecthousing decisions of the household (Rashidi, 2015; Rashidi et al., 2011). Despite thosedifferences, the core demographic processes that are responsible for population growthand family formation and dissolution, namely, marriage, divorce, birth, death, leavingparental home and migration can be found in most instances (Morand et al., 2010) withlimitations on interaction and realisim of these processes. These life events of peopleare usually simulated using statistical models (such as logistic regression and decisiontree models) or simple transition probability tables estimated from population surveys.There could be a series of decisions and predicted quantities required in one event. Asan example, Eluru et al. (2008) apply their birth event only to women aged between 10to 49, most models have the lower bound starting from 15, and for each woman that isgiving birth and the number of newborns and their genders will be also determined inthe process. The demographic decisions of an individual commonly affect their familyor household members. For instance, when a married couple gets divorced, one of thepartners, usually the male partner, will leave the household while their children, if any,will remain in the old household with the female partner.Therefore, simpliﬁcations indemographic models can propogate error into the overall strcutre of the model.Although changes in family structure and population growth can be modelledquite easily using microsimulation approaches, modelling household formation anddissolution can be quite challenging, especially when population surveys are scarce.An ideal example is SERIVGE, a Swedish microsimulation model, where longitudinalsocioeconomic information of the entire population in Sweden from 1985 to 1995 areavailable along with highly detailed geographical identiﬁer of each household (Rephannand Holm, 2004). However, many models do not have the luxury as SERIVGE, sothey restrcit their models to simple household formation and dissolution assumptions.A set of common assumptions, a de facto standard per se, for household formationand dissolution can be found for several lifestyle models, as shown in Table 1. Bydeﬁnition, a household can be made up of related and unrelated individuals, while afamily only consists of people that are related by blood or by marriage or cohabitation.Most demographic models do not consider the formation of households with non relatedindividuals and family households with other individuals explicitly, hence, these typesof households do not get reﬂected in their model. Consequently, the household sizeand composition distributions of those models will most likely be unaligned with theirofﬁcial projection or historical data. A few exceptions exists, (Paul, 2014) developeda roommate model which groups non-related individuals into group households, and(Inagaki, 2018)’s model captures multi-generation households and people returning totheir parental home after relationship breakdown. However, even those that consider dosuffer from creating appropriate sizes of households due to no housing constraints beingimposed.Household formation and dissolution assumptions directly affect the compositionand size of new households that get formed, resulting from demographic outcomes,as well as changes to existing households. Chingcuanco and Miller (2018) apply a4robabilistic model for the child custody decision that entails partnership dissolutionevents, while most models consider a far simpler rule such as leaving the children in theold household with their mother. Such simpliﬁcation may lead to an overestimation inthe number of lone parents in one gender and could affect policy decisions derived fromsuch model. Another simpliﬁcation that is often found in the literature is modelling ofmigration as net migration (see MOSART, DESTINE and CORSIM). The justiﬁcationfor this is often a lack of migrant data and that some regions expect a net positive numberof migrants, hence emigration was not explicitly considered. While this assumptionwould not have a signiﬁcant impact on the population distributions, such as by age andsex, in regions where the ﬂows of migrants are not high. However, major cities suchas Melbourne or Sydney of Australia have been attractive destinations for migrants,hence, not considering emigration explicitly can severely impact the correctness of themicrosimulation result. Other approaches have been introduced to deal with a lack ofdata, such as the Pageant algorithm, an alignment method, proposed by (Chénard, 2000)for correcting the population structure when some information about the characteristicsof individual migrants is known. Although, the algorithm is used by a number of models,such as by Dekkers (2015), it doesn’t incorporate new migrant families joining intoexisting households, which could paint a wrong picture in the ﬁnal analysis, for instance,if housing demand is a quantity that the model is projecting. This is supported byDeloitte (2011), they forecast that 36% of new migrant families in Australia will initiallybe dependent on existing households for housing, hence, migrants do not put immediatepressure on the housing market when they ﬁrst arrive.

Parameter calibration and alignment are the main approaches for correcting theresults of a microsimulation model to match external totals (Baekgaard, 2002). Bothapproaches have been widely used and discussed in many studies. As said by Miller(2018) “.. microsimulation is not a model per se, but rather is a computational structurefor the implementation of models of system behavior”, hence, parameter calibrationneeds to be done both jointly and individually to ensure the system behaviour is correctlyreﬂected in every part of the model. Parameter calibration refers to a procedure formodifying coefﬁcients of an estimated model, usually only the intercept terms arechanged to scale the average proportions, such that the model’s output closely matchesan exogenous target. While alignment approaches, mechanisms that involve selectingmicro units to undergo an event such that the total number of events occurrences isconsistent with an external total, are often used to make sure that a microsimulationmodel produces results that are indifferent from exogenous expectations of futureevents (Li and O’Donoghue, 2013). Although, alignment has its downsides that arequite concerning as discussed by Baekgaard (2002) and Li and O’Donoghue (2014),Anderson (1997) noted that it is a common practice and can be found in most existingdynamic microsimulation models used for policy analysis.

In this paper, we propose a household size alignment method, integrated into a jointsystem of behavioural household and individual decision models, which allows for an5xternal household size distribution to be matched. This method allows new migrantfamilies, people leaving their parental home and those leaving their household dueto relationship breakdown to join existing households and create new households asrequired to match the pre-deﬁned household size target. The reason that we are calledthis an ‘alignment-like’ method is because it can only match the target distribution ofhousehold sizes if the new households can be assigned to those bins that are lower orhigher than its expected count. More details of the alignment method will be discussedin the next section. We also made the code of the proposed dynamic microsimulationmodel that, written in the R languages (R Core Team, 2019), available on GitHub. able 1: A comparison of the household formation and dissolution rules in household microsimulation models. Reference Use Case Partnership Formation Partnership Dissolution Leaving Parental Home Migration CodePubliclyAvailiable

Commonassumptions Create a new household whereboth partners and theirdependants (normally onlyinclude resident children) movein, or one partner joins anotherhousehold along withdependants. The male partner leaves itsfamily household and create anew lone household. Theirchildren usually stay in the samehousehold with the femalepartner. Leavers create new one-personhouseholds Add new migrant families as newhouseholds to the populationand often only net migration isconsidered.

Galler (1988) Householddynamics Randomly select one of the CAs. Did not consider explicitly. NoRephann and Holm(2004) Policy evaluation The female partners move intotheir males’ households aftergetting married. Consider immigration andemigration separately.Eluru et al. (2008) Activity-basedmodelling Randomly select one of the CAs. Consider child custody betweenthe parents. Leavers can form non-familyhouseholds of various sizes andcan leave the study area. Consider immigration andemigration separately. Newmigrants can join the populationas new households or joinexisting households. NoFukawa (2011) Projection of healthand long-term careexpenditures Each of the CAs has a differentprobability value. Also consider people returningtheir parental homes afterrelationship breakdown. Doesn’t consider childrenleaving home for reasons otherthan to get married. Doesn’t consider migration atall. NoWu and Birkin(2012) Regional planning Not stated. Not stated. Not stated. NoGeard et al. (2013) Explore patterns ofinfection andimmunity If both of the partners are livingwith their parents, then create anew household. If either of theindividuals have their ownhousehold then, the other partneralong with their children joinsthem in this household. Clone existing households. YesLawson andAnderson (2014) Forecast householdexpenditures Also consider people returningtheir parental homes afterrelationship breakdown. N/A YesPaul (2014) Activity-basedmodelling The female partners move intotheir males’ households aftergetting married. Consider child custody betweenthe parents. The partner thatgains custody of their childrenstay in the current household.The other partner leaves to forma non-family household. Leavers can form non-familyhouseholds of various sizes. Consider immigration andemigration separately. NoMoeckel (2016) Housing andtransport demand Create a new two personhouseholds. If either of thepartners have children then theirchildren will join the newhousehold. Consider immigration andemigration separately. YesChingcuanco andMiller (2018) Urban land use New couples can merge theirhouseholds. Consider child custody betweenthe parents. Consider immigration andemigration separately. 75%chance to emigrate as ahousehold and 25% chance foronly household heads toemigrate. Yes * CA = Common assumption . The microsimulation model The main objective of our model is to simulate individuals and households that areclosely aligned with Australia ofﬁcial population projections by simulating all the ‘usual’demographic events and the household alignment method. There are 10 demographicevents in the following order: ageing, birth, death, marriage, divorce, cohabitation, breakup, leaving parental home, emigration, and immigration. The migration events weresimulated separately for interregional migrants and overseas migrants. All these eventshappened sequentially and in discrete time where one simulation cycle was equal to oneyear of change. The order of events is known to have a profound impact on the model’sresults. Using a simple example, if we were to simulate the death event before the birthevent, we can expect to see less number of births each year, this is because less womenwould be alive by the time the birth event is to be simulated. However, if we switchtheir order the other way around, we should expect to see more births each year than theformer order. Only a few studies have explored this particular issue, see Dumont et al.(2018). Despite of that, some justiﬁcation can be made without requiring a thoroughexperiment, such as putting the divorce and break up events before the marriage andcohabitation events, to make sure that remarried/cohabitation can happen within thesame year, which is also what APPSIM does (Bacon and Pennec, 2007).Apart from the demographic events, two socio-economic characteristics of allthe individuals, labour force status and education status, were also updated in eachsimulation cycle. Where applicable, these attributes were also used as covariates inthe demographic models. Hence, any changes in those attributes of individuals willchange their chance of undergoing a demographic event. Ideally, this is where one canimplement a macro-micro linkage so that demographic decisions at the individual levelcan be behaviorally responsive to macroeconomic factors.Figure 1 shows a high level picture of how a microsimulation model iterates throughevents in one simulation cycle. One can think of a microsimulation model as a datapipeline, where a dataset gets passed in and ﬂow through a set of predeﬁned operations,referred to as events, that contains parameters and other settings. However, as this isa stochastic model, there is a chance that not every event will be performed on all ofthe data points of the input dataset, where each data point represents a unique entitysuch as a unique person. For an agent to successfully undergo an event (e.g. to givebirth), it depends on the event’s candidate selection criteria, deterministic, and the riskprobability, stochastic, which usually associated with characteristics of the selectedcandidates. The stochastic part is simulated using a Monte Carlo simulation. For anagent to undergo an event usually means that some characteristics of that agent will bechanged, granted that the event is simulated to occur for that agent. The output of oneevent will be the input of the next event. This procedure is repeated until all events aredone, which marks the end of a simulation cycle.

We used a 1% basic conﬁdentialised unit record ﬁle (also referred to as a microdataset or a reference sample) and tabulations (also known as marginal sums, marginaldistributions, and control totals), containing various demographic and socioeconomics8 tartWorld (t)Anyevents?Select candidatesYes World (t+1)NoAnycandidiates?No morecandidatesEventoccurs?Next candidate NoUpdate the candidateYes End

Figure 1: A ﬂowchart that shows how a microsimulation model iterates through events in one simulationcycle.

Table 2: Synthetic population and household characteristics

Characteristics Levels

Individual

Age 18 categories: 0 - 4, ..., 85+ and overSex 2 categories: Male and FemaleMarital status 6 categories: Not applicable, Never married,Married, Separated, Divorced, WidowedEmployment status 4 categories: Not applicable, Employed,Unemployed, Not in labour forceStudent status 3 categories: Not applicable, Part-time, Full-timeFather id NumericMother id NumericPartner id Numeric

Household

Household size 6 categories: 1, 2, ..., 6+Place of residence 1 category: Greater MelbourneTo generate a baseline synthetic population, we followed a standard syntheticreconstruction procedure which involves two stages: ﬁtting and generation (Müller,2017).For the ﬁtting stage, the iterative proportional updating method, also known as IPU,was used. It is a heuristic reweighing approach proposed by Ye et al. (2009). Thismethod is known to be highly efﬁcient and easy to understand, as it is an IterativeProportional Fitting procedure. The highlight of this method is its ability to calibrate thecase weights of a population sample to match their individual-level and household-levelmarginal sums. Hence, the calibrated weight of each record in the reference samplereﬂects the record’s contribution at both of the levels. There are many software packagesthat provide implementations of various ﬁtting methods. For this study we used thesimPop package’s implementation of the IPU method (Templ et al., 2017).Different demographic models may have different requirements. This decisionlargely depends on what they are developed for. The speciﬁcation of synthetic populationis one of the many requirements to be discussed prior to the model development phrase.For this study, in addition to demonstrating the effective of the overall microsimulationframework, one of our main goals is to show that the proposed allocation algorithm10orks as intended. Hence we kept the speciﬁcation of our synthetic population as simpleas possible. The following individual-level cross-tabulations were used as control tables:• age x sex x marital status ,• age x sex x employment status , and• age x sex x student status .Only household size was used as the household-level constraint.Once the calibrated weights of the reference sample were obtained, they had to beintegerised to determine how many copies of each unique record should be generated tomirror the real population. There are many ways to perform integerisation. We pickedthe TRS approach as it strikes the balance between the ease of implementation and hasa superior accuracy than other approaches as shown by (Lovelace and Ballas, 2013).The same multi-level ﬁtting approach and reference sample were used to create asynthetic population of migrants. The reference sample only contained recent migrantsat this point and regional and international migrants were marked accordingly. Allrecords in the reference sample were calibrated against target distributions of 2011Greater Melbourne migrant population. To account for the signiﬁcant part of Australia’sannual population growth, migration, three different groups of migrants were generated:inter-regional, overseas temporary, and permanent overseas migrants. However, dueto limited data available on migrants across different time periods, we assume that allfuture migrants have identical characteristics as the migrants of the base year. Thisis a big assumption to make but can be easily relaxed with a longitudinal dataset onmigrants.One extra step was added after the population synthesis procedure, which is tocreate immediate family relationships between individual whose belong in the samefamily. The microdata contains a variable which describes the person’s relationshipto a reference person of the family, or of the household, if the person does not belongin a family unit. Using that variable we were able to create parent-child and partnerlinks for each individual in a family household. However, parent-child relationships inmulti-generation households cannot be identiﬁed due to the limitation described above.It should also be noted that, in all of our simulation runs only 1% of the syntheticpopulation was used, this was to signiﬁcantly reduce the computing resources requiredfor the study. The main reason why we had to reweigh the 1% microdata was to adjustfor omitting non-residents and incomplete households present in the population sample.Empirical rates of main demographic events such as fertility, mortality, marriageand divorce can be found on ABS website. Those rates are too board geographically andhave only few dimensions (usually grouped by age, sex and state). Hence, longitudinalsurveys are more preferable for estimation of demographic sub-models, when otherdimensions of life or a ﬁner geographic resolution is required. Luckily, in Australia wehave the Household, Income and Labour Dynamics in Australia Survey (Summerﬁeldand Hahn, n.d.), also known as HILDA. HILDA is a household-based panel surveywhere many dimensions of life are captured year after year. The survey has been runningfor 18 consecutive waves, from 2001 to 2018. This study used the 2006 to 2016 panelsto estimate its demographic sub-models. Some demographic events rarely occurred,which are also rarely captured by the survey. Due to small sample size, for those rare11vents, we ﬁtted the models using pooled data, across all major Australian capital citiesand the panels.Where possible, we estimated the models separately for different groups of peoplebased on their characteristics, such as by gender and marital status. To keep the mainbody of the paper concise, estimation results can be found in the Appendix section of thepaper. A summary of the parameters used in the sub-models and which sub-populationthey applied to can be found on Table 6. Household size can be difﬁcult to simulate correctly in a dynamic microsimulationmodel and often not to the full extent that it can be captured validly, or aligns with anofﬁcial projection. This is because there are various factors at play, from demographic tohousing, and the whole decision chain does not always get modelled based on empiricaldata. Demographic factors such as relationship formation and dissolution (i.e. marriage,cohabitation, divorce and break up), having children, leaving home change the size ofhouseholds. These events are part of the life-cycle of families. Not only that, preferredliving arrangements of overseas migrants, such as living with extended family membersand renting with other people, can signiﬁcantly affect household size. Moreover, housingfactors such as affordability and shortage can also inﬂuence the distribution of householdsizes of a population. Hence, we have devised an alignment method that allows a feasibletarget of household sizes to be achieved without adding more complexities to what canalready found in the standard demographic components of a dynamic microsimulationmodel. This alignment method was applied in the immigration events to allocate newmigrant families and in the divorce, break up, and leave parental home events wherewe replace the de facto household formation rules, discussed in the literature reviewsection.This method works iteratively, in each iteration it allocates one household to thehousehold size bin that will reduce and the standard deviation of the household sizedifferences the most at that moment. The use of standard deviation as the ‘scoring’function allows the errors in all of the household sizes to be balance. This process getsrepeated until all the new households are assigned. An assignment can happen in twodifferent ways. A household can either be assigned to a household size bin as a newhousehold, or by joining an existing household with their combined household sizeis equal to the allocated bin. For example, let’s assume a shortage of household sizefour and a surplus of household size two exist, if the next household to be allocatedis of size two, it will be combined with an existing household of size two whichimmediately reduces the surplus in size two and the shortage of size four. This is anideal outcome, however, that may not always be possible due to the unpredictable orderof new households to be processed. Using a random order of new households can reducethe chance that there is a systematic bias in the process. Pseudo-code for this alignmentprocedure is presented in Algorithm 1 and Algorithm 2.This method is not without its drawbacks. First, it does not recognise that merginglarge size households can be problematic. That case would happen when large sizehouseholds are to be allocated ﬁrst, while big households are missing in the population.Second, this implementation of the method doesn’t choose most likely households, otherthan their combined household size, once merged, that the new households should join.12or example, one would expect people leaving parental homes for education to joingroup households that are also students. However, the method make no distinctionbetween a group household and a family household of the same size. This can be easilyimproved by including a compatibility model, similar to a couple compatibility model,that can evaluate the compatibility of a joining household and a host household, if suchdata is available.

Algorithm 1:

Household size alignment procedure

Inputs : HH u , an array of unallocated households. HH e , an array of existing households. T , an array of targeted household sizes, where the last index is the lastcategory of household size (e.g. 6 or more people). B ← calculate household size bins of HH e D ← calculate differences between B and T foreach household h ∈ HH e do S ← RankBestSize ( h , D , T ) foreach household size s ∈ S do if s == 0 then h creates a new household D [ s ] ← D [ s ] − k ← randomly select a household of size s from HH e if no suitable household with size s then try the next best s Make members of h join k j ← min ( size of household h + size of household k , length ( T )) D [ s ] = D [ s ] − D [ j ] = D [ j ] + lgorithm 2: RankBestSize

Inputs : h , a household. D , an array that contains differences between targeted and existinghousehold size bins. T , an array that contains the target distribution of household sizes to bematched. Output : R , an array containing the ranks of the household sizes that thehousehold, h , should join to minimise the mean squared of the sum ofdifferences between D and T , where 0 means to create a new household. n ← length ( D ) S ← an empty array with length n x ← min ( household size of h , n ) foreach i ∈ n do D (cid:63) ← D if i == n then D (cid:63) [ i ] ← D (cid:63) [ i ] + S [ i ] ← sd ( D k / T k ) break j ← min ( x + i , n ) D (cid:63) [ i ] ← D (cid:63) [ i ] − D (cid:63) [ j ] ← D (cid:63) [ j ] + S [ i ] ← sd ( D k / T k ) (cid:46) sd is a function to calculate stardard deviation D (cid:63) ← D (cid:46) Lines 14 to 16 evaluate as a new houeshold. D (cid:63) [ x ] ← D (cid:63) [ x ] + s ← sd ( D k / T k ) S ← append S to s R ← sort 0 to n from the lowest to the highest according to S return R3.4. Sub-models

Our model contains 12 sub-models and within those exist sub-processes. All agents’decisions are outcomes of a Monte Carlo simulation. For example, the probability ofa binary decision is determined using an appropriate model for such decision. Theprobability could be conditioned upon attributes of the individual making the decision.Using a pseudo random generator, a value between 0 and 1 is draw from a uniformdistribution. If the randomly drawn value is less than the probability, then the individualis assumed to undergo that decision. The same technique is also used for simulatingdecisions with multiple outcomes. This process is also known as weighted randomsampling, where the weights are the probabilities corresponding of the choices.

The ageing event increases age of people by one year, since one simulation cycle ofthis model is equivalent to one year. This is a very important event which is also themain distinction between static and dynamic microsimulation. Ageing plays a very vital14ole in reﬂecting changes in behaviour of people such as the decision to kids or theirleave parental home.

Birth is simulated in three steps. The ﬁrst step is to determine the risk of giving birthfor females aged 18 to 49 and simulate the risk outcomes for all, using a Monte Carlosimulation. Once the outcome of the risk is determined, the chance of giving birth tomore than one baby for each female that is simulated along with gender of the newbornbabies. Parent and child relationships will also be created.

The probability of dying depends on age and sex of each individual. Once, a marriedperson is dead, their partner will be made a widower, reﬂected in their marital status.Households with only children under 15 year of age left will be removed from thepopulation. However, that rarely happened in our model so its impact to the overallresult is negligible. An alternative approach would be to assign the orphans to existinghouseholds.

New households are formed whenever individuals enter a partnership, through mar-riage or cohabitation. The marriage and cohabitation events are similar, procedurally.The main differences between them are in their eligibility criteria and that the marriageevent is responsible for simulating two different types of marriage – with and withoutpremarital cohabitation – following different procedures. Those with premarital cohabi-tation when they are selected to be married, it is only required that their marital statuschanged to ‘married’. In contrast, single individuals must ﬁnd a suitable partner to bemarried or to enter a cohabitation relationship with. Hence, a mate matching market isestablished to match individuals who are seeking a partner. A mate matching process isrequired to ﬁnd compatible partners. Our mate matching model assume that no one hasperfect knowledge about other seekers in the market. Hence, each individual is assigned30 potential partners to their choice set. They then have to evaluate their compatibilityand the ﬁnal pick is simulated using a weighted random sampling approach, where theweights are calculated based on their differences in age. This compatibility assumptioncan be easily changed based on the availability of empirical data that captures suchinformation. Both partnership formation events guaranteed no ‘left over’, by drawingthe required additional number of individuals, based on their transition probabilities, tobalance both of the candidate pools.

Relationship breakdown leads to housing stress which causes a decrease in the sizeof households and an increase in the number of households. When a couple is simulatedto end their relationship, if they have any children, child custody will also be simulated.It is assumed that the partner that gets child custody will continues to stay together in thesame household, and the other person will leave the household. If the relationship hasno children involves, then the male partner has to leave the household. All individuals15hose are leaving their households will either form a new lone person household or joinan existing household which is up to the household size alignment algorithm, presentedin Section 3.3. For a demographic model that uses a microsimulation approach, divorce,also sometimes include de-cohabitation, is one of the ways to account for splitting ofindividuals into more households. Our model simulate both divorce and break up, theending of cohabitation, exclusively of each other. This is to account for the fact thatboth types of partnership are different in term of relationship commitment.

There are many reasons to why people left their parental homes, for example, movingout into a couple relationship and to live with other related or non related individuals.The purpose of this event is to model the leave home decision of individuals that havenever left their parental homes for other reasons that are not related to cohabitationor marriage. The cohabitation and marriage events already account for those peoplethat will leave home to move in with their partners. Different parameters are used formales and females. In most demographic microsimulation models, leavers will formbe assumed that they will go on and form one person households, this assumptionundoubtedly lead to an over-representation of households in that size. Only very fewmodels, such as Paul (2014), try to resolve this problem by introducing a roommatematching model to group leavers together to form group households. In our case, thehousehold alignment algorithm was used to allocate leavers to households.

Immigration is one of the main driver for the growth in the population of the greaterMelbourne region. The number of migrants was modelled based on the ABS ofﬁcialmigration projection. The total number of migrants expected in each year was convertedinto the total number of migrant households. The person to household conversion ratewas calculated using the 1% CURF data, with only the sample that migrated to the studyregion, for each migration type. Then that number was used to draw households fromthe synthetic migrant data based on their calibrated weight, that we had calibrated priorto the simulation. In each iteration, a list of migrant households was integrated withthe main population using the household size alignment procedure to ensure that theoverall number of households would not exceed the household projection of that year.As described in the previous section, immigration is modelled separately for each typeof migrants – permanent overseas migrants, temporary overseas migrants and interstatemigrants - since they exhibit different characteristics in all levels, as observed in the 1%CURF ﬁle.

To model emigration we used the procedure presented similar to the Pageant algo-rithm proposed by Chénard (2000). We assumed that people emigrate as households,this is to avoid removing married people from the population, which their remainingdependent children would be marked as orphans. We randomly selected a numberof individuals that aligned with the distribution of emigrants, by age and sex, fromABS. The procedure goes iteratively as the following, in each iteration an individualwas randomly drawn from the population, age and sex of the individual along with16ts household members were checked against the target distribution. If no categoriesin the target distribution were in negative after removing those selected individualscharacteristics, then they would be marked as emigrants and, subsequently, removedfrom the population. Note that, their probabilities of being selected were weightedby their household sizes. This was to make sure that households of all sizes have anequal chance of emigrating. However, this algorithm does not ensure that the simulatednumber of emigrants in each year will always satisfy its projection. Despite of that, wefound that the differences were minute.

By making demographic decisions of individuals sensitive to changes in their so-cioeconomic status can make the model to be more behavioral, which can be highlydesirable for many studies. In a more comprehensive model, socio-economic variablessuch as labour force participant can be linked to a macro-economic model and, forexample, the risk of giving birth can be associated with women’s labour participationstatus. Hence, any changes in the labour force participant rates will affect the totalnumber of females to have children in that year. In our model, labour force status andeducation attainment are modelled using multinomial logistic regression models.

4. Results

This subsection illustrates how the household size alignment procedure works, usinga simple example as shown on Table 3. There are 100 households of each householdsize category, from 1 to 3, to be assigned to the existing population. However, assigningall the unallocated households as new households according to their household sizewould oversimulate most of the categories, except the last category, 4 or more people,since there is no new household that can be assigned to. Once we applied the householdsize alignment procedure to all the unallocated households, we can see very signiﬁcantimprovements across all the household size categories even in the 4 or more categorywhere no unallocated household were from that size. This was because some unallocatedhouseholds were allocated to combine with some existing households of size 3 then theybecame households of size 4 or more. Many mergers occurred between households ofsize 3 and new unallocated households, as can be seen on Figure 2, speciﬁcally, fromthe ﬁrst iteration to around 60th iteration there were a sharp decline and a surge in bothof the categories, respectively. The relative percentage differences dropped to almost0% in all of the household sizes – their average was around 4.5% prior to the alignment.17 able 3: An example of a household alignment problem

Household size1 2 3 4 or more(1) Unallocated households 100 100 100 0(2) Existing households 2,250 3,300 1,800 2,600(3) Target distribution 2,300 3,180 1,710 2,810(4) After alignment 2,296 3,192 1,716 2,823Relative difference before, [(2) - (3)] / (3) -2.17 % 3.77 % 5.26 % -7.47 %Relative difference after, [(4) - (3)] / (3) -0.17 % 0.38 % 0.35 % 0.46 %

Table 4 shows a comparison between the 1% projected and simulated ﬁgures ofpopulation and households across ﬁve different years. For both agent types, their errorsgrew as the number of iterations was increasing, the same can be said about the variationin the simulated values, reported as standard deviations in the brackets. It shouldbe pointed out that, while the total number of the starting households matched theircalibration target, the total number of individuals did not. This was because the IPUalgorithm could not ﬁnd convergence in some categories between the 1% microdataand the target distributions it was given. From the table above, it can be calculatedthat the average household size in the year 2040 would be 2.778 and 2.674 based onthe simulation result and the projection, respectively. It is an indication that individualagents were forming bigger household sizes over time, in fact, greater than what impliesby the projection. Since the number of immigrants added each year was already basedon the immigrant projection, it is clear that births and deaths were factors in the hugediscrepancy in the total number of individuals in 2040, where the model oversimulatedby around 5.1%. Unsurprisingly, all the models were estimated based on data from ahistorical period which only captured the then economic and social trends that inﬂuencedthose events, unlike many of the assumptions made in the ABS projection that were notlinear with time.

Detailed historical data from the 2016 Australian Population and Housing Surveywere used for benchmarking the model performance. The following comparisons aremade to show that our model can reasonably capture the demographic evolution of thestudy region at the aggregate level. This level of comparison is generally acceptable inmicrosimulation studies.Figure 5 shows the marginal errors between the average result of 20 independentruns to their corresponding validation targets. The shares of nearly all of the simulated5-year age groups were closely matched with their observed shares, with more than halfare less than 0.20%. The largest difference, of negative 0.51%, can be seen in peoplethat were in the ‘85 years and over’ category. However, the most signiﬁcant imbalances,based on their proportions, were amongst people aged between 0 to 5, 5 to 9, and 35 to18

Iteration R e l a t i v e d i ff e r en c e Household size

Absolute difference

Figure 2: Relative percentage differences between the target distribution of the existing distribution ofhousehold sizes of the example alignment problem. able 4: A comparison of the 1 percent simulated and observed counts of population and households. Measure 2011 2016 2020 2030 2040

Population

Simulated(SD) 38,754 (0) 44,566(75) 50,277(143) 64,101(190) 78,286(229)Projected 40,000 44,852 52,287 63,835 74,466Ratio 0.969 0.994 0.962 1.004 1.051

Households

Simulated(SD) 14,947 (0) 16,662(39) 18,610(48) 23,289(53) 28,177(56)Projected 14,947 16,645 19,225 23,547 27,852Ratio 1.000 1.001 0.968 0.989 1.01239. These categories accounted for 6.4%, 6.2% and 7.2% in the observed population,respectively, which the model clearly overestimated them. While, in the three largestage categories - 20 to 34 years - were very well captured by the model, with the absolutedifferences of less than 0.08% and the lowest was 0.01%. Although the model explicitlyaccounted for both in- and out-migration using the administrative migration projectionvalues, the age structure of the new migrants did not drastically change with time butonly by chance of them being added, as discussed in the previous section.It should also be noted that, the singly year age variable of the population wassimulated using their ﬁve-year group to allow aging with the simulation cycle. Althoughwe tried to ensure that the count of each singly year age group matched its observeddistribution, it was not possible to simulate each individual’s age jointly with theirhousehold members due to a lack of appropriate data. Some studies were able tosimulate singly year age of parents conditioned on their children’s, and vice versa, andalso amongst each couple. This is a potential improvement to our model in a futureiteration that could lessen the differences in the age structure.Evidently, education and employment were the areas where the sub-models did notdo quite well, many of the categories saw the absolute differences of well over 1.5%,especially the share of people those were not in labour force were substantially low,compared to its observed proportion. These sub models were estimated from multiplewaves of the HILDA survey with the lagged state as the independent variable. Hence,this signiﬁes the need to revise the models with a better speciﬁcation that allows theeffects of other demographic variables to be accounted.Marital status is also another area where the model was able to do well in. Out ofall the categories in marital status, separation was fairly out of proportion, given itssize. This disproportion was most likely the reason why the marriage category wasunderestimated, since only married people can go into separation.The proportions of males and females seem to be fairly accurate. This is one of thosecharacteristics were we were able to control. The discrepancies can only be explainedby the result of the calibration of the base synthetic population and from the randomdrawings made to add new immigrants each year.20verall, the results of the simulated individuals in 2016 shows that the modelcould reasonably simulate age, sex and marital status. However, there are still someweaknesses that should be addressed in the next iteration, such as, the speciﬁcation of themultinomial logistic models used to predict the changes in education and employmentstatus. Additionally, removing emigrants solely based on their age and sex proved tohelp in maintaining the total number of individuals that was close to its projection,however, it was done at the cost of creating errors in other uncontrolled categories.Table 5: Person-level validation results in the year 2016.Category Observed Simulated Range (%) Difference

Education

AdvancedDiploma andDiploma Level 8.6 % 8.2 % [8, 8.4] -0.4 %Bachelor DegreeLevel 16.8 % 16.4 % [16.1, 16.6] -0.5 %Certiﬁcate Level 11.8 % 14.1 % [13.9, 14.3] 2.4 %GraduateDiploma andGraduateCertiﬁcate Level 2.3 % 2.5 % [2.4, 2.6] 0.2 %Not Applicable 20.4 % 18.3 % [18.2, 18.5] -2.1 %PostgraduateDegree Level 5.8 % 5.2 % [5.1, 5.3] -0.6 %Year 12 or Below 34.2 % 35.3 % [35.1, 35.5] 1.1 %21 mployment

Employed 49.6 % 52.5 % [52.2, 52.8] 3 %Not Applicable 19.2 % 18.3 % [18.2, 18.5] -0.9 %Not in the LabourForce 27.6 % 23.7 % [23.5, 23.9] -3.9 %Unemployed 3.6 % 5.4 % [5.2, 5.6] 1.8 %

Marital Status

Divorced 6.1 % 6.4 % [6.2, 6.6] 0.3 %Married 39.5 % 37.6 % [37.3, 37.9] -1.9 %Never Married 29.9 % 32.7 % [32.4, 33] 2.8 %Not Applicable 18.3 % 18.3 % [18.2, 18.5] 0 %Separated 2.3 % 1.5 % [1.4, 1.5] -0.9 %Widowed 3.9 % 3.5 % [3.4, 3.6] -0.4 %

Sex

Female 51 % 51.3 % [51.2, 51.5] 0.3 %Male 49 % 48.7 % [48.5, 48.8] -0.3 %

Figure 3 shows that all of the simulated household sizes were closely matched withtheir historical targets of the same period. At ﬁrst glance, it is obvious that all of thehousehold sizes, with only an exception of one-person household, were consistentlyunderestimated by the model. However, their RMSE values indicate that the differenceswere relatively small. This is evidence that the household alignment method, presentedin Section 3.3, that was used instead of the simple household formation rules worked asintended. Not only it could reasonably match with the observed numbers, the algorithmwas also able to deliver the household size distribution that was consistent with ourassumption that the household size distributions of all of the future periods wouldremain the same as in 2016, as shown in Figure 4. Initially, some slight variationscan be observed, then they gradually stabilised there on toward the end. In the sameﬁgure, we can see that the total number of households of size 6 or more, the smallestcategory, was overestimated due to merging of large households as identiﬁed in Section@ref{hhsize-alignment}.To further investigate the validity of the simulated households, we looked at how wellthe model could replicate three different household types by comparing the projected andsimulated results, as shown in Figure 5. The trends of lone person households and familyhouseholds were correctly replicated, however, for the number of family households themagnitude was consistently below its projection. While the number of group householdswas over-represented. These discrepancies are most likely the result of the householdselection criteria used, where new households only existing households to join thatmatched its preferred household size only, ignoring other probable compatibility metricssuch as household type. Again, this was not unexpected and can be ﬁxed with the rightdata that reveals migrant families’ household formation behaviour.22 or more, RMSE = RMSE = RMSE = RMSE = RMSE = RMSE =

580 600 620 6401180 1190 1200 1210 1220 12302890 2900 2910 2920 2930 29402850 2860 2870 28805160 5180 5200 5220 52403830 3840 3850 3860

Simulated count

Household size

Figure 3: Validation of household sizes in 2016. The red dots are observed counts and the grey dots aresimulated counts from different runs. %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = %( μ = σ = Year P e r c en t Household size or more Figure 4: Household size prediction vs assumption from 2016 to 2040. amily households Group households Lone person households Year N u m be r o f hou s eho l d s Official projection Simulation

Figure 5: Household type prediction vs Ofﬁcial projection from 2011 to 2040.

In addition to analysing the demographic outcomes by looking at the population’scharacteristics, we can also evaluate the results based on the number of occurrences ofeach event, as depicted in Figure 6. The grey lines are from the simulation results fromeach simulation run, they are highly ﬂuctuated but, overall, they all have an upward trend,which is what to be expected as the population increases. Out of all the demographicevents, the number of divorces have the highest amount of variation across differentperiods.Since the validation targets are for the entire State of Victoria, they had to be scaleddown using a quotient calculated from their population sizes. Many demographic eventsincluded that could not be validated, due to lack of administrative data, are also includedsuch as the number of break ups, non consensual unions, people who left parental home.As expected, the magnitudes of many demographic events are off from the approximatesof their observed values.For the migration events, a steep increase can be seen for in-migration from thebeginning and levelling off to around 1900 people – or 190,000 adjusted back from thedownscaling – from the year 2020 onward, while for out-migration a similar upwardtrend can be observed up to 2030 and reach a plateau with 775 of outgoing migrants peryear. These trends are from the administrative projection with an assumption that GreaterMelbourne will see medium interstate and overseas migration ﬂows. However, in reality,migration is extremely volatile due to a multitude of factors – such as immigrationpolicies, world’s economy, country’s economy, global pandemic – that are occurring in25 arriages Marriages from

CohabitationGave

Birth Immigrant

Persons Left

HomeDeaths Divorces EmigrantsBirths Breakups Cohabitations

Year O cc u rr en c e s Observed Simulated

Figure 6: Demographic occurrences this fast-changing world.

5. Conclusion

In this paper we proposed a transparent and comprehensive microsimulation platformwhich incorporate several major elements of population evolution each of which havebeen discussed elsewhere but as a whole rarely come together. The proposed modularframework facilitates maintenance of the system of models as they can be easily updated.Further, a household alignment method is introduced as an alternative to the simple nondata-based rules, which are deemed conservative, while they have been widely used bymany demographic microsimulation models for governing the household formation anddissolution behavior of people. The proposed alignment method allows all the standarddemographic events to be simulated as usual, while maintaining the distribution ofhousehold sizes of the population to be closely aligned with any pre-deﬁned targetdistributions that can be variable with time. This alignment method can be applied toother demographics if a priori information is available about their distributions. We26pplied the method to allocate people whom experienced the relationship dissolutionevents (divorce and break up) or the in-migration event to the population, assuming thatthe proportions of household sizes were the same as what was observed in the 2016population. It was done for three particular reasons, to lessen the number of one-personhouseholds that would otherwise be created through the conservative rule when peopleﬁrst leave their parental homes and those experienced relationship breakdown, and tomake sure that the additions of new immigrant families wouldn’t exceed a probablenumber of households the region was projected to have.We showed that the model was able to reasonably project the included individualand household characteristics and the population at various periods. Some drawbacksstill remain to be addressed by future studies, such as assigning people to their mostlikely household type, and providing a more comprehensive validation of the results. Atthe present, this method allows the total number of households and its household sizedistribution to be controlled, in an absence of appropriate empirical data and model thatcan capture how factors such as housing stress and demographic shifts can affect thehousehold formation and dissolution behavior of people. Ultimately, household sizehas a very profound impact in many household and individual level decisions, suchas in travel demand modelling, vehicle ownership decision, household expenditures,residential mobility and more. Hence, models that deal these decision levels shouldensure that the household size distribution in their models should not exceed a feasibleﬁgure, such as an ofﬁcial household projection, to make conclusions that they drawfrom their results more valid.

6. Appendix

Table 6: Summary of the sub-modelsSubprocess Eligible to Model Outcomes Covariates Data source

Birth

Fertility Femalesaged 16and above Logisticregression Binary Age, parity,age ofyoungestchild HILDAwaves 2006- 2016Multiplicity Femalesgiving birth Rate-based Categorical ABSStatisticSex ofnewborn Newborns Rate-based Categorical ABSStatistic

Death

Dying Allindividuals Logisticregression Binary Age andsex ABSStatistic

Marriage

Directmarriage Cohabitingmales Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016 able 6: Summary of the sub-models (continued) Sub model

Subprocess Eligible to Model Outcomes Covariates Data sourceIndirectmarriage Nevermarriedfemalesaged above18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016Nevermarriedmales agedabove 18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016Previoslymarriedfemalesaged above18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016Previoslymarriedmales agedabove 18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016Partnerscore Individualsseekingpartner Exponentialfunction Numeric Agedifference

Cohabitation

Indirect co-habitation Nevermarriedfemalesaged above18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016Nevermarriedmales agedabove 18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016Previoslymarriedfemalesaged above18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016Previoslymarriedmales agedabove 18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016Partnerscore Individualsseekingpartner Exponentialfunction Numeric Agedifference

Breakup able 6: Summary of the sub-models (continued) Sub model

Subprocess Eligible to Model Outcomes Covariates Data sourceBreakupdecision Nevermarriedfemalesaged above18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016Moveoutdecision Nevermarriedmales agedabove 18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016

Divorce

Decision todivorce Previouslymarriedfemalesaged above18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016Moveoutdecision Previouslymarriedmales agedabove 18 Logisticregression Binary Age,maritalstatus HILDAwaves 2006- 2016

Leavehome

Decision toleaveparentalhome Femalechildrenagedbetween 18to 40 Logisticregression Binary Age HILDAwaves 2006- 2016

Socioeconomics

Education Allindividualsaged above15 Multinomairegression Categorical Education HILDAwaves 2006- 2016Employment Allindividualsaged above15 Multinomairegression Categorical Employment HILDAwaves 2006- 2016

Emigration

Overseasmigrants Allindividuals Algorithmic Binary 5-year age,sex ABSmigrationprojection

Immigration

Overseastemoporarymigrants Weighteddraw Numeric Calibratedweights 2011CURF andABSmigrationprojection able 6: Summary of the sub-models (continued) Sub model

Subprocess Eligible to Model Outcomes Covariates Data sourceOverseaspermanentmigrants Weighteddraw Numeric Calibratedweights 2011CURF andABSmigrationprojectionInter-regionalmigrants Weighteddraw Numeric Calibratedweights 2011CURF andABSmigrationprojectionTable 7: Single females’ fertility model

Term Estimate Std.error Statistic P.valueIntercept -11.853 5.230 -2.266 0.023Age 0.610 0.374 1.632 0.103Age^2 -0.012 0.006 -1.866 0.062Employed -1.036 0.631 -1.642 0.101Has one child 2.607 0.846 3.081 0.002Has two or more children 1.946 0.998 1.951 0.051

Table 8: Cohabiting females’ fertility model

Term Estimate Std.error Statistic P.valueIntercept -11.089 5.396 -2.055 0.040Age 0.576 0.333 1.728 0.084Age^2 -0.008 0.005 -1.655 0.098Age of youngest child -0.054 0.093 -0.586 0.558Employed -1.672 0.460 -3.638 0.00030 able 9: Married females’ fertility model

Term Estimate Std.error Statistic P.valueIntercept -5.225 1.375 -3.801 0.000Age of youngest child -0.073 0.043 -1.700 0.089Employed -0.340 0.216 -1.580 0.114Age: 15 - 19 years with no child -8.039 520.429 -0.015 0.988Age: 20 - 25 years with no child 2.930 1.590 1.843 0.065Age: 25 - 29 years with no child 4.112 1.397 2.943 0.003Age: 30 - 34 years with no child 4.405 1.392 3.164 0.002Age: 35 - 39 years with no child 4.944 1.408 3.511 0.000Age: 40 - 44 years with no child 2.985 1.500 1.989 0.047Age: 45 - 49 years with no child 2.325 1.755 1.324 0.185Age: 20 - 25 years with one child 18.721 620.648 0.030 0.976Age: 25 - 29 years with one child 5.117 1.383 3.700 0.000Age: 30 - 34 years with one child 4.425 1.370 3.229 0.001Age: 35 - 39 years with one child 4.409 1.369 3.220 0.001Age: 40 - 44 years with one child 3.775 1.378 2.740 0.006Age: 45 - 49 years with one child 1.844 1.832 1.006 0.314Age: 20 - 25 years with two or more children 3.747 1.468 2.553 0.011Age: 25 - 29 years with two or more children 4.031 1.370 2.942 0.003Age: 30 - 34 years with two or more children 2.866 1.365 2.099 0.036Age: 35 - 39 years with two or more children 0.890 1.471 0.605 0.545

Table 10: Never married males’ cohabitation model

Term Estimate Std.error Statistic P.valueIntercept -13.159 1.929 -6.820 0.00Age 0.687 0.133 5.149 0.00Age^2 -0.011 0.002 -4.988 0.00Employed 0.729 0.283 2.577 0.01

Table 11: Priorly married males’ cohabitation model

Term Estimate Std.error Statistic P.valueIntercept -12.525 5.030 -2.490 0.013Age 0.406 0.240 1.688 0.091Age^2 -0.005 0.003 -1.840 0.066Employed 0.539 0.811 0.665 0.50631 able 12: Never married females’ cohabitation model

Term Estimate Std.error Statistic P.valueIntercept -10.406 1.740 -5.982 0.000Age 0.548 0.125 4.396 0.000Age^2 -0.009 0.002 -4.253 0.000Employed 0.151 0.263 0.572 0.567

Table 13: Priorly married females’ cohabitation model

Term Estimate Std.error Statistic P.valueIntercept -11.440 4.684 -2.442 0.015Age 0.368 0.225 1.640 0.101Age^2 -0.004 0.003 -1.696 0.090Employed -1.025 0.382 -2.683 0.007

Table 14: Never married males’ direct marriage model

Term Estimate Std.error Statistic P.valueIntercept -26.424 9.636 -2.742 0.006Age 1.354 0.622 2.176 0.030Age^2 -0.021 0.010 -2.076 0.038

Table 15: Priorly married males’ direct marriage model

Term Estimate Std.error Statistic P.valueIntercept -14810.575 307025.01 -0.048 0.962Age 638.060 13206.92 0.048 0.961Age^2 -6.873 142.01 -0.048 0.961

Table 16: Never married females’ direct marriage model

Term Estimate Std.error Statistic P.valueIntercept -16.509 5.399 -3.058 0.002Age 0.769 0.365 2.106 0.035Age^2 -0.012 0.006 -1.958 0.05032 able 17: Priorly married females’ direct marriage model

Term Estimate Std.error Statistic P.valueIntercept -12.718 19.571 -0.650 0.516Age 0.163 0.850 0.192 0.848Age^2 -0.001 0.009 -0.098 0.922

Table 18: Breakup model for males

Term Estimate Std.error Statistic P.valueIntercept -0.049 0.528 -0.093 0.926Age^2 -0.001 0.000 -2.413 0.016Holds a degree -0.145 0.435 -0.333 0.739Employed -1.906 0.424 -4.492 0.000

Table 19: Breakup model for females

Term Estimate Std.error Statistic P.valueIntercept -0.434 0.402 -1.078 0.281Age^2 -0.001 0.000 -3.808 0.000Holds a degree -0.842 0.394 -2.135 0.033Employed -0.381 0.337 -1.133 0.257

Table 20: Divorce model for males

Term Estimate Std.error Statistic P.valueIntercept -1.691 0.710 -2.382 0.017Age -0.037 0.018 -2.026 0.043Has children -0.694 0.378 -1.837 0.066Holds a degree -1.494 0.498 -3.002 0.00333 able 21: Divorce model for females

Term Estimate Std.error Statistic P.valueIntercept -2.062 0.760 -2.712 0.007Age -0.052 0.019 -2.707 0.007Has children -0.164 0.443 -0.370 0.712Holds a degree -0.067 0.363 -0.184 0.85434 eferenceseferences