Estimation of marriage incidence rates by combining two cross-sectional retrospective designs: Event history analysis of two dependent processes
Sangita Kulathinal, Minna Säävälä, Kari Auranen, Olli Saarela
EEstimation of marriage incidence rates by combiningtwo cross-sectional retrospective designs: Event historyanalysis of two dependent processes
Sangita Kulathinal ∗ , Minna S¨a¨av¨al¨a , Kari Auranen , and Olli Saarela Department of Mathematics and Statistics, University of Helsinki, Finland Population Research Institute, V¨aest¨oliitto, Helsinki, Finland Department of Mathematics and Statistics, University of Turku, Finland Dalla Lana School of Public Health, University of Toronto, CanadaSeptember 7, 2020
Abstract
The aim of this work is to develop methods for studying the determinants of mar-riage incidence using marriage histories collected under two different types of retrospec-tive cross-sectional study designs. These designs are: sampling of ever married womenbefore the cross-section, a prevalent cohort, and sampling of women irrespective of mar-ital status, a general cross-sectional cohort. While retrospective histories from a preva-lent cohort do not identify incidence rates without parametric modelling assumptions,the rates can be identified when combined with data from a general cohort. More-over, education, a strong endogenous covariate, and marriage processes are correlated.Hence, they need to be modelled jointly in order to estimate the marriage incidence.For this purpose, we specify a multi-state model and propose a likelihood-based estima-tion method. We outline the assumptions under which a likelihood expression involvingonly marriage incidence parameters can be derived. This is of particular interest wheneither retrospective education histories are not available or related parameters are notof interest. Our simulation results confirm the gain in efficiency by combining datafrom the two designs, while demonstrating how the parameter estimates are affectedby violations of the assumptions used in deriving the simplified likelihood expressions.Two Indian National Family Health Surveys are used as motivation for the method-ological development and to demonstrate the application of the methods.
Keywords:
Correlated processes, cross-sectional surveys, event history analysis, inci-dence rate, multi-state models, prevalent cohort, retrospective histories ∗ Correspondence to: Sangita Kulathinal, Department of Mathematics and Statistics, University ofHelsinki, PL 68 (Pietari Kalmin katu 5) 00014 Finland, e-mail: [email protected] , Tel.:+358505608064. a r X i v : . [ s t a t . M E ] S e p Introduction
In sociology and demography, population-based cross-sectional surveys have been used toestimate rates of events such as marriage or cohabitation. For estimation of marriage inci-dence rates, retrospective marriage history, e.g. age at first marriage, can be collected bysampling at a cross section. Two commonly employed sampling designs at a cross-sectionare; (i) sampling of ever married women before the cross-section, a prevalent cohort, and (ii)sampling of women irrespective of marital status, a general cross-sectional cohort. Marriagehistories are collected retrospectively under the two designs. We refer to studies based onthese two designs as retrospective cohort studies I and II, respectively.Similar designs are used in epidemiology to estimate incidence rate of a disease basedon retrospective disease histories, with methods described in e.g. Keiding (1991); Keidinget al. (2012). Keiding (2006) gives an overview of event history analysis and the cross-sectionwith focus on complex sampling patterns. Further, Saarela et al. (2009) proposed combiningretrospective event histories from individuals with prevalent disease and prospective follow-up of disease free individuals at the cross-section, incident cohort, to improve efficiency inestimating effects of time-invariant covariates on disease incidence. Gain in efficiency hasalso been demonstrated in estimation of survival time from disease onset to death based oncombined prevalent and incident cohort data (Ning et al., 2017; Wolfson et al., 2019).Although incidence rates estimation methods using retrospective disease histories areknown in epidemiology, their application in other fields are sparse. In the sociological con-text, retrospective event histories are typically collected under the cross-sectional retrospec-tive designs described earlier. To estimate incidence of the outcome when the outcome ofinterest is correlated with an endogenous covariate process, the outcome and the covariateprocesses need to be modelled jointly. Moreover, the estimation method should accountfor the sampling. In the absence of complete covariate process histories at the cross-section,incidence rates estimation may be possible only under special assumptions or sufficient back-ground information on the covariate processes.The novelty of the present work is in modelling marriage and education processes jointlyusing a multi-state model by combining the two retrospective cohort studies. We thus extendthe existing likelihood-based methods for estimation of incidence rates to simultaneouslyaccount for two different sampling patterns; two correlated processes; and two time scales.We outline the assumptions under which the likelihood expressions for the marriage incidencerates can be derived when complete retrospective histories of the education process are notavailable or when parameters characterizing the education process are not of interest. Ina simulation study, we assess the gain in efficiency due to using the proposed method overrelying on data from either of the two studies. We apply the methods to two nationallyrepresentative Indian National Family Health Surveys (NFHS) data to study the trends anddeterminants of marriage incidence in India. While we present results in the context ofeducation and marriage, the results are general and can be applied to other similar settings.The paper is organised as follows. Section 2 introduces the empirical data from the twoNFHS. Section 3 outlines the model of female marriage incidence and derives the necessarylikelihood expression of the model parameters to estimate them from cross-sectional data.Section 4 considers calculation of predictive probabilities based on the model. A simulationstudy and data analysis results are presented in sections 5 and 6. The paper concludes with2 discussion.
The motivation for this work comes from the estimation of marriage incidence rates andtheir determinants using two NFHS; surveys conducted in India during 1998-99 (NFHS-2)and 2005-06 (NFHS-3). The NFHS-2, an example of retrospective cohort study I, was across section of a nationally representative sample of 91196 households with 90265 ever-married women aged 15-49 years and gave a retrospective cohort of ever married women.The NFHS-3, an example of retrospective cohort study II, included 109041 households with124373 women aged 15-49 years irrespective of marital status and gave a retrospective co-hort, irrespective of the current status of marriage at the time of survey. The data andreports of the NFHS are available through the National Family Health Survey website( http://rchiips.org/nfhs/ ). A schematic Lexis diagram illustration of the three cohortsis presented in Figure 1. Education is known to be a key determinant of marriage and hence,we model the joint dependency of the education and marriage processes in this context.The NFHS reports clearly bring out differences between the states with respect to education(cf. Appendix A). Hence, we concentrate on, in addition to whole India, a subset of theNFHS data that covers four Indian states (Kerala, Maharashtra, Punjab and Rajasthan)with different demographic situations.The data used in the current analysis include each female participant’s age at the time ofthe survey, age at first marriage, state, birth cohort, urban/rural residence, caste, religion,and highest educational level completed, categorized as in Table 1. The total number ofstudy subjects in the four states was 38052 (Table 6, Appendix A).
As noted earlier, education is known to be a key determinant of marriage and vice versa,and hence, are highly correlated with each other. We model the two correlated processes;at-school and marriage processes, in a multi-state modelling framework. Each process hastwo states indicating respective status, and the joint process can be described using a multi-state model as depicted in Figure 2. The state space of the joint process is { at school andunmarried, at school and married, out of school and unmarried, out of school and married,dead } . We denote these five states as { , , . . . , } , respectively. Of note, the at-schoolprocess jumps to out of school state when the formal education ends. Let a e and a be theminimum age of starting basic compulsory education and the minimum marriageable age a ( > a e ), respectively. In the Indian context, ( a e , a ) are taken as (6 , . We denote the calendar time corresponding to age a as t ( a ) = t + a where t is the birthyear and define both processes in a Lexis diagram with calendar time and age as the twotime scales. We define the at-school process { N ( t, a ) , a ≥ , t = t ( a ) } as a stochastic processgiving the education status with N ( t, a ) = 1 indicating being in school and N ( t, a ) = 03
950 1960 1970 1980 1990 2000
Calendar year A g e ( yea r s ) l l Figure 1: Lexis diagram illustration of the two cross-sectional surveys. NFHS-2 (black)collected retrospective marriage histories from ever married women only (retrospective co-hort study I). NFHS-3 (red) included also never married women and collected retrospectivemarriage histories from currently married women (retrospective cohort study II). λ α µ α µ λ µ µ Figure 2: At-school and marriage processes as a multi-state model (states are: 1 = at schooland unmarried, 2 = at school and married, 3 = out of school and unmarried, 4 = out ofschool and married, 5 = dead)having stopped formal education (out of school) by age a at time t ( a ). Similarly, the marriageprocess { N ( t, a ) , a ≥ , t = t ( a ) } is a stochastic process giving the marital status of a womanaged a at time t ( a ), with N ( t, a ) = 0 indicating unmarried and N ( t, a ) = 1 married status.The corresponding histories are defined as F r ( t, a ) = { N r ( s, u ) , u ≤ a, s ( u ) ≤ t ( a ) } , r = 1 , , respectively and the joint history as F ( t, a ) = { ( N ( s, u ) , N ( s, u )) , u ≤ a, s ≤ t } . .The counting process N ( t, a ) remains at zero between the age 0 and a e , that is betweenthe birth year t and the year t ( a e ). Because of minimum marriageable age a ( > a e ), theprocess N ( t, a ) is zero for all a < a and t < t ( a ) . The association between the twoprocesses is modelled through the dependence on the joint history F ( t, a ) . Because the two4able 1: Covariates in the marriage incidence model. The reference categories are indicatedas ’ref.’
Covariate Category NotationBirth cohort 1942-62 (ref.) x = 0 x = 1 x = 2 x = 3 Residence status Urban (ref.) x = 0 Rural x = 1 Caste Scheduled Caste (SC, ref.) x = 0 Scheduled Tribe (ST) x = 1 Other Backward Class (OBC) x = 2 Other x = 3 Religion Hindu (ref.) x = 0 Muslim x = 1 Christian x = 2 Sikh x = 3 Other x = 4 Education a None ( < years) (ref.) x = 0 Primary (5-9 years) x = 1 Secondary (10-12 years) x = 2 Higher ( > years) x = 3 Note: a ordinal variable times grow together with the same pace we denote the history using only one time scale as F ( a ). Time invariant information or fixed covariates at birth ( x ) such as religion and casteare also included in this history. We also construct a deterministic counting process givingschooling years of a woman aged a at time t ( a ), as the accumulated history of at-schoolprocess { (cid:82) ≤ u ≤ a N ( t ( u ) , u ) d u } .The corresponding transition intensities for the two processes are defined as α ,k ( t, a | F ( a − ))= lim ∆ t → P ( N ( t + ∆ t, a + ∆ t ) = 0 | N ( t − , a − ) = (1 , k ) , F ( a − ))∆ t ,λ k, ( t, a | F ( a − ))= lim ∆ t → P ( N ( t + ∆ t, a + ∆ t ) = 1 | N ( t − , a − ) = ( k, , F ( a − ))∆ t , S c h oo li n g y e a r s ( (cid:82) ≤ u ≤ a N ( t ( u ) , u ) d u ) E du c a t i o n s t a t u s ( N ( t , a )) Age ( a in years) M a r i t a l s t a t u s ( N ( t , a )) Figure 3: A sample path of at-school (blue) and marriage (green) processes. Basic educationstarts at the age of 6 years and marriageable age is 12 years. The inner left y-axis indicateseducation status 0 (= out of school) and 1 (= at school) and the outer left y-axis gives theaccumulated schooling years. The right y-axis is the marital status axis, 0 (= unmarried)and 1 (= married). Dashed black lines indicate observation period relevant for marriageprocess and solid black line indicates the cross-section.where N ( t, a ) = ( N ( t, a ) , N ( t, a )) and k = 0 ,
1. Since the process N ( t, a ) = 1 until thetransition happens, we drop the subscript 1 and simplify the notation α ,k ( t, a | F ( a − )) to α k ( t, a ). Similarly, the process N ( t, a ) = 0 until the transition happens, and hence, we use λ k ( t, a | F ( a − )) as a simplified notation for λ k, ( t, a ). Furthermore, we define µ jk ( t, a ) to bethe rate of moving to state 5 (dead), where j, k ∈ { , } represent the current schooling andmarriage status, respectively.Figure 3 exhibits an example sample paths of the two processes based on the retrospectiveinformation collected at the cross-sectional age of 25. In the example, the formal school endsat the age of 18 and marriage takes place at the age of 21. The at-school process remains instate 1 between the age of 6 years ( a e = 6) and 18 years, then jumps to state 0. The marriageprocess starts at the age of 12 years ( a = 12) in state unmarried (0) and jumps to statemarried (1) at the age of 21 years. In addition, the deterministic counting process givingthe accumulated schooling years is also shown. The observation process stops at the firstmarriage and our main interest in the joint processes is in the time interval between the age a and the age at the first marriage. In the observation period, the multi-state process startsin state 1 at age 12 and calendar year ( t + 12) and moves to state 3 before transitioning tostate 4 at age 21 and calendar year ( t + 21). The schooling years at that time are 12 whichare attained at the age of 18. In principle, changes in the at-school process after marriagecan be inferred based on the method given below subject to the availability of data.We derive the likelihood contributions for all possible event histories conditional on beingalive in either state 1 or 3 at age a e . Let us first consider an individual born in the calendartime t and in state 1 (at school and unmarried) at age a e . Figure 2 shows possible transi-tions between the five states. We develop the model following notation of Keiding (1991),extending it to include two correlated processes. The probability density of being unmarried6nd at school with w = z − a e years of schooling and aged [ z, z + d z ) at time t , is proportionalto β ( t − z, a e ) k ( t, z, w ), where β ( t − z, a e ) is the probability density of being born in year t − z and being in state 1 at age a e and k ( t, z, w ) = exp (cid:26) − (cid:90) za e [ µ ( t − z + u, u ) + α ( t − z + u, u )] d u (cid:27) × exp (cid:26) − (cid:90) za λ ( t − z + u, u ) d u (cid:27) . (1)Similarly, the probability density of being unmarried and out of school with w years ofschooling and alive aged [ z, z + d z ) at time t , is proportional to β ( t − z, a ) k ( t, z, w ) d z where k ( t, z, w ) = exp (cid:26) − (cid:90) a w a e α ( t − z + u, u ) d u (cid:27) × α ( t − z + a w , a w ) { ae 50) years, denoted by [ a j , a j +1 ), j = 1 , . . . , x j atage band j as a time-dependent covariate by modifying the woman’s highest attained level x , recorded at the time of survey, so that x j = min( x , 1) when in age band j = 1, x j = min( x , 2) when j ∈ { , , } , and x j = x otherwise. For example, consider a womanaged 25 years at the time of survey, married at the age of 21 years, has reported educationlevel Secondary ( x = 2 (cf. Figure 3 and Table 1). In the analysis, her contribution tothe education variable will be Primary in the age band [12 , 15) and Secondary in the bands[15 , , · · · , [20 , 21) and [21 , 22) to span the age range from 12 years to the age ather marriage.To sum up, the model for the marriage incidence rate λ ( a ; x, θ ) = λ j ( x, θ ), a ∈ [ a j , a j +1 ), j = 1 , . . . , , conditional on covariate values x was specified aslog { λ j ( x, θ ) } = α j + (cid:88) i =1 β i { x = i } + β { x =1 } + (cid:88) i =1 β i { x = i } + (cid:88) i =1 β i { x = i } + (cid:88) i =1 β i { x j = i } , (14)15ith 31 parameters, including 17 log-baseline rates and 14 covariate effects (log-rate ratios).The same model was fitted for each state separately, and in addition to all-India data (all29 states) to assess how the state-specific patterns differ from the national pattern, bymaximising the product of likelihood expressions of the form (7) and (11), but because theage at marriage was only reported at the precision of one year, expressing the numeratorcontribution for married women as (cid:90) (cid:100) y (cid:101)(cid:98) y (cid:99) λ ( a ; x, θ ) exp (cid:26) − (cid:90) aa λ ( u ; x, θ ) d u (cid:27) d a = exp (cid:40) − (cid:90) (cid:98) y (cid:99) a λ ( u ; x, θ ) d u (cid:41) (cid:34) − exp (cid:40) − (cid:90) (cid:100) y (cid:101)(cid:98) y (cid:99) λ ( u ; x, θ ) d u (cid:41)(cid:35) where (cid:98) y (cid:99) and (cid:100) y (cid:101) = (cid:98) y (cid:99) + 1 denote the floor and ceiling of the exact age y at whichthe marriage took place. The joint likelihood expression was maximised with respect tothe parameter vector θ using the optim function of the R statistical environment (R CoreTeam, 2020). The standard errors were evaluated by inverting the numerically differentiatedobserved information matrix at the maximum likelihood point. The results were presented aspoint estimates and 95% confidence intervals. Of note, by letting the marriage rate dependon the birth cohort, the third possible time scale (calendar time) can be omitted. Figure 4 presents the estimated age-specific baseline marriage rates in the four Indian statesand in all India. Although the hazard of first marriage after age 30 has remained low ineach state, different patterns emerge otherwise. The rate is generally lowest in Kerala,in particular in comparison to Maharashtra and Punjab. In Rajasthan, the rate startsincreasing earliest in age.Figure 5 shows the estimated covariate effects on the marriage rates. The rate decreasesby birth cohort, except for Punjab where the rate is the highest for the 1972-1982 cohort.By the last cohort in this analysis (1982-1992), the rates have declined considerably in allfour states. Since this birth cohort, being 6-16 years of age at the time of survey, wasunderrepresented in NFHS-2, we repeated the analysis by using only the NFHS-3 data andthe estimates of marriage rates were essentially unchanged (results not shown).Unsurprisingly, women in rural areas have a larger rate of marriage (all India incidencerate ratio of 1 . 19) compared to urban areas, except in Punjab where the reverse is true. Thehigher rate in rural areas is particularly striking in Kerala and Maharashtra. At the Indialevel, the marriage rates are similar for OBC and SC while ST and Other caste have lowermarriage rates. However, this pattern is not evident in all of the four the state-level results.In Punjab, the confidence interval for ST is wide because this caste is rare (Table 6).There are clear differences in the marriage rate across religions. At the India level, themarriage incidence rates are clearly smaller in Christian, Sikh and other religions as comparedto Hindu. The same pattern emerges in the state-level analysis, except for Muslims in Kerala.Again, to interpret the state specific results we note that not all religions were sufficientlyrepresented in each state (Table 6). 16he effect of education is evident. There is a clear decrease in the incidence rate whenmoving from no education to higher education levels in India and in all the four states. Inthe all India analysis, the incidence rate for a woman with primary education to marry atany given age is about half that for a woman with no education. The corresponding rates are31% and 28% of the uneducated rate for a woman with secondary and higher education. Thesame patterns shows up in all four states although the effect of education level is relativelysmaller in Kerala.Predictive probabilities of type (12) for marrying by age a were calculated as discussedin Section 4, with a min = a = 12, using 2010 mortality rates based on census data, and mar-riage incidence rates corresponding to different calendar periods (Figure 6). The covariatevalues were set to the reference categories (urban area, scheduled caste, Hindu religion, anduneducated). Clearly, the women’s absolute probability of marrying by late twenties has re-mained consistently high, but in Maharashtra there has been a clear shift towards marryingat a later in life. The patterns in Kerala and Rajasthan are more difficult to interpret, as thehigh estimated marriage rates in late twenties in the later calendar periods actually resultsalso in higher projected absolute probabilities in late twenties. However, this projection doesnot reflect all the changes in the background population, since the overall education levelhas increased over time, bringing the population marriage incidence rates down, while inthis projection education was fixed to the reference level. In Punjab, any changes over timehave been comparatively small. In this article we formulated a multi-state model for modeling an outcome and a covariateprocess jointly in two types of retrospective cross-sectional cohort studies. Our method-ological contributions can be summarised as follows. (i) Combined analysis of retrospectivehistories from two types of cross-sectional cohorts; (ii) multi-state modeling of retrospectivehistories of two correlated processes in two time scales; (iii) assessment of the performance ofthe method based on combining the two retrospective cross-sectional cohort designs againstusing either of the two; and (iv) illustration through an application to the estimation ofmarriage incidence rates. We have used explicitly the structure of the Indian educationsystem in building the joint model and also extracting retrospective information from thecross-section. We also assume that everyone adhere to that. When retrospective history onschooling is available, in addition to the cross-section, this assumption can be relaxed.Statistical methods have been developed and applied for the estimation of incidencerates from cross-sectional cohorts, with or without subsequent prospective follow-up (Keid-ing, 1991, 2006; Saarela et al., 2009; Keiding et al., 2012). The incidence rate, in general, isnot identifiable from data under retrospective cross-section desing I only without supplemen-tary information, e.g., data from design II. The estimation is simplified under assumptionssuch as time homogeneity and non-differential mortality before and after the incident event(Keiding, 1991). Much of the existing literature has focused on nonparametric estimation ofcumulative incidence and survival functions through appropriately weighting the risk sets.Herein our main focus was in factors that modify the incidence rates, and therefore we ap-plied likelihood-based methods for piecewise constant hazard models. For this purpose, we17eeded to combine likelihood functions arising from two different sampling plans, namelythe cross-sectional cohort setting of NFHS-2, and the setting of NFHS-3. To combine infor-mation collected under the different sampling plans, the likelihood contributions from theindividual surveys are conditioned on the specific sampling plan employed in the survey, withthe overall likelihood expression obtained simply as the product of these.This result was applied in the estimation of the marriage incidence rates in four Indianstates as functions of age and birth cohort, as well as demographic characteristics. Unlikeprevious approaches (Kashyap et al., 2015) to estimate marriage rates, the proposed methodallows combining information from more than one survey and modelling education and mar-riage jointly. This brings several advantages. First, the increased sample size leads to morepowerful analyses of age at marriage data at the sub-population level (e.g. Indian states).Second, it also allows learning of calendar time trends in the strength of association of manyfactors affecting marriage rates.The analysis goes beyond simply describing the age- and sex-based marriage rates andputs forward a model which takes into account the well-recognised factors driving the mar-riages in India. The marriage incidence rates differ regionally (or state-wise) and hence therates obtained using the India-level data may not bring out the real marriage squeeze problemexistent in social strata defined by caste, religion and education. Although the caste effecton the marriage incidence rates did not differ much by state, those of education and religiondid. Our analysis provides strong evidence towards religion, education and urban/rural areaas the main factors affecting the marriage pattern among women in India. Education levelsor qualifications seem to be replacing the earlier role of caste in shaping the marriage marketin India. The effects of women’s educational expansion on marriage incidence have beenstudied worldwide and found to have some impact. However, a considerable portion of thereduction in early marriage is not explained by changes in levels of education (Mensch et al.,2005). To predict the real magnitude of the marriage squeeze problem in India, predictionsof married and unmarried populations in different age and social strata defined by state,caste, religion, urban/rural, and education are needed. The model proposed here will havea direct application for such predictions. Acknowledgements The first two authors were partly supported by the project ‘Precarious family formation’financed by the Kone foundation. The first author’s work was also supported by the researchmobility grant (No. 325990) awarded by the Academy of Finland.18 ndia Age B a s e li ne i n c i den c e r a t e 12 15 18 21 24 27 30 35 . . . . . . . . l l l l l l l l l l l l l l l l l Kerala Age B a s e li ne i n c i den c e r a t e 12 15 18 21 24 27 30 35 . . . . . . . . l l l l l l l l l l l l l l l l l Maharashtra Age B a s e li ne i n c i den c e r a t e 12 15 18 21 24 27 30 35 . . . . . . . . l l l l l l l l l l l l l l l l l Punjab Age B a s e li ne i n c i den c e r a t e 12 15 18 21 24 27 30 35 . . . . . . . . l l l l l l l l l l l l l l l l l Rajasthan Age B a s e li ne i n c i den c e r a t e 12 15 18 21 24 27 30 35 . . . . . . . . l l l l l l l l l l l l l l l l l Figure 4: Age-specific baseline rates for a woman to marry in India and the four selectedstates. The horizontal lines show the maximum likelihood estimates of parameters exp { α j } in (14), and their corresponding 95% confidence intervals.19 irth cohort Incidence rate ratio0.2 0.5 1.0 2.0 5.01942−19621962−19721972−19821982−1992 llll llll Urban/Rural Incidence rate ratio0.2 0.5 1.0 2.0 5.0UrbanRural l l l l Caste Incidence rate ratio0.2 0.5 1.0 2.0 5.0SCSTOBCOther ll ll ll ll Religion Incidence rate ratio0.2 0.5 1.0 2.0 5.0HinduMuslimChristianSikhOther lll ll llll l Education Incidence rate ratio0.2 0.5 1.0 2.0 5.0NonePrimarySecondaryHigher llll llll l l IndiaKeralaMaharashtraPunjabRajasthan Figure 5: Forest plots of the estimated covariate effects on marriage incidence rates ofwomen for India and the four selected states. The horizontal lines correspond to the rateratio estimate, and 95% confidence interval. 20 ndia Age C u m u l a t i v e m a rr i age p r obab ili t y 12 15 18 21 24 27 30 33 . . . . . . Kerala Age C u m u l a t i v e m a rr i age p r obab ili t y 12 15 18 21 24 27 30 33 . . . . . . Maharashtra Age C u m u l a t i v e m a rr i age p r obab ili t y 12 15 18 21 24 27 30 33 . . . . . . Punjab Age C u m u l a t i v e m a rr i age p r obab ili t y 12 15 18 21 24 27 30 33 . . . . . . Rajasthan Age C u m u l a t i v e m a rr i age p r obab ili t y 12 15 18 21 24 27 30 33 . . . . . . Figure 6: Predictive probabilities for women to be married by age a by birth cohort, cal-culated by combining 2010 mortality rates with the marriage incidence model. The othercovariates were set to the reference levels. 21 ppendices Appendix A: Data selection and description The NFHS reports clearly bring out differences between the states with respect to education( http://rchiips.org/nfhs/ ). All four states considered here show increasing trends in theproportion of women attaining higher education but differ by education attainment. Thereis a decreasing trend in the proportion of primary and no education, and increasing trend inthe secondary and higher education level. Rajasthan stands out when looking the educationlevels of women, with the highest proportion of women with no education.Punjab has suffered from an imbalanced child sex ratio, starting already in the 1980’s(908 girls per 1000 boys in 1981) when the child sex ratios were still normal in most otherstates in India. Rajasthan has remained as a state with a relatively high total fertility unlikethe other states examined (TFR 4.1 in 1998). Kerala has enjoyed replacement level fertilitysince the early 1990’s. Maharashtra has come to suffer from imbalance in child sex ratioduring the last two decades, combined with replacement level fertility since the 2000’s. Appendix B: Likelihood conditioning on the sampling pattern To see that the likelihood obtained by multiplying (7) and (11) is still a conditional prob-ability (less multiplicative terms), and thus a conditional likelihood, we partition the datacollected under survey j as ( v j , w j ) ≡ { ( v ij , w ij ) : i ∈ C j } , j = 2 , 3, where ( v ij ) repre-sents the conditioning event or sampling pattern. Further, ( w ij ) denote the retrospectivemarriage histories recorded through the survey. Let Θ = ( θ, β ( t ) , µ ( t, a )) denote the param-eters of interest θ as well as birth and mortality rates ( β ( t ) , µ ( t, a )). The parametrised jointdistribution of all observed data p ( v j , w j , j = 2 , | θ ) may now be decomposed as p ( v , w , v , w | Θ) = p ( w , w | v , v ; Θ) p ( v , v | Θ)= (cid:89) j =2 p ( w j | v j ; θ ) p ( v j | Θ)= (cid:89) j =2 (cid:89) i ∈ C j p ( w ij | v ij ; θ ) p ( v ij | Θ) θ ∝ (cid:89) j =2 L j ( θ ) (cid:89) j =2 p ( v j | Θ) , where conditioning on the sampling plan (ignoring (cid:81) j =2 p ( v j | Θ)) may result in some lossof information on θ , but results in valid inferences. References Cook, R.J. and Lawless J.F. (2018). Multistate Models for the Analysis of Life History Data. Monographs on Statistics and Applied Probability 158, CRC Press.22able 6: Observed proportions (in %) of women by state: categorical variables used are birthcohort, urban/rural, caste, religion, and education. (Source: NFHS-2 and -3 data) Kerala Maharashtra Punjab Rajasthan IndiaN 6450 14424 6477 10701 214638Birth cohort1942-1962 22 15 19 18 161962-1972 33 28 30 30 281972-1982 28 33 30 36 331982-1992 17 24 21 16 23Urban 33 66 36 28 40CasteSC 10 15 30 18 17ST 1 8 0.1 14 13OBC 38 25 12 31 30Other 51 52 58 38 40ReligionHindu 55 75 41 89 75Muslim 30 13 3 10 13Christian 15 2 1 0.1 7Sikh 0 0.3 55 0.5 2Other 0.1 10 0.4 1 3EducationNone 16 34 35 73 48Primary 39 34 30 18 28Secondary 31 20 26 6 16Higher 14 12 10 4 8 Kashyap, R., Esteve, A., and Garcia-Roman, J. (2015). Potential (Mis)match? MarriageMarkets Amidst Sociodemographic Change in India, 2005-2050. Demography , 52(1):183–208.Keiding, N. (1991). Age-specific incidence and prevalence: a statistical perspective. Journalof the Royal Statistical Society, Series A , 154:371–412.Keiding, N. (2006). Event history analysis and the cross-section. Statistics in Medicine ,25:2343–2364.Keiding, N., Hansen, O. K. H., Sørensen, D. N., and Slama, R. (2012). The current durationapproach to estimating time to pregnancy. Scandinavian Journal of Statistics , 39:185–204.Mensch, B. S., Singh, S., Casterline, J. B. (2005). Trends in the Timing of First MarriageAmong Men and Women in the Developing World . The Population Council, Inc.Ning, J., Hong, C., Li, L., Xuelin Huang, X. and Yu Shen, Y. (2017). Estimating treatmenteffects in observational studies with both prevalent and incident cohorts The CanadianJournal of Statistics Statis-tical Methods in Medical Research R: A Language and Environment for Statistical Computing . R Foun-dation for Statistical Computing, Vienna, Austria.Saarela, O., Kulathinal, S., and Karvanen, J. (2009). Joint analysis of prevalence andincidence data using conditional likelihood.