[PDF] An Instrumental Variable Estimator for Mixed Indicators: Analytic Derivatives and Alternative Parameterizations

Abstract

Methodological development of the Model-implied Instrumental Variable (MIIV) estimation framework has proved fruitful over the last three decades. Major milestones include Bollen's (1996) original development of the MIIV estimator and its robustness properties for continuous endogenous variable SEMs, the extension of the MIIV estimator to ordered categorical endogenous variables (Bollen \& Maydeu-Olivares, 2007), and the introduction of a Generalized Method of Moments (GMM) estimator (Bollen, Kolenikov \& Bauldry, 2014). This paper furthers these developments by making several unique contributions not present in the prior literature: (1) we use matrix calculus to derive the analytic derivatives of the PIV estimator, (2) we extend the PIV estimator to apply to any mixture of binary, ordinal, and continuous variables, (3) we generalize the PIV model to include intercepts and means, (4) we devise a method to input known threshold values for ordinal observed variables, and (5) we enable a general parameterization that permits the estimation of means, variances, and covariances of the underlying variables to use as input into a SEM analysis with PIV. An empirical example illustrates a mixture of continuous variables and ordinal variables with fixed thresholds. We also include a simulation study to compare the performance of this novel estimator to WLSMV.

Full PDF

RRunning head: MIXED INDICATOR DERIVATIVES 1An Instrumental Variable Estimator for Mixed Indicators: Analytic Derivatives andAlternative ParameterizationsZachary F. Fisher and Kenneth A. Bollen University of North Carolina at Chapel Hill a r X i v : . [ s t a t . M E ] A p r IXED INDICATOR DERIVATIVES 2AbstractMethodological development of the Model-implied Instrumental Variable (MIIV)estimation framework has proved fruitful over the last three decades. Major milestonesinclude Bollen’s (1996) original development of the MIIV estimator and its robustnessproperties for continuous endogenous variable SEMs, the extension of the MIIVestimator to ordered categorical endogenous variables (Bollen & Maydeu-Olivares,2007), and the introduction of a Generalized Method of Moments (GMM) estimator(Bollen, Kolenikov & Bauldry, 2014). This paper furthers these developments bymaking several unique contributions not present in the prior literature: (1) we usematrix calculus to derive the analytic derivatives of the PIV estimator, (2) we extendthe PIV estimator to apply to any mixture of binary, ordinal, and continuous variables,(3) we generalize the PIV model to include intercepts and means, (4) we devise amethod to input known threshold values for ordinal observed variables, and (5) weenable a general parameterization that permits the estimation of means, variances, andcovariances of the underlying variables to use as input into a SEM analysis with PIV.An empirical example illustrates a mixture of continuous variables and ordinal variableswith ﬁxed thresholds. We also include a simulation study to compare the performanceof this novel estimator to WLSMV.IXED INDICATOR DERIVATIVES 3An Instrumental Variable Estimator for Mixed Indicators: Analytic Derivatives andAlternative Parameterizations

Introduction

Classic Structural Equation Models (SEMs) treated all endogenous variables ascontinuous. Contemporary SEM research has paid more attention to binary, ordinal,and other noncontinuous endogenous variables (Arminger and Küsters, 1988; Muthén,1984, 1993; Jöreskog, 1994). Item response theory (IRT), of course, has a long history oftreating measurement models with binary or ordinal measures (e.g. Thurstone, 1925;Lazarsfeld, 1950; Bock, 1972; Lord et al., 1968). A number of estimators have also beenproposed for SEMs with categorical endogenous variables, including the DiagonallyWeighted Least Squares (DWLS), Full-Information Maximum Likelihood (FIML),Pairwise Maximum Likelihood (Katsikatsou et al., 2012, PML) and PolychoricInstrumental Variable (Bollen and Maydeu-Olivares, 2007, PIV) estimators.Recent research on the PIV estimator has revealed some promising features.Nestler (2013) found the PIV estimator to be as accurate as the Unweighted LeastSquares (ULS) and DWLS estimators for correctly speciﬁed models and more robust inthe presence of structural misspeciﬁcations. Similarly, Jin et al. (2016) found the PIVestimates to be as good as those from ULS and DWLS when the equation in questionwas correctly speciﬁed (and no additional misspeciﬁcations were present in the model),and less biased for correctly speciﬁed equations when speciﬁcation errors were presentin the model at large.These are important ﬁndings as it is far more likely our models are approximationsrather than perfect realizations of the process under study. Furthermore, forsystem-wide estimators the bias induced by even minor misspeciﬁcations, such as anomitted factor loading, have been found to linearly increase with each additionalmisspeciﬁcation (Yang-Wallentin et al., 2010). Jin and Cao (2018) proposed anequation-by-equation overidentiﬁcation test for DWLS analogous to that used withModel Implied Instrumental Variables (MIIVs) 2SLS for continuous outcomes (Kirbyand Bollen, 2009).IXED INDICATOR DERIVATIVES 4Despite the PIV’s valuable properties and extensions, it has several restrictions.Bollen and Maydeu-Oliveres’ (2007) current PIV estimator assumes that all observedvariables are binary or ordinal. They do not consider the common situation where amix of binary, ordinal, censored, and continuous variables are analyzed. This limits theapplicability of their results. Further, to simplify the situation they assumed that allthe continuous variables that underlie the binary and ordinal variables had means ofzero and variances of one.With continuous variables, researchers routinely estimate their means, variances,and covariances as part of the analyses. This practice is particularly valuable inlongitudinal data where the diﬀerences in central tendency or variances are of researchinterest. Similarly, this is true in multiple group analysis and invariance testing. Aparameterization that allows estimated means, variances, and covariances createsgreater continuity between analysis of continuous variables and those with ordinalvariables rather than the dominant practice of assuming means of zero and variances ofone for the underlying variable. In addition, many ordinal variables are categorizedversions of continuous variables due to the constraints of survey research. For instance,a questionnaire item on household income collapsed into ranges such as “less than$20,000, $20,000 to $40,000,..., greater than $100,000” creates an ordinal variable from acontinuous variable. Similarly, questionnaire responses on education, hours watchingTV, number of alcohol drinks, cigarettes smoked, time on the internet, etc. oftenappear in ordinal categories that are collapsed versions of continuous variables. In suchcases, the thresholds are known constants (e.g., $20,000, $40,000, etc. for income) thatresearchers could enter rather than estimate and thereby scale the underlyingcontinuous variable in a metric close to that they would have obtained with thecontinuous variable. Jöreskog (2002) proposed using thresholds as a way of estimatingmeans and variables, but he restricted his alternative parameterization to setting theﬁrst two thresholds of an ordinal variable to 0 and 1. Our paper provides a moregeneral approach to a mean, variance, and covariance parameterization that isapplicable to DWLS as well as the MIIV approach.IXED INDICATOR DERIVATIVES 5Another simpliﬁcation Bollen and Maydeu-Oliveras (2007) use are numericalderivatives rather than analytic derivatives when developing estimates of the standarderrors and signiﬁcance tests. The numerical derivatives are computationally slower andless accurate than analytic derivatives. However, the analytic derivatives require matrixderivatives and are time consuming to ﬁnd, despite their advantages. One advantage isthat once the analytic derivatives are available, they enable standard error estimateswhen a covariance (correlation) matrix and means are input for analysis as long as theasymptotic covariances of the covariance (correlation) matrix among the underlyingvariables and their means are available.Among the unique contributions of this paper are: (1) we use matrix calculus toderive the analytic derivatives of the PIV estimator, (2) we extend the PIV estimator toapply to any mixture of binary, ordinal, and continuous variables, (3) we generalize thePIV model to include intercepts and means so the analyst can include these as part ofthe modeling, (4) we devise a method to input known threshold values for ordinalobserved variables, and (5) we enable a general parameterization that permits theestimation of means, variances, and covariances of the underlying variables to use asinput into a SEM analysis with PIV. An empirical example illustrates a mixture ofcontinuous variables and ordinal variables with ﬁxed thresholds. We also include asimulation to compare the performance of this modiﬁed PIV estimator to WLSMV.

The General Model and Assumptions

We begin with a modiﬁcation of a general structural equation model where thelatent variable model is η = α η + B η + ζ , (1)and η is a m × α η is a m × B is a m × m matrix of regression coeﬃcients relating theendogenous variables. The equation disturbances are contained in the m × ζ Our notation is similar to the LISREL notation of Joreskog & Sorbom (1978) except that we do notinclude exogenous latent variables and their corresponding indicators and we use y ∗ instead of y to referto the underlying indicators of η . IXED INDICATOR DERIVATIVES 6and their variances and (cross-equation) covariances in the m × m covariance matrix Σ ζ . We assume E ( ζ ) = 0 and that ζ is uncorrelated with any exogenous variables in η .Furthermore, we propose the measurement model y ∗ = α y + Λ y η + ε (2)where y ∗ is a p × Λ y is a p × m matrix offactor loadings, and ε is a p × E ( ε ) = 0, and are uncorrelated with the latent variables, C ov ( ε , η ) = 0.An additional note describing the auxiliary measurement model linking y ∗ to thevector of observed variables y is warranted. To clarify our notation we note that p isthe total number of observed variables in the system, r is the number of continuousobserved variables, s is the number of observed ordered categorical values, and p = s + r . If y j , . . . , y r represent continuous variables then y j = y ∗ j for j = 1 , . . . , r .Conversely, if y j for j = r + 1 , . . . , r + s represent binary or ordinal variables then y ∗ j iscategorized according to the threshold parameters, τ j , such that y j = c if τ j,c < y ∗ j < τ j,c +1 , where c = 0 , ..., C j −

1. Here, C j is the total number of discrete valuesthe ordered categorical variable y j can take and C j − τ j, = −∞ and τ j,C j = ∞ . A General Estimation Procedure

We partition our estimation procedure into three distinct steps. In the ﬁrst step weobtain consistent estimates of the ﬁrst and second order sample statistics required forthe subsequent analysis. In the second step, we develop alternative parameterizationsfor the y ∗ variables such that it becomes possible to construct an unconstrainedcovariance matrix Σ ∗ and mean vector µ ∗ to serve as input to the analysis. Again usingresults from multivariate calculus, we obtain closed-form solutions for the limitingcovariance matrix of υ ( ˆ Σ ∗ ) − υ ( Σ ∗ ) and υ ( ˆ µ ∗ ) − υ ( µ ∗ ). Here, the υ ( · ) operator stacksthe unique elements of a patterned matrix column-wise as in Magnus (1983). Finally, instep 3 we derive the PIV coeﬃcients corresponding to the full model parameters andIXED INDICATOR DERIVATIVES 7analytic derivatives required to approximate their large sample variances. Table 1provides a summary of this algorithm. First and Second Order Sample Statistics

Here we obtain the ﬁrst and second order sample statistics for the subsequentanalysis. In the case of the model described by Bollen and Maydeu-Olivares (2007)these statistics amount to the thresholds and polychoric correlations among the orderedcategorical variables. With continuous variables included in y we must also estimate themeans and covariance elements. Finally, we must also estimate the polyserialcorrelations among the continuous and categorical variables.A number of methods have been described for obtaining these quantities (Olsson,1979; Muthén, 1984; Jöreskog, 1994; Lee et al., 1995; Katsikatsou et al., 2012; Monroe,2018), and theoretically we could use any of these approaches. However, for didacticpurposes the results from Olsson (1979) and Muthén (1984) suﬃce. We obtain themeans and thresholds from the univariate margins. Let us consider a single variable y j from the set of 1 , . . . , p observed variables. If y j is binary or ordinal, the most commonconvention to identify the thresholds is to set µ j = 0 and σ jj = 1. This enables us toestimate ˆ τ j from the univariate margins using maximum likelihood. If y j is acontinuous variable, µ j and σ jj are freely estimated. Subsequently, for every pairwisecombination of ordered categorical variables the thresholds are held ﬁxed at theestimates obtained from the univariate margins and the correlation structure (ˆ σ jk ) for j = k is obtained. As a result of this conditional estimation procedure ˆ σ jk is considereda pseudo-maximum likelihood estimate.Now, collecting these sample statistics we have ˆ ω = ( ˆ µ y ∗ , ˆ τ , υ ( ˆ Σ y ∗ )), where Σ y ∗ takes the following formIXED INDICATOR DERIVATIVES 8 Σ y ∗ =  ρ y ∗ s, . . . ρ y ∗ s +1 , . . . ˜ ρ y ∗ s +1 ,s σ y ∗ s +1 ,s +1 ... . . . ... ... . . .˜ ρ y ∗ s + r, . . . ˜ ρ y ∗ s + r,s σ y ∗ s + r,s +1 . . . σ y ∗ s + r,s + r  (3)where ρ y ∗ j,k = C or ( y ∗ j , y ∗ k ) ∀ j = k, ; j, k < s represent the polychoric correlations amongthe s ordered categorical variables, ˜ ρ y ∗ j,k = C or ( y ∗ j , y ∗ k ) ∀ s < j ≤ s + r, k < s representthe polyserial correlations among the s ordered categorical variables and r continuousvariables, and σ y ∗ j,k = C ov ( y ∗ j , y ∗ k ) ∀ s < ( j, k ) ≤ s + r represent the covariances amongthe r continuous variables. The mean vector is µ y ∗ = (0 , . . . , s , µ y ∗ s +1 , . . . , µ y ∗ s + r ). Forthe preceding procedure the consistency of ˆ ω , plim ˆ ω = ω , and the assumptions thatunderlie the proof are given by Muthén and Satorra (1995, Appendix A). Furthermore,Muthén and Satorra (1995, pp. 494-498) provide a convenient procedure forconstructing ˆ ω and the conditions under which it achieves asymptotic normality, n / ( ˆ ω − ω ) d −→ N ( , Σ ω ) . (4)There are a number of ways for estimating these quantities and their asymptoticcovariance matrix, including methods proposed by Olsson (1979), Muthén (1984),Jöreskog (1994), Lee et al. (1995), Katsikatsou et al. (2012), and Monroe (2018). Theanalytic derivatives provided here are compatible with any of these methods. An Unconstrained Covariance Matrix and Mean Vector

When the means and variances of the y ∗ variables are of interest, it is possible toestimate these quantities rather than constrain them to 0 and 1, respectively. For thisreason we also consider alternative parameterizations for ordered categorical variableswith three or more response options. Jöreskog (2002) showed a speciﬁc alternativeIXED INDICATOR DERIVATIVES 9parameterization where the ﬁrst two thresholds of y ∗ j are ﬁxed to zero and one,respectively. From these identifying constraints the mean and variance of y ∗ j can beestimated. We generalize this idea for additional parameterizations in the followingmanner. Consider the latent response variable y ∗ j underlying the ordered categoricalvariable y j . In step 1 we assume y ∗ j ∼ N (0 ,

1) , where y ∗ j is determined only up to amonotonic transformation of y j . To retain the normality assumption we can allow anyarbitrary linear transformation of y ∗ j . If we only wish to estimate the mean (whilekeeping V ar ( y ∗ j ) ﬁxed to one) we shift the distribution of y ∗ j by a constant, q , such thatthe transformed distribution ¨ y ∗ j ∼ N ( q , y ∗ j , we shift and scale the distribution of y ∗ j such that¨ y ∗ j ∼ N ( q q , q ).To provide a concrete example of this transformation let τ j,a and τ j,b be the ﬁrsttwo thresholds of the latent response variable y ∗ j underlying the four-category observedvariable y j . In Step 1 of the analysis, we estimated all three thresholds of y ∗ j assuming µ j = 0 and σ jj = 1. Now, let ¨ τ ∗ j,a and ¨ τ j,b be the corresponding ﬁrst two thresholdsunder an alternative parameterization where we wish to instead estimate µ j and σ jj .Accordingly we must now ﬁx any two of the previously estimated thresholds to someconstant values (e.g. ¨ τ j,a = 0 and ¨ τ j,b = 1). Using the following algebraic results q = − τ j,a ¨ τ j,b − τ j,b ¨ τ j,a ¨ τ j,b − ¨ τ j,a (5) q = ¨ τ j,a ( q + τ j,a ) − (6)it becomes possible to estimate the mean, variance, as well as the remaining thresholds,¨ τ j,k = ( τ j,k + q ) q , for the transformed distribution.To construct Σ y ∗ and µ y ∗ under the alternative parameterization we mustreexpress the scalar representation given above in matrix form. To do this we providethe most general form, the case where one wishes to estimate both the mean andvariance for each of the s ordered categorical variables by ﬁxing two of the thresholds toconstant values. For latent response variable y ∗ j we denote the two ﬁxed valueIXED INDICATOR DERIVATIVES 10thresholds as ¨ τ j,a and ¨ τ j,b and their freely estimated counterparts under the standardparameterization as τ j,a and τ j,b . Building on this notation we let D a = diag( τ ,a , . . . , τ s,a , s +1 , . . . , s + r ), D b = diag( τ ,b , . . . , τ s,b , s +1 , . . . , s + r ),¨ D a = diag(¨ τ ,a , . . . , ¨ τ s,a , s +1 , . . . , s + r ), and ¨ D b = diag(¨ τ ,b , . . . , ¨ τ s,b , s +1 , . . . , s + r ).Lastly we let D τ,k = diag( τ ,k , . . . , τ s,k , s +1 , . . . , s + r ). If τ j,k does not exist for y ∗ j , the j the diagonal of D τ,k equals zero. Then, in matrix form we can express the scalarconstants q and q from above as Q = − D a ¨ D b − D b ¨ D a ( ¨ D b − ¨ D a ) − (7) Q = ¨ D a ( Q + D a ) − (8)along with the thresholds, mean vector and covariance matrix¨ τ y ∗ ,k = diag[( D τ,k + Q ) Q ] , (9)¨ µ y ∗ = diag( Q Q ) + µ y ∗ , (10)¨ Σ y ∗ = Q Σ y ∗ Q . (11) Approximate Asymptotic Distribution of ˆ π . Let π contain the freeelements of the unconstrained covariance matrix and mean vector. To obtain theasymptotic distribution of π we employ the multivariate-delta method (Cramér, 1999).The multivariate-delta method provides a convenient approach for resolving the largesample distribution of a vector function of a multinormally distributed random vector.Earlier it was shown that n / ( ˆ ω − ω ) d −→ N ( , Σ ω ). As the elements of π are a functionof ω we can express the large sample variances of π as V ar ( π ) = ∂ π ( ˆ ω ) ∂ ω (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ω =ˆ ω ˆ Σ ω ∂ π ( ˆ ω ) ∂ ω (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ω =ˆ ω . (12)where ∂ π ( ˆ ω ) /∂ ω is the Jacobian matrix containing the ﬁrst order partial derivatives of¨ Σ y ∗ , ¨ µ y ∗ and ¨ τ y ∗ evaluated with respect to D a , D b , D τ,k , Σ y ∗ and µ y ∗ . The Jacobianmatrix is comprised of the partial derivatives of ¨ τ y ∗ , ¨ µ y ∗ and ¨ Σ y ∗ with respect to theIXED INDICATOR DERIVATIVES 11freely varying elements in the p × µ , the p × p diagonal matrices D a , D b and D τ,k and the p × p correlation/covariance matrix Σ y ∗ , ∂ υ ( ¨ Σ y ∗ ) ∂ υ ( D a ) = ∆ + h ( Q Σ y ∗ ⊗ I + I ⊗ Q Σ y ∗ ) ∂ Q ∂ D a i ∆ , (13) ∂ υ ( ¨ Σ y ∗ ) ∂ υ ( D b ) = ∆ + h ( Q Σ y ∗ ⊗ I + I ⊗ Q Σ y ∗ ) ∂ Q ∂ D b i ∆ , (14) ∂ υ ( ¨ Σ y ∗ ) ∂ υ ( Σ y ∗ ) = ∆ + ( Q ⊗ Q ) ∆ , (15) ∂ υ ( ¨ µ y ∗ ) ∂ υ ( D a ) = ∆ + h ( Q ⊗ I ) ∂ Q ∂ D a + ( I ⊗ Q ) ∂ Q ∂ D a i ∆ , (16) ∂ υ ( ¨ µ y ∗ ) ∂ υ ( D b ) = ∆ + h ( Q ⊗ I ) ∂ Q ∂ D b + ( I ⊗ Q ) ∂ Q ∂ D b i ∆ , (17) ∂ υ ( ¨ µ y ∗ ) ∂ υ ( µ y ∗ ) = ∆ + ∆ , (18) ∂ υ (¨ τ ) ∂ υ ( D τ,k ) = ∆ + Q ∆ , (19) ∂ υ (¨ τ ) ∂ υ ( D a ) = ∆ + h ( Q ⊗ I ) ∂ Q ∂ D a + ( I ⊗ ( Q + D τ )) ∂ Q ∂ D a i ∆ , (20) ∂ υ (¨ τ ) ∂ υ ( D b ) = ∆ + h ( Q ⊗ I ) ∂ Q ∂ D b + ( I ⊗ ( Q + D τ )) ∂ Q ∂ D b i ∆ ] , (21)where ∂ Q ∂ D a = − h ( Q + D a ) − ⊗ ¨ D a ( Q + D a ) − i ( ∂ Q ∂ D a + I ) , (22) ∂ Q ∂ D a = − [( ¨ D b − ¨ D a ) − ¨ D b ⊗ I ] , (23) ∂ Q ∂ D b = − h ( Q + D a ) − ⊗ ¨ D a ( Q + D a ) − i ∂ Q ∂ D b , (24) ∂ Q ∂ D b = − [( ¨ D b − ¨ D a ) − ¨ D a ⊗ I ] . (25)and ∆ is a generalized duplication matrix, ∆ + is a generalized elimination matrix(Magnus, 1983). Descriptions of these operators and their relation to patterned matrixderivatives are given in Appendix A. Now, let L = [ ∂ υ ( ¨ Σ y ∗ ) ∂ υ ( D a ) , ∂ υ ( ¨ Σ y ∗ ) ∂ υ ( D b ) , ∂ υ ( ¨ Σ y ∗ ) ∂ υ ( Σ y ∗ ) ], L = [ ∂ υ (¨ µ y ∗ ) ∂ υ ( D a ) , ∂ υ (¨ µ y ∗ ) ∂ υ ( D b ) , ∂ υ (¨ µ y ∗ ) ∂ υ ( µ y ∗ ) ], and L = [ ∂ υ (¨ τ ) ∂ υ ( D τ ) , ∂ υ (¨ τ ) ∂ υ ( D a ) , ∂ υ (¨ τ ) ∂ υ ( D b ) ]. Here ∂ υ (¨ τ ) ∂ υ ( D τ ) is ablock diagonal matrix composed of the k = 1 , . . . , g jacobian matrices ∂ υ (¨ τ ) ∂ υ ( D τ,k ) and g isIXED INDICATOR DERIVATIVES 12the maximum number of free thresholds for the variables in y ∗ . Finally, ∂ π ( ˆ ω ) ∂ ω = [ L , L , L ] (26)and the entries of ∂ π ( ˆ ω ) /∂ ω have been reordered to match the ordering of elements in Σ ω . Combining (12) and the properties of the multivariate-delta method, n / ( υ ( ˆ π ) − υ ( π )) d −→ N ( , V ar ( π )). With a consistent estimate of V ar ( π ) in hand wemove to the estimation of the full model parameters. PIV Estimation

In Stage 3 the correlation or covariance matrix (standard parameterization) or theunconstrained covariance matrix (alternative parameterization) from Step 2 is used toobtain point estimates for the parameters detailed in (1) and (2). Furthermore, thelarge sample distribution of these parameters is derived. Although it is beyond thescope of this paper to present an exhaustive description of MIIV estimation tocontextualize our procedure we provide the prerequisite details and point readers to thesource material for more detailed exposition.

MIIV Estimation.

The MIIV family of estimators (Bollen, 1996; Bollen andMaydeu-Olivares, 2007; Bollen et al., 2014) begin by transforming the latent variablemodel into a system of estimating equations. A detailed description of thistransformation is given by Bollen (1996). A consequence of this transformation is thateach equation’s error will be a composite of errors from related equations. In most casesterms in this composite error will correlate with the endogenous right hand sidevariables. To address this endogeneity instrumental variables implied by the modelspeciﬁcation itself (MIIVs) are ascertained for each equation in the system. To be usefulthese instruments must be related to endogenous variables and unrelated to thecomposite error, and a number of tests are available for assessing the validity ofinstrumental variables (Kirby and Bollen, 2009). For the purpose of this paper we focuson the MIIV-2SLS estimation itself, concentrating on the stages occurring after themodel has been transformed into a system of estimating equations, however a briefIXED INDICATOR DERIVATIVES 13description of the estimating equations is warranted.We begin by scaling each latent variable in the model by choosing one of itsindicators and setting its intercept to zero and its factor loading to identity. We labelthis variable the scaling indicator. This choice allows us to reexpress each latentvariable in terms of its scaling indicator minus its error. This is called the latent toobserved variable transformation and proceeds by partitioning the p manifest variablesinto two vectors, y ( s ) and y ( n ) , where y ( s ) contains the m scaling indicators, and y ( n ) contains the p − m non-scaling indicators, such that Y = [ y ( s ) , y ( n ) ] . Using algebra wecan then transform (1) and (2) into Y =  α η α ( n )y  +  BΛ ( n )  y ( s ) +  ( I − B ) − Λ ( n ) I 0   ε ( s ) ε ( n ) ζ  (27)where α η is the m × α ( n ) y is the ( p − m ) × Λ ( n ) is the ( p − m ) × m factor loading matrix forthe non-scaling indicators.The transformation above has translated the structural relations from the originalmodel into a system of estimating equations. Consolidating the composite disturbancefrom (27) we can further simplify our notation to express this system as Y = Z θ B Λ + e u where Z is block-diagonal and contains all relevant regressors from y ( s ) , θ B Λ containsthe free parameters in B and Λ from (1) and (2) and e u contains the composite errorterms for each equation. The diﬃculty in estimating θ B Λ from y = Z θ B Λ + e u resultsfrom the composite disturbance term e u which by construction will have a nonzerocorrelation with variables in Z . To overcome this diﬃculty we can use the MIIV-2SLSestimator to obtain consistent estimates of θ B Λ where instruments for each equation areimplied by the larger model speciﬁcation. Let V be the block diagonal matrixcontaining these equation-speciﬁc instrumental variables. For equation j , the matrix ofIVs, V j , are valid if the following conditions are satisﬁed: (a) C ov( e u j , V j ) = 0 , (b) rankIXED INDICATOR DERIVATIVES 14of C ov( V j , Z j ) must be equal to the number of columns in Z j , (c) C ov( V j ) isnonsingular, and (d) C ov( Z j , V j ) = 0.It is helpful to discuss these assumptions further and speciﬁcally note theirimplications for the SEM context. Assumption (a) is essential to ensure the consistencyof the estimator. It requires the instrumental variables to be uncorrelated with theequation disturbance. We note that the covariances between the composite error andMIIVs can be read directly from the model-implied moments, and thus a correctlyspeciﬁed model will immediately point to the observed variables that satisfy (a).Condition (b) is used to ensure the parameters in equation j are identiﬁed. This rankcondition fails if the number of instruments is less than the number of explanatoryvariables in an equation, and an implication of this condition is that we must have atleast one instrument for each explanatory variable. Condition (c) ensures the absence ofperfect multicollinearity among the MIIVs. Finally, condition (d) is designed to ensure asuﬃcient association between the endogenous variables and MIIVs speciﬁed for equation j , marginal association of this condition leads to the weak instrument problem. In thecontext of SEM the weak instrument condition has received less attention. One reasonis because nonscaling indicators of the same latent variables are commonly MIIVs, thisoften leads to MIIVs that are moderate to strong. This departs from many typicaleconometric cases where the instrument are auxiliary variables not part of the originalmodel. However, the consequence of weak instruments are serious enough to warrantadditional discussion.In the presence of any nonzero correlation between the instruments and equationerror, and a weak correlation between the endogenous regressors and instruments, orweak instruments, instrumental variable estimators can become biased and the rejectionrate of the null hypothesis of no eﬀect too large. Phillips (1989) was one of the ﬁrst todraw attention to the consequences of weak instruments, however, the developments ofStock and Yogo (2005) have become the de facto framework for testing weakinstruments in statistical software. The Stock and Yogo (2005) approach based theirapproach on quantifying the relationship between instrument strength and the resultingIXED INDICATOR DERIVATIVES 15bias of the instrumental variable estimator relative to the bias of OLS (and the size ofinferential tests). In the case of a single endogenous regressor Stock and Yogo (2005)employ a F statistic obtained from a regression of the endogenous regressor on theinstrumental variables. In the case of multiple endogenous regressors theminimum-eigenvalue statistic of Cragg and Donald (1993) is used. Importantly, Stockand Yogo (2005) provide tables for the critical values in both cases.Unfortunately, the inferential methods developed by Stock and Yogo (2005) relyheavily on the assumption of homoskedasticity and are intended for continuous variablesonly. Another widely used measure of instrument strength is Shea’s (1997) partial R .Shea’s partial R is useful in the current context as it (a) has a straightforwardinterpretation, (b) can handle multiple endogenous variables in a single equation, and(c) its use does not rely on identifying the appropriate sampling distribution of thestatistic. For these reasons we present a version of the statistic adapted to the use ofcovariance matrices as input, R S = S V X S − V V S V X S − XX where S V X is the covariancematrix of the MIIVs and endogenous regressors, while S V V and S XX are the covariancematrices of the MIIVs and endogenous variables, respectively. Although this statistic iscalculated at the equation level we have omitted the subscript j for clarity. Estimation of PIV Regression Model Parameters.

Let θ contain the fullset of model parameters from (1) and (2). It is convenient to separate θ into θ and θ ,where each contains the mathematically independent elements of ( α η , α y ∗ , Λ , B ) and( Σ ζ , Σ ε ), respectively. Furthermore, we partition θ into θ B Λ = [ υ ( Λ ) , υ ( B )] and θ α = [ υ ( α η ) , υ ( α y ∗ )] such that θ = [ θ α , θ B Λ ].To estimate the parameters in θ we employ a generalized method of moments(GMM) approach. GMM estimation is based on the idea that a set of parameters θ from an overidentiﬁed model are related to a set of data X through a series oforthogonality conditions on the population moments, E [ g ( x ; θ )] = 0 (Hansen, 1982).Here we can deﬁne the vector g () as g = g ( x ; θ ) = V ( Y − Z θ ) (28)IXED INDICATOR DERIVATIVES 16where V is a block-diagonal matrix of instrumental variables, Y is the vector ofdependent variables, and Z is a block diagonal matrix of explanatory variables. As theinput to our analysis is the joint unconditional covariance matrix Σ ∗ , we can expressthe quantities S ∗ vy ∗ and S ∗ vz as the sample moments corresponding to Σ ∗ vy ∗ and Σ ∗ vz ,respectively. The goal in estimation is to choose b θ that minimizes the joint distancebetween the sample orthogonality conditions using the symmetric, positive-deﬁniteweight matrix W , N ( S ∗ vy ∗ − S ∗ vz b θ ) c W ( S ∗ vy ∗ − S ∗ vz b θ ). The choice of W leads to a numberof diﬀerent GMM estimators, however, we will focus on the 2SLS weight matrix( W = N ( V V ) − ). Hayashi (2011) provides a general treatment of the varioussingle and multiple equation GMM estimators associated with W and Bollen et al.(2014) develop a MIIV-GMM estimator for latent variable models.Under this frameworkwe write the estimating equations for consistent estimation of the factor loadings andregression coeﬃcients ( θ B Λ ) from (1) and (2) asˆ θ B Λ = (ˆ S ∗ vz ˆ S ∗− vv ˆ S ∗ vz ) − ˆ S ∗ vz ˆ S ∗− vv ˆ S ∗ vy ∗ . (29)where S ∗ vz is the block-diagonal matrix containing the covariances between theexplanatory variables and instruments, S ∗ vy ∗ is the block-diagonal matrix containing thecovariances between the explanatory variables and dependent variables, and S ∗ vv is theblock-diagonal instrument covariance matrix. Fisher et al. (2019, Eq. 22) providesformulas for consistent estimation of θ B Λ in the presence of equality constraints. Withˆ θ B Λ in hand we can obtain consistent estimates of the latent variable and non-scalingindicator intercepts θ α , from (1) and (2) as ˆ θ α = ˆ µ ∗ y − ˆ µ ∗ z ˆ θ B Λ . Combining theseestimates gives us ˆ θ = [ˆ θ B Λ , ˆ θ α ]. Estimation of PIV Variance Parameters.

Estimation of the PIV varianceparameters requires an expression for the model-implied variances of y ∗ , Σ ( θ ) = Λ ( I − B ) − Σ ζ ( I − B ) − Λ + Σ ε (30)IXED INDICATOR DERIVATIVES 17with Σ ( θ ) representing the restrictions imposed on Σ ∗ by the parameters in θ .Similarly, we use γ ( θ ) to denote the speciﬁc restrictions placed on υ ( Σ ∗ ) by the PIVregression model parameters contained in θ . With consistent estimates of θ in handwe can now estimate the parameters in θ conditional on ˆ θ . A number of system-wideestimators could be used to estimate θ . Here, we use a weighted least squaresdiscrepancy function given by F ( θ ; W ) = ( σ − σ ( θ , ˆ θ )) ˆ W W LS ( σ − σ ( θ , ˆ θ )) (31)where σ ( θ , ˆ θ ) represent the mean and variance parameters implied by the full modelconditional on ˆ θ and ˆ W W LS = V ar ( π ) − (Muthén, 1978). These conditional varianceand covariance parameter estimates will be consistent estimates of their populationvalues, however, the naive standard errors corresponding to the estimates will be toosmall. In fact, the standard errors for both θ and θ require modiﬁcation and theseadjustments are addressed now. Asymptotic Variance of θ . A naive estimator for V ar ( θ ) is N − u u ( Σ ∗ vz Σ ∗− vv Σ ∗ vz ) − where V ar ( u u ) is estimated from the residual variances of theestimating equations. In the current context this estimator is not correct as it fails toaccount for the uncertainty arising in the estimation of Σ ∗ . In the context ofendogenous ordinal variables Bollen and Maydeu-Olivares (2007) show that √ N (ˆ θ − θ ) d −→ N ( , K V ar ( Σ ∗ ) K ) where K = ∂ γ ( Σ ∗ ) /∂ Σ ∗ . Writing this quantitywith respect to the patterning observed in the MIIV matrices we obtain the followingapproximation of the large sample covariance matrix of θ V ar (ˆ θ ) = ∂ γ ( Σ ∗ ) ∂ υ ( Σ ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Σ ∗ = ˆ Σ ∗ V ar ( ˆ Σ ∗ ) ∂ γ ( Σ ∗ ) ∂ υ ( Σ ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Σ ∗ = ˆ Σ ∗ . (32)Bollen and Maydeu-Olivares (2007) only discussed ordinal variables and used numericalderivatives to obtain K . Here we present the complementary analytic derivatives foreach of the requisite submatrices required to obtain this quantity. Our derivatives arevalid for a model where Σ ∗ vv is strictly lower-triangular (ordinal variables only under theIXED INDICATOR DERIVATIVES 18standard parameterization), a combination of strictly lower-triangular andlower-triangular (a combination of ordinal and continuous variables under the standardparameterization), or lower-triangular (any combination of variables under thealternative parameterization). The submatrices of K pertaining to the covarianceelements are ∂ γ ( θ ) ∂ vec( Σ ∗ vz ) = h ( Σ ∗ vy Σ ∗− vv ⊗ U − ) − ( θ Σ ∗ vz Σ ∗− vv ⊗ U − ) i K s,r − ( θ ⊗ U − Σ ∗ vz Σ ∗− vv ) ∂ γ ( θ ) ∂ vec( Σ ∗ vy ) = ( Σ ∗ vz Σ ∗− vv Σ ∗− vz ) Σ ∗ vz Σ ∗− vv ∂ γ ( θ ) ∂ υ ( Σ ∗ vv ) = h ( V U − Σ ∗ vz Σ ∗− vv ⊗ U − Σ ∗ vz Σ ∗− vv ) − ( Σ ∗ vy Σ ∗− vv ⊗ U − Σ ∗ vz Σ ∗− vv ) i ∆ where U = Σ ∗ vz Σ ∗− vv Σ ∗ vz and V = Σ ∗ vz Σ ∗− vv Σ ∗ vy and again, Σ ∗ vv may take on an arbitrarypatterning depending on the variable types and parameterization.As Bollen and Maydeu-Olivares (2007) only considered ordinal variables they hadno need for the mean structure of y ∗ . Here we consider the estimation of meanstructures for both continuous and ordinal variables and thus require results for theseparameters as well. As the intercept parameters are dependent on S ∗ vz , S ∗ vy ∗ , S ∗ vv andthe means of the dependent, µ ∗ y , and explanatory variables, µ ∗ z , the partial derivativeswith respect to each of these quantities is required. For the mean vectors the partialderivatives are straightforward, ∂ γ ( µ ∗ ) ∂ µ ∗0 y = I and ∂ γ ( µ ∗ ) ∂ µ ∗0 z = − θ B Λ . The partial derivativeswith respect to the matrices needed for the estimate of θ B Λ requires building onprevious results ∂ γ ( θ ) ∂ vec( Σ ∗ vz ) = − µ ∗ z ∂ γ ( θ ) ∂ vec( Σ ∗ vz ) ∂ γ ( θ ) ∂ vec( Σ ∗ vy ) = − µ ∗ z ∂ γ ( θ ) ∂ vec( Σ ∗ vy ) ∂ γ ( θ ) ∂ υ ( Σ ∗ vv ) = − µ ∗ z ∂ γ ( θ ∗ ) ∂ υ ( Σ ∗ vv ) ∆ , where again, the duplication matrix ∆ accounts for the arbitrary patterning of Σ ∗ vv .Now, let M = [ ∂ γ ( θ ) ∂ vec( Σ ∗ vz ) , ∂ γ ( θ ) ∂ vec( Σ ∗ vy ) , ∂ γ ( θ ) ∂ υ ( Σ ∗ vv ) ] and M = [ ∂ γ ( θ ) ∂ µ ∗0 y , ∂ γ ( θ ) ∂ µ ∗0 z , ∂ γ ( θ ) ∂ vec( Σ ∗ vz ) , IXED INDICATOR DERIVATIVES 19 ∂ γ ( θ ) ∂ vec( Σ ∗ vy ) , ∂ γ ( θ ) ∂ υ ( Σ ∗ vv ) ], and the partial derivatives of γ (ˆ θ ) with respect to σ is ∂ γ (ˆ θ ) ∂ σ = [ M , M ] where the entries of M and M have been reordered to match theordering of elements in V ar ( ˆ σ ∗ ). Asymptotic Variance of θ . With a consistent approximation for theasymptotic distribution of θ we employ results from Bollen and Maydeu-Olivares(2007, Eq. 31) showing V ar (ˆ θ ) = ˆ H ( I − ˆ J ˆ K ) V ar ( ˆ Σ ∗ )( I − ˆ J ˆ K ) ˆ H (33)where J = ∂ σ ( θ ) /∂ θ , H = ( J J ) − J , and J = ∂ σ ( θ , θ ) /∂ θ , are all evaluated atˆ θ . For completeness the elements of J and J are also presented as matrix expressionsto facilitate computational implementation of the estimator described, ∂ σ ( θ ) ∂ υ ( Λ ) = ∆ + [( ΛFΨF ⊗ I ) + ( I ⊗ ΛFΨF ) K ] ∆ , (34) ∂ σ ( θ ) ∂ υ ( B ) = ∆ + [( ΛFΨF ⊗ ΛF ) + ( ΛF ⊗ ΛFΨF ) K ] ∆ , (35) ∂ σ ( θ , ˆ θ ) ∂ υ ( Σ ζ ) = ∆ + ( ΛF ⊗ ΛQF ) ∆ , (36) ∂ σ ( θ , ˆ θ ) ∂ υ ( Σ ε ) = ∆ + ∆ , (37)where F = ( I − B ) − . It should be noted the patterning of the elements in Λ , B , Σ ζ and Σ ε is arbitrary and will not be known in advance. For this reason the algorithmdescribed by Nel (1980) is required. Moreover, in the case of ∂ σ ( θ , ˆ θ ) /∂ υ ( Σ ε ) , ∆ + ∆ = I , as ∆ + is the elimination matrix associated with the model implied variancesand covariances, and ∆ is the duplication matrix associated with the patterning speciﬁcto Σ ε . An Illustrative Example

To demonstrate the PIV estimator developed here and the utility of our generalalternative parameterization we used data from the 2016 General Social Survey (GSS).We chose to model three constructs representing educational attainment; maternalIXED INDICATOR DERIVATIVES 20education (ME), paternal education (PE) and the respondent (or child’s) education(CE). Each construct was measured using two indicators, a near continuous variablemeasuring years of education (educ), and an ordered categorical variable representingthe highest degree achieved (deg). Response options for the degree question were: (1) less than high school , (2) high school , (3) junior college , (4) bachelor , and (5) graduate .We hypothesized that high school and bachelor degrees were the response options linkedmost closely to consistent durations, at 12 and 16 years, respectively. For this reason weﬁxed these two thresholds to the numerical values 12 and 16 and freely estimated thethresholds for junior college and graduate degrees. Furthermore, we regressed child’seducational attainment on the parent’s educational attainment. See Figure 1 for a pathdiagram of the ﬁtted model.To obtain the PIV estimates for the model shown in Figure 1 we used the

MIIVsem package (Fisher et al., 2017). For comparative purposes we also ﬁt the model using theWSLMV estimator in lavaan (Rosseel, 2012). The data contained observations from1 ,

910 subjects. Estimates from the two estimators are provided in Table 2. Anexamination of the threshold estimates paints an interesting picture in terms of thecorrespondence between the years of education associated with obtaining a junior collegeand graduate degree. For example, in this sample a junior college degree is associatedwith approximately 15 . . R measures for the structural parameters in our model aresuﬃciently large to avoid concerns of weak instruments. Additional indirect evidence ofinstrument strength is provided by the consistency across the two estimators and this istypically also an indication of a well-speciﬁed model.IXED INDICATOR DERIVATIVES 21 Simulation Study

In this section we examine the ﬁnite sample properties of the proposed pointestimates and their standard errors in a small simulation study. To promotereproducibility and enrich cross-study comparisons we adapt the main simulationcondition from Jin et al. (2016) to the case of mixed ordinal and continuous endogenousvariables. MIIV estimates were obtained using the

MIIVsem package (Fisher et al.,2017) and WSLMV estimates using lavaan (Rosseel, 2012).

Model Speciﬁcation and Data Generation

The following data generating parameters were employed across all simulationconditions: [ B (3 , , B (4 , , B (5 , , B (5 , , B (5 , ] = [0 . , . , . , . , . λ (1 , , λ (2 , , λ (3 , ,λ (4 , , λ (5 , , λ (6 , , λ (7 , , λ , , λ (9 , , λ (10 , , λ (11 , , λ (12 , ] = [1 . , . , . , . , . , . , . , . , . , . , . , . ζ (1 , , Σ ζ (2 , , Σ ζ (2 , , Σ ζ (3 , , Σ ζ (4 , , Σ ζ (5 , ] = [0 . , . , . , . , . , . η and η were measured bythe continuous indicators y − y and y − y , respectively, while η , η and η weremeasured by the 5-category ordinal variables y − y , y − y and y − y , respectively.All observed variables were generated according to a normal distribution with meanzero and subsequently y − y were discretized with response probabilities of 0 . , . , . , . , .

04. Data were generated according to ﬁve sample sizes( N = 100 , , , , ,

000 datasets were generated.

Improper Solutions and Nonconvergence

For the purposes of our analysis nonpositive deﬁnite factor and error variancematrices were ﬂagged. Previous examinations of the PIV estimator have handledimproper solutions diﬀerently. For example, Nestler (2013) retained Heywood cases forall estimators in the ﬁnal analysis. The rationale here is that Heywood cases reﬂect trueIXED INDICATOR DERIVATIVES 22sampling variability of the estimator and through their omission bias is introduced insummary statistics involving means or variances. On the other hand Jin et al. (2016)omitted solutions containing nonpositive deﬁnite matrices from all summary statisticsexcept for the calculation of a parameter’s empirical standard deviation, which wassubsequently used to asses the accuracy of estimated standard errors. For this reasonwe have chosen to conduct a sensitivity analysis to better understand the impact ofomitting these solutions from our analysis. Across all simulation conditions solutionswhich did not converge were omitted from the analysis. This is discussed further in theresults section and a breakdown of the percentage of nonpositive deﬁnite matrices andnonconverged solutions by estimator and parameterization is given in Table 3.

Outcome Variables

In line with previous simulations we examined the relative bias of both the pointestimates and standard errors within each simulation condition. The mean percentageof relative bias for point estimates was calculated as RB = mean[(ˆ θ ak − θ a ) /θ a ]100where θ a is the data generating parameter in a given simulation condition and ˆ θ ak is theestimate for parameter a in the k th Monte Carlo replication. We calculated the medianpercentage of relative bias for the standard errors as

RBSE = median [( SE (ˆ θ ak ) − SD ( θ a )) /SD ( θ a ) ]100. Here we use the heuristic that a relative bias of less than | . | is considered trivial. The median was chosen to ensure a robust measure of centraltendency given the inclusion of improper solutions in our analysis. Consistent withprevious simulation studies, the mean RB and mean RBSE were calculated across eachparameter set ( τ , α η , α y ( c ) , α y ( o ) , Λ y ( c ) , Λ y ( o ) , B , Σ ε y ( c ) , Σ ε y ( o ) , Σ ζ ) within a givensimulation condition. Here a superscript of y ( c ) indicates parameters corresponding toone of the continuous variables in the system and a superscript of y ( o ) indicatesparameters corresponding to an ordered categorical variable. Finally, Shea’s partial R was used to assess instrument strength. Results

Across all sample sizes and conditions none of the MIIV solutions exhibitednonconvergence. For the WLSMV estimator convergence problems arose in the smallerIXED INDICATOR DERIVATIVES 23sample sizes. Across the two parameterizations approximately 10% and 0 .

5% of theWLSMV solutions failed to converge for the sample sizes of 100 and 200, respectively.Nonconvergence of WLSMV was more frequent under the alternative parameterization.Both estimators exhibited a large portion of nonpositive deﬁnite covariance matrices atthe smaller sample sizes. Here the percentage of nonpositive deﬁnite solutions wassimilar across estimators and ranged from approximately 50% at a sample size of 100 to25% at a sample size of 200. The rate of nonpositive deﬁnite matrices decreasedconsiderably in larger sample sizes. Details on the frequency of nonconverged andnonpositive deﬁnite solutions are provided in Table 3.

Point Estimates.

The percentage of relative bias for point estimates andstandard errors are provided in Table 3. Here any percentage of relative bias exceeding5% in absolute value is presented in bold typeface. First we consider the summarystatistics for the sample of datasets that does not include solutions with nonpositivedeﬁnite matrices. Under the standard parameterization the MIIV estimator exhibitedacceptable bias in point estimates for nearly all parameter sets and sample sizes. Theonly exceptions being the ordinal variable factor loadings and error variances at thesmallest sample size of 100. Under the alternative parameterizations the intercepts andfactor variances showed nontrivial bias at the two smallest sample sizes, N = 100 , N = 100 under the alternative parameterization.For the WLSMV estimator the pattern of relative bias is more nuanced. Under thestandard parameterization, relative bias for WLSMV exceeds the 5% cutoﬀ for allparameters except for the thresholds at N = 100. Additionally, the continuous variablefactor loadings, regression coeﬃcients and factor variances also exhibited nontrivial biasat N = 200. Under the alternative paramaterization bias of the WLSMV estimates werenontrivial for both the variance parameters at the two smallest sample sizes and thefactor loadings and regression coeﬃcients at the smallest sample size only. The patternof problematic bias for the WLSMV was similar under the alternative parameterizationexcept for the error variances, which no longer showed meaningful bias.IXED INDICATOR DERIVATIVES 24Next we consider the situation in which all converged datasets are included in theanalysis. The rationale for including solutions with improper solutions is based on theidea that (1) this situation may be quite common in practical modeling situations and(2) parameter estimates from these solutions are required to obtain unbiased summariesof the estimator’s sampling variability. For the MIIV estimator, a larger amount of biasin the point estimates was observed when all converged datasets were included in theanalysis. This bias was nontrivial at the smallest sample size of 100 for the factorloadings and variance parameters under the standard parameterization, and the factorloadings, regression coeﬃcients and error variances under the alternativeparameterization. For the WLSMV estimator the factor variance parameters exhibitedproblematic levels of bias at sample sizes of 100 and 200, as well as the error variancesat the smallest sample size of 100 under the standard parameterization only. Standard Errors.

Again we begin by examining the sample of datasets wheresolutions with nonpositive deﬁnite matrices have been excluded. For the MIIVestimator the bias of the standard errors speaks to both the performance of theestimator and the accuracy of the analytic results derived earlier. Under the standardparametrization standard error bias exceeded nominal levels for the thresholds andvariance parameters at the smallest sample size, and for the error variances at a samplesize of 200.Under the alternative parameterizations only the thresholds at the smallestsample size exhibited problematic bias. Compared to the MIIV estimator WLSMVexhibited a greater degree of standard error bias at smaller sample sizes. This ﬁndingwas consistent across parameter type and this bias was problematic for the majority ofparameters at a sample size of 100.Including all converged datasets in the analysis increased the standard error biasacross both estimators. For the MIIV estimator, the regression coeﬃcients and varianceparameters exhibited problematic bias at the two smallest sample sizes across bothparameterizations. Under the alternative parameterization the MIIV standard errors forthe thresholds, intercepts and factor loadings also showed nontrivial biases at the twosmallest sample sizes. The factor variance standard errors were the only MIIV estimatesIXED INDICATOR DERIVATIVES 25across all simulation conditions to show appreciable bias at a sample size of 400 orlarger. When including all converged datasets the WLSMV estimator also showed anincrease in SE bias across the two parameterizations. Under the standard andalternative parameterizations the factor loadings, regression coeﬃcients, and varianceparameters showed problematic bias at sample sizes of 100 and 200. Under thealternative parameterization only, the WLSMV intercepts and thresholds also showedproblematic bias at the smaller sample sizes. For the WLSMV estimator the onlyparameters that exhibited problematic bias at sample sizes of 400 or greater were theregression coeﬃcient and factor variance standard errors.

Instrument Strength.

Equation-level instrument strength varied across allreplications included in our simulations. For this reason it is possible to examine theimpact of instrument strength on outcome measures such as relative bias. In Figure 2instrument strength as indexed by Shea’s partial R is aggregated within eachparameter and presented across sample sizes. From this illustration it is clear thatsmaller R S values are associated with more variability in bias, and this is most clear atthe smaller sample sizes. The variability in the percentage of relative bias also tends todecrease with increases in R S at each sample size, though this is less noticeable at thelargest sample sizes. Unfortunately, as the sampling distribution of R S has not beenderived it is not possible to examine the impact of instrument strength on inferentialtest size for our simulations. Discussion

Jin and Cao (2018) recently reviewed the evidence comparing the PIV estimator toDWLS and ULS for factor analysis models and concluded the PIV approach producesestimates as accurate as ULS and DWLS if the model is correctly speciﬁed and morerobust estimates if the model is misspeciﬁed. We agree, but note that the PIV hasseveral restrictions which if removed would make the PIV an even more powerfulalternative. Among these is that Bollen and Maydeu-Olivares (2007) developed the PIVestimator for only binary and ordinal endogenous variables and not a mixture ofIXED INDICATOR DERIVATIVES 26categorical and continuous variables. Also, they and the researchers who followed them,used numerical derivatives to develop the asymptotic covariance matrix of theparameter estimates rather than using analytic derivatives. Analytic derivatives arevaluable because they can increase both the eﬃciency of the estimation and theaccuracy of estimates. In addition, with the analytic derivatives in hand, the PIVestimator can be applied to any means and covariance matrix as input along with theirasymptotic covariance matrix. In addition, nearly all of the work on polychoriccorrelations has focused on correlations rather than means, variances, and covariances.An additional property that would be desirable for both WLSMV and PIV would bethe ability to input the means, variances, and covariances of the underlying variables.Researchers using traditional SEM with continuous endogenous variables can estimatemodels based on the covariance matrix and means. Using these as input facilitatescomparisons of eﬀects both over time and across diﬀerent groups.Our paper overcomes all of these restrictions. We develop the PIV estimator topermit a mixture of binary, ordinal, or continuous endogenous variables. We usediﬀerential matrix calculus to elaborate the analytic derivatives for a general set ofpatterned matrices that permit us to use the multivariate delta method to derive theasymptotic standard errors of the parameter estimates. In addition, we pick up a threadoriginally elaborated for SEM by Jöreskog (2002) regarding the construction of ordinalvariable indicators. To our knowledge we are the ﬁrst to realize and detail the fullgenerality of this idea. Jöreskog (2002) originally considered the estimation of mean andvariance parameters for the underlying variable of an ordinal variable whose ﬁrst twothresholds are constrained to zero and one, respectively.The choice of using the ﬁrst two thresholds and the ﬁxed constants of zero and oneis often a convenient choice as it sets a unit scale for the threshold values, however, asdemonstrated in the empirical example, a number of other possibilities are both usefuland possible. Our general method enables the estimation of means, variances, andcovariances for the underlying variables of ordinal observed variables with three or morecategories along with continuous variables. By so doing, we create means andIXED INDICATOR DERIVATIVES 27covariance matrix as is common when researchers analyze all continuous variables. Aspart of this, we used methods that allow analyst to input known threshold values. Thispermits researchers to take advantage of the cutoﬀ points (e.g., 8 years of education,less than high school, etc.) for ordinal variables that are collapsed versions ofcontinuous variables. By providing the algebraic transformations required for obtainingthese parameterizations we hope to contribute to additional work on ordinal variableparameterizations, including multiple group models where measurement invariancetesting often requires complicated parameter constraints. Furthermore, our matrixderivatives take this scaling ﬂexibility into account. We note that analysts can use thismethod of generating means, variances, and covariances with continuous and ordinalvariables with DWLS and ULS methods as well as PIV.In a simulation study we have shown the performance of the proposed pointestimator and standard errors to be equivalent to the popular WSLMV estimator undera number of sample sizes and correct model parameterizations. Importantly, we haveonly considered correctly speciﬁed models in our simulation study. In doing so we haveignored the scenarios most likely encountered in applied research (modelmisspeciﬁcation) and also the conditions under which the MIIV estimator is most likelyto demonstrate superior performance over the system-wide estimators. Our empiricalexample using education in a continuous and ordinal form allowed us to enter ﬁxedthresholds (e.g., 8 years, 12 years) as well as to estimate the thresholds that correspondto other ordinal categories where the years of education are uncertain.Finally, the proposed developments not only support the current goal ofaccommodating mixed data types but also support future extensions of the MIIVframework. Other researchers can use the analytic derivatives we present in a number ofimportant situations not yet considered in the MIIV framework, such as the handling ofmissing data in a manner similar to that proposed by Yuan and Bentler (2000) andSavalei and Falk (2014) and the development of accurate standard errors for data withcomplex dependencies such as time-series data. For this reason, and the extensions tomultiple-group modeling discussed earlier, we view the developments made here as anIXED INDICATOR DERIVATIVES 28important building block for future development of the PIV estimator.IXED INDICATOR DERIVATIVES 29

Appendix ANotation and Algebraic ResultsVec and related operators.

Our derivations make use of the vec and relatedoperators for transforming a matrix into a vector. Although these operators arecommonly encountered in multivariate analysis there representations vary considerablyby author. For this reason we will provide deﬁnitions corresponding to our own usage.For a p × q matrix, X , the vec operator is used to stack columnwise the q columns of X into a pq × A p,q , where a , . . . , a q are the columns of A p,q taken in lexicon order thenvec A p,q = [ a , . . . , a q ] .The υ ( · ) operator is also used heavily in these derivations. Here, υ ( · ), can beunderstood as a generalization of the vech( · ) ( vector-half ) operator for symmetricmatrices, or the vecp( · ) operator for strictly lower-triangular matrices, to any patternedmatrix. A p × q matrix is labeled patterned if it contains p ∗ = pq − s − v mathematically independent and variable elements, where s and v are the number ofrepeated, and constant elements, respectively. Covariance and correlation matrices aretwo examples of patterned matrices. For a p × p covariance matrix, S , p ∗ = 1 / p ( p + 1), s = 1 / p ( p + 1), and v = 0. Likewise, for the p × p correlation matrix, P , p ∗ = 1 / p ( p − s = 1 / p ( p − v = p . Consider the covariance matrix, S , , then υ ( S ) = [ s , , s , , s , ] . Similarly, for the correlation matrix, P , , υ ( P ) = [ s , ] .Generally, for any patterned matrix X p,q , υ ( X ) will be a p ∗ × The Kronecker product.

For the matrices X p,q and A r,s , we deﬁne theKronecker product as X ⊗ B = ( x i,j A ) pr × qs , for i = 1 , . . . , p and j = 1 , . . . , q . A usefulresult linking the Kronecker product to the vec operator statesvec( A m,n B n,q C q,r ) = ( C ⊗ A )vec B . Other useful properties of the Kronecker productutilized in these derivations for simplifying resultant expressions are( A ⊗ B )( C ⊗ D ) = AC ⊗ BD and ( A ⊗ B ) = A ⊗ B . The commutation matrix.

Commutation (or vec-permutation) matrices canbe used to translate between vectors vec X and vec X . The vec-permutation operator,IXED INDICATOR DERIVATIVES 30 K p,q , is deﬁned such that vec X p,q = K p,q vec X . The commutation matrix plays acentral role in the formulation of matrix derivatives using the vec operator and thefollowing derivative is used throughout. Consider the m × n matrix X , then ∂ vec( X ) /∂ vec( X ) = K m,n . Note also that ∂ vec( X ) /∂ vec( X ) = I mn . Matrix Derivatives and L-Structured Matrices.

The results herein requiretaking partial derivatives with respect to lower-triangular , strictly lower-triangular , diagonal and arbitrarily patterned matrices. Furthermore, the solution matricesresulting from these matrix derivatives are themselves often known apriori to besymmetric or patterned. For these reasons we rely on a number of results detailed byMagnus (1983) and Magnus and Neudecker (1986) for L-structured matrices. The use ofL-structures allows us to derive our results in the most general way possible across thediﬀerent parameterizations available. The following properties of L-Structured matricesare used throughout, vec( X ) = ∆ υ ( X ) and υ ( X ) = ∆ + vec( X ).It is useful to consider ∆ as a generalized duplication matrix, and ∆ + as ageneralized elimination matrix. If X is a symmetric ∆ is p × p ( p + 1) /

2, while ∆ is p × p ( p − / X is strictly lower-triangular . The most interesting case occurs when X exhibits an arbitrary constellation of free, ﬁxed and repeating elements. In this casea general method is needed for constructing ∆ and ∆ + when the speciﬁc patterning of X is unknown (prior to the analysis). Fortunately, a result for this speciﬁc case wasderived by Nel (1980, Deﬁnition 6.1.1). In the case of arbitrary patterning ∆ is p × p ∗ .Extending these properties to the case of matrix derivatives it can be shown that ∂ vec( X ) ∂ υ ( X ) = ∆ and ∂ υ ( X ) ∂ vec( X ) = ∆ + . It follows that if the matrix function Z = f ( X ), ∂ vec( Z ) ∂υ ( X ) = ∂ vec( Z ) ∂ vec( X ) ∂ vec( X ) ∂υ ( X ) = ∂ vec( Z ) ∂ vec( X ) ∆ , (38)and if Z is also patterned, ∂υ ( Z ) ∂ υ ( X ) = ∂υ ( Z ) ∂ vec( Z ) ∂ vec( Z ) ∂ vec( X ) ∂ vec( X ) ∂υ ( X ) = ∆ + ∂ vec( Z ) ∂ vec( X ) ∆ . (39)IXED INDICATOR DERIVATIVES 31 Derivatives of Common Matrix Functions.

The following derivatives areused throughout and will be restated here for clarity. For the following results suppose X is m × n , U is p × q , and V is q × r , where both U and V are matrix functions of X ,then ∂ vec ( U + V ) ∂ vec ( X ) = ∂ vec ( U ) ∂ vec ( X ) + ∂ vec ( V ) ∂ vec ( X ) , (40)and ∂ vec ( UV ) ∂ vec ( X ) = ( V ⊗ I p ) ∂ vec ( U ) ∂ vec ( X ) + ( I r ⊗ U ) ∂ vec ( V ) ∂ vec ( X ) . (41)In addition we state a general rule for taking derivatives of matrix inverses, speciﬁcallyif Y = X − , then ∂ vec( Y ) ∂ vec( X ) = − ( X − ⊗ X − ) . (42)Now suppose X is a symmetric matrix, A is a matrix of constants, and Y = XAX using successive applications of (41) we can show that, ∂ vec( Y ) ∂ vec( X ) = ( AX ⊗ I ) + ( I ⊗ XA ) . (43)Suppose instead that A and B are matrices containing constant elements and Y = A X − B , then ∂ vec( Y ) ∂ vec( X ) = − ( B X − ⊗ A X − ) . (44)IXED INDICATOR DERIVATIVES 32ReferencesArminger, G. and Küsters, U. (1988). Latent Trait Models with Indicators of MixedMeasurement Level. In Langeheine, R. and Rost, J., editors, Latent Trait and LatentClass Models , pages 51–73. Springer US, Boston, MA.Bock, D. R. (1972). Estimating item parameters and latent ability when responses arescored in two or more nominal categories.

Psychometrika , 37(1):29–51.Bollen, K. A. (1996). An alternative two stage least squares (2sls) estimator for latentvariable equations.

Psychometrika , 61(1):109–121.Bollen, K. A., Kolenikov, S., and Bauldry, S. (2014). Model-Implied InstrumentalVariable-Generalized Method of Moments (MIIV-GMM) Estimators for LatentVariable Models.

Psychometrika , 79(1):20–50.Bollen, K. A. and Maydeu-Olivares, A. (2007). A Polychoric Instrumental Variable(PIV) Estimator for Structural Equation Models with Categorical Variables.

Psychometrika , 72(3):309.Cragg, J. G. and Donald, S. G. (1993). Testing Identiﬁability and Speciﬁcation inInstrumental Variable Models.

Econometric Theory , 9(2):222–240.Cramér, H. (1999).

Mathematical Methods of Statistics . Princeton University Press.Fisher, Z., Bollen, K., Gates, K., and Rönkkö, M. (2017). Miivsem: Model impliedinstrumental variable (miiv) estimation of structural equation models. r packageversion 0.5. 2.Fisher, Z. F., Bollen, K. A., and Gates, K. M. (2019). A limited information estimatorfor dynamic factor models.Hansen, L. P. (1982). Large sample properties of generalized method of momentsestimators. 50(4):1029–54.Hayashi, F. (2011).

Econometrics . Princeton University Press.IXED INDICATOR DERIVATIVES 33Jin, S. and Cao, C. (2018). Selecting polychoric instrumental variables in conﬁrmatoryfactor analysis: An alternative speciﬁcation test and eﬀects of instrumental variables.

British Journal of Mathematical and Statistical Psychology , 71(2).Jin, S., Luo, H., and Yang-Wallentin, F. (2016). A Simulation Study of PolychoricInstrumental Variable Estimation in Structural Equation Models.

Structural EquationModeling: A Multidisciplinary Journal , 23(5):680–694.Jöreskog, K. G. (1994). On the estimation of polychoric correlations and theirasymptotic covariance matrix.

Psychometrika , 59(3).Jöreskog, K. G. (2002). Structural Equation Modeling with Ordinal Variables usingLISREL.Katsikatsou, M., Moustaki, I., Yang-Wallentin, F., and Jöreskog, K. G. (2012). Pairwiselikelihood estimation for factor analysis models with ordinal data.

ComputationalStatistics & Data Analysis , 56(12):4243–4258.Kirby, J. B. and Bollen, K. A. (2009). Using Instrumental Variable (IV) Tests toEvaluate Model Speciﬁcation in Latent Variable Structural Equation Models.

Sociological Methodology , 39(1):327–355.Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structureanalysis.

Studies in Social Psychology in World War II Vol. IV : Measurement andPrediction , pages 362–412.Lee, S.-Y., Poon, W.-Y., and Bentler, P. M. (1995). A two-stage estimation ofstructural equation models with continuous and polytomous variables.

BritishJournal of Mathematical and Statistical Psychology , 48(2):339–358.Lord, F., Novick, M., and Birnbaum, A. (1968).

Statistical theories of mental testscores . Statistical theories of mental test scores. Addison-Wesley, Oxford, England.Magnus, J. R. (1983). L-structured matrices and linear matrix equations.

Linear andMultilinear Algebra , 14(1):67–88.IXED INDICATOR DERIVATIVES 34Magnus, J. R. and Neudecker, H. (1986). Symmetry, 0-1 Matrices and Jacobians: AReview.

Econometric Theory , 2(2):157–190.Monroe, S. (2018). Contributions to Estimation of Polychoric Correlations.

MultivariateBehavioral Research , 53(2):247–266.Muthén, B. (1978). Contributions to factor analysis of dichotomous variables.

Psychometrika , 43(4):551–560.Muthén, B. (1984). A general structural equation model with dichotomous, orderedcategorical, and continuous latent variable indicators.

Psychometrika , 49(1):115–132.Muthén, B. (1993).

Latent Trait Models with Indicators of Mixed Measurement Level .Testing structural equation models. Sage Publications, Inc, Thousand Oaks, CA, US.Muthén, B. and Satorra, A. (1995). Technical aspects of Muthén’s liscomp approach toestimation of latent variable relations with a comprehensive measurement model.

Psychometrika , 60(4):489–503.Nel, D. G. (1980). On matrix diﬀerentiation in statistics.

South African StatisticalJournal , 14(2):137–193.Nestler, S. (2013). A Monte Carlo study comparing PIV, ULS and DWLS in theestimation of dichotomous conﬁrmatory factor analysis.

British Journal ofMathematical and Statistical Psychology , 66(1):127–143.Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlationcoeﬃcient.

Psychometrika , 44(4):443–460.Phillips, P. C. B. (1989). Partially Identiﬁed Econometric Models.

Econometric Theory ,5(2):181–240.Rosseel, Y. (2012). lavaan: An R package for structural equation modeling.

Journal ofStatistical Software , 48(2):1–36.IXED INDICATOR DERIVATIVES 35Savalei, V. and Falk, C. F. (2014). Robust Two-Stage Approach Outperforms RobustFull Information Maximum Likelihood With Incomplete Nonnormal Data.

StructuralEquation Modeling: A Multidisciplinary Journal , 21(2):280–302.Shea, J. (1997). Instrument Relevance in Multivariate Linear Models: A SimpleMeasure.

The Review of Economics and Statistics , 79(2):348–352.Stock, J. H. and Yogo, M. (2005). Testing for Weak Instruments in Linear IVRegression. In Stock, J. H. and Andrews, D. W. K., editors,

Identiﬁcation andinference for econometric models: Essays in honor of Thomas J. Rothenberg .Cambridge University Press, Cambride, UK.Thurstone, L. (1925). A method of scaling psychological and educational tests.

Journalof Educational Psychology , 16(7):433–451.Yang-Wallentin, F., Jöreskog, K. G., and Luo, H. (2010). Conﬁrmatory Factor Analysisof Ordinal Variables With Misspeciﬁed Models.

Structural Equation Modeling: AMultidisciplinary Journal , 17(3):392–423.Yuan, K.-H. and Bentler, P. M. (2000). Three likelihood-based methods for mean andcovariance structure analysis with nonnormal missing data.

Sociological Methodology;Washington , 30:165.IXED INDICATOR DERIVATIVES 36Table 1

Description of estimation algorithm.

Parameterization Step DetailsStandard and Alternative 1a Obtain consistent estimates of the sample means, thresholds, variances, covariances andcorrelations, ˆ ω = [ˆ µ y ∗ , ˆ τ , υ ( ˆ Σ y ∗ )], where Σ y ∗ is given in (3) and the patterned matrixoperator υ ( · ) is deﬁned previously.1b Obtain an estimate of the large sample variances of ω from Step 1a, V ar ( ω ).Alternative Only 2a Obtain consistent estimates of the unconstrained mean vector, transformed thresholds,unconstrained variances and covariances, ˆ π = [ˆ¨ τ , ˆ¨ µ y ∗ , υ (ˆ¨ Σ y ∗ )], where the elements of π are given in (9)-(11).2b Obtain an estimate of the large sample variances of π from Step 2a, V ar ( π ).Standard and Alternative 3a Obtain consistent estimates of the intercepts, factor loadings and regression parametersin θ , ˆ θ = [ υ ( ˆ α η ) , υ ( ˆ α y ∗ ) , υ ( Λ ) , υ ( B )], as described in (28).3b Obtain consistent estimates of the variance parameters in θ , where ˆ θ = [ ˆ Σ ζ , ˆ Σ ε ], asgiven in (31).3c Obtain estimates of the large sample variance of the estimates obtained in 3a, V ar ( θ )given in (32).3d Obtain estimates of the large sample variance of the estimates obtained in 3b, V ar ( θ )given in (33). IXED INDICATOR DERIVATIVES 37Table 2

Empirical Example

EstimatorWLSMV MIIVParameter Est. Std.Err. Est. Std.Err. R S λ (1 , λ (2 , λ (3 , λ (4 , λ (5 , λ (6 , β (3 , β (3 , τ (2 , τ (2 , τ (2 , τ (2 , τ (4 , τ (4 , τ (4 , τ (4 , τ (6 , τ (6 , τ (6 , τ (6 , α η α y α y α y ζ (1 , ζ (2 , ζ (3 , ζ (1 , ε (1 , ε (2 , ε (3 , ε (4 , ε (5 , ε (6 , padeg equation, all items except for paeduc were used as MIIVs. For madeg , all itemsexcept for maeduc were used as MIIVs. For chdeg , allitems except for cheduc were used as MIIVs. For the CE equation, all items except for cheduc and chdeg served as MIIVs. R S indicates Shea’s partial R , anindex of instrument strength. IXED INDICATOR DERIVATIVES 38Table 3

Percentage of Nonconverging and Nonpositive Deﬁnite Solutions

Nonconverged or Nonpositive Deﬁnite (%)Standard Parameterization Alternative ParameterizationNoncoverged Nonpositive Deﬁnite Noncoverged Nonpositive DeﬁniteSample Size MIIV WLSMV MIIV WLSMV MIIV WLSMV MIIV WLSMV100 0.0 4.5 45.6 54.5 0.0 15.3 46.2 48.1200 0.0 0.3 24.1 26.6 0.0 0.8 23.7 26.6400 0.0 0.0 9.3 7.4 0.0 0.0 9.0 7.4800 0.0 0.0 2.9 1.7 0.0 0.0 2.8 1.73200 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

IXED INDICATOR DERIVATIVES 39

Table 4

Simulation Results.

Percentage of Relative BiasPoint Estimates Standard ErrorsEstimator EstimatorMIIV WLSMV MIIV WLSMVSample Size Sample Size Sample Size Sample SizeParameter 100 200 400 800 3,200 100 200 400 800 3,200 100 200 400 800 3,200 100 200 400 800 3,200Standard Parameterization - Datasets with NPD Matrices Excluded α η τ -5.8 -2.5 -1.5 -0.1 0.4 -5.8 -2.5 -1.5 -0.1 0.4 Λ y ( c ) -5.3 -0.9 0.9 -1.8 -0.7 Λ y ( o ) -8.7 -3.3 -1.5 -0.6 -0.2 -14.3 -6.3 -2.7 -2.3 -0.4 B -0.6 0.3 0.0 -0.5 -0.1 -12.6 -9.1 -4.0 -2.1 -0.1 Σ ε y ( c ) -0.2 -1.0 -1.0 -1.5 -0.5 Σ ζ -10.6 -5.4 -1.6 0.1 0.0 -5.3 α η τ Λ y ( c ) -3.0 -1.8 -0.7 -0.5 -0.1 4.7 2.2 1.0 0.0 0.0 -2.4 -2.7 0.4 -1.9 -1.2 -10.7 -5.7 -1.3 -2.3 -0.7 Λ y ( o ) -11.7 -4.9 -1.8 -0.7 -0.2 -5.3 -4.0 -2.1 -0.8 -0.5 -23.3 -11.2 -4.3 -2.9 -0.4 B -3.5 -1.8 -0.8 -0.8 -0.1 -15.3 -9.9 -2.5 -0.7 0.6 -35.7 -19.4 -5.8 -2.9 -0.1 Σ ε y ( c ) -12.4 -9.4 -4.2 -2.4 -0.5 -6.2 -3.1 -0.2 -0.7 -0.1 -12.6 -9.0 -3.7 -2.3 -2.6 -22.9 -20.8 -3.6 -2.2 -2.2 Σ ζ -33.3 -24.6 -6.4 -1.9 -0.1 -51.6 -31.9 -8.1 -4.6 -1.2Alternative Parameterization - Datasets with NPD Matrices Excluded α η -6.9 -6.8 -3.3 -1.4 -0.3 1.9 -0.6 -0.3 -1.8 1.8 -3.8 -4.4 -1.6 -2.8 1.5 α y ( o ) -18.7 -9.7 -3.7 -2.1 -0.1 -4.8 -2.1 -1.7 -0.6 -0.5 -17.5 -7.1 -3.6 -2.5 -0.4 τ -5.5 -3.6 -1.8 0.1 0.8 -5.5 -3.6 -1.8 0.1 0.8 Λ y ( c ) Λ y ( o ) -5.4 -2.1 -0.8 -0.2 -0.1 -5.4 -2.9 -2.5 -1.2 -0.1 -17.3 -7.3 -4.8 -3.2 -0.3 B -15.4 -8.9 -4.7 -2.2 0.2 Σ ε y ( c ) -0.3 -1.2 -1.1 -1.5 -0.4 Σ ε y ( o ) -7.7 -3.0 -2.0 -1.2 -0.2 1.1 0.7 -0.6 -0.4 0.0 -13.9 -4.6 -1.6 -0.4 0.1 -6.0 -4.1 -1.5 -1.8 -0.6 Σ ζ -8.4 -4.0 -1.1 0.3 0.1 -1.2 0.8 0.7 0.4 0.9 4.7 -0.7 -1.8 -2.5 -0.8Alternative Parameterization - All Converged Datasets α η -13.0 -9.7 -3.2 -1.2 -0.3 -11.3 -9.5 -0.9 -1.8 1.8 -20.2 -8.4 -1.9 -2.7 1.5 α y ( o ) -18.4 -8.2 -3.9 -2.2 -0.1 -7.7 -4.3 -2.4 -0.8 -0.5 -22.3 -10.4 -4.7 -2.8 -0.4 τ -7.0 -3.6 -1.6 0.1 0.8 -7.0 -3.6 -1.6 0.1 0.8 Λ y ( c ) -3.0 -1.9 -0.7 -0.5 -0.1 5.0 2.2 1.0 0.0 0.0 -3.0 -2.7 0.4 -1.9 -1.2 -11.3 -5.7 -1.3 -2.3 -0.7 Λ y ( o ) -8.4 -3.7 -1.0 -0.2 -0.1 -8.5 -5.1 -2.9 -1.5 -0.1 -21.4 -10.0 -5.2 -3.6 -0.3 B -2.4 -1.0 -0.5 -0.6 -0.1 -17.9 -10.6 -2.3 -0.9 0.7 -35.8 -15.5 -6.0 -2.9 0.2 Σ ε y ( c ) -13.3 -9.6 -4.2 -2.4 -0.4 -4.8 -3.1 -0.2 -0.7 -0.1 -13.8 -9.8 -4.4 -3.0 -3.2 -23.2 -20.9 -3.6 -2.2 -2.2 Σ ε y ( o ) -13.7 -4.0 -2.5 -1.3 -0.2 -10.1 -2.0 -1.0 -0.5 0.0 -25.7 -18.5 -3.7 -0.9 0.1 -35.4 -16.6 -4.6 -2.7 -0.6 Σ ζ -33.7 -25.4 -7.5 -2.2 0.9 -48.4 -31.6 -8.3 -4.7 -0.8 Note. y ( c ) represent parameters corresponding to continuous variables and y ( o ) represent parameters corresponding to ordered categorical variables. Relative biascould not be calculated for the latent variable intercepts under the standard parameterization as the population value is zero. IXED INDICATOR DERIVATIVES 40

Figure 1 . Path Diagram for Empirical ExampleIXED INDICATOR DERIVATIVES 41 lll lll lll ll l lllll l ll ll l l ll ll l l ll l ll llllllll ll ll l lll ll l llll l

N=100 N=200 N=400 N=800 N=32000.0 0.1 0.2 0.3 0.4 0.50.0 0.1 0.2 0.3 0.4 0.50.0 0.1 0.2 0.3 0.4 0.50.0 0.1 0.2 0.3 0.4 0.50.0 0.1 0.2 0.3 0.4 0.5−10010